Home | History | Annotate | Download | only in gprof
      1 \input texinfo @c -*-texinfo-*-
      2 @setfilename gprof.info
      3 @c Copyright (C) 1988-2016 Free Software Foundation, Inc.
      4 @settitle GNU gprof
      5 @setchapternewpage odd
      6 
      7 @c man begin INCLUDE
      8 @include bfdver.texi
      9 @c man end
     10 
     11 @ifnottex
     12 @c This is a dir.info fragment to support semi-automated addition of
     13 @c manuals to an info tree.  zoo (a] cygnus.com is developing this facility.
     14 @dircategory Software development
     15 @direntry
     16 * gprof: (gprof).                Profiling your program's execution
     17 @end direntry
     18 @end ifnottex
     19 
     20 @copying
     21 This file documents the gprof profiler of the GNU system.
     22 
     23 @c man begin COPYRIGHT
     24 Copyright @copyright{} 1988-2016 Free Software Foundation, Inc.
     25 
     26 Permission is granted to copy, distribute and/or modify this document
     27 under the terms of the GNU Free Documentation License, Version 1.3
     28 or any later version published by the Free Software Foundation;
     29 with no Invariant Sections, with no Front-Cover Texts, and with no
     30 Back-Cover Texts.  A copy of the license is included in the
     31 section entitled ``GNU Free Documentation License''.
     32 
     33 @c man end
     34 @end copying
     35 
     36 @finalout
     37 @smallbook
     38 
     39 @titlepage
     40 @title GNU gprof
     41 @subtitle The @sc{gnu} Profiler
     42 @ifset VERSION_PACKAGE
     43 @subtitle @value{VERSION_PACKAGE}
     44 @end ifset
     45 @subtitle Version @value{VERSION}
     46 @author Jay Fenlason and Richard Stallman
     47 
     48 @page
     49 
     50 This manual describes the @sc{gnu} profiler, @code{gprof}, and how you
     51 can use it to determine which parts of a program are taking most of the
     52 execution time.  We assume that you know how to write, compile, and
     53 execute programs.  @sc{gnu} @code{gprof} was written by Jay Fenlason.
     54 Eric S. Raymond made some minor corrections and additions in 2003.
     55 
     56 @vskip 0pt plus 1filll
     57 Copyright @copyright{} 1988-2016 Free Software Foundation, Inc.
     58 
     59       Permission is granted to copy, distribute and/or modify this document
     60       under the terms of the GNU Free Documentation License, Version 1.3
     61       or any later version published by the Free Software Foundation;
     62       with no Invariant Sections, with no Front-Cover Texts, and with no
     63       Back-Cover Texts.  A copy of the license is included in the
     64       section entitled ``GNU Free Documentation License''.
     65 
     66 @end titlepage
     67 @contents
     68 
     69 @ifnottex
     70 @node Top
     71 @top Profiling a Program: Where Does It Spend Its Time?
     72 
     73 This manual describes the @sc{gnu} profiler, @code{gprof}, and how you
     74 can use it to determine which parts of a program are taking most of the
     75 execution time.  We assume that you know how to write, compile, and
     76 execute programs.  @sc{gnu} @code{gprof} was written by Jay Fenlason.
     77 
     78 This manual is for @code{gprof}
     79 @ifset VERSION_PACKAGE
     80 @value{VERSION_PACKAGE}
     81 @end ifset
     82 version @value{VERSION}.
     83 
     84 This document is distributed under the terms of the GNU Free
     85 Documentation License version 1.3.  A copy of the license is included
     86 in the section entitled ``GNU Free Documentation License''.
     87 
     88 @menu
     89 * Introduction::        What profiling means, and why it is useful.
     90 
     91 * Compiling::           How to compile your program for profiling.
     92 * Executing::           Executing your program to generate profile data
     93 * Invoking::            How to run @code{gprof}, and its options
     94 
     95 * Output::              Interpreting @code{gprof}'s output
     96 
     97 * Inaccuracy::          Potential problems you should be aware of
     98 * How do I?::           Answers to common questions
     99 * Incompatibilities::   (between @sc{gnu} @code{gprof} and Unix @code{gprof}.)
    100 * Details::             Details of how profiling is done
    101 * GNU Free Documentation License::  GNU Free Documentation License
    102 @end menu
    103 @end ifnottex
    104 
    105 @node Introduction
    106 @chapter Introduction to Profiling
    107 
    108 @ifset man
    109 @c man title gprof display call graph profile data
    110 
    111 @smallexample
    112 @c man begin SYNOPSIS
    113 gprof [ -[abcDhilLrsTvwxyz] ] [ -[ACeEfFJnNOpPqQZ][@var{name}] ]
    114  [ -I @var{dirs} ] [ -d[@var{num}] ] [ -k @var{from/to} ]
    115  [ -m @var{min-count} ] [ -R @var{map_file} ] [ -t @var{table-length} ]
    116  [ --[no-]annotated-source[=@var{name}] ]
    117  [ --[no-]exec-counts[=@var{name}] ]
    118  [ --[no-]flat-profile[=@var{name}] ] [ --[no-]graph[=@var{name}] ]
    119  [ --[no-]time=@var{name}] [ --all-lines ] [ --brief ]
    120  [ --debug[=@var{level}] ] [ --function-ordering ]
    121  [ --file-ordering @var{map_file} ] [ --directory-path=@var{dirs} ]
    122  [ --display-unused-functions ] [ --file-format=@var{name} ]
    123  [ --file-info ] [ --help ] [ --line ] [ --inline-file-names ]
    124  [ --min-count=@var{n} ] [ --no-static ] [ --print-path ]
    125  [ --separate-files ] [ --static-call-graph ] [ --sum ]
    126  [ --table-length=@var{len} ] [ --traditional ] [ --version ]
    127  [ --width=@var{n} ] [ --ignore-non-functions ]
    128  [ --demangle[=@var{STYLE}] ] [ --no-demangle ]
    129  [--external-symbol-table=name]
    130  [ @var{image-file} ] [ @var{profile-file} @dots{} ]
    131 @c man end
    132 @end smallexample
    133 
    134 @c man begin DESCRIPTION
    135 @code{gprof} produces an execution profile of C, Pascal, or Fortran77
    136 programs.  The effect of called routines is incorporated in the profile
    137 of each caller.  The profile data is taken from the call graph profile file
    138 (@file{gmon.out} default) which is created by programs
    139 that are compiled with the @samp{-pg} option of
    140 @code{cc}, @code{pc}, and @code{f77}.
    141 The @samp{-pg} option also links in versions of the library routines
    142 that are compiled for profiling.  @code{Gprof} reads the given object
    143 file (the default is @code{a.out}) and establishes the relation between
    144 its symbol table and the call graph profile from @file{gmon.out}.
    145 If more than one profile file is specified, the @code{gprof}
    146 output shows the sum of the profile information in the given profile files.
    147 
    148 @code{Gprof} calculates the amount of time spent in each routine.
    149 Next, these times are propagated along the edges of the call graph.
    150 Cycles are discovered, and calls into a cycle are made to share the time
    151 of the cycle.
    152 
    153 @c man end
    154 
    155 @c man begin BUGS
    156 The granularity of the sampling is shown, but remains
    157 statistical at best.
    158 We assume that the time for each execution of a function
    159 can be expressed by the total time for the function divided
    160 by the number of times the function is called.
    161 Thus the time propagated along the call graph arcs to the function's
    162 parents is directly proportional to the number of times that
    163 arc is traversed.
    164 
    165 Parents that are not themselves profiled will have the time of
    166 their profiled children propagated to them, but they will appear
    167 to be spontaneously invoked in the call graph listing, and will
    168 not have their time propagated further.
    169 Similarly, signal catchers, even though profiled, will appear
    170 to be spontaneous (although for more obscure reasons).
    171 Any profiled children of signal catchers should have their times
    172 propagated properly, unless the signal catcher was invoked during
    173 the execution of the profiling routine, in which case all is lost.
    174 
    175 The profiled program must call @code{exit}(2)
    176 or return normally for the profiling information to be saved
    177 in the @file{gmon.out} file.
    178 @c man end
    179 
    180 @c man begin FILES
    181 @table @code
    182 @item @file{a.out}
    183 the namelist and text space.
    184 @item @file{gmon.out}
    185 dynamic call graph and profile.
    186 @item @file{gmon.sum}
    187 summarized dynamic call graph and profile.
    188 @end table
    189 @c man end
    190 
    191 @c man begin SEEALSO
    192 monitor(3), profil(2), cc(1), prof(1), and the Info entry for @file{gprof}.
    193 
    194 ``An Execution Profiler for Modular Programs'',
    195 by S. Graham, P. Kessler, M. McKusick;
    196 Software - Practice and Experience,
    197 Vol. 13, pp. 671-685, 1983.
    198 
    199 ``gprof: A Call Graph Execution Profiler'',
    200 by S. Graham, P. Kessler, M. McKusick;
    201 Proceedings of the SIGPLAN '82 Symposium on Compiler Construction,
    202 SIGPLAN Notices, Vol. 17, No  6, pp. 120-126, June 1982.
    203 @c man end
    204 @end ifset
    205 
    206 Profiling allows you to learn where your program spent its time and which
    207 functions called which other functions while it was executing.  This
    208 information can show you which pieces of your program are slower than you
    209 expected, and might be candidates for rewriting to make your program
    210 execute faster.  It can also tell you which functions are being called more
    211 or less often than you expected.  This may help you spot bugs that had
    212 otherwise been unnoticed.
    213 
    214 Since the profiler uses information collected during the actual execution
    215 of your program, it can be used on programs that are too large or too
    216 complex to analyze by reading the source.  However, how your program is run
    217 will affect the information that shows up in the profile data.  If you
    218 don't use some feature of your program while it is being profiled, no
    219 profile information will be generated for that feature.
    220 
    221 Profiling has several steps:
    222 
    223 @itemize @bullet
    224 @item
    225 You must compile and link your program with profiling enabled.
    226 @xref{Compiling, ,Compiling a Program for Profiling}.
    227 
    228 @item
    229 You must execute your program to generate a profile data file.
    230 @xref{Executing, ,Executing the Program}.
    231 
    232 @item
    233 You must run @code{gprof} to analyze the profile data.
    234 @xref{Invoking, ,@code{gprof} Command Summary}.
    235 @end itemize
    236 
    237 The next three chapters explain these steps in greater detail.
    238 
    239 @c man begin DESCRIPTION
    240 
    241 Several forms of output are available from the analysis.
    242 
    243 The @dfn{flat profile} shows how much time your program spent in each function,
    244 and how many times that function was called.  If you simply want to know
    245 which functions burn most of the cycles, it is stated concisely here.
    246 @xref{Flat Profile, ,The Flat Profile}.
    247 
    248 The @dfn{call graph} shows, for each function, which functions called it, which
    249 other functions it called, and how many times.  There is also an estimate
    250 of how much time was spent in the subroutines of each function.  This can
    251 suggest places where you might try to eliminate function calls that use a
    252 lot of time.  @xref{Call Graph, ,The Call Graph}.
    253 
    254 The @dfn{annotated source} listing is a copy of the program's
    255 source code, labeled with the number of times each line of the
    256 program was executed.  @xref{Annotated Source, ,The Annotated Source
    257 Listing}.
    258 @c man end
    259 
    260 To better understand how profiling works, you may wish to read
    261 a description of its implementation.
    262 @xref{Implementation, ,Implementation of Profiling}.
    263 
    264 @node Compiling
    265 @chapter Compiling a Program for Profiling
    266 
    267 The first step in generating profile information for your program is
    268 to compile and link it with profiling enabled.
    269 
    270 To compile a source file for profiling, specify the @samp{-pg} option when
    271 you run the compiler.  (This is in addition to the options you normally
    272 use.)
    273 
    274 To link the program for profiling, if you use a compiler such as @code{cc}
    275 to do the linking, simply specify @samp{-pg} in addition to your usual
    276 options.  The same option, @samp{-pg}, alters either compilation or linking
    277 to do what is necessary for profiling.  Here are examples:
    278 
    279 @example
    280 cc -g -c myprog.c utils.c -pg
    281 cc -o myprog myprog.o utils.o -pg
    282 @end example
    283 
    284 The @samp{-pg} option also works with a command that both compiles and links:
    285 
    286 @example
    287 cc -o myprog myprog.c utils.c -g -pg
    288 @end example
    289 
    290 Note: The @samp{-pg} option must be part of your compilation options
    291 as well as your link options.  If it is not then no call-graph data
    292 will be gathered and when you run @code{gprof} you will get an error
    293 message like this:
    294 
    295 @example
    296 gprof: gmon.out file is missing call-graph data
    297 @end example
    298 
    299 If you add the @samp{-Q} switch to suppress the printing of the call
    300 graph data you will still be able to see the time samples:
    301 
    302 @example
    303 Flat profile:
    304 
    305 Each sample counts as 0.01 seconds.
    306   %   cumulative   self              self     total
    307  time   seconds   seconds    calls  Ts/call  Ts/call  name
    308  44.12      0.07     0.07                             zazLoop
    309  35.29      0.14     0.06                             main
    310  20.59      0.17     0.04                             bazMillion
    311 @end example
    312 
    313 If you run the linker @code{ld} directly instead of through a compiler
    314 such as @code{cc}, you may have to specify a profiling startup file
    315 @file{gcrt0.o} as the first input file instead of the usual startup
    316 file @file{crt0.o}.  In addition, you would probably want to
    317 specify the profiling C library, @file{libc_p.a}, by writing
    318 @samp{-lc_p} instead of the usual @samp{-lc}.  This is not absolutely
    319 necessary, but doing this gives you number-of-calls information for
    320 standard library functions such as @code{read} and @code{open}.  For
    321 example:
    322 
    323 @example
    324 ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p
    325 @end example
    326 
    327 If you are running the program on a system which supports shared
    328 libraries you may run into problems with the profiling support code in
    329 a shared library being called before that library has been fully
    330 initialised.  This is usually detected by the program encountering a
    331 segmentation fault as soon as it is run.  The solution is to link
    332 against a static version of the library containing the profiling
    333 support code, which for @code{gcc} users can be done via the
    334 @samp{-static} or @samp{-static-libgcc} command line option.  For
    335 example:
    336 
    337 @example
    338 gcc -g -pg -static-libgcc myprog.c utils.c -o myprog
    339 @end example
    340 
    341 If you compile only some of the modules of the program with @samp{-pg}, you
    342 can still profile the program, but you won't get complete information about
    343 the modules that were compiled without @samp{-pg}.  The only information
    344 you get for the functions in those modules is the total time spent in them;
    345 there is no record of how many times they were called, or from where.  This
    346 will not affect the flat profile (except that the @code{calls} field for
    347 the functions will be blank), but will greatly reduce the usefulness of the
    348 call graph.
    349 
    350 If you wish to perform line-by-line profiling you should use the
    351 @code{gcov} tool instead of @code{gprof}.  See that tool's manual or
    352 info pages for more details of how to do this.
    353 
    354 Note, older versions of @code{gcc} produce line-by-line profiling
    355 information that works with @code{gprof} rather than @code{gcov} so
    356 there is still support for displaying this kind of information in
    357 @code{gprof}. @xref{Line-by-line, ,Line-by-line Profiling}.
    358 
    359 It also worth noting that @code{gcc} implements a
    360 @samp{-finstrument-functions} command line option which will insert
    361 calls to special user supplied instrumentation routines at the entry
    362 and exit of every function in their program.  This can be used to
    363 implement an alternative profiling scheme.
    364 
    365 @node Executing
    366 @chapter Executing the Program
    367 
    368 Once the program is compiled for profiling, you must run it in order to
    369 generate the information that @code{gprof} needs.  Simply run the program
    370 as usual, using the normal arguments, file names, etc.  The program should
    371 run normally, producing the same output as usual.  It will, however, run
    372 somewhat slower than normal because of the time spent collecting and
    373 writing the profile data.
    374 
    375 The way you run the program---the arguments and input that you give
    376 it---may have a dramatic effect on what the profile information shows.  The
    377 profile data will describe the parts of the program that were activated for
    378 the particular input you use.  For example, if the first command you give
    379 to your program is to quit, the profile data will show the time used in
    380 initialization and in cleanup, but not much else.
    381 
    382 Your program will write the profile data into a file called @file{gmon.out}
    383 just before exiting.  If there is already a file called @file{gmon.out},
    384 its contents are overwritten.  There is currently no way to tell the
    385 program to write the profile data under a different name, but you can rename
    386 the file afterwards if you are concerned that it may be overwritten.
    387 
    388 In order to write the @file{gmon.out} file properly, your program must exit
    389 normally: by returning from @code{main} or by calling @code{exit}.  Calling
    390 the low-level function @code{_exit} does not write the profile data, and
    391 neither does abnormal termination due to an unhandled signal.
    392 
    393 The @file{gmon.out} file is written in the program's @emph{current working
    394 directory} at the time it exits.  This means that if your program calls
    395 @code{chdir}, the @file{gmon.out} file will be left in the last directory
    396 your program @code{chdir}'d to.  If you don't have permission to write in
    397 this directory, the file is not written, and you will get an error message.
    398 
    399 Older versions of the @sc{gnu} profiling library may also write a file
    400 called @file{bb.out}.  This file, if present, contains an human-readable
    401 listing of the basic-block execution counts.  Unfortunately, the
    402 appearance of a human-readable @file{bb.out} means the basic-block
    403 counts didn't get written into @file{gmon.out}.
    404 The Perl script @code{bbconv.pl}, included with the @code{gprof}
    405 source distribution, will convert a @file{bb.out} file into
    406 a format readable by @code{gprof}.  Invoke it like this:
    407 
    408 @smallexample
    409 bbconv.pl < bb.out > @var{bh-data}
    410 @end smallexample
    411 
    412 This translates the information in @file{bb.out} into a form that
    413 @code{gprof} can understand.  But you still need to tell @code{gprof}
    414 about the existence of this translated information.  To do that, include
    415 @var{bb-data} on the @code{gprof} command line, @emph{along with
    416 @file{gmon.out}}, like this:
    417 
    418 @smallexample
    419 gprof @var{options} @var{executable-file} gmon.out @var{bb-data} [@var{yet-more-profile-data-files}@dots{}] [> @var{outfile}]
    420 @end smallexample
    421 
    422 @node Invoking
    423 @chapter @code{gprof} Command Summary
    424 
    425 After you have a profile data file @file{gmon.out}, you can run @code{gprof}
    426 to interpret the information in it.  The @code{gprof} program prints a
    427 flat profile and a call graph on standard output.  Typically you would
    428 redirect the output of @code{gprof} into a file with @samp{>}.
    429 
    430 You run @code{gprof} like this:
    431 
    432 @smallexample
    433 gprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}]
    434 @end smallexample
    435 
    436 @noindent
    437 Here square-brackets indicate optional arguments.
    438 
    439 If you omit the executable file name, the file @file{a.out} is used.  If
    440 you give no profile data file name, the file @file{gmon.out} is used.  If
    441 any file is not in the proper format, or if the profile data file does not
    442 appear to belong to the executable file, an error message is printed.
    443 
    444 You can give more than one profile data file by entering all their names
    445 after the executable file name; then the statistics in all the data files
    446 are summed together.
    447 
    448 The order of these options does not matter.
    449 
    450 @menu
    451 * Output Options::      Controlling @code{gprof}'s output style
    452 * Analysis Options::    Controlling how @code{gprof} analyzes its data
    453 * Miscellaneous Options::
    454 * Deprecated Options::  Options you no longer need to use, but which
    455                             have been retained for compatibility
    456 * Symspecs::            Specifying functions to include or exclude
    457 @end menu
    458 
    459 @node Output Options
    460 @section Output Options
    461 
    462 @c man begin OPTIONS
    463 These options specify which of several output formats
    464 @code{gprof} should produce.
    465 
    466 Many of these options take an optional @dfn{symspec} to specify
    467 functions to be included or excluded.  These options can be
    468 specified multiple times, with different symspecs, to include
    469 or exclude sets of symbols.  @xref{Symspecs, ,Symspecs}.
    470 
    471 Specifying any of these options overrides the default (@samp{-p -q}),
    472 which prints a flat profile and call graph analysis
    473 for all functions.
    474 
    475 @table @code
    476 
    477 @item -A[@var{symspec}]
    478 @itemx --annotated-source[=@var{symspec}]
    479 The @samp{-A} option causes @code{gprof} to print annotated source code.
    480 If @var{symspec} is specified, print output only for matching symbols.
    481 @xref{Annotated Source, ,The Annotated Source Listing}.
    482 
    483 @item -b
    484 @itemx --brief
    485 If the @samp{-b} option is given, @code{gprof} doesn't print the
    486 verbose blurbs that try to explain the meaning of all of the fields in
    487 the tables.  This is useful if you intend to print out the output, or
    488 are tired of seeing the blurbs.
    489 
    490 @item -C[@var{symspec}]
    491 @itemx --exec-counts[=@var{symspec}]
    492 The @samp{-C} option causes @code{gprof} to
    493 print a tally of functions and the number of times each was called.
    494 If @var{symspec} is specified, print tally only for matching symbols.
    495 
    496 If the profile data file contains basic-block count records, specifying
    497 the @samp{-l} option, along with @samp{-C}, will cause basic-block
    498 execution counts to be tallied and displayed.
    499 
    500 @item -i
    501 @itemx --file-info
    502 The @samp{-i} option causes @code{gprof} to display summary information
    503 about the profile data file(s) and then exit.  The number of histogram,
    504 call graph, and basic-block count records is displayed.
    505 
    506 @item -I @var{dirs}
    507 @itemx --directory-path=@var{dirs}
    508 The @samp{-I} option specifies a list of search directories in
    509 which to find source files.  Environment variable @var{GPROF_PATH}
    510 can also be used to convey this information.
    511 Used mostly for annotated source output.
    512 
    513 @item -J[@var{symspec}]
    514 @itemx --no-annotated-source[=@var{symspec}]
    515 The @samp{-J} option causes @code{gprof} not to
    516 print annotated source code.
    517 If @var{symspec} is specified, @code{gprof} prints annotated source,
    518 but excludes matching symbols.
    519 
    520 @item -L
    521 @itemx --print-path
    522 Normally, source filenames are printed with the path
    523 component suppressed.  The @samp{-L} option causes @code{gprof}
    524 to print the full pathname of
    525 source filenames, which is determined
    526 from symbolic debugging information in the image file
    527 and is relative to the directory in which the compiler
    528 was invoked.
    529 
    530 @item -p[@var{symspec}]
    531 @itemx --flat-profile[=@var{symspec}]
    532 The @samp{-p} option causes @code{gprof} to print a flat profile.
    533 If @var{symspec} is specified, print flat profile only for matching symbols.
    534 @xref{Flat Profile, ,The Flat Profile}.
    535 
    536 @item -P[@var{symspec}]
    537 @itemx --no-flat-profile[=@var{symspec}]
    538 The @samp{-P} option causes @code{gprof} to suppress printing a flat profile.
    539 If @var{symspec} is specified, @code{gprof} prints a flat profile,
    540 but excludes matching symbols.
    541 
    542 @item -q[@var{symspec}]
    543 @itemx --graph[=@var{symspec}]
    544 The @samp{-q} option causes @code{gprof} to print the call graph analysis.
    545 If @var{symspec} is specified, print call graph only for matching symbols
    546 and their children.
    547 @xref{Call Graph, ,The Call Graph}.
    548 
    549 @item -Q[@var{symspec}]
    550 @itemx --no-graph[=@var{symspec}]
    551 The @samp{-Q} option causes @code{gprof} to suppress printing the
    552 call graph.
    553 If @var{symspec} is specified, @code{gprof} prints a call graph,
    554 but excludes matching symbols.
    555 
    556 @item -t
    557 @itemx --table-length=@var{num}
    558 The @samp{-t} option causes the @var{num} most active source lines in
    559 each source file to be listed when source annotation is enabled.  The
    560 default is 10.
    561 
    562 @item -y
    563 @itemx --separate-files
    564 This option affects annotated source output only.
    565 Normally, @code{gprof} prints annotated source files
    566 to standard-output.  If this option is specified,
    567 annotated source for a file named @file{path/@var{filename}}
    568 is generated in the file @file{@var{filename}-ann}.  If the underlying
    569 file system would truncate @file{@var{filename}-ann} so that it
    570 overwrites the original @file{@var{filename}}, @code{gprof} generates
    571 annotated source in the file @file{@var{filename}.ann} instead (if the
    572 original file name has an extension, that extension is @emph{replaced}
    573 with @file{.ann}).
    574 
    575 @item -Z[@var{symspec}]
    576 @itemx --no-exec-counts[=@var{symspec}]
    577 The @samp{-Z} option causes @code{gprof} not to
    578 print a tally of functions and the number of times each was called.
    579 If @var{symspec} is specified, print tally, but exclude matching symbols.
    580 
    581 @item -r
    582 @itemx --function-ordering
    583 The @samp{--function-ordering} option causes @code{gprof} to print a
    584 suggested function ordering for the program based on profiling data.
    585 This option suggests an ordering which may improve paging, tlb and
    586 cache behavior for the program on systems which support arbitrary
    587 ordering of functions in an executable.
    588 
    589 The exact details of how to force the linker to place functions
    590 in a particular order is system dependent and out of the scope of this
    591 manual.
    592 
    593 @item -R @var{map_file}
    594 @itemx --file-ordering @var{map_file}
    595 The @samp{--file-ordering} option causes @code{gprof} to print a
    596 suggested .o link line ordering for the program based on profiling data.
    597 This option suggests an ordering which may improve paging, tlb and
    598 cache behavior for the program on systems which do not support arbitrary
    599 ordering of functions in an executable.
    600 
    601 Use of the @samp{-a} argument is highly recommended with this option.
    602 
    603 The @var{map_file} argument is a pathname to a file which provides
    604 function name to object file mappings.  The format of the file is similar to
    605 the output of the program @code{nm}.
    606 
    607 @smallexample
    608 @group
    609 c-parse.o:00000000 T yyparse
    610 c-parse.o:00000004 C yyerrflag
    611 c-lang.o:00000000 T maybe_objc_method_name
    612 c-lang.o:00000000 T print_lang_statistics
    613 c-lang.o:00000000 T recognize_objc_keyword
    614 c-decl.o:00000000 T print_lang_identifier
    615 c-decl.o:00000000 T print_lang_type
    616 @dots{}
    617 
    618 @end group
    619 @end smallexample
    620 
    621 To create a @var{map_file} with @sc{gnu} @code{nm}, type a command like
    622 @kbd{nm --extern-only --defined-only -v --print-file-name program-name}.
    623 
    624 @item -T
    625 @itemx --traditional
    626 The @samp{-T} option causes @code{gprof} to print its output in
    627 ``traditional'' BSD style.
    628 
    629 @item -w @var{width}
    630 @itemx --width=@var{width}
    631 Sets width of output lines to @var{width}.
    632 Currently only used when printing the function index at the bottom
    633 of the call graph.
    634 
    635 @item -x
    636 @itemx --all-lines
    637 This option affects annotated source output only.
    638 By default, only the lines at the beginning of a basic-block
    639 are annotated.  If this option is specified, every line in
    640 a basic-block is annotated by repeating the annotation for the
    641 first line.  This behavior is similar to @code{tcov}'s @samp{-a}.
    642 
    643 @item --demangle[=@var{style}]
    644 @itemx --no-demangle
    645 These options control whether C++ symbol names should be demangled when
    646 printing output.  The default is to demangle symbols.  The
    647 @code{--no-demangle} option may be used to turn off demangling. Different
    648 compilers have different mangling styles.  The optional demangling style
    649 argument can be used to choose an appropriate demangling style for your
    650 compiler.
    651 @end table
    652 
    653 @node Analysis Options
    654 @section Analysis Options
    655 
    656 @table @code
    657 
    658 @item -a
    659 @itemx --no-static
    660 The @samp{-a} option causes @code{gprof} to suppress the printing of
    661 statically declared (private) functions.  (These are functions whose
    662 names are not listed as global, and which are not visible outside the
    663 file/function/block where they were defined.)  Time spent in these
    664 functions, calls to/from them, etc., will all be attributed to the
    665 function that was loaded directly before it in the executable file.
    666 @c This is compatible with Unix @code{gprof}, but a bad idea.
    667 This option affects both the flat profile and the call graph.
    668 
    669 @item -c
    670 @itemx --static-call-graph
    671 The @samp{-c} option causes the call graph of the program to be
    672 augmented by a heuristic which examines the text space of the object
    673 file and identifies function calls in the binary machine code.
    674 Since normal call graph records are only generated when functions are
    675 entered, this option identifies children that could have been called,
    676 but never were.  Calls to functions that were not compiled with
    677 profiling enabled are also identified, but only if symbol table
    678 entries are present for them.
    679 Calls to dynamic library routines are typically @emph{not} found
    680 by this option.
    681 Parents or children identified via this heuristic
    682 are indicated in the call graph with call counts of @samp{0}.
    683 
    684 @item -D
    685 @itemx --ignore-non-functions
    686 The @samp{-D} option causes @code{gprof} to ignore symbols which
    687 are not known to be functions.  This option will give more accurate
    688 profile data on systems where it is supported (Solaris and HPUX for
    689 example).
    690 
    691 @item -k @var{from}/@var{to}
    692 The @samp{-k} option allows you to delete from the call graph any arcs from
    693 symbols matching symspec @var{from} to those matching symspec @var{to}.
    694 
    695 @item -l
    696 @itemx --line
    697 The @samp{-l} option enables line-by-line profiling, which causes
    698 histogram hits to be charged to individual source code lines,
    699 instead of functions.  This feature only works with programs compiled
    700 by older versions of the @code{gcc} compiler.  Newer versions of
    701 @code{gcc} are designed to work with the @code{gcov} tool instead.
    702 
    703 If the program was compiled with basic-block counting enabled,
    704 this option will also identify how many times each line of
    705 code was executed.
    706 While line-by-line profiling can help isolate where in a large function
    707 a program is spending its time, it also significantly increases
    708 the running time of @code{gprof}, and magnifies statistical
    709 inaccuracies.
    710 @xref{Sampling Error, ,Statistical Sampling Error}.
    711 
    712 @item --inline-file-names
    713 This option causes @code{gprof} to print the source file after each
    714 symbol in both the flat profile and the call graph. The full path to the
    715 file is printed if used with the @samp{-L} option.
    716 
    717 @item -m @var{num}
    718 @itemx --min-count=@var{num}
    719 This option affects execution count output only.
    720 Symbols that are executed less than @var{num} times are suppressed.
    721 
    722 @item -n@var{symspec}
    723 @itemx --time=@var{symspec}
    724 The @samp{-n} option causes @code{gprof}, in its call graph analysis,
    725 to only propagate times for symbols matching @var{symspec}.
    726 
    727 @item -N@var{symspec}
    728 @itemx --no-time=@var{symspec}
    729 The @samp{-n} option causes @code{gprof}, in its call graph analysis,
    730 not to propagate times for symbols matching @var{symspec}.
    731 
    732 @item -S@var{filename}
    733 @itemx --external-symbol-table=@var{filename}
    734 The @samp{-S} option causes @code{gprof} to read an external symbol table
    735 file, such as @file{/proc/kallsyms}, rather than read the symbol table
    736 from the given object file (the default is @code{a.out}). This is useful
    737 for profiling kernel modules.
    738 
    739 @item -z
    740 @itemx --display-unused-functions
    741 If you give the @samp{-z} option, @code{gprof} will mention all
    742 functions in the flat profile, even those that were never called, and
    743 that had no time spent in them.  This is useful in conjunction with the
    744 @samp{-c} option for discovering which routines were never called.
    745 
    746 @end table
    747 
    748 @node Miscellaneous Options
    749 @section Miscellaneous Options
    750 
    751 @table @code
    752 
    753 @item -d[@var{num}]
    754 @itemx --debug[=@var{num}]
    755 The @samp{-d @var{num}} option specifies debugging options.
    756 If @var{num} is not specified, enable all debugging.
    757 @xref{Debugging, ,Debugging @code{gprof}}.
    758 
    759 @item -h
    760 @itemx --help
    761 The @samp{-h} option prints command line usage.
    762 
    763 @item -O@var{name}
    764 @itemx --file-format=@var{name}
    765 Selects the format of the profile data files.  Recognized formats are
    766 @samp{auto} (the default), @samp{bsd}, @samp{4.4bsd}, @samp{magic}, and
    767 @samp{prof} (not yet supported).
    768 
    769 @item -s
    770 @itemx --sum
    771 The @samp{-s} option causes @code{gprof} to summarize the information
    772 in the profile data files it read in, and write out a profile data
    773 file called @file{gmon.sum}, which contains all the information from
    774 the profile data files that @code{gprof} read in.  The file @file{gmon.sum}
    775 may be one of the specified input files; the effect of this is to
    776 merge the data in the other input files into @file{gmon.sum}.
    777 
    778 Eventually you can run @code{gprof} again without @samp{-s} to analyze the
    779 cumulative data in the file @file{gmon.sum}.
    780 
    781 @item -v
    782 @itemx --version
    783 The @samp{-v} flag causes @code{gprof} to print the current version
    784 number, and then exit.
    785 
    786 @end table
    787 
    788 @node Deprecated Options
    789 @section Deprecated Options
    790 
    791 These options have been replaced with newer versions that use symspecs.
    792 
    793 @table @code
    794 
    795 @item -e @var{function_name}
    796 The @samp{-e @var{function}} option tells @code{gprof} to not print
    797 information about the function @var{function_name} (and its
    798 children@dots{}) in the call graph.  The function will still be listed
    799 as a child of any functions that call it, but its index number will be
    800 shown as @samp{[not printed]}.  More than one @samp{-e} option may be
    801 given; only one @var{function_name} may be indicated with each @samp{-e}
    802 option.
    803 
    804 @item -E @var{function_name}
    805 The @code{-E @var{function}} option works like the @code{-e} option, but
    806 time spent in the function (and children who were not called from
    807 anywhere else), will not be used to compute the percentages-of-time for
    808 the call graph.  More than one @samp{-E} option may be given; only one
    809 @var{function_name} may be indicated with each @samp{-E} option.
    810 
    811 @item -f @var{function_name}
    812 The @samp{-f @var{function}} option causes @code{gprof} to limit the
    813 call graph to the function @var{function_name} and its children (and
    814 their children@dots{}).  More than one @samp{-f} option may be given;
    815 only one @var{function_name} may be indicated with each @samp{-f}
    816 option.
    817 
    818 @item -F @var{function_name}
    819 The @samp{-F @var{function}} option works like the @code{-f} option, but
    820 only time spent in the function and its children (and their
    821 children@dots{}) will be used to determine total-time and
    822 percentages-of-time for the call graph.  More than one @samp{-F} option
    823 may be given; only one @var{function_name} may be indicated with each
    824 @samp{-F} option.  The @samp{-F} option overrides the @samp{-E} option.
    825 
    826 @end table
    827 
    828 @c man end
    829 
    830 Note that only one function can be specified with each @code{-e},
    831 @code{-E}, @code{-f} or @code{-F} option.  To specify more than one
    832 function, use multiple options.  For example, this command:
    833 
    834 @example
    835 gprof -e boring -f foo -f bar myprogram > gprof.output
    836 @end example
    837 
    838 @noindent
    839 lists in the call graph all functions that were reached from either
    840 @code{foo} or @code{bar} and were not reachable from @code{boring}.
    841 
    842 @node Symspecs
    843 @section Symspecs
    844 
    845 Many of the output options allow functions to be included or excluded
    846 using @dfn{symspecs} (symbol specifications), which observe the
    847 following syntax:
    848 
    849 @example
    850   filename_containing_a_dot
    851 | funcname_not_containing_a_dot
    852 | linenumber
    853 | ( [ any_filename ] `:' ( any_funcname | linenumber ) )
    854 @end example
    855 
    856 Here are some sample symspecs:
    857 
    858 @table @samp
    859 @item main.c
    860 Selects everything in file @file{main.c}---the
    861 dot in the string tells @code{gprof} to interpret
    862 the string as a filename, rather than as
    863 a function name.  To select a file whose
    864 name does not contain a dot, a trailing colon
    865 should be specified.  For example, @samp{odd:} is
    866 interpreted as the file named @file{odd}.
    867 
    868 @item main
    869 Selects all functions named @samp{main}.
    870 
    871 Note that there may be multiple instances of the same function name
    872 because some of the definitions may be local (i.e., static).  Unless a
    873 function name is unique in a program, you must use the colon notation
    874 explained below to specify a function from a specific source file.
    875 
    876 Sometimes, function names contain dots.  In such cases, it is necessary
    877 to add a leading colon to the name.  For example, @samp{:.mul} selects
    878 function @samp{.mul}.
    879 
    880 In some object file formats, symbols have a leading underscore.
    881 @code{gprof} will normally not print these underscores.  When you name a
    882 symbol in a symspec, you should type it exactly as @code{gprof} prints
    883 it in its output.  For example, if the compiler produces a symbol
    884 @samp{_main} from your @code{main} function, @code{gprof} still prints
    885 it as @samp{main} in its output, so you should use @samp{main} in
    886 symspecs.
    887 
    888 @item main.c:main
    889 Selects function @samp{main} in file @file{main.c}.
    890 
    891 @item main.c:134
    892 Selects line 134 in file @file{main.c}.
    893 @end table
    894 
    895 @node Output
    896 @chapter Interpreting @code{gprof}'s Output
    897 
    898 @code{gprof} can produce several different output styles, the
    899 most important of which are described below.  The simplest output
    900 styles (file information, execution count, and function and file ordering)
    901 are not described here, but are documented with the respective options
    902 that trigger them.
    903 @xref{Output Options, ,Output Options}.
    904 
    905 @menu
    906 * Flat Profile::        The flat profile shows how much time was spent
    907                             executing directly in each function.
    908 * Call Graph::          The call graph shows which functions called which
    909                             others, and how much time each function used
    910                             when its subroutine calls are included.
    911 * Line-by-line::        @code{gprof} can analyze individual source code lines
    912 * Annotated Source::    The annotated source listing displays source code
    913                             labeled with execution counts
    914 @end menu
    915 
    916 
    917 @node Flat Profile
    918 @section The Flat Profile
    919 @cindex flat profile
    920 
    921 The @dfn{flat profile} shows the total amount of time your program
    922 spent executing each function.  Unless the @samp{-z} option is given,
    923 functions with no apparent time spent in them, and no apparent calls
    924 to them, are not mentioned.  Note that if a function was not compiled
    925 for profiling, and didn't run long enough to show up on the program
    926 counter histogram, it will be indistinguishable from a function that
    927 was never called.
    928 
    929 This is part of a flat profile for a small program:
    930 
    931 @smallexample
    932 @group
    933 Flat profile:
    934 
    935 Each sample counts as 0.01 seconds.
    936   %   cumulative   self              self     total
    937  time   seconds   seconds    calls  ms/call  ms/call  name
    938  33.34      0.02     0.02     7208     0.00     0.00  open
    939  16.67      0.03     0.01      244     0.04     0.12  offtime
    940  16.67      0.04     0.01        8     1.25     1.25  memccpy
    941  16.67      0.05     0.01        7     1.43     1.43  write
    942  16.67      0.06     0.01                             mcount
    943   0.00      0.06     0.00      236     0.00     0.00  tzset
    944   0.00      0.06     0.00      192     0.00     0.00  tolower
    945   0.00      0.06     0.00       47     0.00     0.00  strlen
    946   0.00      0.06     0.00       45     0.00     0.00  strchr
    947   0.00      0.06     0.00        1     0.00    50.00  main
    948   0.00      0.06     0.00        1     0.00     0.00  memcpy
    949   0.00      0.06     0.00        1     0.00    10.11  print
    950   0.00      0.06     0.00        1     0.00     0.00  profil
    951   0.00      0.06     0.00        1     0.00    50.00  report
    952 @dots{}
    953 @end group
    954 @end smallexample
    955 
    956 @noindent
    957 The functions are sorted first by decreasing run-time spent in them,
    958 then by decreasing number of calls, then alphabetically by name.  The
    959 functions @samp{mcount} and @samp{profil} are part of the profiling
    960 apparatus and appear in every flat profile; their time gives a measure of
    961 the amount of overhead due to profiling.
    962 
    963 Just before the column headers, a statement appears indicating
    964 how much time each sample counted as.
    965 This @dfn{sampling period} estimates the margin of error in each of the time
    966 figures.  A time figure that is not much larger than this is not
    967 reliable.  In this example, each sample counted as 0.01 seconds,
    968 suggesting a 100 Hz sampling rate.
    969 The program's total execution time was 0.06
    970 seconds, as indicated by the @samp{cumulative seconds} field.  Since
    971 each sample counted for 0.01 seconds, this means only six samples
    972 were taken during the run.  Two of the samples occurred while the
    973 program was in the @samp{open} function, as indicated by the
    974 @samp{self seconds} field.  Each of the other four samples
    975 occurred one each in @samp{offtime}, @samp{memccpy}, @samp{write},
    976 and @samp{mcount}.
    977 Since only six samples were taken, none of these values can
    978 be regarded as particularly reliable.
    979 In another run,
    980 the @samp{self seconds} field for
    981 @samp{mcount} might well be @samp{0.00} or @samp{0.02}.
    982 @xref{Sampling Error, ,Statistical Sampling Error},
    983 for a complete discussion.
    984 
    985 The remaining functions in the listing (those whose
    986 @samp{self seconds} field is @samp{0.00}) didn't appear
    987 in the histogram samples at all.  However, the call graph
    988 indicated that they were called, so therefore they are listed,
    989 sorted in decreasing order by the @samp{calls} field.
    990 Clearly some time was spent executing these functions,
    991 but the paucity of histogram samples prevents any
    992 determination of how much time each took.
    993 
    994 Here is what the fields in each line mean:
    995 
    996 @table @code
    997 @item % time
    998 This is the percentage of the total execution time your program spent
    999 in this function.  These should all add up to 100%.
   1000 
   1001 @item cumulative seconds
   1002 This is the cumulative total number of seconds the computer spent
   1003 executing this functions, plus the time spent in all the functions
   1004 above this one in this table.
   1005 
   1006 @item self seconds
   1007 This is the number of seconds accounted for by this function alone.
   1008 The flat profile listing is sorted first by this number.
   1009 
   1010 @item calls
   1011 This is the total number of times the function was called.  If the
   1012 function was never called, or the number of times it was called cannot
   1013 be determined (probably because the function was not compiled with
   1014 profiling enabled), the @dfn{calls} field is blank.
   1015 
   1016 @item self ms/call
   1017 This represents the average number of milliseconds spent in this
   1018 function per call, if this function is profiled.  Otherwise, this field
   1019 is blank for this function.
   1020 
   1021 @item total ms/call
   1022 This represents the average number of milliseconds spent in this
   1023 function and its descendants per call, if this function is profiled.
   1024 Otherwise, this field is blank for this function.
   1025 This is the only field in the flat profile that uses call graph analysis.
   1026 
   1027 @item name
   1028 This is the name of the function.   The flat profile is sorted by this
   1029 field alphabetically after the @dfn{self seconds} and @dfn{calls}
   1030 fields are sorted.
   1031 @end table
   1032 
   1033 @node Call Graph
   1034 @section The Call Graph
   1035 @cindex call graph
   1036 
   1037 The @dfn{call graph} shows how much time was spent in each function
   1038 and its children.  From this information, you can find functions that,
   1039 while they themselves may not have used much time, called other
   1040 functions that did use unusual amounts of time.
   1041 
   1042 Here is a sample call from a small program.  This call came from the
   1043 same @code{gprof} run as the flat profile example in the previous
   1044 section.
   1045 
   1046 @smallexample
   1047 @group
   1048 granularity: each sample hit covers 2 byte(s) for 20.00% of 0.05 seconds
   1049 
   1050 index % time    self  children    called     name
   1051                                                  <spontaneous>
   1052 [1]    100.0    0.00    0.05                 start [1]
   1053                 0.00    0.05       1/1           main [2]
   1054                 0.00    0.00       1/2           on_exit [28]
   1055                 0.00    0.00       1/1           exit [59]
   1056 -----------------------------------------------
   1057                 0.00    0.05       1/1           start [1]
   1058 [2]    100.0    0.00    0.05       1         main [2]
   1059                 0.00    0.05       1/1           report [3]
   1060 -----------------------------------------------
   1061                 0.00    0.05       1/1           main [2]
   1062 [3]    100.0    0.00    0.05       1         report [3]
   1063                 0.00    0.03       8/8           timelocal [6]
   1064                 0.00    0.01       1/1           print [9]
   1065                 0.00    0.01       9/9           fgets [12]
   1066                 0.00    0.00      12/34          strncmp <cycle 1> [40]
   1067                 0.00    0.00       8/8           lookup [20]
   1068                 0.00    0.00       1/1           fopen [21]
   1069                 0.00    0.00       8/8           chewtime [24]
   1070                 0.00    0.00       8/16          skipspace [44]
   1071 -----------------------------------------------
   1072 [4]     59.8    0.01        0.02       8+472     <cycle 2 as a whole> [4]
   1073                 0.01        0.02     244+260         offtime <cycle 2> [7]
   1074                 0.00        0.00     236+1           tzset <cycle 2> [26]
   1075 -----------------------------------------------
   1076 @end group
   1077 @end smallexample
   1078 
   1079 The lines full of dashes divide this table into @dfn{entries}, one for each
   1080 function.  Each entry has one or more lines.
   1081 
   1082 In each entry, the primary line is the one that starts with an index number
   1083 in square brackets.  The end of this line says which function the entry is
   1084 for.  The preceding lines in the entry describe the callers of this
   1085 function and the following lines describe its subroutines (also called
   1086 @dfn{children} when we speak of the call graph).
   1087 
   1088 The entries are sorted by time spent in the function and its subroutines.
   1089 
   1090 The internal profiling function @code{mcount} (@pxref{Flat Profile, ,The
   1091 Flat Profile}) is never mentioned in the call graph.
   1092 
   1093 @menu
   1094 * Primary::       Details of the primary line's contents.
   1095 * Callers::       Details of caller-lines' contents.
   1096 * Subroutines::   Details of subroutine-lines' contents.
   1097 * Cycles::        When there are cycles of recursion,
   1098                    such as @code{a} calls @code{b} calls @code{a}@dots{}
   1099 @end menu
   1100 
   1101 @node Primary
   1102 @subsection The Primary Line
   1103 
   1104 The @dfn{primary line} in a call graph entry is the line that
   1105 describes the function which the entry is about and gives the overall
   1106 statistics for this function.
   1107 
   1108 For reference, we repeat the primary line from the entry for function
   1109 @code{report} in our main example, together with the heading line that
   1110 shows the names of the fields:
   1111 
   1112 @smallexample
   1113 @group
   1114 index  % time    self  children called     name
   1115 @dots{}
   1116 [3]    100.0    0.00    0.05       1         report [3]
   1117 @end group
   1118 @end smallexample
   1119 
   1120 Here is what the fields in the primary line mean:
   1121 
   1122 @table @code
   1123 @item index
   1124 Entries are numbered with consecutive integers.  Each function
   1125 therefore has an index number, which appears at the beginning of its
   1126 primary line.
   1127 
   1128 Each cross-reference to a function, as a caller or subroutine of
   1129 another, gives its index number as well as its name.  The index number
   1130 guides you if you wish to look for the entry for that function.
   1131 
   1132 @item % time
   1133 This is the percentage of the total time that was spent in this
   1134 function, including time spent in subroutines called from this
   1135 function.
   1136 
   1137 The time spent in this function is counted again for the callers of
   1138 this function.  Therefore, adding up these percentages is meaningless.
   1139 
   1140 @item self
   1141 This is the total amount of time spent in this function.  This
   1142 should be identical to the number printed in the @code{seconds} field
   1143 for this function in the flat profile.
   1144 
   1145 @item children
   1146 This is the total amount of time spent in the subroutine calls made by
   1147 this function.  This should be equal to the sum of all the @code{self}
   1148 and @code{children} entries of the children listed directly below this
   1149 function.
   1150 
   1151 @item called
   1152 This is the number of times the function was called.
   1153 
   1154 If the function called itself recursively, there are two numbers,
   1155 separated by a @samp{+}.  The first number counts non-recursive calls,
   1156 and the second counts recursive calls.
   1157 
   1158 In the example above, the function @code{report} was called once from
   1159 @code{main}.
   1160 
   1161 @item name
   1162 This is the name of the current function.  The index number is
   1163 repeated after it.
   1164 
   1165 If the function is part of a cycle of recursion, the cycle number is
   1166 printed between the function's name and the index number
   1167 (@pxref{Cycles, ,How Mutually Recursive Functions Are Described}).
   1168 For example, if function @code{gnurr} is part of
   1169 cycle number one, and has index number twelve, its primary line would
   1170 be end like this:
   1171 
   1172 @example
   1173 gnurr <cycle 1> [12]
   1174 @end example
   1175 @end table
   1176 
   1177 @node Callers
   1178 @subsection Lines for a Function's Callers
   1179 
   1180 A function's entry has a line for each function it was called by.
   1181 These lines' fields correspond to the fields of the primary line, but
   1182 their meanings are different because of the difference in context.
   1183 
   1184 For reference, we repeat two lines from the entry for the function
   1185 @code{report}, the primary line and one caller-line preceding it, together
   1186 with the heading line that shows the names of the fields:
   1187 
   1188 @smallexample
   1189 index  % time    self  children called     name
   1190 @dots{}
   1191                 0.00    0.05       1/1           main [2]
   1192 [3]    100.0    0.00    0.05       1         report [3]
   1193 @end smallexample
   1194 
   1195 Here are the meanings of the fields in the caller-line for @code{report}
   1196 called from @code{main}:
   1197 
   1198 @table @code
   1199 @item self
   1200 An estimate of the amount of time spent in @code{report} itself when it was
   1201 called from @code{main}.
   1202 
   1203 @item children
   1204 An estimate of the amount of time spent in subroutines of @code{report}
   1205 when @code{report} was called from @code{main}.
   1206 
   1207 The sum of the @code{self} and @code{children} fields is an estimate
   1208 of the amount of time spent within calls to @code{report} from @code{main}.
   1209 
   1210 @item called
   1211 Two numbers: the number of times @code{report} was called from @code{main},
   1212 followed by the total number of non-recursive calls to @code{report} from
   1213 all its callers.
   1214 
   1215 @item name and index number
   1216 The name of the caller of @code{report} to which this line applies,
   1217 followed by the caller's index number.
   1218 
   1219 Not all functions have entries in the call graph; some
   1220 options to @code{gprof} request the omission of certain functions.
   1221 When a caller has no entry of its own, it still has caller-lines
   1222 in the entries of the functions it calls.
   1223 
   1224 If the caller is part of a recursion cycle, the cycle number is
   1225 printed between the name and the index number.
   1226 @end table
   1227 
   1228 If the identity of the callers of a function cannot be determined, a
   1229 dummy caller-line is printed which has @samp{<spontaneous>} as the
   1230 ``caller's name'' and all other fields blank.  This can happen for
   1231 signal handlers.
   1232 @c What if some calls have determinable callers' names but not all?
   1233 @c FIXME - still relevant?
   1234 
   1235 @node Subroutines
   1236 @subsection Lines for a Function's Subroutines
   1237 
   1238 A function's entry has a line for each of its subroutines---in other
   1239 words, a line for each other function that it called.  These lines'
   1240 fields correspond to the fields of the primary line, but their meanings
   1241 are different because of the difference in context.
   1242 
   1243 For reference, we repeat two lines from the entry for the function
   1244 @code{main}, the primary line and a line for a subroutine, together
   1245 with the heading line that shows the names of the fields:
   1246 
   1247 @smallexample
   1248 index  % time    self  children called     name
   1249 @dots{}
   1250 [2]    100.0    0.00    0.05       1         main [2]
   1251                 0.00    0.05       1/1           report [3]
   1252 @end smallexample
   1253 
   1254 Here are the meanings of the fields in the subroutine-line for @code{main}
   1255 calling @code{report}:
   1256 
   1257 @table @code
   1258 @item self
   1259 An estimate of the amount of time spent directly within @code{report}
   1260 when @code{report} was called from @code{main}.
   1261 
   1262 @item children
   1263 An estimate of the amount of time spent in subroutines of @code{report}
   1264 when @code{report} was called from @code{main}.
   1265 
   1266 The sum of the @code{self} and @code{children} fields is an estimate
   1267 of the total time spent in calls to @code{report} from @code{main}.
   1268 
   1269 @item called
   1270 Two numbers, the number of calls to @code{report} from @code{main}
   1271 followed by the total number of non-recursive calls to @code{report}.
   1272 This ratio is used to determine how much of @code{report}'s @code{self}
   1273 and @code{children} time gets credited to @code{main}.
   1274 @xref{Assumptions, ,Estimating @code{children} Times}.
   1275 
   1276 @item name
   1277 The name of the subroutine of @code{main} to which this line applies,
   1278 followed by the subroutine's index number.
   1279 
   1280 If the caller is part of a recursion cycle, the cycle number is
   1281 printed between the name and the index number.
   1282 @end table
   1283 
   1284 @node Cycles
   1285 @subsection How Mutually Recursive Functions Are Described
   1286 @cindex cycle
   1287 @cindex recursion cycle
   1288 
   1289 The graph may be complicated by the presence of @dfn{cycles of
   1290 recursion} in the call graph.  A cycle exists if a function calls
   1291 another function that (directly or indirectly) calls (or appears to
   1292 call) the original function.  For example: if @code{a} calls @code{b},
   1293 and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle.
   1294 
   1295 Whenever there are call paths both ways between a pair of functions, they
   1296 belong to the same cycle.  If @code{a} and @code{b} call each other and
   1297 @code{b} and @code{c} call each other, all three make one cycle.  Note that
   1298 even if @code{b} only calls @code{a} if it was not called from @code{a},
   1299 @code{gprof} cannot determine this, so @code{a} and @code{b} are still
   1300 considered a cycle.
   1301 
   1302 The cycles are numbered with consecutive integers.  When a function
   1303 belongs to a cycle, each time the function name appears in the call graph
   1304 it is followed by @samp{<cycle @var{number}>}.
   1305 
   1306 The reason cycles matter is that they make the time values in the call
   1307 graph paradoxical.  The ``time spent in children'' of @code{a} should
   1308 include the time spent in its subroutine @code{b} and in @code{b}'s
   1309 subroutines---but one of @code{b}'s subroutines is @code{a}!  How much of
   1310 @code{a}'s time should be included in the children of @code{a}, when
   1311 @code{a} is indirectly recursive?
   1312 
   1313 The way @code{gprof} resolves this paradox is by creating a single entry
   1314 for the cycle as a whole.  The primary line of this entry describes the
   1315 total time spent directly in the functions of the cycle.  The
   1316 ``subroutines'' of the cycle are the individual functions of the cycle, and
   1317 all other functions that were called directly by them.  The ``callers'' of
   1318 the cycle are the functions, outside the cycle, that called functions in
   1319 the cycle.
   1320 
   1321 Here is an example portion of a call graph which shows a cycle containing
   1322 functions @code{a} and @code{b}.  The cycle was entered by a call to
   1323 @code{a} from @code{main}; both @code{a} and @code{b} called @code{c}.
   1324 
   1325 @smallexample
   1326 index  % time    self  children called     name
   1327 ----------------------------------------
   1328                  1.77        0    1/1        main [2]
   1329 [3]     91.71    1.77        0    1+5    <cycle 1 as a whole> [3]
   1330                  1.02        0    3          b <cycle 1> [4]
   1331                  0.75        0    2          a <cycle 1> [5]
   1332 ----------------------------------------
   1333                                   3          a <cycle 1> [5]
   1334 [4]     52.85    1.02        0    0      b <cycle 1> [4]
   1335                                   2          a <cycle 1> [5]
   1336                     0        0    3/6        c [6]
   1337 ----------------------------------------
   1338                  1.77        0    1/1        main [2]
   1339                                   2          b <cycle 1> [4]
   1340 [5]     38.86    0.75        0    1      a <cycle 1> [5]
   1341                                   3          b <cycle 1> [4]
   1342                     0        0    3/6        c [6]
   1343 ----------------------------------------
   1344 @end smallexample
   1345 
   1346 @noindent
   1347 (The entire call graph for this program contains in addition an entry for
   1348 @code{main}, which calls @code{a}, and an entry for @code{c}, with callers
   1349 @code{a} and @code{b}.)
   1350 
   1351 @smallexample
   1352 index  % time    self  children called     name
   1353                                              <spontaneous>
   1354 [1]    100.00       0     1.93    0      start [1]
   1355                  0.16     1.77    1/1        main [2]
   1356 ----------------------------------------
   1357                  0.16     1.77    1/1        start [1]
   1358 [2]    100.00    0.16     1.77    1      main [2]
   1359                  1.77        0    1/1        a <cycle 1> [5]
   1360 ----------------------------------------
   1361                  1.77        0    1/1        main [2]
   1362 [3]     91.71    1.77        0    1+5    <cycle 1 as a whole> [3]
   1363                  1.02        0    3          b <cycle 1> [4]
   1364                  0.75        0    2          a <cycle 1> [5]
   1365                     0        0    6/6        c [6]
   1366 ----------------------------------------
   1367                                   3          a <cycle 1> [5]
   1368 [4]     52.85    1.02        0    0      b <cycle 1> [4]
   1369                                   2          a <cycle 1> [5]
   1370                     0        0    3/6        c [6]
   1371 ----------------------------------------
   1372                  1.77        0    1/1        main [2]
   1373                                   2          b <cycle 1> [4]
   1374 [5]     38.86    0.75        0    1      a <cycle 1> [5]
   1375                                   3          b <cycle 1> [4]
   1376                     0        0    3/6        c [6]
   1377 ----------------------------------------
   1378                     0        0    3/6        b <cycle 1> [4]
   1379                     0        0    3/6        a <cycle 1> [5]
   1380 [6]      0.00       0        0    6      c [6]
   1381 ----------------------------------------
   1382 @end smallexample
   1383 
   1384 The @code{self} field of the cycle's primary line is the total time
   1385 spent in all the functions of the cycle.  It equals the sum of the
   1386 @code{self} fields for the individual functions in the cycle, found
   1387 in the entry in the subroutine lines for these functions.
   1388 
   1389 The @code{children} fields of the cycle's primary line and subroutine lines
   1390 count only subroutines outside the cycle.  Even though @code{a} calls
   1391 @code{b}, the time spent in those calls to @code{b} is not counted in
   1392 @code{a}'s @code{children} time.  Thus, we do not encounter the problem of
   1393 what to do when the time in those calls to @code{b} includes indirect
   1394 recursive calls back to @code{a}.
   1395 
   1396 The @code{children} field of a caller-line in the cycle's entry estimates
   1397 the amount of time spent @emph{in the whole cycle}, and its other
   1398 subroutines, on the times when that caller called a function in the cycle.
   1399 
   1400 The @code{called} field in the primary line for the cycle has two numbers:
   1401 first, the number of times functions in the cycle were called by functions
   1402 outside the cycle; second, the number of times they were called by
   1403 functions in the cycle (including times when a function in the cycle calls
   1404 itself).  This is a generalization of the usual split into non-recursive and
   1405 recursive calls.
   1406 
   1407 The @code{called} field of a subroutine-line for a cycle member in the
   1408 cycle's entry says how many time that function was called from functions in
   1409 the cycle.  The total of all these is the second number in the primary line's
   1410 @code{called} field.
   1411 
   1412 In the individual entry for a function in a cycle, the other functions in
   1413 the same cycle can appear as subroutines and as callers.  These lines show
   1414 how many times each function in the cycle called or was called from each other
   1415 function in the cycle.  The @code{self} and @code{children} fields in these
   1416 lines are blank because of the difficulty of defining meanings for them
   1417 when recursion is going on.
   1418 
   1419 @node Line-by-line
   1420 @section Line-by-line Profiling
   1421 
   1422 @code{gprof}'s @samp{-l} option causes the program to perform
   1423 @dfn{line-by-line} profiling.  In this mode, histogram
   1424 samples are assigned not to functions, but to individual
   1425 lines of source code.  This only works with programs compiled with
   1426 older versions of the @code{gcc} compiler.  Newer versions of @code{gcc}
   1427 use a different program - @code{gcov} - to display line-by-line
   1428 profiling information.
   1429 
   1430 With the older versions of @code{gcc} the program usually has to be
   1431 compiled with a @samp{-g} option, in addition to @samp{-pg}, in order
   1432 to generate debugging symbols for tracking source code lines.
   1433 Note, in much older versions of @code{gcc} the program had to be
   1434 compiled with the @samp{-a} command line option as well.
   1435 
   1436 The flat profile is the most useful output table
   1437 in line-by-line mode.
   1438 The call graph isn't as useful as normal, since
   1439 the current version of @code{gprof} does not propagate
   1440 call graph arcs from source code lines to the enclosing function.
   1441 The call graph does, however, show each line of code
   1442 that called each function, along with a count.
   1443 
   1444 Here is a section of @code{gprof}'s output, without line-by-line profiling.
   1445 Note that @code{ct_init} accounted for four histogram hits, and
   1446 13327 calls to @code{init_block}.
   1447 
   1448 @smallexample
   1449 Flat profile:
   1450 
   1451 Each sample counts as 0.01 seconds.
   1452   %   cumulative   self              self     total
   1453  time   seconds   seconds    calls  us/call  us/call  name
   1454  30.77      0.13     0.04     6335     6.31     6.31  ct_init
   1455 
   1456 
   1457 		     Call graph (explanation follows)
   1458 
   1459 
   1460 granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
   1461 
   1462 index % time    self  children    called     name
   1463 
   1464                 0.00    0.00       1/13496       name_too_long
   1465                 0.00    0.00      40/13496       deflate
   1466                 0.00    0.00     128/13496       deflate_fast
   1467                 0.00    0.00   13327/13496       ct_init
   1468 [7]      0.0    0.00    0.00   13496         init_block
   1469 
   1470 @end smallexample
   1471 
   1472 Now let's look at some of @code{gprof}'s output from the same program run,
   1473 this time with line-by-line profiling enabled.  Note that @code{ct_init}'s
   1474 four histogram hits are broken down into four lines of source code---one hit
   1475 occurred on each of lines 349, 351, 382 and 385.  In the call graph,
   1476 note how
   1477 @code{ct_init}'s 13327 calls to @code{init_block} are broken down
   1478 into one call from line 396, 3071 calls from line 384, 3730 calls
   1479 from line 385, and 6525 calls from 387.
   1480 
   1481 @smallexample
   1482 Flat profile:
   1483 
   1484 Each sample counts as 0.01 seconds.
   1485   %   cumulative   self
   1486  time   seconds   seconds    calls  name
   1487   7.69      0.10     0.01           ct_init (trees.c:349)
   1488   7.69      0.11     0.01           ct_init (trees.c:351)
   1489   7.69      0.12     0.01           ct_init (trees.c:382)
   1490   7.69      0.13     0.01           ct_init (trees.c:385)
   1491 
   1492 
   1493 		     Call graph (explanation follows)
   1494 
   1495 
   1496 granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
   1497 
   1498   % time    self  children    called     name
   1499 
   1500             0.00    0.00       1/13496       name_too_long (gzip.c:1440)
   1501             0.00    0.00       1/13496       deflate (deflate.c:763)
   1502             0.00    0.00       1/13496       ct_init (trees.c:396)
   1503             0.00    0.00       2/13496       deflate (deflate.c:727)
   1504             0.00    0.00       4/13496       deflate (deflate.c:686)
   1505             0.00    0.00       5/13496       deflate (deflate.c:675)
   1506             0.00    0.00      12/13496       deflate (deflate.c:679)
   1507             0.00    0.00      16/13496       deflate (deflate.c:730)
   1508             0.00    0.00     128/13496       deflate_fast (deflate.c:654)
   1509             0.00    0.00    3071/13496       ct_init (trees.c:384)
   1510             0.00    0.00    3730/13496       ct_init (trees.c:385)
   1511             0.00    0.00    6525/13496       ct_init (trees.c:387)
   1512 [6]  0.0    0.00    0.00   13496         init_block (trees.c:408)
   1513 
   1514 @end smallexample
   1515 
   1516 
   1517 @node Annotated Source
   1518 @section The Annotated Source Listing
   1519 
   1520 @code{gprof}'s @samp{-A} option triggers an annotated source listing,
   1521 which lists the program's source code, each function labeled with the
   1522 number of times it was called.  You may also need to specify the
   1523 @samp{-I} option, if @code{gprof} can't find the source code files.
   1524 
   1525 With older versions of @code{gcc} compiling with @samp{gcc @dots{} -g
   1526 -pg -a} augments your program with basic-block counting code, in
   1527 addition to function counting code.  This enables @code{gprof} to
   1528 determine how many times each line of code was executed.  With newer
   1529 versions of @code{gcc} support for displaying basic-block counts is
   1530 provided by the @code{gcov} program.
   1531 
   1532 For example, consider the following function, taken from gzip,
   1533 with line numbers added:
   1534 
   1535 @smallexample
   1536  1 ulg updcrc(s, n)
   1537  2     uch *s;
   1538  3     unsigned n;
   1539  4 @{
   1540  5     register ulg c;
   1541  6
   1542  7     static ulg crc = (ulg)0xffffffffL;
   1543  8
   1544  9     if (s == NULL) @{
   1545 10         c = 0xffffffffL;
   1546 11     @} else @{
   1547 12         c = crc;
   1548 13         if (n) do @{
   1549 14             c = crc_32_tab[...];
   1550 15         @} while (--n);
   1551 16     @}
   1552 17     crc = c;
   1553 18     return c ^ 0xffffffffL;
   1554 19 @}
   1555 
   1556 @end smallexample
   1557 
   1558 @code{updcrc} has at least five basic-blocks.
   1559 One is the function itself.  The
   1560 @code{if} statement on line 9 generates two more basic-blocks, one
   1561 for each branch of the @code{if}.  A fourth basic-block results from
   1562 the @code{if} on line 13, and the contents of the @code{do} loop form
   1563 the fifth basic-block.  The compiler may also generate additional
   1564 basic-blocks to handle various special cases.
   1565 
   1566 A program augmented for basic-block counting can be analyzed with
   1567 @samp{gprof -l -A}.
   1568 The @samp{-x} option is also helpful,
   1569 to ensure that each line of code is labeled at least once.
   1570 Here is @code{updcrc}'s
   1571 annotated source listing for a sample @code{gzip} run:
   1572 
   1573 @smallexample
   1574                 ulg updcrc(s, n)
   1575                     uch *s;
   1576                     unsigned n;
   1577             2 ->@{
   1578                     register ulg c;
   1579 
   1580                     static ulg crc = (ulg)0xffffffffL;
   1581 
   1582             2 ->    if (s == NULL) @{
   1583             1 ->        c = 0xffffffffL;
   1584             1 ->    @} else @{
   1585             1 ->        c = crc;
   1586             1 ->        if (n) do @{
   1587         26312 ->            c = crc_32_tab[...];
   1588 26312,1,26311 ->        @} while (--n);
   1589                     @}
   1590             2 ->    crc = c;
   1591             2 ->    return c ^ 0xffffffffL;
   1592             2 ->@}
   1593 @end smallexample
   1594 
   1595 In this example, the function was called twice, passing once through
   1596 each branch of the @code{if} statement.  The body of the @code{do}
   1597 loop was executed a total of 26312 times.  Note how the @code{while}
   1598 statement is annotated.  It began execution 26312 times, once for
   1599 each iteration through the loop.  One of those times (the last time)
   1600 it exited, while it branched back to the beginning of the loop 26311 times.
   1601 
   1602 @node Inaccuracy
   1603 @chapter Inaccuracy of @code{gprof} Output
   1604 
   1605 @menu
   1606 * Sampling Error::      Statistical margins of error
   1607 * Assumptions::         Estimating children times
   1608 @end menu
   1609 
   1610 @node Sampling Error
   1611 @section Statistical Sampling Error
   1612 
   1613 The run-time figures that @code{gprof} gives you are based on a sampling
   1614 process, so they are subject to statistical inaccuracy.  If a function runs
   1615 only a small amount of time, so that on the average the sampling process
   1616 ought to catch that function in the act only once, there is a pretty good
   1617 chance it will actually find that function zero times, or twice.
   1618 
   1619 By contrast, the number-of-calls and basic-block figures are derived
   1620 by counting, not sampling.  They are completely accurate and will not
   1621 vary from run to run if your program is deterministic and single
   1622 threaded.  In multi-threaded applications, or single threaded
   1623 applications that link with multi-threaded libraries, the counts are
   1624 only deterministic if the counting function is thread-safe.  (Note:
   1625 beware that the mcount counting function in glibc is @emph{not}
   1626 thread-safe).  @xref{Implementation, ,Implementation of Profiling}.
   1627 
   1628 The @dfn{sampling period} that is printed at the beginning of the flat
   1629 profile says how often samples are taken.  The rule of thumb is that a
   1630 run-time figure is accurate if it is considerably bigger than the sampling
   1631 period.
   1632 
   1633 The actual amount of error can be predicted.
   1634 For @var{n} samples, the @emph{expected} error
   1635 is the square-root of @var{n}.  For example,
   1636 if the sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second,
   1637 @var{n} is 100 samples (1 second/0.01 seconds), sqrt(@var{n}) is 10 samples, so
   1638 the expected error in @code{foo}'s run-time is 0.1 seconds (10*0.01 seconds),
   1639 or ten percent of the observed value.
   1640 Again, if the sampling period is 0.01 seconds and @code{bar}'s run-time is
   1641 100 seconds, @var{n} is 10000 samples, sqrt(@var{n}) is 100 samples, so
   1642 the expected error in @code{bar}'s run-time is 1 second,
   1643 or one percent of the observed value.
   1644 It is likely to
   1645 vary this much @emph{on the average} from one profiling run to the next.
   1646 (@emph{Sometimes} it will vary more.)
   1647 
   1648 This does not mean that a small run-time figure is devoid of information.
   1649 If the program's @emph{total} run-time is large, a small run-time for one
   1650 function does tell you that that function used an insignificant fraction of
   1651 the whole program's time.  Usually this means it is not worth optimizing.
   1652 
   1653 One way to get more accuracy is to give your program more (but similar)
   1654 input data so it will take longer.  Another way is to combine the data from
   1655 several runs, using the @samp{-s} option of @code{gprof}.  Here is how:
   1656 
   1657 @enumerate
   1658 @item
   1659 Run your program once.
   1660 
   1661 @item
   1662 Issue the command @samp{mv gmon.out gmon.sum}.
   1663 
   1664 @item
   1665 Run your program again, the same as before.
   1666 
   1667 @item
   1668 Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command:
   1669 
   1670 @example
   1671 gprof -s @var{executable-file} gmon.out gmon.sum
   1672 @end example
   1673 
   1674 @item
   1675 Repeat the last two steps as often as you wish.
   1676 
   1677 @item
   1678 Analyze the cumulative data using this command:
   1679 
   1680 @example
   1681 gprof @var{executable-file} gmon.sum > @var{output-file}
   1682 @end example
   1683 @end enumerate
   1684 
   1685 @node Assumptions
   1686 @section Estimating @code{children} Times
   1687 
   1688 Some of the figures in the call graph are estimates---for example, the
   1689 @code{children} time values and all the time figures in caller and
   1690 subroutine lines.
   1691 
   1692 There is no direct information about these measurements in the profile
   1693 data itself.  Instead, @code{gprof} estimates them by making an assumption
   1694 about your program that might or might not be true.
   1695 
   1696 The assumption made is that the average time spent in each call to any
   1697 function @code{foo} is not correlated with who called @code{foo}.  If
   1698 @code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came
   1699 from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s
   1700 @code{children} time, by assumption.
   1701 
   1702 This assumption is usually true enough, but for some programs it is far
   1703 from true.  Suppose that @code{foo} returns very quickly when its argument
   1704 is zero; suppose that @code{a} always passes zero as an argument, while
   1705 other callers of @code{foo} pass other arguments.  In this program, all the
   1706 time spent in @code{foo} is in the calls from callers other than @code{a}.
   1707 But @code{gprof} has no way of knowing this; it will blindly and
   1708 incorrectly charge 2 seconds of time in @code{foo} to the children of
   1709 @code{a}.
   1710 
   1711 @c FIXME - has this been fixed?
   1712 We hope some day to put more complete data into @file{gmon.out}, so that
   1713 this assumption is no longer needed, if we can figure out how.  For the
   1714 novice, the estimated figures are usually more useful than misleading.
   1715 
   1716 @node How do I?
   1717 @chapter Answers to Common Questions
   1718 
   1719 @table @asis
   1720 @item How can I get more exact information about hot spots in my program?
   1721 
   1722 Looking at the per-line call counts only tells part of the story.
   1723 Because @code{gprof} can only report call times and counts by function,
   1724 the best way to get finer-grained information on where the program
   1725 is spending its time is to re-factor large functions into sequences
   1726 of calls to smaller ones.  Beware however that this can introduce
   1727 artificial hot spots since compiling with @samp{-pg} adds a significant
   1728 overhead to function calls.  An alternative solution is to use a
   1729 non-intrusive profiler, e.g.@: oprofile.
   1730 
   1731 @item How do I find which lines in my program were executed the most times?
   1732 
   1733 Use the @code{gcov} program.
   1734 
   1735 @item How do I find which lines in my program called a particular function?
   1736 
   1737 Use @samp{gprof -l} and lookup the function in the call graph.
   1738 The callers will be broken down by function and line number.
   1739 
   1740 @item How do I analyze a program that runs for less than a second?
   1741 
   1742 Try using a shell script like this one:
   1743 
   1744 @example
   1745 for i in `seq 1 100`; do
   1746   fastprog
   1747   mv gmon.out gmon.out.$i
   1748 done
   1749 
   1750 gprof -s fastprog gmon.out.*
   1751 
   1752 gprof fastprog gmon.sum
   1753 @end example
   1754 
   1755 If your program is completely deterministic, all the call counts
   1756 will be simple multiples of 100 (i.e., a function called once in
   1757 each run will appear with a call count of 100).
   1758 
   1759 @end table
   1760 
   1761 @node Incompatibilities
   1762 @chapter Incompatibilities with Unix @code{gprof}
   1763 
   1764 @sc{gnu} @code{gprof} and Berkeley Unix @code{gprof} use the same data
   1765 file @file{gmon.out}, and provide essentially the same information.  But
   1766 there are a few differences.
   1767 
   1768 @itemize @bullet
   1769 @item
   1770 @sc{gnu} @code{gprof} uses a new, generalized file format with support
   1771 for basic-block execution counts and non-realtime histograms.  A magic
   1772 cookie and version number allows @code{gprof} to easily identify
   1773 new style files.  Old BSD-style files can still be read.
   1774 @xref{File Format, ,Profiling Data File Format}.
   1775 
   1776 @item
   1777 For a recursive function, Unix @code{gprof} lists the function as a
   1778 parent and as a child, with a @code{calls} field that lists the number
   1779 of recursive calls.  @sc{gnu} @code{gprof} omits these lines and puts
   1780 the number of recursive calls in the primary line.
   1781 
   1782 @item
   1783 When a function is suppressed from the call graph with @samp{-e}, @sc{gnu}
   1784 @code{gprof} still lists it as a subroutine of functions that call it.
   1785 
   1786 @item
   1787 @sc{gnu} @code{gprof} accepts the @samp{-k} with its argument
   1788 in the form @samp{from/to}, instead of @samp{from to}.
   1789 
   1790 @item
   1791 In the annotated source listing,
   1792 if there are multiple basic blocks on the same line,
   1793 @sc{gnu} @code{gprof} prints all of their counts, separated by commas.
   1794 
   1795 @ignore - it does this now
   1796 @item
   1797 The function names printed in @sc{gnu} @code{gprof} output do not include
   1798 the leading underscores that are added internally to the front of all
   1799 C identifiers on many operating systems.
   1800 @end ignore
   1801 
   1802 @item
   1803 The blurbs, field widths, and output formats are different.  @sc{gnu}
   1804 @code{gprof} prints blurbs after the tables, so that you can see the
   1805 tables without skipping the blurbs.
   1806 @end itemize
   1807 
   1808 @node Details
   1809 @chapter Details of Profiling
   1810 
   1811 @menu
   1812 * Implementation::      How a program collects profiling information
   1813 * File Format::         Format of @samp{gmon.out} files
   1814 * Internals::           @code{gprof}'s internal operation
   1815 * Debugging::           Using @code{gprof}'s @samp{-d} option
   1816 @end menu
   1817 
   1818 @node Implementation
   1819 @section Implementation of Profiling
   1820 
   1821 Profiling works by changing how every function in your program is compiled
   1822 so that when it is called, it will stash away some information about where
   1823 it was called from.  From this, the profiler can figure out what function
   1824 called it, and can count how many times it was called.  This change is made
   1825 by the compiler when your program is compiled with the @samp{-pg} option,
   1826 which causes every function to call @code{mcount}
   1827 (or @code{_mcount}, or @code{__mcount}, depending on the OS and compiler)
   1828 as one of its first operations.
   1829 
   1830 The @code{mcount} routine, included in the profiling library,
   1831 is responsible for recording in an in-memory call graph table
   1832 both its parent routine (the child) and its parent's parent.  This is
   1833 typically done by examining the stack frame to find both
   1834 the address of the child, and the return address in the original parent.
   1835 Since this is a very machine-dependent operation, @code{mcount}
   1836 itself is typically a short assembly-language stub routine
   1837 that extracts the required
   1838 information, and then calls @code{__mcount_internal}
   1839 (a normal C function) with two arguments---@code{frompc} and @code{selfpc}.
   1840 @code{__mcount_internal} is responsible for maintaining
   1841 the in-memory call graph, which records @code{frompc}, @code{selfpc},
   1842 and the number of times each of these call arcs was traversed.
   1843 
   1844 GCC Version 2 provides a magical function (@code{__builtin_return_address}),
   1845 which allows a generic @code{mcount} function to extract the
   1846 required information from the stack frame.  However, on some
   1847 architectures, most notably the SPARC, using this builtin can be
   1848 very computationally expensive, and an assembly language version
   1849 of @code{mcount} is used for performance reasons.
   1850 
   1851 Number-of-calls information for library routines is collected by using a
   1852 special version of the C library.  The programs in it are the same as in
   1853 the usual C library, but they were compiled with @samp{-pg}.  If you
   1854 link your program with @samp{gcc @dots{} -pg}, it automatically uses the
   1855 profiling version of the library.
   1856 
   1857 Profiling also involves watching your program as it runs, and keeping a
   1858 histogram of where the program counter happens to be every now and then.
   1859 Typically the program counter is looked at around 100 times per second of
   1860 run time, but the exact frequency may vary from system to system.
   1861 
   1862 This is done is one of two ways.  Most UNIX-like operating systems
   1863 provide a @code{profil()} system call, which registers a memory
   1864 array with the kernel, along with a scale
   1865 factor that determines how the program's address space maps
   1866 into the array.
   1867 Typical scaling values cause every 2 to 8 bytes of address space
   1868 to map into a single array slot.
   1869 On every tick of the system clock
   1870 (assuming the profiled program is running), the value of the
   1871 program counter is examined and the corresponding slot in
   1872 the memory array is incremented.  Since this is done in the kernel,
   1873 which had to interrupt the process anyway to handle the clock
   1874 interrupt, very little additional system overhead is required.
   1875 
   1876 However, some operating systems, most notably Linux 2.0 (and earlier),
   1877 do not provide a @code{profil()} system call.  On such a system,
   1878 arrangements are made for the kernel to periodically deliver
   1879 a signal to the process (typically via @code{setitimer()}),
   1880 which then performs the same operation of examining the
   1881 program counter and incrementing a slot in the memory array.
   1882 Since this method requires a signal to be delivered to
   1883 user space every time a sample is taken, it uses considerably
   1884 more overhead than kernel-based profiling.  Also, due to the
   1885 added delay required to deliver the signal, this method is
   1886 less accurate as well.
   1887 
   1888 A special startup routine allocates memory for the histogram and
   1889 either calls @code{profil()} or sets up
   1890 a clock signal handler.
   1891 This routine (@code{monstartup}) can be invoked in several ways.
   1892 On Linux systems, a special profiling startup file @code{gcrt0.o},
   1893 which invokes @code{monstartup} before @code{main},
   1894 is used instead of the default @code{crt0.o}.
   1895 Use of this special startup file is one of the effects
   1896 of using @samp{gcc @dots{} -pg} to link.
   1897 On SPARC systems, no special startup files are used.
   1898 Rather, the @code{mcount} routine, when it is invoked for
   1899 the first time (typically when @code{main} is called),
   1900 calls @code{monstartup}.
   1901 
   1902 If the compiler's @samp{-a} option was used, basic-block counting
   1903 is also enabled.  Each object file is then compiled with a static array
   1904 of counts, initially zero.
   1905 In the executable code, every time a new basic-block begins
   1906 (i.e., when an @code{if} statement appears), an extra instruction
   1907 is inserted to increment the corresponding count in the array.
   1908 At compile time, a paired array was constructed that recorded
   1909 the starting address of each basic-block.  Taken together,
   1910 the two arrays record the starting address of every basic-block,
   1911 along with the number of times it was executed.
   1912 
   1913 The profiling library also includes a function (@code{mcleanup}) which is
   1914 typically registered using @code{atexit()} to be called as the
   1915 program exits, and is responsible for writing the file @file{gmon.out}.
   1916 Profiling is turned off, various headers are output, and the histogram
   1917 is written, followed by the call-graph arcs and the basic-block counts.
   1918 
   1919 The output from @code{gprof} gives no indication of parts of your program that
   1920 are limited by I/O or swapping bandwidth.  This is because samples of the
   1921 program counter are taken at fixed intervals of the program's run time.
   1922 Therefore, the
   1923 time measurements in @code{gprof} output say nothing about time that your
   1924 program was not running.  For example, a part of the program that creates
   1925 so much data that it cannot all fit in physical memory at once may run very
   1926 slowly due to thrashing, but @code{gprof} will say it uses little time.  On
   1927 the other hand, sampling by run time has the advantage that the amount of
   1928 load due to other users won't directly affect the output you get.
   1929 
   1930 @node File Format
   1931 @section Profiling Data File Format
   1932 
   1933 The old BSD-derived file format used for profile data does not contain a
   1934 magic cookie that allows to check whether a data file really is a
   1935 @code{gprof} file.  Furthermore, it does not provide a version number, thus
   1936 rendering changes to the file format almost impossible.  @sc{gnu} @code{gprof}
   1937 uses a new file format that provides these features.  For backward
   1938 compatibility, @sc{gnu} @code{gprof} continues to support the old BSD-derived
   1939 format, but not all features are supported with it.  For example,
   1940 basic-block execution counts cannot be accommodated by the old file
   1941 format.
   1942 
   1943 The new file format is defined in header file @file{gmon_out.h}.  It
   1944 consists of a header containing the magic cookie and a version number,
   1945 as well as some spare bytes available for future extensions.  All data
   1946 in a profile data file is in the native format of the target for which
   1947 the profile was collected.  @sc{gnu} @code{gprof} adapts automatically
   1948 to the byte-order in use.
   1949 
   1950 In the new file format, the header is followed by a sequence of
   1951 records.  Currently, there are three different record types: histogram
   1952 records, call-graph arc records, and basic-block execution count
   1953 records.  Each file can contain any number of each record type.  When
   1954 reading a file, @sc{gnu} @code{gprof} will ensure records of the same type are
   1955 compatible with each other and compute the union of all records.  For
   1956 example, for basic-block execution counts, the union is simply the sum
   1957 of all execution counts for each basic-block.
   1958 
   1959 @subsection Histogram Records
   1960 
   1961 Histogram records consist of a header that is followed by an array of
   1962 bins.  The header contains the text-segment range that the histogram
   1963 spans, the size of the histogram in bytes (unlike in the old BSD
   1964 format, this does not include the size of the header), the rate of the
   1965 profiling clock, and the physical dimension that the bin counts
   1966 represent after being scaled by the profiling clock rate.  The
   1967 physical dimension is specified in two parts: a long name of up to 15
   1968 characters and a single character abbreviation.  For example, a
   1969 histogram representing real-time would specify the long name as
   1970 ``seconds'' and the abbreviation as ``s''.  This feature is useful for
   1971 architectures that support performance monitor hardware (which,
   1972 fortunately, is becoming increasingly common).  For example, under DEC
   1973 OSF/1, the ``uprofile'' command can be used to produce a histogram of,
   1974 say, instruction cache misses.  In this case, the dimension in the
   1975 histogram header could be set to ``i-cache misses'' and the abbreviation
   1976 could be set to ``1'' (because it is simply a count, not a physical
   1977 dimension).  Also, the profiling rate would have to be set to 1 in
   1978 this case.
   1979 
   1980 Histogram bins are 16-bit numbers and each bin represent an equal
   1981 amount of text-space.  For example, if the text-segment is one
   1982 thousand bytes long and if there are ten bins in the histogram, each
   1983 bin represents one hundred bytes.
   1984 
   1985 
   1986 @subsection Call-Graph Records
   1987 
   1988 Call-graph records have a format that is identical to the one used in
   1989 the BSD-derived file format.  It consists of an arc in the call graph
   1990 and a count indicating the number of times the arc was traversed
   1991 during program execution.  Arcs are specified by a pair of addresses:
   1992 the first must be within caller's function and the second must be
   1993 within the callee's function.  When performing profiling at the
   1994 function level, these addresses can point anywhere within the
   1995 respective function.  However, when profiling at the line-level, it is
   1996 better if the addresses are as close to the call-site/entry-point as
   1997 possible.  This will ensure that the line-level call-graph is able to
   1998 identify exactly which line of source code performed calls to a
   1999 function.
   2000 
   2001 @subsection Basic-Block Execution Count Records
   2002 
   2003 Basic-block execution count records consist of a header followed by a
   2004 sequence of address/count pairs.  The header simply specifies the
   2005 length of the sequence.  In an address/count pair, the address
   2006 identifies a basic-block and the count specifies the number of times
   2007 that basic-block was executed.  Any address within the basic-address can
   2008 be used.
   2009 
   2010 @node Internals
   2011 @section @code{gprof}'s Internal Operation
   2012 
   2013 Like most programs, @code{gprof} begins by processing its options.
   2014 During this stage, it may building its symspec list
   2015 (@code{sym_ids.c:@-sym_id_add}), if
   2016 options are specified which use symspecs.
   2017 @code{gprof} maintains a single linked list of symspecs,
   2018 which will eventually get turned into 12 symbol tables,
   2019 organized into six include/exclude pairs---one
   2020 pair each for the flat profile (INCL_FLAT/EXCL_FLAT),
   2021 the call graph arcs (INCL_ARCS/EXCL_ARCS),
   2022 printing in the call graph (INCL_GRAPH/EXCL_GRAPH),
   2023 timing propagation in the call graph (INCL_TIME/EXCL_TIME),
   2024 the annotated source listing (INCL_ANNO/EXCL_ANNO),
   2025 and the execution count listing (INCL_EXEC/EXCL_EXEC).
   2026 
   2027 After option processing, @code{gprof} finishes
   2028 building the symspec list by adding all the symspecs in
   2029 @code{default_excluded_list} to the exclude lists
   2030 EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified,
   2031 EXCL_FLAT as well.
   2032 These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
   2033 
   2034 Next, the BFD library is called to open the object file,
   2035 verify that it is an object file,
   2036 and read its symbol table (@code{core.c:@-core_init}),
   2037 using @code{bfd_canonicalize_symtab} after mallocing
   2038 an appropriately sized array of symbols.  At this point,
   2039 function mappings are read (if the @samp{--file-ordering} option
   2040 has been specified), and the core text space is read into
   2041 memory (if the @samp{-c} option was given).
   2042 
   2043 @code{gprof}'s own symbol table, an array of Sym structures,
   2044 is now built.
   2045 This is done in one of two ways, by one of two routines, depending
   2046 on whether line-by-line profiling (@samp{-l} option) has been
   2047 enabled.
   2048 For normal profiling, the BFD canonical symbol table is scanned.
   2049 For line-by-line profiling, every
   2050 text space address is examined, and a new symbol table entry
   2051 gets created every time the line number changes.
   2052 In either case, two passes are made through the symbol
   2053 table---one to count the size of the symbol table required,
   2054 and the other to actually read the symbols.  In between the
   2055 two passes, a single array of type @code{Sym} is created of
   2056 the appropriate length.
   2057 Finally, @code{symtab.c:@-symtab_finalize}
   2058 is called to sort the symbol table and remove duplicate entries
   2059 (entries with the same memory address).
   2060 
   2061 The symbol table must be a contiguous array for two reasons.
   2062 First, the @code{qsort} library function (which sorts an array)
   2063 will be used to sort the symbol table.
   2064 Also, the symbol lookup routine (@code{symtab.c:@-sym_lookup}),
   2065 which finds symbols
   2066 based on memory address, uses a binary search algorithm
   2067 which requires the symbol table to be a sorted array.
   2068 Function symbols are indicated with an @code{is_func} flag.
   2069 Line number symbols have no special flags set.
   2070 Additionally, a symbol can have an @code{is_static} flag
   2071 to indicate that it is a local symbol.
   2072 
   2073 With the symbol table read, the symspecs can now be translated
   2074 into Syms (@code{sym_ids.c:@-sym_id_parse}).  Remember that a single
   2075 symspec can match multiple symbols.
   2076 An array of symbol tables
   2077 (@code{syms}) is created, each entry of which is a symbol table
   2078 of Syms to be included or excluded from a particular listing.
   2079 The master symbol table and the symspecs are examined by nested
   2080 loops, and every symbol that matches a symspec is inserted
   2081 into the appropriate syms table.  This is done twice, once to
   2082 count the size of each required symbol table, and again to build
   2083 the tables, which have been malloced between passes.
   2084 From now on, to determine whether a symbol is on an include
   2085 or exclude symspec list, @code{gprof} simply uses its
   2086 standard symbol lookup routine on the appropriate table
   2087 in the @code{syms} array.
   2088 
   2089 Now the profile data file(s) themselves are read
   2090 (@code{gmon_io.c:@-gmon_out_read}),
   2091 first by checking for a new-style @samp{gmon.out} header,
   2092 then assuming this is an old-style BSD @samp{gmon.out}
   2093 if the magic number test failed.
   2094 
   2095 New-style histogram records are read by @code{hist.c:@-hist_read_rec}.
   2096 For the first histogram record, allocate a memory array to hold
   2097 all the bins, and read them in.
   2098 When multiple profile data files (or files with multiple histogram
   2099 records) are read, the memory ranges of each pair of histogram records
   2100 must be either equal, or non-overlapping.  For each pair of histogram
   2101 records, the resolution (memory region size divided by the number of
   2102 bins) must be the same.  The time unit must be the same for all
   2103 histogram records. If the above containts are met, all histograms
   2104 for the same memory range are merged.
   2105 
   2106 As each call graph record is read (@code{call_graph.c:@-cg_read_rec}),
   2107 the parent and child addresses
   2108 are matched to symbol table entries, and a call graph arc is
   2109 created by @code{cg_arcs.c:@-arc_add}, unless the arc fails a symspec
   2110 check against INCL_ARCS/EXCL_ARCS.  As each arc is added,
   2111 a linked list is maintained of the parent's child arcs, and of the child's
   2112 parent arcs.
   2113 Both the child's call count and the arc's call count are
   2114 incremented by the record's call count.
   2115 
   2116 Basic-block records are read (@code{basic_blocks.c:@-bb_read_rec}),
   2117 but only if line-by-line profiling has been selected.
   2118 Each basic-block address is matched to a corresponding line
   2119 symbol in the symbol table, and an entry made in the symbol's
   2120 bb_addr and bb_calls arrays.  Again, if multiple basic-block
   2121 records are present for the same address, the call counts
   2122 are cumulative.
   2123 
   2124 A gmon.sum file is dumped, if requested (@code{gmon_io.c:@-gmon_out_write}).
   2125 
   2126 If histograms were present in the data files, assign them to symbols
   2127 (@code{hist.c:@-hist_assign_samples}) by iterating over all the sample
   2128 bins and assigning them to symbols.  Since the symbol table
   2129 is sorted in order of ascending memory addresses, we can
   2130 simple follow along in the symbol table as we make our pass
   2131 over the sample bins.
   2132 This step includes a symspec check against INCL_FLAT/EXCL_FLAT.
   2133 Depending on the histogram
   2134 scale factor, a sample bin may span multiple symbols,
   2135 in which case a fraction of the sample count is allocated
   2136 to each symbol, proportional to the degree of overlap.
   2137 This effect is rare for normal profiling, but overlaps
   2138 are more common during line-by-line profiling, and can
   2139 cause each of two adjacent lines to be credited with half
   2140 a hit, for example.
   2141 
   2142 If call graph data is present, @code{cg_arcs.c:@-cg_assemble} is called.
   2143 First, if @samp{-c} was specified, a machine-dependent
   2144 routine (@code{find_call}) scans through each symbol's machine code,
   2145 looking for subroutine call instructions, and adding them
   2146 to the call graph with a zero call count.
   2147 A topological sort is performed by depth-first numbering
   2148 all the symbols (@code{cg_dfn.c:@-cg_dfn}), so that
   2149 children are always numbered less than their parents,
   2150 then making a array of pointers into the symbol table and sorting it into
   2151 numerical order, which is reverse topological
   2152 order (children appear before parents).
   2153 Cycles are also detected at this point, all members
   2154 of which are assigned the same topological number.
   2155 Two passes are now made through this sorted array of symbol pointers.
   2156 The first pass, from end to beginning (parents to children),
   2157 computes the fraction of child time to propagate to each parent
   2158 and a print flag.
   2159 The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH,
   2160 with a parent's include or exclude (print or no print) property
   2161 being propagated to its children, unless they themselves explicitly appear
   2162 in INCL_GRAPH or EXCL_GRAPH.
   2163 A second pass, from beginning to end (children to parents) actually
   2164 propagates the timings along the call graph, subject
   2165 to a check against INCL_TIME/EXCL_TIME.
   2166 With the print flag, fractions, and timings now stored in the symbol
   2167 structures, the topological sort array is now discarded, and a
   2168 new array of pointers is assembled, this time sorted by propagated time.
   2169 
   2170 Finally, print the various outputs the user requested, which is now fairly
   2171 straightforward.  The call graph (@code{cg_print.c:@-cg_print}) and
   2172 flat profile (@code{hist.c:@-hist_print}) are regurgitations of values
   2173 already computed.  The annotated source listing
   2174 (@code{basic_blocks.c:@-print_annotated_source}) uses basic-block
   2175 information, if present, to label each line of code with call counts,
   2176 otherwise only the function call counts are presented.
   2177 
   2178 The function ordering code is marginally well documented
   2179 in the source code itself (@code{cg_print.c}).  Basically,
   2180 the functions with the most use and the most parents are
   2181 placed first, followed by other functions with the most use,
   2182 followed by lower use functions, followed by unused functions
   2183 at the end.
   2184 
   2185 @node Debugging
   2186 @section Debugging @code{gprof}
   2187 
   2188 If @code{gprof} was compiled with debugging enabled,
   2189 the @samp{-d} option triggers debugging output
   2190 (to stdout) which can be helpful in understanding its operation.
   2191 The debugging number specified is interpreted as a sum of the following
   2192 options:
   2193 
   2194 @table @asis
   2195 @item 2 - Topological sort
   2196 Monitor depth-first numbering of symbols during call graph analysis
   2197 @item 4 - Cycles
   2198 Shows symbols as they are identified as cycle heads
   2199 @item 16 - Tallying
   2200 As the call graph arcs are read, show each arc and how
   2201 the total calls to each function are tallied
   2202 @item 32 - Call graph arc sorting
   2203 Details sorting individual parents/children within each call graph entry
   2204 @item 64 - Reading histogram and call graph records
   2205 Shows address ranges of histograms as they are read, and each
   2206 call graph arc
   2207 @item 128 - Symbol table
   2208 Reading, classifying, and sorting the symbol table from the object file.
   2209 For line-by-line profiling (@samp{-l} option), also shows line numbers
   2210 being assigned to memory addresses.
   2211 @item 256 - Static call graph
   2212 Trace operation of @samp{-c} option
   2213 @item 512 - Symbol table and arc table lookups
   2214 Detail operation of lookup routines
   2215 @item 1024 - Call graph propagation
   2216 Shows how function times are propagated along the call graph
   2217 @item 2048 - Basic-blocks
   2218 Shows basic-block records as they are read from profile data
   2219 (only meaningful with @samp{-l} option)
   2220 @item 4096 - Symspecs
   2221 Shows symspec-to-symbol pattern matching operation
   2222 @item 8192 - Annotate source
   2223 Tracks operation of @samp{-A} option
   2224 @end table
   2225 
   2226 @node GNU Free Documentation License
   2227 @appendix GNU Free Documentation License
   2228 @include fdl.texi
   2229 
   2230 @bye
   2231 
   2232 NEEDS AN INDEX
   2233 
   2234 -T - "traditional BSD style": How is it different?  Should the
   2235 differences be documented?
   2236 
   2237 example flat file adds up to 100.01%...
   2238 
   2239 note: time estimates now only go out to one decimal place (0.0), where
   2240 they used to extend two (78.67).
   2241