Home | History | Annotate | Download | only in doc
      1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
      2 <HTML>
      3 
      4 <HEAD>
      5   <link rel="stylesheet" href="designstyle.css">
      6   <title>Gperftools Heap Profiler</title>
      7 </HEAD>
      8 
      9 <BODY>
     10 
     11 <p align=right>
     12   <i>Last modified
     13   <script type=text/javascript>
     14     var lm = new Date(document.lastModified);
     15     document.write(lm.toDateString());
     16   </script></i>
     17 </p>
     18 
     19 <p>This is the heap profiler we use at Google, to explore how C++
     20 programs manage memory.  This facility can be useful for</p>
     21 <ul>
     22   <li> Figuring out what is in the program heap at any given time
     23   <li> Locating memory leaks
     24   <li> Finding places that do a lot of allocation
     25 </ul>
     26 
     27 <p>The profiling system instruments all allocations and frees.  It
     28 keeps track of various pieces of information per allocation site.  An
     29 allocation site is defined as the active stack trace at the call to
     30 <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
     31 <code>new</code>.</p>
     32 
     33 <p>There are three parts to using it: linking the library into an
     34 application, running the code, and analyzing the output.</p>
     35 
     36 
     37 <h1>Linking in the Library</h1>
     38 
     39 <p>To install the heap profiler into your executable, add
     40 <code>-ltcmalloc</code> to the link-time step for your executable.
     41 Also, while we don't necessarily recommend this form of usage, it's
     42 possible to add in the profiler at run-time using
     43 <code>LD_PRELOAD</code>:
     44 <pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" &lt;binary&gt;</pre>
     45 
     46 <p>This does <i>not</i> turn on heap profiling; it just inserts the
     47 code.  For that reason, it's practical to just always link
     48 <code>-ltcmalloc</code> into a binary while developing; that's what we
     49 do at Google.  (However, since any user can turn on the profiler by
     50 setting an environment variable, it's not necessarily recommended to
     51 install profiler-linked binaries into a production, running
     52 system.)  Note that if you wish to use the heap profiler, you must
     53 also use the tcmalloc memory-allocation library.  There is no way
     54 currently to use the heap profiler separate from tcmalloc.</p>
     55 
     56 
     57 <h1>Running the Code</h1>
     58 
     59 <p>There are several alternatives to actually turn on heap profiling
     60 for a given run of an executable:</p>
     61 
     62 <ol>
     63   <li> <p>Define the environment variable HEAPPROFILE to the filename
     64        to dump the profile to.  For instance, to profile
     65        <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
     66        <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
     67   <li> <p>In your code, bracket the code you want profiled in calls to
     68        <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>.
     69        (These functions are declared in <code>&lt;gperftools/heap-profiler.h&gt;</code>.)
     70        <code>HeapProfilerStart()</code> will take the
     71        profile-filename-prefix as an argument.  Then, as often as
     72        you'd like before calling <code>HeapProfilerStop()</code>, you
     73        can use <code>HeapProfilerDump()</code> or
     74        <code>GetHeapProfile()</code> to examine the profile.  In case
     75        it's useful, <code>IsHeapProfilerRunning()</code> will tell you
     76        whether you've already called HeapProfilerStart() or not.</p>
     77 </ol>
     78 
     79 
     80 <p>For security reasons, heap profiling will not write to a file --
     81 and is thus not usable -- for setuid programs.</p>
     82 
     83 <H2>Modifying Runtime Behavior</H2>
     84 
     85 <p>You can more finely control the behavior of the heap profiler via
     86 environment variables.</p>
     87 
     88 <table frame=box rules=sides cellpadding=5 width=100%>
     89 
     90 <tr valign=top>
     91   <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td>
     92   <td>default: 1073741824 (1 Gb)</td>
     93   <td>
     94     Dump heap profiling information once every specified number of
     95     bytes has been allocated by the program.
     96   </td>
     97 </tr>
     98 
     99 <tr valign=top>
    100   <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td>
    101   <td>default: 104857600 (100 Mb)</td>
    102   <td>
    103     Dump heap profiling information whenever the high-water memory
    104     usage mark increases by the specified number of bytes.
    105   </td>
    106 </tr>
    107 
    108 <tr valign=top>
    109   <td><code>HEAP_PROFILE_MMAP</code></td>
    110   <td>default: false</td>
    111   <td>
    112     Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code>
    113     calls in addition
    114     to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
    115     and <code>new</code>.  <b>NOTE:</b> this causes the profiler to
    116     profile calls internal to tcmalloc, since tcmalloc and friends use
    117     mmap and sbrk internally for allocations.  One partial solution is
    118     to filter these allocations out when running <code>pprof</code>,
    119     with something like
    120     <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>.
    121   </td>
    122 </tr>
    123 
    124 <tr valign=top>
    125   <td><code>HEAP_PROFILE_MMAP_ONLY</code></td>
    126   <td>default: false</td>
    127   <td>
    128     Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code>
    129     calls; do not profile
    130     <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
    131     or <code>new</code>.
    132   </td>
    133 </tr>
    134 
    135 <tr valign=top>
    136   <td><code>HEAP_PROFILE_MMAP_LOG</code></td>
    137   <td>default: false</td>
    138   <td>
    139     Log <code>mmap</code>/<code>munmap</code> calls.
    140   </td>
    141 </tr>
    142 
    143 </table>
    144 
    145 <H2>Checking for Leaks</H2>
    146 
    147 <p>You can use the heap profiler to manually check for leaks, for
    148 instance by reading the profiler output and looking for large
    149 allocations.  However, for that task, it's easier to use the <A
    150 HREF="heap_checker.html">automatic heap-checking facility</A> built
    151 into tcmalloc.</p>
    152 
    153 
    154 <h1><a name="pprof">Analyzing the Output</a></h1>
    155 
    156 <p>If heap-profiling is turned on in a program, the program will
    157 periodically write profiles to the filesystem.  The sequence of
    158 profiles will be named:</p>
    159 <pre>
    160            &lt;prefix&gt;.0000.heap
    161            &lt;prefix&gt;.0001.heap
    162            &lt;prefix&gt;.0002.heap
    163            ...
    164 </pre>
    165 <p>where <code>&lt;prefix&gt;</code> is the filename-prefix supplied
    166 when running the code (e.g. via the <code>HEAPPROFILE</code>
    167 environment variable).  Note that if the supplied prefix
    168 does not start with a <code>/</code>, the profile files will be
    169 written to the program's working directory.</p>
    170 
    171 <p>The profile output can be viewed by passing it to the
    172 <code>pprof</code> tool -- the same tool that's used to analyze <A
    173 HREF="cpuprofile.html">CPU profiles</A>.
    174 
    175 <p>Here are some examples.  These examples assume the binary is named
    176 <code>gfs_master</code>, and a sequence of heap profile files can be
    177 found in files named:</p>
    178 <pre>
    179   /tmp/profile.0001.heap
    180   /tmp/profile.0002.heap
    181   ...
    182   /tmp/profile.0100.heap
    183 </pre>
    184 
    185 <h3>Why is a process so big</h3>
    186 
    187 <pre>
    188     % pprof --gv gfs_master /tmp/profile.0100.heap
    189 </pre>
    190 
    191 <p>This command will pop-up a <code>gv</code> window that displays
    192 the profile information as a directed graph.  Here is a portion
    193 of the resulting output:</p>
    194 
    195 <p><center>
    196 <img src="heap-example1.png">
    197 </center></p>
    198 
    199 A few explanations:
    200 <ul>
    201 <li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
    202      of the live memory, which is 25% of the total live memory.
    203 <li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
    204      accountable for 176.2 MB of the live memory (i.e., it directly
    205      allocated 176.2 MB that has not been freed yet).  Furthermore,
    206      it and its callees are responsible for 729.9 MB.  The
    207      labels on the outgoing edges give a good indication of the
    208      amount allocated by each callee.
    209 </ul>
    210 
    211 <h3>Comparing Profiles</h3>
    212 
    213 <p>You often want to skip allocations during the initialization phase
    214 of a program so you can find gradual memory leaks.  One simple way to
    215 do this is to compare two profiles -- both collected after the program
    216 has been running for a while.  Specify the name of the first profile
    217 using the <code>--base</code> option.  For example:</p>
    218 <pre>
    219    % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap
    220 </pre>
    221 
    222 <p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be
    223 subtracted from the memory-usage in
    224 <code>/tmp/profile.0100.heap</code> and the result will be
    225 displayed.</p>
    226 
    227 <h3>Text display</h3>
    228 
    229 <pre>
    230 % pprof --text gfs_master /tmp/profile.0100.heap
    231    255.6  24.7%  24.7%    255.6  24.7% GFS_MasterChunk::AddServer
    232    184.6  17.8%  42.5%    298.8  28.8% GFS_MasterChunkTable::Create
    233    176.2  17.0%  59.5%    729.9  70.5% GFS_MasterChunkTable::UpdateState
    234    169.8  16.4%  75.9%    169.8  16.4% PendingClone::PendingClone
    235     76.3   7.4%  83.3%     76.3   7.4% __default_alloc_template::_S_chunk_alloc
    236     49.5   4.8%  88.0%     49.5   4.8% hashtable::resize
    237    ...
    238 </pre>
    239 
    240 <p>
    241 <ul>
    242   <li> The first column contains the direct memory use in MB.
    243   <li> The fourth column contains memory use by the procedure
    244        and all of its callees.
    245   <li> The second and fifth columns are just percentage
    246        representations of the numbers in the first and fourth columns.
    247   <li> The third column is a cumulative sum of the second column
    248        (i.e., the <code>k</code>th entry in the third column is the
    249        sum of the first <code>k</code> entries in the second column.)
    250 </ul>
    251 
    252 <h3>Ignoring or focusing on specific regions</h3>
    253 
    254 <p>The following command will give a graphical display of a subset of
    255 the call-graph.  Only paths in the call-graph that match the regular
    256 expression <code>DataBuffer</code> are included:</p>
    257 <pre>
    258 % pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap
    259 </pre>
    260 
    261 <p>Similarly, the following command will omit all paths subset of the
    262 call-graph.  All paths in the call-graph that match the regular
    263 expression <code>DataBuffer</code> are discarded:</p>
    264 <pre>
    265 % pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap
    266 </pre>
    267 
    268 <h3>Total allocations + object-level information</h3>
    269 
    270 <p>All of the previous examples have displayed the amount of in-use
    271 space.  I.e., the number of bytes that have been allocated but not
    272 freed.  You can also get other types of information by supplying a
    273 flag to <code>pprof</code>:</p>
    274 
    275 <center>
    276 <table frame=box rules=sides cellpadding=5 width=100%>
    277 
    278 <tr valign=top>
    279   <td><code>--inuse_space</code></td>
    280   <td>
    281      Display the number of in-use megabytes (i.e. space that has
    282      been allocated but not freed).  This is the default.
    283   </td>
    284 </tr>
    285 
    286 <tr valign=top>
    287   <td><code>--inuse_objects</code></td>
    288   <td>
    289      Display the number of in-use objects (i.e. number of
    290      objects that have been allocated but not freed).
    291   </td>
    292 </tr>
    293 
    294 <tr valign=top>
    295   <td><code>--alloc_space</code></td>
    296   <td>
    297      Display the number of allocated megabytes.  This includes
    298      the space that has since been de-allocated.  Use this
    299      if you want to find the main allocation sites in the
    300      program.
    301   </td>
    302 </tr>
    303 
    304 <tr valign=top>
    305   <td><code>--alloc_objects</code></td>
    306   <td>
    307      Display the number of allocated objects.  This includes
    308      the objects that have since been de-allocated.  Use this
    309      if you want to find the main allocation sites in the
    310      program.
    311   </td>
    312 
    313 </table>
    314 </center>
    315 
    316 
    317 <h3>Interactive mode</a></h3>
    318 
    319 <p>By default -- if you don't specify any flags to the contrary --
    320 pprof runs in interactive mode.  At the <code>(pprof)</code> prompt,
    321 you can run many of the commands described above.  You can type
    322 <code>help</code> for a list of what commands are available in
    323 interactive mode.</p>
    324 
    325 
    326 <h1>Caveats</h1>
    327 
    328 <ul>
    329   <li> Heap profiling requires the use of libtcmalloc.  This
    330        requirement may be removed in a future version of the heap
    331        profiler, and the heap profiler separated out into its own
    332        library.
    333      
    334   <li> If the program linked in a library that was not compiled
    335        with enough symbolic information, all samples associated
    336        with the library may be charged to the last symbol found
    337        in the program before the libary.  This will artificially
    338        inflate the count for that symbol.
    339 
    340   <li> If you run the program on one machine, and profile it on
    341        another, and the shared libraries are different on the two
    342        machines, the profiling output may be confusing: samples that
    343        fall within the shared libaries may be assigned to arbitrary
    344        procedures.
    345 
    346   <li> Several libraries, such as some STL implementations, do their
    347        own memory management.  This may cause strange profiling
    348        results.  We have code in libtcmalloc to cause STL to use
    349        tcmalloc for memory management (which in our tests is better
    350        than STL's internal management), though it only works for some
    351        STL implementations.
    352 
    353   <li> If your program forks, the children will also be profiled
    354        (since they inherit the same HEAPPROFILE setting).  Each
    355        process is profiled separately; to distinguish the child
    356        profiles from the parent profile and from each other, all
    357        children will have their process-id attached to the HEAPPROFILE
    358        name.
    359      
    360   <li> Due to a hack we make to work around a possible gcc bug, your
    361        profiles may end up named strangely if the first character of
    362        your HEAPPROFILE variable has ascii value greater than 127.
    363        This should be exceedingly rare, but if you need to use such a
    364        name, just set prepend <code>./</code> to your filename:
    365        <code>HEAPPROFILE=./&Auml;gypten</code>.
    366 </ul>
    367 
    368 <hr>
    369 <address>Sanjay Ghemawat
    370 <!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
    371 </address>
    372 </body>
    373 </html>
    374