1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> 2 <HTML> 3 4 <HEAD> 5 <link rel="stylesheet" href="designstyle.css"> 6 <title>Gperftools Heap Profiler</title> 7 </HEAD> 8 9 <BODY> 10 11 <p align=right> 12 <i>Last modified 13 <script type=text/javascript> 14 var lm = new Date(document.lastModified); 15 document.write(lm.toDateString()); 16 </script></i> 17 </p> 18 19 <p>This is the heap profiler we use at Google, to explore how C++ 20 programs manage memory. This facility can be useful for</p> 21 <ul> 22 <li> Figuring out what is in the program heap at any given time 23 <li> Locating memory leaks 24 <li> Finding places that do a lot of allocation 25 </ul> 26 27 <p>The profiling system instruments all allocations and frees. It 28 keeps track of various pieces of information per allocation site. An 29 allocation site is defined as the active stack trace at the call to 30 <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or, 31 <code>new</code>.</p> 32 33 <p>There are three parts to using it: linking the library into an 34 application, running the code, and analyzing the output.</p> 35 36 37 <h1>Linking in the Library</h1> 38 39 <p>To install the heap profiler into your executable, add 40 <code>-ltcmalloc</code> to the link-time step for your executable. 41 Also, while we don't necessarily recommend this form of usage, it's 42 possible to add in the profiler at run-time using 43 <code>LD_PRELOAD</code>: 44 <pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" <binary></pre> 45 46 <p>This does <i>not</i> turn on heap profiling; it just inserts the 47 code. For that reason, it's practical to just always link 48 <code>-ltcmalloc</code> into a binary while developing; that's what we 49 do at Google. (However, since any user can turn on the profiler by 50 setting an environment variable, it's not necessarily recommended to 51 install profiler-linked binaries into a production, running 52 system.) Note that if you wish to use the heap profiler, you must 53 also use the tcmalloc memory-allocation library. There is no way 54 currently to use the heap profiler separate from tcmalloc.</p> 55 56 57 <h1>Running the Code</h1> 58 59 <p>There are several alternatives to actually turn on heap profiling 60 for a given run of an executable:</p> 61 62 <ol> 63 <li> <p>Define the environment variable HEAPPROFILE to the filename 64 to dump the profile to. For instance, to profile 65 <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p> 66 <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre> 67 <li> <p>In your code, bracket the code you want profiled in calls to 68 <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>. 69 (These functions are declared in <code><gperftools/heap-profiler.h></code>.) 70 <code>HeapProfilerStart()</code> will take the 71 profile-filename-prefix as an argument. Then, as often as 72 you'd like before calling <code>HeapProfilerStop()</code>, you 73 can use <code>HeapProfilerDump()</code> or 74 <code>GetHeapProfile()</code> to examine the profile. In case 75 it's useful, <code>IsHeapProfilerRunning()</code> will tell you 76 whether you've already called HeapProfilerStart() or not.</p> 77 </ol> 78 79 80 <p>For security reasons, heap profiling will not write to a file -- 81 and is thus not usable -- for setuid programs.</p> 82 83 <H2>Modifying Runtime Behavior</H2> 84 85 <p>You can more finely control the behavior of the heap profiler via 86 environment variables.</p> 87 88 <table frame=box rules=sides cellpadding=5 width=100%> 89 90 <tr valign=top> 91 <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td> 92 <td>default: 1073741824 (1 Gb)</td> 93 <td> 94 Dump heap profiling information once every specified number of 95 bytes has been allocated by the program. 96 </td> 97 </tr> 98 99 <tr valign=top> 100 <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td> 101 <td>default: 104857600 (100 Mb)</td> 102 <td> 103 Dump heap profiling information whenever the high-water memory 104 usage mark increases by the specified number of bytes. 105 </td> 106 </tr> 107 108 <tr valign=top> 109 <td><code>HEAP_PROFILE_MMAP</code></td> 110 <td>default: false</td> 111 <td> 112 Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code> 113 calls in addition 114 to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, 115 and <code>new</code>. <b>NOTE:</b> this causes the profiler to 116 profile calls internal to tcmalloc, since tcmalloc and friends use 117 mmap and sbrk internally for allocations. One partial solution is 118 to filter these allocations out when running <code>pprof</code>, 119 with something like 120 <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>. 121 </td> 122 </tr> 123 124 <tr valign=top> 125 <td><code>HEAP_PROFILE_MMAP_ONLY</code></td> 126 <td>default: false</td> 127 <td> 128 Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code> 129 calls; do not profile 130 <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, 131 or <code>new</code>. 132 </td> 133 </tr> 134 135 <tr valign=top> 136 <td><code>HEAP_PROFILE_MMAP_LOG</code></td> 137 <td>default: false</td> 138 <td> 139 Log <code>mmap</code>/<code>munmap</code> calls. 140 </td> 141 </tr> 142 143 </table> 144 145 <H2>Checking for Leaks</H2> 146 147 <p>You can use the heap profiler to manually check for leaks, for 148 instance by reading the profiler output and looking for large 149 allocations. However, for that task, it's easier to use the <A 150 HREF="heap_checker.html">automatic heap-checking facility</A> built 151 into tcmalloc.</p> 152 153 154 <h1><a name="pprof">Analyzing the Output</a></h1> 155 156 <p>If heap-profiling is turned on in a program, the program will 157 periodically write profiles to the filesystem. The sequence of 158 profiles will be named:</p> 159 <pre> 160 <prefix>.0000.heap 161 <prefix>.0001.heap 162 <prefix>.0002.heap 163 ... 164 </pre> 165 <p>where <code><prefix></code> is the filename-prefix supplied 166 when running the code (e.g. via the <code>HEAPPROFILE</code> 167 environment variable). Note that if the supplied prefix 168 does not start with a <code>/</code>, the profile files will be 169 written to the program's working directory.</p> 170 171 <p>The profile output can be viewed by passing it to the 172 <code>pprof</code> tool -- the same tool that's used to analyze <A 173 HREF="cpuprofile.html">CPU profiles</A>. 174 175 <p>Here are some examples. These examples assume the binary is named 176 <code>gfs_master</code>, and a sequence of heap profile files can be 177 found in files named:</p> 178 <pre> 179 /tmp/profile.0001.heap 180 /tmp/profile.0002.heap 181 ... 182 /tmp/profile.0100.heap 183 </pre> 184 185 <h3>Why is a process so big</h3> 186 187 <pre> 188 % pprof --gv gfs_master /tmp/profile.0100.heap 189 </pre> 190 191 <p>This command will pop-up a <code>gv</code> window that displays 192 the profile information as a directed graph. Here is a portion 193 of the resulting output:</p> 194 195 <p><center> 196 <img src="heap-example1.png"> 197 </center></p> 198 199 A few explanations: 200 <ul> 201 <li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB 202 of the live memory, which is 25% of the total live memory. 203 <li> <code>GFS_MasterChunkTable::UpdateState</code> is directly 204 accountable for 176.2 MB of the live memory (i.e., it directly 205 allocated 176.2 MB that has not been freed yet). Furthermore, 206 it and its callees are responsible for 729.9 MB. The 207 labels on the outgoing edges give a good indication of the 208 amount allocated by each callee. 209 </ul> 210 211 <h3>Comparing Profiles</h3> 212 213 <p>You often want to skip allocations during the initialization phase 214 of a program so you can find gradual memory leaks. One simple way to 215 do this is to compare two profiles -- both collected after the program 216 has been running for a while. Specify the name of the first profile 217 using the <code>--base</code> option. For example:</p> 218 <pre> 219 % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap 220 </pre> 221 222 <p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be 223 subtracted from the memory-usage in 224 <code>/tmp/profile.0100.heap</code> and the result will be 225 displayed.</p> 226 227 <h3>Text display</h3> 228 229 <pre> 230 % pprof --text gfs_master /tmp/profile.0100.heap 231 255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer 232 184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create 233 176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState 234 169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone 235 76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc 236 49.5 4.8% 88.0% 49.5 4.8% hashtable::resize 237 ... 238 </pre> 239 240 <p> 241 <ul> 242 <li> The first column contains the direct memory use in MB. 243 <li> The fourth column contains memory use by the procedure 244 and all of its callees. 245 <li> The second and fifth columns are just percentage 246 representations of the numbers in the first and fourth columns. 247 <li> The third column is a cumulative sum of the second column 248 (i.e., the <code>k</code>th entry in the third column is the 249 sum of the first <code>k</code> entries in the second column.) 250 </ul> 251 252 <h3>Ignoring or focusing on specific regions</h3> 253 254 <p>The following command will give a graphical display of a subset of 255 the call-graph. Only paths in the call-graph that match the regular 256 expression <code>DataBuffer</code> are included:</p> 257 <pre> 258 % pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap 259 </pre> 260 261 <p>Similarly, the following command will omit all paths subset of the 262 call-graph. All paths in the call-graph that match the regular 263 expression <code>DataBuffer</code> are discarded:</p> 264 <pre> 265 % pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap 266 </pre> 267 268 <h3>Total allocations + object-level information</h3> 269 270 <p>All of the previous examples have displayed the amount of in-use 271 space. I.e., the number of bytes that have been allocated but not 272 freed. You can also get other types of information by supplying a 273 flag to <code>pprof</code>:</p> 274 275 <center> 276 <table frame=box rules=sides cellpadding=5 width=100%> 277 278 <tr valign=top> 279 <td><code>--inuse_space</code></td> 280 <td> 281 Display the number of in-use megabytes (i.e. space that has 282 been allocated but not freed). This is the default. 283 </td> 284 </tr> 285 286 <tr valign=top> 287 <td><code>--inuse_objects</code></td> 288 <td> 289 Display the number of in-use objects (i.e. number of 290 objects that have been allocated but not freed). 291 </td> 292 </tr> 293 294 <tr valign=top> 295 <td><code>--alloc_space</code></td> 296 <td> 297 Display the number of allocated megabytes. This includes 298 the space that has since been de-allocated. Use this 299 if you want to find the main allocation sites in the 300 program. 301 </td> 302 </tr> 303 304 <tr valign=top> 305 <td><code>--alloc_objects</code></td> 306 <td> 307 Display the number of allocated objects. This includes 308 the objects that have since been de-allocated. Use this 309 if you want to find the main allocation sites in the 310 program. 311 </td> 312 313 </table> 314 </center> 315 316 317 <h3>Interactive mode</a></h3> 318 319 <p>By default -- if you don't specify any flags to the contrary -- 320 pprof runs in interactive mode. At the <code>(pprof)</code> prompt, 321 you can run many of the commands described above. You can type 322 <code>help</code> for a list of what commands are available in 323 interactive mode.</p> 324 325 326 <h1>Caveats</h1> 327 328 <ul> 329 <li> Heap profiling requires the use of libtcmalloc. This 330 requirement may be removed in a future version of the heap 331 profiler, and the heap profiler separated out into its own 332 library. 333 334 <li> If the program linked in a library that was not compiled 335 with enough symbolic information, all samples associated 336 with the library may be charged to the last symbol found 337 in the program before the libary. This will artificially 338 inflate the count for that symbol. 339 340 <li> If you run the program on one machine, and profile it on 341 another, and the shared libraries are different on the two 342 machines, the profiling output may be confusing: samples that 343 fall within the shared libaries may be assigned to arbitrary 344 procedures. 345 346 <li> Several libraries, such as some STL implementations, do their 347 own memory management. This may cause strange profiling 348 results. We have code in libtcmalloc to cause STL to use 349 tcmalloc for memory management (which in our tests is better 350 than STL's internal management), though it only works for some 351 STL implementations. 352 353 <li> If your program forks, the children will also be profiled 354 (since they inherit the same HEAPPROFILE setting). Each 355 process is profiled separately; to distinguish the child 356 profiles from the parent profile and from each other, all 357 children will have their process-id attached to the HEAPPROFILE 358 name. 359 360 <li> Due to a hack we make to work around a possible gcc bug, your 361 profiles may end up named strangely if the first character of 362 your HEAPPROFILE variable has ascii value greater than 127. 363 This should be exceedingly rare, but if you need to use such a 364 name, just set prepend <code>./</code> to your filename: 365 <code>HEAPPROFILE=./Ägypten</code>. 366 </ul> 367 368 <hr> 369 <address>Sanjay Ghemawat 370 <!-- Created: Tue Dec 19 10:43:14 PST 2000 --> 371 </address> 372 </body> 373 </html> 374