1 IMPORTANT NOTE FOR 64-BIT USERS 2 ------------------------------- 3 There are known issues with some perftools functionality on x86_64 4 systems. See 64-BIT ISSUES, below. 5 6 7 TCMALLOC 8 -------- 9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of 10 tcmalloc -- a replacement for malloc and new. See below for some 11 environment variables you can use with tcmalloc, as well. 12 13 tcmalloc functionality is available on all systems we've tested; see 14 INSTALL for more details. See README_windows.txt for instructions on 15 using tcmalloc on Windows. 16 17 NOTE: When compiling with programs with gcc, that you plan to link 18 with libtcmalloc, it's safest to pass in the flags 19 20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free 21 22 when compiling. gcc makes some optimizations assuming it is using its 23 own, built-in malloc; that assumption obviously isn't true with 24 tcmalloc. In practice, we haven't seen any problems with this, but 25 the expected risk is highest for users who register their own malloc 26 hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is 27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in 28 the above flags :-) ). 29 30 31 HEAP PROFILER 32 ------------- 33 See doc/heap-profiler.html for information about how to use tcmalloc's 34 heap profiler and analyze its output. 35 36 As a quick-start, do the following after installing this package: 37 38 1) Link your executable with -ltcmalloc 39 2) Run your executable with the HEAPPROFILE environment var set: 40 $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] 41 3) Run pprof to analyze the heap usage 42 $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options 43 $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap 44 45 You can also use LD_PRELOAD to heap-profile an executable that you 46 didn't compile. 47 48 There are other environment variables, besides HEAPPROFILE, you can 49 set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" 50 below. 51 52 The heap profiler is available on all unix-based systems we've tested; 53 see INSTALL for more details. It is not currently available on Windows. 54 55 56 HEAP CHECKER 57 ------------ 58 See doc/heap-checker.html for information about how to use tcmalloc's 59 heap checker. 60 61 In order to catch all heap leaks, tcmalloc must be linked *last* into 62 your executable. The heap checker may mischaracterize some memory 63 accesses in libraries listed after it on the link line. For instance, 64 it may report these libraries as leaking memory when they're not. 65 (See the source code for more details.) 66 67 Here's a quick-start for how to use: 68 69 As a quick-start, do the following after installing this package: 70 71 1) Link your executable with -ltcmalloc 72 2) Run your executable with the HEAPCHECK environment var set: 73 $ HEAPCHECK=1 <path/to/binary> [binary args] 74 75 Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian 76 77 You can also use LD_PRELOAD to heap-check an executable that you 78 didn't compile. 79 80 The heap checker is only available on Linux at this time; see INSTALL 81 for more details. 82 83 84 CPU PROFILER 85 ------------ 86 See doc/cpu-profiler.html for information about how to use the CPU 87 profiler and analyze its output. 88 89 As a quick-start, do the following after installing this package: 90 91 1) Link your executable with -lprofiler 92 2) Run your executable with the CPUPROFILE environment var set: 93 $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] 94 3) Run pprof to analyze the CPU usage 95 $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output 96 $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output 97 98 There are other environment variables, besides CPUPROFILE, you can set 99 to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. 100 101 The CPU profiler is available on all unix-based systems we've tested; 102 see INSTALL for more details. It is not currently available on Windows. 103 104 NOTE: CPU profiling doesn't work after fork (unless you immediately 105 do an exec()-like call afterwards). Furthermore, if you do 106 fork, and the child calls exit(), it may corrupt the profile 107 data. You can use _exit() to work around this. We hope to have 108 a fix for both problems in the next release of perftools 109 (hopefully perftools 1.2). 110 111 112 EVERYTHING IN ONE 113 ----------------- 114 If you want the CPU profiler, heap profiler, and heap leak-checker to 115 all be available for your application, you can do: 116 gcc -o myapp ... -lprofiler -ltcmalloc 117 118 However, if you have a reason to use the static versions of the 119 library, this two-library linking won't work: 120 gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors! 121 122 Instead, use the special libtcmalloc_and_profiler library, which we 123 make for just this purpose: 124 gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a 125 126 127 CONFIGURATION OPTIONS 128 --------------------- 129 For advanced users, there are several flags you can pass to 130 './configure' that tweak tcmalloc performace. (These are in addition 131 to the environment variables you can set at runtime to affect 132 tcmalloc, described below.) See the INSTALL file for details. 133 134 135 ENVIRONMENT VARIABLES 136 --------------------- 137 The cpu profiler, heap checker, and heap profiler will lie dormant, 138 using no memory or CPU, until you turn them on. (Thus, there's no 139 harm in linking -lprofiler into every application, and also -ltcmalloc 140 assuming you're ok using the non-libc malloc library.) 141 142 The easiest way to turn them on is by setting the appropriate 143 environment variables. We have several variables that let you 144 enable/disable features as well as tweak parameters. 145 146 Here are some of the most important variables: 147 148 HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix 149 HEAPCHECK=<type> -- turns on heap checking with strictness 'type' 150 CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. 151 PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code 152 surrounded with ProfilerEnable()/ProfilerDisable(). 153 PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples. 154 155 TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits 156 MALLOCSTATS=<level> -- prints memory-use stats at program-exit 157 158 For a full list of variables, see the documentation pages: 159 doc/cpuprofile.html 160 doc/heapprofile.html 161 doc/heap_checker.html 162 163 164 COMPILING ON NON-LINUX SYSTEMS 165 ------------------------------ 166 167 Perftools was developed and tested on x86 Linux systems, and it works 168 in its full generality only on those systems. However, we've 169 successfully ported much of the tcmalloc library to FreeBSD, Solaris 170 x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic 171 functionality in tcmalloc_minimal to Windows. See INSTALL for details. 172 See README_windows.txt for details on the Windows port. 173 174 175 PERFORMANCE 176 ----------- 177 178 If you're interested in some third-party comparisons of tcmalloc to 179 other malloc libraries, here are a few web pages that have been 180 brought to our attention. The first discusses the effect of using 181 various malloc libraries on OpenLDAP. The second compares tcmalloc to 182 win32's malloc. 183 http://www.highlandsun.com/hyc/malloc/ 184 http://gaiacrtn.free.fr/articles/win32perftools.html 185 186 It's possible to build tcmalloc in a way that trades off faster 187 performance (particularly for deletes) at the cost of more memory 188 fragmentation (that is, more unusable memory on your system). See the 189 INSTALL file for details. 190 191 192 OLD SYSTEM ISSUES 193 ----------------- 194 195 When compiling perftools on some old systems, like RedHat 8, you may 196 get an error like this: 197 ___tls_get_addr: symbol not found 198 199 This means that you have a system where some parts are updated enough 200 to support Thread Local Storage, but others are not. The perftools 201 configure script can't always detect this kind of case, leading to 202 that error. To fix it, just comment out (or delete) the line 203 #define HAVE_TLS 1 204 in your config.h file before building. 205 206 207 64-BIT ISSUES 208 ------------- 209 210 There are two issues that can cause program hangs or crashes on x86_64 211 64-bit systems, which use the libunwind library to get stack-traces. 212 Neither issue should affect the core tcmalloc library; they both 213 affect the perftools tools such as cpu-profiler, heap-checker, and 214 heap-profiler. 215 216 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the 217 libc function dl_iterate_phdr() acquires its locks in the wrong 218 order. This bug should not affect tcmalloc, but may cause occasional 219 deadlock with the cpu-profiler, heap-profiler, and heap-checker. 220 Its likeliness increases the more dlopen() commands an executable has. 221 Most executables don't have any, though several library routines like 222 getgrgid() call dlopen() behind the scenes. 223 224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the 225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes 226 cause a segfault. I'll explain the problem first, and then some 227 workarounds. 228 229 Note that this only affects the cpu-profiler, which is a 230 gperftools feature you must turn on manually by setting the 231 CPUPROFILE environment variable. If you do not turn on cpu-profiling, 232 you shouldn't see any crashes due to perftools. 233 234 The gory details: The underlying problem is in the backtrace() 235 function, which is a built-in function in libc. 236 Backtracing is fairly straightforward in the normal case, but can run 237 into problems when having to backtrace across a signal frame. 238 Unfortunately, the cpu-profiler uses signals in order to register a 239 profiling event, so every backtrace that the profiler does crosses a 240 signal frame. 241 242 In our experience, the only time there is trouble is when the signal 243 fires in the middle of pthread_mutex_lock. pthread_mutex_lock is 244 called quite a bit from system libraries, particularly at program 245 startup and when creating a new thread. 246 247 The solution: The dwarf debugging format has support for 'cfi 248 annotations', which make it easy to recognize a signal frame. Some OS 249 distributions, such as Fedora and gentoo 2007.0, already have added 250 cfi annotations to their libc. A future version of libunwind should 251 recognize these annotations; these systems should not see any 252 crashses. 253 254 Workarounds: If you see problems with crashes when running the 255 cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into 256 your code, rather than setting CPUPROFILE. This will profile only 257 those sections of the codebase. Though we haven't done much testing, 258 in theory this should reduce the chance of crashes by limiting the 259 signal generation to only a small part of the codebase. Ideally, you 260 would not use ProfilerStart()/ProfilerStop() around code that spawns 261 new threads, or is otherwise likely to cause a call to 262 pthread_mutex_lock! 263 264 --- 265 17 May 2011 266