Home | History | Annotate | Download | only in vendor
      1 IMPORTANT NOTE FOR 64-BIT USERS
      2 -------------------------------
      3 There are known issues with some perftools functionality on x86_64
      4 systems.  See 64-BIT ISSUES, below.
      5 
      6 
      7 TCMALLOC
      8 --------
      9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
     10 tcmalloc -- a replacement for malloc and new.  See below for some
     11 environment variables you can use with tcmalloc, as well.
     12 
     13 tcmalloc functionality is available on all systems we've tested; see
     14 INSTALL for more details.  See README_windows.txt for instructions on
     15 using tcmalloc on Windows.
     16 
     17 NOTE: When compiling with programs with gcc, that you plan to link
     18 with libtcmalloc, it's safest to pass in the flags
     19 
     20  -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
     21 
     22 when compiling.  gcc makes some optimizations assuming it is using its
     23 own, built-in malloc; that assumption obviously isn't true with
     24 tcmalloc.  In practice, we haven't seen any problems with this, but
     25 the expected risk is highest for users who register their own malloc
     26 hooks with tcmalloc (using gperftools/malloc_hook.h).  The risk is
     27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in
     28 the above flags :-) ).
     29 
     30 
     31 HEAP PROFILER
     32 -------------
     33 See doc/heap-profiler.html for information about how to use tcmalloc's
     34 heap profiler and analyze its output.
     35 
     36 As a quick-start, do the following after installing this package:
     37 
     38 1) Link your executable with -ltcmalloc
     39 2) Run your executable with the HEAPPROFILE environment var set:
     40      $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
     41 3) Run pprof to analyze the heap usage
     42      $ pprof <path/to/binary> /tmp/heapprof.0045.heap  # run 'ls' to see options
     43      $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
     44 
     45 You can also use LD_PRELOAD to heap-profile an executable that you
     46 didn't compile.
     47 
     48 There are other environment variables, besides HEAPPROFILE, you can
     49 set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
     50 below.
     51 
     52 The heap profiler is available on all unix-based systems we've tested;
     53 see INSTALL for more details.  It is not currently available on Windows.
     54 
     55 
     56 HEAP CHECKER
     57 ------------
     58 See doc/heap-checker.html for information about how to use tcmalloc's
     59 heap checker.
     60 
     61 In order to catch all heap leaks, tcmalloc must be linked *last* into
     62 your executable.  The heap checker may mischaracterize some memory
     63 accesses in libraries listed after it on the link line.  For instance,
     64 it may report these libraries as leaking memory when they're not.
     65 (See the source code for more details.)
     66 
     67 Here's a quick-start for how to use:
     68 
     69 As a quick-start, do the following after installing this package:
     70 
     71 1) Link your executable with -ltcmalloc
     72 2) Run your executable with the HEAPCHECK environment var set:
     73      $ HEAPCHECK=1 <path/to/binary> [binary args]
     74 
     75 Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
     76 
     77 You can also use LD_PRELOAD to heap-check an executable that you
     78 didn't compile.
     79 
     80 The heap checker is only available on Linux at this time; see INSTALL
     81 for more details.
     82 
     83 
     84 CPU PROFILER
     85 ------------
     86 See doc/cpu-profiler.html for information about how to use the CPU
     87 profiler and analyze its output.
     88 
     89 As a quick-start, do the following after installing this package:
     90 
     91 1) Link your executable with -lprofiler
     92 2) Run your executable with the CPUPROFILE environment var set:
     93      $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
     94 3) Run pprof to analyze the CPU usage
     95      $ pprof <path/to/binary> /tmp/prof.out      # -pg-like text output
     96      $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
     97 
     98 There are other environment variables, besides CPUPROFILE, you can set
     99 to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
    100 
    101 The CPU profiler is available on all unix-based systems we've tested;
    102 see INSTALL for more details.  It is not currently available on Windows.
    103 
    104 NOTE: CPU profiling doesn't work after fork (unless you immediately
    105       do an exec()-like call afterwards).  Furthermore, if you do
    106       fork, and the child calls exit(), it may corrupt the profile
    107       data.  You can use _exit() to work around this.  We hope to have
    108       a fix for both problems in the next release of perftools
    109       (hopefully perftools 1.2).
    110 
    111 
    112 EVERYTHING IN ONE
    113 -----------------
    114 If you want the CPU profiler, heap profiler, and heap leak-checker to
    115 all be available for your application, you can do:
    116    gcc -o myapp ... -lprofiler -ltcmalloc
    117 
    118 However, if you have a reason to use the static versions of the
    119 library, this two-library linking won't work:
    120    gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a  # errors!
    121 
    122 Instead, use the special libtcmalloc_and_profiler library, which we
    123 make for just this purpose:
    124    gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
    125 
    126 
    127 CONFIGURATION OPTIONS
    128 ---------------------
    129 For advanced users, there are several flags you can pass to
    130 './configure' that tweak tcmalloc performace.  (These are in addition
    131 to the environment variables you can set at runtime to affect
    132 tcmalloc, described below.)  See the INSTALL file for details.
    133 
    134 
    135 ENVIRONMENT VARIABLES
    136 ---------------------
    137 The cpu profiler, heap checker, and heap profiler will lie dormant,
    138 using no memory or CPU, until you turn them on.  (Thus, there's no
    139 harm in linking -lprofiler into every application, and also -ltcmalloc
    140 assuming you're ok using the non-libc malloc library.)
    141 
    142 The easiest way to turn them on is by setting the appropriate
    143 environment variables.  We have several variables that let you
    144 enable/disable features as well as tweak parameters.
    145 
    146 Here are some of the most important variables:
    147 
    148 HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
    149 HEAPCHECK=<type>  -- turns on heap checking with strictness 'type'
    150 CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
    151 PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
    152                      surrounded with ProfilerEnable()/ProfilerDisable().
    153 PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
    154 
    155 TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
    156 MALLOCSTATS=<level>    -- prints memory-use stats at program-exit
    157 
    158 For a full list of variables, see the documentation pages:
    159    doc/cpuprofile.html
    160    doc/heapprofile.html
    161    doc/heap_checker.html
    162 
    163 
    164 COMPILING ON NON-LINUX SYSTEMS
    165 ------------------------------
    166 
    167 Perftools was developed and tested on x86 Linux systems, and it works
    168 in its full generality only on those systems.  However, we've
    169 successfully ported much of the tcmalloc library to FreeBSD, Solaris
    170 x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
    171 functionality in tcmalloc_minimal to Windows.  See INSTALL for details.
    172 See README_windows.txt for details on the Windows port.
    173 
    174 
    175 PERFORMANCE
    176 -----------
    177 
    178 If you're interested in some third-party comparisons of tcmalloc to
    179 other malloc libraries, here are a few web pages that have been
    180 brought to our attention.  The first discusses the effect of using
    181 various malloc libraries on OpenLDAP.  The second compares tcmalloc to
    182 win32's malloc.
    183   http://www.highlandsun.com/hyc/malloc/
    184   http://gaiacrtn.free.fr/articles/win32perftools.html
    185 
    186 It's possible to build tcmalloc in a way that trades off faster
    187 performance (particularly for deletes) at the cost of more memory
    188 fragmentation (that is, more unusable memory on your system).  See the
    189 INSTALL file for details.
    190 
    191 
    192 OLD SYSTEM ISSUES
    193 -----------------
    194 
    195 When compiling perftools on some old systems, like RedHat 8, you may
    196 get an error like this:
    197     ___tls_get_addr: symbol not found
    198 
    199 This means that you have a system where some parts are updated enough
    200 to support Thread Local Storage, but others are not.  The perftools
    201 configure script can't always detect this kind of case, leading to
    202 that error.  To fix it, just comment out (or delete) the line
    203    #define HAVE_TLS 1
    204 in your config.h file before building.
    205 
    206 
    207 64-BIT ISSUES
    208 -------------
    209 
    210 There are two issues that can cause program hangs or crashes on x86_64
    211 64-bit systems, which use the libunwind library to get stack-traces.
    212 Neither issue should affect the core tcmalloc library; they both
    213 affect the perftools tools such as cpu-profiler, heap-checker, and
    214 heap-profiler.
    215 
    216 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
    217 libc function dl_iterate_phdr() acquires its locks in the wrong
    218 order.  This bug should not affect tcmalloc, but may cause occasional
    219 deadlock with the cpu-profiler, heap-profiler, and heap-checker.
    220 Its likeliness increases the more dlopen() commands an executable has.
    221 Most executables don't have any, though several library routines like
    222 getgrgid() call dlopen() behind the scenes.
    223 
    224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
    225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes
    226 cause a segfault.  I'll explain the problem first, and then some
    227 workarounds.
    228 
    229 Note that this only affects the cpu-profiler, which is a
    230 gperftools feature you must turn on manually by setting the
    231 CPUPROFILE environment variable.  If you do not turn on cpu-profiling,
    232 you shouldn't see any crashes due to perftools.
    233 
    234 The gory details: The underlying problem is in the backtrace()
    235 function, which is a built-in function in libc.
    236 Backtracing is fairly straightforward in the normal case, but can run
    237 into problems when having to backtrace across a signal frame.
    238 Unfortunately, the cpu-profiler uses signals in order to register a
    239 profiling event, so every backtrace that the profiler does crosses a
    240 signal frame.
    241 
    242 In our experience, the only time there is trouble is when the signal
    243 fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is
    244 called quite a bit from system libraries, particularly at program
    245 startup and when creating a new thread.
    246 
    247 The solution: The dwarf debugging format has support for 'cfi
    248 annotations', which make it easy to recognize a signal frame.  Some OS
    249 distributions, such as Fedora and gentoo 2007.0, already have added
    250 cfi annotations to their libc.  A future version of libunwind should
    251 recognize these annotations; these systems should not see any
    252 crashses.
    253 
    254 Workarounds: If you see problems with crashes when running the
    255 cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
    256 your code, rather than setting CPUPROFILE.  This will profile only
    257 those sections of the codebase.  Though we haven't done much testing,
    258 in theory this should reduce the chance of crashes by limiting the
    259 signal generation to only a small part of the codebase.  Ideally, you
    260 would not use ProfilerStart()/ProfilerStop() around code that spawns
    261 new threads, or is otherwise likely to cause a call to
    262 pthread_mutex_lock!
    263 
    264 ---
    265 17 May 2011
    266