README
1 IMPORTANT NOTE FOR 64-BIT USERS
2 -------------------------------
3 There are known issues with some perftools functionality on x86_64
4 systems. See 64-BIT ISSUES, below.
5
6
7 TCMALLOC
8 --------
9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
10 tcmalloc -- a replacement for malloc and new. See below for some
11 environment variables you can use with tcmalloc, as well.
12
13 tcmalloc functionality is available on all systems we've tested; see
14 INSTALL for more details. See README_windows.txt for instructions on
15 using tcmalloc on Windows.
16
17 NOTE: When compiling with programs with gcc, that you plan to link
18 with libtcmalloc, it's safest to pass in the flags
19
20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
21
22 when compiling. gcc makes some optimizations assuming it is using its
23 own, built-in malloc; that assumption obviously isn't true with
24 tcmalloc. In practice, we haven't seen any problems with this, but
25 the expected risk is highest for users who register their own malloc
26 hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in
28 the above flags :-) ).
29
30
31 HEAP PROFILER
32 -------------
33 See doc/heap-profiler.html for information about how to use tcmalloc's
34 heap profiler and analyze its output.
35
36 As a quick-start, do the following after installing this package:
37
38 1) Link your executable with -ltcmalloc
39 2) Run your executable with the HEAPPROFILE environment var set:
40 $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
41 3) Run pprof to analyze the heap usage
42 $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
43 $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
44
45 You can also use LD_PRELOAD to heap-profile an executable that you
46 didn't compile.
47
48 There are other environment variables, besides HEAPPROFILE, you can
49 set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
50 below.
51
52 The heap profiler is available on all unix-based systems we've tested;
53 see INSTALL for more details. It is not currently available on Windows.
54
55
56 HEAP CHECKER
57 ------------
58 See doc/heap-checker.html for information about how to use tcmalloc's
59 heap checker.
60
61 In order to catch all heap leaks, tcmalloc must be linked *last* into
62 your executable. The heap checker may mischaracterize some memory
63 accesses in libraries listed after it on the link line. For instance,
64 it may report these libraries as leaking memory when they're not.
65 (See the source code for more details.)
66
67 Here's a quick-start for how to use:
68
69 As a quick-start, do the following after installing this package:
70
71 1) Link your executable with -ltcmalloc
72 2) Run your executable with the HEAPCHECK environment var set:
73 $ HEAPCHECK=1 <path/to/binary> [binary args]
74
75 Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
76
77 You can also use LD_PRELOAD to heap-check an executable that you
78 didn't compile.
79
80 The heap checker is only available on Linux at this time; see INSTALL
81 for more details.
82
83
84 CPU PROFILER
85 ------------
86 See doc/cpu-profiler.html for information about how to use the CPU
87 profiler and analyze its output.
88
89 As a quick-start, do the following after installing this package:
90
91 1) Link your executable with -lprofiler
92 2) Run your executable with the CPUPROFILE environment var set:
93 $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
94 3) Run pprof to analyze the CPU usage
95 $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
96 $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
97
98 There are other environment variables, besides CPUPROFILE, you can set
99 to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
100
101 The CPU profiler is available on all unix-based systems we've tested;
102 see INSTALL for more details. It is not currently available on Windows.
103
104 NOTE: CPU profiling doesn't work after fork (unless you immediately
105 do an exec()-like call afterwards). Furthermore, if you do
106 fork, and the child calls exit(), it may corrupt the profile
107 data. You can use _exit() to work around this. We hope to have
108 a fix for both problems in the next release of perftools
109 (hopefully perftools 1.2).
110
111
112 EVERYTHING IN ONE
113 -----------------
114 If you want the CPU profiler, heap profiler, and heap leak-checker to
115 all be available for your application, you can do:
116 gcc -o myapp ... -lprofiler -ltcmalloc
117
118 However, if you have a reason to use the static versions of the
119 library, this two-library linking won't work:
120 gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
121
122 Instead, use the special libtcmalloc_and_profiler library, which we
123 make for just this purpose:
124 gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
125
126
127 CONFIGURATION OPTIONS
128 ---------------------
129 For advanced users, there are several flags you can pass to
130 './configure' that tweak tcmalloc performace. (These are in addition
131 to the environment variables you can set at runtime to affect
132 tcmalloc, described below.) See the INSTALL file for details.
133
134
135 ENVIRONMENT VARIABLES
136 ---------------------
137 The cpu profiler, heap checker, and heap profiler will lie dormant,
138 using no memory or CPU, until you turn them on. (Thus, there's no
139 harm in linking -lprofiler into every application, and also -ltcmalloc
140 assuming you're ok using the non-libc malloc library.)
141
142 The easiest way to turn them on is by setting the appropriate
143 environment variables. We have several variables that let you
144 enable/disable features as well as tweak parameters.
145
146 Here are some of the most important variables:
147
148 HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
149 HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
150 CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
151 PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
152 surrounded with ProfilerEnable()/ProfilerDisable().
153 PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
154
155 TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
156 MALLOCSTATS=<level> -- prints memory-use stats at program-exit
157
158 For a full list of variables, see the documentation pages:
159 doc/cpuprofile.html
160 doc/heapprofile.html
161 doc/heap_checker.html
162
163
164 COMPILING ON NON-LINUX SYSTEMS
165 ------------------------------
166
167 Perftools was developed and tested on x86 Linux systems, and it works
168 in its full generality only on those systems. However, we've
169 successfully ported much of the tcmalloc library to FreeBSD, Solaris
170 x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
171 functionality in tcmalloc_minimal to Windows. See INSTALL for details.
172 See README_windows.txt for details on the Windows port.
173
174
175 PERFORMANCE
176 -----------
177
178 If you're interested in some third-party comparisons of tcmalloc to
179 other malloc libraries, here are a few web pages that have been
180 brought to our attention. The first discusses the effect of using
181 various malloc libraries on OpenLDAP. The second compares tcmalloc to
182 win32's malloc.
183 http://www.highlandsun.com/hyc/malloc/
184 http://gaiacrtn.free.fr/articles/win32perftools.html
185
186 It's possible to build tcmalloc in a way that trades off faster
187 performance (particularly for deletes) at the cost of more memory
188 fragmentation (that is, more unusable memory on your system). See the
189 INSTALL file for details.
190
191
192 OLD SYSTEM ISSUES
193 -----------------
194
195 When compiling perftools on some old systems, like RedHat 8, you may
196 get an error like this:
197 ___tls_get_addr: symbol not found
198
199 This means that you have a system where some parts are updated enough
200 to support Thread Local Storage, but others are not. The perftools
201 configure script can't always detect this kind of case, leading to
202 that error. To fix it, just comment out (or delete) the line
203 #define HAVE_TLS 1
204 in your config.h file before building.
205
206
207 64-BIT ISSUES
208 -------------
209
210 There are two issues that can cause program hangs or crashes on x86_64
211 64-bit systems, which use the libunwind library to get stack-traces.
212 Neither issue should affect the core tcmalloc library; they both
213 affect the perftools tools such as cpu-profiler, heap-checker, and
214 heap-profiler.
215
216 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
217 libc function dl_iterate_phdr() acquires its locks in the wrong
218 order. This bug should not affect tcmalloc, but may cause occasional
219 deadlock with the cpu-profiler, heap-profiler, and heap-checker.
220 Its likeliness increases the more dlopen() commands an executable has.
221 Most executables don't have any, though several library routines like
222 getgrgid() call dlopen() behind the scenes.
223
224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes
226 cause a segfault. I'll explain the problem first, and then some
227 workarounds.
228
229 Note that this only affects the cpu-profiler, which is a
230 gperftools feature you must turn on manually by setting the
231 CPUPROFILE environment variable. If you do not turn on cpu-profiling,
232 you shouldn't see any crashes due to perftools.
233
234 The gory details: The underlying problem is in the backtrace()
235 function, which is a built-in function in libc.
236 Backtracing is fairly straightforward in the normal case, but can run
237 into problems when having to backtrace across a signal frame.
238 Unfortunately, the cpu-profiler uses signals in order to register a
239 profiling event, so every backtrace that the profiler does crosses a
240 signal frame.
241
242 In our experience, the only time there is trouble is when the signal
243 fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
244 called quite a bit from system libraries, particularly at program
245 startup and when creating a new thread.
246
247 The solution: The dwarf debugging format has support for 'cfi
248 annotations', which make it easy to recognize a signal frame. Some OS
249 distributions, such as Fedora and gentoo 2007.0, already have added
250 cfi annotations to their libc. A future version of libunwind should
251 recognize these annotations; these systems should not see any
252 crashses.
253
254 Workarounds: If you see problems with crashes when running the
255 cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
256 your code, rather than setting CPUPROFILE. This will profile only
257 those sections of the codebase. Though we haven't done much testing,
258 in theory this should reduce the chance of crashes by limiting the
259 signal generation to only a small part of the codebase. Ideally, you
260 would not use ProfilerStart()/ProfilerStop() around code that spawns
261 new threads, or is otherwise likely to cause a call to
262 pthread_mutex_lock!
263
264 ---
265 17 May 2011
266
README_windows.txt
1 --- COMPILING
2
3 This project has begun being ported to Windows. A working solution
4 file exists in this directory:
5 gperftools.sln
6
7 You can load this solution file into VC++ 7.1 (Visual Studio 2003) or
8 later -- in the latter case, it will automatically convert the files
9 to the latest format for you.
10
11 When you build the solution, it will create a number of unittests,
12 which you can run by hand (or, more easily, under the Visual Studio
13 debugger) to make sure everything is working properly on your system.
14 The binaries will end up in a directory called "debug" or "release" in
15 the top-level directory (next to the .sln file). It will also create
16 two binaries, nm-pdb and addr2line-pdb, which you should install in
17 the same directory you install the 'pprof' perl script.
18
19 I don't know very much about how to install DLLs on Windows, so you'll
20 have to figure out that part for yourself. If you choose to just
21 re-use the existing .sln, make sure you set the IncludeDir's
22 appropriately! Look at the properties for libtcmalloc_minimal.dll.
23
24 Note that these systems are set to build in Debug mode by default.
25 You may want to change them to Release mode.
26
27 To use tcmalloc_minimal in your own projects, you should only need to
28 build the dll and install it someplace, so you can link it into
29 further binaries. To use the dll, you need to add the following to
30 the linker line of your executable:
31 "libtcmalloc_minimal.lib" /INCLUDE:"__tcmalloc"
32
33 Here is how to accomplish this in Visual Studio 2005 (VC8):
34
35 1) Have your executable depend on the tcmalloc library by selecting
36 "Project Dependencies..." from the "Project" menu. Your executable
37 should depend on "libtcmalloc_minimal".
38
39 2) Have your executable depend on a tcmalloc symbol -- this is
40 necessary so the linker doesn't "optimize out" the libtcmalloc
41 dependency -- by right-clicking on your executable's project (in
42 the solution explorer), selecting Properties from the pull-down
43 menu, then selecting "Configuration Properties" -> "Linker" ->
44 "Input". Then, in the "Force Symbol References" field, enter the
45 text "__tcmalloc" (without the quotes). Be sure to do this for both
46 debug and release modes!
47
48 You can also link tcmalloc code in statically -- see the example
49 project tcmalloc_minimal_unittest-static, which does this. For this
50 to work, you'll need to add "/D PERFTOOLS_DLL_DECL=" to the compile
51 line of every perftools .cc file. You do not need to depend on the
52 tcmalloc symbol in this case (that is, you don't need to do either
53 step 1 or step 2 from above).
54
55 An alternative to all the above is to statically link your application
56 with libc, and then replace its malloc with tcmalloc. This allows you
57 to just build and link your program normally; the tcmalloc support
58 comes in a post-processing step. This is more reliable than the above
59 technique (which depends on run-time patching, which is inherently
60 fragile), though more work to set up. For details, see
61 https://groups.google.com/group/google-perftools/browse_thread/thread/41cd3710af85e57b
62
63
64 --- THE HEAP-PROFILER
65
66 The heap-profiler has had a preliminary port to Windows. It has not
67 been well tested, and probably does not work at all when Frame Pointer
68 Optimization (FPO) is enabled -- that is, in release mode. The other
69 features of perftools, such as the cpu-profiler and leak-checker, have
70 not yet been ported to Windows at all.
71
72
73 --- WIN64
74
75 The function-patcher has to disassemble code, and is very
76 x86-specific. However, the rest of perftools should work fine for
77 both x86 and x64. In particular, if you use the 'statically link with
78 libc, and replace its malloc with tcmalloc' approach, mentioned above,
79 it should be possible to use tcmalloc with 64-bit windows.
80
81 As of perftools 1.10, there is some support for disassembling x86_64
82 instructions, for work with win64. This work is preliminary, but the
83 test file preamble_patcher_test.cc is provided to play around with
84 that a bit. preamble_patcher_test will not compile on win32.
85
86
87 --- ISSUES
88
89 NOTE FOR WIN2K USERS: According to reports
90 (http://code.google.com/p/gperftools/issues/detail?id=127)
91 the stack-tracing necessary for the heap-profiler does not work on
92 Win2K. The best workaround is, if you are building on a Win2k system
93 is to add "/D NO_TCMALLOC_SAMPLES=" to your build, to turn off the
94 stack-tracing. You will not be able to use the heap-profiler if you
95 do this.
96
97 NOTE ON _MSIZE and _RECALLOC: The tcmalloc version of _msize returns
98 the size of the region tcmalloc allocated for you -- which is at least
99 as many bytes you asked for, but may be more. (btw, these *are* bytes
100 you own, even if you didn't ask for all of them, so it's correct code
101 to access all of them if you want.) Unfortunately, the Windows CRT
102 _recalloc() routine assumes that _msize returns exactly as many bytes
103 as were requested. As a result, _recalloc() may not zero out new
104 bytes correctly. IT'S SAFEST NOT TO USE _RECALLOC WITH TCMALLOC.
105 _recalloc() is a tricky routine to use in any case (it's not safe to
106 use with realloc, for instance).
107
108
109 I have little experience with Windows programming, so there may be
110 better ways to set this up than I've done! If you run across any
111 problems, please post to the google-perftools Google Group, or report
112 them on the gperftools Google Code site:
113 http://groups.google.com/group/google-perftools
114 http://code.google.com/p/gperftools/issues/list
115
116 -- craig
117
118 Last modified: 2 February 2012
119