1 This is an (incomplete) list of some of the stuff we want to look at doing. 2 3 If you're interested in hacking on any of these, please contact the list first 4 for some pointers and/or read HACKING and doc/CodingStyle. 5 6 1.0 release 7 ----------- 8 9 (this is a minimal selection of stuff I think we need) 10 11 o default to a vmlinux location: need agreement from kernel developers 12 o default to --separate=library (with anon, =none, makes not much sense) 13 o prettify image name for .jo files and allow lib-image: to specify it 14 o gisle's fixes 15 o opreport tgid:<tgid> doesn't work even if .jo files with that pid 16 o Fix: 17 18 warning: [vdso] (tgid:9236 range:0x7fff98ffd000-0x7fff98fff000) could not be found. 19 warning: /no-vmlinux could not be found. 20 warning: /usr/lib64/libpanel-applet-2.so.0.2.27.#prelink#.sXCUK1 (deleted) could not be found. 21 22 o amd64 32 bit build needs a sys32_lookup_dcookie() translator in the 23 kernel 24 o decide on -m tgid semantics for anon regions 25 o if ev67 is not fixed, back it out 26 o lapic : module should says "didn't find apic" if needed, FAQ and doc should 27 speak a bit about lapic kernel option on x86 and recent kernel 28 o see the big comment in db_insert.c, it's possible to allow unlimited 29 amount of samples with a very minor change in libdb. 30 o if oprofile doesn't recognize the processor selected by the kernel 31 opcontrol could setup the module in timer mode (remove/reload prolly), and 32 warn the user it must upgrade oprofile to get all the feature from its 33 hardware. 34 35 Later 36 ----- 37 38 o remove 2.95/2.2 support so we can use boost multi index container in 39 symbol/sample container 40 o consider if we can improve anon mapping growing support 41 42 <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /bin/bash | grep vfprintf 43 <movement> 14 0.1301 6 0.0102 /lib/tls/libc-2.3.2.so vfprintf 44 <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so /usr/bin/vim | grep vfprintf 45 <movement> 176 2.0927 349 1.2552 /lib/tls/libc-2.3.2.so vfprintf 46 <movement> [moz@lambent pp]$ ./opreport -lf lib-image:/lib/tls/libc-2.3.2.so { image:/bin/bash } { image:/usr/bin/vim } | grep vfprintf 47 <movement> 176 10.9657 +++ 349 7.8888 +++ vfprintf 48 <movement> 14 --- --- 6 --- --- vfprintf 49 <movement> it seems them as two separate symbols 50 <movement> but can we remove the app_name from rough_less and still be able to walk the two lists? 51 <movement> even if we could, it would still go wrong when we're profiling multiple apps 52 53 o Java stuff?? 54 o with opreport -c I can get "warning: /no-vmlinux could not be found.". 55 Should be smarter ? 56 o opreport -c gives weird output for an image with no symbols: 57 58 samples % symbol name 59 15965 100.000 (no symbols) 60 253 100.000 (no symbols) 61 15965 98.4400 (no symbols) 62 253 1.5600 (no symbols) [self] 63 64 o consider tagging opreport -c entries with a number like gprof 65 o --details for opreport -c, or diff?? 66 o should [self] entries be ommitted if 0 ?? 67 o stress test opreport -c: compile a Big Application w/o frame pointer and look 68 how driver and opreport -c react. 69 o oparchive could fix up {kern} paths with -p (what about diff between 70 archive and current though?) 71 o can say more in opcontrol --status 72 o consider a sort option for diff % 73 o opannotate is silent about symbols missing debug info 74 o oprofiled.log now contains various statistics about lost sample etc. from 75 the driver. Post profile tools must parse that and warn eventually, warning 76 must include a proposed work around. User need this: if nothing seems wrong 77 people are unlikely to get a look in oprofiled.log (I ran oprofile on 2.6.1 78 2 weeks before noticing at 30000 I lost a lot of samples, the profile seemed 79 ok du to the randomization of lost samples). As developper we need that too, 80 actually we have no clear idea of the behavior on different arch, NUMA etc. 81 Not perfect because if the profiler is running the oprofiled.log will show 82 those warning only after the first alarm signal, I think we must dump the 83 statistics information after each opcontrol --dump to avoid that. 84 o odb_insert() can fail on ftruncate or mremap() in db_manage.c but we don't 85 try to recover gracefully. 86 o output column shortname headers for opreport -l 87 o is relative_to_absolute_path guaranteeing a trailing '/' documented ? 88 o move oprofiled.log to OP_SAMPLE_DIR/current ? 89 o pp tools must handle samples count overflow (marked as (unsigned)-1) 90 o the way we show kernel modules in 2.5 is not very obvious - "/oprofile" 91 o oparchive will be more usefull with a --root= options to allow profiling 92 on a small box, nfs mount / to another box and transfer sample file and 93 binary on a bigger box for analysis. There is also a problem in oparchive 94 you can use session: to get the right path to samples files but oprofiled.log 95 and abi files path are hardcoded to /var/lib/oprofile. 96 o callgraph patch: better way to skip ignored backtrace ? 97 o lib-image: and image: behavior depend on --separate=, if --separate=library 98 opreport "lib-image:*libc*" --merge=lib works but not 99 opreport "image:*libc*" --merge=lib whilst the behavior is reversed if 100 --separate==none. Must we take care ? 101 o dependencies between profile_container.h symbol_container.h and 102 sample_container.h become more and more ugly, I needed to include them 103 in a specific order in some source (still true??) 104 o add event aliases for common things like icache misses, we must start to 105 think about metrics including simple like event alias mapped to two or more 106 events and intepreted specially by user space tools like using the ratio 107 of samples; more tricky will be to select an event used as call count (no 108 cg on it) and used to emulate the call count field in gprof. I think this is 109 a after 1.0 thing but event aliases must be specified in a way allowing such 110 extension 111 o do we need an opreport like opreport -c (showing caller/callee at binary 112 boundary not symbols) ? 113 o we should notice an opcontrol config change (--separate etc.) and 114 auto-restart the daemon if necessary (Run) 115 o we can add lots more unit tests yet 116 o Itanium event constraints are not implemented 117 o GUI still has a physical-counter interface, should have a general one 118 like opcontrol --event 119 o I think we should have the ability to have *fixed* width headers, e.g. : 120 121 vma samples cum. samples % cum. % symbol name image name app name 122 0804c350 64582 64582 35.0757 35.0757 odb_insert /usr/loc...in/oprofiled /usr/local/oprofile-pp/bin/oprofiled 123 124 Note the ellipsis 125 o should we make the sighup handler re-read counter config and re-start profiling too ? 126 o improve --smart-demangle 127 o allow user to add it's own pattern in user.pat, document it. 128 o hard code ${typename} regular definition to remove all current limitations (difficult, perhaps after 1.0 ?). 129 o oprof_start dialog size is too small initially 130 o i18n. We need a good formatter, and also remember format_percent() 131 o opannotate --source --output-dir=~moz/op/ /usr/bin/oprofiled 132 will fail because the ~ is not expanded (no space around it) (popt bug I say) 133 o cpu names instead of numbers in 2.4 module/ ? 134 o remove 1 and 2 magic numbers for oprof_ready 135 o adapt Anton's patch for handling non-symbolled libraries ? (nowaday C++ 136 anon namespace symbol are static, 3.4 iirc, so with recent distro we are 137 more likely to get problems with a "fallback to dynamic symbols" approch) 138 o use standard C integer type <stdint.h> int32_t int16_t etc. 139 o event multiplexing for real 140 o randomizing of reset value 141 o XML output 142 o profile the NMI handler code 143 o opannotate : I added this to the doc about difference between nr samples 144 credited to a source function and total number of samples for this function: 145 "The missing samples are not lost, they will be credited to another source 146 location where the inlined function is defined. The inlined function will 147 be credited from multiple call site and merged in one place in the 148 annotated source file so there is no way to see from what call site are 149 coming the samples for an inlined function." 150 I think we can work around this: output multiple instances of inlined 151 function like : 152 inline foo() { foo: total 1500 30.00 ... 153 ... annotated source from all call site 154 inline foo() { foo (call site bar()): total 500 10.00 155 .. annotated source from call site bar() etc. 156 what about template..., can we do/must we do something like that 157 template <class T> eat_cpu() and do a similar things, merging and annotating 158 all instantation then annotating for each distinct instantation, this will 159 break our "keep the source line number in annotated source file identical to 160 the original source" 161 o events/mips/34k/events, some events does not make sense, they get identical 162 event number, um and counter nr so they overlap, currently commented 163 o can we find a more efficient implementation for sparse_array ? 164 o libpp/profile.cpp:is_spu_sample_file() can be simplified by using 165 read_header() 166 o while fixing #1819350 I needed to make extra_images per profile session 167 rather than a global var so I think we need to revisit find_image_path(), 168 extra_found_images, --image-path (-p). 169 Actually we can't do something ala: 170 opreport { archive:tmp1 search_path=/lib/modules/2.6.20 } { archive:tmp2 search_path=/.../2.6.20.9 } 171 because search_path is specified through -p which is not a part of the 172 profile spec. Fixing #1819350 covered all case except this one but w/o any 173 user visible change. Another way will be to save the -p option used with 174 oparchive in a file at the toplevel of the archive, use it with all tools 175 when an archive: is specified on the command line and deprecate the use of 176 -p in such case. 177 o consider to make extra_images a ref counted object, it's copied by value 178 a few time but can contain a lot of string. There is also some ugly public 179 member extra_images to fix. 180 o daemon bss size can be improved, grep for MAX_PATH to see where dynamic 181 allocation can be used, try $ nm oprofiled --size-sort too. 182 183 Documentation 184 ------------- 185 186 o the docs should mention the default event for each arch somewhere 187 o more discussion of problematic code needs to go in the "interpreting" section. 188 o document gcc 2.95 and linenr info problems especially for inline functions 189 o finish the internals manual 190 191 JIT support 192 ----------- 193 194 o We need a more dynamic structure to handle entries_address_ascending and 195 entries_symbols_ascending, actually many scaling problem occur because they 196 are array, this was perfect to get a first implementation focusing on 197 handling overlap and all but the need to qsort/copy arrays at each iteration 198 is a performance killer. Some sort of AVL tree will do the job. 199 o Related to the previous, it's possible to do all processing in opjitconv.c 200 in a single left to right walk of the jitentry list. 201 o see the FIXME at parse_dump.c:parse_code_unload() 202 o Increment JITHEADER_VERSION in jitdump.h to be sure that the new code only 203 accepts dump file created by the new code. 204 o opjitconv.c:replacement_name() should be enough clever to avoid name 205 collision so we can remove the recursive call to disambiguate_symbol_names(), 206 need a hash table or some sort of associative array to check quickly if a 207 name exists, we will need some sort of avl tree so it's probably better 208 to do not implement a hash table only for this purpose. 209 o op_write_native_code() must accept one more parameter, the real code size 210 which can be zero or equal to code_size, this will allow to create elf 211 file w/o any code contents, only a symbol table and .text sections w/o 212 contents (yes ELF format allow that). For dynamic binary translation it'll 213 avoid to dump tons of code for little use, opannotate --assembly will not 214 work on such elf file but it can be a real win. It'll need to add to 215 jitrecord0 a real_size field, and some trickery when building the elf file, 216 taking care about the case we mix zero code size with non zero code size. 217 Perhaps we can use it too for java, filtering native method etc. Actually 218 we allow a simplified form of this feature by allowing to disable/enable 219 code dumping but at the whole dump level not on a symbol basis, quite 220 possible sufficient. [mpj: We're backing away from the idea of dumping 221 JIT records without code. Since BFD asymbol type does not include symbol size, 222 the op_bfd technique for determining symbol size relies on knowing the true 223 file size; and if code is not included in the .jo file, we don't have true size.] 224 o The pipe used for triggering JIT dump conversion should be used for normal 225 dumping too. 226 o See FIXME in agents/jvmti/libjvmti_oprofile.c: 227 If enablement to get line number info would be configurable through command line, 228 what should be the default on/off? 229 o See FIXME in opjitconv/debug_line.c 230 o The way to use the pipe should be made more secure to avoid denial of service 231 attacks. We have to think about it. 232 o Callgraph does not work properly for the .jo files the JIT support creates. 233 See section Chapter 4, sect 2.3.2 "Callgraph and JIT support". Try to figure 234 out a way to correlate an anonymous sample callgraph entry with 235 the .jo file that may exist for the anonymous code. 236 o see mail from Gisle Dankel: 237 "JIT_SUPPORT: Adding support for file-backed non-ELF JIT code" 238 -> should be changed (if useful) before next release 239 o See FIXME in op_header.cpp: 240 The check for header.mtime of JIT sample files is not correct because currently 241 this mtime value is set to zero due to missing cookie setting for JIT sample files. 242 Some additional check/setting to header.mtime should be made for JIT sample files. 243 o Mono JIT support: 244 245 2007-11-08: with callgraph massi got 246 <massi> oparchive error: parse_filename() invalid filename: /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/home/massi/mono/amd64/bin/mono/{dep}/{anon:anon}/32432.0x40a26000.0x40a36000/CPU_CLK_UNHALTED.100000.0.all.all.all/{cg}/{root}/usr/oprofile/bin/oprofiled/CPU_CLK_ 247 248 Massi added Mono JIT support, code on the stack is never unloaded and there is 249 no byte code, code is always compiled to native machine code, this mean than 250 for mono at least we can do callgraph if we can fix this samples filename 251 problem. 252 253 General checks to make 254 ---------------------- 255 256 o rgrep FIXME 257 o valgrind (--show-reachable=yes --leak-check=yes) 258 o audit to track unnecessary include <> 259 o gcc 3.0/3.x compile 260 o Qt2/3 check, no Qt check 261 o verify builds (modversions, kernel versions, athlon etc.). I have the 262 necessary stuff to check kernel versions/configurations on PIII core (Phil) 263 o use nm and a little script to track unused function 264 o test it to hell and back 265 o compile all C++ programs with STL_port and test them (gcc 3.4 contain a 266 debug mode too but std::string iterator are not checked) 267 o There is probably place of post profile tools where looking at errno will give better error messages. 268 269