1 2 Note, 11 May 2009. The XML format evolved over several versions, 3 as expected. This file describes 3 different versions of the 4 format (called Protocols 1, 2 and 3 respectively). As of 11 May 09 5 a fourth version, Protocol 4, was defined, and that is described 6 in xml-output-protocol4.txt. 7 8 The original May 2005 introduction follows. These comments are 9 correct up to and including Protocol 3, which was used in the Valgrind 10 3.4.x series. However, there were some more significant changes in 11 the format and the required flags for Valgrind, in Protocol 4. 12 13 ---------------------- 14 15 As of May 2005, Valgrind can produce its output in XML form. The 16 intention is to provide an easily parsed, stable format which is 17 suitable for GUIs to read. 18 19 20 Design goals 21 ~~~~~~~~~~~~ 22 23 * Produce XML output which is easily parsed 24 25 * Have a stable output format which does not change much over time, so 26 that investments in parser-writing by GUI developers is not lost as 27 new versions of Valgrind appear. 28 29 * Have an extensible output format, so that future changes to the 30 format do not break backwards compatibility with existing parsers of 31 it. 32 33 * Produce output in a form which suitable for both offline GUIs (run 34 all the way to the end, then examine output) and interactive GUIs 35 (parse XML incrementally, update display as we go). 36 37 * Put as much information as possible into the XML and let the GUIs 38 decide what to show the user (a.k.a provide mechanism, not policy). 39 40 * Make XML which is actually parseable by standard XML tools. 41 42 43 How to use 44 ~~~~~~~~~~ 45 46 Run with flag --xml=yes. That's all. Note however several 47 caveats. 48 49 * At the present time only Memcheck is supported. The scheme extends 50 easily enough to cover Helgrind if needed. 51 52 * When XML output is selected, various other settings are made. 53 This is in order that the output format is more controlled. 54 The settings which are changed are: 55 56 - Suppression generation is disabled, as that would require user 57 input. 58 59 - Attaching to GDB is disabled for the same reason. 60 61 - The verbosity level is set to 1 (-v). 62 63 - Error limits are disabled. Usually if the program generates a lot 64 of errors, Valgrind slows down and eventually stops collecting 65 them. When outputting XML this is not the case. 66 67 - VEX emulation warnings are not shown. 68 69 - File descriptor leak checking is disabled. This could be 70 re-enabled at some future point. 71 72 - Maximum-detail leak checking is selected (--leak-check=full). 73 74 75 The output format 76 ~~~~~~~~~~~~~~~~~ 77 For the most part this should be self descriptive. It is printed in a 78 sort-of human-readable way for easy understanding. You may want to 79 read the rest of this together with the results of "valgrind --xml=yes 80 memcheck/tests/xml1" as an example. 81 82 All tags are balanced: a <foo> tag is always closed by </foo>. Hence 83 in the description that follows, mention of a tag <foo> implicitly 84 means there is a matching closing tag </foo>. 85 86 Symbols in CAPITALS are nonterminals in the grammar and are defined 87 somewhere below. The root nonterminal is TOPLEVEL. 88 89 The following nonterminals are not described further: 90 INT is a 64-bit signed decimal integer. 91 TEXT is arbitrary text. 92 HEX64 is a 64-bit hexadecimal number, with leading "0x". 93 94 Text strings are escaped so as to remove the <, > and & characters 95 which would otherwise mess up parsing. They are replaced respectively 96 with the standard encodings "<", ">" and "&" respectively. 97 Note this is not (yet) done throughout, only for function names in 98 <frame>..</frame> tags-pairs. 99 100 101 TOPLEVEL 102 -------- 103 104 The first line output is always this: 105 106 <?xml version="1.0"?> 107 108 All remaining output is contained within the tag-pair 109 <valgrindoutput>. 110 111 Inside that, the first entity is an indication of the protocol 112 version. This is provided so that existing parsers can identify XML 113 created by future versions of Valgrind merely by observing that the 114 protocol version is one they don't understand. Hence TOPLEVEL is: 115 116 <?xml version="1.0"?> 117 <valgrindoutput> 118 <protocolversion>INT<protocolversion> 119 PROTOCOL 120 </valgrindoutput> 121 122 Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions 123 3.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version 124 3. 125 126 127 PROTOCOL for version 3 128 ---------------------- 129 Changes in 3.4.X (tentative): (jrs, 1 March 2008) 130 131 * There may be more than one <logfilequalifier> clause. 132 133 * Some errors may have two <auxwhat> blocks, rather than just one 134 (resulting from merge of the DATASYMS branch) 135 136 * Some errors may have an ORIGIN component, indicating the origins of 137 uninitialised values. This results from the merge of the 138 OTRACK_BY_INSTRUMENTATION branch. 139 140 141 PROTOCOL for version 2 142 ---------------------- 143 Version 2 is identical in every way to version 1, except that the time 144 string in 145 146 <time>human-readable-time-string</time> 147 148 has changed format, and is also elapsed wallclock time since process 149 start, and not local time or any such. In fact version 1 does not 150 define the format of the string so in some ways this revision is 151 irrelevant. 152 153 154 PROTOCOL for version 1 155 ---------------------- 156 This is the main top-level construction. Roughly speaking, it 157 contains a load of preamble, the errors from the run of the 158 program, and the result of the final leak check. Hence the 159 following in sequence: 160 161 * Various preamble lines which give version info for the various 162 components. The text in them can be anything; it is not intended 163 for interpretation by the GUI: 164 165 <preamble> 166 <line>Misc version/copyright text</line> (zero or more of) 167 </preamble> 168 169 * The PID of this process and of its parent: 170 171 <pid>INT</pid> 172 <ppid>INT</ppid> 173 174 * The name of the tool being used: 175 176 <tool>TEXT</tool> 177 178 * OPTIONALLY, if --log-file-qualifier=VAR flag was given: 179 180 <logfilequalifier> <var>VAR</var> <value>$VAR</value> 181 </logfilequalifier> 182 183 That is, both the name of the environment variable and its value 184 are given. 185 [update: as of v3.3.0, this is not present, as the --log-file-qualifier 186 option has been removed, replaced by the %q format specifier in --log-file.] 187 188 * OPTIONALLY, if --xml-user-comment=STRING was given: 189 190 <usercomment>STRING</usercomment> 191 192 STRING is not escaped in any way, so that it itself may be a piece 193 of XML with arbitrary tags etc. 194 195 * The program and args: first those pertaining to Valgrind itself, and 196 then those pertaining to the program to be run under Valgrind (the 197 client): 198 199 <args> 200 <vargv> 201 <exe>TEXT</exe> 202 <arg>TEXT</arg> (zero or more of) 203 </vargv> 204 <argv> 205 <exe>TEXT</exe> 206 <arg>TEXT</arg> (zero or more of) 207 </argv> 208 </args> 209 210 * The following, indicating that the program has now started: 211 212 <status> <state>RUNNING</state> 213 <time>human-readable-time-string</time> 214 </status> 215 216 * Zero or more of (either ERROR or ERRORCOUNTS). 217 218 * The following, indicating that the program has now finished, and 219 that the wrapup (leak checking) is happening. 220 221 <status> <state>FINISHED</state> 222 <time>human-readable-time-string</time> 223 </status> 224 225 * SUPPCOUNTS, indicating how many times each suppression was used. 226 227 * Zero or more ERRORs, each of which is a complaint from the 228 leak checker. 229 230 That's it. 231 232 233 ERROR 234 ----- 235 This shows an error, and is the most complex nonterminal. The format 236 is as follows: 237 238 <error> 239 <unique>HEX64</unique> 240 <tid>INT</tid> 241 <kind>KIND</kind> 242 <what>TEXT</what> 243 244 optionally: <leakedbytes>INT</leakedbytes> 245 optionally: <leakedblocks>INT</leakedblocks> 246 247 STACK 248 249 optionally: <auxwhat>TEXT</auxwhat> 250 optionally: STACK 251 optionally: ORIGIN 252 253 </error> 254 255 * Each error contains a unique, arbitrary 64-bit hex number. This is 256 used to refer to the error in ERRORCOUNTS nonterminals (see below). 257 258 * The <tid> tag indicates the Valgrind thread number. This value 259 is arbitrary but may be used to determine which threads produced 260 which errors (at least, the first instance of each error). 261 262 * The <kind> tag specifies one of a small number of fixed error 263 types (enumerated below), so that GUIs may roughly categorise 264 errors by type if they want. 265 266 * The <what> tag gives a human-understandable description of the 267 error. 268 269 * For <kind> tags specifying a KIND of the form "Leak_*", the 270 optional <leakedbytes> and <leakedblocks> indicate the number of 271 bytes and blocks leaked by this error. 272 273 * The primary STACK for this error, indicating where it occurred. 274 275 * Some error types may have auxiliary information attached: 276 277 <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable 278 description (usually of invalid addresses) 279 280 STACK gives an auxiliary stack (usually the allocation/free 281 point of a block). If this STACK is present then 282 <auxwhat>TEXT</auxwhat> will precede it. 283 284 285 KIND 286 ---- 287 This is a small enumeration indicating roughly the nature of an error. 288 The possible values are: 289 290 InvalidFree 291 292 free/delete/delete[] on an invalid pointer 293 294 MismatchedFree 295 296 free/delete/delete[] does not match allocation function 297 (eg doing new[] then free on the result) 298 299 InvalidRead 300 301 read of an invalid address 302 303 InvalidWrite 304 305 write of an invalid address 306 307 InvalidJump 308 309 jump to an invalid address 310 311 Overlap 312 313 args overlap other otherwise bogus in eg memcpy 314 315 InvalidMemPool 316 317 invalid mem pool specified in client request 318 319 UninitCondition 320 321 conditional jump/move depends on undefined value 322 323 UninitValue 324 325 other use of undefined value (primarily memory addresses) 326 327 SyscallParam 328 329 system call params are undefined or point to 330 undefined/unaddressible memory 331 332 ClientCheck 333 334 "error" resulting from a client check request 335 336 Leak_DefinitelyLost 337 338 memory leak; the referenced blocks are definitely lost 339 340 Leak_IndirectlyLost 341 342 memory leak; the referenced blocks are lost because all pointers 343 to them are also in leaked blocks 344 345 Leak_PossiblyLost 346 347 memory leak; only interior pointers to referenced blocks were 348 found 349 350 Leak_StillReachable 351 352 memory leak; pointers to un-freed blocks are still available 353 354 355 STACK 356 ----- 357 STACK indicates locations in the program being debugged. A STACK 358 is one or more FRAMEs. The first is the innermost frame, the 359 next its caller, etc. 360 361 <stack> 362 one or more FRAME 363 </stack> 364 365 366 FRAME 367 ----- 368 FRAME records a single program location: 369 370 <frame> 371 <ip>HEX64</ip> 372 optionally <obj>TEXT</obj> 373 optionally <fn>TEXT</fn> 374 optionally <dir>TEXT</dir> 375 optionally <file>TEXT</file> 376 optionally <line>INT</line> 377 </frame> 378 379 Only the <ip> field is guaranteed to be present. It indicates a 380 code ("instruction pointer") address. 381 382 The optional fields, if present, appear in the order stated: 383 384 * obj: gives the name of the ELF object containing the code address 385 386 * fn: gives the name of the function containing the code address 387 388 * dir: gives the source directory associated with the name specified 389 by <file>. Note the current implementation often does not 390 put anything useful in this field. 391 392 * file: gives the name of the source file containing the code address 393 394 * line: gives the line number in the source file 395 396 397 ORIGIN 398 ------ 399 ORIGIN shows the origin of uninitialised data in errors that involve 400 uninitialised data. STACK shows the origin of the uninitialised 401 value. TEXT gives a human-understandable hint as to the meaning of 402 the information in STACK. 403 404 <origin> 405 <what>TEXT<what> 406 STACK 407 </origin> 408 409 410 ERRORCOUNTS 411 ----------- 412 This specifies, for each error that has been so far presented, 413 the number of occurrences of that error. 414 415 <errorcounts> 416 zero or more of 417 <pair> <count>INT</count> <unique>HEX64</unique> </pair> 418 </errorcounts> 419 420 Each <pair> gives the current error count <count> for the error with 421 unique tag </unique>. The counts do not have to give a count for each 422 error so far presented - partial information is allowable. 423 424 As at Valgrind rev 3793, error counts are only emitted at program 425 termination. However, it is perfectly acceptable to periodically emit 426 error counts as the program is running. Doing so would facilitate a 427 GUI to dynamically update its error-count display as the program runs. 428 429 430 SUPPCOUNTS 431 ---------- 432 A SUPPCOUNTS block appears exactly once, after the program terminates. 433 It specifies the number of times each error-suppression was used. 434 Suppressions not mentioned were used zero times. 435 436 <suppcounts> 437 zero or more of 438 <pair> <count>INT</count> <name>TEXT</name> </pair> 439 </suppcounts> 440 441 The <name> is as specified in the suppression name fields in .supp 442 files. 443 444