Home | History | Annotate | Download | only in internals
      1 
      2 Note, 11 May 2009.  The XML format evolved over several versions,
      3 as expected.  This file describes 3 different versions of the
      4 format (called Protocols 1, 2 and 3 respectively).  As of 11 May 09
      5 a fourth version, Protocol 4, was defined, and that is described
      6 in xml-output-protocol4.txt.
      7 
      8 The original May 2005 introduction follows.  These comments are
      9 correct up to and including Protocol 3, which was used in the Valgrind
     10 3.4.x series.  However, there were some more significant changes in
     11 the format and the required flags for Valgrind, in Protocol 4.
     12 
     13                        ----------------------
     14 
     15 As of May 2005, Valgrind can produce its output in XML form.  The
     16 intention is to provide an easily parsed, stable format which is
     17 suitable for GUIs to read.
     18 
     19 
     20 Design goals
     21 ~~~~~~~~~~~~
     22 
     23 * Produce XML output which is easily parsed
     24 
     25 * Have a stable output format which does not change much over time, so
     26   that investments in parser-writing by GUI developers is not lost as
     27   new versions of Valgrind appear.
     28 
     29 * Have an extensible output format, so that future changes to the
     30   format do not break backwards compatibility with existing parsers of
     31   it.
     32 
     33 * Produce output in a form which suitable for both offline GUIs (run
     34   all the way to the end, then examine output) and interactive GUIs
     35   (parse XML incrementally, update display as we go).
     36 
     37 * Put as much information as possible into the XML and let the GUIs
     38   decide what to show the user (a.k.a provide mechanism, not policy).
     39 
     40 * Make XML which is actually parseable by standard XML tools.
     41 
     42 
     43 How to use
     44 ~~~~~~~~~~
     45 
     46 Run with flag --xml=yes.  That's all.  Note however several 
     47 caveats.
     48 
     49 * At the present time only Memcheck is supported.  The scheme extends
     50   easily enough to cover Helgrind if needed.
     51 
     52 * When XML output is selected, various other settings are made.
     53   This is in order that the output format is more controlled.
     54   The settings which are changed are:
     55 
     56   - Suppression generation is disabled, as that would require user
     57     input.
     58 
     59   - Attaching to GDB is disabled for the same reason.
     60 
     61   - The verbosity level is set to 1 (-v).
     62 
     63   - Error limits are disabled.  Usually if the program generates a lot
     64     of errors, Valgrind slows down and eventually stops collecting
     65     them.  When outputting XML this is not the case.
     66 
     67   - VEX emulation warnings are not shown.
     68 
     69   - File descriptor leak checking is disabled.  This could be
     70     re-enabled at some future point.
     71 
     72   - Maximum-detail leak checking is selected (--leak-check=full).
     73 
     74 
     75 The output format
     76 ~~~~~~~~~~~~~~~~~
     77 For the most part this should be self descriptive.  It is printed in a
     78 sort-of human-readable way for easy understanding.  You may want to
     79 read the rest of this together with the results of "valgrind --xml=yes
     80 memcheck/tests/xml1" as an example.
     81 
     82 All tags are balanced: a <foo> tag is always closed by </foo>.  Hence
     83 in the description that follows, mention of a tag <foo> implicitly
     84 means there is a matching closing tag </foo>.
     85 
     86 Symbols in CAPITALS are nonterminals in the grammar and are defined
     87 somewhere below.  The root nonterminal is TOPLEVEL.
     88 
     89 The following nonterminals are not described further:
     90    INT   is a 64-bit signed decimal integer.
     91    TEXT  is arbitrary text.
     92    HEX64 is a 64-bit hexadecimal number, with leading "0x".
     93 
     94 Text strings are escaped so as to remove the <, > and & characters
     95 which would otherwise mess up parsing.  They are replaced respectively
     96 with the standard encodings "&lt;", "&gt;" and "&amp;" respectively.
     97 Note this is not (yet) done throughout, only for function names in
     98 <frame>..</frame> tags-pairs.
     99 
    100 
    101 TOPLEVEL
    102 --------
    103 
    104 The first line output is always this:
    105 
    106    <?xml version="1.0"?>
    107 
    108 All remaining output is contained within the tag-pair
    109 <valgrindoutput>.
    110 
    111 Inside that, the first entity is an indication of the protocol
    112 version.  This is provided so that existing parsers can identify XML
    113 created by future versions of Valgrind merely by observing that the
    114 protocol version is one they don't understand.  Hence TOPLEVEL is:
    115 
    116   <?xml version="1.0"?>
    117   <valgrindoutput>
    118     <protocolversion>INT<protocolversion>
    119     PROTOCOL
    120   </valgrindoutput>
    121 
    122 Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1.  Versions
    123 3.1.X and 3.2.X emit protocol version 2.  3.4.X emits protocol version
    124 3.
    125 
    126 
    127 PROTOCOL for version 3
    128 ----------------------
    129 Changes in 3.4.X (tentative): (jrs, 1 March 2008)
    130 
    131 * There may be more than one <logfilequalifier> clause.
    132 
    133 * Some errors may have two <auxwhat> blocks, rather than just one
    134   (resulting from merge of the DATASYMS branch)
    135 
    136 * Some errors may have an ORIGIN component, indicating the origins of
    137   uninitialised values.  This results from the merge of the
    138   OTRACK_BY_INSTRUMENTATION branch.
    139 
    140 
    141 PROTOCOL for version 2
    142 ----------------------
    143 Version 2 is identical in every way to version 1, except that the time
    144 string in
    145 
    146    <time>human-readable-time-string</time>
    147 
    148 has changed format, and is also elapsed wallclock time since process
    149 start, and not local time or any such.  In fact version 1 does not
    150 define the format of the string so in some ways this revision is
    151 irrelevant.
    152 
    153 
    154 PROTOCOL for version 1
    155 ----------------------
    156 This is the main top-level construction.  Roughly speaking, it
    157 contains a load of preamble, the errors from the run of the
    158 program, and the result of the final leak check.  Hence the
    159 following in sequence:
    160 
    161 * Various preamble lines which give version info for the various
    162   components.  The text in them can be anything; it is not intended
    163   for interpretation by the GUI:
    164 
    165      <preamble>
    166         <line>Misc version/copyright text</line>  (zero or more of)
    167      </preamble>
    168 
    169 * The PID of this process and of its parent:
    170 
    171      <pid>INT</pid>
    172      <ppid>INT</ppid>
    173 
    174 * The name of the tool being used:
    175 
    176      <tool>TEXT</tool>
    177 
    178 * OPTIONALLY, if --log-file-qualifier=VAR flag was given:
    179 
    180      <logfilequalifier> <var>VAR</var> <value>$VAR</value>
    181      </logfilequalifier>
    182 
    183   That is, both the name of the environment variable and its value
    184   are given.
    185   [update:  as of v3.3.0, this is not present, as the --log-file-qualifier
    186   option has been removed, replaced by the %q format specifier in --log-file.]
    187 
    188 * OPTIONALLY, if --xml-user-comment=STRING was given:
    189 
    190      <usercomment>STRING</usercomment>
    191 
    192   STRING is not escaped in any way, so that it itself may be a piece
    193   of XML with arbitrary tags etc.
    194 
    195 * The program and args: first those pertaining to Valgrind itself, and
    196   then those pertaining to the program to be run under Valgrind (the
    197   client):
    198 
    199      <args>
    200        <vargv>
    201          <exe>TEXT</exe>
    202          <arg>TEXT</arg> (zero or more of)
    203        </vargv>
    204        <argv>
    205          <exe>TEXT</exe>
    206          <arg>TEXT</arg> (zero or more of)
    207        </argv>
    208      </args>
    209 
    210 * The following, indicating that the program has now started:
    211 
    212      <status> <state>RUNNING</state> 
    213               <time>human-readable-time-string</time> 
    214      </status>
    215 
    216 * Zero or more of (either ERROR or ERRORCOUNTS).
    217 
    218 * The following, indicating that the program has now finished, and
    219   that the wrapup (leak checking) is happening.
    220 
    221      <status> <state>FINISHED</state> 
    222               <time>human-readable-time-string</time> 
    223      </status>
    224 
    225 * SUPPCOUNTS, indicating how many times each suppression was used.
    226 
    227 * Zero or more ERRORs, each of which is a complaint from the
    228   leak checker.
    229 
    230 That's it.
    231 
    232 
    233 ERROR
    234 -----
    235 This shows an error, and is the most complex nonterminal.  The format
    236 is as follows:
    237 
    238   <error>
    239      <unique>HEX64</unique>
    240      <tid>INT</tid>
    241      <kind>KIND</kind>
    242      <what>TEXT</what>
    243 
    244      optionally: <leakedbytes>INT</leakedbytes>
    245      optionally: <leakedblocks>INT</leakedblocks>
    246 
    247      STACK
    248 
    249      optionally: <auxwhat>TEXT</auxwhat>
    250      optionally: STACK
    251      optionally: ORIGIN
    252 
    253   </error>
    254 
    255 * Each error contains a unique, arbitrary 64-bit hex number.  This is
    256   used to refer to the error in ERRORCOUNTS nonterminals (see below).
    257 
    258 * The <tid> tag indicates the Valgrind thread number.  This value
    259   is arbitrary but may be used to determine which threads produced
    260   which errors (at least, the first instance of each error).
    261 
    262 * The <kind> tag specifies one of a small number of fixed error
    263   types (enumerated below), so that GUIs may roughly categorise
    264   errors by type if they want.
    265 
    266 * The <what> tag gives a human-understandable description of the
    267   error.
    268 
    269 * For <kind> tags specifying a KIND of the form "Leak_*", the
    270   optional <leakedbytes> and <leakedblocks> indicate the number of
    271   bytes and blocks leaked by this error.
    272 
    273 * The primary STACK for this error, indicating where it occurred.
    274 
    275 * Some error types may have auxiliary information attached:
    276 
    277      <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable
    278      description (usually of invalid addresses)
    279 
    280      STACK gives an auxiliary stack (usually the allocation/free
    281      point of a block).  If this STACK is present then 
    282      <auxwhat>TEXT</auxwhat> will precede it.
    283 
    284 
    285 KIND
    286 ----
    287 This is a small enumeration indicating roughly the nature of an error.
    288 The possible values are:
    289 
    290    InvalidFree
    291 
    292       free/delete/delete[] on an invalid pointer
    293 
    294    MismatchedFree
    295 
    296       free/delete/delete[] does not match allocation function
    297       (eg doing new[] then free on the result)
    298 
    299    InvalidRead
    300 
    301       read of an invalid address
    302 
    303    InvalidWrite
    304 
    305       write of an invalid address
    306 
    307    InvalidJump
    308 
    309       jump to an invalid address
    310 
    311    Overlap
    312 
    313       args overlap other otherwise bogus in eg memcpy
    314 
    315    InvalidMemPool
    316 
    317       invalid mem pool specified in client request
    318 
    319    UninitCondition
    320 
    321       conditional jump/move depends on undefined value
    322 
    323    UninitValue
    324 
    325       other use of undefined value (primarily memory addresses)
    326 
    327    SyscallParam
    328 
    329       system call params are undefined or point to
    330       undefined/unaddressible memory
    331 
    332    ClientCheck
    333 
    334       "error" resulting from a client check request
    335 
    336    Leak_DefinitelyLost
    337 
    338       memory leak; the referenced blocks are definitely lost
    339 
    340    Leak_IndirectlyLost
    341 
    342       memory leak; the referenced blocks are lost because all pointers
    343       to them are also in leaked blocks
    344 
    345    Leak_PossiblyLost
    346 
    347       memory leak; only interior pointers to referenced blocks were
    348       found
    349 
    350    Leak_StillReachable
    351 
    352       memory leak; pointers to un-freed blocks are still available
    353 
    354 
    355 STACK
    356 -----
    357 STACK indicates locations in the program being debugged.  A STACK
    358 is one or more FRAMEs.  The first is the innermost frame, the
    359 next its caller, etc.  
    360 
    361    <stack>
    362       one or more FRAME
    363    </stack>
    364 
    365 
    366 FRAME
    367 -----
    368 FRAME records a single program location:
    369 
    370    <frame>
    371       <ip>HEX64</ip>
    372       optionally <obj>TEXT</obj>
    373       optionally <fn>TEXT</fn>
    374       optionally <dir>TEXT</dir>
    375       optionally <file>TEXT</file>
    376       optionally <line>INT</line>
    377    </frame>
    378 
    379 Only the <ip> field is guaranteed to be present.  It indicates a
    380 code ("instruction pointer") address.
    381 
    382 The optional fields, if present, appear in the order stated:
    383 
    384 * obj: gives the name of the ELF object containing the code address
    385 
    386 * fn: gives the name of the function containing the code address
    387 
    388 * dir: gives the source directory associated with the name specified
    389        by <file>.  Note the current implementation often does not
    390        put anything useful in this field.
    391 
    392 * file: gives the name of the source file containing the code address
    393 
    394 * line: gives the line number in the source file
    395 
    396 
    397 ORIGIN
    398 ------
    399 ORIGIN shows the origin of uninitialised data in errors that involve
    400 uninitialised data.  STACK shows the origin of the uninitialised
    401 value.  TEXT gives a human-understandable hint as to the meaning of
    402 the information in STACK.
    403 
    404    <origin>
    405       <what>TEXT<what>
    406       STACK
    407    </origin>
    408 
    409 
    410 ERRORCOUNTS
    411 -----------
    412 This specifies, for each error that has been so far presented,
    413 the number of occurrences of that error.
    414 
    415   <errorcounts>
    416      zero or more of
    417         <pair> <count>INT</count> <unique>HEX64</unique> </pair>
    418   </errorcounts>
    419 
    420 Each <pair> gives the current error count <count> for the error with
    421 unique tag </unique>.  The counts do not have to give a count for each
    422 error so far presented - partial information is allowable.
    423 
    424 As at Valgrind rev 3793, error counts are only emitted at program
    425 termination.  However, it is perfectly acceptable to periodically emit
    426 error counts as the program is running.  Doing so would facilitate a
    427 GUI to dynamically update its error-count display as the program runs.
    428 
    429 
    430 SUPPCOUNTS
    431 ----------
    432 A SUPPCOUNTS block appears exactly once, after the program terminates.
    433 It specifies the number of times each error-suppression was used.
    434 Suppressions not mentioned were used zero times.
    435 
    436   <suppcounts>
    437      zero or more of
    438         <pair> <count>INT</count> <name>TEXT</name> </pair>
    439   </suppcounts>
    440 
    441 The <name> is as specified in the suppression name fields in .supp
    442 files.
    443 
    444