Home | History | Annotate | Download | only in internals
      1 
      2 ====================================================================
      3 
      4 14 October 2011
      5 
      6 Protocols 1 through 3 supported Memcheck only.  Protocol 4 provides
      7 XML output for Memcheck, Helgrind, DRD and SGcheck.  Technically there
      8 are four variants of Protocol 4, one for each tool, since they
      9 produce different errors.  The four variants differ only in the
     10 definition of the ERROR nonterminal and are otherwise identical.
     11 
     12 NOTE that Protocol 4 (for the current svn trunk, which will eventually
     13 become 3.7.x) is still under development.  The text herein should not
     14 be regarded as the final definition.
     15 
     16 
     17 Identification of Protocols
     18 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
     19 
     20 In Protocols 1 through 3, a <protocolversion>INT<protocolversion>
     21 close to the start of the stream makes it possible for parsers to
     22 ascertain the version, so they can tell whether or not they can handle
     23 it.  The presence of support for multiple tools brings a complication,
     24 though: it is not enough merely to state the protocol version -- the
     25 tool name must also be stated.  Hence in Protocol 4, the
     26 <protocolversion>INT<protocolversion> is followed immediately by
     27 <protocoltool>TEXT</protocoltool>, to identify the tool.
     28 
     29 This duplicates the tool name present later in the preamble, but it
     30 was felt important to place the tool name right at the front along
     31 with the protocol number, for easy determination of parseability.
     32 
     33 
     34 How this specification is structured
     35 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     36 
     37 The TOPLEVEL nonterminal specifies top level XML output structure.  It
     38 is common to all error producing tools.
     39 
     40 TOPLEVEL references TOOLSPECIFICs for each tool, and these are defined
     41 differently for each tool.  Each TOOLSPECIFIC is an error, which is
     42 tool-specific.  For Helgrind and DRD, a TOOLSPECIFIC may also contain a
     43 so-called thread-announcement record (described below).
     44 
     45 Overall there is a very high degree of format commonality between the
     46 three tools.  Once a GUI is able to display the output correctly for
     47 one tool, it should be easy to extend it for the other two.
     48 
     49 
     50 Protocol 4 changes for Memcheck
     51 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     52 
     53 Protocol 4 for Memcheck is similar to Protocol 3, but has a number
     54 of changes to make it fit in the common framework:
     55 
     56 - the SUPPCOUNTS nonterminal now appears after the "Zero or more
     57   ERRORs" block, and not before it.
     58 
     59 - the abovementioned "Zero or more ERRORs" block now becomes
     60   "Zero or more of (either ERROR or ERRORCOUNTS)".
     61 
     62 - ERRORs for Memcheck may contain a SUPPRESSION field, which gives
     63   the corresponding suppression for it.
     64 
     65 - ERRORs for Memcheck now use the XWHAT and XAUXWHAT nonterminals, as
     66   well as WHAT and XWHAT.
     67 
     68 - The ad-hoc blocks <leakedbytes> and <leakedblocks> used by Memcheck
     69   have been moved inside the XWHAT for the relevant error kinds.  This
     70   facilitates a common definition of ERROR across all three tools.
     71 
     72 The first two changes are required in order to correct a longstanding
     73 design flaw in the way Memcheck interacts with Valgrind's error
     74 management mechanism.  See bug #186790
     75 (https://bugs.kde.org/show_bug.cgi?id=186790).  The third change was
     76 requested in #191189 (https://bugs.kde.org/show_bug.cgi?id=191189).
     77 
     78 For GUI authors upgrading from Protocol 3 or earlier, the most
     79 significant new concept to grasp is the relationship between WHAT and
     80 XWHAT, and between AUXWHAT and XAUXWHAT.
     81 
     82 The definition of Protocol 4 now follows.  It is structured similarly
     83 to that of the previous protocols, except that there is a separate
     84 definition of a nonterminal called TOOLSPECIFIC for each of Memcheck,
     85 Helgrind, DRD and SGcheck.  The XWHAT and XAUXWHAT nonterminals also
     86 have tool-specific components.  Apart from that, the structure is
     87 common to all supported tools.
     88 
     89 
     90 ====================================================================
     91 
     92 TOPLEVEL
     93 --------
     94 
     95 The first line output is always this:
     96 
     97    <?xml version="1.0"?>
     98 
     99 All remaining output is contained within the tag-pair
    100 <valgrindoutput>.
    101 
    102 Inside that, the first entity is an indication of the protocol
    103 version.  This is provided so that existing parsers can identify XML
    104 created by future versions of Valgrind merely by observing that the
    105 protocol version is one they don't understand.  Hence TOPLEVEL is:
    106 
    107   <?xml version="1.0"?>
    108   <valgrindoutput>
    109     <protocolversion>INT<protocolversion>
    110     <protocoltool>TEXT</protocoltool>
    111     PROTOCOL
    112   </valgrindoutput>
    113 
    114 Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1.  Versions
    115 3.1.X and 3.2.X [and 3.3.X ??] emit protocol version 2.  3.4.X emits
    116 protocol version 3.  3.5.X emits version 4.
    117 
    118 The TEXT in <protocoltool> is either "memcheck", "helgrind", "drd" or
    119 "exp-ptrcheck" and determines the allowed format of the ERROR
    120 nonterminal.  Note that <protocoltool> is only present when the
    121 protocol version is 4 or above.
    122 
    123 
    124 PROTOCOL for version 4
    125 ----------------------
    126 
    127 This is the main top-level construction.  Roughly speaking, it
    128 contains a preamble, a program-started marker, the errors from the run
    129 of the program, a program-ended marker, and any further errors
    130 resulting from post-run analysis (eg, memory leak detection).  Hence
    131 the following in sequence:
    132 
    133 * Various preamble lines which give version info for the various
    134   components.  The text in them can be anything; it is not intended
    135   for interpretation by the GUI:
    136 
    137      <preamble>
    138         <line>Misc version/copyright text</line>  (zero or more of)
    139      </preamble>
    140 
    141 * The PID of this process and of its parent:
    142 
    143      <pid>INT</pid>
    144      <ppid>INT</ppid>
    145 
    146 * The name of the tool being used:
    147 
    148      <tool>TEXT</tool>
    149 
    150   This can be anything, and it doesn't have to match the
    151   <protocoltool> entry, although that might be wise.
    152 
    153 * Zero or more bindings of environment variable names to actual
    154   values.  These describe precisely the instantiations of %q format
    155   specifiers used in the --xml-file= argument for the run, if any.
    156   There is one <logfilequalifier> entry for each %q expanded:
    157 
    158      <logfilequalifier> <var>VAR</var> <value>$VAR</value>
    159      </logfilequalifier>
    160 
    161 * OPTIONALLY, if --xml-user-comment=STRING was given:
    162 
    163      <usercomment>STRING</usercomment>
    164 
    165   STRING is not escaped in any way, so that it itself may be a piece
    166   of XML with arbitrary tags etc.
    167 
    168 * The program and args: first those pertaining to Valgrind itself, and
    169   then those pertaining to the program to be run under Valgrind (the
    170   client):
    171 
    172      <args>
    173        <vargv>
    174          <exe>TEXT</exe>
    175          <arg>TEXT</arg> (zero or more of)
    176        </vargv>
    177        <argv>
    178          <exe>TEXT</exe>
    179          <arg>TEXT</arg> (zero or more of)
    180        </argv>
    181      </args>
    182 
    183 * The following, indicating that the program has now started:
    184 
    185      <status> <state>RUNNING</state> 
    186               <time>human-readable-time-string</time> 
    187      </status>
    188 
    189   The format of this string is not defined, but it is expected to be
    190   human-understandable.  In current Valgrind versions it is the
    191   elapsed wallclock time since process start.
    192 
    193 * Zero or more of (either ERRORCOUNTS or TOOLSPECIFIC).
    194 
    195 * The following, indicating that the program has now finished, and
    196   that the any final wrapup (eg, for Memcheck, leak checking) is happening.
    197 
    198      <status> <state>FINISHED</state> 
    199               <time>human-readable-time-string</time> 
    200      </status>
    201 
    202 * Zero or more of (either ERRORCOUNTS or TOOLSPECIFIC).  In Memcheck's
    203   case these will be complaints from the leak checker.  For SGcheck
    204   and Helgrind we don't expect any output here (but the spec does not
    205   guarantee that either).
    206 
    207 * SUPPCOUNTS, indicating how many times each suppression was used.
    208 
    209 
    210 That's it.  The tool-specific definitions for TOOLSPECIFIC are below;
    211 however let's first continue with some smaller nonterminals used in
    212 the construction of errors for all the tool types.
    213 
    214 
    215 ====================================================================
    216 
    217 Nonterminals used in construction of ERRORs
    218 -------------------------------------------
    219 
    220 STACK
    221 -----
    222 STACK indicates locations in the program being debugged.  A STACK
    223 is one or more FRAMEs.  The first is the innermost frame, the
    224 next its caller, etc.  
    225 
    226    <stack>
    227       one or more FRAME
    228    </stack>
    229 
    230 
    231 FRAME
    232 -----
    233 FRAME records a single program location:
    234 
    235    <frame>
    236       <ip>HEX64</ip>
    237       optionally <obj>TEXT</obj>
    238       optionally <fn>TEXT</fn>
    239       optionally <dir>TEXT</dir>
    240       optionally <file>TEXT</file>
    241       optionally <line>INT</line>
    242    </frame>
    243 
    244 Only the <ip> field is guaranteed to be present.  It indicates a
    245 code ("instruction pointer") address.
    246 
    247 The optional fields, if present, appear in the order stated:
    248 
    249 * obj: gives the name of the ELF object containing the code address
    250 
    251 * fn: gives the name of the function containing the code address
    252 
    253 * dir: gives the source directory associated with the name specified
    254        by <file>.  Note the current implementation often does not
    255        put anything useful in this field.
    256 
    257 * file: gives the name of the source file containing the code address
    258 
    259 * line: gives the line number in the source file
    260 
    261 
    262 ERRORCOUNTS
    263 -----------
    264 This specifies, for each error that has been so far presented,
    265 the number of occurrences of that error.
    266 
    267   <errorcounts>
    268      zero or more of
    269         <pair> <count>INT</count> <unique>HEX64</unique> </pair>
    270   </errorcounts>
    271 
    272 Each <pair> gives the current error count <count> for the error with
    273 unique tag </unique>.  The counts do not have to give a count for each
    274 error so far presented - partial information is allowable.
    275 
    276 As at Valgrind rev 3793, error counts are only emitted at program
    277 termination.  However, it is perfectly acceptable to periodically emit
    278 error counts as the program is running.  Doing so would facilitate a
    279 GUI to dynamically update its error-count display as the program runs.
    280 
    281 
    282 SUPPCOUNTS
    283 ----------
    284 A SUPPCOUNTS block appears exactly once, after the program terminates.
    285 It specifies the number of times each error-suppression was used.
    286 Suppressions not mentioned were used zero times.
    287 
    288   <suppcounts>
    289      zero or more of
    290         <pair> <count>INT</count> <name>TEXT</name> </pair>
    291   </suppcounts>
    292 
    293 The <name> is as specified in the suppression name fields in .supp
    294 files.
    295 
    296 
    297 SUPPRESSION
    298 -----------
    299 These are optionally emitted as part of ERRORs, and specify the
    300 suppression that would be needed to suppress the containing error.
    301 For convenience, the suppression is presented twice, once in
    302 a structured nicely wrapped up in tags, and once as raw text
    303 suitable for direct copying and pasting into a suppressions file.
    304 
    305   <suppression>
    306     <sname>TEXT</sname>    name of the suppression
    307     <skind>TEXT</skind>    kind, eg                 "Memcheck:Param"
    308     <skaux>TEXT</skaux>    (optional) aux kind, eg  "write(buf)"
    309     SFRAME                 (one or more) frames
    310     <rawtext> CDATAS </rawtext>
    311   </suppression>
    312 
    313 where CDATAS is a sequence of one or more <![CDATA[ .. ]]> blocks
    314 holding the raw text.  Unfortunately, CDATA provides no way to escape
    315 the ending marker "]]>", which means that if the raw data contains
    316 such a sequence, it has to be split between two CDATA blocks, one
    317 ending with data "]]" and the other beginning with data "<".  This is
    318 why the spec calls for one or more CDATA blocks rather than exactly
    319 one.
    320 
    321 Note that, so far, we cannot envisage a circumstance in which a
    322 generated suppression would contain the string "]]>", since neither
    323 "]" nor ">" appear to turn up in mangled symbol names.  Hence it is
    324 not envisaged that there will ever be more than one CDATA block, and
    325 indeed the implementation as of Valgrind 3.5.0 will only ever generate
    326 one block (it ignores any possible escaping problems).  Nevertheless
    327 the specification allows multiple blocks, as a matter of safety.
    328 
    329 
    330 SFRAME
    331 ------
    332 Either
    333 
    334   <sframe> <obj>TEXT</obj> </sframe>
    335 
    336 eg denoting "obj:/usr/X11R6/lib*/libX11.so.6.2", or
    337 
    338   <sframe> <fun>TEXT</fun> </sframe>
    339 
    340 eg denoting "fun:*libc_write"
    341 
    342 
    343 WHAT and XWHAT
    344 --------------
    345 
    346 WHAT supplies a single line of text, which is a human-understandable,
    347 primary description of an error.
    348 
    349 XWHAT is an extended version of WHAT.  It also contains a piece of
    350 text intended for human reading, but in addition may contain arbitrary
    351 other tagged data.  This extra data is tool-specific.  One of its
    352 purposes is to supply GUIs with links to other data in the sequence of
    353 TOOLSPECIFICs, that are associated with the error.  Another purpose is
    354 wrap certain quantities (numbers, file names, etc) embedded in the
    355 message, so that the GUIs can get hold of them without having to parse
    356 the text itself.
    357 
    358 For example, we could get:
    359 
    360   <what>Possible data race on address 0x12345678</what>
    361 
    362 or alternatively
    363 
    364   <xwhat>
    365      <text>Possible data race by thread #17 on address 0x12345678</text>
    366      <threadid>17</threadid>
    367   </xwhat>
    368 
    369 And presumably the <threadid>17</threadid> refers to some previously
    370 emitted entity in the stream of TOOLSPECIFICs for this tool.
    371 
    372 In an XWHAT, the <text> tag-pair is mandatory.  GUIs which don't want
    373 to handle the extra fields can just ignore them and display the text
    374 part.  In this way they have the option to present at least something
    375 useful to the user even in the case where the extra fields can't be
    376 handled, for whatever reason.
    377 
    378 A corollary of this is that the degenerate extended case
    379 
    380    <xwhat> <text>T</text> </xwhat>
    381 
    382 is exactly equivalent to
    383 
    384    <what>T</what>
    385 
    386 
    387 AUXWHAT and XAUXWHAT
    388 --------------------
    389 
    390 AUXWHAT is exactly like WHAT: a single line of text.  It provides
    391 additional, secondary description of an error, that should be shown to
    392 the user.
    393 
    394 XAUXWHAT relates to AUXWHAT in the same way XWHAT relates to WHAT: it
    395 wraps up extra tagged info along with the line of text that would be
    396 in the AUXWHAT.
    397 
    398 
    399 ====================================================================
    400 
    401 ERROR definition -- common structure
    402 ------------------------------------
    403 
    404 ERROR defines an error, and is the most complex nonterminal.  For all
    405 of the tools, the structure is common, and always conforms to the
    406 following:
    407 
    408   <error>
    409      <unique>HEX64</unique>
    410      <tid>INT</tid>
    411      <kind>KIND</kind>
    412 
    413      (either WHAT or XWHAT)
    414      optionally: (either WHAT or XWHAT)
    415 
    416      STACK
    417 
    418      zero or more: (either AUXWHAT or XAUXWHAT or STACK)
    419 
    420      optionally: SUPPRESSION
    421   </error>
    422 
    423 
    424 * Each error contains a unique, arbitrary 64-bit hex number.  This is
    425   used to refer to the error in ERRORCOUNTS nonterminals (see above).
    426 
    427 * The <tid> tag indicates the Valgrind thread number.  This value
    428   is arbitrary but may be used to determine which threads produced
    429   which errors (at least, the first instance of each error).
    430 
    431 * The <kind> tag specifies one of a small number of fixed error types,
    432   so that GUIs may roughly categorise errors by type if they want.
    433   The tags themselves are tool-specific and are defined further
    434   below, for each tool.
    435 
    436 * The "(either WHAT or XWHAT)" gives a primary description of the
    437   error.  WHAT and XWHAT are defined earlier in this file.  Any XWHATs
    438   appearing here may contain tool-specific subcomponents.
    439 
    440 * Optionally, a second line of primary description may be present.
    441 
    442 * A STACK gives the primary source location for the error.
    443 
    444 * There then follow zero or more of "(either AUXWHAT or XAUXWHAT or
    445   STACK)".  These give further (auxiliary) information about the
    446   error, possibly including stack traces.  They should be shown to the
    447   user in the order they appear.  AUXWHAT and XAUXWHAT are defined
    448   earlier in this file.  Any XAUXWHATs appearing here may contain
    449   tool-specific subcomponents.
    450 
    451 * Optionally, as the last field, a SUPPRESSION may be provided.  This
    452   contains a suppression that would hide the error.
    453 
    454 
    455 ====================================================================
    456 
    457 TOOLSPECIFIC definition for Memcheck
    458 ------------------------------------
    459 
    460 For Memcheck, a TOOLSPECIFIC is simply an ERROR:
    461 
    462 TOOLSPECIFIC = ERROR
    463 
    464 
    465 ERROR details for Memcheck
    466 --------------------------
    467 
    468 XWHATs (for definition, see above) may contain the following extra
    469 components (along with the mandatory <text>...</text> component):
    470 
    471 * <leakedbytes>INT</leakedbytes>
    472 
    473 * <leakedblocks>INT</leakedblocks>
    474 
    475 These fields are used in errors that have a <kind> tag specifying a
    476 KIND of the form "Leak_*", to indicate the number of leaked bytes and
    477 blocks.
    478 
    479 
    480 XAUXWHATs (for definition, see above) may contain the following extra
    481 components (along with the mandatory <text>...</text> component):
    482 
    483 * <file>TEXT</file>, as defined in FRAME
    484 
    485 * <line>INT</line>, as defined in FRAME
    486 
    487 * <dir>TEXT</dir>, as defined in FRAME
    488 
    489 
    490 KIND for Memcheck
    491 -----------------
    492 
    493 This is a small enumeration indicating roughly the nature of an error.
    494 The possible values are:
    495 
    496    InvalidFree
    497 
    498       free/delete/delete[] on an invalid pointer
    499 
    500    MismatchedFree
    501 
    502       free/delete/delete[] does not match allocation function
    503       (eg doing new[] then free on the result)
    504 
    505    InvalidRead
    506 
    507       read of an invalid address
    508 
    509    InvalidWrite
    510 
    511       write of an invalid address
    512 
    513    InvalidJump
    514 
    515       jump to an invalid address
    516 
    517    Overlap
    518 
    519       args overlap other otherwise bogus in eg memcpy
    520 
    521    InvalidMemPool
    522 
    523       invalid mem pool specified in client request
    524 
    525    UninitCondition
    526 
    527       conditional jump/move depends on undefined value
    528 
    529    UninitValue
    530 
    531       other use of undefined value (primarily memory addresses)
    532 
    533    SyscallParam
    534 
    535       system call params are undefined or point to
    536       undefined/unaddressible memory
    537 
    538    ClientCheck
    539 
    540       "error" resulting from a client check request
    541 
    542    Leak_DefinitelyLost
    543 
    544       memory leak; the referenced blocks are definitely lost
    545 
    546    Leak_IndirectlyLost
    547 
    548       memory leak; the referenced blocks are lost because all pointers
    549       to them are also in leaked blocks
    550 
    551    Leak_PossiblyLost
    552 
    553       memory leak; only interior pointers to referenced blocks were
    554       found
    555 
    556    Leak_StillReachable
    557 
    558       memory leak; pointers to un-freed blocks are still available
    559 
    560 
    561 ====================================================================
    562 
    563 TOOLSPECIFIC definition for SGcheck
    564 -----------------------------------
    565 
    566 For SGcheck, a TOOLSPECIFIC is simply an ERROR:
    567 
    568 TOOLSPECIFIC = ERROR
    569 
    570 
    571 ERROR details for SGcheck
    572 -------------------------
    573 
    574 SGcheck does not produce any XWHAT records, despite the fact that
    575 "ERROR definition -- common structure" says that tools may do so.
    576 
    577 
    578 XAUXWHATs (for definition, see above) may contain the following extra
    579 components (along with the mandatory <text>...</text> component):
    580 
    581 * <file>TEXT</file>, as defined in FRAME
    582 
    583 * <line>INT</line>, as defined in FRAME
    584 
    585 * <dir>TEXT</dir>, as defined in FRAME
    586 
    587 
    588 KIND for SGcheck
    589 ----------------
    590 This is a small enumeration indicating roughly the nature of an error.
    591 The possible values are:
    592 
    593    SorG
    594 
    595       Stack or global array inconsistency (roughly speaking, an
    596       overrun of a stack or global array).  The <auxwhat> blocks give
    597       further details.
    598 
    599 
    600 ====================================================================
    601 
    602 TOOLSPECIFIC definition for Helgrind
    603 -------------------------------------
    604 
    605 For Helgrind, a TOOLSPECIFIC may be one of two things:
    606 
    607 TOOLSPECIFIC = either ERROR or ANNOUNCETHREAD
    608 
    609 
    610 ANNOUNCETHREAD
    611 --------------
    612 
    613 The definition is
    614 
    615    <announcethread>
    616       <hthreadid>INT</hthreadid>
    617       STACK
    618    </announcethread>
    619 
    620 This states the creation point of a thread, and gives it a unique
    621 "hthreadid", which may be referred to in subsequent ERRORs.  Note that
    622 
    623 1. The appearance of ANNOUNCETHREAD does not mean that the thread was
    624    actually created at that point relative to any preceding or
    625    following ERRORs in the output stream -- in general the thread will
    626    have been created arbitrarily earlier.  Helgrind only "announces" a
    627    thread when it needs to refer to it for the first time, in a
    628    subsequent ERROR.
    629 
    630 2. The "hthreadid" is a number which uniquely identifies the thread
    631    for the run - no other thread will have the same hthreadid.  The
    632    hthreadid is a Helgrind-specific piece of information and is
    633    unrelated to the <tid> fields in the common part of an ERROR.
    634    Be careful not to confuse the two.
    635 
    636 
    637 ERROR details for Helgrind
    638 --------------------------
    639 
    640 XWHATs (for definition, see above) may contain the following extra
    641 components (along with the mandatory <text>...</text> component):
    642 
    643 * <hthreadid>INT</hthreadid> fields.  These refer to ANNOUNCETHREADs
    644   appearing previously in the scheme, and state the creation points of
    645   the thread(s) concerned in the ERROR.  Hence it should be possible
    646   for GUIs to show users stacks of the creation points of all threads
    647   involved in each ERROR.
    648 
    649 
    650 XAUXWHATs (for definition, see above) may contain the following extra
    651 components (along with the mandatory <text>...</text> component):
    652 
    653 * <hthreadid>INT</hthreadid>, same meaning as when referred to in
    654   XWHAT
    655 
    656 * <file>TEXT</file>, as defined in FRAME
    657 
    658 * <line>INT</line>, as defined in FRAME
    659 
    660 * <dir>TEXT</dir>, as defined in FRAME
    661 
    662 
    663 KIND for Helgrind
    664 -----------------
    665 This is a small enumeration indicating roughly the nature of an error.
    666 The possible values are:
    667 
    668    Race
    669 
    670       Data race.  Helgrind will try to show the stacks for both
    671       conflicting accesses if it can; it will always show the stack
    672       for at least one of them.
    673 
    674    UnlockUnlocked
    675 
    676       Unlocking a not-locked lock
    677 
    678    UnlockForeign
    679 
    680       Unlocking a lock held by some other thread
    681 
    682    UnlockBogus
    683 
    684       Unlocking an address which is not known to be a lock
    685 
    686    PthAPIerror
    687 
    688       One of the POSIX pthread_ functions that are intercepted
    689       by Helgrind, failed with an error code.  Usually indicates
    690       something bad happening.
    691 
    692    LockOrder
    693 
    694       An inconsistency in the acquisition order of locks was observed;
    695       dangerous, as it can potentially lead to deadlocks
    696 
    697    Misc
    698 
    699       One of various miscellaneous noteworthy conditions was observed
    700       (eg, thread exited whilst holding locks, "impossible" behaviour
    701       from the underlying threading library, etc)
    702