Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
      4 <title>11.SGCheck: an experimental stack and global array overrun detector</title>
      5 <link rel="stylesheet" type="text/css" href="vg_basic.css">
      6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
      7 <link rel="home" href="index.html" title="Valgrind Documentation">
      8 <link rel="up" href="manual.html" title="Valgrind User Manual">
      9 <link rel="prev" href="dh-manual.html" title="10.DHAT: a dynamic heap analysis tool">
     10 <link rel="next" href="bbv-manual.html" title="12.BBV: an experimental basic block vector generation tool">
     11 </head>
     12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
     13 <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
     14 <td width="22px" align="center" valign="middle"><a accesskey="p" href="dh-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
     15 <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
     16 <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
     17 <th align="center" valign="middle">Valgrind User Manual</th>
     18 <td width="22px" align="center" valign="middle"><a accesskey="n" href="bbv-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
     19 </tr></table></div>
     20 <div class="chapter">
     21 <div class="titlepage"><div><div><h1 class="title">
     22 <a name="sg-manual"></a>11.SGCheck: an experimental stack and global array overrun detector</h1></div></div></div>
     23 <div class="toc">
     24 <p><b>Table of Contents</b></p>
     25 <dl class="toc">
     26 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.overview">11.1. Overview</a></span></dt>
     27 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.options">11.2. SGCheck Command-line Options</a></span></dt>
     28 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.how-works.sg-checks">11.3. How SGCheck Works</a></span></dt>
     29 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.cmp-w-memcheck">11.4. Comparison with Memcheck</a></span></dt>
     30 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.limitations">11.5. Limitations</a></span></dt>
     31 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.todo-user-visible">11.6. Still To Do: User-visible Functionality</a></span></dt>
     32 <dt><span class="sect1"><a href="sg-manual.html#sg-manual.todo-implementation">11.7. Still To Do: Implementation Tidying</a></span></dt>
     33 </dl>
     34 </div>
     35 <p>To use this tool, you must specify
     36 <code class="option">--tool=exp-sgcheck</code> on the Valgrind
     37 command line.</p>
     38 <div class="sect1">
     39 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     40 <a name="sg-manual.overview"></a>11.1.Overview</h2></div></div></div>
     41 <p>SGCheck is a tool for finding overruns of stack and global
     42 arrays.  It works by using a heuristic approach derived from an
     43 observation about the likely forms of stack and global array accesses.
     44 </p>
     45 </div>
     46 <div class="sect1">
     47 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     48 <a name="sg-manual.options"></a>11.2.SGCheck Command-line Options</h2></div></div></div>
     49 <p><a name="sg.opts.list"></a>There are no SGCheck-specific command-line options at present.</p>
     50 </div>
     51 <div class="sect1">
     52 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     53 <a name="sg-manual.how-works.sg-checks"></a>11.3.How SGCheck Works</h2></div></div></div>
     54 <p>When a source file is compiled
     55 with <code class="option">-g</code>, the compiler attaches DWARF3
     56 debugging information which describes the location of all stack and
     57 global arrays in the file.</p>
     58 <p>Checking of accesses to such arrays would then be relatively
     59 simple, if the compiler could also tell us which array (if any) each
     60 memory referencing instruction was supposed to access.  Unfortunately
     61 the DWARF3 debugging format does not provide a way to represent such
     62 information, so we have to resort to a heuristic technique to
     63 approximate it.  The key observation is that
     64    <span class="emphasis"><em>
     65    if a memory referencing instruction accesses inside a stack or
     66    global array once, then it is highly likely to always access that
     67    same array</em></span>.</p>
     68 <p>To see how this might be useful, consider the following buggy
     69 fragment:</p>
     70 <pre class="programlisting">
     71    { int i, a[10];  // both are auto vars
     72      for (i = 0; i &lt;= 10; i++)
     73         a[i] = 42;
     74    }
     75 </pre>
     76 <p>At run time we will know the precise address
     77 of <code class="computeroutput">a[]</code> on the stack, and so we can
     78 observe that the first store resulting from <code class="computeroutput">a[i] =
     79 42</code> writes <code class="computeroutput">a[]</code>, and
     80 we will (correctly) assume that that instruction is intended always to
     81 access <code class="computeroutput">a[]</code>.  Then, on the 11th
     82 iteration, it accesses somewhere else, possibly a different local,
     83 possibly an un-accounted for area of the stack (eg, spill slot), so
     84 SGCheck reports an error.</p>
     85 <p>There is an important caveat.</p>
     86 <p>Imagine a function such as <code class="function">memcpy</code>, which is used
     87 to read and write many different areas of memory over the lifetime of the
     88 program.  If we insist that the read and write instructions in its memory
     89 copying loop only ever access one particular stack or global variable, we
     90 will be flooded with errors resulting from calls to
     91 <code class="function">memcpy</code>.</p>
     92 <p>To avoid this problem, SGCheck instantiates fresh likely-target
     93 records for each entry to a function, and discards them on exit.  This
     94 allows detection of cases where (e.g.) <code class="function">memcpy</code>
     95 overflows its source or destination buffers for any specific call, but
     96 does not carry any restriction from one call to the next.  Indeed,
     97 multiple threads may make multiple simultaneous calls to
     98 (e.g.) <code class="function">memcpy</code> without mutual interference.</p>
     99 <p>It is important to note that the association is done between
    100   a <span class="emphasis"><em>binary instruction</em></span> and an array, the
    101   <span class="emphasis"><em>first time</em></span> this binary instruction accesses an
    102   array during a function call.  When the same instruction is executed
    103   again during the same function call, then SGCheck might report a
    104   problem, if these further executions are not accessing the same
    105   array. This technique causes several limitations in SGCheck, see
    106   <a class="xref" href="sg-manual.html#sg-manual.limitations" title="11.5.Limitations">Limitations</a>.
    107 </p>
    108 </div>
    109 <div class="sect1">
    110 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    111 <a name="sg-manual.cmp-w-memcheck"></a>11.4.Comparison with Memcheck</h2></div></div></div>
    112 <p>SGCheck and Memcheck are complementary: their capabilities do
    113 not overlap.  Memcheck performs bounds checks and use-after-free
    114 checks for heap arrays.  It also finds uses of uninitialised values
    115 created by heap or stack allocations.  But it does not perform bounds
    116 checking for stack or global arrays.</p>
    117 <p>SGCheck, on the other hand, does do bounds checking for stack or
    118 global arrays, but it doesn't do anything else.</p>
    119 </div>
    120 <div class="sect1">
    121 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    122 <a name="sg-manual.limitations"></a>11.5.Limitations</h2></div></div></div>
    123 <p>This is an experimental tool, which relies rather too heavily on some
    124 not-as-robust-as-I-would-like assumptions on the behaviour of correct
    125 programs.  There are a number of limitations which you should be aware
    126 of.</p>
    127 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
    128 <li class="listitem">
    129 <p>False negatives (missed errors): it follows from the
    130    description above (<a class="xref" href="sg-manual.html#sg-manual.how-works.sg-checks" title="11.3.How SGCheck Works">How SGCheck Works</a>)
    131    that the first access by a memory referencing instruction to a
    132    stack or global array creates an association between that
    133    instruction and the array, which is checked on subsequent accesses
    134    by that instruction, until the containing function exits.  Hence,
    135    the first access by an instruction to an array (in any given
    136    function instantiation) is not checked for overrun, since SGCheck
    137    uses that as the "example" of how subsequent accesses should
    138    behave.</p>
    139 <p>It also means that errors will not be found in an instruction
    140      executed only once (e.g. because this instruction is not in a loop,
    141      or the loop is executed only once).</p>
    142 </li>
    143 <li class="listitem">
    144 <p>False positives (false errors): similarly, and more serious,
    145    it is clearly possible to write legitimate pieces of code which
    146    break the basic assumption upon which the checking algorithm
    147    depends.  For example:</p>
    148 <pre class="programlisting">
    149   { int a[10], b[10], *p, i;
    150     for (i = 0; i &lt; 10; i++) {
    151        p = /* arbitrary condition */  ? &amp;a[i]  : &amp;b[i];
    152        *p = 42;
    153     }
    154   }
    155 </pre>
    156 <p>In this case the store sometimes
    157    accesses <code class="computeroutput">a[]</code> and
    158    sometimes <code class="computeroutput">b[]</code>, but in no cases is
    159    the addressed array overrun.  Nevertheless the change in target
    160    will cause an error to be reported.</p>
    161 <p>It is hard to see how to get around this problem.  The only
    162    mitigating factor is that such constructions appear very rare, at
    163    least judging from the results using the tool so far.  Such a
    164    construction appears only once in the Valgrind sources (running
    165    Valgrind on Valgrind) and perhaps two or three times for a start
    166    and exit of Firefox.  The best that can be done is to suppress the
    167    errors.</p>
    168 </li>
    169 <li class="listitem"><p>Performance: SGCheck has to read all of
    170    the DWARF3 type and variable information on the executable and its
    171    shared objects.  This is computationally expensive and makes
    172    startup quite slow.  You can expect debuginfo reading time to be in
    173    the region of a minute for an OpenOffice sized application, on a
    174    2.4 GHz Core 2 machine.  Reading this information also requires a
    175    lot of memory.  To make it viable, SGCheck goes to considerable
    176    trouble to compress the in-memory representation of the DWARF3
    177    data, which is why the process of reading it appears slow.</p></li>
    178 <li class="listitem"><p>Performance: SGCheck runs slower than Memcheck.  This is
    179    partly due to a lack of tuning, but partly due to algorithmic
    180    difficulties.  The
    181    stack and global checks can sometimes require a number of range
    182    checks per memory access, and these are difficult to short-circuit,
    183    despite considerable efforts having been made.  A
    184    redesign and reimplementation could potentially make it much faster.
    185    </p></li>
    186 <li class="listitem">
    187 <p>Coverage: Stack and global checking is fragile.  If a shared
    188    object does not have debug information attached, then SGCheck will
    189    not be able to determine the bounds of any stack or global arrays
    190    defined within that shared object, and so will not be able to check
    191    accesses to them.  This is true even when those arrays are accessed
    192    from some other shared object which was compiled with debug
    193    info.</p>
    194 <p>At the moment SGCheck accepts objects lacking debuginfo
    195    without comment.  This is dangerous as it causes SGCheck to
    196    silently skip stack and global checking for such objects.  It would
    197    be better to print a warning in such circumstances.</p>
    198 </li>
    199 <li class="listitem"><p>Coverage: SGCheck does not check whether the areas read
    200    or written by system calls do overrun stack or global arrays.  This
    201    would be easy to add.</p></li>
    202 <li class="listitem"><p>Platforms: the stack/global checks won't work properly on
    203    PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
    204    That's because the stack and global checking requires tracking
    205    function calls and exits reliably, and there's no obvious way to do
    206    it on ABIs that use a link register for function returns.
    207    </p></li>
    208 <li class="listitem"><p>Robustness: related to the previous point.  Function
    209    call/exit tracking for X86 and AMD64 is believed to work properly
    210    even in the presence of longjmps within the same stack (although
    211    this has not been tested).  However, code which switches stacks is
    212    likely to cause breakage/chaos.</p></li>
    213 </ul></div>
    214 </div>
    215 <div class="sect1">
    216 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    217 <a name="sg-manual.todo-user-visible"></a>11.6.Still To Do: User-visible Functionality</h2></div></div></div>
    218 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
    219 <li class="listitem"><p>Extend system call checking to work on stack and global arrays.</p></li>
    220 <li class="listitem"><p>Print a warning if a shared object does not have debug info
    221    attached, or if, for whatever reason, debug info could not be
    222    found, or read.</p></li>
    223 <li class="listitem"><p>Add some heuristic filtering that removes obvious false
    224      positives.  This would be easy to do.  For example, an access
    225      transition from a heap to a stack object almost certainly isn't a
    226      bug and so should not be reported to the user.</p></li>
    227 </ul></div>
    228 </div>
    229 <div class="sect1">
    230 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    231 <a name="sg-manual.todo-implementation"></a>11.7.Still To Do: Implementation Tidying</h2></div></div></div>
    232 <p>Items marked CRITICAL are considered important for correctness:
    233 non-fixage of them is liable to lead to crashes or assertion failures
    234 in real use.</p>
    235 <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
    236 <li class="listitem"><p> sg_main.c: Redesign and reimplement the basic checking
    237    algorithm.  It could be done much faster than it is -- the current
    238    implementation isn't very good.
    239    </p></li>
    240 <li class="listitem"><p> sg_main.c: Improve the performance of the stack / global
    241    checks by doing some up-front filtering to ignore references in
    242    areas which "obviously" can't be stack or globals.  This will
    243    require using information that m_aspacemgr knows about the address
    244    space layout.</p></li>
    245 <li class="listitem"><p>sg_main.c: fix compute_II_hash to make it a bit more sensible
    246    for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
    247    targets, so this is a bit academic at the moment).</p></li>
    248 </ul></div>
    249 </div>
    250 </div>
    251 <div>
    252 <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
    253 <tr>
    254 <td rowspan="2" width="40%" align="left">
    255 <a accesskey="p" href="dh-manual.html">&lt;&lt;10.DHAT: a dynamic heap analysis tool</a></td>
    256 <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
    257 <td rowspan="2" width="40%" align="right"><a accesskey="n" href="bbv-manual.html">12.BBV: an experimental basic block vector generation tool&gt;&gt;</a>
    258 </td>
    259 </tr>
    260 <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
    261 </table>
    262 </div>
    263 </body>
    264 </html>
    265