Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 
      7 <chapter id="sg-manual" 
      8          xreflabel="SGCheck: an experimental stack and global array overrun detector">
      9   <title>SGCheck: an experimental stack and global array overrun detector</title>
     10 
     11 <para>To use this tool, you must specify
     12 <option>--tool=exp-sgcheck</option> on the Valgrind
     13 command line.</para>
     14 
     15 
     16 
     17 
     18 <sect1 id="sg-manual.overview" xreflabel="Overview">
     19 <title>Overview</title>
     20 
     21 <para>SGCheck is a tool for finding overruns of stack and global
     22 arrays.  It works by using a heuristic approach derived from an
     23 observation about the likely forms of stack and global array accesses.
     24 </para>
     25 
     26 </sect1>
     27 
     28 
     29 
     30 
     31 <sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
     32 <title>SGCheck Command-line Options</title>
     33 
     34 <para>There are no SGCheck-specific command-line options at present.</para>
     35 <!--
     36 <para>SGCheck-specific command-line options are:</para>
     37 
     38 
     39 <variablelist id="sg.opts.list">
     40 </variablelist>
     41 -->
     42 
     43 </sect1>
     44 
     45 
     46 
     47 <sect1 id="sg-manual.how-works.sg-checks"
     48        xreflabel="How SGCheck Works">
     49 <title>How SGCheck Works</title>
     50 
     51 <para>When a source file is compiled
     52 with <option>-g</option>, the compiler attaches DWARF3
     53 debugging information which describes the location of all stack and
     54 global arrays in the file.</para>
     55 
     56 <para>Checking of accesses to such arrays would then be relatively
     57 simple, if the compiler could also tell us which array (if any) each
     58 memory referencing instruction was supposed to access.  Unfortunately
     59 the DWARF3 debugging format does not provide a way to represent such
     60 information, so we have to resort to a heuristic technique to
     61 approximate it.  The key observation is that
     62    <emphasis>
     63    if a memory referencing instruction accesses inside a stack or
     64    global array once, then it is highly likely to always access that
     65    same array</emphasis>.</para>
     66 
     67 <para>To see how this might be useful, consider the following buggy
     68 fragment:</para>
     69 <programlisting><![CDATA[
     70    { int i, a[10];  // both are auto vars
     71      for (i = 0; i <= 10; i++)
     72         a[i] = 42;
     73    }
     74 ]]></programlisting>
     75 
     76 <para>At run time we will know the precise address
     77 of <computeroutput>a[]</computeroutput> on the stack, and so we can
     78 observe that the first store resulting from <computeroutput>a[i] =
     79 42</computeroutput> writes <computeroutput>a[]</computeroutput>, and
     80 we will (correctly) assume that that instruction is intended always to
     81 access <computeroutput>a[]</computeroutput>.  Then, on the 11th
     82 iteration, it accesses somewhere else, possibly a different local,
     83 possibly an un-accounted for area of the stack (eg, spill slot), so
     84 SGCheck reports an error.</para>
     85 
     86 <para>There is an important caveat.</para>
     87 
     88 <para>Imagine a function such as <function>memcpy</function>, which is used
     89 to read and write many different areas of memory over the lifetime of the
     90 program.  If we insist that the read and write instructions in its memory
     91 copying loop only ever access one particular stack or global variable, we
     92 will be flooded with errors resulting from calls to
     93 <function>memcpy</function>.</para>
     94 
     95 <para>To avoid this problem, SGCheck instantiates fresh likely-target
     96 records for each entry to a function, and discards them on exit.  This
     97 allows detection of cases where (e.g.) <function>memcpy</function>
     98 overflows its source or destination buffers for any specific call, but
     99 does not carry any restriction from one call to the next.  Indeed,
    100 multiple threads may make multiple simultaneous calls to
    101 (e.g.) <function>memcpy</function> without mutual interference.</para>
    102 
    103 </sect1>
    104 
    105 
    106 
    107 
    108 <sect1 id="sg-manual.cmp-w-memcheck"
    109        xreflabel="Comparison with Memcheck">
    110 <title>Comparison with Memcheck</title>
    111 
    112 <para>SGCheck and Memcheck are complementary: their capabilities do
    113 not overlap.  Memcheck performs bounds checks and use-after-free
    114 checks for heap arrays.  It also finds uses of uninitialised values
    115 created by heap or stack allocations.  But it does not perform bounds
    116 checking for stack or global arrays.</para>
    117 
    118 <para>SGCheck, on the other hand, does do bounds checking for stack or
    119 global arrays, but it doesn't do anything else.</para>
    120 
    121 </sect1>
    122 
    123 
    124 
    125 
    126 
    127 <sect1 id="sg-manual.limitations"
    128        xreflabel="Limitations">
    129 <title>Limitations</title>
    130 
    131 <para>This is an experimental tool, which relies rather too heavily on some
    132 not-as-robust-as-I-would-like assumptions on the behaviour of correct
    133 programs.  There are a number of limitations which you should be aware
    134 of.</para>
    135 
    136 <itemizedlist>
    137 
    138   <listitem>
    139    <para>False negatives (missed errors): it follows from the
    140    description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
    141    that the first access by a memory referencing instruction to a
    142    stack or global array creates an association between that
    143    instruction and the array, which is checked on subsequent accesses
    144    by that instruction, until the containing function exits.  Hence,
    145    the first access by an instruction to an array (in any given
    146    function instantiation) is not checked for overrun, since SGCheck
    147    uses that as the "example" of how subsequent accesses should
    148    behave.</para>
    149   </listitem>
    150 
    151   <listitem>
    152    <para>False positives (false errors): similarly, and more serious,
    153    it is clearly possible to write legitimate pieces of code which
    154    break the basic assumption upon which the checking algorithm
    155    depends.  For example:</para>
    156 
    157 <programlisting><![CDATA[
    158   { int a[10], b[10], *p, i;
    159     for (i = 0; i < 10; i++) {
    160        p = /* arbitrary condition */  ? &a[i]  : &b[i];
    161        *p = 42;
    162     }
    163   }
    164 ]]></programlisting>
    165 
    166    <para>In this case the store sometimes
    167    accesses <computeroutput>a[]</computeroutput> and
    168    sometimes <computeroutput>b[]</computeroutput>, but in no cases is
    169    the addressed array overrun.  Nevertheless the change in target
    170    will cause an error to be reported.</para>
    171 
    172    <para>It is hard to see how to get around this problem.  The only
    173    mitigating factor is that such constructions appear very rare, at
    174    least judging from the results using the tool so far.  Such a
    175    construction appears only once in the Valgrind sources (running
    176    Valgrind on Valgrind) and perhaps two or three times for a start
    177    and exit of Firefox.  The best that can be done is to suppress the
    178    errors.</para>
    179   </listitem>
    180 
    181   <listitem>
    182    <para>Performance: SGCheck has to read all of
    183    the DWARF3 type and variable information on the executable and its
    184    shared objects.  This is computationally expensive and makes
    185    startup quite slow.  You can expect debuginfo reading time to be in
    186    the region of a minute for an OpenOffice sized application, on a
    187    2.4 GHz Core 2 machine.  Reading this information also requires a
    188    lot of memory.  To make it viable, SGCheck goes to considerable
    189    trouble to compress the in-memory representation of the DWARF3
    190    data, which is why the process of reading it appears slow.</para>
    191   </listitem>
    192 
    193   <listitem>
    194    <para>Performance: SGCheck runs slower than Memcheck.  This is
    195    partly due to a lack of tuning, but partly due to algorithmic
    196    difficulties.  The
    197    stack and global checks can sometimes require a number of range
    198    checks per memory access, and these are difficult to short-circuit,
    199    despite considerable efforts having been made.  A
    200    redesign and reimplementation could potentially make it much faster.
    201    </para>
    202   </listitem>
    203 
    204   <listitem>
    205    <para>Coverage: Stack and global checking is fragile.  If a shared
    206    object does not have debug information attached, then SGCheck will
    207    not be able to determine the bounds of any stack or global arrays
    208    defined within that shared object, and so will not be able to check
    209    accesses to them.  This is true even when those arrays are accessed
    210    from some other shared object which was compiled with debug
    211    info.</para>
    212 
    213    <para>At the moment SGCheck accepts objects lacking debuginfo
    214    without comment.  This is dangerous as it causes SGCheck to
    215    silently skip stack and global checking for such objects.  It would
    216    be better to print a warning in such circumstances.</para>
    217   </listitem>
    218 
    219   <listitem>
    220    <para>Coverage: SGCheck does not check whether the the areas read
    221    or written by system calls do overrun stack or global arrays.  This
    222    would be easy to add.</para>
    223   </listitem>
    224 
    225   <listitem>
    226    <para>Platforms: the stack/global checks won't work properly on
    227    PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
    228    That's because the stack and global checking requires tracking
    229    function calls and exits reliably, and there's no obvious way to do
    230    it on ABIs that use a link register for function returns.
    231    </para>
    232   </listitem>
    233 
    234   <listitem>
    235    <para>Robustness: related to the previous point.  Function
    236    call/exit tracking for X86 and AMD64 is believed to work properly
    237    even in the presence of longjmps within the same stack (although
    238    this has not been tested).  However, code which switches stacks is
    239    likely to cause breakage/chaos.</para>
    240   </listitem>
    241 </itemizedlist>
    242 
    243 </sect1>
    244 
    245 
    246 
    247 
    248 
    249 <sect1 id="sg-manual.todo-user-visible"
    250        xreflabel="Still To Do: User-visible Functionality">
    251 <title>Still To Do: User-visible Functionality</title>
    252 
    253 <itemizedlist>
    254 
    255   <listitem>
    256    <para>Extend system call checking to work on stack and global arrays.</para>
    257   </listitem>
    258 
    259   <listitem>
    260    <para>Print a warning if a shared object does not have debug info
    261    attached, or if, for whatever reason, debug info could not be
    262    found, or read.</para>
    263   </listitem>
    264 
    265   <listitem>
    266    <para>Add some heuristic filtering that removes obvious false
    267      positives.  This would be easy to do.  For example, an access
    268      transition from a heap to a stack object almost certainly isn't a
    269      bug and so should not be reported to the user.</para>
    270   </listitem>
    271 
    272 </itemizedlist>
    273 
    274 </sect1>
    275 
    276 
    277 
    278 
    279 <sect1 id="sg-manual.todo-implementation"
    280        xreflabel="Still To Do: Implementation Tidying">
    281 <title>Still To Do: Implementation Tidying</title>
    282 
    283 <para>Items marked CRITICAL are considered important for correctness:
    284 non-fixage of them is liable to lead to crashes or assertion failures
    285 in real use.</para>
    286 
    287 <itemizedlist>
    288 
    289   <listitem>
    290    <para> sg_main.c: Redesign and reimplement the basic checking
    291    algorithm.  It could be done much faster than it is -- the current
    292    implementation isn't very good.
    293    </para>
    294   </listitem>
    295 
    296   <listitem>
    297    <para> sg_main.c: Improve the performance of the stack / global
    298    checks by doing some up-front filtering to ignore references in
    299    areas which "obviously" can't be stack or globals.  This will
    300    require using information that m_aspacemgr knows about the address
    301    space layout.</para>
    302   </listitem>
    303  
    304   <listitem>
    305    <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
    306    for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
    307    targets, so this is a bit academic at the moment).</para>
    308   </listitem>
    309   
    310 </itemizedlist>
    311 
    312 </sect1>
    313 
    314 
    315 
    316 </chapter>
    317