Home | History | Annotate | Download | only in docs
      1 =================
      2 SanitizerCoverage
      3 =================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 Sanitizer tools have a very simple code coverage tool built in. It allows to
     12 get function-level, basic-block-level, and edge-level coverage at a very low
     13 cost.
     14 
     15 How to build and run
     16 ====================
     17 
     18 SanitizerCoverage can be used with :doc:`AddressSanitizer`,
     19 :doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
     20 In addition to ``-fsanitize=``, pass one of the following compile-time flags:
     21 
     22 * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
     23 * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
     24   **extra** slowdown).
     25 * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
     26 
     27 You may also specify ``-fsanitize-coverage=indirect-calls`` for
     28 additional `caller-callee coverage`_.
     29 
     30 At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
     31 ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
     32 
     33 To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
     34 to one of the above compile-time flags. At runtime, use
     35 ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
     36 
     37 Example:
     38 
     39 .. code-block:: console
     40 
     41     % cat -n cov.cc
     42          1  #include <stdio.h>
     43          2  __attribute__((noinline))
     44          3  void foo() { printf("foo\n"); }
     45          4
     46          5  int main(int argc, char **argv) {
     47          6    if (argc == 2)
     48          7      foo();
     49          8    printf("main\n");
     50          9  }
     51     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
     52     % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
     53     main
     54     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
     55     % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
     56     foo
     57     main
     58     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
     59     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
     60 
     61 Every time you run an executable instrumented with SanitizerCoverage
     62 one ``*.sancov`` file is created during the process shutdown.
     63 If the executable is dynamically linked against instrumented DSOs,
     64 one ``*.sancov`` file will be also created for every DSO.
     65 
     66 Postprocessing
     67 ==============
     68 
     69 The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
     70 one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
     71 magic defines the size of the following offsets. The rest of the data is the
     72 offsets in the corresponding binary/DSO that were executed during the run.
     73 
     74 A simple script
     75 ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
     76 provided to dump these offsets.
     77 
     78 .. code-block:: console
     79 
     80     % sancov.py print a.out.22679.sancov a.out.22673.sancov
     81     sancov.py: read 2 PCs from a.out.22679.sancov
     82     sancov.py: read 1 PCs from a.out.22673.sancov
     83     sancov.py: 2 files merged; 2 PCs total
     84     0x465250
     85     0x4652a0
     86 
     87 You can then filter the output of ``sancov.py`` through ``addr2line --exe
     88 ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
     89 numbers:
     90 
     91 .. code-block:: console
     92 
     93     % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
     94     cov.cc:3
     95     cov.cc:5
     96 
     97 How good is the coverage?
     98 =========================
     99 
    100 It is possible to find out which PCs are not covered, by subtracting the covered
    101 set from the set of all instrumented PCs. The latter can be obtained by listing
    102 all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
    103 can do this for you. Just supply the path to binary and a list of covered PCs:
    104 
    105 .. code-block:: console
    106 
    107     % sancov.py print a.out.12345.sancov > covered.txt
    108     sancov.py: read 2 64-bit PCs from a.out.12345.sancov
    109     sancov.py: 1 file merged; 2 PCs total
    110     % sancov.py missing a.out < covered.txt
    111     sancov.py: found 3 instrumented PCs in a.out
    112     sancov.py: read 2 PCs from stdin
    113     sancov.py: 1 PCs missing from coverage
    114     0x4cc61c
    115 
    116 Edge coverage
    117 =============
    118 
    119 Consider this code:
    120 
    121 .. code-block:: c++
    122 
    123     void foo(int *a) {
    124       if (a)
    125         *a = 0;
    126     }
    127 
    128 It contains 3 basic blocks, let's name them A, B, C:
    129 
    130 .. code-block:: none
    131 
    132     A
    133     |\
    134     | \
    135     |  B
    136     | /
    137     |/
    138     C
    139 
    140 If blocks A, B, and C are all covered we know for certain that the edges A=>B
    141 and B=>C were executed, but we still don't know if the edge A=>C was executed.
    142 Such edges of control flow graph are called
    143 `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
    144 edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
    145 edges by introducing new dummy blocks and then instruments those blocks:
    146 
    147 .. code-block:: none
    148 
    149     A
    150     |\
    151     | \
    152     D  B
    153     | /
    154     |/
    155     C
    156 
    157 Bitset
    158 ======
    159 
    160 When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
    161 dumped as a bitset (text file with 1 for blocks that have been executed and 0
    162 for blocks that were not).
    163 
    164 .. code-block:: console
    165 
    166     % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
    167     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
    168     main
    169     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
    170     foo
    171     main
    172     % head *bitset*
    173     ==> a.out.38214.bitset-sancov <==
    174     01101
    175     ==> a.out.6128.bitset-sancov <==
    176     11011%
    177 
    178 For a given executable the length of the bitset is always the same (well,
    179 unless dlopen/dlclose come into play), so the bitset coverage can be
    180 easily used for bitset-based corpus distillation.
    181 
    182 Caller-callee coverage
    183 ======================
    184 
    185 (Experimental!)
    186 Every indirect function call is instrumented with a run-time function call that
    187 captures caller and callee.  At the shutdown time the process dumps a separate
    188 file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
    189 pairs of lines (odd lines are callers, even lines are callees)
    190 
    191 .. code-block:: console
    192 
    193     a.out 0x4a2e0c
    194     a.out 0x4a6510
    195     a.out 0x4a2e0c
    196     a.out 0x4a87f0
    197 
    198 Current limitations:
    199 
    200 * Only the first 14 callees for every caller are recorded, the rest are silently
    201   ignored.
    202 * The output format is not very compact since caller and callee may reside in
    203   different modules and we need to spell out the module names.
    204 * The routine that dumps the output is not optimized for speed
    205 * Only Linux x86_64 is tested so far.
    206 * Sandboxes are not supported.
    207 
    208 Coverage counters
    209 =================
    210 
    211 This experimental feature is inspired by
    212 `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
    213 instrumentation. With additional compile-time and run-time flags you can get
    214 more sensitive coverage information.  In addition to boolean values assigned to
    215 every basic block (edge) the instrumentation will collect imprecise counters.
    216 On exit, every counter will be mapped to a 8-bit bitset representing counter
    217 ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
    218 be dumped to disk.
    219 
    220 .. code-block:: console
    221 
    222     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
    223     % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
    224     % ls -l *counters-sancov
    225     ... a.out.17110.counters-sancov
    226     % xxd *counters-sancov
    227     0000000: 0001 0100 01
    228 
    229 These counters may also be used for in-process coverage-guided fuzzers. See
    230 ``include/sanitizer/coverage_interface.h``:
    231 
    232 .. code-block:: c++
    233 
    234     // The coverage instrumentation may optionally provide imprecise counters.
    235     // Rather than exposing the counter values to the user we instead map
    236     // the counters to a bitset.
    237     // Every counter is associated with 8 bits in the bitset.
    238     // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
    239     // The i-th bit is set to 1 if the counter value is in the i-th range.
    240     // This counter-based coverage implementation is *not* thread-safe.
    241 
    242     // Returns the number of registered coverage counters.
    243     uintptr_t __sanitizer_get_number_of_counters();
    244     // Updates the counter 'bitset', clears the counters and returns the number of
    245     // new bits in 'bitset'.
    246     // If 'bitset' is nullptr, only clears the counters.
    247     // Otherwise 'bitset' should be at least
    248     // __sanitizer_get_number_of_counters bytes long and 8-aligned.
    249     uintptr_t
    250     __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
    251 
    252 Tracing basic blocks
    253 ====================
    254 An *experimental* feature to support basic block (or edge) tracing.
    255 With ``-fsanitize-coverage=trace-bb`` the compiler will insert
    256 ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
    257 (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
    258 
    259 Tracing data flow
    260 =================
    261 
    262 An *experimental* feature to support data-flow-guided fuzzing.
    263 With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
    264 around comparison instructions and switch statements.
    265 The fuzzer will need to define the following functions,
    266 they will be called by the instrumented code.
    267 
    268 .. code-block:: c++
    269 
    270   // Called before a comparison instruction.
    271   // SizeAndType is a packed value containing
    272   //   - [63:32] the Size of the operands of comparison in bits
    273   //   - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
    274   // Arg1 and Arg2 are arguments of the comparison.
    275   void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
    276 
    277   // Called before a switch statement.
    278   // Val is the switch operand.
    279   // Cases[0] is the number of case constants.
    280   // Cases[1] is the size of Val in bits.
    281   // Cases[2:] are the case constants.
    282   void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
    283 
    284 This interface is a subject to change.
    285 The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
    286 
    287 Output directory
    288 ================
    289 
    290 By default, .sancov files are created in the current working directory.
    291 This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
    292 
    293 .. code-block:: console
    294 
    295     % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
    296     % ls -l /tmp/cov/*sancov
    297     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
    298     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
    299 
    300 Sudden death
    301 ============
    302 
    303 Normally, coverage data is collected in memory and saved to disk when the
    304 program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
    305 ``__sanitizer_cov_dump()`` is called.
    306 
    307 If the program ends with a signal that ASan does not handle (or can not handle
    308 at all, like SIGKILL), coverage data will be lost. This is a big problem on
    309 Android, where SIGKILL is a normal way of evicting applications from memory.
    310 
    311 With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
    312 memory-mapped file as soon as it collected.
    313 
    314 .. code-block:: console
    315 
    316     % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
    317     main
    318     % ls
    319     7036.sancov.map  7036.sancov.raw  a.out
    320     % sancov.py rawunpack 7036.sancov.raw
    321     sancov.py: reading map 7036.sancov.map
    322     sancov.py: unpacking 7036.sancov.raw
    323     writing 1 PCs to a.out.7036.sancov
    324     % sancov.py print a.out.7036.sancov
    325     sancov.py: read 1 PCs from a.out.7036.sancov
    326     sancov.py: 1 files merged; 1 PCs total
    327     0x4b2bae
    328 
    329 Note that on 64-bit platforms, this method writes 2x more data than the default,
    330 because it stores full PC values instead of 32-bit offsets.
    331 
    332 In-process fuzzing
    333 ==================
    334 
    335 Coverage data could be useful for fuzzers and sometimes it is preferable to run
    336 a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
    337 
    338 You can use ``__sanitizer_get_total_unique_coverage()`` from
    339 ``<sanitizer/coverage_interface.h>`` which returns the number of currently
    340 covered entities in the program. This will tell the fuzzer if the coverage has
    341 increased after testing every new input.
    342 
    343 If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
    344 before exiting the process.  Use ``__asan_set_death_callback`` from
    345 ``<sanitizer/asan_interface.h>`` to do that.
    346 
    347 An example of such fuzzer can be found in `the LLVM tree
    348 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
    349 
    350 Performance
    351 ===========
    352 
    353 This coverage implementation is **fast**. With function-level coverage
    354 (``-fsanitize-coverage=func``) the overhead is not measurable. With
    355 basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
    356 between 0 and 25%.
    357 
    358 ==============  =========  =========  =========  =========  =========  =========
    359      benchmark      cov0        cov1   diff 0-1       cov2   diff 0-2   diff 1-2
    360 ==============  =========  =========  =========  =========  =========  =========
    361  400.perlbench    1296.00    1307.00       1.01    1465.00       1.13       1.12
    362      401.bzip2     858.00     854.00       1.00    1010.00       1.18       1.18
    363        403.gcc     613.00     617.00       1.01     683.00       1.11       1.11
    364        429.mcf     605.00     582.00       0.96     610.00       1.01       1.05
    365      445.gobmk     896.00     880.00       0.98    1050.00       1.17       1.19
    366      456.hmmer     892.00     892.00       1.00     918.00       1.03       1.03
    367      458.sjeng     995.00    1009.00       1.01    1217.00       1.22       1.21
    368 462.libquantum     497.00     492.00       0.99     534.00       1.07       1.09
    369    464.h264ref    1461.00    1467.00       1.00    1543.00       1.06       1.05
    370    471.omnetpp     575.00     590.00       1.03     660.00       1.15       1.12
    371      473.astar     658.00     652.00       0.99     715.00       1.09       1.10
    372  483.xalancbmk     471.00     491.00       1.04     582.00       1.24       1.19
    373       433.milc     616.00     627.00       1.02     627.00       1.02       1.00
    374       444.namd     602.00     601.00       1.00     654.00       1.09       1.09
    375     447.dealII     630.00     634.00       1.01     653.00       1.04       1.03
    376     450.soplex     365.00     368.00       1.01     395.00       1.08       1.07
    377     453.povray     427.00     434.00       1.02     495.00       1.16       1.14
    378        470.lbm     357.00     375.00       1.05     370.00       1.04       0.99
    379    482.sphinx3     927.00     928.00       1.00    1000.00       1.08       1.08
    380 ==============  =========  =========  =========  =========  =========  =========
    381 
    382 Why another coverage?
    383 =====================
    384 
    385 Why did we implement yet another code coverage?
    386   * We needed something that is lightning fast, plays well with
    387     AddressSanitizer, and does not significantly increase the binary size.
    388   * Traditional coverage implementations based in global counters
    389     `suffer from contention on counters
    390     <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.
    391