Home | History | Annotate | Download | only in docs
      1 =================
      2 SanitizerCoverage
      3 =================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 Sanitizer tools have a very simple code coverage tool built in. It allows to
     12 get function-level, basic-block-level, and edge-level coverage at a very low
     13 cost.
     14 
     15 How to build and run
     16 ====================
     17 
     18 SanitizerCoverage can be used with :doc:`AddressSanitizer`,
     19 :doc:`LeakSanitizer`, :doc:`MemorySanitizer`,
     20 UndefinedBehaviorSanitizer, or without any sanitizer.  Pass one of the
     21 following compile-time flags:
     22 
     23 * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
     24 * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
     25   **extra** slowdown).
     26 * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
     27 
     28 You may also specify ``-fsanitize-coverage=indirect-calls`` for
     29 additional `caller-callee coverage`_.
     30 
     31 At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``,
     32 ``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as
     33 appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``.
     34 
     35 To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
     36 to one of the above compile-time flags. At runtime, use
     37 ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
     38 
     39 Example:
     40 
     41 .. code-block:: console
     42 
     43     % cat -n cov.cc
     44          1  #include <stdio.h>
     45          2  __attribute__((noinline))
     46          3  void foo() { printf("foo\n"); }
     47          4
     48          5  int main(int argc, char **argv) {
     49          6    if (argc == 2)
     50          7      foo();
     51          8    printf("main\n");
     52          9  }
     53     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
     54     % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
     55     main
     56     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
     57     % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
     58     foo
     59     main
     60     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
     61     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
     62 
     63 Every time you run an executable instrumented with SanitizerCoverage
     64 one ``*.sancov`` file is created during the process shutdown.
     65 If the executable is dynamically linked against instrumented DSOs,
     66 one ``*.sancov`` file will be also created for every DSO.
     67 
     68 Postprocessing
     69 ==============
     70 
     71 The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
     72 one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
     73 magic defines the size of the following offsets. The rest of the data is the
     74 offsets in the corresponding binary/DSO that were executed during the run.
     75 
     76 A simple script
     77 ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
     78 provided to dump these offsets.
     79 
     80 .. code-block:: console
     81 
     82     % sancov.py print a.out.22679.sancov a.out.22673.sancov
     83     sancov.py: read 2 PCs from a.out.22679.sancov
     84     sancov.py: read 1 PCs from a.out.22673.sancov
     85     sancov.py: 2 files merged; 2 PCs total
     86     0x465250
     87     0x4652a0
     88 
     89 You can then filter the output of ``sancov.py`` through ``addr2line --exe
     90 ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
     91 numbers:
     92 
     93 .. code-block:: console
     94 
     95     % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
     96     cov.cc:3
     97     cov.cc:5
     98 
     99 Sancov Tool
    100 ===========
    101 
    102 A new experimental ``sancov`` tool is developed to process coverage files.
    103 The tool is part of LLVM project and is currently supported only on Linux.
    104 It can handle symbolization tasks autonomously without any extra support
    105 from the environment. You need to pass .sancov files (named 
    106 ``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files. 
    107 Sancov matches these files using module names and binaries file names.
    108 
    109 .. code-block:: console
    110 
    111     USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
    112 
    113     Action (required)
    114       -print                    - Print coverage addresses
    115       -covered-functions        - Print all covered functions.
    116       -not-covered-functions    - Print all not covered functions.
    117       -html-report              - Print HTML coverage report.
    118 
    119     Options
    120       -blacklist=<string>         - Blacklist file (sanitizer blacklist format).
    121       -demangle                   - Print demangled function name.
    122       -strip_path_prefix=<string> - Strip this prefix from file paths in reports
    123 
    124 
    125 Automatic HTML Report Generation
    126 ================================
    127 
    128 If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
    129 coverage report would be automatically generated alongside the coverage files.
    130 The ``sancov`` binary should be present in ``PATH`` or
    131 ``sancov_path=<path_to_sancov`` option can be used to specify tool location.
    132 
    133 
    134 How good is the coverage?
    135 =========================
    136 
    137 It is possible to find out which PCs are not covered, by subtracting the covered
    138 set from the set of all instrumented PCs. The latter can be obtained by listing
    139 all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
    140 can do this for you. Just supply the path to binary and a list of covered PCs:
    141 
    142 .. code-block:: console
    143 
    144     % sancov.py print a.out.12345.sancov > covered.txt
    145     sancov.py: read 2 64-bit PCs from a.out.12345.sancov
    146     sancov.py: 1 file merged; 2 PCs total
    147     % sancov.py missing a.out < covered.txt
    148     sancov.py: found 3 instrumented PCs in a.out
    149     sancov.py: read 2 PCs from stdin
    150     sancov.py: 1 PCs missing from coverage
    151     0x4cc61c
    152 
    153 Edge coverage
    154 =============
    155 
    156 Consider this code:
    157 
    158 .. code-block:: c++
    159 
    160     void foo(int *a) {
    161       if (a)
    162         *a = 0;
    163     }
    164 
    165 It contains 3 basic blocks, let's name them A, B, C:
    166 
    167 .. code-block:: none
    168 
    169     A
    170     |\
    171     | \
    172     |  B
    173     | /
    174     |/
    175     C
    176 
    177 If blocks A, B, and C are all covered we know for certain that the edges A=>B
    178 and B=>C were executed, but we still don't know if the edge A=>C was executed.
    179 Such edges of control flow graph are called
    180 `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
    181 edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
    182 edges by introducing new dummy blocks and then instruments those blocks:
    183 
    184 .. code-block:: none
    185 
    186     A
    187     |\
    188     | \
    189     D  B
    190     | /
    191     |/
    192     C
    193 
    194 Bitset
    195 ======
    196 
    197 When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
    198 dumped as a bitset (text file with 1 for blocks that have been executed and 0
    199 for blocks that were not).
    200 
    201 .. code-block:: console
    202 
    203     % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
    204     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
    205     main
    206     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
    207     foo
    208     main
    209     % head *bitset*
    210     ==> a.out.38214.bitset-sancov <==
    211     01101
    212     ==> a.out.6128.bitset-sancov <==
    213     11011%
    214 
    215 For a given executable the length of the bitset is always the same (well,
    216 unless dlopen/dlclose come into play), so the bitset coverage can be
    217 easily used for bitset-based corpus distillation.
    218 
    219 Caller-callee coverage
    220 ======================
    221 
    222 (Experimental!)
    223 Every indirect function call is instrumented with a run-time function call that
    224 captures caller and callee.  At the shutdown time the process dumps a separate
    225 file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
    226 pairs of lines (odd lines are callers, even lines are callees)
    227 
    228 .. code-block:: console
    229 
    230     a.out 0x4a2e0c
    231     a.out 0x4a6510
    232     a.out 0x4a2e0c
    233     a.out 0x4a87f0
    234 
    235 Current limitations:
    236 
    237 * Only the first 14 callees for every caller are recorded, the rest are silently
    238   ignored.
    239 * The output format is not very compact since caller and callee may reside in
    240   different modules and we need to spell out the module names.
    241 * The routine that dumps the output is not optimized for speed
    242 * Only Linux x86_64 is tested so far.
    243 * Sandboxes are not supported.
    244 
    245 Coverage counters
    246 =================
    247 
    248 This experimental feature is inspired by
    249 `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
    250 instrumentation. With additional compile-time and run-time flags you can get
    251 more sensitive coverage information.  In addition to boolean values assigned to
    252 every basic block (edge) the instrumentation will collect imprecise counters.
    253 On exit, every counter will be mapped to a 8-bit bitset representing counter
    254 ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
    255 be dumped to disk.
    256 
    257 .. code-block:: console
    258 
    259     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
    260     % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
    261     % ls -l *counters-sancov
    262     ... a.out.17110.counters-sancov
    263     % xxd *counters-sancov
    264     0000000: 0001 0100 01
    265 
    266 These counters may also be used for in-process coverage-guided fuzzers. See
    267 ``include/sanitizer/coverage_interface.h``:
    268 
    269 .. code-block:: c++
    270 
    271     // The coverage instrumentation may optionally provide imprecise counters.
    272     // Rather than exposing the counter values to the user we instead map
    273     // the counters to a bitset.
    274     // Every counter is associated with 8 bits in the bitset.
    275     // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
    276     // The i-th bit is set to 1 if the counter value is in the i-th range.
    277     // This counter-based coverage implementation is *not* thread-safe.
    278 
    279     // Returns the number of registered coverage counters.
    280     uintptr_t __sanitizer_get_number_of_counters();
    281     // Updates the counter 'bitset', clears the counters and returns the number of
    282     // new bits in 'bitset'.
    283     // If 'bitset' is nullptr, only clears the counters.
    284     // Otherwise 'bitset' should be at least
    285     // __sanitizer_get_number_of_counters bytes long and 8-aligned.
    286     uintptr_t
    287     __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
    288 
    289 Tracing basic blocks
    290 ====================
    291 Experimental support for basic block (or edge) tracing.
    292 With ``-fsanitize-coverage=trace-bb`` the compiler will insert
    293 ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
    294 (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
    295 Example:
    296 
    297 .. code-block:: console
    298 
    299     % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc
    300     % ASAN_OPTIONS=coverage=1 ./a.out
    301 
    302 This will produce two files after the process exit:
    303 `trace-points.PID.sancov` and `trace-events.PID.sancov`.
    304 The first file will contain a textual description of all the instrumented points in the program
    305 in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line.
    306 The second file will contain the actual execution trace as a sequence of 4-byte integers
    307 -- these integers are the indices into the array of instrumented points (the first file).
    308 
    309 Basic block tracing is currently supported only for single-threaded applications.
    310 
    311 
    312 Tracing PCs
    313 ===========
    314 *Experimental* feature similar to tracing basic blocks, but with a different API.
    315 With ``-fsanitize-coverage=trace-pc`` the compiler will insert
    316 ``__sanitizer_cov_trace_pc()`` on every edge.
    317 With an additional ``...=trace-pc,indirect-calls`` flag
    318 ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
    319 These callbacks are not implemented in the Sanitizer run-time and should be defined
    320 by the user. So, these flags do not require the other sanitizer to be used.
    321 This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
    322 and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
    323 
    324 Tracing data flow
    325 =================
    326 
    327 An *experimental* feature to support data-flow-guided fuzzing.
    328 With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
    329 around comparison instructions and switch statements.
    330 The fuzzer will need to define the following functions,
    331 they will be called by the instrumented code.
    332 
    333 .. code-block:: c++
    334 
    335   // Called before a comparison instruction.
    336   // SizeAndType is a packed value containing
    337   //   - [63:32] the Size of the operands of comparison in bits
    338   //   - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
    339   // Arg1 and Arg2 are arguments of the comparison.
    340   void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
    341 
    342   // Called before a switch statement.
    343   // Val is the switch operand.
    344   // Cases[0] is the number of case constants.
    345   // Cases[1] is the size of Val in bits.
    346   // Cases[2:] are the case constants.
    347   void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
    348 
    349 This interface is a subject to change.
    350 The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
    351 
    352 Output directory
    353 ================
    354 
    355 By default, .sancov files are created in the current working directory.
    356 This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
    357 
    358 .. code-block:: console
    359 
    360     % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
    361     % ls -l /tmp/cov/*sancov
    362     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
    363     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
    364 
    365 Sudden death
    366 ============
    367 
    368 Normally, coverage data is collected in memory and saved to disk when the
    369 program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
    370 ``__sanitizer_cov_dump()`` is called.
    371 
    372 If the program ends with a signal that ASan does not handle (or can not handle
    373 at all, like SIGKILL), coverage data will be lost. This is a big problem on
    374 Android, where SIGKILL is a normal way of evicting applications from memory.
    375 
    376 With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
    377 memory-mapped file as soon as it collected.
    378 
    379 .. code-block:: console
    380 
    381     % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
    382     main
    383     % ls
    384     7036.sancov.map  7036.sancov.raw  a.out
    385     % sancov.py rawunpack 7036.sancov.raw
    386     sancov.py: reading map 7036.sancov.map
    387     sancov.py: unpacking 7036.sancov.raw
    388     writing 1 PCs to a.out.7036.sancov
    389     % sancov.py print a.out.7036.sancov
    390     sancov.py: read 1 PCs from a.out.7036.sancov
    391     sancov.py: 1 files merged; 1 PCs total
    392     0x4b2bae
    393 
    394 Note that on 64-bit platforms, this method writes 2x more data than the default,
    395 because it stores full PC values instead of 32-bit offsets.
    396 
    397 In-process fuzzing
    398 ==================
    399 
    400 Coverage data could be useful for fuzzers and sometimes it is preferable to run
    401 a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
    402 
    403 You can use ``__sanitizer_get_total_unique_coverage()`` from
    404 ``<sanitizer/coverage_interface.h>`` which returns the number of currently
    405 covered entities in the program. This will tell the fuzzer if the coverage has
    406 increased after testing every new input.
    407 
    408 If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
    409 before exiting the process.  Use ``__asan_set_death_callback`` from
    410 ``<sanitizer/asan_interface.h>`` to do that.
    411 
    412 An example of such fuzzer can be found in `the LLVM tree
    413 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
    414 
    415 Performance
    416 ===========
    417 
    418 This coverage implementation is **fast**. With function-level coverage
    419 (``-fsanitize-coverage=func``) the overhead is not measurable. With
    420 basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
    421 between 0 and 25%.
    422 
    423 ==============  =========  =========  =========  =========  =========  =========
    424      benchmark      cov0        cov1   diff 0-1       cov2   diff 0-2   diff 1-2
    425 ==============  =========  =========  =========  =========  =========  =========
    426  400.perlbench    1296.00    1307.00       1.01    1465.00       1.13       1.12
    427      401.bzip2     858.00     854.00       1.00    1010.00       1.18       1.18
    428        403.gcc     613.00     617.00       1.01     683.00       1.11       1.11
    429        429.mcf     605.00     582.00       0.96     610.00       1.01       1.05
    430      445.gobmk     896.00     880.00       0.98    1050.00       1.17       1.19
    431      456.hmmer     892.00     892.00       1.00     918.00       1.03       1.03
    432      458.sjeng     995.00    1009.00       1.01    1217.00       1.22       1.21
    433 462.libquantum     497.00     492.00       0.99     534.00       1.07       1.09
    434    464.h264ref    1461.00    1467.00       1.00    1543.00       1.06       1.05
    435    471.omnetpp     575.00     590.00       1.03     660.00       1.15       1.12
    436      473.astar     658.00     652.00       0.99     715.00       1.09       1.10
    437  483.xalancbmk     471.00     491.00       1.04     582.00       1.24       1.19
    438       433.milc     616.00     627.00       1.02     627.00       1.02       1.00
    439       444.namd     602.00     601.00       1.00     654.00       1.09       1.09
    440     447.dealII     630.00     634.00       1.01     653.00       1.04       1.03
    441     450.soplex     365.00     368.00       1.01     395.00       1.08       1.07
    442     453.povray     427.00     434.00       1.02     495.00       1.16       1.14
    443        470.lbm     357.00     375.00       1.05     370.00       1.04       0.99
    444    482.sphinx3     927.00     928.00       1.00    1000.00       1.08       1.08
    445 ==============  =========  =========  =========  =========  =========  =========
    446 
    447 Why another coverage?
    448 =====================
    449 
    450 Why did we implement yet another code coverage?
    451   * We needed something that is lightning fast, plays well with
    452     AddressSanitizer, and does not significantly increase the binary size.
    453   * Traditional coverage implementations based in global counters
    454     `suffer from contention on counters
    455     <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.
    456