Home | History | Annotate | Download | only in doc
      1 <?xml version="1.0" encoding="UTF-8" ?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      3 <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
      4 <head>
      5   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      6   <link rel="stylesheet" href="resources/doc.css" charset="UTF-8" type="text/css" />
      7   <link rel="stylesheet" href="../coverage/jacoco-resources/prettify.css" charset="UTF-8" type="text/css" />
      8   <link rel="shortcut icon" href="resources/report.gif" type="image/gif" />
      9   <script type="text/javascript" src="../coverage/jacoco-resources/prettify.js"></script>
     10   <title>JaCoCo - Control Flow Analysis</title>
     11 </head>
     12 <body onload="prettyPrint()">
     13 
     14 <div class="breadcrumb">
     15   <a href="../index.html" class="el_report">JaCoCo</a> &gt;
     16   <a href="index.html" class="el_group">Documentation</a> &gt;
     17   <span class="el_source">Control Flow Analysis</span>
     18 </div>
     19 <div id="content"> 
     20 
     21 <h1>Control Flow Analysis for Java Methods</h1>
     22 
     23 <p class="hint">
     24   Implementing a coverage tool that supports statement (C0) as well as branch
     25   coverage coverage (C1) requires detailed analysis of the internal control flow
     26   of Java methods. Due to the architecture of JaCoCo this analysis happens on
     27   the bytecode of compiled class files. This document describes JaCoCo's
     28   strategies for inserting probes into the control flow at runtime and analyzing
     29   the actual code coverage. Marc R. Hoffmann, November 2011
     30 </p>
     31 
     32 <h2>Control Flow Graphs for Java Bytecode</h2>
     33 
     34 <p> 
     35   As an starting point we take the following example method that contains a
     36   single branching point:
     37 </p>
     38 
     39 <pre class="source lang-java linenums">
     40 public static void example() {
     41     a();
     42     if (cond()) {
     43         b();
     44     } else {
     45         c();
     46     }
     47     d();
     48 }
     49 </pre>
     50 
     51 <p>
     52   A Java compiler will create the following bytecode from this example method.
     53   Java bytecode is a linear sequence of instructions. Control flow is
     54   implemented with <i>jump</i> instructions like the conditional
     55   <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump
     56   targets are technically relative offsets to the target instruction. For better
     57   readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead
     58   (also the ASM API uses such symbolic labels):  
     59 </p>
     60 
     61 <pre class="source linenums">
     62 public static example()V
     63       INVOKESTATIC a()V
     64       INVOKESTATIC cond()Z
     65       IFEQ L1
     66       INVOKESTATIC b()V
     67       GOTO L2
     68   L1: INVOKESTATIC c()V
     69   L2: INVOKESTATIC d()V
     70       RETURN
     71 </pre>
     72 
     73 <p>
     74   The possible control flow in the bytecode above can be represented by a graph.
     75   The nodes are byte code instruction, the edged of the graph represent the
     76   possible control flow between the instructions. The control flow of the
     77   example is shown in the left box of this diagram:
     78 </p>
     79 
     80 <img src="resources/flow-example.png" alt="Bytecode Control Flow"/>
     81 
     82 
     83 <h3>Flow Edges</h3>
     84 
     85 <p>
     86   The control flow graph of a Java method defined by Java byte code may have
     87   the following Edges. Each edge connects a source instruction with a target
     88   instruction. In some cases the source instruction or the target instruction
     89   does not exist (virtual edges for method entry and exit) or cannot be
     90   exactly specified (exception handlers). 
     91 </p>
     92 
     93 <table class="coverage">
     94   <thead>
     95     <tr>
     96       <td>Type</td>
     97       <td>Source</td>
     98       <td>Target</td>
     99       <td>Remarks</td>
    100     </tr>
    101   </thead>
    102   <tbody>
    103     <tr>
    104       <td>ENTRY</td>
    105       <td>-</td>
    106       <td>First instruction in method</td>
    107       <td></td>
    108     </tr>
    109     <tr>
    110       <td>SEQUENCE</td>
    111       <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>, 
    112         <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td>
    113       <td>Subsequent instruction</td>
    114       <td></td>
    115     </tr>
    116     <tr>
    117       <td>JUMP</td>
    118       <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or
    119         <code>LOOKUPSWITCH</code>  instruction</td>
    120       <td>Target instruction</td>
    121       <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define
    122         multiple edges.</td>
    123     </tr>
    124     <tr>
    125       <td>EXHANDLER</td>
    126       <td>Any instruction in handler scope</td>
    127       <td>Target instruction</td>
    128       <td></td>
    129     </tr>
    130     <tr>
    131       <td>EXIT</td>
    132       <td><code>xRETURN</code> or <code>THROW</code> instruction</td>
    133       <td>-</td>
    134       <td></td>
    135     </tr>
    136     <tr>
    137       <td>EXEXIT</td>
    138       <td>Any instruction</td>
    139       <td>-</td>
    140       <td>Unhandled exception.</td>
    141     </tr>
    142   </tbody>
    143 </table>
    144 
    145 <p>
    146   The current JaCoCo implementation ignores edges caused by implicit exceptions
    147   and the the method entry. This means we consider SEQUENCE, JUMP, EXIT.
    148 </p>
    149 
    150 
    151 <h2>Probe Insertion Strategy</h2>
    152 
    153 <p>
    154   Probes are additional instructions that can be inserted between existing
    155   instructions. They do not change the behavior of the method but record the
    156   fact that they have been executed. One can think probes are placed on edges of
    157   the control flow graph. Theoretically we could insert a probe at every edge of
    158   the control flow graph. As a probe implementation itself requires multiple
    159   bytecode instructions this would increase the size of the class files several
    160   times and significantly slow down execution speed of the instrumented classes.
    161   Fortunately this is not required, in fact we only need a few probes per method
    162   depending on the control flow of the method. For example a method without any
    163   branches requires a single probe only. The reason for this is that starting
    164   from a certain probe we can back-trace the execution path and typically get
    165   coverage information for multiple instructions.
    166 </p>
    167 
    168 <p>
    169   If a probe has been executed we know that the corresponding edge has been
    170   visited. From this edge we can conclude to other preceding nodes and edges: 
    171 </p>
    172 
    173 <ul>
    174   <li>If a edge has been visited, we know that the source node of the this edge
    175       has been executed.</li>
    176   <li>If a node has been executed and the node is the target of only one edge
    177       we know that this edge has been visited.</li>
    178 </ul>
    179 
    180 <p>
    181   Recursively applying these rules allows to determine the execution status of
    182   all instructions of a method &ndash; given that we have probes at the right
    183   positions. Therefore JaCoCo inserts probes      
    184 </p>
    185 
    186 <ul>
    187   <li>at every method exit (return or throws) and</li>
    188   <li>at every edge where the target instruction is the target of more than one
    189       edge.</li>
    190 </ul>
    191 
    192 <p>
    193   We recall that a probe is simply a small sequence of additional instructions
    194   that needs to be inserted at a control flow edge. The following table
    195   illustrates how this extra instructions are added in case of different edge
    196   types.
    197 </p>
    198 
    199 <table class="coverage">
    200   <thead>
    201     <tr>
    202       <td>Type</td>
    203       <td>Before</td>
    204       <td>After</td>
    205       <td>Remarks</td>
    206     </tr>
    207   </thead>
    208   <tbody>
    209     <tr>
    210       <td>SEQUENCE</td>
    211       <td><img src="resources/flow-sequence.png" alt="Sequence"/></td>
    212       <td><img src="resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td>
    213       <td>
    214         In case of a simple sequence the probe is simply inserted between the
    215         two instructions. 
    216       </td>
    217     </tr>
    218     <tr>
    219       <td>JUMP (unconditional)</td>
    220       <td><img src="resources/flow-goto.png" alt="Unconditional Jump"/></td>
    221       <td><img src="resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td>
    222       <td>
    223         As an unconditional jump is executed in any case, we can also insert the
    224         probe just before the GOTO instruction.
    225       </td>
    226     </tr>
    227     <tr>
    228       <td>JUMP (conditional)</td>
    229       <td><img src="resources/flow-cond.png" alt="Conditional Jump"/></td>
    230       <td><img src="resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td>
    231       <td>
    232         Adding a probe to an conditional jump is little bit more tricky. We
    233         invert the semantic of the opcode and add the probe right after the
    234         conditional jump. With a subsequent <code>GOTO</code> instruction we
    235         jump to the original target. Note that this approach will not introduce
    236         a backward jump, which would cause trouble with the Java verifier if we
    237         have an uninitialized object on the stack.
    238       </td>
    239     </tr>
    240     <tr>
    241       <td>EXIT</td>
    242       <td><img src="resources/flow-exit.png" alt="Exit"/></td>
    243       <td><img src="resources/flow-exit-probe.png" alt="Exit with Probe"/></td>
    244       <td>
    245         As is is the nature of RETURN and THROW statements to actually leave the
    246         method we add the probe right before these statements.
    247       </td>
    248     </tr>
    249   </tbody>
    250 </table>
    251 
    252 <p>
    253   Now let's see how this rules apply to the example snippet above. We see that
    254   <code>INVOKE d()</code> instruction is the only node with more than one
    255   incoming edge. So we need to place probes on those edges and another probe on
    256   the only exit node. The result is shown the the right box of the diagram
    257   above.
    258 </p>
    259 
    260 <h2>Additional Probes Between Lines</h2>
    261 
    262 <p>
    263   The probe insertion strategy described so far does not consider implicit
    264   exceptions thrown for example from invoked methods. If the control flow
    265   between two probes is interrupted by a exception not explicitly created with
    266   a <code>throw</code> statement all instruction in between are considered as
    267   not covered. This leads to unexpected results especially when the the block of
    268   instructions spans multiple lines of source code.
    269 </p>
    270 
    271 <p>
    272   Therefore JaCoCo adds an additional probe between the instructions of two
    273   lines whenever the subsequent line contains at least one method invocation.
    274   This limits the effect of implicit exceptions from method invocations to
    275   single lines of source. The approach only works for class files compiled with
    276   debug information (line numbers) and does not consider implicit exceptions
    277   from other instructions than method invocations (e.g.
    278   <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>).
    279 </p>
    280 
    281 <h2>Probe Implementation</h2>
    282 
    283 <p>
    284   Code coverage analysis is a runtime metric that provides execution details
    285   of the software under test. This requires detailed recording about the
    286   instructions (instruction coverage) that have been executed. For branch
    287   coverage also the outcome of decisions has to be recorded. In any case
    288   execution data is collected by so called probes:
    289 </p>
    290 
    291 <p class="hint">
    292   A <b>probe</b> is a sequence of bytecode instructions that can be inserted
    293   into a Java method. When the probe is executed, this fact is recorded and can
    294   be reported by the coverage runtime. The probe must not change the behavior
    295   of the original code.
    296 </p>
    297 
    298 <p>
    299   The only purpose of the probe is to record that it has been executed at least
    300   once. The probe does not record the number of times it has been called or
    301   collect any timing information. The latter is out of scope for code coverage
    302   analysis and more in the objective of a performance analysis tool. Typically
    303   multiple probes needs to be inserted into each method, therefore probes needs
    304   to be identified. Also the probe implementation and the storage mechanism it
    305   depends on needs to be thread safe as multi-threaded execution is a common
    306   scenario for java applications (albeit not for plain unit tests). Probes must
    307   not have any side effects on the original code of the method. Also they should
    308   add minimal overhead.
    309 </p>
    310 
    311 <p>
    312   So to summarize the requirements for execution probes:
    313 </p>
    314 
    315 <ul>
    316   <li>Record execution</li>
    317   <li>Identification for different probes</li>
    318   <li>Thread safe</li>
    319   <li>No side effects on application code</li>
    320   <li>Minimal runtime overhead</li>
    321 </ul>
    322 
    323 <p>
    324   JaCoCo implements probes with a <code>boolean[]</code> array instance per
    325   class. Each probe corresponds to a entry in this array. Whenever the probe is
    326   executed the entry is set to <code>true</code> with the following four
    327   bytecode instructions:
    328 </p>
    329 
    330 <pre class="source">
    331 ALOAD    probearray
    332 xPUSH    probeid
    333 ICONST_1
    334 BASTORE
    335 </pre>
    336 
    337 <p>
    338   Note that this probe code is thread safe, does not modify the operand stack
    339   or modify local variables and is also thread safe. It does also not leave the
    340   method though an external call. The only prerequisite is that the probe array
    341   is available as a local variable. For this at the beginning of each method
    342   additional instrumentation code needs to be added to obtain the array instance
    343   associated with the belonging class. To avoid code duplication the
    344   initialization is delegated to a static private method
    345   <code>$jacocoinit()</code> which is added to every non-interface class.
    346 </p>
    347 
    348 <p>
    349   The size of the probe code above depends on the position of the probe array
    350   variable and the value of the probe identifier as different opcodes can be
    351   used. As calculated in the table below the overhead per probe ranges between 4
    352   and 7 bytes of additional bytecode: 
    353 </p>
    354 
    355 <table class="coverage">
    356   <thead>
    357     <tr>
    358       <td>Possible Opcodes</td>
    359       <td>Min. Size [bytes]</td>
    360       <td>Max. Size [bytes]</td>
    361     </tr>
    362   </thead>
    363   <tfoot>
    364     <tr>
    365       <td>Total:</td>
    366       <td>4</td>
    367       <td>7</td>
    368     </tr>
    369   </tfoot>
    370   <tbody>
    371     <tr>
    372       <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td>
    373       <td>1</td>
    374       <td>2</td>
    375     </tr>
    376     <tr>
    377       <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td>
    378       <td>1</td>
    379       <td>3</td>
    380     </tr>
    381     <tr>
    382       <td><code>ICONST_1</code></td>
    383       <td>1</td>
    384       <td>1</td>
    385     </tr>
    386     <tr>
    387       <td><code>BASTORE</code></td>
    388       <td>1</td>
    389       <td>1</td>
    390     </tr>
    391   </tbody>
    392 </table>
    393 
    394 <p>
    395   <sup>1</sup> The probe array is the first variable after the arguments.
    396   If the method arguments do not consume more that 3 slots the 1-byte opcode can
    397   be used.<br/>
    398   <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127,
    399   3-byte opcode for ids up to 32767. Ids values of 32768 or more require an
    400   additional constant pool entry. For normal class files it is very unlikely to
    401   require more than 32,000 probes.
    402 </p>
    403 
    404 <h2>Performance</h2>
    405 
    406 <p>
    407   The control flow analysis and probe insertion strategy described in this
    408   document allows to efficiently record instruction and branch coverage. In
    409   total classes instrumented with JaCoCo increase their size by about 30%. Due
    410   to the fact that probe execution does not require any method calls, only local
    411   instructions, the observed execution time overhead for instrumented
    412   applications typically is less than 10%. 
    413 </p>
    414 
    415 <h2>References</h2>
    416 
    417 <ul>
    418   <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li>
    419   <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li>
    420   <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li>
    421 </ul>
    422 
    423 </div>
    424 <div class="footer">
    425   <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div>
    426   <a href="license.html">Copyright</a> &copy; @copyright.years@ Mountainminds GmbH &amp; Co. KG and Contributors
    427 </div>
    428 
    429 </body>
    430 </html>
    431