1 <?xml version="1.0" encoding="UTF-8" ?> 2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 3 <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 6 <link rel="stylesheet" href=".resources/doc.css" charset="UTF-8" type="text/css" /> 7 <link rel="stylesheet" href="../coverage/.resources/prettify.css" charset="UTF-8" type="text/css" /> 8 <link rel="shortcut icon" href=".resources/report.gif" type="image/gif" /> 9 <script type="text/javascript" src="../coverage/.resources/prettify.js"></script> 10 <title>JaCoCo - Control Flow Analysis</title> 11 </head> 12 <body onload="prettyPrint()"> 13 14 <div class="breadcrumb"> 15 <a href="../index.html" class="el_report">JaCoCo</a> > 16 <a href="index.html" class="el_group">Documentation</a> > 17 <span class="el_source">Control Flow Analysis</span> 18 </div> 19 <div id="content"> 20 21 <h1>Control Flow Analysis for Java Methods</h1> 22 23 <p class="hint"> 24 Implementing a coverage tool that supports statement (C0) as well as branch 25 coverage coverage (C1) requires detailed analysis of the internal control flow 26 of Java methods. Due to the architecture of JaCoCo this analysis happens on 27 the bytecode of compiled class files. This document describes JaCoCo's 28 strategies for inserting probes into the control flow at runtime and analyzing 29 the actual code coverage. Marc R. Hoffmann, November 2011 30 </p> 31 32 <h2>Control Flow Graphs for Java Bytecode</h2> 33 34 <p> 35 As an starting point we take the following example method that contains a 36 single branching point: 37 </p> 38 39 <pre class="source lang-java linenums"> 40 public static void example() { 41 a(); 42 if (cond()) { 43 b(); 44 } else { 45 c(); 46 } 47 d(); 48 } 49 </pre> 50 51 <p> 52 A Java compiler will create the following bytecode from this example method. 53 Java bytecode is a linear sequence of instructions. Control flow is 54 implemented with <i>jump</i> instructions like the conditional 55 <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump 56 targets are technically relative offsets to the target instruction. For better 57 readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead 58 (also the ASM API uses such symbolic labels): 59 </p> 60 61 <pre class="source linenums"> 62 public static example()V 63 INVOKESTATIC a()V 64 INVOKESTATIC cond()Z 65 IFEQ L1 66 INVOKESTATIC b()V 67 GOTO L2 68 L1: INVOKESTATIC c()V 69 L2: INVOKESTATIC d()V 70 RETURN 71 </pre> 72 73 <p> 74 The possible control flow in the bytecode above can be represented by a graph. 75 The nodes are byte code instruction, the edged of the graph represent the 76 possible control flow between the instructions. The control flow of the 77 example is shown in the left box of this diagram: 78 </p> 79 80 <img src=".resources/flow-example.png" alt="Bytecode Control Flow"/> 81 82 83 <h3>Flow Edges</h3> 84 85 <p> 86 The control flow graph of a Java method defined by Java byte code may have 87 the following Edges. Each edge connects a source instruction with a target 88 instruction. In some cases the source instruction or the target instruction 89 does not exist (virtual edges for method entry and exit) or cannot be 90 exactly specified (exception handlers). 91 </p> 92 93 <table class="coverage"> 94 <thead> 95 <tr> 96 <td>Type</td> 97 <td>Source</td> 98 <td>Target</td> 99 <td>Remarks</td> 100 </tr> 101 </thead> 102 <tbody> 103 <tr> 104 <td>ENTRY</td> 105 <td>-</td> 106 <td>First instruction in method</td> 107 <td></td> 108 </tr> 109 <tr> 110 <td>SEQUENCE</td> 111 <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>, 112 <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td> 113 <td>Subsequent instruction</td> 114 <td></td> 115 </tr> 116 <tr> 117 <td>JUMP</td> 118 <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or 119 <code>LOOKUPSWITCH</code> instruction</td> 120 <td>Target instruction</td> 121 <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define 122 multiple edges.</td> 123 </tr> 124 <tr> 125 <td>EXHANDLER</td> 126 <td>Any instruction in handler scope</td> 127 <td>Target instruction</td> 128 <td></td> 129 </tr> 130 <tr> 131 <td>EXIT</td> 132 <td><code>xRETURN</code> or <code>THROW</code> instruction</td> 133 <td>-</td> 134 <td></td> 135 </tr> 136 <tr> 137 <td>EXEXIT</td> 138 <td>Any instruction</td> 139 <td>-</td> 140 <td>Unhandled exception.</td> 141 </tr> 142 </tbody> 143 </table> 144 145 <p> 146 The current JaCoCo implementation ignores edges caused by implicit exceptions 147 and the the method entry. This means we consider SEQUENCE, JUMP, EXIT. 148 </p> 149 150 151 <h2>Probe Insertion Strategy</h2> 152 153 <p> 154 Probes are additional instructions that can be inserted between existing 155 instructions. They do not change the behavior of the method but record the 156 fact that they have been executed. One can think probes are placed on edges of 157 the control flow graph. Theoretically we could insert a probe at every edge of 158 the control flow graph. As a probe implementation itself requires multiple 159 bytecode instructions this would increase the size of the class files several 160 times and significantly slow down execution speed of the instrumented classes. 161 Fortunately this is not required, in fact we only need a few probes per method 162 depending on the control flow of the method. For example a method without any 163 branches requires a single probe only. The reason for this is that starting 164 from a certain probe we can back-trace the execution path and typically get 165 coverage information for multiple instructions. 166 </p> 167 168 <p> 169 If a probe has been executed we know that the corresponding edge has been 170 visited. From this edge we can conclude to other preceding nodes and edges: 171 </p> 172 173 <ul> 174 <li>If a edge has been visited, we know that the source node of the this edge 175 has been executed.</li> 176 <li>If a node has been executed and the node is the target of only one edge 177 we know that this edge has been visited.</li> 178 </ul> 179 180 <p> 181 Recursively applying these rules allows to determine the execution status of 182 all instructions of a method – given that we have probes at the right 183 positions. Therefore JaCoCo inserts probes 184 </p> 185 186 <ul> 187 <li>at every method exit (return or throws) and</li> 188 <li>at every edge where the target instruction is the target of more than one 189 edge.</li> 190 </ul> 191 192 <p> 193 We recall that a probe is simply a small sequence of additional instructions 194 that needs to be inserted at a control flow edge. The following table 195 illustrates how this extra instructions are added in case of different edge 196 types. 197 </p> 198 199 <table class="coverage"> 200 <thead> 201 <tr> 202 <td>Type</td> 203 <td>Before</td> 204 <td>After</td> 205 <td>Remarks</td> 206 </tr> 207 </thead> 208 <tbody> 209 <tr> 210 <td>SEQUENCE</td> 211 <td><img src=".resources/flow-sequence.png" alt="Sequence"/></td> 212 <td><img src=".resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td> 213 <td> 214 In case of a simple sequence the probe is simply inserted between the 215 two instructions. 216 </td> 217 </tr> 218 <tr> 219 <td>JUMP (unconditional)</td> 220 <td><img src=".resources/flow-goto.png" alt="Unconditional Jump"/></td> 221 <td><img src=".resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td> 222 <td> 223 As an unconditional jump is executed in any case, we can also insert the 224 probe just before the GOTO instruction. 225 </td> 226 </tr> 227 <tr> 228 <td>JUMP (conditional)</td> 229 <td><img src=".resources/flow-cond.png" alt="Conditional Jump"/></td> 230 <td><img src=".resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td> 231 <td> 232 Adding a probe to an conditional jump is little bit more tricky. We 233 invert the semantic of the opcode and add the probe right after the 234 conditional jump. With a subsequent <code>GOTO</code> instruction we 235 jump to the original target. Note that this approach will not introduce 236 a backward jump, which would cause trouble with the Java verifier if we 237 have an uninitialized object on the stack. 238 </td> 239 </tr> 240 <tr> 241 <td>EXIT</td> 242 <td><img src=".resources/flow-exit.png" alt="Exit"/></td> 243 <td><img src=".resources/flow-exit-probe.png" alt="Exit with Probe"/></td> 244 <td> 245 As is is the nature of RETURN and THROW statements to actually leave the 246 method we add the probe right before these statements. 247 </td> 248 </tr> 249 </tbody> 250 </table> 251 252 <p> 253 Now let's see how this rules apply to the example snippet above. We see that 254 <code>INVOKE d()</code> instruction is the only node with more than one 255 incoming edge. So we need to place probes on those edges and another probe on 256 the only exit node. The result is shown the the right box of the diagram 257 above. 258 </p> 259 260 <h2>Additional Probes Between Lines</h2> 261 262 <p> 263 The probe insertion strategy described so far does not consider implicit 264 exceptions thrown for example from invoked methods. If the control flow 265 between two probes is interrupted by a exception not explicitly created with 266 a <code>throw</code> statement all instruction in between are considered as 267 not covered. This leads to unexpected results especially when the the block of 268 instructions spans multiple lines of source code. 269 </p> 270 271 <p> 272 Therefore JaCoCo adds an additional probe between the instructions of two 273 lines whenever the subsequent line contains at least one method invocation. 274 This limits the effect of implicit exceptions from method invocations to 275 single lines of source. The approach only works for class files compiled with 276 debug information (line numbers) and does not consider implicit exceptions 277 from other instructions than method invocations (e.g. 278 <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>). 279 </p> 280 281 <h2>Probe Implementation</h2> 282 283 <p> 284 Code coverage analysis is a runtime metric that provides execution details 285 of the software under test. This requires detailed recording about the 286 instructions (instruction coverage) that have been executed. For branch 287 coverage also the outcome of decisions has to be recorded. In any case 288 execution data is collected by so called probes: 289 </p> 290 291 <p class="hint"> 292 A <b>probe</b> is a sequence of bytecode instructions that can be inserted 293 into a Java method. When the probe is executed, this fact is recorded and can 294 be reported by the coverage runtime. The probe must not change the behavior 295 of the original code. 296 </p> 297 298 <p> 299 The only purpose of the probe is to record that it has been executed at least 300 once. The probe does not record the number of times it has been called or 301 collect any timing information. The latter is out of scope for code coverage 302 analysis and more in the objective of a performance analysis tool. Typically 303 multiple probes needs to be inserted into each method, therefore probes needs 304 to be identified. Also the probe implementation and the storage mechanism it 305 depends on needs to be thread safe as multi-threaded execution is a common 306 scenario for java applications (albeit not for plain unit tests). Probes must 307 not have any side effects on the original code of the method. Also they should 308 add minimal overhead. 309 </p> 310 311 <p> 312 So to summarize the requirements for execution probes: 313 </p> 314 315 <ul> 316 <li>Record execution</li> 317 <li>Identification for different probes</li> 318 <li>Thread safe</li> 319 <li>No side effects on application code</li> 320 <li>Minimal runtime overhead</li> 321 </ul> 322 323 <p> 324 JaCoCo implements probes with a <code>boolean[]</code> array instance per 325 class. Each probe corresponds to a entry in this array. Whenever the probe is 326 executed the entry is set to <code>true</code> with the following four 327 bytecode instructions: 328 </p> 329 330 <pre class="source"> 331 ALOAD probearray 332 xPUSH probeid 333 ICONST_1 334 BASTORE 335 </pre> 336 337 <p> 338 Note that this probe code is thread safe, does not modify the operand stack 339 or modify local variables and is also thread safe. It does also not leave the 340 method though an external call. The only prerequisite is that the probe array 341 is available as a local variable. For this at the beginning of each method 342 additional instrumentation code needs to be added to obtain the array instance 343 associated with the belonging class. To avoid code duplication the 344 initialization is delegated to a static private method 345 <code>$jacocoinit()</code> which is added to every non-interface class. 346 </p> 347 348 <p> 349 The size of the probe code above depends on the position of the probe array 350 variable and the value of the probe identifier as different opcodes can be 351 used. As calculated in the table below the overhead per probe ranges between 4 352 and 7 bytes of additional bytecode: 353 </p> 354 355 <table class="coverage"> 356 <thead> 357 <tr> 358 <td>Possible Opcodes</td> 359 <td>Min. Size [bytes]</td> 360 <td>Max. Size [bytes]</td> 361 </tr> 362 </thead> 363 <tfoot> 364 <tr> 365 <td>Total:</td> 366 <td>4</td> 367 <td>7</td> 368 </tr> 369 </tfoot> 370 <tbody> 371 <tr> 372 <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td> 373 <td>1</td> 374 <td>2</td> 375 </tr> 376 <tr> 377 <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td> 378 <td>1</td> 379 <td>3</td> 380 </tr> 381 <tr> 382 <td><code>ICONST_1</code></td> 383 <td>1</td> 384 <td>1</td> 385 </tr> 386 <tr> 387 <td><code>BASTORE</code></td> 388 <td>1</td> 389 <td>1</td> 390 </tr> 391 </tbody> 392 </table> 393 394 <p> 395 <sup>1</sup> The probe array is the first variable after the arguments. 396 If the method arguments do not consume more that 3 slots the 1-byte opcode can 397 be used.<br/> 398 <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127, 399 3-byte opcode for ids up to 32767. Ids values of 32768 or more require an 400 additional constant pool entry. For normal class files it is very unlikely to 401 require more than 32,000 probes. 402 </p> 403 404 <h2>Performance</h2> 405 406 <p> 407 The control flow analysis and probe insertion strategy described in this 408 document allows to efficiently record instruction and branch coverage. In 409 total classes instrumented with JaCoCo increase their size by about 30%. Due 410 to the fact that probe execution does not require any method calls, only local 411 instructions, the observed execution time overhead for instrumented 412 applications typically is less than 10%. 413 </p> 414 415 <h2>References</h2> 416 417 <ul> 418 <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li> 419 <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li> 420 <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li> 421 </ul> 422 423 </div> 424 <div class="footer"> 425 <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div> 426 <a href="license.html">Copyright</a> © @copyright.years@ Mountainminds GmbH & Co. KG and Contributors 427 </div> 428 429 </body> 430 </html> 431