Home | History | Annotate | Download | only in doc
      1 <?xml version="1.0" encoding="UTF-8" ?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      3 <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
      4 <head>
      5   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      6   <link rel="stylesheet" href="resources/doc.css" charset="UTF-8" type="text/css" />
      7   <link rel="stylesheet" href="../coverage/jacoco-resources/prettify.css" charset="UTF-8" type="text/css" />
      8   <link rel="shortcut icon" href="resources/report.gif" type="image/gif" />
      9   <script type="text/javascript" src="../coverage/jacoco-resources/prettify.js"></script>
     10   <title>JaCoCo - Implementation Design</title>
     11 </head>
     12 <body onload="prettyPrint()">
     13 
     14 <div class="breadcrumb">
     15   <a href="../index.html" class="el_report">JaCoCo</a> &gt;
     16   <a href="index.html" class="el_group">Documentation</a> &gt;
     17   <span class="el_source">Implementation Design</span>
     18 </div>
     19 <div id="content"> 
     20 
     21 <h1>Implementation Design</h1>
     22 
     23 <p>
     24   This is a unordered list of implementation design decisions. Each topic tries
     25   to follow this structure:
     26 </p>
     27 
     28 <ul>
     29   <li>Problem statement</li>
     30   <li>Proposed Solution</li>
     31   <li>Alternatives and Discussion</li>
     32 </ul>
     33 
     34 
     35 <h2>Coverage Analysis Mechanism</h2>
     36 
     37 <p class="intro">
     38   Coverage information has to be collected at runtime. For this purpose JaCoCo
     39   creates instrumented versions of the original class definitions. The
     40   instrumentation process happens on-the-fly during class loading using so
     41   called Java agents.  
     42 </p>
     43 
     44 <p>
     45   There are several different approaches to collect coverage information. For
     46   each approach different implementation techniques are known. The following
     47   diagram gives an overview with the techniques used by JaCoCo highlighted:
     48 </p>
     49 
     50 <img src="resources/implementation.png" alt="Coverage Implementation Techniques"/>
     51 
     52 <p>
     53   Byte code instrumentation is very fast, can be implemented in pure Java and
     54   works with every Java VM. On-the-fly instrumentation with the Java agent
     55   hook can be added to the JVM without any modification of the target
     56   application.
     57 </p>
     58 
     59 <p>
     60   The Java agent hook requires at least 1.5 JVMs. Class files compiled with
     61   debug information (line numbers) allow for source code highlighting. Unluckily
     62   some Java language constructs get compiled to byte code that produces
     63   unexpected highlighting results, especially in case of implicitly generated
     64   code like default constructors or control structures for finally statements.
     65 </p>
     66 
     67 
     68 <h2>Coverage Agent Isolation</h2>
     69 
     70 <p class="intro">
     71   The Java agent is loaded by the application class loader. Therefore the
     72   classes of the agent live in the same name space like the application classes
     73   which can result in clashes especially with the third party library ASM. The
     74   JoCoCo build therefore moves all agent classes into a unique package.  
     75 </p>
     76 
     77 <p>
     78   The JaCoCo build renames all classes contained in the
     79   <code>jacocoagent.jar</code> into classes with a 
     80   <code>org.jacoco.agent.rt_&lt;randomid&gt;</code> prefix, including the
     81   required ASM library classes. The identifier is created from a random number.
     82   As the agent does not provide any API, no one should be affected by this
     83   renaming. This trick also allows that JaCoCo tests can be verified with
     84   JaCoCo.
     85 </p>
     86 
     87 
     88 <h2>Minimal Java Version</h2>
     89 
     90 <p class="intro">
     91   JaCoCo requires Java 1.5.
     92 </p>
     93 
     94 <p>
     95   The Java agent mechanism used for on-the-fly instrumentation became available
     96   with Java 1.5 VMs. Coding and testing with Java 1.5 language level is more
     97   efficient, less error-prone &ndash; and more fun than with older versions.
     98   JaCoCo will still allow to run against Java code compiled for these.
     99 </p>
    100 
    101 
    102 <h2>Byte Code Manipulation</h2>
    103 
    104 <p class="intro">
    105   Instrumentation requires mechanisms to modify and generate Java byte code.
    106   JaCoCo uses the ASM library for this purpose internally.
    107 </p>
    108 
    109 <p>
    110   Implementing the Java byte code specification would be an extensive and
    111   error-prone task. Therefore an existing library should be used. The
    112   <a href="http://asm.objectweb.org/">ASM</a> library is lightweight, easy to
    113   use and very efficient in terms of memory and CPU usage. It is actively
    114   maintained and includes as huge regression test suite. Its simplified BSD
    115   license is approved by the Eclipse Foundation for usage with EPL products.
    116 </p>
    117 
    118 <h2>Java Class Identity</h2>
    119 
    120 <p class="intro">
    121   Each class loaded at runtime needs a unique identity to associate coverage data with.
    122   JaCoCo creates such identities by a CRC64 hash code of the raw class definition.
    123 </p>
    124 
    125 <p>
    126   In multi-classloader environments the plain name of a class does not
    127   unambiguously identify a class. For example OSGi allows to use different
    128   versions of the same class to be loaded within the same VM. In complex
    129   deployment scenarios the actual version of the test target might be different
    130   from current development version. A code coverage report should guarantee that
    131   the presented figures are extracted from a valid test target. A hash code of
    132   the class definitions allows to differentiate between classes and versions of
    133   classes. The CRC64 hash computation is simple and fast resulting in a small 64
    134   bit identifier. 
    135 </p>
    136 
    137 <p>
    138   The same class definition might be loaded by class loaders which will result
    139   in different classes for the Java runtime system. For coverage analysis this
    140   distinction should be irrelevant. Class definitions might be altered by other
    141   instrumentation based technologies (e.g. AspectJ). In this case the hash code
    142   will change and identity gets lost. On the other hand code coverage analysis
    143   based on classes that have been somehow altered will produce unexpected
    144   results. The CRC64 code might produce so called <i>collisions</i>, i.e.
    145   creating the same hash code for two different classes. Although CRC64 is not
    146   cryptographically strong and collision examples can be easily computed, for
    147   regular class files the collision probability is very low. 
    148 </p>
    149 
    150 <h2>Coverage Runtime Dependency</h2>
    151 
    152 <p class="intro">
    153   Instrumented code typically gets a dependency to a coverage runtime which is
    154   responsible for collecting and storing execution data. JaCoCo uses JRE types
    155   only in generated instrumentation code. 
    156 </p>
    157 
    158 <p>
    159   Making a runtime library available to all instrumented classes can be a
    160   painful or impossible task in frameworks that use their own class loading
    161   mechanisms. Since Java 1.6 <code>java.lang.instrument.Instrumentation</code>
    162   has an API to extends the bootsstrap loader. As our minimum target is Java 1.5 
    163   JaCoCo decouples the instrumented classes and the coverage runtime through
    164   official JRE API types only. The instrumented classes communicate through the
    165   <code>Object.equals(Object)</code> method with the runtime. A instrumented
    166   class can retrieve its probe array instance with the following code. Note
    167   that only JRE APIs are used:   
    168 </p>
    169 
    170 
    171 <pre class="source lang-java linenums">
    172 Object access = ...                          // Retrieve instance
    173 
    174 Object[] args = new Object[3];
    175 args[0] = Long.valueOf(8060044182221863588); // class id 
    176 args[1] = "com/example/MyClass";             // class name
    177 args[2] = Integer.valueOf(24);               // probe count
    178 
    179 access.equals(args);
    180 
    181 boolean[] probes = (boolean[]) args[0];
    182 </pre>
    183 
    184 <p>
    185   The most tricky part takes place in line 1 and is not shown in the snippet
    186   above. The object instance providing access to the coverage runtime through
    187   its <code>equals()</code> method has to be obtained. Different approaches have
    188   been implemented and tested so far:
    189 </p>
    190 
    191 <ul>
    192   <li><b><code>SystemPropertiesRuntime</code></b>: This approach stores the
    193     object instance under a system property. This solution breaks the contract
    194     that system properties must only contain <code>java.lang.String</code>
    195     values and therefore causes trouble in applications that rely on this
    196     definition (e.g. Ant).</li>
    197   <li><b><code>LoggerRuntime</code></b>: Here we use a shared
    198     <code>java.util.logging.Logger</code> and communicate through the logging
    199     parameter array instead of a <code>equals()</code> method. The coverage
    200     runtime registers a custom <code>Handler</code> to receive the parameter
    201     array. This approach might break environments that install their own log
    202     managers (e.g. Glassfish).</li> 
    203   <li><b><code>URLStreamHandlerRuntime</code></b>: This runtime registers a
    204     <code>URLStreamHandler</code> for a "jacoco-xxxxx" protocol. Instrumented
    205     classes open a connection on this protocol. The returned connection object
    206     is the one that provides access to the coverage runtime through its
    207     <code>equals()</code> method. However to register the protocol the runtime
    208     needs to access internal members of the <code>java.net.URL</code> class.</li> 
    209   <li><b><code>ModifiedSystemClassRuntime</code></b>: This approach adds a
    210     public static field to an existing JRE class through instrumentation. Unlike
    211     the other methods above this is only possible for environments where a Java
    212     agent is active.</li> 
    213 </ul>
    214 
    215 <p>
    216   The current JaCoCo Java agent implementation uses the 
    217   <code>ModifiedSystemClassRuntime</code> adding a field to the class
    218   <code>java.lang.UnknownError</code>. Versions 0.5.0 - 0.7.9 were adding field
    219   to the class <code>java.util.UUID</code>, having bigger chance of conflict
    220   with other agents.
    221 </p>
    222 
    223 
    224 <h2>Memory Usage</h2>
    225 
    226 <p class="intro">
    227   Coverage analysis for huge projects with several thousand classes or hundred
    228   thousand lines of code should be possible. To allow this with reasonable
    229   memory usage the coverage analysis is based on streaming patterns and
    230   "depth first" traversals.  
    231 </p>
    232 
    233 <p>
    234   The complete data tree of a huge coverage report is too big to fit into a
    235   reasonable heap memory configuration. Therefore the coverage analysis and
    236   report generation is implemented as "depth first" traversals. Which means that
    237   at any point in time only the following data has to be held in working memory:  
    238 </p>
    239 
    240 <ul>
    241   <li>A single class which is currently processed.</li>
    242   <li>The summary information of all parents of this class (package, groups).</li>
    243 </ul>
    244 
    245 <h2>Java Element Identifiers</h2>
    246 
    247 <p class="intro">
    248   The Java language and the Java VM use different String representation formats
    249   for Java elements. For example while a type reference in Java reads like 
    250   <code>java.lang.Object</code>, the VM references the same type as
    251   <code>Ljava/lang/Object;</code>. The JaCoCo API is based on VM identifiers only.
    252 </p>
    253 
    254 <p>
    255   Using VM identifiers directly does not cause any transformation overhead at
    256   runtime. There are several programming languages based on the Java VM that
    257   might use different notations. Specific transformations should therefore only
    258   happen at the user interface level, for example during report generation.
    259 </p>
    260 
    261 <h2>Modularization of the JaCoCo implementation</h2>
    262 
    263 <p class="intro">
    264   JaCoCo is implemented in several modules providing different functionality.
    265   These modules are provided as OSGi bundles with proper manifest files. But
    266   there are no dependencies on OSGi itself.
    267 </p>
    268 
    269 <p>
    270   Using OSGi bundles allows well defined dependencies at development time and
    271   at runtime in OSGi containers. As there are no dependencies on OSGi, the
    272   bundles can also be used like regular JAR files.  
    273 </p>
    274 
    275 </div>
    276 <div class="footer">
    277   <span class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</span>
    278   <a href="license.html">Copyright</a> &copy; @copyright.years@ Mountainminds GmbH &amp; Co. KG and Contributors
    279 </div>
    280 
    281 </body>
    282 </html>
    283