Home | History | Annotate | only in /build/tools/ijar
Up to higher level directory
NameDateSize
Android.mk24-Aug-2016495
classfile.cc24-Aug-201649.7K
common.h24-Aug-20162.2K
ijar.cc24-Aug-20165.4K
LICENSE24-Aug-201611.1K
README.txt24-Aug-20165.1K
zip.cc24-Aug-201633K
zip.h24-Aug-20166.6K
zip_main.cc24-Aug-20168.8K

README.txt

      1 
      2 ijar: A tool for generating interface .jars from normal .jars
      3 =============================================================
      4 
      5 Alan Donovan, 26 May 2007.
      6 
      7 Rationale:
      8 
      9   In order to improve the speed of compilation of Java programs in
     10   Bazel, the output of build steps is cached.
     11 
     12   This works very nicely for C++ compilation: a compilation unit
     13   includes a .cc source file and typically dozens of header files.
     14   Header files change relatively infrequently, so the need for a
     15   rebuild is usually driven by a change in the .cc file.  Even after
     16   syncing a slightly newer version of the tree and doing a rebuild,
     17   many hits in the cache are still observed.
     18 
     19   In Java, by contrast, a compilation unit involves a set of .java
     20   source files, plus a set of .jar files containing already-compiled
     21   JVM .class files.  Class files serve a dual purpose: from the JVM's
     22   perspective, they are containers of executable code, but from the
     23   compiler's perspective, they are interface definitions.  The problem
     24   here is that .jar files are very much more sensitive to change than
     25   C++ header files, so even a change that is insignificant to the
     26   compiler (such as the addition of a print statement to a method in a
     27   prerequisite class) will cause the jar to change, and any code that
     28   depends on this jar's interface will be recompiled unnecessarily.
     29 
     30   The purpose of ijar is to produce, from a .jar file, a much smaller,
     31   simpler .jar file containing only the parts that are significant for
     32   the purposes of compilation.  In other words, an interface .jar
     33   file.  By changing ones compilation dependencies to be the interface
     34   jar files, unnecessary recompilation is avoided when upstream
     35   changes don't affect the interface.
     36 
     37 Details:
     38 
     39   ijar is a tool that reads a .jar file and emits a .jar file
     40   containing only the parts that are relevant to Java compilation.
     41   For example, it throws away:
     42 
     43   - Files whose name does not end in ".class".
     44   - All executable method code.
     45   - All private methods and fields.
     46   - All constants and attributes except the minimal set necessary to
     47     describe the class interface.
     48   - All debugging information
     49     (LineNumberTable, SourceFile, LocalVariableTables attributes).
     50 
     51   It also sets to zero the file modification times in the index of the
     52   .jar file.
     53 
     54 Implementation:
     55 
     56   ijar is implemented in C++, and runs very quickly.  For example
     57   (when optimized) it takes only 530ms to process a 42MB
     58   .jar file containing 5878 classe, resulting in an interface .jar
     59   file of only 11.4MB in size.  For more usual .jar sizes of a few
     60   megabytes, a runtime of 50ms is typical.
     61 
     62   The implementation strategy is to mmap both the input jar and the
     63   newly-created _interface.jar, and to scan through the former and
     64   emit the latter in a single pass. There are a couple of locations
     65   where some kind of "backpatching" is required:
     66 
     67   - in the .zip file format, for each file, the size field precedes
     68     the data.  We emit a zero but note its location, generate and emit
     69     the stripped classfile, then poke the correct size into the
     70     location.
     71 
     72   - for JVM .class files, the header (including the constant table)
     73     precedes the body, but cannot be emitted before it because it's
     74     not until we emit the body that we know which constants are
     75     referenced and which are garbage.  So we emit the body into a
     76     temporary buffer, then emit the header to the output jar, followed
     77     by the contents of the temp buffer.
     78 
     79   Also note that the zip file format has unnecessary duplication of
     80   the index metadata: it has header+data for each file, then another
     81   set of (similar) headers at the end.  Rather than save the metadata
     82   explicitly in some datastructure, we just record the addresses of
     83   the already-emitted zip metadata entries in the output file, and
     84   then read from there as necessary.
     85 
     86 Notes:
     87 
     88   This code has no dependency except on the STL and on zlib.
     89 
     90   Almost all of the getX/putX/ReadX/WriteX functions in the code
     91   advance their first argument pointer, which is passed by reference.
     92 
     93   It's tempting to discard package-private classes and class members.
     94   However, this would be incorrect because they are a necessary part
     95   of the package interface, as a Java package is often compiled in
     96   multiple stages.  For example: in Bazel, both java tests and java
     97   code inhabit the same Java package but are compiled separately.
     98 
     99 Assumptions:
    100 
    101   We assume that jar files are uncompressed v1.0 zip files (created
    102   with 'jar c0f') with a zero general_purpose_bit_flag.
    103 
    104   We assume that javap/javac don't need the correct CRC checksums in
    105   the .jar file.
    106 
    107   We assume that it's better simply to abort in the face of unknown
    108   input than to risk leaving out something important from the output
    109   (although in the case of annotations, it should be safe to ignore
    110   ones we don't understand).
    111 
    112 TODO:
    113   Maybe: ensure a canonical sort order is used for every list (jar
    114   entries, class members, attributes, etc.)  This isn't essential
    115   because we can assume the compiler is deterministic and the order in
    116   the source files changes little.  Also, it would require two passes. :(
    117 
    118   Maybe: delete dynamically-allocated memory.
    119 
    120   Add (a lot) more tests.  Include a test of idempotency.
    121