Home | History | Annotate | Download | only in docs
      1 <html>
      2 <head>
      3     <title>Dalvik Optimization and Verification</title>
      4 </head>
      5 
      6 <body>
      7 <h1>Dalvik Optimization and Verification With <i>dexopt</i></h1>
      8 
      9 <p>
     10 The Dalvik virtual machine was designed specifically for the Android
     11 mobile platform.  The target systems have little RAM, store data on slow
     12 internal flash memory, and generally have the performance characteristics
     13 of decade-old desktop systems.  They also run Linux, which provides
     14 virtual memory, processes and threads, and UID-based security mechanisms.
     15 <p>
     16 The features and limitations caused us to focus on certain goals:
     17 
     18 <ul>
     19     <li>Class data, notably bytecode, must be shared between multiple
     20     processes to minimize total system memory usage.
     21     <li>The overhead in launching a new app must be minimized to keep
     22     the device responsive.
     23     <li>Storing class data in individual files results in a lot of
     24     redundancy, especially with respect to strings.  To conserve disk
     25     space we need to factor this out.
     26     <li>Parsing class data fields adds unnecessary overhead during
     27     class loading.  Accessing data values (e.g. integers and strings)
     28     directly as C types is better.
     29     <li>Bytecode verification is necessary, but slow, so we want to verify
     30     as much as possible outside app execution.
     31     <li>Bytecode optimization (quickened instructions, method pruning) is
     32     important for speed and battery life.
     33     <li>For security reasons, processes may not edit shared code.
     34 </ul>
     35 
     36 <p>
     37 The typical VM implementation uncompresses individual classes from a
     38 compressed archive and stores them on the heap.  This implies a separate
     39 copy of each class in every process, and slows application startup because
     40 the code must be uncompressed (or at least read off disk in many small
     41 pieces).  On the other hand, having the bytecode on the local heap makes
     42 it easy to rewrite instructions on first use, facilitating a number of
     43 different optimizations.
     44 <p>
     45 The goals led us to make some fundamental decisions:
     46 
     47 <ul>
     48     <li>Multiple classes are aggregated into a single "DEX" file.
     49     <li>DEX files are mapped read-only and shared between processes.
     50     <li>Byte ordering and word alignment are adjusted to suit the local
     51     system.
     52     <li>Bytecode verification is mandatory for all classes, but we want
     53     to "pre-verify" whatever we can.
     54     <li>Optimizations that require rewriting bytecode must be done ahead
     55     of time.
     56 </ul>
     57 
     58 <p>
     59 The consequences of these decisions are explained in the following sections.
     60 
     61 
     62 <h2>VM Operation</h2>
     63 
     64 <p>
     65 Application code is delivered to the system in a <code>.jar</code>
     66 or <code>.apk</code> file.  These are really just <code>.zip</code>
     67 archives with some meta-data files added.  The Dalvik DEX data file
     68 is always called <code>classes.dex</code>.
     69 <p>
     70 The bytecode cannot be memory-mapped and executed directly from the zip
     71 file, because the data is compressed and the start of the file is not
     72 guaranteed to be word-aligned.  These problems could be addressed by
     73 storing <code>classes.dex</code> without compression and padding out the zip
     74 file, but that would increase the size of the package sent across the
     75 data network.
     76 <p>
     77 We need to extract <code>classes.dex</code> from the zip archive before
     78 we can use it.  While we have the file available, we might as well perform
     79 some of the other actions (realignment, optimization, verification) described
     80 earlier.  This raises a new question however: who is responsible for doing
     81 this, and where do we keep the output?
     82 
     83 <h3>Preparation</h3>
     84 
     85 <p>
     86 There are at least three different ways to create a "prepared" DEX file,
     87 sometimes known as "ODEX" (for Optimized DEX):
     88 <ol>
     89     <li>The VM does it "just in time".  The output goes into a special
     90     <code>dalvik-cache</code> directory.  This works on the desktop and
     91     engineering-only device builds where the permissions on the
     92     <code>dalvik-cache</code> directory are not restricted.  On production
     93     devices, this is not allowed.
     94     <li>The system installer does it when an application is first added.
     95     It has the privileges required to write to <code>dalvik-cache</code>.
     96     <li>The build system does it ahead of time.  The relevant <code>jar</code>
     97     / <code>apk</code> files are present, but the <code>classes.dex</code>
     98     is stripped out.  The optimized DEX is stored next to the original
     99     zip archive, not in <code>dalvik-cache</code>, and is part of the
    100     system image.
    101 </ol>
    102 <p>
    103 The <code>dalvik-cache</code> directory is more accurately
    104 <code>$ANDROID_DATA/data/dalvik-cache</code>.  The files inside it have
    105 names derived from the full path of the source DEX.  On the device the
    106 directory is owned by <code>system</code> / <code>system</code>
    107 and has 0771 permissions, and the optimized DEX files stored there are
    108 owned by <code>system</code> and the
    109 application's group, with 0644 permissions.  DRM-locked applications will
    110 use 640 permissions to prevent other user applications from examining them.
    111 The bottom line is that you can read your own DEX file and those of most
    112 other applications, but you cannot create, modify, or remove them.
    113 <p>
    114 Preparation of the DEX file for the "just in time" and "system installer"
    115 approaches proceeds in three steps:
    116 <p>
    117 First, the dalvik-cache file is created.  This must be done in a process
    118 with appropriate privileges, so for the "system installer" case this is
    119 done within <code>installd</code>, which runs as root.
    120 <p>
    121 Second, the <code>classes.dex</code> entry is extracted from the the zip
    122 archive.  A small amount of space is left at the start of the file for
    123 the ODEX header.
    124 <p>
    125 Third, the file is memory-mapped for easy access and tweaked for use on
    126 the current system.  This includes byte-swapping and structure realigning,
    127 but no meaningful changes to the DEX file.  We also do some basic
    128 structure checks, such as ensuring that file offsets and data indices
    129 fall within valid ranges.
    130 <p>
    131 The build system uses a hairy process that involves starting the
    132 emulator, forcing just-in-time optimization of all relevant DEX files,
    133 and then extracting the results from <code>dalvik-cache</code>.  The
    134 reasons for doing this, rather than using a tool that runs on the desktop,
    135 will become more apparent when the optimizations are explained.
    136 <p>
    137 Once the code is byte-swapped and aligned, we're ready to go.  We append
    138 some pre-computed data, fill in the ODEX header at the start of the file,
    139 and start executing.  (The header is filled in last, so that we don't
    140 try to use a partial file.)  If we're interested in verification and
    141 optimization, however, we need to insert a step after the initial prep.
    142 
    143 <h3>dexopt</h3>
    144 
    145 <p>
    146 We want to verify and optimize all of the classes in the DEX file.  The
    147 easiest and safest way to do this is to load all of the classes into
    148 the VM and run through them.  Anything that fails to load is simply not
    149 verified or optimized.  Unfortunately, this can cause allocation of some
    150 resources that are difficult to release (e.g. loading of native shared
    151 libraries), so we don't want to do it in the same virtual machine that
    152 we're running applications in.
    153 <p>
    154 The solution is to invoke a program called <code>dexopt</code>, which
    155 is really just a back door into the VM.  It performs an abbreviated VM
    156 initialization, loads zero or more DEX files from the bootstrap class
    157 path, and then sets about verifying and optimizing whatever it can from
    158 the target DEX.  On completion, the process exits, freeing all resources.
    159 <p>
    160 It is possible for multiple VMs to want the same DEX file at the same
    161 time.  File locking is used to ensure that dexopt is only run once.
    162 
    163 
    164 <h2>Verification</h2>
    165 
    166 <p>
    167 The bytecode verification process involves scanning through the instructions
    168 in every method in every class in a DEX file.  The goal is to identify
    169 illegal instruction sequences so that we don't have to check for them at
    170 run time.  Many of the computations involved are also necessary for "exact"
    171 garbage collection.  See
    172 <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more
    173 information.
    174 <p>
    175 For performance reasons, the optimizer (described in the next section)
    176 assumes that the verifier has run successfully, and makes some potentially
    177 unsafe assumptions.  By default, Dalvik insists upon verifying all classes,
    178 and only optimizes classes that have been verified.  If you want to
    179 disable the verifier, you can use command-line flags to do so.  See also
    180 <a href="embedded-vm-control.html"> Controlling the Embedded VM</a>
    181 for instructions on controlling these
    182 features within the Android application framework.
    183 <p>
    184 Reporting of verification failures is a tricky issue.  For example,
    185 calling a package-scope method on a class in a different package is
    186 illegal and will be caught by the verifier.  We don't necessarily want
    187 to report it during verification though -- we actually want to throw
    188 an exception when the method call is attempted.  Checking the access
    189 flags on every method call is expensive though.  The
    190 <a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document
    191 addresses this issue.
    192 <p>
    193 Classes that have been verified successfully have a flag set in the ODEX.
    194 They will not be re-verified when loaded.  The Linux access permissions
    195 are expected to prevent tampering; if you can get around those, installing
    196 faulty bytecode is far from the easiest line of attack.  The ODEX file has
    197 a 32-bit checksum, but that's chiefly present as a quick check for
    198 corrupted data.
    199 
    200 
    201 <h2>Optimization</h2>
    202 
    203 <p>
    204 Virtual machine interpreters typically perform certain optimizations the
    205 first time a piece of code is used.  Constant pool references are replaced
    206 with pointers to internal data structures, operations that always succeed
    207 or always work a certain way are replaced with simpler forms.  Some of
    208 these require information only available at runtime, others can be inferred
    209 statically when certain assumptions are made.
    210 <p>
    211 The Dalvik optimizer does the following:
    212 <ul>
    213     <li>For virtual method calls, replace the method index with a
    214     vtable index.
    215     <li>For instance field get/put, replace the field index with
    216     a byte offset.  Also, merge the boolean / byte / char / short
    217     variants into a single 32-bit form (less code in the interpreter
    218     means more room in the CPU I-cache).
    219     <li>Replace a handful of high-volume calls, like String.length(),
    220     with "inline" replacements.  This skips the usual method call
    221     overhead, directly switching from the interpreter to a native
    222     implementation.
    223     <li>Prune empty methods.  The simplest example is
    224     <code>Object.&lt;init&gt;</code>, which does nothing, but must be
    225     called whenever any object is allocated.  The instruction is
    226     replaced with a new version that acts as a no-op unless a debugger
    227     is attached.
    228     <li>Append pre-computed data.  For example, the VM wants to have a
    229     hash table for lookups on class name.  Instead of computing this
    230     when the DEX file is loaded, we can compute it now, saving heap
    231     space and computation time in every VM where the DEX is loaded.
    232 </ul>
    233 
    234 <p>
    235 All of the instruction modifications involve replacing the opcode with
    236 one not defined by the Dalvik specification.  This allows us to freely
    237 mix optimized and unoptimized instructions.  The set of optimized
    238 instructions, and their exact representation, is tied closely to the VM
    239 version.
    240 <p>
    241 Most of the optimizations are obvious "wins".  The use of raw indices
    242 and offsets not only allows us to execute more quickly, we can also
    243 skip the initial symbolic resolution.  Pre-computation eats up
    244 disk space, and so must be done in moderation.
    245 <p>
    246 There are a couple of potential sources of trouble with these
    247 optimizations.  First, vtable indices and byte offsets are subject to
    248 change if the VM is updated.  Second, if a superclass is in a different
    249 DEX, and that other DEX is updated, we need to ensure that our optimized
    250 indices and offsets are updated as well.  A similar but more subtle
    251 problem emerges when user-defined class loaders are employed: the class
    252 we actually call may not be the one we expected to call.
    253 <p>These problems are addressed with dependency lists and some limitations
    254 on what can be optimized.
    255 
    256 
    257 <h2>Dependencies and Limitations</h2>
    258 
    259 <p>
    260 The optimized DEX file includes a list of dependencies on other DEX files,
    261 plus the CRC-32 and modification date from the originating
    262 <code>classes.dex</code> zip file entry.  The dependency list includes the
    263 full path to the <code>dalvik-cache</code> file, and the file's SHA-1
    264 signature.  The timestamps of files on the device are unreliable and
    265 not used.  The dependency area also includes the VM version number.
    266 <p>
    267 An optimized DEX is dependent upon all of the DEX files in the bootstrap
    268 class path.  DEX files that are part of the bootstrap class path depend
    269 upon the DEX files that appeared earlier.  To ensure that nothing outside
    270 the dependent DEX files is available, <code>dexopt</code> only loads the
    271 bootstrap classes.  References to classes in other DEX files fail, which
    272 causes class loading and/or verification to fail, and classes with
    273 external dependencies are simply not optimized.
    274 <p>
    275 This means that splitting code out into many separate DEX files has a
    276 disadvantage: virtual method calls and instance field lookups between
    277 non-boot DEX files can't be optimized.  Because verification is pass/fail
    278 with class granularity, no method in a class that has any reliance on
    279 classes in external DEX files can be optimized.  This may be a bit
    280 heavy-handed, but it's the only way to guarantee that nothing breaks
    281 when individual pieces are updated.
    282 <p>
    283 Another negative consequence: any change to a bootstrap DEX will result
    284 in rejection of all optimized DEX files.  This makes it hard to keep
    285 system updates small.
    286 <p>
    287 Despite our caution, there is still a possibility that a class in a DEX
    288 file loaded by a user-defined class loader could ask for a bootstrap class
    289 (say, String) and be given a different class with the same name.  If a
    290 class in the DEX file being processed has the same name as a class in the
    291 bootstrap DEX files, the class will be flagged as ambiguous and references
    292 to it will not be resolved during verification / optimization.  The class
    293 linking code in the VM does additional checks to plug another hole;
    294 see the verbose description in the VM sources for details (vm/oo/Class.c).
    295 <p>
    296 If one of the dependencies is updated, we need to re-verify and
    297 re-optimize the DEX file.  If we can do a just-in-time <code>dexopt</code>
    298 invocation, this is easy.  If we have to rely on the installer daemon, or
    299 the DEX was shipped only in ODEX, then the VM has to reject the DEX.
    300 <p>
    301 The output of <code>dexopt</code> is byte-swapped and struct-aligned
    302 for the host, and contains indices and offsets that are highly VM-specific
    303 (both version-wise and platform-wise).  For this reason it's tricky to
    304 write a version of <code>dexopt</code> that runs on the desktop but
    305 generates output suitable for a particular device.  The safest way to
    306 invoke it is on the target device, or on an emulator for that device.
    307 
    308 
    309 <h2>Generated DEX</h2>
    310 
    311 <p>
    312 Some languages and frameworks rely on the ability to generate bytecode
    313 and execute it.  The rather heavy <code>dexopt</code> verification and
    314 optimization model doesn't work well with that.
    315 <p>
    316 We intend to support this in a future release, but the exact method is
    317 to be determined.  We may allow individual classes to be added or whole
    318 DEX files; may allow Java bytecode or Dalvik bytecode in instructions;
    319 may perform the usual set of optimizations, or use a separate interpreter
    320 that performs on-first-use optimizations directly on the bytecode (which
    321 won't be mapped read-only, since it's locally defined).
    322 
    323 <address>Copyright &copy; 2008 The Android Open Source Project</address>
    324 
    325 </body>
    326 </html>
    327