Home | History | Annotate | Download | only in mterp
      1 Dalvik "mterp" README
      2 
      3 NOTE: Find rebuilding instructions at the bottom of this file.
      4 
      5 
      6 ==== Overview ====
      7 
      8 This is the source code for the Dalvik interpreter.  The core of the
      9 original version was implemented as a single C function, but to improve
     10 performance we rewrote it in assembly.  To make this and future assembly
     11 ports easier and less error-prone, we used a modular approach that allows
     12 development of platform-specific code one opcode at a time.
     13 
     14 The original all-in-one-function C version still exists as the "portable"
     15 interpreter, and is generated using the same sources and tools that
     16 generate the platform-specific versions.
     17 
     18 Every configuration has a "config-*" file that controls how the sources
     19 are generated.  The sources are written into the "out" directory, where
     20 they are picked up by the Android build system.
     21 
     22 The best way to become familiar with the interpreter is to look at the
     23 generated files in the "out" directory, such as out/InterpC-portstd.c,
     24 rather than trying to look at the various component pieces in (say)
     25 armv5te.
     26 
     27 
     28 ==== Platform-specific source generation ====
     29 
     30 The architecture-specific config files determine what goes into two
     31 generated output files (InterpC-<arch>.c, InterpAsm-<arch>.S).  The goal is
     32 to make it easy to swap C and assembly sources during initial development
     33 and testing, and to provide a way to use architecture-specific versions of
     34 some operations (e.g. making use of PLD instructions on ARMv6 or avoiding
     35 CLZ on ARMv4T).
     36 
     37 Depending on architecture, instruction-to-instruction transitions may
     38 be done as either computed goto or jump table.  In the computed goto
     39 variant, each instruction handler is allocated a fixed-size area (e.g. 64
     40 byte).  "Overflow" code is tacked on to the end.  In the jump table variant,
     41 all of the instructions handlers are contiguous and may be of any size.
     42 The interpreter style is selected via the "handler-size" command (see below).
     43 
     44 When a C implementation for an instruction is desired, the assembly
     45 version packs all local state into the Thread structure and passes
     46 that to the C function.  Updates to the state are pulled out of
     47 "Thread" on return.
     48 
     49 The "arch" value should indicate an architecture family with common
     50 programming characteristics, so "armv5te" would work for all ARMv5TE CPUs,
     51 but might not be backward- or forward-compatible.  (We *might* want to
     52 specify the ABI model as well, e.g. "armv5te-eabi", but currently that adds
     53 verbosity without value.)
     54 
     55 
     56 ==== Config file format ====
     57 
     58 The config files are parsed from top to bottom.  Each line in the file
     59 may be blank, hold a comment (line starts with '#'), or be a command.
     60 
     61 The commands are:
     62 
     63   handler-style <computed-goto|jump-table|all-c>
     64 
     65     Specify which style of interpreter to generate.  In computed-goto,
     66     each handler is allocated a fixed region, allowing transitions to
     67     be done via table-start-address + (opcode * handler-size). With
     68     jump-table style, handlers may be of any length, and the generated
     69     table is an array of pointers to the handlers. The "all-c" style is
     70     for the portable interpreter (which is implemented completely in C).
     71     [Note: all-c is distinct from an "allstubs" configuration.  In both
     72     configurations, all handlers are the C versions, but the allstubs
     73     configuration uses the assembly outer loop and assembly stubs to
     74     transition to the handlers].  This command is required, and must be
     75     the first command in the config file.
     76 
     77   handler-size <bytes>
     78 
     79     Specify the size of the fixed region, in bytes.  On most platforms
     80     this will need to be a power of 2.  For jump-table and all-c
     81     implementations, this command is ignored.
     82 
     83   import <filename>
     84 
     85     The specified file is included immediately, in its entirety.  No
     86     substitutions are performed.  ".cpp" and ".h" files are copied to the
     87     C output, ".S" files are copied to the asm output.
     88 
     89   asm-stub <filename>
     90 
     91     The named file will be included whenever an assembly "stub" is needed
     92     to transfer control to a handler written in C.  Text substitution is
     93     performed on the opcode name.  This command is not applicable to
     94     to "all-c" configurations.
     95 
     96   asm-alt-stub <filename>
     97 
     98     When present, this command will cause the generation of an alternate
     99     set of entry points (for computed-goto interpreters) or an alternate
    100     jump table (for jump-table interpreters).
    101 
    102   op-start <directory>
    103 
    104     Indicates the start of the opcode list.  Must precede any "op"
    105     commands.  The specified directory is the default location to pull
    106     instruction files from.
    107 
    108   op <opcode> <directory>
    109 
    110     Can only appear after "op-start" and before "op-end".  Overrides the
    111     default source file location of the specified opcode.  The opcode
    112     definition will come from the specified file, e.g. "op OP_NOP armv5te"
    113     will load from "armv5te/OP_NOP.S".  A substitution dictionary will be
    114     applied (see below).
    115 
    116   alt <opcode> <directory>
    117 
    118     Can only appear after "op-start" and before "op-end".  Similar to the
    119     "op" command above, but denotes a source file to override the entry
    120     in the alternate handler table.  The opcode definition will come from
    121     the specified file, e.g. "alt OP_NOP armv5te" will load from
    122     "armv5te/ALT_OP_NOP.S".  A substitution dictionary will be applied
    123     (see below).
    124 
    125   op-end
    126 
    127     Indicates the end of the opcode list.  All kNumPackedOpcodes
    128     opcodes are emitted when this is seen, followed by any code that
    129     didn't fit inside the fixed-size instruction handler space.
    130 
    131 The order of "op" and "alt" directives are not significant; the generation
    132 tool will extract ordering info from the VM sources.
    133 
    134 Typically the form in which most opcodes currently exist is used in
    135 the "op-start" directive.  For a new port you would start with "c",
    136 and add architecture-specific "op" entries as you write instructions.
    137 When complete it will default to the target architecture, and you insert
    138 "c" ops to stub out platform-specific code.
    139 
    140 For the <directory> specified in the "op" command, the "c" directory
    141 is special in two ways: (1) the sources are assumed to be C code, and
    142 will be inserted into the generated C file; (2) when a C implementation
    143 is emitted, a "glue stub" is emitted in the assembly source file.
    144 (The generator script always emits kNumPackedOpcodes assembly
    145 instructions, unless "asm-stub" was left blank, in which case it only
    146 emits some labels.)
    147 
    148 
    149 ==== Instruction file format ====
    150 
    151 The assembly instruction files are simply fragments of assembly sources.
    152 The starting label will be provided by the generation tool, as will
    153 declarations for the segment type and alignment.  The expected target
    154 assembler is GNU "as", but others will work (may require fiddling with
    155 some of the pseudo-ops emitted by the generation tool).
    156 
    157 The C files do a bunch of fancy things with macros in an attempt to share
    158 code with the portable interpreter.  (This is expected to be reduced in
    159 the future.)
    160 
    161 A substitution dictionary is applied to all opcode fragments as they are
    162 appended to the output.  Substitutions can look like "$value" or "${value}".
    163 
    164 The dictionary always includes:
    165 
    166   $opcode - opcode name, e.g. "OP_NOP"
    167   $opnum - opcode number, e.g. 0 for OP_NOP
    168   $handler_size_bytes - max size of an instruction handler, in bytes
    169   $handler_size_bits - max size of an instruction handler, log 2
    170 
    171 Both C and assembly sources will be passed through the C pre-processor,
    172 so you can take advantage of C-style comments and preprocessor directives
    173 like "#define".
    174 
    175 Some generator operations are available.
    176 
    177   %include "filename" [subst-dict]
    178 
    179     Includes the file, which should look like "armv5te/OP_NOP.S".  You can
    180     specify values for the substitution dictionary, using standard Python
    181     syntax.  For example, this:
    182       %include "armv5te/unop.S" {"result":"r1"}
    183     would insert "armv5te/unop.S" at the current file position, replacing
    184     occurrences of "$result" with "r1".
    185 
    186   %default <subst-dict>
    187 
    188     Specify default substitution dictionary values, using standard Python
    189     syntax.  Useful if you want to have a "base" version and variants.
    190 
    191   %break
    192 
    193     Identifies the split between the main portion of the instruction
    194     handler (which must fit in "handler-size" bytes) and the "sister"
    195     code, which is appended to the end of the instruction handler block.
    196     In jump table implementations, %break is ignored.
    197 
    198   %verify "message"
    199 
    200     Leave a note to yourself about what needs to be tested.  (This may
    201     turn into something more interesting someday; for now, it just gets
    202     stripped out before the output is generated.)
    203 
    204 The generation tool does *not* print a warning if your instructions
    205 exceed "handler-size", but the VM will abort on startup if it detects an
    206 oversized handler.  On architectures with fixed-width instructions this
    207 is easy to work with, on others this you will need to count bytes.
    208 
    209 
    210 ==== Using C constants from assembly sources ====
    211 
    212 The file "common/asm-constants.h" has some definitions for constant
    213 values, structure sizes, and struct member offsets.  The format is fairly
    214 restricted, as simple macros are used to massage it for use with both C
    215 (where it is verified) and assembly (where the definitions are used).
    216 
    217 If a constant in the file becomes out of sync, the VM will log an error
    218 message and abort during startup.
    219 
    220 
    221 ==== Development tips ====
    222 
    223 If you need to debug the initial piece of an opcode handler, and your
    224 debug code expands it beyond the handler size limit, you can insert a
    225 generic header at the top:
    226 
    227     b       ${opcode}_start
    228 %break
    229 ${opcode}_start:
    230 
    231 If you already have a %break, it's okay to leave it in place -- the second
    232 %break is ignored.
    233 
    234 
    235 ==== Rebuilding ====
    236 
    237 If you change any of the source file fragments, you need to rebuild the
    238 combined source files in the "out" directory.  Make sure the files in
    239 "out" are editable, then:
    240 
    241     $ cd mterp
    242     $ ./rebuild.sh
    243 
    244 As of this writing, this requires Python 2.5. You may see inscrutible
    245 error messages or just general failure if you have a different version
    246 of Python installed.
    247 
    248 The ultimate goal is to have the build system generate the necessary
    249 output files without requiring this separate step, but we're not yet
    250 ready to require Python in the build.
    251 
    252 ==== Interpreter Control ====
    253 
    254 The central mechanism for interpreter control is the InterpBreak struture
    255 that is found in each thread's Thread struct (see vm/Thread.h).  There
    256 is one mandatory field, and two optional fields:
    257 
    258     subMode - required, describes debug/profile/special operation
    259     breakFlags & curHandlerTable - optional, used lower subMode polling costs
    260 
    261 The subMode field is a bitmask which records all currently active
    262 special modes of operation.  For example, when Traceview profiling
    263 is active, kSubModeMethodTrace is set.  This bit informs the interpreter
    264 that it must notify the profiling subsystem on each method entry and
    265 return.  There are similar bits for an active debugging session,
    266 instruction count profiling, pending thread suspension request, etc.
    267 
    268 To support special subMode operation the simplest mechanism for the
    269 interpreter is to poll the subMode field before interpreting each Dalvik
    270 bytecode and take any required action.  In fact, this is precisely
    271 what the portable interpreter does.  The "FINISH" macro expands to
    272 include a test of subMode and subsequent call to the "dvmCheckBefore()".
    273 
    274 Per-instruction polling, however, is expensive and subMode operation is
    275 relative rare.  For normal operation we'd like to avoid having to perform
    276 any checks unless a special subMode is actually in effect.  This is
    277 where curHandlerTable and breakFlags come in to play.
    278 
    279 The mterp fast interpreter achieves much of its performance advantage
    280 over the portable interpreter through its efficient mechanism of
    281 transitioning from one Dalvik bytecode to the next.  Mterp for ARM targets
    282 uses a computed-goto mechanism, in which the handler entrypoints are
    283 located at the base of the handler table + (opcode * 64).  Mterp for x86
    284 targets instead uses a jump table of handler entry points indexed
    285 by the Dalvik opcode.  To support efficient handling of special subModes,
    286 mterp supports two sets of handler entries (for ARM) or two jump
    287 tables (for x86).  One handler set is optimized for speed and performs no
    288 inter-instruction checks (mainHandlerTable in the Thread structure), while
    289 the other includes a test of the subMode field (altHandlerTable).
    290 
    291 In normal operation (i.e. subMode == 0), the dedicated register rIBASE
    292 (r8 for ARM, edx for x86) holds a mainHandlerTable.  If we need to switch
    293 to a subMode that requires inter-instruction checking, rIBASE is changed
    294 to altHandlerTable.  Note that this change is not immediate.  What is actually
    295 changed is the value of curHandlerTable - which is part of the interpBreak
    296 structure.  Rather than explicitly check for changes, each thread will
    297 blindly refresh rIBASE at backward branches, exception throws and returns.
    298 
    299 The breakFlags field tells the interpreter control mechanism whether
    300 curHandlerTable should hold the real or alternate handler base.  If
    301 non-zero, we use the altHandlerBase.  The bits within breakFlags
    302 tells dvmCheckBefore which set of subModes need to be checked.
    303 
    304 See dvmCheckBefore() for subMode handling, and dvmEnableSubMode(),
    305 dvmDisableSubMode() for switching on and off.
    306