Home | History | Annotate | Download | only in libbcc
      1 ===============================================================
      2 libbcc: A Versatile Bitcode Execution Engine for Mobile Devices
      3 ===============================================================
      4 
      5 
      6 Introduction
      7 ------------
      8 
      9 libbcc is an LLVM bitcode execution engine that compiles the bitcode
     10 to an in-memory executable. libbcc is versatile because:
     11 
     12 * it implements both AOT (Ahead-of-Time) and JIT (Just-in-Time)
     13   compilation.
     14 
     15 * Android devices demand fast start-up time, small size, and high
     16   performance *at the same time*. libbcc attempts to address these
     17   design constraints.
     18 
     19 * it supports on-device linking. Each device vendor can supply his or
     20   her own runtime bitcode library (lib*.bc) that differentiates his or
     21   her system. Specialization becomes ecosystem-friendly.
     22 
     23 libbcc provides:
     24 
     25 * a *just-in-time bitcode compiler*, which translates the LLVM bitcode
     26   into machine code
     27 
     28 * a *caching mechanism*, which can:
     29 
     30   * after each compilation, serialize the in-memory executable into a
     31     cache file.  Note that the compilation is triggered by a cache
     32     miss.
     33   * load from the cache file upon cache-hit.
     34 
     35 Highlights of libbcc are:
     36 
     37 * libbcc supports bitcode from various language frontends, such as
     38   Renderscript, GLSL (pixelflinger2).
     39 
     40 * libbcc strives to balance between library size, launch time and
     41   steady-state performance:
     42 
     43   * The size of libbcc is aggressively reduced for mobile devices. We
     44     customize and improve upon the default Execution Engine from
     45     upstream. Otherwise, libbcc's execution engine can easily become
     46     at least 2 times bigger.
     47 
     48   * To reduce launch time, we support caching of
     49     binaries. Just-in-Time compilation are oftentimes Just-too-Late,
     50     if the given apps are performance-sensitive. Thus, we implemented
     51     AOT to get the best of both worlds: Fast launch time and high
     52     steady-state performance.
     53 
     54     AOT is also important for projects such as NDK on LLVM with
     55     portability enhancement. Launch time reduction after we
     56     implemented AOT is signficant::
     57 
     58 
     59      Apps          libbcc without AOT       libbcc with AOT
     60                    launch time in libbcc    launch time in libbcc
     61      App_1            1218ms                   9ms
     62      App_2            842ms                    4ms
     63      Wallpaper:
     64        MagicSmoke     182ms                    3ms
     65        Halo           127ms                    3ms
     66      Balls            149ms                    3ms
     67      SceneGraph       146ms                    90ms
     68      Model            104ms                    4ms
     69      Fountain         57ms                     3ms
     70 
     71     AOT also masks the launching time overhead of on-device linking
     72     and helps it become reality.
     73 
     74   * For steady-state performance, we enable VFP3 and aggressive
     75     optimizations.
     76 
     77 * Currently we disable Lazy JITting.
     78 
     79 
     80 
     81 API
     82 ---
     83 
     84 **Basic:**
     85 
     86 * **bccCreateScript** - Create new bcc script
     87 
     88 * **bccRegisterSymbolCallback** - Register the callback function for external
     89   symbol lookup
     90 
     91 * **bccReadBC** - Set the source bitcode for compilation
     92 
     93 * **bccReadModule** - Set the llvm::Module for compilation
     94 
     95 * **bccLinkBC** - Set the library bitcode for linking
     96 
     97 * **bccPrepareExecutable** - *deprecated* - Use bccPrepareExecutableEx instead
     98 
     99 * **bccPrepareExecutableEx** - Create the in-memory executable by either
    100   just-in-time compilation or cache loading
    101 
    102 * **bccGetFuncAddr** - Get the entry address of the function
    103 
    104 * **bccDisposeScript** - Destroy bcc script and release the resources
    105 
    106 * **bccGetError** - *deprecated* - Don't use this
    107 
    108 
    109 **Reflection:**
    110 
    111 * **bccGetExportVarCount** - Get the count of exported variables
    112 
    113 * **bccGetExportVarList** - Get the addresses of exported variables
    114 
    115 * **bccGetExportFuncCount** - Get the count of exported functions
    116 
    117 * **bccGetExportFuncList** - Get the addresses of exported functions
    118 
    119 * **bccGetPragmaCount** - Get the count of pragmas
    120 
    121 * **bccGetPragmaList** - Get the pragmas
    122 
    123 
    124 **Debug:**
    125 
    126 * **bccGetFuncCount** - Get the count of functions (including non-exported)
    127 
    128 * **bccGetFuncInfoList** - Get the function information (name, base, size)
    129 
    130 
    131 
    132 Cache File Format
    133 -----------------
    134 
    135 A cache file (denoted as \*.oBCC) for libbcc consists of several sections:
    136 header, string pool, dependencies table, relocation table, exported
    137 variable list, exported function list, pragma list, function information
    138 table, and bcc context.  Every section should be aligned to a word size.
    139 Here is the brief description of each sections:
    140 
    141 * **Header** (MCO_Header) - The header of a cache file. It contains the
    142   magic word, version, machine integer type information (the endianness,
    143   the size of off_t, size_t, and ptr_t), and the size
    144   and offset of other sections.  The header section is guaranteed
    145   to be at the beginning of the cache file.
    146 
    147 * **String Pool** (MCO_StringPool) - A collection of serialized variable
    148   length strings.  The strp_index in the other part of the cache file
    149   represents the index of such string in this string pool.
    150 
    151 * **Dependencies Table** (MCO_DependencyTable) - The dependencies table.
    152   This table stores the resource name (or file path), the resource
    153   type (rather in APK or on the file system), and the SHA1 checksum.
    154 
    155 * **Relocation Table** (MCO_RelocationTable) - *not enabled*
    156 
    157 * **Exported Variable List** (MCO_ExportVarList) -
    158   The list of the addresses of exported variables.
    159 
    160 * **Exported Function List** (MCO_ExportFuncList) -
    161   The list of the addresses of exported functions.
    162 
    163 * **Pragma List** (MCO_PragmaList) - The list of pragma key-value pair.
    164 
    165 * **Function Information Table** (MCO_FuncTable) - This is a table of
    166   function information, such as function name, function entry address,
    167   and function binary size.  Besides, the table should be ordered by
    168   function name.
    169 
    170 * **Context** - The context of the in-memory executable, including
    171   the code and the data.  The offset of context should aligned to
    172   a page size, so that we can mmap the context directly into memory.
    173 
    174 For furthur information, you may read `bcc_cache.h <include/bcc/bcc_cache.h>`_,
    175 `CacheReader.cpp <lib/bcc/CacheReader.cpp>`_, and
    176 `CacheWriter.cpp <lib/bcc/CacheWriter.cpp>`_ for details.
    177 
    178 
    179 
    180 JIT'ed Code Calling Conventions
    181 -------------------------------
    182 
    183 1. Calls from Execution Environment or from/to within script:
    184 
    185    On ARM, the first 4 arguments will go into r0, r1, r2, and r3, in that order.
    186    The remaining (if any) will go through stack.
    187 
    188    For ext_vec_types such as float2, a set of registers will be used. In the case
    189    of float2, a register pair will be used. Specifically, if float2 is the first
    190    argument in the function prototype, float2.x will go into r0, and float2.y,
    191    r1.
    192 
    193    Note: stack will be aligned to the coarsest-grained argument. In the case of
    194    float2 above as an argument, parameter stack will be aligned to an 8-byte
    195    boundary (if the sizes of other arguments are no greater than 8.)
    196 
    197 2. Calls from/to a separate compilation unit: (E.g., calls to Execution
    198    Environment if those runtime library callees are not compiled using LLVM.)
    199 
    200    On ARM, we use hardfp.  Note that double will be placed in a register pair.
    201