Home | History | Annotate | Download | only in docs
      1 Subzero - Fast code generator for PNaCl bitcode
      2 ===============================================
      3 
      4 Design
      5 ------
      6 
      7 See the accompanying DESIGN.rst file for a more detailed technical overview of
      8 Subzero.
      9 
     10 Building
     11 --------
     12 
     13 Subzero is set up to be built within the Native Client tree.  Follow the
     14 `Developing PNaCl
     15 <https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl>`_
     16 instructions, in particular the section on building PNaCl sources.  This will
     17 prepare the necessary external headers and libraries that Subzero needs.
     18 Checking out the Native Client project also gets the pre-built clang and LLVM
     19 tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which
     20 are used for building Subzero.
     21 
     22 The Subzero source is in ``native_client/toolchain_build/src/subzero``.  From
     23 within that directory, ``git checkout master && git pull`` to get the latest
     24 version of Subzero source code.
     25 
     26 The Makefile is designed to be used as part of the higher level LLVM build
     27 system.  To build manually, use the ``Makefile.standalone``.  There are several
     28 build configurations from the command line::
     29 
     30     make -f Makefile.standalone
     31     make -f Makefile.standalone DEBUG=1
     32     make -f Makefile.standalone NOASSERT=1
     33     make -f Makefile.standalone DEBUG=1 NOASSERT=1
     34     make -f Makefile.standalone MINIMAL=1
     35     make -f Makefile.standalone ASAN=1
     36     make -f Makefile.standalone TSAN=1
     37 
     38 ``DEBUG=1`` builds without optimizations and is good when running the translator
     39 inside a debugger.  ``NOASSERT=1`` disables assertions and is the preferred
     40 configuration for performance testing the translator.  ``MINIMAL=1`` attempts to
     41 minimize the size of the translator by compiling out everything unnecessary.
     42 ``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer.
     43 
     44 The result of the ``make`` command is the target ``pnacl-sz`` in the current
     45 directory.
     46 
     47 Building within LLVM trunk
     48 --------------------------
     49 
     50 Subzero can also be built from within a standard LLVM trunk checkout.  Here is
     51 an example of how it can be checked out and built::
     52 
     53     mkdir llvm-git
     54     cd llvm-git
     55     git clone http://llvm.org/git/llvm.git
     56     cd llvm/projects/
     57     git clone https://chromium.googlesource.com/native_client/pnacl-subzero
     58     cd ../..
     59     mkdir build
     60     cd build
     61     cmake -G Ninja ../llvm/
     62     ninja
     63     ./bin/pnacl-sz -version
     64 
     65 This creates a default build of ``pnacl-sz``; currently any options such as
     66 ``DEBUG=1`` or ``MINIMAL=1`` have to be added manually.
     67 
     68 ``pnacl-sz``
     69 ------------
     70 
     71 The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it
     72 into ICE (Subzero's intermediate representation).  It then invokes the ICE
     73 translate method to lower it to target-specific machine code, optionally dumping
     74 the intermediate representation at various stages of the translation.
     75 
     76 The program can be run as follows::
     77 
     78     ../pnacl-sz ./path/to/<file>.pexe
     79     ../pnacl-sz ./tests_lit/pnacl-sz_tests/<file>.ll
     80 
     81 At this time, ``pnacl-sz`` accepts a number of arguments, including the
     82 following:
     83 
     84     ``-help`` -- Show available arguments and possible values.  (Note: this
     85     unfortunately also pulls in some LLVM-specific options that are reported but
     86     that Subzero doesn't use.)
     87 
     88     ``-notranslate`` -- Suppress the ICE translation phase, which is useful if
     89     ICE is missing some support.
     90 
     91     ``-target=<TARGET>`` -- Set the target architecture.  The default is x8632.
     92     Future targets include x8664, arm32, and arm64.
     93 
     94     ``-filetype=obj|asm|iasm`` -- Select the output file type.  ``obj`` is a
     95     native ELF file, ``asm`` is a textual assembly file, and ``iasm`` is a
     96     low-level textual assembly file demonstrating the integrated assembler.
     97 
     98     ``-O<LEVEL>`` -- Set the optimization level.  Valid levels are ``2``, ``1``,
     99     ``0``, ``-1``, and ``m1``.  Levels ``-1`` and ``m1`` are synonyms, and
    100     represent the minimum optimization and worst code quality, but fastest code
    101     generation.
    102 
    103     ``-verbose=<list>`` -- Set verbosity flags.  This argument allows a
    104     comma-separated list of values.  The default is ``none``, and the value
    105     ``inst,pred`` will roughly match the .ll bitcode file.  Of particular use
    106     are ``all``, ``most``, and ``none``.
    107 
    108     ``-o <FILE>`` -- Set the assembly output file name.  Default is stdout.
    109 
    110     ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is
    111     controlled by ``-verbose``).  Default is stdout.
    112 
    113     ``-timing`` -- Dump some pass timing information after translating the input
    114     file.
    115 
    116 Running the test suite
    117 ----------------------
    118 
    119 Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which
    120 lives in ``tests_lit``. To execute the test suite, first build Subzero, and then
    121 run::
    122 
    123     make -f Makefile.standalone check-lit
    124 
    125 There is also a suite of cross tests in the ``crosstest`` directory.  A cross
    126 test takes a test bitcode file implementing some unit tests, and translates it
    127 twice, once with Subzero and once with LLVM's known-good ``llc`` translator.
    128 The Subzero-translated symbols are specially mangled to avoid multiple
    129 definition errors from the linker.  Both translated versions are linked together
    130 with a driver program that calls each version of each unit test with a variety
    131 of interesting inputs and compares the results for equality.  The cross tests
    132 are currently invoked by running::
    133 
    134     make -f Makefile.standalone check-xtest
    135 
    136 Similar, there is a suite of unit tests::
    137 
    138     make -f Makefile.standalone check-unit
    139 
    140 A convenient way to run the lit, cross, and unit tests is::
    141 
    142     make -f Makefile.standalone check
    143 
    144 Assembling ``pnacl-sz`` output as needed
    145 ----------------------------------------
    146 
    147 ``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``.
    148 
    149 ``pnacl-sz`` can also produce textual assembly code in a structure suitable for
    150 input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``.  An object
    151 file can then be produced using the command::
    152 
    153     llvm-mc -triple=i686 -filetype=obj -o=MyObj.o
    154 
    155 Building a translated binary
    156 ----------------------------
    157 
    158 There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe
    159 into a fully linked executable.  Run it with ``-help`` for extensive
    160 documentation.
    161 
    162 By default, ``szbuild.py`` builds an executable using only Subzero translation,
    163 but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is
    164 the name of the LLVM translator) for bisection-based debugging.  In bisection
    165 debugging mode, the pexe is translated using both Subzero and ``llc``, and the
    166 resulting object files are combined into a single executable using symbol
    167 weakening and other linker tricks to control which Subzero symbols and which
    168 ``llc`` symbols take precedence.  This is controlled by the ``-include`` and
    169 ``-exclude`` arguments.  These can be used to rapidly find a single function
    170 that Subzero translates incorrectly leading to incorrect output.
    171 
    172 There is another helper script, ``pydir/szbuild_spec2k.py``, that runs
    173 ``szbuild.py`` on one or more components of the Spec2K suite.  This assumes that
    174 Spec2K is set up in the usual place in the Native Client tree, and the finalized
    175 pexe files have been built.  (Note: for working with Spec2K and other pexes,
    176 it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the
    177 original function and global variable names.)
    178 
    179 Status
    180 ------
    181 
    182 Subzero currently fully supports the x86-32 architecture, for both native and
    183 Native Client sandboxing modes.  The x86-64 architecture is also supported in
    184 native mode only, and only for the x32 flavor due to the fact that pointers and
    185 32-bit integers are indistinguishable in PNaCl bitcode.  Sandboxing support for
    186 x86-64 is in progress.  ARM and MIPS support is in progress.  Two optimization
    187 levels, ``-Om1`` and ``-O2``, are implemented.
    188 
    189 The ``-Om1`` configuration is designed to be the simplest and fastest possible,
    190 with a minimal set of passes and transformations.
    191 
    192 * Simple Phi lowering before target lowering, by generating temporaries and
    193   adding assignments to the end of predecessor blocks.
    194 
    195 * Simple register allocation limited to pre-colored or infinite-weight
    196   Variables.
    197 
    198 The ``-O2`` configuration is designed to use all optimizations available and
    199 produce the best code.
    200 
    201 * Address mode inference to leverage the complex x86 addressing modes.
    202 
    203 * Compare/branch fusing based on liveness/last-use analysis.
    204 
    205 * Global, linear-scan register allocation.
    206 
    207 * Advanced phi lowering after target lowering and global register allocation,
    208   via edge splitting, topological sorting of the parallel moves, and final local
    209   register allocation.
    210 
    211 * Stack slot coalescing to reduce frame size.
    212 
    213 * Branch optimization to reduce the number of branches to the following block.
    214