1 Subzero - Fast code generator for PNaCl bitcode 2 =============================================== 3 4 Design 5 ------ 6 7 See the accompanying DESIGN.rst file for a more detailed technical overview of 8 Subzero. 9 10 Building 11 -------- 12 13 Subzero is set up to be built within the Native Client tree. Follow the 14 `Developing PNaCl 15 <https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl>`_ 16 instructions, in particular the section on building PNaCl sources. This will 17 prepare the necessary external headers and libraries that Subzero needs. 18 Checking out the Native Client project also gets the pre-built clang and LLVM 19 tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which 20 are used for building Subzero. 21 22 The Subzero source is in ``native_client/toolchain_build/src/subzero``. From 23 within that directory, ``git checkout master && git pull`` to get the latest 24 version of Subzero source code. 25 26 The Makefile is designed to be used as part of the higher level LLVM build 27 system. To build manually, use the ``Makefile.standalone``. There are several 28 build configurations from the command line:: 29 30 make -f Makefile.standalone 31 make -f Makefile.standalone DEBUG=1 32 make -f Makefile.standalone NOASSERT=1 33 make -f Makefile.standalone DEBUG=1 NOASSERT=1 34 make -f Makefile.standalone MINIMAL=1 35 make -f Makefile.standalone ASAN=1 36 make -f Makefile.standalone TSAN=1 37 38 ``DEBUG=1`` builds without optimizations and is good when running the translator 39 inside a debugger. ``NOASSERT=1`` disables assertions and is the preferred 40 configuration for performance testing the translator. ``MINIMAL=1`` attempts to 41 minimize the size of the translator by compiling out everything unnecessary. 42 ``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer. 43 44 The result of the ``make`` command is the target ``pnacl-sz`` in the current 45 directory. 46 47 Building within LLVM trunk 48 -------------------------- 49 50 Subzero can also be built from within a standard LLVM trunk checkout. Here is 51 an example of how it can be checked out and built:: 52 53 mkdir llvm-git 54 cd llvm-git 55 git clone http://llvm.org/git/llvm.git 56 cd llvm/projects/ 57 git clone https://chromium.googlesource.com/native_client/pnacl-subzero 58 cd ../.. 59 mkdir build 60 cd build 61 cmake -G Ninja ../llvm/ 62 ninja 63 ./bin/pnacl-sz -version 64 65 This creates a default build of ``pnacl-sz``; currently any options such as 66 ``DEBUG=1`` or ``MINIMAL=1`` have to be added manually. 67 68 ``pnacl-sz`` 69 ------------ 70 71 The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it 72 into ICE (Subzero's intermediate representation). It then invokes the ICE 73 translate method to lower it to target-specific machine code, optionally dumping 74 the intermediate representation at various stages of the translation. 75 76 The program can be run as follows:: 77 78 ../pnacl-sz ./path/to/<file>.pexe 79 ../pnacl-sz ./tests_lit/pnacl-sz_tests/<file>.ll 80 81 At this time, ``pnacl-sz`` accepts a number of arguments, including the 82 following: 83 84 ``-help`` -- Show available arguments and possible values. (Note: this 85 unfortunately also pulls in some LLVM-specific options that are reported but 86 that Subzero doesn't use.) 87 88 ``-notranslate`` -- Suppress the ICE translation phase, which is useful if 89 ICE is missing some support. 90 91 ``-target=<TARGET>`` -- Set the target architecture. The default is x8632. 92 Future targets include x8664, arm32, and arm64. 93 94 ``-filetype=obj|asm|iasm`` -- Select the output file type. ``obj`` is a 95 native ELF file, ``asm`` is a textual assembly file, and ``iasm`` is a 96 low-level textual assembly file demonstrating the integrated assembler. 97 98 ``-O<LEVEL>`` -- Set the optimization level. Valid levels are ``2``, ``1``, 99 ``0``, ``-1``, and ``m1``. Levels ``-1`` and ``m1`` are synonyms, and 100 represent the minimum optimization and worst code quality, but fastest code 101 generation. 102 103 ``-verbose=<list>`` -- Set verbosity flags. This argument allows a 104 comma-separated list of values. The default is ``none``, and the value 105 ``inst,pred`` will roughly match the .ll bitcode file. Of particular use 106 are ``all``, ``most``, and ``none``. 107 108 ``-o <FILE>`` -- Set the assembly output file name. Default is stdout. 109 110 ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is 111 controlled by ``-verbose``). Default is stdout. 112 113 ``-timing`` -- Dump some pass timing information after translating the input 114 file. 115 116 Running the test suite 117 ---------------------- 118 119 Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which 120 lives in ``tests_lit``. To execute the test suite, first build Subzero, and then 121 run:: 122 123 make -f Makefile.standalone check-lit 124 125 There is also a suite of cross tests in the ``crosstest`` directory. A cross 126 test takes a test bitcode file implementing some unit tests, and translates it 127 twice, once with Subzero and once with LLVM's known-good ``llc`` translator. 128 The Subzero-translated symbols are specially mangled to avoid multiple 129 definition errors from the linker. Both translated versions are linked together 130 with a driver program that calls each version of each unit test with a variety 131 of interesting inputs and compares the results for equality. The cross tests 132 are currently invoked by running:: 133 134 make -f Makefile.standalone check-xtest 135 136 Similar, there is a suite of unit tests:: 137 138 make -f Makefile.standalone check-unit 139 140 A convenient way to run the lit, cross, and unit tests is:: 141 142 make -f Makefile.standalone check 143 144 Assembling ``pnacl-sz`` output as needed 145 ---------------------------------------- 146 147 ``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``. 148 149 ``pnacl-sz`` can also produce textual assembly code in a structure suitable for 150 input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``. An object 151 file can then be produced using the command:: 152 153 llvm-mc -triple=i686 -filetype=obj -o=MyObj.o 154 155 Building a translated binary 156 ---------------------------- 157 158 There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe 159 into a fully linked executable. Run it with ``-help`` for extensive 160 documentation. 161 162 By default, ``szbuild.py`` builds an executable using only Subzero translation, 163 but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is 164 the name of the LLVM translator) for bisection-based debugging. In bisection 165 debugging mode, the pexe is translated using both Subzero and ``llc``, and the 166 resulting object files are combined into a single executable using symbol 167 weakening and other linker tricks to control which Subzero symbols and which 168 ``llc`` symbols take precedence. This is controlled by the ``-include`` and 169 ``-exclude`` arguments. These can be used to rapidly find a single function 170 that Subzero translates incorrectly leading to incorrect output. 171 172 There is another helper script, ``pydir/szbuild_spec2k.py``, that runs 173 ``szbuild.py`` on one or more components of the Spec2K suite. This assumes that 174 Spec2K is set up in the usual place in the Native Client tree, and the finalized 175 pexe files have been built. (Note: for working with Spec2K and other pexes, 176 it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the 177 original function and global variable names.) 178 179 Status 180 ------ 181 182 Subzero currently fully supports the x86-32 architecture, for both native and 183 Native Client sandboxing modes. The x86-64 architecture is also supported in 184 native mode only, and only for the x32 flavor due to the fact that pointers and 185 32-bit integers are indistinguishable in PNaCl bitcode. Sandboxing support for 186 x86-64 is in progress. ARM and MIPS support is in progress. Two optimization 187 levels, ``-Om1`` and ``-O2``, are implemented. 188 189 The ``-Om1`` configuration is designed to be the simplest and fastest possible, 190 with a minimal set of passes and transformations. 191 192 * Simple Phi lowering before target lowering, by generating temporaries and 193 adding assignments to the end of predecessor blocks. 194 195 * Simple register allocation limited to pre-colored or infinite-weight 196 Variables. 197 198 The ``-O2`` configuration is designed to use all optimizations available and 199 produce the best code. 200 201 * Address mode inference to leverage the complex x86 addressing modes. 202 203 * Compare/branch fusing based on liveness/last-use analysis. 204 205 * Global, linear-scan register allocation. 206 207 * Advanced phi lowering after target lowering and global register allocation, 208 via edge splitting, topological sorting of the parallel moves, and final local 209 register allocation. 210 211 * Stack slot coalescing to reduce frame size. 212 213 * Branch optimization to reduce the number of branches to the following block. 214