Home | History | Annotate | Download | only in tutorial
      1 ==============================================
      2 Kaleidoscope: Adding JIT and Optimizer Support
      3 ==============================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 4 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 4 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
     13 of a simple language and added support for generating LLVM IR. This
     14 chapter describes two new techniques: adding optimizer support to your
     15 language, and adding JIT compiler support. These additions will
     16 demonstrate how to get nice, efficient code for the Kaleidoscope
     17 language.
     18 
     19 Trivial Constant Folding
     20 ========================
     21 
     22 Our demonstration for Chapter 3 is elegant and easy to extend.
     23 Unfortunately, it does not produce wonderful code. The IRBuilder,
     24 however, does give us obvious optimizations when compiling simple code:
     25 
     26 ::
     27 
     28     ready> def test(x) 1+2+x;
     29     Read function definition:
     30     define double @test(double %x) {
     31     entry:
     32             %addtmp = fadd double 3.000000e+00, %x
     33             ret double %addtmp
     34     }
     35 
     36 This code is not a literal transcription of the AST built by parsing the
     37 input. That would be:
     38 
     39 ::
     40 
     41     ready> def test(x) 1+2+x;
     42     Read function definition:
     43     define double @test(double %x) {
     44     entry:
     45             %addtmp = fadd double 2.000000e+00, 1.000000e+00
     46             %addtmp1 = fadd double %addtmp, %x
     47             ret double %addtmp1
     48     }
     49 
     50 Constant folding, as seen above, in particular, is a very common and
     51 very important optimization: so much so that many language implementors
     52 implement constant folding support in their AST representation.
     53 
     54 With LLVM, you don't need this support in the AST. Since all calls to
     55 build LLVM IR go through the LLVM IR builder, the builder itself checked
     56 to see if there was a constant folding opportunity when you call it. If
     57 so, it just does the constant fold and return the constant instead of
     58 creating an instruction.
     59 
     60 Well, that was easy :). In practice, we recommend always using
     61 ``IRBuilder`` when generating code like this. It has no "syntactic
     62 overhead" for its use (you don't have to uglify your compiler with
     63 constant checks everywhere) and it can dramatically reduce the amount of
     64 LLVM IR that is generated in some cases (particular for languages with a
     65 macro preprocessor or that use a lot of constants).
     66 
     67 On the other hand, the ``IRBuilder`` is limited by the fact that it does
     68 all of its analysis inline with the code as it is built. If you take a
     69 slightly more complex example:
     70 
     71 ::
     72 
     73     ready> def test(x) (1+2+x)*(x+(1+2));
     74     ready> Read function definition:
     75     define double @test(double %x) {
     76     entry:
     77             %addtmp = fadd double 3.000000e+00, %x
     78             %addtmp1 = fadd double %x, 3.000000e+00
     79             %multmp = fmul double %addtmp, %addtmp1
     80             ret double %multmp
     81     }
     82 
     83 In this case, the LHS and RHS of the multiplication are the same value.
     84 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
     85 instead of computing "``x+3``" twice.
     86 
     87 Unfortunately, no amount of local analysis will be able to detect and
     88 correct this. This requires two transformations: reassociation of
     89 expressions (to make the add's lexically identical) and Common
     90 Subexpression Elimination (CSE) to delete the redundant add instruction.
     91 Fortunately, LLVM provides a broad range of optimizations that you can
     92 use, in the form of "passes".
     93 
     94 LLVM Optimization Passes
     95 ========================
     96 
     97 LLVM provides many optimization passes, which do many different sorts of
     98 things and have different tradeoffs. Unlike other systems, LLVM doesn't
     99 hold to the mistaken notion that one set of optimizations is right for
    100 all languages and for all situations. LLVM allows a compiler implementor
    101 to make complete decisions about what optimizations to use, in which
    102 order, and in what situation.
    103 
    104 As a concrete example, LLVM supports both "whole module" passes, which
    105 look across as large of body of code as they can (often a whole file,
    106 but if run at link time, this can be a substantial portion of the whole
    107 program). It also supports and includes "per-function" passes which just
    108 operate on a single function at a time, without looking at other
    109 functions. For more information on passes and how they are run, see the
    110 `How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
    111 `List of LLVM Passes <../Passes.html>`_.
    112 
    113 For Kaleidoscope, we are currently generating functions on the fly, one
    114 at a time, as the user types them in. We aren't shooting for the
    115 ultimate optimization experience in this setting, but we also want to
    116 catch the easy and quick stuff where possible. As such, we will choose
    117 to run a few per-function optimizations as the user types the function
    118 in. If we wanted to make a "static Kaleidoscope compiler", we would use
    119 exactly the code we have now, except that we would defer running the
    120 optimizer until the entire file has been parsed.
    121 
    122 In order to get per-function optimizations going, we need to set up a
    123 `FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold
    124 and organize the LLVM optimizations that we want to run. Once we have
    125 that, we can add a set of optimizations to run. The code looks like
    126 this:
    127 
    128 .. code-block:: c++
    129 
    130       FunctionPassManager OurFPM(TheModule);
    131 
    132       // Set up the optimizer pipeline.  Start with registering info about how the
    133       // target lays out data structures.
    134       OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
    135       // Provide basic AliasAnalysis support for GVN.
    136       OurFPM.add(createBasicAliasAnalysisPass());
    137       // Do simple "peephole" optimizations and bit-twiddling optzns.
    138       OurFPM.add(createInstructionCombiningPass());
    139       // Reassociate expressions.
    140       OurFPM.add(createReassociatePass());
    141       // Eliminate Common SubExpressions.
    142       OurFPM.add(createGVNPass());
    143       // Simplify the control flow graph (deleting unreachable blocks, etc).
    144       OurFPM.add(createCFGSimplificationPass());
    145 
    146       OurFPM.doInitialization();
    147 
    148       // Set the global so the code gen can use this.
    149       TheFPM = &OurFPM;
    150 
    151       // Run the main "interpreter loop" now.
    152       MainLoop();
    153 
    154 This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a
    155 pointer to the ``Module`` to construct itself. Once it is set up, we use
    156 a series of "add" calls to add a bunch of LLVM passes. The first pass is
    157 basically boilerplate, it adds a pass so that later optimizations know
    158 how the data structures in the program are laid out. The
    159 "``TheExecutionEngine``" variable is related to the JIT, which we will
    160 get to in the next section.
    161 
    162 In this case, we choose to add 4 optimization passes. The passes we
    163 chose here are a pretty standard set of "cleanup" optimizations that are
    164 useful for a wide variety of code. I won't delve into what they do but,
    165 believe me, they are a good starting place :).
    166 
    167 Once the PassManager is set up, we need to make use of it. We do this by
    168 running it after our newly created function is constructed (in
    169 ``FunctionAST::Codegen``), but before it is returned to the client:
    170 
    171 .. code-block:: c++
    172 
    173       if (Value *RetVal = Body->Codegen()) {
    174         // Finish off the function.
    175         Builder.CreateRet(RetVal);
    176 
    177         // Validate the generated code, checking for consistency.
    178         verifyFunction(*TheFunction);
    179 
    180         // Optimize the function.
    181         TheFPM->run(*TheFunction);
    182 
    183         return TheFunction;
    184       }
    185 
    186 As you can see, this is pretty straightforward. The
    187 ``FunctionPassManager`` optimizes and updates the LLVM Function\* in
    188 place, improving (hopefully) its body. With this in place, we can try
    189 our test above again:
    190 
    191 ::
    192 
    193     ready> def test(x) (1+2+x)*(x+(1+2));
    194     ready> Read function definition:
    195     define double @test(double %x) {
    196     entry:
    197             %addtmp = fadd double %x, 3.000000e+00
    198             %multmp = fmul double %addtmp, %addtmp
    199             ret double %multmp
    200     }
    201 
    202 As expected, we now get our nicely optimized code, saving a floating
    203 point add instruction from every execution of this function.
    204 
    205 LLVM provides a wide variety of optimizations that can be used in
    206 certain circumstances. Some `documentation about the various
    207 passes <../Passes.html>`_ is available, but it isn't very complete.
    208 Another good source of ideas can come from looking at the passes that
    209 ``Clang`` runs to get started. The "``opt``" tool allows you to
    210 experiment with passes from the command line, so you can see if they do
    211 anything.
    212 
    213 Now that we have reasonable code coming out of our front-end, lets talk
    214 about executing it!
    215 
    216 Adding a JIT Compiler
    217 =====================
    218 
    219 Code that is available in LLVM IR can have a wide variety of tools
    220 applied to it. For example, you can run optimizations on it (as we did
    221 above), you can dump it out in textual or binary forms, you can compile
    222 the code to an assembly file (.s) for some target, or you can JIT
    223 compile it. The nice thing about the LLVM IR representation is that it
    224 is the "common currency" between many different parts of the compiler.
    225 
    226 In this section, we'll add JIT compiler support to our interpreter. The
    227 basic idea that we want for Kaleidoscope is to have the user enter
    228 function bodies as they do now, but immediately evaluate the top-level
    229 expressions they type in. For example, if they type in "1 + 2;", we
    230 should evaluate and print out 3. If they define a function, they should
    231 be able to call it from the command line.
    232 
    233 In order to do this, we first declare and initialize the JIT. This is
    234 done by adding a global variable and a call in ``main``:
    235 
    236 .. code-block:: c++
    237 
    238     static ExecutionEngine *TheExecutionEngine;
    239     ...
    240     int main() {
    241       ..
    242       // Create the JIT.  This takes ownership of the module.
    243       TheExecutionEngine = EngineBuilder(TheModule).create();
    244       ..
    245     }
    246 
    247 This creates an abstract "Execution Engine" which can be either a JIT
    248 compiler or the LLVM interpreter. LLVM will automatically pick a JIT
    249 compiler for you if one is available for your platform, otherwise it
    250 will fall back to the interpreter.
    251 
    252 Once the ``ExecutionEngine`` is created, the JIT is ready to be used.
    253 There are a variety of APIs that are useful, but the simplest one is the
    254 "``getPointerToFunction(F)``" method. This method JIT compiles the
    255 specified LLVM Function and returns a function pointer to the generated
    256 machine code. In our case, this means that we can change the code that
    257 parses a top-level expression to look like this:
    258 
    259 .. code-block:: c++
    260 
    261     static void HandleTopLevelExpression() {
    262       // Evaluate a top-level expression into an anonymous function.
    263       if (FunctionAST *F = ParseTopLevelExpr()) {
    264         if (Function *LF = F->Codegen()) {
    265           LF->dump();  // Dump the function for exposition purposes.
    266 
    267           // JIT the function, returning a function pointer.
    268           void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
    269 
    270           // Cast it to the right type (takes no arguments, returns a double) so we
    271           // can call it as a native function.
    272           double (*FP)() = (double (*)())(intptr_t)FPtr;
    273           fprintf(stderr, "Evaluated to %f\n", FP());
    274         }
    275 
    276 Recall that we compile top-level expressions into a self-contained LLVM
    277 function that takes no arguments and returns the computed double.
    278 Because the LLVM JIT compiler matches the native platform ABI, this
    279 means that you can just cast the result pointer to a function pointer of
    280 that type and call it directly. This means, there is no difference
    281 between JIT compiled code and native machine code that is statically
    282 linked into your application.
    283 
    284 With just these two changes, lets see how Kaleidoscope works now!
    285 
    286 ::
    287 
    288     ready> 4+5;
    289     Read top-level expression:
    290     define double @0() {
    291     entry:
    292       ret double 9.000000e+00
    293     }
    294 
    295     Evaluated to 9.000000
    296 
    297 Well this looks like it is basically working. The dump of the function
    298 shows the "no argument function that always returns double" that we
    299 synthesize for each top-level expression that is typed in. This
    300 demonstrates very basic functionality, but can we do more?
    301 
    302 ::
    303 
    304     ready> def testfunc(x y) x + y*2;
    305     Read function definition:
    306     define double @testfunc(double %x, double %y) {
    307     entry:
    308       %multmp = fmul double %y, 2.000000e+00
    309       %addtmp = fadd double %multmp, %x
    310       ret double %addtmp
    311     }
    312 
    313     ready> testfunc(4, 10);
    314     Read top-level expression:
    315     define double @1() {
    316     entry:
    317       %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
    318       ret double %calltmp
    319     }
    320 
    321     Evaluated to 24.000000
    322 
    323 This illustrates that we can now call user code, but there is something
    324 a bit subtle going on here. Note that we only invoke the JIT on the
    325 anonymous functions that *call testfunc*, but we never invoked it on
    326 *testfunc* itself. What actually happened here is that the JIT scanned
    327 for all non-JIT'd functions transitively called from the anonymous
    328 function and compiled all of them before returning from
    329 ``getPointerToFunction()``.
    330 
    331 The JIT provides a number of other more advanced interfaces for things
    332 like freeing allocated machine code, rejit'ing functions to update them,
    333 etc. However, even with this simple code, we get some surprisingly
    334 powerful capabilities - check this out (I removed the dump of the
    335 anonymous functions, you should get the idea by now :) :
    336 
    337 ::
    338 
    339     ready> extern sin(x);
    340     Read extern:
    341     declare double @sin(double)
    342 
    343     ready> extern cos(x);
    344     Read extern:
    345     declare double @cos(double)
    346 
    347     ready> sin(1.0);
    348     Read top-level expression:
    349     define double @2() {
    350     entry:
    351       ret double 0x3FEAED548F090CEE
    352     }
    353 
    354     Evaluated to 0.841471
    355 
    356     ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
    357     Read function definition:
    358     define double @foo(double %x) {
    359     entry:
    360       %calltmp = call double @sin(double %x)
    361       %multmp = fmul double %calltmp, %calltmp
    362       %calltmp2 = call double @cos(double %x)
    363       %multmp4 = fmul double %calltmp2, %calltmp2
    364       %addtmp = fadd double %multmp, %multmp4
    365       ret double %addtmp
    366     }
    367 
    368     ready> foo(4.0);
    369     Read top-level expression:
    370     define double @3() {
    371     entry:
    372       %calltmp = call double @foo(double 4.000000e+00)
    373       ret double %calltmp
    374     }
    375 
    376     Evaluated to 1.000000
    377 
    378 Whoa, how does the JIT know about sin and cos? The answer is
    379 surprisingly simple: in this example, the JIT started execution of a
    380 function and got to a function call. It realized that the function was
    381 not yet JIT compiled and invoked the standard set of routines to resolve
    382 the function. In this case, there is no body defined for the function,
    383 so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope
    384 process itself. Since "``sin``" is defined within the JIT's address
    385 space, it simply patches up calls in the module to call the libm version
    386 of ``sin`` directly.
    387 
    388 The LLVM JIT provides a number of interfaces (look in the
    389 ``ExecutionEngine.h`` file) for controlling how unknown functions get
    390 resolved. It allows you to establish explicit mappings between IR
    391 objects and addresses (useful for LLVM global variables that you want to
    392 map to static tables, for example), allows you to dynamically decide on
    393 the fly based on the function name, and even allows you to have the JIT
    394 compile functions lazily the first time they're called.
    395 
    396 One interesting application of this is that we can now extend the
    397 language by writing arbitrary C++ code to implement operations. For
    398 example, if we add:
    399 
    400 .. code-block:: c++
    401 
    402     /// putchard - putchar that takes a double and returns 0.
    403     extern "C"
    404     double putchard(double X) {
    405       putchar((char)X);
    406       return 0;
    407     }
    408 
    409 Now we can produce simple output to the console by using things like:
    410 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
    411 on the console (120 is the ASCII code for 'x'). Similar code could be
    412 used to implement file I/O, console input, and many other capabilities
    413 in Kaleidoscope.
    414 
    415 This completes the JIT and optimizer chapter of the Kaleidoscope
    416 tutorial. At this point, we can compile a non-Turing-complete
    417 programming language, optimize and JIT compile it in a user-driven way.
    418 Next up we'll look into `extending the language with control flow
    419 constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues
    420 along the way.
    421 
    422 Full Code Listing
    423 =================
    424 
    425 Here is the complete code listing for our running example, enhanced with
    426 the LLVM JIT and optimizer. To build this example, use:
    427 
    428 .. code-block:: bash
    429 
    430     # Compile
    431     clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
    432     # Run
    433     ./toy
    434 
    435 If you are compiling this on Linux, make sure to add the "-rdynamic"
    436 option as well. This makes sure that the external functions are resolved
    437 properly at runtime.
    438 
    439 Here is the code:
    440 
    441 .. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp
    442    :language: c++
    443 
    444 `Next: Extending the language: control flow <LangImpl5.html>`_
    445 
    446