Home | History | Annotate | Download | only in tutorial
      1 ==============================================
      2 Kaleidoscope: Adding JIT and Optimizer Support
      3 ==============================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 4 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 4 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
     13 of a simple language and added support for generating LLVM IR. This
     14 chapter describes two new techniques: adding optimizer support to your
     15 language, and adding JIT compiler support. These additions will
     16 demonstrate how to get nice, efficient code for the Kaleidoscope
     17 language.
     18 
     19 Trivial Constant Folding
     20 ========================
     21 
     22 Our demonstration for Chapter 3 is elegant and easy to extend.
     23 Unfortunately, it does not produce wonderful code. The IRBuilder,
     24 however, does give us obvious optimizations when compiling simple code:
     25 
     26 ::
     27 
     28     ready> def test(x) 1+2+x;
     29     Read function definition:
     30     define double @test(double %x) {
     31     entry:
     32             %addtmp = fadd double 3.000000e+00, %x
     33             ret double %addtmp
     34     }
     35 
     36 This code is not a literal transcription of the AST built by parsing the
     37 input. That would be:
     38 
     39 ::
     40 
     41     ready> def test(x) 1+2+x;
     42     Read function definition:
     43     define double @test(double %x) {
     44     entry:
     45             %addtmp = fadd double 2.000000e+00, 1.000000e+00
     46             %addtmp1 = fadd double %addtmp, %x
     47             ret double %addtmp1
     48     }
     49 
     50 Constant folding, as seen above, in particular, is a very common and
     51 very important optimization: so much so that many language implementors
     52 implement constant folding support in their AST representation.
     53 
     54 With LLVM, you don't need this support in the AST. Since all calls to
     55 build LLVM IR go through the LLVM IR builder, the builder itself checked
     56 to see if there was a constant folding opportunity when you call it. If
     57 so, it just does the constant fold and return the constant instead of
     58 creating an instruction.
     59 
     60 Well, that was easy :). In practice, we recommend always using
     61 ``IRBuilder`` when generating code like this. It has no "syntactic
     62 overhead" for its use (you don't have to uglify your compiler with
     63 constant checks everywhere) and it can dramatically reduce the amount of
     64 LLVM IR that is generated in some cases (particular for languages with a
     65 macro preprocessor or that use a lot of constants).
     66 
     67 On the other hand, the ``IRBuilder`` is limited by the fact that it does
     68 all of its analysis inline with the code as it is built. If you take a
     69 slightly more complex example:
     70 
     71 ::
     72 
     73     ready> def test(x) (1+2+x)*(x+(1+2));
     74     ready> Read function definition:
     75     define double @test(double %x) {
     76     entry:
     77             %addtmp = fadd double 3.000000e+00, %x
     78             %addtmp1 = fadd double %x, 3.000000e+00
     79             %multmp = fmul double %addtmp, %addtmp1
     80             ret double %multmp
     81     }
     82 
     83 In this case, the LHS and RHS of the multiplication are the same value.
     84 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
     85 instead of computing "``x+3``" twice.
     86 
     87 Unfortunately, no amount of local analysis will be able to detect and
     88 correct this. This requires two transformations: reassociation of
     89 expressions (to make the add's lexically identical) and Common
     90 Subexpression Elimination (CSE) to delete the redundant add instruction.
     91 Fortunately, LLVM provides a broad range of optimizations that you can
     92 use, in the form of "passes".
     93 
     94 LLVM Optimization Passes
     95 ========================
     96 
     97 LLVM provides many optimization passes, which do many different sorts of
     98 things and have different tradeoffs. Unlike other systems, LLVM doesn't
     99 hold to the mistaken notion that one set of optimizations is right for
    100 all languages and for all situations. LLVM allows a compiler implementor
    101 to make complete decisions about what optimizations to use, in which
    102 order, and in what situation.
    103 
    104 As a concrete example, LLVM supports both "whole module" passes, which
    105 look across as large of body of code as they can (often a whole file,
    106 but if run at link time, this can be a substantial portion of the whole
    107 program). It also supports and includes "per-function" passes which just
    108 operate on a single function at a time, without looking at other
    109 functions. For more information on passes and how they are run, see the
    110 `How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
    111 `List of LLVM Passes <../Passes.html>`_.
    112 
    113 For Kaleidoscope, we are currently generating functions on the fly, one
    114 at a time, as the user types them in. We aren't shooting for the
    115 ultimate optimization experience in this setting, but we also want to
    116 catch the easy and quick stuff where possible. As such, we will choose
    117 to run a few per-function optimizations as the user types the function
    118 in. If we wanted to make a "static Kaleidoscope compiler", we would use
    119 exactly the code we have now, except that we would defer running the
    120 optimizer until the entire file has been parsed.
    121 
    122 In order to get per-function optimizations going, we need to set up a
    123 `FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
    124 and organize the LLVM optimizations that we want to run. Once we have
    125 that, we can add a set of optimizations to run. We'll need a new
    126 FunctionPassManager for each module that we want to optimize, so we'll
    127 write a function to create and initialize both the module and pass manager
    128 for us:
    129 
    130 .. code-block:: c++
    131 
    132     void InitializeModuleAndPassManager(void) {
    133       // Open a new module.
    134       Context LLVMContext;
    135       TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext);
    136       TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
    137 
    138       // Create a new pass manager attached to it.
    139       TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());
    140 
    141       // Provide basic AliasAnalysis support for GVN.
    142       TheFPM.add(createBasicAliasAnalysisPass());
    143       // Do simple "peephole" optimizations and bit-twiddling optzns.
    144       TheFPM.add(createInstructionCombiningPass());
    145       // Reassociate expressions.
    146       TheFPM.add(createReassociatePass());
    147       // Eliminate Common SubExpressions.
    148       TheFPM.add(createGVNPass());
    149       // Simplify the control flow graph (deleting unreachable blocks, etc).
    150       TheFPM.add(createCFGSimplificationPass());
    151 
    152       TheFPM.doInitialization();
    153     }
    154 
    155 This code initializes the global module ``TheModule``, and the function pass
    156 manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
    157 set up, we use a series of "add" calls to add a bunch of LLVM passes.
    158 
    159 In this case, we choose to add five passes: one analysis pass (alias analysis),
    160 and four optimization passes. The passes we choose here are a pretty standard set
    161 of "cleanup" optimizations that are useful for a wide variety of code. I won't
    162 delve into what they do but, believe me, they are a good starting place :).
    163 
    164 Once the PassManager is set up, we need to make use of it. We do this by
    165 running it after our newly created function is constructed (in
    166 ``FunctionAST::codegen()``), but before it is returned to the client:
    167 
    168 .. code-block:: c++
    169 
    170       if (Value *RetVal = Body->codegen()) {
    171         // Finish off the function.
    172         Builder.CreateRet(RetVal);
    173 
    174         // Validate the generated code, checking for consistency.
    175         verifyFunction(*TheFunction);
    176 
    177         // Optimize the function.
    178         TheFPM->run(*TheFunction);
    179 
    180         return TheFunction;
    181       }
    182 
    183 As you can see, this is pretty straightforward. The
    184 ``FunctionPassManager`` optimizes and updates the LLVM Function\* in
    185 place, improving (hopefully) its body. With this in place, we can try
    186 our test above again:
    187 
    188 ::
    189 
    190     ready> def test(x) (1+2+x)*(x+(1+2));
    191     ready> Read function definition:
    192     define double @test(double %x) {
    193     entry:
    194             %addtmp = fadd double %x, 3.000000e+00
    195             %multmp = fmul double %addtmp, %addtmp
    196             ret double %multmp
    197     }
    198 
    199 As expected, we now get our nicely optimized code, saving a floating
    200 point add instruction from every execution of this function.
    201 
    202 LLVM provides a wide variety of optimizations that can be used in
    203 certain circumstances. Some `documentation about the various
    204 passes <../Passes.html>`_ is available, but it isn't very complete.
    205 Another good source of ideas can come from looking at the passes that
    206 ``Clang`` runs to get started. The "``opt``" tool allows you to
    207 experiment with passes from the command line, so you can see if they do
    208 anything.
    209 
    210 Now that we have reasonable code coming out of our front-end, lets talk
    211 about executing it!
    212 
    213 Adding a JIT Compiler
    214 =====================
    215 
    216 Code that is available in LLVM IR can have a wide variety of tools
    217 applied to it. For example, you can run optimizations on it (as we did
    218 above), you can dump it out in textual or binary forms, you can compile
    219 the code to an assembly file (.s) for some target, or you can JIT
    220 compile it. The nice thing about the LLVM IR representation is that it
    221 is the "common currency" between many different parts of the compiler.
    222 
    223 In this section, we'll add JIT compiler support to our interpreter. The
    224 basic idea that we want for Kaleidoscope is to have the user enter
    225 function bodies as they do now, but immediately evaluate the top-level
    226 expressions they type in. For example, if they type in "1 + 2;", we
    227 should evaluate and print out 3. If they define a function, they should
    228 be able to call it from the command line.
    229 
    230 In order to do this, we first declare and initialize the JIT. This is
    231 done by adding a global variable ``TheJIT``, and initializing it in
    232 ``main``:
    233 
    234 .. code-block:: c++
    235 
    236     static std::unique_ptr<KaleidoscopeJIT> TheJIT;
    237     ...
    238     int main() {
    239       ..
    240       TheJIT = llvm::make_unique<KaleidoscopeJIT>();
    241 
    242       // Run the main "interpreter loop" now.
    243       MainLoop();
    244 
    245       return 0;
    246     }
    247 
    248 The KaleidoscopeJIT class is a simple JIT built specifically for these
    249 tutorials. In later chapters we will look at how it works and extend it with
    250 new features, but for now we will take it as given. Its API is very simple::
    251 ``addModule`` adds an LLVM IR module to the JIT, making its functions
    252 available for execution; ``removeModule`` removes a module, freeing any
    253 memory associated with the code in that module; and ``findSymbol`` allows us
    254 to look up pointers to the compiled code.
    255 
    256 We can take this simple API and change our code that parses top-level expressions to
    257 look like this:
    258 
    259 .. code-block:: c++
    260 
    261     static void HandleTopLevelExpression() {
    262       // Evaluate a top-level expression into an anonymous function.
    263       if (auto FnAST = ParseTopLevelExpr()) {
    264         if (FnAST->codegen()) {
    265 
    266           // JIT the module containing the anonymous expression, keeping a handle so
    267           // we can free it later.
    268           auto H = TheJIT->addModule(std::move(TheModule));
    269           InitializeModuleAndPassManager();
    270 
    271           // Search the JIT for the __anon_expr symbol.
    272           auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
    273           assert(ExprSymbol && "Function not found");
    274 
    275           // Get the symbol's address and cast it to the right type (takes no
    276           // arguments, returns a double) so we can call it as a native function.
    277           double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
    278           fprintf(stderr, "Evaluated to %f\n", FP());
    279 
    280           // Delete the anonymous expression module from the JIT.
    281           TheJIT->removeModule(H);
    282         }
    283 
    284 If parsing and codegen succeeed, the next step is to add the module containing
    285 the top-level expression to the JIT. We do this by calling addModule, which
    286 triggers code generation for all the functions in the module, and returns a
    287 handle that can be used to remove the module from the JIT later. Once the module
    288 has been added to the JIT it can no longer be modified, so we also open a new
    289 module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
    290 
    291 Once we've added the module to the JIT we need to get a pointer to the final
    292 generated code. We do this by calling the JIT's findSymbol method, and passing
    293 the name of the top-level expression function: ``__anon_expr``. Since we just
    294 added this function, we assert that findSymbol returned a result.
    295 
    296 Next, we get the in-memory address of the ``__anon_expr`` function by calling
    297 ``getAddress()`` on the symbol. Recall that we compile top-level expressions
    298 into a self-contained LLVM function that takes no arguments and returns the
    299 computed double. Because the LLVM JIT compiler matches the native platform ABI,
    300 this means that you can just cast the result pointer to a function pointer of
    301 that type and call it directly. This means, there is no difference between JIT
    302 compiled code and native machine code that is statically linked into your
    303 application.
    304 
    305 Finally, since we don't support re-evaluation of top-level expressions, we
    306 remove the module from the JIT when we're done to free the associated memory.
    307 Recall, however, that the module we created a few lines earlier (via
    308 ``InitializeModuleAndPassManager``) is still open and waiting for new code to be
    309 added.
    310 
    311 With just these two changes, lets see how Kaleidoscope works now!
    312 
    313 ::
    314 
    315     ready> 4+5;
    316     Read top-level expression:
    317     define double @0() {
    318     entry:
    319       ret double 9.000000e+00
    320     }
    321 
    322     Evaluated to 9.000000
    323 
    324 Well this looks like it is basically working. The dump of the function
    325 shows the "no argument function that always returns double" that we
    326 synthesize for each top-level expression that is typed in. This
    327 demonstrates very basic functionality, but can we do more?
    328 
    329 ::
    330 
    331     ready> def testfunc(x y) x + y*2;
    332     Read function definition:
    333     define double @testfunc(double %x, double %y) {
    334     entry:
    335       %multmp = fmul double %y, 2.000000e+00
    336       %addtmp = fadd double %multmp, %x
    337       ret double %addtmp
    338     }
    339 
    340     ready> testfunc(4, 10);
    341     Read top-level expression:
    342     define double @1() {
    343     entry:
    344       %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
    345       ret double %calltmp
    346     }
    347 
    348     Evaluated to 24.000000
    349 
    350     ready> testfunc(5, 10);
    351     ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
    352 
    353 
    354 Function definitions and calls also work, but something went very wrong on that
    355 last line. The call looks valid, so what happened? As you may have guessed from
    356 the the API a Module is a unit of allocation for the JIT, and testfunc was part
    357 of the same module that contained anonymous expression. When we removed that
    358 module from the JIT to free the memory for the anonymous expression, we deleted
    359 the definition of ``testfunc`` along with it. Then, when we tried to call
    360 testfunc a second time, the JIT could no longer find it.
    361 
    362 The easiest way to fix this is to put the anonymous expression in a separate
    363 module from the rest of the function definitions. The JIT will happily resolve
    364 function calls across module boundaries, as long as each of the functions called
    365 has a prototype, and is added to the JIT before it is called. By putting the
    366 anonymous expression in a different module we can delete it without affecting
    367 the rest of the functions.
    368 
    369 In fact, we're going to go a step further and put every function in its own
    370 module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
    371 that will make our environment more REPL-like: Functions can be added to the
    372 JIT more than once (unlike a module where every function must have a unique
    373 definition). When you look up a symbol in KaleidoscopeJIT it will always return
    374 the most recent definition:
    375 
    376 ::
    377 
    378     ready> def foo(x) x + 1;
    379     Read function definition:
    380     define double @foo(double %x) {
    381     entry:
    382       %addtmp = fadd double %x, 1.000000e+00
    383       ret double %addtmp
    384     }
    385 
    386     ready> foo(2);
    387     Evaluated to 3.000000
    388 
    389     ready> def foo(x) x + 2;
    390     define double @foo(double %x) {
    391     entry:
    392       %addtmp = fadd double %x, 2.000000e+00
    393       ret double %addtmp
    394     }
    395 
    396     ready> foo(2);
    397     Evaluated to 4.000000
    398 
    399 
    400 To allow each function to live in its own module we'll need a way to
    401 re-generate previous function declarations into each new module we open:
    402 
    403 .. code-block:: c++
    404 
    405     static std::unique_ptr<KaleidoscopeJIT> TheJIT;
    406 
    407     ...
    408 
    409     Function *getFunction(std::string Name) {
    410       // First, see if the function has already been added to the current module.
    411       if (auto *F = TheModule->getFunction(Name))
    412         return F;
    413 
    414       // If not, check whether we can codegen the declaration from some existing
    415       // prototype.
    416       auto FI = FunctionProtos.find(Name);
    417       if (FI != FunctionProtos.end())
    418         return FI->second->codegen();
    419 
    420       // If no existing prototype exists, return null.
    421       return nullptr;
    422     }
    423 
    424     ...
    425 
    426     Value *CallExprAST::codegen() {
    427       // Look up the name in the global module table.
    428       Function *CalleeF = getFunction(Callee);
    429 
    430     ...
    431 
    432     Function *FunctionAST::codegen() {
    433       // Transfer ownership of the prototype to the FunctionProtos map, but keep a
    434       // reference to it for use below.
    435       auto &P = *Proto;
    436       FunctionProtos[Proto->getName()] = std::move(Proto);
    437       Function *TheFunction = getFunction(P.getName());
    438       if (!TheFunction)
    439         return nullptr;
    440 
    441 
    442 To enable this, we'll start by adding a new global, ``FunctionProtos``, that
    443 holds the most recent prototype for each function. We'll also add a convenience
    444 method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
    445 Our convenience method searches ``TheModule`` for an existing function
    446 declaration, falling back to generating a new declaration from FunctionProtos if
    447 it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
    448 call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
    449 update the FunctionProtos map first, then call ``getFunction()``. With this
    450 done, we can always obtain a function declaration in the current module for any
    451 previously declared function.
    452 
    453 We also need to update HandleDefinition and HandleExtern:
    454 
    455 .. code-block:: c++
    456 
    457     static void HandleDefinition() {
    458       if (auto FnAST = ParseDefinition()) {
    459         if (auto *FnIR = FnAST->codegen()) {
    460           fprintf(stderr, "Read function definition:");
    461           FnIR->dump();
    462           TheJIT->addModule(std::move(TheModule));
    463           InitializeModuleAndPassManager();
    464         }
    465       } else {
    466         // Skip token for error recovery.
    467          getNextToken();
    468       }
    469     }
    470 
    471     static void HandleExtern() {
    472       if (auto ProtoAST = ParseExtern()) {
    473         if (auto *FnIR = ProtoAST->codegen()) {
    474           fprintf(stderr, "Read extern: ");
    475           FnIR->dump();
    476           FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
    477         }
    478       } else {
    479         // Skip token for error recovery.
    480         getNextToken();
    481       }
    482     }
    483 
    484 In HandleDefinition, we add two lines to transfer the newly defined function to
    485 the JIT and open a new module. In HandleExtern, we just need to add one line to
    486 add the prototype to FunctionProtos.
    487 
    488 With these changes made, lets try our REPL again (I removed the dump of the
    489 anonymous functions this time, you should get the idea by now :) :
    490 
    491 ::
    492 
    493     ready> def foo(x) x + 1;
    494     ready> foo(2);
    495     Evaluated to 3.000000
    496 
    497     ready> def foo(x) x + 2;
    498     ready> foo(2);
    499     Evaluated to 4.000000
    500 
    501 It works!
    502 
    503 Even with this simple code, we get some surprisingly powerful capabilities -
    504 check this out:
    505 
    506 ::
    507 
    508     ready> extern sin(x);
    509     Read extern:
    510     declare double @sin(double)
    511 
    512     ready> extern cos(x);
    513     Read extern:
    514     declare double @cos(double)
    515 
    516     ready> sin(1.0);
    517     Read top-level expression:
    518     define double @2() {
    519     entry:
    520       ret double 0x3FEAED548F090CEE
    521     }
    522 
    523     Evaluated to 0.841471
    524 
    525     ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
    526     Read function definition:
    527     define double @foo(double %x) {
    528     entry:
    529       %calltmp = call double @sin(double %x)
    530       %multmp = fmul double %calltmp, %calltmp
    531       %calltmp2 = call double @cos(double %x)
    532       %multmp4 = fmul double %calltmp2, %calltmp2
    533       %addtmp = fadd double %multmp, %multmp4
    534       ret double %addtmp
    535     }
    536 
    537     ready> foo(4.0);
    538     Read top-level expression:
    539     define double @3() {
    540     entry:
    541       %calltmp = call double @foo(double 4.000000e+00)
    542       ret double %calltmp
    543     }
    544 
    545     Evaluated to 1.000000
    546 
    547 Whoa, how does the JIT know about sin and cos? The answer is surprisingly
    548 simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
    549 it uses to find symbols that aren't available in any given module: First
    550 it searches all the modules that have already been added to the JIT, from the
    551 most recent to the oldest, to find the newest definition. If no definition is
    552 found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
    553 Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
    554 address space, it simply patches up calls in the module to call the libm
    555 version of ``sin`` directly.
    556 
    557 In the future we'll see how tweaking this symbol resolution rule can be used to
    558 enable all sorts of useful features, from security (restricting the set of
    559 symbols available to JIT'd code), to dynamic code generation based on symbol
    560 names, and even lazy compilation.
    561 
    562 One immediate benefit of the symbol resolution rule is that we can now extend
    563 the language by writing arbitrary C++ code to implement operations. For example,
    564 if we add:
    565 
    566 .. code-block:: c++
    567 
    568     /// putchard - putchar that takes a double and returns 0.
    569     extern "C" double putchard(double X) {
    570       fputc((char)X, stderr);
    571       return 0;
    572     }
    573 
    574 Now we can produce simple output to the console by using things like:
    575 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
    576 on the console (120 is the ASCII code for 'x'). Similar code could be
    577 used to implement file I/O, console input, and many other capabilities
    578 in Kaleidoscope.
    579 
    580 This completes the JIT and optimizer chapter of the Kaleidoscope
    581 tutorial. At this point, we can compile a non-Turing-complete
    582 programming language, optimize and JIT compile it in a user-driven way.
    583 Next up we'll look into `extending the language with control flow
    584 constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues
    585 along the way.
    586 
    587 Full Code Listing
    588 =================
    589 
    590 Here is the complete code listing for our running example, enhanced with
    591 the LLVM JIT and optimizer. To build this example, use:
    592 
    593 .. code-block:: bash
    594 
    595     # Compile
    596     clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
    597     # Run
    598     ./toy
    599 
    600 If you are compiling this on Linux, make sure to add the "-rdynamic"
    601 option as well. This makes sure that the external functions are resolved
    602 properly at runtime.
    603 
    604 Here is the code:
    605 
    606 .. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp
    607    :language: c++
    608 
    609 `Next: Extending the language: control flow <LangImpl05.html>`_
    610 
    611