1 ============================================== 2 Kaleidoscope: Adding JIT and Optimizer Support 3 ============================================== 4 5 .. contents:: 6 :local: 7 8 Chapter 4 Introduction 9 ====================== 10 11 Welcome to Chapter 4 of the "`Implementing a language with 12 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation 13 of a simple language and added support for generating LLVM IR. This 14 chapter describes two new techniques: adding optimizer support to your 15 language, and adding JIT compiler support. These additions will 16 demonstrate how to get nice, efficient code for the Kaleidoscope 17 language. 18 19 Trivial Constant Folding 20 ======================== 21 22 Our demonstration for Chapter 3 is elegant and easy to extend. 23 Unfortunately, it does not produce wonderful code. The IRBuilder, 24 however, does give us obvious optimizations when compiling simple code: 25 26 :: 27 28 ready> def test(x) 1+2+x; 29 Read function definition: 30 define double @test(double %x) { 31 entry: 32 %addtmp = fadd double 3.000000e+00, %x 33 ret double %addtmp 34 } 35 36 This code is not a literal transcription of the AST built by parsing the 37 input. That would be: 38 39 :: 40 41 ready> def test(x) 1+2+x; 42 Read function definition: 43 define double @test(double %x) { 44 entry: 45 %addtmp = fadd double 2.000000e+00, 1.000000e+00 46 %addtmp1 = fadd double %addtmp, %x 47 ret double %addtmp1 48 } 49 50 Constant folding, as seen above, in particular, is a very common and 51 very important optimization: so much so that many language implementors 52 implement constant folding support in their AST representation. 53 54 With LLVM, you don't need this support in the AST. Since all calls to 55 build LLVM IR go through the LLVM IR builder, the builder itself checked 56 to see if there was a constant folding opportunity when you call it. If 57 so, it just does the constant fold and return the constant instead of 58 creating an instruction. 59 60 Well, that was easy :). In practice, we recommend always using 61 ``IRBuilder`` when generating code like this. It has no "syntactic 62 overhead" for its use (you don't have to uglify your compiler with 63 constant checks everywhere) and it can dramatically reduce the amount of 64 LLVM IR that is generated in some cases (particular for languages with a 65 macro preprocessor or that use a lot of constants). 66 67 On the other hand, the ``IRBuilder`` is limited by the fact that it does 68 all of its analysis inline with the code as it is built. If you take a 69 slightly more complex example: 70 71 :: 72 73 ready> def test(x) (1+2+x)*(x+(1+2)); 74 ready> Read function definition: 75 define double @test(double %x) { 76 entry: 77 %addtmp = fadd double 3.000000e+00, %x 78 %addtmp1 = fadd double %x, 3.000000e+00 79 %multmp = fmul double %addtmp, %addtmp1 80 ret double %multmp 81 } 82 83 In this case, the LHS and RHS of the multiplication are the same value. 84 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" 85 instead of computing "``x+3``" twice. 86 87 Unfortunately, no amount of local analysis will be able to detect and 88 correct this. This requires two transformations: reassociation of 89 expressions (to make the add's lexically identical) and Common 90 Subexpression Elimination (CSE) to delete the redundant add instruction. 91 Fortunately, LLVM provides a broad range of optimizations that you can 92 use, in the form of "passes". 93 94 LLVM Optimization Passes 95 ======================== 96 97 LLVM provides many optimization passes, which do many different sorts of 98 things and have different tradeoffs. Unlike other systems, LLVM doesn't 99 hold to the mistaken notion that one set of optimizations is right for 100 all languages and for all situations. LLVM allows a compiler implementor 101 to make complete decisions about what optimizations to use, in which 102 order, and in what situation. 103 104 As a concrete example, LLVM supports both "whole module" passes, which 105 look across as large of body of code as they can (often a whole file, 106 but if run at link time, this can be a substantial portion of the whole 107 program). It also supports and includes "per-function" passes which just 108 operate on a single function at a time, without looking at other 109 functions. For more information on passes and how they are run, see the 110 `How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the 111 `List of LLVM Passes <../Passes.html>`_. 112 113 For Kaleidoscope, we are currently generating functions on the fly, one 114 at a time, as the user types them in. We aren't shooting for the 115 ultimate optimization experience in this setting, but we also want to 116 catch the easy and quick stuff where possible. As such, we will choose 117 to run a few per-function optimizations as the user types the function 118 in. If we wanted to make a "static Kaleidoscope compiler", we would use 119 exactly the code we have now, except that we would defer running the 120 optimizer until the entire file has been parsed. 121 122 In order to get per-function optimizations going, we need to set up a 123 `FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold 124 and organize the LLVM optimizations that we want to run. Once we have 125 that, we can add a set of optimizations to run. We'll need a new 126 FunctionPassManager for each module that we want to optimize, so we'll 127 write a function to create and initialize both the module and pass manager 128 for us: 129 130 .. code-block:: c++ 131 132 void InitializeModuleAndPassManager(void) { 133 // Open a new module. 134 Context LLVMContext; 135 TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext); 136 TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); 137 138 // Create a new pass manager attached to it. 139 TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get()); 140 141 // Provide basic AliasAnalysis support for GVN. 142 TheFPM.add(createBasicAliasAnalysisPass()); 143 // Do simple "peephole" optimizations and bit-twiddling optzns. 144 TheFPM.add(createInstructionCombiningPass()); 145 // Reassociate expressions. 146 TheFPM.add(createReassociatePass()); 147 // Eliminate Common SubExpressions. 148 TheFPM.add(createGVNPass()); 149 // Simplify the control flow graph (deleting unreachable blocks, etc). 150 TheFPM.add(createCFGSimplificationPass()); 151 152 TheFPM.doInitialization(); 153 } 154 155 This code initializes the global module ``TheModule``, and the function pass 156 manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is 157 set up, we use a series of "add" calls to add a bunch of LLVM passes. 158 159 In this case, we choose to add five passes: one analysis pass (alias analysis), 160 and four optimization passes. The passes we choose here are a pretty standard set 161 of "cleanup" optimizations that are useful for a wide variety of code. I won't 162 delve into what they do but, believe me, they are a good starting place :). 163 164 Once the PassManager is set up, we need to make use of it. We do this by 165 running it after our newly created function is constructed (in 166 ``FunctionAST::codegen()``), but before it is returned to the client: 167 168 .. code-block:: c++ 169 170 if (Value *RetVal = Body->codegen()) { 171 // Finish off the function. 172 Builder.CreateRet(RetVal); 173 174 // Validate the generated code, checking for consistency. 175 verifyFunction(*TheFunction); 176 177 // Optimize the function. 178 TheFPM->run(*TheFunction); 179 180 return TheFunction; 181 } 182 183 As you can see, this is pretty straightforward. The 184 ``FunctionPassManager`` optimizes and updates the LLVM Function\* in 185 place, improving (hopefully) its body. With this in place, we can try 186 our test above again: 187 188 :: 189 190 ready> def test(x) (1+2+x)*(x+(1+2)); 191 ready> Read function definition: 192 define double @test(double %x) { 193 entry: 194 %addtmp = fadd double %x, 3.000000e+00 195 %multmp = fmul double %addtmp, %addtmp 196 ret double %multmp 197 } 198 199 As expected, we now get our nicely optimized code, saving a floating 200 point add instruction from every execution of this function. 201 202 LLVM provides a wide variety of optimizations that can be used in 203 certain circumstances. Some `documentation about the various 204 passes <../Passes.html>`_ is available, but it isn't very complete. 205 Another good source of ideas can come from looking at the passes that 206 ``Clang`` runs to get started. The "``opt``" tool allows you to 207 experiment with passes from the command line, so you can see if they do 208 anything. 209 210 Now that we have reasonable code coming out of our front-end, lets talk 211 about executing it! 212 213 Adding a JIT Compiler 214 ===================== 215 216 Code that is available in LLVM IR can have a wide variety of tools 217 applied to it. For example, you can run optimizations on it (as we did 218 above), you can dump it out in textual or binary forms, you can compile 219 the code to an assembly file (.s) for some target, or you can JIT 220 compile it. The nice thing about the LLVM IR representation is that it 221 is the "common currency" between many different parts of the compiler. 222 223 In this section, we'll add JIT compiler support to our interpreter. The 224 basic idea that we want for Kaleidoscope is to have the user enter 225 function bodies as they do now, but immediately evaluate the top-level 226 expressions they type in. For example, if they type in "1 + 2;", we 227 should evaluate and print out 3. If they define a function, they should 228 be able to call it from the command line. 229 230 In order to do this, we first declare and initialize the JIT. This is 231 done by adding a global variable ``TheJIT``, and initializing it in 232 ``main``: 233 234 .. code-block:: c++ 235 236 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 237 ... 238 int main() { 239 .. 240 TheJIT = llvm::make_unique<KaleidoscopeJIT>(); 241 242 // Run the main "interpreter loop" now. 243 MainLoop(); 244 245 return 0; 246 } 247 248 The KaleidoscopeJIT class is a simple JIT built specifically for these 249 tutorials. In later chapters we will look at how it works and extend it with 250 new features, but for now we will take it as given. Its API is very simple:: 251 ``addModule`` adds an LLVM IR module to the JIT, making its functions 252 available for execution; ``removeModule`` removes a module, freeing any 253 memory associated with the code in that module; and ``findSymbol`` allows us 254 to look up pointers to the compiled code. 255 256 We can take this simple API and change our code that parses top-level expressions to 257 look like this: 258 259 .. code-block:: c++ 260 261 static void HandleTopLevelExpression() { 262 // Evaluate a top-level expression into an anonymous function. 263 if (auto FnAST = ParseTopLevelExpr()) { 264 if (FnAST->codegen()) { 265 266 // JIT the module containing the anonymous expression, keeping a handle so 267 // we can free it later. 268 auto H = TheJIT->addModule(std::move(TheModule)); 269 InitializeModuleAndPassManager(); 270 271 // Search the JIT for the __anon_expr symbol. 272 auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); 273 assert(ExprSymbol && "Function not found"); 274 275 // Get the symbol's address and cast it to the right type (takes no 276 // arguments, returns a double) so we can call it as a native function. 277 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress(); 278 fprintf(stderr, "Evaluated to %f\n", FP()); 279 280 // Delete the anonymous expression module from the JIT. 281 TheJIT->removeModule(H); 282 } 283 284 If parsing and codegen succeeed, the next step is to add the module containing 285 the top-level expression to the JIT. We do this by calling addModule, which 286 triggers code generation for all the functions in the module, and returns a 287 handle that can be used to remove the module from the JIT later. Once the module 288 has been added to the JIT it can no longer be modified, so we also open a new 289 module to hold subsequent code by calling ``InitializeModuleAndPassManager()``. 290 291 Once we've added the module to the JIT we need to get a pointer to the final 292 generated code. We do this by calling the JIT's findSymbol method, and passing 293 the name of the top-level expression function: ``__anon_expr``. Since we just 294 added this function, we assert that findSymbol returned a result. 295 296 Next, we get the in-memory address of the ``__anon_expr`` function by calling 297 ``getAddress()`` on the symbol. Recall that we compile top-level expressions 298 into a self-contained LLVM function that takes no arguments and returns the 299 computed double. Because the LLVM JIT compiler matches the native platform ABI, 300 this means that you can just cast the result pointer to a function pointer of 301 that type and call it directly. This means, there is no difference between JIT 302 compiled code and native machine code that is statically linked into your 303 application. 304 305 Finally, since we don't support re-evaluation of top-level expressions, we 306 remove the module from the JIT when we're done to free the associated memory. 307 Recall, however, that the module we created a few lines earlier (via 308 ``InitializeModuleAndPassManager``) is still open and waiting for new code to be 309 added. 310 311 With just these two changes, lets see how Kaleidoscope works now! 312 313 :: 314 315 ready> 4+5; 316 Read top-level expression: 317 define double @0() { 318 entry: 319 ret double 9.000000e+00 320 } 321 322 Evaluated to 9.000000 323 324 Well this looks like it is basically working. The dump of the function 325 shows the "no argument function that always returns double" that we 326 synthesize for each top-level expression that is typed in. This 327 demonstrates very basic functionality, but can we do more? 328 329 :: 330 331 ready> def testfunc(x y) x + y*2; 332 Read function definition: 333 define double @testfunc(double %x, double %y) { 334 entry: 335 %multmp = fmul double %y, 2.000000e+00 336 %addtmp = fadd double %multmp, %x 337 ret double %addtmp 338 } 339 340 ready> testfunc(4, 10); 341 Read top-level expression: 342 define double @1() { 343 entry: 344 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) 345 ret double %calltmp 346 } 347 348 Evaluated to 24.000000 349 350 ready> testfunc(5, 10); 351 ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved! 352 353 354 Function definitions and calls also work, but something went very wrong on that 355 last line. The call looks valid, so what happened? As you may have guessed from 356 the the API a Module is a unit of allocation for the JIT, and testfunc was part 357 of the same module that contained anonymous expression. When we removed that 358 module from the JIT to free the memory for the anonymous expression, we deleted 359 the definition of ``testfunc`` along with it. Then, when we tried to call 360 testfunc a second time, the JIT could no longer find it. 361 362 The easiest way to fix this is to put the anonymous expression in a separate 363 module from the rest of the function definitions. The JIT will happily resolve 364 function calls across module boundaries, as long as each of the functions called 365 has a prototype, and is added to the JIT before it is called. By putting the 366 anonymous expression in a different module we can delete it without affecting 367 the rest of the functions. 368 369 In fact, we're going to go a step further and put every function in its own 370 module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT 371 that will make our environment more REPL-like: Functions can be added to the 372 JIT more than once (unlike a module where every function must have a unique 373 definition). When you look up a symbol in KaleidoscopeJIT it will always return 374 the most recent definition: 375 376 :: 377 378 ready> def foo(x) x + 1; 379 Read function definition: 380 define double @foo(double %x) { 381 entry: 382 %addtmp = fadd double %x, 1.000000e+00 383 ret double %addtmp 384 } 385 386 ready> foo(2); 387 Evaluated to 3.000000 388 389 ready> def foo(x) x + 2; 390 define double @foo(double %x) { 391 entry: 392 %addtmp = fadd double %x, 2.000000e+00 393 ret double %addtmp 394 } 395 396 ready> foo(2); 397 Evaluated to 4.000000 398 399 400 To allow each function to live in its own module we'll need a way to 401 re-generate previous function declarations into each new module we open: 402 403 .. code-block:: c++ 404 405 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 406 407 ... 408 409 Function *getFunction(std::string Name) { 410 // First, see if the function has already been added to the current module. 411 if (auto *F = TheModule->getFunction(Name)) 412 return F; 413 414 // If not, check whether we can codegen the declaration from some existing 415 // prototype. 416 auto FI = FunctionProtos.find(Name); 417 if (FI != FunctionProtos.end()) 418 return FI->second->codegen(); 419 420 // If no existing prototype exists, return null. 421 return nullptr; 422 } 423 424 ... 425 426 Value *CallExprAST::codegen() { 427 // Look up the name in the global module table. 428 Function *CalleeF = getFunction(Callee); 429 430 ... 431 432 Function *FunctionAST::codegen() { 433 // Transfer ownership of the prototype to the FunctionProtos map, but keep a 434 // reference to it for use below. 435 auto &P = *Proto; 436 FunctionProtos[Proto->getName()] = std::move(Proto); 437 Function *TheFunction = getFunction(P.getName()); 438 if (!TheFunction) 439 return nullptr; 440 441 442 To enable this, we'll start by adding a new global, ``FunctionProtos``, that 443 holds the most recent prototype for each function. We'll also add a convenience 444 method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``. 445 Our convenience method searches ``TheModule`` for an existing function 446 declaration, falling back to generating a new declaration from FunctionProtos if 447 it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the 448 call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to 449 update the FunctionProtos map first, then call ``getFunction()``. With this 450 done, we can always obtain a function declaration in the current module for any 451 previously declared function. 452 453 We also need to update HandleDefinition and HandleExtern: 454 455 .. code-block:: c++ 456 457 static void HandleDefinition() { 458 if (auto FnAST = ParseDefinition()) { 459 if (auto *FnIR = FnAST->codegen()) { 460 fprintf(stderr, "Read function definition:"); 461 FnIR->dump(); 462 TheJIT->addModule(std::move(TheModule)); 463 InitializeModuleAndPassManager(); 464 } 465 } else { 466 // Skip token for error recovery. 467 getNextToken(); 468 } 469 } 470 471 static void HandleExtern() { 472 if (auto ProtoAST = ParseExtern()) { 473 if (auto *FnIR = ProtoAST->codegen()) { 474 fprintf(stderr, "Read extern: "); 475 FnIR->dump(); 476 FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); 477 } 478 } else { 479 // Skip token for error recovery. 480 getNextToken(); 481 } 482 } 483 484 In HandleDefinition, we add two lines to transfer the newly defined function to 485 the JIT and open a new module. In HandleExtern, we just need to add one line to 486 add the prototype to FunctionProtos. 487 488 With these changes made, lets try our REPL again (I removed the dump of the 489 anonymous functions this time, you should get the idea by now :) : 490 491 :: 492 493 ready> def foo(x) x + 1; 494 ready> foo(2); 495 Evaluated to 3.000000 496 497 ready> def foo(x) x + 2; 498 ready> foo(2); 499 Evaluated to 4.000000 500 501 It works! 502 503 Even with this simple code, we get some surprisingly powerful capabilities - 504 check this out: 505 506 :: 507 508 ready> extern sin(x); 509 Read extern: 510 declare double @sin(double) 511 512 ready> extern cos(x); 513 Read extern: 514 declare double @cos(double) 515 516 ready> sin(1.0); 517 Read top-level expression: 518 define double @2() { 519 entry: 520 ret double 0x3FEAED548F090CEE 521 } 522 523 Evaluated to 0.841471 524 525 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); 526 Read function definition: 527 define double @foo(double %x) { 528 entry: 529 %calltmp = call double @sin(double %x) 530 %multmp = fmul double %calltmp, %calltmp 531 %calltmp2 = call double @cos(double %x) 532 %multmp4 = fmul double %calltmp2, %calltmp2 533 %addtmp = fadd double %multmp, %multmp4 534 ret double %addtmp 535 } 536 537 ready> foo(4.0); 538 Read top-level expression: 539 define double @3() { 540 entry: 541 %calltmp = call double @foo(double 4.000000e+00) 542 ret double %calltmp 543 } 544 545 Evaluated to 1.000000 546 547 Whoa, how does the JIT know about sin and cos? The answer is surprisingly 548 simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that 549 it uses to find symbols that aren't available in any given module: First 550 it searches all the modules that have already been added to the JIT, from the 551 most recent to the oldest, to find the newest definition. If no definition is 552 found inside the JIT, it falls back to calling "``dlsym("sin")``" on the 553 Kaleidoscope process itself. Since "``sin``" is defined within the JIT's 554 address space, it simply patches up calls in the module to call the libm 555 version of ``sin`` directly. 556 557 In the future we'll see how tweaking this symbol resolution rule can be used to 558 enable all sorts of useful features, from security (restricting the set of 559 symbols available to JIT'd code), to dynamic code generation based on symbol 560 names, and even lazy compilation. 561 562 One immediate benefit of the symbol resolution rule is that we can now extend 563 the language by writing arbitrary C++ code to implement operations. For example, 564 if we add: 565 566 .. code-block:: c++ 567 568 /// putchard - putchar that takes a double and returns 0. 569 extern "C" double putchard(double X) { 570 fputc((char)X, stderr); 571 return 0; 572 } 573 574 Now we can produce simple output to the console by using things like: 575 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x' 576 on the console (120 is the ASCII code for 'x'). Similar code could be 577 used to implement file I/O, console input, and many other capabilities 578 in Kaleidoscope. 579 580 This completes the JIT and optimizer chapter of the Kaleidoscope 581 tutorial. At this point, we can compile a non-Turing-complete 582 programming language, optimize and JIT compile it in a user-driven way. 583 Next up we'll look into `extending the language with control flow 584 constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues 585 along the way. 586 587 Full Code Listing 588 ================= 589 590 Here is the complete code listing for our running example, enhanced with 591 the LLVM JIT and optimizer. To build this example, use: 592 593 .. code-block:: bash 594 595 # Compile 596 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy 597 # Run 598 ./toy 599 600 If you are compiling this on Linux, make sure to add the "-rdynamic" 601 option as well. This makes sure that the external functions are resolved 602 properly at runtime. 603 604 Here is the code: 605 606 .. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp 607 :language: c++ 608 609 `Next: Extending the language: control flow <LangImpl05.html>`_ 610 611