1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 4 <html> 5 <head> 6 <title>Kaleidoscope: Extending the Language: Control Flow</title> 7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8 <meta name="author" content="Chris Lattner"> 9 <link rel="stylesheet" href="../llvm.css" type="text/css"> 10 </head> 11 12 <body> 13 14 <h1>Kaleidoscope: Extending the Language: Control Flow</h1> 15 16 <ul> 17 <li><a href="index.html">Up to Tutorial Index</a></li> 18 <li>Chapter 5 19 <ol> 20 <li><a href="#intro">Chapter 5 Introduction</a></li> 21 <li><a href="#ifthen">If/Then/Else</a> 22 <ol> 23 <li><a href="#iflexer">Lexer Extensions</a></li> 24 <li><a href="#ifast">AST Extensions</a></li> 25 <li><a href="#ifparser">Parser Extensions</a></li> 26 <li><a href="#ifir">LLVM IR</a></li> 27 <li><a href="#ifcodegen">Code Generation</a></li> 28 </ol> 29 </li> 30 <li><a href="#for">'for' Loop Expression</a> 31 <ol> 32 <li><a href="#forlexer">Lexer Extensions</a></li> 33 <li><a href="#forast">AST Extensions</a></li> 34 <li><a href="#forparser">Parser Extensions</a></li> 35 <li><a href="#forir">LLVM IR</a></li> 36 <li><a href="#forcodegen">Code Generation</a></li> 37 </ol> 38 </li> 39 <li><a href="#code">Full Code Listing</a></li> 40 </ol> 41 </li> 42 <li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language: 43 User-defined Operators</li> 44 </ul> 45 46 <div class="doc_author"> 47 <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a></p> 48 </div> 49 50 <!-- *********************************************************************** --> 51 <h2><a name="intro">Chapter 5 Introduction</a></h2> 52 <!-- *********************************************************************** --> 53 54 <div> 55 56 <p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language 57 with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple 58 Kaleidoscope language and included support for generating LLVM IR, followed by 59 optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is 60 mostly useless: it has no control flow other than call and return. This means 61 that you can't have conditional branches in the code, significantly limiting its 62 power. In this episode of "build that compiler", we'll extend Kaleidoscope to 63 have an if/then/else expression plus a simple 'for' loop.</p> 64 65 </div> 66 67 <!-- *********************************************************************** --> 68 <h2><a name="ifthen">If/Then/Else</a></h2> 69 <!-- *********************************************************************** --> 70 71 <div> 72 73 <p> 74 Extending Kaleidoscope to support if/then/else is quite straightforward. It 75 basically requires adding support for this "new" concept to the lexer, 76 parser, AST, and LLVM code emitter. This example is nice, because it shows how 77 easy it is to "grow" a language over time, incrementally extending it as new 78 ideas are discovered.</p> 79 80 <p>Before we get going on "how" we add this extension, lets talk about "what" we 81 want. The basic idea is that we want to be able to write this sort of thing: 82 </p> 83 84 <div class="doc_code"> 85 <pre> 86 def fib(x) 87 if x < 3 then 88 1 89 else 90 fib(x-1)+fib(x-2); 91 </pre> 92 </div> 93 94 <p>In Kaleidoscope, every construct is an expression: there are no statements. 95 As such, the if/then/else expression needs to return a value like any other. 96 Since we're using a mostly functional form, we'll have it evaluate its 97 conditional, then return the 'then' or 'else' value based on how the condition 98 was resolved. This is very similar to the C "?:" expression.</p> 99 100 <p>The semantics of the if/then/else expression is that it evaluates the 101 condition to a boolean equality value: 0.0 is considered to be false and 102 everything else is considered to be true. 103 If the condition is true, the first subexpression is evaluated and returned, if 104 the condition is false, the second subexpression is evaluated and returned. 105 Since Kaleidoscope allows side-effects, this behavior is important to nail down. 106 </p> 107 108 <p>Now that we know what we "want", lets break this down into its constituent 109 pieces.</p> 110 111 <!-- ======================================================================= --> 112 <h4><a name="iflexer">Lexer Extensions for If/Then/Else</a></h4> 113 <!-- ======================================================================= --> 114 115 116 <div> 117 118 <p>The lexer extensions are straightforward. First we add new enum values 119 for the relevant tokens:</p> 120 121 <div class="doc_code"> 122 <pre> 123 // control 124 tok_if = -6, tok_then = -7, tok_else = -8, 125 </pre> 126 </div> 127 128 <p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple 129 stuff:</p> 130 131 <div class="doc_code"> 132 <pre> 133 ... 134 if (IdentifierStr == "def") return tok_def; 135 if (IdentifierStr == "extern") return tok_extern; 136 <b>if (IdentifierStr == "if") return tok_if; 137 if (IdentifierStr == "then") return tok_then; 138 if (IdentifierStr == "else") return tok_else;</b> 139 return tok_identifier; 140 </pre> 141 </div> 142 143 </div> 144 145 <!-- ======================================================================= --> 146 <h4><a name="ifast">AST Extensions for If/Then/Else</a></h4> 147 <!-- ======================================================================= --> 148 149 <div> 150 151 <p>To represent the new expression we add a new AST node for it:</p> 152 153 <div class="doc_code"> 154 <pre> 155 /// IfExprAST - Expression class for if/then/else. 156 class IfExprAST : public ExprAST { 157 ExprAST *Cond, *Then, *Else; 158 public: 159 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 160 : Cond(cond), Then(then), Else(_else) {} 161 virtual Value *Codegen(); 162 }; 163 </pre> 164 </div> 165 166 <p>The AST node just has pointers to the various subexpressions.</p> 167 168 </div> 169 170 <!-- ======================================================================= --> 171 <h4><a name="ifparser">Parser Extensions for If/Then/Else</a></h4> 172 <!-- ======================================================================= --> 173 174 <div> 175 176 <p>Now that we have the relevant tokens coming from the lexer and we have the 177 AST node to build, our parsing logic is relatively straightforward. First we 178 define a new parsing function:</p> 179 180 <div class="doc_code"> 181 <pre> 182 /// ifexpr ::= 'if' expression 'then' expression 'else' expression 183 static ExprAST *ParseIfExpr() { 184 getNextToken(); // eat the if. 185 186 // condition. 187 ExprAST *Cond = ParseExpression(); 188 if (!Cond) return 0; 189 190 if (CurTok != tok_then) 191 return Error("expected then"); 192 getNextToken(); // eat the then 193 194 ExprAST *Then = ParseExpression(); 195 if (Then == 0) return 0; 196 197 if (CurTok != tok_else) 198 return Error("expected else"); 199 200 getNextToken(); 201 202 ExprAST *Else = ParseExpression(); 203 if (!Else) return 0; 204 205 return new IfExprAST(Cond, Then, Else); 206 } 207 </pre> 208 </div> 209 210 <p>Next we hook it up as a primary expression:</p> 211 212 <div class="doc_code"> 213 <pre> 214 static ExprAST *ParsePrimary() { 215 switch (CurTok) { 216 default: return Error("unknown token when expecting an expression"); 217 case tok_identifier: return ParseIdentifierExpr(); 218 case tok_number: return ParseNumberExpr(); 219 case '(': return ParseParenExpr(); 220 <b>case tok_if: return ParseIfExpr();</b> 221 } 222 } 223 </pre> 224 </div> 225 226 </div> 227 228 <!-- ======================================================================= --> 229 <h4><a name="ifir">LLVM IR for If/Then/Else</a></h4> 230 <!-- ======================================================================= --> 231 232 <div> 233 234 <p>Now that we have it parsing and building the AST, the final piece is adding 235 LLVM code generation support. This is the most interesting part of the 236 if/then/else example, because this is where it starts to introduce new concepts. 237 All of the code above has been thoroughly described in previous chapters. 238 </p> 239 240 <p>To motivate the code we want to produce, lets take a look at a simple 241 example. Consider:</p> 242 243 <div class="doc_code"> 244 <pre> 245 extern foo(); 246 extern bar(); 247 def baz(x) if x then foo() else bar(); 248 </pre> 249 </div> 250 251 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope 252 looks like this:</p> 253 254 <div class="doc_code"> 255 <pre> 256 declare double @foo() 257 258 declare double @bar() 259 260 define double @baz(double %x) { 261 entry: 262 %ifcond = fcmp one double %x, 0.000000e+00 263 br i1 %ifcond, label %then, label %else 264 265 then: ; preds = %entry 266 %calltmp = call double @foo() 267 br label %ifcont 268 269 else: ; preds = %entry 270 %calltmp1 = call double @bar() 271 br label %ifcont 272 273 ifcont: ; preds = %else, %then 274 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] 275 ret double %iftmp 276 } 277 </pre> 278 </div> 279 280 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM 281 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR 282 into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a 283 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll 284 see this graph:</p> 285 286 <div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" 287 height="315"></div> 288 289 <p>Another way to get this is to call "<tt>F->viewCFG()</tt>" or 290 "<tt>F->viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by 291 inserting actual calls into the code and recompiling or by calling these in the 292 debugger. LLVM has many nice features for visualizing various graphs.</p> 293 294 <p>Getting back to the generated code, it is fairly simple: the entry block 295 evaluates the conditional expression ("x" in our case here) and compares the 296 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" 297 instruction ('one' is "Ordered and Not Equal"). Based on the result of this 298 expression, the code jumps to either the "then" or "else" blocks, which contain 299 the expressions for the true/false cases.</p> 300 301 <p>Once the then/else blocks are finished executing, they both branch back to the 302 'ifcont' block to execute the code that happens after the if/then/else. In this 303 case the only thing left to do is to return to the caller of the function. The 304 question then becomes: how does the code know which expression to return?</p> 305 306 <p>The answer to this question involves an important SSA operation: the 307 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi 308 operation</a>. If you're not familiar with SSA, <a 309 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia 310 article</a> is a good introduction and there are various other introductions to 311 it available on your favorite search engine. The short version is that 312 "execution" of the Phi operation requires "remembering" which block control came 313 from. The Phi operation takes on the value corresponding to the input control 314 block. In this case, if control comes in from the "then" block, it gets the 315 value of "calltmp". If control comes from the "else" block, it gets the value 316 of "calltmp1".</p> 317 318 <p>At this point, you are probably starting to think "Oh no! This means my 319 simple and elegant front-end will have to start generating SSA form in order to 320 use LLVM!". Fortunately, this is not the case, and we strongly advise 321 <em>not</em> implementing an SSA construction algorithm in your front-end 322 unless there is an amazingly good reason to do so. In practice, there are two 323 sorts of values that float around in code written for your average imperative 324 programming language that might need Phi nodes:</p> 325 326 <ol> 327 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> 328 <li>Values that are implicit in the structure of your AST, such as the Phi node 329 in this case.</li> 330 </ol> 331 332 <p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable 333 variables"), we'll talk about #1 334 in depth. For now, just believe me that you don't need SSA construction to 335 handle this case. For #2, you have the choice of using the techniques that we will 336 describe for #1, or you can insert Phi nodes directly, if convenient. In this 337 case, it is really really easy to generate the Phi node, so we choose to do it 338 directly.</p> 339 340 <p>Okay, enough of the motivation and overview, lets generate code!</p> 341 342 </div> 343 344 <!-- ======================================================================= --> 345 <h4><a name="ifcodegen">Code Generation for If/Then/Else</a></h4> 346 <!-- ======================================================================= --> 347 348 <div> 349 350 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method 351 for <tt>IfExprAST</tt>:</p> 352 353 <div class="doc_code"> 354 <pre> 355 Value *IfExprAST::Codegen() { 356 Value *CondV = Cond->Codegen(); 357 if (CondV == 0) return 0; 358 359 // Convert condition to a bool by comparing equal to 0.0. 360 CondV = Builder.CreateFCmpONE(CondV, 361 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 362 "ifcond"); 363 </pre> 364 </div> 365 366 <p>This code is straightforward and similar to what we saw before. We emit the 367 expression for the condition, then compare that value to zero to get a truth 368 value as a 1-bit (bool) value.</p> 369 370 <div class="doc_code"> 371 <pre> 372 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 373 374 // Create blocks for the then and else cases. Insert the 'then' block at the 375 // end of the function. 376 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 377 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 378 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 379 380 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 381 </pre> 382 </div> 383 384 <p>This code creates the basic blocks that are related to the if/then/else 385 statement, and correspond directly to the blocks in the example above. The 386 first line gets the current Function object that is being built. It 387 gets this by asking the builder for the current BasicBlock, and asking that 388 block for its "parent" (the function it is currently embedded into).</p> 389 390 <p>Once it has that, it creates three blocks. Note that it passes "TheFunction" 391 into the constructor for the "then" block. This causes the constructor to 392 automatically insert the new block into the end of the specified function. The 393 other two blocks are created, but aren't yet inserted into the function.</p> 394 395 <p>Once the blocks are created, we can emit the conditional branch that chooses 396 between them. Note that creating new blocks does not implicitly affect the 397 IRBuilder, so it is still inserting into the block that the condition 398 went into. Also note that it is creating a branch to the "then" block and the 399 "else" block, even though the "else" block isn't inserted into the function yet. 400 This is all ok: it is the standard way that LLVM supports forward 401 references.</p> 402 403 <div class="doc_code"> 404 <pre> 405 // Emit then value. 406 Builder.SetInsertPoint(ThenBB); 407 408 Value *ThenV = Then->Codegen(); 409 if (ThenV == 0) return 0; 410 411 Builder.CreateBr(MergeBB); 412 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 413 ThenBB = Builder.GetInsertBlock(); 414 </pre> 415 </div> 416 417 <p>After the conditional branch is inserted, we move the builder to start 418 inserting into the "then" block. Strictly speaking, this call moves the 419 insertion point to be at the end of the specified block. However, since the 420 "then" block is empty, it also starts out by inserting at the beginning of the 421 block. :)</p> 422 423 <p>Once the insertion point is set, we recursively codegen the "then" expression 424 from the AST. To finish off the "then" block, we create an unconditional branch 425 to the merge block. One interesting (and very important) aspect of the LLVM IR 426 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks 427 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow 428 instruction</a> such as return or branch. This means that all control flow, 429 <em>including fall throughs</em> must be made explicit in the LLVM IR. If you 430 violate this rule, the verifier will emit an error.</p> 431 432 <p>The final line here is quite subtle, but is very important. The basic issue 433 is that when we create the Phi node in the merge block, we need to set up the 434 block/value pairs that indicate how the Phi will work. Importantly, the Phi 435 node expects to have an entry for each predecessor of the block in the CFG. Why 436 then, are we getting the current block when we just set it to ThenBB 5 lines 437 above? The problem is that the "Then" expression may actually itself change the 438 block that the Builder is emitting into if, for example, it contains a nested 439 "if/then/else" expression. Because calling Codegen recursively could 440 arbitrarily change the notion of the current block, we are required to get an 441 up-to-date value for code that will set up the Phi node.</p> 442 443 <div class="doc_code"> 444 <pre> 445 // Emit else block. 446 TheFunction->getBasicBlockList().push_back(ElseBB); 447 Builder.SetInsertPoint(ElseBB); 448 449 Value *ElseV = Else->Codegen(); 450 if (ElseV == 0) return 0; 451 452 Builder.CreateBr(MergeBB); 453 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 454 ElseBB = Builder.GetInsertBlock(); 455 </pre> 456 </div> 457 458 <p>Code generation for the 'else' block is basically identical to codegen for 459 the 'then' block. The only significant difference is the first line, which adds 460 the 'else' block to the function. Recall previously that the 'else' block was 461 created, but not added to the function. Now that the 'then' and 'else' blocks 462 are emitted, we can finish up with the merge code:</p> 463 464 <div class="doc_code"> 465 <pre> 466 // Emit merge block. 467 TheFunction->getBasicBlockList().push_back(MergeBB); 468 Builder.SetInsertPoint(MergeBB); 469 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 470 "iftmp"); 471 472 PN->addIncoming(ThenV, ThenBB); 473 PN->addIncoming(ElseV, ElseBB); 474 return PN; 475 } 476 </pre> 477 </div> 478 479 <p>The first two lines here are now familiar: the first adds the "merge" block 480 to the Function object (it was previously floating, like the else block above). 481 The second block changes the insertion point so that newly created code will go 482 into the "merge" block. Once that is done, we need to create the PHI node and 483 set up the block/value pairs for the PHI.</p> 484 485 <p>Finally, the CodeGen function returns the phi node as the value computed by 486 the if/then/else expression. In our example above, this returned value will 487 feed into the code for the top-level function, which will create the return 488 instruction.</p> 489 490 <p>Overall, we now have the ability to execute conditional code in 491 Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language 492 that can calculate a wide variety of numeric functions. Next up we'll add 493 another useful expression that is familiar from non-functional languages...</p> 494 495 </div> 496 497 </div> 498 499 <!-- *********************************************************************** --> 500 <h2><a name="for">'for' Loop Expression</a></h2> 501 <!-- *********************************************************************** --> 502 503 <div> 504 505 <p>Now that we know how to add basic control flow constructs to the language, 506 we have the tools to add more powerful things. Lets add something more 507 aggressive, a 'for' expression:</p> 508 509 <div class="doc_code"> 510 <pre> 511 extern putchard(char) 512 def printstar(n) 513 for i = 1, i < n, 1.0 in 514 putchard(42); # ascii 42 = '*' 515 516 # print 100 '*' characters 517 printstar(100); 518 </pre> 519 </div> 520 521 <p>This expression defines a new variable ("i" in this case) which iterates from 522 a starting value, while the condition ("i < n" in this case) is true, 523 incrementing by an optional step value ("1.0" in this case). If the step value 524 is omitted, it defaults to 1.0. While the loop is true, it executes its 525 body expression. Because we don't have anything better to return, we'll just 526 define the loop as always returning 0.0. In the future when we have mutable 527 variables, it will get more useful.</p> 528 529 <p>As before, lets talk about the changes that we need to Kaleidoscope to 530 support this.</p> 531 532 <!-- ======================================================================= --> 533 <h4><a name="forlexer">Lexer Extensions for the 'for' Loop</a></h4> 534 <!-- ======================================================================= --> 535 536 <div> 537 538 <p>The lexer extensions are the same sort of thing as for if/then/else:</p> 539 540 <div class="doc_code"> 541 <pre> 542 ... in enum Token ... 543 // control 544 tok_if = -6, tok_then = -7, tok_else = -8, 545 <b> tok_for = -9, tok_in = -10</b> 546 547 ... in gettok ... 548 if (IdentifierStr == "def") return tok_def; 549 if (IdentifierStr == "extern") return tok_extern; 550 if (IdentifierStr == "if") return tok_if; 551 if (IdentifierStr == "then") return tok_then; 552 if (IdentifierStr == "else") return tok_else; 553 <b>if (IdentifierStr == "for") return tok_for; 554 if (IdentifierStr == "in") return tok_in;</b> 555 return tok_identifier; 556 </pre> 557 </div> 558 559 </div> 560 561 <!-- ======================================================================= --> 562 <h4><a name="forast">AST Extensions for the 'for' Loop</a></h4> 563 <!-- ======================================================================= --> 564 565 <div> 566 567 <p>The AST node is just as simple. It basically boils down to capturing 568 the variable name and the constituent expressions in the node.</p> 569 570 <div class="doc_code"> 571 <pre> 572 /// ForExprAST - Expression class for for/in. 573 class ForExprAST : public ExprAST { 574 std::string VarName; 575 ExprAST *Start, *End, *Step, *Body; 576 public: 577 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 578 ExprAST *step, ExprAST *body) 579 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 580 virtual Value *Codegen(); 581 }; 582 </pre> 583 </div> 584 585 </div> 586 587 <!-- ======================================================================= --> 588 <h4><a name="forparser">Parser Extensions for the 'for' Loop</a></h4> 589 <!-- ======================================================================= --> 590 591 <div> 592 593 <p>The parser code is also fairly standard. The only interesting thing here is 594 handling of the optional step value. The parser code handles it by checking to 595 see if the second comma is present. If not, it sets the step value to null in 596 the AST node:</p> 597 598 <div class="doc_code"> 599 <pre> 600 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 601 static ExprAST *ParseForExpr() { 602 getNextToken(); // eat the for. 603 604 if (CurTok != tok_identifier) 605 return Error("expected identifier after for"); 606 607 std::string IdName = IdentifierStr; 608 getNextToken(); // eat identifier. 609 610 if (CurTok != '=') 611 return Error("expected '=' after for"); 612 getNextToken(); // eat '='. 613 614 615 ExprAST *Start = ParseExpression(); 616 if (Start == 0) return 0; 617 if (CurTok != ',') 618 return Error("expected ',' after for start value"); 619 getNextToken(); 620 621 ExprAST *End = ParseExpression(); 622 if (End == 0) return 0; 623 624 // The step value is optional. 625 ExprAST *Step = 0; 626 if (CurTok == ',') { 627 getNextToken(); 628 Step = ParseExpression(); 629 if (Step == 0) return 0; 630 } 631 632 if (CurTok != tok_in) 633 return Error("expected 'in' after for"); 634 getNextToken(); // eat 'in'. 635 636 ExprAST *Body = ParseExpression(); 637 if (Body == 0) return 0; 638 639 return new ForExprAST(IdName, Start, End, Step, Body); 640 } 641 </pre> 642 </div> 643 644 </div> 645 646 <!-- ======================================================================= --> 647 <h4><a name="forir">LLVM IR for the 'for' Loop</a></h4> 648 <!-- ======================================================================= --> 649 650 <div> 651 652 <p>Now we get to the good part: the LLVM IR we want to generate for this thing. 653 With the simple example above, we get this LLVM IR (note that this dump is 654 generated with optimizations disabled for clarity): 655 </p> 656 657 <div class="doc_code"> 658 <pre> 659 declare double @putchard(double) 660 661 define double @printstar(double %n) { 662 entry: 663 ; initial value = 1.0 (inlined into phi) 664 br label %loop 665 666 loop: ; preds = %loop, %entry 667 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] 668 ; body 669 %calltmp = call double @putchard(double 4.200000e+01) 670 ; increment 671 %nextvar = fadd double %i, 1.000000e+00 672 673 ; termination test 674 %cmptmp = fcmp ult double %i, %n 675 %booltmp = uitofp i1 %cmptmp to double 676 %loopcond = fcmp one double %booltmp, 0.000000e+00 677 br i1 %loopcond, label %loop, label %afterloop 678 679 afterloop: ; preds = %loop 680 ; loop always returns 0.0 681 ret double 0.000000e+00 682 } 683 </pre> 684 </div> 685 686 <p>This loop contains all the same constructs we saw before: a phi node, several 687 expressions, and some basic blocks. Lets see how this fits together.</p> 688 689 </div> 690 691 <!-- ======================================================================= --> 692 <h4><a name="forcodegen">Code Generation for the 'for' Loop</a></h4> 693 <!-- ======================================================================= --> 694 695 <div> 696 697 <p>The first part of Codegen is very simple: we just output the start expression 698 for the loop value:</p> 699 700 <div class="doc_code"> 701 <pre> 702 Value *ForExprAST::Codegen() { 703 // Emit the start code first, without 'variable' in scope. 704 Value *StartVal = Start->Codegen(); 705 if (StartVal == 0) return 0; 706 </pre> 707 </div> 708 709 <p>With this out of the way, the next step is to set up the LLVM basic block 710 for the start of the loop body. In the case above, the whole loop body is one 711 block, but remember that the body code itself could consist of multiple blocks 712 (e.g. if it contains an if/then/else or a for/in expression).</p> 713 714 <div class="doc_code"> 715 <pre> 716 // Make the new basic block for the loop header, inserting after current 717 // block. 718 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 719 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 720 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 721 722 // Insert an explicit fall through from the current block to the LoopBB. 723 Builder.CreateBr(LoopBB); 724 </pre> 725 </div> 726 727 <p>This code is similar to what we saw for if/then/else. Because we will need 728 it to create the Phi node, we remember the block that falls through into the 729 loop. Once we have that, we create the actual block that starts the loop and 730 create an unconditional branch for the fall-through between the two blocks.</p> 731 732 <div class="doc_code"> 733 <pre> 734 // Start insertion in LoopBB. 735 Builder.SetInsertPoint(LoopBB); 736 737 // Start the PHI node with an entry for Start. 738 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 739 Variable->addIncoming(StartVal, PreheaderBB); 740 </pre> 741 </div> 742 743 <p>Now that the "preheader" for the loop is set up, we switch to emitting code 744 for the loop body. To begin with, we move the insertion point and create the 745 PHI node for the loop induction variable. Since we already know the incoming 746 value for the starting value, we add it to the Phi node. Note that the Phi will 747 eventually get a second value for the backedge, but we can't set it up yet 748 (because it doesn't exist!).</p> 749 750 <div class="doc_code"> 751 <pre> 752 // Within the loop, the variable is defined equal to the PHI node. If it 753 // shadows an existing variable, we have to restore it, so save it now. 754 Value *OldVal = NamedValues[VarName]; 755 NamedValues[VarName] = Variable; 756 757 // Emit the body of the loop. This, like any other expr, can change the 758 // current BB. Note that we ignore the value computed by the body, but don't 759 // allow an error. 760 if (Body->Codegen() == 0) 761 return 0; 762 </pre> 763 </div> 764 765 <p>Now the code starts to get more interesting. Our 'for' loop introduces a new 766 variable to the symbol table. This means that our symbol table can now contain 767 either function arguments or loop variables. To handle this, before we codegen 768 the body of the loop, we add the loop variable as the current value for its 769 name. Note that it is possible that there is a variable of the same name in the 770 outer scope. It would be easy to make this an error (emit an error and return 771 null if there is already an entry for VarName) but we choose to allow shadowing 772 of variables. In order to handle this correctly, we remember the Value that 773 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is 774 no shadowed variable).</p> 775 776 <p>Once the loop variable is set into the symbol table, the code recursively 777 codegen's the body. This allows the body to use the loop variable: any 778 references to it will naturally find it in the symbol table.</p> 779 780 <div class="doc_code"> 781 <pre> 782 // Emit the step value. 783 Value *StepVal; 784 if (Step) { 785 StepVal = Step->Codegen(); 786 if (StepVal == 0) return 0; 787 } else { 788 // If not specified, use 1.0. 789 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 790 } 791 792 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 793 </pre> 794 </div> 795 796 <p>Now that the body is emitted, we compute the next value of the iteration 797 variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>' 798 will be the value of the loop variable on the next iteration of the loop.</p> 799 800 <div class="doc_code"> 801 <pre> 802 // Compute the end condition. 803 Value *EndCond = End->Codegen(); 804 if (EndCond == 0) return EndCond; 805 806 // Convert condition to a bool by comparing equal to 0.0. 807 EndCond = Builder.CreateFCmpONE(EndCond, 808 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 809 "loopcond"); 810 </pre> 811 </div> 812 813 <p>Finally, we evaluate the exit value of the loop, to determine whether the 814 loop should exit. This mirrors the condition evaluation for the if/then/else 815 statement.</p> 816 817 <div class="doc_code"> 818 <pre> 819 // Create the "after loop" block and insert it. 820 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 821 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 822 823 // Insert the conditional branch into the end of LoopEndBB. 824 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 825 826 // Any new code will be inserted in AfterBB. 827 Builder.SetInsertPoint(AfterBB); 828 </pre> 829 </div> 830 831 <p>With the code for the body of the loop complete, we just need to finish up 832 the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the 833 exit condition, it creates a conditional branch that chooses between executing 834 the loop again and exiting the loop. Any future code is emitted in the 835 "afterloop" block, so it sets the insertion position to it.</p> 836 837 <div class="doc_code"> 838 <pre> 839 // Add a new entry to the PHI node for the backedge. 840 Variable->addIncoming(NextVar, LoopEndBB); 841 842 // Restore the unshadowed variable. 843 if (OldVal) 844 NamedValues[VarName] = OldVal; 845 else 846 NamedValues.erase(VarName); 847 848 // for expr always returns 0.0. 849 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 850 } 851 </pre> 852 </div> 853 854 <p>The final code handles various cleanups: now that we have the "NextVar" 855 value, we can add the incoming value to the loop PHI node. After that, we 856 remove the loop variable from the symbol table, so that it isn't in scope after 857 the for loop. Finally, code generation of the for loop always returns 0.0, so 858 that is what we return from <tt>ForExprAST::Codegen</tt>.</p> 859 860 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of 861 the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors 862 to know. In the next chapter of our saga, we will get a bit crazier and add 863 <a href="LangImpl6.html">user-defined operators</a> to our poor innocent 864 language.</p> 865 866 </div> 867 868 </div> 869 870 <!-- *********************************************************************** --> 871 <h2><a name="code">Full Code Listing</a></h2> 872 <!-- *********************************************************************** --> 873 874 <div> 875 876 <p> 877 Here is the complete code listing for our running example, enhanced with the 878 if/then/else and for expressions.. To build this example, use: 879 </p> 880 881 <div class="doc_code"> 882 <pre> 883 # Compile 884 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy 885 # Run 886 ./toy 887 </pre> 888 </div> 889 890 <p>Here is the code:</p> 891 892 <div class="doc_code"> 893 <pre> 894 #include "llvm/DerivedTypes.h" 895 #include "llvm/ExecutionEngine/ExecutionEngine.h" 896 #include "llvm/ExecutionEngine/JIT.h" 897 #include "llvm/LLVMContext.h" 898 #include "llvm/Module.h" 899 #include "llvm/PassManager.h" 900 #include "llvm/Analysis/Verifier.h" 901 #include "llvm/Analysis/Passes.h" 902 #include "llvm/Target/TargetData.h" 903 #include "llvm/Target/TargetSelect.h" 904 #include "llvm/Transforms/Scalar.h" 905 #include "llvm/Support/IRBuilder.h" 906 #include <cstdio> 907 #include <string> 908 #include <map> 909 #include <vector> 910 using namespace llvm; 911 912 //===----------------------------------------------------------------------===// 913 // Lexer 914 //===----------------------------------------------------------------------===// 915 916 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one 917 // of these for known things. 918 enum Token { 919 tok_eof = -1, 920 921 // commands 922 tok_def = -2, tok_extern = -3, 923 924 // primary 925 tok_identifier = -4, tok_number = -5, 926 927 // control 928 tok_if = -6, tok_then = -7, tok_else = -8, 929 tok_for = -9, tok_in = -10 930 }; 931 932 static std::string IdentifierStr; // Filled in if tok_identifier 933 static double NumVal; // Filled in if tok_number 934 935 /// gettok - Return the next token from standard input. 936 static int gettok() { 937 static int LastChar = ' '; 938 939 // Skip any whitespace. 940 while (isspace(LastChar)) 941 LastChar = getchar(); 942 943 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* 944 IdentifierStr = LastChar; 945 while (isalnum((LastChar = getchar()))) 946 IdentifierStr += LastChar; 947 948 if (IdentifierStr == "def") return tok_def; 949 if (IdentifierStr == "extern") return tok_extern; 950 if (IdentifierStr == "if") return tok_if; 951 if (IdentifierStr == "then") return tok_then; 952 if (IdentifierStr == "else") return tok_else; 953 if (IdentifierStr == "for") return tok_for; 954 if (IdentifierStr == "in") return tok_in; 955 return tok_identifier; 956 } 957 958 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ 959 std::string NumStr; 960 do { 961 NumStr += LastChar; 962 LastChar = getchar(); 963 } while (isdigit(LastChar) || LastChar == '.'); 964 965 NumVal = strtod(NumStr.c_str(), 0); 966 return tok_number; 967 } 968 969 if (LastChar == '#') { 970 // Comment until end of line. 971 do LastChar = getchar(); 972 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); 973 974 if (LastChar != EOF) 975 return gettok(); 976 } 977 978 // Check for end of file. Don't eat the EOF. 979 if (LastChar == EOF) 980 return tok_eof; 981 982 // Otherwise, just return the character as its ascii value. 983 int ThisChar = LastChar; 984 LastChar = getchar(); 985 return ThisChar; 986 } 987 988 //===----------------------------------------------------------------------===// 989 // Abstract Syntax Tree (aka Parse Tree) 990 //===----------------------------------------------------------------------===// 991 992 /// ExprAST - Base class for all expression nodes. 993 class ExprAST { 994 public: 995 virtual ~ExprAST() {} 996 virtual Value *Codegen() = 0; 997 }; 998 999 /// NumberExprAST - Expression class for numeric literals like "1.0". 1000 class NumberExprAST : public ExprAST { 1001 double Val; 1002 public: 1003 NumberExprAST(double val) : Val(val) {} 1004 virtual Value *Codegen(); 1005 }; 1006 1007 /// VariableExprAST - Expression class for referencing a variable, like "a". 1008 class VariableExprAST : public ExprAST { 1009 std::string Name; 1010 public: 1011 VariableExprAST(const std::string &name) : Name(name) {} 1012 virtual Value *Codegen(); 1013 }; 1014 1015 /// BinaryExprAST - Expression class for a binary operator. 1016 class BinaryExprAST : public ExprAST { 1017 char Op; 1018 ExprAST *LHS, *RHS; 1019 public: 1020 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 1021 : Op(op), LHS(lhs), RHS(rhs) {} 1022 virtual Value *Codegen(); 1023 }; 1024 1025 /// CallExprAST - Expression class for function calls. 1026 class CallExprAST : public ExprAST { 1027 std::string Callee; 1028 std::vector<ExprAST*> Args; 1029 public: 1030 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) 1031 : Callee(callee), Args(args) {} 1032 virtual Value *Codegen(); 1033 }; 1034 1035 /// IfExprAST - Expression class for if/then/else. 1036 class IfExprAST : public ExprAST { 1037 ExprAST *Cond, *Then, *Else; 1038 public: 1039 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 1040 : Cond(cond), Then(then), Else(_else) {} 1041 virtual Value *Codegen(); 1042 }; 1043 1044 /// ForExprAST - Expression class for for/in. 1045 class ForExprAST : public ExprAST { 1046 std::string VarName; 1047 ExprAST *Start, *End, *Step, *Body; 1048 public: 1049 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 1050 ExprAST *step, ExprAST *body) 1051 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 1052 virtual Value *Codegen(); 1053 }; 1054 1055 /// PrototypeAST - This class represents the "prototype" for a function, 1056 /// which captures its name, and its argument names (thus implicitly the number 1057 /// of arguments the function takes). 1058 class PrototypeAST { 1059 std::string Name; 1060 std::vector<std::string> Args; 1061 public: 1062 PrototypeAST(const std::string &name, const std::vector<std::string> &args) 1063 : Name(name), Args(args) {} 1064 1065 Function *Codegen(); 1066 }; 1067 1068 /// FunctionAST - This class represents a function definition itself. 1069 class FunctionAST { 1070 PrototypeAST *Proto; 1071 ExprAST *Body; 1072 public: 1073 FunctionAST(PrototypeAST *proto, ExprAST *body) 1074 : Proto(proto), Body(body) {} 1075 1076 Function *Codegen(); 1077 }; 1078 1079 //===----------------------------------------------------------------------===// 1080 // Parser 1081 //===----------------------------------------------------------------------===// 1082 1083 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 1084 /// token the parser is looking at. getNextToken reads another token from the 1085 /// lexer and updates CurTok with its results. 1086 static int CurTok; 1087 static int getNextToken() { 1088 return CurTok = gettok(); 1089 } 1090 1091 /// BinopPrecedence - This holds the precedence for each binary operator that is 1092 /// defined. 1093 static std::map<char, int> BinopPrecedence; 1094 1095 /// GetTokPrecedence - Get the precedence of the pending binary operator token. 1096 static int GetTokPrecedence() { 1097 if (!isascii(CurTok)) 1098 return -1; 1099 1100 // Make sure it's a declared binop. 1101 int TokPrec = BinopPrecedence[CurTok]; 1102 if (TokPrec <= 0) return -1; 1103 return TokPrec; 1104 } 1105 1106 /// Error* - These are little helper functions for error handling. 1107 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} 1108 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } 1109 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } 1110 1111 static ExprAST *ParseExpression(); 1112 1113 /// identifierexpr 1114 /// ::= identifier 1115 /// ::= identifier '(' expression* ')' 1116 static ExprAST *ParseIdentifierExpr() { 1117 std::string IdName = IdentifierStr; 1118 1119 getNextToken(); // eat identifier. 1120 1121 if (CurTok != '(') // Simple variable ref. 1122 return new VariableExprAST(IdName); 1123 1124 // Call. 1125 getNextToken(); // eat ( 1126 std::vector<ExprAST*> Args; 1127 if (CurTok != ')') { 1128 while (1) { 1129 ExprAST *Arg = ParseExpression(); 1130 if (!Arg) return 0; 1131 Args.push_back(Arg); 1132 1133 if (CurTok == ')') break; 1134 1135 if (CurTok != ',') 1136 return Error("Expected ')' or ',' in argument list"); 1137 getNextToken(); 1138 } 1139 } 1140 1141 // Eat the ')'. 1142 getNextToken(); 1143 1144 return new CallExprAST(IdName, Args); 1145 } 1146 1147 /// numberexpr ::= number 1148 static ExprAST *ParseNumberExpr() { 1149 ExprAST *Result = new NumberExprAST(NumVal); 1150 getNextToken(); // consume the number 1151 return Result; 1152 } 1153 1154 /// parenexpr ::= '(' expression ')' 1155 static ExprAST *ParseParenExpr() { 1156 getNextToken(); // eat (. 1157 ExprAST *V = ParseExpression(); 1158 if (!V) return 0; 1159 1160 if (CurTok != ')') 1161 return Error("expected ')'"); 1162 getNextToken(); // eat ). 1163 return V; 1164 } 1165 1166 /// ifexpr ::= 'if' expression 'then' expression 'else' expression 1167 static ExprAST *ParseIfExpr() { 1168 getNextToken(); // eat the if. 1169 1170 // condition. 1171 ExprAST *Cond = ParseExpression(); 1172 if (!Cond) return 0; 1173 1174 if (CurTok != tok_then) 1175 return Error("expected then"); 1176 getNextToken(); // eat the then 1177 1178 ExprAST *Then = ParseExpression(); 1179 if (Then == 0) return 0; 1180 1181 if (CurTok != tok_else) 1182 return Error("expected else"); 1183 1184 getNextToken(); 1185 1186 ExprAST *Else = ParseExpression(); 1187 if (!Else) return 0; 1188 1189 return new IfExprAST(Cond, Then, Else); 1190 } 1191 1192 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 1193 static ExprAST *ParseForExpr() { 1194 getNextToken(); // eat the for. 1195 1196 if (CurTok != tok_identifier) 1197 return Error("expected identifier after for"); 1198 1199 std::string IdName = IdentifierStr; 1200 getNextToken(); // eat identifier. 1201 1202 if (CurTok != '=') 1203 return Error("expected '=' after for"); 1204 getNextToken(); // eat '='. 1205 1206 1207 ExprAST *Start = ParseExpression(); 1208 if (Start == 0) return 0; 1209 if (CurTok != ',') 1210 return Error("expected ',' after for start value"); 1211 getNextToken(); 1212 1213 ExprAST *End = ParseExpression(); 1214 if (End == 0) return 0; 1215 1216 // The step value is optional. 1217 ExprAST *Step = 0; 1218 if (CurTok == ',') { 1219 getNextToken(); 1220 Step = ParseExpression(); 1221 if (Step == 0) return 0; 1222 } 1223 1224 if (CurTok != tok_in) 1225 return Error("expected 'in' after for"); 1226 getNextToken(); // eat 'in'. 1227 1228 ExprAST *Body = ParseExpression(); 1229 if (Body == 0) return 0; 1230 1231 return new ForExprAST(IdName, Start, End, Step, Body); 1232 } 1233 1234 /// primary 1235 /// ::= identifierexpr 1236 /// ::= numberexpr 1237 /// ::= parenexpr 1238 /// ::= ifexpr 1239 /// ::= forexpr 1240 static ExprAST *ParsePrimary() { 1241 switch (CurTok) { 1242 default: return Error("unknown token when expecting an expression"); 1243 case tok_identifier: return ParseIdentifierExpr(); 1244 case tok_number: return ParseNumberExpr(); 1245 case '(': return ParseParenExpr(); 1246 case tok_if: return ParseIfExpr(); 1247 case tok_for: return ParseForExpr(); 1248 } 1249 } 1250 1251 /// binoprhs 1252 /// ::= ('+' primary)* 1253 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 1254 // If this is a binop, find its precedence. 1255 while (1) { 1256 int TokPrec = GetTokPrecedence(); 1257 1258 // If this is a binop that binds at least as tightly as the current binop, 1259 // consume it, otherwise we are done. 1260 if (TokPrec < ExprPrec) 1261 return LHS; 1262 1263 // Okay, we know this is a binop. 1264 int BinOp = CurTok; 1265 getNextToken(); // eat binop 1266 1267 // Parse the primary expression after the binary operator. 1268 ExprAST *RHS = ParsePrimary(); 1269 if (!RHS) return 0; 1270 1271 // If BinOp binds less tightly with RHS than the operator after RHS, let 1272 // the pending operator take RHS as its LHS. 1273 int NextPrec = GetTokPrecedence(); 1274 if (TokPrec < NextPrec) { 1275 RHS = ParseBinOpRHS(TokPrec+1, RHS); 1276 if (RHS == 0) return 0; 1277 } 1278 1279 // Merge LHS/RHS. 1280 LHS = new BinaryExprAST(BinOp, LHS, RHS); 1281 } 1282 } 1283 1284 /// expression 1285 /// ::= primary binoprhs 1286 /// 1287 static ExprAST *ParseExpression() { 1288 ExprAST *LHS = ParsePrimary(); 1289 if (!LHS) return 0; 1290 1291 return ParseBinOpRHS(0, LHS); 1292 } 1293 1294 /// prototype 1295 /// ::= id '(' id* ')' 1296 static PrototypeAST *ParsePrototype() { 1297 if (CurTok != tok_identifier) 1298 return ErrorP("Expected function name in prototype"); 1299 1300 std::string FnName = IdentifierStr; 1301 getNextToken(); 1302 1303 if (CurTok != '(') 1304 return ErrorP("Expected '(' in prototype"); 1305 1306 std::vector<std::string> ArgNames; 1307 while (getNextToken() == tok_identifier) 1308 ArgNames.push_back(IdentifierStr); 1309 if (CurTok != ')') 1310 return ErrorP("Expected ')' in prototype"); 1311 1312 // success. 1313 getNextToken(); // eat ')'. 1314 1315 return new PrototypeAST(FnName, ArgNames); 1316 } 1317 1318 /// definition ::= 'def' prototype expression 1319 static FunctionAST *ParseDefinition() { 1320 getNextToken(); // eat def. 1321 PrototypeAST *Proto = ParsePrototype(); 1322 if (Proto == 0) return 0; 1323 1324 if (ExprAST *E = ParseExpression()) 1325 return new FunctionAST(Proto, E); 1326 return 0; 1327 } 1328 1329 /// toplevelexpr ::= expression 1330 static FunctionAST *ParseTopLevelExpr() { 1331 if (ExprAST *E = ParseExpression()) { 1332 // Make an anonymous proto. 1333 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); 1334 return new FunctionAST(Proto, E); 1335 } 1336 return 0; 1337 } 1338 1339 /// external ::= 'extern' prototype 1340 static PrototypeAST *ParseExtern() { 1341 getNextToken(); // eat extern. 1342 return ParsePrototype(); 1343 } 1344 1345 //===----------------------------------------------------------------------===// 1346 // Code Generation 1347 //===----------------------------------------------------------------------===// 1348 1349 static Module *TheModule; 1350 static IRBuilder<> Builder(getGlobalContext()); 1351 static std::map<std::string, Value*> NamedValues; 1352 static FunctionPassManager *TheFPM; 1353 1354 Value *ErrorV(const char *Str) { Error(Str); return 0; } 1355 1356 Value *NumberExprAST::Codegen() { 1357 return ConstantFP::get(getGlobalContext(), APFloat(Val)); 1358 } 1359 1360 Value *VariableExprAST::Codegen() { 1361 // Look this variable up in the function. 1362 Value *V = NamedValues[Name]; 1363 return V ? V : ErrorV("Unknown variable name"); 1364 } 1365 1366 Value *BinaryExprAST::Codegen() { 1367 Value *L = LHS->Codegen(); 1368 Value *R = RHS->Codegen(); 1369 if (L == 0 || R == 0) return 0; 1370 1371 switch (Op) { 1372 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 1373 case '-': return Builder.CreateFSub(L, R, "subtmp"); 1374 case '*': return Builder.CreateFMul(L, R, "multmp"); 1375 case '<': 1376 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 1377 // Convert bool 0/1 to double 0.0 or 1.0 1378 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 1379 "booltmp"); 1380 default: return ErrorV("invalid binary operator"); 1381 } 1382 } 1383 1384 Value *CallExprAST::Codegen() { 1385 // Look up the name in the global module table. 1386 Function *CalleeF = TheModule->getFunction(Callee); 1387 if (CalleeF == 0) 1388 return ErrorV("Unknown function referenced"); 1389 1390 // If argument mismatch error. 1391 if (CalleeF->arg_size() != Args.size()) 1392 return ErrorV("Incorrect # arguments passed"); 1393 1394 std::vector<Value*> ArgsV; 1395 for (unsigned i = 0, e = Args.size(); i != e; ++i) { 1396 ArgsV.push_back(Args[i]->Codegen()); 1397 if (ArgsV.back() == 0) return 0; 1398 } 1399 1400 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); 1401 } 1402 1403 Value *IfExprAST::Codegen() { 1404 Value *CondV = Cond->Codegen(); 1405 if (CondV == 0) return 0; 1406 1407 // Convert condition to a bool by comparing equal to 0.0. 1408 CondV = Builder.CreateFCmpONE(CondV, 1409 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1410 "ifcond"); 1411 1412 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1413 1414 // Create blocks for the then and else cases. Insert the 'then' block at the 1415 // end of the function. 1416 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 1417 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 1418 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 1419 1420 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 1421 1422 // Emit then value. 1423 Builder.SetInsertPoint(ThenBB); 1424 1425 Value *ThenV = Then->Codegen(); 1426 if (ThenV == 0) return 0; 1427 1428 Builder.CreateBr(MergeBB); 1429 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 1430 ThenBB = Builder.GetInsertBlock(); 1431 1432 // Emit else block. 1433 TheFunction->getBasicBlockList().push_back(ElseBB); 1434 Builder.SetInsertPoint(ElseBB); 1435 1436 Value *ElseV = Else->Codegen(); 1437 if (ElseV == 0) return 0; 1438 1439 Builder.CreateBr(MergeBB); 1440 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 1441 ElseBB = Builder.GetInsertBlock(); 1442 1443 // Emit merge block. 1444 TheFunction->getBasicBlockList().push_back(MergeBB); 1445 Builder.SetInsertPoint(MergeBB); 1446 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 1447 "iftmp"); 1448 1449 PN->addIncoming(ThenV, ThenBB); 1450 PN->addIncoming(ElseV, ElseBB); 1451 return PN; 1452 } 1453 1454 Value *ForExprAST::Codegen() { 1455 // Output this as: 1456 // ... 1457 // start = startexpr 1458 // goto loop 1459 // loop: 1460 // variable = phi [start, loopheader], [nextvariable, loopend] 1461 // ... 1462 // bodyexpr 1463 // ... 1464 // loopend: 1465 // step = stepexpr 1466 // nextvariable = variable + step 1467 // endcond = endexpr 1468 // br endcond, loop, endloop 1469 // outloop: 1470 1471 // Emit the start code first, without 'variable' in scope. 1472 Value *StartVal = Start->Codegen(); 1473 if (StartVal == 0) return 0; 1474 1475 // Make the new basic block for the loop header, inserting after current 1476 // block. 1477 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1478 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 1479 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 1480 1481 // Insert an explicit fall through from the current block to the LoopBB. 1482 Builder.CreateBr(LoopBB); 1483 1484 // Start insertion in LoopBB. 1485 Builder.SetInsertPoint(LoopBB); 1486 1487 // Start the PHI node with an entry for Start. 1488 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 1489 Variable->addIncoming(StartVal, PreheaderBB); 1490 1491 // Within the loop, the variable is defined equal to the PHI node. If it 1492 // shadows an existing variable, we have to restore it, so save it now. 1493 Value *OldVal = NamedValues[VarName]; 1494 NamedValues[VarName] = Variable; 1495 1496 // Emit the body of the loop. This, like any other expr, can change the 1497 // current BB. Note that we ignore the value computed by the body, but don't 1498 // allow an error. 1499 if (Body->Codegen() == 0) 1500 return 0; 1501 1502 // Emit the step value. 1503 Value *StepVal; 1504 if (Step) { 1505 StepVal = Step->Codegen(); 1506 if (StepVal == 0) return 0; 1507 } else { 1508 // If not specified, use 1.0. 1509 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 1510 } 1511 1512 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 1513 1514 // Compute the end condition. 1515 Value *EndCond = End->Codegen(); 1516 if (EndCond == 0) return EndCond; 1517 1518 // Convert condition to a bool by comparing equal to 0.0. 1519 EndCond = Builder.CreateFCmpONE(EndCond, 1520 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1521 "loopcond"); 1522 1523 // Create the "after loop" block and insert it. 1524 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 1525 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 1526 1527 // Insert the conditional branch into the end of LoopEndBB. 1528 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 1529 1530 // Any new code will be inserted in AfterBB. 1531 Builder.SetInsertPoint(AfterBB); 1532 1533 // Add a new entry to the PHI node for the backedge. 1534 Variable->addIncoming(NextVar, LoopEndBB); 1535 1536 // Restore the unshadowed variable. 1537 if (OldVal) 1538 NamedValues[VarName] = OldVal; 1539 else 1540 NamedValues.erase(VarName); 1541 1542 1543 // for expr always returns 0.0. 1544 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 1545 } 1546 1547 Function *PrototypeAST::Codegen() { 1548 // Make the function type: double(double,double) etc. 1549 std::vector<const Type*> Doubles(Args.size(), 1550 Type::getDoubleTy(getGlobalContext())); 1551 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), 1552 Doubles, false); 1553 1554 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); 1555 1556 // If F conflicted, there was already something named 'Name'. If it has a 1557 // body, don't allow redefinition or reextern. 1558 if (F->getName() != Name) { 1559 // Delete the one we just made and get the existing one. 1560 F->eraseFromParent(); 1561 F = TheModule->getFunction(Name); 1562 1563 // If F already has a body, reject this. 1564 if (!F->empty()) { 1565 ErrorF("redefinition of function"); 1566 return 0; 1567 } 1568 1569 // If F took a different number of args, reject. 1570 if (F->arg_size() != Args.size()) { 1571 ErrorF("redefinition of function with different # args"); 1572 return 0; 1573 } 1574 } 1575 1576 // Set names for all arguments. 1577 unsigned Idx = 0; 1578 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); 1579 ++AI, ++Idx) { 1580 AI->setName(Args[Idx]); 1581 1582 // Add arguments to variable symbol table. 1583 NamedValues[Args[Idx]] = AI; 1584 } 1585 1586 return F; 1587 } 1588 1589 Function *FunctionAST::Codegen() { 1590 NamedValues.clear(); 1591 1592 Function *TheFunction = Proto->Codegen(); 1593 if (TheFunction == 0) 1594 return 0; 1595 1596 // Create a new basic block to start insertion into. 1597 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 1598 Builder.SetInsertPoint(BB); 1599 1600 if (Value *RetVal = Body->Codegen()) { 1601 // Finish off the function. 1602 Builder.CreateRet(RetVal); 1603 1604 // Validate the generated code, checking for consistency. 1605 verifyFunction(*TheFunction); 1606 1607 // Optimize the function. 1608 TheFPM->run(*TheFunction); 1609 1610 return TheFunction; 1611 } 1612 1613 // Error reading body, remove function. 1614 TheFunction->eraseFromParent(); 1615 return 0; 1616 } 1617 1618 //===----------------------------------------------------------------------===// 1619 // Top-Level parsing and JIT Driver 1620 //===----------------------------------------------------------------------===// 1621 1622 static ExecutionEngine *TheExecutionEngine; 1623 1624 static void HandleDefinition() { 1625 if (FunctionAST *F = ParseDefinition()) { 1626 if (Function *LF = F->Codegen()) { 1627 fprintf(stderr, "Read function definition:"); 1628 LF->dump(); 1629 } 1630 } else { 1631 // Skip token for error recovery. 1632 getNextToken(); 1633 } 1634 } 1635 1636 static void HandleExtern() { 1637 if (PrototypeAST *P = ParseExtern()) { 1638 if (Function *F = P->Codegen()) { 1639 fprintf(stderr, "Read extern: "); 1640 F->dump(); 1641 } 1642 } else { 1643 // Skip token for error recovery. 1644 getNextToken(); 1645 } 1646 } 1647 1648 static void HandleTopLevelExpression() { 1649 // Evaluate a top-level expression into an anonymous function. 1650 if (FunctionAST *F = ParseTopLevelExpr()) { 1651 if (Function *LF = F->Codegen()) { 1652 // JIT the function, returning a function pointer. 1653 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); 1654 1655 // Cast it to the right type (takes no arguments, returns a double) so we 1656 // can call it as a native function. 1657 double (*FP)() = (double (*)())(intptr_t)FPtr; 1658 fprintf(stderr, "Evaluated to %f\n", FP()); 1659 } 1660 } else { 1661 // Skip token for error recovery. 1662 getNextToken(); 1663 } 1664 } 1665 1666 /// top ::= definition | external | expression | ';' 1667 static void MainLoop() { 1668 while (1) { 1669 fprintf(stderr, "ready> "); 1670 switch (CurTok) { 1671 case tok_eof: return; 1672 case ';': getNextToken(); break; // ignore top-level semicolons. 1673 case tok_def: HandleDefinition(); break; 1674 case tok_extern: HandleExtern(); break; 1675 default: HandleTopLevelExpression(); break; 1676 } 1677 } 1678 } 1679 1680 //===----------------------------------------------------------------------===// 1681 // "Library" functions that can be "extern'd" from user code. 1682 //===----------------------------------------------------------------------===// 1683 1684 /// putchard - putchar that takes a double and returns 0. 1685 extern "C" 1686 double putchard(double X) { 1687 putchar((char)X); 1688 return 0; 1689 } 1690 1691 //===----------------------------------------------------------------------===// 1692 // Main driver code. 1693 //===----------------------------------------------------------------------===// 1694 1695 int main() { 1696 InitializeNativeTarget(); 1697 LLVMContext &Context = getGlobalContext(); 1698 1699 // Install standard binary operators. 1700 // 1 is lowest precedence. 1701 BinopPrecedence['<'] = 10; 1702 BinopPrecedence['+'] = 20; 1703 BinopPrecedence['-'] = 20; 1704 BinopPrecedence['*'] = 40; // highest. 1705 1706 // Prime the first token. 1707 fprintf(stderr, "ready> "); 1708 getNextToken(); 1709 1710 // Make the module, which holds all the code. 1711 TheModule = new Module("my cool jit", Context); 1712 1713 // Create the JIT. This takes ownership of the module. 1714 std::string ErrStr; 1715 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); 1716 if (!TheExecutionEngine) { 1717 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); 1718 exit(1); 1719 } 1720 1721 FunctionPassManager OurFPM(TheModule); 1722 1723 // Set up the optimizer pipeline. Start with registering info about how the 1724 // target lays out data structures. 1725 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); 1726 // Provide basic AliasAnalysis support for GVN. 1727 OurFPM.add(createBasicAliasAnalysisPass()); 1728 // Do simple "peephole" optimizations and bit-twiddling optzns. 1729 OurFPM.add(createInstructionCombiningPass()); 1730 // Reassociate expressions. 1731 OurFPM.add(createReassociatePass()); 1732 // Eliminate Common SubExpressions. 1733 OurFPM.add(createGVNPass()); 1734 // Simplify the control flow graph (deleting unreachable blocks, etc). 1735 OurFPM.add(createCFGSimplificationPass()); 1736 1737 OurFPM.doInitialization(); 1738 1739 // Set the global so the code gen can use this. 1740 TheFPM = &OurFPM; 1741 1742 // Run the main "interpreter loop" now. 1743 MainLoop(); 1744 1745 TheFPM = 0; 1746 1747 // Print out all of the generated code. 1748 TheModule->dump(); 1749 1750 return 0; 1751 } 1752 </pre> 1753 </div> 1754 1755 <a href="LangImpl6.html">Next: Extending the language: user-defined operators</a> 1756 </div> 1757 1758 <!-- *********************************************************************** --> 1759 <hr> 1760 <address> 1761 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 1762 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> 1763 <a href="http://validator.w3.org/check/referer"><img 1764 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> 1765 1766 <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br> 1767 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> 1768 Last modified: $Date$ 1769 </address> 1770 </body> 1771 </html> 1772