Home | History | Annotate | Download | only in tutorial
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2                       "http://www.w3.org/TR/html4/strict.dtd">
      3 
      4 <html>
      5 <head>
      6   <title>Kaleidoscope: Extending the Language: Control Flow</title>
      7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      8   <meta name="author" content="Chris Lattner">
      9   <link rel="stylesheet" href="../_static/llvm.css" type="text/css">
     10 </head>
     11 
     12 <body>
     13 
     14 <h1>Kaleidoscope: Extending the Language: Control Flow</h1>
     15 
     16 <ul>
     17 <li><a href="index.html">Up to Tutorial Index</a></li>
     18 <li>Chapter 5
     19   <ol>
     20     <li><a href="#intro">Chapter 5 Introduction</a></li>
     21     <li><a href="#ifthen">If/Then/Else</a>
     22     <ol>
     23       <li><a href="#iflexer">Lexer Extensions</a></li>
     24       <li><a href="#ifast">AST Extensions</a></li>
     25       <li><a href="#ifparser">Parser Extensions</a></li>
     26       <li><a href="#ifir">LLVM IR</a></li>
     27       <li><a href="#ifcodegen">Code Generation</a></li>
     28     </ol>
     29     </li>
     30     <li><a href="#for">'for' Loop Expression</a>
     31     <ol>
     32       <li><a href="#forlexer">Lexer Extensions</a></li>
     33       <li><a href="#forast">AST Extensions</a></li>
     34       <li><a href="#forparser">Parser Extensions</a></li>
     35       <li><a href="#forir">LLVM IR</a></li>
     36       <li><a href="#forcodegen">Code Generation</a></li>
     37     </ol>
     38     </li>
     39     <li><a href="#code">Full Code Listing</a></li>
     40   </ol>
     41 </li>
     42 <li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language: 
     43 User-defined Operators</li>
     44 </ul>
     45 
     46 <div class="doc_author">
     47   <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a></p>
     48 </div>
     49 
     50 <!-- *********************************************************************** -->
     51 <h2><a name="intro">Chapter 5 Introduction</a></h2>
     52 <!-- *********************************************************************** -->
     53 
     54 <div>
     55 
     56 <p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
     57 with LLVM</a>" tutorial.  Parts 1-4 described the implementation of the simple
     58 Kaleidoscope language and included support for generating LLVM IR, followed by
     59 optimizations and a JIT compiler.  Unfortunately, as presented, Kaleidoscope is
     60 mostly useless: it has no control flow other than call and return.  This means
     61 that you can't have conditional branches in the code, significantly limiting its
     62 power.  In this episode of "build that compiler", we'll extend Kaleidoscope to
     63 have an if/then/else expression plus a simple 'for' loop.</p>
     64 
     65 </div>
     66 
     67 <!-- *********************************************************************** -->
     68 <h2><a name="ifthen">If/Then/Else</a></h2>
     69 <!-- *********************************************************************** -->
     70 
     71 <div>
     72 
     73 <p>
     74 Extending Kaleidoscope to support if/then/else is quite straightforward.  It
     75 basically requires adding support for this "new" concept to the lexer,
     76 parser, AST, and LLVM code emitter.  This example is nice, because it shows how
     77 easy it is to "grow" a language over time, incrementally extending it as new
     78 ideas are discovered.</p>
     79 
     80 <p>Before we get going on "how" we add this extension, lets talk about "what" we
     81 want.  The basic idea is that we want to be able to write this sort of thing:
     82 </p>
     83 
     84 <div class="doc_code">
     85 <pre>
     86 def fib(x)
     87   if x &lt; 3 then
     88     1
     89   else
     90     fib(x-1)+fib(x-2);
     91 </pre>
     92 </div>
     93 
     94 <p>In Kaleidoscope, every construct is an expression: there are no statements.
     95 As such, the if/then/else expression needs to return a value like any other.
     96 Since we're using a mostly functional form, we'll have it evaluate its
     97 conditional, then return the 'then' or 'else' value based on how the condition
     98 was resolved.  This is very similar to the C "?:" expression.</p>
     99 
    100 <p>The semantics of the if/then/else expression is that it evaluates the
    101 condition to a boolean equality value: 0.0 is considered to be false and
    102 everything else is considered to be true.
    103 If the condition is true, the first subexpression is evaluated and returned, if
    104 the condition is false, the second subexpression is evaluated and returned.
    105 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
    106 </p>
    107 
    108 <p>Now that we know what we "want", lets break this down into its constituent
    109 pieces.</p>
    110 
    111 <!-- ======================================================================= -->
    112 <h4><a name="iflexer">Lexer Extensions for If/Then/Else</a></h4>
    113 <!-- ======================================================================= -->
    114 
    115 
    116 <div>
    117 
    118 <p>The lexer extensions are straightforward.  First we add new enum values
    119 for the relevant tokens:</p>
    120 
    121 <div class="doc_code">
    122 <pre>
    123   // control
    124   tok_if = -6, tok_then = -7, tok_else = -8,
    125 </pre>
    126 </div>
    127 
    128 <p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple
    129 stuff:</p>
    130 
    131 <div class="doc_code">
    132 <pre>
    133     ...
    134     if (IdentifierStr == "def") return tok_def;
    135     if (IdentifierStr == "extern") return tok_extern;
    136     <b>if (IdentifierStr == "if") return tok_if;
    137     if (IdentifierStr == "then") return tok_then;
    138     if (IdentifierStr == "else") return tok_else;</b>
    139     return tok_identifier;
    140 </pre>
    141 </div>
    142 
    143 </div>
    144 
    145 <!-- ======================================================================= -->
    146 <h4><a name="ifast">AST Extensions for If/Then/Else</a></h4>
    147 <!-- ======================================================================= -->
    148 
    149 <div>
    150 
    151 <p>To represent the new expression we add a new AST node for it:</p>
    152 
    153 <div class="doc_code">
    154 <pre>
    155 /// IfExprAST - Expression class for if/then/else.
    156 class IfExprAST : public ExprAST {
    157   ExprAST *Cond, *Then, *Else;
    158 public:
    159   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
    160     : Cond(cond), Then(then), Else(_else) {}
    161   virtual Value *Codegen();
    162 };
    163 </pre>
    164 </div>
    165 
    166 <p>The AST node just has pointers to the various subexpressions.</p>
    167 
    168 </div>
    169 
    170 <!-- ======================================================================= -->
    171 <h4><a name="ifparser">Parser Extensions for If/Then/Else</a></h4>
    172 <!-- ======================================================================= -->
    173 
    174 <div>
    175 
    176 <p>Now that we have the relevant tokens coming from the lexer and we have the
    177 AST node to build, our parsing logic is relatively straightforward.  First we
    178 define a new parsing function:</p>
    179 
    180 <div class="doc_code">
    181 <pre>
    182 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
    183 static ExprAST *ParseIfExpr() {
    184   getNextToken();  // eat the if.
    185   
    186   // condition.
    187   ExprAST *Cond = ParseExpression();
    188   if (!Cond) return 0;
    189   
    190   if (CurTok != tok_then)
    191     return Error("expected then");
    192   getNextToken();  // eat the then
    193   
    194   ExprAST *Then = ParseExpression();
    195   if (Then == 0) return 0;
    196   
    197   if (CurTok != tok_else)
    198     return Error("expected else");
    199   
    200   getNextToken();
    201   
    202   ExprAST *Else = ParseExpression();
    203   if (!Else) return 0;
    204   
    205   return new IfExprAST(Cond, Then, Else);
    206 }
    207 </pre>
    208 </div>
    209 
    210 <p>Next we hook it up as a primary expression:</p>
    211 
    212 <div class="doc_code">
    213 <pre>
    214 static ExprAST *ParsePrimary() {
    215   switch (CurTok) {
    216   default: return Error("unknown token when expecting an expression");
    217   case tok_identifier: return ParseIdentifierExpr();
    218   case tok_number:     return ParseNumberExpr();
    219   case '(':            return ParseParenExpr();
    220   <b>case tok_if:         return ParseIfExpr();</b>
    221   }
    222 }
    223 </pre>
    224 </div>
    225 
    226 </div>
    227 
    228 <!-- ======================================================================= -->
    229 <h4><a name="ifir">LLVM IR for If/Then/Else</a></h4>
    230 <!-- ======================================================================= -->
    231 
    232 <div>
    233 
    234 <p>Now that we have it parsing and building the AST, the final piece is adding
    235 LLVM code generation support.  This is the most interesting part of the
    236 if/then/else example, because this is where it starts to introduce new concepts.
    237 All of the code above has been thoroughly described in previous chapters.
    238 </p>
    239 
    240 <p>To motivate the code we want to produce, lets take a look at a simple
    241 example.  Consider:</p>
    242 
    243 <div class="doc_code">
    244 <pre>
    245 extern foo();
    246 extern bar();
    247 def baz(x) if x then foo() else bar();
    248 </pre>
    249 </div>
    250 
    251 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
    252 looks like this:</p>
    253 
    254 <div class="doc_code">
    255 <pre>
    256 declare double @foo()
    257 
    258 declare double @bar()
    259 
    260 define double @baz(double %x) {
    261 entry:
    262   %ifcond = fcmp one double %x, 0.000000e+00
    263   br i1 %ifcond, label %then, label %else
    264 
    265 then:		; preds = %entry
    266   %calltmp = call double @foo()
    267   br label %ifcont
    268 
    269 else:		; preds = %entry
    270   %calltmp1 = call double @bar()
    271   br label %ifcont
    272 
    273 ifcont:		; preds = %else, %then
    274   %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
    275   ret double %iftmp
    276 }
    277 </pre>
    278 </div>
    279 
    280 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
    281 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool.  If you put this LLVM IR
    282 into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
    283 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
    284 see this graph:</p>
    285 
    286 <div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" 
    287 height="315"></div>
    288 
    289 <p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
    290 "<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
    291 inserting actual calls into the code and recompiling or by calling these in the
    292 debugger.  LLVM has many nice features for visualizing various graphs.</p>
    293 
    294 <p>Getting back to the generated code, it is fairly simple: the entry block 
    295 evaluates the conditional expression ("x" in our case here) and compares the
    296 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
    297 instruction ('one' is "Ordered and Not Equal").  Based on the result of this
    298 expression, the code jumps to either the "then" or "else" blocks, which contain
    299 the expressions for the true/false cases.</p>
    300 
    301 <p>Once the then/else blocks are finished executing, they both branch back to the
    302 'ifcont' block to execute the code that happens after the if/then/else.  In this
    303 case the only thing left to do is to return to the caller of the function.  The
    304 question then becomes: how does the code know which expression to return?</p>
    305 
    306 <p>The answer to this question involves an important SSA operation: the
    307 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
    308 operation</a>.  If you're not familiar with SSA, <a 
    309 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
    310 article</a> is a good introduction and there are various other introductions to
    311 it available on your favorite search engine.  The short version is that
    312 "execution" of the Phi operation requires "remembering" which block control came
    313 from.  The Phi operation takes on the value corresponding to the input control
    314 block.  In this case, if control comes in from the "then" block, it gets the
    315 value of "calltmp".  If control comes from the "else" block, it gets the value
    316 of "calltmp1".</p>
    317 
    318 <p>At this point, you are probably starting to think "Oh no! This means my
    319 simple and elegant front-end will have to start generating SSA form in order to
    320 use LLVM!".  Fortunately, this is not the case, and we strongly advise
    321 <em>not</em> implementing an SSA construction algorithm in your front-end
    322 unless there is an amazingly good reason to do so.  In practice, there are two
    323 sorts of values that float around in code written for your average imperative
    324 programming language that might need Phi nodes:</p>
    325 
    326 <ol>
    327 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
    328 <li>Values that are implicit in the structure of your AST, such as the Phi node
    329 in this case.</li>
    330 </ol>
    331 
    332 <p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable 
    333 variables"), we'll talk about #1
    334 in depth.  For now, just believe me that you don't need SSA construction to
    335 handle this case.  For #2, you have the choice of using the techniques that we will 
    336 describe for #1, or you can insert Phi nodes directly, if convenient.  In this 
    337 case, it is really really easy to generate the Phi node, so we choose to do it
    338 directly.</p>
    339 
    340 <p>Okay, enough of the motivation and overview, lets generate code!</p>
    341 
    342 </div>
    343 
    344 <!-- ======================================================================= -->
    345 <h4><a name="ifcodegen">Code Generation for If/Then/Else</a></h4>
    346 <!-- ======================================================================= -->
    347 
    348 <div>
    349 
    350 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
    351 for <tt>IfExprAST</tt>:</p>
    352 
    353 <div class="doc_code">
    354 <pre>
    355 Value *IfExprAST::Codegen() {
    356   Value *CondV = Cond-&gt;Codegen();
    357   if (CondV == 0) return 0;
    358   
    359   // Convert condition to a bool by comparing equal to 0.0.
    360   CondV = Builder.CreateFCmpONE(CondV, 
    361                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
    362                                 "ifcond");
    363 </pre>
    364 </div>
    365 
    366 <p>This code is straightforward and similar to what we saw before.  We emit the
    367 expression for the condition, then compare that value to zero to get a truth
    368 value as a 1-bit (bool) value.</p>
    369 
    370 <div class="doc_code">
    371 <pre>
    372   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
    373   
    374   // Create blocks for the then and else cases.  Insert the 'then' block at the
    375   // end of the function.
    376   BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
    377   BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
    378   BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
    379 
    380   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
    381 </pre>
    382 </div>
    383 
    384 <p>This code creates the basic blocks that are related to the if/then/else
    385 statement, and correspond directly to the blocks in the example above.  The
    386 first line gets the current Function object that is being built.  It
    387 gets this by asking the builder for the current BasicBlock, and asking that
    388 block for its "parent" (the function it is currently embedded into).</p>
    389 
    390 <p>Once it has that, it creates three blocks.  Note that it passes "TheFunction"
    391 into the constructor for the "then" block.  This causes the constructor to
    392 automatically insert the new block into the end of the specified function.  The
    393 other two blocks are created, but aren't yet inserted into the function.</p>
    394 
    395 <p>Once the blocks are created, we can emit the conditional branch that chooses
    396 between them.  Note that creating new blocks does not implicitly affect the
    397 IRBuilder, so it is still inserting into the block that the condition
    398 went into.  Also note that it is creating a branch to the "then" block and the
    399 "else" block, even though the "else" block isn't inserted into the function yet.
    400 This is all ok: it is the standard way that LLVM supports forward 
    401 references.</p>
    402 
    403 <div class="doc_code">
    404 <pre>
    405   // Emit then value.
    406   Builder.SetInsertPoint(ThenBB);
    407   
    408   Value *ThenV = Then-&gt;Codegen();
    409   if (ThenV == 0) return 0;
    410   
    411   Builder.CreateBr(MergeBB);
    412   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
    413   ThenBB = Builder.GetInsertBlock();
    414 </pre>
    415 </div>
    416 
    417 <p>After the conditional branch is inserted, we move the builder to start
    418 inserting into the "then" block.  Strictly speaking, this call moves the
    419 insertion point to be at the end of the specified block.  However, since the
    420 "then" block is empty, it also starts out by inserting at the beginning of the
    421 block.  :)</p>
    422 
    423 <p>Once the insertion point is set, we recursively codegen the "then" expression
    424 from the AST.  To finish off the "then" block, we create an unconditional branch
    425 to the merge block.  One interesting (and very important) aspect of the LLVM IR
    426 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
    427 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
    428 instruction</a> such as return or branch.  This means that all control flow,
    429 <em>including fall throughs</em> must be made explicit in the LLVM IR.  If you
    430 violate this rule, the verifier will emit an error.</p>
    431 
    432 <p>The final line here is quite subtle, but is very important.  The basic issue
    433 is that when we create the Phi node in the merge block, we need to set up the
    434 block/value pairs that indicate how the Phi will work.  Importantly, the Phi
    435 node expects to have an entry for each predecessor of the block in the CFG.  Why
    436 then, are we getting the current block when we just set it to ThenBB 5 lines
    437 above?  The problem is that the "Then" expression may actually itself change the
    438 block that the Builder is emitting into if, for example, it contains a nested
    439 "if/then/else" expression.  Because calling Codegen recursively could
    440 arbitrarily change the notion of the current block, we are required to get an
    441 up-to-date value for code that will set up the Phi node.</p>
    442 
    443 <div class="doc_code">
    444 <pre>
    445   // Emit else block.
    446   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
    447   Builder.SetInsertPoint(ElseBB);
    448   
    449   Value *ElseV = Else-&gt;Codegen();
    450   if (ElseV == 0) return 0;
    451   
    452   Builder.CreateBr(MergeBB);
    453   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
    454   ElseBB = Builder.GetInsertBlock();
    455 </pre>
    456 </div>
    457 
    458 <p>Code generation for the 'else' block is basically identical to codegen for
    459 the 'then' block.  The only significant difference is the first line, which adds
    460 the 'else' block to the function.  Recall previously that the 'else' block was
    461 created, but not added to the function.  Now that the 'then' and 'else' blocks
    462 are emitted, we can finish up with the merge code:</p>
    463 
    464 <div class="doc_code">
    465 <pre>
    466   // Emit merge block.
    467   TheFunction->getBasicBlockList().push_back(MergeBB);
    468   Builder.SetInsertPoint(MergeBB);
    469   PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
    470                                   "iftmp");
    471   
    472   PN->addIncoming(ThenV, ThenBB);
    473   PN->addIncoming(ElseV, ElseBB);
    474   return PN;
    475 }
    476 </pre>
    477 </div>
    478 
    479 <p>The first two lines here are now familiar: the first adds the "merge" block
    480 to the Function object (it was previously floating, like the else block above).
    481 The second block changes the insertion point so that newly created code will go
    482 into the "merge" block.  Once that is done, we need to create the PHI node and
    483 set up the block/value pairs for the PHI.</p>
    484 
    485 <p>Finally, the CodeGen function returns the phi node as the value computed by
    486 the if/then/else expression.  In our example above, this returned value will 
    487 feed into the code for the top-level function, which will create the return
    488 instruction.</p>
    489 
    490 <p>Overall, we now have the ability to execute conditional code in
    491 Kaleidoscope.  With this extension, Kaleidoscope is a fairly complete language
    492 that can calculate a wide variety of numeric functions.  Next up we'll add
    493 another useful expression that is familiar from non-functional languages...</p>
    494 
    495 </div>
    496 
    497 </div>
    498 
    499 <!-- *********************************************************************** -->
    500 <h2><a name="for">'for' Loop Expression</a></h2>
    501 <!-- *********************************************************************** -->
    502 
    503 <div>
    504 
    505 <p>Now that we know how to add basic control flow constructs to the language,
    506 we have the tools to add more powerful things.  Lets add something more
    507 aggressive, a 'for' expression:</p>
    508 
    509 <div class="doc_code">
    510 <pre>
    511  extern putchard(char)
    512  def printstar(n)
    513    for i = 1, i &lt; n, 1.0 in
    514      putchard(42);  # ascii 42 = '*'
    515      
    516  # print 100 '*' characters
    517  printstar(100);
    518 </pre>
    519 </div>
    520 
    521 <p>This expression defines a new variable ("i" in this case) which iterates from
    522 a starting value, while the condition ("i &lt; n" in this case) is true, 
    523 incrementing by an optional step value ("1.0" in this case).  If the step value
    524 is omitted, it defaults to 1.0.  While the loop is true, it executes its 
    525 body expression.  Because we don't have anything better to return, we'll just
    526 define the loop as always returning 0.0.  In the future when we have mutable
    527 variables, it will get more useful.</p>
    528 
    529 <p>As before, lets talk about the changes that we need to Kaleidoscope to
    530 support this.</p>
    531 
    532 <!-- ======================================================================= -->
    533 <h4><a name="forlexer">Lexer Extensions for the 'for' Loop</a></h4>
    534 <!-- ======================================================================= -->
    535 
    536 <div>
    537 
    538 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
    539 
    540 <div class="doc_code">
    541 <pre>
    542   ... in enum Token ...
    543   // control
    544   tok_if = -6, tok_then = -7, tok_else = -8,
    545 <b>  tok_for = -9, tok_in = -10</b>
    546 
    547   ... in gettok ...
    548   if (IdentifierStr == "def") return tok_def;
    549   if (IdentifierStr == "extern") return tok_extern;
    550   if (IdentifierStr == "if") return tok_if;
    551   if (IdentifierStr == "then") return tok_then;
    552   if (IdentifierStr == "else") return tok_else;
    553   <b>if (IdentifierStr == "for") return tok_for;
    554   if (IdentifierStr == "in") return tok_in;</b>
    555   return tok_identifier;
    556 </pre>
    557 </div>
    558 
    559 </div>
    560 
    561 <!-- ======================================================================= -->
    562 <h4><a name="forast">AST Extensions for the 'for' Loop</a></h4>
    563 <!-- ======================================================================= -->
    564 
    565 <div>
    566 
    567 <p>The AST node is just as simple.  It basically boils down to capturing
    568 the variable name and the constituent expressions in the node.</p>
    569 
    570 <div class="doc_code">
    571 <pre>
    572 /// ForExprAST - Expression class for for/in.
    573 class ForExprAST : public ExprAST {
    574   std::string VarName;
    575   ExprAST *Start, *End, *Step, *Body;
    576 public:
    577   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
    578              ExprAST *step, ExprAST *body)
    579     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
    580   virtual Value *Codegen();
    581 };
    582 </pre>
    583 </div>
    584 
    585 </div>
    586 
    587 <!-- ======================================================================= -->
    588 <h4><a name="forparser">Parser Extensions for the 'for' Loop</a></h4>
    589 <!-- ======================================================================= -->
    590 
    591 <div>
    592 
    593 <p>The parser code is also fairly standard.  The only interesting thing here is
    594 handling of the optional step value.  The parser code handles it by checking to
    595 see if the second comma is present.  If not, it sets the step value to null in
    596 the AST node:</p>
    597 
    598 <div class="doc_code">
    599 <pre>
    600 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
    601 static ExprAST *ParseForExpr() {
    602   getNextToken();  // eat the for.
    603 
    604   if (CurTok != tok_identifier)
    605     return Error("expected identifier after for");
    606   
    607   std::string IdName = IdentifierStr;
    608   getNextToken();  // eat identifier.
    609   
    610   if (CurTok != '=')
    611     return Error("expected '=' after for");
    612   getNextToken();  // eat '='.
    613   
    614   
    615   ExprAST *Start = ParseExpression();
    616   if (Start == 0) return 0;
    617   if (CurTok != ',')
    618     return Error("expected ',' after for start value");
    619   getNextToken();
    620   
    621   ExprAST *End = ParseExpression();
    622   if (End == 0) return 0;
    623   
    624   // The step value is optional.
    625   ExprAST *Step = 0;
    626   if (CurTok == ',') {
    627     getNextToken();
    628     Step = ParseExpression();
    629     if (Step == 0) return 0;
    630   }
    631   
    632   if (CurTok != tok_in)
    633     return Error("expected 'in' after for");
    634   getNextToken();  // eat 'in'.
    635   
    636   ExprAST *Body = ParseExpression();
    637   if (Body == 0) return 0;
    638 
    639   return new ForExprAST(IdName, Start, End, Step, Body);
    640 }
    641 </pre>
    642 </div>
    643 
    644 </div>
    645 
    646 <!-- ======================================================================= -->
    647 <h4><a name="forir">LLVM IR for the 'for' Loop</a></h4>
    648 <!-- ======================================================================= -->
    649 
    650 <div>
    651 
    652 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
    653 With the simple example above, we get this LLVM IR (note that this dump is
    654 generated with optimizations disabled for clarity):
    655 </p>
    656 
    657 <div class="doc_code">
    658 <pre>
    659 declare double @putchard(double)
    660 
    661 define double @printstar(double %n) {
    662 entry:
    663   ; initial value = 1.0 (inlined into phi)
    664   br label %loop
    665 
    666 loop:		; preds = %loop, %entry
    667   %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
    668   ; body
    669   %calltmp = call double @putchard(double 4.200000e+01)
    670   ; increment
    671   %nextvar = fadd double %i, 1.000000e+00
    672 
    673   ; termination test
    674   %cmptmp = fcmp ult double %i, %n
    675   %booltmp = uitofp i1 %cmptmp to double
    676   %loopcond = fcmp one double %booltmp, 0.000000e+00
    677   br i1 %loopcond, label %loop, label %afterloop
    678 
    679 afterloop:		; preds = %loop
    680   ; loop always returns 0.0
    681   ret double 0.000000e+00
    682 }
    683 </pre>
    684 </div>
    685 
    686 <p>This loop contains all the same constructs we saw before: a phi node, several
    687 expressions, and some basic blocks.  Lets see how this fits together.</p>
    688 
    689 </div>
    690 
    691 <!-- ======================================================================= -->
    692 <h4><a name="forcodegen">Code Generation for the 'for' Loop</a></h4>
    693 <!-- ======================================================================= -->
    694 
    695 <div>
    696 
    697 <p>The first part of Codegen is very simple: we just output the start expression
    698 for the loop value:</p>
    699 
    700 <div class="doc_code">
    701 <pre>
    702 Value *ForExprAST::Codegen() {
    703   // Emit the start code first, without 'variable' in scope.
    704   Value *StartVal = Start-&gt;Codegen();
    705   if (StartVal == 0) return 0;
    706 </pre>
    707 </div>
    708 
    709 <p>With this out of the way, the next step is to set up the LLVM basic block
    710 for the start of the loop body.  In the case above, the whole loop body is one
    711 block, but remember that the body code itself could consist of multiple blocks
    712 (e.g. if it contains an if/then/else or a for/in expression).</p>
    713 
    714 <div class="doc_code">
    715 <pre>
    716   // Make the new basic block for the loop header, inserting after current
    717   // block.
    718   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
    719   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
    720   BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
    721   
    722   // Insert an explicit fall through from the current block to the LoopBB.
    723   Builder.CreateBr(LoopBB);
    724 </pre>
    725 </div>
    726 
    727 <p>This code is similar to what we saw for if/then/else.  Because we will need
    728 it to create the Phi node, we remember the block that falls through into the
    729 loop.  Once we have that, we create the actual block that starts the loop and
    730 create an unconditional branch for the fall-through between the two blocks.</p>
    731   
    732 <div class="doc_code">
    733 <pre>
    734   // Start insertion in LoopBB.
    735   Builder.SetInsertPoint(LoopBB);
    736   
    737   // Start the PHI node with an entry for Start.
    738   PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
    739   Variable-&gt;addIncoming(StartVal, PreheaderBB);
    740 </pre>
    741 </div>
    742 
    743 <p>Now that the "preheader" for the loop is set up, we switch to emitting code
    744 for the loop body.  To begin with, we move the insertion point and create the
    745 PHI node for the loop induction variable.  Since we already know the incoming
    746 value for the starting value, we add it to the Phi node.  Note that the Phi will
    747 eventually get a second value for the backedge, but we can't set it up yet
    748 (because it doesn't exist!).</p>
    749 
    750 <div class="doc_code">
    751 <pre>
    752   // Within the loop, the variable is defined equal to the PHI node.  If it
    753   // shadows an existing variable, we have to restore it, so save it now.
    754   Value *OldVal = NamedValues[VarName];
    755   NamedValues[VarName] = Variable;
    756   
    757   // Emit the body of the loop.  This, like any other expr, can change the
    758   // current BB.  Note that we ignore the value computed by the body, but don't
    759   // allow an error.
    760   if (Body-&gt;Codegen() == 0)
    761     return 0;
    762 </pre>
    763 </div>
    764 
    765 <p>Now the code starts to get more interesting.  Our 'for' loop introduces a new
    766 variable to the symbol table.  This means that our symbol table can now contain
    767 either function arguments or loop variables.  To handle this, before we codegen
    768 the body of the loop, we add the loop variable as the current value for its
    769 name.  Note that it is possible that there is a variable of the same name in the
    770 outer scope.  It would be easy to make this an error (emit an error and return
    771 null if there is already an entry for VarName) but we choose to allow shadowing
    772 of variables.  In order to handle this correctly, we remember the Value that
    773 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
    774 no shadowed variable).</p>
    775 
    776 <p>Once the loop variable is set into the symbol table, the code recursively
    777 codegen's the body.  This allows the body to use the loop variable: any
    778 references to it will naturally find it in the symbol table.</p>
    779 
    780 <div class="doc_code">
    781 <pre>
    782   // Emit the step value.
    783   Value *StepVal;
    784   if (Step) {
    785     StepVal = Step-&gt;Codegen();
    786     if (StepVal == 0) return 0;
    787   } else {
    788     // If not specified, use 1.0.
    789     StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
    790   }
    791   
    792   Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
    793 </pre>
    794 </div>
    795 
    796 <p>Now that the body is emitted, we compute the next value of the iteration
    797 variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>'
    798 will be the value of the loop variable on the next iteration of the loop.</p>
    799 
    800 <div class="doc_code">
    801 <pre>
    802   // Compute the end condition.
    803   Value *EndCond = End-&gt;Codegen();
    804   if (EndCond == 0) return EndCond;
    805   
    806   // Convert condition to a bool by comparing equal to 0.0.
    807   EndCond = Builder.CreateFCmpONE(EndCond, 
    808                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
    809                                   "loopcond");
    810 </pre>
    811 </div>
    812 
    813 <p>Finally, we evaluate the exit value of the loop, to determine whether the
    814 loop should exit.  This mirrors the condition evaluation for the if/then/else
    815 statement.</p>
    816       
    817 <div class="doc_code">
    818 <pre>
    819   // Create the "after loop" block and insert it.
    820   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
    821   BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
    822   
    823   // Insert the conditional branch into the end of LoopEndBB.
    824   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
    825   
    826   // Any new code will be inserted in AfterBB.
    827   Builder.SetInsertPoint(AfterBB);
    828 </pre>
    829 </div>
    830 
    831 <p>With the code for the body of the loop complete, we just need to finish up
    832 the control flow for it.  This code remembers the end block (for the phi node),
    833 then creates the block for the loop exit ("afterloop").  Based on the value of
    834 the exit condition, it creates a conditional branch that chooses between
    835 executing the loop again and exiting the loop.  Any future code is emitted in
    836 the "afterloop" block, so it sets the insertion position to it.</p>
    837   
    838 <div class="doc_code">
    839 <pre>
    840   // Add a new entry to the PHI node for the backedge.
    841   Variable-&gt;addIncoming(NextVar, LoopEndBB);
    842   
    843   // Restore the unshadowed variable.
    844   if (OldVal)
    845     NamedValues[VarName] = OldVal;
    846   else
    847     NamedValues.erase(VarName);
    848   
    849   // for expr always returns 0.0.
    850   return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
    851 }
    852 </pre>
    853 </div>
    854 
    855 <p>The final code handles various cleanups: now that we have the "NextVar"
    856 value, we can add the incoming value to the loop PHI node.  After that, we
    857 remove the loop variable from the symbol table, so that it isn't in scope after
    858 the for loop.  Finally, code generation of the for loop always returns 0.0, so
    859 that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
    860 
    861 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
    862 the tutorial.  In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors
    863 to know.  In the next chapter of our saga, we will get a bit crazier and add
    864 <a href="LangImpl6.html">user-defined operators</a> to our poor innocent 
    865 language.</p>
    866 
    867 </div>
    868 
    869 </div>
    870 
    871 <!-- *********************************************************************** -->
    872 <h2><a name="code">Full Code Listing</a></h2>
    873 <!-- *********************************************************************** -->
    874 
    875 <div>
    876 
    877 <p>
    878 Here is the complete code listing for our running example, enhanced with the
    879 if/then/else and for expressions..  To build this example, use:
    880 </p>
    881 
    882 <div class="doc_code">
    883 <pre>
    884 # Compile
    885 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
    886 # Run
    887 ./toy
    888 </pre>
    889 </div>
    890 
    891 <p>Here is the code:</p>
    892 
    893 <div class="doc_code">
    894 <pre>
    895 #include "llvm/DerivedTypes.h"
    896 #include "llvm/ExecutionEngine/ExecutionEngine.h"
    897 #include "llvm/ExecutionEngine/JIT.h"
    898 #include "llvm/IRBuilder.h"
    899 #include "llvm/LLVMContext.h"
    900 #include "llvm/Module.h"
    901 #include "llvm/PassManager.h"
    902 #include "llvm/Analysis/Verifier.h"
    903 #include "llvm/Analysis/Passes.h"
    904 #include "llvm/Target/TargetData.h"
    905 #include "llvm/Transforms/Scalar.h"
    906 #include "llvm/Support/TargetSelect.h"
    907 #include &lt;cstdio&gt;
    908 #include &lt;string&gt;
    909 #include &lt;map&gt;
    910 #include &lt;vector&gt;
    911 using namespace llvm;
    912 
    913 //===----------------------------------------------------------------------===//
    914 // Lexer
    915 //===----------------------------------------------------------------------===//
    916 
    917 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
    918 // of these for known things.
    919 enum Token {
    920   tok_eof = -1,
    921 
    922   // commands
    923   tok_def = -2, tok_extern = -3,
    924 
    925   // primary
    926   tok_identifier = -4, tok_number = -5,
    927   
    928   // control
    929   tok_if = -6, tok_then = -7, tok_else = -8,
    930   tok_for = -9, tok_in = -10
    931 };
    932 
    933 static std::string IdentifierStr;  // Filled in if tok_identifier
    934 static double NumVal;              // Filled in if tok_number
    935 
    936 /// gettok - Return the next token from standard input.
    937 static int gettok() {
    938   static int LastChar = ' ';
    939 
    940   // Skip any whitespace.
    941   while (isspace(LastChar))
    942     LastChar = getchar();
    943 
    944   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
    945     IdentifierStr = LastChar;
    946     while (isalnum((LastChar = getchar())))
    947       IdentifierStr += LastChar;
    948 
    949     if (IdentifierStr == "def") return tok_def;
    950     if (IdentifierStr == "extern") return tok_extern;
    951     if (IdentifierStr == "if") return tok_if;
    952     if (IdentifierStr == "then") return tok_then;
    953     if (IdentifierStr == "else") return tok_else;
    954     if (IdentifierStr == "for") return tok_for;
    955     if (IdentifierStr == "in") return tok_in;
    956     return tok_identifier;
    957   }
    958 
    959   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
    960     std::string NumStr;
    961     do {
    962       NumStr += LastChar;
    963       LastChar = getchar();
    964     } while (isdigit(LastChar) || LastChar == '.');
    965 
    966     NumVal = strtod(NumStr.c_str(), 0);
    967     return tok_number;
    968   }
    969 
    970   if (LastChar == '#') {
    971     // Comment until end of line.
    972     do LastChar = getchar();
    973     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
    974     
    975     if (LastChar != EOF)
    976       return gettok();
    977   }
    978   
    979   // Check for end of file.  Don't eat the EOF.
    980   if (LastChar == EOF)
    981     return tok_eof;
    982 
    983   // Otherwise, just return the character as its ascii value.
    984   int ThisChar = LastChar;
    985   LastChar = getchar();
    986   return ThisChar;
    987 }
    988 
    989 //===----------------------------------------------------------------------===//
    990 // Abstract Syntax Tree (aka Parse Tree)
    991 //===----------------------------------------------------------------------===//
    992 
    993 /// ExprAST - Base class for all expression nodes.
    994 class ExprAST {
    995 public:
    996   virtual ~ExprAST() {}
    997   virtual Value *Codegen() = 0;
    998 };
    999 
   1000 /// NumberExprAST - Expression class for numeric literals like "1.0".
   1001 class NumberExprAST : public ExprAST {
   1002   double Val;
   1003 public:
   1004   NumberExprAST(double val) : Val(val) {}
   1005   virtual Value *Codegen();
   1006 };
   1007 
   1008 /// VariableExprAST - Expression class for referencing a variable, like "a".
   1009 class VariableExprAST : public ExprAST {
   1010   std::string Name;
   1011 public:
   1012   VariableExprAST(const std::string &amp;name) : Name(name) {}
   1013   virtual Value *Codegen();
   1014 };
   1015 
   1016 /// BinaryExprAST - Expression class for a binary operator.
   1017 class BinaryExprAST : public ExprAST {
   1018   char Op;
   1019   ExprAST *LHS, *RHS;
   1020 public:
   1021   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 
   1022     : Op(op), LHS(lhs), RHS(rhs) {}
   1023   virtual Value *Codegen();
   1024 };
   1025 
   1026 /// CallExprAST - Expression class for function calls.
   1027 class CallExprAST : public ExprAST {
   1028   std::string Callee;
   1029   std::vector&lt;ExprAST*&gt; Args;
   1030 public:
   1031   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
   1032     : Callee(callee), Args(args) {}
   1033   virtual Value *Codegen();
   1034 };
   1035 
   1036 /// IfExprAST - Expression class for if/then/else.
   1037 class IfExprAST : public ExprAST {
   1038   ExprAST *Cond, *Then, *Else;
   1039 public:
   1040   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
   1041   : Cond(cond), Then(then), Else(_else) {}
   1042   virtual Value *Codegen();
   1043 };
   1044 
   1045 /// ForExprAST - Expression class for for/in.
   1046 class ForExprAST : public ExprAST {
   1047   std::string VarName;
   1048   ExprAST *Start, *End, *Step, *Body;
   1049 public:
   1050   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
   1051              ExprAST *step, ExprAST *body)
   1052     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
   1053   virtual Value *Codegen();
   1054 };
   1055 
   1056 /// PrototypeAST - This class represents the "prototype" for a function,
   1057 /// which captures its name, and its argument names (thus implicitly the number
   1058 /// of arguments the function takes).
   1059 class PrototypeAST {
   1060   std::string Name;
   1061   std::vector&lt;std::string&gt; Args;
   1062 public:
   1063   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
   1064     : Name(name), Args(args) {}
   1065   
   1066   Function *Codegen();
   1067 };
   1068 
   1069 /// FunctionAST - This class represents a function definition itself.
   1070 class FunctionAST {
   1071   PrototypeAST *Proto;
   1072   ExprAST *Body;
   1073 public:
   1074   FunctionAST(PrototypeAST *proto, ExprAST *body)
   1075     : Proto(proto), Body(body) {}
   1076   
   1077   Function *Codegen();
   1078 };
   1079 
   1080 //===----------------------------------------------------------------------===//
   1081 // Parser
   1082 //===----------------------------------------------------------------------===//
   1083 
   1084 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
   1085 /// token the parser is looking at.  getNextToken reads another token from the
   1086 /// lexer and updates CurTok with its results.
   1087 static int CurTok;
   1088 static int getNextToken() {
   1089   return CurTok = gettok();
   1090 }
   1091 
   1092 /// BinopPrecedence - This holds the precedence for each binary operator that is
   1093 /// defined.
   1094 static std::map&lt;char, int&gt; BinopPrecedence;
   1095 
   1096 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
   1097 static int GetTokPrecedence() {
   1098   if (!isascii(CurTok))
   1099     return -1;
   1100   
   1101   // Make sure it's a declared binop.
   1102   int TokPrec = BinopPrecedence[CurTok];
   1103   if (TokPrec &lt;= 0) return -1;
   1104   return TokPrec;
   1105 }
   1106 
   1107 /// Error* - These are little helper functions for error handling.
   1108 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
   1109 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
   1110 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
   1111 
   1112 static ExprAST *ParseExpression();
   1113 
   1114 /// identifierexpr
   1115 ///   ::= identifier
   1116 ///   ::= identifier '(' expression* ')'
   1117 static ExprAST *ParseIdentifierExpr() {
   1118   std::string IdName = IdentifierStr;
   1119   
   1120   getNextToken();  // eat identifier.
   1121   
   1122   if (CurTok != '(') // Simple variable ref.
   1123     return new VariableExprAST(IdName);
   1124   
   1125   // Call.
   1126   getNextToken();  // eat (
   1127   std::vector&lt;ExprAST*&gt; Args;
   1128   if (CurTok != ')') {
   1129     while (1) {
   1130       ExprAST *Arg = ParseExpression();
   1131       if (!Arg) return 0;
   1132       Args.push_back(Arg);
   1133 
   1134       if (CurTok == ')') break;
   1135 
   1136       if (CurTok != ',')
   1137         return Error("Expected ')' or ',' in argument list");
   1138       getNextToken();
   1139     }
   1140   }
   1141 
   1142   // Eat the ')'.
   1143   getNextToken();
   1144   
   1145   return new CallExprAST(IdName, Args);
   1146 }
   1147 
   1148 /// numberexpr ::= number
   1149 static ExprAST *ParseNumberExpr() {
   1150   ExprAST *Result = new NumberExprAST(NumVal);
   1151   getNextToken(); // consume the number
   1152   return Result;
   1153 }
   1154 
   1155 /// parenexpr ::= '(' expression ')'
   1156 static ExprAST *ParseParenExpr() {
   1157   getNextToken();  // eat (.
   1158   ExprAST *V = ParseExpression();
   1159   if (!V) return 0;
   1160   
   1161   if (CurTok != ')')
   1162     return Error("expected ')'");
   1163   getNextToken();  // eat ).
   1164   return V;
   1165 }
   1166 
   1167 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
   1168 static ExprAST *ParseIfExpr() {
   1169   getNextToken();  // eat the if.
   1170   
   1171   // condition.
   1172   ExprAST *Cond = ParseExpression();
   1173   if (!Cond) return 0;
   1174   
   1175   if (CurTok != tok_then)
   1176     return Error("expected then");
   1177   getNextToken();  // eat the then
   1178   
   1179   ExprAST *Then = ParseExpression();
   1180   if (Then == 0) return 0;
   1181   
   1182   if (CurTok != tok_else)
   1183     return Error("expected else");
   1184   
   1185   getNextToken();
   1186   
   1187   ExprAST *Else = ParseExpression();
   1188   if (!Else) return 0;
   1189   
   1190   return new IfExprAST(Cond, Then, Else);
   1191 }
   1192 
   1193 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
   1194 static ExprAST *ParseForExpr() {
   1195   getNextToken();  // eat the for.
   1196 
   1197   if (CurTok != tok_identifier)
   1198     return Error("expected identifier after for");
   1199   
   1200   std::string IdName = IdentifierStr;
   1201   getNextToken();  // eat identifier.
   1202   
   1203   if (CurTok != '=')
   1204     return Error("expected '=' after for");
   1205   getNextToken();  // eat '='.
   1206   
   1207   
   1208   ExprAST *Start = ParseExpression();
   1209   if (Start == 0) return 0;
   1210   if (CurTok != ',')
   1211     return Error("expected ',' after for start value");
   1212   getNextToken();
   1213   
   1214   ExprAST *End = ParseExpression();
   1215   if (End == 0) return 0;
   1216   
   1217   // The step value is optional.
   1218   ExprAST *Step = 0;
   1219   if (CurTok == ',') {
   1220     getNextToken();
   1221     Step = ParseExpression();
   1222     if (Step == 0) return 0;
   1223   }
   1224   
   1225   if (CurTok != tok_in)
   1226     return Error("expected 'in' after for");
   1227   getNextToken();  // eat 'in'.
   1228   
   1229   ExprAST *Body = ParseExpression();
   1230   if (Body == 0) return 0;
   1231 
   1232   return new ForExprAST(IdName, Start, End, Step, Body);
   1233 }
   1234 
   1235 /// primary
   1236 ///   ::= identifierexpr
   1237 ///   ::= numberexpr
   1238 ///   ::= parenexpr
   1239 ///   ::= ifexpr
   1240 ///   ::= forexpr
   1241 static ExprAST *ParsePrimary() {
   1242   switch (CurTok) {
   1243   default: return Error("unknown token when expecting an expression");
   1244   case tok_identifier: return ParseIdentifierExpr();
   1245   case tok_number:     return ParseNumberExpr();
   1246   case '(':            return ParseParenExpr();
   1247   case tok_if:         return ParseIfExpr();
   1248   case tok_for:        return ParseForExpr();
   1249   }
   1250 }
   1251 
   1252 /// binoprhs
   1253 ///   ::= ('+' primary)*
   1254 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
   1255   // If this is a binop, find its precedence.
   1256   while (1) {
   1257     int TokPrec = GetTokPrecedence();
   1258     
   1259     // If this is a binop that binds at least as tightly as the current binop,
   1260     // consume it, otherwise we are done.
   1261     if (TokPrec &lt; ExprPrec)
   1262       return LHS;
   1263     
   1264     // Okay, we know this is a binop.
   1265     int BinOp = CurTok;
   1266     getNextToken();  // eat binop
   1267     
   1268     // Parse the primary expression after the binary operator.
   1269     ExprAST *RHS = ParsePrimary();
   1270     if (!RHS) return 0;
   1271     
   1272     // If BinOp binds less tightly with RHS than the operator after RHS, let
   1273     // the pending operator take RHS as its LHS.
   1274     int NextPrec = GetTokPrecedence();
   1275     if (TokPrec &lt; NextPrec) {
   1276       RHS = ParseBinOpRHS(TokPrec+1, RHS);
   1277       if (RHS == 0) return 0;
   1278     }
   1279     
   1280     // Merge LHS/RHS.
   1281     LHS = new BinaryExprAST(BinOp, LHS, RHS);
   1282   }
   1283 }
   1284 
   1285 /// expression
   1286 ///   ::= primary binoprhs
   1287 ///
   1288 static ExprAST *ParseExpression() {
   1289   ExprAST *LHS = ParsePrimary();
   1290   if (!LHS) return 0;
   1291   
   1292   return ParseBinOpRHS(0, LHS);
   1293 }
   1294 
   1295 /// prototype
   1296 ///   ::= id '(' id* ')'
   1297 static PrototypeAST *ParsePrototype() {
   1298   if (CurTok != tok_identifier)
   1299     return ErrorP("Expected function name in prototype");
   1300 
   1301   std::string FnName = IdentifierStr;
   1302   getNextToken();
   1303   
   1304   if (CurTok != '(')
   1305     return ErrorP("Expected '(' in prototype");
   1306   
   1307   std::vector&lt;std::string&gt; ArgNames;
   1308   while (getNextToken() == tok_identifier)
   1309     ArgNames.push_back(IdentifierStr);
   1310   if (CurTok != ')')
   1311     return ErrorP("Expected ')' in prototype");
   1312   
   1313   // success.
   1314   getNextToken();  // eat ')'.
   1315   
   1316   return new PrototypeAST(FnName, ArgNames);
   1317 }
   1318 
   1319 /// definition ::= 'def' prototype expression
   1320 static FunctionAST *ParseDefinition() {
   1321   getNextToken();  // eat def.
   1322   PrototypeAST *Proto = ParsePrototype();
   1323   if (Proto == 0) return 0;
   1324 
   1325   if (ExprAST *E = ParseExpression())
   1326     return new FunctionAST(Proto, E);
   1327   return 0;
   1328 }
   1329 
   1330 /// toplevelexpr ::= expression
   1331 static FunctionAST *ParseTopLevelExpr() {
   1332   if (ExprAST *E = ParseExpression()) {
   1333     // Make an anonymous proto.
   1334     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
   1335     return new FunctionAST(Proto, E);
   1336   }
   1337   return 0;
   1338 }
   1339 
   1340 /// external ::= 'extern' prototype
   1341 static PrototypeAST *ParseExtern() {
   1342   getNextToken();  // eat extern.
   1343   return ParsePrototype();
   1344 }
   1345 
   1346 //===----------------------------------------------------------------------===//
   1347 // Code Generation
   1348 //===----------------------------------------------------------------------===//
   1349 
   1350 static Module *TheModule;
   1351 static IRBuilder&lt;&gt; Builder(getGlobalContext());
   1352 static std::map&lt;std::string, Value*&gt; NamedValues;
   1353 static FunctionPassManager *TheFPM;
   1354 
   1355 Value *ErrorV(const char *Str) { Error(Str); return 0; }
   1356 
   1357 Value *NumberExprAST::Codegen() {
   1358   return ConstantFP::get(getGlobalContext(), APFloat(Val));
   1359 }
   1360 
   1361 Value *VariableExprAST::Codegen() {
   1362   // Look this variable up in the function.
   1363   Value *V = NamedValues[Name];
   1364   return V ? V : ErrorV("Unknown variable name");
   1365 }
   1366 
   1367 Value *BinaryExprAST::Codegen() {
   1368   Value *L = LHS-&gt;Codegen();
   1369   Value *R = RHS-&gt;Codegen();
   1370   if (L == 0 || R == 0) return 0;
   1371   
   1372   switch (Op) {
   1373   case '+': return Builder.CreateFAdd(L, R, "addtmp");
   1374   case '-': return Builder.CreateFSub(L, R, "subtmp");
   1375   case '*': return Builder.CreateFMul(L, R, "multmp");
   1376   case '&lt;':
   1377     L = Builder.CreateFCmpULT(L, R, "cmptmp");
   1378     // Convert bool 0/1 to double 0.0 or 1.0
   1379     return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
   1380                                 "booltmp");
   1381   default: return ErrorV("invalid binary operator");
   1382   }
   1383 }
   1384 
   1385 Value *CallExprAST::Codegen() {
   1386   // Look up the name in the global module table.
   1387   Function *CalleeF = TheModule-&gt;getFunction(Callee);
   1388   if (CalleeF == 0)
   1389     return ErrorV("Unknown function referenced");
   1390   
   1391   // If argument mismatch error.
   1392   if (CalleeF-&gt;arg_size() != Args.size())
   1393     return ErrorV("Incorrect # arguments passed");
   1394 
   1395   std::vector&lt;Value*&gt; ArgsV;
   1396   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
   1397     ArgsV.push_back(Args[i]-&gt;Codegen());
   1398     if (ArgsV.back() == 0) return 0;
   1399   }
   1400   
   1401   return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
   1402 }
   1403 
   1404 Value *IfExprAST::Codegen() {
   1405   Value *CondV = Cond-&gt;Codegen();
   1406   if (CondV == 0) return 0;
   1407   
   1408   // Convert condition to a bool by comparing equal to 0.0.
   1409   CondV = Builder.CreateFCmpONE(CondV, 
   1410                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
   1411                                 "ifcond");
   1412   
   1413   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
   1414   
   1415   // Create blocks for the then and else cases.  Insert the 'then' block at the
   1416   // end of the function.
   1417   BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
   1418   BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
   1419   BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
   1420   
   1421   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
   1422   
   1423   // Emit then value.
   1424   Builder.SetInsertPoint(ThenBB);
   1425   
   1426   Value *ThenV = Then-&gt;Codegen();
   1427   if (ThenV == 0) return 0;
   1428   
   1429   Builder.CreateBr(MergeBB);
   1430   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
   1431   ThenBB = Builder.GetInsertBlock();
   1432   
   1433   // Emit else block.
   1434   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
   1435   Builder.SetInsertPoint(ElseBB);
   1436   
   1437   Value *ElseV = Else-&gt;Codegen();
   1438   if (ElseV == 0) return 0;
   1439   
   1440   Builder.CreateBr(MergeBB);
   1441   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
   1442   ElseBB = Builder.GetInsertBlock();
   1443   
   1444   // Emit merge block.
   1445   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
   1446   Builder.SetInsertPoint(MergeBB);
   1447   PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
   1448                                   "iftmp");
   1449   
   1450   PN-&gt;addIncoming(ThenV, ThenBB);
   1451   PN-&gt;addIncoming(ElseV, ElseBB);
   1452   return PN;
   1453 }
   1454 
   1455 Value *ForExprAST::Codegen() {
   1456   // Output this as:
   1457   //   ...
   1458   //   start = startexpr
   1459   //   goto loop
   1460   // loop: 
   1461   //   variable = phi [start, loopheader], [nextvariable, loopend]
   1462   //   ...
   1463   //   bodyexpr
   1464   //   ...
   1465   // loopend:
   1466   //   step = stepexpr
   1467   //   nextvariable = variable + step
   1468   //   endcond = endexpr
   1469   //   br endcond, loop, endloop
   1470   // outloop:
   1471   
   1472   // Emit the start code first, without 'variable' in scope.
   1473   Value *StartVal = Start-&gt;Codegen();
   1474   if (StartVal == 0) return 0;
   1475   
   1476   // Make the new basic block for the loop header, inserting after current
   1477   // block.
   1478   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
   1479   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
   1480   BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
   1481   
   1482   // Insert an explicit fall through from the current block to the LoopBB.
   1483   Builder.CreateBr(LoopBB);
   1484 
   1485   // Start insertion in LoopBB.
   1486   Builder.SetInsertPoint(LoopBB);
   1487   
   1488   // Start the PHI node with an entry for Start.
   1489   PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
   1490   Variable-&gt;addIncoming(StartVal, PreheaderBB);
   1491   
   1492   // Within the loop, the variable is defined equal to the PHI node.  If it
   1493   // shadows an existing variable, we have to restore it, so save it now.
   1494   Value *OldVal = NamedValues[VarName];
   1495   NamedValues[VarName] = Variable;
   1496   
   1497   // Emit the body of the loop.  This, like any other expr, can change the
   1498   // current BB.  Note that we ignore the value computed by the body, but don't
   1499   // allow an error.
   1500   if (Body-&gt;Codegen() == 0)
   1501     return 0;
   1502   
   1503   // Emit the step value.
   1504   Value *StepVal;
   1505   if (Step) {
   1506     StepVal = Step-&gt;Codegen();
   1507     if (StepVal == 0) return 0;
   1508   } else {
   1509     // If not specified, use 1.0.
   1510     StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
   1511   }
   1512   
   1513   Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
   1514 
   1515   // Compute the end condition.
   1516   Value *EndCond = End-&gt;Codegen();
   1517   if (EndCond == 0) return EndCond;
   1518   
   1519   // Convert condition to a bool by comparing equal to 0.0.
   1520   EndCond = Builder.CreateFCmpONE(EndCond, 
   1521                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
   1522                                   "loopcond");
   1523   
   1524   // Create the "after loop" block and insert it.
   1525   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
   1526   BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
   1527   
   1528   // Insert the conditional branch into the end of LoopEndBB.
   1529   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
   1530   
   1531   // Any new code will be inserted in AfterBB.
   1532   Builder.SetInsertPoint(AfterBB);
   1533   
   1534   // Add a new entry to the PHI node for the backedge.
   1535   Variable-&gt;addIncoming(NextVar, LoopEndBB);
   1536   
   1537   // Restore the unshadowed variable.
   1538   if (OldVal)
   1539     NamedValues[VarName] = OldVal;
   1540   else
   1541     NamedValues.erase(VarName);
   1542 
   1543   
   1544   // for expr always returns 0.0.
   1545   return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
   1546 }
   1547 
   1548 Function *PrototypeAST::Codegen() {
   1549   // Make the function type:  double(double,double) etc.
   1550   std::vector&lt;Type*&gt; Doubles(Args.size(),
   1551                              Type::getDoubleTy(getGlobalContext()));
   1552   FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
   1553                                        Doubles, false);
   1554   
   1555   Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
   1556   
   1557   // If F conflicted, there was already something named 'Name'.  If it has a
   1558   // body, don't allow redefinition or reextern.
   1559   if (F-&gt;getName() != Name) {
   1560     // Delete the one we just made and get the existing one.
   1561     F-&gt;eraseFromParent();
   1562     F = TheModule-&gt;getFunction(Name);
   1563     
   1564     // If F already has a body, reject this.
   1565     if (!F-&gt;empty()) {
   1566       ErrorF("redefinition of function");
   1567       return 0;
   1568     }
   1569     
   1570     // If F took a different number of args, reject.
   1571     if (F-&gt;arg_size() != Args.size()) {
   1572       ErrorF("redefinition of function with different # args");
   1573       return 0;
   1574     }
   1575   }
   1576   
   1577   // Set names for all arguments.
   1578   unsigned Idx = 0;
   1579   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
   1580        ++AI, ++Idx) {
   1581     AI-&gt;setName(Args[Idx]);
   1582     
   1583     // Add arguments to variable symbol table.
   1584     NamedValues[Args[Idx]] = AI;
   1585   }
   1586   
   1587   return F;
   1588 }
   1589 
   1590 Function *FunctionAST::Codegen() {
   1591   NamedValues.clear();
   1592   
   1593   Function *TheFunction = Proto-&gt;Codegen();
   1594   if (TheFunction == 0)
   1595     return 0;
   1596   
   1597   // Create a new basic block to start insertion into.
   1598   BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
   1599   Builder.SetInsertPoint(BB);
   1600   
   1601   if (Value *RetVal = Body-&gt;Codegen()) {
   1602     // Finish off the function.
   1603     Builder.CreateRet(RetVal);
   1604 
   1605     // Validate the generated code, checking for consistency.
   1606     verifyFunction(*TheFunction);
   1607 
   1608     // Optimize the function.
   1609     TheFPM-&gt;run(*TheFunction);
   1610     
   1611     return TheFunction;
   1612   }
   1613   
   1614   // Error reading body, remove function.
   1615   TheFunction-&gt;eraseFromParent();
   1616   return 0;
   1617 }
   1618 
   1619 //===----------------------------------------------------------------------===//
   1620 // Top-Level parsing and JIT Driver
   1621 //===----------------------------------------------------------------------===//
   1622 
   1623 static ExecutionEngine *TheExecutionEngine;
   1624 
   1625 static void HandleDefinition() {
   1626   if (FunctionAST *F = ParseDefinition()) {
   1627     if (Function *LF = F-&gt;Codegen()) {
   1628       fprintf(stderr, "Read function definition:");
   1629       LF-&gt;dump();
   1630     }
   1631   } else {
   1632     // Skip token for error recovery.
   1633     getNextToken();
   1634   }
   1635 }
   1636 
   1637 static void HandleExtern() {
   1638   if (PrototypeAST *P = ParseExtern()) {
   1639     if (Function *F = P-&gt;Codegen()) {
   1640       fprintf(stderr, "Read extern: ");
   1641       F-&gt;dump();
   1642     }
   1643   } else {
   1644     // Skip token for error recovery.
   1645     getNextToken();
   1646   }
   1647 }
   1648 
   1649 static void HandleTopLevelExpression() {
   1650   // Evaluate a top-level expression into an anonymous function.
   1651   if (FunctionAST *F = ParseTopLevelExpr()) {
   1652     if (Function *LF = F-&gt;Codegen()) {
   1653       // JIT the function, returning a function pointer.
   1654       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
   1655       
   1656       // Cast it to the right type (takes no arguments, returns a double) so we
   1657       // can call it as a native function.
   1658       double (*FP)() = (double (*)())(intptr_t)FPtr;
   1659       fprintf(stderr, "Evaluated to %f\n", FP());
   1660     }
   1661   } else {
   1662     // Skip token for error recovery.
   1663     getNextToken();
   1664   }
   1665 }
   1666 
   1667 /// top ::= definition | external | expression | ';'
   1668 static void MainLoop() {
   1669   while (1) {
   1670     fprintf(stderr, "ready&gt; ");
   1671     switch (CurTok) {
   1672     case tok_eof:    return;
   1673     case ';':        getNextToken(); break;  // ignore top-level semicolons.
   1674     case tok_def:    HandleDefinition(); break;
   1675     case tok_extern: HandleExtern(); break;
   1676     default:         HandleTopLevelExpression(); break;
   1677     }
   1678   }
   1679 }
   1680 
   1681 //===----------------------------------------------------------------------===//
   1682 // "Library" functions that can be "extern'd" from user code.
   1683 //===----------------------------------------------------------------------===//
   1684 
   1685 /// putchard - putchar that takes a double and returns 0.
   1686 extern "C" 
   1687 double putchard(double X) {
   1688   putchar((char)X);
   1689   return 0;
   1690 }
   1691 
   1692 //===----------------------------------------------------------------------===//
   1693 // Main driver code.
   1694 //===----------------------------------------------------------------------===//
   1695 
   1696 int main() {
   1697   InitializeNativeTarget();
   1698   LLVMContext &amp;Context = getGlobalContext();
   1699 
   1700   // Install standard binary operators.
   1701   // 1 is lowest precedence.
   1702   BinopPrecedence['&lt;'] = 10;
   1703   BinopPrecedence['+'] = 20;
   1704   BinopPrecedence['-'] = 20;
   1705   BinopPrecedence['*'] = 40;  // highest.
   1706 
   1707   // Prime the first token.
   1708   fprintf(stderr, "ready&gt; ");
   1709   getNextToken();
   1710 
   1711   // Make the module, which holds all the code.
   1712   TheModule = new Module("my cool jit", Context);
   1713 
   1714   // Create the JIT.  This takes ownership of the module.
   1715   std::string ErrStr;
   1716   TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&amp;ErrStr).create();
   1717   if (!TheExecutionEngine) {
   1718     fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
   1719     exit(1);
   1720   }
   1721 
   1722   FunctionPassManager OurFPM(TheModule);
   1723 
   1724   // Set up the optimizer pipeline.  Start with registering info about how the
   1725   // target lays out data structures.
   1726   OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
   1727   // Provide basic AliasAnalysis support for GVN.
   1728   OurFPM.add(createBasicAliasAnalysisPass());
   1729   // Do simple "peephole" optimizations and bit-twiddling optzns.
   1730   OurFPM.add(createInstructionCombiningPass());
   1731   // Reassociate expressions.
   1732   OurFPM.add(createReassociatePass());
   1733   // Eliminate Common SubExpressions.
   1734   OurFPM.add(createGVNPass());
   1735   // Simplify the control flow graph (deleting unreachable blocks, etc).
   1736   OurFPM.add(createCFGSimplificationPass());
   1737 
   1738   OurFPM.doInitialization();
   1739 
   1740   // Set the global so the code gen can use this.
   1741   TheFPM = &amp;OurFPM;
   1742 
   1743   // Run the main "interpreter loop" now.
   1744   MainLoop();
   1745 
   1746   TheFPM = 0;
   1747 
   1748   // Print out all of the generated code.
   1749   TheModule-&gt;dump();
   1750 
   1751   return 0;
   1752 }
   1753 </pre>
   1754 </div>
   1755 
   1756 <a href="LangImpl6.html">Next: Extending the language: user-defined operators</a>
   1757 </div>
   1758 
   1759 <!-- *********************************************************************** -->
   1760 <hr>
   1761 <address>
   1762   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
   1763   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
   1764   <a href="http://validator.w3.org/check/referer"><img
   1765   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
   1766 
   1767   <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br>
   1768   <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
   1769   Last modified: $Date$
   1770 </address>
   1771 </body>
   1772 </html>
   1773