Home | History | Annotate | Download | only in tutorial
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2                       "http://www.w3.org/TR/html4/strict.dtd">
      3 
      4 <html>
      5 <head>
      6   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
      7          construction</title>
      8   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      9   <meta name="author" content="Chris Lattner">
     10   <link rel="stylesheet" href="../llvm.css" type="text/css">
     11 </head>
     12 
     13 <body>
     14 
     15 <h1>Kaleidoscope: Extending the Language: Mutable Variables</h1>
     16 
     17 <ul>
     18 <li><a href="index.html">Up to Tutorial Index</a></li>
     19 <li>Chapter 7
     20   <ol>
     21     <li><a href="#intro">Chapter 7 Introduction</a></li>
     22     <li><a href="#why">Why is this a hard problem?</a></li>
     23     <li><a href="#memory">Memory in LLVM</a></li>
     24     <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
     25     <li><a href="#adjustments">Adjusting Existing Variables for
     26      Mutation</a></li>
     27     <li><a href="#assignment">New Assignment Operator</a></li>
     28     <li><a href="#localvars">User-defined Local Variables</a></li>
     29     <li><a href="#code">Full Code Listing</a></li>
     30   </ol>
     31 </li>
     32 <li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
     33  tidbits</li>
     34 </ul>
     35 
     36 <div class="doc_author">
     37   <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a></p>
     38 </div>
     39 
     40 <!-- *********************************************************************** -->
     41 <h2><a name="intro">Chapter 7 Introduction</a></h2>
     42 <!-- *********************************************************************** -->
     43 
     44 <div>
     45 
     46 <p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
     47 with LLVM</a>" tutorial.  In chapters 1 through 6, we've built a very
     48 respectable, albeit simple, <a 
     49 href="http://en.wikipedia.org/wiki/Functional_programming">functional
     50 programming language</a>.  In our journey, we learned some parsing techniques,
     51 how to build and represent an AST, how to build LLVM IR, and how to optimize
     52 the resultant code as well as JIT compile it.</p>
     53 
     54 <p>While Kaleidoscope is interesting as a functional language, the fact that it
     55 is functional makes it "too easy" to generate LLVM IR for it.  In particular, a 
     56 functional language makes it very easy to build LLVM IR directly in <a 
     57 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
     58 Since LLVM requires that the input code be in SSA form, this is a very nice
     59 property and it is often unclear to newcomers how to generate code for an
     60 imperative language with mutable variables.</p>
     61 
     62 <p>The short (and happy) summary of this chapter is that there is no need for
     63 your front-end to build SSA form: LLVM provides highly tuned and well tested
     64 support for this, though the way it works is a bit unexpected for some.</p>
     65 
     66 </div>
     67 
     68 <!-- *********************************************************************** -->
     69 <h2><a name="why">Why is this a hard problem?</a></h2>
     70 <!-- *********************************************************************** -->
     71 
     72 <div>
     73 
     74 <p>
     75 To understand why mutable variables cause complexities in SSA construction, 
     76 consider this extremely simple C example:
     77 </p>
     78 
     79 <div class="doc_code">
     80 <pre>
     81 int G, H;
     82 int test(_Bool Condition) {
     83   int X;
     84   if (Condition)
     85     X = G;
     86   else
     87     X = H;
     88   return X;
     89 }
     90 </pre>
     91 </div>
     92 
     93 <p>In this case, we have the variable "X", whose value depends on the path 
     94 executed in the program.  Because there are two different possible values for X
     95 before the return instruction, a PHI node is inserted to merge the two values.
     96 The LLVM IR that we want for this example looks like this:</p>
     97 
     98 <div class="doc_code">
     99 <pre>
    100 @G = weak global i32 0   ; type of @G is i32*
    101 @H = weak global i32 0   ; type of @H is i32*
    102 
    103 define i32 @test(i1 %Condition) {
    104 entry:
    105   br i1 %Condition, label %cond_true, label %cond_false
    106 
    107 cond_true:
    108   %X.0 = load i32* @G
    109   br label %cond_next
    110 
    111 cond_false:
    112   %X.1 = load i32* @H
    113   br label %cond_next
    114 
    115 cond_next:
    116   %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
    117   ret i32 %X.2
    118 }
    119 </pre>
    120 </div>
    121 
    122 <p>In this example, the loads from the G and H global variables are explicit in
    123 the LLVM IR, and they live in the then/else branches of the if statement
    124 (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node
    125 in the cond_next block selects the right value to use based on where control 
    126 flow is coming from: if control flow comes from the cond_false block, X.2 gets
    127 the value of X.1.  Alternatively, if control flow comes from cond_true, it gets
    128 the value of X.0.  The intent of this chapter is not to explain the details of
    129 SSA form.  For more information, see one of the many <a 
    130 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online 
    131 references</a>.</p>
    132 
    133 <p>The question for this article is "who places the phi nodes when lowering 
    134 assignments to mutable variables?".  The issue here is that LLVM 
    135 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
    136 However, SSA construction requires non-trivial algorithms and data structures,
    137 so it is inconvenient and wasteful for every front-end to have to reproduce this
    138 logic.</p>
    139 
    140 </div>
    141 
    142 <!-- *********************************************************************** -->
    143 <h2><a name="memory">Memory in LLVM</a></h2>
    144 <!-- *********************************************************************** -->
    145 
    146 <div>
    147 
    148 <p>The 'trick' here is that while LLVM does require all register values to be
    149 in SSA form, it does not require (or permit) memory objects to be in SSA form.
    150 In the example above, note that the loads from G and H are direct accesses to
    151 G and H: they are not renamed or versioned.  This differs from some other
    152 compiler systems, which do try to version memory objects.  In LLVM, instead of
    153 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a 
    154 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
    155 demand.</p>
    156 
    157 <p>
    158 With this in mind, the high-level idea is that we want to make a stack variable
    159 (which lives in memory, because it is on the stack) for each mutable object in
    160 a function.  To take advantage of this trick, we need to talk about how LLVM
    161 represents stack variables.
    162 </p>
    163 
    164 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
    165 it is carefully designed not to have (or need) an "address-of" operator.  Notice
    166 how the type of the @G/@H global variables is actually "i32*" even though the 
    167 variable is defined as "i32".  What this means is that @G defines <em>space</em>
    168 for an i32 in the global data area, but its <em>name</em> actually refers to the
    169 address for that space.  Stack variables work the same way, except that instead of 
    170 being declared with global variable definitions, they are declared with the 
    171 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
    172 
    173 <div class="doc_code">
    174 <pre>
    175 define i32 @example() {
    176 entry:
    177   %X = alloca i32           ; type of %X is i32*.
    178   ...
    179   %tmp = load i32* %X       ; load the stack value %X from the stack.
    180   %tmp2 = add i32 %tmp, 1   ; increment it
    181   store i32 %tmp2, i32* %X  ; store it back
    182   ...
    183 </pre>
    184 </div>
    185 
    186 <p>This code shows an example of how you can declare and manipulate a stack
    187 variable in the LLVM IR.  Stack memory allocated with the alloca instruction is
    188 fully general: you can pass the address of the stack slot to functions, you can
    189 store it in other variables, etc.  In our example above, we could rewrite the
    190 example to use the alloca technique to avoid using a PHI node:</p>
    191 
    192 <div class="doc_code">
    193 <pre>
    194 @G = weak global i32 0   ; type of @G is i32*
    195 @H = weak global i32 0   ; type of @H is i32*
    196 
    197 define i32 @test(i1 %Condition) {
    198 entry:
    199   %X = alloca i32           ; type of %X is i32*.
    200   br i1 %Condition, label %cond_true, label %cond_false
    201 
    202 cond_true:
    203   %X.0 = load i32* @G
    204   store i32 %X.0, i32* %X   ; Update X
    205   br label %cond_next
    206 
    207 cond_false:
    208   %X.1 = load i32* @H
    209   store i32 %X.1, i32* %X   ; Update X
    210   br label %cond_next
    211 
    212 cond_next:
    213   %X.2 = load i32* %X       ; Read X
    214   ret i32 %X.2
    215 }
    216 </pre>
    217 </div>
    218 
    219 <p>With this, we have discovered a way to handle arbitrary mutable variables
    220 without the need to create Phi nodes at all:</p>
    221 
    222 <ol>
    223 <li>Each mutable variable becomes a stack allocation.</li>
    224 <li>Each read of the variable becomes a load from the stack.</li>
    225 <li>Each update of the variable becomes a store to the stack.</li>
    226 <li>Taking the address of a variable just uses the stack address directly.</li>
    227 </ol>
    228 
    229 <p>While this solution has solved our immediate problem, it introduced another
    230 one: we have now apparently introduced a lot of stack traffic for very simple
    231 and common operations, a major performance problem.  Fortunately for us, the
    232 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
    233 this case, promoting allocas like this into SSA registers, inserting Phi nodes
    234 as appropriate.  If you run this example through the pass, for example, you'll
    235 get:</p>
    236 
    237 <div class="doc_code">
    238 <pre>
    239 $ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
    240 @G = weak global i32 0
    241 @H = weak global i32 0
    242 
    243 define i32 @test(i1 %Condition) {
    244 entry:
    245   br i1 %Condition, label %cond_true, label %cond_false
    246 
    247 cond_true:
    248   %X.0 = load i32* @G
    249   br label %cond_next
    250 
    251 cond_false:
    252   %X.1 = load i32* @H
    253   br label %cond_next
    254 
    255 cond_next:
    256   %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
    257   ret i32 %X.01
    258 }
    259 </pre>
    260 </div>
    261 
    262 <p>The mem2reg pass implements the standard "iterated dominance frontier"
    263 algorithm for constructing SSA form and has a number of optimizations that speed
    264 up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing 
    265 with mutable variables, and we highly recommend that you depend on it.  Note that
    266 mem2reg only works on variables in certain circumstances:</p>
    267 
    268 <ol>
    269 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
    270 promotes them.  It does not apply to global variables or heap allocations.</li>
    271 
    272 <li>mem2reg only looks for alloca instructions in the entry block of the
    273 function.  Being in the entry block guarantees that the alloca is only executed
    274 once, which makes analysis simpler.</li>
    275 
    276 <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If
    277 the address of the stack object is passed to a function, or if any funny pointer
    278 arithmetic is involved, the alloca will not be promoted.</li>
    279 
    280 <li>mem2reg only works on allocas of <a 
    281 href="../LangRef.html#t_classifications">first class</a> 
    282 values (such as pointers, scalars and vectors), and only if the array size
    283 of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of
    284 promoting structs or arrays to registers.  Note that the "scalarrepl" pass is
    285 more powerful and can promote structs, "unions", and arrays in many cases.</li>
    286 
    287 </ol>
    288 
    289 <p>
    290 All of these properties are easy to satisfy for most imperative languages, and
    291 we'll illustrate it below with Kaleidoscope.  The final question you may be
    292 asking is: should I bother with this nonsense for my front-end?  Wouldn't it be
    293 better if I just did SSA construction directly, avoiding use of the mem2reg
    294 optimization pass?  In short, we strongly recommend that you use this technique
    295 for building SSA form, unless there is an extremely good reason not to.  Using
    296 this technique is:</p>
    297 
    298 <ul>
    299 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
    300 mutable variables.  As such, the most common clients of LLVM are using this to
    301 handle a bulk of their variables.  You can be sure that bugs are found fast and
    302 fixed early.</li>
    303 
    304 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
    305 common cases as well as fully general.  For example, it has fast-paths for
    306 variables that are only used in a single block, variables that only have one
    307 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
    308 </li>
    309 
    310 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
    311 Debug information in LLVM</a> relies on having the address of the variable
    312 exposed so that debug info can be attached to it.  This technique dovetails 
    313 very naturally with this style of debug info.</li>
    314 </ul>
    315 
    316 <p>If nothing else, this makes it much easier to get your front-end up and 
    317 running, and is very simple to implement.  Lets extend Kaleidoscope with mutable
    318 variables now!
    319 </p>
    320 
    321 </div>
    322 
    323 <!-- *********************************************************************** -->
    324 <h2><a name="kalvars">Mutable Variables in Kaleidoscope</a></h2>
    325 <!-- *********************************************************************** -->
    326 
    327 <div>
    328 
    329 <p>Now that we know the sort of problem we want to tackle, lets see what this
    330 looks like in the context of our little Kaleidoscope language.  We're going to
    331 add two features:</p>
    332 
    333 <ol>
    334 <li>The ability to mutate variables with the '=' operator.</li>
    335 <li>The ability to define new variables.</li>
    336 </ol>
    337 
    338 <p>While the first item is really what this is about, we only have variables
    339 for incoming arguments as well as for induction variables, and redefining those only
    340 goes so far :).  Also, the ability to define new variables is a
    341 useful thing regardless of whether you will be mutating them.  Here's a
    342 motivating example that shows how we could use these:</p>
    343 
    344 <div class="doc_code">
    345 <pre>
    346 # Define ':' for sequencing: as a low-precedence operator that ignores operands
    347 # and just returns the RHS.
    348 def binary : 1 (x y) y;
    349 
    350 # Recursive fib, we could do this before.
    351 def fib(x)
    352   if (x &lt; 3) then
    353     1
    354   else
    355     fib(x-1)+fib(x-2);
    356 
    357 # Iterative fib.
    358 def fibi(x)
    359   <b>var a = 1, b = 1, c in</b>
    360   (for i = 3, i &lt; x in 
    361      <b>c = a + b</b> :
    362      <b>a = b</b> :
    363      <b>b = c</b>) :
    364   b;
    365 
    366 # Call it. 
    367 fibi(10);
    368 </pre>
    369 </div>
    370 
    371 <p>
    372 In order to mutate variables, we have to change our existing variables to use
    373 the "alloca trick".  Once we have that, we'll add our new operator, then extend
    374 Kaleidoscope to support new variable definitions.
    375 </p>
    376 
    377 </div>
    378 
    379 <!-- *********************************************************************** -->
    380 <h2><a name="adjustments">Adjusting Existing Variables for Mutation</a></h2>
    381 <!-- *********************************************************************** -->
    382 
    383 <div>
    384 
    385 <p>
    386 The symbol table in Kaleidoscope is managed at code generation time by the 
    387 '<tt>NamedValues</tt>' map.  This map currently keeps track of the LLVM "Value*"
    388 that holds the double value for the named variable.  In order to support
    389 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
    390 the <em>memory location</em> of the variable in question.  Note that this 
    391 change is a refactoring: it changes the structure of the code, but does not
    392 (by itself) change the behavior of the compiler.  All of these changes are 
    393 isolated in the Kaleidoscope code generator.</p>
    394 
    395 <p>
    396 At this point in Kaleidoscope's development, it only supports variables for two
    397 things: incoming arguments to functions and the induction variable of 'for'
    398 loops.  For consistency, we'll allow mutation of these variables in addition to
    399 other user-defined variables.  This means that these will both need memory
    400 locations.
    401 </p>
    402 
    403 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
    404 map so that it maps to AllocaInst* instead of Value*.  Once we do this, the C++ 
    405 compiler will tell us what parts of the code we need to update:</p>
    406 
    407 <div class="doc_code">
    408 <pre>
    409 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
    410 </pre>
    411 </div>
    412 
    413 <p>Also, since we will need to create these alloca's, we'll use a helper
    414 function that ensures that the allocas are created in the entry block of the
    415 function:</p>
    416 
    417 <div class="doc_code">
    418 <pre>
    419 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
    420 /// the function.  This is used for mutable variables etc.
    421 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
    422                                           const std::string &amp;VarName) {
    423   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
    424                  TheFunction-&gt;getEntryBlock().begin());
    425   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
    426                            VarName.c_str());
    427 }
    428 </pre>
    429 </div>
    430 
    431 <p>This funny looking code creates an IRBuilder object that is pointing at
    432 the first instruction (.begin()) of the entry block.  It then creates an alloca
    433 with the expected name and returns it.  Because all values in Kaleidoscope are
    434 doubles, there is no need to pass in a type to use.</p>
    435 
    436 <p>With this in place, the first functionality change we want to make is to
    437 variable references.  In our new scheme, variables live on the stack, so code
    438 generating a reference to them actually needs to produce a load from the stack
    439 slot:</p>
    440 
    441 <div class="doc_code">
    442 <pre>
    443 Value *VariableExprAST::Codegen() {
    444   // Look this variable up in the function.
    445   Value *V = NamedValues[Name];
    446   if (V == 0) return ErrorV("Unknown variable name");
    447 
    448   <b>// Load the value.
    449   return Builder.CreateLoad(V, Name.c_str());</b>
    450 }
    451 </pre>
    452 </div>
    453 
    454 <p>As you can see, this is pretty straightforward.  Now we need to update the
    455 things that define the variables to set up the alloca.  We'll start with 
    456 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
    457 the unabridged code):</p>
    458 
    459 <div class="doc_code">
    460 <pre>
    461   Function *TheFunction = Builder.GetInsertBlock()->getParent();
    462 
    463   <b>// Create an alloca for the variable in the entry block.
    464   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
    465   
    466     // Emit the start code first, without 'variable' in scope.
    467   Value *StartVal = Start-&gt;Codegen();
    468   if (StartVal == 0) return 0;
    469   
    470   <b>// Store the value into the alloca.
    471   Builder.CreateStore(StartVal, Alloca);</b>
    472   ...
    473 
    474   // Compute the end condition.
    475   Value *EndCond = End-&gt;Codegen();
    476   if (EndCond == 0) return EndCond;
    477   
    478   <b>// Reload, increment, and restore the alloca.  This handles the case where
    479   // the body of the loop mutates the variable.
    480   Value *CurVar = Builder.CreateLoad(Alloca);
    481   Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
    482   Builder.CreateStore(NextVar, Alloca);</b>
    483   ...
    484 </pre>
    485 </div>
    486 
    487 <p>This code is virtually identical to the code <a 
    488 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>.  The
    489 big difference is that we no longer have to construct a PHI node, and we use
    490 load/store to access the variable as needed.</p>
    491 
    492 <p>To support mutable argument variables, we need to also make allocas for them.
    493 The code for this is also pretty simple:</p>
    494 
    495 <div class="doc_code">
    496 <pre>
    497 /// CreateArgumentAllocas - Create an alloca for each argument and register the
    498 /// argument in the symbol table so that references to it will succeed.
    499 void PrototypeAST::CreateArgumentAllocas(Function *F) {
    500   Function::arg_iterator AI = F-&gt;arg_begin();
    501   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
    502     // Create an alloca for this variable.
    503     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
    504 
    505     // Store the initial value into the alloca.
    506     Builder.CreateStore(AI, Alloca);
    507 
    508     // Add arguments to variable symbol table.
    509     NamedValues[Args[Idx]] = Alloca;
    510   }
    511 }
    512 </pre>
    513 </div>
    514 
    515 <p>For each argument, we make an alloca, store the input value to the function
    516 into the alloca, and register the alloca as the memory location for the
    517 argument.  This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
    518 it sets up the entry block for the function.</p>
    519 
    520 <p>The final missing piece is adding the mem2reg pass, which allows us to get
    521 good codegen once again:</p>
    522 
    523 <div class="doc_code">
    524 <pre>
    525     // Set up the optimizer pipeline.  Start with registering info about how the
    526     // target lays out data structures.
    527     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
    528     <b>// Promote allocas to registers.
    529     OurFPM.add(createPromoteMemoryToRegisterPass());</b>
    530     // Do simple "peephole" optimizations and bit-twiddling optzns.
    531     OurFPM.add(createInstructionCombiningPass());
    532     // Reassociate expressions.
    533     OurFPM.add(createReassociatePass());
    534 </pre>
    535 </div>
    536 
    537 <p>It is interesting to see what the code looks like before and after the
    538 mem2reg optimization runs.  For example, this is the before/after code for our
    539 recursive fib function.  Before the optimization:</p>
    540 
    541 <div class="doc_code">
    542 <pre>
    543 define double @fib(double %x) {
    544 entry:
    545   <b>%x1 = alloca double
    546   store double %x, double* %x1
    547   %x2 = load double* %x1</b>
    548   %cmptmp = fcmp ult double %x2, 3.000000e+00
    549   %booltmp = uitofp i1 %cmptmp to double
    550   %ifcond = fcmp one double %booltmp, 0.000000e+00
    551   br i1 %ifcond, label %then, label %else
    552 
    553 then:		; preds = %entry
    554   br label %ifcont
    555 
    556 else:		; preds = %entry
    557   <b>%x3 = load double* %x1</b>
    558   %subtmp = fsub double %x3, 1.000000e+00
    559   %calltmp = call double @fib(double %subtmp)
    560   <b>%x4 = load double* %x1</b>
    561   %subtmp5 = fsub double %x4, 2.000000e+00
    562   %calltmp6 = call double @fib(double %subtmp5)
    563   %addtmp = fadd double %calltmp, %calltmp6
    564   br label %ifcont
    565 
    566 ifcont:		; preds = %else, %then
    567   %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
    568   ret double %iftmp
    569 }
    570 </pre>
    571 </div>
    572 
    573 <p>Here there is only one variable (x, the input argument) but you can still
    574 see the extremely simple-minded code generation strategy we are using.  In the
    575 entry block, an alloca is created, and the initial input value is stored into
    576 it.  Each reference to the variable does a reload from the stack.  Also, note
    577 that we didn't modify the if/then/else expression, so it still inserts a PHI
    578 node.  While we could make an alloca for it, it is actually easier to create a 
    579 PHI node for it, so we still just make the PHI.</p>
    580 
    581 <p>Here is the code after the mem2reg pass runs:</p>
    582 
    583 <div class="doc_code">
    584 <pre>
    585 define double @fib(double %x) {
    586 entry:
    587   %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
    588   %booltmp = uitofp i1 %cmptmp to double
    589   %ifcond = fcmp one double %booltmp, 0.000000e+00
    590   br i1 %ifcond, label %then, label %else
    591 
    592 then:
    593   br label %ifcont
    594 
    595 else:
    596   %subtmp = fsub double <b>%x</b>, 1.000000e+00
    597   %calltmp = call double @fib(double %subtmp)
    598   %subtmp5 = fsub double <b>%x</b>, 2.000000e+00
    599   %calltmp6 = call double @fib(double %subtmp5)
    600   %addtmp = fadd double %calltmp, %calltmp6
    601   br label %ifcont
    602 
    603 ifcont:		; preds = %else, %then
    604   %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
    605   ret double %iftmp
    606 }
    607 </pre>
    608 </div>
    609 
    610 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
    611 variable.  The point of showing this is to calm your tension about inserting
    612 such blatent inefficiencies :).</p>
    613 
    614 <p>After the rest of the optimizers run, we get:</p>
    615 
    616 <div class="doc_code">
    617 <pre>
    618 define double @fib(double %x) {
    619 entry:
    620   %cmptmp = fcmp ult double %x, 3.000000e+00
    621   %booltmp = uitofp i1 %cmptmp to double
    622   %ifcond = fcmp ueq double %booltmp, 0.000000e+00
    623   br i1 %ifcond, label %else, label %ifcont
    624 
    625 else:
    626   %subtmp = fsub double %x, 1.000000e+00
    627   %calltmp = call double @fib(double %subtmp)
    628   %subtmp5 = fsub double %x, 2.000000e+00
    629   %calltmp6 = call double @fib(double %subtmp5)
    630   %addtmp = fadd double %calltmp, %calltmp6
    631   ret double %addtmp
    632 
    633 ifcont:
    634   ret double 1.000000e+00
    635 }
    636 </pre>
    637 </div>
    638 
    639 <p>Here we see that the simplifycfg pass decided to clone the return instruction
    640 into the end of the 'else' block.  This allowed it to eliminate some branches
    641 and the PHI node.</p>
    642 
    643 <p>Now that all symbol table references are updated to use stack variables, 
    644 we'll add the assignment operator.</p>
    645 
    646 </div>
    647 
    648 <!-- *********************************************************************** -->
    649 <h2><a name="assignment">New Assignment Operator</a></h2>
    650 <!-- *********************************************************************** -->
    651 
    652 <div>
    653 
    654 <p>With our current framework, adding a new assignment operator is really
    655 simple.  We will parse it just like any other binary operator, but handle it
    656 internally (instead of allowing the user to define it).  The first step is to
    657 set a precedence:</p>
    658 
    659 <div class="doc_code">
    660 <pre>
    661  int main() {
    662    // Install standard binary operators.
    663    // 1 is lowest precedence.
    664    <b>BinopPrecedence['='] = 2;</b>
    665    BinopPrecedence['&lt;'] = 10;
    666    BinopPrecedence['+'] = 20;
    667    BinopPrecedence['-'] = 20;
    668 </pre>
    669 </div>
    670 
    671 <p>Now that the parser knows the precedence of the binary operator, it takes
    672 care of all the parsing and AST generation.  We just need to implement codegen
    673 for the assignment operator.  This looks like:</p> 
    674 
    675 <div class="doc_code">
    676 <pre>
    677 Value *BinaryExprAST::Codegen() {
    678   // Special case '=' because we don't want to emit the LHS as an expression.
    679   if (Op == '=') {
    680     // Assignment requires the LHS to be an identifier.
    681     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
    682     if (!LHSE)
    683       return ErrorV("destination of '=' must be a variable");
    684 </pre>
    685 </div>
    686 
    687 <p>Unlike the rest of the binary operators, our assignment operator doesn't
    688 follow the "emit LHS, emit RHS, do computation" model.  As such, it is handled
    689 as a special case before the other binary operators are handled.  The other 
    690 strange thing is that it requires the LHS to be a variable.  It is invalid to
    691 have "(x+1) = expr" - only things like "x = expr" are allowed.
    692 </p>
    693 
    694 <div class="doc_code">
    695 <pre>
    696     // Codegen the RHS.
    697     Value *Val = RHS-&gt;Codegen();
    698     if (Val == 0) return 0;
    699 
    700     // Look up the name.
    701     Value *Variable = NamedValues[LHSE-&gt;getName()];
    702     if (Variable == 0) return ErrorV("Unknown variable name");
    703 
    704     Builder.CreateStore(Val, Variable);
    705     return Val;
    706   }
    707   ...  
    708 </pre>
    709 </div>
    710 
    711 <p>Once we have the variable, codegen'ing the assignment is straightforward:
    712 we emit the RHS of the assignment, create a store, and return the computed
    713 value.  Returning a value allows for chained assignments like "X = (Y = Z)".</p>
    714 
    715 <p>Now that we have an assignment operator, we can mutate loop variables and
    716 arguments.  For example, we can now run code like this:</p>
    717 
    718 <div class="doc_code">
    719 <pre>
    720 # Function to print a double.
    721 extern printd(x);
    722 
    723 # Define ':' for sequencing: as a low-precedence operator that ignores operands
    724 # and just returns the RHS.
    725 def binary : 1 (x y) y;
    726 
    727 def test(x)
    728   printd(x) :
    729   x = 4 :
    730   printd(x);
    731 
    732 test(123);
    733 </pre>
    734 </div>
    735 
    736 <p>When run, this example prints "123" and then "4", showing that we did
    737 actually mutate the value!  Okay, we have now officially implemented our goal:
    738 getting this to work requires SSA construction in the general case.  However,
    739 to be really useful, we want the ability to define our own local variables, lets
    740 add this next! 
    741 </p>
    742 
    743 </div>
    744 
    745 <!-- *********************************************************************** -->
    746 <h2><a name="localvars">User-defined Local Variables</a></h2>
    747 <!-- *********************************************************************** -->
    748 
    749 <div>
    750 
    751 <p>Adding var/in is just like any other other extensions we made to 
    752 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
    753 The first step for adding our new 'var/in' construct is to extend the lexer.
    754 As before, this is pretty trivial, the code looks like this:</p>
    755 
    756 <div class="doc_code">
    757 <pre>
    758 enum Token {
    759   ...
    760   <b>// var definition
    761   tok_var = -13</b>
    762 ...
    763 }
    764 ...
    765 static int gettok() {
    766 ...
    767     if (IdentifierStr == "in") return tok_in;
    768     if (IdentifierStr == "binary") return tok_binary;
    769     if (IdentifierStr == "unary") return tok_unary;
    770     <b>if (IdentifierStr == "var") return tok_var;</b>
    771     return tok_identifier;
    772 ...
    773 </pre>
    774 </div>
    775 
    776 <p>The next step is to define the AST node that we will construct.  For var/in,
    777 it looks like this:</p>
    778 
    779 <div class="doc_code">
    780 <pre>
    781 /// VarExprAST - Expression class for var/in
    782 class VarExprAST : public ExprAST {
    783   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
    784   ExprAST *Body;
    785 public:
    786   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
    787              ExprAST *body)
    788   : VarNames(varnames), Body(body) {}
    789   
    790   virtual Value *Codegen();
    791 };
    792 </pre>
    793 </div>
    794 
    795 <p>var/in allows a list of names to be defined all at once, and each name can
    796 optionally have an initializer value.  As such, we capture this information in
    797 the VarNames vector.  Also, var/in has a body, this body is allowed to access
    798 the variables defined by the var/in.</p>
    799 
    800 <p>With this in place, we can define the parser pieces.  The first thing we do is add
    801 it as a primary expression:</p>
    802 
    803 <div class="doc_code">
    804 <pre>
    805 /// primary
    806 ///   ::= identifierexpr
    807 ///   ::= numberexpr
    808 ///   ::= parenexpr
    809 ///   ::= ifexpr
    810 ///   ::= forexpr
    811 <b>///   ::= varexpr</b>
    812 static ExprAST *ParsePrimary() {
    813   switch (CurTok) {
    814   default: return Error("unknown token when expecting an expression");
    815   case tok_identifier: return ParseIdentifierExpr();
    816   case tok_number:     return ParseNumberExpr();
    817   case '(':            return ParseParenExpr();
    818   case tok_if:         return ParseIfExpr();
    819   case tok_for:        return ParseForExpr();
    820   <b>case tok_var:        return ParseVarExpr();</b>
    821   }
    822 }
    823 </pre>
    824 </div>
    825 
    826 <p>Next we define ParseVarExpr:</p>
    827 
    828 <div class="doc_code">
    829 <pre>
    830 /// varexpr ::= 'var' identifier ('=' expression)? 
    831 //                    (',' identifier ('=' expression)?)* 'in' expression
    832 static ExprAST *ParseVarExpr() {
    833   getNextToken();  // eat the var.
    834 
    835   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
    836 
    837   // At least one variable name is required.
    838   if (CurTok != tok_identifier)
    839     return Error("expected identifier after var");
    840 </pre>
    841 </div>
    842 
    843 <p>The first part of this code parses the list of identifier/expr pairs into the
    844 local <tt>VarNames</tt> vector.  
    845 
    846 <div class="doc_code">
    847 <pre>
    848   while (1) {
    849     std::string Name = IdentifierStr;
    850     getNextToken();  // eat identifier.
    851 
    852     // Read the optional initializer.
    853     ExprAST *Init = 0;
    854     if (CurTok == '=') {
    855       getNextToken(); // eat the '='.
    856       
    857       Init = ParseExpression();
    858       if (Init == 0) return 0;
    859     }
    860     
    861     VarNames.push_back(std::make_pair(Name, Init));
    862     
    863     // End of var list, exit loop.
    864     if (CurTok != ',') break;
    865     getNextToken(); // eat the ','.
    866     
    867     if (CurTok != tok_identifier)
    868       return Error("expected identifier list after var");
    869   }
    870 </pre>
    871 </div>
    872 
    873 <p>Once all the variables are parsed, we then parse the body and create the
    874 AST node:</p>
    875 
    876 <div class="doc_code">
    877 <pre>
    878   // At this point, we have to have 'in'.
    879   if (CurTok != tok_in)
    880     return Error("expected 'in' keyword after 'var'");
    881   getNextToken();  // eat 'in'.
    882   
    883   ExprAST *Body = ParseExpression();
    884   if (Body == 0) return 0;
    885   
    886   return new VarExprAST(VarNames, Body);
    887 }
    888 </pre>
    889 </div>
    890 
    891 <p>Now that we can parse and represent the code, we need to support emission of
    892 LLVM IR for it.  This code starts out with:</p>
    893 
    894 <div class="doc_code">
    895 <pre>
    896 Value *VarExprAST::Codegen() {
    897   std::vector&lt;AllocaInst *&gt; OldBindings;
    898   
    899   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
    900 
    901   // Register all variables and emit their initializer.
    902   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
    903     const std::string &amp;VarName = VarNames[i].first;
    904     ExprAST *Init = VarNames[i].second;
    905 </pre>
    906 </div>
    907 
    908 <p>Basically it loops over all the variables, installing them one at a time.
    909 For each variable we put into the symbol table, we remember the previous value
    910 that we replace in OldBindings.</p>
    911 
    912 <div class="doc_code">
    913 <pre>
    914     // Emit the initializer before adding the variable to scope, this prevents
    915     // the initializer from referencing the variable itself, and permits stuff
    916     // like this:
    917     //  var a = 1 in
    918     //    var a = a in ...   # refers to outer 'a'.
    919     Value *InitVal;
    920     if (Init) {
    921       InitVal = Init-&gt;Codegen();
    922       if (InitVal == 0) return 0;
    923     } else { // If not specified, use 0.0.
    924       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
    925     }
    926     
    927     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
    928     Builder.CreateStore(InitVal, Alloca);
    929 
    930     // Remember the old variable binding so that we can restore the binding when
    931     // we unrecurse.
    932     OldBindings.push_back(NamedValues[VarName]);
    933     
    934     // Remember this binding.
    935     NamedValues[VarName] = Alloca;
    936   }
    937 </pre>
    938 </div>
    939 
    940 <p>There are more comments here than code.  The basic idea is that we emit the
    941 initializer, create the alloca, then update the symbol table to point to it.
    942 Once all the variables are installed in the symbol table, we evaluate the body
    943 of the var/in expression:</p>
    944 
    945 <div class="doc_code">
    946 <pre>
    947   // Codegen the body, now that all vars are in scope.
    948   Value *BodyVal = Body-&gt;Codegen();
    949   if (BodyVal == 0) return 0;
    950 </pre>
    951 </div>
    952 
    953 <p>Finally, before returning, we restore the previous variable bindings:</p>
    954 
    955 <div class="doc_code">
    956 <pre>
    957   // Pop all our variables from scope.
    958   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
    959     NamedValues[VarNames[i].first] = OldBindings[i];
    960 
    961   // Return the body computation.
    962   return BodyVal;
    963 }
    964 </pre>
    965 </div>
    966 
    967 <p>The end result of all of this is that we get properly scoped variable 
    968 definitions, and we even (trivially) allow mutation of them :).</p>
    969 
    970 <p>With this, we completed what we set out to do.  Our nice iterative fib
    971 example from the intro compiles and runs just fine.  The mem2reg pass optimizes
    972 all of our stack variables into SSA registers, inserting PHI nodes where needed,
    973 and our front-end remains simple: no "iterated dominance frontier" computation
    974 anywhere in sight.</p>
    975 
    976 </div>
    977 
    978 <!-- *********************************************************************** -->
    979 <h2><a name="code">Full Code Listing</a></h2>
    980 <!-- *********************************************************************** -->
    981 
    982 <div>
    983 
    984 <p>
    985 Here is the complete code listing for our running example, enhanced with mutable
    986 variables and var/in support.  To build this example, use:
    987 </p>
    988 
    989 <div class="doc_code">
    990 <pre>
    991 # Compile
    992 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
    993 # Run
    994 ./toy
    995 </pre>
    996 </div>
    997 
    998 <p>Here is the code:</p>
    999 
   1000 <div class="doc_code">
   1001 <pre>
   1002 #include "llvm/DerivedTypes.h"
   1003 #include "llvm/ExecutionEngine/ExecutionEngine.h"
   1004 #include "llvm/ExecutionEngine/JIT.h"
   1005 #include "llvm/LLVMContext.h"
   1006 #include "llvm/Module.h"
   1007 #include "llvm/PassManager.h"
   1008 #include "llvm/Analysis/Verifier.h"
   1009 #include "llvm/Analysis/Passes.h"
   1010 #include "llvm/Target/TargetData.h"
   1011 #include "llvm/Transforms/Scalar.h"
   1012 #include "llvm/Support/IRBuilder.h"
   1013 #include "llvm/Support/TargetSelect.h"
   1014 #include &lt;cstdio&gt;
   1015 #include &lt;string&gt;
   1016 #include &lt;map&gt;
   1017 #include &lt;vector&gt;
   1018 using namespace llvm;
   1019 
   1020 //===----------------------------------------------------------------------===//
   1021 // Lexer
   1022 //===----------------------------------------------------------------------===//
   1023 
   1024 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
   1025 // of these for known things.
   1026 enum Token {
   1027   tok_eof = -1,
   1028 
   1029   // commands
   1030   tok_def = -2, tok_extern = -3,
   1031 
   1032   // primary
   1033   tok_identifier = -4, tok_number = -5,
   1034   
   1035   // control
   1036   tok_if = -6, tok_then = -7, tok_else = -8,
   1037   tok_for = -9, tok_in = -10,
   1038   
   1039   // operators
   1040   tok_binary = -11, tok_unary = -12,
   1041   
   1042   // var definition
   1043   tok_var = -13
   1044 };
   1045 
   1046 static std::string IdentifierStr;  // Filled in if tok_identifier
   1047 static double NumVal;              // Filled in if tok_number
   1048 
   1049 /// gettok - Return the next token from standard input.
   1050 static int gettok() {
   1051   static int LastChar = ' ';
   1052 
   1053   // Skip any whitespace.
   1054   while (isspace(LastChar))
   1055     LastChar = getchar();
   1056 
   1057   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
   1058     IdentifierStr = LastChar;
   1059     while (isalnum((LastChar = getchar())))
   1060       IdentifierStr += LastChar;
   1061 
   1062     if (IdentifierStr == "def") return tok_def;
   1063     if (IdentifierStr == "extern") return tok_extern;
   1064     if (IdentifierStr == "if") return tok_if;
   1065     if (IdentifierStr == "then") return tok_then;
   1066     if (IdentifierStr == "else") return tok_else;
   1067     if (IdentifierStr == "for") return tok_for;
   1068     if (IdentifierStr == "in") return tok_in;
   1069     if (IdentifierStr == "binary") return tok_binary;
   1070     if (IdentifierStr == "unary") return tok_unary;
   1071     if (IdentifierStr == "var") return tok_var;
   1072     return tok_identifier;
   1073   }
   1074 
   1075   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
   1076     std::string NumStr;
   1077     do {
   1078       NumStr += LastChar;
   1079       LastChar = getchar();
   1080     } while (isdigit(LastChar) || LastChar == '.');
   1081 
   1082     NumVal = strtod(NumStr.c_str(), 0);
   1083     return tok_number;
   1084   }
   1085 
   1086   if (LastChar == '#') {
   1087     // Comment until end of line.
   1088     do LastChar = getchar();
   1089     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
   1090     
   1091     if (LastChar != EOF)
   1092       return gettok();
   1093   }
   1094   
   1095   // Check for end of file.  Don't eat the EOF.
   1096   if (LastChar == EOF)
   1097     return tok_eof;
   1098 
   1099   // Otherwise, just return the character as its ascii value.
   1100   int ThisChar = LastChar;
   1101   LastChar = getchar();
   1102   return ThisChar;
   1103 }
   1104 
   1105 //===----------------------------------------------------------------------===//
   1106 // Abstract Syntax Tree (aka Parse Tree)
   1107 //===----------------------------------------------------------------------===//
   1108 
   1109 /// ExprAST - Base class for all expression nodes.
   1110 class ExprAST {
   1111 public:
   1112   virtual ~ExprAST() {}
   1113   virtual Value *Codegen() = 0;
   1114 };
   1115 
   1116 /// NumberExprAST - Expression class for numeric literals like "1.0".
   1117 class NumberExprAST : public ExprAST {
   1118   double Val;
   1119 public:
   1120   NumberExprAST(double val) : Val(val) {}
   1121   virtual Value *Codegen();
   1122 };
   1123 
   1124 /// VariableExprAST - Expression class for referencing a variable, like "a".
   1125 class VariableExprAST : public ExprAST {
   1126   std::string Name;
   1127 public:
   1128   VariableExprAST(const std::string &amp;name) : Name(name) {}
   1129   const std::string &amp;getName() const { return Name; }
   1130   virtual Value *Codegen();
   1131 };
   1132 
   1133 /// UnaryExprAST - Expression class for a unary operator.
   1134 class UnaryExprAST : public ExprAST {
   1135   char Opcode;
   1136   ExprAST *Operand;
   1137 public:
   1138   UnaryExprAST(char opcode, ExprAST *operand) 
   1139     : Opcode(opcode), Operand(operand) {}
   1140   virtual Value *Codegen();
   1141 };
   1142 
   1143 /// BinaryExprAST - Expression class for a binary operator.
   1144 class BinaryExprAST : public ExprAST {
   1145   char Op;
   1146   ExprAST *LHS, *RHS;
   1147 public:
   1148   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 
   1149     : Op(op), LHS(lhs), RHS(rhs) {}
   1150   virtual Value *Codegen();
   1151 };
   1152 
   1153 /// CallExprAST - Expression class for function calls.
   1154 class CallExprAST : public ExprAST {
   1155   std::string Callee;
   1156   std::vector&lt;ExprAST*&gt; Args;
   1157 public:
   1158   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
   1159     : Callee(callee), Args(args) {}
   1160   virtual Value *Codegen();
   1161 };
   1162 
   1163 /// IfExprAST - Expression class for if/then/else.
   1164 class IfExprAST : public ExprAST {
   1165   ExprAST *Cond, *Then, *Else;
   1166 public:
   1167   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
   1168   : Cond(cond), Then(then), Else(_else) {}
   1169   virtual Value *Codegen();
   1170 };
   1171 
   1172 /// ForExprAST - Expression class for for/in.
   1173 class ForExprAST : public ExprAST {
   1174   std::string VarName;
   1175   ExprAST *Start, *End, *Step, *Body;
   1176 public:
   1177   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
   1178              ExprAST *step, ExprAST *body)
   1179     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
   1180   virtual Value *Codegen();
   1181 };
   1182 
   1183 /// VarExprAST - Expression class for var/in
   1184 class VarExprAST : public ExprAST {
   1185   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
   1186   ExprAST *Body;
   1187 public:
   1188   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
   1189              ExprAST *body)
   1190   : VarNames(varnames), Body(body) {}
   1191   
   1192   virtual Value *Codegen();
   1193 };
   1194 
   1195 /// PrototypeAST - This class represents the "prototype" for a function,
   1196 /// which captures its name, and its argument names (thus implicitly the number
   1197 /// of arguments the function takes), as well as if it is an operator.
   1198 class PrototypeAST {
   1199   std::string Name;
   1200   std::vector&lt;std::string&gt; Args;
   1201   bool isOperator;
   1202   unsigned Precedence;  // Precedence if a binary op.
   1203 public:
   1204   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
   1205                bool isoperator = false, unsigned prec = 0)
   1206   : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
   1207   
   1208   bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
   1209   bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
   1210   
   1211   char getOperatorName() const {
   1212     assert(isUnaryOp() || isBinaryOp());
   1213     return Name[Name.size()-1];
   1214   }
   1215   
   1216   unsigned getBinaryPrecedence() const { return Precedence; }
   1217   
   1218   Function *Codegen();
   1219   
   1220   void CreateArgumentAllocas(Function *F);
   1221 };
   1222 
   1223 /// FunctionAST - This class represents a function definition itself.
   1224 class FunctionAST {
   1225   PrototypeAST *Proto;
   1226   ExprAST *Body;
   1227 public:
   1228   FunctionAST(PrototypeAST *proto, ExprAST *body)
   1229     : Proto(proto), Body(body) {}
   1230   
   1231   Function *Codegen();
   1232 };
   1233 
   1234 //===----------------------------------------------------------------------===//
   1235 // Parser
   1236 //===----------------------------------------------------------------------===//
   1237 
   1238 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
   1239 /// token the parser is looking at.  getNextToken reads another token from the
   1240 /// lexer and updates CurTok with its results.
   1241 static int CurTok;
   1242 static int getNextToken() {
   1243   return CurTok = gettok();
   1244 }
   1245 
   1246 /// BinopPrecedence - This holds the precedence for each binary operator that is
   1247 /// defined.
   1248 static std::map&lt;char, int&gt; BinopPrecedence;
   1249 
   1250 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
   1251 static int GetTokPrecedence() {
   1252   if (!isascii(CurTok))
   1253     return -1;
   1254   
   1255   // Make sure it's a declared binop.
   1256   int TokPrec = BinopPrecedence[CurTok];
   1257   if (TokPrec &lt;= 0) return -1;
   1258   return TokPrec;
   1259 }
   1260 
   1261 /// Error* - These are little helper functions for error handling.
   1262 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
   1263 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
   1264 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
   1265 
   1266 static ExprAST *ParseExpression();
   1267 
   1268 /// identifierexpr
   1269 ///   ::= identifier
   1270 ///   ::= identifier '(' expression* ')'
   1271 static ExprAST *ParseIdentifierExpr() {
   1272   std::string IdName = IdentifierStr;
   1273   
   1274   getNextToken();  // eat identifier.
   1275   
   1276   if (CurTok != '(') // Simple variable ref.
   1277     return new VariableExprAST(IdName);
   1278   
   1279   // Call.
   1280   getNextToken();  // eat (
   1281   std::vector&lt;ExprAST*&gt; Args;
   1282   if (CurTok != ')') {
   1283     while (1) {
   1284       ExprAST *Arg = ParseExpression();
   1285       if (!Arg) return 0;
   1286       Args.push_back(Arg);
   1287 
   1288       if (CurTok == ')') break;
   1289 
   1290       if (CurTok != ',')
   1291         return Error("Expected ')' or ',' in argument list");
   1292       getNextToken();
   1293     }
   1294   }
   1295 
   1296   // Eat the ')'.
   1297   getNextToken();
   1298   
   1299   return new CallExprAST(IdName, Args);
   1300 }
   1301 
   1302 /// numberexpr ::= number
   1303 static ExprAST *ParseNumberExpr() {
   1304   ExprAST *Result = new NumberExprAST(NumVal);
   1305   getNextToken(); // consume the number
   1306   return Result;
   1307 }
   1308 
   1309 /// parenexpr ::= '(' expression ')'
   1310 static ExprAST *ParseParenExpr() {
   1311   getNextToken();  // eat (.
   1312   ExprAST *V = ParseExpression();
   1313   if (!V) return 0;
   1314   
   1315   if (CurTok != ')')
   1316     return Error("expected ')'");
   1317   getNextToken();  // eat ).
   1318   return V;
   1319 }
   1320 
   1321 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
   1322 static ExprAST *ParseIfExpr() {
   1323   getNextToken();  // eat the if.
   1324   
   1325   // condition.
   1326   ExprAST *Cond = ParseExpression();
   1327   if (!Cond) return 0;
   1328   
   1329   if (CurTok != tok_then)
   1330     return Error("expected then");
   1331   getNextToken();  // eat the then
   1332   
   1333   ExprAST *Then = ParseExpression();
   1334   if (Then == 0) return 0;
   1335   
   1336   if (CurTok != tok_else)
   1337     return Error("expected else");
   1338   
   1339   getNextToken();
   1340   
   1341   ExprAST *Else = ParseExpression();
   1342   if (!Else) return 0;
   1343   
   1344   return new IfExprAST(Cond, Then, Else);
   1345 }
   1346 
   1347 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
   1348 static ExprAST *ParseForExpr() {
   1349   getNextToken();  // eat the for.
   1350 
   1351   if (CurTok != tok_identifier)
   1352     return Error("expected identifier after for");
   1353   
   1354   std::string IdName = IdentifierStr;
   1355   getNextToken();  // eat identifier.
   1356   
   1357   if (CurTok != '=')
   1358     return Error("expected '=' after for");
   1359   getNextToken();  // eat '='.
   1360   
   1361   
   1362   ExprAST *Start = ParseExpression();
   1363   if (Start == 0) return 0;
   1364   if (CurTok != ',')
   1365     return Error("expected ',' after for start value");
   1366   getNextToken();
   1367   
   1368   ExprAST *End = ParseExpression();
   1369   if (End == 0) return 0;
   1370   
   1371   // The step value is optional.
   1372   ExprAST *Step = 0;
   1373   if (CurTok == ',') {
   1374     getNextToken();
   1375     Step = ParseExpression();
   1376     if (Step == 0) return 0;
   1377   }
   1378   
   1379   if (CurTok != tok_in)
   1380     return Error("expected 'in' after for");
   1381   getNextToken();  // eat 'in'.
   1382   
   1383   ExprAST *Body = ParseExpression();
   1384   if (Body == 0) return 0;
   1385 
   1386   return new ForExprAST(IdName, Start, End, Step, Body);
   1387 }
   1388 
   1389 /// varexpr ::= 'var' identifier ('=' expression)? 
   1390 //                    (',' identifier ('=' expression)?)* 'in' expression
   1391 static ExprAST *ParseVarExpr() {
   1392   getNextToken();  // eat the var.
   1393 
   1394   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
   1395 
   1396   // At least one variable name is required.
   1397   if (CurTok != tok_identifier)
   1398     return Error("expected identifier after var");
   1399   
   1400   while (1) {
   1401     std::string Name = IdentifierStr;
   1402     getNextToken();  // eat identifier.
   1403 
   1404     // Read the optional initializer.
   1405     ExprAST *Init = 0;
   1406     if (CurTok == '=') {
   1407       getNextToken(); // eat the '='.
   1408       
   1409       Init = ParseExpression();
   1410       if (Init == 0) return 0;
   1411     }
   1412     
   1413     VarNames.push_back(std::make_pair(Name, Init));
   1414     
   1415     // End of var list, exit loop.
   1416     if (CurTok != ',') break;
   1417     getNextToken(); // eat the ','.
   1418     
   1419     if (CurTok != tok_identifier)
   1420       return Error("expected identifier list after var");
   1421   }
   1422   
   1423   // At this point, we have to have 'in'.
   1424   if (CurTok != tok_in)
   1425     return Error("expected 'in' keyword after 'var'");
   1426   getNextToken();  // eat 'in'.
   1427   
   1428   ExprAST *Body = ParseExpression();
   1429   if (Body == 0) return 0;
   1430   
   1431   return new VarExprAST(VarNames, Body);
   1432 }
   1433 
   1434 /// primary
   1435 ///   ::= identifierexpr
   1436 ///   ::= numberexpr
   1437 ///   ::= parenexpr
   1438 ///   ::= ifexpr
   1439 ///   ::= forexpr
   1440 ///   ::= varexpr
   1441 static ExprAST *ParsePrimary() {
   1442   switch (CurTok) {
   1443   default: return Error("unknown token when expecting an expression");
   1444   case tok_identifier: return ParseIdentifierExpr();
   1445   case tok_number:     return ParseNumberExpr();
   1446   case '(':            return ParseParenExpr();
   1447   case tok_if:         return ParseIfExpr();
   1448   case tok_for:        return ParseForExpr();
   1449   case tok_var:        return ParseVarExpr();
   1450   }
   1451 }
   1452 
   1453 /// unary
   1454 ///   ::= primary
   1455 ///   ::= '!' unary
   1456 static ExprAST *ParseUnary() {
   1457   // If the current token is not an operator, it must be a primary expr.
   1458   if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
   1459     return ParsePrimary();
   1460   
   1461   // If this is a unary operator, read it.
   1462   int Opc = CurTok;
   1463   getNextToken();
   1464   if (ExprAST *Operand = ParseUnary())
   1465     return new UnaryExprAST(Opc, Operand);
   1466   return 0;
   1467 }
   1468 
   1469 /// binoprhs
   1470 ///   ::= ('+' unary)*
   1471 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
   1472   // If this is a binop, find its precedence.
   1473   while (1) {
   1474     int TokPrec = GetTokPrecedence();
   1475     
   1476     // If this is a binop that binds at least as tightly as the current binop,
   1477     // consume it, otherwise we are done.
   1478     if (TokPrec &lt; ExprPrec)
   1479       return LHS;
   1480     
   1481     // Okay, we know this is a binop.
   1482     int BinOp = CurTok;
   1483     getNextToken();  // eat binop
   1484     
   1485     // Parse the unary expression after the binary operator.
   1486     ExprAST *RHS = ParseUnary();
   1487     if (!RHS) return 0;
   1488     
   1489     // If BinOp binds less tightly with RHS than the operator after RHS, let
   1490     // the pending operator take RHS as its LHS.
   1491     int NextPrec = GetTokPrecedence();
   1492     if (TokPrec &lt; NextPrec) {
   1493       RHS = ParseBinOpRHS(TokPrec+1, RHS);
   1494       if (RHS == 0) return 0;
   1495     }
   1496     
   1497     // Merge LHS/RHS.
   1498     LHS = new BinaryExprAST(BinOp, LHS, RHS);
   1499   }
   1500 }
   1501 
   1502 /// expression
   1503 ///   ::= unary binoprhs
   1504 ///
   1505 static ExprAST *ParseExpression() {
   1506   ExprAST *LHS = ParseUnary();
   1507   if (!LHS) return 0;
   1508   
   1509   return ParseBinOpRHS(0, LHS);
   1510 }
   1511 
   1512 /// prototype
   1513 ///   ::= id '(' id* ')'
   1514 ///   ::= binary LETTER number? (id, id)
   1515 ///   ::= unary LETTER (id)
   1516 static PrototypeAST *ParsePrototype() {
   1517   std::string FnName;
   1518   
   1519   unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
   1520   unsigned BinaryPrecedence = 30;
   1521   
   1522   switch (CurTok) {
   1523   default:
   1524     return ErrorP("Expected function name in prototype");
   1525   case tok_identifier:
   1526     FnName = IdentifierStr;
   1527     Kind = 0;
   1528     getNextToken();
   1529     break;
   1530   case tok_unary:
   1531     getNextToken();
   1532     if (!isascii(CurTok))
   1533       return ErrorP("Expected unary operator");
   1534     FnName = "unary";
   1535     FnName += (char)CurTok;
   1536     Kind = 1;
   1537     getNextToken();
   1538     break;
   1539   case tok_binary:
   1540     getNextToken();
   1541     if (!isascii(CurTok))
   1542       return ErrorP("Expected binary operator");
   1543     FnName = "binary";
   1544     FnName += (char)CurTok;
   1545     Kind = 2;
   1546     getNextToken();
   1547     
   1548     // Read the precedence if present.
   1549     if (CurTok == tok_number) {
   1550       if (NumVal &lt; 1 || NumVal &gt; 100)
   1551         return ErrorP("Invalid precedecnce: must be 1..100");
   1552       BinaryPrecedence = (unsigned)NumVal;
   1553       getNextToken();
   1554     }
   1555     break;
   1556   }
   1557   
   1558   if (CurTok != '(')
   1559     return ErrorP("Expected '(' in prototype");
   1560   
   1561   std::vector&lt;std::string&gt; ArgNames;
   1562   while (getNextToken() == tok_identifier)
   1563     ArgNames.push_back(IdentifierStr);
   1564   if (CurTok != ')')
   1565     return ErrorP("Expected ')' in prototype");
   1566   
   1567   // success.
   1568   getNextToken();  // eat ')'.
   1569   
   1570   // Verify right number of names for operator.
   1571   if (Kind &amp;&amp; ArgNames.size() != Kind)
   1572     return ErrorP("Invalid number of operands for operator");
   1573   
   1574   return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
   1575 }
   1576 
   1577 /// definition ::= 'def' prototype expression
   1578 static FunctionAST *ParseDefinition() {
   1579   getNextToken();  // eat def.
   1580   PrototypeAST *Proto = ParsePrototype();
   1581   if (Proto == 0) return 0;
   1582 
   1583   if (ExprAST *E = ParseExpression())
   1584     return new FunctionAST(Proto, E);
   1585   return 0;
   1586 }
   1587 
   1588 /// toplevelexpr ::= expression
   1589 static FunctionAST *ParseTopLevelExpr() {
   1590   if (ExprAST *E = ParseExpression()) {
   1591     // Make an anonymous proto.
   1592     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
   1593     return new FunctionAST(Proto, E);
   1594   }
   1595   return 0;
   1596 }
   1597 
   1598 /// external ::= 'extern' prototype
   1599 static PrototypeAST *ParseExtern() {
   1600   getNextToken();  // eat extern.
   1601   return ParsePrototype();
   1602 }
   1603 
   1604 //===----------------------------------------------------------------------===//
   1605 // Code Generation
   1606 //===----------------------------------------------------------------------===//
   1607 
   1608 static Module *TheModule;
   1609 static IRBuilder&lt;&gt; Builder(getGlobalContext());
   1610 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
   1611 static FunctionPassManager *TheFPM;
   1612 
   1613 Value *ErrorV(const char *Str) { Error(Str); return 0; }
   1614 
   1615 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
   1616 /// the function.  This is used for mutable variables etc.
   1617 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
   1618                                           const std::string &amp;VarName) {
   1619   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
   1620                  TheFunction-&gt;getEntryBlock().begin());
   1621   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
   1622                            VarName.c_str());
   1623 }
   1624 
   1625 Value *NumberExprAST::Codegen() {
   1626   return ConstantFP::get(getGlobalContext(), APFloat(Val));
   1627 }
   1628 
   1629 Value *VariableExprAST::Codegen() {
   1630   // Look this variable up in the function.
   1631   Value *V = NamedValues[Name];
   1632   if (V == 0) return ErrorV("Unknown variable name");
   1633 
   1634   // Load the value.
   1635   return Builder.CreateLoad(V, Name.c_str());
   1636 }
   1637 
   1638 Value *UnaryExprAST::Codegen() {
   1639   Value *OperandV = Operand-&gt;Codegen();
   1640   if (OperandV == 0) return 0;
   1641   
   1642   Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
   1643   if (F == 0)
   1644     return ErrorV("Unknown unary operator");
   1645   
   1646   return Builder.CreateCall(F, OperandV, "unop");
   1647 }
   1648 
   1649 Value *BinaryExprAST::Codegen() {
   1650   // Special case '=' because we don't want to emit the LHS as an expression.
   1651   if (Op == '=') {
   1652     // Assignment requires the LHS to be an identifier.
   1653     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
   1654     if (!LHSE)
   1655       return ErrorV("destination of '=' must be a variable");
   1656     // Codegen the RHS.
   1657     Value *Val = RHS-&gt;Codegen();
   1658     if (Val == 0) return 0;
   1659 
   1660     // Look up the name.
   1661     Value *Variable = NamedValues[LHSE-&gt;getName()];
   1662     if (Variable == 0) return ErrorV("Unknown variable name");
   1663 
   1664     Builder.CreateStore(Val, Variable);
   1665     return Val;
   1666   }
   1667   
   1668   Value *L = LHS-&gt;Codegen();
   1669   Value *R = RHS-&gt;Codegen();
   1670   if (L == 0 || R == 0) return 0;
   1671   
   1672   switch (Op) {
   1673   case '+': return Builder.CreateFAdd(L, R, "addtmp");
   1674   case '-': return Builder.CreateFSub(L, R, "subtmp");
   1675   case '*': return Builder.CreateFMul(L, R, "multmp");
   1676   case '&lt;':
   1677     L = Builder.CreateFCmpULT(L, R, "cmptmp");
   1678     // Convert bool 0/1 to double 0.0 or 1.0
   1679     return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
   1680                                 "booltmp");
   1681   default: break;
   1682   }
   1683   
   1684   // If it wasn't a builtin binary operator, it must be a user defined one. Emit
   1685   // a call to it.
   1686   Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
   1687   assert(F &amp;&amp; "binary operator not found!");
   1688   
   1689   Value *Ops[2] = { L, R };
   1690   return Builder.CreateCall(F, Ops, "binop");
   1691 }
   1692 
   1693 Value *CallExprAST::Codegen() {
   1694   // Look up the name in the global module table.
   1695   Function *CalleeF = TheModule-&gt;getFunction(Callee);
   1696   if (CalleeF == 0)
   1697     return ErrorV("Unknown function referenced");
   1698   
   1699   // If argument mismatch error.
   1700   if (CalleeF-&gt;arg_size() != Args.size())
   1701     return ErrorV("Incorrect # arguments passed");
   1702 
   1703   std::vector&lt;Value*&gt; ArgsV;
   1704   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
   1705     ArgsV.push_back(Args[i]-&gt;Codegen());
   1706     if (ArgsV.back() == 0) return 0;
   1707   }
   1708   
   1709   return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
   1710 }
   1711 
   1712 Value *IfExprAST::Codegen() {
   1713   Value *CondV = Cond-&gt;Codegen();
   1714   if (CondV == 0) return 0;
   1715   
   1716   // Convert condition to a bool by comparing equal to 0.0.
   1717   CondV = Builder.CreateFCmpONE(CondV, 
   1718                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
   1719                                 "ifcond");
   1720   
   1721   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
   1722   
   1723   // Create blocks for the then and else cases.  Insert the 'then' block at the
   1724   // end of the function.
   1725   BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
   1726   BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
   1727   BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
   1728   
   1729   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
   1730   
   1731   // Emit then value.
   1732   Builder.SetInsertPoint(ThenBB);
   1733   
   1734   Value *ThenV = Then-&gt;Codegen();
   1735   if (ThenV == 0) return 0;
   1736   
   1737   Builder.CreateBr(MergeBB);
   1738   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
   1739   ThenBB = Builder.GetInsertBlock();
   1740   
   1741   // Emit else block.
   1742   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
   1743   Builder.SetInsertPoint(ElseBB);
   1744   
   1745   Value *ElseV = Else-&gt;Codegen();
   1746   if (ElseV == 0) return 0;
   1747   
   1748   Builder.CreateBr(MergeBB);
   1749   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
   1750   ElseBB = Builder.GetInsertBlock();
   1751   
   1752   // Emit merge block.
   1753   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
   1754   Builder.SetInsertPoint(MergeBB);
   1755   PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
   1756                                   "iftmp");
   1757   
   1758   PN-&gt;addIncoming(ThenV, ThenBB);
   1759   PN-&gt;addIncoming(ElseV, ElseBB);
   1760   return PN;
   1761 }
   1762 
   1763 Value *ForExprAST::Codegen() {
   1764   // Output this as:
   1765   //   var = alloca double
   1766   //   ...
   1767   //   start = startexpr
   1768   //   store start -&gt; var
   1769   //   goto loop
   1770   // loop: 
   1771   //   ...
   1772   //   bodyexpr
   1773   //   ...
   1774   // loopend:
   1775   //   step = stepexpr
   1776   //   endcond = endexpr
   1777   //
   1778   //   curvar = load var
   1779   //   nextvar = curvar + step
   1780   //   store nextvar -&gt; var
   1781   //   br endcond, loop, endloop
   1782   // outloop:
   1783   
   1784   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
   1785 
   1786   // Create an alloca for the variable in the entry block.
   1787   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
   1788   
   1789   // Emit the start code first, without 'variable' in scope.
   1790   Value *StartVal = Start-&gt;Codegen();
   1791   if (StartVal == 0) return 0;
   1792   
   1793   // Store the value into the alloca.
   1794   Builder.CreateStore(StartVal, Alloca);
   1795   
   1796   // Make the new basic block for the loop header, inserting after current
   1797   // block.
   1798   BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
   1799   
   1800   // Insert an explicit fall through from the current block to the LoopBB.
   1801   Builder.CreateBr(LoopBB);
   1802 
   1803   // Start insertion in LoopBB.
   1804   Builder.SetInsertPoint(LoopBB);
   1805   
   1806   // Within the loop, the variable is defined equal to the PHI node.  If it
   1807   // shadows an existing variable, we have to restore it, so save it now.
   1808   AllocaInst *OldVal = NamedValues[VarName];
   1809   NamedValues[VarName] = Alloca;
   1810   
   1811   // Emit the body of the loop.  This, like any other expr, can change the
   1812   // current BB.  Note that we ignore the value computed by the body, but don't
   1813   // allow an error.
   1814   if (Body-&gt;Codegen() == 0)
   1815     return 0;
   1816   
   1817   // Emit the step value.
   1818   Value *StepVal;
   1819   if (Step) {
   1820     StepVal = Step-&gt;Codegen();
   1821     if (StepVal == 0) return 0;
   1822   } else {
   1823     // If not specified, use 1.0.
   1824     StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
   1825   }
   1826   
   1827   // Compute the end condition.
   1828   Value *EndCond = End-&gt;Codegen();
   1829   if (EndCond == 0) return EndCond;
   1830   
   1831   // Reload, increment, and restore the alloca.  This handles the case where
   1832   // the body of the loop mutates the variable.
   1833   Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
   1834   Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
   1835   Builder.CreateStore(NextVar, Alloca);
   1836   
   1837   // Convert condition to a bool by comparing equal to 0.0.
   1838   EndCond = Builder.CreateFCmpONE(EndCond, 
   1839                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
   1840                                   "loopcond");
   1841   
   1842   // Create the "after loop" block and insert it.
   1843   BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
   1844   
   1845   // Insert the conditional branch into the end of LoopEndBB.
   1846   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
   1847   
   1848   // Any new code will be inserted in AfterBB.
   1849   Builder.SetInsertPoint(AfterBB);
   1850   
   1851   // Restore the unshadowed variable.
   1852   if (OldVal)
   1853     NamedValues[VarName] = OldVal;
   1854   else
   1855     NamedValues.erase(VarName);
   1856 
   1857   
   1858   // for expr always returns 0.0.
   1859   return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
   1860 }
   1861 
   1862 Value *VarExprAST::Codegen() {
   1863   std::vector&lt;AllocaInst *&gt; OldBindings;
   1864   
   1865   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
   1866 
   1867   // Register all variables and emit their initializer.
   1868   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
   1869     const std::string &amp;VarName = VarNames[i].first;
   1870     ExprAST *Init = VarNames[i].second;
   1871     
   1872     // Emit the initializer before adding the variable to scope, this prevents
   1873     // the initializer from referencing the variable itself, and permits stuff
   1874     // like this:
   1875     //  var a = 1 in
   1876     //    var a = a in ...   # refers to outer 'a'.
   1877     Value *InitVal;
   1878     if (Init) {
   1879       InitVal = Init-&gt;Codegen();
   1880       if (InitVal == 0) return 0;
   1881     } else { // If not specified, use 0.0.
   1882       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
   1883     }
   1884     
   1885     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
   1886     Builder.CreateStore(InitVal, Alloca);
   1887 
   1888     // Remember the old variable binding so that we can restore the binding when
   1889     // we unrecurse.
   1890     OldBindings.push_back(NamedValues[VarName]);
   1891     
   1892     // Remember this binding.
   1893     NamedValues[VarName] = Alloca;
   1894   }
   1895   
   1896   // Codegen the body, now that all vars are in scope.
   1897   Value *BodyVal = Body-&gt;Codegen();
   1898   if (BodyVal == 0) return 0;
   1899   
   1900   // Pop all our variables from scope.
   1901   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
   1902     NamedValues[VarNames[i].first] = OldBindings[i];
   1903 
   1904   // Return the body computation.
   1905   return BodyVal;
   1906 }
   1907 
   1908 Function *PrototypeAST::Codegen() {
   1909   // Make the function type:  double(double,double) etc.
   1910   std::vector&lt;Type*&gt; Doubles(Args.size(),
   1911                              Type::getDoubleTy(getGlobalContext()));
   1912   FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
   1913                                        Doubles, false);
   1914   
   1915   Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
   1916   
   1917   // If F conflicted, there was already something named 'Name'.  If it has a
   1918   // body, don't allow redefinition or reextern.
   1919   if (F-&gt;getName() != Name) {
   1920     // Delete the one we just made and get the existing one.
   1921     F-&gt;eraseFromParent();
   1922     F = TheModule-&gt;getFunction(Name);
   1923     
   1924     // If F already has a body, reject this.
   1925     if (!F-&gt;empty()) {
   1926       ErrorF("redefinition of function");
   1927       return 0;
   1928     }
   1929     
   1930     // If F took a different number of args, reject.
   1931     if (F-&gt;arg_size() != Args.size()) {
   1932       ErrorF("redefinition of function with different # args");
   1933       return 0;
   1934     }
   1935   }
   1936   
   1937   // Set names for all arguments.
   1938   unsigned Idx = 0;
   1939   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
   1940        ++AI, ++Idx)
   1941     AI-&gt;setName(Args[Idx]);
   1942     
   1943   return F;
   1944 }
   1945 
   1946 /// CreateArgumentAllocas - Create an alloca for each argument and register the
   1947 /// argument in the symbol table so that references to it will succeed.
   1948 void PrototypeAST::CreateArgumentAllocas(Function *F) {
   1949   Function::arg_iterator AI = F-&gt;arg_begin();
   1950   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
   1951     // Create an alloca for this variable.
   1952     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
   1953 
   1954     // Store the initial value into the alloca.
   1955     Builder.CreateStore(AI, Alloca);
   1956 
   1957     // Add arguments to variable symbol table.
   1958     NamedValues[Args[Idx]] = Alloca;
   1959   }
   1960 }
   1961 
   1962 Function *FunctionAST::Codegen() {
   1963   NamedValues.clear();
   1964   
   1965   Function *TheFunction = Proto-&gt;Codegen();
   1966   if (TheFunction == 0)
   1967     return 0;
   1968   
   1969   // If this is an operator, install it.
   1970   if (Proto-&gt;isBinaryOp())
   1971     BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
   1972   
   1973   // Create a new basic block to start insertion into.
   1974   BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
   1975   Builder.SetInsertPoint(BB);
   1976   
   1977   // Add all arguments to the symbol table and create their allocas.
   1978   Proto-&gt;CreateArgumentAllocas(TheFunction);
   1979 
   1980   if (Value *RetVal = Body-&gt;Codegen()) {
   1981     // Finish off the function.
   1982     Builder.CreateRet(RetVal);
   1983 
   1984     // Validate the generated code, checking for consistency.
   1985     verifyFunction(*TheFunction);
   1986 
   1987     // Optimize the function.
   1988     TheFPM-&gt;run(*TheFunction);
   1989     
   1990     return TheFunction;
   1991   }
   1992   
   1993   // Error reading body, remove function.
   1994   TheFunction-&gt;eraseFromParent();
   1995 
   1996   if (Proto-&gt;isBinaryOp())
   1997     BinopPrecedence.erase(Proto-&gt;getOperatorName());
   1998   return 0;
   1999 }
   2000 
   2001 //===----------------------------------------------------------------------===//
   2002 // Top-Level parsing and JIT Driver
   2003 //===----------------------------------------------------------------------===//
   2004 
   2005 static ExecutionEngine *TheExecutionEngine;
   2006 
   2007 static void HandleDefinition() {
   2008   if (FunctionAST *F = ParseDefinition()) {
   2009     if (Function *LF = F-&gt;Codegen()) {
   2010       fprintf(stderr, "Read function definition:");
   2011       LF-&gt;dump();
   2012     }
   2013   } else {
   2014     // Skip token for error recovery.
   2015     getNextToken();
   2016   }
   2017 }
   2018 
   2019 static void HandleExtern() {
   2020   if (PrototypeAST *P = ParseExtern()) {
   2021     if (Function *F = P-&gt;Codegen()) {
   2022       fprintf(stderr, "Read extern: ");
   2023       F-&gt;dump();
   2024     }
   2025   } else {
   2026     // Skip token for error recovery.
   2027     getNextToken();
   2028   }
   2029 }
   2030 
   2031 static void HandleTopLevelExpression() {
   2032   // Evaluate a top-level expression into an anonymous function.
   2033   if (FunctionAST *F = ParseTopLevelExpr()) {
   2034     if (Function *LF = F-&gt;Codegen()) {
   2035       // JIT the function, returning a function pointer.
   2036       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
   2037       
   2038       // Cast it to the right type (takes no arguments, returns a double) so we
   2039       // can call it as a native function.
   2040       double (*FP)() = (double (*)())(intptr_t)FPtr;
   2041       fprintf(stderr, "Evaluated to %f\n", FP());
   2042     }
   2043   } else {
   2044     // Skip token for error recovery.
   2045     getNextToken();
   2046   }
   2047 }
   2048 
   2049 /// top ::= definition | external | expression | ';'
   2050 static void MainLoop() {
   2051   while (1) {
   2052     fprintf(stderr, "ready&gt; ");
   2053     switch (CurTok) {
   2054     case tok_eof:    return;
   2055     case ';':        getNextToken(); break;  // ignore top-level semicolons.
   2056     case tok_def:    HandleDefinition(); break;
   2057     case tok_extern: HandleExtern(); break;
   2058     default:         HandleTopLevelExpression(); break;
   2059     }
   2060   }
   2061 }
   2062 
   2063 //===----------------------------------------------------------------------===//
   2064 // "Library" functions that can be "extern'd" from user code.
   2065 //===----------------------------------------------------------------------===//
   2066 
   2067 /// putchard - putchar that takes a double and returns 0.
   2068 extern "C" 
   2069 double putchard(double X) {
   2070   putchar((char)X);
   2071   return 0;
   2072 }
   2073 
   2074 /// printd - printf that takes a double prints it as "%f\n", returning 0.
   2075 extern "C" 
   2076 double printd(double X) {
   2077   printf("%f\n", X);
   2078   return 0;
   2079 }
   2080 
   2081 //===----------------------------------------------------------------------===//
   2082 // Main driver code.
   2083 //===----------------------------------------------------------------------===//
   2084 
   2085 int main() {
   2086   InitializeNativeTarget();
   2087   LLVMContext &amp;Context = getGlobalContext();
   2088 
   2089   // Install standard binary operators.
   2090   // 1 is lowest precedence.
   2091   BinopPrecedence['='] = 2;
   2092   BinopPrecedence['&lt;'] = 10;
   2093   BinopPrecedence['+'] = 20;
   2094   BinopPrecedence['-'] = 20;
   2095   BinopPrecedence['*'] = 40;  // highest.
   2096 
   2097   // Prime the first token.
   2098   fprintf(stderr, "ready&gt; ");
   2099   getNextToken();
   2100 
   2101   // Make the module, which holds all the code.
   2102   TheModule = new Module("my cool jit", Context);
   2103 
   2104   // Create the JIT.  This takes ownership of the module.
   2105   std::string ErrStr;
   2106   TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&amp;ErrStr).create();
   2107   if (!TheExecutionEngine) {
   2108     fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
   2109     exit(1);
   2110   }
   2111 
   2112   FunctionPassManager OurFPM(TheModule);
   2113 
   2114   // Set up the optimizer pipeline.  Start with registering info about how the
   2115   // target lays out data structures.
   2116   OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
   2117   // Provide basic AliasAnalysis support for GVN.
   2118   OurFPM.add(createBasicAliasAnalysisPass());
   2119   // Promote allocas to registers.
   2120   OurFPM.add(createPromoteMemoryToRegisterPass());
   2121   // Do simple "peephole" optimizations and bit-twiddling optzns.
   2122   OurFPM.add(createInstructionCombiningPass());
   2123   // Reassociate expressions.
   2124   OurFPM.add(createReassociatePass());
   2125   // Eliminate Common SubExpressions.
   2126   OurFPM.add(createGVNPass());
   2127   // Simplify the control flow graph (deleting unreachable blocks, etc).
   2128   OurFPM.add(createCFGSimplificationPass());
   2129 
   2130   OurFPM.doInitialization();
   2131 
   2132   // Set the global so the code gen can use this.
   2133   TheFPM = &amp;OurFPM;
   2134 
   2135   // Run the main "interpreter loop" now.
   2136   MainLoop();
   2137 
   2138   TheFPM = 0;
   2139 
   2140   // Print out all of the generated code.
   2141   TheModule-&gt;dump();
   2142 
   2143   return 0;
   2144 }
   2145 </pre>
   2146 </div>
   2147 
   2148 <a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
   2149 </div>
   2150 
   2151 <!-- *********************************************************************** -->
   2152 <hr>
   2153 <address>
   2154   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
   2155   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
   2156   <a href="http://validator.w3.org/check/referer"><img
   2157   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
   2158 
   2159   <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br>
   2160   <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
   2161   Last modified: $Date: 2011-10-16 04:07:38 -0400 (Sun, 16 Oct 2011) $
   2162 </address>
   2163 </body>
   2164 </html>
   2165