Home | History | Annotate | Download | only in tutorial
      1 ========================================
      2 Kaleidoscope: Code generation to LLVM IR
      3 ========================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 3 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 3 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. This chapter shows you how to transform
     13 the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2,
     14 into LLVM IR. This will teach you a little bit about how LLVM does
     15 things, as well as demonstrate how easy it is to use. It's much more
     16 work to build a lexer and parser than it is to generate LLVM IR code. :)
     17 
     18 **Please note**: the code in this chapter and later require LLVM 2.3 or
     19 LLVM SVN to work. LLVM 2.2 and before will not work with it.
     20 
     21 Code Generation Setup
     22 =====================
     23 
     24 In order to generate LLVM IR, we want some simple setup to get started.
     25 First we define virtual code generation (codegen) methods in each AST
     26 class:
     27 
     28 .. code-block:: ocaml
     29 
     30     let rec codegen_expr = function
     31       | Ast.Number n -> ...
     32       | Ast.Variable name -> ...
     33 
     34 The ``Codegen.codegen_expr`` function says to emit IR for that AST node
     35 along with all the things it depends on, and they all return an LLVM
     36 Value object. "Value" is the class used to represent a "`Static Single
     37 Assignment
     38 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
     39 register" or "SSA value" in LLVM. The most distinct aspect of SSA values
     40 is that their value is computed as the related instruction executes, and
     41 it does not get a new value until (and if) the instruction re-executes.
     42 In other words, there is no way to "change" an SSA value. For more
     43 information, please read up on `Static Single
     44 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
     45 - the concepts are really quite natural once you grok them.
     46 
     47 The second thing we want is an "Error" exception like we used for the
     48 parser, which will be used to report errors found during code generation
     49 (for example, use of an undeclared parameter):
     50 
     51 .. code-block:: ocaml
     52 
     53     exception Error of string
     54 
     55     let context = global_context ()
     56     let the_module = create_module context "my cool jit"
     57     let builder = builder context
     58     let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
     59     let double_type = double_type context
     60 
     61 The static variables will be used during code generation.
     62 ``Codgen.the_module`` is the LLVM construct that contains all of the
     63 functions and global variables in a chunk of code. In many ways, it is
     64 the top-level structure that the LLVM IR uses to contain code.
     65 
     66 The ``Codegen.builder`` object is a helper object that makes it easy to
     67 generate LLVM instructions. Instances of the
     68 ```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
     69 class keep track of the current place to insert instructions and has
     70 methods to create new instructions.
     71 
     72 The ``Codegen.named_values`` map keeps track of which values are defined
     73 in the current scope and what their LLVM representation is. (In other
     74 words, it is a symbol table for the code). In this form of Kaleidoscope,
     75 the only things that can be referenced are function parameters. As such,
     76 function parameters will be in this map when generating code for their
     77 function body.
     78 
     79 With these basics in place, we can start talking about how to generate
     80 code for each expression. Note that this assumes that the
     81 ``Codgen.builder`` has been set up to generate code *into* something.
     82 For now, we'll assume that this has already been done, and we'll just
     83 use it to emit code.
     84 
     85 Expression Code Generation
     86 ==========================
     87 
     88 Generating LLVM code for expression nodes is very straightforward: less
     89 than 30 lines of commented code for all four of our expression nodes.
     90 First we'll do numeric literals:
     91 
     92 .. code-block:: ocaml
     93 
     94       | Ast.Number n -> const_float double_type n
     95 
     96 In the LLVM IR, numeric constants are represented with the
     97 ``ConstantFP`` class, which holds the numeric value in an ``APFloat``
     98 internally (``APFloat`` has the capability of holding floating point
     99 constants of Arbitrary Precision). This code basically just creates
    100 and returns a ``ConstantFP``. Note that in the LLVM IR that constants
    101 are all uniqued together and shared. For this reason, the API uses "the
    102 foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".
    103 
    104 .. code-block:: ocaml
    105 
    106       | Ast.Variable name ->
    107           (try Hashtbl.find named_values name with
    108             | Not_found -> raise (Error "unknown variable name"))
    109 
    110 References to variables are also quite simple using LLVM. In the simple
    111 version of Kaleidoscope, we assume that the variable has already been
    112 emitted somewhere and its value is available. In practice, the only
    113 values that can be in the ``Codegen.named_values`` map are function
    114 arguments. This code simply checks to see that the specified name is in
    115 the map (if not, an unknown variable is being referenced) and returns
    116 the value for it. In future chapters, we'll add support for `loop
    117 induction variables <LangImpl5.html#for>`_ in the symbol table, and for
    118 `local variables <LangImpl7.html#localvars>`_.
    119 
    120 .. code-block:: ocaml
    121 
    122       | Ast.Binary (op, lhs, rhs) ->
    123           let lhs_val = codegen_expr lhs in
    124           let rhs_val = codegen_expr rhs in
    125           begin
    126             match op with
    127             | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
    128             | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
    129             | '*' -> build_fmul lhs_val rhs_val "multmp" builder
    130             | '<' ->
    131                 (* Convert bool 0/1 to double 0.0 or 1.0 *)
    132                 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
    133                 build_uitofp i double_type "booltmp" builder
    134             | _ -> raise (Error "invalid binary operator")
    135           end
    136 
    137 Binary operators start to get more interesting. The basic idea here is
    138 that we recursively emit code for the left-hand side of the expression,
    139 then the right-hand side, then we compute the result of the binary
    140 expression. In this code, we do a simple switch on the opcode to create
    141 the right LLVM instruction.
    142 
    143 In the example above, the LLVM builder class is starting to show its
    144 value. IRBuilder knows where to insert the newly created instruction,
    145 all you have to do is specify what instruction to create (e.g. with
    146 ``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here)
    147 and optionally provide a name for the generated instruction.
    148 
    149 One nice thing about LLVM is that the name is just a hint. For instance,
    150 if the code above emits multiple "addtmp" variables, LLVM will
    151 automatically provide each one with an increasing, unique numeric
    152 suffix. Local value names for instructions are purely optional, but it
    153 makes it much easier to read the IR dumps.
    154 
    155 `LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
    156 rules: for example, the Left and Right operators of an `add
    157 instruction <../LangRef.html#i_add>`_ must have the same type, and the
    158 result type of the add must match the operand types. Because all values
    159 in Kaleidoscope are doubles, this makes for very simple code for add,
    160 sub and mul.
    161 
    162 On the other hand, LLVM specifies that the `fcmp
    163 instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
    164 one bit integer). The problem with this is that Kaleidoscope wants the
    165 value to be a 0.0 or 1.0 value. In order to get these semantics, we
    166 combine the fcmp instruction with a `uitofp
    167 instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
    168 input integer into a floating point value by treating the input as an
    169 unsigned value. In contrast, if we used the `sitofp
    170 instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
    171 would return 0.0 and -1.0, depending on the input value.
    172 
    173 .. code-block:: ocaml
    174 
    175       | Ast.Call (callee, args) ->
    176           (* Look up the name in the module table. *)
    177           let callee =
    178             match lookup_function callee the_module with
    179             | Some callee -> callee
    180             | None -> raise (Error "unknown function referenced")
    181           in
    182           let params = params callee in
    183 
    184           (* If argument mismatch error. *)
    185           if Array.length params == Array.length args then () else
    186             raise (Error "incorrect # arguments passed");
    187           let args = Array.map codegen_expr args in
    188           build_call callee args "calltmp" builder
    189 
    190 Code generation for function calls is quite straightforward with LLVM.
    191 The code above initially does a function name lookup in the LLVM
    192 Module's symbol table. Recall that the LLVM Module is the container that
    193 holds all of the functions we are JIT'ing. By giving each function the
    194 same name as what the user specifies, we can use the LLVM symbol table
    195 to resolve function names for us.
    196 
    197 Once we have the function to call, we recursively codegen each argument
    198 that is to be passed in, and create an LLVM `call
    199 instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
    200 calling conventions by default, allowing these calls to also call into
    201 standard library functions like "sin" and "cos", with no additional
    202 effort.
    203 
    204 This wraps up our handling of the four basic expressions that we have so
    205 far in Kaleidoscope. Feel free to go in and add some more. For example,
    206 by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
    207 several other interesting instructions that are really easy to plug into
    208 our basic framework.
    209 
    210 Function Code Generation
    211 ========================
    212 
    213 Code generation for prototypes and functions must handle a number of
    214 details, which make their code less beautiful than expression code
    215 generation, but allows us to illustrate some important points. First,
    216 lets talk about code generation for prototypes: they are used both for
    217 function bodies and external function declarations. The code starts
    218 with:
    219 
    220 .. code-block:: ocaml
    221 
    222     let codegen_proto = function
    223       | Ast.Prototype (name, args) ->
    224           (* Make the function type: double(double,double) etc. *)
    225           let doubles = Array.make (Array.length args) double_type in
    226           let ft = function_type double_type doubles in
    227           let f =
    228             match lookup_function name the_module with
    229 
    230 This code packs a lot of power into a few lines. Note first that this
    231 function returns a "Function\*" instead of a "Value\*" (although at the
    232 moment they both are modeled by ``llvalue`` in ocaml). Because a
    233 "prototype" really talks about the external interface for a function
    234 (not the value computed by an expression), it makes sense for it to
    235 return the LLVM Function it corresponds to when codegen'd.
    236 
    237 The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that
    238 should be used for a given Prototype. Since all function arguments in
    239 Kaleidoscope are of type double, the first line creates a vector of "N"
    240 LLVM double types. It then uses the ``Llvm.function_type`` method to
    241 create a function type that takes "N" doubles as arguments, returns one
    242 double as a result, and that is not vararg (that uses the function
    243 ``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued
    244 just like ``Constant``'s are, so you don't "new" a type, you "get" it.
    245 
    246 The final line above checks if the function has already been defined in
    247 ``Codegen.the_module``. If not, we will create it.
    248 
    249 .. code-block:: ocaml
    250 
    251             | None -> declare_function name ft the_module
    252 
    253 This indicates the type and name to use, as well as which module to
    254 insert into. By default we assume a function has
    255 ``Llvm.Linkage.ExternalLinkage``. "`external
    256 linkage <LangRef.html#linkage>`_" means that the function may be defined
    257 outside the current module and/or that it is callable by functions
    258 outside the module. The "``name``" passed in is the name the user
    259 specified: this name is registered in "``Codegen.the_module``"s symbol
    260 table, which is used by the function call code above.
    261 
    262 In Kaleidoscope, I choose to allow redefinitions of functions in two
    263 cases: first, we want to allow 'extern'ing a function more than once, as
    264 long as the prototypes for the externs match (since all arguments have
    265 the same type, we just have to check that the number of arguments
    266 match). Second, we want to allow 'extern'ing a function and then
    267 defining a body for it. This is useful when defining mutually recursive
    268 functions.
    269 
    270 .. code-block:: ocaml
    271 
    272             (* If 'f' conflicted, there was already something named 'name'. If it
    273              * has a body, don't allow redefinition or reextern. *)
    274             | Some f ->
    275                 (* If 'f' already has a body, reject this. *)
    276                 if Array.length (basic_blocks f) == 0 then () else
    277                   raise (Error "redefinition of function");
    278 
    279                 (* If 'f' took a different number of arguments, reject. *)
    280                 if Array.length (params f) == Array.length args then () else
    281                   raise (Error "redefinition of function with different # args");
    282                 f
    283           in
    284 
    285 In order to verify the logic above, we first check to see if the
    286 pre-existing function is "empty". In this case, empty means that it has
    287 no basic blocks in it, which means it has no body. If it has no body, it
    288 is a forward declaration. Since we don't allow anything after a full
    289 definition of the function, the code rejects this case. If the previous
    290 reference to a function was an 'extern', we simply verify that the
    291 number of arguments for that definition and this one match up. If not,
    292 we emit an error.
    293 
    294 .. code-block:: ocaml
    295 
    296           (* Set names for all arguments. *)
    297           Array.iteri (fun i a ->
    298             let n = args.(i) in
    299             set_value_name n a;
    300             Hashtbl.add named_values n a;
    301           ) (params f);
    302           f
    303 
    304 The last bit of code for prototypes loops over all of the arguments in
    305 the function, setting the name of the LLVM Argument objects to match,
    306 and registering the arguments in the ``Codegen.named_values`` map for
    307 future use by the ``Ast.Variable`` variant. Once this is set up, it
    308 returns the Function object to the caller. Note that we don't check for
    309 conflicting argument names here (e.g. "extern foo(a b a)"). Doing so
    310 would be very straight-forward with the mechanics we have already used
    311 above.
    312 
    313 .. code-block:: ocaml
    314 
    315     let codegen_func = function
    316       | Ast.Function (proto, body) ->
    317           Hashtbl.clear named_values;
    318           let the_function = codegen_proto proto in
    319 
    320 Code generation for function definitions starts out simply enough: we
    321 just codegen the prototype (Proto) and verify that it is ok. We then
    322 clear out the ``Codegen.named_values`` map to make sure that there isn't
    323 anything in it from the last function we compiled. Code generation of
    324 the prototype ensures that there is an LLVM Function object that is
    325 ready to go for us.
    326 
    327 .. code-block:: ocaml
    328 
    329           (* Create a new basic block to start insertion into. *)
    330           let bb = append_block context "entry" the_function in
    331           position_at_end bb builder;
    332 
    333           try
    334             let ret_val = codegen_expr body in
    335 
    336 Now we get to the point where the ``Codegen.builder`` is set up. The
    337 first line creates a new `basic
    338 block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"),
    339 which is inserted into ``the_function``. The second line then tells the
    340 builder that new instructions should be inserted into the end of the new
    341 basic block. Basic blocks in LLVM are an important part of functions
    342 that define the `Control Flow
    343 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
    344 don't have any control flow, our functions will only contain one block
    345 at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :).
    346 
    347 .. code-block:: ocaml
    348 
    349             let ret_val = codegen_expr body in
    350 
    351             (* Finish off the function. *)
    352             let _ = build_ret ret_val builder in
    353 
    354             (* Validate the generated code, checking for consistency. *)
    355             Llvm_analysis.assert_valid_function the_function;
    356 
    357             the_function
    358 
    359 Once the insertion point is set up, we call the ``Codegen.codegen_func``
    360 method for the root expression of the function. If no error happens,
    361 this emits code to compute the expression into the entry block and
    362 returns the value that was computed. Assuming no error, we then create
    363 an LLVM `ret instruction <../LangRef.html#i_ret>`_, which completes the
    364 function. Once the function is built, we call
    365 ``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This
    366 function does a variety of consistency checks on the generated code, to
    367 determine if our compiler is doing everything right. Using this is
    368 important: it can catch a lot of bugs. Once the function is finished and
    369 validated, we return it.
    370 
    371 .. code-block:: ocaml
    372 
    373           with e ->
    374             delete_function the_function;
    375             raise e
    376 
    377 The only piece left here is handling of the error case. For simplicity,
    378 we handle this by merely deleting the function we produced with the
    379 ``Llvm.delete_function`` method. This allows the user to redefine a
    380 function that they incorrectly typed in before: if we didn't delete it,
    381 it would live in the symbol table, with a body, preventing future
    382 redefinition.
    383 
    384 This code does have a bug, though. Since the ``Codegen.codegen_proto``
    385 can return a previously defined forward declaration, our code can
    386 actually delete a forward declaration. There are a number of ways to fix
    387 this bug, see what you can come up with! Here is a testcase:
    388 
    389 ::
    390 
    391     extern foo(a b);     # ok, defines foo.
    392     def foo(a b) c;      # error, 'c' is invalid.
    393     def bar() foo(1, 2); # error, unknown function "foo"
    394 
    395 Driver Changes and Closing Thoughts
    396 ===================================
    397 
    398 For now, code generation to LLVM doesn't really get us much, except that
    399 we can look at the pretty IR calls. The sample code inserts calls to
    400 Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM
    401 IR. This gives a nice way to look at the LLVM IR for simple functions.
    402 For example:
    403 
    404 ::
    405 
    406     ready> 4+5;
    407     Read top-level expression:
    408     define double @""() {
    409     entry:
    410             %addtmp = fadd double 4.000000e+00, 5.000000e+00
    411             ret double %addtmp
    412     }
    413 
    414 Note how the parser turns the top-level expression into anonymous
    415 functions for us. This will be handy when we add `JIT
    416 support <OCamlLangImpl4.html#jit>`_ in the next chapter. Also note that
    417 the code is very literally transcribed, no optimizations are being
    418 performed. We will `add
    419 optimizations <OCamlLangImpl4.html#trivialconstfold>`_ explicitly in the
    420 next chapter.
    421 
    422 ::
    423 
    424     ready> def foo(a b) a*a + 2*a*b + b*b;
    425     Read function definition:
    426     define double @foo(double %a, double %b) {
    427     entry:
    428             %multmp = fmul double %a, %a
    429             %multmp1 = fmul double 2.000000e+00, %a
    430             %multmp2 = fmul double %multmp1, %b
    431             %addtmp = fadd double %multmp, %multmp2
    432             %multmp3 = fmul double %b, %b
    433             %addtmp4 = fadd double %addtmp, %multmp3
    434             ret double %addtmp4
    435     }
    436 
    437 This shows some simple arithmetic. Notice the striking similarity to the
    438 LLVM builder calls that we use to create the instructions.
    439 
    440 ::
    441 
    442     ready> def bar(a) foo(a, 4.0) + bar(31337);
    443     Read function definition:
    444     define double @bar(double %a) {
    445     entry:
    446             %calltmp = call double @foo(double %a, double 4.000000e+00)
    447             %calltmp1 = call double @bar(double 3.133700e+04)
    448             %addtmp = fadd double %calltmp, %calltmp1
    449             ret double %addtmp
    450     }
    451 
    452 This shows some function calls. Note that this function will take a long
    453 time to execute if you call it. In the future we'll add conditional
    454 control flow to actually make recursion useful :).
    455 
    456 ::
    457 
    458     ready> extern cos(x);
    459     Read extern:
    460     declare double @cos(double)
    461 
    462     ready> cos(1.234);
    463     Read top-level expression:
    464     define double @""() {
    465     entry:
    466             %calltmp = call double @cos(double 1.234000e+00)
    467             ret double %calltmp
    468     }
    469 
    470 This shows an extern for the libm "cos" function, and a call to it.
    471 
    472 ::
    473 
    474     ready> ^D
    475     ; ModuleID = 'my cool jit'
    476 
    477     define double @""() {
    478     entry:
    479             %addtmp = fadd double 4.000000e+00, 5.000000e+00
    480             ret double %addtmp
    481     }
    482 
    483     define double @foo(double %a, double %b) {
    484     entry:
    485             %multmp = fmul double %a, %a
    486             %multmp1 = fmul double 2.000000e+00, %a
    487             %multmp2 = fmul double %multmp1, %b
    488             %addtmp = fadd double %multmp, %multmp2
    489             %multmp3 = fmul double %b, %b
    490             %addtmp4 = fadd double %addtmp, %multmp3
    491             ret double %addtmp4
    492     }
    493 
    494     define double @bar(double %a) {
    495     entry:
    496             %calltmp = call double @foo(double %a, double 4.000000e+00)
    497             %calltmp1 = call double @bar(double 3.133700e+04)
    498             %addtmp = fadd double %calltmp, %calltmp1
    499             ret double %addtmp
    500     }
    501 
    502     declare double @cos(double)
    503 
    504     define double @""() {
    505     entry:
    506             %calltmp = call double @cos(double 1.234000e+00)
    507             ret double %calltmp
    508     }
    509 
    510 When you quit the current demo, it dumps out the IR for the entire
    511 module generated. Here you can see the big picture with all the
    512 functions referencing each other.
    513 
    514 This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
    515 we'll describe how to `add JIT codegen and optimizer
    516 support <OCamlLangImpl4.html>`_ to this so we can actually start running
    517 code!
    518 
    519 Full Code Listing
    520 =================
    521 
    522 Here is the complete code listing for our running example, enhanced with
    523 the LLVM code generator. Because this uses the LLVM libraries, we need
    524 to link them in. To do this, we use the
    525 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
    526 our makefile/command line about which options to use:
    527 
    528 .. code-block:: bash
    529 
    530     # Compile
    531     ocamlbuild toy.byte
    532     # Run
    533     ./toy.byte
    534 
    535 Here is the code:
    536 
    537 \_tags:
    538     ::
    539 
    540         <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
    541         <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
    542 
    543 myocamlbuild.ml:
    544     .. code-block:: ocaml
    545 
    546         open Ocamlbuild_plugin;;
    547 
    548         ocaml_lib ~extern:true "llvm";;
    549         ocaml_lib ~extern:true "llvm_analysis";;
    550 
    551         flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
    552 
    553 token.ml:
    554     .. code-block:: ocaml
    555 
    556         (*===----------------------------------------------------------------------===
    557          * Lexer Tokens
    558          *===----------------------------------------------------------------------===*)
    559 
    560         (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
    561          * these others for known things. *)
    562         type token =
    563           (* commands *)
    564           | Def | Extern
    565 
    566           (* primary *)
    567           | Ident of string | Number of float
    568 
    569           (* unknown *)
    570           | Kwd of char
    571 
    572 lexer.ml:
    573     .. code-block:: ocaml
    574 
    575         (*===----------------------------------------------------------------------===
    576          * Lexer
    577          *===----------------------------------------------------------------------===*)
    578 
    579         let rec lex = parser
    580           (* Skip any whitespace. *)
    581           | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
    582 
    583           (* identifier: [a-zA-Z][a-zA-Z0-9] *)
    584           | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
    585               let buffer = Buffer.create 1 in
    586               Buffer.add_char buffer c;
    587               lex_ident buffer stream
    588 
    589           (* number: [0-9.]+ *)
    590           | [< ' ('0' .. '9' as c); stream >] ->
    591               let buffer = Buffer.create 1 in
    592               Buffer.add_char buffer c;
    593               lex_number buffer stream
    594 
    595           (* Comment until end of line. *)
    596           | [< ' ('#'); stream >] ->
    597               lex_comment stream
    598 
    599           (* Otherwise, just return the character as its ascii value. *)
    600           | [< 'c; stream >] ->
    601               [< 'Token.Kwd c; lex stream >]
    602 
    603           (* end of stream. *)
    604           | [< >] -> [< >]
    605 
    606         and lex_number buffer = parser
    607           | [< ' ('0' .. '9' | '.' as c); stream >] ->
    608               Buffer.add_char buffer c;
    609               lex_number buffer stream
    610           | [< stream=lex >] ->
    611               [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
    612 
    613         and lex_ident buffer = parser
    614           | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
    615               Buffer.add_char buffer c;
    616               lex_ident buffer stream
    617           | [< stream=lex >] ->
    618               match Buffer.contents buffer with
    619               | "def" -> [< 'Token.Def; stream >]
    620               | "extern" -> [< 'Token.Extern; stream >]
    621               | id -> [< 'Token.Ident id; stream >]
    622 
    623         and lex_comment = parser
    624           | [< ' ('\n'); stream=lex >] -> stream
    625           | [< 'c; e=lex_comment >] -> e
    626           | [< >] -> [< >]
    627 
    628 ast.ml:
    629     .. code-block:: ocaml
    630 
    631         (*===----------------------------------------------------------------------===
    632          * Abstract Syntax Tree (aka Parse Tree)
    633          *===----------------------------------------------------------------------===*)
    634 
    635         (* expr - Base type for all expression nodes. *)
    636         type expr =
    637           (* variant for numeric literals like "1.0". *)
    638           | Number of float
    639 
    640           (* variant for referencing a variable, like "a". *)
    641           | Variable of string
    642 
    643           (* variant for a binary operator. *)
    644           | Binary of char * expr * expr
    645 
    646           (* variant for function calls. *)
    647           | Call of string * expr array
    648 
    649         (* proto - This type represents the "prototype" for a function, which captures
    650          * its name, and its argument names (thus implicitly the number of arguments the
    651          * function takes). *)
    652         type proto = Prototype of string * string array
    653 
    654         (* func - This type represents a function definition itself. *)
    655         type func = Function of proto * expr
    656 
    657 parser.ml:
    658     .. code-block:: ocaml
    659 
    660         (*===---------------------------------------------------------------------===
    661          * Parser
    662          *===---------------------------------------------------------------------===*)
    663 
    664         (* binop_precedence - This holds the precedence for each binary operator that is
    665          * defined *)
    666         let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
    667 
    668         (* precedence - Get the precedence of the pending binary operator token. *)
    669         let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
    670 
    671         (* primary
    672          *   ::= identifier
    673          *   ::= numberexpr
    674          *   ::= parenexpr *)
    675         let rec parse_primary = parser
    676           (* numberexpr ::= number *)
    677           | [< 'Token.Number n >] -> Ast.Number n
    678 
    679           (* parenexpr ::= '(' expression ')' *)
    680           | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
    681 
    682           (* identifierexpr
    683            *   ::= identifier
    684            *   ::= identifier '(' argumentexpr ')' *)
    685           | [< 'Token.Ident id; stream >] ->
    686               let rec parse_args accumulator = parser
    687                 | [< e=parse_expr; stream >] ->
    688                     begin parser
    689                       | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
    690                       | [< >] -> e :: accumulator
    691                     end stream
    692                 | [< >] -> accumulator
    693               in
    694               let rec parse_ident id = parser
    695                 (* Call. *)
    696                 | [< 'Token.Kwd '(';
    697                      args=parse_args [];
    698                      'Token.Kwd ')' ?? "expected ')'">] ->
    699                     Ast.Call (id, Array.of_list (List.rev args))
    700 
    701                 (* Simple variable ref. *)
    702                 | [< >] -> Ast.Variable id
    703               in
    704               parse_ident id stream
    705 
    706           | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
    707 
    708         (* binoprhs
    709          *   ::= ('+' primary)* *)
    710         and parse_bin_rhs expr_prec lhs stream =
    711           match Stream.peek stream with
    712           (* If this is a binop, find its precedence. *)
    713           | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
    714               let token_prec = precedence c in
    715 
    716               (* If this is a binop that binds at least as tightly as the current binop,
    717                * consume it, otherwise we are done. *)
    718               if token_prec < expr_prec then lhs else begin
    719                 (* Eat the binop. *)
    720                 Stream.junk stream;
    721 
    722                 (* Parse the primary expression after the binary operator. *)
    723                 let rhs = parse_primary stream in
    724 
    725                 (* Okay, we know this is a binop. *)
    726                 let rhs =
    727                   match Stream.peek stream with
    728                   | Some (Token.Kwd c2) ->
    729                       (* If BinOp binds less tightly with rhs than the operator after
    730                        * rhs, let the pending operator take rhs as its lhs. *)
    731                       let next_prec = precedence c2 in
    732                       if token_prec < next_prec
    733                       then parse_bin_rhs (token_prec + 1) rhs stream
    734                       else rhs
    735                   | _ -> rhs
    736                 in
    737 
    738                 (* Merge lhs/rhs. *)
    739                 let lhs = Ast.Binary (c, lhs, rhs) in
    740                 parse_bin_rhs expr_prec lhs stream
    741               end
    742           | _ -> lhs
    743 
    744         (* expression
    745          *   ::= primary binoprhs *)
    746         and parse_expr = parser
    747           | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
    748 
    749         (* prototype
    750          *   ::= id '(' id* ')' *)
    751         let parse_prototype =
    752           let rec parse_args accumulator = parser
    753             | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
    754             | [< >] -> accumulator
    755           in
    756 
    757           parser
    758           | [< 'Token.Ident id;
    759                'Token.Kwd '(' ?? "expected '(' in prototype";
    760                args=parse_args [];
    761                'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
    762               (* success. *)
    763               Ast.Prototype (id, Array.of_list (List.rev args))
    764 
    765           | [< >] ->
    766               raise (Stream.Error "expected function name in prototype")
    767 
    768         (* definition ::= 'def' prototype expression *)
    769         let parse_definition = parser
    770           | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
    771               Ast.Function (p, e)
    772 
    773         (* toplevelexpr ::= expression *)
    774         let parse_toplevel = parser
    775           | [< e=parse_expr >] ->
    776               (* Make an anonymous proto. *)
    777               Ast.Function (Ast.Prototype ("", [||]), e)
    778 
    779         (*  external ::= 'extern' prototype *)
    780         let parse_extern = parser
    781           | [< 'Token.Extern; e=parse_prototype >] -> e
    782 
    783 codegen.ml:
    784     .. code-block:: ocaml
    785 
    786         (*===----------------------------------------------------------------------===
    787          * Code Generation
    788          *===----------------------------------------------------------------------===*)
    789 
    790         open Llvm
    791 
    792         exception Error of string
    793 
    794         let context = global_context ()
    795         let the_module = create_module context "my cool jit"
    796         let builder = builder context
    797         let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
    798         let double_type = double_type context
    799 
    800         let rec codegen_expr = function
    801           | Ast.Number n -> const_float double_type n
    802           | Ast.Variable name ->
    803               (try Hashtbl.find named_values name with
    804                 | Not_found -> raise (Error "unknown variable name"))
    805           | Ast.Binary (op, lhs, rhs) ->
    806               let lhs_val = codegen_expr lhs in
    807               let rhs_val = codegen_expr rhs in
    808               begin
    809                 match op with
    810                 | '+' -> build_add lhs_val rhs_val "addtmp" builder
    811                 | '-' -> build_sub lhs_val rhs_val "subtmp" builder
    812                 | '*' -> build_mul lhs_val rhs_val "multmp" builder
    813                 | '<' ->
    814                     (* Convert bool 0/1 to double 0.0 or 1.0 *)
    815                     let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
    816                     build_uitofp i double_type "booltmp" builder
    817                 | _ -> raise (Error "invalid binary operator")
    818               end
    819           | Ast.Call (callee, args) ->
    820               (* Look up the name in the module table. *)
    821               let callee =
    822                 match lookup_function callee the_module with
    823                 | Some callee -> callee
    824                 | None -> raise (Error "unknown function referenced")
    825               in
    826               let params = params callee in
    827 
    828               (* If argument mismatch error. *)
    829               if Array.length params == Array.length args then () else
    830                 raise (Error "incorrect # arguments passed");
    831               let args = Array.map codegen_expr args in
    832               build_call callee args "calltmp" builder
    833 
    834         let codegen_proto = function
    835           | Ast.Prototype (name, args) ->
    836               (* Make the function type: double(double,double) etc. *)
    837               let doubles = Array.make (Array.length args) double_type in
    838               let ft = function_type double_type doubles in
    839               let f =
    840                 match lookup_function name the_module with
    841                 | None -> declare_function name ft the_module
    842 
    843                 (* If 'f' conflicted, there was already something named 'name'. If it
    844                  * has a body, don't allow redefinition or reextern. *)
    845                 | Some f ->
    846                     (* If 'f' already has a body, reject this. *)
    847                     if block_begin f <> At_end f then
    848                       raise (Error "redefinition of function");
    849 
    850                     (* If 'f' took a different number of arguments, reject. *)
    851                     if element_type (type_of f) <> ft then
    852                       raise (Error "redefinition of function with different # args");
    853                     f
    854               in
    855 
    856               (* Set names for all arguments. *)
    857               Array.iteri (fun i a ->
    858                 let n = args.(i) in
    859                 set_value_name n a;
    860                 Hashtbl.add named_values n a;
    861               ) (params f);
    862               f
    863 
    864         let codegen_func = function
    865           | Ast.Function (proto, body) ->
    866               Hashtbl.clear named_values;
    867               let the_function = codegen_proto proto in
    868 
    869               (* Create a new basic block to start insertion into. *)
    870               let bb = append_block context "entry" the_function in
    871               position_at_end bb builder;
    872 
    873               try
    874                 let ret_val = codegen_expr body in
    875 
    876                 (* Finish off the function. *)
    877                 let _ = build_ret ret_val builder in
    878 
    879                 (* Validate the generated code, checking for consistency. *)
    880                 Llvm_analysis.assert_valid_function the_function;
    881 
    882                 the_function
    883               with e ->
    884                 delete_function the_function;
    885                 raise e
    886 
    887 toplevel.ml:
    888     .. code-block:: ocaml
    889 
    890         (*===----------------------------------------------------------------------===
    891          * Top-Level parsing and JIT Driver
    892          *===----------------------------------------------------------------------===*)
    893 
    894         open Llvm
    895 
    896         (* top ::= definition | external | expression | ';' *)
    897         let rec main_loop stream =
    898           match Stream.peek stream with
    899           | None -> ()
    900 
    901           (* ignore top-level semicolons. *)
    902           | Some (Token.Kwd ';') ->
    903               Stream.junk stream;
    904               main_loop stream
    905 
    906           | Some token ->
    907               begin
    908                 try match token with
    909                 | Token.Def ->
    910                     let e = Parser.parse_definition stream in
    911                     print_endline "parsed a function definition.";
    912                     dump_value (Codegen.codegen_func e);
    913                 | Token.Extern ->
    914                     let e = Parser.parse_extern stream in
    915                     print_endline "parsed an extern.";
    916                     dump_value (Codegen.codegen_proto e);
    917                 | _ ->
    918                     (* Evaluate a top-level expression into an anonymous function. *)
    919                     let e = Parser.parse_toplevel stream in
    920                     print_endline "parsed a top-level expr";
    921                     dump_value (Codegen.codegen_func e);
    922                 with Stream.Error s | Codegen.Error s ->
    923                   (* Skip token for error recovery. *)
    924                   Stream.junk stream;
    925                   print_endline s;
    926               end;
    927               print_string "ready> "; flush stdout;
    928               main_loop stream
    929 
    930 toy.ml:
    931     .. code-block:: ocaml
    932 
    933         (*===----------------------------------------------------------------------===
    934          * Main driver code.
    935          *===----------------------------------------------------------------------===*)
    936 
    937         open Llvm
    938 
    939         let main () =
    940           (* Install standard binary operators.
    941            * 1 is the lowest precedence. *)
    942           Hashtbl.add Parser.binop_precedence '<' 10;
    943           Hashtbl.add Parser.binop_precedence '+' 20;
    944           Hashtbl.add Parser.binop_precedence '-' 20;
    945           Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
    946 
    947           (* Prime the first token. *)
    948           print_string "ready> "; flush stdout;
    949           let stream = Lexer.lex (Stream.of_channel stdin) in
    950 
    951           (* Run the main "interpreter loop" now. *)
    952           Toplevel.main_loop stream;
    953 
    954           (* Print out all the generated code. *)
    955           dump_module Codegen.the_module
    956         ;;
    957 
    958         main ()
    959 
    960 `Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_
    961 
    962