1 ======================================== 2 Kaleidoscope: Code generation to LLVM IR 3 ======================================== 4 5 .. contents:: 6 :local: 7 8 Chapter 3 Introduction 9 ====================== 10 11 Welcome to Chapter 3 of the "`Implementing a language with 12 LLVM <index.html>`_" tutorial. This chapter shows you how to transform 13 the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2, 14 into LLVM IR. This will teach you a little bit about how LLVM does 15 things, as well as demonstrate how easy it is to use. It's much more 16 work to build a lexer and parser than it is to generate LLVM IR code. :) 17 18 **Please note**: the code in this chapter and later require LLVM 2.3 or 19 LLVM SVN to work. LLVM 2.2 and before will not work with it. 20 21 Code Generation Setup 22 ===================== 23 24 In order to generate LLVM IR, we want some simple setup to get started. 25 First we define virtual code generation (codegen) methods in each AST 26 class: 27 28 .. code-block:: ocaml 29 30 let rec codegen_expr = function 31 | Ast.Number n -> ... 32 | Ast.Variable name -> ... 33 34 The ``Codegen.codegen_expr`` function says to emit IR for that AST node 35 along with all the things it depends on, and they all return an LLVM 36 Value object. "Value" is the class used to represent a "`Static Single 37 Assignment 38 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ 39 register" or "SSA value" in LLVM. The most distinct aspect of SSA values 40 is that their value is computed as the related instruction executes, and 41 it does not get a new value until (and if) the instruction re-executes. 42 In other words, there is no way to "change" an SSA value. For more 43 information, please read up on `Static Single 44 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ 45 - the concepts are really quite natural once you grok them. 46 47 The second thing we want is an "Error" exception like we used for the 48 parser, which will be used to report errors found during code generation 49 (for example, use of an undeclared parameter): 50 51 .. code-block:: ocaml 52 53 exception Error of string 54 55 let context = global_context () 56 let the_module = create_module context "my cool jit" 57 let builder = builder context 58 let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 59 let double_type = double_type context 60 61 The static variables will be used during code generation. 62 ``Codgen.the_module`` is the LLVM construct that contains all of the 63 functions and global variables in a chunk of code. In many ways, it is 64 the top-level structure that the LLVM IR uses to contain code. 65 66 The ``Codegen.builder`` object is a helper object that makes it easy to 67 generate LLVM instructions. Instances of the 68 `IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ 69 class keep track of the current place to insert instructions and has 70 methods to create new instructions. 71 72 The ``Codegen.named_values`` map keeps track of which values are defined 73 in the current scope and what their LLVM representation is. (In other 74 words, it is a symbol table for the code). In this form of Kaleidoscope, 75 the only things that can be referenced are function parameters. As such, 76 function parameters will be in this map when generating code for their 77 function body. 78 79 With these basics in place, we can start talking about how to generate 80 code for each expression. Note that this assumes that the 81 ``Codgen.builder`` has been set up to generate code *into* something. 82 For now, we'll assume that this has already been done, and we'll just 83 use it to emit code. 84 85 Expression Code Generation 86 ========================== 87 88 Generating LLVM code for expression nodes is very straightforward: less 89 than 30 lines of commented code for all four of our expression nodes. 90 First we'll do numeric literals: 91 92 .. code-block:: ocaml 93 94 | Ast.Number n -> const_float double_type n 95 96 In the LLVM IR, numeric constants are represented with the 97 ``ConstantFP`` class, which holds the numeric value in an ``APFloat`` 98 internally (``APFloat`` has the capability of holding floating point 99 constants of Arbitrary Precision). This code basically just creates 100 and returns a ``ConstantFP``. Note that in the LLVM IR that constants 101 are all uniqued together and shared. For this reason, the API uses "the 102 foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)". 103 104 .. code-block:: ocaml 105 106 | Ast.Variable name -> 107 (try Hashtbl.find named_values name with 108 | Not_found -> raise (Error "unknown variable name")) 109 110 References to variables are also quite simple using LLVM. In the simple 111 version of Kaleidoscope, we assume that the variable has already been 112 emitted somewhere and its value is available. In practice, the only 113 values that can be in the ``Codegen.named_values`` map are function 114 arguments. This code simply checks to see that the specified name is in 115 the map (if not, an unknown variable is being referenced) and returns 116 the value for it. In future chapters, we'll add support for `loop 117 induction variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for 118 `local variables <LangImpl7.html#user-defined-local-variables>`_. 119 120 .. code-block:: ocaml 121 122 | Ast.Binary (op, lhs, rhs) -> 123 let lhs_val = codegen_expr lhs in 124 let rhs_val = codegen_expr rhs in 125 begin 126 match op with 127 | '+' -> build_fadd lhs_val rhs_val "addtmp" builder 128 | '-' -> build_fsub lhs_val rhs_val "subtmp" builder 129 | '*' -> build_fmul lhs_val rhs_val "multmp" builder 130 | '<' -> 131 (* Convert bool 0/1 to double 0.0 or 1.0 *) 132 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in 133 build_uitofp i double_type "booltmp" builder 134 | _ -> raise (Error "invalid binary operator") 135 end 136 137 Binary operators start to get more interesting. The basic idea here is 138 that we recursively emit code for the left-hand side of the expression, 139 then the right-hand side, then we compute the result of the binary 140 expression. In this code, we do a simple switch on the opcode to create 141 the right LLVM instruction. 142 143 In the example above, the LLVM builder class is starting to show its 144 value. IRBuilder knows where to insert the newly created instruction, 145 all you have to do is specify what instruction to create (e.g. with 146 ``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here) 147 and optionally provide a name for the generated instruction. 148 149 One nice thing about LLVM is that the name is just a hint. For instance, 150 if the code above emits multiple "addtmp" variables, LLVM will 151 automatically provide each one with an increasing, unique numeric 152 suffix. Local value names for instructions are purely optional, but it 153 makes it much easier to read the IR dumps. 154 155 `LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict 156 rules: for example, the Left and Right operators of an `add 157 instruction <../LangRef.html#add-instruction>`_ must have the same type, and the 158 result type of the add must match the operand types. Because all values 159 in Kaleidoscope are doubles, this makes for very simple code for add, 160 sub and mul. 161 162 On the other hand, LLVM specifies that the `fcmp 163 instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a 164 one bit integer). The problem with this is that Kaleidoscope wants the 165 value to be a 0.0 or 1.0 value. In order to get these semantics, we 166 combine the fcmp instruction with a `uitofp 167 instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its 168 input integer into a floating point value by treating the input as an 169 unsigned value. In contrast, if we used the `sitofp 170 instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator 171 would return 0.0 and -1.0, depending on the input value. 172 173 .. code-block:: ocaml 174 175 | Ast.Call (callee, args) -> 176 (* Look up the name in the module table. *) 177 let callee = 178 match lookup_function callee the_module with 179 | Some callee -> callee 180 | None -> raise (Error "unknown function referenced") 181 in 182 let params = params callee in 183 184 (* If argument mismatch error. *) 185 if Array.length params == Array.length args then () else 186 raise (Error "incorrect # arguments passed"); 187 let args = Array.map codegen_expr args in 188 build_call callee args "calltmp" builder 189 190 Code generation for function calls is quite straightforward with LLVM. 191 The code above initially does a function name lookup in the LLVM 192 Module's symbol table. Recall that the LLVM Module is the container that 193 holds all of the functions we are JIT'ing. By giving each function the 194 same name as what the user specifies, we can use the LLVM symbol table 195 to resolve function names for us. 196 197 Once we have the function to call, we recursively codegen each argument 198 that is to be passed in, and create an LLVM `call 199 instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C 200 calling conventions by default, allowing these calls to also call into 201 standard library functions like "sin" and "cos", with no additional 202 effort. 203 204 This wraps up our handling of the four basic expressions that we have so 205 far in Kaleidoscope. Feel free to go in and add some more. For example, 206 by browsing the `LLVM language reference <../LangRef.html>`_ you'll find 207 several other interesting instructions that are really easy to plug into 208 our basic framework. 209 210 Function Code Generation 211 ======================== 212 213 Code generation for prototypes and functions must handle a number of 214 details, which make their code less beautiful than expression code 215 generation, but allows us to illustrate some important points. First, 216 lets talk about code generation for prototypes: they are used both for 217 function bodies and external function declarations. The code starts 218 with: 219 220 .. code-block:: ocaml 221 222 let codegen_proto = function 223 | Ast.Prototype (name, args) -> 224 (* Make the function type: double(double,double) etc. *) 225 let doubles = Array.make (Array.length args) double_type in 226 let ft = function_type double_type doubles in 227 let f = 228 match lookup_function name the_module with 229 230 This code packs a lot of power into a few lines. Note first that this 231 function returns a "Function\*" instead of a "Value\*" (although at the 232 moment they both are modeled by ``llvalue`` in ocaml). Because a 233 "prototype" really talks about the external interface for a function 234 (not the value computed by an expression), it makes sense for it to 235 return the LLVM Function it corresponds to when codegen'd. 236 237 The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that 238 should be used for a given Prototype. Since all function arguments in 239 Kaleidoscope are of type double, the first line creates a vector of "N" 240 LLVM double types. It then uses the ``Llvm.function_type`` method to 241 create a function type that takes "N" doubles as arguments, returns one 242 double as a result, and that is not vararg (that uses the function 243 ``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued 244 just like ``Constant``'s are, so you don't "new" a type, you "get" it. 245 246 The final line above checks if the function has already been defined in 247 ``Codegen.the_module``. If not, we will create it. 248 249 .. code-block:: ocaml 250 251 | None -> declare_function name ft the_module 252 253 This indicates the type and name to use, as well as which module to 254 insert into. By default we assume a function has 255 ``Llvm.Linkage.ExternalLinkage``. "`external 256 linkage <../LangRef.html#linkage>`_" means that the function may be defined 257 outside the current module and/or that it is callable by functions 258 outside the module. The "``name``" passed in is the name the user 259 specified: this name is registered in "``Codegen.the_module``"s symbol 260 table, which is used by the function call code above. 261 262 In Kaleidoscope, I choose to allow redefinitions of functions in two 263 cases: first, we want to allow 'extern'ing a function more than once, as 264 long as the prototypes for the externs match (since all arguments have 265 the same type, we just have to check that the number of arguments 266 match). Second, we want to allow 'extern'ing a function and then 267 defining a body for it. This is useful when defining mutually recursive 268 functions. 269 270 .. code-block:: ocaml 271 272 (* If 'f' conflicted, there was already something named 'name'. If it 273 * has a body, don't allow redefinition or reextern. *) 274 | Some f -> 275 (* If 'f' already has a body, reject this. *) 276 if Array.length (basic_blocks f) == 0 then () else 277 raise (Error "redefinition of function"); 278 279 (* If 'f' took a different number of arguments, reject. *) 280 if Array.length (params f) == Array.length args then () else 281 raise (Error "redefinition of function with different # args"); 282 f 283 in 284 285 In order to verify the logic above, we first check to see if the 286 pre-existing function is "empty". In this case, empty means that it has 287 no basic blocks in it, which means it has no body. If it has no body, it 288 is a forward declaration. Since we don't allow anything after a full 289 definition of the function, the code rejects this case. If the previous 290 reference to a function was an 'extern', we simply verify that the 291 number of arguments for that definition and this one match up. If not, 292 we emit an error. 293 294 .. code-block:: ocaml 295 296 (* Set names for all arguments. *) 297 Array.iteri (fun i a -> 298 let n = args.(i) in 299 set_value_name n a; 300 Hashtbl.add named_values n a; 301 ) (params f); 302 f 303 304 The last bit of code for prototypes loops over all of the arguments in 305 the function, setting the name of the LLVM Argument objects to match, 306 and registering the arguments in the ``Codegen.named_values`` map for 307 future use by the ``Ast.Variable`` variant. Once this is set up, it 308 returns the Function object to the caller. Note that we don't check for 309 conflicting argument names here (e.g. "extern foo(a b a)"). Doing so 310 would be very straight-forward with the mechanics we have already used 311 above. 312 313 .. code-block:: ocaml 314 315 let codegen_func = function 316 | Ast.Function (proto, body) -> 317 Hashtbl.clear named_values; 318 let the_function = codegen_proto proto in 319 320 Code generation for function definitions starts out simply enough: we 321 just codegen the prototype (Proto) and verify that it is ok. We then 322 clear out the ``Codegen.named_values`` map to make sure that there isn't 323 anything in it from the last function we compiled. Code generation of 324 the prototype ensures that there is an LLVM Function object that is 325 ready to go for us. 326 327 .. code-block:: ocaml 328 329 (* Create a new basic block to start insertion into. *) 330 let bb = append_block context "entry" the_function in 331 position_at_end bb builder; 332 333 try 334 let ret_val = codegen_expr body in 335 336 Now we get to the point where the ``Codegen.builder`` is set up. The 337 first line creates a new `basic 338 block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"), 339 which is inserted into ``the_function``. The second line then tells the 340 builder that new instructions should be inserted into the end of the new 341 basic block. Basic blocks in LLVM are an important part of functions 342 that define the `Control Flow 343 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we 344 don't have any control flow, our functions will only contain one block 345 at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :). 346 347 .. code-block:: ocaml 348 349 let ret_val = codegen_expr body in 350 351 (* Finish off the function. *) 352 let _ = build_ret ret_val builder in 353 354 (* Validate the generated code, checking for consistency. *) 355 Llvm_analysis.assert_valid_function the_function; 356 357 the_function 358 359 Once the insertion point is set up, we call the ``Codegen.codegen_func`` 360 method for the root expression of the function. If no error happens, 361 this emits code to compute the expression into the entry block and 362 returns the value that was computed. Assuming no error, we then create 363 an LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the 364 function. Once the function is built, we call 365 ``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This 366 function does a variety of consistency checks on the generated code, to 367 determine if our compiler is doing everything right. Using this is 368 important: it can catch a lot of bugs. Once the function is finished and 369 validated, we return it. 370 371 .. code-block:: ocaml 372 373 with e -> 374 delete_function the_function; 375 raise e 376 377 The only piece left here is handling of the error case. For simplicity, 378 we handle this by merely deleting the function we produced with the 379 ``Llvm.delete_function`` method. This allows the user to redefine a 380 function that they incorrectly typed in before: if we didn't delete it, 381 it would live in the symbol table, with a body, preventing future 382 redefinition. 383 384 This code does have a bug, though. Since the ``Codegen.codegen_proto`` 385 can return a previously defined forward declaration, our code can 386 actually delete a forward declaration. There are a number of ways to fix 387 this bug, see what you can come up with! Here is a testcase: 388 389 :: 390 391 extern foo(a b); # ok, defines foo. 392 def foo(a b) c; # error, 'c' is invalid. 393 def bar() foo(1, 2); # error, unknown function "foo" 394 395 Driver Changes and Closing Thoughts 396 =================================== 397 398 For now, code generation to LLVM doesn't really get us much, except that 399 we can look at the pretty IR calls. The sample code inserts calls to 400 Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM 401 IR. This gives a nice way to look at the LLVM IR for simple functions. 402 For example: 403 404 :: 405 406 ready> 4+5; 407 Read top-level expression: 408 define double @""() { 409 entry: 410 %addtmp = fadd double 4.000000e+00, 5.000000e+00 411 ret double %addtmp 412 } 413 414 Note how the parser turns the top-level expression into anonymous 415 functions for us. This will be handy when we add `JIT 416 support <OCamlLangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that 417 the code is very literally transcribed, no optimizations are being 418 performed. We will `add 419 optimizations <OCamlLangImpl4.html#trivial-constant-folding>`_ explicitly in the 420 next chapter. 421 422 :: 423 424 ready> def foo(a b) a*a + 2*a*b + b*b; 425 Read function definition: 426 define double @foo(double %a, double %b) { 427 entry: 428 %multmp = fmul double %a, %a 429 %multmp1 = fmul double 2.000000e+00, %a 430 %multmp2 = fmul double %multmp1, %b 431 %addtmp = fadd double %multmp, %multmp2 432 %multmp3 = fmul double %b, %b 433 %addtmp4 = fadd double %addtmp, %multmp3 434 ret double %addtmp4 435 } 436 437 This shows some simple arithmetic. Notice the striking similarity to the 438 LLVM builder calls that we use to create the instructions. 439 440 :: 441 442 ready> def bar(a) foo(a, 4.0) + bar(31337); 443 Read function definition: 444 define double @bar(double %a) { 445 entry: 446 %calltmp = call double @foo(double %a, double 4.000000e+00) 447 %calltmp1 = call double @bar(double 3.133700e+04) 448 %addtmp = fadd double %calltmp, %calltmp1 449 ret double %addtmp 450 } 451 452 This shows some function calls. Note that this function will take a long 453 time to execute if you call it. In the future we'll add conditional 454 control flow to actually make recursion useful :). 455 456 :: 457 458 ready> extern cos(x); 459 Read extern: 460 declare double @cos(double) 461 462 ready> cos(1.234); 463 Read top-level expression: 464 define double @""() { 465 entry: 466 %calltmp = call double @cos(double 1.234000e+00) 467 ret double %calltmp 468 } 469 470 This shows an extern for the libm "cos" function, and a call to it. 471 472 :: 473 474 ready> ^D 475 ; ModuleID = 'my cool jit' 476 477 define double @""() { 478 entry: 479 %addtmp = fadd double 4.000000e+00, 5.000000e+00 480 ret double %addtmp 481 } 482 483 define double @foo(double %a, double %b) { 484 entry: 485 %multmp = fmul double %a, %a 486 %multmp1 = fmul double 2.000000e+00, %a 487 %multmp2 = fmul double %multmp1, %b 488 %addtmp = fadd double %multmp, %multmp2 489 %multmp3 = fmul double %b, %b 490 %addtmp4 = fadd double %addtmp, %multmp3 491 ret double %addtmp4 492 } 493 494 define double @bar(double %a) { 495 entry: 496 %calltmp = call double @foo(double %a, double 4.000000e+00) 497 %calltmp1 = call double @bar(double 3.133700e+04) 498 %addtmp = fadd double %calltmp, %calltmp1 499 ret double %addtmp 500 } 501 502 declare double @cos(double) 503 504 define double @""() { 505 entry: 506 %calltmp = call double @cos(double 1.234000e+00) 507 ret double %calltmp 508 } 509 510 When you quit the current demo, it dumps out the IR for the entire 511 module generated. Here you can see the big picture with all the 512 functions referencing each other. 513 514 This wraps up the third chapter of the Kaleidoscope tutorial. Up next, 515 we'll describe how to `add JIT codegen and optimizer 516 support <OCamlLangImpl4.html>`_ to this so we can actually start running 517 code! 518 519 Full Code Listing 520 ================= 521 522 Here is the complete code listing for our running example, enhanced with 523 the LLVM code generator. Because this uses the LLVM libraries, we need 524 to link them in. To do this, we use the 525 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform 526 our makefile/command line about which options to use: 527 528 .. code-block:: bash 529 530 # Compile 531 ocamlbuild toy.byte 532 # Run 533 ./toy.byte 534 535 Here is the code: 536 537 \_tags: 538 :: 539 540 <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) 541 <*.{byte,native}>: g++, use_llvm, use_llvm_analysis 542 543 myocamlbuild.ml: 544 .. code-block:: ocaml 545 546 open Ocamlbuild_plugin;; 547 548 ocaml_lib ~extern:true "llvm";; 549 ocaml_lib ~extern:true "llvm_analysis";; 550 551 flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; 552 553 token.ml: 554 .. code-block:: ocaml 555 556 (*===----------------------------------------------------------------------=== 557 * Lexer Tokens 558 *===----------------------------------------------------------------------===*) 559 560 (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of 561 * these others for known things. *) 562 type token = 563 (* commands *) 564 | Def | Extern 565 566 (* primary *) 567 | Ident of string | Number of float 568 569 (* unknown *) 570 | Kwd of char 571 572 lexer.ml: 573 .. code-block:: ocaml 574 575 (*===----------------------------------------------------------------------=== 576 * Lexer 577 *===----------------------------------------------------------------------===*) 578 579 let rec lex = parser 580 (* Skip any whitespace. *) 581 | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream 582 583 (* identifier: [a-zA-Z][a-zA-Z0-9] *) 584 | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> 585 let buffer = Buffer.create 1 in 586 Buffer.add_char buffer c; 587 lex_ident buffer stream 588 589 (* number: [0-9.]+ *) 590 | [< ' ('0' .. '9' as c); stream >] -> 591 let buffer = Buffer.create 1 in 592 Buffer.add_char buffer c; 593 lex_number buffer stream 594 595 (* Comment until end of line. *) 596 | [< ' ('#'); stream >] -> 597 lex_comment stream 598 599 (* Otherwise, just return the character as its ascii value. *) 600 | [< 'c; stream >] -> 601 [< 'Token.Kwd c; lex stream >] 602 603 (* end of stream. *) 604 | [< >] -> [< >] 605 606 and lex_number buffer = parser 607 | [< ' ('0' .. '9' | '.' as c); stream >] -> 608 Buffer.add_char buffer c; 609 lex_number buffer stream 610 | [< stream=lex >] -> 611 [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] 612 613 and lex_ident buffer = parser 614 | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> 615 Buffer.add_char buffer c; 616 lex_ident buffer stream 617 | [< stream=lex >] -> 618 match Buffer.contents buffer with 619 | "def" -> [< 'Token.Def; stream >] 620 | "extern" -> [< 'Token.Extern; stream >] 621 | id -> [< 'Token.Ident id; stream >] 622 623 and lex_comment = parser 624 | [< ' ('\n'); stream=lex >] -> stream 625 | [< 'c; e=lex_comment >] -> e 626 | [< >] -> [< >] 627 628 ast.ml: 629 .. code-block:: ocaml 630 631 (*===----------------------------------------------------------------------=== 632 * Abstract Syntax Tree (aka Parse Tree) 633 *===----------------------------------------------------------------------===*) 634 635 (* expr - Base type for all expression nodes. *) 636 type expr = 637 (* variant for numeric literals like "1.0". *) 638 | Number of float 639 640 (* variant for referencing a variable, like "a". *) 641 | Variable of string 642 643 (* variant for a binary operator. *) 644 | Binary of char * expr * expr 645 646 (* variant for function calls. *) 647 | Call of string * expr array 648 649 (* proto - This type represents the "prototype" for a function, which captures 650 * its name, and its argument names (thus implicitly the number of arguments the 651 * function takes). *) 652 type proto = Prototype of string * string array 653 654 (* func - This type represents a function definition itself. *) 655 type func = Function of proto * expr 656 657 parser.ml: 658 .. code-block:: ocaml 659 660 (*===---------------------------------------------------------------------=== 661 * Parser 662 *===---------------------------------------------------------------------===*) 663 664 (* binop_precedence - This holds the precedence for each binary operator that is 665 * defined *) 666 let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 667 668 (* precedence - Get the precedence of the pending binary operator token. *) 669 let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 670 671 (* primary 672 * ::= identifier 673 * ::= numberexpr 674 * ::= parenexpr *) 675 let rec parse_primary = parser 676 (* numberexpr ::= number *) 677 | [< 'Token.Number n >] -> Ast.Number n 678 679 (* parenexpr ::= '(' expression ')' *) 680 | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e 681 682 (* identifierexpr 683 * ::= identifier 684 * ::= identifier '(' argumentexpr ')' *) 685 | [< 'Token.Ident id; stream >] -> 686 let rec parse_args accumulator = parser 687 | [< e=parse_expr; stream >] -> 688 begin parser 689 | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e 690 | [< >] -> e :: accumulator 691 end stream 692 | [< >] -> accumulator 693 in 694 let rec parse_ident id = parser 695 (* Call. *) 696 | [< 'Token.Kwd '('; 697 args=parse_args []; 698 'Token.Kwd ')' ?? "expected ')'">] -> 699 Ast.Call (id, Array.of_list (List.rev args)) 700 701 (* Simple variable ref. *) 702 | [< >] -> Ast.Variable id 703 in 704 parse_ident id stream 705 706 | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") 707 708 (* binoprhs 709 * ::= ('+' primary)* *) 710 and parse_bin_rhs expr_prec lhs stream = 711 match Stream.peek stream with 712 (* If this is a binop, find its precedence. *) 713 | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> 714 let token_prec = precedence c in 715 716 (* If this is a binop that binds at least as tightly as the current binop, 717 * consume it, otherwise we are done. *) 718 if token_prec < expr_prec then lhs else begin 719 (* Eat the binop. *) 720 Stream.junk stream; 721 722 (* Parse the primary expression after the binary operator. *) 723 let rhs = parse_primary stream in 724 725 (* Okay, we know this is a binop. *) 726 let rhs = 727 match Stream.peek stream with 728 | Some (Token.Kwd c2) -> 729 (* If BinOp binds less tightly with rhs than the operator after 730 * rhs, let the pending operator take rhs as its lhs. *) 731 let next_prec = precedence c2 in 732 if token_prec < next_prec 733 then parse_bin_rhs (token_prec + 1) rhs stream 734 else rhs 735 | _ -> rhs 736 in 737 738 (* Merge lhs/rhs. *) 739 let lhs = Ast.Binary (c, lhs, rhs) in 740 parse_bin_rhs expr_prec lhs stream 741 end 742 | _ -> lhs 743 744 (* expression 745 * ::= primary binoprhs *) 746 and parse_expr = parser 747 | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream 748 749 (* prototype 750 * ::= id '(' id* ')' *) 751 let parse_prototype = 752 let rec parse_args accumulator = parser 753 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e 754 | [< >] -> accumulator 755 in 756 757 parser 758 | [< 'Token.Ident id; 759 'Token.Kwd '(' ?? "expected '(' in prototype"; 760 args=parse_args []; 761 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> 762 (* success. *) 763 Ast.Prototype (id, Array.of_list (List.rev args)) 764 765 | [< >] -> 766 raise (Stream.Error "expected function name in prototype") 767 768 (* definition ::= 'def' prototype expression *) 769 let parse_definition = parser 770 | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> 771 Ast.Function (p, e) 772 773 (* toplevelexpr ::= expression *) 774 let parse_toplevel = parser 775 | [< e=parse_expr >] -> 776 (* Make an anonymous proto. *) 777 Ast.Function (Ast.Prototype ("", [||]), e) 778 779 (* external ::= 'extern' prototype *) 780 let parse_extern = parser 781 | [< 'Token.Extern; e=parse_prototype >] -> e 782 783 codegen.ml: 784 .. code-block:: ocaml 785 786 (*===----------------------------------------------------------------------=== 787 * Code Generation 788 *===----------------------------------------------------------------------===*) 789 790 open Llvm 791 792 exception Error of string 793 794 let context = global_context () 795 let the_module = create_module context "my cool jit" 796 let builder = builder context 797 let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 798 let double_type = double_type context 799 800 let rec codegen_expr = function 801 | Ast.Number n -> const_float double_type n 802 | Ast.Variable name -> 803 (try Hashtbl.find named_values name with 804 | Not_found -> raise (Error "unknown variable name")) 805 | Ast.Binary (op, lhs, rhs) -> 806 let lhs_val = codegen_expr lhs in 807 let rhs_val = codegen_expr rhs in 808 begin 809 match op with 810 | '+' -> build_add lhs_val rhs_val "addtmp" builder 811 | '-' -> build_sub lhs_val rhs_val "subtmp" builder 812 | '*' -> build_mul lhs_val rhs_val "multmp" builder 813 | '<' -> 814 (* Convert bool 0/1 to double 0.0 or 1.0 *) 815 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in 816 build_uitofp i double_type "booltmp" builder 817 | _ -> raise (Error "invalid binary operator") 818 end 819 | Ast.Call (callee, args) -> 820 (* Look up the name in the module table. *) 821 let callee = 822 match lookup_function callee the_module with 823 | Some callee -> callee 824 | None -> raise (Error "unknown function referenced") 825 in 826 let params = params callee in 827 828 (* If argument mismatch error. *) 829 if Array.length params == Array.length args then () else 830 raise (Error "incorrect # arguments passed"); 831 let args = Array.map codegen_expr args in 832 build_call callee args "calltmp" builder 833 834 let codegen_proto = function 835 | Ast.Prototype (name, args) -> 836 (* Make the function type: double(double,double) etc. *) 837 let doubles = Array.make (Array.length args) double_type in 838 let ft = function_type double_type doubles in 839 let f = 840 match lookup_function name the_module with 841 | None -> declare_function name ft the_module 842 843 (* If 'f' conflicted, there was already something named 'name'. If it 844 * has a body, don't allow redefinition or reextern. *) 845 | Some f -> 846 (* If 'f' already has a body, reject this. *) 847 if block_begin f <> At_end f then 848 raise (Error "redefinition of function"); 849 850 (* If 'f' took a different number of arguments, reject. *) 851 if element_type (type_of f) <> ft then 852 raise (Error "redefinition of function with different # args"); 853 f 854 in 855 856 (* Set names for all arguments. *) 857 Array.iteri (fun i a -> 858 let n = args.(i) in 859 set_value_name n a; 860 Hashtbl.add named_values n a; 861 ) (params f); 862 f 863 864 let codegen_func = function 865 | Ast.Function (proto, body) -> 866 Hashtbl.clear named_values; 867 let the_function = codegen_proto proto in 868 869 (* Create a new basic block to start insertion into. *) 870 let bb = append_block context "entry" the_function in 871 position_at_end bb builder; 872 873 try 874 let ret_val = codegen_expr body in 875 876 (* Finish off the function. *) 877 let _ = build_ret ret_val builder in 878 879 (* Validate the generated code, checking for consistency. *) 880 Llvm_analysis.assert_valid_function the_function; 881 882 the_function 883 with e -> 884 delete_function the_function; 885 raise e 886 887 toplevel.ml: 888 .. code-block:: ocaml 889 890 (*===----------------------------------------------------------------------=== 891 * Top-Level parsing and JIT Driver 892 *===----------------------------------------------------------------------===*) 893 894 open Llvm 895 896 (* top ::= definition | external | expression | ';' *) 897 let rec main_loop stream = 898 match Stream.peek stream with 899 | None -> () 900 901 (* ignore top-level semicolons. *) 902 | Some (Token.Kwd ';') -> 903 Stream.junk stream; 904 main_loop stream 905 906 | Some token -> 907 begin 908 try match token with 909 | Token.Def -> 910 let e = Parser.parse_definition stream in 911 print_endline "parsed a function definition."; 912 dump_value (Codegen.codegen_func e); 913 | Token.Extern -> 914 let e = Parser.parse_extern stream in 915 print_endline "parsed an extern."; 916 dump_value (Codegen.codegen_proto e); 917 | _ -> 918 (* Evaluate a top-level expression into an anonymous function. *) 919 let e = Parser.parse_toplevel stream in 920 print_endline "parsed a top-level expr"; 921 dump_value (Codegen.codegen_func e); 922 with Stream.Error s | Codegen.Error s -> 923 (* Skip token for error recovery. *) 924 Stream.junk stream; 925 print_endline s; 926 end; 927 print_string "ready> "; flush stdout; 928 main_loop stream 929 930 toy.ml: 931 .. code-block:: ocaml 932 933 (*===----------------------------------------------------------------------=== 934 * Main driver code. 935 *===----------------------------------------------------------------------===*) 936 937 open Llvm 938 939 let main () = 940 (* Install standard binary operators. 941 * 1 is the lowest precedence. *) 942 Hashtbl.add Parser.binop_precedence '<' 10; 943 Hashtbl.add Parser.binop_precedence '+' 20; 944 Hashtbl.add Parser.binop_precedence '-' 20; 945 Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) 946 947 (* Prime the first token. *) 948 print_string "ready> "; flush stdout; 949 let stream = Lexer.lex (Stream.of_channel stdin) in 950 951 (* Run the main "interpreter loop" now. *) 952 Toplevel.main_loop stream; 953 954 (* Print out all the generated code. *) 955 dump_module Codegen.the_module 956 ;; 957 958 main () 959 960 `Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_ 961 962