1 ============================================================ 2 Kaleidoscope: Extending the Language: User-defined Operators 3 ============================================================ 4 5 .. contents:: 6 :local: 7 8 Chapter 6 Introduction 9 ====================== 10 11 Welcome to Chapter 6 of the "`Implementing a language with 12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now 13 have a fully functional language that is fairly minimal, but also 14 useful. There is still one big problem with it, however. Our language 15 doesn't have many useful operators (like division, logical negation, or 16 even any comparisons besides less-than). 17 18 This chapter of the tutorial takes a wild digression into adding 19 user-defined operators to the simple and beautiful Kaleidoscope 20 language. This digression now gives us a simple and ugly language in 21 some ways, but also a powerful one at the same time. One of the great 22 things about creating your own language is that you get to decide what 23 is good or bad. In this tutorial we'll assume that it is okay to use 24 this as a way to show some interesting parsing techniques. 25 26 At the end of this tutorial, we'll run through an example Kaleidoscope 27 application that `renders the Mandelbrot set <#example>`_. This gives an 28 example of what you can build with Kaleidoscope and its feature set. 29 30 User-defined Operators: the Idea 31 ================================ 32 33 The "operator overloading" that we will add to Kaleidoscope is more 34 general than languages like C++. In C++, you are only allowed to 35 redefine existing operators: you can't programatically change the 36 grammar, introduce new operators, change precedence levels, etc. In this 37 chapter, we will add this capability to Kaleidoscope, which will let the 38 user round out the set of operators that are supported. 39 40 The point of going into user-defined operators in a tutorial like this 41 is to show the power and flexibility of using a hand-written parser. 42 Thus far, the parser we have been implementing uses recursive descent 43 for most parts of the grammar and operator precedence parsing for the 44 expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without 45 using operator precedence parsing, it would be very difficult to allow 46 the programmer to introduce new operators into the grammar: the grammar 47 is dynamically extensible as the JIT runs. 48 49 The two specific features we'll add are programmable unary operators 50 (right now, Kaleidoscope has no unary operators at all) as well as 51 binary operators. An example of this is: 52 53 :: 54 55 # Logical unary not. 56 def unary!(v) 57 if v then 58 0 59 else 60 1; 61 62 # Define > with the same precedence as <. 63 def binary> 10 (LHS RHS) 64 RHS < LHS; 65 66 # Binary "logical or", (note that it does not "short circuit") 67 def binary| 5 (LHS RHS) 68 if LHS then 69 1 70 else if RHS then 71 1 72 else 73 0; 74 75 # Define = with slightly lower precedence than relationals. 76 def binary= 9 (LHS RHS) 77 !(LHS < RHS | LHS > RHS); 78 79 Many languages aspire to being able to implement their standard runtime 80 library in the language itself. In Kaleidoscope, we can implement 81 significant parts of the language in the library! 82 83 We will break down implementation of these features into two parts: 84 implementing support for user-defined binary operators and adding unary 85 operators. 86 87 User-defined Binary Operators 88 ============================= 89 90 Adding support for user-defined binary operators is pretty simple with 91 our current framework. We'll first add support for the unary/binary 92 keywords: 93 94 .. code-block:: c++ 95 96 enum Token { 97 ... 98 // operators 99 tok_binary = -11, tok_unary = -12 100 }; 101 ... 102 static int gettok() { 103 ... 104 if (IdentifierStr == "for") return tok_for; 105 if (IdentifierStr == "in") return tok_in; 106 if (IdentifierStr == "binary") return tok_binary; 107 if (IdentifierStr == "unary") return tok_unary; 108 return tok_identifier; 109 110 This just adds lexer support for the unary and binary keywords, like we 111 did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing 112 about our current AST, is that we represent binary operators with full 113 generalisation by using their ASCII code as the opcode. For our extended 114 operators, we'll use this same representation, so we don't need any new 115 AST or parser support. 116 117 On the other hand, we have to be able to represent the definitions of 118 these new operators, in the "def binary\| 5" part of the function 119 definition. In our grammar so far, the "name" for the function 120 definition is parsed as the "prototype" production and into the 121 ``PrototypeAST`` AST node. To represent our new user-defined operators 122 as prototypes, we have to extend the ``PrototypeAST`` AST node like 123 this: 124 125 .. code-block:: c++ 126 127 /// PrototypeAST - This class represents the "prototype" for a function, 128 /// which captures its argument names as well as if it is an operator. 129 class PrototypeAST { 130 std::string Name; 131 std::vector<std::string> Args; 132 bool isOperator; 133 unsigned Precedence; // Precedence if a binary op. 134 public: 135 PrototypeAST(const std::string &name, const std::vector<std::string> &args, 136 bool isoperator = false, unsigned prec = 0) 137 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} 138 139 bool isUnaryOp() const { return isOperator && Args.size() == 1; } 140 bool isBinaryOp() const { return isOperator && Args.size() == 2; } 141 142 char getOperatorName() const { 143 assert(isUnaryOp() || isBinaryOp()); 144 return Name[Name.size()-1]; 145 } 146 147 unsigned getBinaryPrecedence() const { return Precedence; } 148 149 Function *Codegen(); 150 }; 151 152 Basically, in addition to knowing a name for the prototype, we now keep 153 track of whether it was an operator, and if it was, what precedence 154 level the operator is at. The precedence is only used for binary 155 operators (as you'll see below, it just doesn't apply for unary 156 operators). Now that we have a way to represent the prototype for a 157 user-defined operator, we need to parse it: 158 159 .. code-block:: c++ 160 161 /// prototype 162 /// ::= id '(' id* ')' 163 /// ::= binary LETTER number? (id, id) 164 static PrototypeAST *ParsePrototype() { 165 std::string FnName; 166 167 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 168 unsigned BinaryPrecedence = 30; 169 170 switch (CurTok) { 171 default: 172 return ErrorP("Expected function name in prototype"); 173 case tok_identifier: 174 FnName = IdentifierStr; 175 Kind = 0; 176 getNextToken(); 177 break; 178 case tok_binary: 179 getNextToken(); 180 if (!isascii(CurTok)) 181 return ErrorP("Expected binary operator"); 182 FnName = "binary"; 183 FnName += (char)CurTok; 184 Kind = 2; 185 getNextToken(); 186 187 // Read the precedence if present. 188 if (CurTok == tok_number) { 189 if (NumVal < 1 || NumVal > 100) 190 return ErrorP("Invalid precedecnce: must be 1..100"); 191 BinaryPrecedence = (unsigned)NumVal; 192 getNextToken(); 193 } 194 break; 195 } 196 197 if (CurTok != '(') 198 return ErrorP("Expected '(' in prototype"); 199 200 std::vector<std::string> ArgNames; 201 while (getNextToken() == tok_identifier) 202 ArgNames.push_back(IdentifierStr); 203 if (CurTok != ')') 204 return ErrorP("Expected ')' in prototype"); 205 206 // success. 207 getNextToken(); // eat ')'. 208 209 // Verify right number of names for operator. 210 if (Kind && ArgNames.size() != Kind) 211 return ErrorP("Invalid number of operands for operator"); 212 213 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); 214 } 215 216 This is all fairly straightforward parsing code, and we have already 217 seen a lot of similar code in the past. One interesting part about the 218 code above is the couple lines that set up ``FnName`` for binary 219 operators. This builds names like "binary@" for a newly defined "@" 220 operator. This then takes advantage of the fact that symbol names in the 221 LLVM symbol table are allowed to have any character in them, including 222 embedded nul characters. 223 224 The next interesting thing to add, is codegen support for these binary 225 operators. Given our current structure, this is a simple addition of a 226 default case for our existing binary operator node: 227 228 .. code-block:: c++ 229 230 Value *BinaryExprAST::Codegen() { 231 Value *L = LHS->Codegen(); 232 Value *R = RHS->Codegen(); 233 if (L == 0 || R == 0) return 0; 234 235 switch (Op) { 236 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 237 case '-': return Builder.CreateFSub(L, R, "subtmp"); 238 case '*': return Builder.CreateFMul(L, R, "multmp"); 239 case '<': 240 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 241 // Convert bool 0/1 to double 0.0 or 1.0 242 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 243 "booltmp"); 244 default: break; 245 } 246 247 // If it wasn't a builtin binary operator, it must be a user defined one. Emit 248 // a call to it. 249 Function *F = TheModule->getFunction(std::string("binary")+Op); 250 assert(F && "binary operator not found!"); 251 252 Value *Ops[2] = { L, R }; 253 return Builder.CreateCall(F, Ops, "binop"); 254 } 255 256 As you can see above, the new code is actually really simple. It just 257 does a lookup for the appropriate operator in the symbol table and 258 generates a function call to it. Since user-defined operators are just 259 built as normal functions (because the "prototype" boils down to a 260 function with the right name) everything falls into place. 261 262 The final piece of code we are missing, is a bit of top-level magic: 263 264 .. code-block:: c++ 265 266 Function *FunctionAST::Codegen() { 267 NamedValues.clear(); 268 269 Function *TheFunction = Proto->Codegen(); 270 if (TheFunction == 0) 271 return 0; 272 273 // If this is an operator, install it. 274 if (Proto->isBinaryOp()) 275 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); 276 277 // Create a new basic block to start insertion into. 278 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 279 Builder.SetInsertPoint(BB); 280 281 if (Value *RetVal = Body->Codegen()) { 282 ... 283 284 Basically, before codegening a function, if it is a user-defined 285 operator, we register it in the precedence table. This allows the binary 286 operator parsing logic we already have in place to handle it. Since we 287 are working on a fully-general operator precedence parser, this is all 288 we need to do to "extend the grammar". 289 290 Now we have useful user-defined binary operators. This builds a lot on 291 the previous framework we built for other operators. Adding unary 292 operators is a bit more challenging, because we don't have any framework 293 for it yet - lets see what it takes. 294 295 User-defined Unary Operators 296 ============================ 297 298 Since we don't currently support unary operators in the Kaleidoscope 299 language, we'll need to add everything to support them. Above, we added 300 simple support for the 'unary' keyword to the lexer. In addition to 301 that, we need an AST node: 302 303 .. code-block:: c++ 304 305 /// UnaryExprAST - Expression class for a unary operator. 306 class UnaryExprAST : public ExprAST { 307 char Opcode; 308 ExprAST *Operand; 309 public: 310 UnaryExprAST(char opcode, ExprAST *operand) 311 : Opcode(opcode), Operand(operand) {} 312 virtual Value *Codegen(); 313 }; 314 315 This AST node is very simple and obvious by now. It directly mirrors the 316 binary operator AST node, except that it only has one child. With this, 317 we need to add the parsing logic. Parsing a unary operator is pretty 318 simple: we'll add a new function to do it: 319 320 .. code-block:: c++ 321 322 /// unary 323 /// ::= primary 324 /// ::= '!' unary 325 static ExprAST *ParseUnary() { 326 // If the current token is not an operator, it must be a primary expr. 327 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') 328 return ParsePrimary(); 329 330 // If this is a unary operator, read it. 331 int Opc = CurTok; 332 getNextToken(); 333 if (ExprAST *Operand = ParseUnary()) 334 return new UnaryExprAST(Opc, Operand); 335 return 0; 336 } 337 338 The grammar we add is pretty straightforward here. If we see a unary 339 operator when parsing a primary operator, we eat the operator as a 340 prefix and parse the remaining piece as another unary operator. This 341 allows us to handle multiple unary operators (e.g. "!!x"). Note that 342 unary operators can't have ambiguous parses like binary operators can, 343 so there is no need for precedence information. 344 345 The problem with this function, is that we need to call ParseUnary from 346 somewhere. To do this, we change previous callers of ParsePrimary to 347 call ParseUnary instead: 348 349 .. code-block:: c++ 350 351 /// binoprhs 352 /// ::= ('+' unary)* 353 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 354 ... 355 // Parse the unary expression after the binary operator. 356 ExprAST *RHS = ParseUnary(); 357 if (!RHS) return 0; 358 ... 359 } 360 /// expression 361 /// ::= unary binoprhs 362 /// 363 static ExprAST *ParseExpression() { 364 ExprAST *LHS = ParseUnary(); 365 if (!LHS) return 0; 366 367 return ParseBinOpRHS(0, LHS); 368 } 369 370 With these two simple changes, we are now able to parse unary operators 371 and build the AST for them. Next up, we need to add parser support for 372 prototypes, to parse the unary operator prototype. We extend the binary 373 operator code above with: 374 375 .. code-block:: c++ 376 377 /// prototype 378 /// ::= id '(' id* ')' 379 /// ::= binary LETTER number? (id, id) 380 /// ::= unary LETTER (id) 381 static PrototypeAST *ParsePrototype() { 382 std::string FnName; 383 384 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 385 unsigned BinaryPrecedence = 30; 386 387 switch (CurTok) { 388 default: 389 return ErrorP("Expected function name in prototype"); 390 case tok_identifier: 391 FnName = IdentifierStr; 392 Kind = 0; 393 getNextToken(); 394 break; 395 case tok_unary: 396 getNextToken(); 397 if (!isascii(CurTok)) 398 return ErrorP("Expected unary operator"); 399 FnName = "unary"; 400 FnName += (char)CurTok; 401 Kind = 1; 402 getNextToken(); 403 break; 404 case tok_binary: 405 ... 406 407 As with binary operators, we name unary operators with a name that 408 includes the operator character. This assists us at code generation 409 time. Speaking of, the final piece we need to add is codegen support for 410 unary operators. It looks like this: 411 412 .. code-block:: c++ 413 414 Value *UnaryExprAST::Codegen() { 415 Value *OperandV = Operand->Codegen(); 416 if (OperandV == 0) return 0; 417 418 Function *F = TheModule->getFunction(std::string("unary")+Opcode); 419 if (F == 0) 420 return ErrorV("Unknown unary operator"); 421 422 return Builder.CreateCall(F, OperandV, "unop"); 423 } 424 425 This code is similar to, but simpler than, the code for binary 426 operators. It is simpler primarily because it doesn't need to handle any 427 predefined operators. 428 429 Kicking the Tires 430 ================= 431 432 It is somewhat hard to believe, but with a few simple extensions we've 433 covered in the last chapters, we have grown a real-ish language. With 434 this, we can do a lot of interesting things, including I/O, math, and a 435 bunch of other things. For example, we can now add a nice sequencing 436 operator (printd is defined to print out the specified value and a 437 newline): 438 439 :: 440 441 ready> extern printd(x); 442 Read extern: 443 declare double @printd(double) 444 445 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. 446 .. 447 ready> printd(123) : printd(456) : printd(789); 448 123.000000 449 456.000000 450 789.000000 451 Evaluated to 0.000000 452 453 We can also define a bunch of other "primitive" operations, such as: 454 455 :: 456 457 # Logical unary not. 458 def unary!(v) 459 if v then 460 0 461 else 462 1; 463 464 # Unary negate. 465 def unary-(v) 466 0-v; 467 468 # Define > with the same precedence as <. 469 def binary> 10 (LHS RHS) 470 RHS < LHS; 471 472 # Binary logical or, which does not short circuit. 473 def binary| 5 (LHS RHS) 474 if LHS then 475 1 476 else if RHS then 477 1 478 else 479 0; 480 481 # Binary logical and, which does not short circuit. 482 def binary& 6 (LHS RHS) 483 if !LHS then 484 0 485 else 486 !!RHS; 487 488 # Define = with slightly lower precedence than relationals. 489 def binary = 9 (LHS RHS) 490 !(LHS < RHS | LHS > RHS); 491 492 # Define ':' for sequencing: as a low-precedence operator that ignores operands 493 # and just returns the RHS. 494 def binary : 1 (x y) y; 495 496 Given the previous if/then/else support, we can also define interesting 497 functions for I/O. For example, the following prints out a character 498 whose "density" reflects the value passed in: the lower the value, the 499 denser the character: 500 501 :: 502 503 ready> 504 505 extern putchard(char) 506 def printdensity(d) 507 if d > 8 then 508 putchard(32) # ' ' 509 else if d > 4 then 510 putchard(46) # '.' 511 else if d > 2 then 512 putchard(43) # '+' 513 else 514 putchard(42); # '*' 515 ... 516 ready> printdensity(1): printdensity(2): printdensity(3): 517 printdensity(4): printdensity(5): printdensity(9): 518 putchard(10); 519 **++. 520 Evaluated to 0.000000 521 522 Based on these simple primitive operations, we can start to define more 523 interesting things. For example, here's a little function that solves 524 for the number of iterations it takes a function in the complex plane to 525 converge: 526 527 :: 528 529 # Determine whether the specific location diverges. 530 # Solve for z = z^2 + c in the complex plane. 531 def mandleconverger(real imag iters creal cimag) 532 if iters > 255 | (real*real + imag*imag > 4) then 533 iters 534 else 535 mandleconverger(real*real - imag*imag + creal, 536 2*real*imag + cimag, 537 iters+1, creal, cimag); 538 539 # Return the number of iterations required for the iteration to escape 540 def mandleconverge(real imag) 541 mandleconverger(real, imag, 0, real, imag); 542 543 This "``z = z2 + c``" function is a beautiful little creature that is 544 the basis for computation of the `Mandelbrot 545 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our 546 ``mandelconverge`` function returns the number of iterations that it 547 takes for a complex orbit to escape, saturating to 255. This is not a 548 very useful function by itself, but if you plot its value over a 549 two-dimensional plane, you can see the Mandelbrot set. Given that we are 550 limited to using putchard here, our amazing graphical output is limited, 551 but we can whip together something using the density plotter above: 552 553 :: 554 555 # Compute and plot the mandlebrot set with the specified 2 dimensional range 556 # info. 557 def mandelhelp(xmin xmax xstep ymin ymax ystep) 558 for y = ymin, y < ymax, ystep in ( 559 (for x = xmin, x < xmax, xstep in 560 printdensity(mandleconverge(x,y))) 561 : putchard(10) 562 ) 563 564 # mandel - This is a convenient helper function for plotting the mandelbrot set 565 # from the specified position with the specified Magnification. 566 def mandel(realstart imagstart realmag imagmag) 567 mandelhelp(realstart, realstart+realmag*78, realmag, 568 imagstart, imagstart+imagmag*40, imagmag); 569 570 Given this, we can try plotting out the mandlebrot set! Lets try it out: 571 572 :: 573 574 ready> mandel(-2.3, -1.3, 0.05, 0.07); 575 *******************************+++++++++++************************************* 576 *************************+++++++++++++++++++++++******************************* 577 **********************+++++++++++++++++++++++++++++**************************** 578 *******************+++++++++++++++++++++.. ...++++++++************************* 579 *****************++++++++++++++++++++++.... ...+++++++++*********************** 580 ***************+++++++++++++++++++++++..... ...+++++++++********************* 581 **************+++++++++++++++++++++++.... ....+++++++++******************** 582 *************++++++++++++++++++++++...... .....++++++++******************* 583 ************+++++++++++++++++++++....... .......+++++++****************** 584 ***********+++++++++++++++++++.... ... .+++++++***************** 585 **********+++++++++++++++++....... .+++++++**************** 586 *********++++++++++++++........... ...+++++++*************** 587 ********++++++++++++............ ...++++++++************** 588 ********++++++++++... .......... .++++++++************** 589 *******+++++++++..... .+++++++++************* 590 *******++++++++...... ..+++++++++************* 591 *******++++++....... ..+++++++++************* 592 *******+++++...... ..+++++++++************* 593 *******.... .... ...+++++++++************* 594 *******.... . ...+++++++++************* 595 *******+++++...... ...+++++++++************* 596 *******++++++....... ..+++++++++************* 597 *******++++++++...... .+++++++++************* 598 *******+++++++++..... ..+++++++++************* 599 ********++++++++++... .......... .++++++++************** 600 ********++++++++++++............ ...++++++++************** 601 *********++++++++++++++.......... ...+++++++*************** 602 **********++++++++++++++++........ .+++++++**************** 603 **********++++++++++++++++++++.... ... ..+++++++**************** 604 ***********++++++++++++++++++++++....... .......++++++++***************** 605 ************+++++++++++++++++++++++...... ......++++++++****************** 606 **************+++++++++++++++++++++++.... ....++++++++******************** 607 ***************+++++++++++++++++++++++..... ...+++++++++********************* 608 *****************++++++++++++++++++++++.... ...++++++++*********************** 609 *******************+++++++++++++++++++++......++++++++************************* 610 *********************++++++++++++++++++++++.++++++++*************************** 611 *************************+++++++++++++++++++++++******************************* 612 ******************************+++++++++++++************************************ 613 ******************************************************************************* 614 ******************************************************************************* 615 ******************************************************************************* 616 Evaluated to 0.000000 617 ready> mandel(-2, -1, 0.02, 0.04); 618 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ 619 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 620 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. 621 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... 622 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... 623 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ 624 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... 625 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. 626 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . 627 **********++++++++++++++++++++++++++++++++++++++++++++++............. 628 ********+++++++++++++++++++++++++++++++++++++++++++.................. 629 *******+++++++++++++++++++++++++++++++++++++++....................... 630 ******+++++++++++++++++++++++++++++++++++........................... 631 *****++++++++++++++++++++++++++++++++............................ 632 *****++++++++++++++++++++++++++++............................... 633 ****++++++++++++++++++++++++++...... ......................... 634 ***++++++++++++++++++++++++......... ...... ........... 635 ***++++++++++++++++++++++............ 636 **+++++++++++++++++++++.............. 637 **+++++++++++++++++++................ 638 *++++++++++++++++++................. 639 *++++++++++++++++............ ... 640 *++++++++++++++.............. 641 *+++....++++................ 642 *.......... ........... 643 * 644 *.......... ........... 645 *+++....++++................ 646 *++++++++++++++.............. 647 *++++++++++++++++............ ... 648 *++++++++++++++++++................. 649 **+++++++++++++++++++................ 650 **+++++++++++++++++++++.............. 651 ***++++++++++++++++++++++............ 652 ***++++++++++++++++++++++++......... ...... ........... 653 ****++++++++++++++++++++++++++...... ......................... 654 *****++++++++++++++++++++++++++++............................... 655 *****++++++++++++++++++++++++++++++++............................ 656 ******+++++++++++++++++++++++++++++++++++........................... 657 *******+++++++++++++++++++++++++++++++++++++++....................... 658 ********+++++++++++++++++++++++++++++++++++++++++++.................. 659 Evaluated to 0.000000 660 ready> mandel(-0.9, -1.4, 0.02, 0.03); 661 ******************************************************************************* 662 ******************************************************************************* 663 ******************************************************************************* 664 **********+++++++++++++++++++++************************************************ 665 *+++++++++++++++++++++++++++++++++++++++*************************************** 666 +++++++++++++++++++++++++++++++++++++++++++++********************************** 667 ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** 668 ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* 669 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** 670 +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* 671 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** 672 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** 673 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ 674 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** 675 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** 676 ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** 677 +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** 678 ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** 679 ++++++++++++++++++++........... .........++++++++++++++++++++++* 680 ++++++++++++++++++............ ...........++++++++++++++++++++ 681 ++++++++++++++++............... .............++++++++++++++++++ 682 ++++++++++++++................. ...............++++++++++++++++ 683 ++++++++++++.................. .................++++++++++++++ 684 +++++++++.................. .................+++++++++++++ 685 ++++++........ . ......... ..++++++++++++ 686 ++............ ...... ....++++++++++ 687 .............. ...++++++++++ 688 .............. ....+++++++++ 689 .............. .....++++++++ 690 ............. ......++++++++ 691 ........... .......++++++++ 692 ......... ........+++++++ 693 ......... ........+++++++ 694 ......... ....+++++++ 695 ........ ...+++++++ 696 ....... ...+++++++ 697 ....+++++++ 698 .....+++++++ 699 ....+++++++ 700 ....+++++++ 701 ....+++++++ 702 Evaluated to 0.000000 703 ready> ^D 704 705 At this point, you may be starting to realize that Kaleidoscope is a 706 real and powerful language. It may not be self-similar :), but it can be 707 used to plot things that are! 708 709 With this, we conclude the "adding user-defined operators" chapter of 710 the tutorial. We have successfully augmented our language, adding the 711 ability to extend the language in the library, and we have shown how 712 this can be used to build a simple but interesting end-user application 713 in Kaleidoscope. At this point, Kaleidoscope can build a variety of 714 applications that are functional and can call functions with 715 side-effects, but it can't actually define and mutate a variable itself. 716 717 Strikingly, variable mutation is an important feature of some languages, 718 and it is not at all obvious how to `add support for mutable 719 variables <LangImpl7.html>`_ without having to add an "SSA construction" 720 phase to your front-end. In the next chapter, we will describe how you 721 can add variable mutation without building SSA in your front-end. 722 723 Full Code Listing 724 ================= 725 726 Here is the complete code listing for our running example, enhanced with 727 the if/then/else and for expressions.. To build this example, use: 728 729 .. code-block:: bash 730 731 # Compile 732 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy 733 # Run 734 ./toy 735 736 On some platforms, you will need to specify -rdynamic or 737 -Wl,--export-dynamic when linking. This ensures that symbols defined in 738 the main executable are exported to the dynamic linker and so are 739 available for symbol resolution at run time. This is not needed if you 740 compile your support code into a shared library, although doing that 741 will cause problems on Windows. 742 743 Here is the code: 744 745 .. code-block:: c++ 746 747 #include "llvm/DerivedTypes.h" 748 #include "llvm/ExecutionEngine/ExecutionEngine.h" 749 #include "llvm/ExecutionEngine/JIT.h" 750 #include "llvm/IRBuilder.h" 751 #include "llvm/LLVMContext.h" 752 #include "llvm/Module.h" 753 #include "llvm/PassManager.h" 754 #include "llvm/Analysis/Verifier.h" 755 #include "llvm/Analysis/Passes.h" 756 #include "llvm/DataLayout.h" 757 #include "llvm/Transforms/Scalar.h" 758 #include "llvm/Support/TargetSelect.h" 759 #include <cstdio> 760 #include <string> 761 #include <map> 762 #include <vector> 763 using namespace llvm; 764 765 //===----------------------------------------------------------------------===// 766 // Lexer 767 //===----------------------------------------------------------------------===// 768 769 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one 770 // of these for known things. 771 enum Token { 772 tok_eof = -1, 773 774 // commands 775 tok_def = -2, tok_extern = -3, 776 777 // primary 778 tok_identifier = -4, tok_number = -5, 779 780 // control 781 tok_if = -6, tok_then = -7, tok_else = -8, 782 tok_for = -9, tok_in = -10, 783 784 // operators 785 tok_binary = -11, tok_unary = -12 786 }; 787 788 static std::string IdentifierStr; // Filled in if tok_identifier 789 static double NumVal; // Filled in if tok_number 790 791 /// gettok - Return the next token from standard input. 792 static int gettok() { 793 static int LastChar = ' '; 794 795 // Skip any whitespace. 796 while (isspace(LastChar)) 797 LastChar = getchar(); 798 799 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* 800 IdentifierStr = LastChar; 801 while (isalnum((LastChar = getchar()))) 802 IdentifierStr += LastChar; 803 804 if (IdentifierStr == "def") return tok_def; 805 if (IdentifierStr == "extern") return tok_extern; 806 if (IdentifierStr == "if") return tok_if; 807 if (IdentifierStr == "then") return tok_then; 808 if (IdentifierStr == "else") return tok_else; 809 if (IdentifierStr == "for") return tok_for; 810 if (IdentifierStr == "in") return tok_in; 811 if (IdentifierStr == "binary") return tok_binary; 812 if (IdentifierStr == "unary") return tok_unary; 813 return tok_identifier; 814 } 815 816 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ 817 std::string NumStr; 818 do { 819 NumStr += LastChar; 820 LastChar = getchar(); 821 } while (isdigit(LastChar) || LastChar == '.'); 822 823 NumVal = strtod(NumStr.c_str(), 0); 824 return tok_number; 825 } 826 827 if (LastChar == '#') { 828 // Comment until end of line. 829 do LastChar = getchar(); 830 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); 831 832 if (LastChar != EOF) 833 return gettok(); 834 } 835 836 // Check for end of file. Don't eat the EOF. 837 if (LastChar == EOF) 838 return tok_eof; 839 840 // Otherwise, just return the character as its ascii value. 841 int ThisChar = LastChar; 842 LastChar = getchar(); 843 return ThisChar; 844 } 845 846 //===----------------------------------------------------------------------===// 847 // Abstract Syntax Tree (aka Parse Tree) 848 //===----------------------------------------------------------------------===// 849 850 /// ExprAST - Base class for all expression nodes. 851 class ExprAST { 852 public: 853 virtual ~ExprAST() {} 854 virtual Value *Codegen() = 0; 855 }; 856 857 /// NumberExprAST - Expression class for numeric literals like "1.0". 858 class NumberExprAST : public ExprAST { 859 double Val; 860 public: 861 NumberExprAST(double val) : Val(val) {} 862 virtual Value *Codegen(); 863 }; 864 865 /// VariableExprAST - Expression class for referencing a variable, like "a". 866 class VariableExprAST : public ExprAST { 867 std::string Name; 868 public: 869 VariableExprAST(const std::string &name) : Name(name) {} 870 virtual Value *Codegen(); 871 }; 872 873 /// UnaryExprAST - Expression class for a unary operator. 874 class UnaryExprAST : public ExprAST { 875 char Opcode; 876 ExprAST *Operand; 877 public: 878 UnaryExprAST(char opcode, ExprAST *operand) 879 : Opcode(opcode), Operand(operand) {} 880 virtual Value *Codegen(); 881 }; 882 883 /// BinaryExprAST - Expression class for a binary operator. 884 class BinaryExprAST : public ExprAST { 885 char Op; 886 ExprAST *LHS, *RHS; 887 public: 888 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 889 : Op(op), LHS(lhs), RHS(rhs) {} 890 virtual Value *Codegen(); 891 }; 892 893 /// CallExprAST - Expression class for function calls. 894 class CallExprAST : public ExprAST { 895 std::string Callee; 896 std::vector<ExprAST*> Args; 897 public: 898 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) 899 : Callee(callee), Args(args) {} 900 virtual Value *Codegen(); 901 }; 902 903 /// IfExprAST - Expression class for if/then/else. 904 class IfExprAST : public ExprAST { 905 ExprAST *Cond, *Then, *Else; 906 public: 907 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 908 : Cond(cond), Then(then), Else(_else) {} 909 virtual Value *Codegen(); 910 }; 911 912 /// ForExprAST - Expression class for for/in. 913 class ForExprAST : public ExprAST { 914 std::string VarName; 915 ExprAST *Start, *End, *Step, *Body; 916 public: 917 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 918 ExprAST *step, ExprAST *body) 919 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 920 virtual Value *Codegen(); 921 }; 922 923 /// PrototypeAST - This class represents the "prototype" for a function, 924 /// which captures its name, and its argument names (thus implicitly the number 925 /// of arguments the function takes), as well as if it is an operator. 926 class PrototypeAST { 927 std::string Name; 928 std::vector<std::string> Args; 929 bool isOperator; 930 unsigned Precedence; // Precedence if a binary op. 931 public: 932 PrototypeAST(const std::string &name, const std::vector<std::string> &args, 933 bool isoperator = false, unsigned prec = 0) 934 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} 935 936 bool isUnaryOp() const { return isOperator && Args.size() == 1; } 937 bool isBinaryOp() const { return isOperator && Args.size() == 2; } 938 939 char getOperatorName() const { 940 assert(isUnaryOp() || isBinaryOp()); 941 return Name[Name.size()-1]; 942 } 943 944 unsigned getBinaryPrecedence() const { return Precedence; } 945 946 Function *Codegen(); 947 }; 948 949 /// FunctionAST - This class represents a function definition itself. 950 class FunctionAST { 951 PrototypeAST *Proto; 952 ExprAST *Body; 953 public: 954 FunctionAST(PrototypeAST *proto, ExprAST *body) 955 : Proto(proto), Body(body) {} 956 957 Function *Codegen(); 958 }; 959 960 //===----------------------------------------------------------------------===// 961 // Parser 962 //===----------------------------------------------------------------------===// 963 964 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 965 /// token the parser is looking at. getNextToken reads another token from the 966 /// lexer and updates CurTok with its results. 967 static int CurTok; 968 static int getNextToken() { 969 return CurTok = gettok(); 970 } 971 972 /// BinopPrecedence - This holds the precedence for each binary operator that is 973 /// defined. 974 static std::map<char, int> BinopPrecedence; 975 976 /// GetTokPrecedence - Get the precedence of the pending binary operator token. 977 static int GetTokPrecedence() { 978 if (!isascii(CurTok)) 979 return -1; 980 981 // Make sure it's a declared binop. 982 int TokPrec = BinopPrecedence[CurTok]; 983 if (TokPrec <= 0) return -1; 984 return TokPrec; 985 } 986 987 /// Error* - These are little helper functions for error handling. 988 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} 989 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } 990 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } 991 992 static ExprAST *ParseExpression(); 993 994 /// identifierexpr 995 /// ::= identifier 996 /// ::= identifier '(' expression* ')' 997 static ExprAST *ParseIdentifierExpr() { 998 std::string IdName = IdentifierStr; 999 1000 getNextToken(); // eat identifier. 1001 1002 if (CurTok != '(') // Simple variable ref. 1003 return new VariableExprAST(IdName); 1004 1005 // Call. 1006 getNextToken(); // eat ( 1007 std::vector<ExprAST*> Args; 1008 if (CurTok != ')') { 1009 while (1) { 1010 ExprAST *Arg = ParseExpression(); 1011 if (!Arg) return 0; 1012 Args.push_back(Arg); 1013 1014 if (CurTok == ')') break; 1015 1016 if (CurTok != ',') 1017 return Error("Expected ')' or ',' in argument list"); 1018 getNextToken(); 1019 } 1020 } 1021 1022 // Eat the ')'. 1023 getNextToken(); 1024 1025 return new CallExprAST(IdName, Args); 1026 } 1027 1028 /// numberexpr ::= number 1029 static ExprAST *ParseNumberExpr() { 1030 ExprAST *Result = new NumberExprAST(NumVal); 1031 getNextToken(); // consume the number 1032 return Result; 1033 } 1034 1035 /// parenexpr ::= '(' expression ')' 1036 static ExprAST *ParseParenExpr() { 1037 getNextToken(); // eat (. 1038 ExprAST *V = ParseExpression(); 1039 if (!V) return 0; 1040 1041 if (CurTok != ')') 1042 return Error("expected ')'"); 1043 getNextToken(); // eat ). 1044 return V; 1045 } 1046 1047 /// ifexpr ::= 'if' expression 'then' expression 'else' expression 1048 static ExprAST *ParseIfExpr() { 1049 getNextToken(); // eat the if. 1050 1051 // condition. 1052 ExprAST *Cond = ParseExpression(); 1053 if (!Cond) return 0; 1054 1055 if (CurTok != tok_then) 1056 return Error("expected then"); 1057 getNextToken(); // eat the then 1058 1059 ExprAST *Then = ParseExpression(); 1060 if (Then == 0) return 0; 1061 1062 if (CurTok != tok_else) 1063 return Error("expected else"); 1064 1065 getNextToken(); 1066 1067 ExprAST *Else = ParseExpression(); 1068 if (!Else) return 0; 1069 1070 return new IfExprAST(Cond, Then, Else); 1071 } 1072 1073 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 1074 static ExprAST *ParseForExpr() { 1075 getNextToken(); // eat the for. 1076 1077 if (CurTok != tok_identifier) 1078 return Error("expected identifier after for"); 1079 1080 std::string IdName = IdentifierStr; 1081 getNextToken(); // eat identifier. 1082 1083 if (CurTok != '=') 1084 return Error("expected '=' after for"); 1085 getNextToken(); // eat '='. 1086 1087 1088 ExprAST *Start = ParseExpression(); 1089 if (Start == 0) return 0; 1090 if (CurTok != ',') 1091 return Error("expected ',' after for start value"); 1092 getNextToken(); 1093 1094 ExprAST *End = ParseExpression(); 1095 if (End == 0) return 0; 1096 1097 // The step value is optional. 1098 ExprAST *Step = 0; 1099 if (CurTok == ',') { 1100 getNextToken(); 1101 Step = ParseExpression(); 1102 if (Step == 0) return 0; 1103 } 1104 1105 if (CurTok != tok_in) 1106 return Error("expected 'in' after for"); 1107 getNextToken(); // eat 'in'. 1108 1109 ExprAST *Body = ParseExpression(); 1110 if (Body == 0) return 0; 1111 1112 return new ForExprAST(IdName, Start, End, Step, Body); 1113 } 1114 1115 /// primary 1116 /// ::= identifierexpr 1117 /// ::= numberexpr 1118 /// ::= parenexpr 1119 /// ::= ifexpr 1120 /// ::= forexpr 1121 static ExprAST *ParsePrimary() { 1122 switch (CurTok) { 1123 default: return Error("unknown token when expecting an expression"); 1124 case tok_identifier: return ParseIdentifierExpr(); 1125 case tok_number: return ParseNumberExpr(); 1126 case '(': return ParseParenExpr(); 1127 case tok_if: return ParseIfExpr(); 1128 case tok_for: return ParseForExpr(); 1129 } 1130 } 1131 1132 /// unary 1133 /// ::= primary 1134 /// ::= '!' unary 1135 static ExprAST *ParseUnary() { 1136 // If the current token is not an operator, it must be a primary expr. 1137 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') 1138 return ParsePrimary(); 1139 1140 // If this is a unary operator, read it. 1141 int Opc = CurTok; 1142 getNextToken(); 1143 if (ExprAST *Operand = ParseUnary()) 1144 return new UnaryExprAST(Opc, Operand); 1145 return 0; 1146 } 1147 1148 /// binoprhs 1149 /// ::= ('+' unary)* 1150 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 1151 // If this is a binop, find its precedence. 1152 while (1) { 1153 int TokPrec = GetTokPrecedence(); 1154 1155 // If this is a binop that binds at least as tightly as the current binop, 1156 // consume it, otherwise we are done. 1157 if (TokPrec < ExprPrec) 1158 return LHS; 1159 1160 // Okay, we know this is a binop. 1161 int BinOp = CurTok; 1162 getNextToken(); // eat binop 1163 1164 // Parse the unary expression after the binary operator. 1165 ExprAST *RHS = ParseUnary(); 1166 if (!RHS) return 0; 1167 1168 // If BinOp binds less tightly with RHS than the operator after RHS, let 1169 // the pending operator take RHS as its LHS. 1170 int NextPrec = GetTokPrecedence(); 1171 if (TokPrec < NextPrec) { 1172 RHS = ParseBinOpRHS(TokPrec+1, RHS); 1173 if (RHS == 0) return 0; 1174 } 1175 1176 // Merge LHS/RHS. 1177 LHS = new BinaryExprAST(BinOp, LHS, RHS); 1178 } 1179 } 1180 1181 /// expression 1182 /// ::= unary binoprhs 1183 /// 1184 static ExprAST *ParseExpression() { 1185 ExprAST *LHS = ParseUnary(); 1186 if (!LHS) return 0; 1187 1188 return ParseBinOpRHS(0, LHS); 1189 } 1190 1191 /// prototype 1192 /// ::= id '(' id* ')' 1193 /// ::= binary LETTER number? (id, id) 1194 /// ::= unary LETTER (id) 1195 static PrototypeAST *ParsePrototype() { 1196 std::string FnName; 1197 1198 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 1199 unsigned BinaryPrecedence = 30; 1200 1201 switch (CurTok) { 1202 default: 1203 return ErrorP("Expected function name in prototype"); 1204 case tok_identifier: 1205 FnName = IdentifierStr; 1206 Kind = 0; 1207 getNextToken(); 1208 break; 1209 case tok_unary: 1210 getNextToken(); 1211 if (!isascii(CurTok)) 1212 return ErrorP("Expected unary operator"); 1213 FnName = "unary"; 1214 FnName += (char)CurTok; 1215 Kind = 1; 1216 getNextToken(); 1217 break; 1218 case tok_binary: 1219 getNextToken(); 1220 if (!isascii(CurTok)) 1221 return ErrorP("Expected binary operator"); 1222 FnName = "binary"; 1223 FnName += (char)CurTok; 1224 Kind = 2; 1225 getNextToken(); 1226 1227 // Read the precedence if present. 1228 if (CurTok == tok_number) { 1229 if (NumVal < 1 || NumVal > 100) 1230 return ErrorP("Invalid precedecnce: must be 1..100"); 1231 BinaryPrecedence = (unsigned)NumVal; 1232 getNextToken(); 1233 } 1234 break; 1235 } 1236 1237 if (CurTok != '(') 1238 return ErrorP("Expected '(' in prototype"); 1239 1240 std::vector<std::string> ArgNames; 1241 while (getNextToken() == tok_identifier) 1242 ArgNames.push_back(IdentifierStr); 1243 if (CurTok != ')') 1244 return ErrorP("Expected ')' in prototype"); 1245 1246 // success. 1247 getNextToken(); // eat ')'. 1248 1249 // Verify right number of names for operator. 1250 if (Kind && ArgNames.size() != Kind) 1251 return ErrorP("Invalid number of operands for operator"); 1252 1253 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); 1254 } 1255 1256 /// definition ::= 'def' prototype expression 1257 static FunctionAST *ParseDefinition() { 1258 getNextToken(); // eat def. 1259 PrototypeAST *Proto = ParsePrototype(); 1260 if (Proto == 0) return 0; 1261 1262 if (ExprAST *E = ParseExpression()) 1263 return new FunctionAST(Proto, E); 1264 return 0; 1265 } 1266 1267 /// toplevelexpr ::= expression 1268 static FunctionAST *ParseTopLevelExpr() { 1269 if (ExprAST *E = ParseExpression()) { 1270 // Make an anonymous proto. 1271 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); 1272 return new FunctionAST(Proto, E); 1273 } 1274 return 0; 1275 } 1276 1277 /// external ::= 'extern' prototype 1278 static PrototypeAST *ParseExtern() { 1279 getNextToken(); // eat extern. 1280 return ParsePrototype(); 1281 } 1282 1283 //===----------------------------------------------------------------------===// 1284 // Code Generation 1285 //===----------------------------------------------------------------------===// 1286 1287 static Module *TheModule; 1288 static IRBuilder<> Builder(getGlobalContext()); 1289 static std::map<std::string, Value*> NamedValues; 1290 static FunctionPassManager *TheFPM; 1291 1292 Value *ErrorV(const char *Str) { Error(Str); return 0; } 1293 1294 Value *NumberExprAST::Codegen() { 1295 return ConstantFP::get(getGlobalContext(), APFloat(Val)); 1296 } 1297 1298 Value *VariableExprAST::Codegen() { 1299 // Look this variable up in the function. 1300 Value *V = NamedValues[Name]; 1301 return V ? V : ErrorV("Unknown variable name"); 1302 } 1303 1304 Value *UnaryExprAST::Codegen() { 1305 Value *OperandV = Operand->Codegen(); 1306 if (OperandV == 0) return 0; 1307 1308 Function *F = TheModule->getFunction(std::string("unary")+Opcode); 1309 if (F == 0) 1310 return ErrorV("Unknown unary operator"); 1311 1312 return Builder.CreateCall(F, OperandV, "unop"); 1313 } 1314 1315 Value *BinaryExprAST::Codegen() { 1316 Value *L = LHS->Codegen(); 1317 Value *R = RHS->Codegen(); 1318 if (L == 0 || R == 0) return 0; 1319 1320 switch (Op) { 1321 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 1322 case '-': return Builder.CreateFSub(L, R, "subtmp"); 1323 case '*': return Builder.CreateFMul(L, R, "multmp"); 1324 case '<': 1325 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 1326 // Convert bool 0/1 to double 0.0 or 1.0 1327 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 1328 "booltmp"); 1329 default: break; 1330 } 1331 1332 // If it wasn't a builtin binary operator, it must be a user defined one. Emit 1333 // a call to it. 1334 Function *F = TheModule->getFunction(std::string("binary")+Op); 1335 assert(F && "binary operator not found!"); 1336 1337 Value *Ops[2] = { L, R }; 1338 return Builder.CreateCall(F, Ops, "binop"); 1339 } 1340 1341 Value *CallExprAST::Codegen() { 1342 // Look up the name in the global module table. 1343 Function *CalleeF = TheModule->getFunction(Callee); 1344 if (CalleeF == 0) 1345 return ErrorV("Unknown function referenced"); 1346 1347 // If argument mismatch error. 1348 if (CalleeF->arg_size() != Args.size()) 1349 return ErrorV("Incorrect # arguments passed"); 1350 1351 std::vector<Value*> ArgsV; 1352 for (unsigned i = 0, e = Args.size(); i != e; ++i) { 1353 ArgsV.push_back(Args[i]->Codegen()); 1354 if (ArgsV.back() == 0) return 0; 1355 } 1356 1357 return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); 1358 } 1359 1360 Value *IfExprAST::Codegen() { 1361 Value *CondV = Cond->Codegen(); 1362 if (CondV == 0) return 0; 1363 1364 // Convert condition to a bool by comparing equal to 0.0. 1365 CondV = Builder.CreateFCmpONE(CondV, 1366 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1367 "ifcond"); 1368 1369 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1370 1371 // Create blocks for the then and else cases. Insert the 'then' block at the 1372 // end of the function. 1373 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 1374 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 1375 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 1376 1377 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 1378 1379 // Emit then value. 1380 Builder.SetInsertPoint(ThenBB); 1381 1382 Value *ThenV = Then->Codegen(); 1383 if (ThenV == 0) return 0; 1384 1385 Builder.CreateBr(MergeBB); 1386 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 1387 ThenBB = Builder.GetInsertBlock(); 1388 1389 // Emit else block. 1390 TheFunction->getBasicBlockList().push_back(ElseBB); 1391 Builder.SetInsertPoint(ElseBB); 1392 1393 Value *ElseV = Else->Codegen(); 1394 if (ElseV == 0) return 0; 1395 1396 Builder.CreateBr(MergeBB); 1397 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 1398 ElseBB = Builder.GetInsertBlock(); 1399 1400 // Emit merge block. 1401 TheFunction->getBasicBlockList().push_back(MergeBB); 1402 Builder.SetInsertPoint(MergeBB); 1403 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 1404 "iftmp"); 1405 1406 PN->addIncoming(ThenV, ThenBB); 1407 PN->addIncoming(ElseV, ElseBB); 1408 return PN; 1409 } 1410 1411 Value *ForExprAST::Codegen() { 1412 // Output this as: 1413 // ... 1414 // start = startexpr 1415 // goto loop 1416 // loop: 1417 // variable = phi [start, loopheader], [nextvariable, loopend] 1418 // ... 1419 // bodyexpr 1420 // ... 1421 // loopend: 1422 // step = stepexpr 1423 // nextvariable = variable + step 1424 // endcond = endexpr 1425 // br endcond, loop, endloop 1426 // outloop: 1427 1428 // Emit the start code first, without 'variable' in scope. 1429 Value *StartVal = Start->Codegen(); 1430 if (StartVal == 0) return 0; 1431 1432 // Make the new basic block for the loop header, inserting after current 1433 // block. 1434 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1435 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 1436 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 1437 1438 // Insert an explicit fall through from the current block to the LoopBB. 1439 Builder.CreateBr(LoopBB); 1440 1441 // Start insertion in LoopBB. 1442 Builder.SetInsertPoint(LoopBB); 1443 1444 // Start the PHI node with an entry for Start. 1445 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 1446 Variable->addIncoming(StartVal, PreheaderBB); 1447 1448 // Within the loop, the variable is defined equal to the PHI node. If it 1449 // shadows an existing variable, we have to restore it, so save it now. 1450 Value *OldVal = NamedValues[VarName]; 1451 NamedValues[VarName] = Variable; 1452 1453 // Emit the body of the loop. This, like any other expr, can change the 1454 // current BB. Note that we ignore the value computed by the body, but don't 1455 // allow an error. 1456 if (Body->Codegen() == 0) 1457 return 0; 1458 1459 // Emit the step value. 1460 Value *StepVal; 1461 if (Step) { 1462 StepVal = Step->Codegen(); 1463 if (StepVal == 0) return 0; 1464 } else { 1465 // If not specified, use 1.0. 1466 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 1467 } 1468 1469 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 1470 1471 // Compute the end condition. 1472 Value *EndCond = End->Codegen(); 1473 if (EndCond == 0) return EndCond; 1474 1475 // Convert condition to a bool by comparing equal to 0.0. 1476 EndCond = Builder.CreateFCmpONE(EndCond, 1477 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1478 "loopcond"); 1479 1480 // Create the "after loop" block and insert it. 1481 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 1482 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 1483 1484 // Insert the conditional branch into the end of LoopEndBB. 1485 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 1486 1487 // Any new code will be inserted in AfterBB. 1488 Builder.SetInsertPoint(AfterBB); 1489 1490 // Add a new entry to the PHI node for the backedge. 1491 Variable->addIncoming(NextVar, LoopEndBB); 1492 1493 // Restore the unshadowed variable. 1494 if (OldVal) 1495 NamedValues[VarName] = OldVal; 1496 else 1497 NamedValues.erase(VarName); 1498 1499 1500 // for expr always returns 0.0. 1501 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 1502 } 1503 1504 Function *PrototypeAST::Codegen() { 1505 // Make the function type: double(double,double) etc. 1506 std::vector<Type*> Doubles(Args.size(), 1507 Type::getDoubleTy(getGlobalContext())); 1508 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), 1509 Doubles, false); 1510 1511 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); 1512 1513 // If F conflicted, there was already something named 'Name'. If it has a 1514 // body, don't allow redefinition or reextern. 1515 if (F->getName() != Name) { 1516 // Delete the one we just made and get the existing one. 1517 F->eraseFromParent(); 1518 F = TheModule->getFunction(Name); 1519 1520 // If F already has a body, reject this. 1521 if (!F->empty()) { 1522 ErrorF("redefinition of function"); 1523 return 0; 1524 } 1525 1526 // If F took a different number of args, reject. 1527 if (F->arg_size() != Args.size()) { 1528 ErrorF("redefinition of function with different # args"); 1529 return 0; 1530 } 1531 } 1532 1533 // Set names for all arguments. 1534 unsigned Idx = 0; 1535 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); 1536 ++AI, ++Idx) { 1537 AI->setName(Args[Idx]); 1538 1539 // Add arguments to variable symbol table. 1540 NamedValues[Args[Idx]] = AI; 1541 } 1542 1543 return F; 1544 } 1545 1546 Function *FunctionAST::Codegen() { 1547 NamedValues.clear(); 1548 1549 Function *TheFunction = Proto->Codegen(); 1550 if (TheFunction == 0) 1551 return 0; 1552 1553 // If this is an operator, install it. 1554 if (Proto->isBinaryOp()) 1555 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); 1556 1557 // Create a new basic block to start insertion into. 1558 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 1559 Builder.SetInsertPoint(BB); 1560 1561 if (Value *RetVal = Body->Codegen()) { 1562 // Finish off the function. 1563 Builder.CreateRet(RetVal); 1564 1565 // Validate the generated code, checking for consistency. 1566 verifyFunction(*TheFunction); 1567 1568 // Optimize the function. 1569 TheFPM->run(*TheFunction); 1570 1571 return TheFunction; 1572 } 1573 1574 // Error reading body, remove function. 1575 TheFunction->eraseFromParent(); 1576 1577 if (Proto->isBinaryOp()) 1578 BinopPrecedence.erase(Proto->getOperatorName()); 1579 return 0; 1580 } 1581 1582 //===----------------------------------------------------------------------===// 1583 // Top-Level parsing and JIT Driver 1584 //===----------------------------------------------------------------------===// 1585 1586 static ExecutionEngine *TheExecutionEngine; 1587 1588 static void HandleDefinition() { 1589 if (FunctionAST *F = ParseDefinition()) { 1590 if (Function *LF = F->Codegen()) { 1591 fprintf(stderr, "Read function definition:"); 1592 LF->dump(); 1593 } 1594 } else { 1595 // Skip token for error recovery. 1596 getNextToken(); 1597 } 1598 } 1599 1600 static void HandleExtern() { 1601 if (PrototypeAST *P = ParseExtern()) { 1602 if (Function *F = P->Codegen()) { 1603 fprintf(stderr, "Read extern: "); 1604 F->dump(); 1605 } 1606 } else { 1607 // Skip token for error recovery. 1608 getNextToken(); 1609 } 1610 } 1611 1612 static void HandleTopLevelExpression() { 1613 // Evaluate a top-level expression into an anonymous function. 1614 if (FunctionAST *F = ParseTopLevelExpr()) { 1615 if (Function *LF = F->Codegen()) { 1616 // JIT the function, returning a function pointer. 1617 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); 1618 1619 // Cast it to the right type (takes no arguments, returns a double) so we 1620 // can call it as a native function. 1621 double (*FP)() = (double (*)())(intptr_t)FPtr; 1622 fprintf(stderr, "Evaluated to %f\n", FP()); 1623 } 1624 } else { 1625 // Skip token for error recovery. 1626 getNextToken(); 1627 } 1628 } 1629 1630 /// top ::= definition | external | expression | ';' 1631 static void MainLoop() { 1632 while (1) { 1633 fprintf(stderr, "ready> "); 1634 switch (CurTok) { 1635 case tok_eof: return; 1636 case ';': getNextToken(); break; // ignore top-level semicolons. 1637 case tok_def: HandleDefinition(); break; 1638 case tok_extern: HandleExtern(); break; 1639 default: HandleTopLevelExpression(); break; 1640 } 1641 } 1642 } 1643 1644 //===----------------------------------------------------------------------===// 1645 // "Library" functions that can be "extern'd" from user code. 1646 //===----------------------------------------------------------------------===// 1647 1648 /// putchard - putchar that takes a double and returns 0. 1649 extern "C" 1650 double putchard(double X) { 1651 putchar((char)X); 1652 return 0; 1653 } 1654 1655 /// printd - printf that takes a double prints it as "%f\n", returning 0. 1656 extern "C" 1657 double printd(double X) { 1658 printf("%f\n", X); 1659 return 0; 1660 } 1661 1662 //===----------------------------------------------------------------------===// 1663 // Main driver code. 1664 //===----------------------------------------------------------------------===// 1665 1666 int main() { 1667 InitializeNativeTarget(); 1668 LLVMContext &Context = getGlobalContext(); 1669 1670 // Install standard binary operators. 1671 // 1 is lowest precedence. 1672 BinopPrecedence['<'] = 10; 1673 BinopPrecedence['+'] = 20; 1674 BinopPrecedence['-'] = 20; 1675 BinopPrecedence['*'] = 40; // highest. 1676 1677 // Prime the first token. 1678 fprintf(stderr, "ready> "); 1679 getNextToken(); 1680 1681 // Make the module, which holds all the code. 1682 TheModule = new Module("my cool jit", Context); 1683 1684 // Create the JIT. This takes ownership of the module. 1685 std::string ErrStr; 1686 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); 1687 if (!TheExecutionEngine) { 1688 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); 1689 exit(1); 1690 } 1691 1692 FunctionPassManager OurFPM(TheModule); 1693 1694 // Set up the optimizer pipeline. Start with registering info about how the 1695 // target lays out data structures. 1696 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); 1697 // Provide basic AliasAnalysis support for GVN. 1698 OurFPM.add(createBasicAliasAnalysisPass()); 1699 // Do simple "peephole" optimizations and bit-twiddling optzns. 1700 OurFPM.add(createInstructionCombiningPass()); 1701 // Reassociate expressions. 1702 OurFPM.add(createReassociatePass()); 1703 // Eliminate Common SubExpressions. 1704 OurFPM.add(createGVNPass()); 1705 // Simplify the control flow graph (deleting unreachable blocks, etc). 1706 OurFPM.add(createCFGSimplificationPass()); 1707 1708 OurFPM.doInitialization(); 1709 1710 // Set the global so the code gen can use this. 1711 TheFPM = &OurFPM; 1712 1713 // Run the main "interpreter loop" now. 1714 MainLoop(); 1715 1716 TheFPM = 0; 1717 1718 // Print out all of the generated code. 1719 TheModule->dump(); 1720 1721 return 0; 1722 } 1723 1724 `Next: Extending the language: mutable variables / SSA 1725 construction <LangImpl7.html>`_ 1726 1727