1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 4 <html> 5 <head> 6 <title>Kaleidoscope: Extending the Language: User-defined Operators</title> 7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8 <meta name="author" content="Chris Lattner"> 9 <link rel="stylesheet" href="../llvm.css" type="text/css"> 10 </head> 11 12 <body> 13 14 <h1>Kaleidoscope: Extending the Language: User-defined Operators</h1> 15 16 <ul> 17 <li><a href="index.html">Up to Tutorial Index</a></li> 18 <li>Chapter 6 19 <ol> 20 <li><a href="#intro">Chapter 6 Introduction</a></li> 21 <li><a href="#idea">User-defined Operators: the Idea</a></li> 22 <li><a href="#binary">User-defined Binary Operators</a></li> 23 <li><a href="#unary">User-defined Unary Operators</a></li> 24 <li><a href="#example">Kicking the Tires</a></li> 25 <li><a href="#code">Full Code Listing</a></li> 26 </ol> 27 </li> 28 <li><a href="LangImpl7.html">Chapter 7</a>: Extending the Language: Mutable 29 Variables / SSA Construction</li> 30 </ul> 31 32 <div class="doc_author"> 33 <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a></p> 34 </div> 35 36 <!-- *********************************************************************** --> 37 <h2><a name="intro">Chapter 6 Introduction</a></h2> 38 <!-- *********************************************************************** --> 39 40 <div> 41 42 <p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language 43 with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully 44 functional language that is fairly minimal, but also useful. There 45 is still one big problem with it, however. Our language doesn't have many 46 useful operators (like division, logical negation, or even any comparisons 47 besides less-than).</p> 48 49 <p>This chapter of the tutorial takes a wild digression into adding user-defined 50 operators to the simple and beautiful Kaleidoscope language. This digression now gives 51 us a simple and ugly language in some ways, but also a powerful one at the same time. 52 One of the great things about creating your own language is that you get to 53 decide what is good or bad. In this tutorial we'll assume that it is okay to 54 use this as a way to show some interesting parsing techniques.</p> 55 56 <p>At the end of this tutorial, we'll run through an example Kaleidoscope 57 application that <a href="#example">renders the Mandelbrot set</a>. This gives 58 an example of what you can build with Kaleidoscope and its feature set.</p> 59 60 </div> 61 62 <!-- *********************************************************************** --> 63 <h2><a name="idea">User-defined Operators: the Idea</a></h2> 64 <!-- *********************************************************************** --> 65 66 <div> 67 68 <p> 69 The "operator overloading" that we will add to Kaleidoscope is more general than 70 languages like C++. In C++, you are only allowed to redefine existing 71 operators: you can't programatically change the grammar, introduce new 72 operators, change precedence levels, etc. In this chapter, we will add this 73 capability to Kaleidoscope, which will let the user round out the set of 74 operators that are supported.</p> 75 76 <p>The point of going into user-defined operators in a tutorial like this is to 77 show the power and flexibility of using a hand-written parser. Thus far, the parser 78 we have been implementing uses recursive descent for most parts of the grammar and 79 operator precedence parsing for the expressions. See <a 80 href="LangImpl2.html">Chapter 2</a> for details. Without using operator 81 precedence parsing, it would be very difficult to allow the programmer to 82 introduce new operators into the grammar: the grammar is dynamically extensible 83 as the JIT runs.</p> 84 85 <p>The two specific features we'll add are programmable unary operators (right 86 now, Kaleidoscope has no unary operators at all) as well as binary operators. 87 An example of this is:</p> 88 89 <div class="doc_code"> 90 <pre> 91 # Logical unary not. 92 def unary!(v) 93 if v then 94 0 95 else 96 1; 97 98 # Define > with the same precedence as <. 99 def binary> 10 (LHS RHS) 100 RHS < LHS; 101 102 # Binary "logical or", (note that it does not "short circuit") 103 def binary| 5 (LHS RHS) 104 if LHS then 105 1 106 else if RHS then 107 1 108 else 109 0; 110 111 # Define = with slightly lower precedence than relationals. 112 def binary= 9 (LHS RHS) 113 !(LHS < RHS | LHS > RHS); 114 </pre> 115 </div> 116 117 <p>Many languages aspire to being able to implement their standard runtime 118 library in the language itself. In Kaleidoscope, we can implement significant 119 parts of the language in the library!</p> 120 121 <p>We will break down implementation of these features into two parts: 122 implementing support for user-defined binary operators and adding unary 123 operators.</p> 124 125 </div> 126 127 <!-- *********************************************************************** --> 128 <h2><a name="binary">User-defined Binary Operators</a></h2> 129 <!-- *********************************************************************** --> 130 131 <div> 132 133 <p>Adding support for user-defined binary operators is pretty simple with our 134 current framework. We'll first add support for the unary/binary keywords:</p> 135 136 <div class="doc_code"> 137 <pre> 138 enum Token { 139 ... 140 <b>// operators 141 tok_binary = -11, tok_unary = -12</b> 142 }; 143 ... 144 static int gettok() { 145 ... 146 if (IdentifierStr == "for") return tok_for; 147 if (IdentifierStr == "in") return tok_in; 148 <b>if (IdentifierStr == "binary") return tok_binary; 149 if (IdentifierStr == "unary") return tok_unary;</b> 150 return tok_identifier; 151 </pre> 152 </div> 153 154 <p>This just adds lexer support for the unary and binary keywords, like we 155 did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing 156 about our current AST, is that we represent binary operators with full generalisation 157 by using their ASCII code as the opcode. For our extended operators, we'll use this 158 same representation, so we don't need any new AST or parser support.</p> 159 160 <p>On the other hand, we have to be able to represent the definitions of these 161 new operators, in the "def binary| 5" part of the function definition. In our 162 grammar so far, the "name" for the function definition is parsed as the 163 "prototype" production and into the <tt>PrototypeAST</tt> AST node. To 164 represent our new user-defined operators as prototypes, we have to extend 165 the <tt>PrototypeAST</tt> AST node like this:</p> 166 167 <div class="doc_code"> 168 <pre> 169 /// PrototypeAST - This class represents the "prototype" for a function, 170 /// which captures its argument names as well as if it is an operator. 171 class PrototypeAST { 172 std::string Name; 173 std::vector<std::string> Args; 174 <b>bool isOperator; 175 unsigned Precedence; // Precedence if a binary op.</b> 176 public: 177 PrototypeAST(const std::string &name, const std::vector<std::string> &args, 178 <b>bool isoperator = false, unsigned prec = 0</b>) 179 : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {} 180 181 <b>bool isUnaryOp() const { return isOperator && Args.size() == 1; } 182 bool isBinaryOp() const { return isOperator && Args.size() == 2; } 183 184 char getOperatorName() const { 185 assert(isUnaryOp() || isBinaryOp()); 186 return Name[Name.size()-1]; 187 } 188 189 unsigned getBinaryPrecedence() const { return Precedence; }</b> 190 191 Function *Codegen(); 192 }; 193 </pre> 194 </div> 195 196 <p>Basically, in addition to knowing a name for the prototype, we now keep track 197 of whether it was an operator, and if it was, what precedence level the operator 198 is at. The precedence is only used for binary operators (as you'll see below, 199 it just doesn't apply for unary operators). Now that we have a way to represent 200 the prototype for a user-defined operator, we need to parse it:</p> 201 202 <div class="doc_code"> 203 <pre> 204 /// prototype 205 /// ::= id '(' id* ')' 206 <b>/// ::= binary LETTER number? (id, id)</b> 207 static PrototypeAST *ParsePrototype() { 208 std::string FnName; 209 210 <b>unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 211 unsigned BinaryPrecedence = 30;</b> 212 213 switch (CurTok) { 214 default: 215 return ErrorP("Expected function name in prototype"); 216 case tok_identifier: 217 FnName = IdentifierStr; 218 Kind = 0; 219 getNextToken(); 220 break; 221 <b>case tok_binary: 222 getNextToken(); 223 if (!isascii(CurTok)) 224 return ErrorP("Expected binary operator"); 225 FnName = "binary"; 226 FnName += (char)CurTok; 227 Kind = 2; 228 getNextToken(); 229 230 // Read the precedence if present. 231 if (CurTok == tok_number) { 232 if (NumVal < 1 || NumVal > 100) 233 return ErrorP("Invalid precedecnce: must be 1..100"); 234 BinaryPrecedence = (unsigned)NumVal; 235 getNextToken(); 236 } 237 break;</b> 238 } 239 240 if (CurTok != '(') 241 return ErrorP("Expected '(' in prototype"); 242 243 std::vector<std::string> ArgNames; 244 while (getNextToken() == tok_identifier) 245 ArgNames.push_back(IdentifierStr); 246 if (CurTok != ')') 247 return ErrorP("Expected ')' in prototype"); 248 249 // success. 250 getNextToken(); // eat ')'. 251 252 <b>// Verify right number of names for operator. 253 if (Kind && ArgNames.size() != Kind) 254 return ErrorP("Invalid number of operands for operator"); 255 256 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);</b> 257 } 258 </pre> 259 </div> 260 261 <p>This is all fairly straightforward parsing code, and we have already seen 262 a lot of similar code in the past. One interesting part about the code above is 263 the couple lines that set up <tt>FnName</tt> for binary operators. This builds names 264 like "binary@" for a newly defined "@" operator. This then takes advantage of the 265 fact that symbol names in the LLVM symbol table are allowed to have any character in 266 them, including embedded nul characters.</p> 267 268 <p>The next interesting thing to add, is codegen support for these binary operators. 269 Given our current structure, this is a simple addition of a default case for our 270 existing binary operator node:</p> 271 272 <div class="doc_code"> 273 <pre> 274 Value *BinaryExprAST::Codegen() { 275 Value *L = LHS->Codegen(); 276 Value *R = RHS->Codegen(); 277 if (L == 0 || R == 0) return 0; 278 279 switch (Op) { 280 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 281 case '-': return Builder.CreateFSub(L, R, "subtmp"); 282 case '*': return Builder.CreateFMul(L, R, "multmp"); 283 case '<': 284 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 285 // Convert bool 0/1 to double 0.0 or 1.0 286 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 287 "booltmp"); 288 <b>default: break;</b> 289 } 290 291 <b>// If it wasn't a builtin binary operator, it must be a user defined one. Emit 292 // a call to it. 293 Function *F = TheModule->getFunction(std::string("binary")+Op); 294 assert(F && "binary operator not found!"); 295 296 Value *Ops[2] = { L, R }; 297 return Builder.CreateCall(F, Ops, "binop");</b> 298 } 299 300 </pre> 301 </div> 302 303 <p>As you can see above, the new code is actually really simple. It just does 304 a lookup for the appropriate operator in the symbol table and generates a 305 function call to it. Since user-defined operators are just built as normal 306 functions (because the "prototype" boils down to a function with the right 307 name) everything falls into place.</p> 308 309 <p>The final piece of code we are missing, is a bit of top-level magic:</p> 310 311 <div class="doc_code"> 312 <pre> 313 Function *FunctionAST::Codegen() { 314 NamedValues.clear(); 315 316 Function *TheFunction = Proto->Codegen(); 317 if (TheFunction == 0) 318 return 0; 319 320 <b>// If this is an operator, install it. 321 if (Proto->isBinaryOp()) 322 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();</b> 323 324 // Create a new basic block to start insertion into. 325 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 326 Builder.SetInsertPoint(BB); 327 328 if (Value *RetVal = Body->Codegen()) { 329 ... 330 </pre> 331 </div> 332 333 <p>Basically, before codegening a function, if it is a user-defined operator, we 334 register it in the precedence table. This allows the binary operator parsing 335 logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".</p> 336 337 <p>Now we have useful user-defined binary operators. This builds a lot 338 on the previous framework we built for other operators. Adding unary operators 339 is a bit more challenging, because we don't have any framework for it yet - lets 340 see what it takes.</p> 341 342 </div> 343 344 <!-- *********************************************************************** --> 345 <h2><a name="unary">User-defined Unary Operators</a></h2> 346 <!-- *********************************************************************** --> 347 348 <div> 349 350 <p>Since we don't currently support unary operators in the Kaleidoscope 351 language, we'll need to add everything to support them. Above, we added simple 352 support for the 'unary' keyword to the lexer. In addition to that, we need an 353 AST node:</p> 354 355 <div class="doc_code"> 356 <pre> 357 /// UnaryExprAST - Expression class for a unary operator. 358 class UnaryExprAST : public ExprAST { 359 char Opcode; 360 ExprAST *Operand; 361 public: 362 UnaryExprAST(char opcode, ExprAST *operand) 363 : Opcode(opcode), Operand(operand) {} 364 virtual Value *Codegen(); 365 }; 366 </pre> 367 </div> 368 369 <p>This AST node is very simple and obvious by now. It directly mirrors the 370 binary operator AST node, except that it only has one child. With this, we 371 need to add the parsing logic. Parsing a unary operator is pretty simple: we'll 372 add a new function to do it:</p> 373 374 <div class="doc_code"> 375 <pre> 376 /// unary 377 /// ::= primary 378 /// ::= '!' unary 379 static ExprAST *ParseUnary() { 380 // If the current token is not an operator, it must be a primary expr. 381 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') 382 return ParsePrimary(); 383 384 // If this is a unary operator, read it. 385 int Opc = CurTok; 386 getNextToken(); 387 if (ExprAST *Operand = ParseUnary()) 388 return new UnaryExprAST(Opc, Operand); 389 return 0; 390 } 391 </pre> 392 </div> 393 394 <p>The grammar we add is pretty straightforward here. If we see a unary 395 operator when parsing a primary operator, we eat the operator as a prefix and 396 parse the remaining piece as another unary operator. This allows us to handle 397 multiple unary operators (e.g. "!!x"). Note that unary operators can't have 398 ambiguous parses like binary operators can, so there is no need for precedence 399 information.</p> 400 401 <p>The problem with this function, is that we need to call ParseUnary from somewhere. 402 To do this, we change previous callers of ParsePrimary to call ParseUnary 403 instead:</p> 404 405 <div class="doc_code"> 406 <pre> 407 /// binoprhs 408 /// ::= ('+' unary)* 409 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 410 ... 411 <b>// Parse the unary expression after the binary operator. 412 ExprAST *RHS = ParseUnary(); 413 if (!RHS) return 0;</b> 414 ... 415 } 416 /// expression 417 /// ::= unary binoprhs 418 /// 419 static ExprAST *ParseExpression() { 420 <b>ExprAST *LHS = ParseUnary();</b> 421 if (!LHS) return 0; 422 423 return ParseBinOpRHS(0, LHS); 424 } 425 </pre> 426 </div> 427 428 <p>With these two simple changes, we are now able to parse unary operators and build the 429 AST for them. Next up, we need to add parser support for prototypes, to parse 430 the unary operator prototype. We extend the binary operator code above 431 with:</p> 432 433 <div class="doc_code"> 434 <pre> 435 /// prototype 436 /// ::= id '(' id* ')' 437 /// ::= binary LETTER number? (id, id) 438 <b>/// ::= unary LETTER (id)</b> 439 static PrototypeAST *ParsePrototype() { 440 std::string FnName; 441 442 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 443 unsigned BinaryPrecedence = 30; 444 445 switch (CurTok) { 446 default: 447 return ErrorP("Expected function name in prototype"); 448 case tok_identifier: 449 FnName = IdentifierStr; 450 Kind = 0; 451 getNextToken(); 452 break; 453 <b>case tok_unary: 454 getNextToken(); 455 if (!isascii(CurTok)) 456 return ErrorP("Expected unary operator"); 457 FnName = "unary"; 458 FnName += (char)CurTok; 459 Kind = 1; 460 getNextToken(); 461 break;</b> 462 case tok_binary: 463 ... 464 </pre> 465 </div> 466 467 <p>As with binary operators, we name unary operators with a name that includes 468 the operator character. This assists us at code generation time. Speaking of, 469 the final piece we need to add is codegen support for unary operators. It looks 470 like this:</p> 471 472 <div class="doc_code"> 473 <pre> 474 Value *UnaryExprAST::Codegen() { 475 Value *OperandV = Operand->Codegen(); 476 if (OperandV == 0) return 0; 477 478 Function *F = TheModule->getFunction(std::string("unary")+Opcode); 479 if (F == 0) 480 return ErrorV("Unknown unary operator"); 481 482 return Builder.CreateCall(F, OperandV, "unop"); 483 } 484 </pre> 485 </div> 486 487 <p>This code is similar to, but simpler than, the code for binary operators. It 488 is simpler primarily because it doesn't need to handle any predefined operators. 489 </p> 490 491 </div> 492 493 <!-- *********************************************************************** --> 494 <h2><a name="example">Kicking the Tires</a></h2> 495 <!-- *********************************************************************** --> 496 497 <div> 498 499 <p>It is somewhat hard to believe, but with a few simple extensions we've 500 covered in the last chapters, we have grown a real-ish language. With this, we 501 can do a lot of interesting things, including I/O, math, and a bunch of other 502 things. For example, we can now add a nice sequencing operator (printd is 503 defined to print out the specified value and a newline):</p> 504 505 <div class="doc_code"> 506 <pre> 507 ready> <b>extern printd(x);</b> 508 Read extern: 509 declare double @printd(double) 510 511 ready> <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b> 512 .. 513 ready> <b>printd(123) : printd(456) : printd(789);</b> 514 123.000000 515 456.000000 516 789.000000 517 Evaluated to 0.000000 518 </pre> 519 </div> 520 521 <p>We can also define a bunch of other "primitive" operations, such as:</p> 522 523 <div class="doc_code"> 524 <pre> 525 # Logical unary not. 526 def unary!(v) 527 if v then 528 0 529 else 530 1; 531 532 # Unary negate. 533 def unary-(v) 534 0-v; 535 536 # Define > with the same precedence as <. 537 def binary> 10 (LHS RHS) 538 RHS < LHS; 539 540 # Binary logical or, which does not short circuit. 541 def binary| 5 (LHS RHS) 542 if LHS then 543 1 544 else if RHS then 545 1 546 else 547 0; 548 549 # Binary logical and, which does not short circuit. 550 def binary& 6 (LHS RHS) 551 if !LHS then 552 0 553 else 554 !!RHS; 555 556 # Define = with slightly lower precedence than relationals. 557 def binary = 9 (LHS RHS) 558 !(LHS < RHS | LHS > RHS); 559 560 # Define ':' for sequencing: as a low-precedence operator that ignores operands 561 # and just returns the RHS. 562 def binary : 1 (x y) y; 563 </pre> 564 </div> 565 566 567 <p>Given the previous if/then/else support, we can also define interesting 568 functions for I/O. For example, the following prints out a character whose 569 "density" reflects the value passed in: the lower the value, the denser the 570 character:</p> 571 572 <div class="doc_code"> 573 <pre> 574 ready> 575 <b> 576 extern putchard(char) 577 def printdensity(d) 578 if d > 8 then 579 putchard(32) # ' ' 580 else if d > 4 then 581 putchard(46) # '.' 582 else if d > 2 then 583 putchard(43) # '+' 584 else 585 putchard(42); # '*'</b> 586 ... 587 ready> <b>printdensity(1): printdensity(2): printdensity(3): 588 printdensity(4): printdensity(5): printdensity(9): 589 putchard(10);</b> 590 **++. 591 Evaluated to 0.000000 592 </pre> 593 </div> 594 595 <p>Based on these simple primitive operations, we can start to define more 596 interesting things. For example, here's a little function that solves for the 597 number of iterations it takes a function in the complex plane to 598 converge:</p> 599 600 <div class="doc_code"> 601 <pre> 602 # Determine whether the specific location diverges. 603 # Solve for z = z^2 + c in the complex plane. 604 def mandleconverger(real imag iters creal cimag) 605 if iters > 255 | (real*real + imag*imag > 4) then 606 iters 607 else 608 mandleconverger(real*real - imag*imag + creal, 609 2*real*imag + cimag, 610 iters+1, creal, cimag); 611 612 # Return the number of iterations required for the iteration to escape 613 def mandleconverge(real imag) 614 mandleconverger(real, imag, 0, real, imag); 615 </pre> 616 </div> 617 618 <p>This "<code>z = z<sup>2</sup> + c</code>" function is a beautiful little 619 creature that is the basis for computation of 620 the <a href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. 621 Our <tt>mandelconverge</tt> function returns the number of iterations that it 622 takes for a complex orbit to escape, saturating to 255. This is not a very 623 useful function by itself, but if you plot its value over a two-dimensional 624 plane, you can see the Mandelbrot set. Given that we are limited to using 625 putchard here, our amazing graphical output is limited, but we can whip together 626 something using the density plotter above:</p> 627 628 <div class="doc_code"> 629 <pre> 630 # Compute and plot the mandlebrot set with the specified 2 dimensional range 631 # info. 632 def mandelhelp(xmin xmax xstep ymin ymax ystep) 633 for y = ymin, y < ymax, ystep in ( 634 (for x = xmin, x < xmax, xstep in 635 printdensity(mandleconverge(x,y))) 636 : putchard(10) 637 ) 638 639 # mandel - This is a convenient helper function for ploting the mandelbrot set 640 # from the specified position with the specified Magnification. 641 def mandel(realstart imagstart realmag imagmag) 642 mandelhelp(realstart, realstart+realmag*78, realmag, 643 imagstart, imagstart+imagmag*40, imagmag); 644 </pre> 645 </div> 646 647 <p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p> 648 649 <div class="doc_code"> 650 <pre> 651 ready> <b>mandel(-2.3, -1.3, 0.05, 0.07);</b> 652 *******************************+++++++++++************************************* 653 *************************+++++++++++++++++++++++******************************* 654 **********************+++++++++++++++++++++++++++++**************************** 655 *******************+++++++++++++++++++++.. ...++++++++************************* 656 *****************++++++++++++++++++++++.... ...+++++++++*********************** 657 ***************+++++++++++++++++++++++..... ...+++++++++********************* 658 **************+++++++++++++++++++++++.... ....+++++++++******************** 659 *************++++++++++++++++++++++...... .....++++++++******************* 660 ************+++++++++++++++++++++....... .......+++++++****************** 661 ***********+++++++++++++++++++.... ... .+++++++***************** 662 **********+++++++++++++++++....... .+++++++**************** 663 *********++++++++++++++........... ...+++++++*************** 664 ********++++++++++++............ ...++++++++************** 665 ********++++++++++... .......... .++++++++************** 666 *******+++++++++..... .+++++++++************* 667 *******++++++++...... ..+++++++++************* 668 *******++++++....... ..+++++++++************* 669 *******+++++...... ..+++++++++************* 670 *******.... .... ...+++++++++************* 671 *******.... . ...+++++++++************* 672 *******+++++...... ...+++++++++************* 673 *******++++++....... ..+++++++++************* 674 *******++++++++...... .+++++++++************* 675 *******+++++++++..... ..+++++++++************* 676 ********++++++++++... .......... .++++++++************** 677 ********++++++++++++............ ...++++++++************** 678 *********++++++++++++++.......... ...+++++++*************** 679 **********++++++++++++++++........ .+++++++**************** 680 **********++++++++++++++++++++.... ... ..+++++++**************** 681 ***********++++++++++++++++++++++....... .......++++++++***************** 682 ************+++++++++++++++++++++++...... ......++++++++****************** 683 **************+++++++++++++++++++++++.... ....++++++++******************** 684 ***************+++++++++++++++++++++++..... ...+++++++++********************* 685 *****************++++++++++++++++++++++.... ...++++++++*********************** 686 *******************+++++++++++++++++++++......++++++++************************* 687 *********************++++++++++++++++++++++.++++++++*************************** 688 *************************+++++++++++++++++++++++******************************* 689 ******************************+++++++++++++************************************ 690 ******************************************************************************* 691 ******************************************************************************* 692 ******************************************************************************* 693 Evaluated to 0.000000 694 ready> <b>mandel(-2, -1, 0.02, 0.04);</b> 695 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ 696 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 697 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. 698 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... 699 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... 700 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ 701 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... 702 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. 703 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . 704 **********++++++++++++++++++++++++++++++++++++++++++++++............. 705 ********+++++++++++++++++++++++++++++++++++++++++++.................. 706 *******+++++++++++++++++++++++++++++++++++++++....................... 707 ******+++++++++++++++++++++++++++++++++++........................... 708 *****++++++++++++++++++++++++++++++++............................ 709 *****++++++++++++++++++++++++++++............................... 710 ****++++++++++++++++++++++++++...... ......................... 711 ***++++++++++++++++++++++++......... ...... ........... 712 ***++++++++++++++++++++++............ 713 **+++++++++++++++++++++.............. 714 **+++++++++++++++++++................ 715 *++++++++++++++++++................. 716 *++++++++++++++++............ ... 717 *++++++++++++++.............. 718 *+++....++++................ 719 *.......... ........... 720 * 721 *.......... ........... 722 *+++....++++................ 723 *++++++++++++++.............. 724 *++++++++++++++++............ ... 725 *++++++++++++++++++................. 726 **+++++++++++++++++++................ 727 **+++++++++++++++++++++.............. 728 ***++++++++++++++++++++++............ 729 ***++++++++++++++++++++++++......... ...... ........... 730 ****++++++++++++++++++++++++++...... ......................... 731 *****++++++++++++++++++++++++++++............................... 732 *****++++++++++++++++++++++++++++++++............................ 733 ******+++++++++++++++++++++++++++++++++++........................... 734 *******+++++++++++++++++++++++++++++++++++++++....................... 735 ********+++++++++++++++++++++++++++++++++++++++++++.................. 736 Evaluated to 0.000000 737 ready> <b>mandel(-0.9, -1.4, 0.02, 0.03);</b> 738 ******************************************************************************* 739 ******************************************************************************* 740 ******************************************************************************* 741 **********+++++++++++++++++++++************************************************ 742 *+++++++++++++++++++++++++++++++++++++++*************************************** 743 +++++++++++++++++++++++++++++++++++++++++++++********************************** 744 ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** 745 ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* 746 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** 747 +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* 748 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** 749 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** 750 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ 751 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** 752 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** 753 ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** 754 +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** 755 ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** 756 ++++++++++++++++++++........... .........++++++++++++++++++++++* 757 ++++++++++++++++++............ ...........++++++++++++++++++++ 758 ++++++++++++++++............... .............++++++++++++++++++ 759 ++++++++++++++................. ...............++++++++++++++++ 760 ++++++++++++.................. .................++++++++++++++ 761 +++++++++.................. .................+++++++++++++ 762 ++++++........ . ......... ..++++++++++++ 763 ++............ ...... ....++++++++++ 764 .............. ...++++++++++ 765 .............. ....+++++++++ 766 .............. .....++++++++ 767 ............. ......++++++++ 768 ........... .......++++++++ 769 ......... ........+++++++ 770 ......... ........+++++++ 771 ......... ....+++++++ 772 ........ ...+++++++ 773 ....... ...+++++++ 774 ....+++++++ 775 .....+++++++ 776 ....+++++++ 777 ....+++++++ 778 ....+++++++ 779 Evaluated to 0.000000 780 ready> <b>^D</b> 781 </pre> 782 </div> 783 784 <p>At this point, you may be starting to realize that Kaleidoscope is a real 785 and powerful language. It may not be self-similar :), but it can be used to 786 plot things that are!</p> 787 788 <p>With this, we conclude the "adding user-defined operators" chapter of the 789 tutorial. We have successfully augmented our language, adding the ability to extend the 790 language in the library, and we have shown how this can be used to build a simple but 791 interesting end-user application in Kaleidoscope. At this point, Kaleidoscope 792 can build a variety of applications that are functional and can call functions 793 with side-effects, but it can't actually define and mutate a variable itself. 794 </p> 795 796 <p>Strikingly, variable mutation is an important feature of some 797 languages, and it is not at all obvious how to <a href="LangImpl7.html">add 798 support for mutable variables</a> without having to add an "SSA construction" 799 phase to your front-end. In the next chapter, we will describe how you can 800 add variable mutation without building SSA in your front-end.</p> 801 802 </div> 803 804 <!-- *********************************************************************** --> 805 <h2><a name="code">Full Code Listing</a></h2> 806 <!-- *********************************************************************** --> 807 808 <div> 809 810 <p> 811 Here is the complete code listing for our running example, enhanced with the 812 if/then/else and for expressions.. To build this example, use: 813 </p> 814 815 <div class="doc_code"> 816 <pre> 817 # Compile 818 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy 819 # Run 820 ./toy 821 </pre> 822 </div> 823 824 <p>On some platforms, you will need to specify -rdynamic or -Wl,--export-dynamic 825 when linking. This ensures that symbols defined in the main executable are 826 exported to the dynamic linker and so are available for symbol resolution at 827 run time. This is not needed if you compile your support code into a shared 828 library, although doing that will cause problems on Windows.</p> 829 830 <p>Here is the code:</p> 831 832 <div class="doc_code"> 833 <pre> 834 #include "llvm/DerivedTypes.h" 835 #include "llvm/ExecutionEngine/ExecutionEngine.h" 836 #include "llvm/ExecutionEngine/JIT.h" 837 #include "llvm/LLVMContext.h" 838 #include "llvm/Module.h" 839 #include "llvm/PassManager.h" 840 #include "llvm/Analysis/Verifier.h" 841 #include "llvm/Analysis/Passes.h" 842 #include "llvm/Target/TargetData.h" 843 #include "llvm/Transforms/Scalar.h" 844 #include "llvm/Support/IRBuilder.h" 845 #include "llvm/Support/TargetSelect.h" 846 #include <cstdio> 847 #include <string> 848 #include <map> 849 #include <vector> 850 using namespace llvm; 851 852 //===----------------------------------------------------------------------===// 853 // Lexer 854 //===----------------------------------------------------------------------===// 855 856 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one 857 // of these for known things. 858 enum Token { 859 tok_eof = -1, 860 861 // commands 862 tok_def = -2, tok_extern = -3, 863 864 // primary 865 tok_identifier = -4, tok_number = -5, 866 867 // control 868 tok_if = -6, tok_then = -7, tok_else = -8, 869 tok_for = -9, tok_in = -10, 870 871 // operators 872 tok_binary = -11, tok_unary = -12 873 }; 874 875 static std::string IdentifierStr; // Filled in if tok_identifier 876 static double NumVal; // Filled in if tok_number 877 878 /// gettok - Return the next token from standard input. 879 static int gettok() { 880 static int LastChar = ' '; 881 882 // Skip any whitespace. 883 while (isspace(LastChar)) 884 LastChar = getchar(); 885 886 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* 887 IdentifierStr = LastChar; 888 while (isalnum((LastChar = getchar()))) 889 IdentifierStr += LastChar; 890 891 if (IdentifierStr == "def") return tok_def; 892 if (IdentifierStr == "extern") return tok_extern; 893 if (IdentifierStr == "if") return tok_if; 894 if (IdentifierStr == "then") return tok_then; 895 if (IdentifierStr == "else") return tok_else; 896 if (IdentifierStr == "for") return tok_for; 897 if (IdentifierStr == "in") return tok_in; 898 if (IdentifierStr == "binary") return tok_binary; 899 if (IdentifierStr == "unary") return tok_unary; 900 return tok_identifier; 901 } 902 903 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ 904 std::string NumStr; 905 do { 906 NumStr += LastChar; 907 LastChar = getchar(); 908 } while (isdigit(LastChar) || LastChar == '.'); 909 910 NumVal = strtod(NumStr.c_str(), 0); 911 return tok_number; 912 } 913 914 if (LastChar == '#') { 915 // Comment until end of line. 916 do LastChar = getchar(); 917 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); 918 919 if (LastChar != EOF) 920 return gettok(); 921 } 922 923 // Check for end of file. Don't eat the EOF. 924 if (LastChar == EOF) 925 return tok_eof; 926 927 // Otherwise, just return the character as its ascii value. 928 int ThisChar = LastChar; 929 LastChar = getchar(); 930 return ThisChar; 931 } 932 933 //===----------------------------------------------------------------------===// 934 // Abstract Syntax Tree (aka Parse Tree) 935 //===----------------------------------------------------------------------===// 936 937 /// ExprAST - Base class for all expression nodes. 938 class ExprAST { 939 public: 940 virtual ~ExprAST() {} 941 virtual Value *Codegen() = 0; 942 }; 943 944 /// NumberExprAST - Expression class for numeric literals like "1.0". 945 class NumberExprAST : public ExprAST { 946 double Val; 947 public: 948 NumberExprAST(double val) : Val(val) {} 949 virtual Value *Codegen(); 950 }; 951 952 /// VariableExprAST - Expression class for referencing a variable, like "a". 953 class VariableExprAST : public ExprAST { 954 std::string Name; 955 public: 956 VariableExprAST(const std::string &name) : Name(name) {} 957 virtual Value *Codegen(); 958 }; 959 960 /// UnaryExprAST - Expression class for a unary operator. 961 class UnaryExprAST : public ExprAST { 962 char Opcode; 963 ExprAST *Operand; 964 public: 965 UnaryExprAST(char opcode, ExprAST *operand) 966 : Opcode(opcode), Operand(operand) {} 967 virtual Value *Codegen(); 968 }; 969 970 /// BinaryExprAST - Expression class for a binary operator. 971 class BinaryExprAST : public ExprAST { 972 char Op; 973 ExprAST *LHS, *RHS; 974 public: 975 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 976 : Op(op), LHS(lhs), RHS(rhs) {} 977 virtual Value *Codegen(); 978 }; 979 980 /// CallExprAST - Expression class for function calls. 981 class CallExprAST : public ExprAST { 982 std::string Callee; 983 std::vector<ExprAST*> Args; 984 public: 985 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) 986 : Callee(callee), Args(args) {} 987 virtual Value *Codegen(); 988 }; 989 990 /// IfExprAST - Expression class for if/then/else. 991 class IfExprAST : public ExprAST { 992 ExprAST *Cond, *Then, *Else; 993 public: 994 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 995 : Cond(cond), Then(then), Else(_else) {} 996 virtual Value *Codegen(); 997 }; 998 999 /// ForExprAST - Expression class for for/in. 1000 class ForExprAST : public ExprAST { 1001 std::string VarName; 1002 ExprAST *Start, *End, *Step, *Body; 1003 public: 1004 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 1005 ExprAST *step, ExprAST *body) 1006 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 1007 virtual Value *Codegen(); 1008 }; 1009 1010 /// PrototypeAST - This class represents the "prototype" for a function, 1011 /// which captures its name, and its argument names (thus implicitly the number 1012 /// of arguments the function takes), as well as if it is an operator. 1013 class PrototypeAST { 1014 std::string Name; 1015 std::vector<std::string> Args; 1016 bool isOperator; 1017 unsigned Precedence; // Precedence if a binary op. 1018 public: 1019 PrototypeAST(const std::string &name, const std::vector<std::string> &args, 1020 bool isoperator = false, unsigned prec = 0) 1021 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} 1022 1023 bool isUnaryOp() const { return isOperator && Args.size() == 1; } 1024 bool isBinaryOp() const { return isOperator && Args.size() == 2; } 1025 1026 char getOperatorName() const { 1027 assert(isUnaryOp() || isBinaryOp()); 1028 return Name[Name.size()-1]; 1029 } 1030 1031 unsigned getBinaryPrecedence() const { return Precedence; } 1032 1033 Function *Codegen(); 1034 }; 1035 1036 /// FunctionAST - This class represents a function definition itself. 1037 class FunctionAST { 1038 PrototypeAST *Proto; 1039 ExprAST *Body; 1040 public: 1041 FunctionAST(PrototypeAST *proto, ExprAST *body) 1042 : Proto(proto), Body(body) {} 1043 1044 Function *Codegen(); 1045 }; 1046 1047 //===----------------------------------------------------------------------===// 1048 // Parser 1049 //===----------------------------------------------------------------------===// 1050 1051 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 1052 /// token the parser is looking at. getNextToken reads another token from the 1053 /// lexer and updates CurTok with its results. 1054 static int CurTok; 1055 static int getNextToken() { 1056 return CurTok = gettok(); 1057 } 1058 1059 /// BinopPrecedence - This holds the precedence for each binary operator that is 1060 /// defined. 1061 static std::map<char, int> BinopPrecedence; 1062 1063 /// GetTokPrecedence - Get the precedence of the pending binary operator token. 1064 static int GetTokPrecedence() { 1065 if (!isascii(CurTok)) 1066 return -1; 1067 1068 // Make sure it's a declared binop. 1069 int TokPrec = BinopPrecedence[CurTok]; 1070 if (TokPrec <= 0) return -1; 1071 return TokPrec; 1072 } 1073 1074 /// Error* - These are little helper functions for error handling. 1075 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} 1076 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } 1077 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } 1078 1079 static ExprAST *ParseExpression(); 1080 1081 /// identifierexpr 1082 /// ::= identifier 1083 /// ::= identifier '(' expression* ')' 1084 static ExprAST *ParseIdentifierExpr() { 1085 std::string IdName = IdentifierStr; 1086 1087 getNextToken(); // eat identifier. 1088 1089 if (CurTok != '(') // Simple variable ref. 1090 return new VariableExprAST(IdName); 1091 1092 // Call. 1093 getNextToken(); // eat ( 1094 std::vector<ExprAST*> Args; 1095 if (CurTok != ')') { 1096 while (1) { 1097 ExprAST *Arg = ParseExpression(); 1098 if (!Arg) return 0; 1099 Args.push_back(Arg); 1100 1101 if (CurTok == ')') break; 1102 1103 if (CurTok != ',') 1104 return Error("Expected ')' or ',' in argument list"); 1105 getNextToken(); 1106 } 1107 } 1108 1109 // Eat the ')'. 1110 getNextToken(); 1111 1112 return new CallExprAST(IdName, Args); 1113 } 1114 1115 /// numberexpr ::= number 1116 static ExprAST *ParseNumberExpr() { 1117 ExprAST *Result = new NumberExprAST(NumVal); 1118 getNextToken(); // consume the number 1119 return Result; 1120 } 1121 1122 /// parenexpr ::= '(' expression ')' 1123 static ExprAST *ParseParenExpr() { 1124 getNextToken(); // eat (. 1125 ExprAST *V = ParseExpression(); 1126 if (!V) return 0; 1127 1128 if (CurTok != ')') 1129 return Error("expected ')'"); 1130 getNextToken(); // eat ). 1131 return V; 1132 } 1133 1134 /// ifexpr ::= 'if' expression 'then' expression 'else' expression 1135 static ExprAST *ParseIfExpr() { 1136 getNextToken(); // eat the if. 1137 1138 // condition. 1139 ExprAST *Cond = ParseExpression(); 1140 if (!Cond) return 0; 1141 1142 if (CurTok != tok_then) 1143 return Error("expected then"); 1144 getNextToken(); // eat the then 1145 1146 ExprAST *Then = ParseExpression(); 1147 if (Then == 0) return 0; 1148 1149 if (CurTok != tok_else) 1150 return Error("expected else"); 1151 1152 getNextToken(); 1153 1154 ExprAST *Else = ParseExpression(); 1155 if (!Else) return 0; 1156 1157 return new IfExprAST(Cond, Then, Else); 1158 } 1159 1160 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 1161 static ExprAST *ParseForExpr() { 1162 getNextToken(); // eat the for. 1163 1164 if (CurTok != tok_identifier) 1165 return Error("expected identifier after for"); 1166 1167 std::string IdName = IdentifierStr; 1168 getNextToken(); // eat identifier. 1169 1170 if (CurTok != '=') 1171 return Error("expected '=' after for"); 1172 getNextToken(); // eat '='. 1173 1174 1175 ExprAST *Start = ParseExpression(); 1176 if (Start == 0) return 0; 1177 if (CurTok != ',') 1178 return Error("expected ',' after for start value"); 1179 getNextToken(); 1180 1181 ExprAST *End = ParseExpression(); 1182 if (End == 0) return 0; 1183 1184 // The step value is optional. 1185 ExprAST *Step = 0; 1186 if (CurTok == ',') { 1187 getNextToken(); 1188 Step = ParseExpression(); 1189 if (Step == 0) return 0; 1190 } 1191 1192 if (CurTok != tok_in) 1193 return Error("expected 'in' after for"); 1194 getNextToken(); // eat 'in'. 1195 1196 ExprAST *Body = ParseExpression(); 1197 if (Body == 0) return 0; 1198 1199 return new ForExprAST(IdName, Start, End, Step, Body); 1200 } 1201 1202 /// primary 1203 /// ::= identifierexpr 1204 /// ::= numberexpr 1205 /// ::= parenexpr 1206 /// ::= ifexpr 1207 /// ::= forexpr 1208 static ExprAST *ParsePrimary() { 1209 switch (CurTok) { 1210 default: return Error("unknown token when expecting an expression"); 1211 case tok_identifier: return ParseIdentifierExpr(); 1212 case tok_number: return ParseNumberExpr(); 1213 case '(': return ParseParenExpr(); 1214 case tok_if: return ParseIfExpr(); 1215 case tok_for: return ParseForExpr(); 1216 } 1217 } 1218 1219 /// unary 1220 /// ::= primary 1221 /// ::= '!' unary 1222 static ExprAST *ParseUnary() { 1223 // If the current token is not an operator, it must be a primary expr. 1224 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') 1225 return ParsePrimary(); 1226 1227 // If this is a unary operator, read it. 1228 int Opc = CurTok; 1229 getNextToken(); 1230 if (ExprAST *Operand = ParseUnary()) 1231 return new UnaryExprAST(Opc, Operand); 1232 return 0; 1233 } 1234 1235 /// binoprhs 1236 /// ::= ('+' unary)* 1237 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 1238 // If this is a binop, find its precedence. 1239 while (1) { 1240 int TokPrec = GetTokPrecedence(); 1241 1242 // If this is a binop that binds at least as tightly as the current binop, 1243 // consume it, otherwise we are done. 1244 if (TokPrec < ExprPrec) 1245 return LHS; 1246 1247 // Okay, we know this is a binop. 1248 int BinOp = CurTok; 1249 getNextToken(); // eat binop 1250 1251 // Parse the unary expression after the binary operator. 1252 ExprAST *RHS = ParseUnary(); 1253 if (!RHS) return 0; 1254 1255 // If BinOp binds less tightly with RHS than the operator after RHS, let 1256 // the pending operator take RHS as its LHS. 1257 int NextPrec = GetTokPrecedence(); 1258 if (TokPrec < NextPrec) { 1259 RHS = ParseBinOpRHS(TokPrec+1, RHS); 1260 if (RHS == 0) return 0; 1261 } 1262 1263 // Merge LHS/RHS. 1264 LHS = new BinaryExprAST(BinOp, LHS, RHS); 1265 } 1266 } 1267 1268 /// expression 1269 /// ::= unary binoprhs 1270 /// 1271 static ExprAST *ParseExpression() { 1272 ExprAST *LHS = ParseUnary(); 1273 if (!LHS) return 0; 1274 1275 return ParseBinOpRHS(0, LHS); 1276 } 1277 1278 /// prototype 1279 /// ::= id '(' id* ')' 1280 /// ::= binary LETTER number? (id, id) 1281 /// ::= unary LETTER (id) 1282 static PrototypeAST *ParsePrototype() { 1283 std::string FnName; 1284 1285 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. 1286 unsigned BinaryPrecedence = 30; 1287 1288 switch (CurTok) { 1289 default: 1290 return ErrorP("Expected function name in prototype"); 1291 case tok_identifier: 1292 FnName = IdentifierStr; 1293 Kind = 0; 1294 getNextToken(); 1295 break; 1296 case tok_unary: 1297 getNextToken(); 1298 if (!isascii(CurTok)) 1299 return ErrorP("Expected unary operator"); 1300 FnName = "unary"; 1301 FnName += (char)CurTok; 1302 Kind = 1; 1303 getNextToken(); 1304 break; 1305 case tok_binary: 1306 getNextToken(); 1307 if (!isascii(CurTok)) 1308 return ErrorP("Expected binary operator"); 1309 FnName = "binary"; 1310 FnName += (char)CurTok; 1311 Kind = 2; 1312 getNextToken(); 1313 1314 // Read the precedence if present. 1315 if (CurTok == tok_number) { 1316 if (NumVal < 1 || NumVal > 100) 1317 return ErrorP("Invalid precedecnce: must be 1..100"); 1318 BinaryPrecedence = (unsigned)NumVal; 1319 getNextToken(); 1320 } 1321 break; 1322 } 1323 1324 if (CurTok != '(') 1325 return ErrorP("Expected '(' in prototype"); 1326 1327 std::vector<std::string> ArgNames; 1328 while (getNextToken() == tok_identifier) 1329 ArgNames.push_back(IdentifierStr); 1330 if (CurTok != ')') 1331 return ErrorP("Expected ')' in prototype"); 1332 1333 // success. 1334 getNextToken(); // eat ')'. 1335 1336 // Verify right number of names for operator. 1337 if (Kind && ArgNames.size() != Kind) 1338 return ErrorP("Invalid number of operands for operator"); 1339 1340 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); 1341 } 1342 1343 /// definition ::= 'def' prototype expression 1344 static FunctionAST *ParseDefinition() { 1345 getNextToken(); // eat def. 1346 PrototypeAST *Proto = ParsePrototype(); 1347 if (Proto == 0) return 0; 1348 1349 if (ExprAST *E = ParseExpression()) 1350 return new FunctionAST(Proto, E); 1351 return 0; 1352 } 1353 1354 /// toplevelexpr ::= expression 1355 static FunctionAST *ParseTopLevelExpr() { 1356 if (ExprAST *E = ParseExpression()) { 1357 // Make an anonymous proto. 1358 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); 1359 return new FunctionAST(Proto, E); 1360 } 1361 return 0; 1362 } 1363 1364 /// external ::= 'extern' prototype 1365 static PrototypeAST *ParseExtern() { 1366 getNextToken(); // eat extern. 1367 return ParsePrototype(); 1368 } 1369 1370 //===----------------------------------------------------------------------===// 1371 // Code Generation 1372 //===----------------------------------------------------------------------===// 1373 1374 static Module *TheModule; 1375 static IRBuilder<> Builder(getGlobalContext()); 1376 static std::map<std::string, Value*> NamedValues; 1377 static FunctionPassManager *TheFPM; 1378 1379 Value *ErrorV(const char *Str) { Error(Str); return 0; } 1380 1381 Value *NumberExprAST::Codegen() { 1382 return ConstantFP::get(getGlobalContext(), APFloat(Val)); 1383 } 1384 1385 Value *VariableExprAST::Codegen() { 1386 // Look this variable up in the function. 1387 Value *V = NamedValues[Name]; 1388 return V ? V : ErrorV("Unknown variable name"); 1389 } 1390 1391 Value *UnaryExprAST::Codegen() { 1392 Value *OperandV = Operand->Codegen(); 1393 if (OperandV == 0) return 0; 1394 1395 Function *F = TheModule->getFunction(std::string("unary")+Opcode); 1396 if (F == 0) 1397 return ErrorV("Unknown unary operator"); 1398 1399 return Builder.CreateCall(F, OperandV, "unop"); 1400 } 1401 1402 Value *BinaryExprAST::Codegen() { 1403 Value *L = LHS->Codegen(); 1404 Value *R = RHS->Codegen(); 1405 if (L == 0 || R == 0) return 0; 1406 1407 switch (Op) { 1408 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 1409 case '-': return Builder.CreateFSub(L, R, "subtmp"); 1410 case '*': return Builder.CreateFMul(L, R, "multmp"); 1411 case '<': 1412 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 1413 // Convert bool 0/1 to double 0.0 or 1.0 1414 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 1415 "booltmp"); 1416 default: break; 1417 } 1418 1419 // If it wasn't a builtin binary operator, it must be a user defined one. Emit 1420 // a call to it. 1421 Function *F = TheModule->getFunction(std::string("binary")+Op); 1422 assert(F && "binary operator not found!"); 1423 1424 Value *Ops[2] = { L, R }; 1425 return Builder.CreateCall(F, Ops, "binop"); 1426 } 1427 1428 Value *CallExprAST::Codegen() { 1429 // Look up the name in the global module table. 1430 Function *CalleeF = TheModule->getFunction(Callee); 1431 if (CalleeF == 0) 1432 return ErrorV("Unknown function referenced"); 1433 1434 // If argument mismatch error. 1435 if (CalleeF->arg_size() != Args.size()) 1436 return ErrorV("Incorrect # arguments passed"); 1437 1438 std::vector<Value*> ArgsV; 1439 for (unsigned i = 0, e = Args.size(); i != e; ++i) { 1440 ArgsV.push_back(Args[i]->Codegen()); 1441 if (ArgsV.back() == 0) return 0; 1442 } 1443 1444 return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); 1445 } 1446 1447 Value *IfExprAST::Codegen() { 1448 Value *CondV = Cond->Codegen(); 1449 if (CondV == 0) return 0; 1450 1451 // Convert condition to a bool by comparing equal to 0.0. 1452 CondV = Builder.CreateFCmpONE(CondV, 1453 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1454 "ifcond"); 1455 1456 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1457 1458 // Create blocks for the then and else cases. Insert the 'then' block at the 1459 // end of the function. 1460 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 1461 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 1462 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 1463 1464 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 1465 1466 // Emit then value. 1467 Builder.SetInsertPoint(ThenBB); 1468 1469 Value *ThenV = Then->Codegen(); 1470 if (ThenV == 0) return 0; 1471 1472 Builder.CreateBr(MergeBB); 1473 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 1474 ThenBB = Builder.GetInsertBlock(); 1475 1476 // Emit else block. 1477 TheFunction->getBasicBlockList().push_back(ElseBB); 1478 Builder.SetInsertPoint(ElseBB); 1479 1480 Value *ElseV = Else->Codegen(); 1481 if (ElseV == 0) return 0; 1482 1483 Builder.CreateBr(MergeBB); 1484 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 1485 ElseBB = Builder.GetInsertBlock(); 1486 1487 // Emit merge block. 1488 TheFunction->getBasicBlockList().push_back(MergeBB); 1489 Builder.SetInsertPoint(MergeBB); 1490 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 1491 "iftmp"); 1492 1493 PN->addIncoming(ThenV, ThenBB); 1494 PN->addIncoming(ElseV, ElseBB); 1495 return PN; 1496 } 1497 1498 Value *ForExprAST::Codegen() { 1499 // Output this as: 1500 // ... 1501 // start = startexpr 1502 // goto loop 1503 // loop: 1504 // variable = phi [start, loopheader], [nextvariable, loopend] 1505 // ... 1506 // bodyexpr 1507 // ... 1508 // loopend: 1509 // step = stepexpr 1510 // nextvariable = variable + step 1511 // endcond = endexpr 1512 // br endcond, loop, endloop 1513 // outloop: 1514 1515 // Emit the start code first, without 'variable' in scope. 1516 Value *StartVal = Start->Codegen(); 1517 if (StartVal == 0) return 0; 1518 1519 // Make the new basic block for the loop header, inserting after current 1520 // block. 1521 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1522 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 1523 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 1524 1525 // Insert an explicit fall through from the current block to the LoopBB. 1526 Builder.CreateBr(LoopBB); 1527 1528 // Start insertion in LoopBB. 1529 Builder.SetInsertPoint(LoopBB); 1530 1531 // Start the PHI node with an entry for Start. 1532 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 1533 Variable->addIncoming(StartVal, PreheaderBB); 1534 1535 // Within the loop, the variable is defined equal to the PHI node. If it 1536 // shadows an existing variable, we have to restore it, so save it now. 1537 Value *OldVal = NamedValues[VarName]; 1538 NamedValues[VarName] = Variable; 1539 1540 // Emit the body of the loop. This, like any other expr, can change the 1541 // current BB. Note that we ignore the value computed by the body, but don't 1542 // allow an error. 1543 if (Body->Codegen() == 0) 1544 return 0; 1545 1546 // Emit the step value. 1547 Value *StepVal; 1548 if (Step) { 1549 StepVal = Step->Codegen(); 1550 if (StepVal == 0) return 0; 1551 } else { 1552 // If not specified, use 1.0. 1553 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 1554 } 1555 1556 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 1557 1558 // Compute the end condition. 1559 Value *EndCond = End->Codegen(); 1560 if (EndCond == 0) return EndCond; 1561 1562 // Convert condition to a bool by comparing equal to 0.0. 1563 EndCond = Builder.CreateFCmpONE(EndCond, 1564 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1565 "loopcond"); 1566 1567 // Create the "after loop" block and insert it. 1568 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 1569 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 1570 1571 // Insert the conditional branch into the end of LoopEndBB. 1572 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 1573 1574 // Any new code will be inserted in AfterBB. 1575 Builder.SetInsertPoint(AfterBB); 1576 1577 // Add a new entry to the PHI node for the backedge. 1578 Variable->addIncoming(NextVar, LoopEndBB); 1579 1580 // Restore the unshadowed variable. 1581 if (OldVal) 1582 NamedValues[VarName] = OldVal; 1583 else 1584 NamedValues.erase(VarName); 1585 1586 1587 // for expr always returns 0.0. 1588 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 1589 } 1590 1591 Function *PrototypeAST::Codegen() { 1592 // Make the function type: double(double,double) etc. 1593 std::vector<Type*> Doubles(Args.size(), 1594 Type::getDoubleTy(getGlobalContext())); 1595 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), 1596 Doubles, false); 1597 1598 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); 1599 1600 // If F conflicted, there was already something named 'Name'. If it has a 1601 // body, don't allow redefinition or reextern. 1602 if (F->getName() != Name) { 1603 // Delete the one we just made and get the existing one. 1604 F->eraseFromParent(); 1605 F = TheModule->getFunction(Name); 1606 1607 // If F already has a body, reject this. 1608 if (!F->empty()) { 1609 ErrorF("redefinition of function"); 1610 return 0; 1611 } 1612 1613 // If F took a different number of args, reject. 1614 if (F->arg_size() != Args.size()) { 1615 ErrorF("redefinition of function with different # args"); 1616 return 0; 1617 } 1618 } 1619 1620 // Set names for all arguments. 1621 unsigned Idx = 0; 1622 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); 1623 ++AI, ++Idx) { 1624 AI->setName(Args[Idx]); 1625 1626 // Add arguments to variable symbol table. 1627 NamedValues[Args[Idx]] = AI; 1628 } 1629 1630 return F; 1631 } 1632 1633 Function *FunctionAST::Codegen() { 1634 NamedValues.clear(); 1635 1636 Function *TheFunction = Proto->Codegen(); 1637 if (TheFunction == 0) 1638 return 0; 1639 1640 // If this is an operator, install it. 1641 if (Proto->isBinaryOp()) 1642 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); 1643 1644 // Create a new basic block to start insertion into. 1645 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 1646 Builder.SetInsertPoint(BB); 1647 1648 if (Value *RetVal = Body->Codegen()) { 1649 // Finish off the function. 1650 Builder.CreateRet(RetVal); 1651 1652 // Validate the generated code, checking for consistency. 1653 verifyFunction(*TheFunction); 1654 1655 // Optimize the function. 1656 TheFPM->run(*TheFunction); 1657 1658 return TheFunction; 1659 } 1660 1661 // Error reading body, remove function. 1662 TheFunction->eraseFromParent(); 1663 1664 if (Proto->isBinaryOp()) 1665 BinopPrecedence.erase(Proto->getOperatorName()); 1666 return 0; 1667 } 1668 1669 //===----------------------------------------------------------------------===// 1670 // Top-Level parsing and JIT Driver 1671 //===----------------------------------------------------------------------===// 1672 1673 static ExecutionEngine *TheExecutionEngine; 1674 1675 static void HandleDefinition() { 1676 if (FunctionAST *F = ParseDefinition()) { 1677 if (Function *LF = F->Codegen()) { 1678 fprintf(stderr, "Read function definition:"); 1679 LF->dump(); 1680 } 1681 } else { 1682 // Skip token for error recovery. 1683 getNextToken(); 1684 } 1685 } 1686 1687 static void HandleExtern() { 1688 if (PrototypeAST *P = ParseExtern()) { 1689 if (Function *F = P->Codegen()) { 1690 fprintf(stderr, "Read extern: "); 1691 F->dump(); 1692 } 1693 } else { 1694 // Skip token for error recovery. 1695 getNextToken(); 1696 } 1697 } 1698 1699 static void HandleTopLevelExpression() { 1700 // Evaluate a top-level expression into an anonymous function. 1701 if (FunctionAST *F = ParseTopLevelExpr()) { 1702 if (Function *LF = F->Codegen()) { 1703 // JIT the function, returning a function pointer. 1704 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); 1705 1706 // Cast it to the right type (takes no arguments, returns a double) so we 1707 // can call it as a native function. 1708 double (*FP)() = (double (*)())(intptr_t)FPtr; 1709 fprintf(stderr, "Evaluated to %f\n", FP()); 1710 } 1711 } else { 1712 // Skip token for error recovery. 1713 getNextToken(); 1714 } 1715 } 1716 1717 /// top ::= definition | external | expression | ';' 1718 static void MainLoop() { 1719 while (1) { 1720 fprintf(stderr, "ready> "); 1721 switch (CurTok) { 1722 case tok_eof: return; 1723 case ';': getNextToken(); break; // ignore top-level semicolons. 1724 case tok_def: HandleDefinition(); break; 1725 case tok_extern: HandleExtern(); break; 1726 default: HandleTopLevelExpression(); break; 1727 } 1728 } 1729 } 1730 1731 //===----------------------------------------------------------------------===// 1732 // "Library" functions that can be "extern'd" from user code. 1733 //===----------------------------------------------------------------------===// 1734 1735 /// putchard - putchar that takes a double and returns 0. 1736 extern "C" 1737 double putchard(double X) { 1738 putchar((char)X); 1739 return 0; 1740 } 1741 1742 /// printd - printf that takes a double prints it as "%f\n", returning 0. 1743 extern "C" 1744 double printd(double X) { 1745 printf("%f\n", X); 1746 return 0; 1747 } 1748 1749 //===----------------------------------------------------------------------===// 1750 // Main driver code. 1751 //===----------------------------------------------------------------------===// 1752 1753 int main() { 1754 InitializeNativeTarget(); 1755 LLVMContext &Context = getGlobalContext(); 1756 1757 // Install standard binary operators. 1758 // 1 is lowest precedence. 1759 BinopPrecedence['<'] = 10; 1760 BinopPrecedence['+'] = 20; 1761 BinopPrecedence['-'] = 20; 1762 BinopPrecedence['*'] = 40; // highest. 1763 1764 // Prime the first token. 1765 fprintf(stderr, "ready> "); 1766 getNextToken(); 1767 1768 // Make the module, which holds all the code. 1769 TheModule = new Module("my cool jit", Context); 1770 1771 // Create the JIT. This takes ownership of the module. 1772 std::string ErrStr; 1773 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); 1774 if (!TheExecutionEngine) { 1775 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); 1776 exit(1); 1777 } 1778 1779 FunctionPassManager OurFPM(TheModule); 1780 1781 // Set up the optimizer pipeline. Start with registering info about how the 1782 // target lays out data structures. 1783 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); 1784 // Provide basic AliasAnalysis support for GVN. 1785 OurFPM.add(createBasicAliasAnalysisPass()); 1786 // Do simple "peephole" optimizations and bit-twiddling optzns. 1787 OurFPM.add(createInstructionCombiningPass()); 1788 // Reassociate expressions. 1789 OurFPM.add(createReassociatePass()); 1790 // Eliminate Common SubExpressions. 1791 OurFPM.add(createGVNPass()); 1792 // Simplify the control flow graph (deleting unreachable blocks, etc). 1793 OurFPM.add(createCFGSimplificationPass()); 1794 1795 OurFPM.doInitialization(); 1796 1797 // Set the global so the code gen can use this. 1798 TheFPM = &OurFPM; 1799 1800 // Run the main "interpreter loop" now. 1801 MainLoop(); 1802 1803 TheFPM = 0; 1804 1805 // Print out all of the generated code. 1806 TheModule->dump(); 1807 1808 return 0; 1809 } 1810 </pre> 1811 </div> 1812 1813 <a href="LangImpl7.html">Next: Extending the language: mutable variables / SSA construction</a> 1814 </div> 1815 1816 <!-- *********************************************************************** --> 1817 <hr> 1818 <address> 1819 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 1820 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> 1821 <a href="http://validator.w3.org/check/referer"><img 1822 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> 1823 1824 <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br> 1825 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> 1826 Last modified: $Date: 2011-10-16 04:07:38 -0400 (Sun, 16 Oct 2011) $ 1827 </address> 1828 </body> 1829 </html> 1830