Home | History | Annotate | Download | only in tutorial
      1 ===========================================
      2 Kaleidoscope: Implementing a Parser and AST
      3 ===========================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 2 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 2 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
     13 lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
     14 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
     15 language. Once we have a parser, we'll define and build an `Abstract
     16 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
     17 
     18 The parser we will build uses a combination of `Recursive Descent
     19 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
     20 `Operator-Precedence
     21 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
     22 parse the Kaleidoscope language (the latter for binary expressions and
     23 the former for everything else). Before we get to parsing though, lets
     24 talk about the output of the parser: the Abstract Syntax Tree.
     25 
     26 The Abstract Syntax Tree (AST)
     27 ==============================
     28 
     29 The AST for a program captures its behavior in such a way that it is
     30 easy for later stages of the compiler (e.g. code generation) to
     31 interpret. We basically want one object for each construct in the
     32 language, and the AST should closely model the language. In
     33 Kaleidoscope, we have expressions, a prototype, and a function object.
     34 We'll start with expressions first:
     35 
     36 .. code-block:: c++
     37 
     38     /// ExprAST - Base class for all expression nodes.
     39     class ExprAST {
     40     public:
     41       virtual ~ExprAST() {}
     42     };
     43 
     44     /// NumberExprAST - Expression class for numeric literals like "1.0".
     45     class NumberExprAST : public ExprAST {
     46       double Val;
     47 
     48     public:
     49       NumberExprAST(double Val) : Val(Val) {}
     50     };
     51 
     52 The code above shows the definition of the base ExprAST class and one
     53 subclass which we use for numeric literals. The important thing to note
     54 about this code is that the NumberExprAST class captures the numeric
     55 value of the literal as an instance variable. This allows later phases
     56 of the compiler to know what the stored numeric value is.
     57 
     58 Right now we only create the AST, so there are no useful accessor
     59 methods on them. It would be very easy to add a virtual method to pretty
     60 print the code, for example. Here are the other expression AST node
     61 definitions that we'll use in the basic form of the Kaleidoscope
     62 language:
     63 
     64 .. code-block:: c++
     65 
     66     /// VariableExprAST - Expression class for referencing a variable, like "a".
     67     class VariableExprAST : public ExprAST {
     68       std::string Name;
     69 
     70     public:
     71       VariableExprAST(const std::string &Name) : Name(Name) {}
     72     };
     73 
     74     /// BinaryExprAST - Expression class for a binary operator.
     75     class BinaryExprAST : public ExprAST {
     76       char Op;
     77       std::unique_ptr<ExprAST> LHS, RHS;
     78 
     79     public:
     80       BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
     81                     std::unique_ptr<ExprAST> RHS)
     82         : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
     83     };
     84 
     85     /// CallExprAST - Expression class for function calls.
     86     class CallExprAST : public ExprAST {
     87       std::string Callee;
     88       std::vector<std::unique_ptr<ExprAST>> Args;
     89 
     90     public:
     91       CallExprAST(const std::string &Callee,
     92                   std::vector<std::unique_ptr<ExprAST>> Args)
     93         : Callee(Callee), Args(std::move(Args)) {}
     94     };
     95 
     96 This is all (intentionally) rather straight-forward: variables capture
     97 the variable name, binary operators capture their opcode (e.g. '+'), and
     98 calls capture a function name as well as a list of any argument
     99 expressions. One thing that is nice about our AST is that it captures
    100 the language features without talking about the syntax of the language.
    101 Note that there is no discussion about precedence of binary operators,
    102 lexical structure, etc.
    103 
    104 For our basic language, these are all of the expression nodes we'll
    105 define. Because it doesn't have conditional control flow, it isn't
    106 Turing-complete; we'll fix that in a later installment. The two things
    107 we need next are a way to talk about the interface to a function, and a
    108 way to talk about functions themselves:
    109 
    110 .. code-block:: c++
    111 
    112     /// PrototypeAST - This class represents the "prototype" for a function,
    113     /// which captures its name, and its argument names (thus implicitly the number
    114     /// of arguments the function takes).
    115     class PrototypeAST {
    116       std::string Name;
    117       std::vector<std::string> Args;
    118 
    119     public:
    120       PrototypeAST(const std::string &name, std::vector<std::string> Args)
    121         : Name(name), Args(std::move(Args)) {}
    122     };
    123 
    124     /// FunctionAST - This class represents a function definition itself.
    125     class FunctionAST {
    126       std::unique_ptr<PrototypeAST> Proto;
    127       std::unique_ptr<ExprAST> Body;
    128 
    129     public:
    130       FunctionAST(std::unique_ptr<PrototypeAST> Proto,
    131                   std::unique_ptr<ExprAST> Body)
    132         : Proto(std::move(Proto)), Body(std::move(Body)) {}
    133     };
    134 
    135 In Kaleidoscope, functions are typed with just a count of their
    136 arguments. Since all values are double precision floating point, the
    137 type of each argument doesn't need to be stored anywhere. In a more
    138 aggressive and realistic language, the "ExprAST" class would probably
    139 have a type field.
    140 
    141 With this scaffolding, we can now talk about parsing expressions and
    142 function bodies in Kaleidoscope.
    143 
    144 Parser Basics
    145 =============
    146 
    147 Now that we have an AST to build, we need to define the parser code to
    148 build it. The idea here is that we want to parse something like "x+y"
    149 (which is returned as three tokens by the lexer) into an AST that could
    150 be generated with calls like this:
    151 
    152 .. code-block:: c++
    153 
    154       auto LHS = llvm::make_unique<VariableExprAST>("x");
    155       auto RHS = llvm::make_unique<VariableExprAST>("y");
    156       auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
    157                                                     std::move(RHS));
    158 
    159 In order to do this, we'll start by defining some basic helper routines:
    160 
    161 .. code-block:: c++
    162 
    163     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
    164     /// token the parser is looking at.  getNextToken reads another token from the
    165     /// lexer and updates CurTok with its results.
    166     static int CurTok;
    167     static int getNextToken() {
    168       return CurTok = gettok();
    169     }
    170 
    171 This implements a simple token buffer around the lexer. This allows us
    172 to look one token ahead at what the lexer is returning. Every function
    173 in our parser will assume that CurTok is the current token that needs to
    174 be parsed.
    175 
    176 .. code-block:: c++
    177 
    178 
    179     /// LogError* - These are little helper functions for error handling.
    180     std::unique_ptr<ExprAST> LogError(const char *Str) {
    181       fprintf(stderr, "LogError: %s\n", Str);
    182       return nullptr;
    183     }
    184     std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
    185       LogError(Str);
    186       return nullptr;
    187     }
    188 
    189 The ``LogError`` routines are simple helper routines that our parser will
    190 use to handle errors. The error recovery in our parser will not be the
    191 best and is not particular user-friendly, but it will be enough for our
    192 tutorial. These routines make it easier to handle errors in routines
    193 that have various return types: they always return null.
    194 
    195 With these basic helper functions, we can implement the first piece of
    196 our grammar: numeric literals.
    197 
    198 Basic Expression Parsing
    199 ========================
    200 
    201 We start with numeric literals, because they are the simplest to
    202 process. For each production in our grammar, we'll define a function
    203 which parses that production. For numeric literals, we have:
    204 
    205 .. code-block:: c++
    206 
    207     /// numberexpr ::= number
    208     static std::unique_ptr<ExprAST> ParseNumberExpr() {
    209       auto Result = llvm::make_unique<NumberExprAST>(NumVal);
    210       getNextToken(); // consume the number
    211       return std::move(Result);
    212     }
    213 
    214 This routine is very simple: it expects to be called when the current
    215 token is a ``tok_number`` token. It takes the current number value,
    216 creates a ``NumberExprAST`` node, advances the lexer to the next token,
    217 and finally returns.
    218 
    219 There are some interesting aspects to this. The most important one is
    220 that this routine eats all of the tokens that correspond to the
    221 production and returns the lexer buffer with the next token (which is
    222 not part of the grammar production) ready to go. This is a fairly
    223 standard way to go for recursive descent parsers. For a better example,
    224 the parenthesis operator is defined like this:
    225 
    226 .. code-block:: c++
    227 
    228     /// parenexpr ::= '(' expression ')'
    229     static std::unique_ptr<ExprAST> ParseParenExpr() {
    230       getNextToken(); // eat (.
    231       auto V = ParseExpression();
    232       if (!V)
    233         return nullptr;
    234 
    235       if (CurTok != ')')
    236         return LogError("expected ')'");
    237       getNextToken(); // eat ).
    238       return V;
    239     }
    240 
    241 This function illustrates a number of interesting things about the
    242 parser:
    243 
    244 1) It shows how we use the LogError routines. When called, this function
    245 expects that the current token is a '(' token, but after parsing the
    246 subexpression, it is possible that there is no ')' waiting. For example,
    247 if the user types in "(4 x" instead of "(4)", the parser should emit an
    248 error. Because errors can occur, the parser needs a way to indicate that
    249 they happened: in our parser, we return null on an error.
    250 
    251 2) Another interesting aspect of this function is that it uses recursion
    252 by calling ``ParseExpression`` (we will soon see that
    253 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
    254 because it allows us to handle recursive grammars, and keeps each
    255 production very simple. Note that parentheses do not cause construction
    256 of AST nodes themselves. While we could do it this way, the most
    257 important role of parentheses are to guide the parser and provide
    258 grouping. Once the parser constructs the AST, parentheses are not
    259 needed.
    260 
    261 The next simple production is for handling variable references and
    262 function calls:
    263 
    264 .. code-block:: c++
    265 
    266     /// identifierexpr
    267     ///   ::= identifier
    268     ///   ::= identifier '(' expression* ')'
    269     static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
    270       std::string IdName = IdentifierStr;
    271 
    272       getNextToken();  // eat identifier.
    273 
    274       if (CurTok != '(') // Simple variable ref.
    275         return llvm::make_unique<VariableExprAST>(IdName);
    276 
    277       // Call.
    278       getNextToken();  // eat (
    279       std::vector<std::unique_ptr<ExprAST>> Args;
    280       if (CurTok != ')') {
    281         while (1) {
    282           if (auto Arg = ParseExpression())
    283             Args.push_back(std::move(Arg));
    284           else
    285             return nullptr;
    286 
    287           if (CurTok == ')')
    288             break;
    289 
    290           if (CurTok != ',')
    291             return LogError("Expected ')' or ',' in argument list");
    292           getNextToken();
    293         }
    294       }
    295 
    296       // Eat the ')'.
    297       getNextToken();
    298 
    299       return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
    300     }
    301 
    302 This routine follows the same style as the other routines. (It expects
    303 to be called if the current token is a ``tok_identifier`` token). It
    304 also has recursion and error handling. One interesting aspect of this is
    305 that it uses *look-ahead* to determine if the current identifier is a
    306 stand alone variable reference or if it is a function call expression.
    307 It handles this by checking to see if the token after the identifier is
    308 a '(' token, constructing either a ``VariableExprAST`` or
    309 ``CallExprAST`` node as appropriate.
    310 
    311 Now that we have all of our simple expression-parsing logic in place, we
    312 can define a helper function to wrap it together into one entry point.
    313 We call this class of expressions "primary" expressions, for reasons
    314 that will become more clear `later in the
    315 tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
    316 primary expression, we need to determine what sort of expression it is:
    317 
    318 .. code-block:: c++
    319 
    320     /// primary
    321     ///   ::= identifierexpr
    322     ///   ::= numberexpr
    323     ///   ::= parenexpr
    324     static std::unique_ptr<ExprAST> ParsePrimary() {
    325       switch (CurTok) {
    326       default:
    327         return LogError("unknown token when expecting an expression");
    328       case tok_identifier:
    329         return ParseIdentifierExpr();
    330       case tok_number:
    331         return ParseNumberExpr();
    332       case '(':
    333         return ParseParenExpr();
    334       }
    335     }
    336 
    337 Now that you see the definition of this function, it is more obvious why
    338 we can assume the state of CurTok in the various functions. This uses
    339 look-ahead to determine which sort of expression is being inspected, and
    340 then parses it with a function call.
    341 
    342 Now that basic expressions are handled, we need to handle binary
    343 expressions. They are a bit more complex.
    344 
    345 Binary Expression Parsing
    346 =========================
    347 
    348 Binary expressions are significantly harder to parse because they are
    349 often ambiguous. For example, when given the string "x+y\*z", the parser
    350 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
    351 definitions from mathematics, we expect the later parse, because "\*"
    352 (multiplication) has higher *precedence* than "+" (addition).
    353 
    354 There are many ways to handle this, but an elegant and efficient way is
    355 to use `Operator-Precedence
    356 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
    357 This parsing technique uses the precedence of binary operators to guide
    358 recursion. To start with, we need a table of precedences:
    359 
    360 .. code-block:: c++
    361 
    362     /// BinopPrecedence - This holds the precedence for each binary operator that is
    363     /// defined.
    364     static std::map<char, int> BinopPrecedence;
    365 
    366     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
    367     static int GetTokPrecedence() {
    368       if (!isascii(CurTok))
    369         return -1;
    370 
    371       // Make sure it's a declared binop.
    372       int TokPrec = BinopPrecedence[CurTok];
    373       if (TokPrec <= 0) return -1;
    374       return TokPrec;
    375     }
    376 
    377     int main() {
    378       // Install standard binary operators.
    379       // 1 is lowest precedence.
    380       BinopPrecedence['<'] = 10;
    381       BinopPrecedence['+'] = 20;
    382       BinopPrecedence['-'] = 20;
    383       BinopPrecedence['*'] = 40;  // highest.
    384       ...
    385     }
    386 
    387 For the basic form of Kaleidoscope, we will only support 4 binary
    388 operators (this can obviously be extended by you, our brave and intrepid
    389 reader). The ``GetTokPrecedence`` function returns the precedence for
    390 the current token, or -1 if the token is not a binary operator. Having a
    391 map makes it easy to add new operators and makes it clear that the
    392 algorithm doesn't depend on the specific operators involved, but it
    393 would be easy enough to eliminate the map and do the comparisons in the
    394 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
    395 
    396 With the helper above defined, we can now start parsing binary
    397 expressions. The basic idea of operator precedence parsing is to break
    398 down an expression with potentially ambiguous binary operators into
    399 pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
    400 Operator precedence parsing considers this as a stream of primary
    401 expressions separated by binary operators. As such, it will first parse
    402 the leading primary expression "a", then it will see the pairs [+, b]
    403 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
    404 primary expressions, the binary expression parser doesn't need to worry
    405 about nested subexpressions like (c+d) at all.
    406 
    407 To start, an expression is a primary expression potentially followed by
    408 a sequence of [binop,primaryexpr] pairs:
    409 
    410 .. code-block:: c++
    411 
    412     /// expression
    413     ///   ::= primary binoprhs
    414     ///
    415     static std::unique_ptr<ExprAST> ParseExpression() {
    416       auto LHS = ParsePrimary();
    417       if (!LHS)
    418         return nullptr;
    419 
    420       return ParseBinOpRHS(0, std::move(LHS));
    421     }
    422 
    423 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
    424 us. It takes a precedence and a pointer to an expression for the part
    425 that has been parsed so far. Note that "x" is a perfectly valid
    426 expression: As such, "binoprhs" is allowed to be empty, in which case it
    427 returns the expression that is passed into it. In our example above, the
    428 code passes the expression for "a" into ``ParseBinOpRHS`` and the
    429 current token is "+".
    430 
    431 The precedence value passed into ``ParseBinOpRHS`` indicates the
    432 *minimal operator precedence* that the function is allowed to eat. For
    433 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
    434 passed in a precedence of 40, it will not consume any tokens (because
    435 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
    436 starts with:
    437 
    438 .. code-block:: c++
    439 
    440     /// binoprhs
    441     ///   ::= ('+' primary)*
    442     static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
    443                                                   std::unique_ptr<ExprAST> LHS) {
    444       // If this is a binop, find its precedence.
    445       while (1) {
    446         int TokPrec = GetTokPrecedence();
    447 
    448         // If this is a binop that binds at least as tightly as the current binop,
    449         // consume it, otherwise we are done.
    450         if (TokPrec < ExprPrec)
    451           return LHS;
    452 
    453 This code gets the precedence of the current token and checks to see if
    454 if is too low. Because we defined invalid tokens to have a precedence of
    455 -1, this check implicitly knows that the pair-stream ends when the token
    456 stream runs out of binary operators. If this check succeeds, we know
    457 that the token is a binary operator and that it will be included in this
    458 expression:
    459 
    460 .. code-block:: c++
    461 
    462         // Okay, we know this is a binop.
    463         int BinOp = CurTok;
    464         getNextToken();  // eat binop
    465 
    466         // Parse the primary expression after the binary operator.
    467         auto RHS = ParsePrimary();
    468         if (!RHS)
    469           return nullptr;
    470 
    471 As such, this code eats (and remembers) the binary operator and then
    472 parses the primary expression that follows. This builds up the whole
    473 pair, the first of which is [+, b] for the running example.
    474 
    475 Now that we parsed the left-hand side of an expression and one pair of
    476 the RHS sequence, we have to decide which way the expression associates.
    477 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
    478 unparsed)". To determine this, we look ahead at "binop" to determine its
    479 precedence and compare it to BinOp's precedence (which is '+' in this
    480 case):
    481 
    482 .. code-block:: c++
    483 
    484         // If BinOp binds less tightly with RHS than the operator after RHS, let
    485         // the pending operator take RHS as its LHS.
    486         int NextPrec = GetTokPrecedence();
    487         if (TokPrec < NextPrec) {
    488 
    489 If the precedence of the binop to the right of "RHS" is lower or equal
    490 to the precedence of our current operator, then we know that the
    491 parentheses associate as "(a+b) binop ...". In our example, the current
    492 operator is "+" and the next operator is "+", we know that they have the
    493 same precedence. In this case we'll create the AST node for "a+b", and
    494 then continue parsing:
    495 
    496 .. code-block:: c++
    497 
    498           ... if body omitted ...
    499         }
    500 
    501         // Merge LHS/RHS.
    502         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
    503                                                std::move(RHS));
    504       }  // loop around to the top of the while loop.
    505     }
    506 
    507 In our example above, this will turn "a+b+" into "(a+b)" and execute the
    508 next iteration of the loop, with "+" as the current token. The code
    509 above will eat, remember, and parse "(c+d)" as the primary expression,
    510 which makes the current pair equal to [+, (c+d)]. It will then evaluate
    511 the 'if' conditional above with "\*" as the binop to the right of the
    512 primary. In this case, the precedence of "\*" is higher than the
    513 precedence of "+" so the if condition will be entered.
    514 
    515 The critical question left here is "how can the if condition parse the
    516 right hand side in full"? In particular, to build the AST correctly for
    517 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
    518 variable. The code to do this is surprisingly simple (code from the
    519 above two blocks duplicated for context):
    520 
    521 .. code-block:: c++
    522 
    523         // If BinOp binds less tightly with RHS than the operator after RHS, let
    524         // the pending operator take RHS as its LHS.
    525         int NextPrec = GetTokPrecedence();
    526         if (TokPrec < NextPrec) {
    527           RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
    528           if (!RHS)
    529             return nullptr;
    530         }
    531         // Merge LHS/RHS.
    532         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
    533                                                std::move(RHS));
    534       }  // loop around to the top of the while loop.
    535     }
    536 
    537 At this point, we know that the binary operator to the RHS of our
    538 primary has higher precedence than the binop we are currently parsing.
    539 As such, we know that any sequence of pairs whose operators are all
    540 higher precedence than "+" should be parsed together and returned as
    541 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
    542 specifying "TokPrec+1" as the minimum precedence required for it to
    543 continue. In our example above, this will cause it to return the AST
    544 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
    545 expression.
    546 
    547 Finally, on the next iteration of the while loop, the "+g" piece is
    548 parsed and added to the AST. With this little bit of code (14
    549 non-trivial lines), we correctly handle fully general binary expression
    550 parsing in a very elegant way. This was a whirlwind tour of this code,
    551 and it is somewhat subtle. I recommend running through it with a few
    552 tough examples to see how it works.
    553 
    554 This wraps up handling of expressions. At this point, we can point the
    555 parser at an arbitrary token stream and build an expression from it,
    556 stopping at the first token that is not part of the expression. Next up
    557 we need to handle function definitions, etc.
    558 
    559 Parsing the Rest
    560 ================
    561 
    562 The next thing missing is handling of function prototypes. In
    563 Kaleidoscope, these are used both for 'extern' function declarations as
    564 well as function body definitions. The code to do this is
    565 straight-forward and not very interesting (once you've survived
    566 expressions):
    567 
    568 .. code-block:: c++
    569 
    570     /// prototype
    571     ///   ::= id '(' id* ')'
    572     static std::unique_ptr<PrototypeAST> ParsePrototype() {
    573       if (CurTok != tok_identifier)
    574         return LogErrorP("Expected function name in prototype");
    575 
    576       std::string FnName = IdentifierStr;
    577       getNextToken();
    578 
    579       if (CurTok != '(')
    580         return LogErrorP("Expected '(' in prototype");
    581 
    582       // Read the list of argument names.
    583       std::vector<std::string> ArgNames;
    584       while (getNextToken() == tok_identifier)
    585         ArgNames.push_back(IdentifierStr);
    586       if (CurTok != ')')
    587         return LogErrorP("Expected ')' in prototype");
    588 
    589       // success.
    590       getNextToken();  // eat ')'.
    591 
    592       return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
    593     }
    594 
    595 Given this, a function definition is very simple, just a prototype plus
    596 an expression to implement the body:
    597 
    598 .. code-block:: c++
    599 
    600     /// definition ::= 'def' prototype expression
    601     static std::unique_ptr<FunctionAST> ParseDefinition() {
    602       getNextToken();  // eat def.
    603       auto Proto = ParsePrototype();
    604       if (!Proto) return nullptr;
    605 
    606       if (auto E = ParseExpression())
    607         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
    608       return nullptr;
    609     }
    610 
    611 In addition, we support 'extern' to declare functions like 'sin' and
    612 'cos' as well as to support forward declaration of user functions. These
    613 'extern's are just prototypes with no body:
    614 
    615 .. code-block:: c++
    616 
    617     /// external ::= 'extern' prototype
    618     static std::unique_ptr<PrototypeAST> ParseExtern() {
    619       getNextToken();  // eat extern.
    620       return ParsePrototype();
    621     }
    622 
    623 Finally, we'll also let the user type in arbitrary top-level expressions
    624 and evaluate them on the fly. We will handle this by defining anonymous
    625 nullary (zero argument) functions for them:
    626 
    627 .. code-block:: c++
    628 
    629     /// toplevelexpr ::= expression
    630     static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
    631       if (auto E = ParseExpression()) {
    632         // Make an anonymous proto.
    633         auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
    634         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
    635       }
    636       return nullptr;
    637     }
    638 
    639 Now that we have all the pieces, let's build a little driver that will
    640 let us actually *execute* this code we've built!
    641 
    642 The Driver
    643 ==========
    644 
    645 The driver for this simply invokes all of the parsing pieces with a
    646 top-level dispatch loop. There isn't much interesting here, so I'll just
    647 include the top-level loop. See `below <#full-code-listing>`_ for full code in the
    648 "Top-Level Parsing" section.
    649 
    650 .. code-block:: c++
    651 
    652     /// top ::= definition | external | expression | ';'
    653     static void MainLoop() {
    654       while (1) {
    655         fprintf(stderr, "ready> ");
    656         switch (CurTok) {
    657         case tok_eof:
    658           return;
    659         case ';': // ignore top-level semicolons.
    660           getNextToken();
    661           break;
    662         case tok_def:
    663           HandleDefinition();
    664           break;
    665         case tok_extern:
    666           HandleExtern();
    667           break;
    668         default:
    669           HandleTopLevelExpression();
    670           break;
    671         }
    672       }
    673     }
    674 
    675 The most interesting part of this is that we ignore top-level
    676 semicolons. Why is this, you ask? The basic reason is that if you type
    677 "4 + 5" at the command line, the parser doesn't know whether that is the
    678 end of what you will type or not. For example, on the next line you
    679 could type "def foo..." in which case 4+5 is the end of a top-level
    680 expression. Alternatively you could type "\* 6", which would continue
    681 the expression. Having top-level semicolons allows you to type "4+5;",
    682 and the parser will know you are done.
    683 
    684 Conclusions
    685 ===========
    686 
    687 With just under 400 lines of commented code (240 lines of non-comment,
    688 non-blank code), we fully defined our minimal language, including a
    689 lexer, parser, and AST builder. With this done, the executable will
    690 validate Kaleidoscope code and tell us if it is grammatically invalid.
    691 For example, here is a sample interaction:
    692 
    693 .. code-block:: bash
    694 
    695     $ ./a.out
    696     ready> def foo(x y) x+foo(y, 4.0);
    697     Parsed a function definition.
    698     ready> def foo(x y) x+y y;
    699     Parsed a function definition.
    700     Parsed a top-level expr
    701     ready> def foo(x y) x+y );
    702     Parsed a function definition.
    703     Error: unknown token when expecting an expression
    704     ready> extern sin(a);
    705     ready> Parsed an extern
    706     ready> ^D
    707     $
    708 
    709 There is a lot of room for extension here. You can define new AST nodes,
    710 extend the language in many ways, etc. In the `next
    711 installment <LangImpl3.html>`_, we will describe how to generate LLVM
    712 Intermediate Representation (IR) from the AST.
    713 
    714 Full Code Listing
    715 =================
    716 
    717 Here is the complete code listing for this and the previous chapter.
    718 Note that it is fully self-contained: you don't need LLVM or any
    719 external libraries at all for this. (Besides the C and C++ standard
    720 libraries, of course.) To build this, just compile with:
    721 
    722 .. code-block:: bash
    723 
    724     # Compile
    725     clang++ -g -O3 toy.cpp
    726     # Run
    727     ./a.out
    728 
    729 Here is the code:
    730 
    731 .. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
    732    :language: c++
    733 
    734 `Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
    735 
    736