Home | History | Annotate | Download | only in tutorial
      1 ===========================================
      2 Kaleidoscope: Implementing a Parser and AST
      3 ===========================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 2 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 2 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
     13 lexer, built in `Chapter 1 <LangImpl01.html>`_, to build a full
     14 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
     15 language. Once we have a parser, we'll define and build an `Abstract
     16 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
     17 
     18 The parser we will build uses a combination of `Recursive Descent
     19 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
     20 `Operator-Precedence
     21 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
     22 parse the Kaleidoscope language (the latter for binary expressions and
     23 the former for everything else). Before we get to parsing though, let's
     24 talk about the output of the parser: the Abstract Syntax Tree.
     25 
     26 The Abstract Syntax Tree (AST)
     27 ==============================
     28 
     29 The AST for a program captures its behavior in such a way that it is
     30 easy for later stages of the compiler (e.g. code generation) to
     31 interpret. We basically want one object for each construct in the
     32 language, and the AST should closely model the language. In
     33 Kaleidoscope, we have expressions, a prototype, and a function object.
     34 We'll start with expressions first:
     35 
     36 .. code-block:: c++
     37 
     38     /// ExprAST - Base class for all expression nodes.
     39     class ExprAST {
     40     public:
     41       virtual ~ExprAST() {}
     42     };
     43 
     44     /// NumberExprAST - Expression class for numeric literals like "1.0".
     45     class NumberExprAST : public ExprAST {
     46       double Val;
     47 
     48     public:
     49       NumberExprAST(double Val) : Val(Val) {}
     50     };
     51 
     52 The code above shows the definition of the base ExprAST class and one
     53 subclass which we use for numeric literals. The important thing to note
     54 about this code is that the NumberExprAST class captures the numeric
     55 value of the literal as an instance variable. This allows later phases
     56 of the compiler to know what the stored numeric value is.
     57 
     58 Right now we only create the AST, so there are no useful accessor
     59 methods on them. It would be very easy to add a virtual method to pretty
     60 print the code, for example. Here are the other expression AST node
     61 definitions that we'll use in the basic form of the Kaleidoscope
     62 language:
     63 
     64 .. code-block:: c++
     65 
     66     /// VariableExprAST - Expression class for referencing a variable, like "a".
     67     class VariableExprAST : public ExprAST {
     68       std::string Name;
     69 
     70     public:
     71       VariableExprAST(const std::string &Name) : Name(Name) {}
     72     };
     73 
     74     /// BinaryExprAST - Expression class for a binary operator.
     75     class BinaryExprAST : public ExprAST {
     76       char Op;
     77       std::unique_ptr<ExprAST> LHS, RHS;
     78 
     79     public:
     80       BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
     81                     std::unique_ptr<ExprAST> RHS)
     82         : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
     83     };
     84 
     85     /// CallExprAST - Expression class for function calls.
     86     class CallExprAST : public ExprAST {
     87       std::string Callee;
     88       std::vector<std::unique_ptr<ExprAST>> Args;
     89 
     90     public:
     91       CallExprAST(const std::string &Callee,
     92                   std::vector<std::unique_ptr<ExprAST>> Args)
     93         : Callee(Callee), Args(std::move(Args)) {}
     94     };
     95 
     96 This is all (intentionally) rather straight-forward: variables capture
     97 the variable name, binary operators capture their opcode (e.g. '+'), and
     98 calls capture a function name as well as a list of any argument
     99 expressions. One thing that is nice about our AST is that it captures
    100 the language features without talking about the syntax of the language.
    101 Note that there is no discussion about precedence of binary operators,
    102 lexical structure, etc.
    103 
    104 For our basic language, these are all of the expression nodes we'll
    105 define. Because it doesn't have conditional control flow, it isn't
    106 Turing-complete; we'll fix that in a later installment. The two things
    107 we need next are a way to talk about the interface to a function, and a
    108 way to talk about functions themselves:
    109 
    110 .. code-block:: c++
    111 
    112     /// PrototypeAST - This class represents the "prototype" for a function,
    113     /// which captures its name, and its argument names (thus implicitly the number
    114     /// of arguments the function takes).
    115     class PrototypeAST {
    116       std::string Name;
    117       std::vector<std::string> Args;
    118 
    119     public:
    120       PrototypeAST(const std::string &name, std::vector<std::string> Args)
    121         : Name(name), Args(std::move(Args)) {}
    122 
    123       const std::string &getName() const { return Name; }
    124     };
    125 
    126     /// FunctionAST - This class represents a function definition itself.
    127     class FunctionAST {
    128       std::unique_ptr<PrototypeAST> Proto;
    129       std::unique_ptr<ExprAST> Body;
    130 
    131     public:
    132       FunctionAST(std::unique_ptr<PrototypeAST> Proto,
    133                   std::unique_ptr<ExprAST> Body)
    134         : Proto(std::move(Proto)), Body(std::move(Body)) {}
    135     };
    136 
    137 In Kaleidoscope, functions are typed with just a count of their
    138 arguments. Since all values are double precision floating point, the
    139 type of each argument doesn't need to be stored anywhere. In a more
    140 aggressive and realistic language, the "ExprAST" class would probably
    141 have a type field.
    142 
    143 With this scaffolding, we can now talk about parsing expressions and
    144 function bodies in Kaleidoscope.
    145 
    146 Parser Basics
    147 =============
    148 
    149 Now that we have an AST to build, we need to define the parser code to
    150 build it. The idea here is that we want to parse something like "x+y"
    151 (which is returned as three tokens by the lexer) into an AST that could
    152 be generated with calls like this:
    153 
    154 .. code-block:: c++
    155 
    156       auto LHS = llvm::make_unique<VariableExprAST>("x");
    157       auto RHS = llvm::make_unique<VariableExprAST>("y");
    158       auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
    159                                                     std::move(RHS));
    160 
    161 In order to do this, we'll start by defining some basic helper routines:
    162 
    163 .. code-block:: c++
    164 
    165     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
    166     /// token the parser is looking at.  getNextToken reads another token from the
    167     /// lexer and updates CurTok with its results.
    168     static int CurTok;
    169     static int getNextToken() {
    170       return CurTok = gettok();
    171     }
    172 
    173 This implements a simple token buffer around the lexer. This allows us
    174 to look one token ahead at what the lexer is returning. Every function
    175 in our parser will assume that CurTok is the current token that needs to
    176 be parsed.
    177 
    178 .. code-block:: c++
    179 
    180 
    181     /// LogError* - These are little helper functions for error handling.
    182     std::unique_ptr<ExprAST> LogError(const char *Str) {
    183       fprintf(stderr, "LogError: %s\n", Str);
    184       return nullptr;
    185     }
    186     std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
    187       LogError(Str);
    188       return nullptr;
    189     }
    190 
    191 The ``LogError`` routines are simple helper routines that our parser will
    192 use to handle errors. The error recovery in our parser will not be the
    193 best and is not particular user-friendly, but it will be enough for our
    194 tutorial. These routines make it easier to handle errors in routines
    195 that have various return types: they always return null.
    196 
    197 With these basic helper functions, we can implement the first piece of
    198 our grammar: numeric literals.
    199 
    200 Basic Expression Parsing
    201 ========================
    202 
    203 We start with numeric literals, because they are the simplest to
    204 process. For each production in our grammar, we'll define a function
    205 which parses that production. For numeric literals, we have:
    206 
    207 .. code-block:: c++
    208 
    209     /// numberexpr ::= number
    210     static std::unique_ptr<ExprAST> ParseNumberExpr() {
    211       auto Result = llvm::make_unique<NumberExprAST>(NumVal);
    212       getNextToken(); // consume the number
    213       return std::move(Result);
    214     }
    215 
    216 This routine is very simple: it expects to be called when the current
    217 token is a ``tok_number`` token. It takes the current number value,
    218 creates a ``NumberExprAST`` node, advances the lexer to the next token,
    219 and finally returns.
    220 
    221 There are some interesting aspects to this. The most important one is
    222 that this routine eats all of the tokens that correspond to the
    223 production and returns the lexer buffer with the next token (which is
    224 not part of the grammar production) ready to go. This is a fairly
    225 standard way to go for recursive descent parsers. For a better example,
    226 the parenthesis operator is defined like this:
    227 
    228 .. code-block:: c++
    229 
    230     /// parenexpr ::= '(' expression ')'
    231     static std::unique_ptr<ExprAST> ParseParenExpr() {
    232       getNextToken(); // eat (.
    233       auto V = ParseExpression();
    234       if (!V)
    235         return nullptr;
    236 
    237       if (CurTok != ')')
    238         return LogError("expected ')'");
    239       getNextToken(); // eat ).
    240       return V;
    241     }
    242 
    243 This function illustrates a number of interesting things about the
    244 parser:
    245 
    246 1) It shows how we use the LogError routines. When called, this function
    247 expects that the current token is a '(' token, but after parsing the
    248 subexpression, it is possible that there is no ')' waiting. For example,
    249 if the user types in "(4 x" instead of "(4)", the parser should emit an
    250 error. Because errors can occur, the parser needs a way to indicate that
    251 they happened: in our parser, we return null on an error.
    252 
    253 2) Another interesting aspect of this function is that it uses recursion
    254 by calling ``ParseExpression`` (we will soon see that
    255 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
    256 because it allows us to handle recursive grammars, and keeps each
    257 production very simple. Note that parentheses do not cause construction
    258 of AST nodes themselves. While we could do it this way, the most
    259 important role of parentheses are to guide the parser and provide
    260 grouping. Once the parser constructs the AST, parentheses are not
    261 needed.
    262 
    263 The next simple production is for handling variable references and
    264 function calls:
    265 
    266 .. code-block:: c++
    267 
    268     /// identifierexpr
    269     ///   ::= identifier
    270     ///   ::= identifier '(' expression* ')'
    271     static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
    272       std::string IdName = IdentifierStr;
    273 
    274       getNextToken();  // eat identifier.
    275 
    276       if (CurTok != '(') // Simple variable ref.
    277         return llvm::make_unique<VariableExprAST>(IdName);
    278 
    279       // Call.
    280       getNextToken();  // eat (
    281       std::vector<std::unique_ptr<ExprAST>> Args;
    282       if (CurTok != ')') {
    283         while (1) {
    284           if (auto Arg = ParseExpression())
    285             Args.push_back(std::move(Arg));
    286           else
    287             return nullptr;
    288 
    289           if (CurTok == ')')
    290             break;
    291 
    292           if (CurTok != ',')
    293             return LogError("Expected ')' or ',' in argument list");
    294           getNextToken();
    295         }
    296       }
    297 
    298       // Eat the ')'.
    299       getNextToken();
    300 
    301       return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
    302     }
    303 
    304 This routine follows the same style as the other routines. (It expects
    305 to be called if the current token is a ``tok_identifier`` token). It
    306 also has recursion and error handling. One interesting aspect of this is
    307 that it uses *look-ahead* to determine if the current identifier is a
    308 stand alone variable reference or if it is a function call expression.
    309 It handles this by checking to see if the token after the identifier is
    310 a '(' token, constructing either a ``VariableExprAST`` or
    311 ``CallExprAST`` node as appropriate.
    312 
    313 Now that we have all of our simple expression-parsing logic in place, we
    314 can define a helper function to wrap it together into one entry point.
    315 We call this class of expressions "primary" expressions, for reasons
    316 that will become more clear `later in the
    317 tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
    318 primary expression, we need to determine what sort of expression it is:
    319 
    320 .. code-block:: c++
    321 
    322     /// primary
    323     ///   ::= identifierexpr
    324     ///   ::= numberexpr
    325     ///   ::= parenexpr
    326     static std::unique_ptr<ExprAST> ParsePrimary() {
    327       switch (CurTok) {
    328       default:
    329         return LogError("unknown token when expecting an expression");
    330       case tok_identifier:
    331         return ParseIdentifierExpr();
    332       case tok_number:
    333         return ParseNumberExpr();
    334       case '(':
    335         return ParseParenExpr();
    336       }
    337     }
    338 
    339 Now that you see the definition of this function, it is more obvious why
    340 we can assume the state of CurTok in the various functions. This uses
    341 look-ahead to determine which sort of expression is being inspected, and
    342 then parses it with a function call.
    343 
    344 Now that basic expressions are handled, we need to handle binary
    345 expressions. They are a bit more complex.
    346 
    347 Binary Expression Parsing
    348 =========================
    349 
    350 Binary expressions are significantly harder to parse because they are
    351 often ambiguous. For example, when given the string "x+y\*z", the parser
    352 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
    353 definitions from mathematics, we expect the later parse, because "\*"
    354 (multiplication) has higher *precedence* than "+" (addition).
    355 
    356 There are many ways to handle this, but an elegant and efficient way is
    357 to use `Operator-Precedence
    358 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
    359 This parsing technique uses the precedence of binary operators to guide
    360 recursion. To start with, we need a table of precedences:
    361 
    362 .. code-block:: c++
    363 
    364     /// BinopPrecedence - This holds the precedence for each binary operator that is
    365     /// defined.
    366     static std::map<char, int> BinopPrecedence;
    367 
    368     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
    369     static int GetTokPrecedence() {
    370       if (!isascii(CurTok))
    371         return -1;
    372 
    373       // Make sure it's a declared binop.
    374       int TokPrec = BinopPrecedence[CurTok];
    375       if (TokPrec <= 0) return -1;
    376       return TokPrec;
    377     }
    378 
    379     int main() {
    380       // Install standard binary operators.
    381       // 1 is lowest precedence.
    382       BinopPrecedence['<'] = 10;
    383       BinopPrecedence['+'] = 20;
    384       BinopPrecedence['-'] = 20;
    385       BinopPrecedence['*'] = 40;  // highest.
    386       ...
    387     }
    388 
    389 For the basic form of Kaleidoscope, we will only support 4 binary
    390 operators (this can obviously be extended by you, our brave and intrepid
    391 reader). The ``GetTokPrecedence`` function returns the precedence for
    392 the current token, or -1 if the token is not a binary operator. Having a
    393 map makes it easy to add new operators and makes it clear that the
    394 algorithm doesn't depend on the specific operators involved, but it
    395 would be easy enough to eliminate the map and do the comparisons in the
    396 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
    397 
    398 With the helper above defined, we can now start parsing binary
    399 expressions. The basic idea of operator precedence parsing is to break
    400 down an expression with potentially ambiguous binary operators into
    401 pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
    402 Operator precedence parsing considers this as a stream of primary
    403 expressions separated by binary operators. As such, it will first parse
    404 the leading primary expression "a", then it will see the pairs [+, b]
    405 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
    406 primary expressions, the binary expression parser doesn't need to worry
    407 about nested subexpressions like (c+d) at all.
    408 
    409 To start, an expression is a primary expression potentially followed by
    410 a sequence of [binop,primaryexpr] pairs:
    411 
    412 .. code-block:: c++
    413 
    414     /// expression
    415     ///   ::= primary binoprhs
    416     ///
    417     static std::unique_ptr<ExprAST> ParseExpression() {
    418       auto LHS = ParsePrimary();
    419       if (!LHS)
    420         return nullptr;
    421 
    422       return ParseBinOpRHS(0, std::move(LHS));
    423     }
    424 
    425 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
    426 us. It takes a precedence and a pointer to an expression for the part
    427 that has been parsed so far. Note that "x" is a perfectly valid
    428 expression: As such, "binoprhs" is allowed to be empty, in which case it
    429 returns the expression that is passed into it. In our example above, the
    430 code passes the expression for "a" into ``ParseBinOpRHS`` and the
    431 current token is "+".
    432 
    433 The precedence value passed into ``ParseBinOpRHS`` indicates the
    434 *minimal operator precedence* that the function is allowed to eat. For
    435 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
    436 passed in a precedence of 40, it will not consume any tokens (because
    437 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
    438 starts with:
    439 
    440 .. code-block:: c++
    441 
    442     /// binoprhs
    443     ///   ::= ('+' primary)*
    444     static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
    445                                                   std::unique_ptr<ExprAST> LHS) {
    446       // If this is a binop, find its precedence.
    447       while (1) {
    448         int TokPrec = GetTokPrecedence();
    449 
    450         // If this is a binop that binds at least as tightly as the current binop,
    451         // consume it, otherwise we are done.
    452         if (TokPrec < ExprPrec)
    453           return LHS;
    454 
    455 This code gets the precedence of the current token and checks to see if
    456 if is too low. Because we defined invalid tokens to have a precedence of
    457 -1, this check implicitly knows that the pair-stream ends when the token
    458 stream runs out of binary operators. If this check succeeds, we know
    459 that the token is a binary operator and that it will be included in this
    460 expression:
    461 
    462 .. code-block:: c++
    463 
    464         // Okay, we know this is a binop.
    465         int BinOp = CurTok;
    466         getNextToken();  // eat binop
    467 
    468         // Parse the primary expression after the binary operator.
    469         auto RHS = ParsePrimary();
    470         if (!RHS)
    471           return nullptr;
    472 
    473 As such, this code eats (and remembers) the binary operator and then
    474 parses the primary expression that follows. This builds up the whole
    475 pair, the first of which is [+, b] for the running example.
    476 
    477 Now that we parsed the left-hand side of an expression and one pair of
    478 the RHS sequence, we have to decide which way the expression associates.
    479 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
    480 unparsed)". To determine this, we look ahead at "binop" to determine its
    481 precedence and compare it to BinOp's precedence (which is '+' in this
    482 case):
    483 
    484 .. code-block:: c++
    485 
    486         // If BinOp binds less tightly with RHS than the operator after RHS, let
    487         // the pending operator take RHS as its LHS.
    488         int NextPrec = GetTokPrecedence();
    489         if (TokPrec < NextPrec) {
    490 
    491 If the precedence of the binop to the right of "RHS" is lower or equal
    492 to the precedence of our current operator, then we know that the
    493 parentheses associate as "(a+b) binop ...". In our example, the current
    494 operator is "+" and the next operator is "+", we know that they have the
    495 same precedence. In this case we'll create the AST node for "a+b", and
    496 then continue parsing:
    497 
    498 .. code-block:: c++
    499 
    500           ... if body omitted ...
    501         }
    502 
    503         // Merge LHS/RHS.
    504         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
    505                                                std::move(RHS));
    506       }  // loop around to the top of the while loop.
    507     }
    508 
    509 In our example above, this will turn "a+b+" into "(a+b)" and execute the
    510 next iteration of the loop, with "+" as the current token. The code
    511 above will eat, remember, and parse "(c+d)" as the primary expression,
    512 which makes the current pair equal to [+, (c+d)]. It will then evaluate
    513 the 'if' conditional above with "\*" as the binop to the right of the
    514 primary. In this case, the precedence of "\*" is higher than the
    515 precedence of "+" so the if condition will be entered.
    516 
    517 The critical question left here is "how can the if condition parse the
    518 right hand side in full"? In particular, to build the AST correctly for
    519 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
    520 variable. The code to do this is surprisingly simple (code from the
    521 above two blocks duplicated for context):
    522 
    523 .. code-block:: c++
    524 
    525         // If BinOp binds less tightly with RHS than the operator after RHS, let
    526         // the pending operator take RHS as its LHS.
    527         int NextPrec = GetTokPrecedence();
    528         if (TokPrec < NextPrec) {
    529           RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
    530           if (!RHS)
    531             return nullptr;
    532         }
    533         // Merge LHS/RHS.
    534         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
    535                                                std::move(RHS));
    536       }  // loop around to the top of the while loop.
    537     }
    538 
    539 At this point, we know that the binary operator to the RHS of our
    540 primary has higher precedence than the binop we are currently parsing.
    541 As such, we know that any sequence of pairs whose operators are all
    542 higher precedence than "+" should be parsed together and returned as
    543 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
    544 specifying "TokPrec+1" as the minimum precedence required for it to
    545 continue. In our example above, this will cause it to return the AST
    546 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
    547 expression.
    548 
    549 Finally, on the next iteration of the while loop, the "+g" piece is
    550 parsed and added to the AST. With this little bit of code (14
    551 non-trivial lines), we correctly handle fully general binary expression
    552 parsing in a very elegant way. This was a whirlwind tour of this code,
    553 and it is somewhat subtle. I recommend running through it with a few
    554 tough examples to see how it works.
    555 
    556 This wraps up handling of expressions. At this point, we can point the
    557 parser at an arbitrary token stream and build an expression from it,
    558 stopping at the first token that is not part of the expression. Next up
    559 we need to handle function definitions, etc.
    560 
    561 Parsing the Rest
    562 ================
    563 
    564 The next thing missing is handling of function prototypes. In
    565 Kaleidoscope, these are used both for 'extern' function declarations as
    566 well as function body definitions. The code to do this is
    567 straight-forward and not very interesting (once you've survived
    568 expressions):
    569 
    570 .. code-block:: c++
    571 
    572     /// prototype
    573     ///   ::= id '(' id* ')'
    574     static std::unique_ptr<PrototypeAST> ParsePrototype() {
    575       if (CurTok != tok_identifier)
    576         return LogErrorP("Expected function name in prototype");
    577 
    578       std::string FnName = IdentifierStr;
    579       getNextToken();
    580 
    581       if (CurTok != '(')
    582         return LogErrorP("Expected '(' in prototype");
    583 
    584       // Read the list of argument names.
    585       std::vector<std::string> ArgNames;
    586       while (getNextToken() == tok_identifier)
    587         ArgNames.push_back(IdentifierStr);
    588       if (CurTok != ')')
    589         return LogErrorP("Expected ')' in prototype");
    590 
    591       // success.
    592       getNextToken();  // eat ')'.
    593 
    594       return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
    595     }
    596 
    597 Given this, a function definition is very simple, just a prototype plus
    598 an expression to implement the body:
    599 
    600 .. code-block:: c++
    601 
    602     /// definition ::= 'def' prototype expression
    603     static std::unique_ptr<FunctionAST> ParseDefinition() {
    604       getNextToken();  // eat def.
    605       auto Proto = ParsePrototype();
    606       if (!Proto) return nullptr;
    607 
    608       if (auto E = ParseExpression())
    609         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
    610       return nullptr;
    611     }
    612 
    613 In addition, we support 'extern' to declare functions like 'sin' and
    614 'cos' as well as to support forward declaration of user functions. These
    615 'extern's are just prototypes with no body:
    616 
    617 .. code-block:: c++
    618 
    619     /// external ::= 'extern' prototype
    620     static std::unique_ptr<PrototypeAST> ParseExtern() {
    621       getNextToken();  // eat extern.
    622       return ParsePrototype();
    623     }
    624 
    625 Finally, we'll also let the user type in arbitrary top-level expressions
    626 and evaluate them on the fly. We will handle this by defining anonymous
    627 nullary (zero argument) functions for them:
    628 
    629 .. code-block:: c++
    630 
    631     /// toplevelexpr ::= expression
    632     static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
    633       if (auto E = ParseExpression()) {
    634         // Make an anonymous proto.
    635         auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
    636         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
    637       }
    638       return nullptr;
    639     }
    640 
    641 Now that we have all the pieces, let's build a little driver that will
    642 let us actually *execute* this code we've built!
    643 
    644 The Driver
    645 ==========
    646 
    647 The driver for this simply invokes all of the parsing pieces with a
    648 top-level dispatch loop. There isn't much interesting here, so I'll just
    649 include the top-level loop. See `below <#full-code-listing>`_ for full code in the
    650 "Top-Level Parsing" section.
    651 
    652 .. code-block:: c++
    653 
    654     /// top ::= definition | external | expression | ';'
    655     static void MainLoop() {
    656       while (1) {
    657         fprintf(stderr, "ready> ");
    658         switch (CurTok) {
    659         case tok_eof:
    660           return;
    661         case ';': // ignore top-level semicolons.
    662           getNextToken();
    663           break;
    664         case tok_def:
    665           HandleDefinition();
    666           break;
    667         case tok_extern:
    668           HandleExtern();
    669           break;
    670         default:
    671           HandleTopLevelExpression();
    672           break;
    673         }
    674       }
    675     }
    676 
    677 The most interesting part of this is that we ignore top-level
    678 semicolons. Why is this, you ask? The basic reason is that if you type
    679 "4 + 5" at the command line, the parser doesn't know whether that is the
    680 end of what you will type or not. For example, on the next line you
    681 could type "def foo..." in which case 4+5 is the end of a top-level
    682 expression. Alternatively you could type "\* 6", which would continue
    683 the expression. Having top-level semicolons allows you to type "4+5;",
    684 and the parser will know you are done.
    685 
    686 Conclusions
    687 ===========
    688 
    689 With just under 400 lines of commented code (240 lines of non-comment,
    690 non-blank code), we fully defined our minimal language, including a
    691 lexer, parser, and AST builder. With this done, the executable will
    692 validate Kaleidoscope code and tell us if it is grammatically invalid.
    693 For example, here is a sample interaction:
    694 
    695 .. code-block:: bash
    696 
    697     $ ./a.out
    698     ready> def foo(x y) x+foo(y, 4.0);
    699     Parsed a function definition.
    700     ready> def foo(x y) x+y y;
    701     Parsed a function definition.
    702     Parsed a top-level expr
    703     ready> def foo(x y) x+y );
    704     Parsed a function definition.
    705     Error: unknown token when expecting an expression
    706     ready> extern sin(a);
    707     ready> Parsed an extern
    708     ready> ^D
    709     $
    710 
    711 There is a lot of room for extension here. You can define new AST nodes,
    712 extend the language in many ways, etc. In the `next
    713 installment <LangImpl03.html>`_, we will describe how to generate LLVM
    714 Intermediate Representation (IR) from the AST.
    715 
    716 Full Code Listing
    717 =================
    718 
    719 Here is the complete code listing for our running example. Because this
    720 uses the LLVM libraries, we need to link them in. To do this, we use the
    721 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
    722 our makefile/command line about which options to use:
    723 
    724 .. code-block:: bash
    725 
    726     # Compile
    727     clang++ -g -O3 toy.cpp `llvm-config --cxxflags`
    728     # Run
    729     ./a.out
    730 
    731 Here is the code:
    732 
    733 .. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
    734    :language: c++
    735 
    736 `Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
    737 
    738