Home | History | Annotate | Download | only in tutorial
      1 ===========================================
      2 Kaleidoscope: Implementing a Parser and AST
      3 ===========================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Chapter 2 Introduction
      9 ======================
     10 
     11 Welcome to Chapter 2 of the "`Implementing a language with
     12 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
     13 lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
     14 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
     15 language. Once we have a parser, we'll define and build an `Abstract
     16 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
     17 
     18 The parser we will build uses a combination of `Recursive Descent
     19 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
     20 `Operator-Precedence
     21 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
     22 parse the Kaleidoscope language (the latter for binary expressions and
     23 the former for everything else). Before we get to parsing though, lets
     24 talk about the output of the parser: the Abstract Syntax Tree.
     25 
     26 The Abstract Syntax Tree (AST)
     27 ==============================
     28 
     29 The AST for a program captures its behavior in such a way that it is
     30 easy for later stages of the compiler (e.g. code generation) to
     31 interpret. We basically want one object for each construct in the
     32 language, and the AST should closely model the language. In
     33 Kaleidoscope, we have expressions, a prototype, and a function object.
     34 We'll start with expressions first:
     35 
     36 .. code-block:: c++
     37 
     38     /// ExprAST - Base class for all expression nodes.
     39     class ExprAST {
     40     public:
     41       virtual ~ExprAST() {}
     42     };
     43 
     44     /// NumberExprAST - Expression class for numeric literals like "1.0".
     45     class NumberExprAST : public ExprAST {
     46       double Val;
     47     public:
     48       NumberExprAST(double val) : Val(val) {}
     49     };
     50 
     51 The code above shows the definition of the base ExprAST class and one
     52 subclass which we use for numeric literals. The important thing to note
     53 about this code is that the NumberExprAST class captures the numeric
     54 value of the literal as an instance variable. This allows later phases
     55 of the compiler to know what the stored numeric value is.
     56 
     57 Right now we only create the AST, so there are no useful accessor
     58 methods on them. It would be very easy to add a virtual method to pretty
     59 print the code, for example. Here are the other expression AST node
     60 definitions that we'll use in the basic form of the Kaleidoscope
     61 language:
     62 
     63 .. code-block:: c++
     64 
     65     /// VariableExprAST - Expression class for referencing a variable, like "a".
     66     class VariableExprAST : public ExprAST {
     67       std::string Name;
     68     public:
     69       VariableExprAST(const std::string &name) : Name(name) {}
     70     };
     71 
     72     /// BinaryExprAST - Expression class for a binary operator.
     73     class BinaryExprAST : public ExprAST {
     74       char Op;
     75       ExprAST *LHS, *RHS;
     76     public:
     77       BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
     78         : Op(op), LHS(lhs), RHS(rhs) {}
     79     };
     80 
     81     /// CallExprAST - Expression class for function calls.
     82     class CallExprAST : public ExprAST {
     83       std::string Callee;
     84       std::vector<ExprAST*> Args;
     85     public:
     86       CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
     87         : Callee(callee), Args(args) {}
     88     };
     89 
     90 This is all (intentionally) rather straight-forward: variables capture
     91 the variable name, binary operators capture their opcode (e.g. '+'), and
     92 calls capture a function name as well as a list of any argument
     93 expressions. One thing that is nice about our AST is that it captures
     94 the language features without talking about the syntax of the language.
     95 Note that there is no discussion about precedence of binary operators,
     96 lexical structure, etc.
     97 
     98 For our basic language, these are all of the expression nodes we'll
     99 define. Because it doesn't have conditional control flow, it isn't
    100 Turing-complete; we'll fix that in a later installment. The two things
    101 we need next are a way to talk about the interface to a function, and a
    102 way to talk about functions themselves:
    103 
    104 .. code-block:: c++
    105 
    106     /// PrototypeAST - This class represents the "prototype" for a function,
    107     /// which captures its name, and its argument names (thus implicitly the number
    108     /// of arguments the function takes).
    109     class PrototypeAST {
    110       std::string Name;
    111       std::vector<std::string> Args;
    112     public:
    113       PrototypeAST(const std::string &name, const std::vector<std::string> &args)
    114         : Name(name), Args(args) {}
    115     };
    116 
    117     /// FunctionAST - This class represents a function definition itself.
    118     class FunctionAST {
    119       PrototypeAST *Proto;
    120       ExprAST *Body;
    121     public:
    122       FunctionAST(PrototypeAST *proto, ExprAST *body)
    123         : Proto(proto), Body(body) {}
    124     };
    125 
    126 In Kaleidoscope, functions are typed with just a count of their
    127 arguments. Since all values are double precision floating point, the
    128 type of each argument doesn't need to be stored anywhere. In a more
    129 aggressive and realistic language, the "ExprAST" class would probably
    130 have a type field.
    131 
    132 With this scaffolding, we can now talk about parsing expressions and
    133 function bodies in Kaleidoscope.
    134 
    135 Parser Basics
    136 =============
    137 
    138 Now that we have an AST to build, we need to define the parser code to
    139 build it. The idea here is that we want to parse something like "x+y"
    140 (which is returned as three tokens by the lexer) into an AST that could
    141 be generated with calls like this:
    142 
    143 .. code-block:: c++
    144 
    145       ExprAST *X = new VariableExprAST("x");
    146       ExprAST *Y = new VariableExprAST("y");
    147       ExprAST *Result = new BinaryExprAST('+', X, Y);
    148 
    149 In order to do this, we'll start by defining some basic helper routines:
    150 
    151 .. code-block:: c++
    152 
    153     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
    154     /// token the parser is looking at.  getNextToken reads another token from the
    155     /// lexer and updates CurTok with its results.
    156     static int CurTok;
    157     static int getNextToken() {
    158       return CurTok = gettok();
    159     }
    160 
    161 This implements a simple token buffer around the lexer. This allows us
    162 to look one token ahead at what the lexer is returning. Every function
    163 in our parser will assume that CurTok is the current token that needs to
    164 be parsed.
    165 
    166 .. code-block:: c++
    167 
    168 
    169     /// Error* - These are little helper functions for error handling.
    170     ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
    171     PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
    172     FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
    173 
    174 The ``Error`` routines are simple helper routines that our parser will
    175 use to handle errors. The error recovery in our parser will not be the
    176 best and is not particular user-friendly, but it will be enough for our
    177 tutorial. These routines make it easier to handle errors in routines
    178 that have various return types: they always return null.
    179 
    180 With these basic helper functions, we can implement the first piece of
    181 our grammar: numeric literals.
    182 
    183 Basic Expression Parsing
    184 ========================
    185 
    186 We start with numeric literals, because they are the simplest to
    187 process. For each production in our grammar, we'll define a function
    188 which parses that production. For numeric literals, we have:
    189 
    190 .. code-block:: c++
    191 
    192     /// numberexpr ::= number
    193     static ExprAST *ParseNumberExpr() {
    194       ExprAST *Result = new NumberExprAST(NumVal);
    195       getNextToken(); // consume the number
    196       return Result;
    197     }
    198 
    199 This routine is very simple: it expects to be called when the current
    200 token is a ``tok_number`` token. It takes the current number value,
    201 creates a ``NumberExprAST`` node, advances the lexer to the next token,
    202 and finally returns.
    203 
    204 There are some interesting aspects to this. The most important one is
    205 that this routine eats all of the tokens that correspond to the
    206 production and returns the lexer buffer with the next token (which is
    207 not part of the grammar production) ready to go. This is a fairly
    208 standard way to go for recursive descent parsers. For a better example,
    209 the parenthesis operator is defined like this:
    210 
    211 .. code-block:: c++
    212 
    213     /// parenexpr ::= '(' expression ')'
    214     static ExprAST *ParseParenExpr() {
    215       getNextToken();  // eat (.
    216       ExprAST *V = ParseExpression();
    217       if (!V) return 0;
    218 
    219       if (CurTok != ')')
    220         return Error("expected ')'");
    221       getNextToken();  // eat ).
    222       return V;
    223     }
    224 
    225 This function illustrates a number of interesting things about the
    226 parser:
    227 
    228 1) It shows how we use the Error routines. When called, this function
    229 expects that the current token is a '(' token, but after parsing the
    230 subexpression, it is possible that there is no ')' waiting. For example,
    231 if the user types in "(4 x" instead of "(4)", the parser should emit an
    232 error. Because errors can occur, the parser needs a way to indicate that
    233 they happened: in our parser, we return null on an error.
    234 
    235 2) Another interesting aspect of this function is that it uses recursion
    236 by calling ``ParseExpression`` (we will soon see that
    237 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
    238 because it allows us to handle recursive grammars, and keeps each
    239 production very simple. Note that parentheses do not cause construction
    240 of AST nodes themselves. While we could do it this way, the most
    241 important role of parentheses are to guide the parser and provide
    242 grouping. Once the parser constructs the AST, parentheses are not
    243 needed.
    244 
    245 The next simple production is for handling variable references and
    246 function calls:
    247 
    248 .. code-block:: c++
    249 
    250     /// identifierexpr
    251     ///   ::= identifier
    252     ///   ::= identifier '(' expression* ')'
    253     static ExprAST *ParseIdentifierExpr() {
    254       std::string IdName = IdentifierStr;
    255 
    256       getNextToken();  // eat identifier.
    257 
    258       if (CurTok != '(') // Simple variable ref.
    259         return new VariableExprAST(IdName);
    260 
    261       // Call.
    262       getNextToken();  // eat (
    263       std::vector<ExprAST*> Args;
    264       if (CurTok != ')') {
    265         while (1) {
    266           ExprAST *Arg = ParseExpression();
    267           if (!Arg) return 0;
    268           Args.push_back(Arg);
    269 
    270           if (CurTok == ')') break;
    271 
    272           if (CurTok != ',')
    273             return Error("Expected ')' or ',' in argument list");
    274           getNextToken();
    275         }
    276       }
    277 
    278       // Eat the ')'.
    279       getNextToken();
    280 
    281       return new CallExprAST(IdName, Args);
    282     }
    283 
    284 This routine follows the same style as the other routines. (It expects
    285 to be called if the current token is a ``tok_identifier`` token). It
    286 also has recursion and error handling. One interesting aspect of this is
    287 that it uses *look-ahead* to determine if the current identifier is a
    288 stand alone variable reference or if it is a function call expression.
    289 It handles this by checking to see if the token after the identifier is
    290 a '(' token, constructing either a ``VariableExprAST`` or
    291 ``CallExprAST`` node as appropriate.
    292 
    293 Now that we have all of our simple expression-parsing logic in place, we
    294 can define a helper function to wrap it together into one entry point.
    295 We call this class of expressions "primary" expressions, for reasons
    296 that will become more clear `later in the
    297 tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
    298 primary expression, we need to determine what sort of expression it is:
    299 
    300 .. code-block:: c++
    301 
    302     /// primary
    303     ///   ::= identifierexpr
    304     ///   ::= numberexpr
    305     ///   ::= parenexpr
    306     static ExprAST *ParsePrimary() {
    307       switch (CurTok) {
    308       default: return Error("unknown token when expecting an expression");
    309       case tok_identifier: return ParseIdentifierExpr();
    310       case tok_number:     return ParseNumberExpr();
    311       case '(':            return ParseParenExpr();
    312       }
    313     }
    314 
    315 Now that you see the definition of this function, it is more obvious why
    316 we can assume the state of CurTok in the various functions. This uses
    317 look-ahead to determine which sort of expression is being inspected, and
    318 then parses it with a function call.
    319 
    320 Now that basic expressions are handled, we need to handle binary
    321 expressions. They are a bit more complex.
    322 
    323 Binary Expression Parsing
    324 =========================
    325 
    326 Binary expressions are significantly harder to parse because they are
    327 often ambiguous. For example, when given the string "x+y\*z", the parser
    328 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
    329 definitions from mathematics, we expect the later parse, because "\*"
    330 (multiplication) has higher *precedence* than "+" (addition).
    331 
    332 There are many ways to handle this, but an elegant and efficient way is
    333 to use `Operator-Precedence
    334 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
    335 This parsing technique uses the precedence of binary operators to guide
    336 recursion. To start with, we need a table of precedences:
    337 
    338 .. code-block:: c++
    339 
    340     /// BinopPrecedence - This holds the precedence for each binary operator that is
    341     /// defined.
    342     static std::map<char, int> BinopPrecedence;
    343 
    344     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
    345     static int GetTokPrecedence() {
    346       if (!isascii(CurTok))
    347         return -1;
    348 
    349       // Make sure it's a declared binop.
    350       int TokPrec = BinopPrecedence[CurTok];
    351       if (TokPrec <= 0) return -1;
    352       return TokPrec;
    353     }
    354 
    355     int main() {
    356       // Install standard binary operators.
    357       // 1 is lowest precedence.
    358       BinopPrecedence['<'] = 10;
    359       BinopPrecedence['+'] = 20;
    360       BinopPrecedence['-'] = 20;
    361       BinopPrecedence['*'] = 40;  // highest.
    362       ...
    363     }
    364 
    365 For the basic form of Kaleidoscope, we will only support 4 binary
    366 operators (this can obviously be extended by you, our brave and intrepid
    367 reader). The ``GetTokPrecedence`` function returns the precedence for
    368 the current token, or -1 if the token is not a binary operator. Having a
    369 map makes it easy to add new operators and makes it clear that the
    370 algorithm doesn't depend on the specific operators involved, but it
    371 would be easy enough to eliminate the map and do the comparisons in the
    372 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
    373 
    374 With the helper above defined, we can now start parsing binary
    375 expressions. The basic idea of operator precedence parsing is to break
    376 down an expression with potentially ambiguous binary operators into
    377 pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
    378 Operator precedence parsing considers this as a stream of primary
    379 expressions separated by binary operators. As such, it will first parse
    380 the leading primary expression "a", then it will see the pairs [+, b]
    381 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
    382 primary expressions, the binary expression parser doesn't need to worry
    383 about nested subexpressions like (c+d) at all.
    384 
    385 To start, an expression is a primary expression potentially followed by
    386 a sequence of [binop,primaryexpr] pairs:
    387 
    388 .. code-block:: c++
    389 
    390     /// expression
    391     ///   ::= primary binoprhs
    392     ///
    393     static ExprAST *ParseExpression() {
    394       ExprAST *LHS = ParsePrimary();
    395       if (!LHS) return 0;
    396 
    397       return ParseBinOpRHS(0, LHS);
    398     }
    399 
    400 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
    401 us. It takes a precedence and a pointer to an expression for the part
    402 that has been parsed so far. Note that "x" is a perfectly valid
    403 expression: As such, "binoprhs" is allowed to be empty, in which case it
    404 returns the expression that is passed into it. In our example above, the
    405 code passes the expression for "a" into ``ParseBinOpRHS`` and the
    406 current token is "+".
    407 
    408 The precedence value passed into ``ParseBinOpRHS`` indicates the
    409 *minimal operator precedence* that the function is allowed to eat. For
    410 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
    411 passed in a precedence of 40, it will not consume any tokens (because
    412 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
    413 starts with:
    414 
    415 .. code-block:: c++
    416 
    417     /// binoprhs
    418     ///   ::= ('+' primary)*
    419     static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
    420       // If this is a binop, find its precedence.
    421       while (1) {
    422         int TokPrec = GetTokPrecedence();
    423 
    424         // If this is a binop that binds at least as tightly as the current binop,
    425         // consume it, otherwise we are done.
    426         if (TokPrec < ExprPrec)
    427           return LHS;
    428 
    429 This code gets the precedence of the current token and checks to see if
    430 if is too low. Because we defined invalid tokens to have a precedence of
    431 -1, this check implicitly knows that the pair-stream ends when the token
    432 stream runs out of binary operators. If this check succeeds, we know
    433 that the token is a binary operator and that it will be included in this
    434 expression:
    435 
    436 .. code-block:: c++
    437 
    438         // Okay, we know this is a binop.
    439         int BinOp = CurTok;
    440         getNextToken();  // eat binop
    441 
    442         // Parse the primary expression after the binary operator.
    443         ExprAST *RHS = ParsePrimary();
    444         if (!RHS) return 0;
    445 
    446 As such, this code eats (and remembers) the binary operator and then
    447 parses the primary expression that follows. This builds up the whole
    448 pair, the first of which is [+, b] for the running example.
    449 
    450 Now that we parsed the left-hand side of an expression and one pair of
    451 the RHS sequence, we have to decide which way the expression associates.
    452 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
    453 unparsed)". To determine this, we look ahead at "binop" to determine its
    454 precedence and compare it to BinOp's precedence (which is '+' in this
    455 case):
    456 
    457 .. code-block:: c++
    458 
    459         // If BinOp binds less tightly with RHS than the operator after RHS, let
    460         // the pending operator take RHS as its LHS.
    461         int NextPrec = GetTokPrecedence();
    462         if (TokPrec < NextPrec) {
    463 
    464 If the precedence of the binop to the right of "RHS" is lower or equal
    465 to the precedence of our current operator, then we know that the
    466 parentheses associate as "(a+b) binop ...". In our example, the current
    467 operator is "+" and the next operator is "+", we know that they have the
    468 same precedence. In this case we'll create the AST node for "a+b", and
    469 then continue parsing:
    470 
    471 .. code-block:: c++
    472 
    473           ... if body omitted ...
    474         }
    475 
    476         // Merge LHS/RHS.
    477         LHS = new BinaryExprAST(BinOp, LHS, RHS);
    478       }  // loop around to the top of the while loop.
    479     }
    480 
    481 In our example above, this will turn "a+b+" into "(a+b)" and execute the
    482 next iteration of the loop, with "+" as the current token. The code
    483 above will eat, remember, and parse "(c+d)" as the primary expression,
    484 which makes the current pair equal to [+, (c+d)]. It will then evaluate
    485 the 'if' conditional above with "\*" as the binop to the right of the
    486 primary. In this case, the precedence of "\*" is higher than the
    487 precedence of "+" so the if condition will be entered.
    488 
    489 The critical question left here is "how can the if condition parse the
    490 right hand side in full"? In particular, to build the AST correctly for
    491 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
    492 variable. The code to do this is surprisingly simple (code from the
    493 above two blocks duplicated for context):
    494 
    495 .. code-block:: c++
    496 
    497         // If BinOp binds less tightly with RHS than the operator after RHS, let
    498         // the pending operator take RHS as its LHS.
    499         int NextPrec = GetTokPrecedence();
    500         if (TokPrec < NextPrec) {
    501           RHS = ParseBinOpRHS(TokPrec+1, RHS);
    502           if (RHS == 0) return 0;
    503         }
    504         // Merge LHS/RHS.
    505         LHS = new BinaryExprAST(BinOp, LHS, RHS);
    506       }  // loop around to the top of the while loop.
    507     }
    508 
    509 At this point, we know that the binary operator to the RHS of our
    510 primary has higher precedence than the binop we are currently parsing.
    511 As such, we know that any sequence of pairs whose operators are all
    512 higher precedence than "+" should be parsed together and returned as
    513 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
    514 specifying "TokPrec+1" as the minimum precedence required for it to
    515 continue. In our example above, this will cause it to return the AST
    516 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
    517 expression.
    518 
    519 Finally, on the next iteration of the while loop, the "+g" piece is
    520 parsed and added to the AST. With this little bit of code (14
    521 non-trivial lines), we correctly handle fully general binary expression
    522 parsing in a very elegant way. This was a whirlwind tour of this code,
    523 and it is somewhat subtle. I recommend running through it with a few
    524 tough examples to see how it works.
    525 
    526 This wraps up handling of expressions. At this point, we can point the
    527 parser at an arbitrary token stream and build an expression from it,
    528 stopping at the first token that is not part of the expression. Next up
    529 we need to handle function definitions, etc.
    530 
    531 Parsing the Rest
    532 ================
    533 
    534 The next thing missing is handling of function prototypes. In
    535 Kaleidoscope, these are used both for 'extern' function declarations as
    536 well as function body definitions. The code to do this is
    537 straight-forward and not very interesting (once you've survived
    538 expressions):
    539 
    540 .. code-block:: c++
    541 
    542     /// prototype
    543     ///   ::= id '(' id* ')'
    544     static PrototypeAST *ParsePrototype() {
    545       if (CurTok != tok_identifier)
    546         return ErrorP("Expected function name in prototype");
    547 
    548       std::string FnName = IdentifierStr;
    549       getNextToken();
    550 
    551       if (CurTok != '(')
    552         return ErrorP("Expected '(' in prototype");
    553 
    554       // Read the list of argument names.
    555       std::vector<std::string> ArgNames;
    556       while (getNextToken() == tok_identifier)
    557         ArgNames.push_back(IdentifierStr);
    558       if (CurTok != ')')
    559         return ErrorP("Expected ')' in prototype");
    560 
    561       // success.
    562       getNextToken();  // eat ')'.
    563 
    564       return new PrototypeAST(FnName, ArgNames);
    565     }
    566 
    567 Given this, a function definition is very simple, just a prototype plus
    568 an expression to implement the body:
    569 
    570 .. code-block:: c++
    571 
    572     /// definition ::= 'def' prototype expression
    573     static FunctionAST *ParseDefinition() {
    574       getNextToken();  // eat def.
    575       PrototypeAST *Proto = ParsePrototype();
    576       if (Proto == 0) return 0;
    577 
    578       if (ExprAST *E = ParseExpression())
    579         return new FunctionAST(Proto, E);
    580       return 0;
    581     }
    582 
    583 In addition, we support 'extern' to declare functions like 'sin' and
    584 'cos' as well as to support forward declaration of user functions. These
    585 'extern's are just prototypes with no body:
    586 
    587 .. code-block:: c++
    588 
    589     /// external ::= 'extern' prototype
    590     static PrototypeAST *ParseExtern() {
    591       getNextToken();  // eat extern.
    592       return ParsePrototype();
    593     }
    594 
    595 Finally, we'll also let the user type in arbitrary top-level expressions
    596 and evaluate them on the fly. We will handle this by defining anonymous
    597 nullary (zero argument) functions for them:
    598 
    599 .. code-block:: c++
    600 
    601     /// toplevelexpr ::= expression
    602     static FunctionAST *ParseTopLevelExpr() {
    603       if (ExprAST *E = ParseExpression()) {
    604         // Make an anonymous proto.
    605         PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
    606         return new FunctionAST(Proto, E);
    607       }
    608       return 0;
    609     }
    610 
    611 Now that we have all the pieces, let's build a little driver that will
    612 let us actually *execute* this code we've built!
    613 
    614 The Driver
    615 ==========
    616 
    617 The driver for this simply invokes all of the parsing pieces with a
    618 top-level dispatch loop. There isn't much interesting here, so I'll just
    619 include the top-level loop. See `below <#code>`_ for full code in the
    620 "Top-Level Parsing" section.
    621 
    622 .. code-block:: c++
    623 
    624     /// top ::= definition | external | expression | ';'
    625     static void MainLoop() {
    626       while (1) {
    627         fprintf(stderr, "ready> ");
    628         switch (CurTok) {
    629         case tok_eof:    return;
    630         case ';':        getNextToken(); break;  // ignore top-level semicolons.
    631         case tok_def:    HandleDefinition(); break;
    632         case tok_extern: HandleExtern(); break;
    633         default:         HandleTopLevelExpression(); break;
    634         }
    635       }
    636     }
    637 
    638 The most interesting part of this is that we ignore top-level
    639 semicolons. Why is this, you ask? The basic reason is that if you type
    640 "4 + 5" at the command line, the parser doesn't know whether that is the
    641 end of what you will type or not. For example, on the next line you
    642 could type "def foo..." in which case 4+5 is the end of a top-level
    643 expression. Alternatively you could type "\* 6", which would continue
    644 the expression. Having top-level semicolons allows you to type "4+5;",
    645 and the parser will know you are done.
    646 
    647 Conclusions
    648 ===========
    649 
    650 With just under 400 lines of commented code (240 lines of non-comment,
    651 non-blank code), we fully defined our minimal language, including a
    652 lexer, parser, and AST builder. With this done, the executable will
    653 validate Kaleidoscope code and tell us if it is grammatically invalid.
    654 For example, here is a sample interaction:
    655 
    656 .. code-block:: bash
    657 
    658     $ ./a.out
    659     ready> def foo(x y) x+foo(y, 4.0);
    660     Parsed a function definition.
    661     ready> def foo(x y) x+y y;
    662     Parsed a function definition.
    663     Parsed a top-level expr
    664     ready> def foo(x y) x+y );
    665     Parsed a function definition.
    666     Error: unknown token when expecting an expression
    667     ready> extern sin(a);
    668     ready> Parsed an extern
    669     ready> ^D
    670     $
    671 
    672 There is a lot of room for extension here. You can define new AST nodes,
    673 extend the language in many ways, etc. In the `next
    674 installment <LangImpl3.html>`_, we will describe how to generate LLVM
    675 Intermediate Representation (IR) from the AST.
    676 
    677 Full Code Listing
    678 =================
    679 
    680 Here is the complete code listing for this and the previous chapter.
    681 Note that it is fully self-contained: you don't need LLVM or any
    682 external libraries at all for this. (Besides the C and C++ standard
    683 libraries, of course.) To build this, just compile with:
    684 
    685 .. code-block:: bash
    686 
    687     # Compile
    688     clang++ -g -O3 toy.cpp
    689     # Run
    690     ./a.out
    691 
    692 Here is the code:
    693 
    694 .. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
    695    :language: c++
    696 
    697 `Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_
    698 
    699