Home | History | Annotate | Download | only in doc
      1 This is ../../../doc/bison.info, produced by makeinfo version 4.13 from
      2 ../../../doc/bison.texi.
      3 
      4 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU
      5 parser generator.
      6 
      7    Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation,
      8 Inc.
      9 
     10      Permission is granted to copy, distribute and/or modify this
     11      document under the terms of the GNU Free Documentation License,
     12      Version 1.3 or any later version published by the Free Software
     13      Foundation; with no Invariant Sections, with the Front-Cover texts
     14      being "A GNU Manual," and with the Back-Cover Texts as in (a)
     15      below.  A copy of the license is included in the section entitled
     16      "GNU Free Documentation License."
     17 
     18      (a) The FSF's Back-Cover Text is: "You have the freedom to copy and
     19      modify this GNU manual.  Buying copies from the FSF supports it in
     20      developing GNU and promoting software freedom."
     21 
     22 INFO-DIR-SECTION Software development
     23 START-INFO-DIR-ENTRY
     24 * bison: (bison).       GNU parser generator (Yacc replacement).
     25 END-INFO-DIR-ENTRY
     26 
     27 
     28 File: bison.info,  Node: Top,  Next: Introduction,  Up: (dir)
     29 
     30 Bison
     31 *****
     32 
     33 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU
     34 parser generator.
     35 
     36    Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation,
     37 Inc.
     38 
     39      Permission is granted to copy, distribute and/or modify this
     40      document under the terms of the GNU Free Documentation License,
     41      Version 1.3 or any later version published by the Free Software
     42      Foundation; with no Invariant Sections, with the Front-Cover texts
     43      being "A GNU Manual," and with the Back-Cover Texts as in (a)
     44      below.  A copy of the license is included in the section entitled
     45      "GNU Free Documentation License."
     46 
     47      (a) The FSF's Back-Cover Text is: "You have the freedom to copy and
     48      modify this GNU manual.  Buying copies from the FSF supports it in
     49      developing GNU and promoting software freedom."
     50 
     51 * Menu:
     52 
     53 * Introduction::
     54 * Conditions::
     55 * Copying::             The GNU General Public License says
     56                           how you can copy and share Bison.
     57 
     58 Tutorial sections:
     59 * Concepts::            Basic concepts for understanding Bison.
     60 * Examples::            Three simple explained examples of using Bison.
     61 
     62 Reference sections:
     63 * Grammar File::        Writing Bison declarations and rules.
     64 * Interface::           C-language interface to the parser function `yyparse'.
     65 * Algorithm::           How the Bison parser works at run-time.
     66 * Error Recovery::      Writing rules for error recovery.
     67 * Context Dependency::  What to do if your language syntax is too
     68                           messy for Bison to handle straightforwardly.
     69 * Debugging::           Understanding or debugging Bison parsers.
     70 * Invocation::          How to run Bison (to produce the parser implementation).
     71 * Other Languages::     Creating C++ and Java parsers.
     72 * FAQ::                 Frequently Asked Questions
     73 * Table of Symbols::    All the keywords of the Bison language are explained.
     74 * Glossary::            Basic concepts are explained.
     75 * Copying This Manual:: License for copying this manual.
     76 * Bibliography::        Publications cited in this manual.
     77 * Index of Terms::      Cross-references to the text.
     78 
     79  --- The Detailed Node Listing ---
     80 
     81 The Concepts of Bison
     82 
     83 * Language and Grammar:: Languages and context-free grammars,
     84                            as mathematical ideas.
     85 * Grammar in Bison::     How we represent grammars for Bison's sake.
     86 * Semantic Values::      Each token or syntactic grouping can have
     87                            a semantic value (the value of an integer,
     88                            the name of an identifier, etc.).
     89 * Semantic Actions::     Each rule can have an action containing C code.
     90 * GLR Parsers::          Writing parsers for general context-free languages.
     91 * Locations::            Overview of location tracking.
     92 * Bison Parser::         What are Bison's input and output,
     93                            how is the output used?
     94 * Stages::               Stages in writing and running Bison grammars.
     95 * Grammar Layout::       Overall structure of a Bison grammar file.
     96 
     97 Writing GLR Parsers
     98 
     99 * Simple GLR Parsers::     Using GLR parsers on unambiguous grammars.
    100 * Merging GLR Parses::     Using GLR parsers to resolve ambiguities.
    101 * GLR Semantic Actions::   Deferred semantic actions have special concerns.
    102 * Compiler Requirements::  GLR parsers require a modern C compiler.
    103 
    104 Examples
    105 
    106 * RPN Calc::               Reverse polish notation calculator;
    107                              a first example with no operator precedence.
    108 * Infix Calc::             Infix (algebraic) notation calculator.
    109                              Operator precedence is introduced.
    110 * Simple Error Recovery::  Continuing after syntax errors.
    111 * Location Tracking Calc:: Demonstrating the use of @N and @$.
    112 * Multi-function Calc::    Calculator with memory and trig functions.
    113                              It uses multiple data-types for semantic values.
    114 * Exercises::              Ideas for improving the multi-function calculator.
    115 
    116 Reverse Polish Notation Calculator
    117 
    118 * Rpcalc Declarations::    Prologue (declarations) for rpcalc.
    119 * Rpcalc Rules::           Grammar Rules for rpcalc, with explanation.
    120 * Rpcalc Lexer::           The lexical analyzer.
    121 * Rpcalc Main::            The controlling function.
    122 * Rpcalc Error::           The error reporting function.
    123 * Rpcalc Generate::        Running Bison on the grammar file.
    124 * Rpcalc Compile::         Run the C compiler on the output code.
    125 
    126 Grammar Rules for `rpcalc'
    127 
    128 * Rpcalc Input::
    129 * Rpcalc Line::
    130 * Rpcalc Expr::
    131 
    132 Location Tracking Calculator: `ltcalc'
    133 
    134 * Ltcalc Declarations::    Bison and C declarations for ltcalc.
    135 * Ltcalc Rules::           Grammar rules for ltcalc, with explanations.
    136 * Ltcalc Lexer::           The lexical analyzer.
    137 
    138 Multi-Function Calculator: `mfcalc'
    139 
    140 * Mfcalc Declarations::    Bison declarations for multi-function calculator.
    141 * Mfcalc Rules::           Grammar rules for the calculator.
    142 * Mfcalc Symbol Table::    Symbol table management subroutines.
    143 
    144 Bison Grammar Files
    145 
    146 * Grammar Outline::    Overall layout of the grammar file.
    147 * Symbols::            Terminal and nonterminal symbols.
    148 * Rules::              How to write grammar rules.
    149 * Recursion::          Writing recursive rules.
    150 * Semantics::          Semantic values and actions.
    151 * Tracking Locations:: Locations and actions.
    152 * Named References::   Using named references in actions.
    153 * Declarations::       All kinds of Bison declarations are described here.
    154 * Multiple Parsers::   Putting more than one Bison parser in one program.
    155 
    156 Outline of a Bison Grammar
    157 
    158 * Prologue::              Syntax and usage of the prologue.
    159 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
    160 * Bison Declarations::    Syntax and usage of the Bison declarations section.
    161 * Grammar Rules::         Syntax and usage of the grammar rules section.
    162 * Epilogue::              Syntax and usage of the epilogue.
    163 
    164 Defining Language Semantics
    165 
    166 * Value Type::        Specifying one data type for all semantic values.
    167 * Multiple Types::    Specifying several alternative data types.
    168 * Actions::           An action is the semantic definition of a grammar rule.
    169 * Action Types::      Specifying data types for actions to operate on.
    170 * Mid-Rule Actions::  Most actions go at the end of a rule.
    171                       This says when, why and how to use the exceptional
    172                         action in the middle of a rule.
    173 
    174 Actions in Mid-Rule
    175 
    176 * Using Mid-Rule Actions::       Putting an action in the middle of a rule.
    177 * Mid-Rule Action Translation::  How mid-rule actions are actually processed.
    178 * Mid-Rule Conflicts::           Mid-rule actions can cause conflicts.
    179 
    180 Tracking Locations
    181 
    182 * Location Type::               Specifying a data type for locations.
    183 * Actions and Locations::       Using locations in actions.
    184 * Location Default Action::     Defining a general way to compute locations.
    185 
    186 Bison Declarations
    187 
    188 * Require Decl::      Requiring a Bison version.
    189 * Token Decl::        Declaring terminal symbols.
    190 * Precedence Decl::   Declaring terminals with precedence and associativity.
    191 * Union Decl::        Declaring the set of all semantic value types.
    192 * Type Decl::         Declaring the choice of type for a nonterminal symbol.
    193 * Initial Action Decl::  Code run before parsing starts.
    194 * Destructor Decl::   Declaring how symbols are freed.
    195 * Printer Decl::      Declaring how symbol values are displayed.
    196 * Expect Decl::       Suppressing warnings about parsing conflicts.
    197 * Start Decl::        Specifying the start symbol.
    198 * Pure Decl::         Requesting a reentrant parser.
    199 * Push Decl::         Requesting a push parser.
    200 * Decl Summary::      Table of all Bison declarations.
    201 * %define Summary::   Defining variables to adjust Bison's behavior.
    202 * %code Summary::     Inserting code into the parser source.
    203 
    204 Parser C-Language Interface
    205 
    206 * Parser Function::         How to call `yyparse' and what it returns.
    207 * Push Parser Function::    How to call `yypush_parse' and what it returns.
    208 * Pull Parser Function::    How to call `yypull_parse' and what it returns.
    209 * Parser Create Function::  How to call `yypstate_new' and what it returns.
    210 * Parser Delete Function::  How to call `yypstate_delete' and what it returns.
    211 * Lexical::                 You must supply a function `yylex'
    212                               which reads tokens.
    213 * Error Reporting::         You must supply a function `yyerror'.
    214 * Action Features::         Special features for use in actions.
    215 * Internationalization::    How to let the parser speak in the user's
    216                               native language.
    217 
    218 The Lexical Analyzer Function `yylex'
    219 
    220 * Calling Convention::  How `yyparse' calls `yylex'.
    221 * Token Values::        How `yylex' must return the semantic value
    222                           of the token it has read.
    223 * Token Locations::     How `yylex' must return the text location
    224                           (line number, etc.) of the token, if the
    225                           actions want that.
    226 * Pure Calling::        How the calling convention differs in a pure parser
    227                           (*note A Pure (Reentrant) Parser: Pure Decl.).
    228 
    229 The Bison Parser Algorithm
    230 
    231 * Lookahead::         Parser looks one token ahead when deciding what to do.
    232 * Shift/Reduce::      Conflicts: when either shifting or reduction is valid.
    233 * Precedence::        Operator precedence works by resolving conflicts.
    234 * Contextual Precedence::  When an operator's precedence depends on context.
    235 * Parser States::     The parser is a finite-state-machine with stack.
    236 * Reduce/Reduce::     When two rules are applicable in the same situation.
    237 * Mysterious Conflicts:: Conflicts that look unjustified.
    238 * Tuning LR::         How to tune fundamental aspects of LR-based parsing.
    239 * Generalized LR Parsing::  Parsing arbitrary context-free grammars.
    240 * Memory Management:: What happens when memory is exhausted.  How to avoid it.
    241 
    242 Operator Precedence
    243 
    244 * Why Precedence::    An example showing why precedence is needed.
    245 * Using Precedence::  How to specify precedence in Bison grammars.
    246 * Precedence Examples::  How these features are used in the previous example.
    247 * How Precedence::    How they work.
    248 * Non Operators::     Using precedence for general conflicts.
    249 
    250 Tuning LR
    251 
    252 * LR Table Construction:: Choose a different construction algorithm.
    253 * Default Reductions::    Disable default reductions.
    254 * LAC::                   Correct lookahead sets in the parser states.
    255 * Unreachable States::    Keep unreachable parser states for debugging.
    256 
    257 Handling Context Dependencies
    258 
    259 * Semantic Tokens::   Token parsing can depend on the semantic context.
    260 * Lexical Tie-ins::   Token parsing can depend on the syntactic context.
    261 * Tie-in Recovery::   Lexical tie-ins have implications for how
    262                         error recovery rules must be written.
    263 
    264 Debugging Your Parser
    265 
    266 * Understanding::     Understanding the structure of your parser.
    267 * Graphviz::          Getting a visual representation of the parser.
    268 * Xml::               Getting a markup representation of the parser.
    269 * Tracing::           Tracing the execution of your parser.
    270 
    271 Tracing Your Parser
    272 
    273 * Enabling Traces::             Activating run-time trace support
    274 * Mfcalc Traces::               Extending `mfcalc' to support traces
    275 * The YYPRINT Macro::           Obsolete interface for semantic value reports
    276 
    277 Invoking Bison
    278 
    279 * Bison Options::     All the options described in detail,
    280                         in alphabetical order by short options.
    281 * Option Cross Key::  Alphabetical list of long options.
    282 * Yacc Library::      Yacc-compatible `yylex' and `main'.
    283 
    284 Parsers Written In Other Languages
    285 
    286 * C++ Parsers::                 The interface to generate C++ parser classes
    287 * Java Parsers::                The interface to generate Java parser classes
    288 
    289 C++ Parsers
    290 
    291 * C++ Bison Interface::         Asking for C++ parser generation
    292 * C++ Semantic Values::         %union vs. C++
    293 * C++ Location Values::         The position and location classes
    294 * C++ Parser Interface::        Instantiating and running the parser
    295 * C++ Scanner Interface::       Exchanges between yylex and parse
    296 * A Complete C++ Example::      Demonstrating their use
    297 
    298 C++ Location Values
    299 
    300 * C++ position::                One point in the source file
    301 * C++ location::                Two points in the source file
    302 * User Defined Location Type::  Required interface for locations
    303 
    304 A Complete C++ Example
    305 
    306 * Calc++ --- C++ Calculator::   The specifications
    307 * Calc++ Parsing Driver::       An active parsing context
    308 * Calc++ Parser::               A parser class
    309 * Calc++ Scanner::              A pure C++ Flex scanner
    310 * Calc++ Top Level::            Conducting the band
    311 
    312 Java Parsers
    313 
    314 * Java Bison Interface::        Asking for Java parser generation
    315 * Java Semantic Values::        %type and %token vs. Java
    316 * Java Location Values::        The position and location classes
    317 * Java Parser Interface::       Instantiating and running the parser
    318 * Java Scanner Interface::      Specifying the scanner for the parser
    319 * Java Action Features::        Special features for use in actions
    320 * Java Differences::            Differences between C/C++ and Java Grammars
    321 * Java Declarations Summary::   List of Bison declarations used with Java
    322 
    323 Frequently Asked Questions
    324 
    325 * Memory Exhausted::            Breaking the Stack Limits
    326 * How Can I Reset the Parser::  `yyparse' Keeps some State
    327 * Strings are Destroyed::       `yylval' Loses Track of Strings
    328 * Implementing Gotos/Loops::    Control Flow in the Calculator
    329 * Multiple start-symbols::      Factoring closely related grammars
    330 * Secure?  Conform?::           Is Bison POSIX safe?
    331 * I can't build Bison::         Troubleshooting
    332 * Where can I find help?::      Troubleshouting
    333 * Bug Reports::                 Troublereporting
    334 * More Languages::              Parsers in C++, Java, and so on
    335 * Beta Testing::                Experimenting development versions
    336 * Mailing Lists::               Meeting other Bison users
    337 
    338 Copying This Manual
    339 
    340 * Copying This Manual::         License for copying this manual.
    341 
    342 
    343 File: bison.info,  Node: Introduction,  Next: Conditions,  Prev: Top,  Up: Top
    344 
    345 Introduction
    346 ************
    347 
    348 "Bison" is a general-purpose parser generator that converts an
    349 annotated context-free grammar into a deterministic LR or generalized
    350 LR (GLR) parser employing LALR(1) parser tables.  As an experimental
    351 feature, Bison can also generate IELR(1) or canonical LR(1) parser
    352 tables.  Once you are proficient with Bison, you can use it to develop
    353 a wide range of language parsers, from those used in simple desk
    354 calculators to complex programming languages.
    355 
    356    Bison is upward compatible with Yacc: all properly-written Yacc
    357 grammars ought to work with Bison with no change.  Anyone familiar with
    358 Yacc should be able to use Bison with little trouble.  You need to be
    359 fluent in C or C++ programming in order to use Bison or to understand
    360 this manual.  Java is also supported as an experimental feature.
    361 
    362    We begin with tutorial chapters that explain the basic concepts of
    363 using Bison and show three explained examples, each building on the
    364 last.  If you don't know Bison or Yacc, start by reading these
    365 chapters.  Reference chapters follow, which describe specific aspects
    366 of Bison in detail.
    367 
    368    Bison was written originally by Robert Corbett.  Richard Stallman
    369 made it Yacc-compatible.  Wilfred Hansen of Carnegie Mellon University
    370 added multi-character string literals and other features.  Since then,
    371 Bison has grown more robust and evolved many other new features thanks
    372 to the hard work of a long list of volunteers.  For details, see the
    373 `THANKS' and `ChangeLog' files included in the Bison distribution.
    374 
    375    This edition corresponds to version 2.7 of Bison.
    376 
    377 
    378 File: bison.info,  Node: Conditions,  Next: Copying,  Prev: Introduction,  Up: Top
    379 
    380 Conditions for Using Bison
    381 **************************
    382 
    383 The distribution terms for Bison-generated parsers permit using the
    384 parsers in nonfree programs.  Before Bison version 2.2, these extra
    385 permissions applied only when Bison was generating LALR(1) parsers in
    386 C.  And before Bison version 1.24, Bison-generated parsers could be
    387 used only in programs that were free software.
    388 
    389    The other GNU programming tools, such as the GNU C compiler, have
    390 never had such a requirement.  They could always be used for nonfree
    391 software.  The reason Bison was different was not due to a special
    392 policy decision; it resulted from applying the usual General Public
    393 License to all of the Bison source code.
    394 
    395    The main output of the Bison utility--the Bison parser implementation
    396 file--contains a verbatim copy of a sizable piece of Bison, which is
    397 the code for the parser's implementation.  (The actions from your
    398 grammar are inserted into this implementation at one point, but most of
    399 the rest of the implementation is not changed.)  When we applied the
    400 GPL terms to the skeleton code for the parser's implementation, the
    401 effect was to restrict the use of Bison output to free software.
    402 
    403    We didn't change the terms because of sympathy for people who want to
    404 make software proprietary.  *Software should be free.*  But we
    405 concluded that limiting Bison's use to free software was doing little to
    406 encourage people to make other software free.  So we decided to make the
    407 practical conditions for using Bison match the practical conditions for
    408 using the other GNU tools.
    409 
    410    This exception applies when Bison is generating code for a parser.
    411 You can tell whether the exception applies to a Bison output file by
    412 inspecting the file for text beginning with "As a special
    413 exception...".  The text spells out the exact terms of the exception.
    414 
    415 
    416 File: bison.info,  Node: Copying,  Next: Concepts,  Prev: Conditions,  Up: Top
    417 
    418 GNU GENERAL PUBLIC LICENSE
    419 **************************
    420 
    421                         Version 3, 29 June 2007
    422 
    423      Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
    424 
    425      Everyone is permitted to copy and distribute verbatim copies of this
    426      license document, but changing it is not allowed.
    427 
    428 Preamble
    429 ========
    430 
    431 The GNU General Public License is a free, copyleft license for software
    432 and other kinds of works.
    433 
    434    The licenses for most software and other practical works are designed
    435 to take away your freedom to share and change the works.  By contrast,
    436 the GNU General Public License is intended to guarantee your freedom to
    437 share and change all versions of a program--to make sure it remains
    438 free software for all its users.  We, the Free Software Foundation, use
    439 the GNU General Public License for most of our software; it applies
    440 also to any other work released this way by its authors.  You can apply
    441 it to your programs, too.
    442 
    443    When we speak of free software, we are referring to freedom, not
    444 price.  Our General Public Licenses are designed to make sure that you
    445 have the freedom to distribute copies of free software (and charge for
    446 them if you wish), that you receive source code or can get it if you
    447 want it, that you can change the software or use pieces of it in new
    448 free programs, and that you know you can do these things.
    449 
    450    To protect your rights, we need to prevent others from denying you
    451 these rights or asking you to surrender the rights.  Therefore, you
    452 have certain responsibilities if you distribute copies of the software,
    453 or if you modify it: responsibilities to respect the freedom of others.
    454 
    455    For example, if you distribute copies of such a program, whether
    456 gratis or for a fee, you must pass on to the recipients the same
    457 freedoms that you received.  You must make sure that they, too, receive
    458 or can get the source code.  And you must show them these terms so they
    459 know their rights.
    460 
    461    Developers that use the GNU GPL protect your rights with two steps:
    462 (1) assert copyright on the software, and (2) offer you this License
    463 giving you legal permission to copy, distribute and/or modify it.
    464 
    465    For the developers' and authors' protection, the GPL clearly explains
    466 that there is no warranty for this free software.  For both users' and
    467 authors' sake, the GPL requires that modified versions be marked as
    468 changed, so that their problems will not be attributed erroneously to
    469 authors of previous versions.
    470 
    471    Some devices are designed to deny users access to install or run
    472 modified versions of the software inside them, although the
    473 manufacturer can do so.  This is fundamentally incompatible with the
    474 aim of protecting users' freedom to change the software.  The
    475 systematic pattern of such abuse occurs in the area of products for
    476 individuals to use, which is precisely where it is most unacceptable.
    477 Therefore, we have designed this version of the GPL to prohibit the
    478 practice for those products.  If such problems arise substantially in
    479 other domains, we stand ready to extend this provision to those domains
    480 in future versions of the GPL, as needed to protect the freedom of
    481 users.
    482 
    483    Finally, every program is threatened constantly by software patents.
    484 States should not allow patents to restrict development and use of
    485 software on general-purpose computers, but in those that do, we wish to
    486 avoid the special danger that patents applied to a free program could
    487 make it effectively proprietary.  To prevent this, the GPL assures that
    488 patents cannot be used to render the program non-free.
    489 
    490    The precise terms and conditions for copying, distribution and
    491 modification follow.
    492 
    493 TERMS AND CONDITIONS
    494 ====================
    495 
    496   0. Definitions.
    497 
    498      "This License" refers to version 3 of the GNU General Public
    499      License.
    500 
    501      "Copyright" also means copyright-like laws that apply to other
    502      kinds of works, such as semiconductor masks.
    503 
    504      "The Program" refers to any copyrightable work licensed under this
    505      License.  Each licensee is addressed as "you".  "Licensees" and
    506      "recipients" may be individuals or organizations.
    507 
    508      To "modify" a work means to copy from or adapt all or part of the
    509      work in a fashion requiring copyright permission, other than the
    510      making of an exact copy.  The resulting work is called a "modified
    511      version" of the earlier work or a work "based on" the earlier work.
    512 
    513      A "covered work" means either the unmodified Program or a work
    514      based on the Program.
    515 
    516      To "propagate" a work means to do anything with it that, without
    517      permission, would make you directly or secondarily liable for
    518      infringement under applicable copyright law, except executing it
    519      on a computer or modifying a private copy.  Propagation includes
    520      copying, distribution (with or without modification), making
    521      available to the public, and in some countries other activities as
    522      well.
    523 
    524      To "convey" a work means any kind of propagation that enables other
    525      parties to make or receive copies.  Mere interaction with a user
    526      through a computer network, with no transfer of a copy, is not
    527      conveying.
    528 
    529      An interactive user interface displays "Appropriate Legal Notices"
    530      to the extent that it includes a convenient and prominently visible
    531      feature that (1) displays an appropriate copyright notice, and (2)
    532      tells the user that there is no warranty for the work (except to
    533      the extent that warranties are provided), that licensees may
    534      convey the work under this License, and how to view a copy of this
    535      License.  If the interface presents a list of user commands or
    536      options, such as a menu, a prominent item in the list meets this
    537      criterion.
    538 
    539   1. Source Code.
    540 
    541      The "source code" for a work means the preferred form of the work
    542      for making modifications to it.  "Object code" means any
    543      non-source form of a work.
    544 
    545      A "Standard Interface" means an interface that either is an
    546      official standard defined by a recognized standards body, or, in
    547      the case of interfaces specified for a particular programming
    548      language, one that is widely used among developers working in that
    549      language.
    550 
    551      The "System Libraries" of an executable work include anything,
    552      other than the work as a whole, that (a) is included in the normal
    553      form of packaging a Major Component, but which is not part of that
    554      Major Component, and (b) serves only to enable use of the work
    555      with that Major Component, or to implement a Standard Interface
    556      for which an implementation is available to the public in source
    557      code form.  A "Major Component", in this context, means a major
    558      essential component (kernel, window system, and so on) of the
    559      specific operating system (if any) on which the executable work
    560      runs, or a compiler used to produce the work, or an object code
    561      interpreter used to run it.
    562 
    563      The "Corresponding Source" for a work in object code form means all
    564      the source code needed to generate, install, and (for an executable
    565      work) run the object code and to modify the work, including
    566      scripts to control those activities.  However, it does not include
    567      the work's System Libraries, or general-purpose tools or generally
    568      available free programs which are used unmodified in performing
    569      those activities but which are not part of the work.  For example,
    570      Corresponding Source includes interface definition files
    571      associated with source files for the work, and the source code for
    572      shared libraries and dynamically linked subprograms that the work
    573      is specifically designed to require, such as by intimate data
    574      communication or control flow between those subprograms and other
    575      parts of the work.
    576 
    577      The Corresponding Source need not include anything that users can
    578      regenerate automatically from other parts of the Corresponding
    579      Source.
    580 
    581      The Corresponding Source for a work in source code form is that
    582      same work.
    583 
    584   2. Basic Permissions.
    585 
    586      All rights granted under this License are granted for the term of
    587      copyright on the Program, and are irrevocable provided the stated
    588      conditions are met.  This License explicitly affirms your unlimited
    589      permission to run the unmodified Program.  The output from running
    590      a covered work is covered by this License only if the output,
    591      given its content, constitutes a covered work.  This License
    592      acknowledges your rights of fair use or other equivalent, as
    593      provided by copyright law.
    594 
    595      You may make, run and propagate covered works that you do not
    596      convey, without conditions so long as your license otherwise
    597      remains in force.  You may convey covered works to others for the
    598      sole purpose of having them make modifications exclusively for
    599      you, or provide you with facilities for running those works,
    600      provided that you comply with the terms of this License in
    601      conveying all material for which you do not control copyright.
    602      Those thus making or running the covered works for you must do so
    603      exclusively on your behalf, under your direction and control, on
    604      terms that prohibit them from making any copies of your
    605      copyrighted material outside their relationship with you.
    606 
    607      Conveying under any other circumstances is permitted solely under
    608      the conditions stated below.  Sublicensing is not allowed; section
    609      10 makes it unnecessary.
    610 
    611   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
    612 
    613      No covered work shall be deemed part of an effective technological
    614      measure under any applicable law fulfilling obligations under
    615      article 11 of the WIPO copyright treaty adopted on 20 December
    616      1996, or similar laws prohibiting or restricting circumvention of
    617      such measures.
    618 
    619      When you convey a covered work, you waive any legal power to forbid
    620      circumvention of technological measures to the extent such
    621      circumvention is effected by exercising rights under this License
    622      with respect to the covered work, and you disclaim any intention
    623      to limit operation or modification of the work as a means of
    624      enforcing, against the work's users, your or third parties' legal
    625      rights to forbid circumvention of technological measures.
    626 
    627   4. Conveying Verbatim Copies.
    628 
    629      You may convey verbatim copies of the Program's source code as you
    630      receive it, in any medium, provided that you conspicuously and
    631      appropriately publish on each copy an appropriate copyright notice;
    632      keep intact all notices stating that this License and any
    633      non-permissive terms added in accord with section 7 apply to the
    634      code; keep intact all notices of the absence of any warranty; and
    635      give all recipients a copy of this License along with the Program.
    636 
    637      You may charge any price or no price for each copy that you convey,
    638      and you may offer support or warranty protection for a fee.
    639 
    640   5. Conveying Modified Source Versions.
    641 
    642      You may convey a work based on the Program, or the modifications to
    643      produce it from the Program, in the form of source code under the
    644      terms of section 4, provided that you also meet all of these
    645      conditions:
    646 
    647        a. The work must carry prominent notices stating that you
    648           modified it, and giving a relevant date.
    649 
    650        b. The work must carry prominent notices stating that it is
    651           released under this License and any conditions added under
    652           section 7.  This requirement modifies the requirement in
    653           section 4 to "keep intact all notices".
    654 
    655        c. You must license the entire work, as a whole, under this
    656           License to anyone who comes into possession of a copy.  This
    657           License will therefore apply, along with any applicable
    658           section 7 additional terms, to the whole of the work, and all
    659           its parts, regardless of how they are packaged.  This License
    660           gives no permission to license the work in any other way, but
    661           it does not invalidate such permission if you have separately
    662           received it.
    663 
    664        d. If the work has interactive user interfaces, each must display
    665           Appropriate Legal Notices; however, if the Program has
    666           interactive interfaces that do not display Appropriate Legal
    667           Notices, your work need not make them do so.
    668 
    669      A compilation of a covered work with other separate and independent
    670      works, which are not by their nature extensions of the covered
    671      work, and which are not combined with it such as to form a larger
    672      program, in or on a volume of a storage or distribution medium, is
    673      called an "aggregate" if the compilation and its resulting
    674      copyright are not used to limit the access or legal rights of the
    675      compilation's users beyond what the individual works permit.
    676      Inclusion of a covered work in an aggregate does not cause this
    677      License to apply to the other parts of the aggregate.
    678 
    679   6. Conveying Non-Source Forms.
    680 
    681      You may convey a covered work in object code form under the terms
    682      of sections 4 and 5, provided that you also convey the
    683      machine-readable Corresponding Source under the terms of this
    684      License, in one of these ways:
    685 
    686        a. Convey the object code in, or embodied in, a physical product
    687           (including a physical distribution medium), accompanied by the
    688           Corresponding Source fixed on a durable physical medium
    689           customarily used for software interchange.
    690 
    691        b. Convey the object code in, or embodied in, a physical product
    692           (including a physical distribution medium), accompanied by a
    693           written offer, valid for at least three years and valid for
    694           as long as you offer spare parts or customer support for that
    695           product model, to give anyone who possesses the object code
    696           either (1) a copy of the Corresponding Source for all the
    697           software in the product that is covered by this License, on a
    698           durable physical medium customarily used for software
    699           interchange, for a price no more than your reasonable cost of
    700           physically performing this conveying of source, or (2) access
    701           to copy the Corresponding Source from a network server at no
    702           charge.
    703 
    704        c. Convey individual copies of the object code with a copy of
    705           the written offer to provide the Corresponding Source.  This
    706           alternative is allowed only occasionally and noncommercially,
    707           and only if you received the object code with such an offer,
    708           in accord with subsection 6b.
    709 
    710        d. Convey the object code by offering access from a designated
    711           place (gratis or for a charge), and offer equivalent access
    712           to the Corresponding Source in the same way through the same
    713           place at no further charge.  You need not require recipients
    714           to copy the Corresponding Source along with the object code.
    715           If the place to copy the object code is a network server, the
    716           Corresponding Source may be on a different server (operated
    717           by you or a third party) that supports equivalent copying
    718           facilities, provided you maintain clear directions next to
    719           the object code saying where to find the Corresponding Source.
    720           Regardless of what server hosts the Corresponding Source, you
    721           remain obligated to ensure that it is available for as long
    722           as needed to satisfy these requirements.
    723 
    724        e. Convey the object code using peer-to-peer transmission,
    725           provided you inform other peers where the object code and
    726           Corresponding Source of the work are being offered to the
    727           general public at no charge under subsection 6d.
    728 
    729 
    730      A separable portion of the object code, whose source code is
    731      excluded from the Corresponding Source as a System Library, need
    732      not be included in conveying the object code work.
    733 
    734      A "User Product" is either (1) a "consumer product", which means
    735      any tangible personal property which is normally used for personal,
    736      family, or household purposes, or (2) anything designed or sold for
    737      incorporation into a dwelling.  In determining whether a product
    738      is a consumer product, doubtful cases shall be resolved in favor of
    739      coverage.  For a particular product received by a particular user,
    740      "normally used" refers to a typical or common use of that class of
    741      product, regardless of the status of the particular user or of the
    742      way in which the particular user actually uses, or expects or is
    743      expected to use, the product.  A product is a consumer product
    744      regardless of whether the product has substantial commercial,
    745      industrial or non-consumer uses, unless such uses represent the
    746      only significant mode of use of the product.
    747 
    748      "Installation Information" for a User Product means any methods,
    749      procedures, authorization keys, or other information required to
    750      install and execute modified versions of a covered work in that
    751      User Product from a modified version of its Corresponding Source.
    752      The information must suffice to ensure that the continued
    753      functioning of the modified object code is in no case prevented or
    754      interfered with solely because modification has been made.
    755 
    756      If you convey an object code work under this section in, or with,
    757      or specifically for use in, a User Product, and the conveying
    758      occurs as part of a transaction in which the right of possession
    759      and use of the User Product is transferred to the recipient in
    760      perpetuity or for a fixed term (regardless of how the transaction
    761      is characterized), the Corresponding Source conveyed under this
    762      section must be accompanied by the Installation Information.  But
    763      this requirement does not apply if neither you nor any third party
    764      retains the ability to install modified object code on the User
    765      Product (for example, the work has been installed in ROM).
    766 
    767      The requirement to provide Installation Information does not
    768      include a requirement to continue to provide support service,
    769      warranty, or updates for a work that has been modified or
    770      installed by the recipient, or for the User Product in which it
    771      has been modified or installed.  Access to a network may be denied
    772      when the modification itself materially and adversely affects the
    773      operation of the network or violates the rules and protocols for
    774      communication across the network.
    775 
    776      Corresponding Source conveyed, and Installation Information
    777      provided, in accord with this section must be in a format that is
    778      publicly documented (and with an implementation available to the
    779      public in source code form), and must require no special password
    780      or key for unpacking, reading or copying.
    781 
    782   7. Additional Terms.
    783 
    784      "Additional permissions" are terms that supplement the terms of
    785      this License by making exceptions from one or more of its
    786      conditions.  Additional permissions that are applicable to the
    787      entire Program shall be treated as though they were included in
    788      this License, to the extent that they are valid under applicable
    789      law.  If additional permissions apply only to part of the Program,
    790      that part may be used separately under those permissions, but the
    791      entire Program remains governed by this License without regard to
    792      the additional permissions.
    793 
    794      When you convey a copy of a covered work, you may at your option
    795      remove any additional permissions from that copy, or from any part
    796      of it.  (Additional permissions may be written to require their own
    797      removal in certain cases when you modify the work.)  You may place
    798      additional permissions on material, added by you to a covered work,
    799      for which you have or can give appropriate copyright permission.
    800 
    801      Notwithstanding any other provision of this License, for material
    802      you add to a covered work, you may (if authorized by the copyright
    803      holders of that material) supplement the terms of this License
    804      with terms:
    805 
    806        a. Disclaiming warranty or limiting liability differently from
    807           the terms of sections 15 and 16 of this License; or
    808 
    809        b. Requiring preservation of specified reasonable legal notices
    810           or author attributions in that material or in the Appropriate
    811           Legal Notices displayed by works containing it; or
    812 
    813        c. Prohibiting misrepresentation of the origin of that material,
    814           or requiring that modified versions of such material be
    815           marked in reasonable ways as different from the original
    816           version; or
    817 
    818        d. Limiting the use for publicity purposes of names of licensors
    819           or authors of the material; or
    820 
    821        e. Declining to grant rights under trademark law for use of some
    822           trade names, trademarks, or service marks; or
    823 
    824        f. Requiring indemnification of licensors and authors of that
    825           material by anyone who conveys the material (or modified
    826           versions of it) with contractual assumptions of liability to
    827           the recipient, for any liability that these contractual
    828           assumptions directly impose on those licensors and authors.
    829 
    830      All other non-permissive additional terms are considered "further
    831      restrictions" within the meaning of section 10.  If the Program as
    832      you received it, or any part of it, contains a notice stating that
    833      it is governed by this License along with a term that is a further
    834      restriction, you may remove that term.  If a license document
    835      contains a further restriction but permits relicensing or
    836      conveying under this License, you may add to a covered work
    837      material governed by the terms of that license document, provided
    838      that the further restriction does not survive such relicensing or
    839      conveying.
    840 
    841      If you add terms to a covered work in accord with this section, you
    842      must place, in the relevant source files, a statement of the
    843      additional terms that apply to those files, or a notice indicating
    844      where to find the applicable terms.
    845 
    846      Additional terms, permissive or non-permissive, may be stated in
    847      the form of a separately written license, or stated as exceptions;
    848      the above requirements apply either way.
    849 
    850   8. Termination.
    851 
    852      You may not propagate or modify a covered work except as expressly
    853      provided under this License.  Any attempt otherwise to propagate or
    854      modify it is void, and will automatically terminate your rights
    855      under this License (including any patent licenses granted under
    856      the third paragraph of section 11).
    857 
    858      However, if you cease all violation of this License, then your
    859      license from a particular copyright holder is reinstated (a)
    860      provisionally, unless and until the copyright holder explicitly
    861      and finally terminates your license, and (b) permanently, if the
    862      copyright holder fails to notify you of the violation by some
    863      reasonable means prior to 60 days after the cessation.
    864 
    865      Moreover, your license from a particular copyright holder is
    866      reinstated permanently if the copyright holder notifies you of the
    867      violation by some reasonable means, this is the first time you have
    868      received notice of violation of this License (for any work) from
    869      that copyright holder, and you cure the violation prior to 30 days
    870      after your receipt of the notice.
    871 
    872      Termination of your rights under this section does not terminate
    873      the licenses of parties who have received copies or rights from
    874      you under this License.  If your rights have been terminated and
    875      not permanently reinstated, you do not qualify to receive new
    876      licenses for the same material under section 10.
    877 
    878   9. Acceptance Not Required for Having Copies.
    879 
    880      You are not required to accept this License in order to receive or
    881      run a copy of the Program.  Ancillary propagation of a covered work
    882      occurring solely as a consequence of using peer-to-peer
    883      transmission to receive a copy likewise does not require
    884      acceptance.  However, nothing other than this License grants you
    885      permission to propagate or modify any covered work.  These actions
    886      infringe copyright if you do not accept this License.  Therefore,
    887      by modifying or propagating a covered work, you indicate your
    888      acceptance of this License to do so.
    889 
    890  10. Automatic Licensing of Downstream Recipients.
    891 
    892      Each time you convey a covered work, the recipient automatically
    893      receives a license from the original licensors, to run, modify and
    894      propagate that work, subject to this License.  You are not
    895      responsible for enforcing compliance by third parties with this
    896      License.
    897 
    898      An "entity transaction" is a transaction transferring control of an
    899      organization, or substantially all assets of one, or subdividing an
    900      organization, or merging organizations.  If propagation of a
    901      covered work results from an entity transaction, each party to that
    902      transaction who receives a copy of the work also receives whatever
    903      licenses to the work the party's predecessor in interest had or
    904      could give under the previous paragraph, plus a right to
    905      possession of the Corresponding Source of the work from the
    906      predecessor in interest, if the predecessor has it or can get it
    907      with reasonable efforts.
    908 
    909      You may not impose any further restrictions on the exercise of the
    910      rights granted or affirmed under this License.  For example, you
    911      may not impose a license fee, royalty, or other charge for
    912      exercise of rights granted under this License, and you may not
    913      initiate litigation (including a cross-claim or counterclaim in a
    914      lawsuit) alleging that any patent claim is infringed by making,
    915      using, selling, offering for sale, or importing the Program or any
    916      portion of it.
    917 
    918  11. Patents.
    919 
    920      A "contributor" is a copyright holder who authorizes use under this
    921      License of the Program or a work on which the Program is based.
    922      The work thus licensed is called the contributor's "contributor
    923      version".
    924 
    925      A contributor's "essential patent claims" are all patent claims
    926      owned or controlled by the contributor, whether already acquired or
    927      hereafter acquired, that would be infringed by some manner,
    928      permitted by this License, of making, using, or selling its
    929      contributor version, but do not include claims that would be
    930      infringed only as a consequence of further modification of the
    931      contributor version.  For purposes of this definition, "control"
    932      includes the right to grant patent sublicenses in a manner
    933      consistent with the requirements of this License.
    934 
    935      Each contributor grants you a non-exclusive, worldwide,
    936      royalty-free patent license under the contributor's essential
    937      patent claims, to make, use, sell, offer for sale, import and
    938      otherwise run, modify and propagate the contents of its
    939      contributor version.
    940 
    941      In the following three paragraphs, a "patent license" is any
    942      express agreement or commitment, however denominated, not to
    943      enforce a patent (such as an express permission to practice a
    944      patent or covenant not to sue for patent infringement).  To
    945      "grant" such a patent license to a party means to make such an
    946      agreement or commitment not to enforce a patent against the party.
    947 
    948      If you convey a covered work, knowingly relying on a patent
    949      license, and the Corresponding Source of the work is not available
    950      for anyone to copy, free of charge and under the terms of this
    951      License, through a publicly available network server or other
    952      readily accessible means, then you must either (1) cause the
    953      Corresponding Source to be so available, or (2) arrange to deprive
    954      yourself of the benefit of the patent license for this particular
    955      work, or (3) arrange, in a manner consistent with the requirements
    956      of this License, to extend the patent license to downstream
    957      recipients.  "Knowingly relying" means you have actual knowledge
    958      that, but for the patent license, your conveying the covered work
    959      in a country, or your recipient's use of the covered work in a
    960      country, would infringe one or more identifiable patents in that
    961      country that you have reason to believe are valid.
    962 
    963      If, pursuant to or in connection with a single transaction or
    964      arrangement, you convey, or propagate by procuring conveyance of, a
    965      covered work, and grant a patent license to some of the parties
    966      receiving the covered work authorizing them to use, propagate,
    967      modify or convey a specific copy of the covered work, then the
    968      patent license you grant is automatically extended to all
    969      recipients of the covered work and works based on it.
    970 
    971      A patent license is "discriminatory" if it does not include within
    972      the scope of its coverage, prohibits the exercise of, or is
    973      conditioned on the non-exercise of one or more of the rights that
    974      are specifically granted under this License.  You may not convey a
    975      covered work if you are a party to an arrangement with a third
    976      party that is in the business of distributing software, under
    977      which you make payment to the third party based on the extent of
    978      your activity of conveying the work, and under which the third
    979      party grants, to any of the parties who would receive the covered
    980      work from you, a discriminatory patent license (a) in connection
    981      with copies of the covered work conveyed by you (or copies made
    982      from those copies), or (b) primarily for and in connection with
    983      specific products or compilations that contain the covered work,
    984      unless you entered into that arrangement, or that patent license
    985      was granted, prior to 28 March 2007.
    986 
    987      Nothing in this License shall be construed as excluding or limiting
    988      any implied license or other defenses to infringement that may
    989      otherwise be available to you under applicable patent law.
    990 
    991  12. No Surrender of Others' Freedom.
    992 
    993      If conditions are imposed on you (whether by court order,
    994      agreement or otherwise) that contradict the conditions of this
    995      License, they do not excuse you from the conditions of this
    996      License.  If you cannot convey a covered work so as to satisfy
    997      simultaneously your obligations under this License and any other
    998      pertinent obligations, then as a consequence you may not convey it
    999      at all.  For example, if you agree to terms that obligate you to
   1000      collect a royalty for further conveying from those to whom you
   1001      convey the Program, the only way you could satisfy both those
   1002      terms and this License would be to refrain entirely from conveying
   1003      the Program.
   1004 
   1005  13. Use with the GNU Affero General Public License.
   1006 
   1007      Notwithstanding any other provision of this License, you have
   1008      permission to link or combine any covered work with a work licensed
   1009      under version 3 of the GNU Affero General Public License into a
   1010      single combined work, and to convey the resulting work.  The terms
   1011      of this License will continue to apply to the part which is the
   1012      covered work, but the special requirements of the GNU Affero
   1013      General Public License, section 13, concerning interaction through
   1014      a network will apply to the combination as such.
   1015 
   1016  14. Revised Versions of this License.
   1017 
   1018      The Free Software Foundation may publish revised and/or new
   1019      versions of the GNU General Public License from time to time.
   1020      Such new versions will be similar in spirit to the present
   1021      version, but may differ in detail to address new problems or
   1022      concerns.
   1023 
   1024      Each version is given a distinguishing version number.  If the
   1025      Program specifies that a certain numbered version of the GNU
   1026      General Public License "or any later version" applies to it, you
   1027      have the option of following the terms and conditions either of
   1028      that numbered version or of any later version published by the
   1029      Free Software Foundation.  If the Program does not specify a
   1030      version number of the GNU General Public License, you may choose
   1031      any version ever published by the Free Software Foundation.
   1032 
   1033      If the Program specifies that a proxy can decide which future
   1034      versions of the GNU General Public License can be used, that
   1035      proxy's public statement of acceptance of a version permanently
   1036      authorizes you to choose that version for the Program.
   1037 
   1038      Later license versions may give you additional or different
   1039      permissions.  However, no additional obligations are imposed on any
   1040      author or copyright holder as a result of your choosing to follow a
   1041      later version.
   1042 
   1043  15. Disclaimer of Warranty.
   1044 
   1045      THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
   1046      APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE
   1047      COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS"
   1048      WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
   1049      INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
   1050      MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE
   1051      RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
   1052      SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
   1053      NECESSARY SERVICING, REPAIR OR CORRECTION.
   1054 
   1055  16. Limitation of Liability.
   1056 
   1057      IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
   1058      WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
   1059      AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
   1060      FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
   1061      CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
   1062      THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
   1063      BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
   1064      PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
   1065      PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
   1066      THE POSSIBILITY OF SUCH DAMAGES.
   1067 
   1068  17. Interpretation of Sections 15 and 16.
   1069 
   1070      If the disclaimer of warranty and limitation of liability provided
   1071      above cannot be given local legal effect according to their terms,
   1072      reviewing courts shall apply local law that most closely
   1073      approximates an absolute waiver of all civil liability in
   1074      connection with the Program, unless a warranty or assumption of
   1075      liability accompanies a copy of the Program in return for a fee.
   1076 
   1077 
   1078 END OF TERMS AND CONDITIONS
   1079 ===========================
   1080 
   1081 How to Apply These Terms to Your New Programs
   1082 =============================================
   1083 
   1084 If you develop a new program, and you want it to be of the greatest
   1085 possible use to the public, the best way to achieve this is to make it
   1086 free software which everyone can redistribute and change under these
   1087 terms.
   1088 
   1089    To do so, attach the following notices to the program.  It is safest
   1090 to attach them to the start of each source file to most effectively
   1091 state the exclusion of warranty; and each file should have at least the
   1092 "copyright" line and a pointer to where the full notice is found.
   1093 
   1094      ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
   1095      Copyright (C) YEAR NAME OF AUTHOR
   1096 
   1097      This program is free software: you can redistribute it and/or modify
   1098      it under the terms of the GNU General Public License as published by
   1099      the Free Software Foundation, either version 3 of the License, or (at
   1100      your option) any later version.
   1101 
   1102      This program is distributed in the hope that it will be useful, but
   1103      WITHOUT ANY WARRANTY; without even the implied warranty of
   1104      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   1105      General Public License for more details.
   1106 
   1107      You should have received a copy of the GNU General Public License
   1108      along with this program.  If not, see `http://www.gnu.org/licenses/'.
   1109 
   1110    Also add information on how to contact you by electronic and paper
   1111 mail.
   1112 
   1113    If the program does terminal interaction, make it output a short
   1114 notice like this when it starts in an interactive mode:
   1115 
   1116      PROGRAM Copyright (C) YEAR NAME OF AUTHOR
   1117      This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
   1118      This is free software, and you are welcome to redistribute it
   1119      under certain conditions; type `show c' for details.
   1120 
   1121    The hypothetical commands `show w' and `show c' should show the
   1122 appropriate parts of the General Public License.  Of course, your
   1123 program's commands might be different; for a GUI interface, you would
   1124 use an "about box".
   1125 
   1126    You should also get your employer (if you work as a programmer) or
   1127 school, if any, to sign a "copyright disclaimer" for the program, if
   1128 necessary.  For more information on this, and how to apply and follow
   1129 the GNU GPL, see `http://www.gnu.org/licenses/'.
   1130 
   1131    The GNU General Public License does not permit incorporating your
   1132 program into proprietary programs.  If your program is a subroutine
   1133 library, you may consider it more useful to permit linking proprietary
   1134 applications with the library.  If this is what you want to do, use the
   1135 GNU Lesser General Public License instead of this License.  But first,
   1136 please read `http://www.gnu.org/philosophy/why-not-lgpl.html'.
   1137 
   1138 
   1139 File: bison.info,  Node: Concepts,  Next: Examples,  Prev: Copying,  Up: Top
   1140 
   1141 1 The Concepts of Bison
   1142 ***********************
   1143 
   1144 This chapter introduces many of the basic concepts without which the
   1145 details of Bison will not make sense.  If you do not already know how to
   1146 use Bison or Yacc, we suggest you start by reading this chapter
   1147 carefully.
   1148 
   1149 * Menu:
   1150 
   1151 * Language and Grammar:: Languages and context-free grammars,
   1152                            as mathematical ideas.
   1153 * Grammar in Bison::     How we represent grammars for Bison's sake.
   1154 * Semantic Values::      Each token or syntactic grouping can have
   1155                            a semantic value (the value of an integer,
   1156                            the name of an identifier, etc.).
   1157 * Semantic Actions::     Each rule can have an action containing C code.
   1158 * GLR Parsers::          Writing parsers for general context-free languages.
   1159 * Locations::            Overview of location tracking.
   1160 * Bison Parser::         What are Bison's input and output,
   1161                            how is the output used?
   1162 * Stages::               Stages in writing and running Bison grammars.
   1163 * Grammar Layout::       Overall structure of a Bison grammar file.
   1164 
   1165 
   1166 File: bison.info,  Node: Language and Grammar,  Next: Grammar in Bison,  Up: Concepts
   1167 
   1168 1.1 Languages and Context-Free Grammars
   1169 =======================================
   1170 
   1171 In order for Bison to parse a language, it must be described by a
   1172 "context-free grammar".  This means that you specify one or more
   1173 "syntactic groupings" and give rules for constructing them from their
   1174 parts.  For example, in the C language, one kind of grouping is called
   1175 an `expression'.  One rule for making an expression might be, "An
   1176 expression can be made of a minus sign and another expression".
   1177 Another would be, "An expression can be an integer".  As you can see,
   1178 rules are often recursive, but there must be at least one rule which
   1179 leads out of the recursion.
   1180 
   1181    The most common formal system for presenting such rules for humans
   1182 to read is "Backus-Naur Form" or "BNF", which was developed in order to
   1183 specify the language Algol 60.  Any grammar expressed in BNF is a
   1184 context-free grammar.  The input to Bison is essentially
   1185 machine-readable BNF.
   1186 
   1187    There are various important subclasses of context-free grammars.
   1188 Although it can handle almost all context-free grammars, Bison is
   1189 optimized for what are called LR(1) grammars.  In brief, in these
   1190 grammars, it must be possible to tell how to parse any portion of an
   1191 input string with just a single token of lookahead.  For historical
   1192 reasons, Bison by default is limited by the additional restrictions of
   1193 LALR(1), which is hard to explain simply.  *Note Mysterious
   1194 Conflicts::, for more information on this.  As an experimental feature,
   1195 you can escape these additional restrictions by requesting IELR(1) or
   1196 canonical LR(1) parser tables.  *Note LR Table Construction::, to learn
   1197 how.
   1198 
   1199    Parsers for LR(1) grammars are "deterministic", meaning roughly that
   1200 the next grammar rule to apply at any point in the input is uniquely
   1201 determined by the preceding input and a fixed, finite portion (called a
   1202 "lookahead") of the remaining input.  A context-free grammar can be
   1203 "ambiguous", meaning that there are multiple ways to apply the grammar
   1204 rules to get the same inputs.  Even unambiguous grammars can be
   1205 "nondeterministic", meaning that no fixed lookahead always suffices to
   1206 determine the next grammar rule to apply.  With the proper
   1207 declarations, Bison is also able to parse these more general
   1208 context-free grammars, using a technique known as GLR parsing (for
   1209 Generalized LR).  Bison's GLR parsers are able to handle any
   1210 context-free grammar for which the number of possible parses of any
   1211 given string is finite.
   1212 
   1213    In the formal grammatical rules for a language, each kind of
   1214 syntactic unit or grouping is named by a "symbol".  Those which are
   1215 built by grouping smaller constructs according to grammatical rules are
   1216 called "nonterminal symbols"; those which can't be subdivided are called
   1217 "terminal symbols" or "token types".  We call a piece of input
   1218 corresponding to a single terminal symbol a "token", and a piece
   1219 corresponding to a single nonterminal symbol a "grouping".
   1220 
   1221    We can use the C language as an example of what symbols, terminal and
   1222 nonterminal, mean.  The tokens of C are identifiers, constants (numeric
   1223 and string), and the various keywords, arithmetic operators and
   1224 punctuation marks.  So the terminal symbols of a grammar for C include
   1225 `identifier', `number', `string', plus one symbol for each keyword,
   1226 operator or punctuation mark: `if', `return', `const', `static', `int',
   1227 `char', `plus-sign', `open-brace', `close-brace', `comma' and many more.
   1228 (These tokens can be subdivided into characters, but that is a matter of
   1229 lexicography, not grammar.)
   1230 
   1231    Here is a simple C function subdivided into tokens:
   1232 
   1233      int             /* keyword `int' */
   1234      square (int x)  /* identifier, open-paren, keyword `int',
   1235                         identifier, close-paren */
   1236      {               /* open-brace */
   1237        return x * x; /* keyword `return', identifier, asterisk,
   1238                         identifier, semicolon */
   1239      }               /* close-brace */
   1240 
   1241    The syntactic groupings of C include the expression, the statement,
   1242 the declaration, and the function definition.  These are represented in
   1243 the grammar of C by nonterminal symbols `expression', `statement',
   1244 `declaration' and `function definition'.  The full grammar uses dozens
   1245 of additional language constructs, each with its own nonterminal
   1246 symbol, in order to express the meanings of these four.  The example
   1247 above is a function definition; it contains one declaration, and one
   1248 statement.  In the statement, each `x' is an expression and so is `x *
   1249 x'.
   1250 
   1251    Each nonterminal symbol must have grammatical rules showing how it
   1252 is made out of simpler constructs.  For example, one kind of C
   1253 statement is the `return' statement; this would be described with a
   1254 grammar rule which reads informally as follows:
   1255 
   1256      A `statement' can be made of a `return' keyword, an `expression'
   1257      and a `semicolon'.
   1258 
   1259 There would be many other rules for `statement', one for each kind of
   1260 statement in C.
   1261 
   1262    One nonterminal symbol must be distinguished as the special one which
   1263 defines a complete utterance in the language.  It is called the "start
   1264 symbol".  In a compiler, this means a complete input program.  In the C
   1265 language, the nonterminal symbol `sequence of definitions and
   1266 declarations' plays this role.
   1267 
   1268    For example, `1 + 2' is a valid C expression--a valid part of a C
   1269 program--but it is not valid as an _entire_ C program.  In the
   1270 context-free grammar of C, this follows from the fact that `expression'
   1271 is not the start symbol.
   1272 
   1273    The Bison parser reads a sequence of tokens as its input, and groups
   1274 the tokens using the grammar rules.  If the input is valid, the end
   1275 result is that the entire token sequence reduces to a single grouping
   1276 whose symbol is the grammar's start symbol.  If we use a grammar for C,
   1277 the entire input must be a `sequence of definitions and declarations'.
   1278 If not, the parser reports a syntax error.
   1279 
   1280 
   1281 File: bison.info,  Node: Grammar in Bison,  Next: Semantic Values,  Prev: Language and Grammar,  Up: Concepts
   1282 
   1283 1.2 From Formal Rules to Bison Input
   1284 ====================================
   1285 
   1286 A formal grammar is a mathematical construct.  To define the language
   1287 for Bison, you must write a file expressing the grammar in Bison syntax:
   1288 a "Bison grammar" file.  *Note Bison Grammar Files: Grammar File.
   1289 
   1290    A nonterminal symbol in the formal grammar is represented in Bison
   1291 input as an identifier, like an identifier in C.  By convention, it
   1292 should be in lower case, such as `expr', `stmt' or `declaration'.
   1293 
   1294    The Bison representation for a terminal symbol is also called a
   1295 "token type".  Token types as well can be represented as C-like
   1296 identifiers.  By convention, these identifiers should be upper case to
   1297 distinguish them from nonterminals: for example, `INTEGER',
   1298 `IDENTIFIER', `IF' or `RETURN'.  A terminal symbol that stands for a
   1299 particular keyword in the language should be named after that keyword
   1300 converted to upper case.  The terminal symbol `error' is reserved for
   1301 error recovery.  *Note Symbols::.
   1302 
   1303    A terminal symbol can also be represented as a character literal,
   1304 just like a C character constant.  You should do this whenever a token
   1305 is just a single character (parenthesis, plus-sign, etc.): use that
   1306 same character in a literal as the terminal symbol for that token.
   1307 
   1308    A third way to represent a terminal symbol is with a C string
   1309 constant containing several characters.  *Note Symbols::, for more
   1310 information.
   1311 
   1312    The grammar rules also have an expression in Bison syntax.  For
   1313 example, here is the Bison rule for a C `return' statement.  The
   1314 semicolon in quotes is a literal character token, representing part of
   1315 the C syntax for the statement; the naked semicolon, and the colon, are
   1316 Bison punctuation used in every rule.
   1317 
   1318      stmt: RETURN expr ';' ;
   1319 
   1320 *Note Syntax of Grammar Rules: Rules.
   1321 
   1322 
   1323 File: bison.info,  Node: Semantic Values,  Next: Semantic Actions,  Prev: Grammar in Bison,  Up: Concepts
   1324 
   1325 1.3 Semantic Values
   1326 ===================
   1327 
   1328 A formal grammar selects tokens only by their classifications: for
   1329 example, if a rule mentions the terminal symbol `integer constant', it
   1330 means that _any_ integer constant is grammatically valid in that
   1331 position.  The precise value of the constant is irrelevant to how to
   1332 parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is
   1333 equally grammatical.
   1334 
   1335    But the precise value is very important for what the input means
   1336 once it is parsed.  A compiler is useless if it fails to distinguish
   1337 between 4, 1 and 3989 as constants in the program!  Therefore, each
   1338 token in a Bison grammar has both a token type and a "semantic value".
   1339 *Note Defining Language Semantics: Semantics, for details.
   1340 
   1341    The token type is a terminal symbol defined in the grammar, such as
   1342 `INTEGER', `IDENTIFIER' or `',''.  It tells everything you need to know
   1343 to decide where the token may validly appear and how to group it with
   1344 other tokens.  The grammar rules know nothing about tokens except their
   1345 types.
   1346 
   1347    The semantic value has all the rest of the information about the
   1348 meaning of the token, such as the value of an integer, or the name of an
   1349 identifier.  (A token such as `','' which is just punctuation doesn't
   1350 need to have any semantic value.)
   1351 
   1352    For example, an input token might be classified as token type
   1353 `INTEGER' and have the semantic value 4.  Another input token might
   1354 have the same token type `INTEGER' but value 3989.  When a grammar rule
   1355 says that `INTEGER' is allowed, either of these tokens is acceptable
   1356 because each is an `INTEGER'.  When the parser accepts the token, it
   1357 keeps track of the token's semantic value.
   1358 
   1359    Each grouping can also have a semantic value as well as its
   1360 nonterminal symbol.  For example, in a calculator, an expression
   1361 typically has a semantic value that is a number.  In a compiler for a
   1362 programming language, an expression typically has a semantic value that
   1363 is a tree structure describing the meaning of the expression.
   1364 
   1365 
   1366 File: bison.info,  Node: Semantic Actions,  Next: GLR Parsers,  Prev: Semantic Values,  Up: Concepts
   1367 
   1368 1.4 Semantic Actions
   1369 ====================
   1370 
   1371 In order to be useful, a program must do more than parse input; it must
   1372 also produce some output based on the input.  In a Bison grammar, a
   1373 grammar rule can have an "action" made up of C statements.  Each time
   1374 the parser recognizes a match for that rule, the action is executed.
   1375 *Note Actions::.
   1376 
   1377    Most of the time, the purpose of an action is to compute the
   1378 semantic value of the whole construct from the semantic values of its
   1379 parts.  For example, suppose we have a rule which says an expression
   1380 can be the sum of two expressions.  When the parser recognizes such a
   1381 sum, each of the subexpressions has a semantic value which describes
   1382 how it was built up.  The action for this rule should create a similar
   1383 sort of value for the newly recognized larger expression.
   1384 
   1385    For example, here is a rule that says an expression can be the sum of
   1386 two subexpressions:
   1387 
   1388      expr: expr '+' expr   { $$ = $1 + $3; } ;
   1389 
   1390 The action says how to produce the semantic value of the sum expression
   1391 from the values of the two subexpressions.
   1392 
   1393 
   1394 File: bison.info,  Node: GLR Parsers,  Next: Locations,  Prev: Semantic Actions,  Up: Concepts
   1395 
   1396 1.5 Writing GLR Parsers
   1397 =======================
   1398 
   1399 In some grammars, Bison's deterministic LR(1) parsing algorithm cannot
   1400 decide whether to apply a certain grammar rule at a given point.  That
   1401 is, it may not be able to decide (on the basis of the input read so
   1402 far) which of two possible reductions (applications of a grammar rule)
   1403 applies, or whether to apply a reduction or read more of the input and
   1404 apply a reduction later in the input.  These are known respectively as
   1405 "reduce/reduce" conflicts (*note Reduce/Reduce::), and "shift/reduce"
   1406 conflicts (*note Shift/Reduce::).
   1407 
   1408    To use a grammar that is not easily modified to be LR(1), a more
   1409 general parsing algorithm is sometimes necessary.  If you include
   1410 `%glr-parser' among the Bison declarations in your file (*note Grammar
   1411 Outline::), the result is a Generalized LR (GLR) parser.  These parsers
   1412 handle Bison grammars that contain no unresolved conflicts (i.e., after
   1413 applying precedence declarations) identically to deterministic parsers.
   1414 However, when faced with unresolved shift/reduce and reduce/reduce
   1415 conflicts, GLR parsers use the simple expedient of doing both,
   1416 effectively cloning the parser to follow both possibilities.  Each of
   1417 the resulting parsers can again split, so that at any given time, there
   1418 can be any number of possible parses being explored.  The parsers
   1419 proceed in lockstep; that is, all of them consume (shift) a given input
   1420 symbol before any of them proceed to the next.  Each of the cloned
   1421 parsers eventually meets one of two possible fates: either it runs into
   1422 a parsing error, in which case it simply vanishes, or it merges with
   1423 another parser, because the two of them have reduced the input to an
   1424 identical set of symbols.
   1425 
   1426    During the time that there are multiple parsers, semantic actions are
   1427 recorded, but not performed.  When a parser disappears, its recorded
   1428 semantic actions disappear as well, and are never performed.  When a
   1429 reduction makes two parsers identical, causing them to merge, Bison
   1430 records both sets of semantic actions.  Whenever the last two parsers
   1431 merge, reverting to the single-parser case, Bison resolves all the
   1432 outstanding actions either by precedences given to the grammar rules
   1433 involved, or by performing both actions, and then calling a designated
   1434 user-defined function on the resulting values to produce an arbitrary
   1435 merged result.
   1436 
   1437 * Menu:
   1438 
   1439 * Simple GLR Parsers::     Using GLR parsers on unambiguous grammars.
   1440 * Merging GLR Parses::     Using GLR parsers to resolve ambiguities.
   1441 * GLR Semantic Actions::   Deferred semantic actions have special concerns.
   1442 * Compiler Requirements::  GLR parsers require a modern C compiler.
   1443 
   1444 
   1445 File: bison.info,  Node: Simple GLR Parsers,  Next: Merging GLR Parses,  Up: GLR Parsers
   1446 
   1447 1.5.1 Using GLR on Unambiguous Grammars
   1448 ---------------------------------------
   1449 
   1450 In the simplest cases, you can use the GLR algorithm to parse grammars
   1451 that are unambiguous but fail to be LR(1).  Such grammars typically
   1452 require more than one symbol of lookahead.
   1453 
   1454    Consider a problem that arises in the declaration of enumerated and
   1455 subrange types in the programming language Pascal.  Here are some
   1456 examples:
   1457 
   1458      type subrange = lo .. hi;
   1459      type enum = (a, b, c);
   1460 
   1461 The original language standard allows only numeric literals and
   1462 constant identifiers for the subrange bounds (`lo' and `hi'), but
   1463 Extended Pascal (ISO/IEC 10206) and many other Pascal implementations
   1464 allow arbitrary expressions there.  This gives rise to the following
   1465 situation, containing a superfluous pair of parentheses:
   1466 
   1467      type subrange = (a) .. b;
   1468 
   1469 Compare this to the following declaration of an enumerated type with
   1470 only one value:
   1471 
   1472      type enum = (a);
   1473 
   1474 (These declarations are contrived, but they are syntactically valid,
   1475 and more-complicated cases can come up in practical programs.)
   1476 
   1477    These two declarations look identical until the `..' token.  With
   1478 normal LR(1) one-token lookahead it is not possible to decide between
   1479 the two forms when the identifier `a' is parsed.  It is, however,
   1480 desirable for a parser to decide this, since in the latter case `a'
   1481 must become a new identifier to represent the enumeration value, while
   1482 in the former case `a' must be evaluated with its current meaning,
   1483 which may be a constant or even a function call.
   1484 
   1485    You could parse `(a)' as an "unspecified identifier in parentheses",
   1486 to be resolved later, but this typically requires substantial
   1487 contortions in both semantic actions and large parts of the grammar,
   1488 where the parentheses are nested in the recursive rules for expressions.
   1489 
   1490    You might think of using the lexer to distinguish between the two
   1491 forms by returning different tokens for currently defined and undefined
   1492 identifiers.  But if these declarations occur in a local scope, and `a'
   1493 is defined in an outer scope, then both forms are possible--either
   1494 locally redefining `a', or using the value of `a' from the outer scope.
   1495 So this approach cannot work.
   1496 
   1497    A simple solution to this problem is to declare the parser to use
   1498 the GLR algorithm.  When the GLR parser reaches the critical state, it
   1499 merely splits into two branches and pursues both syntax rules
   1500 simultaneously.  Sooner or later, one of them runs into a parsing
   1501 error.  If there is a `..' token before the next `;', the rule for
   1502 enumerated types fails since it cannot accept `..' anywhere; otherwise,
   1503 the subrange type rule fails since it requires a `..' token.  So one of
   1504 the branches fails silently, and the other one continues normally,
   1505 performing all the intermediate actions that were postponed during the
   1506 split.
   1507 
   1508    If the input is syntactically incorrect, both branches fail and the
   1509 parser reports a syntax error as usual.
   1510 
   1511    The effect of all this is that the parser seems to "guess" the
   1512 correct branch to take, or in other words, it seems to use more
   1513 lookahead than the underlying LR(1) algorithm actually allows for.  In
   1514 this example, LR(2) would suffice, but also some cases that are not
   1515 LR(k) for any k can be handled this way.
   1516 
   1517    In general, a GLR parser can take quadratic or cubic worst-case time,
   1518 and the current Bison parser even takes exponential time and space for
   1519 some grammars.  In practice, this rarely happens, and for many grammars
   1520 it is possible to prove that it cannot happen.  The present example
   1521 contains only one conflict between two rules, and the type-declaration
   1522 context containing the conflict cannot be nested.  So the number of
   1523 branches that can exist at any time is limited by the constant 2, and
   1524 the parsing time is still linear.
   1525 
   1526    Here is a Bison grammar corresponding to the example above.  It
   1527 parses a vastly simplified form of Pascal type declarations.
   1528 
   1529      %token TYPE DOTDOT ID
   1530 
   1531      %left '+' '-'
   1532      %left '*' '/'
   1533 
   1534      %%
   1535 
   1536      type_decl: TYPE ID '=' type ';' ;
   1537 
   1538      type:
   1539        '(' id_list ')'
   1540      | expr DOTDOT expr
   1541      ;
   1542 
   1543      id_list:
   1544        ID
   1545      | id_list ',' ID
   1546      ;
   1547 
   1548      expr:
   1549        '(' expr ')'
   1550      | expr '+' expr
   1551      | expr '-' expr
   1552      | expr '*' expr
   1553      | expr '/' expr
   1554      | ID
   1555      ;
   1556 
   1557    When used as a normal LR(1) grammar, Bison correctly complains about
   1558 one reduce/reduce conflict.  In the conflicting situation the parser
   1559 chooses one of the alternatives, arbitrarily the one declared first.
   1560 Therefore the following correct input is not recognized:
   1561 
   1562      type t = (a) .. b;
   1563 
   1564    The parser can be turned into a GLR parser, while also telling Bison
   1565 to be silent about the one known reduce/reduce conflict, by adding
   1566 these two declarations to the Bison grammar file (before the first
   1567 `%%'):
   1568 
   1569      %glr-parser
   1570      %expect-rr 1
   1571 
   1572 No change in the grammar itself is required.  Now the parser recognizes
   1573 all valid declarations, according to the limited syntax above,
   1574 transparently.  In fact, the user does not even notice when the parser
   1575 splits.
   1576 
   1577    So here we have a case where we can use the benefits of GLR, almost
   1578 without disadvantages.  Even in simple cases like this, however, there
   1579 are at least two potential problems to beware.  First, always analyze
   1580 the conflicts reported by Bison to make sure that GLR splitting is only
   1581 done where it is intended.  A GLR parser splitting inadvertently may
   1582 cause problems less obvious than an LR parser statically choosing the
   1583 wrong alternative in a conflict.  Second, consider interactions with
   1584 the lexer (*note Semantic Tokens::) with great care.  Since a split
   1585 parser consumes tokens without performing any actions during the split,
   1586 the lexer cannot obtain information via parser actions.  Some cases of
   1587 lexer interactions can be eliminated by using GLR to shift the
   1588 complications from the lexer to the parser.  You must check the
   1589 remaining cases for correctness.
   1590 
   1591    In our example, it would be safe for the lexer to return tokens
   1592 based on their current meanings in some symbol table, because no new
   1593 symbols are defined in the middle of a type declaration.  Though it is
   1594 possible for a parser to define the enumeration constants as they are
   1595 parsed, before the type declaration is completed, it actually makes no
   1596 difference since they cannot be used within the same enumerated type
   1597 declaration.
   1598 
   1599 
   1600 File: bison.info,  Node: Merging GLR Parses,  Next: GLR Semantic Actions,  Prev: Simple GLR Parsers,  Up: GLR Parsers
   1601 
   1602 1.5.2 Using GLR to Resolve Ambiguities
   1603 --------------------------------------
   1604 
   1605 Let's consider an example, vastly simplified from a C++ grammar.
   1606 
   1607      %{
   1608        #include <stdio.h>
   1609        #define YYSTYPE char const *
   1610        int yylex (void);
   1611        void yyerror (char const *);
   1612      %}
   1613 
   1614      %token TYPENAME ID
   1615 
   1616      %right '='
   1617      %left '+'
   1618 
   1619      %glr-parser
   1620 
   1621      %%
   1622 
   1623      prog:
   1624        /* Nothing.  */
   1625      | prog stmt   { printf ("\n"); }
   1626      ;
   1627 
   1628      stmt:
   1629        expr ';'  %dprec 1
   1630      | decl      %dprec 2
   1631      ;
   1632 
   1633      expr:
   1634        ID               { printf ("%s ", $$); }
   1635      | TYPENAME '(' expr ')'
   1636                         { printf ("%s <cast> ", $1); }
   1637      | expr '+' expr    { printf ("+ "); }
   1638      | expr '=' expr    { printf ("= "); }
   1639      ;
   1640 
   1641      decl:
   1642        TYPENAME declarator ';'
   1643                         { printf ("%s <declare> ", $1); }
   1644      | TYPENAME declarator '=' expr ';'
   1645                         { printf ("%s <init-declare> ", $1); }
   1646      ;
   1647 
   1648      declarator:
   1649        ID               { printf ("\"%s\" ", $1); }
   1650      | '(' declarator ')'
   1651      ;
   1652 
   1653 This models a problematic part of the C++ grammar--the ambiguity between
   1654 certain declarations and statements.  For example,
   1655 
   1656      T (x) = y+z;
   1657 
   1658 parses as either an `expr' or a `stmt' (assuming that `T' is recognized
   1659 as a `TYPENAME' and `x' as an `ID').  Bison detects this as a
   1660 reduce/reduce conflict between the rules `expr : ID' and `declarator :
   1661 ID', which it cannot resolve at the time it encounters `x' in the
   1662 example above.  Since this is a GLR parser, it therefore splits the
   1663 problem into two parses, one for each choice of resolving the
   1664 reduce/reduce conflict.  Unlike the example from the previous section
   1665 (*note Simple GLR Parsers::), however, neither of these parses "dies,"
   1666 because the grammar as it stands is ambiguous.  One of the parsers
   1667 eventually reduces `stmt : expr ';'' and the other reduces `stmt :
   1668 decl', after which both parsers are in an identical state: they've seen
   1669 `prog stmt' and have the same unprocessed input remaining.  We say that
   1670 these parses have "merged."
   1671 
   1672    At this point, the GLR parser requires a specification in the
   1673 grammar of how to choose between the competing parses.  In the example
   1674 above, the two `%dprec' declarations specify that Bison is to give
   1675 precedence to the parse that interprets the example as a `decl', which
   1676 implies that `x' is a declarator.  The parser therefore prints
   1677 
   1678      "x" y z + T <init-declare>
   1679 
   1680    The `%dprec' declarations only come into play when more than one
   1681 parse survives.  Consider a different input string for this parser:
   1682 
   1683      T (x) + y;
   1684 
   1685 This is another example of using GLR to parse an unambiguous construct,
   1686 as shown in the previous section (*note Simple GLR Parsers::).  Here,
   1687 there is no ambiguity (this cannot be parsed as a declaration).
   1688 However, at the time the Bison parser encounters `x', it does not have
   1689 enough information to resolve the reduce/reduce conflict (again,
   1690 between `x' as an `expr' or a `declarator').  In this case, no
   1691 precedence declaration is used.  Again, the parser splits into two, one
   1692 assuming that `x' is an `expr', and the other assuming `x' is a
   1693 `declarator'.  The second of these parsers then vanishes when it sees
   1694 `+', and the parser prints
   1695 
   1696      x T <cast> y +
   1697 
   1698    Suppose that instead of resolving the ambiguity, you wanted to see
   1699 all the possibilities.  For this purpose, you must merge the semantic
   1700 actions of the two possible parsers, rather than choosing one over the
   1701 other.  To do so, you could change the declaration of `stmt' as follows:
   1702 
   1703      stmt:
   1704        expr ';'  %merge <stmtMerge>
   1705      | decl      %merge <stmtMerge>
   1706      ;
   1707 
   1708 and define the `stmtMerge' function as:
   1709 
   1710      static YYSTYPE
   1711      stmtMerge (YYSTYPE x0, YYSTYPE x1)
   1712      {
   1713        printf ("<OR> ");
   1714        return "";
   1715      }
   1716 
   1717 with an accompanying forward declaration in the C declarations at the
   1718 beginning of the file:
   1719 
   1720      %{
   1721        #define YYSTYPE char const *
   1722        static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
   1723      %}
   1724 
   1725 With these declarations, the resulting parser parses the first example
   1726 as both an `expr' and a `decl', and prints
   1727 
   1728      "x" y z + T <init-declare> x T <cast> y z + = <OR>
   1729 
   1730    Bison requires that all of the productions that participate in any
   1731 particular merge have identical `%merge' clauses.  Otherwise, the
   1732 ambiguity would be unresolvable, and the parser will report an error
   1733 during any parse that results in the offending merge.
   1734 
   1735 
   1736 File: bison.info,  Node: GLR Semantic Actions,  Next: Compiler Requirements,  Prev: Merging GLR Parses,  Up: GLR Parsers
   1737 
   1738 1.5.3 GLR Semantic Actions
   1739 --------------------------
   1740 
   1741 By definition, a deferred semantic action is not performed at the same
   1742 time as the associated reduction.  This raises caveats for several
   1743 Bison features you might use in a semantic action in a GLR parser.
   1744 
   1745    In any semantic action, you can examine `yychar' to determine the
   1746 type of the lookahead token present at the time of the associated
   1747 reduction.  After checking that `yychar' is not set to `YYEMPTY' or
   1748 `YYEOF', you can then examine `yylval' and `yylloc' to determine the
   1749 lookahead token's semantic value and location, if any.  In a
   1750 nondeferred semantic action, you can also modify any of these variables
   1751 to influence syntax analysis.  *Note Lookahead Tokens: Lookahead.
   1752 
   1753    In a deferred semantic action, it's too late to influence syntax
   1754 analysis.  In this case, `yychar', `yylval', and `yylloc' are set to
   1755 shallow copies of the values they had at the time of the associated
   1756 reduction.  For this reason alone, modifying them is dangerous.
   1757 Moreover, the result of modifying them is undefined and subject to
   1758 change with future versions of Bison.  For example, if a semantic
   1759 action might be deferred, you should never write it to invoke
   1760 `yyclearin' (*note Action Features::) or to attempt to free memory
   1761 referenced by `yylval'.
   1762 
   1763    Another Bison feature requiring special consideration is `YYERROR'
   1764 (*note Action Features::), which you can invoke in a semantic action to
   1765 initiate error recovery.  During deterministic GLR operation, the
   1766 effect of `YYERROR' is the same as its effect in a deterministic parser.
   1767 In a deferred semantic action, its effect is undefined.
   1768 
   1769    Also, see *note Default Action for Locations: Location Default
   1770 Action, which describes a special usage of `YYLLOC_DEFAULT' in GLR
   1771 parsers.
   1772 
   1773 
   1774 File: bison.info,  Node: Compiler Requirements,  Prev: GLR Semantic Actions,  Up: GLR Parsers
   1775 
   1776 1.5.4 Considerations when Compiling GLR Parsers
   1777 -----------------------------------------------
   1778 
   1779 The GLR parsers require a compiler for ISO C89 or later.  In addition,
   1780 they use the `inline' keyword, which is not C89, but is C99 and is a
   1781 common extension in pre-C99 compilers.  It is up to the user of these
   1782 parsers to handle portability issues.  For instance, if using Autoconf
   1783 and the Autoconf macro `AC_C_INLINE', a mere
   1784 
   1785      %{
   1786        #include <config.h>
   1787      %}
   1788 
   1789 will suffice.  Otherwise, we suggest
   1790 
   1791      %{
   1792        #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \
   1793             && ! defined inline)
   1794        # define inline
   1795        #endif
   1796      %}
   1797 
   1798 
   1799 File: bison.info,  Node: Locations,  Next: Bison Parser,  Prev: GLR Parsers,  Up: Concepts
   1800 
   1801 1.6 Locations
   1802 =============
   1803 
   1804 Many applications, like interpreters or compilers, have to produce
   1805 verbose and useful error messages.  To achieve this, one must be able
   1806 to keep track of the "textual location", or "location", of each
   1807 syntactic construct.  Bison provides a mechanism for handling these
   1808 locations.
   1809 
   1810    Each token has a semantic value.  In a similar fashion, each token
   1811 has an associated location, but the type of locations is the same for
   1812 all tokens and groupings.  Moreover, the output parser is equipped with
   1813 a default data structure for storing locations (*note Tracking
   1814 Locations::, for more details).
   1815 
   1816    Like semantic values, locations can be reached in actions using a
   1817 dedicated set of constructs.  In the example above, the location of the
   1818 whole grouping is `@$', while the locations of the subexpressions are
   1819 `@1' and `@3'.
   1820 
   1821    When a rule is matched, a default action is used to compute the
   1822 semantic value of its left hand side (*note Actions::).  In the same
   1823 way, another default action is used for locations.  However, the action
   1824 for locations is general enough for most cases, meaning there is
   1825 usually no need to describe for each rule how `@$' should be formed.
   1826 When building a new location for a given grouping, the default behavior
   1827 of the output parser is to take the beginning of the first symbol, and
   1828 the end of the last symbol.
   1829 
   1830 
   1831 File: bison.info,  Node: Bison Parser,  Next: Stages,  Prev: Locations,  Up: Concepts
   1832 
   1833 1.7 Bison Output: the Parser Implementation File
   1834 ================================================
   1835 
   1836 When you run Bison, you give it a Bison grammar file as input.  The
   1837 most important output is a C source file that implements a parser for
   1838 the language described by the grammar.  This parser is called a "Bison
   1839 parser", and this file is called a "Bison parser implementation file".
   1840 Keep in mind that the Bison utility and the Bison parser are two
   1841 distinct programs: the Bison utility is a program whose output is the
   1842 Bison parser implementation file that becomes part of your program.
   1843 
   1844    The job of the Bison parser is to group tokens into groupings
   1845 according to the grammar rules--for example, to build identifiers and
   1846 operators into expressions.  As it does this, it runs the actions for
   1847 the grammar rules it uses.
   1848 
   1849    The tokens come from a function called the "lexical analyzer" that
   1850 you must supply in some fashion (such as by writing it in C).  The Bison
   1851 parser calls the lexical analyzer each time it wants a new token.  It
   1852 doesn't know what is "inside" the tokens (though their semantic values
   1853 may reflect this).  Typically the lexical analyzer makes the tokens by
   1854 parsing characters of text, but Bison does not depend on this.  *Note
   1855 The Lexical Analyzer Function `yylex': Lexical.
   1856 
   1857    The Bison parser implementation file is C code which defines a
   1858 function named `yyparse' which implements that grammar.  This function
   1859 does not make a complete C program: you must supply some additional
   1860 functions.  One is the lexical analyzer.  Another is an error-reporting
   1861 function which the parser calls to report an error.  In addition, a
   1862 complete C program must start with a function called `main'; you have
   1863 to provide this, and arrange for it to call `yyparse' or the parser
   1864 will never run.  *Note Parser C-Language Interface: Interface.
   1865 
   1866    Aside from the token type names and the symbols in the actions you
   1867 write, all symbols defined in the Bison parser implementation file
   1868 itself begin with `yy' or `YY'.  This includes interface functions such
   1869 as the lexical analyzer function `yylex', the error reporting function
   1870 `yyerror' and the parser function `yyparse' itself.  This also includes
   1871 numerous identifiers used for internal purposes.  Therefore, you should
   1872 avoid using C identifiers starting with `yy' or `YY' in the Bison
   1873 grammar file except for the ones defined in this manual.  Also, you
   1874 should avoid using the C identifiers `malloc' and `free' for anything
   1875 other than their usual meanings.
   1876 
   1877    In some cases the Bison parser implementation file includes system
   1878 headers, and in those cases your code should respect the identifiers
   1879 reserved by those headers.  On some non-GNU hosts, `<alloca.h>',
   1880 `<malloc.h>', `<stddef.h>', and `<stdlib.h>' are included as needed to
   1881 declare memory allocators and related types.  `<libintl.h>' is included
   1882 if message translation is in use (*note Internationalization::).  Other
   1883 system headers may be included if you define `YYDEBUG' to a nonzero
   1884 value (*note Tracing Your Parser: Tracing.).
   1885 
   1886 
   1887 File: bison.info,  Node: Stages,  Next: Grammar Layout,  Prev: Bison Parser,  Up: Concepts
   1888 
   1889 1.8 Stages in Using Bison
   1890 =========================
   1891 
   1892 The actual language-design process using Bison, from grammar
   1893 specification to a working compiler or interpreter, has these parts:
   1894 
   1895   1. Formally specify the grammar in a form recognized by Bison (*note
   1896      Bison Grammar Files: Grammar File.).  For each grammatical rule in
   1897      the language, describe the action that is to be taken when an
   1898      instance of that rule is recognized.  The action is described by a
   1899      sequence of C statements.
   1900 
   1901   2. Write a lexical analyzer to process input and pass tokens to the
   1902      parser.  The lexical analyzer may be written by hand in C (*note
   1903      The Lexical Analyzer Function `yylex': Lexical.).  It could also
   1904      be produced using Lex, but the use of Lex is not discussed in this
   1905      manual.
   1906 
   1907   3. Write a controlling function that calls the Bison-produced parser.
   1908 
   1909   4. Write error-reporting routines.
   1910 
   1911    To turn this source code as written into a runnable program, you
   1912 must follow these steps:
   1913 
   1914   1. Run Bison on the grammar to produce the parser.
   1915 
   1916   2. Compile the code output by Bison, as well as any other source
   1917      files.
   1918 
   1919   3. Link the object files to produce the finished product.
   1920 
   1921 
   1922 File: bison.info,  Node: Grammar Layout,  Prev: Stages,  Up: Concepts
   1923 
   1924 1.9 The Overall Layout of a Bison Grammar
   1925 =========================================
   1926 
   1927 The input file for the Bison utility is a "Bison grammar file".  The
   1928 general form of a Bison grammar file is as follows:
   1929 
   1930      %{
   1931      PROLOGUE
   1932      %}
   1933 
   1934      BISON DECLARATIONS
   1935 
   1936      %%
   1937      GRAMMAR RULES
   1938      %%
   1939      EPILOGUE
   1940 
   1941 The `%%', `%{' and `%}' are punctuation that appears in every Bison
   1942 grammar file to separate the sections.
   1943 
   1944    The prologue may define types and variables used in the actions.
   1945 You can also use preprocessor commands to define macros used there, and
   1946 use `#include' to include header files that do any of these things.
   1947 You need to declare the lexical analyzer `yylex' and the error printer
   1948 `yyerror' here, along with any other global identifiers used by the
   1949 actions in the grammar rules.
   1950 
   1951    The Bison declarations declare the names of the terminal and
   1952 nonterminal symbols, and may also describe operator precedence and the
   1953 data types of semantic values of various symbols.
   1954 
   1955    The grammar rules define how to construct each nonterminal symbol
   1956 from its parts.
   1957 
   1958    The epilogue can contain any code you want to use.  Often the
   1959 definitions of functions declared in the prologue go here.  In a simple
   1960 program, all the rest of the program can go here.
   1961 
   1962 
   1963 File: bison.info,  Node: Examples,  Next: Grammar File,  Prev: Concepts,  Up: Top
   1964 
   1965 2 Examples
   1966 **********
   1967 
   1968 Now we show and explain several sample programs written using Bison: a
   1969 reverse polish notation calculator, an algebraic (infix) notation
   1970 calculator -- later extended to track "locations" -- and a
   1971 multi-function calculator.  All produce usable, though limited,
   1972 interactive desk-top calculators.
   1973 
   1974    These examples are simple, but Bison grammars for real programming
   1975 languages are written the same way.  You can copy these examples into a
   1976 source file to try them.
   1977 
   1978 * Menu:
   1979 
   1980 * RPN Calc::               Reverse polish notation calculator;
   1981                              a first example with no operator precedence.
   1982 * Infix Calc::             Infix (algebraic) notation calculator.
   1983                              Operator precedence is introduced.
   1984 * Simple Error Recovery::  Continuing after syntax errors.
   1985 * Location Tracking Calc:: Demonstrating the use of @N and @$.
   1986 * Multi-function Calc::    Calculator with memory and trig functions.
   1987                              It uses multiple data-types for semantic values.
   1988 * Exercises::              Ideas for improving the multi-function calculator.
   1989 
   1990 
   1991 File: bison.info,  Node: RPN Calc,  Next: Infix Calc,  Up: Examples
   1992 
   1993 2.1 Reverse Polish Notation Calculator
   1994 ======================================
   1995 
   1996 The first example is that of a simple double-precision "reverse polish
   1997 notation" calculator (a calculator using postfix operators).  This
   1998 example provides a good starting point, since operator precedence is
   1999 not an issue.  The second example will illustrate how operator
   2000 precedence is handled.
   2001 
   2002    The source code for this calculator is named `rpcalc.y'.  The `.y'
   2003 extension is a convention used for Bison grammar files.
   2004 
   2005 * Menu:
   2006 
   2007 * Rpcalc Declarations::    Prologue (declarations) for rpcalc.
   2008 * Rpcalc Rules::           Grammar Rules for rpcalc, with explanation.
   2009 * Rpcalc Lexer::           The lexical analyzer.
   2010 * Rpcalc Main::            The controlling function.
   2011 * Rpcalc Error::           The error reporting function.
   2012 * Rpcalc Generate::        Running Bison on the grammar file.
   2013 * Rpcalc Compile::         Run the C compiler on the output code.
   2014 
   2015 
   2016 File: bison.info,  Node: Rpcalc Declarations,  Next: Rpcalc Rules,  Up: RPN Calc
   2017 
   2018 2.1.1 Declarations for `rpcalc'
   2019 -------------------------------
   2020 
   2021 Here are the C and Bison declarations for the reverse polish notation
   2022 calculator.  As in C, comments are placed between `/*...*/'.
   2023 
   2024      /* Reverse polish notation calculator.  */
   2025 
   2026      %{
   2027        #define YYSTYPE double
   2028        #include <math.h>
   2029        int yylex (void);
   2030        void yyerror (char const *);
   2031      %}
   2032 
   2033      %token NUM
   2034 
   2035      %% /* Grammar rules and actions follow.  */
   2036 
   2037    The declarations section (*note The prologue: Prologue.) contains two
   2038 preprocessor directives and two forward declarations.
   2039 
   2040    The `#define' directive defines the macro `YYSTYPE', thus specifying
   2041 the C data type for semantic values of both tokens and groupings (*note
   2042 Data Types of Semantic Values: Value Type.).  The Bison parser will use
   2043 whatever type `YYSTYPE' is defined as; if you don't define it, `int' is
   2044 the default.  Because we specify `double', each token and each
   2045 expression has an associated value, which is a floating point number.
   2046 
   2047    The `#include' directive is used to declare the exponentiation
   2048 function `pow'.
   2049 
   2050    The forward declarations for `yylex' and `yyerror' are needed
   2051 because the C language requires that functions be declared before they
   2052 are used.  These functions will be defined in the epilogue, but the
   2053 parser calls them so they must be declared in the prologue.
   2054 
   2055    The second section, Bison declarations, provides information to Bison
   2056 about the token types (*note The Bison Declarations Section: Bison
   2057 Declarations.).  Each terminal symbol that is not a single-character
   2058 literal must be declared here.  (Single-character literals normally
   2059 don't need to be declared.)  In this example, all the arithmetic
   2060 operators are designated by single-character literals, so the only
   2061 terminal symbol that needs to be declared is `NUM', the token type for
   2062 numeric constants.
   2063 
   2064 
   2065 File: bison.info,  Node: Rpcalc Rules,  Next: Rpcalc Lexer,  Prev: Rpcalc Declarations,  Up: RPN Calc
   2066 
   2067 2.1.2 Grammar Rules for `rpcalc'
   2068 --------------------------------
   2069 
   2070 Here are the grammar rules for the reverse polish notation calculator.
   2071 
   2072      input:
   2073        /* empty */
   2074      | input line
   2075      ;
   2076 
   2077      line:
   2078        '\n'
   2079      | exp '\n'      { printf ("%.10g\n", $1); }
   2080      ;
   2081 
   2082      exp:
   2083        NUM           { $$ = $1;           }
   2084      | exp exp '+'   { $$ = $1 + $2;      }
   2085      | exp exp '-'   { $$ = $1 - $2;      }
   2086      | exp exp '*'   { $$ = $1 * $2;      }
   2087      | exp exp '/'   { $$ = $1 / $2;      }
   2088      | exp exp '^'   { $$ = pow ($1, $2); }  /* Exponentiation */
   2089      | exp 'n'       { $$ = -$1;          }  /* Unary minus    */
   2090      ;
   2091      %%
   2092 
   2093    The groupings of the rpcalc "language" defined here are the
   2094 expression (given the name `exp'), the line of input (`line'), and the
   2095 complete input transcript (`input').  Each of these nonterminal symbols
   2096 has several alternate rules, joined by the vertical bar `|' which is
   2097 read as "or".  The following sections explain what these rules mean.
   2098 
   2099    The semantics of the language is determined by the actions taken
   2100 when a grouping is recognized.  The actions are the C code that appears
   2101 inside braces.  *Note Actions::.
   2102 
   2103    You must specify these actions in C, but Bison provides the means for
   2104 passing semantic values between the rules.  In each action, the
   2105 pseudo-variable `$$' stands for the semantic value for the grouping
   2106 that the rule is going to construct.  Assigning a value to `$$' is the
   2107 main job of most actions.  The semantic values of the components of the
   2108 rule are referred to as `$1', `$2', and so on.
   2109 
   2110 * Menu:
   2111 
   2112 * Rpcalc Input::
   2113 * Rpcalc Line::
   2114 * Rpcalc Expr::
   2115 
   2116 
   2117 File: bison.info,  Node: Rpcalc Input,  Next: Rpcalc Line,  Up: Rpcalc Rules
   2118 
   2119 2.1.2.1 Explanation of `input'
   2120 ..............................
   2121 
   2122 Consider the definition of `input':
   2123 
   2124      input:
   2125        /* empty */
   2126      | input line
   2127      ;
   2128 
   2129    This definition reads as follows: "A complete input is either an
   2130 empty string, or a complete input followed by an input line".  Notice
   2131 that "complete input" is defined in terms of itself.  This definition
   2132 is said to be "left recursive" since `input' appears always as the
   2133 leftmost symbol in the sequence.  *Note Recursive Rules: Recursion.
   2134 
   2135    The first alternative is empty because there are no symbols between
   2136 the colon and the first `|'; this means that `input' can match an empty
   2137 string of input (no tokens).  We write the rules this way because it is
   2138 legitimate to type `Ctrl-d' right after you start the calculator.  It's
   2139 conventional to put an empty alternative first and write the comment
   2140 `/* empty */' in it.
   2141 
   2142    The second alternate rule (`input line') handles all nontrivial
   2143 input.  It means, "After reading any number of lines, read one more
   2144 line if possible."  The left recursion makes this rule into a loop.
   2145 Since the first alternative matches empty input, the loop can be
   2146 executed zero or more times.
   2147 
   2148    The parser function `yyparse' continues to process input until a
   2149 grammatical error is seen or the lexical analyzer says there are no more
   2150 input tokens; we will arrange for the latter to happen at end-of-input.
   2151 
   2152 
   2153 File: bison.info,  Node: Rpcalc Line,  Next: Rpcalc Expr,  Prev: Rpcalc Input,  Up: Rpcalc Rules
   2154 
   2155 2.1.2.2 Explanation of `line'
   2156 .............................
   2157 
   2158 Now consider the definition of `line':
   2159 
   2160      line:
   2161        '\n'
   2162      | exp '\n'  { printf ("%.10g\n", $1); }
   2163      ;
   2164 
   2165    The first alternative is a token which is a newline character; this
   2166 means that rpcalc accepts a blank line (and ignores it, since there is
   2167 no action).  The second alternative is an expression followed by a
   2168 newline.  This is the alternative that makes rpcalc useful.  The
   2169 semantic value of the `exp' grouping is the value of `$1' because the
   2170 `exp' in question is the first symbol in the alternative.  The action
   2171 prints this value, which is the result of the computation the user
   2172 asked for.
   2173 
   2174    This action is unusual because it does not assign a value to `$$'.
   2175 As a consequence, the semantic value associated with the `line' is
   2176 uninitialized (its value will be unpredictable).  This would be a bug if
   2177 that value were ever used, but we don't use it: once rpcalc has printed
   2178 the value of the user's input line, that value is no longer needed.
   2179 
   2180 
   2181 File: bison.info,  Node: Rpcalc Expr,  Prev: Rpcalc Line,  Up: Rpcalc Rules
   2182 
   2183 2.1.2.3 Explanation of `expr'
   2184 .............................
   2185 
   2186 The `exp' grouping has several rules, one for each kind of expression.
   2187 The first rule handles the simplest expressions: those that are just
   2188 numbers.  The second handles an addition-expression, which looks like
   2189 two expressions followed by a plus-sign.  The third handles
   2190 subtraction, and so on.
   2191 
   2192      exp:
   2193        NUM
   2194      | exp exp '+'     { $$ = $1 + $2;    }
   2195      | exp exp '-'     { $$ = $1 - $2;    }
   2196      ...
   2197      ;
   2198 
   2199    We have used `|' to join all the rules for `exp', but we could
   2200 equally well have written them separately:
   2201 
   2202      exp: NUM ;
   2203      exp: exp exp '+'     { $$ = $1 + $2; };
   2204      exp: exp exp '-'     { $$ = $1 - $2; };
   2205      ...
   2206 
   2207    Most of the rules have actions that compute the value of the
   2208 expression in terms of the value of its parts.  For example, in the
   2209 rule for addition, `$1' refers to the first component `exp' and `$2'
   2210 refers to the second one.  The third component, `'+'', has no meaningful
   2211 associated semantic value, but if it had one you could refer to it as
   2212 `$3'.  When `yyparse' recognizes a sum expression using this rule, the
   2213 sum of the two subexpressions' values is produced as the value of the
   2214 entire expression.  *Note Actions::.
   2215 
   2216    You don't have to give an action for every rule.  When a rule has no
   2217 action, Bison by default copies the value of `$1' into `$$'.  This is
   2218 what happens in the first rule (the one that uses `NUM').
   2219 
   2220    The formatting shown here is the recommended convention, but Bison
   2221 does not require it.  You can add or change white space as much as you
   2222 wish.  For example, this:
   2223 
   2224      exp: NUM | exp exp '+' {$$ = $1 + $2; } | ... ;
   2225 
   2226 means the same thing as this:
   2227 
   2228      exp:
   2229        NUM
   2230      | exp exp '+'    { $$ = $1 + $2; }
   2231      | ...
   2232      ;
   2233 
   2234 The latter, however, is much more readable.
   2235 
   2236 
   2237 File: bison.info,  Node: Rpcalc Lexer,  Next: Rpcalc Main,  Prev: Rpcalc Rules,  Up: RPN Calc
   2238 
   2239 2.1.3 The `rpcalc' Lexical Analyzer
   2240 -----------------------------------
   2241 
   2242 The lexical analyzer's job is low-level parsing: converting characters
   2243 or sequences of characters into tokens.  The Bison parser gets its
   2244 tokens by calling the lexical analyzer.  *Note The Lexical Analyzer
   2245 Function `yylex': Lexical.
   2246 
   2247    Only a simple lexical analyzer is needed for the RPN calculator.
   2248 This lexical analyzer skips blanks and tabs, then reads in numbers as
   2249 `double' and returns them as `NUM' tokens.  Any other character that
   2250 isn't part of a number is a separate token.  Note that the token-code
   2251 for such a single-character token is the character itself.
   2252 
   2253    The return value of the lexical analyzer function is a numeric code
   2254 which represents a token type.  The same text used in Bison rules to
   2255 stand for this token type is also a C expression for the numeric code
   2256 for the type.  This works in two ways.  If the token type is a
   2257 character literal, then its numeric code is that of the character; you
   2258 can use the same character literal in the lexical analyzer to express
   2259 the number.  If the token type is an identifier, that identifier is
   2260 defined by Bison as a C macro whose definition is the appropriate
   2261 number.  In this example, therefore, `NUM' becomes a macro for `yylex'
   2262 to use.
   2263 
   2264    The semantic value of the token (if it has one) is stored into the
   2265 global variable `yylval', which is where the Bison parser will look for
   2266 it.  (The C data type of `yylval' is `YYSTYPE', which was defined at
   2267 the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc
   2268 Declarations.)
   2269 
   2270    A token type code of zero is returned if the end-of-input is
   2271 encountered.  (Bison recognizes any nonpositive value as indicating
   2272 end-of-input.)
   2273 
   2274    Here is the code for the lexical analyzer:
   2275 
   2276      /* The lexical analyzer returns a double floating point
   2277         number on the stack and the token NUM, or the numeric code
   2278         of the character read if not a number.  It skips all blanks
   2279         and tabs, and returns 0 for end-of-input.  */
   2280 
   2281      #include <ctype.h>
   2282 
   2283      int
   2284      yylex (void)
   2285      {
   2286        int c;
   2287 
   2288        /* Skip white space.  */
   2289        while ((c = getchar ()) == ' ' || c == '\t')
   2290          continue;
   2291        /* Process numbers.  */
   2292        if (c == '.' || isdigit (c))
   2293          {
   2294            ungetc (c, stdin);
   2295            scanf ("%lf", &yylval);
   2296            return NUM;
   2297          }
   2298        /* Return end-of-input.  */
   2299        if (c == EOF)
   2300          return 0;
   2301        /* Return a single char.  */
   2302        return c;
   2303      }
   2304 
   2305 
   2306 File: bison.info,  Node: Rpcalc Main,  Next: Rpcalc Error,  Prev: Rpcalc Lexer,  Up: RPN Calc
   2307 
   2308 2.1.4 The Controlling Function
   2309 ------------------------------
   2310 
   2311 In keeping with the spirit of this example, the controlling function is
   2312 kept to the bare minimum.  The only requirement is that it call
   2313 `yyparse' to start the process of parsing.
   2314 
   2315      int
   2316      main (void)
   2317      {
   2318        return yyparse ();
   2319      }
   2320 
   2321 
   2322 File: bison.info,  Node: Rpcalc Error,  Next: Rpcalc Generate,  Prev: Rpcalc Main,  Up: RPN Calc
   2323 
   2324 2.1.5 The Error Reporting Routine
   2325 ---------------------------------
   2326 
   2327 When `yyparse' detects a syntax error, it calls the error reporting
   2328 function `yyerror' to print an error message (usually but not always
   2329 `"syntax error"').  It is up to the programmer to supply `yyerror'
   2330 (*note Parser C-Language Interface: Interface.), so here is the
   2331 definition we will use:
   2332 
   2333      #include <stdio.h>
   2334 
   2335      /* Called by yyparse on error.  */
   2336      void
   2337      yyerror (char const *s)
   2338      {
   2339        fprintf (stderr, "%s\n", s);
   2340      }
   2341 
   2342    After `yyerror' returns, the Bison parser may recover from the error
   2343 and continue parsing if the grammar contains a suitable error rule
   2344 (*note Error Recovery::).  Otherwise, `yyparse' returns nonzero.  We
   2345 have not written any error rules in this example, so any invalid input
   2346 will cause the calculator program to exit.  This is not clean behavior
   2347 for a real calculator, but it is adequate for the first example.
   2348 
   2349 
   2350 File: bison.info,  Node: Rpcalc Generate,  Next: Rpcalc Compile,  Prev: Rpcalc Error,  Up: RPN Calc
   2351 
   2352 2.1.6 Running Bison to Make the Parser
   2353 --------------------------------------
   2354 
   2355 Before running Bison to produce a parser, we need to decide how to
   2356 arrange all the source code in one or more source files.  For such a
   2357 simple example, the easiest thing is to put everything in one file, the
   2358 grammar file.  The definitions of `yylex', `yyerror' and `main' go at
   2359 the end, in the epilogue of the grammar file (*note The Overall Layout
   2360 of a Bison Grammar: Grammar Layout.).
   2361 
   2362    For a large project, you would probably have several source files,
   2363 and use `make' to arrange to recompile them.
   2364 
   2365    With all the source in the grammar file, you use the following
   2366 command to convert it into a parser implementation file:
   2367 
   2368      bison FILE.y
   2369 
   2370 In this example, the grammar file is called `rpcalc.y' (for "Reverse
   2371 Polish CALCulator").  Bison produces a parser implementation file named
   2372 `FILE.tab.c', removing the `.y' from the grammar file name.  The parser
   2373 implementation file contains the source code for `yyparse'.  The
   2374 additional functions in the grammar file (`yylex', `yyerror' and
   2375 `main') are copied verbatim to the parser implementation file.
   2376 
   2377 
   2378 File: bison.info,  Node: Rpcalc Compile,  Prev: Rpcalc Generate,  Up: RPN Calc
   2379 
   2380 2.1.7 Compiling the Parser Implementation File
   2381 ----------------------------------------------
   2382 
   2383 Here is how to compile and run the parser implementation file:
   2384 
   2385      # List files in current directory.
   2386      $ ls
   2387      rpcalc.tab.c  rpcalc.y
   2388 
   2389      # Compile the Bison parser.
   2390      # `-lm' tells compiler to search math library for `pow'.
   2391      $ cc -lm -o rpcalc rpcalc.tab.c
   2392 
   2393      # List files again.
   2394      $ ls
   2395      rpcalc  rpcalc.tab.c  rpcalc.y
   2396 
   2397    The file `rpcalc' now contains the executable code.  Here is an
   2398 example session using `rpcalc'.
   2399 
   2400      $ rpcalc
   2401      4 9 +
   2402      13
   2403      3 7 + 3 4 5 *+-
   2404      -13
   2405      3 7 + 3 4 5 * + - n              Note the unary minus, `n'
   2406      13
   2407      5 6 / 4 n +
   2408      -3.166666667
   2409      3 4 ^                            Exponentiation
   2410      81
   2411      ^D                               End-of-file indicator
   2412      $
   2413 
   2414 
   2415 File: bison.info,  Node: Infix Calc,  Next: Simple Error Recovery,  Prev: RPN Calc,  Up: Examples
   2416 
   2417 2.2 Infix Notation Calculator: `calc'
   2418 =====================================
   2419 
   2420 We now modify rpcalc to handle infix operators instead of postfix.
   2421 Infix notation involves the concept of operator precedence and the need
   2422 for parentheses nested to arbitrary depth.  Here is the Bison code for
   2423 `calc.y', an infix desk-top calculator.
   2424 
   2425      /* Infix notation calculator.  */
   2426 
   2427      %{
   2428        #define YYSTYPE double
   2429        #include <math.h>
   2430        #include <stdio.h>
   2431        int yylex (void);
   2432        void yyerror (char const *);
   2433      %}
   2434 
   2435      /* Bison declarations.  */
   2436      %token NUM
   2437      %left '-' '+'
   2438      %left '*' '/'
   2439      %left NEG     /* negation--unary minus */
   2440      %right '^'    /* exponentiation */
   2441 
   2442      %% /* The grammar follows.  */
   2443      input:
   2444        /* empty */
   2445      | input line
   2446      ;
   2447 
   2448      line:
   2449        '\n'
   2450      | exp '\n'  { printf ("\t%.10g\n", $1); }
   2451      ;
   2452 
   2453      exp:
   2454        NUM                { $$ = $1;           }
   2455      | exp '+' exp        { $$ = $1 + $3;      }
   2456      | exp '-' exp        { $$ = $1 - $3;      }
   2457      | exp '*' exp        { $$ = $1 * $3;      }
   2458      | exp '/' exp        { $$ = $1 / $3;      }
   2459      | '-' exp  %prec NEG { $$ = -$2;          }
   2460      | exp '^' exp        { $$ = pow ($1, $3); }
   2461      | '(' exp ')'        { $$ = $2;           }
   2462      ;
   2463      %%
   2464 
   2465 The functions `yylex', `yyerror' and `main' can be the same as before.
   2466 
   2467    There are two important new features shown in this code.
   2468 
   2469    In the second section (Bison declarations), `%left' declares token
   2470 types and says they are left-associative operators.  The declarations
   2471 `%left' and `%right' (right associativity) take the place of `%token'
   2472 which is used to declare a token type name without associativity.
   2473 (These tokens are single-character literals, which ordinarily don't
   2474 need to be declared.  We declare them here to specify the
   2475 associativity.)
   2476 
   2477    Operator precedence is determined by the line ordering of the
   2478 declarations; the higher the line number of the declaration (lower on
   2479 the page or screen), the higher the precedence.  Hence, exponentiation
   2480 has the highest precedence, unary minus (`NEG') is next, followed by
   2481 `*' and `/', and so on.  *Note Operator Precedence: Precedence.
   2482 
   2483    The other important new feature is the `%prec' in the grammar
   2484 section for the unary minus operator.  The `%prec' simply instructs
   2485 Bison that the rule `| '-' exp' has the same precedence as `NEG'--in
   2486 this case the next-to-highest.  *Note Context-Dependent Precedence:
   2487 Contextual Precedence.
   2488 
   2489    Here is a sample run of `calc.y':
   2490 
   2491      $ calc
   2492      4 + 4.5 - (34/(8*3+-3))
   2493      6.880952381
   2494      -56 + 2
   2495      -54
   2496      3 ^ 2
   2497      9
   2498 
   2499 
   2500 File: bison.info,  Node: Simple Error Recovery,  Next: Location Tracking Calc,  Prev: Infix Calc,  Up: Examples
   2501 
   2502 2.3 Simple Error Recovery
   2503 =========================
   2504 
   2505 Up to this point, this manual has not addressed the issue of "error
   2506 recovery"--how to continue parsing after the parser detects a syntax
   2507 error.  All we have handled is error reporting with `yyerror'.  Recall
   2508 that by default `yyparse' returns after calling `yyerror'.  This means
   2509 that an erroneous input line causes the calculator program to exit.
   2510 Now we show how to rectify this deficiency.
   2511 
   2512    The Bison language itself includes the reserved word `error', which
   2513 may be included in the grammar rules.  In the example below it has been
   2514 added to one of the alternatives for `line':
   2515 
   2516      line:
   2517        '\n'
   2518      | exp '\n'   { printf ("\t%.10g\n", $1); }
   2519      | error '\n' { yyerrok;                  }
   2520      ;
   2521 
   2522    This addition to the grammar allows for simple error recovery in the
   2523 event of a syntax error.  If an expression that cannot be evaluated is
   2524 read, the error will be recognized by the third rule for `line', and
   2525 parsing will continue.  (The `yyerror' function is still called upon to
   2526 print its message as well.)  The action executes the statement
   2527 `yyerrok', a macro defined automatically by Bison; its meaning is that
   2528 error recovery is complete (*note Error Recovery::).  Note the
   2529 difference between `yyerrok' and `yyerror'; neither one is a misprint.
   2530 
   2531    This form of error recovery deals with syntax errors.  There are
   2532 other kinds of errors; for example, division by zero, which raises an
   2533 exception signal that is normally fatal.  A real calculator program
   2534 must handle this signal and use `longjmp' to return to `main' and
   2535 resume parsing input lines; it would also have to discard the rest of
   2536 the current line of input.  We won't discuss this issue further because
   2537 it is not specific to Bison programs.
   2538 
   2539 
   2540 File: bison.info,  Node: Location Tracking Calc,  Next: Multi-function Calc,  Prev: Simple Error Recovery,  Up: Examples
   2541 
   2542 2.4 Location Tracking Calculator: `ltcalc'
   2543 ==========================================
   2544 
   2545 This example extends the infix notation calculator with location
   2546 tracking.  This feature will be used to improve the error messages.  For
   2547 the sake of clarity, this example is a simple integer calculator, since
   2548 most of the work needed to use locations will be done in the lexical
   2549 analyzer.
   2550 
   2551 * Menu:
   2552 
   2553 * Ltcalc Declarations::    Bison and C declarations for ltcalc.
   2554 * Ltcalc Rules::           Grammar rules for ltcalc, with explanations.
   2555 * Ltcalc Lexer::           The lexical analyzer.
   2556 
   2557 
   2558 File: bison.info,  Node: Ltcalc Declarations,  Next: Ltcalc Rules,  Up: Location Tracking Calc
   2559 
   2560 2.4.1 Declarations for `ltcalc'
   2561 -------------------------------
   2562 
   2563 The C and Bison declarations for the location tracking calculator are
   2564 the same as the declarations for the infix notation calculator.
   2565 
   2566      /* Location tracking calculator.  */
   2567 
   2568      %{
   2569        #define YYSTYPE int
   2570        #include <math.h>
   2571        int yylex (void);
   2572        void yyerror (char const *);
   2573      %}
   2574 
   2575      /* Bison declarations.  */
   2576      %token NUM
   2577 
   2578      %left '-' '+'
   2579      %left '*' '/'
   2580      %left NEG
   2581      %right '^'
   2582 
   2583      %% /* The grammar follows.  */
   2584 
   2585 Note there are no declarations specific to locations.  Defining a data
   2586 type for storing locations is not needed: we will use the type provided
   2587 by default (*note Data Types of Locations: Location Type.), which is a
   2588 four member structure with the following integer fields: `first_line',
   2589 `first_column', `last_line' and `last_column'.  By conventions, and in
   2590 accordance with the GNU Coding Standards and common practice, the line
   2591 and column count both start at 1.
   2592 
   2593 
   2594 File: bison.info,  Node: Ltcalc Rules,  Next: Ltcalc Lexer,  Prev: Ltcalc Declarations,  Up: Location Tracking Calc
   2595 
   2596 2.4.2 Grammar Rules for `ltcalc'
   2597 --------------------------------
   2598 
   2599 Whether handling locations or not has no effect on the syntax of your
   2600 language.  Therefore, grammar rules for this example will be very close
   2601 to those of the previous example: we will only modify them to benefit
   2602 from the new information.
   2603 
   2604    Here, we will use locations to report divisions by zero, and locate
   2605 the wrong expressions or subexpressions.
   2606 
   2607      input:
   2608        /* empty */
   2609      | input line
   2610      ;
   2611 
   2612      line:
   2613        '\n'
   2614      | exp '\n' { printf ("%d\n", $1); }
   2615      ;
   2616 
   2617      exp:
   2618        NUM           { $$ = $1; }
   2619      | exp '+' exp   { $$ = $1 + $3; }
   2620      | exp '-' exp   { $$ = $1 - $3; }
   2621      | exp '*' exp   { $$ = $1 * $3; }
   2622      | exp '/' exp
   2623          {
   2624            if ($3)
   2625              $$ = $1 / $3;
   2626            else
   2627              {
   2628                $$ = 1;
   2629                fprintf (stderr, "%d.%d-%d.%d: division by zero",
   2630                         @3.first_line, @3.first_column,
   2631                         @3.last_line, @3.last_column);
   2632              }
   2633          }
   2634      | '-' exp %prec NEG     { $$ = -$2; }
   2635      | exp '^' exp           { $$ = pow ($1, $3); }
   2636      | '(' exp ')'           { $$ = $2; }
   2637 
   2638    This code shows how to reach locations inside of semantic actions, by
   2639 using the pseudo-variables `@N' for rule components, and the
   2640 pseudo-variable `@$' for groupings.
   2641 
   2642    We don't need to assign a value to `@$': the output parser does it
   2643 automatically.  By default, before executing the C code of each action,
   2644 `@$' is set to range from the beginning of `@1' to the end of `@N', for
   2645 a rule with N components.  This behavior can be redefined (*note
   2646 Default Action for Locations: Location Default Action.), and for very
   2647 specific rules, `@$' can be computed by hand.
   2648 
   2649 
   2650 File: bison.info,  Node: Ltcalc Lexer,  Prev: Ltcalc Rules,  Up: Location Tracking Calc
   2651 
   2652 2.4.3 The `ltcalc' Lexical Analyzer.
   2653 ------------------------------------
   2654 
   2655 Until now, we relied on Bison's defaults to enable location tracking.
   2656 The next step is to rewrite the lexical analyzer, and make it able to
   2657 feed the parser with the token locations, as it already does for
   2658 semantic values.
   2659 
   2660    To this end, we must take into account every single character of the
   2661 input text, to avoid the computed locations of being fuzzy or wrong:
   2662 
   2663      int
   2664      yylex (void)
   2665      {
   2666        int c;
   2667 
   2668        /* Skip white space.  */
   2669        while ((c = getchar ()) == ' ' || c == '\t')
   2670          ++yylloc.last_column;
   2671 
   2672        /* Step.  */
   2673        yylloc.first_line = yylloc.last_line;
   2674        yylloc.first_column = yylloc.last_column;
   2675 
   2676        /* Process numbers.  */
   2677        if (isdigit (c))
   2678          {
   2679            yylval = c - '0';
   2680            ++yylloc.last_column;
   2681            while (isdigit (c = getchar ()))
   2682              {
   2683                ++yylloc.last_column;
   2684                yylval = yylval * 10 + c - '0';
   2685              }
   2686            ungetc (c, stdin);
   2687            return NUM;
   2688          }
   2689 
   2690        /* Return end-of-input.  */
   2691        if (c == EOF)
   2692          return 0;
   2693 
   2694        /* Return a single char, and update location.  */
   2695        if (c == '\n')
   2696          {
   2697            ++yylloc.last_line;
   2698            yylloc.last_column = 0;
   2699          }
   2700        else
   2701          ++yylloc.last_column;
   2702        return c;
   2703      }
   2704 
   2705    Basically, the lexical analyzer performs the same processing as
   2706 before: it skips blanks and tabs, and reads numbers or single-character
   2707 tokens.  In addition, it updates `yylloc', the global variable (of type
   2708 `YYLTYPE') containing the token's location.
   2709 
   2710    Now, each time this function returns a token, the parser has its
   2711 number as well as its semantic value, and its location in the text.
   2712 The last needed change is to initialize `yylloc', for example in the
   2713 controlling function:
   2714 
   2715      int
   2716      main (void)
   2717      {
   2718        yylloc.first_line = yylloc.last_line = 1;
   2719        yylloc.first_column = yylloc.last_column = 0;
   2720        return yyparse ();
   2721      }
   2722 
   2723    Remember that computing locations is not a matter of syntax.  Every
   2724 character must be associated to a location update, whether it is in
   2725 valid input, in comments, in literal strings, and so on.
   2726 
   2727 
   2728 File: bison.info,  Node: Multi-function Calc,  Next: Exercises,  Prev: Location Tracking Calc,  Up: Examples
   2729 
   2730 2.5 Multi-Function Calculator: `mfcalc'
   2731 =======================================
   2732 
   2733 Now that the basics of Bison have been discussed, it is time to move on
   2734 to a more advanced problem.  The above calculators provided only five
   2735 functions, `+', `-', `*', `/' and `^'.  It would be nice to have a
   2736 calculator that provides other mathematical functions such as `sin',
   2737 `cos', etc.
   2738 
   2739    It is easy to add new operators to the infix calculator as long as
   2740 they are only single-character literals.  The lexical analyzer `yylex'
   2741 passes back all nonnumeric characters as tokens, so new grammar rules
   2742 suffice for adding a new operator.  But we want something more
   2743 flexible: built-in functions whose syntax has this form:
   2744 
   2745      FUNCTION_NAME (ARGUMENT)
   2746 
   2747 At the same time, we will add memory to the calculator, by allowing you
   2748 to create named variables, store values in them, and use them later.
   2749 Here is a sample session with the multi-function calculator:
   2750 
   2751      $ mfcalc
   2752      pi = 3.141592653589
   2753      3.1415926536
   2754      sin(pi)
   2755      0.0000000000
   2756      alpha = beta1 = 2.3
   2757      2.3000000000
   2758      alpha
   2759      2.3000000000
   2760      ln(alpha)
   2761      0.8329091229
   2762      exp(ln(beta1))
   2763      2.3000000000
   2764      $
   2765 
   2766    Note that multiple assignment and nested function calls are
   2767 permitted.
   2768 
   2769 * Menu:
   2770 
   2771 * Mfcalc Declarations::    Bison declarations for multi-function calculator.
   2772 * Mfcalc Rules::           Grammar rules for the calculator.
   2773 * Mfcalc Symbol Table::    Symbol table management subroutines.
   2774 
   2775 
   2776 File: bison.info,  Node: Mfcalc Declarations,  Next: Mfcalc Rules,  Up: Multi-function Calc
   2777 
   2778 2.5.1 Declarations for `mfcalc'
   2779 -------------------------------
   2780 
   2781 Here are the C and Bison declarations for the multi-function calculator.
   2782 
   2783      %{
   2784        #include <math.h>  /* For math functions, cos(), sin(), etc.  */
   2785        #include "calc.h"  /* Contains definition of `symrec'.  */
   2786        int yylex (void);
   2787        void yyerror (char const *);
   2788      %}
   2789 
   2790      %union {
   2791        double    val;   /* For returning numbers.  */
   2792        symrec  *tptr;   /* For returning symbol-table pointers.  */
   2793      }
   2794      %token <val>  NUM        /* Simple double precision number.  */
   2795      %token <tptr> VAR FNCT   /* Variable and function.  */
   2796      %type  <val>  exp
   2797 
   2798      %right '='
   2799      %left '-' '+'
   2800      %left '*' '/'
   2801      %left NEG     /* negation--unary minus */
   2802      %right '^'    /* exponentiation */
   2803 
   2804    The above grammar introduces only two new features of the Bison
   2805 language.  These features allow semantic values to have various data
   2806 types (*note More Than One Value Type: Multiple Types.).
   2807 
   2808    The `%union' declaration specifies the entire list of possible types;
   2809 this is instead of defining `YYSTYPE'.  The allowable types are now
   2810 double-floats (for `exp' and `NUM') and pointers to entries in the
   2811 symbol table.  *Note The Collection of Value Types: Union Decl.
   2812 
   2813    Since values can now have various types, it is necessary to
   2814 associate a type with each grammar symbol whose semantic value is used.
   2815 These symbols are `NUM', `VAR', `FNCT', and `exp'.  Their declarations
   2816 are augmented with information about their data type (placed between
   2817 angle brackets).
   2818 
   2819    The Bison construct `%type' is used for declaring nonterminal
   2820 symbols, just as `%token' is used for declaring token types.  We have
   2821 not used `%type' before because nonterminal symbols are normally
   2822 declared implicitly by the rules that define them.  But `exp' must be
   2823 declared explicitly so we can specify its value type.  *Note
   2824 Nonterminal Symbols: Type Decl.
   2825 
   2826 
   2827 File: bison.info,  Node: Mfcalc Rules,  Next: Mfcalc Symbol Table,  Prev: Mfcalc Declarations,  Up: Multi-function Calc
   2828 
   2829 2.5.2 Grammar Rules for `mfcalc'
   2830 --------------------------------
   2831 
   2832 Here are the grammar rules for the multi-function calculator.  Most of
   2833 them are copied directly from `calc'; three rules, those which mention
   2834 `VAR' or `FNCT', are new.
   2835 
   2836      %% /* The grammar follows.  */
   2837      input:
   2838        /* empty */
   2839      | input line
   2840      ;
   2841 
   2842      line:
   2843        '\n'
   2844      | exp '\n'   { printf ("%.10g\n", $1); }
   2845      | error '\n' { yyerrok;                }
   2846      ;
   2847 
   2848      exp:
   2849        NUM                { $$ = $1;                         }
   2850      | VAR                { $$ = $1->value.var;              }
   2851      | VAR '=' exp        { $$ = $3; $1->value.var = $3;     }
   2852      | FNCT '(' exp ')'   { $$ = (*($1->value.fnctptr))($3); }
   2853      | exp '+' exp        { $$ = $1 + $3;                    }
   2854      | exp '-' exp        { $$ = $1 - $3;                    }
   2855      | exp '*' exp        { $$ = $1 * $3;                    }
   2856      | exp '/' exp        { $$ = $1 / $3;                    }
   2857      | '-' exp  %prec NEG { $$ = -$2;                        }
   2858      | exp '^' exp        { $$ = pow ($1, $3);               }
   2859      | '(' exp ')'        { $$ = $2;                         }
   2860      ;
   2861      /* End of grammar.  */
   2862      %%
   2863 
   2864 
   2865 File: bison.info,  Node: Mfcalc Symbol Table,  Prev: Mfcalc Rules,  Up: Multi-function Calc
   2866 
   2867 2.5.3 The `mfcalc' Symbol Table
   2868 -------------------------------
   2869 
   2870 The multi-function calculator requires a symbol table to keep track of
   2871 the names and meanings of variables and functions.  This doesn't affect
   2872 the grammar rules (except for the actions) or the Bison declarations,
   2873 but it requires some additional C functions for support.
   2874 
   2875    The symbol table itself consists of a linked list of records.  Its
   2876 definition, which is kept in the header `calc.h', is as follows.  It
   2877 provides for either functions or variables to be placed in the table.
   2878 
   2879      /* Function type.  */
   2880      typedef double (*func_t) (double);
   2881 
   2882      /* Data type for links in the chain of symbols.  */
   2883      struct symrec
   2884      {
   2885        char *name;  /* name of symbol */
   2886        int type;    /* type of symbol: either VAR or FNCT */
   2887        union
   2888        {
   2889          double var;      /* value of a VAR */
   2890          func_t fnctptr;  /* value of a FNCT */
   2891        } value;
   2892        struct symrec *next;  /* link field */
   2893      };
   2894 
   2895      typedef struct symrec symrec;
   2896 
   2897      /* The symbol table: a chain of `struct symrec'.  */
   2898      extern symrec *sym_table;
   2899 
   2900      symrec *putsym (char const *, int);
   2901      symrec *getsym (char const *);
   2902 
   2903    The new version of `main' includes a call to `init_table', a
   2904 function that initializes the symbol table.  Here it is, and
   2905 `init_table' as well:
   2906 
   2907      #include <stdio.h>
   2908 
   2909      /* Called by yyparse on error.  */
   2910      void
   2911      yyerror (char const *s)
   2912      {
   2913        fprintf (stderr, "%s\n", s);
   2914      }
   2915 
   2916      struct init
   2917      {
   2918        char const *fname;
   2919        double (*fnct) (double);
   2920      };
   2921 
   2922      struct init const arith_fncts[] =
   2923      {
   2924        "sin",  sin,
   2925        "cos",  cos,
   2926        "atan", atan,
   2927        "ln",   log,
   2928        "exp",  exp,
   2929        "sqrt", sqrt,
   2930        0, 0
   2931      };
   2932 
   2933      /* The symbol table: a chain of `struct symrec'.  */
   2934      symrec *sym_table;
   2935 
   2936      /* Put arithmetic functions in table.  */
   2937      void
   2938      init_table (void)
   2939      {
   2940        int i;
   2941        for (i = 0; arith_fncts[i].fname != 0; i++)
   2942          {
   2943            symrec *ptr = putsym (arith_fncts[i].fname, FNCT);
   2944            ptr->value.fnctptr = arith_fncts[i].fnct;
   2945          }
   2946      }
   2947 
   2948      int
   2949      main (void)
   2950      {
   2951        init_table ();
   2952        return yyparse ();
   2953      }
   2954 
   2955    By simply editing the initialization list and adding the necessary
   2956 include files, you can add additional functions to the calculator.
   2957 
   2958    Two important functions allow look-up and installation of symbols in
   2959 the symbol table.  The function `putsym' is passed a name and the type
   2960 (`VAR' or `FNCT') of the object to be installed.  The object is linked
   2961 to the front of the list, and a pointer to the object is returned.  The
   2962 function `getsym' is passed the name of the symbol to look up.  If
   2963 found, a pointer to that symbol is returned; otherwise zero is returned.
   2964 
   2965      #include <stdlib.h> /* malloc. */
   2966      #include <string.h> /* strlen. */
   2967 
   2968      symrec *
   2969      putsym (char const *sym_name, int sym_type)
   2970      {
   2971        symrec *ptr = (symrec *) malloc (sizeof (symrec));
   2972        ptr->name = (char *) malloc (strlen (sym_name) + 1);
   2973        strcpy (ptr->name,sym_name);
   2974        ptr->type = sym_type;
   2975        ptr->value.var = 0; /* Set value to 0 even if fctn.  */
   2976        ptr->next = (struct symrec *)sym_table;
   2977        sym_table = ptr;
   2978        return ptr;
   2979      }
   2980 
   2981      symrec *
   2982      getsym (char const *sym_name)
   2983      {
   2984        symrec *ptr;
   2985        for (ptr = sym_table; ptr != (symrec *) 0;
   2986             ptr = (symrec *)ptr->next)
   2987          if (strcmp (ptr->name,sym_name) == 0)
   2988            return ptr;
   2989        return 0;
   2990      }
   2991 
   2992    The function `yylex' must now recognize variables, numeric values,
   2993 and the single-character arithmetic operators.  Strings of alphanumeric
   2994 characters with a leading letter are recognized as either variables or
   2995 functions depending on what the symbol table says about them.
   2996 
   2997    The string is passed to `getsym' for look up in the symbol table.  If
   2998 the name appears in the table, a pointer to its location and its type
   2999 (`VAR' or `FNCT') is returned to `yyparse'.  If it is not already in
   3000 the table, then it is installed as a `VAR' using `putsym'.  Again, a
   3001 pointer and its type (which must be `VAR') is returned to `yyparse'.
   3002 
   3003    No change is needed in the handling of numeric values and arithmetic
   3004 operators in `yylex'.
   3005 
   3006      #include <ctype.h>
   3007 
   3008      int
   3009      yylex (void)
   3010      {
   3011        int c;
   3012 
   3013        /* Ignore white space, get first nonwhite character.  */
   3014        while ((c = getchar ()) == ' ' || c == '\t')
   3015          continue;
   3016 
   3017        if (c == EOF)
   3018          return 0;
   3019 
   3020        /* Char starts a number => parse the number.         */
   3021        if (c == '.' || isdigit (c))
   3022          {
   3023            ungetc (c, stdin);
   3024            scanf ("%lf", &yylval.val);
   3025            return NUM;
   3026          }
   3027 
   3028        /* Char starts an identifier => read the name.       */
   3029        if (isalpha (c))
   3030          {
   3031            /* Initially make the buffer long enough
   3032               for a 40-character symbol name.  */
   3033            static size_t length = 40;
   3034            static char *symbuf = 0;
   3035            symrec *s;
   3036            int i;
   3037 
   3038            if (!symbuf)
   3039              symbuf = (char *) malloc (length + 1);
   3040 
   3041            i = 0;
   3042            do
   3043              {
   3044                /* If buffer is full, make it bigger.        */
   3045                if (i == length)
   3046                  {
   3047                    length *= 2;
   3048                    symbuf = (char *) realloc (symbuf, length + 1);
   3049                  }
   3050                /* Add this character to the buffer.         */
   3051                symbuf[i++] = c;
   3052                /* Get another character.                    */
   3053                c = getchar ();
   3054              }
   3055            while (isalnum (c));
   3056 
   3057            ungetc (c, stdin);
   3058            symbuf[i] = '\0';
   3059 
   3060            s = getsym (symbuf);
   3061            if (s == 0)
   3062              s = putsym (symbuf, VAR);
   3063            yylval.tptr = s;
   3064            return s->type;
   3065          }
   3066 
   3067        /* Any other character is a token by itself.        */
   3068        return c;
   3069      }
   3070 
   3071    The error reporting function is unchanged, and the new version of
   3072 `main' includes a call to `init_table' and sets the `yydebug' on user
   3073 demand (*Note Tracing Your Parser: Tracing, for details):
   3074 
   3075      /* Called by yyparse on error.  */
   3076      void
   3077      yyerror (char const *s)
   3078      {
   3079        fprintf (stderr, "%s\n", s);
   3080      }
   3081 
   3082      int
   3083      main (int argc, char const* argv[])
   3084      {
   3085        int i;
   3086        /* Enable parse traces on option -p.  */
   3087        for (i = 1; i < argc; ++i)
   3088          if (!strcmp(argv[i], "-p"))
   3089            yydebug = 1;
   3090        init_table ();
   3091        return yyparse ();
   3092      }
   3093 
   3094    This program is both powerful and flexible.  You may easily add new
   3095 functions, and it is a simple job to modify this code to install
   3096 predefined variables such as `pi' or `e' as well.
   3097 
   3098 
   3099 File: bison.info,  Node: Exercises,  Prev: Multi-function Calc,  Up: Examples
   3100 
   3101 2.6 Exercises
   3102 =============
   3103 
   3104   1. Add some new functions from `math.h' to the initialization list.
   3105 
   3106   2. Add another array that contains constants and their values.  Then
   3107      modify `init_table' to add these constants to the symbol table.
   3108      It will be easiest to give the constants type `VAR'.
   3109 
   3110   3. Make the program report an error if the user refers to an
   3111      uninitialized variable in any way except to store a value in it.
   3112 
   3113 
   3114 File: bison.info,  Node: Grammar File,  Next: Interface,  Prev: Examples,  Up: Top
   3115 
   3116 3 Bison Grammar Files
   3117 *********************
   3118 
   3119 Bison takes as input a context-free grammar specification and produces a
   3120 C-language function that recognizes correct instances of the grammar.
   3121 
   3122    The Bison grammar file conventionally has a name ending in `.y'.
   3123 *Note Invoking Bison: Invocation.
   3124 
   3125 * Menu:
   3126 
   3127 * Grammar Outline::    Overall layout of the grammar file.
   3128 * Symbols::            Terminal and nonterminal symbols.
   3129 * Rules::              How to write grammar rules.
   3130 * Recursion::          Writing recursive rules.
   3131 * Semantics::          Semantic values and actions.
   3132 * Tracking Locations:: Locations and actions.
   3133 * Named References::   Using named references in actions.
   3134 * Declarations::       All kinds of Bison declarations are described here.
   3135 * Multiple Parsers::   Putting more than one Bison parser in one program.
   3136 
   3137 
   3138 File: bison.info,  Node: Grammar Outline,  Next: Symbols,  Up: Grammar File
   3139 
   3140 3.1 Outline of a Bison Grammar
   3141 ==============================
   3142 
   3143 A Bison grammar file has four main sections, shown here with the
   3144 appropriate delimiters:
   3145 
   3146      %{
   3147        PROLOGUE
   3148      %}
   3149 
   3150      BISON DECLARATIONS
   3151 
   3152      %%
   3153      GRAMMAR RULES
   3154      %%
   3155 
   3156      EPILOGUE
   3157 
   3158    Comments enclosed in `/* ... */' may appear in any of the sections.
   3159 As a GNU extension, `//' introduces a comment that continues until end
   3160 of line.
   3161 
   3162 * Menu:
   3163 
   3164 * Prologue::              Syntax and usage of the prologue.
   3165 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
   3166 * Bison Declarations::    Syntax and usage of the Bison declarations section.
   3167 * Grammar Rules::         Syntax and usage of the grammar rules section.
   3168 * Epilogue::              Syntax and usage of the epilogue.
   3169 
   3170 
   3171 File: bison.info,  Node: Prologue,  Next: Prologue Alternatives,  Up: Grammar Outline
   3172 
   3173 3.1.1 The prologue
   3174 ------------------
   3175 
   3176 The PROLOGUE section contains macro definitions and declarations of
   3177 functions and variables that are used in the actions in the grammar
   3178 rules.  These are copied to the beginning of the parser implementation
   3179 file so that they precede the definition of `yyparse'.  You can use
   3180 `#include' to get the declarations from a header file.  If you don't
   3181 need any C declarations, you may omit the `%{' and `%}' delimiters that
   3182 bracket this section.
   3183 
   3184    The PROLOGUE section is terminated by the first occurrence of `%}'
   3185 that is outside a comment, a string literal, or a character constant.
   3186 
   3187    You may have more than one PROLOGUE section, intermixed with the
   3188 BISON DECLARATIONS.  This allows you to have C and Bison declarations
   3189 that refer to each other.  For example, the `%union' declaration may
   3190 use types defined in a header file, and you may wish to prototype
   3191 functions that take arguments of type `YYSTYPE'.  This can be done with
   3192 two PROLOGUE blocks, one before and one after the `%union' declaration.
   3193 
   3194      %{
   3195        #define _GNU_SOURCE
   3196        #include <stdio.h>
   3197        #include "ptypes.h"
   3198      %}
   3199 
   3200      %union {
   3201        long int n;
   3202        tree t;  /* `tree' is defined in `ptypes.h'. */
   3203      }
   3204 
   3205      %{
   3206        static void print_token_value (FILE *, int, YYSTYPE);
   3207        #define YYPRINT(F, N, L) print_token_value (F, N, L)
   3208      %}
   3209 
   3210      ...
   3211 
   3212    When in doubt, it is usually safer to put prologue code before all
   3213 Bison declarations, rather than after.  For example, any definitions of
   3214 feature test macros like `_GNU_SOURCE' or `_POSIX_C_SOURCE' should
   3215 appear before all Bison declarations, as feature test macros can affect
   3216 the behavior of Bison-generated `#include' directives.
   3217 
   3218 
   3219 File: bison.info,  Node: Prologue Alternatives,  Next: Bison Declarations,  Prev: Prologue,  Up: Grammar Outline
   3220 
   3221 3.1.2 Prologue Alternatives
   3222 ---------------------------
   3223 
   3224 The functionality of PROLOGUE sections can often be subtle and
   3225 inflexible.  As an alternative, Bison provides a `%code' directive with
   3226 an explicit qualifier field, which identifies the purpose of the code
   3227 and thus the location(s) where Bison should generate it.  For C/C++,
   3228 the qualifier can be omitted for the default location, or it can be one
   3229 of `requires', `provides', `top'.  *Note %code Summary::.
   3230 
   3231    Look again at the example of the previous section:
   3232 
   3233      %{
   3234        #define _GNU_SOURCE
   3235        #include <stdio.h>
   3236        #include "ptypes.h"
   3237      %}
   3238 
   3239      %union {
   3240        long int n;
   3241        tree t;  /* `tree' is defined in `ptypes.h'. */
   3242      }
   3243 
   3244      %{
   3245        static void print_token_value (FILE *, int, YYSTYPE);
   3246        #define YYPRINT(F, N, L) print_token_value (F, N, L)
   3247      %}
   3248 
   3249      ...
   3250 
   3251 Notice that there are two PROLOGUE sections here, but there's a subtle
   3252 distinction between their functionality.  For example, if you decide to
   3253 override Bison's default definition for `YYLTYPE', in which PROLOGUE
   3254 section should you write your new definition?  You should write it in
   3255 the first since Bison will insert that code into the parser
   3256 implementation file _before_ the default `YYLTYPE' definition.  In
   3257 which PROLOGUE section should you prototype an internal function,
   3258 `trace_token', that accepts `YYLTYPE' and `yytokentype' as arguments?
   3259 You should prototype it in the second since Bison will insert that code
   3260 _after_ the `YYLTYPE' and `yytokentype' definitions.
   3261 
   3262    This distinction in functionality between the two PROLOGUE sections
   3263 is established by the appearance of the `%union' between them.  This
   3264 behavior raises a few questions.  First, why should the position of a
   3265 `%union' affect definitions related to `YYLTYPE' and `yytokentype'?
   3266 Second, what if there is no `%union'?  In that case, the second kind of
   3267 PROLOGUE section is not available.  This behavior is not intuitive.
   3268 
   3269    To avoid this subtle `%union' dependency, rewrite the example using a
   3270 `%code top' and an unqualified `%code'.  Let's go ahead and add the new
   3271 `YYLTYPE' definition and the `trace_token' prototype at the same time:
   3272 
   3273      %code top {
   3274        #define _GNU_SOURCE
   3275        #include <stdio.h>
   3276 
   3277        /* WARNING: The following code really belongs
   3278         * in a `%code requires'; see below.  */
   3279 
   3280        #include "ptypes.h"
   3281        #define YYLTYPE YYLTYPE
   3282        typedef struct YYLTYPE
   3283        {
   3284          int first_line;
   3285          int first_column;
   3286          int last_line;
   3287          int last_column;
   3288          char *filename;
   3289        } YYLTYPE;
   3290      }
   3291 
   3292      %union {
   3293        long int n;
   3294        tree t;  /* `tree' is defined in `ptypes.h'. */
   3295      }
   3296 
   3297      %code {
   3298        static void print_token_value (FILE *, int, YYSTYPE);
   3299        #define YYPRINT(F, N, L) print_token_value (F, N, L)
   3300        static void trace_token (enum yytokentype token, YYLTYPE loc);
   3301      }
   3302 
   3303      ...
   3304 
   3305 In this way, `%code top' and the unqualified `%code' achieve the same
   3306 functionality as the two kinds of PROLOGUE sections, but it's always
   3307 explicit which kind you intend.  Moreover, both kinds are always
   3308 available even in the absence of `%union'.
   3309 
   3310    The `%code top' block above logically contains two parts.  The first
   3311 two lines before the warning need to appear near the top of the parser
   3312 implementation file.  The first line after the warning is required by
   3313 `YYSTYPE' and thus also needs to appear in the parser implementation
   3314 file.  However, if you've instructed Bison to generate a parser header
   3315 file (*note %defines: Decl Summary.), you probably want that line to
   3316 appear before the `YYSTYPE' definition in that header file as well.
   3317 The `YYLTYPE' definition should also appear in the parser header file
   3318 to override the default `YYLTYPE' definition there.
   3319 
   3320    In other words, in the `%code top' block above, all but the first two
   3321 lines are dependency code required by the `YYSTYPE' and `YYLTYPE'
   3322 definitions.  Thus, they belong in one or more `%code requires':
   3323 
   3324      %code top {
   3325        #define _GNU_SOURCE
   3326        #include <stdio.h>
   3327      }
   3328 
   3329      %code requires {
   3330        #include "ptypes.h"
   3331      }
   3332      %union {
   3333        long int n;
   3334        tree t;  /* `tree' is defined in `ptypes.h'. */
   3335      }
   3336 
   3337      %code requires {
   3338        #define YYLTYPE YYLTYPE
   3339        typedef struct YYLTYPE
   3340        {
   3341          int first_line;
   3342          int first_column;
   3343          int last_line;
   3344          int last_column;
   3345          char *filename;
   3346        } YYLTYPE;
   3347      }
   3348 
   3349      %code {
   3350        static void print_token_value (FILE *, int, YYSTYPE);
   3351        #define YYPRINT(F, N, L) print_token_value (F, N, L)
   3352        static void trace_token (enum yytokentype token, YYLTYPE loc);
   3353      }
   3354 
   3355      ...
   3356 
   3357 Now Bison will insert `#include "ptypes.h"' and the new `YYLTYPE'
   3358 definition before the Bison-generated `YYSTYPE' and `YYLTYPE'
   3359 definitions in both the parser implementation file and the parser
   3360 header file.  (By the same reasoning, `%code requires' would also be
   3361 the appropriate place to write your own definition for `YYSTYPE'.)
   3362 
   3363    When you are writing dependency code for `YYSTYPE' and `YYLTYPE',
   3364 you should prefer `%code requires' over `%code top' regardless of
   3365 whether you instruct Bison to generate a parser header file.  When you
   3366 are writing code that you need Bison to insert only into the parser
   3367 implementation file and that has no special need to appear at the top
   3368 of that file, you should prefer the unqualified `%code' over `%code
   3369 top'.  These practices will make the purpose of each block of your code
   3370 explicit to Bison and to other developers reading your grammar file.
   3371 Following these practices, we expect the unqualified `%code' and `%code
   3372 requires' to be the most important of the four PROLOGUE alternatives.
   3373 
   3374    At some point while developing your parser, you might decide to
   3375 provide `trace_token' to modules that are external to your parser.
   3376 Thus, you might wish for Bison to insert the prototype into both the
   3377 parser header file and the parser implementation file.  Since this
   3378 function is not a dependency required by `YYSTYPE' or `YYLTYPE', it
   3379 doesn't make sense to move its prototype to a `%code requires'.  More
   3380 importantly, since it depends upon `YYLTYPE' and `yytokentype', `%code
   3381 requires' is not sufficient.  Instead, move its prototype from the
   3382 unqualified `%code' to a `%code provides':
   3383 
   3384      %code top {
   3385        #define _GNU_SOURCE
   3386        #include <stdio.h>
   3387      }
   3388 
   3389      %code requires {
   3390        #include "ptypes.h"
   3391      }
   3392      %union {
   3393        long int n;
   3394        tree t;  /* `tree' is defined in `ptypes.h'. */
   3395      }
   3396 
   3397      %code requires {
   3398        #define YYLTYPE YYLTYPE
   3399        typedef struct YYLTYPE
   3400        {
   3401          int first_line;
   3402          int first_column;
   3403          int last_line;
   3404          int last_column;
   3405          char *filename;
   3406        } YYLTYPE;
   3407      }
   3408 
   3409      %code provides {
   3410        void trace_token (enum yytokentype token, YYLTYPE loc);
   3411      }
   3412 
   3413      %code {
   3414        static void print_token_value (FILE *, int, YYSTYPE);
   3415        #define YYPRINT(F, N, L) print_token_value (F, N, L)
   3416      }
   3417 
   3418      ...
   3419 
   3420 Bison will insert the `trace_token' prototype into both the parser
   3421 header file and the parser implementation file after the definitions
   3422 for `yytokentype', `YYLTYPE', and `YYSTYPE'.
   3423 
   3424    The above examples are careful to write directives in an order that
   3425 reflects the layout of the generated parser implementation and header
   3426 files: `%code top', `%code requires', `%code provides', and then
   3427 `%code'.  While your grammar files may generally be easier to read if
   3428 you also follow this order, Bison does not require it.  Instead, Bison
   3429 lets you choose an organization that makes sense to you.
   3430 
   3431    You may declare any of these directives multiple times in the
   3432 grammar file.  In that case, Bison concatenates the contained code in
   3433 declaration order.  This is the only way in which the position of one
   3434 of these directives within the grammar file affects its functionality.
   3435 
   3436    The result of the previous two properties is greater flexibility in
   3437 how you may organize your grammar file.  For example, you may organize
   3438 semantic-type-related directives by semantic type:
   3439 
   3440      %code requires { #include "type1.h" }
   3441      %union { type1 field1; }
   3442      %destructor { type1_free ($$); } <field1>
   3443      %printer { type1_print (yyoutput, $$); } <field1>
   3444 
   3445      %code requires { #include "type2.h" }
   3446      %union { type2 field2; }
   3447      %destructor { type2_free ($$); } <field2>
   3448      %printer { type2_print (yyoutput, $$); } <field2>
   3449 
   3450 You could even place each of the above directive groups in the rules
   3451 section of the grammar file next to the set of rules that uses the
   3452 associated semantic type.  (In the rules section, you must terminate
   3453 each of those directives with a semicolon.)  And you don't have to
   3454 worry that some directive (like a `%union') in the definitions section
   3455 is going to adversely affect their functionality in some
   3456 counter-intuitive manner just because it comes first.  Such an
   3457 organization is not possible using PROLOGUE sections.
   3458 
   3459    This section has been concerned with explaining the advantages of
   3460 the four PROLOGUE alternatives over the original Yacc PROLOGUE.
   3461 However, in most cases when using these directives, you shouldn't need
   3462 to think about all the low-level ordering issues discussed here.
   3463 Instead, you should simply use these directives to label each block of
   3464 your code according to its purpose and let Bison handle the ordering.
   3465 `%code' is the most generic label.  Move code to `%code requires',
   3466 `%code provides', or `%code top' as needed.
   3467 
   3468 
   3469 File: bison.info,  Node: Bison Declarations,  Next: Grammar Rules,  Prev: Prologue Alternatives,  Up: Grammar Outline
   3470 
   3471 3.1.3 The Bison Declarations Section
   3472 ------------------------------------
   3473 
   3474 The BISON DECLARATIONS section contains declarations that define
   3475 terminal and nonterminal symbols, specify precedence, and so on.  In
   3476 some simple grammars you may not need any declarations.  *Note Bison
   3477 Declarations: Declarations.
   3478 
   3479 
   3480 File: bison.info,  Node: Grammar Rules,  Next: Epilogue,  Prev: Bison Declarations,  Up: Grammar Outline
   3481 
   3482 3.1.4 The Grammar Rules Section
   3483 -------------------------------
   3484 
   3485 The "grammar rules" section contains one or more Bison grammar rules,
   3486 and nothing else.  *Note Syntax of Grammar Rules: Rules.
   3487 
   3488    There must always be at least one grammar rule, and the first `%%'
   3489 (which precedes the grammar rules) may never be omitted even if it is
   3490 the first thing in the file.
   3491 
   3492 
   3493 File: bison.info,  Node: Epilogue,  Prev: Grammar Rules,  Up: Grammar Outline
   3494 
   3495 3.1.5 The epilogue
   3496 ------------------
   3497 
   3498 The EPILOGUE is copied verbatim to the end of the parser implementation
   3499 file, just as the PROLOGUE is copied to the beginning.  This is the
   3500 most convenient place to put anything that you want to have in the
   3501 parser implementation file but which need not come before the
   3502 definition of `yyparse'.  For example, the definitions of `yylex' and
   3503 `yyerror' often go here.  Because C requires functions to be declared
   3504 before being used, you often need to declare functions like `yylex' and
   3505 `yyerror' in the Prologue, even if you define them in the Epilogue.
   3506 *Note Parser C-Language Interface: Interface.
   3507 
   3508    If the last section is empty, you may omit the `%%' that separates it
   3509 from the grammar rules.
   3510 
   3511    The Bison parser itself contains many macros and identifiers whose
   3512 names start with `yy' or `YY', so it is a good idea to avoid using any
   3513 such names (except those documented in this manual) in the epilogue of
   3514 the grammar file.
   3515 
   3516 
   3517 File: bison.info,  Node: Symbols,  Next: Rules,  Prev: Grammar Outline,  Up: Grammar File
   3518 
   3519 3.2 Symbols, Terminal and Nonterminal
   3520 =====================================
   3521 
   3522 "Symbols" in Bison grammars represent the grammatical classifications
   3523 of the language.
   3524 
   3525    A "terminal symbol" (also known as a "token type") represents a
   3526 class of syntactically equivalent tokens.  You use the symbol in grammar
   3527 rules to mean that a token in that class is allowed.  The symbol is
   3528 represented in the Bison parser by a numeric code, and the `yylex'
   3529 function returns a token type code to indicate what kind of token has
   3530 been read.  You don't need to know what the code value is; you can use
   3531 the symbol to stand for it.
   3532 
   3533    A "nonterminal symbol" stands for a class of syntactically
   3534 equivalent groupings.  The symbol name is used in writing grammar rules.
   3535 By convention, it should be all lower case.
   3536 
   3537    Symbol names can contain letters, underscores, periods, and
   3538 non-initial digits and dashes.  Dashes in symbol names are a GNU
   3539 extension, incompatible with POSIX Yacc.  Periods and dashes make
   3540 symbol names less convenient to use with named references, which
   3541 require brackets around such names (*note Named References::).
   3542 Terminal symbols that contain periods or dashes make little sense:
   3543 since they are not valid symbols (in most programming languages) they
   3544 are not exported as token names.
   3545 
   3546    There are three ways of writing terminal symbols in the grammar:
   3547 
   3548    * A "named token type" is written with an identifier, like an
   3549      identifier in C.  By convention, it should be all upper case.  Each
   3550      such name must be defined with a Bison declaration such as
   3551      `%token'.  *Note Token Type Names: Token Decl.
   3552 
   3553    * A "character token type" (or "literal character token") is written
   3554      in the grammar using the same syntax used in C for character
   3555      constants; for example, `'+'' is a character token type.  A
   3556      character token type doesn't need to be declared unless you need to
   3557      specify its semantic value data type (*note Data Types of Semantic
   3558      Values: Value Type.), associativity, or precedence (*note Operator
   3559      Precedence: Precedence.).
   3560 
   3561      By convention, a character token type is used only to represent a
   3562      token that consists of that particular character.  Thus, the token
   3563      type `'+'' is used to represent the character `+' as a token.
   3564      Nothing enforces this convention, but if you depart from it, your
   3565      program will confuse other readers.
   3566 
   3567      All the usual escape sequences used in character literals in C can
   3568      be used in Bison as well, but you must not use the null character
   3569      as a character literal because its numeric code, zero, signifies
   3570      end-of-input (*note Calling Convention for `yylex': Calling
   3571      Convention.).  Also, unlike standard C, trigraphs have no special
   3572      meaning in Bison character literals, nor is backslash-newline
   3573      allowed.
   3574 
   3575    * A "literal string token" is written like a C string constant; for
   3576      example, `"<="' is a literal string token.  A literal string token
   3577      doesn't need to be declared unless you need to specify its semantic
   3578      value data type (*note Value Type::), associativity, or precedence
   3579      (*note Precedence::).
   3580 
   3581      You can associate the literal string token with a symbolic name as
   3582      an alias, using the `%token' declaration (*note Token
   3583      Declarations: Token Decl.).  If you don't do that, the lexical
   3584      analyzer has to retrieve the token number for the literal string
   3585      token from the `yytname' table (*note Calling Convention::).
   3586 
   3587      *Warning*: literal string tokens do not work in Yacc.
   3588 
   3589      By convention, a literal string token is used only to represent a
   3590      token that consists of that particular string.  Thus, you should
   3591      use the token type `"<="' to represent the string `<=' as a token.
   3592      Bison does not enforce this convention, but if you depart from it,
   3593      people who read your program will be confused.
   3594 
   3595      All the escape sequences used in string literals in C can be used
   3596      in Bison as well, except that you must not use a null character
   3597      within a string literal.  Also, unlike Standard C, trigraphs have
   3598      no special meaning in Bison string literals, nor is
   3599      backslash-newline allowed.  A literal string token must contain
   3600      two or more characters; for a token containing just one character,
   3601      use a character token (see above).
   3602 
   3603    How you choose to write a terminal symbol has no effect on its
   3604 grammatical meaning.  That depends only on where it appears in rules and
   3605 on when the parser function returns that symbol.
   3606 
   3607    The value returned by `yylex' is always one of the terminal symbols,
   3608 except that a zero or negative value signifies end-of-input.  Whichever
   3609 way you write the token type in the grammar rules, you write it the
   3610 same way in the definition of `yylex'.  The numeric code for a
   3611 character token type is simply the positive numeric code of the
   3612 character, so `yylex' can use the identical value to generate the
   3613 requisite code, though you may need to convert it to `unsigned char' to
   3614 avoid sign-extension on hosts where `char' is signed.  Each named token
   3615 type becomes a C macro in the parser implementation file, so `yylex'
   3616 can use the name to stand for the code.  (This is why periods don't
   3617 make sense in terminal symbols.)  *Note Calling Convention for `yylex':
   3618 Calling Convention.
   3619 
   3620    If `yylex' is defined in a separate file, you need to arrange for the
   3621 token-type macro definitions to be available there.  Use the `-d'
   3622 option when you run Bison, so that it will write these macro definitions
   3623 into a separate header file `NAME.tab.h' which you can include in the
   3624 other source files that need it.  *Note Invoking Bison: Invocation.
   3625 
   3626    If you want to write a grammar that is portable to any Standard C
   3627 host, you must use only nonnull character tokens taken from the basic
   3628 execution character set of Standard C.  This set consists of the ten
   3629 digits, the 52 lower- and upper-case English letters, and the
   3630 characters in the following C-language string:
   3631 
   3632      "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~"
   3633 
   3634    The `yylex' function and Bison must use a consistent character set
   3635 and encoding for character tokens.  For example, if you run Bison in an
   3636 ASCII environment, but then compile and run the resulting program in an
   3637 environment that uses an incompatible character set like EBCDIC, the
   3638 resulting program may not work because the tables generated by Bison
   3639 will assume ASCII numeric values for character tokens.  It is standard
   3640 practice for software distributions to contain C source files that were
   3641 generated by Bison in an ASCII environment, so installers on platforms
   3642 that are incompatible with ASCII must rebuild those files before
   3643 compiling them.
   3644 
   3645    The symbol `error' is a terminal symbol reserved for error recovery
   3646 (*note Error Recovery::); you shouldn't use it for any other purpose.
   3647 In particular, `yylex' should never return this value.  The default
   3648 value of the error token is 256, unless you explicitly assigned 256 to
   3649 one of your tokens with a `%token' declaration.
   3650 
   3651 
   3652 File: bison.info,  Node: Rules,  Next: Recursion,  Prev: Symbols,  Up: Grammar File
   3653 
   3654 3.3 Syntax of Grammar Rules
   3655 ===========================
   3656 
   3657 A Bison grammar rule has the following general form:
   3658 
   3659      RESULT: COMPONENTS...;
   3660 
   3661 where RESULT is the nonterminal symbol that this rule describes, and
   3662 COMPONENTS are various terminal and nonterminal symbols that are put
   3663 together by this rule (*note Symbols::).
   3664 
   3665    For example,
   3666 
   3667      exp: exp '+' exp;
   3668 
   3669 says that two groupings of type `exp', with a `+' token in between, can
   3670 be combined into a larger grouping of type `exp'.
   3671 
   3672    White space in rules is significant only to separate symbols.  You
   3673 can add extra white space as you wish.
   3674 
   3675    Scattered among the components can be ACTIONS that determine the
   3676 semantics of the rule.  An action looks like this:
   3677 
   3678      {C STATEMENTS}
   3679 
   3680 This is an example of "braced code", that is, C code surrounded by
   3681 braces, much like a compound statement in C.  Braced code can contain
   3682 any sequence of C tokens, so long as its braces are balanced.  Bison
   3683 does not check the braced code for correctness directly; it merely
   3684 copies the code to the parser implementation file, where the C compiler
   3685 can check it.
   3686 
   3687    Within braced code, the balanced-brace count is not affected by
   3688 braces within comments, string literals, or character constants, but it
   3689 is affected by the C digraphs `<%' and `%>' that represent braces.  At
   3690 the top level braced code must be terminated by `}' and not by a
   3691 digraph.  Bison does not look for trigraphs, so if braced code uses
   3692 trigraphs you should ensure that they do not affect the nesting of
   3693 braces or the boundaries of comments, string literals, or character
   3694 constants.
   3695 
   3696    Usually there is only one action and it follows the components.
   3697 *Note Actions::.
   3698 
   3699    Multiple rules for the same RESULT can be written separately or can
   3700 be joined with the vertical-bar character `|' as follows:
   3701 
   3702      RESULT:
   3703        RULE1-COMPONENTS...
   3704      | RULE2-COMPONENTS...
   3705      ...
   3706      ;
   3707 
   3708 They are still considered distinct rules even when joined in this way.
   3709 
   3710    If COMPONENTS in a rule is empty, it means that RESULT can match the
   3711 empty string.  For example, here is how to define a comma-separated
   3712 sequence of zero or more `exp' groupings:
   3713 
   3714      expseq:
   3715        /* empty */
   3716      | expseq1
   3717      ;
   3718 
   3719      expseq1:
   3720        exp
   3721      | expseq1 ',' exp
   3722      ;
   3723 
   3724 It is customary to write a comment `/* empty */' in each rule with no
   3725 components.
   3726 
   3727 
   3728 File: bison.info,  Node: Recursion,  Next: Semantics,  Prev: Rules,  Up: Grammar File
   3729 
   3730 3.4 Recursive Rules
   3731 ===================
   3732 
   3733 A rule is called "recursive" when its RESULT nonterminal appears also
   3734 on its right hand side.  Nearly all Bison grammars need to use
   3735 recursion, because that is the only way to define a sequence of any
   3736 number of a particular thing.  Consider this recursive definition of a
   3737 comma-separated sequence of one or more expressions:
   3738 
   3739      expseq1:
   3740        exp
   3741      | expseq1 ',' exp
   3742      ;
   3743 
   3744 Since the recursive use of `expseq1' is the leftmost symbol in the
   3745 right hand side, we call this "left recursion".  By contrast, here the
   3746 same construct is defined using "right recursion":
   3747 
   3748      expseq1:
   3749        exp
   3750      | exp ',' expseq1
   3751      ;
   3752 
   3753 Any kind of sequence can be defined using either left recursion or right
   3754 recursion, but you should always use left recursion, because it can
   3755 parse a sequence of any number of elements with bounded stack space.
   3756 Right recursion uses up space on the Bison stack in proportion to the
   3757 number of elements in the sequence, because all the elements must be
   3758 shifted onto the stack before the rule can be applied even once.  *Note
   3759 The Bison Parser Algorithm: Algorithm, for further explanation of this.
   3760 
   3761    "Indirect" or "mutual" recursion occurs when the result of the rule
   3762 does not appear directly on its right hand side, but does appear in
   3763 rules for other nonterminals which do appear on its right hand side.
   3764 
   3765    For example:
   3766 
   3767      expr:
   3768        primary
   3769      | primary '+' primary
   3770      ;
   3771 
   3772      primary:
   3773        constant
   3774      | '(' expr ')'
   3775      ;
   3776 
   3777 defines two mutually-recursive nonterminals, since each refers to the
   3778 other.
   3779 
   3780 
   3781 File: bison.info,  Node: Semantics,  Next: Tracking Locations,  Prev: Recursion,  Up: Grammar File
   3782 
   3783 3.5 Defining Language Semantics
   3784 ===============================
   3785 
   3786 The grammar rules for a language determine only the syntax.  The
   3787 semantics are determined by the semantic values associated with various
   3788 tokens and groupings, and by the actions taken when various groupings
   3789 are recognized.
   3790 
   3791    For example, the calculator calculates properly because the value
   3792 associated with each expression is the proper number; it adds properly
   3793 because the action for the grouping `X + Y' is to add the numbers
   3794 associated with X and Y.
   3795 
   3796 * Menu:
   3797 
   3798 * Value Type::        Specifying one data type for all semantic values.
   3799 * Multiple Types::    Specifying several alternative data types.
   3800 * Actions::           An action is the semantic definition of a grammar rule.
   3801 * Action Types::      Specifying data types for actions to operate on.
   3802 * Mid-Rule Actions::  Most actions go at the end of a rule.
   3803                       This says when, why and how to use the exceptional
   3804                         action in the middle of a rule.
   3805 
   3806 
   3807 File: bison.info,  Node: Value Type,  Next: Multiple Types,  Up: Semantics
   3808 
   3809 3.5.1 Data Types of Semantic Values
   3810 -----------------------------------
   3811 
   3812 In a simple program it may be sufficient to use the same data type for
   3813 the semantic values of all language constructs.  This was true in the
   3814 RPN and infix calculator examples (*note Reverse Polish Notation
   3815 Calculator: RPN Calc.).
   3816 
   3817    Bison normally uses the type `int' for semantic values if your
   3818 program uses the same data type for all language constructs.  To
   3819 specify some other type, define `YYSTYPE' as a macro, like this:
   3820 
   3821      #define YYSTYPE double
   3822 
   3823 `YYSTYPE''s replacement list should be a type name that does not
   3824 contain parentheses or square brackets.  This macro definition must go
   3825 in the prologue of the grammar file (*note Outline of a Bison Grammar:
   3826 Grammar Outline.).
   3827 
   3828 
   3829 File: bison.info,  Node: Multiple Types,  Next: Actions,  Prev: Value Type,  Up: Semantics
   3830 
   3831 3.5.2 More Than One Value Type
   3832 ------------------------------
   3833 
   3834 In most programs, you will need different data types for different kinds
   3835 of tokens and groupings.  For example, a numeric constant may need type
   3836 `int' or `long int', while a string constant needs type `char *', and
   3837 an identifier might need a pointer to an entry in the symbol table.
   3838 
   3839    To use more than one data type for semantic values in one parser,
   3840 Bison requires you to do two things:
   3841 
   3842    * Specify the entire collection of possible data types, either by
   3843      using the `%union' Bison declaration (*note The Collection of
   3844      Value Types: Union Decl.), or by using a `typedef' or a `#define'
   3845      to define `YYSTYPE' to be a union type whose member names are the
   3846      type tags.
   3847 
   3848    * Choose one of those types for each symbol (terminal or
   3849      nonterminal) for which semantic values are used.  This is done for
   3850      tokens with the `%token' Bison declaration (*note Token Type
   3851      Names: Token Decl.)  and for groupings with the `%type' Bison
   3852      declaration (*note Nonterminal Symbols: Type Decl.).
   3853 
   3854 
   3855 File: bison.info,  Node: Actions,  Next: Action Types,  Prev: Multiple Types,  Up: Semantics
   3856 
   3857 3.5.3 Actions
   3858 -------------
   3859 
   3860 An action accompanies a syntactic rule and contains C code to be
   3861 executed each time an instance of that rule is recognized.  The task of
   3862 most actions is to compute a semantic value for the grouping built by
   3863 the rule from the semantic values associated with tokens or smaller
   3864 groupings.
   3865 
   3866    An action consists of braced code containing C statements, and can be
   3867 placed at any position in the rule; it is executed at that position.
   3868 Most rules have just one action at the end of the rule, following all
   3869 the components.  Actions in the middle of a rule are tricky and used
   3870 only for special purposes (*note Actions in Mid-Rule: Mid-Rule
   3871 Actions.).
   3872 
   3873    The C code in an action can refer to the semantic values of the
   3874 components matched by the rule with the construct `$N', which stands
   3875 for the value of the Nth component.  The semantic value for the
   3876 grouping being constructed is `$$'.  In addition, the semantic values
   3877 of symbols can be accessed with the named references construct `$NAME'
   3878 or `$[NAME]'.  Bison translates both of these constructs into
   3879 expressions of the appropriate type when it copies the actions into the
   3880 parser implementation file.  `$$' (or `$NAME', when it stands for the
   3881 current grouping) is translated to a modifiable lvalue, so it can be
   3882 assigned to.
   3883 
   3884    Here is a typical example:
   3885 
   3886      exp:
   3887      ...
   3888      | exp '+' exp     { $$ = $1 + $3; }
   3889 
   3890    Or, in terms of named references:
   3891 
   3892      exp[result]:
   3893      ...
   3894      | exp[left] '+' exp[right]  { $result = $left + $right; }
   3895 
   3896 This rule constructs an `exp' from two smaller `exp' groupings
   3897 connected by a plus-sign token.  In the action, `$1' and `$3' (`$left'
   3898 and `$right') refer to the semantic values of the two component `exp'
   3899 groupings, which are the first and third symbols on the right hand side
   3900 of the rule.  The sum is stored into `$$' (`$result') so that it
   3901 becomes the semantic value of the addition-expression just recognized
   3902 by the rule.  If there were a useful semantic value associated with the
   3903 `+' token, it could be referred to as `$2'.
   3904 
   3905    *Note Named References::, for more information about using the named
   3906 references construct.
   3907 
   3908    Note that the vertical-bar character `|' is really a rule separator,
   3909 and actions are attached to a single rule.  This is a difference with
   3910 tools like Flex, for which `|' stands for either "or", or "the same
   3911 action as that of the next rule".  In the following example, the action
   3912 is triggered only when `b' is found:
   3913 
   3914      a-or-b: 'a'|'b'   { a_or_b_found = 1; };
   3915 
   3916    If you don't specify an action for a rule, Bison supplies a default:
   3917 `$$ = $1'.  Thus, the value of the first symbol in the rule becomes the
   3918 value of the whole rule.  Of course, the default action is valid only
   3919 if the two data types match.  There is no meaningful default action for
   3920 an empty rule; every empty rule must have an explicit action unless the
   3921 rule's value does not matter.
   3922 
   3923    `$N' with N zero or negative is allowed for reference to tokens and
   3924 groupings on the stack _before_ those that match the current rule.
   3925 This is a very risky practice, and to use it reliably you must be
   3926 certain of the context in which the rule is applied.  Here is a case in
   3927 which you can use this reliably:
   3928 
   3929      foo:
   3930        expr bar '+' expr  { ... }
   3931      | expr bar '-' expr  { ... }
   3932      ;
   3933 
   3934      bar:
   3935        /* empty */    { previous_expr = $0; }
   3936      ;
   3937 
   3938    As long as `bar' is used only in the fashion shown here, `$0' always
   3939 refers to the `expr' which precedes `bar' in the definition of `foo'.
   3940 
   3941    It is also possible to access the semantic value of the lookahead
   3942 token, if any, from a semantic action.  This semantic value is stored
   3943 in `yylval'.  *Note Special Features for Use in Actions: Action
   3944 Features.
   3945 
   3946 
   3947 File: bison.info,  Node: Action Types,  Next: Mid-Rule Actions,  Prev: Actions,  Up: Semantics
   3948 
   3949 3.5.4 Data Types of Values in Actions
   3950 -------------------------------------
   3951 
   3952 If you have chosen a single data type for semantic values, the `$$' and
   3953 `$N' constructs always have that data type.
   3954 
   3955    If you have used `%union' to specify a variety of data types, then
   3956 you must declare a choice among these types for each terminal or
   3957 nonterminal symbol that can have a semantic value.  Then each time you
   3958 use `$$' or `$N', its data type is determined by which symbol it refers
   3959 to in the rule.  In this example,
   3960 
   3961      exp:
   3962        ...
   3963      | exp '+' exp    { $$ = $1 + $3; }
   3964 
   3965 `$1' and `$3' refer to instances of `exp', so they all have the data
   3966 type declared for the nonterminal symbol `exp'.  If `$2' were used, it
   3967 would have the data type declared for the terminal symbol `'+'',
   3968 whatever that might be.
   3969 
   3970    Alternatively, you can specify the data type when you refer to the
   3971 value, by inserting `<TYPE>' after the `$' at the beginning of the
   3972 reference.  For example, if you have defined types as shown here:
   3973 
   3974      %union {
   3975        int itype;
   3976        double dtype;
   3977      }
   3978 
   3979 then you can write `$<itype>1' to refer to the first subunit of the
   3980 rule as an integer, or `$<dtype>1' to refer to it as a double.
   3981 
   3982 
   3983 File: bison.info,  Node: Mid-Rule Actions,  Prev: Action Types,  Up: Semantics
   3984 
   3985 3.5.5 Actions in Mid-Rule
   3986 -------------------------
   3987 
   3988 Occasionally it is useful to put an action in the middle of a rule.
   3989 These actions are written just like usual end-of-rule actions, but they
   3990 are executed before the parser even recognizes the following components.
   3991 
   3992 * Menu:
   3993 
   3994 * Using Mid-Rule Actions::       Putting an action in the middle of a rule.
   3995 * Mid-Rule Action Translation::  How mid-rule actions are actually processed.
   3996 * Mid-Rule Conflicts::           Mid-rule actions can cause conflicts.
   3997 
   3998 
   3999 File: bison.info,  Node: Using Mid-Rule Actions,  Next: Mid-Rule Action Translation,  Up: Mid-Rule Actions
   4000 
   4001 3.5.5.1 Using Mid-Rule Actions
   4002 ..............................
   4003 
   4004 A mid-rule action may refer to the components preceding it using `$N',
   4005 but it may not refer to subsequent components because it is run before
   4006 they are parsed.
   4007 
   4008    The mid-rule action itself counts as one of the components of the
   4009 rule.  This makes a difference when there is another action later in
   4010 the same rule (and usually there is another at the end): you have to
   4011 count the actions along with the symbols when working out which number
   4012 N to use in `$N'.
   4013 
   4014    The mid-rule action can also have a semantic value.  The action can
   4015 set its value with an assignment to `$$', and actions later in the rule
   4016 can refer to the value using `$N'.  Since there is no symbol to name
   4017 the action, there is no way to declare a data type for the value in
   4018 advance, so you must use the `$<...>N' construct to specify a data type
   4019 each time you refer to this value.
   4020 
   4021    There is no way to set the value of the entire rule with a mid-rule
   4022 action, because assignments to `$$' do not have that effect.  The only
   4023 way to set the value for the entire rule is with an ordinary action at
   4024 the end of the rule.
   4025 
   4026    Here is an example from a hypothetical compiler, handling a `let'
   4027 statement that looks like `let (VARIABLE) STATEMENT' and serves to
   4028 create a variable named VARIABLE temporarily for the duration of
   4029 STATEMENT.  To parse this construct, we must put VARIABLE into the
   4030 symbol table while STATEMENT is parsed, then remove it afterward.  Here
   4031 is how it is done:
   4032 
   4033      stmt:
   4034        "let" '(' var ')'
   4035          {
   4036            $<context>$ = push_context ();
   4037            declare_variable ($3);
   4038          }
   4039        stmt
   4040          {
   4041            $$ = $6;
   4042            pop_context ($<context>5);
   4043          }
   4044 
   4045 As soon as `let (VARIABLE)' has been recognized, the first action is
   4046 run.  It saves a copy of the current semantic context (the list of
   4047 accessible variables) as its semantic value, using alternative
   4048 `context' in the data-type union.  Then it calls `declare_variable' to
   4049 add the new variable to that list.  Once the first action is finished,
   4050 the embedded statement `stmt' can be parsed.
   4051 
   4052    Note that the mid-rule action is component number 5, so the `stmt' is
   4053 component number 6.  Named references can be used to improve the
   4054 readability and maintainability (*note Named References::):
   4055 
   4056      stmt:
   4057        "let" '(' var ')'
   4058          {
   4059            $<context>let = push_context ();
   4060            declare_variable ($3);
   4061          }[let]
   4062        stmt
   4063          {
   4064            $$ = $6;
   4065            pop_context ($<context>let);
   4066          }
   4067 
   4068    After the embedded statement is parsed, its semantic value becomes
   4069 the value of the entire `let'-statement.  Then the semantic value from
   4070 the earlier action is used to restore the prior list of variables.  This
   4071 removes the temporary `let'-variable from the list so that it won't
   4072 appear to exist while the rest of the program is parsed.
   4073 
   4074    In the above example, if the parser initiates error recovery (*note
   4075 Error Recovery::) while parsing the tokens in the embedded statement
   4076 `stmt', it might discard the previous semantic context `$<context>5'
   4077 without restoring it.  Thus, `$<context>5' needs a destructor (*note
   4078 Freeing Discarded Symbols: Destructor Decl.).  However, Bison currently
   4079 provides no means to declare a destructor specific to a particular
   4080 mid-rule action's semantic value.
   4081 
   4082    One solution is to bury the mid-rule action inside a nonterminal
   4083 symbol and to declare a destructor for that symbol:
   4084 
   4085      %type <context> let
   4086      %destructor { pop_context ($$); } let
   4087 
   4088      %%
   4089 
   4090      stmt:
   4091        let stmt
   4092          {
   4093            $$ = $2;
   4094            pop_context ($let);
   4095          };
   4096 
   4097      let:
   4098        "let" '(' var ')'
   4099          {
   4100            $let = push_context ();
   4101            declare_variable ($3);
   4102          };
   4103 
   4104 Note that the action is now at the end of its rule.  Any mid-rule
   4105 action can be converted to an end-of-rule action in this way, and this
   4106 is what Bison actually does to implement mid-rule actions.
   4107 
   4108 
   4109 File: bison.info,  Node: Mid-Rule Action Translation,  Next: Mid-Rule Conflicts,  Prev: Using Mid-Rule Actions,  Up: Mid-Rule Actions
   4110 
   4111 3.5.5.2 Mid-Rule Action Translation
   4112 ...................................
   4113 
   4114 As hinted earlier, mid-rule actions are actually transformed into
   4115 regular rules and actions.  The various reports generated by Bison
   4116 (textual, graphical, etc., see *note Understanding Your Parser:
   4117 Understanding.)  reveal this translation, best explained by means of an
   4118 example.  The following rule:
   4119 
   4120      exp: { a(); } "b" { c(); } { d(); } "e" { f(); };
   4121 
   4122 is translated into:
   4123 
   4124      $@1: /* empty */ { a(); };
   4125      $@2: /* empty */ { c(); };
   4126      $@3: /* empty */ { d(); };
   4127      exp: $@1 "b" $@2 $@3 "e" { f(); };
   4128 
   4129 with new nonterminal symbols `$@N', where N is a number.
   4130 
   4131    A mid-rule action is expected to generate a value if it uses `$$', or
   4132 the (final) action uses `$N' where N denote the mid-rule action.  In
   4133 that case its nonterminal is rather named `@N':
   4134 
   4135      exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; };
   4136 
   4137 is translated into
   4138 
   4139      @1: /* empty */ { a(); };
   4140      @2: /* empty */ { $$ = c(); };
   4141      $@3: /* empty */ { d(); };
   4142      exp: @1 "b" @2 $@3 "e" { f = $1; }
   4143 
   4144    There are probably two errors in the above example: the first
   4145 mid-rule action does not generate a value (it does not use `$$'
   4146 although the final action uses it), and the value of the second one is
   4147 not used (the final action does not use `$3').  Bison reports these
   4148 errors when the `midrule-value' warnings are enabled (*note Invoking
   4149 Bison: Invocation.):
   4150 
   4151      $ bison -fcaret -Wmidrule-value mid.y
   4152      mid.y:2.6-13: warning: unset value: $$
   4153       exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; };
   4154            ^^^^^^^^
   4155      mid.y:2.19-31: warning: unused value: $3
   4156       exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; };
   4157                         ^^^^^^^^^^^^^
   4158 
   4159 
   4160 File: bison.info,  Node: Mid-Rule Conflicts,  Prev: Mid-Rule Action Translation,  Up: Mid-Rule Actions
   4161 
   4162 3.5.5.3 Conflicts due to Mid-Rule Actions
   4163 .........................................
   4164 
   4165 Taking action before a rule is completely recognized often leads to
   4166 conflicts since the parser must commit to a parse in order to execute
   4167 the action.  For example, the following two rules, without mid-rule
   4168 actions, can coexist in a working parser because the parser can shift
   4169 the open-brace token and look at what follows before deciding whether
   4170 there is a declaration or not:
   4171 
   4172      compound:
   4173        '{' declarations statements '}'
   4174      | '{' statements '}'
   4175      ;
   4176 
   4177 But when we add a mid-rule action as follows, the rules become
   4178 nonfunctional:
   4179 
   4180      compound:
   4181        { prepare_for_local_variables (); }
   4182           '{' declarations statements '}'
   4183      |    '{' statements '}'
   4184      ;
   4185 
   4186 Now the parser is forced to decide whether to run the mid-rule action
   4187 when it has read no farther than the open-brace.  In other words, it
   4188 must commit to using one rule or the other, without sufficient
   4189 information to do it correctly.  (The open-brace token is what is called
   4190 the "lookahead" token at this time, since the parser is still deciding
   4191 what to do about it.  *Note Lookahead Tokens: Lookahead.)
   4192 
   4193    You might think that you could correct the problem by putting
   4194 identical actions into the two rules, like this:
   4195 
   4196      compound:
   4197        { prepare_for_local_variables (); }
   4198          '{' declarations statements '}'
   4199      | { prepare_for_local_variables (); }
   4200          '{' statements '}'
   4201      ;
   4202 
   4203 But this does not help, because Bison does not realize that the two
   4204 actions are identical.  (Bison never tries to understand the C code in
   4205 an action.)
   4206 
   4207    If the grammar is such that a declaration can be distinguished from a
   4208 statement by the first token (which is true in C), then one solution
   4209 which does work is to put the action after the open-brace, like this:
   4210 
   4211      compound:
   4212        '{' { prepare_for_local_variables (); }
   4213          declarations statements '}'
   4214      | '{' statements '}'
   4215      ;
   4216 
   4217 Now the first token of the following declaration or statement, which
   4218 would in any case tell Bison which rule to use, can still do so.
   4219 
   4220    Another solution is to bury the action inside a nonterminal symbol
   4221 which serves as a subroutine:
   4222 
   4223      subroutine:
   4224        /* empty */  { prepare_for_local_variables (); }
   4225      ;
   4226 
   4227      compound:
   4228        subroutine '{' declarations statements '}'
   4229      | subroutine '{' statements '}'
   4230      ;
   4231 
   4232 Now Bison can execute the action in the rule for `subroutine' without
   4233 deciding which rule for `compound' it will eventually use.
   4234 
   4235 
   4236 File: bison.info,  Node: Tracking Locations,  Next: Named References,  Prev: Semantics,  Up: Grammar File
   4237 
   4238 3.6 Tracking Locations
   4239 ======================
   4240 
   4241 Though grammar rules and semantic actions are enough to write a fully
   4242 functional parser, it can be useful to process some additional
   4243 information, especially symbol locations.
   4244 
   4245    The way locations are handled is defined by providing a data type,
   4246 and actions to take when rules are matched.
   4247 
   4248 * Menu:
   4249 
   4250 * Location Type::               Specifying a data type for locations.
   4251 * Actions and Locations::       Using locations in actions.
   4252 * Location Default Action::     Defining a general way to compute locations.
   4253 
   4254 
   4255 File: bison.info,  Node: Location Type,  Next: Actions and Locations,  Up: Tracking Locations
   4256 
   4257 3.6.1 Data Type of Locations
   4258 ----------------------------
   4259 
   4260 Defining a data type for locations is much simpler than for semantic
   4261 values, since all tokens and groupings always use the same type.
   4262 
   4263    You can specify the type of locations by defining a macro called
   4264 `YYLTYPE', just as you can specify the semantic value type by defining
   4265 a `YYSTYPE' macro (*note Value Type::).  When `YYLTYPE' is not defined,
   4266 Bison uses a default structure type with four members:
   4267 
   4268      typedef struct YYLTYPE
   4269      {
   4270        int first_line;
   4271        int first_column;
   4272        int last_line;
   4273        int last_column;
   4274      } YYLTYPE;
   4275 
   4276    When `YYLTYPE' is not defined, at the beginning of the parsing, Bison
   4277 initializes all these fields to 1 for `yylloc'.  To initialize `yylloc'
   4278 with a custom location type (or to chose a different initialization),
   4279 use the `%initial-action' directive.  *Note Performing Actions before
   4280 Parsing: Initial Action Decl.
   4281 
   4282 
   4283 File: bison.info,  Node: Actions and Locations,  Next: Location Default Action,  Prev: Location Type,  Up: Tracking Locations
   4284 
   4285 3.6.2 Actions and Locations
   4286 ---------------------------
   4287 
   4288 Actions are not only useful for defining language semantics, but also
   4289 for describing the behavior of the output parser with locations.
   4290 
   4291    The most obvious way for building locations of syntactic groupings
   4292 is very similar to the way semantic values are computed.  In a given
   4293 rule, several constructs can be used to access the locations of the
   4294 elements being matched.  The location of the Nth component of the right
   4295 hand side is `@N', while the location of the left hand side grouping is
   4296 `@$'.
   4297 
   4298    In addition, the named references construct `@NAME' and `@[NAME]'
   4299 may also be used to address the symbol locations.  *Note Named
   4300 References::, for more information about using the named references
   4301 construct.
   4302 
   4303    Here is a basic example using the default data type for locations:
   4304 
   4305      exp:
   4306        ...
   4307      | exp '/' exp
   4308          {
   4309            @$.first_column = @1.first_column;
   4310            @$.first_line = @1.first_line;
   4311            @$.last_column = @3.last_column;
   4312            @$.last_line = @3.last_line;
   4313            if ($3)
   4314              $$ = $1 / $3;
   4315            else
   4316              {
   4317                $$ = 1;
   4318                fprintf (stderr,
   4319                         "Division by zero, l%d,c%d-l%d,c%d",
   4320                         @3.first_line, @3.first_column,
   4321                         @3.last_line, @3.last_column);
   4322              }
   4323          }
   4324 
   4325    As for semantic values, there is a default action for locations that
   4326 is run each time a rule is matched.  It sets the beginning of `@$' to
   4327 the beginning of the first symbol, and the end of `@$' to the end of the
   4328 last symbol.
   4329 
   4330    With this default action, the location tracking can be fully
   4331 automatic.  The example above simply rewrites this way:
   4332 
   4333      exp:
   4334        ...
   4335      | exp '/' exp
   4336          {
   4337            if ($3)
   4338              $$ = $1 / $3;
   4339            else
   4340              {
   4341                $$ = 1;
   4342                fprintf (stderr,
   4343                         "Division by zero, l%d,c%d-l%d,c%d",
   4344                         @3.first_line, @3.first_column,
   4345                         @3.last_line, @3.last_column);
   4346              }
   4347          }
   4348 
   4349    It is also possible to access the location of the lookahead token,
   4350 if any, from a semantic action.  This location is stored in `yylloc'.
   4351 *Note Special Features for Use in Actions: Action Features.
   4352 
   4353 
   4354 File: bison.info,  Node: Location Default Action,  Prev: Actions and Locations,  Up: Tracking Locations
   4355 
   4356 3.6.3 Default Action for Locations
   4357 ----------------------------------
   4358 
   4359 Actually, actions are not the best place to compute locations.  Since
   4360 locations are much more general than semantic values, there is room in
   4361 the output parser to redefine the default action to take for each rule.
   4362 The `YYLLOC_DEFAULT' macro is invoked each time a rule is matched,
   4363 before the associated action is run.  It is also invoked while
   4364 processing a syntax error, to compute the error's location.  Before
   4365 reporting an unresolvable syntactic ambiguity, a GLR parser invokes
   4366 `YYLLOC_DEFAULT' recursively to compute the location of that ambiguity.
   4367 
   4368    Most of the time, this macro is general enough to suppress location
   4369 dedicated code from semantic actions.
   4370 
   4371    The `YYLLOC_DEFAULT' macro takes three parameters.  The first one is
   4372 the location of the grouping (the result of the computation).  When a
   4373 rule is matched, the second parameter identifies locations of all right
   4374 hand side elements of the rule being matched, and the third parameter
   4375 is the size of the rule's right hand side.  When a GLR parser reports
   4376 an ambiguity, which of multiple candidate right hand sides it passes to
   4377 `YYLLOC_DEFAULT' is undefined.  When processing a syntax error, the
   4378 second parameter identifies locations of the symbols that were
   4379 discarded during error processing, and the third parameter is the
   4380 number of discarded symbols.
   4381 
   4382    By default, `YYLLOC_DEFAULT' is defined this way:
   4383 
   4384      # define YYLLOC_DEFAULT(Cur, Rhs, N)                      \
   4385      do                                                        \
   4386        if (N)                                                  \
   4387          {                                                     \
   4388            (Cur).first_line   = YYRHSLOC(Rhs, 1).first_line;   \
   4389            (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \
   4390            (Cur).last_line    = YYRHSLOC(Rhs, N).last_line;    \
   4391            (Cur).last_column  = YYRHSLOC(Rhs, N).last_column;  \
   4392          }                                                     \
   4393        else                                                    \
   4394          {                                                     \
   4395            (Cur).first_line   = (Cur).last_line   =            \
   4396              YYRHSLOC(Rhs, 0).last_line;                       \
   4397            (Cur).first_column = (Cur).last_column =            \
   4398              YYRHSLOC(Rhs, 0).last_column;                     \
   4399          }                                                     \
   4400      while (0)
   4401 
   4402 where `YYRHSLOC (rhs, k)' is the location of the Kth symbol in RHS when
   4403 K is positive, and the location of the symbol just before the reduction
   4404 when K and N are both zero.
   4405 
   4406    When defining `YYLLOC_DEFAULT', you should consider that:
   4407 
   4408    * All arguments are free of side-effects.  However, only the first
   4409      one (the result) should be modified by `YYLLOC_DEFAULT'.
   4410 
   4411    * For consistency with semantic actions, valid indexes within the
   4412      right hand side range from 1 to N.  When N is zero, only 0 is a
   4413      valid index, and it refers to the symbol just before the reduction.
   4414      During error processing N is always positive.
   4415 
   4416    * Your macro should parenthesize its arguments, if need be, since the
   4417      actual arguments may not be surrounded by parentheses.  Also, your
   4418      macro should expand to something that can be used as a single
   4419      statement when it is followed by a semicolon.
   4420 
   4421 
   4422 File: bison.info,  Node: Named References,  Next: Declarations,  Prev: Tracking Locations,  Up: Grammar File
   4423 
   4424 3.7 Named References
   4425 ====================
   4426 
   4427 As described in the preceding sections, the traditional way to refer to
   4428 any semantic value or location is a "positional reference", which takes
   4429 the form `$N', `$$', `@N', and `@$'.  However, such a reference is not
   4430 very descriptive.  Moreover, if you later decide to insert or remove
   4431 symbols in the right-hand side of a grammar rule, the need to renumber
   4432 such references can be tedious and error-prone.
   4433 
   4434    To avoid these issues, you can also refer to a semantic value or
   4435 location using a "named reference".  First of all, original symbol
   4436 names may be used as named references.  For example:
   4437 
   4438      invocation: op '(' args ')'
   4439        { $invocation = new_invocation ($op, $args, @invocation); }
   4440 
   4441 Positional and named references can be mixed arbitrarily.  For example:
   4442 
   4443      invocation: op '(' args ')'
   4444        { $$ = new_invocation ($op, $args, @$); }
   4445 
   4446 However, sometimes regular symbol names are not sufficient due to
   4447 ambiguities:
   4448 
   4449      exp: exp '/' exp
   4450        { $exp = $exp / $exp; } // $exp is ambiguous.
   4451 
   4452      exp: exp '/' exp
   4453        { $$ = $1 / $exp; } // One usage is ambiguous.
   4454 
   4455      exp: exp '/' exp
   4456        { $$ = $1 / $3; } // No error.
   4457 
   4458 When ambiguity occurs, explicitly declared names may be used for values
   4459 and locations.  Explicit names are declared as a bracketed name after a
   4460 symbol appearance in rule definitions.  For example:
   4461      exp[result]: exp[left] '/' exp[right]
   4462        { $result = $left / $right; }
   4463 
   4464 In order to access a semantic value generated by a mid-rule action, an
   4465 explicit name may also be declared by putting a bracketed name after the
   4466 closing brace of the mid-rule action code:
   4467      exp[res]: exp[x] '+' {$left = $x;}[left] exp[right]
   4468        { $res = $left + $right; }
   4469 
   4470 In references, in order to specify names containing dots and dashes, an
   4471 explicit bracketed syntax `$[name]' and `@[name]' must be used:
   4472      if-stmt: "if" '(' expr ')' "then" then.stmt ';'
   4473        { $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); }
   4474 
   4475    It often happens that named references are followed by a dot, dash
   4476 or other C punctuation marks and operators.  By default, Bison will read
   4477 `$name.suffix' as a reference to symbol value `$name' followed by
   4478 `.suffix', i.e., an access to the `suffix' field of the semantic value.
   4479 In order to force Bison to recognize `name.suffix' in its entirety as
   4480 the name of a semantic value, the bracketed syntax `$[name.suffix]'
   4481 must be used.
   4482 
   4483    The named references feature is experimental.  More user feedback
   4484 will help to stabilize it.
   4485 
   4486 
   4487 File: bison.info,  Node: Declarations,  Next: Multiple Parsers,  Prev: Named References,  Up: Grammar File
   4488 
   4489 3.8 Bison Declarations
   4490 ======================
   4491 
   4492 The "Bison declarations" section of a Bison grammar defines the symbols
   4493 used in formulating the grammar and the data types of semantic values.
   4494 *Note Symbols::.
   4495 
   4496    All token type names (but not single-character literal tokens such as
   4497 `'+'' and `'*'') must be declared.  Nonterminal symbols must be
   4498 declared if you need to specify which data type to use for the semantic
   4499 value (*note More Than One Value Type: Multiple Types.).
   4500 
   4501    The first rule in the grammar file also specifies the start symbol,
   4502 by default.  If you want some other symbol to be the start symbol, you
   4503 must declare it explicitly (*note Languages and Context-Free Grammars:
   4504 Language and Grammar.).
   4505 
   4506 * Menu:
   4507 
   4508 * Require Decl::      Requiring a Bison version.
   4509 * Token Decl::        Declaring terminal symbols.
   4510 * Precedence Decl::   Declaring terminals with precedence and associativity.
   4511 * Union Decl::        Declaring the set of all semantic value types.
   4512 * Type Decl::         Declaring the choice of type for a nonterminal symbol.
   4513 * Initial Action Decl::  Code run before parsing starts.
   4514 * Destructor Decl::   Declaring how symbols are freed.
   4515 * Printer Decl::      Declaring how symbol values are displayed.
   4516 * Expect Decl::       Suppressing warnings about parsing conflicts.
   4517 * Start Decl::        Specifying the start symbol.
   4518 * Pure Decl::         Requesting a reentrant parser.
   4519 * Push Decl::         Requesting a push parser.
   4520 * Decl Summary::      Table of all Bison declarations.
   4521 * %define Summary::   Defining variables to adjust Bison's behavior.
   4522 * %code Summary::     Inserting code into the parser source.
   4523 
   4524 
   4525 File: bison.info,  Node: Require Decl,  Next: Token Decl,  Up: Declarations
   4526 
   4527 3.8.1 Require a Version of Bison
   4528 --------------------------------
   4529 
   4530 You may require the minimum version of Bison to process the grammar.  If
   4531 the requirement is not met, `bison' exits with an error (exit status
   4532 63).
   4533 
   4534      %require "VERSION"
   4535 
   4536 
   4537 File: bison.info,  Node: Token Decl,  Next: Precedence Decl,  Prev: Require Decl,  Up: Declarations
   4538 
   4539 3.8.2 Token Type Names
   4540 ----------------------
   4541 
   4542 The basic way to declare a token type name (terminal symbol) is as
   4543 follows:
   4544 
   4545      %token NAME
   4546 
   4547    Bison will convert this into a `#define' directive in the parser, so
   4548 that the function `yylex' (if it is in this file) can use the name NAME
   4549 to stand for this token type's code.
   4550 
   4551    Alternatively, you can use `%left', `%right', or `%nonassoc' instead
   4552 of `%token', if you wish to specify associativity and precedence.
   4553 *Note Operator Precedence: Precedence Decl.
   4554 
   4555    You can explicitly specify the numeric code for a token type by
   4556 appending a nonnegative decimal or hexadecimal integer value in the
   4557 field immediately following the token name:
   4558 
   4559      %token NUM 300
   4560      %token XNUM 0x12d // a GNU extension
   4561 
   4562 It is generally best, however, to let Bison choose the numeric codes for
   4563 all token types.  Bison will automatically select codes that don't
   4564 conflict with each other or with normal characters.
   4565 
   4566    In the event that the stack type is a union, you must augment the
   4567 `%token' or other token declaration to include the data type
   4568 alternative delimited by angle-brackets (*note More Than One Value
   4569 Type: Multiple Types.).
   4570 
   4571    For example:
   4572 
   4573      %union {              /* define stack type */
   4574        double val;
   4575        symrec *tptr;
   4576      }
   4577      %token <val> NUM      /* define token NUM and its type */
   4578 
   4579    You can associate a literal string token with a token type name by
   4580 writing the literal string at the end of a `%token' declaration which
   4581 declares the name.  For example:
   4582 
   4583      %token arrow "=>"
   4584 
   4585 For example, a grammar for the C language might specify these names with
   4586 equivalent literal string tokens:
   4587 
   4588      %token  <operator>  OR      "||"
   4589      %token  <operator>  LE 134  "<="
   4590      %left  OR  "<="
   4591 
   4592 Once you equate the literal string and the token name, you can use them
   4593 interchangeably in further declarations or the grammar rules.  The
   4594 `yylex' function can use the token name or the literal string to obtain
   4595 the token type code number (*note Calling Convention::).  Syntax error
   4596 messages passed to `yyerror' from the parser will reference the literal
   4597 string instead of the token name.
   4598 
   4599    The token numbered as 0 corresponds to end of file; the following
   4600 line allows for nicer error messages referring to "end of file" instead
   4601 of "$end":
   4602 
   4603      %token END 0 "end of file"
   4604 
   4605 
   4606 File: bison.info,  Node: Precedence Decl,  Next: Union Decl,  Prev: Token Decl,  Up: Declarations
   4607 
   4608 3.8.3 Operator Precedence
   4609 -------------------------
   4610 
   4611 Use the `%left', `%right' or `%nonassoc' declaration to declare a token
   4612 and specify its precedence and associativity, all at once.  These are
   4613 called "precedence declarations".  *Note Operator Precedence:
   4614 Precedence, for general information on operator precedence.
   4615 
   4616    The syntax of a precedence declaration is nearly the same as that of
   4617 `%token': either
   4618 
   4619      %left SYMBOLS...
   4620 
   4621 or
   4622 
   4623      %left <TYPE> SYMBOLS...
   4624 
   4625    And indeed any of these declarations serves the purposes of `%token'.
   4626 But in addition, they specify the associativity and relative precedence
   4627 for all the SYMBOLS:
   4628 
   4629    * The associativity of an operator OP determines how repeated uses
   4630      of the operator nest: whether `X OP Y OP Z' is parsed by grouping
   4631      X with Y first or by grouping Y with Z first.  `%left' specifies
   4632      left-associativity (grouping X with Y first) and `%right'
   4633      specifies right-associativity (grouping Y with Z first).
   4634      `%nonassoc' specifies no associativity, which means that `X OP Y
   4635      OP Z' is considered a syntax error.
   4636 
   4637    * The precedence of an operator determines how it nests with other
   4638      operators.  All the tokens declared in a single precedence
   4639      declaration have equal precedence and nest together according to
   4640      their associativity.  When two tokens declared in different
   4641      precedence declarations associate, the one declared later has the
   4642      higher precedence and is grouped first.
   4643 
   4644    For backward compatibility, there is a confusing difference between
   4645 the argument lists of `%token' and precedence declarations.  Only a
   4646 `%token' can associate a literal string with a token type name.  A
   4647 precedence declaration always interprets a literal string as a
   4648 reference to a separate token.  For example:
   4649 
   4650      %left  OR "<="         // Does not declare an alias.
   4651      %left  OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".
   4652 
   4653 
   4654 File: bison.info,  Node: Union Decl,  Next: Type Decl,  Prev: Precedence Decl,  Up: Declarations
   4655 
   4656 3.8.4 The Collection of Value Types
   4657 -----------------------------------
   4658 
   4659 The `%union' declaration specifies the entire collection of possible
   4660 data types for semantic values.  The keyword `%union' is followed by
   4661 braced code containing the same thing that goes inside a `union' in C.
   4662 
   4663    For example:
   4664 
   4665      %union {
   4666        double val;
   4667        symrec *tptr;
   4668      }
   4669 
   4670 This says that the two alternative types are `double' and `symrec *'.
   4671 They are given names `val' and `tptr'; these names are used in the
   4672 `%token' and `%type' declarations to pick one of the types for a
   4673 terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.).
   4674 
   4675    As an extension to POSIX, a tag is allowed after the `union'.  For
   4676 example:
   4677 
   4678      %union value {
   4679        double val;
   4680        symrec *tptr;
   4681      }
   4682 
   4683 specifies the union tag `value', so the corresponding C type is `union
   4684 value'.  If you do not specify a tag, it defaults to `YYSTYPE'.
   4685 
   4686    As another extension to POSIX, you may specify multiple `%union'
   4687 declarations; their contents are concatenated.  However, only the first
   4688 `%union' declaration can specify a tag.
   4689 
   4690    Note that, unlike making a `union' declaration in C, you need not
   4691 write a semicolon after the closing brace.
   4692 
   4693    Instead of `%union', you can define and use your own union type
   4694 `YYSTYPE' if your grammar contains at least one `<TYPE>' tag.  For
   4695 example, you can put the following into a header file `parser.h':
   4696 
   4697      union YYSTYPE {
   4698        double val;
   4699        symrec *tptr;
   4700      };
   4701      typedef union YYSTYPE YYSTYPE;
   4702 
   4703 and then your grammar can use the following instead of `%union':
   4704 
   4705      %{
   4706      #include "parser.h"
   4707      %}
   4708      %type <val> expr
   4709      %token <tptr> ID
   4710 
   4711 
   4712 File: bison.info,  Node: Type Decl,  Next: Initial Action Decl,  Prev: Union Decl,  Up: Declarations
   4713 
   4714 3.8.5 Nonterminal Symbols
   4715 -------------------------
   4716 
   4717 When you use `%union' to specify multiple value types, you must declare
   4718 the value type of each nonterminal symbol for which values are used.
   4719 This is done with a `%type' declaration, like this:
   4720 
   4721      %type <TYPE> NONTERMINAL...
   4722 
   4723 Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the
   4724 name given in the `%union' to the alternative that you want (*note The
   4725 Collection of Value Types: Union Decl.).  You can give any number of
   4726 nonterminal symbols in the same `%type' declaration, if they have the
   4727 same value type.  Use spaces to separate the symbol names.
   4728 
   4729    You can also declare the value type of a terminal symbol.  To do
   4730 this, use the same `<TYPE>' construction in a declaration for the
   4731 terminal symbol.  All kinds of token declarations allow `<TYPE>'.
   4732 
   4733 
   4734 File: bison.info,  Node: Initial Action Decl,  Next: Destructor Decl,  Prev: Type Decl,  Up: Declarations
   4735 
   4736 3.8.6 Performing Actions before Parsing
   4737 ---------------------------------------
   4738 
   4739 Sometimes your parser needs to perform some initializations before
   4740 parsing.  The `%initial-action' directive allows for such arbitrary
   4741 code.
   4742 
   4743  -- Directive: %initial-action { CODE }
   4744      Declare that the braced CODE must be invoked before parsing each
   4745      time `yyparse' is called.  The CODE may use `$$' (or `$<TAG>$')
   4746      and `@$' -- initial value and location of the lookahead -- and the
   4747      `%parse-param'.
   4748 
   4749    For instance, if your locations use a file name, you may use
   4750 
   4751      %parse-param { char const *file_name };
   4752      %initial-action
   4753      {
   4754        @$.initialize (file_name);
   4755      };
   4756 
   4757 
   4758 File: bison.info,  Node: Destructor Decl,  Next: Printer Decl,  Prev: Initial Action Decl,  Up: Declarations
   4759 
   4760 3.8.7 Freeing Discarded Symbols
   4761 -------------------------------
   4762 
   4763 During error recovery (*note Error Recovery::), symbols already pushed
   4764 on the stack and tokens coming from the rest of the file are discarded
   4765 until the parser falls on its feet.  If the parser runs out of memory,
   4766 or if it returns via `YYABORT' or `YYACCEPT', all the symbols on the
   4767 stack must be discarded.  Even if the parser succeeds, it must discard
   4768 the start symbol.
   4769 
   4770    When discarded symbols convey heap based information, this memory is
   4771 lost.  While this behavior can be tolerable for batch parsers, such as
   4772 in traditional compilers, it is unacceptable for programs like shells or
   4773 protocol implementations that may parse and execute indefinitely.
   4774 
   4775    The `%destructor' directive defines code that is called when a
   4776 symbol is automatically discarded.
   4777 
   4778  -- Directive: %destructor { CODE } SYMBOLS
   4779      Invoke the braced CODE whenever the parser discards one of the
   4780      SYMBOLS.  Within CODE, `$$' (or `$<TAG>$') designates the semantic
   4781      value associated with the discarded symbol, and `@$' designates
   4782      its location.  The additional parser parameters are also available
   4783      (*note The Parser Function `yyparse': Parser Function.).
   4784 
   4785      When a symbol is listed among SYMBOLS, its `%destructor' is called
   4786      a per-symbol `%destructor'.  You may also define a per-type
   4787      `%destructor' by listing a semantic type tag among SYMBOLS.  In
   4788      that case, the parser will invoke this CODE whenever it discards
   4789      any grammar symbol that has that semantic type tag unless that
   4790      symbol has its own per-symbol `%destructor'.
   4791 
   4792      Finally, you can define two different kinds of default
   4793      `%destructor's.  (These default forms are experimental.  More user
   4794      feedback will help to determine whether they should become
   4795      permanent features.)  You can place each of `<*>' and `<>' in the
   4796      SYMBOLS list of exactly one `%destructor' declaration in your
   4797      grammar file.  The parser will invoke the CODE associated with one
   4798      of these whenever it discards any user-defined grammar symbol that
   4799      has no per-symbol and no per-type `%destructor'.  The parser uses
   4800      the CODE for `<*>' in the case of such a grammar symbol for which
   4801      you have formally declared a semantic type tag (`%type' counts as
   4802      such a declaration, but `$<tag>$' does not).  The parser uses the
   4803      CODE for `<>' in the case of such a grammar symbol that has no
   4804      declared semantic type tag.
   4805 
   4806 For example:
   4807 
   4808      %union { char *string; }
   4809      %token <string> STRING1
   4810      %token <string> STRING2
   4811      %type  <string> string1
   4812      %type  <string> string2
   4813      %union { char character; }
   4814      %token <character> CHR
   4815      %type  <character> chr
   4816      %token TAGLESS
   4817 
   4818      %destructor { } <character>
   4819      %destructor { free ($$); } <*>
   4820      %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1
   4821      %destructor { printf ("Discarding tagless symbol.\n"); } <>
   4822 
   4823 guarantees that, when the parser discards any user-defined symbol that
   4824 has a semantic type tag other than `<character>', it passes its
   4825 semantic value to `free' by default.  However, when the parser discards
   4826 a `STRING1' or a `string1', it also prints its line number to `stdout'.
   4827 It performs only the second `%destructor' in this case, so it invokes
   4828 `free' only once.  Finally, the parser merely prints a message whenever
   4829 it discards any symbol, such as `TAGLESS', that has no semantic type
   4830 tag.
   4831 
   4832    A Bison-generated parser invokes the default `%destructor's only for
   4833 user-defined as opposed to Bison-defined symbols.  For example, the
   4834 parser will not invoke either kind of default `%destructor' for the
   4835 special Bison-defined symbols `$accept', `$undefined', or `$end' (*note
   4836 Bison Symbols: Table of Symbols.), none of which you can reference in
   4837 your grammar.  It also will not invoke either for the `error' token
   4838 (*note error: Table of Symbols.), which is always defined by Bison
   4839 regardless of whether you reference it in your grammar.  However, it
   4840 may invoke one of them for the end token (token 0) if you redefine it
   4841 from `$end' to, for example, `END':
   4842 
   4843      %token END 0
   4844 
   4845    Finally, Bison will never invoke a `%destructor' for an unreferenced
   4846 mid-rule semantic value (*note Actions in Mid-Rule: Mid-Rule Actions.).
   4847 That is, Bison does not consider a mid-rule to have a semantic value if
   4848 you do not reference `$$' in the mid-rule's action or `$N' (where N is
   4849 the right-hand side symbol position of the mid-rule) in any later
   4850 action in that rule.  However, if you do reference either, the
   4851 Bison-generated parser will invoke the `<>' `%destructor' whenever it
   4852 discards the mid-rule symbol.
   4853 
   4854 
   4855    "Discarded symbols" are the following:
   4856 
   4857    * stacked symbols popped during the first phase of error recovery,
   4858 
   4859    * incoming terminals during the second phase of error recovery,
   4860 
   4861    * the current lookahead and the entire stack (except the current
   4862      right-hand side symbols) when the parser returns immediately, and
   4863 
   4864    * the current lookahead and the entire stack (including the current
   4865      right-hand side symbols) when the C++ parser (`lalr1.cc') catches
   4866      an exception in `parse',
   4867 
   4868    * the start symbol, when the parser succeeds.
   4869 
   4870    The parser can "return immediately" because of an explicit call to
   4871 `YYABORT' or `YYACCEPT', or failed error recovery, or memory exhaustion.
   4872 
   4873    Right-hand side symbols of a rule that explicitly triggers a syntax
   4874 error via `YYERROR' are not discarded automatically.  As a rule of
   4875 thumb, destructors are invoked only when user actions cannot manage the
   4876 memory.
   4877 
   4878 
   4879 File: bison.info,  Node: Printer Decl,  Next: Expect Decl,  Prev: Destructor Decl,  Up: Declarations
   4880 
   4881 3.8.8 Printing Semantic Values
   4882 ------------------------------
   4883 
   4884 When run-time traces are enabled (*note Tracing Your Parser: Tracing.),
   4885 the parser reports its actions, such as reductions.  When a symbol
   4886 involved in an action is reported, only its kind is displayed, as the
   4887 parser cannot know how semantic values should be formatted.
   4888 
   4889    The `%printer' directive defines code that is called when a symbol is
   4890 reported.  Its syntax is the same as `%destructor' (*note Freeing
   4891 Discarded Symbols: Destructor Decl.).
   4892 
   4893  -- Directive: %printer { CODE } SYMBOLS
   4894      Invoke the braced CODE whenever the parser displays one of the
   4895      SYMBOLS.  Within CODE, `yyoutput' denotes the output stream (a
   4896      `FILE*' in C, and an `std::ostream&' in C++), `$$' (or `$<TAG>$')
   4897      designates the semantic value associated with the symbol, and `@$'
   4898      its location.  The additional parser parameters are also available
   4899      (*note The Parser Function `yyparse': Parser Function.).
   4900 
   4901      The SYMBOLS are defined as for `%destructor' (*note Freeing
   4902      Discarded Symbols: Destructor Decl.): they can be per-type (e.g.,
   4903      `<ival>'), per-symbol (e.g., `exp', `NUM', `"float"'), typed
   4904      per-default (i.e., `<*>', or untyped per-default (i.e., `<>').
   4905 
   4906 For example:
   4907 
   4908      %union { char *string; }
   4909      %token <string> STRING1
   4910      %token <string> STRING2
   4911      %type  <string> string1
   4912      %type  <string> string2
   4913      %union { char character; }
   4914      %token <character> CHR
   4915      %type  <character> chr
   4916      %token TAGLESS
   4917 
   4918      %printer { fprintf (yyoutput, "'%c'", $$); } <character>
   4919      %printer { fprintf (yyoutput, "&%p", $$); } <*>
   4920      %printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1
   4921      %printer { fprintf (yyoutput, "<>"); } <>
   4922 
   4923 guarantees that, when the parser print any symbol that has a semantic
   4924 type tag other than `<character>', it display the address of the
   4925 semantic value by default.  However, when the parser displays a
   4926 `STRING1' or a `string1', it formats it as a string in double quotes.
   4927 It performs only the second `%printer' in this case, so it prints only
   4928 once.  Finally, the parser print `<>' for any symbol, such as `TAGLESS',
   4929 that has no semantic type tag.  See also
   4930 
   4931 
   4932 File: bison.info,  Node: Expect Decl,  Next: Start Decl,  Prev: Printer Decl,  Up: Declarations
   4933 
   4934 3.8.9 Suppressing Conflict Warnings
   4935 -----------------------------------
   4936 
   4937 Bison normally warns if there are any conflicts in the grammar (*note
   4938 Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars have
   4939 harmless shift/reduce conflicts which are resolved in a predictable way
   4940 and would be difficult to eliminate.  It is desirable to suppress the
   4941 warning about these conflicts unless the number of conflicts changes.
   4942 You can do this with the `%expect' declaration.
   4943 
   4944    The declaration looks like this:
   4945 
   4946      %expect N
   4947 
   4948    Here N is a decimal integer.  The declaration says there should be N
   4949 shift/reduce conflicts and no reduce/reduce conflicts.  Bison reports
   4950 an error if the number of shift/reduce conflicts differs from N, or if
   4951 there are any reduce/reduce conflicts.
   4952 
   4953    For deterministic parsers, reduce/reduce conflicts are more serious,
   4954 and should be eliminated entirely.  Bison will always report
   4955 reduce/reduce conflicts for these parsers.  With GLR parsers, however,
   4956 both kinds of conflicts are routine; otherwise, there would be no need
   4957 to use GLR parsing.  Therefore, it is also possible to specify an
   4958 expected number of reduce/reduce conflicts in GLR parsers, using the
   4959 declaration:
   4960 
   4961      %expect-rr N
   4962 
   4963    In general, using `%expect' involves these steps:
   4964 
   4965    * Compile your grammar without `%expect'.  Use the `-v' option to
   4966      get a verbose list of where the conflicts occur.  Bison will also
   4967      print the number of conflicts.
   4968 
   4969    * Check each of the conflicts to make sure that Bison's default
   4970      resolution is what you really want.  If not, rewrite the grammar
   4971      and go back to the beginning.
   4972 
   4973    * Add an `%expect' declaration, copying the number N from the number
   4974      which Bison printed.  With GLR parsers, add an `%expect-rr'
   4975      declaration as well.
   4976 
   4977    Now Bison will report an error if you introduce an unexpected
   4978 conflict, but will keep silent otherwise.
   4979 
   4980 
   4981 File: bison.info,  Node: Start Decl,  Next: Pure Decl,  Prev: Expect Decl,  Up: Declarations
   4982 
   4983 3.8.10 The Start-Symbol
   4984 -----------------------
   4985 
   4986 Bison assumes by default that the start symbol for the grammar is the
   4987 first nonterminal specified in the grammar specification section.  The
   4988 programmer may override this restriction with the `%start' declaration
   4989 as follows:
   4990 
   4991      %start SYMBOL
   4992 
   4993 
   4994 File: bison.info,  Node: Pure Decl,  Next: Push Decl,  Prev: Start Decl,  Up: Declarations
   4995 
   4996 3.8.11 A Pure (Reentrant) Parser
   4997 --------------------------------
   4998 
   4999 A "reentrant" program is one which does not alter in the course of
   5000 execution; in other words, it consists entirely of "pure" (read-only)
   5001 code.  Reentrancy is important whenever asynchronous execution is
   5002 possible; for example, a nonreentrant program may not be safe to call
   5003 from a signal handler.  In systems with multiple threads of control, a
   5004 nonreentrant program must be called only within interlocks.
   5005 
   5006    Normally, Bison generates a parser which is not reentrant.  This is
   5007 suitable for most uses, and it permits compatibility with Yacc.  (The
   5008 standard Yacc interfaces are inherently nonreentrant, because they use
   5009 statically allocated variables for communication with `yylex',
   5010 including `yylval' and `yylloc'.)
   5011 
   5012    Alternatively, you can generate a pure, reentrant parser.  The Bison
   5013 declaration `%define api.pure' says that you want the parser to be
   5014 reentrant.  It looks like this:
   5015 
   5016      %define api.pure full
   5017 
   5018    The result is that the communication variables `yylval' and `yylloc'
   5019 become local variables in `yyparse', and a different calling convention
   5020 is used for the lexical analyzer function `yylex'.  *Note Calling
   5021 Conventions for Pure Parsers: Pure Calling, for the details of this.
   5022 The variable `yynerrs' becomes local in `yyparse' in pull mode but it
   5023 becomes a member of yypstate in push mode.  (*note The Error Reporting
   5024 Function `yyerror': Error Reporting.).  The convention for calling
   5025 `yyparse' itself is unchanged.
   5026 
   5027    Whether the parser is pure has nothing to do with the grammar rules.
   5028 You can generate either a pure parser or a nonreentrant parser from any
   5029 valid grammar.
   5030 
   5031 
   5032 File: bison.info,  Node: Push Decl,  Next: Decl Summary,  Prev: Pure Decl,  Up: Declarations
   5033 
   5034 3.8.12 A Push Parser
   5035 --------------------
   5036 
   5037 (The current push parsing interface is experimental and may evolve.
   5038 More user feedback will help to stabilize it.)
   5039 
   5040    A pull parser is called once and it takes control until all its input
   5041 is completely parsed.  A push parser, on the other hand, is called each
   5042 time a new token is made available.
   5043 
   5044    A push parser is typically useful when the parser is part of a main
   5045 event loop in the client's application.  This is typically a
   5046 requirement of a GUI, when the main event loop needs to be triggered
   5047 within a certain time period.
   5048 
   5049    Normally, Bison generates a pull parser.  The following Bison
   5050 declaration says that you want the parser to be a push parser (*note
   5051 api.push-pull: %define Summary.):
   5052 
   5053      %define api.push-pull push
   5054 
   5055    In almost all cases, you want to ensure that your push parser is also
   5056 a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.).  The only
   5057 time you should create an impure push parser is to have backwards
   5058 compatibility with the impure Yacc pull mode interface.  Unless you know
   5059 what you are doing, your declarations should look like this:
   5060 
   5061      %define api.pure full
   5062      %define api.push-pull push
   5063 
   5064    There is a major notable functional difference between the pure push
   5065 parser and the impure push parser.  It is acceptable for a pure push
   5066 parser to have many parser instances, of the same type of parser, in
   5067 memory at the same time.  An impure push parser should only use one
   5068 parser at a time.
   5069 
   5070    When a push parser is selected, Bison will generate some new symbols
   5071 in the generated parser.  `yypstate' is a structure that the generated
   5072 parser uses to store the parser's state.  `yypstate_new' is the
   5073 function that will create a new parser instance.  `yypstate_delete'
   5074 will free the resources associated with the corresponding parser
   5075 instance.  Finally, `yypush_parse' is the function that should be
   5076 called whenever a token is available to provide the parser.  A trivial
   5077 example of using a pure push parser would look like this:
   5078 
   5079      int status;
   5080      yypstate *ps = yypstate_new ();
   5081      do {
   5082        status = yypush_parse (ps, yylex (), NULL);
   5083      } while (status == YYPUSH_MORE);
   5084      yypstate_delete (ps);
   5085 
   5086    If the user decided to use an impure push parser, a few things about
   5087 the generated parser will change.  The `yychar' variable becomes a
   5088 global variable instead of a variable in the `yypush_parse' function.
   5089 For this reason, the signature of the `yypush_parse' function is
   5090 changed to remove the token as a parameter.  A nonreentrant push parser
   5091 example would thus look like this:
   5092 
   5093      extern int yychar;
   5094      int status;
   5095      yypstate *ps = yypstate_new ();
   5096      do {
   5097        yychar = yylex ();
   5098        status = yypush_parse (ps);
   5099      } while (status == YYPUSH_MORE);
   5100      yypstate_delete (ps);
   5101 
   5102    That's it. Notice the next token is put into the global variable
   5103 `yychar' for use by the next invocation of the `yypush_parse' function.
   5104 
   5105    Bison also supports both the push parser interface along with the
   5106 pull parser interface in the same generated parser.  In order to get
   5107 this functionality, you should replace the `%define api.push-pull push'
   5108 declaration with the `%define api.push-pull both' declaration.  Doing
   5109 this will create all of the symbols mentioned earlier along with the
   5110 two extra symbols, `yyparse' and `yypull_parse'.  `yyparse' can be used
   5111 exactly as it normally would be used.  However, the user should note
   5112 that it is implemented in the generated parser by calling
   5113 `yypull_parse'.  This makes the `yyparse' function that is generated
   5114 with the `%define api.push-pull both' declaration slower than the normal
   5115 `yyparse' function.  If the user calls the `yypull_parse' function it
   5116 will parse the rest of the input stream.  It is possible to
   5117 `yypush_parse' tokens to select a subgrammar and then `yypull_parse'
   5118 the rest of the input stream.  If you would like to switch back and
   5119 forth between between parsing styles, you would have to write your own
   5120 `yypull_parse' function that knows when to quit looking for input.  An
   5121 example of using the `yypull_parse' function would look like this:
   5122 
   5123      yypstate *ps = yypstate_new ();
   5124      yypull_parse (ps); /* Will call the lexer */
   5125      yypstate_delete (ps);
   5126 
   5127    Adding the `%define api.pure full' declaration does exactly the same
   5128 thing to the generated parser with `%define api.push-pull both' as it
   5129 did for `%define api.push-pull push'.
   5130 
   5131 
   5132 File: bison.info,  Node: Decl Summary,  Next: %define Summary,  Prev: Push Decl,  Up: Declarations
   5133 
   5134 3.8.13 Bison Declaration Summary
   5135 --------------------------------
   5136 
   5137 Here is a summary of the declarations used to define a grammar:
   5138 
   5139  -- Directive: %union
   5140      Declare the collection of data types that semantic values may have
   5141      (*note The Collection of Value Types: Union Decl.).
   5142 
   5143  -- Directive: %token
   5144      Declare a terminal symbol (token type name) with no precedence or
   5145      associativity specified (*note Token Type Names: Token Decl.).
   5146 
   5147  -- Directive: %right
   5148      Declare a terminal symbol (token type name) that is
   5149      right-associative (*note Operator Precedence: Precedence Decl.).
   5150 
   5151  -- Directive: %left
   5152      Declare a terminal symbol (token type name) that is
   5153      left-associative (*note Operator Precedence: Precedence Decl.).
   5154 
   5155  -- Directive: %nonassoc
   5156      Declare a terminal symbol (token type name) that is nonassociative
   5157      (*note Operator Precedence: Precedence Decl.).  Using it in a way
   5158      that would be associative is a syntax error.
   5159 
   5160  -- Directive: %type
   5161      Declare the type of semantic values for a nonterminal symbol
   5162      (*note Nonterminal Symbols: Type Decl.).
   5163 
   5164  -- Directive: %start
   5165      Specify the grammar's start symbol (*note The Start-Symbol: Start
   5166      Decl.).
   5167 
   5168  -- Directive: %expect
   5169      Declare the expected number of shift-reduce conflicts (*note
   5170      Suppressing Conflict Warnings: Expect Decl.).
   5171 
   5172 
   5173 In order to change the behavior of `bison', use the following
   5174 directives:
   5175 
   5176  -- Directive: %code {CODE}
   5177  -- Directive: %code QUALIFIER {CODE}
   5178      Insert CODE verbatim into the output parser source at the default
   5179      location or at the location specified by QUALIFIER.  *Note %code
   5180      Summary::.
   5181 
   5182  -- Directive: %debug
   5183      In the parser implementation file, define the macro `YYDEBUG' (or
   5184      `PREFIXDEBUG' with `%define api.prefix PREFIX', see *note Multiple
   5185      Parsers in the Same Program: Multiple Parsers.) to 1 if it is not
   5186      already defined, so that the debugging facilities are compiled.
   5187      *Note Tracing Your Parser: Tracing.
   5188 
   5189  -- Directive: %define VARIABLE
   5190  -- Directive: %define VARIABLE VALUE
   5191  -- Directive: %define VARIABLE "VALUE"
   5192      Define a variable to adjust Bison's behavior.  *Note %define
   5193      Summary::.
   5194 
   5195  -- Directive: %defines
   5196      Write a parser header file containing macro definitions for the
   5197      token type names defined in the grammar as well as a few other
   5198      declarations.  If the parser implementation file is named `NAME.c'
   5199      then the parser header file is named `NAME.h'.
   5200 
   5201      For C parsers, the parser header file declares `YYSTYPE' unless
   5202      `YYSTYPE' is already defined as a macro or you have used a
   5203      `<TYPE>' tag without using `%union'.  Therefore, if you are using
   5204      a `%union' (*note More Than One Value Type: Multiple Types.) with
   5205      components that require other definitions, or if you have defined
   5206      a `YYSTYPE' macro or type definition (*note Data Types of Semantic
   5207      Values: Value Type.), you need to arrange for these definitions to
   5208      be propagated to all modules, e.g., by putting them in a
   5209      prerequisite header that is included both by your parser and by any
   5210      other module that needs `YYSTYPE'.
   5211 
   5212      Unless your parser is pure, the parser header file declares
   5213      `yylval' as an external variable.  *Note A Pure (Reentrant)
   5214      Parser: Pure Decl.
   5215 
   5216      If you have also used locations, the parser header file declares
   5217      `YYLTYPE' and `yylloc' using a protocol similar to that of the
   5218      `YYSTYPE' macro and `yylval'.  *Note Tracking Locations::.
   5219 
   5220      This parser header file is normally essential if you wish to put
   5221      the definition of `yylex' in a separate source file, because
   5222      `yylex' typically needs to be able to refer to the above-mentioned
   5223      declarations and to the token type codes.  *Note Semantic Values
   5224      of Tokens: Token Values.
   5225 
   5226      If you have declared `%code requires' or `%code provides', the
   5227      output header also contains their code.  *Note %code Summary::.
   5228 
   5229      The generated header is protected against multiple inclusions with
   5230      a C preprocessor guard: `YY_PREFIX_FILE_INCLUDED', where PREFIX
   5231      and FILE are the prefix (*note Multiple Parsers in the Same
   5232      Program: Multiple Parsers.) and generated file name turned
   5233      uppercase, with each series of non alphanumerical characters
   5234      converted to a single underscore.
   5235 
   5236      For instance with `%define api.prefix "calc"' and `%defines
   5237      "lib/parse.h"', the header will be guarded as follows.
   5238           #ifndef YY_CALC_LIB_PARSE_H_INCLUDED
   5239           # define YY_CALC_LIB_PARSE_H_INCLUDED
   5240           ...
   5241           #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */
   5242 
   5243  -- Directive: %defines DEFINES-FILE
   5244      Same as above, but save in the file DEFINES-FILE.
   5245 
   5246  -- Directive: %destructor
   5247      Specify how the parser should reclaim the memory associated to
   5248      discarded symbols.  *Note Freeing Discarded Symbols: Destructor
   5249      Decl.
   5250 
   5251  -- Directive: %file-prefix "PREFIX"
   5252      Specify a prefix to use for all Bison output file names.  The names
   5253      are chosen as if the grammar file were named `PREFIX.y'.
   5254 
   5255  -- Directive: %language "LANGUAGE"
   5256      Specify the programming language for the generated parser.
   5257      Currently supported languages include C, C++, and Java.  LANGUAGE
   5258      is case-insensitive.
   5259 
   5260 
   5261  -- Directive: %locations
   5262      Generate the code processing the locations (*note Special Features
   5263      for Use in Actions: Action Features.).  This mode is enabled as
   5264      soon as the grammar uses the special `@N' tokens, but if your
   5265      grammar does not use it, using `%locations' allows for more
   5266      accurate syntax error messages.
   5267 
   5268  -- Directive: %no-lines
   5269      Don't generate any `#line' preprocessor commands in the parser
   5270      implementation file.  Ordinarily Bison writes these commands in the
   5271      parser implementation file so that the C compiler and debuggers
   5272      will associate errors and object code with your source file (the
   5273      grammar file).  This directive causes them to associate errors
   5274      with the parser implementation file, treating it as an independent
   5275      source file in its own right.
   5276 
   5277  -- Directive: %output "FILE"
   5278      Specify FILE for the parser implementation file.
   5279 
   5280  -- Directive: %pure-parser
   5281      Deprecated version of `%define api.pure' (*note api.pure: %define
   5282      Summary.), for which Bison is more careful to warn about
   5283      unreasonable usage.
   5284 
   5285  -- Directive: %require "VERSION"
   5286      Require version VERSION or higher of Bison.  *Note Require a
   5287      Version of Bison: Require Decl.
   5288 
   5289  -- Directive: %skeleton "FILE"
   5290      Specify the skeleton to use.
   5291 
   5292      If FILE does not contain a `/', FILE is the name of a skeleton
   5293      file in the Bison installation directory.  If it does, FILE is an
   5294      absolute file name or a file name relative to the directory of the
   5295      grammar file.  This is similar to how most shells resolve commands.
   5296 
   5297  -- Directive: %token-table
   5298      Generate an array of token names in the parser implementation file.
   5299      The name of the array is `yytname'; `yytname[I]' is the name of
   5300      the token whose internal Bison token code number is I.  The first
   5301      three elements of `yytname' correspond to the predefined tokens
   5302      `"$end"', `"error"', and `"$undefined"'; after these come the
   5303      symbols defined in the grammar file.
   5304 
   5305      The name in the table includes all the characters needed to
   5306      represent the token in Bison.  For single-character literals and
   5307      literal strings, this includes the surrounding quoting characters
   5308      and any escape sequences.  For example, the Bison single-character
   5309      literal `'+'' corresponds to a three-character name, represented
   5310      in C as `"'+'"'; and the Bison two-character literal string `"\\/"'
   5311      corresponds to a five-character name, represented in C as
   5312      `"\"\\\\/\""'.
   5313 
   5314      When you specify `%token-table', Bison also generates macro
   5315      definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and
   5316      `YYNSTATES':
   5317 
   5318     `YYNTOKENS'
   5319           The highest token number, plus one.
   5320 
   5321     `YYNNTS'
   5322           The number of nonterminal symbols.
   5323 
   5324     `YYNRULES'
   5325           The number of grammar rules,
   5326 
   5327     `YYNSTATES'
   5328           The number of parser states (*note Parser States::).
   5329 
   5330  -- Directive: %verbose
   5331      Write an extra output file containing verbose descriptions of the
   5332      parser states and what is done for each type of lookahead token in
   5333      that state.  *Note Understanding Your Parser: Understanding, for
   5334      more information.
   5335 
   5336  -- Directive: %yacc
   5337      Pretend the option `--yacc' was given, i.e., imitate Yacc,
   5338      including its naming conventions.  *Note Bison Options::, for more.
   5339 
   5340 
   5341 File: bison.info,  Node: %define Summary,  Next: %code Summary,  Prev: Decl Summary,  Up: Declarations
   5342 
   5343 3.8.14 %define Summary
   5344 ----------------------
   5345 
   5346 There are many features of Bison's behavior that can be controlled by
   5347 assigning the feature a single value.  For historical reasons, some
   5348 such features are assigned values by dedicated directives, such as
   5349 `%start', which assigns the start symbol.  However, newer such features
   5350 are associated with variables, which are assigned by the `%define'
   5351 directive:
   5352 
   5353  -- Directive: %define VARIABLE
   5354  -- Directive: %define VARIABLE VALUE
   5355  -- Directive: %define VARIABLE "VALUE"
   5356      Define VARIABLE to VALUE.
   5357 
   5358      VALUE must be placed in quotation marks if it contains any
   5359      character other than a letter, underscore, period, or non-initial
   5360      dash or digit.  Omitting `"VALUE"' entirely is always equivalent
   5361      to specifying `""'.
   5362 
   5363      It is an error if a VARIABLE is defined by `%define' multiple
   5364      times, but see *note -D NAME[=VALUE]: Bison Options.
   5365 
   5366    The rest of this section summarizes variables and values that
   5367 `%define' accepts.
   5368 
   5369    Some VARIABLEs take Boolean values.  In this case, Bison will
   5370 complain if the variable definition does not meet one of the following
   5371 four conditions:
   5372 
   5373   1. `VALUE' is `true'
   5374 
   5375   2. `VALUE' is omitted (or `""' is specified).  This is equivalent to
   5376      `true'.
   5377 
   5378   3. `VALUE' is `false'.
   5379 
   5380   4. VARIABLE is never defined.  In this case, Bison selects a default
   5381      value.
   5382 
   5383    What VARIABLEs are accepted, as well as their meanings and default
   5384 values, depend on the selected target language and/or the parser
   5385 skeleton (*note %language: Decl Summary, *note %skeleton: Decl
   5386 Summary.).  Unaccepted VARIABLEs produce an error.  Some of the
   5387 accepted VARIABLEs are:
   5388 
   5389    * `api.location.type' 
   5390 
   5391         * Language(s): C++, Java
   5392 
   5393         * Purpose: Define the location type.  *Note User Defined
   5394           Location Type::.
   5395 
   5396         * Accepted Values: String
   5397 
   5398         * Default Value: none
   5399 
   5400         * History: introduced in Bison 2.7
   5401 
   5402    * `api.prefix' 
   5403 
   5404         * Language(s): All
   5405 
   5406         * Purpose: Rename exported symbols.  *Note Multiple Parsers in
   5407           the Same Program: Multiple Parsers.
   5408 
   5409         * Accepted Values: String
   5410 
   5411         * Default Value: `yy'
   5412 
   5413         * History: introduced in Bison 2.6
   5414 
   5415    * `api.pure' 
   5416 
   5417         * Language(s): C
   5418 
   5419         * Purpose: Request a pure (reentrant) parser program.  *Note A
   5420           Pure (Reentrant) Parser: Pure Decl.
   5421 
   5422         * Accepted Values: `true', `false', `full'
   5423 
   5424           The value may be omitted: this is equivalent to specifying
   5425           `true', as is the case for Boolean values.
   5426 
   5427           When `%define api.pure full' is used, the parser is made
   5428           reentrant. This changes the signature for `yylex' (*note Pure
   5429           Calling::), and also that of `yyerror' when the tracking of
   5430           locations has been activated, as shown below.
   5431 
   5432           The `true' value is very similar to the `full' value, the only
   5433           difference is in the signature of `yyerror' on Yacc parsers
   5434           without `%parse-param', for historical reasons.
   5435 
   5436           I.e., if `%locations %define api.pure' is passed then the
   5437           prototypes for `yyerror' are:
   5438 
   5439                void yyerror (char const *msg);                 // Yacc parsers.
   5440                void yyerror (YYLTYPE *locp, char const *msg);  // GLR parsers.
   5441 
   5442           But if `%locations %define api.pure %parse-param {int
   5443           *nastiness}' is used, then both parsers have the same
   5444           signature:
   5445 
   5446                void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg);
   5447 
   5448           (*note The Error Reporting Function `yyerror': Error
   5449           Reporting.)
   5450 
   5451         * Default Value: `false'
   5452 
   5453         * History: the `full' value was introduced in Bison 2.7
   5454 
   5455    * `api.push-pull' 
   5456 
   5457         * Language(s): C (deterministic parsers only)
   5458 
   5459         * Purpose: Request a pull parser, a push parser, or both.
   5460           *Note A Push Parser: Push Decl.  (The current push parsing
   5461           interface is experimental and may evolve.  More user feedback
   5462           will help to stabilize it.)
   5463 
   5464         * Accepted Values: `pull', `push', `both'
   5465 
   5466         * Default Value: `pull'
   5467 
   5468    * `lr.default-reductions' 
   5469 
   5470         * Language(s): all
   5471 
   5472         * Purpose: Specify the kind of states that are permitted to
   5473           contain default reductions.  *Note Default Reductions::.
   5474           (The ability to specify where default reductions should be
   5475           used is experimental.  More user feedback will help to
   5476           stabilize it.)
   5477 
   5478         * Accepted Values: `most', `consistent', `accepting'
   5479 
   5480         * Default Value:
   5481              * `accepting' if `lr.type' is `canonical-lr'.
   5482 
   5483              * `most' otherwise.
   5484 
   5485    * `lr.keep-unreachable-states' 
   5486 
   5487         * Language(s): all
   5488 
   5489         * Purpose: Request that Bison allow unreachable parser states to
   5490           remain in the parser tables.  *Note Unreachable States::.
   5491 
   5492         * Accepted Values: Boolean
   5493 
   5494         * Default Value: `false'
   5495 
   5496    * `lr.type' 
   5497 
   5498         * Language(s): all
   5499 
   5500         * Purpose: Specify the type of parser tables within the LR(1)
   5501           family.  *Note LR Table Construction::.  (This feature is
   5502           experimental.  More user feedback will help to stabilize it.)
   5503 
   5504         * Accepted Values: `lalr', `ielr', `canonical-lr'
   5505 
   5506         * Default Value: `lalr'
   5507 
   5508    * `namespace' 
   5509 
   5510         * Languages(s): C++
   5511 
   5512         * Purpose: Specify the namespace for the parser class.  For
   5513           example, if you specify:
   5514 
   5515                %define namespace "foo::bar"
   5516 
   5517           Bison uses `foo::bar' verbatim in references such as:
   5518 
   5519                foo::bar::parser::semantic_type
   5520 
   5521           However, to open a namespace, Bison removes any leading `::'
   5522           and then splits on any remaining occurrences:
   5523 
   5524                namespace foo { namespace bar {
   5525                  class position;
   5526                  class location;
   5527                } }
   5528 
   5529         * Accepted Values: Any absolute or relative C++ namespace
   5530           reference without a trailing `"::"'.  For example, `"foo"' or
   5531           `"::foo::bar"'.
   5532 
   5533         * Default Value: The value specified by `%name-prefix', which
   5534           defaults to `yy'.  This usage of `%name-prefix' is for
   5535           backward compatibility and can be confusing since
   5536           `%name-prefix' also specifies the textual prefix for the
   5537           lexical analyzer function.  Thus, if you specify
   5538           `%name-prefix', it is best to also specify `%define
   5539           namespace' so that `%name-prefix' _only_ affects the lexical
   5540           analyzer function.  For example, if you specify:
   5541 
   5542                %define namespace "foo"
   5543                %name-prefix "bar::"
   5544 
   5545           The parser namespace is `foo' and `yylex' is referenced as
   5546           `bar::lex'.
   5547 
   5548    * `parse.lac' 
   5549 
   5550         * Languages(s): C (deterministic parsers only)
   5551 
   5552         * Purpose: Enable LAC (lookahead correction) to improve syntax
   5553           error handling.  *Note LAC::.
   5554 
   5555         * Accepted Values: `none', `full'
   5556 
   5557         * Default Value: `none'
   5558 
   5559 
   5560 File: bison.info,  Node: %code Summary,  Prev: %define Summary,  Up: Declarations
   5561 
   5562 3.8.15 %code Summary
   5563 --------------------
   5564 
   5565 The `%code' directive inserts code verbatim into the output parser
   5566 source at any of a predefined set of locations.  It thus serves as a
   5567 flexible and user-friendly alternative to the traditional Yacc
   5568 prologue, `%{CODE%}'.  This section summarizes the functionality of
   5569 `%code' for the various target languages supported by Bison.  For a
   5570 detailed discussion of how to use `%code' in place of `%{CODE%}' for
   5571 C/C++ and why it is advantageous to do so, *note Prologue
   5572 Alternatives::.
   5573 
   5574  -- Directive: %code {CODE}
   5575      This is the unqualified form of the `%code' directive.  It inserts
   5576      CODE verbatim at a language-dependent default location in the
   5577      parser implementation.
   5578 
   5579      For C/C++, the default location is the parser implementation file
   5580      after the usual contents of the parser header file.  Thus, the
   5581      unqualified form replaces `%{CODE%}' for most purposes.
   5582 
   5583      For Java, the default location is inside the parser class.
   5584 
   5585  -- Directive: %code QUALIFIER {CODE}
   5586      This is the qualified form of the `%code' directive.  QUALIFIER
   5587      identifies the purpose of CODE and thus the location(s) where
   5588      Bison should insert it.  That is, if you need to specify
   5589      location-sensitive CODE that does not belong at the default
   5590      location selected by the unqualified `%code' form, use this form
   5591      instead.
   5592 
   5593    For any particular qualifier or for the unqualified form, if there
   5594 are multiple occurrences of the `%code' directive, Bison concatenates
   5595 the specified code in the order in which it appears in the grammar file.
   5596 
   5597    Not all qualifiers are accepted for all target languages.  Unaccepted
   5598 qualifiers produce an error.  Some of the accepted qualifiers are:
   5599 
   5600    * requires 
   5601 
   5602         * Language(s): C, C++
   5603 
   5604         * Purpose: This is the best place to write dependency code
   5605           required for `YYSTYPE' and `YYLTYPE'.  In other words, it's
   5606           the best place to define types referenced in `%union'
   5607           directives, and it's the best place to override Bison's
   5608           default `YYSTYPE' and `YYLTYPE' definitions.
   5609 
   5610         * Location(s): The parser header file and the parser
   5611           implementation file before the Bison-generated `YYSTYPE' and
   5612           `YYLTYPE' definitions.
   5613 
   5614    * provides 
   5615 
   5616         * Language(s): C, C++
   5617 
   5618         * Purpose: This is the best place to write additional
   5619           definitions and declarations that should be provided to other
   5620           modules.
   5621 
   5622         * Location(s): The parser header file and the parser
   5623           implementation file after the Bison-generated `YYSTYPE',
   5624           `YYLTYPE', and token definitions.
   5625 
   5626    * top 
   5627 
   5628         * Language(s): C, C++
   5629 
   5630         * Purpose: The unqualified `%code' or `%code requires' should
   5631           usually be more appropriate than `%code top'.  However,
   5632           occasionally it is necessary to insert code much nearer the
   5633           top of the parser implementation file.  For example:
   5634 
   5635                %code top {
   5636                  #define _GNU_SOURCE
   5637                  #include <stdio.h>
   5638                }
   5639 
   5640         * Location(s): Near the top of the parser implementation file.
   5641 
   5642    * imports 
   5643 
   5644         * Language(s): Java
   5645 
   5646         * Purpose: This is the best place to write Java import
   5647           directives.
   5648 
   5649         * Location(s): The parser Java file after any Java package
   5650           directive and before any class definitions.
   5651 
   5652    Though we say the insertion locations are language-dependent, they
   5653 are technically skeleton-dependent.  Writers of non-standard skeletons
   5654 however should choose their locations consistently with the behavior of
   5655 the standard Bison skeletons.
   5656 
   5657 
   5658 File: bison.info,  Node: Multiple Parsers,  Prev: Declarations,  Up: Grammar File
   5659 
   5660 3.9 Multiple Parsers in the Same Program
   5661 ========================================
   5662 
   5663 Most programs that use Bison parse only one language and therefore
   5664 contain only one Bison parser.  But what if you want to parse more than
   5665 one language with the same program?  Then you need to avoid name
   5666 conflicts between different definitions of functions and variables such
   5667 as `yyparse', `yylval'.  To use different parsers from the same
   5668 compilation unit, you also need to avoid conflicts on types and macros
   5669 (e.g., `YYSTYPE') exported in the generated header.
   5670 
   5671    The easy way to do this is to define the `%define' variable
   5672 `api.prefix'.  With different `api.prefix's it is guaranteed that
   5673 headers do not conflict when included together, and that compiled
   5674 objects can be linked together too.  Specifying `%define api.prefix
   5675 PREFIX' (or passing the option `-Dapi.prefix=PREFIX', see *note
   5676 Invoking Bison: Invocation.) renames the interface functions and
   5677 variables of the Bison parser to start with PREFIX instead of `yy', and
   5678 all the macros to start by PREFIX (i.e., PREFIX upper-cased) instead of
   5679 `YY'.
   5680 
   5681    The renamed symbols include `yyparse', `yylex', `yyerror',
   5682 `yynerrs', `yylval', `yylloc', `yychar' and `yydebug'.  If you use a
   5683 push parser, `yypush_parse', `yypull_parse', `yypstate', `yypstate_new'
   5684 and `yypstate_delete' will also be renamed.  The renamed macros include
   5685 `YYSTYPE', `YYLTYPE', and `YYDEBUG', which is treated specifically --
   5686 more about this below.
   5687 
   5688    For example, if you use `%define api.prefix c', the names become
   5689 `cparse', `clex', ..., `CSTYPE', `CLTYPE', and so on.
   5690 
   5691    The `%define' variable `api.prefix' works in two different ways.  In
   5692 the implementation file, it works by adding macro definitions to the
   5693 beginning of the parser implementation file, defining `yyparse' as
   5694 `PREFIXparse', and so on:
   5695 
   5696      #define YYSTYPE CTYPE
   5697      #define yyparse cparse
   5698      #define yylval  clval
   5699      ...
   5700      YYSTYPE yylval;
   5701      int yyparse (void);
   5702 
   5703    This effectively substitutes one name for the other in the entire
   5704 parser implementation file, thus the "original" names (`yylex',
   5705 `YYSTYPE', ...) are also usable in the parser implementation file.
   5706 
   5707    However, in the parser header file, the symbols are defined renamed,
   5708 for instance:
   5709 
   5710      extern CSTYPE clval;
   5711      int cparse (void);
   5712 
   5713    The macro `YYDEBUG' is commonly used to enable the tracing support in
   5714 parsers.  To comply with this tradition, when `api.prefix' is used,
   5715 `YYDEBUG' (not renamed) is used as a default value:
   5716 
   5717      /* Enabling traces.  */
   5718      #ifndef CDEBUG
   5719      # if defined YYDEBUG
   5720      #  if YYDEBUG
   5721      #   define CDEBUG 1
   5722      #  else
   5723      #   define CDEBUG 0
   5724      #  endif
   5725      # else
   5726      #  define CDEBUG 0
   5727      # endif
   5728      #endif
   5729      #if CDEBUG
   5730      extern int cdebug;
   5731      #endif
   5732 
   5733 
   5734 
   5735    Prior to Bison 2.6, a feature similar to `api.prefix' was provided by
   5736 the obsolete directive `%name-prefix' (*note Bison Symbols: Table of
   5737 Symbols.) and the option `--name-prefix' (*note Bison Options::).
   5738 
   5739 
   5740 File: bison.info,  Node: Interface,  Next: Algorithm,  Prev: Grammar File,  Up: Top
   5741 
   5742 4 Parser C-Language Interface
   5743 *****************************
   5744 
   5745 The Bison parser is actually a C function named `yyparse'.  Here we
   5746 describe the interface conventions of `yyparse' and the other functions
   5747 that it needs to use.
   5748 
   5749    Keep in mind that the parser uses many C identifiers starting with
   5750 `yy' and `YY' for internal purposes.  If you use such an identifier
   5751 (aside from those in this manual) in an action or in epilogue in the
   5752 grammar file, you are likely to run into trouble.
   5753 
   5754 * Menu:
   5755 
   5756 * Parser Function::         How to call `yyparse' and what it returns.
   5757 * Push Parser Function::    How to call `yypush_parse' and what it returns.
   5758 * Pull Parser Function::    How to call `yypull_parse' and what it returns.
   5759 * Parser Create Function::  How to call `yypstate_new' and what it returns.
   5760 * Parser Delete Function::  How to call `yypstate_delete' and what it returns.
   5761 * Lexical::                 You must supply a function `yylex'
   5762                               which reads tokens.
   5763 * Error Reporting::         You must supply a function `yyerror'.
   5764 * Action Features::         Special features for use in actions.
   5765 * Internationalization::    How to let the parser speak in the user's
   5766                               native language.
   5767 
   5768 
   5769 File: bison.info,  Node: Parser Function,  Next: Push Parser Function,  Up: Interface
   5770 
   5771 4.1 The Parser Function `yyparse'
   5772 =================================
   5773 
   5774 You call the function `yyparse' to cause parsing to occur.  This
   5775 function reads tokens, executes actions, and ultimately returns when it
   5776 encounters end-of-input or an unrecoverable syntax error.  You can also
   5777 write an action which directs `yyparse' to return immediately without
   5778 reading further.
   5779 
   5780  -- Function: int yyparse (void)
   5781      The value returned by `yyparse' is 0 if parsing was successful
   5782      (return is due to end-of-input).
   5783 
   5784      The value is 1 if parsing failed because of invalid input, i.e.,
   5785      input that contains a syntax error or that causes `YYABORT' to be
   5786      invoked.
   5787 
   5788      The value is 2 if parsing failed due to memory exhaustion.
   5789 
   5790    In an action, you can cause immediate return from `yyparse' by using
   5791 these macros:
   5792 
   5793  -- Macro: YYACCEPT
   5794      Return immediately with value 0 (to report success).
   5795 
   5796  -- Macro: YYABORT
   5797      Return immediately with value 1 (to report failure).
   5798 
   5799    If you use a reentrant parser, you can optionally pass additional
   5800 parameter information to it in a reentrant way.  To do so, use the
   5801 declaration `%parse-param':
   5802 
   5803  -- Directive: %parse-param {ARGUMENT-DECLARATION}
   5804      Declare that an argument declared by the braced-code
   5805      ARGUMENT-DECLARATION is an additional `yyparse' argument.  The
   5806      ARGUMENT-DECLARATION is used when declaring functions or
   5807      prototypes.  The last identifier in ARGUMENT-DECLARATION must be
   5808      the argument name.
   5809 
   5810    Here's an example.  Write this in the parser:
   5811 
   5812      %parse-param {int *nastiness}
   5813      %parse-param {int *randomness}
   5814 
   5815 Then call the parser like this:
   5816 
   5817      {
   5818        int nastiness, randomness;
   5819        ...  /* Store proper data in `nastiness' and `randomness'.  */
   5820        value = yyparse (&nastiness, &randomness);
   5821        ...
   5822      }
   5823 
   5824 In the grammar actions, use expressions like this to refer to the data:
   5825 
   5826      exp: ...    { ...; *randomness += 1; ... }
   5827 
   5828 Using the following:
   5829      %parse-param {int *randomness}
   5830 
   5831    Results in these signatures:
   5832      void yyerror (int *randomness, const char *msg);
   5833      int  yyparse (int *randomness);
   5834 
   5835 Or, if both `%define api.pure full' (or just `%define api.pure') and
   5836 `%locations' are used:
   5837 
   5838      void yyerror (YYLTYPE *llocp, int *randomness, const char *msg);
   5839      int  yyparse (int *randomness);
   5840 
   5841 
   5842 File: bison.info,  Node: Push Parser Function,  Next: Pull Parser Function,  Prev: Parser Function,  Up: Interface
   5843 
   5844 4.2 The Push Parser Function `yypush_parse'
   5845 ===========================================
   5846 
   5847 (The current push parsing interface is experimental and may evolve.
   5848 More user feedback will help to stabilize it.)
   5849 
   5850    You call the function `yypush_parse' to parse a single token.  This
   5851 function is available if either the `%define api.push-pull push' or
   5852 `%define api.push-pull both' declaration is used.  *Note A Push Parser:
   5853 Push Decl.
   5854 
   5855  -- Function: int yypush_parse (yypstate *yyps)
   5856      The value returned by `yypush_parse' is the same as for yyparse
   5857      with the following exception: it returns `YYPUSH_MORE' if more
   5858      input is required to finish parsing the grammar.
   5859 
   5860 
   5861 File: bison.info,  Node: Pull Parser Function,  Next: Parser Create Function,  Prev: Push Parser Function,  Up: Interface
   5862 
   5863 4.3 The Pull Parser Function `yypull_parse'
   5864 ===========================================
   5865 
   5866 (The current push parsing interface is experimental and may evolve.
   5867 More user feedback will help to stabilize it.)
   5868 
   5869    You call the function `yypull_parse' to parse the rest of the input
   5870 stream.  This function is available if the `%define api.push-pull both'
   5871 declaration is used.  *Note A Push Parser: Push Decl.
   5872 
   5873  -- Function: int yypull_parse (yypstate *yyps)
   5874      The value returned by `yypull_parse' is the same as for `yyparse'.
   5875 
   5876 
   5877 File: bison.info,  Node: Parser Create Function,  Next: Parser Delete Function,  Prev: Pull Parser Function,  Up: Interface
   5878 
   5879 4.4 The Parser Create Function `yystate_new'
   5880 ============================================
   5881 
   5882 (The current push parsing interface is experimental and may evolve.
   5883 More user feedback will help to stabilize it.)
   5884 
   5885    You call the function `yypstate_new' to create a new parser instance.
   5886 This function is available if either the `%define api.push-pull push' or
   5887 `%define api.push-pull both' declaration is used.  *Note A Push Parser:
   5888 Push Decl.
   5889 
   5890  -- Function: yypstate* yypstate_new (void)
   5891      The function will return a valid parser instance if there was
   5892      memory available or 0 if no memory was available.  In impure mode,
   5893      it will also return 0 if a parser instance is currently allocated.
   5894 
   5895 
   5896 File: bison.info,  Node: Parser Delete Function,  Next: Lexical,  Prev: Parser Create Function,  Up: Interface
   5897 
   5898 4.5 The Parser Delete Function `yystate_delete'
   5899 ===============================================
   5900 
   5901 (The current push parsing interface is experimental and may evolve.
   5902 More user feedback will help to stabilize it.)
   5903 
   5904    You call the function `yypstate_delete' to delete a parser instance.
   5905 function is available if either the `%define api.push-pull push' or
   5906 `%define api.push-pull both' declaration is used.  *Note A Push Parser:
   5907 Push Decl.
   5908 
   5909  -- Function: void yypstate_delete (yypstate *yyps)
   5910      This function will reclaim the memory associated with a parser
   5911      instance.  After this call, you should no longer attempt to use
   5912      the parser instance.
   5913 
   5914 
   5915 File: bison.info,  Node: Lexical,  Next: Error Reporting,  Prev: Parser Delete Function,  Up: Interface
   5916 
   5917 4.6 The Lexical Analyzer Function `yylex'
   5918 =========================================
   5919 
   5920 The "lexical analyzer" function, `yylex', recognizes tokens from the
   5921 input stream and returns them to the parser.  Bison does not create
   5922 this function automatically; you must write it so that `yyparse' can
   5923 call it.  The function is sometimes referred to as a lexical scanner.
   5924 
   5925    In simple programs, `yylex' is often defined at the end of the Bison
   5926 grammar file.  If `yylex' is defined in a separate source file, you
   5927 need to arrange for the token-type macro definitions to be available
   5928 there.  To do this, use the `-d' option when you run Bison, so that it
   5929 will write these macro definitions into the separate parser header
   5930 file, `NAME.tab.h', which you can include in the other source files
   5931 that need it.  *Note Invoking Bison: Invocation.
   5932 
   5933 * Menu:
   5934 
   5935 * Calling Convention::  How `yyparse' calls `yylex'.
   5936 * Token Values::        How `yylex' must return the semantic value
   5937                           of the token it has read.
   5938 * Token Locations::     How `yylex' must return the text location
   5939                           (line number, etc.) of the token, if the
   5940                           actions want that.
   5941 * Pure Calling::        How the calling convention differs in a pure parser
   5942                           (*note A Pure (Reentrant) Parser: Pure Decl.).
   5943 
   5944 
   5945 File: bison.info,  Node: Calling Convention,  Next: Token Values,  Up: Lexical
   5946 
   5947 4.6.1 Calling Convention for `yylex'
   5948 ------------------------------------
   5949 
   5950 The value that `yylex' returns must be the positive numeric code for
   5951 the type of token it has just found; a zero or negative value signifies
   5952 end-of-input.
   5953 
   5954    When a token is referred to in the grammar rules by a name, that name
   5955 in the parser implementation file becomes a C macro whose definition is
   5956 the proper numeric code for that token type.  So `yylex' can use the
   5957 name to indicate that type.  *Note Symbols::.
   5958 
   5959    When a token is referred to in the grammar rules by a character
   5960 literal, the numeric code for that character is also the code for the
   5961 token type.  So `yylex' can simply return that character code, possibly
   5962 converted to `unsigned char' to avoid sign-extension.  The null
   5963 character must not be used this way, because its code is zero and that
   5964 signifies end-of-input.
   5965 
   5966    Here is an example showing these things:
   5967 
   5968      int
   5969      yylex (void)
   5970      {
   5971        ...
   5972        if (c == EOF)    /* Detect end-of-input.  */
   5973          return 0;
   5974        ...
   5975        if (c == '+' || c == '-')
   5976          return c;      /* Assume token type for `+' is '+'.  */
   5977        ...
   5978        return INT;      /* Return the type of the token.  */
   5979        ...
   5980      }
   5981 
   5982 This interface has been designed so that the output from the `lex'
   5983 utility can be used without change as the definition of `yylex'.
   5984 
   5985    If the grammar uses literal string tokens, there are two ways that
   5986 `yylex' can determine the token type codes for them:
   5987 
   5988    * If the grammar defines symbolic token names as aliases for the
   5989      literal string tokens, `yylex' can use these symbolic names like
   5990      all others.  In this case, the use of the literal string tokens in
   5991      the grammar file has no effect on `yylex'.
   5992 
   5993    * `yylex' can find the multicharacter token in the `yytname' table.
   5994      The index of the token in the table is the token type's code.  The
   5995      name of a multicharacter token is recorded in `yytname' with a
   5996      double-quote, the token's characters, and another double-quote.
   5997      The token's characters are escaped as necessary to be suitable as
   5998      input to Bison.
   5999 
   6000      Here's code for looking up a multicharacter token in `yytname',
   6001      assuming that the characters of the token are stored in
   6002      `token_buffer', and assuming that the token does not contain any
   6003      characters like `"' that require escaping.
   6004 
   6005           for (i = 0; i < YYNTOKENS; i++)
   6006             {
   6007               if (yytname[i] != 0
   6008                   && yytname[i][0] == '"'
   6009                   && ! strncmp (yytname[i] + 1, token_buffer,
   6010                                 strlen (token_buffer))
   6011                   && yytname[i][strlen (token_buffer) + 1] == '"'
   6012                   && yytname[i][strlen (token_buffer) + 2] == 0)
   6013                 break;
   6014             }
   6015 
   6016      The `yytname' table is generated only if you use the
   6017      `%token-table' declaration.  *Note Decl Summary::.
   6018 
   6019 
   6020 File: bison.info,  Node: Token Values,  Next: Token Locations,  Prev: Calling Convention,  Up: Lexical
   6021 
   6022 4.6.2 Semantic Values of Tokens
   6023 -------------------------------
   6024 
   6025 In an ordinary (nonreentrant) parser, the semantic value of the token
   6026 must be stored into the global variable `yylval'.  When you are using
   6027 just one data type for semantic values, `yylval' has that type.  Thus,
   6028 if the type is `int' (the default), you might write this in `yylex':
   6029 
   6030        ...
   6031        yylval = value;  /* Put value onto Bison stack.  */
   6032        return INT;      /* Return the type of the token.  */
   6033        ...
   6034 
   6035    When you are using multiple data types, `yylval''s type is a union
   6036 made from the `%union' declaration (*note The Collection of Value
   6037 Types: Union Decl.).  So when you store a token's value, you must use
   6038 the proper member of the union.  If the `%union' declaration looks like
   6039 this:
   6040 
   6041      %union {
   6042        int intval;
   6043        double val;
   6044        symrec *tptr;
   6045      }
   6046 
   6047 then the code in `yylex' might look like this:
   6048 
   6049        ...
   6050        yylval.intval = value; /* Put value onto Bison stack.  */
   6051        return INT;            /* Return the type of the token.  */
   6052        ...
   6053 
   6054 
   6055 File: bison.info,  Node: Token Locations,  Next: Pure Calling,  Prev: Token Values,  Up: Lexical
   6056 
   6057 4.6.3 Textual Locations of Tokens
   6058 ---------------------------------
   6059 
   6060 If you are using the `@N'-feature (*note Tracking Locations::) in
   6061 actions to keep track of the textual locations of tokens and groupings,
   6062 then you must provide this information in `yylex'.  The function
   6063 `yyparse' expects to find the textual location of a token just parsed
   6064 in the global variable `yylloc'.  So `yylex' must store the proper data
   6065 in that variable.
   6066 
   6067    By default, the value of `yylloc' is a structure and you need only
   6068 initialize the members that are going to be used by the actions.  The
   6069 four members are called `first_line', `first_column', `last_line' and
   6070 `last_column'.  Note that the use of this feature makes the parser
   6071 noticeably slower.
   6072 
   6073    The data type of `yylloc' has the name `YYLTYPE'.
   6074 
   6075 
   6076 File: bison.info,  Node: Pure Calling,  Prev: Token Locations,  Up: Lexical
   6077 
   6078 4.6.4 Calling Conventions for Pure Parsers
   6079 ------------------------------------------
   6080 
   6081 When you use the Bison declaration `%define api.pure full' to request a
   6082 pure, reentrant parser, the global communication variables `yylval' and
   6083 `yylloc' cannot be used.  (*Note A Pure (Reentrant) Parser: Pure Decl.)
   6084 In such parsers the two global variables are replaced by pointers
   6085 passed as arguments to `yylex'.  You must declare them as shown here,
   6086 and pass the information back by storing it through those pointers.
   6087 
   6088      int
   6089      yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
   6090      {
   6091        ...
   6092        *lvalp = value;  /* Put value onto Bison stack.  */
   6093        return INT;      /* Return the type of the token.  */
   6094        ...
   6095      }
   6096 
   6097    If the grammar file does not use the `@' constructs to refer to
   6098 textual locations, then the type `YYLTYPE' will not be defined.  In
   6099 this case, omit the second argument; `yylex' will be called with only
   6100 one argument.
   6101 
   6102    If you wish to pass the additional parameter data to `yylex', use
   6103 `%lex-param' just like `%parse-param' (*note Parser Function::).
   6104 
   6105  -- Directive: lex-param {ARGUMENT-DECLARATION}
   6106      Declare that the braced-code ARGUMENT-DECLARATION is an additional
   6107      `yylex' argument declaration.
   6108 
   6109 For instance:
   6110 
   6111      %lex-param   {int *nastiness}
   6112 
   6113 results in the following signature:
   6114 
   6115      int yylex (int *nastiness);
   6116 
   6117 If `%define api.pure full' (or just `%define api.pure') is added:
   6118 
   6119      int yylex (YYSTYPE *lvalp, int *nastiness);
   6120 
   6121 
   6122 File: bison.info,  Node: Error Reporting,  Next: Action Features,  Prev: Lexical,  Up: Interface
   6123 
   6124 4.7 The Error Reporting Function `yyerror'
   6125 ==========================================
   6126 
   6127 The Bison parser detects a "syntax error" or "parse error" whenever it
   6128 reads a token which cannot satisfy any syntax rule.  An action in the
   6129 grammar can also explicitly proclaim an error, using the macro
   6130 `YYERROR' (*note Special Features for Use in Actions: Action Features.).
   6131 
   6132    The Bison parser expects to report the error by calling an error
   6133 reporting function named `yyerror', which you must supply.  It is
   6134 called by `yyparse' whenever a syntax error is found, and it receives
   6135 one argument.  For a syntax error, the string is normally
   6136 `"syntax error"'.
   6137 
   6138    If you invoke the directive `%error-verbose' in the Bison
   6139 declarations section (*note The Bison Declarations Section: Bison
   6140 Declarations.), then Bison provides a more verbose and specific error
   6141 message string instead of just plain `"syntax error"'.  However, that
   6142 message sometimes contains incorrect information if LAC is not enabled
   6143 (*note LAC::).
   6144 
   6145    The parser can detect one other kind of error: memory exhaustion.
   6146 This can happen when the input contains constructions that are very
   6147 deeply nested.  It isn't likely you will encounter this, since the Bison
   6148 parser normally extends its stack automatically up to a very large
   6149 limit.  But if memory is exhausted, `yyparse' calls `yyerror' in the
   6150 usual fashion, except that the argument string is `"memory exhausted"'.
   6151 
   6152    In some cases diagnostics like `"syntax error"' are translated
   6153 automatically from English to some other language before they are
   6154 passed to `yyerror'.  *Note Internationalization::.
   6155 
   6156    The following definition suffices in simple programs:
   6157 
   6158      void
   6159      yyerror (char const *s)
   6160      {
   6161        fprintf (stderr, "%s\n", s);
   6162      }
   6163 
   6164    After `yyerror' returns to `yyparse', the latter will attempt error
   6165 recovery if you have written suitable error recovery grammar rules
   6166 (*note Error Recovery::).  If recovery is impossible, `yyparse' will
   6167 immediately return 1.
   6168 
   6169    Obviously, in location tracking pure parsers, `yyerror' should have
   6170 an access to the current location. With `%define api.pure', this is
   6171 indeed the case for the GLR parsers, but not for the Yacc parser, for
   6172 historical reasons, and this is the why `%define api.pure full' should
   6173 be prefered over `%define api.pure'.
   6174 
   6175    When `%locations %define api.pure full' is used, `yyerror' has the
   6176 following signature:
   6177 
   6178      void yyerror (YYLTYPE *locp, char const *msg);
   6179 
   6180 The prototypes are only indications of how the code produced by Bison
   6181 uses `yyerror'.  Bison-generated code always ignores the returned
   6182 value, so `yyerror' can return any type, including `void'.  Also,
   6183 `yyerror' can be a variadic function; that is why the message is always
   6184 passed last.
   6185 
   6186    Traditionally `yyerror' returns an `int' that is always ignored, but
   6187 this is purely for historical reasons, and `void' is preferable since
   6188 it more accurately describes the return type for `yyerror'.
   6189 
   6190    The variable `yynerrs' contains the number of syntax errors reported
   6191 so far.  Normally this variable is global; but if you request a pure
   6192 parser (*note A Pure (Reentrant) Parser: Pure Decl.)  then it is a
   6193 local variable which only the actions can access.
   6194 
   6195 
   6196 File: bison.info,  Node: Action Features,  Next: Internationalization,  Prev: Error Reporting,  Up: Interface
   6197 
   6198 4.8 Special Features for Use in Actions
   6199 =======================================
   6200 
   6201 Here is a table of Bison constructs, variables and macros that are
   6202 useful in actions.
   6203 
   6204  -- Variable: $$
   6205      Acts like a variable that contains the semantic value for the
   6206      grouping made by the current rule.  *Note Actions::.
   6207 
   6208  -- Variable: $N
   6209      Acts like a variable that contains the semantic value for the Nth
   6210      component of the current rule.  *Note Actions::.
   6211 
   6212  -- Variable: $<TYPEALT>$
   6213      Like `$$' but specifies alternative TYPEALT in the union specified
   6214      by the `%union' declaration.  *Note Data Types of Values in
   6215      Actions: Action Types.
   6216 
   6217  -- Variable: $<TYPEALT>N
   6218      Like `$N' but specifies alternative TYPEALT in the union specified
   6219      by the `%union' declaration.  *Note Data Types of Values in
   6220      Actions: Action Types.
   6221 
   6222  -- Macro: YYABORT `;'
   6223      Return immediately from `yyparse', indicating failure.  *Note The
   6224      Parser Function `yyparse': Parser Function.
   6225 
   6226  -- Macro: YYACCEPT `;'
   6227      Return immediately from `yyparse', indicating success.  *Note The
   6228      Parser Function `yyparse': Parser Function.
   6229 
   6230  -- Macro: YYBACKUP (TOKEN, VALUE)`;'
   6231      Unshift a token.  This macro is allowed only for rules that reduce
   6232      a single value, and only when there is no lookahead token.  It is
   6233      also disallowed in GLR parsers.  It installs a lookahead token
   6234      with token type TOKEN and semantic value VALUE; then it discards
   6235      the value that was going to be reduced by this rule.
   6236 
   6237      If the macro is used when it is not valid, such as when there is a
   6238      lookahead token already, then it reports a syntax error with a
   6239      message `cannot back up' and performs ordinary error recovery.
   6240 
   6241      In either case, the rest of the action is not executed.
   6242 
   6243  -- Macro: YYEMPTY
   6244      Value stored in `yychar' when there is no lookahead token.
   6245 
   6246  -- Macro: YYEOF
   6247      Value stored in `yychar' when the lookahead is the end of the input
   6248      stream.
   6249 
   6250  -- Macro: YYERROR `;'
   6251      Cause an immediate syntax error.  This statement initiates error
   6252      recovery just as if the parser itself had detected an error;
   6253      however, it does not call `yyerror', and does not print any
   6254      message.  If you want to print an error message, call `yyerror'
   6255      explicitly before the `YYERROR;' statement.  *Note Error
   6256      Recovery::.
   6257 
   6258  -- Macro: YYRECOVERING
   6259      The expression `YYRECOVERING ()' yields 1 when the parser is
   6260      recovering from a syntax error, and 0 otherwise.  *Note Error
   6261      Recovery::.
   6262 
   6263  -- Variable: yychar
   6264      Variable containing either the lookahead token, or `YYEOF' when the
   6265      lookahead is the end of the input stream, or `YYEMPTY' when no
   6266      lookahead has been performed so the next token is not yet known.
   6267      Do not modify `yychar' in a deferred semantic action (*note GLR
   6268      Semantic Actions::).  *Note Lookahead Tokens: Lookahead.
   6269 
   6270  -- Macro: yyclearin `;'
   6271      Discard the current lookahead token.  This is useful primarily in
   6272      error rules.  Do not invoke `yyclearin' in a deferred semantic
   6273      action (*note GLR Semantic Actions::).  *Note Error Recovery::.
   6274 
   6275  -- Macro: yyerrok `;'
   6276      Resume generating error messages immediately for subsequent syntax
   6277      errors.  This is useful primarily in error rules.  *Note Error
   6278      Recovery::.
   6279 
   6280  -- Variable: yylloc
   6281      Variable containing the lookahead token location when `yychar' is
   6282      not set to `YYEMPTY' or `YYEOF'.  Do not modify `yylloc' in a
   6283      deferred semantic action (*note GLR Semantic Actions::).  *Note
   6284      Actions and Locations: Actions and Locations.
   6285 
   6286  -- Variable: yylval
   6287      Variable containing the lookahead token semantic value when
   6288      `yychar' is not set to `YYEMPTY' or `YYEOF'.  Do not modify
   6289      `yylval' in a deferred semantic action (*note GLR Semantic
   6290      Actions::).  *Note Actions: Actions.
   6291 
   6292  -- Value: @$
   6293      Acts like a structure variable containing information on the
   6294      textual location of the grouping made by the current rule.  *Note
   6295      Tracking Locations::.
   6296 
   6297 
   6298  -- Value: @N
   6299      Acts like a structure variable containing information on the
   6300      textual location of the Nth component of the current rule.  *Note
   6301      Tracking Locations::.
   6302 
   6303 
   6304 File: bison.info,  Node: Internationalization,  Prev: Action Features,  Up: Interface
   6305 
   6306 4.9 Parser Internationalization
   6307 ===============================
   6308 
   6309 A Bison-generated parser can print diagnostics, including error and
   6310 tracing messages.  By default, they appear in English.  However, Bison
   6311 also supports outputting diagnostics in the user's native language.  To
   6312 make this work, the user should set the usual environment variables.
   6313 *Note The User's View: (gettext)Users.  For example, the shell command
   6314 `export LC_ALL=fr_CA.UTF-8' might set the user's locale to French
   6315 Canadian using the UTF-8 encoding.  The exact set of available locales
   6316 depends on the user's installation.
   6317 
   6318    The maintainer of a package that uses a Bison-generated parser
   6319 enables the internationalization of the parser's output through the
   6320 following steps.  Here we assume a package that uses GNU Autoconf and
   6321 GNU Automake.
   6322 
   6323   1. Into the directory containing the GNU Autoconf macros used by the
   6324      package --often called `m4'-- copy the `bison-i18n.m4' file
   6325      installed by Bison under `share/aclocal/bison-i18n.m4' in Bison's
   6326      installation directory.  For example:
   6327 
   6328           cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
   6329 
   6330   2. In the top-level `configure.ac', after the `AM_GNU_GETTEXT'
   6331      invocation, add an invocation of `BISON_I18N'.  This macro is
   6332      defined in the file `bison-i18n.m4' that you copied earlier.  It
   6333      causes `configure' to find the value of the `BISON_LOCALEDIR'
   6334      variable, and it defines the source-language symbol `YYENABLE_NLS'
   6335      to enable translations in the Bison-generated parser.
   6336 
   6337   3. In the `main' function of your program, designate the directory
   6338      containing Bison's runtime message catalog, through a call to
   6339      `bindtextdomain' with domain name `bison-runtime'.  For example:
   6340 
   6341           bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
   6342 
   6343      Typically this appears after any other call `bindtextdomain
   6344      (PACKAGE, LOCALEDIR)' that your package already has.  Here we rely
   6345      on `BISON_LOCALEDIR' to be defined as a string through the
   6346      `Makefile'.
   6347 
   6348   4. In the `Makefile.am' that controls the compilation of the `main'
   6349      function, make `BISON_LOCALEDIR' available as a C preprocessor
   6350      macro, either in `DEFS' or in `AM_CPPFLAGS'.  For example:
   6351 
   6352           DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
   6353 
   6354      or:
   6355 
   6356           AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
   6357 
   6358   5. Finally, invoke the command `autoreconf' to generate the build
   6359      infrastructure.
   6360 
   6361 
   6362 File: bison.info,  Node: Algorithm,  Next: Error Recovery,  Prev: Interface,  Up: Top
   6363 
   6364 5 The Bison Parser Algorithm
   6365 ****************************
   6366 
   6367 As Bison reads tokens, it pushes them onto a stack along with their
   6368 semantic values.  The stack is called the "parser stack".  Pushing a
   6369 token is traditionally called "shifting".
   6370 
   6371    For example, suppose the infix calculator has read `1 + 5 *', with a
   6372 `3' to come.  The stack will have four elements, one for each token
   6373 that was shifted.
   6374 
   6375    But the stack does not always have an element for each token read.
   6376 When the last N tokens and groupings shifted match the components of a
   6377 grammar rule, they can be combined according to that rule.  This is
   6378 called "reduction".  Those tokens and groupings are replaced on the
   6379 stack by a single grouping whose symbol is the result (left hand side)
   6380 of that rule.  Running the rule's action is part of the process of
   6381 reduction, because this is what computes the semantic value of the
   6382 resulting grouping.
   6383 
   6384    For example, if the infix calculator's parser stack contains this:
   6385 
   6386      1 + 5 * 3
   6387 
   6388 and the next input token is a newline character, then the last three
   6389 elements can be reduced to 15 via the rule:
   6390 
   6391      expr: expr '*' expr;
   6392 
   6393 Then the stack contains just these three elements:
   6394 
   6395      1 + 15
   6396 
   6397 At this point, another reduction can be made, resulting in the single
   6398 value 16.  Then the newline token can be shifted.
   6399 
   6400    The parser tries, by shifts and reductions, to reduce the entire
   6401 input down to a single grouping whose symbol is the grammar's
   6402 start-symbol (*note Languages and Context-Free Grammars: Language and
   6403 Grammar.).
   6404 
   6405    This kind of parser is known in the literature as a bottom-up parser.
   6406 
   6407 * Menu:
   6408 
   6409 * Lookahead::         Parser looks one token ahead when deciding what to do.
   6410 * Shift/Reduce::      Conflicts: when either shifting or reduction is valid.
   6411 * Precedence::        Operator precedence works by resolving conflicts.
   6412 * Contextual Precedence::  When an operator's precedence depends on context.
   6413 * Parser States::     The parser is a finite-state-machine with stack.
   6414 * Reduce/Reduce::     When two rules are applicable in the same situation.
   6415 * Mysterious Conflicts:: Conflicts that look unjustified.
   6416 * Tuning LR::         How to tune fundamental aspects of LR-based parsing.
   6417 * Generalized LR Parsing::  Parsing arbitrary context-free grammars.
   6418 * Memory Management:: What happens when memory is exhausted.  How to avoid it.
   6419 
   6420 
   6421 File: bison.info,  Node: Lookahead,  Next: Shift/Reduce,  Up: Algorithm
   6422 
   6423 5.1 Lookahead Tokens
   6424 ====================
   6425 
   6426 The Bison parser does _not_ always reduce immediately as soon as the
   6427 last N tokens and groupings match a rule.  This is because such a
   6428 simple strategy is inadequate to handle most languages.  Instead, when a
   6429 reduction is possible, the parser sometimes "looks ahead" at the next
   6430 token in order to decide what to do.
   6431 
   6432    When a token is read, it is not immediately shifted; first it
   6433 becomes the "lookahead token", which is not on the stack.  Now the
   6434 parser can perform one or more reductions of tokens and groupings on
   6435 the stack, while the lookahead token remains off to the side.  When no
   6436 more reductions should take place, the lookahead token is shifted onto
   6437 the stack.  This does not mean that all possible reductions have been
   6438 done; depending on the token type of the lookahead token, some rules
   6439 may choose to delay their application.
   6440 
   6441    Here is a simple case where lookahead is needed.  These three rules
   6442 define expressions which contain binary addition operators and postfix
   6443 unary factorial operators (`!'), and allow parentheses for grouping.
   6444 
   6445      expr:
   6446        term '+' expr
   6447      | term
   6448      ;
   6449 
   6450      term:
   6451        '(' expr ')'
   6452      | term '!'
   6453      | "number"
   6454      ;
   6455 
   6456    Suppose that the tokens `1 + 2' have been read and shifted; what
   6457 should be done?  If the following token is `)', then the first three
   6458 tokens must be reduced to form an `expr'.  This is the only valid
   6459 course, because shifting the `)' would produce a sequence of symbols
   6460 `term ')'', and no rule allows this.
   6461 
   6462    If the following token is `!', then it must be shifted immediately so
   6463 that `2 !' can be reduced to make a `term'.  If instead the parser were
   6464 to reduce before shifting, `1 + 2' would become an `expr'.  It would
   6465 then be impossible to shift the `!' because doing so would produce on
   6466 the stack the sequence of symbols `expr '!''.  No rule allows that
   6467 sequence.
   6468 
   6469    The lookahead token is stored in the variable `yychar'.  Its
   6470 semantic value and location, if any, are stored in the variables
   6471 `yylval' and `yylloc'.  *Note Special Features for Use in Actions:
   6472 Action Features.
   6473 
   6474 
   6475 File: bison.info,  Node: Shift/Reduce,  Next: Precedence,  Prev: Lookahead,  Up: Algorithm
   6476 
   6477 5.2 Shift/Reduce Conflicts
   6478 ==========================
   6479 
   6480 Suppose we are parsing a language which has if-then and if-then-else
   6481 statements, with a pair of rules like this:
   6482 
   6483      if_stmt:
   6484        "if" expr "then" stmt
   6485      | "if" expr "then" stmt "else" stmt
   6486      ;
   6487 
   6488 Here `"if"', `"then"' and `"else"' are terminal symbols for specific
   6489 keyword tokens.
   6490 
   6491    When the `"else"' token is read and becomes the lookahead token, the
   6492 contents of the stack (assuming the input is valid) are just right for
   6493 reduction by the first rule.  But it is also legitimate to shift the
   6494 `"else"', because that would lead to eventual reduction by the second
   6495 rule.
   6496 
   6497    This situation, where either a shift or a reduction would be valid,
   6498 is called a "shift/reduce conflict".  Bison is designed to resolve
   6499 these conflicts by choosing to shift, unless otherwise directed by
   6500 operator precedence declarations.  To see the reason for this, let's
   6501 contrast it with the other alternative.
   6502 
   6503    Since the parser prefers to shift the `"else"', the result is to
   6504 attach the else-clause to the innermost if-statement, making these two
   6505 inputs equivalent:
   6506 
   6507      if x then if y then win; else lose;
   6508 
   6509      if x then do; if y then win; else lose; end;
   6510 
   6511    But if the parser chose to reduce when possible rather than shift,
   6512 the result would be to attach the else-clause to the outermost
   6513 if-statement, making these two inputs equivalent:
   6514 
   6515      if x then if y then win; else lose;
   6516 
   6517      if x then do; if y then win; end; else lose;
   6518 
   6519    The conflict exists because the grammar as written is ambiguous:
   6520 either parsing of the simple nested if-statement is legitimate.  The
   6521 established convention is that these ambiguities are resolved by
   6522 attaching the else-clause to the innermost if-statement; this is what
   6523 Bison accomplishes by choosing to shift rather than reduce.  (It would
   6524 ideally be cleaner to write an unambiguous grammar, but that is very
   6525 hard to do in this case.)  This particular ambiguity was first
   6526 encountered in the specifications of Algol 60 and is called the
   6527 "dangling `else'" ambiguity.
   6528 
   6529    To avoid warnings from Bison about predictable, legitimate
   6530 shift/reduce conflicts, you can use the `%expect N' declaration.  There
   6531 will be no warning as long as the number of shift/reduce conflicts is
   6532 exactly N, and Bison will report an error if there is a different
   6533 number.  *Note Suppressing Conflict Warnings: Expect Decl.  However, we
   6534 don't recommend the use of `%expect' (except `%expect 0'!), as an equal
   6535 number of conflicts does not mean that they are the _same_.  When
   6536 possible, you should rather use precedence directives to _fix_ the
   6537 conflicts explicitly (*note Using Precedence For Non Operators: Non
   6538 Operators.).
   6539 
   6540    The definition of `if_stmt' above is solely to blame for the
   6541 conflict, but the conflict does not actually appear without additional
   6542 rules.  Here is a complete Bison grammar file that actually manifests
   6543 the conflict:
   6544 
   6545      %%
   6546      stmt:
   6547        expr
   6548      | if_stmt
   6549      ;
   6550 
   6551      if_stmt:
   6552        "if" expr "then" stmt
   6553      | "if" expr "then" stmt "else" stmt
   6554      ;
   6555 
   6556      expr:
   6557        "identifier"
   6558      ;
   6559 
   6560 
   6561 File: bison.info,  Node: Precedence,  Next: Contextual Precedence,  Prev: Shift/Reduce,  Up: Algorithm
   6562 
   6563 5.3 Operator Precedence
   6564 =======================
   6565 
   6566 Another situation where shift/reduce conflicts appear is in arithmetic
   6567 expressions.  Here shifting is not always the preferred resolution; the
   6568 Bison declarations for operator precedence allow you to specify when to
   6569 shift and when to reduce.
   6570 
   6571 * Menu:
   6572 
   6573 * Why Precedence::    An example showing why precedence is needed.
   6574 * Using Precedence::  How to specify precedence in Bison grammars.
   6575 * Precedence Examples::  How these features are used in the previous example.
   6576 * How Precedence::    How they work.
   6577 * Non Operators::     Using precedence for general conflicts.
   6578 
   6579 
   6580 File: bison.info,  Node: Why Precedence,  Next: Using Precedence,  Up: Precedence
   6581 
   6582 5.3.1 When Precedence is Needed
   6583 -------------------------------
   6584 
   6585 Consider the following ambiguous grammar fragment (ambiguous because the
   6586 input `1 - 2 * 3' can be parsed in two different ways):
   6587 
   6588      expr:
   6589        expr '-' expr
   6590      | expr '*' expr
   6591      | expr '<' expr
   6592      | '(' expr ')'
   6593      ...
   6594      ;
   6595 
   6596 Suppose the parser has seen the tokens `1', `-' and `2'; should it
   6597 reduce them via the rule for the subtraction operator?  It depends on
   6598 the next token.  Of course, if the next token is `)', we must reduce;
   6599 shifting is invalid because no single rule can reduce the token
   6600 sequence `- 2 )' or anything starting with that.  But if the next token
   6601 is `*' or `<', we have a choice: either shifting or reduction would
   6602 allow the parse to complete, but with different results.
   6603 
   6604    To decide which one Bison should do, we must consider the results.
   6605 If the next operator token OP is shifted, then it must be reduced first
   6606 in order to permit another opportunity to reduce the difference.  The
   6607 result is (in effect) `1 - (2 OP 3)'.  On the other hand, if the
   6608 subtraction is reduced before shifting OP, the result is
   6609 `(1 - 2) OP 3'.  Clearly, then, the choice of shift or reduce should
   6610 depend on the relative precedence of the operators `-' and OP: `*'
   6611 should be shifted first, but not `<'.
   6612 
   6613    What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5'
   6614 or should it be `1 - (2 - 5)'?  For most operators we prefer the
   6615 former, which is called "left association".  The latter alternative,
   6616 "right association", is desirable for assignment operators.  The choice
   6617 of left or right association is a matter of whether the parser chooses
   6618 to shift or reduce when the stack contains `1 - 2' and the lookahead
   6619 token is `-': shifting makes right-associativity.
   6620 
   6621 
   6622 File: bison.info,  Node: Using Precedence,  Next: Precedence Examples,  Prev: Why Precedence,  Up: Precedence
   6623 
   6624 5.3.2 Specifying Operator Precedence
   6625 ------------------------------------
   6626 
   6627 Bison allows you to specify these choices with the operator precedence
   6628 declarations `%left' and `%right'.  Each such declaration contains a
   6629 list of tokens, which are operators whose precedence and associativity
   6630 is being declared.  The `%left' declaration makes all those operators
   6631 left-associative and the `%right' declaration makes them
   6632 right-associative.  A third alternative is `%nonassoc', which declares
   6633 that it is a syntax error to find the same operator twice "in a row".
   6634 
   6635    The relative precedence of different operators is controlled by the
   6636 order in which they are declared.  The first `%left' or `%right'
   6637 declaration in the file declares the operators whose precedence is
   6638 lowest, the next such declaration declares the operators whose
   6639 precedence is a little higher, and so on.
   6640 
   6641 
   6642 File: bison.info,  Node: Precedence Examples,  Next: How Precedence,  Prev: Using Precedence,  Up: Precedence
   6643 
   6644 5.3.3 Precedence Examples
   6645 -------------------------
   6646 
   6647 In our example, we would want the following declarations:
   6648 
   6649      %left '<'
   6650      %left '-'
   6651      %left '*'
   6652 
   6653    In a more complete example, which supports other operators as well,
   6654 we would declare them in groups of equal precedence.  For example,
   6655 `'+'' is declared with `'-'':
   6656 
   6657      %left '<' '>' '=' "!=" "<=" ">="
   6658      %left '+' '-'
   6659      %left '*' '/'
   6660 
   6661 
   6662 File: bison.info,  Node: How Precedence,  Next: Non Operators,  Prev: Precedence Examples,  Up: Precedence
   6663 
   6664 5.3.4 How Precedence Works
   6665 --------------------------
   6666 
   6667 The first effect of the precedence declarations is to assign precedence
   6668 levels to the terminal symbols declared.  The second effect is to assign
   6669 precedence levels to certain rules: each rule gets its precedence from
   6670 the last terminal symbol mentioned in the components.  (You can also
   6671 specify explicitly the precedence of a rule.  *Note Context-Dependent
   6672 Precedence: Contextual Precedence.)
   6673 
   6674    Finally, the resolution of conflicts works by comparing the
   6675 precedence of the rule being considered with that of the lookahead
   6676 token.  If the token's precedence is higher, the choice is to shift.
   6677 If the rule's precedence is higher, the choice is to reduce.  If they
   6678 have equal precedence, the choice is made based on the associativity of
   6679 that precedence level.  The verbose output file made by `-v' (*note
   6680 Invoking Bison: Invocation.) says how each conflict was resolved.
   6681 
   6682    Not all rules and not all tokens have precedence.  If either the
   6683 rule or the lookahead token has no precedence, then the default is to
   6684 shift.
   6685 
   6686 
   6687 File: bison.info,  Node: Non Operators,  Prev: How Precedence,  Up: Precedence
   6688 
   6689 5.3.5 Using Precedence For Non Operators
   6690 ----------------------------------------
   6691 
   6692 Using properly precedence and associativity directives can help fixing
   6693 shift/reduce conflicts that do not involve arithmetics-like operators.
   6694 For instance, the "dangling `else'" problem (*note Shift/Reduce
   6695 Conflicts: Shift/Reduce.) can be solved elegantly in two different ways.
   6696 
   6697    In the present case, the conflict is between the token `"else"'
   6698 willing to be shifted, and the rule `if_stmt: "if" expr "then" stmt',
   6699 asking for reduction.  By default, the precedence of a rule is that of
   6700 its last token, here `"then"', so the conflict will be solved
   6701 appropriately by giving `"else"' a precedence higher than that of
   6702 `"then"', for instance as follows:
   6703 
   6704      %nonassoc "then"
   6705      %nonassoc "else"
   6706 
   6707    Alternatively, you may give both tokens the same precedence, in
   6708 which case associativity is used to solve the conflict.  To preserve
   6709 the shift action, use right associativity:
   6710 
   6711      %right "then" "else"
   6712 
   6713    Neither solution is perfect however.  Since Bison does not provide,
   6714 so far, support for "scoped" precedence, both force you to declare the
   6715 precedence of these keywords with respect to the other operators your
   6716 grammar.  Therefore, instead of being warned about new conflicts you
   6717 would be unaware of (e.g., a shift/reduce conflict due to `if test then
   6718 1 else 2 + 3' being ambiguous: `if test then 1 else (2 + 3)' or `(if
   6719 test then 1 else 2) + 3'?), the conflict will be already "fixed".
   6720 
   6721 
   6722 File: bison.info,  Node: Contextual Precedence,  Next: Parser States,  Prev: Precedence,  Up: Algorithm
   6723 
   6724 5.4 Context-Dependent Precedence
   6725 ================================
   6726 
   6727 Often the precedence of an operator depends on the context.  This sounds
   6728 outlandish at first, but it is really very common.  For example, a minus
   6729 sign typically has a very high precedence as a unary operator, and a
   6730 somewhat lower precedence (lower than multiplication) as a binary
   6731 operator.
   6732 
   6733    The Bison precedence declarations, `%left', `%right' and
   6734 `%nonassoc', can only be used once for a given token; so a token has
   6735 only one precedence declared in this way.  For context-dependent
   6736 precedence, you need to use an additional mechanism: the `%prec'
   6737 modifier for rules.
   6738 
   6739    The `%prec' modifier declares the precedence of a particular rule by
   6740 specifying a terminal symbol whose precedence should be used for that
   6741 rule.  It's not necessary for that symbol to appear otherwise in the
   6742 rule.  The modifier's syntax is:
   6743 
   6744      %prec TERMINAL-SYMBOL
   6745 
   6746 and it is written after the components of the rule.  Its effect is to
   6747 assign the rule the precedence of TERMINAL-SYMBOL, overriding the
   6748 precedence that would be deduced for it in the ordinary way.  The
   6749 altered rule precedence then affects how conflicts involving that rule
   6750 are resolved (*note Operator Precedence: Precedence.).
   6751 
   6752    Here is how `%prec' solves the problem of unary minus.  First,
   6753 declare a precedence for a fictitious terminal symbol named `UMINUS'.
   6754 There are no tokens of this type, but the symbol serves to stand for its
   6755 precedence:
   6756 
   6757      ...
   6758      %left '+' '-'
   6759      %left '*'
   6760      %left UMINUS
   6761 
   6762    Now the precedence of `UMINUS' can be used in specific rules:
   6763 
   6764      exp:
   6765        ...
   6766      | exp '-' exp
   6767        ...
   6768      | '-' exp %prec UMINUS
   6769 
   6770 
   6771 File: bison.info,  Node: Parser States,  Next: Reduce/Reduce,  Prev: Contextual Precedence,  Up: Algorithm
   6772 
   6773 5.5 Parser States
   6774 =================
   6775 
   6776 The function `yyparse' is implemented using a finite-state machine.
   6777 The values pushed on the parser stack are not simply token type codes;
   6778 they represent the entire sequence of terminal and nonterminal symbols
   6779 at or near the top of the stack.  The current state collects all the
   6780 information about previous input which is relevant to deciding what to
   6781 do next.
   6782 
   6783    Each time a lookahead token is read, the current parser state
   6784 together with the type of lookahead token are looked up in a table.
   6785 This table entry can say, "Shift the lookahead token."  In this case,
   6786 it also specifies the new parser state, which is pushed onto the top of
   6787 the parser stack.  Or it can say, "Reduce using rule number N."  This
   6788 means that a certain number of tokens or groupings are taken off the
   6789 top of the stack, and replaced by one grouping.  In other words, that
   6790 number of states are popped from the stack, and one new state is pushed.
   6791 
   6792    There is one other alternative: the table can say that the lookahead
   6793 token is erroneous in the current state.  This causes error processing
   6794 to begin (*note Error Recovery::).
   6795 
   6796 
   6797 File: bison.info,  Node: Reduce/Reduce,  Next: Mysterious Conflicts,  Prev: Parser States,  Up: Algorithm
   6798 
   6799 5.6 Reduce/Reduce Conflicts
   6800 ===========================
   6801 
   6802 A reduce/reduce conflict occurs if there are two or more rules that
   6803 apply to the same sequence of input.  This usually indicates a serious
   6804 error in the grammar.
   6805 
   6806    For example, here is an erroneous attempt to define a sequence of
   6807 zero or more `word' groupings.
   6808 
   6809      sequence:
   6810        /* empty */    { printf ("empty sequence\n"); }
   6811      | maybeword
   6812      | sequence word  { printf ("added word %s\n", $2); }
   6813      ;
   6814 
   6815      maybeword:
   6816        /* empty */   { printf ("empty maybeword\n"); }
   6817      | word          { printf ("single word %s\n", $1); }
   6818      ;
   6819 
   6820 The error is an ambiguity: there is more than one way to parse a single
   6821 `word' into a `sequence'.  It could be reduced to a `maybeword' and
   6822 then into a `sequence' via the second rule.  Alternatively,
   6823 nothing-at-all could be reduced into a `sequence' via the first rule,
   6824 and this could be combined with the `word' using the third rule for
   6825 `sequence'.
   6826 
   6827    There is also more than one way to reduce nothing-at-all into a
   6828 `sequence'.  This can be done directly via the first rule, or
   6829 indirectly via `maybeword' and then the second rule.
   6830 
   6831    You might think that this is a distinction without a difference,
   6832 because it does not change whether any particular input is valid or
   6833 not.  But it does affect which actions are run.  One parsing order runs
   6834 the second rule's action; the other runs the first rule's action and
   6835 the third rule's action.  In this example, the output of the program
   6836 changes.
   6837 
   6838    Bison resolves a reduce/reduce conflict by choosing to use the rule
   6839 that appears first in the grammar, but it is very risky to rely on
   6840 this.  Every reduce/reduce conflict must be studied and usually
   6841 eliminated.  Here is the proper way to define `sequence':
   6842 
   6843      sequence:
   6844        /* empty */    { printf ("empty sequence\n"); }
   6845      | sequence word  { printf ("added word %s\n", $2); }
   6846      ;
   6847 
   6848    Here is another common error that yields a reduce/reduce conflict:
   6849 
   6850      sequence:
   6851        /* empty */
   6852      | sequence words
   6853      | sequence redirects
   6854      ;
   6855 
   6856      words:
   6857        /* empty */
   6858      | words word
   6859      ;
   6860 
   6861      redirects:
   6862        /* empty */
   6863      | redirects redirect
   6864      ;
   6865 
   6866 The intention here is to define a sequence which can contain either
   6867 `word' or `redirect' groupings.  The individual definitions of
   6868 `sequence', `words' and `redirects' are error-free, but the three
   6869 together make a subtle ambiguity: even an empty input can be parsed in
   6870 infinitely many ways!
   6871 
   6872    Consider: nothing-at-all could be a `words'.  Or it could be two
   6873 `words' in a row, or three, or any number.  It could equally well be a
   6874 `redirects', or two, or any number.  Or it could be a `words' followed
   6875 by three `redirects' and another `words'.  And so on.
   6876 
   6877    Here are two ways to correct these rules.  First, to make it a
   6878 single level of sequence:
   6879 
   6880      sequence:
   6881        /* empty */
   6882      | sequence word
   6883      | sequence redirect
   6884      ;
   6885 
   6886    Second, to prevent either a `words' or a `redirects' from being
   6887 empty:
   6888 
   6889      sequence:
   6890        /* empty */
   6891      | sequence words
   6892      | sequence redirects
   6893      ;
   6894 
   6895      words:
   6896        word
   6897      | words word
   6898      ;
   6899 
   6900      redirects:
   6901        redirect
   6902      | redirects redirect
   6903      ;
   6904 
   6905    Yet this proposal introduces another kind of ambiguity!  The input
   6906 `word word' can be parsed as a single `words' composed of two `word's,
   6907 or as two one-`word' `words' (and likewise for `redirect'/`redirects').
   6908 However this ambiguity is now a shift/reduce conflict, and therefore it
   6909 can now be addressed with precedence directives.
   6910 
   6911    To simplify the matter, we will proceed with `word' and `redirect'
   6912 being tokens: `"word"' and `"redirect"'.
   6913 
   6914    To prefer the longest `words', the conflict between the token
   6915 `"word"' and the rule `sequence: sequence words' must be resolved as a
   6916 shift.  To this end, we use the same techniques as exposed above, see
   6917 *note Using Precedence For Non Operators: Non Operators.  One solution
   6918 relies on precedences: use `%prec' to give a lower precedence to the
   6919 rule:
   6920 
   6921      %nonassoc "word"
   6922      %nonassoc "sequence"
   6923      %%
   6924      sequence:
   6925        /* empty */
   6926      | sequence word      %prec "sequence"
   6927      | sequence redirect  %prec "sequence"
   6928      ;
   6929 
   6930      words:
   6931        word
   6932      | words "word"
   6933      ;
   6934 
   6935    Another solution relies on associativity: provide both the token and
   6936 the rule with the same precedence, but make them right-associative:
   6937 
   6938      %right "word" "redirect"
   6939      %%
   6940      sequence:
   6941        /* empty */
   6942      | sequence word      %prec "word"
   6943      | sequence redirect  %prec "redirect"
   6944      ;
   6945 
   6946 
   6947 File: bison.info,  Node: Mysterious Conflicts,  Next: Tuning LR,  Prev: Reduce/Reduce,  Up: Algorithm
   6948 
   6949 5.7 Mysterious Conflicts
   6950 ========================
   6951 
   6952 Sometimes reduce/reduce conflicts can occur that don't look warranted.
   6953 Here is an example:
   6954 
   6955      %%
   6956      def: param_spec return_spec ',';
   6957      param_spec:
   6958        type
   6959      | name_list ':' type
   6960      ;
   6961      return_spec:
   6962        type
   6963      | name ':' type
   6964      ;
   6965      type: "id";
   6966      name: "id";
   6967      name_list:
   6968        name
   6969      | name ',' name_list
   6970      ;
   6971 
   6972    It would seem that this grammar can be parsed with only a single
   6973 token of lookahead: when a `param_spec' is being read, an `"id"' is a
   6974 `name' if a comma or colon follows, or a `type' if another `"id"'
   6975 follows.  In other words, this grammar is LR(1).
   6976 
   6977    However, for historical reasons, Bison cannot by default handle all
   6978 LR(1) grammars.  In this grammar, two contexts, that after an `"id"' at
   6979 the beginning of a `param_spec' and likewise at the beginning of a
   6980 `return_spec', are similar enough that Bison assumes they are the same.
   6981 They appear similar because the same set of rules would be active--the
   6982 rule for reducing to a `name' and that for reducing to a `type'.  Bison
   6983 is unable to determine at that stage of processing that the rules would
   6984 require different lookahead tokens in the two contexts, so it makes a
   6985 single parser state for them both.  Combining the two contexts causes a
   6986 conflict later.  In parser terminology, this occurrence means that the
   6987 grammar is not LALR(1).
   6988 
   6989    For many practical grammars (specifically those that fall into the
   6990 non-LR(1) class), the limitations of LALR(1) result in difficulties
   6991 beyond just mysterious reduce/reduce conflicts.  The best way to fix
   6992 all these problems is to select a different parser table construction
   6993 algorithm.  Either IELR(1) or canonical LR(1) would suffice, but the
   6994 former is more efficient and easier to debug during development.  *Note
   6995 LR Table Construction::, for details.  (Bison's IELR(1) and canonical
   6996 LR(1) implementations are experimental.  More user feedback will help
   6997 to stabilize them.)
   6998 
   6999    If you instead wish to work around LALR(1)'s limitations, you can
   7000 often fix a mysterious conflict by identifying the two parser states
   7001 that are being confused, and adding something to make them look
   7002 distinct.  In the above example, adding one rule to `return_spec' as
   7003 follows makes the problem go away:
   7004 
   7005      ...
   7006      return_spec:
   7007        type
   7008      | name ':' type
   7009      | "id" "bogus"       /* This rule is never used.  */
   7010      ;
   7011 
   7012    This corrects the problem because it introduces the possibility of an
   7013 additional active rule in the context after the `"id"' at the beginning
   7014 of `return_spec'.  This rule is not active in the corresponding context
   7015 in a `param_spec', so the two contexts receive distinct parser states.
   7016 As long as the token `"bogus"' is never generated by `yylex', the added
   7017 rule cannot alter the way actual input is parsed.
   7018 
   7019    In this particular example, there is another way to solve the
   7020 problem: rewrite the rule for `return_spec' to use `"id"' directly
   7021 instead of via `name'.  This also causes the two confusing contexts to
   7022 have different sets of active rules, because the one for `return_spec'
   7023 activates the altered rule for `return_spec' rather than the one for
   7024 `name'.
   7025 
   7026      param_spec:
   7027        type
   7028      | name_list ':' type
   7029      ;
   7030      return_spec:
   7031        type
   7032      | "id" ':' type
   7033      ;
   7034 
   7035    For a more detailed exposition of LALR(1) parsers and parser
   7036 generators, *note DeRemer 1982: Bibliography.
   7037 
   7038 
   7039 File: bison.info,  Node: Tuning LR,  Next: Generalized LR Parsing,  Prev: Mysterious Conflicts,  Up: Algorithm
   7040 
   7041 5.8 Tuning LR
   7042 =============
   7043 
   7044 The default behavior of Bison's LR-based parsers is chosen mostly for
   7045 historical reasons, but that behavior is often not robust.  For
   7046 example, in the previous section, we discussed the mysterious conflicts
   7047 that can be produced by LALR(1), Bison's default parser table
   7048 construction algorithm.  Another example is Bison's `%error-verbose'
   7049 directive, which instructs the generated parser to produce verbose
   7050 syntax error messages, which can sometimes contain incorrect
   7051 information.
   7052 
   7053    In this section, we explore several modern features of Bison that
   7054 allow you to tune fundamental aspects of the generated LR-based
   7055 parsers.  Some of these features easily eliminate shortcomings like
   7056 those mentioned above.  Others can be helpful purely for understanding
   7057 your parser.
   7058 
   7059    Most of the features discussed in this section are still
   7060 experimental.  More user feedback will help to stabilize them.
   7061 
   7062 * Menu:
   7063 
   7064 * LR Table Construction:: Choose a different construction algorithm.
   7065 * Default Reductions::    Disable default reductions.
   7066 * LAC::                   Correct lookahead sets in the parser states.
   7067 * Unreachable States::    Keep unreachable parser states for debugging.
   7068 
   7069 
   7070 File: bison.info,  Node: LR Table Construction,  Next: Default Reductions,  Up: Tuning LR
   7071 
   7072 5.8.1 LR Table Construction
   7073 ---------------------------
   7074 
   7075 For historical reasons, Bison constructs LALR(1) parser tables by
   7076 default.  However, LALR does not possess the full language-recognition
   7077 power of LR.  As a result, the behavior of parsers employing LALR
   7078 parser tables is often mysterious.  We presented a simple example of
   7079 this effect in *note Mysterious Conflicts::.
   7080 
   7081    As we also demonstrated in that example, the traditional approach to
   7082 eliminating such mysterious behavior is to restructure the grammar.
   7083 Unfortunately, doing so correctly is often difficult.  Moreover, merely
   7084 discovering that LALR causes mysterious behavior in your parser can be
   7085 difficult as well.
   7086 
   7087    Fortunately, Bison provides an easy way to eliminate the possibility
   7088 of such mysterious behavior altogether.  You simply need to activate a
   7089 more powerful parser table construction algorithm by using the `%define
   7090 lr.type' directive.
   7091 
   7092  -- Directive: %define lr.type TYPE
   7093      Specify the type of parser tables within the LR(1) family.  The
   7094      accepted values for TYPE are:
   7095 
   7096         * `lalr' (default)
   7097 
   7098         * `ielr'
   7099 
   7100         * `canonical-lr'
   7101 
   7102      (This feature is experimental. More user feedback will help to
   7103      stabilize it.)
   7104 
   7105    For example, to activate IELR, you might add the following directive
   7106 to you grammar file:
   7107 
   7108      %define lr.type ielr
   7109 
   7110 For the example in *note Mysterious Conflicts::, the mysterious
   7111 conflict is then eliminated, so there is no need to invest time in
   7112 comprehending the conflict or restructuring the grammar to fix it.  If,
   7113 during future development, the grammar evolves such that all mysterious
   7114 behavior would have disappeared using just LALR, you need not fear that
   7115 continuing to use IELR will result in unnecessarily large parser tables.
   7116 That is, IELR generates LALR tables when LALR (using a deterministic
   7117 parsing algorithm) is sufficient to support the full
   7118 language-recognition power of LR.  Thus, by enabling IELR at the start
   7119 of grammar development, you can safely and completely eliminate the
   7120 need to consider LALR's shortcomings.
   7121 
   7122    While IELR is almost always preferable, there are circumstances
   7123 where LALR or the canonical LR parser tables described by Knuth (*note
   7124 Knuth 1965: Bibliography.) can be useful.  Here we summarize the
   7125 relative advantages of each parser table construction algorithm within
   7126 Bison:
   7127 
   7128    * LALR
   7129 
   7130      There are at least two scenarios where LALR can be worthwhile:
   7131 
   7132         * GLR without static conflict resolution.
   7133 
   7134           When employing GLR parsers (*note GLR Parsers::), if you do
   7135           not resolve any conflicts statically (for example, with
   7136           `%left' or `%prec'), then the parser explores all potential
   7137           parses of any given input.  In this case, the choice of
   7138           parser table construction algorithm is guaranteed not to alter
   7139           the language accepted by the parser.  LALR parser tables are
   7140           the smallest parser tables Bison can currently construct, so
   7141           they may then be preferable.  Nevertheless, once you begin to
   7142           resolve conflicts statically, GLR behaves more like a
   7143           deterministic parser in the syntactic contexts where those
   7144           conflicts appear, and so either IELR or canonical LR can then
   7145           be helpful to avoid LALR's mysterious behavior.
   7146 
   7147         * Malformed grammars.
   7148 
   7149           Occasionally during development, an especially malformed
   7150           grammar with a major recurring flaw may severely impede the
   7151           IELR or canonical LR parser table construction algorithm.
   7152           LALR can be a quick way to construct parser tables in order
   7153           to investigate such problems while ignoring the more subtle
   7154           differences from IELR and canonical LR.
   7155 
   7156    * IELR
   7157 
   7158      IELR (Inadequacy Elimination LR) is a minimal LR algorithm.  That
   7159      is, given any grammar (LR or non-LR), parsers using IELR or
   7160      canonical LR parser tables always accept exactly the same set of
   7161      sentences.  However, like LALR, IELR merges parser states during
   7162      parser table construction so that the number of parser states is
   7163      often an order of magnitude less than for canonical LR.  More
   7164      importantly, because canonical LR's extra parser states may contain
   7165      duplicate conflicts in the case of non-LR grammars, the number of
   7166      conflicts for IELR is often an order of magnitude less as well.
   7167      This effect can significantly reduce the complexity of developing
   7168      a grammar.
   7169 
   7170    * Canonical LR
   7171 
   7172      While inefficient, canonical LR parser tables can be an
   7173      interesting means to explore a grammar because they possess a
   7174      property that IELR and LALR tables do not.  That is, if
   7175      `%nonassoc' is not used and default reductions are left disabled
   7176      (*note Default Reductions::), then, for every left context of
   7177      every canonical LR state, the set of tokens accepted by that state
   7178      is guaranteed to be the exact set of tokens that is syntactically
   7179      acceptable in that left context.  It might then seem that an
   7180      advantage of canonical LR parsers in production is that, under the
   7181      above constraints, they are guaranteed to detect a syntax error as
   7182      soon as possible without performing any unnecessary reductions.
   7183      However, IELR parsers that use LAC are also able to achieve this
   7184      behavior without sacrificing `%nonassoc' or default reductions.
   7185      For details and a few caveats of LAC, *note LAC::.
   7186 
   7187    For a more detailed exposition of the mysterious behavior in LALR
   7188 parsers and the benefits of IELR, *note Denny 2008 March: Bibliography,
   7189 and *note Denny 2010 November: Bibliography.
   7190 
   7191 
   7192 File: bison.info,  Node: Default Reductions,  Next: LAC,  Prev: LR Table Construction,  Up: Tuning LR
   7193 
   7194 5.8.2 Default Reductions
   7195 ------------------------
   7196 
   7197 After parser table construction, Bison identifies the reduction with the
   7198 largest lookahead set in each parser state.  To reduce the size of the
   7199 parser state, traditional Bison behavior is to remove that lookahead
   7200 set and to assign that reduction to be the default parser action.  Such
   7201 a reduction is known as a "default reduction".
   7202 
   7203    Default reductions affect more than the size of the parser tables.
   7204 They also affect the behavior of the parser:
   7205 
   7206    * Delayed `yylex' invocations.
   7207 
   7208      A "consistent state" is a state that has only one possible parser
   7209      action.  If that action is a reduction and is encoded as a default
   7210      reduction, then that consistent state is called a "defaulted
   7211      state".  Upon reaching a defaulted state, a Bison-generated parser
   7212      does not bother to invoke `yylex' to fetch the next token before
   7213      performing the reduction.  In other words, whether default
   7214      reductions are enabled in consistent states determines how soon a
   7215      Bison-generated parser invokes `yylex' for a token: immediately
   7216      when it _reaches_ that token in the input or when it eventually
   7217      _needs_ that token as a lookahead to determine the next parser
   7218      action.  Traditionally, default reductions are enabled, and so the
   7219      parser exhibits the latter behavior.
   7220 
   7221      The presence of defaulted states is an important consideration when
   7222      designing `yylex' and the grammar file.  That is, if the behavior
   7223      of `yylex' can influence or be influenced by the semantic actions
   7224      associated with the reductions in defaulted states, then the delay
   7225      of the next `yylex' invocation until after those reductions is
   7226      significant.  For example, the semantic actions might pop a scope
   7227      stack that `yylex' uses to determine what token to return.  Thus,
   7228      the delay might be necessary to ensure that `yylex' does not look
   7229      up the next token in a scope that should already be considered
   7230      closed.
   7231 
   7232    * Delayed syntax error detection.
   7233 
   7234      When the parser fetches a new token by invoking `yylex', it checks
   7235      whether there is an action for that token in the current parser
   7236      state.  The parser detects a syntax error if and only if either
   7237      (1) there is no action for that token or (2) the action for that
   7238      token is the error action (due to the use of `%nonassoc').
   7239      However, if there is a default reduction in that state (which
   7240      might or might not be a defaulted state), then it is impossible
   7241      for condition 1 to exist.  That is, all tokens have an action.
   7242      Thus, the parser sometimes fails to detect the syntax error until
   7243      it reaches a later state.
   7244 
   7245      While default reductions never cause the parser to accept
   7246      syntactically incorrect sentences, the delay of syntax error
   7247      detection can have unexpected effects on the behavior of the
   7248      parser.  However, the delay can be caused anyway by parser state
   7249      merging and the use of `%nonassoc', and it can be fixed by another
   7250      Bison feature, LAC.  We discuss the effects of delayed syntax
   7251      error detection and LAC more in the next section (*note LAC::).
   7252 
   7253    For canonical LR, the only default reduction that Bison enables by
   7254 default is the accept action, which appears only in the accepting
   7255 state, which has no other action and is thus a defaulted state.
   7256 However, the default accept action does not delay any `yylex'
   7257 invocation or syntax error detection because the accept action ends the
   7258 parse.
   7259 
   7260    For LALR and IELR, Bison enables default reductions in nearly all
   7261 states by default.  There are only two exceptions.  First, states that
   7262 have a shift action on the `error' token do not have default reductions
   7263 because delayed syntax error detection could then prevent the `error'
   7264 token from ever being shifted in that state.  However, parser state
   7265 merging can cause the same effect anyway, and LAC fixes it in both
   7266 cases, so future versions of Bison might drop this exception when LAC
   7267 is activated.  Second, GLR parsers do not record the default reduction
   7268 as the action on a lookahead token for which there is a conflict.  The
   7269 correct action in this case is to split the parse instead.
   7270 
   7271    To adjust which states have default reductions enabled, use the
   7272 `%define lr.default-reductions' directive.
   7273 
   7274  -- Directive: %define lr.default-reductions WHERE
   7275      Specify the kind of states that are permitted to contain default
   7276      reductions.  The accepted values of WHERE are:
   7277         * `most' (default for LALR and IELR)
   7278 
   7279         * `consistent'
   7280 
   7281         * `accepting' (default for canonical LR)
   7282 
   7283      (The ability to specify where default reductions are permitted is
   7284      experimental.  More user feedback will help to stabilize it.)
   7285 
   7286 
   7287 File: bison.info,  Node: LAC,  Next: Unreachable States,  Prev: Default Reductions,  Up: Tuning LR
   7288 
   7289 5.8.3 LAC
   7290 ---------
   7291 
   7292 Canonical LR, IELR, and LALR can suffer from a couple of problems upon
   7293 encountering a syntax error.  First, the parser might perform additional
   7294 parser stack reductions before discovering the syntax error.  Such
   7295 reductions can perform user semantic actions that are unexpected because
   7296 they are based on an invalid token, and they cause error recovery to
   7297 begin in a different syntactic context than the one in which the
   7298 invalid token was encountered.  Second, when verbose error messages are
   7299 enabled (*note Error Reporting::), the expected token list in the
   7300 syntax error message can both contain invalid tokens and omit valid
   7301 tokens.
   7302 
   7303    The culprits for the above problems are `%nonassoc', default
   7304 reductions in inconsistent states (*note Default Reductions::), and
   7305 parser state merging.  Because IELR and LALR merge parser states, they
   7306 suffer the most.  Canonical LR can suffer only if `%nonassoc' is used
   7307 or if default reductions are enabled for inconsistent states.
   7308 
   7309    LAC (Lookahead Correction) is a new mechanism within the parsing
   7310 algorithm that solves these problems for canonical LR, IELR, and LALR
   7311 without sacrificing `%nonassoc', default reductions, or state merging.
   7312 You can enable LAC with the `%define parse.lac' directive.
   7313 
   7314  -- Directive: %define parse.lac VALUE
   7315      Enable LAC to improve syntax error handling.
   7316         * `none' (default)
   7317 
   7318         * `full'
   7319      (This feature is experimental.  More user feedback will help to
   7320      stabilize it.  Moreover, it is currently only available for
   7321      deterministic parsers in C.)
   7322 
   7323    Conceptually, the LAC mechanism is straight-forward.  Whenever the
   7324 parser fetches a new token from the scanner so that it can determine
   7325 the next parser action, it immediately suspends normal parsing and
   7326 performs an exploratory parse using a temporary copy of the normal
   7327 parser state stack.  During this exploratory parse, the parser does not
   7328 perform user semantic actions.  If the exploratory parse reaches a
   7329 shift action, normal parsing then resumes on the normal parser stacks.
   7330 If the exploratory parse reaches an error instead, the parser reports a
   7331 syntax error.  If verbose syntax error messages are enabled, the parser
   7332 must then discover the list of expected tokens, so it performs a
   7333 separate exploratory parse for each token in the grammar.
   7334 
   7335    There is one subtlety about the use of LAC.  That is, when in a
   7336 consistent parser state with a default reduction, the parser will not
   7337 attempt to fetch a token from the scanner because no lookahead is
   7338 needed to determine the next parser action.  Thus, whether default
   7339 reductions are enabled in consistent states (*note Default
   7340 Reductions::) affects how soon the parser detects a syntax error:
   7341 immediately when it _reaches_ an erroneous token or when it eventually
   7342 _needs_ that token as a lookahead to determine the next parser action.
   7343 The latter behavior is probably more intuitive, so Bison currently
   7344 provides no way to achieve the former behavior while default reductions
   7345 are enabled in consistent states.
   7346 
   7347    Thus, when LAC is in use, for some fixed decision of whether to
   7348 enable default reductions in consistent states, canonical LR and IELR
   7349 behave almost exactly the same for both syntactically acceptable and
   7350 syntactically unacceptable input.  While LALR still does not support
   7351 the full language-recognition power of canonical LR and IELR, LAC at
   7352 least enables LALR's syntax error handling to correctly reflect LALR's
   7353 language-recognition power.
   7354 
   7355    There are a few caveats to consider when using LAC:
   7356 
   7357    * Infinite parsing loops.
   7358 
   7359      IELR plus LAC does have one shortcoming relative to canonical LR.
   7360      Some parsers generated by Bison can loop infinitely.  LAC does not
   7361      fix infinite parsing loops that occur between encountering a
   7362      syntax error and detecting it, but enabling canonical LR or
   7363      disabling default reductions sometimes does.
   7364 
   7365    * Verbose error message limitations.
   7366 
   7367      Because of internationalization considerations, Bison-generated
   7368      parsers limit the size of the expected token list they are willing
   7369      to report in a verbose syntax error message.  If the number of
   7370      expected tokens exceeds that limit, the list is simply dropped
   7371      from the message.  Enabling LAC can increase the size of the list
   7372      and thus cause the parser to drop it.  Of course, dropping the
   7373      list is better than reporting an incorrect list.
   7374 
   7375    * Performance.
   7376 
   7377      Because LAC requires many parse actions to be performed twice, it
   7378      can have a performance penalty.  However, not all parse actions
   7379      must be performed twice.  Specifically, during a series of default
   7380      reductions in consistent states and shift actions, the parser
   7381      never has to initiate an exploratory parse.  Moreover, the most
   7382      time-consuming tasks in a parse are often the file I/O, the
   7383      lexical analysis performed by the scanner, and the user's semantic
   7384      actions, but none of these are performed during the exploratory
   7385      parse.  Finally, the base of the temporary stack used during an
   7386      exploratory parse is a pointer into the normal parser state stack
   7387      so that the stack is never physically copied.  In our experience,
   7388      the performance penalty of LAC has proved insignificant for
   7389      practical grammars.
   7390 
   7391    While the LAC algorithm shares techniques that have been recognized
   7392 in the parser community for years, for the publication that introduces
   7393 LAC, *note Denny 2010 May: Bibliography.
   7394 
   7395 
   7396 File: bison.info,  Node: Unreachable States,  Prev: LAC,  Up: Tuning LR
   7397 
   7398 5.8.4 Unreachable States
   7399 ------------------------
   7400 
   7401 If there exists no sequence of transitions from the parser's start
   7402 state to some state S, then Bison considers S to be an "unreachable
   7403 state".  A state can become unreachable during conflict resolution if
   7404 Bison disables a shift action leading to it from a predecessor state.
   7405 
   7406    By default, Bison removes unreachable states from the parser after
   7407 conflict resolution because they are useless in the generated parser.
   7408 However, keeping unreachable states is sometimes useful when trying to
   7409 understand the relationship between the parser and the grammar.
   7410 
   7411  -- Directive: %define lr.keep-unreachable-states VALUE
   7412      Request that Bison allow unreachable states to remain in the
   7413      parser tables.  VALUE must be a Boolean.  The default is `false'.
   7414 
   7415    There are a few caveats to consider:
   7416 
   7417    * Missing or extraneous warnings.
   7418 
   7419      Unreachable states may contain conflicts and may use rules not
   7420      used in any other state.  Thus, keeping unreachable states may
   7421      induce warnings that are irrelevant to your parser's behavior, and
   7422      it may eliminate warnings that are relevant.  Of course, the
   7423      change in warnings may actually be relevant to a parser table
   7424      analysis that wants to keep unreachable states, so this behavior
   7425      will likely remain in future Bison releases.
   7426 
   7427    * Other useless states.
   7428 
   7429      While Bison is able to remove unreachable states, it is not
   7430      guaranteed to remove other kinds of useless states.  Specifically,
   7431      when Bison disables reduce actions during conflict resolution,
   7432      some goto actions may become useless, and thus some additional
   7433      states may become useless.  If Bison were to compute which goto
   7434      actions were useless and then disable those actions, it could
   7435      identify such states as unreachable and then remove those states.
   7436      However, Bison does not compute which goto actions are useless.
   7437 
   7438 
   7439 File: bison.info,  Node: Generalized LR Parsing,  Next: Memory Management,  Prev: Tuning LR,  Up: Algorithm
   7440 
   7441 5.9 Generalized LR (GLR) Parsing
   7442 ================================
   7443 
   7444 Bison produces _deterministic_ parsers that choose uniquely when to
   7445 reduce and which reduction to apply based on a summary of the preceding
   7446 input and on one extra token of lookahead.  As a result, normal Bison
   7447 handles a proper subset of the family of context-free languages.
   7448 Ambiguous grammars, since they have strings with more than one possible
   7449 sequence of reductions cannot have deterministic parsers in this sense.
   7450 The same is true of languages that require more than one symbol of
   7451 lookahead, since the parser lacks the information necessary to make a
   7452 decision at the point it must be made in a shift-reduce parser.
   7453 Finally, as previously mentioned (*note Mysterious Conflicts::), there
   7454 are languages where Bison's default choice of how to summarize the
   7455 input seen so far loses necessary information.
   7456 
   7457    When you use the `%glr-parser' declaration in your grammar file,
   7458 Bison generates a parser that uses a different algorithm, called
   7459 Generalized LR (or GLR).  A Bison GLR parser uses the same basic
   7460 algorithm for parsing as an ordinary Bison parser, but behaves
   7461 differently in cases where there is a shift-reduce conflict that has not
   7462 been resolved by precedence rules (*note Precedence::) or a
   7463 reduce-reduce conflict.  When a GLR parser encounters such a situation,
   7464 it effectively _splits_ into a several parsers, one for each possible
   7465 shift or reduction.  These parsers then proceed as usual, consuming
   7466 tokens in lock-step.  Some of the stacks may encounter other conflicts
   7467 and split further, with the result that instead of a sequence of states,
   7468 a Bison GLR parsing stack is what is in effect a tree of states.
   7469 
   7470    In effect, each stack represents a guess as to what the proper parse
   7471 is.  Additional input may indicate that a guess was wrong, in which case
   7472 the appropriate stack silently disappears.  Otherwise, the semantics
   7473 actions generated in each stack are saved, rather than being executed
   7474 immediately.  When a stack disappears, its saved semantic actions never
   7475 get executed.  When a reduction causes two stacks to become equivalent,
   7476 their sets of semantic actions are both saved with the state that
   7477 results from the reduction.  We say that two stacks are equivalent when
   7478 they both represent the same sequence of states, and each pair of
   7479 corresponding states represents a grammar symbol that produces the same
   7480 segment of the input token stream.
   7481 
   7482    Whenever the parser makes a transition from having multiple states
   7483 to having one, it reverts to the normal deterministic parsing
   7484 algorithm, after resolving and executing the saved-up actions.  At this
   7485 transition, some of the states on the stack will have semantic values
   7486 that are sets (actually multisets) of possible actions.  The parser
   7487 tries to pick one of the actions by first finding one whose rule has
   7488 the highest dynamic precedence, as set by the `%dprec' declaration.
   7489 Otherwise, if the alternative actions are not ordered by precedence,
   7490 but there the same merging function is declared for both rules by the
   7491 `%merge' declaration, Bison resolves and evaluates both and then calls
   7492 the merge function on the result.  Otherwise, it reports an ambiguity.
   7493 
   7494    It is possible to use a data structure for the GLR parsing tree that
   7495 permits the processing of any LR(1) grammar in linear time (in the size
   7496 of the input), any unambiguous (not necessarily LR(1)) grammar in
   7497 quadratic worst-case time, and any general (possibly ambiguous)
   7498 context-free grammar in cubic worst-case time.  However, Bison currently
   7499 uses a simpler data structure that requires time proportional to the
   7500 length of the input times the maximum number of stacks required for any
   7501 prefix of the input.  Thus, really ambiguous or nondeterministic
   7502 grammars can require exponential time and space to process.  Such badly
   7503 behaving examples, however, are not generally of practical interest.
   7504 Usually, nondeterminism in a grammar is local--the parser is "in doubt"
   7505 only for a few tokens at a time.  Therefore, the current data structure
   7506 should generally be adequate.  On LR(1) portions of a grammar, in
   7507 particular, it is only slightly slower than with the deterministic
   7508 LR(1) Bison parser.
   7509 
   7510    For a more detailed exposition of GLR parsers, *note Scott 2000:
   7511 Bibliography.
   7512 
   7513 
   7514 File: bison.info,  Node: Memory Management,  Prev: Generalized LR Parsing,  Up: Algorithm
   7515 
   7516 5.10 Memory Management, and How to Avoid Memory Exhaustion
   7517 ==========================================================
   7518 
   7519 The Bison parser stack can run out of memory if too many tokens are
   7520 shifted and not reduced.  When this happens, the parser function
   7521 `yyparse' calls `yyerror' and then returns 2.
   7522 
   7523    Because Bison parsers have growing stacks, hitting the upper limit
   7524 usually results from using a right recursion instead of a left
   7525 recursion, see *note Recursive Rules: Recursion.
   7526 
   7527    By defining the macro `YYMAXDEPTH', you can control how deep the
   7528 parser stack can become before memory is exhausted.  Define the macro
   7529 with a value that is an integer.  This value is the maximum number of
   7530 tokens that can be shifted (and not reduced) before overflow.
   7531 
   7532    The stack space allowed is not necessarily allocated.  If you
   7533 specify a large value for `YYMAXDEPTH', the parser normally allocates a
   7534 small stack at first, and then makes it bigger by stages as needed.
   7535 This increasing allocation happens automatically and silently.
   7536 Therefore, you do not need to make `YYMAXDEPTH' painfully small merely
   7537 to save space for ordinary inputs that do not need much stack.
   7538 
   7539    However, do not allow `YYMAXDEPTH' to be a value so large that
   7540 arithmetic overflow could occur when calculating the size of the stack
   7541 space.  Also, do not allow `YYMAXDEPTH' to be less than `YYINITDEPTH'.
   7542 
   7543    The default value of `YYMAXDEPTH', if you do not define it, is 10000.
   7544 
   7545    You can control how much stack is allocated initially by defining the
   7546 macro `YYINITDEPTH' to a positive integer.  For the deterministic
   7547 parser in C, this value must be a compile-time constant unless you are
   7548 assuming C99 or some other target language or compiler that allows
   7549 variable-length arrays.  The default is 200.
   7550 
   7551    Do not allow `YYINITDEPTH' to be greater than `YYMAXDEPTH'.
   7552 
   7553    Because of semantic differences between C and C++, the deterministic
   7554 parsers in C produced by Bison cannot grow when compiled by C++
   7555 compilers.  In this precise case (compiling a C parser as C++) you are
   7556 suggested to grow `YYINITDEPTH'.  The Bison maintainers hope to fix
   7557 this deficiency in a future release.
   7558 
   7559 
   7560 File: bison.info,  Node: Error Recovery,  Next: Context Dependency,  Prev: Algorithm,  Up: Top
   7561 
   7562 6 Error Recovery
   7563 ****************
   7564 
   7565 It is not usually acceptable to have a program terminate on a syntax
   7566 error.  For example, a compiler should recover sufficiently to parse the
   7567 rest of the input file and check it for errors; a calculator should
   7568 accept another expression.
   7569 
   7570    In a simple interactive command parser where each input is one line,
   7571 it may be sufficient to allow `yyparse' to return 1 on error and have
   7572 the caller ignore the rest of the input line when that happens (and