1 This is ../../../doc/bison.info, produced by makeinfo version 4.13 from 2 ../../../doc/bison.texi. 3 4 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU 5 parser generator. 6 7 Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation, 8 Inc. 9 10 Permission is granted to copy, distribute and/or modify this 11 document under the terms of the GNU Free Documentation License, 12 Version 1.3 or any later version published by the Free Software 13 Foundation; with no Invariant Sections, with the Front-Cover texts 14 being "A GNU Manual," and with the Back-Cover Texts as in (a) 15 below. A copy of the license is included in the section entitled 16 "GNU Free Documentation License." 17 18 (a) The FSF's Back-Cover Text is: "You have the freedom to copy and 19 modify this GNU manual. Buying copies from the FSF supports it in 20 developing GNU and promoting software freedom." 21 22 INFO-DIR-SECTION Software development 23 START-INFO-DIR-ENTRY 24 * bison: (bison). GNU parser generator (Yacc replacement). 25 END-INFO-DIR-ENTRY 26 27 28 File: bison.info, Node: Top, Next: Introduction, Up: (dir) 29 30 Bison 31 ***** 32 33 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU 34 parser generator. 35 36 Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation, 37 Inc. 38 39 Permission is granted to copy, distribute and/or modify this 40 document under the terms of the GNU Free Documentation License, 41 Version 1.3 or any later version published by the Free Software 42 Foundation; with no Invariant Sections, with the Front-Cover texts 43 being "A GNU Manual," and with the Back-Cover Texts as in (a) 44 below. A copy of the license is included in the section entitled 45 "GNU Free Documentation License." 46 47 (a) The FSF's Back-Cover Text is: "You have the freedom to copy and 48 modify this GNU manual. Buying copies from the FSF supports it in 49 developing GNU and promoting software freedom." 50 51 * Menu: 52 53 * Introduction:: 54 * Conditions:: 55 * Copying:: The GNU General Public License says 56 how you can copy and share Bison. 57 58 Tutorial sections: 59 * Concepts:: Basic concepts for understanding Bison. 60 * Examples:: Three simple explained examples of using Bison. 61 62 Reference sections: 63 * Grammar File:: Writing Bison declarations and rules. 64 * Interface:: C-language interface to the parser function `yyparse'. 65 * Algorithm:: How the Bison parser works at run-time. 66 * Error Recovery:: Writing rules for error recovery. 67 * Context Dependency:: What to do if your language syntax is too 68 messy for Bison to handle straightforwardly. 69 * Debugging:: Understanding or debugging Bison parsers. 70 * Invocation:: How to run Bison (to produce the parser implementation). 71 * Other Languages:: Creating C++ and Java parsers. 72 * FAQ:: Frequently Asked Questions 73 * Table of Symbols:: All the keywords of the Bison language are explained. 74 * Glossary:: Basic concepts are explained. 75 * Copying This Manual:: License for copying this manual. 76 * Bibliography:: Publications cited in this manual. 77 * Index of Terms:: Cross-references to the text. 78 79 --- The Detailed Node Listing --- 80 81 The Concepts of Bison 82 83 * Language and Grammar:: Languages and context-free grammars, 84 as mathematical ideas. 85 * Grammar in Bison:: How we represent grammars for Bison's sake. 86 * Semantic Values:: Each token or syntactic grouping can have 87 a semantic value (the value of an integer, 88 the name of an identifier, etc.). 89 * Semantic Actions:: Each rule can have an action containing C code. 90 * GLR Parsers:: Writing parsers for general context-free languages. 91 * Locations:: Overview of location tracking. 92 * Bison Parser:: What are Bison's input and output, 93 how is the output used? 94 * Stages:: Stages in writing and running Bison grammars. 95 * Grammar Layout:: Overall structure of a Bison grammar file. 96 97 Writing GLR Parsers 98 99 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 100 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 101 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 102 * Compiler Requirements:: GLR parsers require a modern C compiler. 103 104 Examples 105 106 * RPN Calc:: Reverse polish notation calculator; 107 a first example with no operator precedence. 108 * Infix Calc:: Infix (algebraic) notation calculator. 109 Operator precedence is introduced. 110 * Simple Error Recovery:: Continuing after syntax errors. 111 * Location Tracking Calc:: Demonstrating the use of @N and @$. 112 * Multi-function Calc:: Calculator with memory and trig functions. 113 It uses multiple data-types for semantic values. 114 * Exercises:: Ideas for improving the multi-function calculator. 115 116 Reverse Polish Notation Calculator 117 118 * Rpcalc Declarations:: Prologue (declarations) for rpcalc. 119 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. 120 * Rpcalc Lexer:: The lexical analyzer. 121 * Rpcalc Main:: The controlling function. 122 * Rpcalc Error:: The error reporting function. 123 * Rpcalc Generate:: Running Bison on the grammar file. 124 * Rpcalc Compile:: Run the C compiler on the output code. 125 126 Grammar Rules for `rpcalc' 127 128 * Rpcalc Input:: 129 * Rpcalc Line:: 130 * Rpcalc Expr:: 131 132 Location Tracking Calculator: `ltcalc' 133 134 * Ltcalc Declarations:: Bison and C declarations for ltcalc. 135 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations. 136 * Ltcalc Lexer:: The lexical analyzer. 137 138 Multi-Function Calculator: `mfcalc' 139 140 * Mfcalc Declarations:: Bison declarations for multi-function calculator. 141 * Mfcalc Rules:: Grammar rules for the calculator. 142 * Mfcalc Symbol Table:: Symbol table management subroutines. 143 144 Bison Grammar Files 145 146 * Grammar Outline:: Overall layout of the grammar file. 147 * Symbols:: Terminal and nonterminal symbols. 148 * Rules:: How to write grammar rules. 149 * Recursion:: Writing recursive rules. 150 * Semantics:: Semantic values and actions. 151 * Tracking Locations:: Locations and actions. 152 * Named References:: Using named references in actions. 153 * Declarations:: All kinds of Bison declarations are described here. 154 * Multiple Parsers:: Putting more than one Bison parser in one program. 155 156 Outline of a Bison Grammar 157 158 * Prologue:: Syntax and usage of the prologue. 159 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. 160 * Bison Declarations:: Syntax and usage of the Bison declarations section. 161 * Grammar Rules:: Syntax and usage of the grammar rules section. 162 * Epilogue:: Syntax and usage of the epilogue. 163 164 Defining Language Semantics 165 166 * Value Type:: Specifying one data type for all semantic values. 167 * Multiple Types:: Specifying several alternative data types. 168 * Actions:: An action is the semantic definition of a grammar rule. 169 * Action Types:: Specifying data types for actions to operate on. 170 * Mid-Rule Actions:: Most actions go at the end of a rule. 171 This says when, why and how to use the exceptional 172 action in the middle of a rule. 173 174 Actions in Mid-Rule 175 176 * Using Mid-Rule Actions:: Putting an action in the middle of a rule. 177 * Mid-Rule Action Translation:: How mid-rule actions are actually processed. 178 * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. 179 180 Tracking Locations 181 182 * Location Type:: Specifying a data type for locations. 183 * Actions and Locations:: Using locations in actions. 184 * Location Default Action:: Defining a general way to compute locations. 185 186 Bison Declarations 187 188 * Require Decl:: Requiring a Bison version. 189 * Token Decl:: Declaring terminal symbols. 190 * Precedence Decl:: Declaring terminals with precedence and associativity. 191 * Union Decl:: Declaring the set of all semantic value types. 192 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 193 * Initial Action Decl:: Code run before parsing starts. 194 * Destructor Decl:: Declaring how symbols are freed. 195 * Printer Decl:: Declaring how symbol values are displayed. 196 * Expect Decl:: Suppressing warnings about parsing conflicts. 197 * Start Decl:: Specifying the start symbol. 198 * Pure Decl:: Requesting a reentrant parser. 199 * Push Decl:: Requesting a push parser. 200 * Decl Summary:: Table of all Bison declarations. 201 * %define Summary:: Defining variables to adjust Bison's behavior. 202 * %code Summary:: Inserting code into the parser source. 203 204 Parser C-Language Interface 205 206 * Parser Function:: How to call `yyparse' and what it returns. 207 * Push Parser Function:: How to call `yypush_parse' and what it returns. 208 * Pull Parser Function:: How to call `yypull_parse' and what it returns. 209 * Parser Create Function:: How to call `yypstate_new' and what it returns. 210 * Parser Delete Function:: How to call `yypstate_delete' and what it returns. 211 * Lexical:: You must supply a function `yylex' 212 which reads tokens. 213 * Error Reporting:: You must supply a function `yyerror'. 214 * Action Features:: Special features for use in actions. 215 * Internationalization:: How to let the parser speak in the user's 216 native language. 217 218 The Lexical Analyzer Function `yylex' 219 220 * Calling Convention:: How `yyparse' calls `yylex'. 221 * Token Values:: How `yylex' must return the semantic value 222 of the token it has read. 223 * Token Locations:: How `yylex' must return the text location 224 (line number, etc.) of the token, if the 225 actions want that. 226 * Pure Calling:: How the calling convention differs in a pure parser 227 (*note A Pure (Reentrant) Parser: Pure Decl.). 228 229 The Bison Parser Algorithm 230 231 * Lookahead:: Parser looks one token ahead when deciding what to do. 232 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 233 * Precedence:: Operator precedence works by resolving conflicts. 234 * Contextual Precedence:: When an operator's precedence depends on context. 235 * Parser States:: The parser is a finite-state-machine with stack. 236 * Reduce/Reduce:: When two rules are applicable in the same situation. 237 * Mysterious Conflicts:: Conflicts that look unjustified. 238 * Tuning LR:: How to tune fundamental aspects of LR-based parsing. 239 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 240 * Memory Management:: What happens when memory is exhausted. How to avoid it. 241 242 Operator Precedence 243 244 * Why Precedence:: An example showing why precedence is needed. 245 * Using Precedence:: How to specify precedence in Bison grammars. 246 * Precedence Examples:: How these features are used in the previous example. 247 * How Precedence:: How they work. 248 * Non Operators:: Using precedence for general conflicts. 249 250 Tuning LR 251 252 * LR Table Construction:: Choose a different construction algorithm. 253 * Default Reductions:: Disable default reductions. 254 * LAC:: Correct lookahead sets in the parser states. 255 * Unreachable States:: Keep unreachable parser states for debugging. 256 257 Handling Context Dependencies 258 259 * Semantic Tokens:: Token parsing can depend on the semantic context. 260 * Lexical Tie-ins:: Token parsing can depend on the syntactic context. 261 * Tie-in Recovery:: Lexical tie-ins have implications for how 262 error recovery rules must be written. 263 264 Debugging Your Parser 265 266 * Understanding:: Understanding the structure of your parser. 267 * Graphviz:: Getting a visual representation of the parser. 268 * Xml:: Getting a markup representation of the parser. 269 * Tracing:: Tracing the execution of your parser. 270 271 Tracing Your Parser 272 273 * Enabling Traces:: Activating run-time trace support 274 * Mfcalc Traces:: Extending `mfcalc' to support traces 275 * The YYPRINT Macro:: Obsolete interface for semantic value reports 276 277 Invoking Bison 278 279 * Bison Options:: All the options described in detail, 280 in alphabetical order by short options. 281 * Option Cross Key:: Alphabetical list of long options. 282 * Yacc Library:: Yacc-compatible `yylex' and `main'. 283 284 Parsers Written In Other Languages 285 286 * C++ Parsers:: The interface to generate C++ parser classes 287 * Java Parsers:: The interface to generate Java parser classes 288 289 C++ Parsers 290 291 * C++ Bison Interface:: Asking for C++ parser generation 292 * C++ Semantic Values:: %union vs. C++ 293 * C++ Location Values:: The position and location classes 294 * C++ Parser Interface:: Instantiating and running the parser 295 * C++ Scanner Interface:: Exchanges between yylex and parse 296 * A Complete C++ Example:: Demonstrating their use 297 298 C++ Location Values 299 300 * C++ position:: One point in the source file 301 * C++ location:: Two points in the source file 302 * User Defined Location Type:: Required interface for locations 303 304 A Complete C++ Example 305 306 * Calc++ --- C++ Calculator:: The specifications 307 * Calc++ Parsing Driver:: An active parsing context 308 * Calc++ Parser:: A parser class 309 * Calc++ Scanner:: A pure C++ Flex scanner 310 * Calc++ Top Level:: Conducting the band 311 312 Java Parsers 313 314 * Java Bison Interface:: Asking for Java parser generation 315 * Java Semantic Values:: %type and %token vs. Java 316 * Java Location Values:: The position and location classes 317 * Java Parser Interface:: Instantiating and running the parser 318 * Java Scanner Interface:: Specifying the scanner for the parser 319 * Java Action Features:: Special features for use in actions 320 * Java Differences:: Differences between C/C++ and Java Grammars 321 * Java Declarations Summary:: List of Bison declarations used with Java 322 323 Frequently Asked Questions 324 325 * Memory Exhausted:: Breaking the Stack Limits 326 * How Can I Reset the Parser:: `yyparse' Keeps some State 327 * Strings are Destroyed:: `yylval' Loses Track of Strings 328 * Implementing Gotos/Loops:: Control Flow in the Calculator 329 * Multiple start-symbols:: Factoring closely related grammars 330 * Secure? Conform?:: Is Bison POSIX safe? 331 * I can't build Bison:: Troubleshooting 332 * Where can I find help?:: Troubleshouting 333 * Bug Reports:: Troublereporting 334 * More Languages:: Parsers in C++, Java, and so on 335 * Beta Testing:: Experimenting development versions 336 * Mailing Lists:: Meeting other Bison users 337 338 Copying This Manual 339 340 * Copying This Manual:: License for copying this manual. 341 342 343 File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top 344 345 Introduction 346 ************ 347 348 "Bison" is a general-purpose parser generator that converts an 349 annotated context-free grammar into a deterministic LR or generalized 350 LR (GLR) parser employing LALR(1) parser tables. As an experimental 351 feature, Bison can also generate IELR(1) or canonical LR(1) parser 352 tables. Once you are proficient with Bison, you can use it to develop 353 a wide range of language parsers, from those used in simple desk 354 calculators to complex programming languages. 355 356 Bison is upward compatible with Yacc: all properly-written Yacc 357 grammars ought to work with Bison with no change. Anyone familiar with 358 Yacc should be able to use Bison with little trouble. You need to be 359 fluent in C or C++ programming in order to use Bison or to understand 360 this manual. Java is also supported as an experimental feature. 361 362 We begin with tutorial chapters that explain the basic concepts of 363 using Bison and show three explained examples, each building on the 364 last. If you don't know Bison or Yacc, start by reading these 365 chapters. Reference chapters follow, which describe specific aspects 366 of Bison in detail. 367 368 Bison was written originally by Robert Corbett. Richard Stallman 369 made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University 370 added multi-character string literals and other features. Since then, 371 Bison has grown more robust and evolved many other new features thanks 372 to the hard work of a long list of volunteers. For details, see the 373 `THANKS' and `ChangeLog' files included in the Bison distribution. 374 375 This edition corresponds to version 2.7 of Bison. 376 377 378 File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top 379 380 Conditions for Using Bison 381 ************************** 382 383 The distribution terms for Bison-generated parsers permit using the 384 parsers in nonfree programs. Before Bison version 2.2, these extra 385 permissions applied only when Bison was generating LALR(1) parsers in 386 C. And before Bison version 1.24, Bison-generated parsers could be 387 used only in programs that were free software. 388 389 The other GNU programming tools, such as the GNU C compiler, have 390 never had such a requirement. They could always be used for nonfree 391 software. The reason Bison was different was not due to a special 392 policy decision; it resulted from applying the usual General Public 393 License to all of the Bison source code. 394 395 The main output of the Bison utility--the Bison parser implementation 396 file--contains a verbatim copy of a sizable piece of Bison, which is 397 the code for the parser's implementation. (The actions from your 398 grammar are inserted into this implementation at one point, but most of 399 the rest of the implementation is not changed.) When we applied the 400 GPL terms to the skeleton code for the parser's implementation, the 401 effect was to restrict the use of Bison output to free software. 402 403 We didn't change the terms because of sympathy for people who want to 404 make software proprietary. *Software should be free.* But we 405 concluded that limiting Bison's use to free software was doing little to 406 encourage people to make other software free. So we decided to make the 407 practical conditions for using Bison match the practical conditions for 408 using the other GNU tools. 409 410 This exception applies when Bison is generating code for a parser. 411 You can tell whether the exception applies to a Bison output file by 412 inspecting the file for text beginning with "As a special 413 exception...". The text spells out the exact terms of the exception. 414 415 416 File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top 417 418 GNU GENERAL PUBLIC LICENSE 419 ************************** 420 421 Version 3, 29 June 2007 422 423 Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/' 424 425 Everyone is permitted to copy and distribute verbatim copies of this 426 license document, but changing it is not allowed. 427 428 Preamble 429 ======== 430 431 The GNU General Public License is a free, copyleft license for software 432 and other kinds of works. 433 434 The licenses for most software and other practical works are designed 435 to take away your freedom to share and change the works. By contrast, 436 the GNU General Public License is intended to guarantee your freedom to 437 share and change all versions of a program--to make sure it remains 438 free software for all its users. We, the Free Software Foundation, use 439 the GNU General Public License for most of our software; it applies 440 also to any other work released this way by its authors. You can apply 441 it to your programs, too. 442 443 When we speak of free software, we are referring to freedom, not 444 price. Our General Public Licenses are designed to make sure that you 445 have the freedom to distribute copies of free software (and charge for 446 them if you wish), that you receive source code or can get it if you 447 want it, that you can change the software or use pieces of it in new 448 free programs, and that you know you can do these things. 449 450 To protect your rights, we need to prevent others from denying you 451 these rights or asking you to surrender the rights. Therefore, you 452 have certain responsibilities if you distribute copies of the software, 453 or if you modify it: responsibilities to respect the freedom of others. 454 455 For example, if you distribute copies of such a program, whether 456 gratis or for a fee, you must pass on to the recipients the same 457 freedoms that you received. You must make sure that they, too, receive 458 or can get the source code. And you must show them these terms so they 459 know their rights. 460 461 Developers that use the GNU GPL protect your rights with two steps: 462 (1) assert copyright on the software, and (2) offer you this License 463 giving you legal permission to copy, distribute and/or modify it. 464 465 For the developers' and authors' protection, the GPL clearly explains 466 that there is no warranty for this free software. For both users' and 467 authors' sake, the GPL requires that modified versions be marked as 468 changed, so that their problems will not be attributed erroneously to 469 authors of previous versions. 470 471 Some devices are designed to deny users access to install or run 472 modified versions of the software inside them, although the 473 manufacturer can do so. This is fundamentally incompatible with the 474 aim of protecting users' freedom to change the software. The 475 systematic pattern of such abuse occurs in the area of products for 476 individuals to use, which is precisely where it is most unacceptable. 477 Therefore, we have designed this version of the GPL to prohibit the 478 practice for those products. If such problems arise substantially in 479 other domains, we stand ready to extend this provision to those domains 480 in future versions of the GPL, as needed to protect the freedom of 481 users. 482 483 Finally, every program is threatened constantly by software patents. 484 States should not allow patents to restrict development and use of 485 software on general-purpose computers, but in those that do, we wish to 486 avoid the special danger that patents applied to a free program could 487 make it effectively proprietary. To prevent this, the GPL assures that 488 patents cannot be used to render the program non-free. 489 490 The precise terms and conditions for copying, distribution and 491 modification follow. 492 493 TERMS AND CONDITIONS 494 ==================== 495 496 0. Definitions. 497 498 "This License" refers to version 3 of the GNU General Public 499 License. 500 501 "Copyright" also means copyright-like laws that apply to other 502 kinds of works, such as semiconductor masks. 503 504 "The Program" refers to any copyrightable work licensed under this 505 License. Each licensee is addressed as "you". "Licensees" and 506 "recipients" may be individuals or organizations. 507 508 To "modify" a work means to copy from or adapt all or part of the 509 work in a fashion requiring copyright permission, other than the 510 making of an exact copy. The resulting work is called a "modified 511 version" of the earlier work or a work "based on" the earlier work. 512 513 A "covered work" means either the unmodified Program or a work 514 based on the Program. 515 516 To "propagate" a work means to do anything with it that, without 517 permission, would make you directly or secondarily liable for 518 infringement under applicable copyright law, except executing it 519 on a computer or modifying a private copy. Propagation includes 520 copying, distribution (with or without modification), making 521 available to the public, and in some countries other activities as 522 well. 523 524 To "convey" a work means any kind of propagation that enables other 525 parties to make or receive copies. Mere interaction with a user 526 through a computer network, with no transfer of a copy, is not 527 conveying. 528 529 An interactive user interface displays "Appropriate Legal Notices" 530 to the extent that it includes a convenient and prominently visible 531 feature that (1) displays an appropriate copyright notice, and (2) 532 tells the user that there is no warranty for the work (except to 533 the extent that warranties are provided), that licensees may 534 convey the work under this License, and how to view a copy of this 535 License. If the interface presents a list of user commands or 536 options, such as a menu, a prominent item in the list meets this 537 criterion. 538 539 1. Source Code. 540 541 The "source code" for a work means the preferred form of the work 542 for making modifications to it. "Object code" means any 543 non-source form of a work. 544 545 A "Standard Interface" means an interface that either is an 546 official standard defined by a recognized standards body, or, in 547 the case of interfaces specified for a particular programming 548 language, one that is widely used among developers working in that 549 language. 550 551 The "System Libraries" of an executable work include anything, 552 other than the work as a whole, that (a) is included in the normal 553 form of packaging a Major Component, but which is not part of that 554 Major Component, and (b) serves only to enable use of the work 555 with that Major Component, or to implement a Standard Interface 556 for which an implementation is available to the public in source 557 code form. A "Major Component", in this context, means a major 558 essential component (kernel, window system, and so on) of the 559 specific operating system (if any) on which the executable work 560 runs, or a compiler used to produce the work, or an object code 561 interpreter used to run it. 562 563 The "Corresponding Source" for a work in object code form means all 564 the source code needed to generate, install, and (for an executable 565 work) run the object code and to modify the work, including 566 scripts to control those activities. However, it does not include 567 the work's System Libraries, or general-purpose tools or generally 568 available free programs which are used unmodified in performing 569 those activities but which are not part of the work. For example, 570 Corresponding Source includes interface definition files 571 associated with source files for the work, and the source code for 572 shared libraries and dynamically linked subprograms that the work 573 is specifically designed to require, such as by intimate data 574 communication or control flow between those subprograms and other 575 parts of the work. 576 577 The Corresponding Source need not include anything that users can 578 regenerate automatically from other parts of the Corresponding 579 Source. 580 581 The Corresponding Source for a work in source code form is that 582 same work. 583 584 2. Basic Permissions. 585 586 All rights granted under this License are granted for the term of 587 copyright on the Program, and are irrevocable provided the stated 588 conditions are met. This License explicitly affirms your unlimited 589 permission to run the unmodified Program. The output from running 590 a covered work is covered by this License only if the output, 591 given its content, constitutes a covered work. This License 592 acknowledges your rights of fair use or other equivalent, as 593 provided by copyright law. 594 595 You may make, run and propagate covered works that you do not 596 convey, without conditions so long as your license otherwise 597 remains in force. You may convey covered works to others for the 598 sole purpose of having them make modifications exclusively for 599 you, or provide you with facilities for running those works, 600 provided that you comply with the terms of this License in 601 conveying all material for which you do not control copyright. 602 Those thus making or running the covered works for you must do so 603 exclusively on your behalf, under your direction and control, on 604 terms that prohibit them from making any copies of your 605 copyrighted material outside their relationship with you. 606 607 Conveying under any other circumstances is permitted solely under 608 the conditions stated below. Sublicensing is not allowed; section 609 10 makes it unnecessary. 610 611 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 612 613 No covered work shall be deemed part of an effective technological 614 measure under any applicable law fulfilling obligations under 615 article 11 of the WIPO copyright treaty adopted on 20 December 616 1996, or similar laws prohibiting or restricting circumvention of 617 such measures. 618 619 When you convey a covered work, you waive any legal power to forbid 620 circumvention of technological measures to the extent such 621 circumvention is effected by exercising rights under this License 622 with respect to the covered work, and you disclaim any intention 623 to limit operation or modification of the work as a means of 624 enforcing, against the work's users, your or third parties' legal 625 rights to forbid circumvention of technological measures. 626 627 4. Conveying Verbatim Copies. 628 629 You may convey verbatim copies of the Program's source code as you 630 receive it, in any medium, provided that you conspicuously and 631 appropriately publish on each copy an appropriate copyright notice; 632 keep intact all notices stating that this License and any 633 non-permissive terms added in accord with section 7 apply to the 634 code; keep intact all notices of the absence of any warranty; and 635 give all recipients a copy of this License along with the Program. 636 637 You may charge any price or no price for each copy that you convey, 638 and you may offer support or warranty protection for a fee. 639 640 5. Conveying Modified Source Versions. 641 642 You may convey a work based on the Program, or the modifications to 643 produce it from the Program, in the form of source code under the 644 terms of section 4, provided that you also meet all of these 645 conditions: 646 647 a. The work must carry prominent notices stating that you 648 modified it, and giving a relevant date. 649 650 b. The work must carry prominent notices stating that it is 651 released under this License and any conditions added under 652 section 7. This requirement modifies the requirement in 653 section 4 to "keep intact all notices". 654 655 c. You must license the entire work, as a whole, under this 656 License to anyone who comes into possession of a copy. This 657 License will therefore apply, along with any applicable 658 section 7 additional terms, to the whole of the work, and all 659 its parts, regardless of how they are packaged. This License 660 gives no permission to license the work in any other way, but 661 it does not invalidate such permission if you have separately 662 received it. 663 664 d. If the work has interactive user interfaces, each must display 665 Appropriate Legal Notices; however, if the Program has 666 interactive interfaces that do not display Appropriate Legal 667 Notices, your work need not make them do so. 668 669 A compilation of a covered work with other separate and independent 670 works, which are not by their nature extensions of the covered 671 work, and which are not combined with it such as to form a larger 672 program, in or on a volume of a storage or distribution medium, is 673 called an "aggregate" if the compilation and its resulting 674 copyright are not used to limit the access or legal rights of the 675 compilation's users beyond what the individual works permit. 676 Inclusion of a covered work in an aggregate does not cause this 677 License to apply to the other parts of the aggregate. 678 679 6. Conveying Non-Source Forms. 680 681 You may convey a covered work in object code form under the terms 682 of sections 4 and 5, provided that you also convey the 683 machine-readable Corresponding Source under the terms of this 684 License, in one of these ways: 685 686 a. Convey the object code in, or embodied in, a physical product 687 (including a physical distribution medium), accompanied by the 688 Corresponding Source fixed on a durable physical medium 689 customarily used for software interchange. 690 691 b. Convey the object code in, or embodied in, a physical product 692 (including a physical distribution medium), accompanied by a 693 written offer, valid for at least three years and valid for 694 as long as you offer spare parts or customer support for that 695 product model, to give anyone who possesses the object code 696 either (1) a copy of the Corresponding Source for all the 697 software in the product that is covered by this License, on a 698 durable physical medium customarily used for software 699 interchange, for a price no more than your reasonable cost of 700 physically performing this conveying of source, or (2) access 701 to copy the Corresponding Source from a network server at no 702 charge. 703 704 c. Convey individual copies of the object code with a copy of 705 the written offer to provide the Corresponding Source. This 706 alternative is allowed only occasionally and noncommercially, 707 and only if you received the object code with such an offer, 708 in accord with subsection 6b. 709 710 d. Convey the object code by offering access from a designated 711 place (gratis or for a charge), and offer equivalent access 712 to the Corresponding Source in the same way through the same 713 place at no further charge. You need not require recipients 714 to copy the Corresponding Source along with the object code. 715 If the place to copy the object code is a network server, the 716 Corresponding Source may be on a different server (operated 717 by you or a third party) that supports equivalent copying 718 facilities, provided you maintain clear directions next to 719 the object code saying where to find the Corresponding Source. 720 Regardless of what server hosts the Corresponding Source, you 721 remain obligated to ensure that it is available for as long 722 as needed to satisfy these requirements. 723 724 e. Convey the object code using peer-to-peer transmission, 725 provided you inform other peers where the object code and 726 Corresponding Source of the work are being offered to the 727 general public at no charge under subsection 6d. 728 729 730 A separable portion of the object code, whose source code is 731 excluded from the Corresponding Source as a System Library, need 732 not be included in conveying the object code work. 733 734 A "User Product" is either (1) a "consumer product", which means 735 any tangible personal property which is normally used for personal, 736 family, or household purposes, or (2) anything designed or sold for 737 incorporation into a dwelling. In determining whether a product 738 is a consumer product, doubtful cases shall be resolved in favor of 739 coverage. For a particular product received by a particular user, 740 "normally used" refers to a typical or common use of that class of 741 product, regardless of the status of the particular user or of the 742 way in which the particular user actually uses, or expects or is 743 expected to use, the product. A product is a consumer product 744 regardless of whether the product has substantial commercial, 745 industrial or non-consumer uses, unless such uses represent the 746 only significant mode of use of the product. 747 748 "Installation Information" for a User Product means any methods, 749 procedures, authorization keys, or other information required to 750 install and execute modified versions of a covered work in that 751 User Product from a modified version of its Corresponding Source. 752 The information must suffice to ensure that the continued 753 functioning of the modified object code is in no case prevented or 754 interfered with solely because modification has been made. 755 756 If you convey an object code work under this section in, or with, 757 or specifically for use in, a User Product, and the conveying 758 occurs as part of a transaction in which the right of possession 759 and use of the User Product is transferred to the recipient in 760 perpetuity or for a fixed term (regardless of how the transaction 761 is characterized), the Corresponding Source conveyed under this 762 section must be accompanied by the Installation Information. But 763 this requirement does not apply if neither you nor any third party 764 retains the ability to install modified object code on the User 765 Product (for example, the work has been installed in ROM). 766 767 The requirement to provide Installation Information does not 768 include a requirement to continue to provide support service, 769 warranty, or updates for a work that has been modified or 770 installed by the recipient, or for the User Product in which it 771 has been modified or installed. Access to a network may be denied 772 when the modification itself materially and adversely affects the 773 operation of the network or violates the rules and protocols for 774 communication across the network. 775 776 Corresponding Source conveyed, and Installation Information 777 provided, in accord with this section must be in a format that is 778 publicly documented (and with an implementation available to the 779 public in source code form), and must require no special password 780 or key for unpacking, reading or copying. 781 782 7. Additional Terms. 783 784 "Additional permissions" are terms that supplement the terms of 785 this License by making exceptions from one or more of its 786 conditions. Additional permissions that are applicable to the 787 entire Program shall be treated as though they were included in 788 this License, to the extent that they are valid under applicable 789 law. If additional permissions apply only to part of the Program, 790 that part may be used separately under those permissions, but the 791 entire Program remains governed by this License without regard to 792 the additional permissions. 793 794 When you convey a copy of a covered work, you may at your option 795 remove any additional permissions from that copy, or from any part 796 of it. (Additional permissions may be written to require their own 797 removal in certain cases when you modify the work.) You may place 798 additional permissions on material, added by you to a covered work, 799 for which you have or can give appropriate copyright permission. 800 801 Notwithstanding any other provision of this License, for material 802 you add to a covered work, you may (if authorized by the copyright 803 holders of that material) supplement the terms of this License 804 with terms: 805 806 a. Disclaiming warranty or limiting liability differently from 807 the terms of sections 15 and 16 of this License; or 808 809 b. Requiring preservation of specified reasonable legal notices 810 or author attributions in that material or in the Appropriate 811 Legal Notices displayed by works containing it; or 812 813 c. Prohibiting misrepresentation of the origin of that material, 814 or requiring that modified versions of such material be 815 marked in reasonable ways as different from the original 816 version; or 817 818 d. Limiting the use for publicity purposes of names of licensors 819 or authors of the material; or 820 821 e. Declining to grant rights under trademark law for use of some 822 trade names, trademarks, or service marks; or 823 824 f. Requiring indemnification of licensors and authors of that 825 material by anyone who conveys the material (or modified 826 versions of it) with contractual assumptions of liability to 827 the recipient, for any liability that these contractual 828 assumptions directly impose on those licensors and authors. 829 830 All other non-permissive additional terms are considered "further 831 restrictions" within the meaning of section 10. If the Program as 832 you received it, or any part of it, contains a notice stating that 833 it is governed by this License along with a term that is a further 834 restriction, you may remove that term. If a license document 835 contains a further restriction but permits relicensing or 836 conveying under this License, you may add to a covered work 837 material governed by the terms of that license document, provided 838 that the further restriction does not survive such relicensing or 839 conveying. 840 841 If you add terms to a covered work in accord with this section, you 842 must place, in the relevant source files, a statement of the 843 additional terms that apply to those files, or a notice indicating 844 where to find the applicable terms. 845 846 Additional terms, permissive or non-permissive, may be stated in 847 the form of a separately written license, or stated as exceptions; 848 the above requirements apply either way. 849 850 8. Termination. 851 852 You may not propagate or modify a covered work except as expressly 853 provided under this License. Any attempt otherwise to propagate or 854 modify it is void, and will automatically terminate your rights 855 under this License (including any patent licenses granted under 856 the third paragraph of section 11). 857 858 However, if you cease all violation of this License, then your 859 license from a particular copyright holder is reinstated (a) 860 provisionally, unless and until the copyright holder explicitly 861 and finally terminates your license, and (b) permanently, if the 862 copyright holder fails to notify you of the violation by some 863 reasonable means prior to 60 days after the cessation. 864 865 Moreover, your license from a particular copyright holder is 866 reinstated permanently if the copyright holder notifies you of the 867 violation by some reasonable means, this is the first time you have 868 received notice of violation of this License (for any work) from 869 that copyright holder, and you cure the violation prior to 30 days 870 after your receipt of the notice. 871 872 Termination of your rights under this section does not terminate 873 the licenses of parties who have received copies or rights from 874 you under this License. If your rights have been terminated and 875 not permanently reinstated, you do not qualify to receive new 876 licenses for the same material under section 10. 877 878 9. Acceptance Not Required for Having Copies. 879 880 You are not required to accept this License in order to receive or 881 run a copy of the Program. Ancillary propagation of a covered work 882 occurring solely as a consequence of using peer-to-peer 883 transmission to receive a copy likewise does not require 884 acceptance. However, nothing other than this License grants you 885 permission to propagate or modify any covered work. These actions 886 infringe copyright if you do not accept this License. Therefore, 887 by modifying or propagating a covered work, you indicate your 888 acceptance of this License to do so. 889 890 10. Automatic Licensing of Downstream Recipients. 891 892 Each time you convey a covered work, the recipient automatically 893 receives a license from the original licensors, to run, modify and 894 propagate that work, subject to this License. You are not 895 responsible for enforcing compliance by third parties with this 896 License. 897 898 An "entity transaction" is a transaction transferring control of an 899 organization, or substantially all assets of one, or subdividing an 900 organization, or merging organizations. If propagation of a 901 covered work results from an entity transaction, each party to that 902 transaction who receives a copy of the work also receives whatever 903 licenses to the work the party's predecessor in interest had or 904 could give under the previous paragraph, plus a right to 905 possession of the Corresponding Source of the work from the 906 predecessor in interest, if the predecessor has it or can get it 907 with reasonable efforts. 908 909 You may not impose any further restrictions on the exercise of the 910 rights granted or affirmed under this License. For example, you 911 may not impose a license fee, royalty, or other charge for 912 exercise of rights granted under this License, and you may not 913 initiate litigation (including a cross-claim or counterclaim in a 914 lawsuit) alleging that any patent claim is infringed by making, 915 using, selling, offering for sale, or importing the Program or any 916 portion of it. 917 918 11. Patents. 919 920 A "contributor" is a copyright holder who authorizes use under this 921 License of the Program or a work on which the Program is based. 922 The work thus licensed is called the contributor's "contributor 923 version". 924 925 A contributor's "essential patent claims" are all patent claims 926 owned or controlled by the contributor, whether already acquired or 927 hereafter acquired, that would be infringed by some manner, 928 permitted by this License, of making, using, or selling its 929 contributor version, but do not include claims that would be 930 infringed only as a consequence of further modification of the 931 contributor version. For purposes of this definition, "control" 932 includes the right to grant patent sublicenses in a manner 933 consistent with the requirements of this License. 934 935 Each contributor grants you a non-exclusive, worldwide, 936 royalty-free patent license under the contributor's essential 937 patent claims, to make, use, sell, offer for sale, import and 938 otherwise run, modify and propagate the contents of its 939 contributor version. 940 941 In the following three paragraphs, a "patent license" is any 942 express agreement or commitment, however denominated, not to 943 enforce a patent (such as an express permission to practice a 944 patent or covenant not to sue for patent infringement). To 945 "grant" such a patent license to a party means to make such an 946 agreement or commitment not to enforce a patent against the party. 947 948 If you convey a covered work, knowingly relying on a patent 949 license, and the Corresponding Source of the work is not available 950 for anyone to copy, free of charge and under the terms of this 951 License, through a publicly available network server or other 952 readily accessible means, then you must either (1) cause the 953 Corresponding Source to be so available, or (2) arrange to deprive 954 yourself of the benefit of the patent license for this particular 955 work, or (3) arrange, in a manner consistent with the requirements 956 of this License, to extend the patent license to downstream 957 recipients. "Knowingly relying" means you have actual knowledge 958 that, but for the patent license, your conveying the covered work 959 in a country, or your recipient's use of the covered work in a 960 country, would infringe one or more identifiable patents in that 961 country that you have reason to believe are valid. 962 963 If, pursuant to or in connection with a single transaction or 964 arrangement, you convey, or propagate by procuring conveyance of, a 965 covered work, and grant a patent license to some of the parties 966 receiving the covered work authorizing them to use, propagate, 967 modify or convey a specific copy of the covered work, then the 968 patent license you grant is automatically extended to all 969 recipients of the covered work and works based on it. 970 971 A patent license is "discriminatory" if it does not include within 972 the scope of its coverage, prohibits the exercise of, or is 973 conditioned on the non-exercise of one or more of the rights that 974 are specifically granted under this License. You may not convey a 975 covered work if you are a party to an arrangement with a third 976 party that is in the business of distributing software, under 977 which you make payment to the third party based on the extent of 978 your activity of conveying the work, and under which the third 979 party grants, to any of the parties who would receive the covered 980 work from you, a discriminatory patent license (a) in connection 981 with copies of the covered work conveyed by you (or copies made 982 from those copies), or (b) primarily for and in connection with 983 specific products or compilations that contain the covered work, 984 unless you entered into that arrangement, or that patent license 985 was granted, prior to 28 March 2007. 986 987 Nothing in this License shall be construed as excluding or limiting 988 any implied license or other defenses to infringement that may 989 otherwise be available to you under applicable patent law. 990 991 12. No Surrender of Others' Freedom. 992 993 If conditions are imposed on you (whether by court order, 994 agreement or otherwise) that contradict the conditions of this 995 License, they do not excuse you from the conditions of this 996 License. If you cannot convey a covered work so as to satisfy 997 simultaneously your obligations under this License and any other 998 pertinent obligations, then as a consequence you may not convey it 999 at all. For example, if you agree to terms that obligate you to 1000 collect a royalty for further conveying from those to whom you 1001 convey the Program, the only way you could satisfy both those 1002 terms and this License would be to refrain entirely from conveying 1003 the Program. 1004 1005 13. Use with the GNU Affero General Public License. 1006 1007 Notwithstanding any other provision of this License, you have 1008 permission to link or combine any covered work with a work licensed 1009 under version 3 of the GNU Affero General Public License into a 1010 single combined work, and to convey the resulting work. The terms 1011 of this License will continue to apply to the part which is the 1012 covered work, but the special requirements of the GNU Affero 1013 General Public License, section 13, concerning interaction through 1014 a network will apply to the combination as such. 1015 1016 14. Revised Versions of this License. 1017 1018 The Free Software Foundation may publish revised and/or new 1019 versions of the GNU General Public License from time to time. 1020 Such new versions will be similar in spirit to the present 1021 version, but may differ in detail to address new problems or 1022 concerns. 1023 1024 Each version is given a distinguishing version number. If the 1025 Program specifies that a certain numbered version of the GNU 1026 General Public License "or any later version" applies to it, you 1027 have the option of following the terms and conditions either of 1028 that numbered version or of any later version published by the 1029 Free Software Foundation. If the Program does not specify a 1030 version number of the GNU General Public License, you may choose 1031 any version ever published by the Free Software Foundation. 1032 1033 If the Program specifies that a proxy can decide which future 1034 versions of the GNU General Public License can be used, that 1035 proxy's public statement of acceptance of a version permanently 1036 authorizes you to choose that version for the Program. 1037 1038 Later license versions may give you additional or different 1039 permissions. However, no additional obligations are imposed on any 1040 author or copyright holder as a result of your choosing to follow a 1041 later version. 1042 1043 15. Disclaimer of Warranty. 1044 1045 THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 1046 APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE 1047 COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" 1048 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, 1049 INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 1050 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE 1051 RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. 1052 SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL 1053 NECESSARY SERVICING, REPAIR OR CORRECTION. 1054 1055 16. Limitation of Liability. 1056 1057 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN 1058 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES 1059 AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU 1060 FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR 1061 CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE 1062 THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA 1063 BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 1064 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 1065 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF 1066 THE POSSIBILITY OF SUCH DAMAGES. 1067 1068 17. Interpretation of Sections 15 and 16. 1069 1070 If the disclaimer of warranty and limitation of liability provided 1071 above cannot be given local legal effect according to their terms, 1072 reviewing courts shall apply local law that most closely 1073 approximates an absolute waiver of all civil liability in 1074 connection with the Program, unless a warranty or assumption of 1075 liability accompanies a copy of the Program in return for a fee. 1076 1077 1078 END OF TERMS AND CONDITIONS 1079 =========================== 1080 1081 How to Apply These Terms to Your New Programs 1082 ============================================= 1083 1084 If you develop a new program, and you want it to be of the greatest 1085 possible use to the public, the best way to achieve this is to make it 1086 free software which everyone can redistribute and change under these 1087 terms. 1088 1089 To do so, attach the following notices to the program. It is safest 1090 to attach them to the start of each source file to most effectively 1091 state the exclusion of warranty; and each file should have at least the 1092 "copyright" line and a pointer to where the full notice is found. 1093 1094 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. 1095 Copyright (C) YEAR NAME OF AUTHOR 1096 1097 This program is free software: you can redistribute it and/or modify 1098 it under the terms of the GNU General Public License as published by 1099 the Free Software Foundation, either version 3 of the License, or (at 1100 your option) any later version. 1101 1102 This program is distributed in the hope that it will be useful, but 1103 WITHOUT ANY WARRANTY; without even the implied warranty of 1104 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 1105 General Public License for more details. 1106 1107 You should have received a copy of the GNU General Public License 1108 along with this program. If not, see `http://www.gnu.org/licenses/'. 1109 1110 Also add information on how to contact you by electronic and paper 1111 mail. 1112 1113 If the program does terminal interaction, make it output a short 1114 notice like this when it starts in an interactive mode: 1115 1116 PROGRAM Copyright (C) YEAR NAME OF AUTHOR 1117 This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 1118 This is free software, and you are welcome to redistribute it 1119 under certain conditions; type `show c' for details. 1120 1121 The hypothetical commands `show w' and `show c' should show the 1122 appropriate parts of the General Public License. Of course, your 1123 program's commands might be different; for a GUI interface, you would 1124 use an "about box". 1125 1126 You should also get your employer (if you work as a programmer) or 1127 school, if any, to sign a "copyright disclaimer" for the program, if 1128 necessary. For more information on this, and how to apply and follow 1129 the GNU GPL, see `http://www.gnu.org/licenses/'. 1130 1131 The GNU General Public License does not permit incorporating your 1132 program into proprietary programs. If your program is a subroutine 1133 library, you may consider it more useful to permit linking proprietary 1134 applications with the library. If this is what you want to do, use the 1135 GNU Lesser General Public License instead of this License. But first, 1136 please read `http://www.gnu.org/philosophy/why-not-lgpl.html'. 1137 1138 1139 File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top 1140 1141 1 The Concepts of Bison 1142 *********************** 1143 1144 This chapter introduces many of the basic concepts without which the 1145 details of Bison will not make sense. If you do not already know how to 1146 use Bison or Yacc, we suggest you start by reading this chapter 1147 carefully. 1148 1149 * Menu: 1150 1151 * Language and Grammar:: Languages and context-free grammars, 1152 as mathematical ideas. 1153 * Grammar in Bison:: How we represent grammars for Bison's sake. 1154 * Semantic Values:: Each token or syntactic grouping can have 1155 a semantic value (the value of an integer, 1156 the name of an identifier, etc.). 1157 * Semantic Actions:: Each rule can have an action containing C code. 1158 * GLR Parsers:: Writing parsers for general context-free languages. 1159 * Locations:: Overview of location tracking. 1160 * Bison Parser:: What are Bison's input and output, 1161 how is the output used? 1162 * Stages:: Stages in writing and running Bison grammars. 1163 * Grammar Layout:: Overall structure of a Bison grammar file. 1164 1165 1166 File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Up: Concepts 1167 1168 1.1 Languages and Context-Free Grammars 1169 ======================================= 1170 1171 In order for Bison to parse a language, it must be described by a 1172 "context-free grammar". This means that you specify one or more 1173 "syntactic groupings" and give rules for constructing them from their 1174 parts. For example, in the C language, one kind of grouping is called 1175 an `expression'. One rule for making an expression might be, "An 1176 expression can be made of a minus sign and another expression". 1177 Another would be, "An expression can be an integer". As you can see, 1178 rules are often recursive, but there must be at least one rule which 1179 leads out of the recursion. 1180 1181 The most common formal system for presenting such rules for humans 1182 to read is "Backus-Naur Form" or "BNF", which was developed in order to 1183 specify the language Algol 60. Any grammar expressed in BNF is a 1184 context-free grammar. The input to Bison is essentially 1185 machine-readable BNF. 1186 1187 There are various important subclasses of context-free grammars. 1188 Although it can handle almost all context-free grammars, Bison is 1189 optimized for what are called LR(1) grammars. In brief, in these 1190 grammars, it must be possible to tell how to parse any portion of an 1191 input string with just a single token of lookahead. For historical 1192 reasons, Bison by default is limited by the additional restrictions of 1193 LALR(1), which is hard to explain simply. *Note Mysterious 1194 Conflicts::, for more information on this. As an experimental feature, 1195 you can escape these additional restrictions by requesting IELR(1) or 1196 canonical LR(1) parser tables. *Note LR Table Construction::, to learn 1197 how. 1198 1199 Parsers for LR(1) grammars are "deterministic", meaning roughly that 1200 the next grammar rule to apply at any point in the input is uniquely 1201 determined by the preceding input and a fixed, finite portion (called a 1202 "lookahead") of the remaining input. A context-free grammar can be 1203 "ambiguous", meaning that there are multiple ways to apply the grammar 1204 rules to get the same inputs. Even unambiguous grammars can be 1205 "nondeterministic", meaning that no fixed lookahead always suffices to 1206 determine the next grammar rule to apply. With the proper 1207 declarations, Bison is also able to parse these more general 1208 context-free grammars, using a technique known as GLR parsing (for 1209 Generalized LR). Bison's GLR parsers are able to handle any 1210 context-free grammar for which the number of possible parses of any 1211 given string is finite. 1212 1213 In the formal grammatical rules for a language, each kind of 1214 syntactic unit or grouping is named by a "symbol". Those which are 1215 built by grouping smaller constructs according to grammatical rules are 1216 called "nonterminal symbols"; those which can't be subdivided are called 1217 "terminal symbols" or "token types". We call a piece of input 1218 corresponding to a single terminal symbol a "token", and a piece 1219 corresponding to a single nonterminal symbol a "grouping". 1220 1221 We can use the C language as an example of what symbols, terminal and 1222 nonterminal, mean. The tokens of C are identifiers, constants (numeric 1223 and string), and the various keywords, arithmetic operators and 1224 punctuation marks. So the terminal symbols of a grammar for C include 1225 `identifier', `number', `string', plus one symbol for each keyword, 1226 operator or punctuation mark: `if', `return', `const', `static', `int', 1227 `char', `plus-sign', `open-brace', `close-brace', `comma' and many more. 1228 (These tokens can be subdivided into characters, but that is a matter of 1229 lexicography, not grammar.) 1230 1231 Here is a simple C function subdivided into tokens: 1232 1233 int /* keyword `int' */ 1234 square (int x) /* identifier, open-paren, keyword `int', 1235 identifier, close-paren */ 1236 { /* open-brace */ 1237 return x * x; /* keyword `return', identifier, asterisk, 1238 identifier, semicolon */ 1239 } /* close-brace */ 1240 1241 The syntactic groupings of C include the expression, the statement, 1242 the declaration, and the function definition. These are represented in 1243 the grammar of C by nonterminal symbols `expression', `statement', 1244 `declaration' and `function definition'. The full grammar uses dozens 1245 of additional language constructs, each with its own nonterminal 1246 symbol, in order to express the meanings of these four. The example 1247 above is a function definition; it contains one declaration, and one 1248 statement. In the statement, each `x' is an expression and so is `x * 1249 x'. 1250 1251 Each nonterminal symbol must have grammatical rules showing how it 1252 is made out of simpler constructs. For example, one kind of C 1253 statement is the `return' statement; this would be described with a 1254 grammar rule which reads informally as follows: 1255 1256 A `statement' can be made of a `return' keyword, an `expression' 1257 and a `semicolon'. 1258 1259 There would be many other rules for `statement', one for each kind of 1260 statement in C. 1261 1262 One nonterminal symbol must be distinguished as the special one which 1263 defines a complete utterance in the language. It is called the "start 1264 symbol". In a compiler, this means a complete input program. In the C 1265 language, the nonterminal symbol `sequence of definitions and 1266 declarations' plays this role. 1267 1268 For example, `1 + 2' is a valid C expression--a valid part of a C 1269 program--but it is not valid as an _entire_ C program. In the 1270 context-free grammar of C, this follows from the fact that `expression' 1271 is not the start symbol. 1272 1273 The Bison parser reads a sequence of tokens as its input, and groups 1274 the tokens using the grammar rules. If the input is valid, the end 1275 result is that the entire token sequence reduces to a single grouping 1276 whose symbol is the grammar's start symbol. If we use a grammar for C, 1277 the entire input must be a `sequence of definitions and declarations'. 1278 If not, the parser reports a syntax error. 1279 1280 1281 File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts 1282 1283 1.2 From Formal Rules to Bison Input 1284 ==================================== 1285 1286 A formal grammar is a mathematical construct. To define the language 1287 for Bison, you must write a file expressing the grammar in Bison syntax: 1288 a "Bison grammar" file. *Note Bison Grammar Files: Grammar File. 1289 1290 A nonterminal symbol in the formal grammar is represented in Bison 1291 input as an identifier, like an identifier in C. By convention, it 1292 should be in lower case, such as `expr', `stmt' or `declaration'. 1293 1294 The Bison representation for a terminal symbol is also called a 1295 "token type". Token types as well can be represented as C-like 1296 identifiers. By convention, these identifiers should be upper case to 1297 distinguish them from nonterminals: for example, `INTEGER', 1298 `IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a 1299 particular keyword in the language should be named after that keyword 1300 converted to upper case. The terminal symbol `error' is reserved for 1301 error recovery. *Note Symbols::. 1302 1303 A terminal symbol can also be represented as a character literal, 1304 just like a C character constant. You should do this whenever a token 1305 is just a single character (parenthesis, plus-sign, etc.): use that 1306 same character in a literal as the terminal symbol for that token. 1307 1308 A third way to represent a terminal symbol is with a C string 1309 constant containing several characters. *Note Symbols::, for more 1310 information. 1311 1312 The grammar rules also have an expression in Bison syntax. For 1313 example, here is the Bison rule for a C `return' statement. The 1314 semicolon in quotes is a literal character token, representing part of 1315 the C syntax for the statement; the naked semicolon, and the colon, are 1316 Bison punctuation used in every rule. 1317 1318 stmt: RETURN expr ';' ; 1319 1320 *Note Syntax of Grammar Rules: Rules. 1321 1322 1323 File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts 1324 1325 1.3 Semantic Values 1326 =================== 1327 1328 A formal grammar selects tokens only by their classifications: for 1329 example, if a rule mentions the terminal symbol `integer constant', it 1330 means that _any_ integer constant is grammatically valid in that 1331 position. The precise value of the constant is irrelevant to how to 1332 parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is 1333 equally grammatical. 1334 1335 But the precise value is very important for what the input means 1336 once it is parsed. A compiler is useless if it fails to distinguish 1337 between 4, 1 and 3989 as constants in the program! Therefore, each 1338 token in a Bison grammar has both a token type and a "semantic value". 1339 *Note Defining Language Semantics: Semantics, for details. 1340 1341 The token type is a terminal symbol defined in the grammar, such as 1342 `INTEGER', `IDENTIFIER' or `',''. It tells everything you need to know 1343 to decide where the token may validly appear and how to group it with 1344 other tokens. The grammar rules know nothing about tokens except their 1345 types. 1346 1347 The semantic value has all the rest of the information about the 1348 meaning of the token, such as the value of an integer, or the name of an 1349 identifier. (A token such as `','' which is just punctuation doesn't 1350 need to have any semantic value.) 1351 1352 For example, an input token might be classified as token type 1353 `INTEGER' and have the semantic value 4. Another input token might 1354 have the same token type `INTEGER' but value 3989. When a grammar rule 1355 says that `INTEGER' is allowed, either of these tokens is acceptable 1356 because each is an `INTEGER'. When the parser accepts the token, it 1357 keeps track of the token's semantic value. 1358 1359 Each grouping can also have a semantic value as well as its 1360 nonterminal symbol. For example, in a calculator, an expression 1361 typically has a semantic value that is a number. In a compiler for a 1362 programming language, an expression typically has a semantic value that 1363 is a tree structure describing the meaning of the expression. 1364 1365 1366 File: bison.info, Node: Semantic Actions, Next: GLR Parsers, Prev: Semantic Values, Up: Concepts 1367 1368 1.4 Semantic Actions 1369 ==================== 1370 1371 In order to be useful, a program must do more than parse input; it must 1372 also produce some output based on the input. In a Bison grammar, a 1373 grammar rule can have an "action" made up of C statements. Each time 1374 the parser recognizes a match for that rule, the action is executed. 1375 *Note Actions::. 1376 1377 Most of the time, the purpose of an action is to compute the 1378 semantic value of the whole construct from the semantic values of its 1379 parts. For example, suppose we have a rule which says an expression 1380 can be the sum of two expressions. When the parser recognizes such a 1381 sum, each of the subexpressions has a semantic value which describes 1382 how it was built up. The action for this rule should create a similar 1383 sort of value for the newly recognized larger expression. 1384 1385 For example, here is a rule that says an expression can be the sum of 1386 two subexpressions: 1387 1388 expr: expr '+' expr { $$ = $1 + $3; } ; 1389 1390 The action says how to produce the semantic value of the sum expression 1391 from the values of the two subexpressions. 1392 1393 1394 File: bison.info, Node: GLR Parsers, Next: Locations, Prev: Semantic Actions, Up: Concepts 1395 1396 1.5 Writing GLR Parsers 1397 ======================= 1398 1399 In some grammars, Bison's deterministic LR(1) parsing algorithm cannot 1400 decide whether to apply a certain grammar rule at a given point. That 1401 is, it may not be able to decide (on the basis of the input read so 1402 far) which of two possible reductions (applications of a grammar rule) 1403 applies, or whether to apply a reduction or read more of the input and 1404 apply a reduction later in the input. These are known respectively as 1405 "reduce/reduce" conflicts (*note Reduce/Reduce::), and "shift/reduce" 1406 conflicts (*note Shift/Reduce::). 1407 1408 To use a grammar that is not easily modified to be LR(1), a more 1409 general parsing algorithm is sometimes necessary. If you include 1410 `%glr-parser' among the Bison declarations in your file (*note Grammar 1411 Outline::), the result is a Generalized LR (GLR) parser. These parsers 1412 handle Bison grammars that contain no unresolved conflicts (i.e., after 1413 applying precedence declarations) identically to deterministic parsers. 1414 However, when faced with unresolved shift/reduce and reduce/reduce 1415 conflicts, GLR parsers use the simple expedient of doing both, 1416 effectively cloning the parser to follow both possibilities. Each of 1417 the resulting parsers can again split, so that at any given time, there 1418 can be any number of possible parses being explored. The parsers 1419 proceed in lockstep; that is, all of them consume (shift) a given input 1420 symbol before any of them proceed to the next. Each of the cloned 1421 parsers eventually meets one of two possible fates: either it runs into 1422 a parsing error, in which case it simply vanishes, or it merges with 1423 another parser, because the two of them have reduced the input to an 1424 identical set of symbols. 1425 1426 During the time that there are multiple parsers, semantic actions are 1427 recorded, but not performed. When a parser disappears, its recorded 1428 semantic actions disappear as well, and are never performed. When a 1429 reduction makes two parsers identical, causing them to merge, Bison 1430 records both sets of semantic actions. Whenever the last two parsers 1431 merge, reverting to the single-parser case, Bison resolves all the 1432 outstanding actions either by precedences given to the grammar rules 1433 involved, or by performing both actions, and then calling a designated 1434 user-defined function on the resulting values to produce an arbitrary 1435 merged result. 1436 1437 * Menu: 1438 1439 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 1440 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 1441 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 1442 * Compiler Requirements:: GLR parsers require a modern C compiler. 1443 1444 1445 File: bison.info, Node: Simple GLR Parsers, Next: Merging GLR Parses, Up: GLR Parsers 1446 1447 1.5.1 Using GLR on Unambiguous Grammars 1448 --------------------------------------- 1449 1450 In the simplest cases, you can use the GLR algorithm to parse grammars 1451 that are unambiguous but fail to be LR(1). Such grammars typically 1452 require more than one symbol of lookahead. 1453 1454 Consider a problem that arises in the declaration of enumerated and 1455 subrange types in the programming language Pascal. Here are some 1456 examples: 1457 1458 type subrange = lo .. hi; 1459 type enum = (a, b, c); 1460 1461 The original language standard allows only numeric literals and 1462 constant identifiers for the subrange bounds (`lo' and `hi'), but 1463 Extended Pascal (ISO/IEC 10206) and many other Pascal implementations 1464 allow arbitrary expressions there. This gives rise to the following 1465 situation, containing a superfluous pair of parentheses: 1466 1467 type subrange = (a) .. b; 1468 1469 Compare this to the following declaration of an enumerated type with 1470 only one value: 1471 1472 type enum = (a); 1473 1474 (These declarations are contrived, but they are syntactically valid, 1475 and more-complicated cases can come up in practical programs.) 1476 1477 These two declarations look identical until the `..' token. With 1478 normal LR(1) one-token lookahead it is not possible to decide between 1479 the two forms when the identifier `a' is parsed. It is, however, 1480 desirable for a parser to decide this, since in the latter case `a' 1481 must become a new identifier to represent the enumeration value, while 1482 in the former case `a' must be evaluated with its current meaning, 1483 which may be a constant or even a function call. 1484 1485 You could parse `(a)' as an "unspecified identifier in parentheses", 1486 to be resolved later, but this typically requires substantial 1487 contortions in both semantic actions and large parts of the grammar, 1488 where the parentheses are nested in the recursive rules for expressions. 1489 1490 You might think of using the lexer to distinguish between the two 1491 forms by returning different tokens for currently defined and undefined 1492 identifiers. But if these declarations occur in a local scope, and `a' 1493 is defined in an outer scope, then both forms are possible--either 1494 locally redefining `a', or using the value of `a' from the outer scope. 1495 So this approach cannot work. 1496 1497 A simple solution to this problem is to declare the parser to use 1498 the GLR algorithm. When the GLR parser reaches the critical state, it 1499 merely splits into two branches and pursues both syntax rules 1500 simultaneously. Sooner or later, one of them runs into a parsing 1501 error. If there is a `..' token before the next `;', the rule for 1502 enumerated types fails since it cannot accept `..' anywhere; otherwise, 1503 the subrange type rule fails since it requires a `..' token. So one of 1504 the branches fails silently, and the other one continues normally, 1505 performing all the intermediate actions that were postponed during the 1506 split. 1507 1508 If the input is syntactically incorrect, both branches fail and the 1509 parser reports a syntax error as usual. 1510 1511 The effect of all this is that the parser seems to "guess" the 1512 correct branch to take, or in other words, it seems to use more 1513 lookahead than the underlying LR(1) algorithm actually allows for. In 1514 this example, LR(2) would suffice, but also some cases that are not 1515 LR(k) for any k can be handled this way. 1516 1517 In general, a GLR parser can take quadratic or cubic worst-case time, 1518 and the current Bison parser even takes exponential time and space for 1519 some grammars. In practice, this rarely happens, and for many grammars 1520 it is possible to prove that it cannot happen. The present example 1521 contains only one conflict between two rules, and the type-declaration 1522 context containing the conflict cannot be nested. So the number of 1523 branches that can exist at any time is limited by the constant 2, and 1524 the parsing time is still linear. 1525 1526 Here is a Bison grammar corresponding to the example above. It 1527 parses a vastly simplified form of Pascal type declarations. 1528 1529 %token TYPE DOTDOT ID 1530 1531 %left '+' '-' 1532 %left '*' '/' 1533 1534 %% 1535 1536 type_decl: TYPE ID '=' type ';' ; 1537 1538 type: 1539 '(' id_list ')' 1540 | expr DOTDOT expr 1541 ; 1542 1543 id_list: 1544 ID 1545 | id_list ',' ID 1546 ; 1547 1548 expr: 1549 '(' expr ')' 1550 | expr '+' expr 1551 | expr '-' expr 1552 | expr '*' expr 1553 | expr '/' expr 1554 | ID 1555 ; 1556 1557 When used as a normal LR(1) grammar, Bison correctly complains about 1558 one reduce/reduce conflict. In the conflicting situation the parser 1559 chooses one of the alternatives, arbitrarily the one declared first. 1560 Therefore the following correct input is not recognized: 1561 1562 type t = (a) .. b; 1563 1564 The parser can be turned into a GLR parser, while also telling Bison 1565 to be silent about the one known reduce/reduce conflict, by adding 1566 these two declarations to the Bison grammar file (before the first 1567 `%%'): 1568 1569 %glr-parser 1570 %expect-rr 1 1571 1572 No change in the grammar itself is required. Now the parser recognizes 1573 all valid declarations, according to the limited syntax above, 1574 transparently. In fact, the user does not even notice when the parser 1575 splits. 1576 1577 So here we have a case where we can use the benefits of GLR, almost 1578 without disadvantages. Even in simple cases like this, however, there 1579 are at least two potential problems to beware. First, always analyze 1580 the conflicts reported by Bison to make sure that GLR splitting is only 1581 done where it is intended. A GLR parser splitting inadvertently may 1582 cause problems less obvious than an LR parser statically choosing the 1583 wrong alternative in a conflict. Second, consider interactions with 1584 the lexer (*note Semantic Tokens::) with great care. Since a split 1585 parser consumes tokens without performing any actions during the split, 1586 the lexer cannot obtain information via parser actions. Some cases of 1587 lexer interactions can be eliminated by using GLR to shift the 1588 complications from the lexer to the parser. You must check the 1589 remaining cases for correctness. 1590 1591 In our example, it would be safe for the lexer to return tokens 1592 based on their current meanings in some symbol table, because no new 1593 symbols are defined in the middle of a type declaration. Though it is 1594 possible for a parser to define the enumeration constants as they are 1595 parsed, before the type declaration is completed, it actually makes no 1596 difference since they cannot be used within the same enumerated type 1597 declaration. 1598 1599 1600 File: bison.info, Node: Merging GLR Parses, Next: GLR Semantic Actions, Prev: Simple GLR Parsers, Up: GLR Parsers 1601 1602 1.5.2 Using GLR to Resolve Ambiguities 1603 -------------------------------------- 1604 1605 Let's consider an example, vastly simplified from a C++ grammar. 1606 1607 %{ 1608 #include <stdio.h> 1609 #define YYSTYPE char const * 1610 int yylex (void); 1611 void yyerror (char const *); 1612 %} 1613 1614 %token TYPENAME ID 1615 1616 %right '=' 1617 %left '+' 1618 1619 %glr-parser 1620 1621 %% 1622 1623 prog: 1624 /* Nothing. */ 1625 | prog stmt { printf ("\n"); } 1626 ; 1627 1628 stmt: 1629 expr ';' %dprec 1 1630 | decl %dprec 2 1631 ; 1632 1633 expr: 1634 ID { printf ("%s ", $$); } 1635 | TYPENAME '(' expr ')' 1636 { printf ("%s <cast> ", $1); } 1637 | expr '+' expr { printf ("+ "); } 1638 | expr '=' expr { printf ("= "); } 1639 ; 1640 1641 decl: 1642 TYPENAME declarator ';' 1643 { printf ("%s <declare> ", $1); } 1644 | TYPENAME declarator '=' expr ';' 1645 { printf ("%s <init-declare> ", $1); } 1646 ; 1647 1648 declarator: 1649 ID { printf ("\"%s\" ", $1); } 1650 | '(' declarator ')' 1651 ; 1652 1653 This models a problematic part of the C++ grammar--the ambiguity between 1654 certain declarations and statements. For example, 1655 1656 T (x) = y+z; 1657 1658 parses as either an `expr' or a `stmt' (assuming that `T' is recognized 1659 as a `TYPENAME' and `x' as an `ID'). Bison detects this as a 1660 reduce/reduce conflict between the rules `expr : ID' and `declarator : 1661 ID', which it cannot resolve at the time it encounters `x' in the 1662 example above. Since this is a GLR parser, it therefore splits the 1663 problem into two parses, one for each choice of resolving the 1664 reduce/reduce conflict. Unlike the example from the previous section 1665 (*note Simple GLR Parsers::), however, neither of these parses "dies," 1666 because the grammar as it stands is ambiguous. One of the parsers 1667 eventually reduces `stmt : expr ';'' and the other reduces `stmt : 1668 decl', after which both parsers are in an identical state: they've seen 1669 `prog stmt' and have the same unprocessed input remaining. We say that 1670 these parses have "merged." 1671 1672 At this point, the GLR parser requires a specification in the 1673 grammar of how to choose between the competing parses. In the example 1674 above, the two `%dprec' declarations specify that Bison is to give 1675 precedence to the parse that interprets the example as a `decl', which 1676 implies that `x' is a declarator. The parser therefore prints 1677 1678 "x" y z + T <init-declare> 1679 1680 The `%dprec' declarations only come into play when more than one 1681 parse survives. Consider a different input string for this parser: 1682 1683 T (x) + y; 1684 1685 This is another example of using GLR to parse an unambiguous construct, 1686 as shown in the previous section (*note Simple GLR Parsers::). Here, 1687 there is no ambiguity (this cannot be parsed as a declaration). 1688 However, at the time the Bison parser encounters `x', it does not have 1689 enough information to resolve the reduce/reduce conflict (again, 1690 between `x' as an `expr' or a `declarator'). In this case, no 1691 precedence declaration is used. Again, the parser splits into two, one 1692 assuming that `x' is an `expr', and the other assuming `x' is a 1693 `declarator'. The second of these parsers then vanishes when it sees 1694 `+', and the parser prints 1695 1696 x T <cast> y + 1697 1698 Suppose that instead of resolving the ambiguity, you wanted to see 1699 all the possibilities. For this purpose, you must merge the semantic 1700 actions of the two possible parsers, rather than choosing one over the 1701 other. To do so, you could change the declaration of `stmt' as follows: 1702 1703 stmt: 1704 expr ';' %merge <stmtMerge> 1705 | decl %merge <stmtMerge> 1706 ; 1707 1708 and define the `stmtMerge' function as: 1709 1710 static YYSTYPE 1711 stmtMerge (YYSTYPE x0, YYSTYPE x1) 1712 { 1713 printf ("<OR> "); 1714 return ""; 1715 } 1716 1717 with an accompanying forward declaration in the C declarations at the 1718 beginning of the file: 1719 1720 %{ 1721 #define YYSTYPE char const * 1722 static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); 1723 %} 1724 1725 With these declarations, the resulting parser parses the first example 1726 as both an `expr' and a `decl', and prints 1727 1728 "x" y z + T <init-declare> x T <cast> y z + = <OR> 1729 1730 Bison requires that all of the productions that participate in any 1731 particular merge have identical `%merge' clauses. Otherwise, the 1732 ambiguity would be unresolvable, and the parser will report an error 1733 during any parse that results in the offending merge. 1734 1735 1736 File: bison.info, Node: GLR Semantic Actions, Next: Compiler Requirements, Prev: Merging GLR Parses, Up: GLR Parsers 1737 1738 1.5.3 GLR Semantic Actions 1739 -------------------------- 1740 1741 By definition, a deferred semantic action is not performed at the same 1742 time as the associated reduction. This raises caveats for several 1743 Bison features you might use in a semantic action in a GLR parser. 1744 1745 In any semantic action, you can examine `yychar' to determine the 1746 type of the lookahead token present at the time of the associated 1747 reduction. After checking that `yychar' is not set to `YYEMPTY' or 1748 `YYEOF', you can then examine `yylval' and `yylloc' to determine the 1749 lookahead token's semantic value and location, if any. In a 1750 nondeferred semantic action, you can also modify any of these variables 1751 to influence syntax analysis. *Note Lookahead Tokens: Lookahead. 1752 1753 In a deferred semantic action, it's too late to influence syntax 1754 analysis. In this case, `yychar', `yylval', and `yylloc' are set to 1755 shallow copies of the values they had at the time of the associated 1756 reduction. For this reason alone, modifying them is dangerous. 1757 Moreover, the result of modifying them is undefined and subject to 1758 change with future versions of Bison. For example, if a semantic 1759 action might be deferred, you should never write it to invoke 1760 `yyclearin' (*note Action Features::) or to attempt to free memory 1761 referenced by `yylval'. 1762 1763 Another Bison feature requiring special consideration is `YYERROR' 1764 (*note Action Features::), which you can invoke in a semantic action to 1765 initiate error recovery. During deterministic GLR operation, the 1766 effect of `YYERROR' is the same as its effect in a deterministic parser. 1767 In a deferred semantic action, its effect is undefined. 1768 1769 Also, see *note Default Action for Locations: Location Default 1770 Action, which describes a special usage of `YYLLOC_DEFAULT' in GLR 1771 parsers. 1772 1773 1774 File: bison.info, Node: Compiler Requirements, Prev: GLR Semantic Actions, Up: GLR Parsers 1775 1776 1.5.4 Considerations when Compiling GLR Parsers 1777 ----------------------------------------------- 1778 1779 The GLR parsers require a compiler for ISO C89 or later. In addition, 1780 they use the `inline' keyword, which is not C89, but is C99 and is a 1781 common extension in pre-C99 compilers. It is up to the user of these 1782 parsers to handle portability issues. For instance, if using Autoconf 1783 and the Autoconf macro `AC_C_INLINE', a mere 1784 1785 %{ 1786 #include <config.h> 1787 %} 1788 1789 will suffice. Otherwise, we suggest 1790 1791 %{ 1792 #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \ 1793 && ! defined inline) 1794 # define inline 1795 #endif 1796 %} 1797 1798 1799 File: bison.info, Node: Locations, Next: Bison Parser, Prev: GLR Parsers, Up: Concepts 1800 1801 1.6 Locations 1802 ============= 1803 1804 Many applications, like interpreters or compilers, have to produce 1805 verbose and useful error messages. To achieve this, one must be able 1806 to keep track of the "textual location", or "location", of each 1807 syntactic construct. Bison provides a mechanism for handling these 1808 locations. 1809 1810 Each token has a semantic value. In a similar fashion, each token 1811 has an associated location, but the type of locations is the same for 1812 all tokens and groupings. Moreover, the output parser is equipped with 1813 a default data structure for storing locations (*note Tracking 1814 Locations::, for more details). 1815 1816 Like semantic values, locations can be reached in actions using a 1817 dedicated set of constructs. In the example above, the location of the 1818 whole grouping is `@$', while the locations of the subexpressions are 1819 `@1' and `@3'. 1820 1821 When a rule is matched, a default action is used to compute the 1822 semantic value of its left hand side (*note Actions::). In the same 1823 way, another default action is used for locations. However, the action 1824 for locations is general enough for most cases, meaning there is 1825 usually no need to describe for each rule how `@$' should be formed. 1826 When building a new location for a given grouping, the default behavior 1827 of the output parser is to take the beginning of the first symbol, and 1828 the end of the last symbol. 1829 1830 1831 File: bison.info, Node: Bison Parser, Next: Stages, Prev: Locations, Up: Concepts 1832 1833 1.7 Bison Output: the Parser Implementation File 1834 ================================================ 1835 1836 When you run Bison, you give it a Bison grammar file as input. The 1837 most important output is a C source file that implements a parser for 1838 the language described by the grammar. This parser is called a "Bison 1839 parser", and this file is called a "Bison parser implementation file". 1840 Keep in mind that the Bison utility and the Bison parser are two 1841 distinct programs: the Bison utility is a program whose output is the 1842 Bison parser implementation file that becomes part of your program. 1843 1844 The job of the Bison parser is to group tokens into groupings 1845 according to the grammar rules--for example, to build identifiers and 1846 operators into expressions. As it does this, it runs the actions for 1847 the grammar rules it uses. 1848 1849 The tokens come from a function called the "lexical analyzer" that 1850 you must supply in some fashion (such as by writing it in C). The Bison 1851 parser calls the lexical analyzer each time it wants a new token. It 1852 doesn't know what is "inside" the tokens (though their semantic values 1853 may reflect this). Typically the lexical analyzer makes the tokens by 1854 parsing characters of text, but Bison does not depend on this. *Note 1855 The Lexical Analyzer Function `yylex': Lexical. 1856 1857 The Bison parser implementation file is C code which defines a 1858 function named `yyparse' which implements that grammar. This function 1859 does not make a complete C program: you must supply some additional 1860 functions. One is the lexical analyzer. Another is an error-reporting 1861 function which the parser calls to report an error. In addition, a 1862 complete C program must start with a function called `main'; you have 1863 to provide this, and arrange for it to call `yyparse' or the parser 1864 will never run. *Note Parser C-Language Interface: Interface. 1865 1866 Aside from the token type names and the symbols in the actions you 1867 write, all symbols defined in the Bison parser implementation file 1868 itself begin with `yy' or `YY'. This includes interface functions such 1869 as the lexical analyzer function `yylex', the error reporting function 1870 `yyerror' and the parser function `yyparse' itself. This also includes 1871 numerous identifiers used for internal purposes. Therefore, you should 1872 avoid using C identifiers starting with `yy' or `YY' in the Bison 1873 grammar file except for the ones defined in this manual. Also, you 1874 should avoid using the C identifiers `malloc' and `free' for anything 1875 other than their usual meanings. 1876 1877 In some cases the Bison parser implementation file includes system 1878 headers, and in those cases your code should respect the identifiers 1879 reserved by those headers. On some non-GNU hosts, `<alloca.h>', 1880 `<malloc.h>', `<stddef.h>', and `<stdlib.h>' are included as needed to 1881 declare memory allocators and related types. `<libintl.h>' is included 1882 if message translation is in use (*note Internationalization::). Other 1883 system headers may be included if you define `YYDEBUG' to a nonzero 1884 value (*note Tracing Your Parser: Tracing.). 1885 1886 1887 File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts 1888 1889 1.8 Stages in Using Bison 1890 ========================= 1891 1892 The actual language-design process using Bison, from grammar 1893 specification to a working compiler or interpreter, has these parts: 1894 1895 1. Formally specify the grammar in a form recognized by Bison (*note 1896 Bison Grammar Files: Grammar File.). For each grammatical rule in 1897 the language, describe the action that is to be taken when an 1898 instance of that rule is recognized. The action is described by a 1899 sequence of C statements. 1900 1901 2. Write a lexical analyzer to process input and pass tokens to the 1902 parser. The lexical analyzer may be written by hand in C (*note 1903 The Lexical Analyzer Function `yylex': Lexical.). It could also 1904 be produced using Lex, but the use of Lex is not discussed in this 1905 manual. 1906 1907 3. Write a controlling function that calls the Bison-produced parser. 1908 1909 4. Write error-reporting routines. 1910 1911 To turn this source code as written into a runnable program, you 1912 must follow these steps: 1913 1914 1. Run Bison on the grammar to produce the parser. 1915 1916 2. Compile the code output by Bison, as well as any other source 1917 files. 1918 1919 3. Link the object files to produce the finished product. 1920 1921 1922 File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts 1923 1924 1.9 The Overall Layout of a Bison Grammar 1925 ========================================= 1926 1927 The input file for the Bison utility is a "Bison grammar file". The 1928 general form of a Bison grammar file is as follows: 1929 1930 %{ 1931 PROLOGUE 1932 %} 1933 1934 BISON DECLARATIONS 1935 1936 %% 1937 GRAMMAR RULES 1938 %% 1939 EPILOGUE 1940 1941 The `%%', `%{' and `%}' are punctuation that appears in every Bison 1942 grammar file to separate the sections. 1943 1944 The prologue may define types and variables used in the actions. 1945 You can also use preprocessor commands to define macros used there, and 1946 use `#include' to include header files that do any of these things. 1947 You need to declare the lexical analyzer `yylex' and the error printer 1948 `yyerror' here, along with any other global identifiers used by the 1949 actions in the grammar rules. 1950 1951 The Bison declarations declare the names of the terminal and 1952 nonterminal symbols, and may also describe operator precedence and the 1953 data types of semantic values of various symbols. 1954 1955 The grammar rules define how to construct each nonterminal symbol 1956 from its parts. 1957 1958 The epilogue can contain any code you want to use. Often the 1959 definitions of functions declared in the prologue go here. In a simple 1960 program, all the rest of the program can go here. 1961 1962 1963 File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top 1964 1965 2 Examples 1966 ********** 1967 1968 Now we show and explain several sample programs written using Bison: a 1969 reverse polish notation calculator, an algebraic (infix) notation 1970 calculator -- later extended to track "locations" -- and a 1971 multi-function calculator. All produce usable, though limited, 1972 interactive desk-top calculators. 1973 1974 These examples are simple, but Bison grammars for real programming 1975 languages are written the same way. You can copy these examples into a 1976 source file to try them. 1977 1978 * Menu: 1979 1980 * RPN Calc:: Reverse polish notation calculator; 1981 a first example with no operator precedence. 1982 * Infix Calc:: Infix (algebraic) notation calculator. 1983 Operator precedence is introduced. 1984 * Simple Error Recovery:: Continuing after syntax errors. 1985 * Location Tracking Calc:: Demonstrating the use of @N and @$. 1986 * Multi-function Calc:: Calculator with memory and trig functions. 1987 It uses multiple data-types for semantic values. 1988 * Exercises:: Ideas for improving the multi-function calculator. 1989 1990 1991 File: bison.info, Node: RPN Calc, Next: Infix Calc, Up: Examples 1992 1993 2.1 Reverse Polish Notation Calculator 1994 ====================================== 1995 1996 The first example is that of a simple double-precision "reverse polish 1997 notation" calculator (a calculator using postfix operators). This 1998 example provides a good starting point, since operator precedence is 1999 not an issue. The second example will illustrate how operator 2000 precedence is handled. 2001 2002 The source code for this calculator is named `rpcalc.y'. The `.y' 2003 extension is a convention used for Bison grammar files. 2004 2005 * Menu: 2006 2007 * Rpcalc Declarations:: Prologue (declarations) for rpcalc. 2008 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. 2009 * Rpcalc Lexer:: The lexical analyzer. 2010 * Rpcalc Main:: The controlling function. 2011 * Rpcalc Error:: The error reporting function. 2012 * Rpcalc Generate:: Running Bison on the grammar file. 2013 * Rpcalc Compile:: Run the C compiler on the output code. 2014 2015 2016 File: bison.info, Node: Rpcalc Declarations, Next: Rpcalc Rules, Up: RPN Calc 2017 2018 2.1.1 Declarations for `rpcalc' 2019 ------------------------------- 2020 2021 Here are the C and Bison declarations for the reverse polish notation 2022 calculator. As in C, comments are placed between `/*...*/'. 2023 2024 /* Reverse polish notation calculator. */ 2025 2026 %{ 2027 #define YYSTYPE double 2028 #include <math.h> 2029 int yylex (void); 2030 void yyerror (char const *); 2031 %} 2032 2033 %token NUM 2034 2035 %% /* Grammar rules and actions follow. */ 2036 2037 The declarations section (*note The prologue: Prologue.) contains two 2038 preprocessor directives and two forward declarations. 2039 2040 The `#define' directive defines the macro `YYSTYPE', thus specifying 2041 the C data type for semantic values of both tokens and groupings (*note 2042 Data Types of Semantic Values: Value Type.). The Bison parser will use 2043 whatever type `YYSTYPE' is defined as; if you don't define it, `int' is 2044 the default. Because we specify `double', each token and each 2045 expression has an associated value, which is a floating point number. 2046 2047 The `#include' directive is used to declare the exponentiation 2048 function `pow'. 2049 2050 The forward declarations for `yylex' and `yyerror' are needed 2051 because the C language requires that functions be declared before they 2052 are used. These functions will be defined in the epilogue, but the 2053 parser calls them so they must be declared in the prologue. 2054 2055 The second section, Bison declarations, provides information to Bison 2056 about the token types (*note The Bison Declarations Section: Bison 2057 Declarations.). Each terminal symbol that is not a single-character 2058 literal must be declared here. (Single-character literals normally 2059 don't need to be declared.) In this example, all the arithmetic 2060 operators are designated by single-character literals, so the only 2061 terminal symbol that needs to be declared is `NUM', the token type for 2062 numeric constants. 2063 2064 2065 File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Declarations, Up: RPN Calc 2066 2067 2.1.2 Grammar Rules for `rpcalc' 2068 -------------------------------- 2069 2070 Here are the grammar rules for the reverse polish notation calculator. 2071 2072 input: 2073 /* empty */ 2074 | input line 2075 ; 2076 2077 line: 2078 '\n' 2079 | exp '\n' { printf ("%.10g\n", $1); } 2080 ; 2081 2082 exp: 2083 NUM { $$ = $1; } 2084 | exp exp '+' { $$ = $1 + $2; } 2085 | exp exp '-' { $$ = $1 - $2; } 2086 | exp exp '*' { $$ = $1 * $2; } 2087 | exp exp '/' { $$ = $1 / $2; } 2088 | exp exp '^' { $$ = pow ($1, $2); } /* Exponentiation */ 2089 | exp 'n' { $$ = -$1; } /* Unary minus */ 2090 ; 2091 %% 2092 2093 The groupings of the rpcalc "language" defined here are the 2094 expression (given the name `exp'), the line of input (`line'), and the 2095 complete input transcript (`input'). Each of these nonterminal symbols 2096 has several alternate rules, joined by the vertical bar `|' which is 2097 read as "or". The following sections explain what these rules mean. 2098 2099 The semantics of the language is determined by the actions taken 2100 when a grouping is recognized. The actions are the C code that appears 2101 inside braces. *Note Actions::. 2102 2103 You must specify these actions in C, but Bison provides the means for 2104 passing semantic values between the rules. In each action, the 2105 pseudo-variable `$$' stands for the semantic value for the grouping 2106 that the rule is going to construct. Assigning a value to `$$' is the 2107 main job of most actions. The semantic values of the components of the 2108 rule are referred to as `$1', `$2', and so on. 2109 2110 * Menu: 2111 2112 * Rpcalc Input:: 2113 * Rpcalc Line:: 2114 * Rpcalc Expr:: 2115 2116 2117 File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules 2118 2119 2.1.2.1 Explanation of `input' 2120 .............................. 2121 2122 Consider the definition of `input': 2123 2124 input: 2125 /* empty */ 2126 | input line 2127 ; 2128 2129 This definition reads as follows: "A complete input is either an 2130 empty string, or a complete input followed by an input line". Notice 2131 that "complete input" is defined in terms of itself. This definition 2132 is said to be "left recursive" since `input' appears always as the 2133 leftmost symbol in the sequence. *Note Recursive Rules: Recursion. 2134 2135 The first alternative is empty because there are no symbols between 2136 the colon and the first `|'; this means that `input' can match an empty 2137 string of input (no tokens). We write the rules this way because it is 2138 legitimate to type `Ctrl-d' right after you start the calculator. It's 2139 conventional to put an empty alternative first and write the comment 2140 `/* empty */' in it. 2141 2142 The second alternate rule (`input line') handles all nontrivial 2143 input. It means, "After reading any number of lines, read one more 2144 line if possible." The left recursion makes this rule into a loop. 2145 Since the first alternative matches empty input, the loop can be 2146 executed zero or more times. 2147 2148 The parser function `yyparse' continues to process input until a 2149 grammatical error is seen or the lexical analyzer says there are no more 2150 input tokens; we will arrange for the latter to happen at end-of-input. 2151 2152 2153 File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules 2154 2155 2.1.2.2 Explanation of `line' 2156 ............................. 2157 2158 Now consider the definition of `line': 2159 2160 line: 2161 '\n' 2162 | exp '\n' { printf ("%.10g\n", $1); } 2163 ; 2164 2165 The first alternative is a token which is a newline character; this 2166 means that rpcalc accepts a blank line (and ignores it, since there is 2167 no action). The second alternative is an expression followed by a 2168 newline. This is the alternative that makes rpcalc useful. The 2169 semantic value of the `exp' grouping is the value of `$1' because the 2170 `exp' in question is the first symbol in the alternative. The action 2171 prints this value, which is the result of the computation the user 2172 asked for. 2173 2174 This action is unusual because it does not assign a value to `$$'. 2175 As a consequence, the semantic value associated with the `line' is 2176 uninitialized (its value will be unpredictable). This would be a bug if 2177 that value were ever used, but we don't use it: once rpcalc has printed 2178 the value of the user's input line, that value is no longer needed. 2179 2180 2181 File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules 2182 2183 2.1.2.3 Explanation of `expr' 2184 ............................. 2185 2186 The `exp' grouping has several rules, one for each kind of expression. 2187 The first rule handles the simplest expressions: those that are just 2188 numbers. The second handles an addition-expression, which looks like 2189 two expressions followed by a plus-sign. The third handles 2190 subtraction, and so on. 2191 2192 exp: 2193 NUM 2194 | exp exp '+' { $$ = $1 + $2; } 2195 | exp exp '-' { $$ = $1 - $2; } 2196 ... 2197 ; 2198 2199 We have used `|' to join all the rules for `exp', but we could 2200 equally well have written them separately: 2201 2202 exp: NUM ; 2203 exp: exp exp '+' { $$ = $1 + $2; }; 2204 exp: exp exp '-' { $$ = $1 - $2; }; 2205 ... 2206 2207 Most of the rules have actions that compute the value of the 2208 expression in terms of the value of its parts. For example, in the 2209 rule for addition, `$1' refers to the first component `exp' and `$2' 2210 refers to the second one. The third component, `'+'', has no meaningful 2211 associated semantic value, but if it had one you could refer to it as 2212 `$3'. When `yyparse' recognizes a sum expression using this rule, the 2213 sum of the two subexpressions' values is produced as the value of the 2214 entire expression. *Note Actions::. 2215 2216 You don't have to give an action for every rule. When a rule has no 2217 action, Bison by default copies the value of `$1' into `$$'. This is 2218 what happens in the first rule (the one that uses `NUM'). 2219 2220 The formatting shown here is the recommended convention, but Bison 2221 does not require it. You can add or change white space as much as you 2222 wish. For example, this: 2223 2224 exp: NUM | exp exp '+' {$$ = $1 + $2; } | ... ; 2225 2226 means the same thing as this: 2227 2228 exp: 2229 NUM 2230 | exp exp '+' { $$ = $1 + $2; } 2231 | ... 2232 ; 2233 2234 The latter, however, is much more readable. 2235 2236 2237 File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc 2238 2239 2.1.3 The `rpcalc' Lexical Analyzer 2240 ----------------------------------- 2241 2242 The lexical analyzer's job is low-level parsing: converting characters 2243 or sequences of characters into tokens. The Bison parser gets its 2244 tokens by calling the lexical analyzer. *Note The Lexical Analyzer 2245 Function `yylex': Lexical. 2246 2247 Only a simple lexical analyzer is needed for the RPN calculator. 2248 This lexical analyzer skips blanks and tabs, then reads in numbers as 2249 `double' and returns them as `NUM' tokens. Any other character that 2250 isn't part of a number is a separate token. Note that the token-code 2251 for such a single-character token is the character itself. 2252 2253 The return value of the lexical analyzer function is a numeric code 2254 which represents a token type. The same text used in Bison rules to 2255 stand for this token type is also a C expression for the numeric code 2256 for the type. This works in two ways. If the token type is a 2257 character literal, then its numeric code is that of the character; you 2258 can use the same character literal in the lexical analyzer to express 2259 the number. If the token type is an identifier, that identifier is 2260 defined by Bison as a C macro whose definition is the appropriate 2261 number. In this example, therefore, `NUM' becomes a macro for `yylex' 2262 to use. 2263 2264 The semantic value of the token (if it has one) is stored into the 2265 global variable `yylval', which is where the Bison parser will look for 2266 it. (The C data type of `yylval' is `YYSTYPE', which was defined at 2267 the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc 2268 Declarations.) 2269 2270 A token type code of zero is returned if the end-of-input is 2271 encountered. (Bison recognizes any nonpositive value as indicating 2272 end-of-input.) 2273 2274 Here is the code for the lexical analyzer: 2275 2276 /* The lexical analyzer returns a double floating point 2277 number on the stack and the token NUM, or the numeric code 2278 of the character read if not a number. It skips all blanks 2279 and tabs, and returns 0 for end-of-input. */ 2280 2281 #include <ctype.h> 2282 2283 int 2284 yylex (void) 2285 { 2286 int c; 2287 2288 /* Skip white space. */ 2289 while ((c = getchar ()) == ' ' || c == '\t') 2290 continue; 2291 /* Process numbers. */ 2292 if (c == '.' || isdigit (c)) 2293 { 2294 ungetc (c, stdin); 2295 scanf ("%lf", &yylval); 2296 return NUM; 2297 } 2298 /* Return end-of-input. */ 2299 if (c == EOF) 2300 return 0; 2301 /* Return a single char. */ 2302 return c; 2303 } 2304 2305 2306 File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc 2307 2308 2.1.4 The Controlling Function 2309 ------------------------------ 2310 2311 In keeping with the spirit of this example, the controlling function is 2312 kept to the bare minimum. The only requirement is that it call 2313 `yyparse' to start the process of parsing. 2314 2315 int 2316 main (void) 2317 { 2318 return yyparse (); 2319 } 2320 2321 2322 File: bison.info, Node: Rpcalc Error, Next: Rpcalc Generate, Prev: Rpcalc Main, Up: RPN Calc 2323 2324 2.1.5 The Error Reporting Routine 2325 --------------------------------- 2326 2327 When `yyparse' detects a syntax error, it calls the error reporting 2328 function `yyerror' to print an error message (usually but not always 2329 `"syntax error"'). It is up to the programmer to supply `yyerror' 2330 (*note Parser C-Language Interface: Interface.), so here is the 2331 definition we will use: 2332 2333 #include <stdio.h> 2334 2335 /* Called by yyparse on error. */ 2336 void 2337 yyerror (char const *s) 2338 { 2339 fprintf (stderr, "%s\n", s); 2340 } 2341 2342 After `yyerror' returns, the Bison parser may recover from the error 2343 and continue parsing if the grammar contains a suitable error rule 2344 (*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We 2345 have not written any error rules in this example, so any invalid input 2346 will cause the calculator program to exit. This is not clean behavior 2347 for a real calculator, but it is adequate for the first example. 2348 2349 2350 File: bison.info, Node: Rpcalc Generate, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc 2351 2352 2.1.6 Running Bison to Make the Parser 2353 -------------------------------------- 2354 2355 Before running Bison to produce a parser, we need to decide how to 2356 arrange all the source code in one or more source files. For such a 2357 simple example, the easiest thing is to put everything in one file, the 2358 grammar file. The definitions of `yylex', `yyerror' and `main' go at 2359 the end, in the epilogue of the grammar file (*note The Overall Layout 2360 of a Bison Grammar: Grammar Layout.). 2361 2362 For a large project, you would probably have several source files, 2363 and use `make' to arrange to recompile them. 2364 2365 With all the source in the grammar file, you use the following 2366 command to convert it into a parser implementation file: 2367 2368 bison FILE.y 2369 2370 In this example, the grammar file is called `rpcalc.y' (for "Reverse 2371 Polish CALCulator"). Bison produces a parser implementation file named 2372 `FILE.tab.c', removing the `.y' from the grammar file name. The parser 2373 implementation file contains the source code for `yyparse'. The 2374 additional functions in the grammar file (`yylex', `yyerror' and 2375 `main') are copied verbatim to the parser implementation file. 2376 2377 2378 File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Generate, Up: RPN Calc 2379 2380 2.1.7 Compiling the Parser Implementation File 2381 ---------------------------------------------- 2382 2383 Here is how to compile and run the parser implementation file: 2384 2385 # List files in current directory. 2386 $ ls 2387 rpcalc.tab.c rpcalc.y 2388 2389 # Compile the Bison parser. 2390 # `-lm' tells compiler to search math library for `pow'. 2391 $ cc -lm -o rpcalc rpcalc.tab.c 2392 2393 # List files again. 2394 $ ls 2395 rpcalc rpcalc.tab.c rpcalc.y 2396 2397 The file `rpcalc' now contains the executable code. Here is an 2398 example session using `rpcalc'. 2399 2400 $ rpcalc 2401 4 9 + 2402 13 2403 3 7 + 3 4 5 *+- 2404 -13 2405 3 7 + 3 4 5 * + - n Note the unary minus, `n' 2406 13 2407 5 6 / 4 n + 2408 -3.166666667 2409 3 4 ^ Exponentiation 2410 81 2411 ^D End-of-file indicator 2412 $ 2413 2414 2415 File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples 2416 2417 2.2 Infix Notation Calculator: `calc' 2418 ===================================== 2419 2420 We now modify rpcalc to handle infix operators instead of postfix. 2421 Infix notation involves the concept of operator precedence and the need 2422 for parentheses nested to arbitrary depth. Here is the Bison code for 2423 `calc.y', an infix desk-top calculator. 2424 2425 /* Infix notation calculator. */ 2426 2427 %{ 2428 #define YYSTYPE double 2429 #include <math.h> 2430 #include <stdio.h> 2431 int yylex (void); 2432 void yyerror (char const *); 2433 %} 2434 2435 /* Bison declarations. */ 2436 %token NUM 2437 %left '-' '+' 2438 %left '*' '/' 2439 %left NEG /* negation--unary minus */ 2440 %right '^' /* exponentiation */ 2441 2442 %% /* The grammar follows. */ 2443 input: 2444 /* empty */ 2445 | input line 2446 ; 2447 2448 line: 2449 '\n' 2450 | exp '\n' { printf ("\t%.10g\n", $1); } 2451 ; 2452 2453 exp: 2454 NUM { $$ = $1; } 2455 | exp '+' exp { $$ = $1 + $3; } 2456 | exp '-' exp { $$ = $1 - $3; } 2457 | exp '*' exp { $$ = $1 * $3; } 2458 | exp '/' exp { $$ = $1 / $3; } 2459 | '-' exp %prec NEG { $$ = -$2; } 2460 | exp '^' exp { $$ = pow ($1, $3); } 2461 | '(' exp ')' { $$ = $2; } 2462 ; 2463 %% 2464 2465 The functions `yylex', `yyerror' and `main' can be the same as before. 2466 2467 There are two important new features shown in this code. 2468 2469 In the second section (Bison declarations), `%left' declares token 2470 types and says they are left-associative operators. The declarations 2471 `%left' and `%right' (right associativity) take the place of `%token' 2472 which is used to declare a token type name without associativity. 2473 (These tokens are single-character literals, which ordinarily don't 2474 need to be declared. We declare them here to specify the 2475 associativity.) 2476 2477 Operator precedence is determined by the line ordering of the 2478 declarations; the higher the line number of the declaration (lower on 2479 the page or screen), the higher the precedence. Hence, exponentiation 2480 has the highest precedence, unary minus (`NEG') is next, followed by 2481 `*' and `/', and so on. *Note Operator Precedence: Precedence. 2482 2483 The other important new feature is the `%prec' in the grammar 2484 section for the unary minus operator. The `%prec' simply instructs 2485 Bison that the rule `| '-' exp' has the same precedence as `NEG'--in 2486 this case the next-to-highest. *Note Context-Dependent Precedence: 2487 Contextual Precedence. 2488 2489 Here is a sample run of `calc.y': 2490 2491 $ calc 2492 4 + 4.5 - (34/(8*3+-3)) 2493 6.880952381 2494 -56 + 2 2495 -54 2496 3 ^ 2 2497 9 2498 2499 2500 File: bison.info, Node: Simple Error Recovery, Next: Location Tracking Calc, Prev: Infix Calc, Up: Examples 2501 2502 2.3 Simple Error Recovery 2503 ========================= 2504 2505 Up to this point, this manual has not addressed the issue of "error 2506 recovery"--how to continue parsing after the parser detects a syntax 2507 error. All we have handled is error reporting with `yyerror'. Recall 2508 that by default `yyparse' returns after calling `yyerror'. This means 2509 that an erroneous input line causes the calculator program to exit. 2510 Now we show how to rectify this deficiency. 2511 2512 The Bison language itself includes the reserved word `error', which 2513 may be included in the grammar rules. In the example below it has been 2514 added to one of the alternatives for `line': 2515 2516 line: 2517 '\n' 2518 | exp '\n' { printf ("\t%.10g\n", $1); } 2519 | error '\n' { yyerrok; } 2520 ; 2521 2522 This addition to the grammar allows for simple error recovery in the 2523 event of a syntax error. If an expression that cannot be evaluated is 2524 read, the error will be recognized by the third rule for `line', and 2525 parsing will continue. (The `yyerror' function is still called upon to 2526 print its message as well.) The action executes the statement 2527 `yyerrok', a macro defined automatically by Bison; its meaning is that 2528 error recovery is complete (*note Error Recovery::). Note the 2529 difference between `yyerrok' and `yyerror'; neither one is a misprint. 2530 2531 This form of error recovery deals with syntax errors. There are 2532 other kinds of errors; for example, division by zero, which raises an 2533 exception signal that is normally fatal. A real calculator program 2534 must handle this signal and use `longjmp' to return to `main' and 2535 resume parsing input lines; it would also have to discard the rest of 2536 the current line of input. We won't discuss this issue further because 2537 it is not specific to Bison programs. 2538 2539 2540 File: bison.info, Node: Location Tracking Calc, Next: Multi-function Calc, Prev: Simple Error Recovery, Up: Examples 2541 2542 2.4 Location Tracking Calculator: `ltcalc' 2543 ========================================== 2544 2545 This example extends the infix notation calculator with location 2546 tracking. This feature will be used to improve the error messages. For 2547 the sake of clarity, this example is a simple integer calculator, since 2548 most of the work needed to use locations will be done in the lexical 2549 analyzer. 2550 2551 * Menu: 2552 2553 * Ltcalc Declarations:: Bison and C declarations for ltcalc. 2554 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations. 2555 * Ltcalc Lexer:: The lexical analyzer. 2556 2557 2558 File: bison.info, Node: Ltcalc Declarations, Next: Ltcalc Rules, Up: Location Tracking Calc 2559 2560 2.4.1 Declarations for `ltcalc' 2561 ------------------------------- 2562 2563 The C and Bison declarations for the location tracking calculator are 2564 the same as the declarations for the infix notation calculator. 2565 2566 /* Location tracking calculator. */ 2567 2568 %{ 2569 #define YYSTYPE int 2570 #include <math.h> 2571 int yylex (void); 2572 void yyerror (char const *); 2573 %} 2574 2575 /* Bison declarations. */ 2576 %token NUM 2577 2578 %left '-' '+' 2579 %left '*' '/' 2580 %left NEG 2581 %right '^' 2582 2583 %% /* The grammar follows. */ 2584 2585 Note there are no declarations specific to locations. Defining a data 2586 type for storing locations is not needed: we will use the type provided 2587 by default (*note Data Types of Locations: Location Type.), which is a 2588 four member structure with the following integer fields: `first_line', 2589 `first_column', `last_line' and `last_column'. By conventions, and in 2590 accordance with the GNU Coding Standards and common practice, the line 2591 and column count both start at 1. 2592 2593 2594 File: bison.info, Node: Ltcalc Rules, Next: Ltcalc Lexer, Prev: Ltcalc Declarations, Up: Location Tracking Calc 2595 2596 2.4.2 Grammar Rules for `ltcalc' 2597 -------------------------------- 2598 2599 Whether handling locations or not has no effect on the syntax of your 2600 language. Therefore, grammar rules for this example will be very close 2601 to those of the previous example: we will only modify them to benefit 2602 from the new information. 2603 2604 Here, we will use locations to report divisions by zero, and locate 2605 the wrong expressions or subexpressions. 2606 2607 input: 2608 /* empty */ 2609 | input line 2610 ; 2611 2612 line: 2613 '\n' 2614 | exp '\n' { printf ("%d\n", $1); } 2615 ; 2616 2617 exp: 2618 NUM { $$ = $1; } 2619 | exp '+' exp { $$ = $1 + $3; } 2620 | exp '-' exp { $$ = $1 - $3; } 2621 | exp '*' exp { $$ = $1 * $3; } 2622 | exp '/' exp 2623 { 2624 if ($3) 2625 $$ = $1 / $3; 2626 else 2627 { 2628 $$ = 1; 2629 fprintf (stderr, "%d.%d-%d.%d: division by zero", 2630 @3.first_line, @3.first_column, 2631 @3.last_line, @3.last_column); 2632 } 2633 } 2634 | '-' exp %prec NEG { $$ = -$2; } 2635 | exp '^' exp { $$ = pow ($1, $3); } 2636 | '(' exp ')' { $$ = $2; } 2637 2638 This code shows how to reach locations inside of semantic actions, by 2639 using the pseudo-variables `@N' for rule components, and the 2640 pseudo-variable `@$' for groupings. 2641 2642 We don't need to assign a value to `@$': the output parser does it 2643 automatically. By default, before executing the C code of each action, 2644 `@$' is set to range from the beginning of `@1' to the end of `@N', for 2645 a rule with N components. This behavior can be redefined (*note 2646 Default Action for Locations: Location Default Action.), and for very 2647 specific rules, `@$' can be computed by hand. 2648 2649 2650 File: bison.info, Node: Ltcalc Lexer, Prev: Ltcalc Rules, Up: Location Tracking Calc 2651 2652 2.4.3 The `ltcalc' Lexical Analyzer. 2653 ------------------------------------ 2654 2655 Until now, we relied on Bison's defaults to enable location tracking. 2656 The next step is to rewrite the lexical analyzer, and make it able to 2657 feed the parser with the token locations, as it already does for 2658 semantic values. 2659 2660 To this end, we must take into account every single character of the 2661 input text, to avoid the computed locations of being fuzzy or wrong: 2662 2663 int 2664 yylex (void) 2665 { 2666 int c; 2667 2668 /* Skip white space. */ 2669 while ((c = getchar ()) == ' ' || c == '\t') 2670 ++yylloc.last_column; 2671 2672 /* Step. */ 2673 yylloc.first_line = yylloc.last_line; 2674 yylloc.first_column = yylloc.last_column; 2675 2676 /* Process numbers. */ 2677 if (isdigit (c)) 2678 { 2679 yylval = c - '0'; 2680 ++yylloc.last_column; 2681 while (isdigit (c = getchar ())) 2682 { 2683 ++yylloc.last_column; 2684 yylval = yylval * 10 + c - '0'; 2685 } 2686 ungetc (c, stdin); 2687 return NUM; 2688 } 2689 2690 /* Return end-of-input. */ 2691 if (c == EOF) 2692 return 0; 2693 2694 /* Return a single char, and update location. */ 2695 if (c == '\n') 2696 { 2697 ++yylloc.last_line; 2698 yylloc.last_column = 0; 2699 } 2700 else 2701 ++yylloc.last_column; 2702 return c; 2703 } 2704 2705 Basically, the lexical analyzer performs the same processing as 2706 before: it skips blanks and tabs, and reads numbers or single-character 2707 tokens. In addition, it updates `yylloc', the global variable (of type 2708 `YYLTYPE') containing the token's location. 2709 2710 Now, each time this function returns a token, the parser has its 2711 number as well as its semantic value, and its location in the text. 2712 The last needed change is to initialize `yylloc', for example in the 2713 controlling function: 2714 2715 int 2716 main (void) 2717 { 2718 yylloc.first_line = yylloc.last_line = 1; 2719 yylloc.first_column = yylloc.last_column = 0; 2720 return yyparse (); 2721 } 2722 2723 Remember that computing locations is not a matter of syntax. Every 2724 character must be associated to a location update, whether it is in 2725 valid input, in comments, in literal strings, and so on. 2726 2727 2728 File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Location Tracking Calc, Up: Examples 2729 2730 2.5 Multi-Function Calculator: `mfcalc' 2731 ======================================= 2732 2733 Now that the basics of Bison have been discussed, it is time to move on 2734 to a more advanced problem. The above calculators provided only five 2735 functions, `+', `-', `*', `/' and `^'. It would be nice to have a 2736 calculator that provides other mathematical functions such as `sin', 2737 `cos', etc. 2738 2739 It is easy to add new operators to the infix calculator as long as 2740 they are only single-character literals. The lexical analyzer `yylex' 2741 passes back all nonnumeric characters as tokens, so new grammar rules 2742 suffice for adding a new operator. But we want something more 2743 flexible: built-in functions whose syntax has this form: 2744 2745 FUNCTION_NAME (ARGUMENT) 2746 2747 At the same time, we will add memory to the calculator, by allowing you 2748 to create named variables, store values in them, and use them later. 2749 Here is a sample session with the multi-function calculator: 2750 2751 $ mfcalc 2752 pi = 3.141592653589 2753 3.1415926536 2754 sin(pi) 2755 0.0000000000 2756 alpha = beta1 = 2.3 2757 2.3000000000 2758 alpha 2759 2.3000000000 2760 ln(alpha) 2761 0.8329091229 2762 exp(ln(beta1)) 2763 2.3000000000 2764 $ 2765 2766 Note that multiple assignment and nested function calls are 2767 permitted. 2768 2769 * Menu: 2770 2771 * Mfcalc Declarations:: Bison declarations for multi-function calculator. 2772 * Mfcalc Rules:: Grammar rules for the calculator. 2773 * Mfcalc Symbol Table:: Symbol table management subroutines. 2774 2775 2776 File: bison.info, Node: Mfcalc Declarations, Next: Mfcalc Rules, Up: Multi-function Calc 2777 2778 2.5.1 Declarations for `mfcalc' 2779 ------------------------------- 2780 2781 Here are the C and Bison declarations for the multi-function calculator. 2782 2783 %{ 2784 #include <math.h> /* For math functions, cos(), sin(), etc. */ 2785 #include "calc.h" /* Contains definition of `symrec'. */ 2786 int yylex (void); 2787 void yyerror (char const *); 2788 %} 2789 2790 %union { 2791 double val; /* For returning numbers. */ 2792 symrec *tptr; /* For returning symbol-table pointers. */ 2793 } 2794 %token <val> NUM /* Simple double precision number. */ 2795 %token <tptr> VAR FNCT /* Variable and function. */ 2796 %type <val> exp 2797 2798 %right '=' 2799 %left '-' '+' 2800 %left '*' '/' 2801 %left NEG /* negation--unary minus */ 2802 %right '^' /* exponentiation */ 2803 2804 The above grammar introduces only two new features of the Bison 2805 language. These features allow semantic values to have various data 2806 types (*note More Than One Value Type: Multiple Types.). 2807 2808 The `%union' declaration specifies the entire list of possible types; 2809 this is instead of defining `YYSTYPE'. The allowable types are now 2810 double-floats (for `exp' and `NUM') and pointers to entries in the 2811 symbol table. *Note The Collection of Value Types: Union Decl. 2812 2813 Since values can now have various types, it is necessary to 2814 associate a type with each grammar symbol whose semantic value is used. 2815 These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations 2816 are augmented with information about their data type (placed between 2817 angle brackets). 2818 2819 The Bison construct `%type' is used for declaring nonterminal 2820 symbols, just as `%token' is used for declaring token types. We have 2821 not used `%type' before because nonterminal symbols are normally 2822 declared implicitly by the rules that define them. But `exp' must be 2823 declared explicitly so we can specify its value type. *Note 2824 Nonterminal Symbols: Type Decl. 2825 2826 2827 File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symbol Table, Prev: Mfcalc Declarations, Up: Multi-function Calc 2828 2829 2.5.2 Grammar Rules for `mfcalc' 2830 -------------------------------- 2831 2832 Here are the grammar rules for the multi-function calculator. Most of 2833 them are copied directly from `calc'; three rules, those which mention 2834 `VAR' or `FNCT', are new. 2835 2836 %% /* The grammar follows. */ 2837 input: 2838 /* empty */ 2839 | input line 2840 ; 2841 2842 line: 2843 '\n' 2844 | exp '\n' { printf ("%.10g\n", $1); } 2845 | error '\n' { yyerrok; } 2846 ; 2847 2848 exp: 2849 NUM { $$ = $1; } 2850 | VAR { $$ = $1->value.var; } 2851 | VAR '=' exp { $$ = $3; $1->value.var = $3; } 2852 | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); } 2853 | exp '+' exp { $$ = $1 + $3; } 2854 | exp '-' exp { $$ = $1 - $3; } 2855 | exp '*' exp { $$ = $1 * $3; } 2856 | exp '/' exp { $$ = $1 / $3; } 2857 | '-' exp %prec NEG { $$ = -$2; } 2858 | exp '^' exp { $$ = pow ($1, $3); } 2859 | '(' exp ')' { $$ = $2; } 2860 ; 2861 /* End of grammar. */ 2862 %% 2863 2864 2865 File: bison.info, Node: Mfcalc Symbol Table, Prev: Mfcalc Rules, Up: Multi-function Calc 2866 2867 2.5.3 The `mfcalc' Symbol Table 2868 ------------------------------- 2869 2870 The multi-function calculator requires a symbol table to keep track of 2871 the names and meanings of variables and functions. This doesn't affect 2872 the grammar rules (except for the actions) or the Bison declarations, 2873 but it requires some additional C functions for support. 2874 2875 The symbol table itself consists of a linked list of records. Its 2876 definition, which is kept in the header `calc.h', is as follows. It 2877 provides for either functions or variables to be placed in the table. 2878 2879 /* Function type. */ 2880 typedef double (*func_t) (double); 2881 2882 /* Data type for links in the chain of symbols. */ 2883 struct symrec 2884 { 2885 char *name; /* name of symbol */ 2886 int type; /* type of symbol: either VAR or FNCT */ 2887 union 2888 { 2889 double var; /* value of a VAR */ 2890 func_t fnctptr; /* value of a FNCT */ 2891 } value; 2892 struct symrec *next; /* link field */ 2893 }; 2894 2895 typedef struct symrec symrec; 2896 2897 /* The symbol table: a chain of `struct symrec'. */ 2898 extern symrec *sym_table; 2899 2900 symrec *putsym (char const *, int); 2901 symrec *getsym (char const *); 2902 2903 The new version of `main' includes a call to `init_table', a 2904 function that initializes the symbol table. Here it is, and 2905 `init_table' as well: 2906 2907 #include <stdio.h> 2908 2909 /* Called by yyparse on error. */ 2910 void 2911 yyerror (char const *s) 2912 { 2913 fprintf (stderr, "%s\n", s); 2914 } 2915 2916 struct init 2917 { 2918 char const *fname; 2919 double (*fnct) (double); 2920 }; 2921 2922 struct init const arith_fncts[] = 2923 { 2924 "sin", sin, 2925 "cos", cos, 2926 "atan", atan, 2927 "ln", log, 2928 "exp", exp, 2929 "sqrt", sqrt, 2930 0, 0 2931 }; 2932 2933 /* The symbol table: a chain of `struct symrec'. */ 2934 symrec *sym_table; 2935 2936 /* Put arithmetic functions in table. */ 2937 void 2938 init_table (void) 2939 { 2940 int i; 2941 for (i = 0; arith_fncts[i].fname != 0; i++) 2942 { 2943 symrec *ptr = putsym (arith_fncts[i].fname, FNCT); 2944 ptr->value.fnctptr = arith_fncts[i].fnct; 2945 } 2946 } 2947 2948 int 2949 main (void) 2950 { 2951 init_table (); 2952 return yyparse (); 2953 } 2954 2955 By simply editing the initialization list and adding the necessary 2956 include files, you can add additional functions to the calculator. 2957 2958 Two important functions allow look-up and installation of symbols in 2959 the symbol table. The function `putsym' is passed a name and the type 2960 (`VAR' or `FNCT') of the object to be installed. The object is linked 2961 to the front of the list, and a pointer to the object is returned. The 2962 function `getsym' is passed the name of the symbol to look up. If 2963 found, a pointer to that symbol is returned; otherwise zero is returned. 2964 2965 #include <stdlib.h> /* malloc. */ 2966 #include <string.h> /* strlen. */ 2967 2968 symrec * 2969 putsym (char const *sym_name, int sym_type) 2970 { 2971 symrec *ptr = (symrec *) malloc (sizeof (symrec)); 2972 ptr->name = (char *) malloc (strlen (sym_name) + 1); 2973 strcpy (ptr->name,sym_name); 2974 ptr->type = sym_type; 2975 ptr->value.var = 0; /* Set value to 0 even if fctn. */ 2976 ptr->next = (struct symrec *)sym_table; 2977 sym_table = ptr; 2978 return ptr; 2979 } 2980 2981 symrec * 2982 getsym (char const *sym_name) 2983 { 2984 symrec *ptr; 2985 for (ptr = sym_table; ptr != (symrec *) 0; 2986 ptr = (symrec *)ptr->next) 2987 if (strcmp (ptr->name,sym_name) == 0) 2988 return ptr; 2989 return 0; 2990 } 2991 2992 The function `yylex' must now recognize variables, numeric values, 2993 and the single-character arithmetic operators. Strings of alphanumeric 2994 characters with a leading letter are recognized as either variables or 2995 functions depending on what the symbol table says about them. 2996 2997 The string is passed to `getsym' for look up in the symbol table. If 2998 the name appears in the table, a pointer to its location and its type 2999 (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in 3000 the table, then it is installed as a `VAR' using `putsym'. Again, a 3001 pointer and its type (which must be `VAR') is returned to `yyparse'. 3002 3003 No change is needed in the handling of numeric values and arithmetic 3004 operators in `yylex'. 3005 3006 #include <ctype.h> 3007 3008 int 3009 yylex (void) 3010 { 3011 int c; 3012 3013 /* Ignore white space, get first nonwhite character. */ 3014 while ((c = getchar ()) == ' ' || c == '\t') 3015 continue; 3016 3017 if (c == EOF) 3018 return 0; 3019 3020 /* Char starts a number => parse the number. */ 3021 if (c == '.' || isdigit (c)) 3022 { 3023 ungetc (c, stdin); 3024 scanf ("%lf", &yylval.val); 3025 return NUM; 3026 } 3027 3028 /* Char starts an identifier => read the name. */ 3029 if (isalpha (c)) 3030 { 3031 /* Initially make the buffer long enough 3032 for a 40-character symbol name. */ 3033 static size_t length = 40; 3034 static char *symbuf = 0; 3035 symrec *s; 3036 int i; 3037 3038 if (!symbuf) 3039 symbuf = (char *) malloc (length + 1); 3040 3041 i = 0; 3042 do 3043 { 3044 /* If buffer is full, make it bigger. */ 3045 if (i == length) 3046 { 3047 length *= 2; 3048 symbuf = (char *) realloc (symbuf, length + 1); 3049 } 3050 /* Add this character to the buffer. */ 3051 symbuf[i++] = c; 3052 /* Get another character. */ 3053 c = getchar (); 3054 } 3055 while (isalnum (c)); 3056 3057 ungetc (c, stdin); 3058 symbuf[i] = '\0'; 3059 3060 s = getsym (symbuf); 3061 if (s == 0) 3062 s = putsym (symbuf, VAR); 3063 yylval.tptr = s; 3064 return s->type; 3065 } 3066 3067 /* Any other character is a token by itself. */ 3068 return c; 3069 } 3070 3071 The error reporting function is unchanged, and the new version of 3072 `main' includes a call to `init_table' and sets the `yydebug' on user 3073 demand (*Note Tracing Your Parser: Tracing, for details): 3074 3075 /* Called by yyparse on error. */ 3076 void 3077 yyerror (char const *s) 3078 { 3079 fprintf (stderr, "%s\n", s); 3080 } 3081 3082 int 3083 main (int argc, char const* argv[]) 3084 { 3085 int i; 3086 /* Enable parse traces on option -p. */ 3087 for (i = 1; i < argc; ++i) 3088 if (!strcmp(argv[i], "-p")) 3089 yydebug = 1; 3090 init_table (); 3091 return yyparse (); 3092 } 3093 3094 This program is both powerful and flexible. You may easily add new 3095 functions, and it is a simple job to modify this code to install 3096 predefined variables such as `pi' or `e' as well. 3097 3098 3099 File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples 3100 3101 2.6 Exercises 3102 ============= 3103 3104 1. Add some new functions from `math.h' to the initialization list. 3105 3106 2. Add another array that contains constants and their values. Then 3107 modify `init_table' to add these constants to the symbol table. 3108 It will be easiest to give the constants type `VAR'. 3109 3110 3. Make the program report an error if the user refers to an 3111 uninitialized variable in any way except to store a value in it. 3112 3113 3114 File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top 3115 3116 3 Bison Grammar Files 3117 ********************* 3118 3119 Bison takes as input a context-free grammar specification and produces a 3120 C-language function that recognizes correct instances of the grammar. 3121 3122 The Bison grammar file conventionally has a name ending in `.y'. 3123 *Note Invoking Bison: Invocation. 3124 3125 * Menu: 3126 3127 * Grammar Outline:: Overall layout of the grammar file. 3128 * Symbols:: Terminal and nonterminal symbols. 3129 * Rules:: How to write grammar rules. 3130 * Recursion:: Writing recursive rules. 3131 * Semantics:: Semantic values and actions. 3132 * Tracking Locations:: Locations and actions. 3133 * Named References:: Using named references in actions. 3134 * Declarations:: All kinds of Bison declarations are described here. 3135 * Multiple Parsers:: Putting more than one Bison parser in one program. 3136 3137 3138 File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File 3139 3140 3.1 Outline of a Bison Grammar 3141 ============================== 3142 3143 A Bison grammar file has four main sections, shown here with the 3144 appropriate delimiters: 3145 3146 %{ 3147 PROLOGUE 3148 %} 3149 3150 BISON DECLARATIONS 3151 3152 %% 3153 GRAMMAR RULES 3154 %% 3155 3156 EPILOGUE 3157 3158 Comments enclosed in `/* ... */' may appear in any of the sections. 3159 As a GNU extension, `//' introduces a comment that continues until end 3160 of line. 3161 3162 * Menu: 3163 3164 * Prologue:: Syntax and usage of the prologue. 3165 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. 3166 * Bison Declarations:: Syntax and usage of the Bison declarations section. 3167 * Grammar Rules:: Syntax and usage of the grammar rules section. 3168 * Epilogue:: Syntax and usage of the epilogue. 3169 3170 3171 File: bison.info, Node: Prologue, Next: Prologue Alternatives, Up: Grammar Outline 3172 3173 3.1.1 The prologue 3174 ------------------ 3175 3176 The PROLOGUE section contains macro definitions and declarations of 3177 functions and variables that are used in the actions in the grammar 3178 rules. These are copied to the beginning of the parser implementation 3179 file so that they precede the definition of `yyparse'. You can use 3180 `#include' to get the declarations from a header file. If you don't 3181 need any C declarations, you may omit the `%{' and `%}' delimiters that 3182 bracket this section. 3183 3184 The PROLOGUE section is terminated by the first occurrence of `%}' 3185 that is outside a comment, a string literal, or a character constant. 3186 3187 You may have more than one PROLOGUE section, intermixed with the 3188 BISON DECLARATIONS. This allows you to have C and Bison declarations 3189 that refer to each other. For example, the `%union' declaration may 3190 use types defined in a header file, and you may wish to prototype 3191 functions that take arguments of type `YYSTYPE'. This can be done with 3192 two PROLOGUE blocks, one before and one after the `%union' declaration. 3193 3194 %{ 3195 #define _GNU_SOURCE 3196 #include <stdio.h> 3197 #include "ptypes.h" 3198 %} 3199 3200 %union { 3201 long int n; 3202 tree t; /* `tree' is defined in `ptypes.h'. */ 3203 } 3204 3205 %{ 3206 static void print_token_value (FILE *, int, YYSTYPE); 3207 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3208 %} 3209 3210 ... 3211 3212 When in doubt, it is usually safer to put prologue code before all 3213 Bison declarations, rather than after. For example, any definitions of 3214 feature test macros like `_GNU_SOURCE' or `_POSIX_C_SOURCE' should 3215 appear before all Bison declarations, as feature test macros can affect 3216 the behavior of Bison-generated `#include' directives. 3217 3218 3219 File: bison.info, Node: Prologue Alternatives, Next: Bison Declarations, Prev: Prologue, Up: Grammar Outline 3220 3221 3.1.2 Prologue Alternatives 3222 --------------------------- 3223 3224 The functionality of PROLOGUE sections can often be subtle and 3225 inflexible. As an alternative, Bison provides a `%code' directive with 3226 an explicit qualifier field, which identifies the purpose of the code 3227 and thus the location(s) where Bison should generate it. For C/C++, 3228 the qualifier can be omitted for the default location, or it can be one 3229 of `requires', `provides', `top'. *Note %code Summary::. 3230 3231 Look again at the example of the previous section: 3232 3233 %{ 3234 #define _GNU_SOURCE 3235 #include <stdio.h> 3236 #include "ptypes.h" 3237 %} 3238 3239 %union { 3240 long int n; 3241 tree t; /* `tree' is defined in `ptypes.h'. */ 3242 } 3243 3244 %{ 3245 static void print_token_value (FILE *, int, YYSTYPE); 3246 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3247 %} 3248 3249 ... 3250 3251 Notice that there are two PROLOGUE sections here, but there's a subtle 3252 distinction between their functionality. For example, if you decide to 3253 override Bison's default definition for `YYLTYPE', in which PROLOGUE 3254 section should you write your new definition? You should write it in 3255 the first since Bison will insert that code into the parser 3256 implementation file _before_ the default `YYLTYPE' definition. In 3257 which PROLOGUE section should you prototype an internal function, 3258 `trace_token', that accepts `YYLTYPE' and `yytokentype' as arguments? 3259 You should prototype it in the second since Bison will insert that code 3260 _after_ the `YYLTYPE' and `yytokentype' definitions. 3261 3262 This distinction in functionality between the two PROLOGUE sections 3263 is established by the appearance of the `%union' between them. This 3264 behavior raises a few questions. First, why should the position of a 3265 `%union' affect definitions related to `YYLTYPE' and `yytokentype'? 3266 Second, what if there is no `%union'? In that case, the second kind of 3267 PROLOGUE section is not available. This behavior is not intuitive. 3268 3269 To avoid this subtle `%union' dependency, rewrite the example using a 3270 `%code top' and an unqualified `%code'. Let's go ahead and add the new 3271 `YYLTYPE' definition and the `trace_token' prototype at the same time: 3272 3273 %code top { 3274 #define _GNU_SOURCE 3275 #include <stdio.h> 3276 3277 /* WARNING: The following code really belongs 3278 * in a `%code requires'; see below. */ 3279 3280 #include "ptypes.h" 3281 #define YYLTYPE YYLTYPE 3282 typedef struct YYLTYPE 3283 { 3284 int first_line; 3285 int first_column; 3286 int last_line; 3287 int last_column; 3288 char *filename; 3289 } YYLTYPE; 3290 } 3291 3292 %union { 3293 long int n; 3294 tree t; /* `tree' is defined in `ptypes.h'. */ 3295 } 3296 3297 %code { 3298 static void print_token_value (FILE *, int, YYSTYPE); 3299 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3300 static void trace_token (enum yytokentype token, YYLTYPE loc); 3301 } 3302 3303 ... 3304 3305 In this way, `%code top' and the unqualified `%code' achieve the same 3306 functionality as the two kinds of PROLOGUE sections, but it's always 3307 explicit which kind you intend. Moreover, both kinds are always 3308 available even in the absence of `%union'. 3309 3310 The `%code top' block above logically contains two parts. The first 3311 two lines before the warning need to appear near the top of the parser 3312 implementation file. The first line after the warning is required by 3313 `YYSTYPE' and thus also needs to appear in the parser implementation 3314 file. However, if you've instructed Bison to generate a parser header 3315 file (*note %defines: Decl Summary.), you probably want that line to 3316 appear before the `YYSTYPE' definition in that header file as well. 3317 The `YYLTYPE' definition should also appear in the parser header file 3318 to override the default `YYLTYPE' definition there. 3319 3320 In other words, in the `%code top' block above, all but the first two 3321 lines are dependency code required by the `YYSTYPE' and `YYLTYPE' 3322 definitions. Thus, they belong in one or more `%code requires': 3323 3324 %code top { 3325 #define _GNU_SOURCE 3326 #include <stdio.h> 3327 } 3328 3329 %code requires { 3330 #include "ptypes.h" 3331 } 3332 %union { 3333 long int n; 3334 tree t; /* `tree' is defined in `ptypes.h'. */ 3335 } 3336 3337 %code requires { 3338 #define YYLTYPE YYLTYPE 3339 typedef struct YYLTYPE 3340 { 3341 int first_line; 3342 int first_column; 3343 int last_line; 3344 int last_column; 3345 char *filename; 3346 } YYLTYPE; 3347 } 3348 3349 %code { 3350 static void print_token_value (FILE *, int, YYSTYPE); 3351 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3352 static void trace_token (enum yytokentype token, YYLTYPE loc); 3353 } 3354 3355 ... 3356 3357 Now Bison will insert `#include "ptypes.h"' and the new `YYLTYPE' 3358 definition before the Bison-generated `YYSTYPE' and `YYLTYPE' 3359 definitions in both the parser implementation file and the parser 3360 header file. (By the same reasoning, `%code requires' would also be 3361 the appropriate place to write your own definition for `YYSTYPE'.) 3362 3363 When you are writing dependency code for `YYSTYPE' and `YYLTYPE', 3364 you should prefer `%code requires' over `%code top' regardless of 3365 whether you instruct Bison to generate a parser header file. When you 3366 are writing code that you need Bison to insert only into the parser 3367 implementation file and that has no special need to appear at the top 3368 of that file, you should prefer the unqualified `%code' over `%code 3369 top'. These practices will make the purpose of each block of your code 3370 explicit to Bison and to other developers reading your grammar file. 3371 Following these practices, we expect the unqualified `%code' and `%code 3372 requires' to be the most important of the four PROLOGUE alternatives. 3373 3374 At some point while developing your parser, you might decide to 3375 provide `trace_token' to modules that are external to your parser. 3376 Thus, you might wish for Bison to insert the prototype into both the 3377 parser header file and the parser implementation file. Since this 3378 function is not a dependency required by `YYSTYPE' or `YYLTYPE', it 3379 doesn't make sense to move its prototype to a `%code requires'. More 3380 importantly, since it depends upon `YYLTYPE' and `yytokentype', `%code 3381 requires' is not sufficient. Instead, move its prototype from the 3382 unqualified `%code' to a `%code provides': 3383 3384 %code top { 3385 #define _GNU_SOURCE 3386 #include <stdio.h> 3387 } 3388 3389 %code requires { 3390 #include "ptypes.h" 3391 } 3392 %union { 3393 long int n; 3394 tree t; /* `tree' is defined in `ptypes.h'. */ 3395 } 3396 3397 %code requires { 3398 #define YYLTYPE YYLTYPE 3399 typedef struct YYLTYPE 3400 { 3401 int first_line; 3402 int first_column; 3403 int last_line; 3404 int last_column; 3405 char *filename; 3406 } YYLTYPE; 3407 } 3408 3409 %code provides { 3410 void trace_token (enum yytokentype token, YYLTYPE loc); 3411 } 3412 3413 %code { 3414 static void print_token_value (FILE *, int, YYSTYPE); 3415 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3416 } 3417 3418 ... 3419 3420 Bison will insert the `trace_token' prototype into both the parser 3421 header file and the parser implementation file after the definitions 3422 for `yytokentype', `YYLTYPE', and `YYSTYPE'. 3423 3424 The above examples are careful to write directives in an order that 3425 reflects the layout of the generated parser implementation and header 3426 files: `%code top', `%code requires', `%code provides', and then 3427 `%code'. While your grammar files may generally be easier to read if 3428 you also follow this order, Bison does not require it. Instead, Bison 3429 lets you choose an organization that makes sense to you. 3430 3431 You may declare any of these directives multiple times in the 3432 grammar file. In that case, Bison concatenates the contained code in 3433 declaration order. This is the only way in which the position of one 3434 of these directives within the grammar file affects its functionality. 3435 3436 The result of the previous two properties is greater flexibility in 3437 how you may organize your grammar file. For example, you may organize 3438 semantic-type-related directives by semantic type: 3439 3440 %code requires { #include "type1.h" } 3441 %union { type1 field1; } 3442 %destructor { type1_free ($$); } <field1> 3443 %printer { type1_print (yyoutput, $$); } <field1> 3444 3445 %code requires { #include "type2.h" } 3446 %union { type2 field2; } 3447 %destructor { type2_free ($$); } <field2> 3448 %printer { type2_print (yyoutput, $$); } <field2> 3449 3450 You could even place each of the above directive groups in the rules 3451 section of the grammar file next to the set of rules that uses the 3452 associated semantic type. (In the rules section, you must terminate 3453 each of those directives with a semicolon.) And you don't have to 3454 worry that some directive (like a `%union') in the definitions section 3455 is going to adversely affect their functionality in some 3456 counter-intuitive manner just because it comes first. Such an 3457 organization is not possible using PROLOGUE sections. 3458 3459 This section has been concerned with explaining the advantages of 3460 the four PROLOGUE alternatives over the original Yacc PROLOGUE. 3461 However, in most cases when using these directives, you shouldn't need 3462 to think about all the low-level ordering issues discussed here. 3463 Instead, you should simply use these directives to label each block of 3464 your code according to its purpose and let Bison handle the ordering. 3465 `%code' is the most generic label. Move code to `%code requires', 3466 `%code provides', or `%code top' as needed. 3467 3468 3469 File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: Prologue Alternatives, Up: Grammar Outline 3470 3471 3.1.3 The Bison Declarations Section 3472 ------------------------------------ 3473 3474 The BISON DECLARATIONS section contains declarations that define 3475 terminal and nonterminal symbols, specify precedence, and so on. In 3476 some simple grammars you may not need any declarations. *Note Bison 3477 Declarations: Declarations. 3478 3479 3480 File: bison.info, Node: Grammar Rules, Next: Epilogue, Prev: Bison Declarations, Up: Grammar Outline 3481 3482 3.1.4 The Grammar Rules Section 3483 ------------------------------- 3484 3485 The "grammar rules" section contains one or more Bison grammar rules, 3486 and nothing else. *Note Syntax of Grammar Rules: Rules. 3487 3488 There must always be at least one grammar rule, and the first `%%' 3489 (which precedes the grammar rules) may never be omitted even if it is 3490 the first thing in the file. 3491 3492 3493 File: bison.info, Node: Epilogue, Prev: Grammar Rules, Up: Grammar Outline 3494 3495 3.1.5 The epilogue 3496 ------------------ 3497 3498 The EPILOGUE is copied verbatim to the end of the parser implementation 3499 file, just as the PROLOGUE is copied to the beginning. This is the 3500 most convenient place to put anything that you want to have in the 3501 parser implementation file but which need not come before the 3502 definition of `yyparse'. For example, the definitions of `yylex' and 3503 `yyerror' often go here. Because C requires functions to be declared 3504 before being used, you often need to declare functions like `yylex' and 3505 `yyerror' in the Prologue, even if you define them in the Epilogue. 3506 *Note Parser C-Language Interface: Interface. 3507 3508 If the last section is empty, you may omit the `%%' that separates it 3509 from the grammar rules. 3510 3511 The Bison parser itself contains many macros and identifiers whose 3512 names start with `yy' or `YY', so it is a good idea to avoid using any 3513 such names (except those documented in this manual) in the epilogue of 3514 the grammar file. 3515 3516 3517 File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File 3518 3519 3.2 Symbols, Terminal and Nonterminal 3520 ===================================== 3521 3522 "Symbols" in Bison grammars represent the grammatical classifications 3523 of the language. 3524 3525 A "terminal symbol" (also known as a "token type") represents a 3526 class of syntactically equivalent tokens. You use the symbol in grammar 3527 rules to mean that a token in that class is allowed. The symbol is 3528 represented in the Bison parser by a numeric code, and the `yylex' 3529 function returns a token type code to indicate what kind of token has 3530 been read. You don't need to know what the code value is; you can use 3531 the symbol to stand for it. 3532 3533 A "nonterminal symbol" stands for a class of syntactically 3534 equivalent groupings. The symbol name is used in writing grammar rules. 3535 By convention, it should be all lower case. 3536 3537 Symbol names can contain letters, underscores, periods, and 3538 non-initial digits and dashes. Dashes in symbol names are a GNU 3539 extension, incompatible with POSIX Yacc. Periods and dashes make 3540 symbol names less convenient to use with named references, which 3541 require brackets around such names (*note Named References::). 3542 Terminal symbols that contain periods or dashes make little sense: 3543 since they are not valid symbols (in most programming languages) they 3544 are not exported as token names. 3545 3546 There are three ways of writing terminal symbols in the grammar: 3547 3548 * A "named token type" is written with an identifier, like an 3549 identifier in C. By convention, it should be all upper case. Each 3550 such name must be defined with a Bison declaration such as 3551 `%token'. *Note Token Type Names: Token Decl. 3552 3553 * A "character token type" (or "literal character token") is written 3554 in the grammar using the same syntax used in C for character 3555 constants; for example, `'+'' is a character token type. A 3556 character token type doesn't need to be declared unless you need to 3557 specify its semantic value data type (*note Data Types of Semantic 3558 Values: Value Type.), associativity, or precedence (*note Operator 3559 Precedence: Precedence.). 3560 3561 By convention, a character token type is used only to represent a 3562 token that consists of that particular character. Thus, the token 3563 type `'+'' is used to represent the character `+' as a token. 3564 Nothing enforces this convention, but if you depart from it, your 3565 program will confuse other readers. 3566 3567 All the usual escape sequences used in character literals in C can 3568 be used in Bison as well, but you must not use the null character 3569 as a character literal because its numeric code, zero, signifies 3570 end-of-input (*note Calling Convention for `yylex': Calling 3571 Convention.). Also, unlike standard C, trigraphs have no special 3572 meaning in Bison character literals, nor is backslash-newline 3573 allowed. 3574 3575 * A "literal string token" is written like a C string constant; for 3576 example, `"<="' is a literal string token. A literal string token 3577 doesn't need to be declared unless you need to specify its semantic 3578 value data type (*note Value Type::), associativity, or precedence 3579 (*note Precedence::). 3580 3581 You can associate the literal string token with a symbolic name as 3582 an alias, using the `%token' declaration (*note Token 3583 Declarations: Token Decl.). If you don't do that, the lexical 3584 analyzer has to retrieve the token number for the literal string 3585 token from the `yytname' table (*note Calling Convention::). 3586 3587 *Warning*: literal string tokens do not work in Yacc. 3588 3589 By convention, a literal string token is used only to represent a 3590 token that consists of that particular string. Thus, you should 3591 use the token type `"<="' to represent the string `<=' as a token. 3592 Bison does not enforce this convention, but if you depart from it, 3593 people who read your program will be confused. 3594 3595 All the escape sequences used in string literals in C can be used 3596 in Bison as well, except that you must not use a null character 3597 within a string literal. Also, unlike Standard C, trigraphs have 3598 no special meaning in Bison string literals, nor is 3599 backslash-newline allowed. A literal string token must contain 3600 two or more characters; for a token containing just one character, 3601 use a character token (see above). 3602 3603 How you choose to write a terminal symbol has no effect on its 3604 grammatical meaning. That depends only on where it appears in rules and 3605 on when the parser function returns that symbol. 3606 3607 The value returned by `yylex' is always one of the terminal symbols, 3608 except that a zero or negative value signifies end-of-input. Whichever 3609 way you write the token type in the grammar rules, you write it the 3610 same way in the definition of `yylex'. The numeric code for a 3611 character token type is simply the positive numeric code of the 3612 character, so `yylex' can use the identical value to generate the 3613 requisite code, though you may need to convert it to `unsigned char' to 3614 avoid sign-extension on hosts where `char' is signed. Each named token 3615 type becomes a C macro in the parser implementation file, so `yylex' 3616 can use the name to stand for the code. (This is why periods don't 3617 make sense in terminal symbols.) *Note Calling Convention for `yylex': 3618 Calling Convention. 3619 3620 If `yylex' is defined in a separate file, you need to arrange for the 3621 token-type macro definitions to be available there. Use the `-d' 3622 option when you run Bison, so that it will write these macro definitions 3623 into a separate header file `NAME.tab.h' which you can include in the 3624 other source files that need it. *Note Invoking Bison: Invocation. 3625 3626 If you want to write a grammar that is portable to any Standard C 3627 host, you must use only nonnull character tokens taken from the basic 3628 execution character set of Standard C. This set consists of the ten 3629 digits, the 52 lower- and upper-case English letters, and the 3630 characters in the following C-language string: 3631 3632 "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~" 3633 3634 The `yylex' function and Bison must use a consistent character set 3635 and encoding for character tokens. For example, if you run Bison in an 3636 ASCII environment, but then compile and run the resulting program in an 3637 environment that uses an incompatible character set like EBCDIC, the 3638 resulting program may not work because the tables generated by Bison 3639 will assume ASCII numeric values for character tokens. It is standard 3640 practice for software distributions to contain C source files that were 3641 generated by Bison in an ASCII environment, so installers on platforms 3642 that are incompatible with ASCII must rebuild those files before 3643 compiling them. 3644 3645 The symbol `error' is a terminal symbol reserved for error recovery 3646 (*note Error Recovery::); you shouldn't use it for any other purpose. 3647 In particular, `yylex' should never return this value. The default 3648 value of the error token is 256, unless you explicitly assigned 256 to 3649 one of your tokens with a `%token' declaration. 3650 3651 3652 File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File 3653 3654 3.3 Syntax of Grammar Rules 3655 =========================== 3656 3657 A Bison grammar rule has the following general form: 3658 3659 RESULT: COMPONENTS...; 3660 3661 where RESULT is the nonterminal symbol that this rule describes, and 3662 COMPONENTS are various terminal and nonterminal symbols that are put 3663 together by this rule (*note Symbols::). 3664 3665 For example, 3666 3667 exp: exp '+' exp; 3668 3669 says that two groupings of type `exp', with a `+' token in between, can 3670 be combined into a larger grouping of type `exp'. 3671 3672 White space in rules is significant only to separate symbols. You 3673 can add extra white space as you wish. 3674 3675 Scattered among the components can be ACTIONS that determine the 3676 semantics of the rule. An action looks like this: 3677 3678 {C STATEMENTS} 3679 3680 This is an example of "braced code", that is, C code surrounded by 3681 braces, much like a compound statement in C. Braced code can contain 3682 any sequence of C tokens, so long as its braces are balanced. Bison 3683 does not check the braced code for correctness directly; it merely 3684 copies the code to the parser implementation file, where the C compiler 3685 can check it. 3686 3687 Within braced code, the balanced-brace count is not affected by 3688 braces within comments, string literals, or character constants, but it 3689 is affected by the C digraphs `<%' and `%>' that represent braces. At 3690 the top level braced code must be terminated by `}' and not by a 3691 digraph. Bison does not look for trigraphs, so if braced code uses 3692 trigraphs you should ensure that they do not affect the nesting of 3693 braces or the boundaries of comments, string literals, or character 3694 constants. 3695 3696 Usually there is only one action and it follows the components. 3697 *Note Actions::. 3698 3699 Multiple rules for the same RESULT can be written separately or can 3700 be joined with the vertical-bar character `|' as follows: 3701 3702 RESULT: 3703 RULE1-COMPONENTS... 3704 | RULE2-COMPONENTS... 3705 ... 3706 ; 3707 3708 They are still considered distinct rules even when joined in this way. 3709 3710 If COMPONENTS in a rule is empty, it means that RESULT can match the 3711 empty string. For example, here is how to define a comma-separated 3712 sequence of zero or more `exp' groupings: 3713 3714 expseq: 3715 /* empty */ 3716 | expseq1 3717 ; 3718 3719 expseq1: 3720 exp 3721 | expseq1 ',' exp 3722 ; 3723 3724 It is customary to write a comment `/* empty */' in each rule with no 3725 components. 3726 3727 3728 File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File 3729 3730 3.4 Recursive Rules 3731 =================== 3732 3733 A rule is called "recursive" when its RESULT nonterminal appears also 3734 on its right hand side. Nearly all Bison grammars need to use 3735 recursion, because that is the only way to define a sequence of any 3736 number of a particular thing. Consider this recursive definition of a 3737 comma-separated sequence of one or more expressions: 3738 3739 expseq1: 3740 exp 3741 | expseq1 ',' exp 3742 ; 3743 3744 Since the recursive use of `expseq1' is the leftmost symbol in the 3745 right hand side, we call this "left recursion". By contrast, here the 3746 same construct is defined using "right recursion": 3747 3748 expseq1: 3749 exp 3750 | exp ',' expseq1 3751 ; 3752 3753 Any kind of sequence can be defined using either left recursion or right 3754 recursion, but you should always use left recursion, because it can 3755 parse a sequence of any number of elements with bounded stack space. 3756 Right recursion uses up space on the Bison stack in proportion to the 3757 number of elements in the sequence, because all the elements must be 3758 shifted onto the stack before the rule can be applied even once. *Note 3759 The Bison Parser Algorithm: Algorithm, for further explanation of this. 3760 3761 "Indirect" or "mutual" recursion occurs when the result of the rule 3762 does not appear directly on its right hand side, but does appear in 3763 rules for other nonterminals which do appear on its right hand side. 3764 3765 For example: 3766 3767 expr: 3768 primary 3769 | primary '+' primary 3770 ; 3771 3772 primary: 3773 constant 3774 | '(' expr ')' 3775 ; 3776 3777 defines two mutually-recursive nonterminals, since each refers to the 3778 other. 3779 3780 3781 File: bison.info, Node: Semantics, Next: Tracking Locations, Prev: Recursion, Up: Grammar File 3782 3783 3.5 Defining Language Semantics 3784 =============================== 3785 3786 The grammar rules for a language determine only the syntax. The 3787 semantics are determined by the semantic values associated with various 3788 tokens and groupings, and by the actions taken when various groupings 3789 are recognized. 3790 3791 For example, the calculator calculates properly because the value 3792 associated with each expression is the proper number; it adds properly 3793 because the action for the grouping `X + Y' is to add the numbers 3794 associated with X and Y. 3795 3796 * Menu: 3797 3798 * Value Type:: Specifying one data type for all semantic values. 3799 * Multiple Types:: Specifying several alternative data types. 3800 * Actions:: An action is the semantic definition of a grammar rule. 3801 * Action Types:: Specifying data types for actions to operate on. 3802 * Mid-Rule Actions:: Most actions go at the end of a rule. 3803 This says when, why and how to use the exceptional 3804 action in the middle of a rule. 3805 3806 3807 File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics 3808 3809 3.5.1 Data Types of Semantic Values 3810 ----------------------------------- 3811 3812 In a simple program it may be sufficient to use the same data type for 3813 the semantic values of all language constructs. This was true in the 3814 RPN and infix calculator examples (*note Reverse Polish Notation 3815 Calculator: RPN Calc.). 3816 3817 Bison normally uses the type `int' for semantic values if your 3818 program uses the same data type for all language constructs. To 3819 specify some other type, define `YYSTYPE' as a macro, like this: 3820 3821 #define YYSTYPE double 3822 3823 `YYSTYPE''s replacement list should be a type name that does not 3824 contain parentheses or square brackets. This macro definition must go 3825 in the prologue of the grammar file (*note Outline of a Bison Grammar: 3826 Grammar Outline.). 3827 3828 3829 File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics 3830 3831 3.5.2 More Than One Value Type 3832 ------------------------------ 3833 3834 In most programs, you will need different data types for different kinds 3835 of tokens and groupings. For example, a numeric constant may need type 3836 `int' or `long int', while a string constant needs type `char *', and 3837 an identifier might need a pointer to an entry in the symbol table. 3838 3839 To use more than one data type for semantic values in one parser, 3840 Bison requires you to do two things: 3841 3842 * Specify the entire collection of possible data types, either by 3843 using the `%union' Bison declaration (*note The Collection of 3844 Value Types: Union Decl.), or by using a `typedef' or a `#define' 3845 to define `YYSTYPE' to be a union type whose member names are the 3846 type tags. 3847 3848 * Choose one of those types for each symbol (terminal or 3849 nonterminal) for which semantic values are used. This is done for 3850 tokens with the `%token' Bison declaration (*note Token Type 3851 Names: Token Decl.) and for groupings with the `%type' Bison 3852 declaration (*note Nonterminal Symbols: Type Decl.). 3853 3854 3855 File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics 3856 3857 3.5.3 Actions 3858 ------------- 3859 3860 An action accompanies a syntactic rule and contains C code to be 3861 executed each time an instance of that rule is recognized. The task of 3862 most actions is to compute a semantic value for the grouping built by 3863 the rule from the semantic values associated with tokens or smaller 3864 groupings. 3865 3866 An action consists of braced code containing C statements, and can be 3867 placed at any position in the rule; it is executed at that position. 3868 Most rules have just one action at the end of the rule, following all 3869 the components. Actions in the middle of a rule are tricky and used 3870 only for special purposes (*note Actions in Mid-Rule: Mid-Rule 3871 Actions.). 3872 3873 The C code in an action can refer to the semantic values of the 3874 components matched by the rule with the construct `$N', which stands 3875 for the value of the Nth component. The semantic value for the 3876 grouping being constructed is `$$'. In addition, the semantic values 3877 of symbols can be accessed with the named references construct `$NAME' 3878 or `$[NAME]'. Bison translates both of these constructs into 3879 expressions of the appropriate type when it copies the actions into the 3880 parser implementation file. `$$' (or `$NAME', when it stands for the 3881 current grouping) is translated to a modifiable lvalue, so it can be 3882 assigned to. 3883 3884 Here is a typical example: 3885 3886 exp: 3887 ... 3888 | exp '+' exp { $$ = $1 + $3; } 3889 3890 Or, in terms of named references: 3891 3892 exp[result]: 3893 ... 3894 | exp[left] '+' exp[right] { $result = $left + $right; } 3895 3896 This rule constructs an `exp' from two smaller `exp' groupings 3897 connected by a plus-sign token. In the action, `$1' and `$3' (`$left' 3898 and `$right') refer to the semantic values of the two component `exp' 3899 groupings, which are the first and third symbols on the right hand side 3900 of the rule. The sum is stored into `$$' (`$result') so that it 3901 becomes the semantic value of the addition-expression just recognized 3902 by the rule. If there were a useful semantic value associated with the 3903 `+' token, it could be referred to as `$2'. 3904 3905 *Note Named References::, for more information about using the named 3906 references construct. 3907 3908 Note that the vertical-bar character `|' is really a rule separator, 3909 and actions are attached to a single rule. This is a difference with 3910 tools like Flex, for which `|' stands for either "or", or "the same 3911 action as that of the next rule". In the following example, the action 3912 is triggered only when `b' is found: 3913 3914 a-or-b: 'a'|'b' { a_or_b_found = 1; }; 3915 3916 If you don't specify an action for a rule, Bison supplies a default: 3917 `$$ = $1'. Thus, the value of the first symbol in the rule becomes the 3918 value of the whole rule. Of course, the default action is valid only 3919 if the two data types match. There is no meaningful default action for 3920 an empty rule; every empty rule must have an explicit action unless the 3921 rule's value does not matter. 3922 3923 `$N' with N zero or negative is allowed for reference to tokens and 3924 groupings on the stack _before_ those that match the current rule. 3925 This is a very risky practice, and to use it reliably you must be 3926 certain of the context in which the rule is applied. Here is a case in 3927 which you can use this reliably: 3928 3929 foo: 3930 expr bar '+' expr { ... } 3931 | expr bar '-' expr { ... } 3932 ; 3933 3934 bar: 3935 /* empty */ { previous_expr = $0; } 3936 ; 3937 3938 As long as `bar' is used only in the fashion shown here, `$0' always 3939 refers to the `expr' which precedes `bar' in the definition of `foo'. 3940 3941 It is also possible to access the semantic value of the lookahead 3942 token, if any, from a semantic action. This semantic value is stored 3943 in `yylval'. *Note Special Features for Use in Actions: Action 3944 Features. 3945 3946 3947 File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics 3948 3949 3.5.4 Data Types of Values in Actions 3950 ------------------------------------- 3951 3952 If you have chosen a single data type for semantic values, the `$$' and 3953 `$N' constructs always have that data type. 3954 3955 If you have used `%union' to specify a variety of data types, then 3956 you must declare a choice among these types for each terminal or 3957 nonterminal symbol that can have a semantic value. Then each time you 3958 use `$$' or `$N', its data type is determined by which symbol it refers 3959 to in the rule. In this example, 3960 3961 exp: 3962 ... 3963 | exp '+' exp { $$ = $1 + $3; } 3964 3965 `$1' and `$3' refer to instances of `exp', so they all have the data 3966 type declared for the nonterminal symbol `exp'. If `$2' were used, it 3967 would have the data type declared for the terminal symbol `'+'', 3968 whatever that might be. 3969 3970 Alternatively, you can specify the data type when you refer to the 3971 value, by inserting `<TYPE>' after the `$' at the beginning of the 3972 reference. For example, if you have defined types as shown here: 3973 3974 %union { 3975 int itype; 3976 double dtype; 3977 } 3978 3979 then you can write `$<itype>1' to refer to the first subunit of the 3980 rule as an integer, or `$<dtype>1' to refer to it as a double. 3981 3982 3983 File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics 3984 3985 3.5.5 Actions in Mid-Rule 3986 ------------------------- 3987 3988 Occasionally it is useful to put an action in the middle of a rule. 3989 These actions are written just like usual end-of-rule actions, but they 3990 are executed before the parser even recognizes the following components. 3991 3992 * Menu: 3993 3994 * Using Mid-Rule Actions:: Putting an action in the middle of a rule. 3995 * Mid-Rule Action Translation:: How mid-rule actions are actually processed. 3996 * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. 3997 3998 3999 File: bison.info, Node: Using Mid-Rule Actions, Next: Mid-Rule Action Translation, Up: Mid-Rule Actions 4000 4001 3.5.5.1 Using Mid-Rule Actions 4002 .............................. 4003 4004 A mid-rule action may refer to the components preceding it using `$N', 4005 but it may not refer to subsequent components because it is run before 4006 they are parsed. 4007 4008 The mid-rule action itself counts as one of the components of the 4009 rule. This makes a difference when there is another action later in 4010 the same rule (and usually there is another at the end): you have to 4011 count the actions along with the symbols when working out which number 4012 N to use in `$N'. 4013 4014 The mid-rule action can also have a semantic value. The action can 4015 set its value with an assignment to `$$', and actions later in the rule 4016 can refer to the value using `$N'. Since there is no symbol to name 4017 the action, there is no way to declare a data type for the value in 4018 advance, so you must use the `$<...>N' construct to specify a data type 4019 each time you refer to this value. 4020 4021 There is no way to set the value of the entire rule with a mid-rule 4022 action, because assignments to `$$' do not have that effect. The only 4023 way to set the value for the entire rule is with an ordinary action at 4024 the end of the rule. 4025 4026 Here is an example from a hypothetical compiler, handling a `let' 4027 statement that looks like `let (VARIABLE) STATEMENT' and serves to 4028 create a variable named VARIABLE temporarily for the duration of 4029 STATEMENT. To parse this construct, we must put VARIABLE into the 4030 symbol table while STATEMENT is parsed, then remove it afterward. Here 4031 is how it is done: 4032 4033 stmt: 4034 "let" '(' var ')' 4035 { 4036 $<context>$ = push_context (); 4037 declare_variable ($3); 4038 } 4039 stmt 4040 { 4041 $$ = $6; 4042 pop_context ($<context>5); 4043 } 4044 4045 As soon as `let (VARIABLE)' has been recognized, the first action is 4046 run. It saves a copy of the current semantic context (the list of 4047 accessible variables) as its semantic value, using alternative 4048 `context' in the data-type union. Then it calls `declare_variable' to 4049 add the new variable to that list. Once the first action is finished, 4050 the embedded statement `stmt' can be parsed. 4051 4052 Note that the mid-rule action is component number 5, so the `stmt' is 4053 component number 6. Named references can be used to improve the 4054 readability and maintainability (*note Named References::): 4055 4056 stmt: 4057 "let" '(' var ')' 4058 { 4059 $<context>let = push_context (); 4060 declare_variable ($3); 4061 }[let] 4062 stmt 4063 { 4064 $$ = $6; 4065 pop_context ($<context>let); 4066 } 4067 4068 After the embedded statement is parsed, its semantic value becomes 4069 the value of the entire `let'-statement. Then the semantic value from 4070 the earlier action is used to restore the prior list of variables. This 4071 removes the temporary `let'-variable from the list so that it won't 4072 appear to exist while the rest of the program is parsed. 4073 4074 In the above example, if the parser initiates error recovery (*note 4075 Error Recovery::) while parsing the tokens in the embedded statement 4076 `stmt', it might discard the previous semantic context `$<context>5' 4077 without restoring it. Thus, `$<context>5' needs a destructor (*note 4078 Freeing Discarded Symbols: Destructor Decl.). However, Bison currently 4079 provides no means to declare a destructor specific to a particular 4080 mid-rule action's semantic value. 4081 4082 One solution is to bury the mid-rule action inside a nonterminal 4083 symbol and to declare a destructor for that symbol: 4084 4085 %type <context> let 4086 %destructor { pop_context ($$); } let 4087 4088 %% 4089 4090 stmt: 4091 let stmt 4092 { 4093 $$ = $2; 4094 pop_context ($let); 4095 }; 4096 4097 let: 4098 "let" '(' var ')' 4099 { 4100 $let = push_context (); 4101 declare_variable ($3); 4102 }; 4103 4104 Note that the action is now at the end of its rule. Any mid-rule 4105 action can be converted to an end-of-rule action in this way, and this 4106 is what Bison actually does to implement mid-rule actions. 4107 4108 4109 File: bison.info, Node: Mid-Rule Action Translation, Next: Mid-Rule Conflicts, Prev: Using Mid-Rule Actions, Up: Mid-Rule Actions 4110 4111 3.5.5.2 Mid-Rule Action Translation 4112 ................................... 4113 4114 As hinted earlier, mid-rule actions are actually transformed into 4115 regular rules and actions. The various reports generated by Bison 4116 (textual, graphical, etc., see *note Understanding Your Parser: 4117 Understanding.) reveal this translation, best explained by means of an 4118 example. The following rule: 4119 4120 exp: { a(); } "b" { c(); } { d(); } "e" { f(); }; 4121 4122 is translated into: 4123 4124 $@1: /* empty */ { a(); }; 4125 $@2: /* empty */ { c(); }; 4126 $@3: /* empty */ { d(); }; 4127 exp: $@1 "b" $@2 $@3 "e" { f(); }; 4128 4129 with new nonterminal symbols `$@N', where N is a number. 4130 4131 A mid-rule action is expected to generate a value if it uses `$$', or 4132 the (final) action uses `$N' where N denote the mid-rule action. In 4133 that case its nonterminal is rather named `@N': 4134 4135 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4136 4137 is translated into 4138 4139 @1: /* empty */ { a(); }; 4140 @2: /* empty */ { $$ = c(); }; 4141 $@3: /* empty */ { d(); }; 4142 exp: @1 "b" @2 $@3 "e" { f = $1; } 4143 4144 There are probably two errors in the above example: the first 4145 mid-rule action does not generate a value (it does not use `$$' 4146 although the final action uses it), and the value of the second one is 4147 not used (the final action does not use `$3'). Bison reports these 4148 errors when the `midrule-value' warnings are enabled (*note Invoking 4149 Bison: Invocation.): 4150 4151 $ bison -fcaret -Wmidrule-value mid.y 4152 mid.y:2.6-13: warning: unset value: $$ 4153 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4154 ^^^^^^^^ 4155 mid.y:2.19-31: warning: unused value: $3 4156 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4157 ^^^^^^^^^^^^^ 4158 4159 4160 File: bison.info, Node: Mid-Rule Conflicts, Prev: Mid-Rule Action Translation, Up: Mid-Rule Actions 4161 4162 3.5.5.3 Conflicts due to Mid-Rule Actions 4163 ......................................... 4164 4165 Taking action before a rule is completely recognized often leads to 4166 conflicts since the parser must commit to a parse in order to execute 4167 the action. For example, the following two rules, without mid-rule 4168 actions, can coexist in a working parser because the parser can shift 4169 the open-brace token and look at what follows before deciding whether 4170 there is a declaration or not: 4171 4172 compound: 4173 '{' declarations statements '}' 4174 | '{' statements '}' 4175 ; 4176 4177 But when we add a mid-rule action as follows, the rules become 4178 nonfunctional: 4179 4180 compound: 4181 { prepare_for_local_variables (); } 4182 '{' declarations statements '}' 4183 | '{' statements '}' 4184 ; 4185 4186 Now the parser is forced to decide whether to run the mid-rule action 4187 when it has read no farther than the open-brace. In other words, it 4188 must commit to using one rule or the other, without sufficient 4189 information to do it correctly. (The open-brace token is what is called 4190 the "lookahead" token at this time, since the parser is still deciding 4191 what to do about it. *Note Lookahead Tokens: Lookahead.) 4192 4193 You might think that you could correct the problem by putting 4194 identical actions into the two rules, like this: 4195 4196 compound: 4197 { prepare_for_local_variables (); } 4198 '{' declarations statements '}' 4199 | { prepare_for_local_variables (); } 4200 '{' statements '}' 4201 ; 4202 4203 But this does not help, because Bison does not realize that the two 4204 actions are identical. (Bison never tries to understand the C code in 4205 an action.) 4206 4207 If the grammar is such that a declaration can be distinguished from a 4208 statement by the first token (which is true in C), then one solution 4209 which does work is to put the action after the open-brace, like this: 4210 4211 compound: 4212 '{' { prepare_for_local_variables (); } 4213 declarations statements '}' 4214 | '{' statements '}' 4215 ; 4216 4217 Now the first token of the following declaration or statement, which 4218 would in any case tell Bison which rule to use, can still do so. 4219 4220 Another solution is to bury the action inside a nonterminal symbol 4221 which serves as a subroutine: 4222 4223 subroutine: 4224 /* empty */ { prepare_for_local_variables (); } 4225 ; 4226 4227 compound: 4228 subroutine '{' declarations statements '}' 4229 | subroutine '{' statements '}' 4230 ; 4231 4232 Now Bison can execute the action in the rule for `subroutine' without 4233 deciding which rule for `compound' it will eventually use. 4234 4235 4236 File: bison.info, Node: Tracking Locations, Next: Named References, Prev: Semantics, Up: Grammar File 4237 4238 3.6 Tracking Locations 4239 ====================== 4240 4241 Though grammar rules and semantic actions are enough to write a fully 4242 functional parser, it can be useful to process some additional 4243 information, especially symbol locations. 4244 4245 The way locations are handled is defined by providing a data type, 4246 and actions to take when rules are matched. 4247 4248 * Menu: 4249 4250 * Location Type:: Specifying a data type for locations. 4251 * Actions and Locations:: Using locations in actions. 4252 * Location Default Action:: Defining a general way to compute locations. 4253 4254 4255 File: bison.info, Node: Location Type, Next: Actions and Locations, Up: Tracking Locations 4256 4257 3.6.1 Data Type of Locations 4258 ---------------------------- 4259 4260 Defining a data type for locations is much simpler than for semantic 4261 values, since all tokens and groupings always use the same type. 4262 4263 You can specify the type of locations by defining a macro called 4264 `YYLTYPE', just as you can specify the semantic value type by defining 4265 a `YYSTYPE' macro (*note Value Type::). When `YYLTYPE' is not defined, 4266 Bison uses a default structure type with four members: 4267 4268 typedef struct YYLTYPE 4269 { 4270 int first_line; 4271 int first_column; 4272 int last_line; 4273 int last_column; 4274 } YYLTYPE; 4275 4276 When `YYLTYPE' is not defined, at the beginning of the parsing, Bison 4277 initializes all these fields to 1 for `yylloc'. To initialize `yylloc' 4278 with a custom location type (or to chose a different initialization), 4279 use the `%initial-action' directive. *Note Performing Actions before 4280 Parsing: Initial Action Decl. 4281 4282 4283 File: bison.info, Node: Actions and Locations, Next: Location Default Action, Prev: Location Type, Up: Tracking Locations 4284 4285 3.6.2 Actions and Locations 4286 --------------------------- 4287 4288 Actions are not only useful for defining language semantics, but also 4289 for describing the behavior of the output parser with locations. 4290 4291 The most obvious way for building locations of syntactic groupings 4292 is very similar to the way semantic values are computed. In a given 4293 rule, several constructs can be used to access the locations of the 4294 elements being matched. The location of the Nth component of the right 4295 hand side is `@N', while the location of the left hand side grouping is 4296 `@$'. 4297 4298 In addition, the named references construct `@NAME' and `@[NAME]' 4299 may also be used to address the symbol locations. *Note Named 4300 References::, for more information about using the named references 4301 construct. 4302 4303 Here is a basic example using the default data type for locations: 4304 4305 exp: 4306 ... 4307 | exp '/' exp 4308 { 4309 @$.first_column = @1.first_column; 4310 @$.first_line = @1.first_line; 4311 @$.last_column = @3.last_column; 4312 @$.last_line = @3.last_line; 4313 if ($3) 4314 $$ = $1 / $3; 4315 else 4316 { 4317 $$ = 1; 4318 fprintf (stderr, 4319 "Division by zero, l%d,c%d-l%d,c%d", 4320 @3.first_line, @3.first_column, 4321 @3.last_line, @3.last_column); 4322 } 4323 } 4324 4325 As for semantic values, there is a default action for locations that 4326 is run each time a rule is matched. It sets the beginning of `@$' to 4327 the beginning of the first symbol, and the end of `@$' to the end of the 4328 last symbol. 4329 4330 With this default action, the location tracking can be fully 4331 automatic. The example above simply rewrites this way: 4332 4333 exp: 4334 ... 4335 | exp '/' exp 4336 { 4337 if ($3) 4338 $$ = $1 / $3; 4339 else 4340 { 4341 $$ = 1; 4342 fprintf (stderr, 4343 "Division by zero, l%d,c%d-l%d,c%d", 4344 @3.first_line, @3.first_column, 4345 @3.last_line, @3.last_column); 4346 } 4347 } 4348 4349 It is also possible to access the location of the lookahead token, 4350 if any, from a semantic action. This location is stored in `yylloc'. 4351 *Note Special Features for Use in Actions: Action Features. 4352 4353 4354 File: bison.info, Node: Location Default Action, Prev: Actions and Locations, Up: Tracking Locations 4355 4356 3.6.3 Default Action for Locations 4357 ---------------------------------- 4358 4359 Actually, actions are not the best place to compute locations. Since 4360 locations are much more general than semantic values, there is room in 4361 the output parser to redefine the default action to take for each rule. 4362 The `YYLLOC_DEFAULT' macro is invoked each time a rule is matched, 4363 before the associated action is run. It is also invoked while 4364 processing a syntax error, to compute the error's location. Before 4365 reporting an unresolvable syntactic ambiguity, a GLR parser invokes 4366 `YYLLOC_DEFAULT' recursively to compute the location of that ambiguity. 4367 4368 Most of the time, this macro is general enough to suppress location 4369 dedicated code from semantic actions. 4370 4371 The `YYLLOC_DEFAULT' macro takes three parameters. The first one is 4372 the location of the grouping (the result of the computation). When a 4373 rule is matched, the second parameter identifies locations of all right 4374 hand side elements of the rule being matched, and the third parameter 4375 is the size of the rule's right hand side. When a GLR parser reports 4376 an ambiguity, which of multiple candidate right hand sides it passes to 4377 `YYLLOC_DEFAULT' is undefined. When processing a syntax error, the 4378 second parameter identifies locations of the symbols that were 4379 discarded during error processing, and the third parameter is the 4380 number of discarded symbols. 4381 4382 By default, `YYLLOC_DEFAULT' is defined this way: 4383 4384 # define YYLLOC_DEFAULT(Cur, Rhs, N) \ 4385 do \ 4386 if (N) \ 4387 { \ 4388 (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ 4389 (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ 4390 (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ 4391 (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ 4392 } \ 4393 else \ 4394 { \ 4395 (Cur).first_line = (Cur).last_line = \ 4396 YYRHSLOC(Rhs, 0).last_line; \ 4397 (Cur).first_column = (Cur).last_column = \ 4398 YYRHSLOC(Rhs, 0).last_column; \ 4399 } \ 4400 while (0) 4401 4402 where `YYRHSLOC (rhs, k)' is the location of the Kth symbol in RHS when 4403 K is positive, and the location of the symbol just before the reduction 4404 when K and N are both zero. 4405 4406 When defining `YYLLOC_DEFAULT', you should consider that: 4407 4408 * All arguments are free of side-effects. However, only the first 4409 one (the result) should be modified by `YYLLOC_DEFAULT'. 4410 4411 * For consistency with semantic actions, valid indexes within the 4412 right hand side range from 1 to N. When N is zero, only 0 is a 4413 valid index, and it refers to the symbol just before the reduction. 4414 During error processing N is always positive. 4415 4416 * Your macro should parenthesize its arguments, if need be, since the 4417 actual arguments may not be surrounded by parentheses. Also, your 4418 macro should expand to something that can be used as a single 4419 statement when it is followed by a semicolon. 4420 4421 4422 File: bison.info, Node: Named References, Next: Declarations, Prev: Tracking Locations, Up: Grammar File 4423 4424 3.7 Named References 4425 ==================== 4426 4427 As described in the preceding sections, the traditional way to refer to 4428 any semantic value or location is a "positional reference", which takes 4429 the form `$N', `$$', `@N', and `@$'. However, such a reference is not 4430 very descriptive. Moreover, if you later decide to insert or remove 4431 symbols in the right-hand side of a grammar rule, the need to renumber 4432 such references can be tedious and error-prone. 4433 4434 To avoid these issues, you can also refer to a semantic value or 4435 location using a "named reference". First of all, original symbol 4436 names may be used as named references. For example: 4437 4438 invocation: op '(' args ')' 4439 { $invocation = new_invocation ($op, $args, @invocation); } 4440 4441 Positional and named references can be mixed arbitrarily. For example: 4442 4443 invocation: op '(' args ')' 4444 { $$ = new_invocation ($op, $args, @$); } 4445 4446 However, sometimes regular symbol names are not sufficient due to 4447 ambiguities: 4448 4449 exp: exp '/' exp 4450 { $exp = $exp / $exp; } // $exp is ambiguous. 4451 4452 exp: exp '/' exp 4453 { $$ = $1 / $exp; } // One usage is ambiguous. 4454 4455 exp: exp '/' exp 4456 { $$ = $1 / $3; } // No error. 4457 4458 When ambiguity occurs, explicitly declared names may be used for values 4459 and locations. Explicit names are declared as a bracketed name after a 4460 symbol appearance in rule definitions. For example: 4461 exp[result]: exp[left] '/' exp[right] 4462 { $result = $left / $right; } 4463 4464 In order to access a semantic value generated by a mid-rule action, an 4465 explicit name may also be declared by putting a bracketed name after the 4466 closing brace of the mid-rule action code: 4467 exp[res]: exp[x] '+' {$left = $x;}[left] exp[right] 4468 { $res = $left + $right; } 4469 4470 In references, in order to specify names containing dots and dashes, an 4471 explicit bracketed syntax `$[name]' and `@[name]' must be used: 4472 if-stmt: "if" '(' expr ')' "then" then.stmt ';' 4473 { $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); } 4474 4475 It often happens that named references are followed by a dot, dash 4476 or other C punctuation marks and operators. By default, Bison will read 4477 `$name.suffix' as a reference to symbol value `$name' followed by 4478 `.suffix', i.e., an access to the `suffix' field of the semantic value. 4479 In order to force Bison to recognize `name.suffix' in its entirety as 4480 the name of a semantic value, the bracketed syntax `$[name.suffix]' 4481 must be used. 4482 4483 The named references feature is experimental. More user feedback 4484 will help to stabilize it. 4485 4486 4487 File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Named References, Up: Grammar File 4488 4489 3.8 Bison Declarations 4490 ====================== 4491 4492 The "Bison declarations" section of a Bison grammar defines the symbols 4493 used in formulating the grammar and the data types of semantic values. 4494 *Note Symbols::. 4495 4496 All token type names (but not single-character literal tokens such as 4497 `'+'' and `'*'') must be declared. Nonterminal symbols must be 4498 declared if you need to specify which data type to use for the semantic 4499 value (*note More Than One Value Type: Multiple Types.). 4500 4501 The first rule in the grammar file also specifies the start symbol, 4502 by default. If you want some other symbol to be the start symbol, you 4503 must declare it explicitly (*note Languages and Context-Free Grammars: 4504 Language and Grammar.). 4505 4506 * Menu: 4507 4508 * Require Decl:: Requiring a Bison version. 4509 * Token Decl:: Declaring terminal symbols. 4510 * Precedence Decl:: Declaring terminals with precedence and associativity. 4511 * Union Decl:: Declaring the set of all semantic value types. 4512 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 4513 * Initial Action Decl:: Code run before parsing starts. 4514 * Destructor Decl:: Declaring how symbols are freed. 4515 * Printer Decl:: Declaring how symbol values are displayed. 4516 * Expect Decl:: Suppressing warnings about parsing conflicts. 4517 * Start Decl:: Specifying the start symbol. 4518 * Pure Decl:: Requesting a reentrant parser. 4519 * Push Decl:: Requesting a push parser. 4520 * Decl Summary:: Table of all Bison declarations. 4521 * %define Summary:: Defining variables to adjust Bison's behavior. 4522 * %code Summary:: Inserting code into the parser source. 4523 4524 4525 File: bison.info, Node: Require Decl, Next: Token Decl, Up: Declarations 4526 4527 3.8.1 Require a Version of Bison 4528 -------------------------------- 4529 4530 You may require the minimum version of Bison to process the grammar. If 4531 the requirement is not met, `bison' exits with an error (exit status 4532 63). 4533 4534 %require "VERSION" 4535 4536 4537 File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Require Decl, Up: Declarations 4538 4539 3.8.2 Token Type Names 4540 ---------------------- 4541 4542 The basic way to declare a token type name (terminal symbol) is as 4543 follows: 4544 4545 %token NAME 4546 4547 Bison will convert this into a `#define' directive in the parser, so 4548 that the function `yylex' (if it is in this file) can use the name NAME 4549 to stand for this token type's code. 4550 4551 Alternatively, you can use `%left', `%right', or `%nonassoc' instead 4552 of `%token', if you wish to specify associativity and precedence. 4553 *Note Operator Precedence: Precedence Decl. 4554 4555 You can explicitly specify the numeric code for a token type by 4556 appending a nonnegative decimal or hexadecimal integer value in the 4557 field immediately following the token name: 4558 4559 %token NUM 300 4560 %token XNUM 0x12d // a GNU extension 4561 4562 It is generally best, however, to let Bison choose the numeric codes for 4563 all token types. Bison will automatically select codes that don't 4564 conflict with each other or with normal characters. 4565 4566 In the event that the stack type is a union, you must augment the 4567 `%token' or other token declaration to include the data type 4568 alternative delimited by angle-brackets (*note More Than One Value 4569 Type: Multiple Types.). 4570 4571 For example: 4572 4573 %union { /* define stack type */ 4574 double val; 4575 symrec *tptr; 4576 } 4577 %token <val> NUM /* define token NUM and its type */ 4578 4579 You can associate a literal string token with a token type name by 4580 writing the literal string at the end of a `%token' declaration which 4581 declares the name. For example: 4582 4583 %token arrow "=>" 4584 4585 For example, a grammar for the C language might specify these names with 4586 equivalent literal string tokens: 4587 4588 %token <operator> OR "||" 4589 %token <operator> LE 134 "<=" 4590 %left OR "<=" 4591 4592 Once you equate the literal string and the token name, you can use them 4593 interchangeably in further declarations or the grammar rules. The 4594 `yylex' function can use the token name or the literal string to obtain 4595 the token type code number (*note Calling Convention::). Syntax error 4596 messages passed to `yyerror' from the parser will reference the literal 4597 string instead of the token name. 4598 4599 The token numbered as 0 corresponds to end of file; the following 4600 line allows for nicer error messages referring to "end of file" instead 4601 of "$end": 4602 4603 %token END 0 "end of file" 4604 4605 4606 File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations 4607 4608 3.8.3 Operator Precedence 4609 ------------------------- 4610 4611 Use the `%left', `%right' or `%nonassoc' declaration to declare a token 4612 and specify its precedence and associativity, all at once. These are 4613 called "precedence declarations". *Note Operator Precedence: 4614 Precedence, for general information on operator precedence. 4615 4616 The syntax of a precedence declaration is nearly the same as that of 4617 `%token': either 4618 4619 %left SYMBOLS... 4620 4621 or 4622 4623 %left <TYPE> SYMBOLS... 4624 4625 And indeed any of these declarations serves the purposes of `%token'. 4626 But in addition, they specify the associativity and relative precedence 4627 for all the SYMBOLS: 4628 4629 * The associativity of an operator OP determines how repeated uses 4630 of the operator nest: whether `X OP Y OP Z' is parsed by grouping 4631 X with Y first or by grouping Y with Z first. `%left' specifies 4632 left-associativity (grouping X with Y first) and `%right' 4633 specifies right-associativity (grouping Y with Z first). 4634 `%nonassoc' specifies no associativity, which means that `X OP Y 4635 OP Z' is considered a syntax error. 4636 4637 * The precedence of an operator determines how it nests with other 4638 operators. All the tokens declared in a single precedence 4639 declaration have equal precedence and nest together according to 4640 their associativity. When two tokens declared in different 4641 precedence declarations associate, the one declared later has the 4642 higher precedence and is grouped first. 4643 4644 For backward compatibility, there is a confusing difference between 4645 the argument lists of `%token' and precedence declarations. Only a 4646 `%token' can associate a literal string with a token type name. A 4647 precedence declaration always interprets a literal string as a 4648 reference to a separate token. For example: 4649 4650 %left OR "<=" // Does not declare an alias. 4651 %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". 4652 4653 4654 File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations 4655 4656 3.8.4 The Collection of Value Types 4657 ----------------------------------- 4658 4659 The `%union' declaration specifies the entire collection of possible 4660 data types for semantic values. The keyword `%union' is followed by 4661 braced code containing the same thing that goes inside a `union' in C. 4662 4663 For example: 4664 4665 %union { 4666 double val; 4667 symrec *tptr; 4668 } 4669 4670 This says that the two alternative types are `double' and `symrec *'. 4671 They are given names `val' and `tptr'; these names are used in the 4672 `%token' and `%type' declarations to pick one of the types for a 4673 terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.). 4674 4675 As an extension to POSIX, a tag is allowed after the `union'. For 4676 example: 4677 4678 %union value { 4679 double val; 4680 symrec *tptr; 4681 } 4682 4683 specifies the union tag `value', so the corresponding C type is `union 4684 value'. If you do not specify a tag, it defaults to `YYSTYPE'. 4685 4686 As another extension to POSIX, you may specify multiple `%union' 4687 declarations; their contents are concatenated. However, only the first 4688 `%union' declaration can specify a tag. 4689 4690 Note that, unlike making a `union' declaration in C, you need not 4691 write a semicolon after the closing brace. 4692 4693 Instead of `%union', you can define and use your own union type 4694 `YYSTYPE' if your grammar contains at least one `<TYPE>' tag. For 4695 example, you can put the following into a header file `parser.h': 4696 4697 union YYSTYPE { 4698 double val; 4699 symrec *tptr; 4700 }; 4701 typedef union YYSTYPE YYSTYPE; 4702 4703 and then your grammar can use the following instead of `%union': 4704 4705 %{ 4706 #include "parser.h" 4707 %} 4708 %type <val> expr 4709 %token <tptr> ID 4710 4711 4712 File: bison.info, Node: Type Decl, Next: Initial Action Decl, Prev: Union Decl, Up: Declarations 4713 4714 3.8.5 Nonterminal Symbols 4715 ------------------------- 4716 4717 When you use `%union' to specify multiple value types, you must declare 4718 the value type of each nonterminal symbol for which values are used. 4719 This is done with a `%type' declaration, like this: 4720 4721 %type <TYPE> NONTERMINAL... 4722 4723 Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the 4724 name given in the `%union' to the alternative that you want (*note The 4725 Collection of Value Types: Union Decl.). You can give any number of 4726 nonterminal symbols in the same `%type' declaration, if they have the 4727 same value type. Use spaces to separate the symbol names. 4728 4729 You can also declare the value type of a terminal symbol. To do 4730 this, use the same `<TYPE>' construction in a declaration for the 4731 terminal symbol. All kinds of token declarations allow `<TYPE>'. 4732 4733 4734 File: bison.info, Node: Initial Action Decl, Next: Destructor Decl, Prev: Type Decl, Up: Declarations 4735 4736 3.8.6 Performing Actions before Parsing 4737 --------------------------------------- 4738 4739 Sometimes your parser needs to perform some initializations before 4740 parsing. The `%initial-action' directive allows for such arbitrary 4741 code. 4742 4743 -- Directive: %initial-action { CODE } 4744 Declare that the braced CODE must be invoked before parsing each 4745 time `yyparse' is called. The CODE may use `$$' (or `$<TAG>$') 4746 and `@$' -- initial value and location of the lookahead -- and the 4747 `%parse-param'. 4748 4749 For instance, if your locations use a file name, you may use 4750 4751 %parse-param { char const *file_name }; 4752 %initial-action 4753 { 4754 @$.initialize (file_name); 4755 }; 4756 4757 4758 File: bison.info, Node: Destructor Decl, Next: Printer Decl, Prev: Initial Action Decl, Up: Declarations 4759 4760 3.8.7 Freeing Discarded Symbols 4761 ------------------------------- 4762 4763 During error recovery (*note Error Recovery::), symbols already pushed 4764 on the stack and tokens coming from the rest of the file are discarded 4765 until the parser falls on its feet. If the parser runs out of memory, 4766 or if it returns via `YYABORT' or `YYACCEPT', all the symbols on the 4767 stack must be discarded. Even if the parser succeeds, it must discard 4768 the start symbol. 4769 4770 When discarded symbols convey heap based information, this memory is 4771 lost. While this behavior can be tolerable for batch parsers, such as 4772 in traditional compilers, it is unacceptable for programs like shells or 4773 protocol implementations that may parse and execute indefinitely. 4774 4775 The `%destructor' directive defines code that is called when a 4776 symbol is automatically discarded. 4777 4778 -- Directive: %destructor { CODE } SYMBOLS 4779 Invoke the braced CODE whenever the parser discards one of the 4780 SYMBOLS. Within CODE, `$$' (or `$<TAG>$') designates the semantic 4781 value associated with the discarded symbol, and `@$' designates 4782 its location. The additional parser parameters are also available 4783 (*note The Parser Function `yyparse': Parser Function.). 4784 4785 When a symbol is listed among SYMBOLS, its `%destructor' is called 4786 a per-symbol `%destructor'. You may also define a per-type 4787 `%destructor' by listing a semantic type tag among SYMBOLS. In 4788 that case, the parser will invoke this CODE whenever it discards 4789 any grammar symbol that has that semantic type tag unless that 4790 symbol has its own per-symbol `%destructor'. 4791 4792 Finally, you can define two different kinds of default 4793 `%destructor's. (These default forms are experimental. More user 4794 feedback will help to determine whether they should become 4795 permanent features.) You can place each of `<*>' and `<>' in the 4796 SYMBOLS list of exactly one `%destructor' declaration in your 4797 grammar file. The parser will invoke the CODE associated with one 4798 of these whenever it discards any user-defined grammar symbol that 4799 has no per-symbol and no per-type `%destructor'. The parser uses 4800 the CODE for `<*>' in the case of such a grammar symbol for which 4801 you have formally declared a semantic type tag (`%type' counts as 4802 such a declaration, but `$<tag>$' does not). The parser uses the 4803 CODE for `<>' in the case of such a grammar symbol that has no 4804 declared semantic type tag. 4805 4806 For example: 4807 4808 %union { char *string; } 4809 %token <string> STRING1 4810 %token <string> STRING2 4811 %type <string> string1 4812 %type <string> string2 4813 %union { char character; } 4814 %token <character> CHR 4815 %type <character> chr 4816 %token TAGLESS 4817 4818 %destructor { } <character> 4819 %destructor { free ($$); } <*> 4820 %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1 4821 %destructor { printf ("Discarding tagless symbol.\n"); } <> 4822 4823 guarantees that, when the parser discards any user-defined symbol that 4824 has a semantic type tag other than `<character>', it passes its 4825 semantic value to `free' by default. However, when the parser discards 4826 a `STRING1' or a `string1', it also prints its line number to `stdout'. 4827 It performs only the second `%destructor' in this case, so it invokes 4828 `free' only once. Finally, the parser merely prints a message whenever 4829 it discards any symbol, such as `TAGLESS', that has no semantic type 4830 tag. 4831 4832 A Bison-generated parser invokes the default `%destructor's only for 4833 user-defined as opposed to Bison-defined symbols. For example, the 4834 parser will not invoke either kind of default `%destructor' for the 4835 special Bison-defined symbols `$accept', `$undefined', or `$end' (*note 4836 Bison Symbols: Table of Symbols.), none of which you can reference in 4837 your grammar. It also will not invoke either for the `error' token 4838 (*note error: Table of Symbols.), which is always defined by Bison 4839 regardless of whether you reference it in your grammar. However, it 4840 may invoke one of them for the end token (token 0) if you redefine it 4841 from `$end' to, for example, `END': 4842 4843 %token END 0 4844 4845 Finally, Bison will never invoke a `%destructor' for an unreferenced 4846 mid-rule semantic value (*note Actions in Mid-Rule: Mid-Rule Actions.). 4847 That is, Bison does not consider a mid-rule to have a semantic value if 4848 you do not reference `$$' in the mid-rule's action or `$N' (where N is 4849 the right-hand side symbol position of the mid-rule) in any later 4850 action in that rule. However, if you do reference either, the 4851 Bison-generated parser will invoke the `<>' `%destructor' whenever it 4852 discards the mid-rule symbol. 4853 4854 4855 "Discarded symbols" are the following: 4856 4857 * stacked symbols popped during the first phase of error recovery, 4858 4859 * incoming terminals during the second phase of error recovery, 4860 4861 * the current lookahead and the entire stack (except the current 4862 right-hand side symbols) when the parser returns immediately, and 4863 4864 * the current lookahead and the entire stack (including the current 4865 right-hand side symbols) when the C++ parser (`lalr1.cc') catches 4866 an exception in `parse', 4867 4868 * the start symbol, when the parser succeeds. 4869 4870 The parser can "return immediately" because of an explicit call to 4871 `YYABORT' or `YYACCEPT', or failed error recovery, or memory exhaustion. 4872 4873 Right-hand side symbols of a rule that explicitly triggers a syntax 4874 error via `YYERROR' are not discarded automatically. As a rule of 4875 thumb, destructors are invoked only when user actions cannot manage the 4876 memory. 4877 4878 4879 File: bison.info, Node: Printer Decl, Next: Expect Decl, Prev: Destructor Decl, Up: Declarations 4880 4881 3.8.8 Printing Semantic Values 4882 ------------------------------ 4883 4884 When run-time traces are enabled (*note Tracing Your Parser: Tracing.), 4885 the parser reports its actions, such as reductions. When a symbol 4886 involved in an action is reported, only its kind is displayed, as the 4887 parser cannot know how semantic values should be formatted. 4888 4889 The `%printer' directive defines code that is called when a symbol is 4890 reported. Its syntax is the same as `%destructor' (*note Freeing 4891 Discarded Symbols: Destructor Decl.). 4892 4893 -- Directive: %printer { CODE } SYMBOLS 4894 Invoke the braced CODE whenever the parser displays one of the 4895 SYMBOLS. Within CODE, `yyoutput' denotes the output stream (a 4896 `FILE*' in C, and an `std::ostream&' in C++), `$$' (or `$<TAG>$') 4897 designates the semantic value associated with the symbol, and `@$' 4898 its location. The additional parser parameters are also available 4899 (*note The Parser Function `yyparse': Parser Function.). 4900 4901 The SYMBOLS are defined as for `%destructor' (*note Freeing 4902 Discarded Symbols: Destructor Decl.): they can be per-type (e.g., 4903 `<ival>'), per-symbol (e.g., `exp', `NUM', `"float"'), typed 4904 per-default (i.e., `<*>', or untyped per-default (i.e., `<>'). 4905 4906 For example: 4907 4908 %union { char *string; } 4909 %token <string> STRING1 4910 %token <string> STRING2 4911 %type <string> string1 4912 %type <string> string2 4913 %union { char character; } 4914 %token <character> CHR 4915 %type <character> chr 4916 %token TAGLESS 4917 4918 %printer { fprintf (yyoutput, "'%c'", $$); } <character> 4919 %printer { fprintf (yyoutput, "&%p", $$); } <*> 4920 %printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1 4921 %printer { fprintf (yyoutput, "<>"); } <> 4922 4923 guarantees that, when the parser print any symbol that has a semantic 4924 type tag other than `<character>', it display the address of the 4925 semantic value by default. However, when the parser displays a 4926 `STRING1' or a `string1', it formats it as a string in double quotes. 4927 It performs only the second `%printer' in this case, so it prints only 4928 once. Finally, the parser print `<>' for any symbol, such as `TAGLESS', 4929 that has no semantic type tag. See also 4930 4931 4932 File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Printer Decl, Up: Declarations 4933 4934 3.8.9 Suppressing Conflict Warnings 4935 ----------------------------------- 4936 4937 Bison normally warns if there are any conflicts in the grammar (*note 4938 Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars have 4939 harmless shift/reduce conflicts which are resolved in a predictable way 4940 and would be difficult to eliminate. It is desirable to suppress the 4941 warning about these conflicts unless the number of conflicts changes. 4942 You can do this with the `%expect' declaration. 4943 4944 The declaration looks like this: 4945 4946 %expect N 4947 4948 Here N is a decimal integer. The declaration says there should be N 4949 shift/reduce conflicts and no reduce/reduce conflicts. Bison reports 4950 an error if the number of shift/reduce conflicts differs from N, or if 4951 there are any reduce/reduce conflicts. 4952 4953 For deterministic parsers, reduce/reduce conflicts are more serious, 4954 and should be eliminated entirely. Bison will always report 4955 reduce/reduce conflicts for these parsers. With GLR parsers, however, 4956 both kinds of conflicts are routine; otherwise, there would be no need 4957 to use GLR parsing. Therefore, it is also possible to specify an 4958 expected number of reduce/reduce conflicts in GLR parsers, using the 4959 declaration: 4960 4961 %expect-rr N 4962 4963 In general, using `%expect' involves these steps: 4964 4965 * Compile your grammar without `%expect'. Use the `-v' option to 4966 get a verbose list of where the conflicts occur. Bison will also 4967 print the number of conflicts. 4968 4969 * Check each of the conflicts to make sure that Bison's default 4970 resolution is what you really want. If not, rewrite the grammar 4971 and go back to the beginning. 4972 4973 * Add an `%expect' declaration, copying the number N from the number 4974 which Bison printed. With GLR parsers, add an `%expect-rr' 4975 declaration as well. 4976 4977 Now Bison will report an error if you introduce an unexpected 4978 conflict, but will keep silent otherwise. 4979 4980 4981 File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations 4982 4983 3.8.10 The Start-Symbol 4984 ----------------------- 4985 4986 Bison assumes by default that the start symbol for the grammar is the 4987 first nonterminal specified in the grammar specification section. The 4988 programmer may override this restriction with the `%start' declaration 4989 as follows: 4990 4991 %start SYMBOL 4992 4993 4994 File: bison.info, Node: Pure Decl, Next: Push Decl, Prev: Start Decl, Up: Declarations 4995 4996 3.8.11 A Pure (Reentrant) Parser 4997 -------------------------------- 4998 4999 A "reentrant" program is one which does not alter in the course of 5000 execution; in other words, it consists entirely of "pure" (read-only) 5001 code. Reentrancy is important whenever asynchronous execution is 5002 possible; for example, a nonreentrant program may not be safe to call 5003 from a signal handler. In systems with multiple threads of control, a 5004 nonreentrant program must be called only within interlocks. 5005 5006 Normally, Bison generates a parser which is not reentrant. This is 5007 suitable for most uses, and it permits compatibility with Yacc. (The 5008 standard Yacc interfaces are inherently nonreentrant, because they use 5009 statically allocated variables for communication with `yylex', 5010 including `yylval' and `yylloc'.) 5011 5012 Alternatively, you can generate a pure, reentrant parser. The Bison 5013 declaration `%define api.pure' says that you want the parser to be 5014 reentrant. It looks like this: 5015 5016 %define api.pure full 5017 5018 The result is that the communication variables `yylval' and `yylloc' 5019 become local variables in `yyparse', and a different calling convention 5020 is used for the lexical analyzer function `yylex'. *Note Calling 5021 Conventions for Pure Parsers: Pure Calling, for the details of this. 5022 The variable `yynerrs' becomes local in `yyparse' in pull mode but it 5023 becomes a member of yypstate in push mode. (*note The Error Reporting 5024 Function `yyerror': Error Reporting.). The convention for calling 5025 `yyparse' itself is unchanged. 5026 5027 Whether the parser is pure has nothing to do with the grammar rules. 5028 You can generate either a pure parser or a nonreentrant parser from any 5029 valid grammar. 5030 5031 5032 File: bison.info, Node: Push Decl, Next: Decl Summary, Prev: Pure Decl, Up: Declarations 5033 5034 3.8.12 A Push Parser 5035 -------------------- 5036 5037 (The current push parsing interface is experimental and may evolve. 5038 More user feedback will help to stabilize it.) 5039 5040 A pull parser is called once and it takes control until all its input 5041 is completely parsed. A push parser, on the other hand, is called each 5042 time a new token is made available. 5043 5044 A push parser is typically useful when the parser is part of a main 5045 event loop in the client's application. This is typically a 5046 requirement of a GUI, when the main event loop needs to be triggered 5047 within a certain time period. 5048 5049 Normally, Bison generates a pull parser. The following Bison 5050 declaration says that you want the parser to be a push parser (*note 5051 api.push-pull: %define Summary.): 5052 5053 %define api.push-pull push 5054 5055 In almost all cases, you want to ensure that your push parser is also 5056 a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). The only 5057 time you should create an impure push parser is to have backwards 5058 compatibility with the impure Yacc pull mode interface. Unless you know 5059 what you are doing, your declarations should look like this: 5060 5061 %define api.pure full 5062 %define api.push-pull push 5063 5064 There is a major notable functional difference between the pure push 5065 parser and the impure push parser. It is acceptable for a pure push 5066 parser to have many parser instances, of the same type of parser, in 5067 memory at the same time. An impure push parser should only use one 5068 parser at a time. 5069 5070 When a push parser is selected, Bison will generate some new symbols 5071 in the generated parser. `yypstate' is a structure that the generated 5072 parser uses to store the parser's state. `yypstate_new' is the 5073 function that will create a new parser instance. `yypstate_delete' 5074 will free the resources associated with the corresponding parser 5075 instance. Finally, `yypush_parse' is the function that should be 5076 called whenever a token is available to provide the parser. A trivial 5077 example of using a pure push parser would look like this: 5078 5079 int status; 5080 yypstate *ps = yypstate_new (); 5081 do { 5082 status = yypush_parse (ps, yylex (), NULL); 5083 } while (status == YYPUSH_MORE); 5084 yypstate_delete (ps); 5085 5086 If the user decided to use an impure push parser, a few things about 5087 the generated parser will change. The `yychar' variable becomes a 5088 global variable instead of a variable in the `yypush_parse' function. 5089 For this reason, the signature of the `yypush_parse' function is 5090 changed to remove the token as a parameter. A nonreentrant push parser 5091 example would thus look like this: 5092 5093 extern int yychar; 5094 int status; 5095 yypstate *ps = yypstate_new (); 5096 do { 5097 yychar = yylex (); 5098 status = yypush_parse (ps); 5099 } while (status == YYPUSH_MORE); 5100 yypstate_delete (ps); 5101 5102 That's it. Notice the next token is put into the global variable 5103 `yychar' for use by the next invocation of the `yypush_parse' function. 5104 5105 Bison also supports both the push parser interface along with the 5106 pull parser interface in the same generated parser. In order to get 5107 this functionality, you should replace the `%define api.push-pull push' 5108 declaration with the `%define api.push-pull both' declaration. Doing 5109 this will create all of the symbols mentioned earlier along with the 5110 two extra symbols, `yyparse' and `yypull_parse'. `yyparse' can be used 5111 exactly as it normally would be used. However, the user should note 5112 that it is implemented in the generated parser by calling 5113 `yypull_parse'. This makes the `yyparse' function that is generated 5114 with the `%define api.push-pull both' declaration slower than the normal 5115 `yyparse' function. If the user calls the `yypull_parse' function it 5116 will parse the rest of the input stream. It is possible to 5117 `yypush_parse' tokens to select a subgrammar and then `yypull_parse' 5118 the rest of the input stream. If you would like to switch back and 5119 forth between between parsing styles, you would have to write your own 5120 `yypull_parse' function that knows when to quit looking for input. An 5121 example of using the `yypull_parse' function would look like this: 5122 5123 yypstate *ps = yypstate_new (); 5124 yypull_parse (ps); /* Will call the lexer */ 5125 yypstate_delete (ps); 5126 5127 Adding the `%define api.pure full' declaration does exactly the same 5128 thing to the generated parser with `%define api.push-pull both' as it 5129 did for `%define api.push-pull push'. 5130 5131 5132 File: bison.info, Node: Decl Summary, Next: %define Summary, Prev: Push Decl, Up: Declarations 5133 5134 3.8.13 Bison Declaration Summary 5135 -------------------------------- 5136 5137 Here is a summary of the declarations used to define a grammar: 5138 5139 -- Directive: %union 5140 Declare the collection of data types that semantic values may have 5141 (*note The Collection of Value Types: Union Decl.). 5142 5143 -- Directive: %token 5144 Declare a terminal symbol (token type name) with no precedence or 5145 associativity specified (*note Token Type Names: Token Decl.). 5146 5147 -- Directive: %right 5148 Declare a terminal symbol (token type name) that is 5149 right-associative (*note Operator Precedence: Precedence Decl.). 5150 5151 -- Directive: %left 5152 Declare a terminal symbol (token type name) that is 5153 left-associative (*note Operator Precedence: Precedence Decl.). 5154 5155 -- Directive: %nonassoc 5156 Declare a terminal symbol (token type name) that is nonassociative 5157 (*note Operator Precedence: Precedence Decl.). Using it in a way 5158 that would be associative is a syntax error. 5159 5160 -- Directive: %type 5161 Declare the type of semantic values for a nonterminal symbol 5162 (*note Nonterminal Symbols: Type Decl.). 5163 5164 -- Directive: %start 5165 Specify the grammar's start symbol (*note The Start-Symbol: Start 5166 Decl.). 5167 5168 -- Directive: %expect 5169 Declare the expected number of shift-reduce conflicts (*note 5170 Suppressing Conflict Warnings: Expect Decl.). 5171 5172 5173 In order to change the behavior of `bison', use the following 5174 directives: 5175 5176 -- Directive: %code {CODE} 5177 -- Directive: %code QUALIFIER {CODE} 5178 Insert CODE verbatim into the output parser source at the default 5179 location or at the location specified by QUALIFIER. *Note %code 5180 Summary::. 5181 5182 -- Directive: %debug 5183 In the parser implementation file, define the macro `YYDEBUG' (or 5184 `PREFIXDEBUG' with `%define api.prefix PREFIX', see *note Multiple 5185 Parsers in the Same Program: Multiple Parsers.) to 1 if it is not 5186 already defined, so that the debugging facilities are compiled. 5187 *Note Tracing Your Parser: Tracing. 5188 5189 -- Directive: %define VARIABLE 5190 -- Directive: %define VARIABLE VALUE 5191 -- Directive: %define VARIABLE "VALUE" 5192 Define a variable to adjust Bison's behavior. *Note %define 5193 Summary::. 5194 5195 -- Directive: %defines 5196 Write a parser header file containing macro definitions for the 5197 token type names defined in the grammar as well as a few other 5198 declarations. If the parser implementation file is named `NAME.c' 5199 then the parser header file is named `NAME.h'. 5200 5201 For C parsers, the parser header file declares `YYSTYPE' unless 5202 `YYSTYPE' is already defined as a macro or you have used a 5203 `<TYPE>' tag without using `%union'. Therefore, if you are using 5204 a `%union' (*note More Than One Value Type: Multiple Types.) with 5205 components that require other definitions, or if you have defined 5206 a `YYSTYPE' macro or type definition (*note Data Types of Semantic 5207 Values: Value Type.), you need to arrange for these definitions to 5208 be propagated to all modules, e.g., by putting them in a 5209 prerequisite header that is included both by your parser and by any 5210 other module that needs `YYSTYPE'. 5211 5212 Unless your parser is pure, the parser header file declares 5213 `yylval' as an external variable. *Note A Pure (Reentrant) 5214 Parser: Pure Decl. 5215 5216 If you have also used locations, the parser header file declares 5217 `YYLTYPE' and `yylloc' using a protocol similar to that of the 5218 `YYSTYPE' macro and `yylval'. *Note Tracking Locations::. 5219 5220 This parser header file is normally essential if you wish to put 5221 the definition of `yylex' in a separate source file, because 5222 `yylex' typically needs to be able to refer to the above-mentioned 5223 declarations and to the token type codes. *Note Semantic Values 5224 of Tokens: Token Values. 5225 5226 If you have declared `%code requires' or `%code provides', the 5227 output header also contains their code. *Note %code Summary::. 5228 5229 The generated header is protected against multiple inclusions with 5230 a C preprocessor guard: `YY_PREFIX_FILE_INCLUDED', where PREFIX 5231 and FILE are the prefix (*note Multiple Parsers in the Same 5232 Program: Multiple Parsers.) and generated file name turned 5233 uppercase, with each series of non alphanumerical characters 5234 converted to a single underscore. 5235 5236 For instance with `%define api.prefix "calc"' and `%defines 5237 "lib/parse.h"', the header will be guarded as follows. 5238 #ifndef YY_CALC_LIB_PARSE_H_INCLUDED 5239 # define YY_CALC_LIB_PARSE_H_INCLUDED 5240 ... 5241 #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */ 5242 5243 -- Directive: %defines DEFINES-FILE 5244 Same as above, but save in the file DEFINES-FILE. 5245 5246 -- Directive: %destructor 5247 Specify how the parser should reclaim the memory associated to 5248 discarded symbols. *Note Freeing Discarded Symbols: Destructor 5249 Decl. 5250 5251 -- Directive: %file-prefix "PREFIX" 5252 Specify a prefix to use for all Bison output file names. The names 5253 are chosen as if the grammar file were named `PREFIX.y'. 5254 5255 -- Directive: %language "LANGUAGE" 5256 Specify the programming language for the generated parser. 5257 Currently supported languages include C, C++, and Java. LANGUAGE 5258 is case-insensitive. 5259 5260 5261 -- Directive: %locations 5262 Generate the code processing the locations (*note Special Features 5263 for Use in Actions: Action Features.). This mode is enabled as 5264 soon as the grammar uses the special `@N' tokens, but if your 5265 grammar does not use it, using `%locations' allows for more 5266 accurate syntax error messages. 5267 5268 -- Directive: %no-lines 5269 Don't generate any `#line' preprocessor commands in the parser 5270 implementation file. Ordinarily Bison writes these commands in the 5271 parser implementation file so that the C compiler and debuggers 5272 will associate errors and object code with your source file (the 5273 grammar file). This directive causes them to associate errors 5274 with the parser implementation file, treating it as an independent 5275 source file in its own right. 5276 5277 -- Directive: %output "FILE" 5278 Specify FILE for the parser implementation file. 5279 5280 -- Directive: %pure-parser 5281 Deprecated version of `%define api.pure' (*note api.pure: %define 5282 Summary.), for which Bison is more careful to warn about 5283 unreasonable usage. 5284 5285 -- Directive: %require "VERSION" 5286 Require version VERSION or higher of Bison. *Note Require a 5287 Version of Bison: Require Decl. 5288 5289 -- Directive: %skeleton "FILE" 5290 Specify the skeleton to use. 5291 5292 If FILE does not contain a `/', FILE is the name of a skeleton 5293 file in the Bison installation directory. If it does, FILE is an 5294 absolute file name or a file name relative to the directory of the 5295 grammar file. This is similar to how most shells resolve commands. 5296 5297 -- Directive: %token-table 5298 Generate an array of token names in the parser implementation file. 5299 The name of the array is `yytname'; `yytname[I]' is the name of 5300 the token whose internal Bison token code number is I. The first 5301 three elements of `yytname' correspond to the predefined tokens 5302 `"$end"', `"error"', and `"$undefined"'; after these come the 5303 symbols defined in the grammar file. 5304 5305 The name in the table includes all the characters needed to 5306 represent the token in Bison. For single-character literals and 5307 literal strings, this includes the surrounding quoting characters 5308 and any escape sequences. For example, the Bison single-character 5309 literal `'+'' corresponds to a three-character name, represented 5310 in C as `"'+'"'; and the Bison two-character literal string `"\\/"' 5311 corresponds to a five-character name, represented in C as 5312 `"\"\\\\/\""'. 5313 5314 When you specify `%token-table', Bison also generates macro 5315 definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and 5316 `YYNSTATES': 5317 5318 `YYNTOKENS' 5319 The highest token number, plus one. 5320 5321 `YYNNTS' 5322 The number of nonterminal symbols. 5323 5324 `YYNRULES' 5325 The number of grammar rules, 5326 5327 `YYNSTATES' 5328 The number of parser states (*note Parser States::). 5329 5330 -- Directive: %verbose 5331 Write an extra output file containing verbose descriptions of the 5332 parser states and what is done for each type of lookahead token in 5333 that state. *Note Understanding Your Parser: Understanding, for 5334 more information. 5335 5336 -- Directive: %yacc 5337 Pretend the option `--yacc' was given, i.e., imitate Yacc, 5338 including its naming conventions. *Note Bison Options::, for more. 5339 5340 5341 File: bison.info, Node: %define Summary, Next: %code Summary, Prev: Decl Summary, Up: Declarations 5342 5343 3.8.14 %define Summary 5344 ---------------------- 5345 5346 There are many features of Bison's behavior that can be controlled by 5347 assigning the feature a single value. For historical reasons, some 5348 such features are assigned values by dedicated directives, such as 5349 `%start', which assigns the start symbol. However, newer such features 5350 are associated with variables, which are assigned by the `%define' 5351 directive: 5352 5353 -- Directive: %define VARIABLE 5354 -- Directive: %define VARIABLE VALUE 5355 -- Directive: %define VARIABLE "VALUE" 5356 Define VARIABLE to VALUE. 5357 5358 VALUE must be placed in quotation marks if it contains any 5359 character other than a letter, underscore, period, or non-initial 5360 dash or digit. Omitting `"VALUE"' entirely is always equivalent 5361 to specifying `""'. 5362 5363 It is an error if a VARIABLE is defined by `%define' multiple 5364 times, but see *note -D NAME[=VALUE]: Bison Options. 5365 5366 The rest of this section summarizes variables and values that 5367 `%define' accepts. 5368 5369 Some VARIABLEs take Boolean values. In this case, Bison will 5370 complain if the variable definition does not meet one of the following 5371 four conditions: 5372 5373 1. `VALUE' is `true' 5374 5375 2. `VALUE' is omitted (or `""' is specified). This is equivalent to 5376 `true'. 5377 5378 3. `VALUE' is `false'. 5379 5380 4. VARIABLE is never defined. In this case, Bison selects a default 5381 value. 5382 5383 What VARIABLEs are accepted, as well as their meanings and default 5384 values, depend on the selected target language and/or the parser 5385 skeleton (*note %language: Decl Summary, *note %skeleton: Decl 5386 Summary.). Unaccepted VARIABLEs produce an error. Some of the 5387 accepted VARIABLEs are: 5388 5389 * `api.location.type' 5390 5391 * Language(s): C++, Java 5392 5393 * Purpose: Define the location type. *Note User Defined 5394 Location Type::. 5395 5396 * Accepted Values: String 5397 5398 * Default Value: none 5399 5400 * History: introduced in Bison 2.7 5401 5402 * `api.prefix' 5403 5404 * Language(s): All 5405 5406 * Purpose: Rename exported symbols. *Note Multiple Parsers in 5407 the Same Program: Multiple Parsers. 5408 5409 * Accepted Values: String 5410 5411 * Default Value: `yy' 5412 5413 * History: introduced in Bison 2.6 5414 5415 * `api.pure' 5416 5417 * Language(s): C 5418 5419 * Purpose: Request a pure (reentrant) parser program. *Note A 5420 Pure (Reentrant) Parser: Pure Decl. 5421 5422 * Accepted Values: `true', `false', `full' 5423 5424 The value may be omitted: this is equivalent to specifying 5425 `true', as is the case for Boolean values. 5426 5427 When `%define api.pure full' is used, the parser is made 5428 reentrant. This changes the signature for `yylex' (*note Pure 5429 Calling::), and also that of `yyerror' when the tracking of 5430 locations has been activated, as shown below. 5431 5432 The `true' value is very similar to the `full' value, the only 5433 difference is in the signature of `yyerror' on Yacc parsers 5434 without `%parse-param', for historical reasons. 5435 5436 I.e., if `%locations %define api.pure' is passed then the 5437 prototypes for `yyerror' are: 5438 5439 void yyerror (char const *msg); // Yacc parsers. 5440 void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers. 5441 5442 But if `%locations %define api.pure %parse-param {int 5443 *nastiness}' is used, then both parsers have the same 5444 signature: 5445 5446 void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg); 5447 5448 (*note The Error Reporting Function `yyerror': Error 5449 Reporting.) 5450 5451 * Default Value: `false' 5452 5453 * History: the `full' value was introduced in Bison 2.7 5454 5455 * `api.push-pull' 5456 5457 * Language(s): C (deterministic parsers only) 5458 5459 * Purpose: Request a pull parser, a push parser, or both. 5460 *Note A Push Parser: Push Decl. (The current push parsing 5461 interface is experimental and may evolve. More user feedback 5462 will help to stabilize it.) 5463 5464 * Accepted Values: `pull', `push', `both' 5465 5466 * Default Value: `pull' 5467 5468 * `lr.default-reductions' 5469 5470 * Language(s): all 5471 5472 * Purpose: Specify the kind of states that are permitted to 5473 contain default reductions. *Note Default Reductions::. 5474 (The ability to specify where default reductions should be 5475 used is experimental. More user feedback will help to 5476 stabilize it.) 5477 5478 * Accepted Values: `most', `consistent', `accepting' 5479 5480 * Default Value: 5481 * `accepting' if `lr.type' is `canonical-lr'. 5482 5483 * `most' otherwise. 5484 5485 * `lr.keep-unreachable-states' 5486 5487 * Language(s): all 5488 5489 * Purpose: Request that Bison allow unreachable parser states to 5490 remain in the parser tables. *Note Unreachable States::. 5491 5492 * Accepted Values: Boolean 5493 5494 * Default Value: `false' 5495 5496 * `lr.type' 5497 5498 * Language(s): all 5499 5500 * Purpose: Specify the type of parser tables within the LR(1) 5501 family. *Note LR Table Construction::. (This feature is 5502 experimental. More user feedback will help to stabilize it.) 5503 5504 * Accepted Values: `lalr', `ielr', `canonical-lr' 5505 5506 * Default Value: `lalr' 5507 5508 * `namespace' 5509 5510 * Languages(s): C++ 5511 5512 * Purpose: Specify the namespace for the parser class. For 5513 example, if you specify: 5514 5515 %define namespace "foo::bar" 5516 5517 Bison uses `foo::bar' verbatim in references such as: 5518 5519 foo::bar::parser::semantic_type 5520 5521 However, to open a namespace, Bison removes any leading `::' 5522 and then splits on any remaining occurrences: 5523 5524 namespace foo { namespace bar { 5525 class position; 5526 class location; 5527 } } 5528 5529 * Accepted Values: Any absolute or relative C++ namespace 5530 reference without a trailing `"::"'. For example, `"foo"' or 5531 `"::foo::bar"'. 5532 5533 * Default Value: The value specified by `%name-prefix', which 5534 defaults to `yy'. This usage of `%name-prefix' is for 5535 backward compatibility and can be confusing since 5536 `%name-prefix' also specifies the textual prefix for the 5537 lexical analyzer function. Thus, if you specify 5538 `%name-prefix', it is best to also specify `%define 5539 namespace' so that `%name-prefix' _only_ affects the lexical 5540 analyzer function. For example, if you specify: 5541 5542 %define namespace "foo" 5543 %name-prefix "bar::" 5544 5545 The parser namespace is `foo' and `yylex' is referenced as 5546 `bar::lex'. 5547 5548 * `parse.lac' 5549 5550 * Languages(s): C (deterministic parsers only) 5551 5552 * Purpose: Enable LAC (lookahead correction) to improve syntax 5553 error handling. *Note LAC::. 5554 5555 * Accepted Values: `none', `full' 5556 5557 * Default Value: `none' 5558 5559 5560 File: bison.info, Node: %code Summary, Prev: %define Summary, Up: Declarations 5561 5562 3.8.15 %code Summary 5563 -------------------- 5564 5565 The `%code' directive inserts code verbatim into the output parser 5566 source at any of a predefined set of locations. It thus serves as a 5567 flexible and user-friendly alternative to the traditional Yacc 5568 prologue, `%{CODE%}'. This section summarizes the functionality of 5569 `%code' for the various target languages supported by Bison. For a 5570 detailed discussion of how to use `%code' in place of `%{CODE%}' for 5571 C/C++ and why it is advantageous to do so, *note Prologue 5572 Alternatives::. 5573 5574 -- Directive: %code {CODE} 5575 This is the unqualified form of the `%code' directive. It inserts 5576 CODE verbatim at a language-dependent default location in the 5577 parser implementation. 5578 5579 For C/C++, the default location is the parser implementation file 5580 after the usual contents of the parser header file. Thus, the 5581 unqualified form replaces `%{CODE%}' for most purposes. 5582 5583 For Java, the default location is inside the parser class. 5584 5585 -- Directive: %code QUALIFIER {CODE} 5586 This is the qualified form of the `%code' directive. QUALIFIER 5587 identifies the purpose of CODE and thus the location(s) where 5588 Bison should insert it. That is, if you need to specify 5589 location-sensitive CODE that does not belong at the default 5590 location selected by the unqualified `%code' form, use this form 5591 instead. 5592 5593 For any particular qualifier or for the unqualified form, if there 5594 are multiple occurrences of the `%code' directive, Bison concatenates 5595 the specified code in the order in which it appears in the grammar file. 5596 5597 Not all qualifiers are accepted for all target languages. Unaccepted 5598 qualifiers produce an error. Some of the accepted qualifiers are: 5599 5600 * requires 5601 5602 * Language(s): C, C++ 5603 5604 * Purpose: This is the best place to write dependency code 5605 required for `YYSTYPE' and `YYLTYPE'. In other words, it's 5606 the best place to define types referenced in `%union' 5607 directives, and it's the best place to override Bison's 5608 default `YYSTYPE' and `YYLTYPE' definitions. 5609 5610 * Location(s): The parser header file and the parser 5611 implementation file before the Bison-generated `YYSTYPE' and 5612 `YYLTYPE' definitions. 5613 5614 * provides 5615 5616 * Language(s): C, C++ 5617 5618 * Purpose: This is the best place to write additional 5619 definitions and declarations that should be provided to other 5620 modules. 5621 5622 * Location(s): The parser header file and the parser 5623 implementation file after the Bison-generated `YYSTYPE', 5624 `YYLTYPE', and token definitions. 5625 5626 * top 5627 5628 * Language(s): C, C++ 5629 5630 * Purpose: The unqualified `%code' or `%code requires' should 5631 usually be more appropriate than `%code top'. However, 5632 occasionally it is necessary to insert code much nearer the 5633 top of the parser implementation file. For example: 5634 5635 %code top { 5636 #define _GNU_SOURCE 5637 #include <stdio.h> 5638 } 5639 5640 * Location(s): Near the top of the parser implementation file. 5641 5642 * imports 5643 5644 * Language(s): Java 5645 5646 * Purpose: This is the best place to write Java import 5647 directives. 5648 5649 * Location(s): The parser Java file after any Java package 5650 directive and before any class definitions. 5651 5652 Though we say the insertion locations are language-dependent, they 5653 are technically skeleton-dependent. Writers of non-standard skeletons 5654 however should choose their locations consistently with the behavior of 5655 the standard Bison skeletons. 5656 5657 5658 File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File 5659 5660 3.9 Multiple Parsers in the Same Program 5661 ======================================== 5662 5663 Most programs that use Bison parse only one language and therefore 5664 contain only one Bison parser. But what if you want to parse more than 5665 one language with the same program? Then you need to avoid name 5666 conflicts between different definitions of functions and variables such 5667 as `yyparse', `yylval'. To use different parsers from the same 5668 compilation unit, you also need to avoid conflicts on types and macros 5669 (e.g., `YYSTYPE') exported in the generated header. 5670 5671 The easy way to do this is to define the `%define' variable 5672 `api.prefix'. With different `api.prefix's it is guaranteed that 5673 headers do not conflict when included together, and that compiled 5674 objects can be linked together too. Specifying `%define api.prefix 5675 PREFIX' (or passing the option `-Dapi.prefix=PREFIX', see *note 5676 Invoking Bison: Invocation.) renames the interface functions and 5677 variables of the Bison parser to start with PREFIX instead of `yy', and 5678 all the macros to start by PREFIX (i.e., PREFIX upper-cased) instead of 5679 `YY'. 5680 5681 The renamed symbols include `yyparse', `yylex', `yyerror', 5682 `yynerrs', `yylval', `yylloc', `yychar' and `yydebug'. If you use a 5683 push parser, `yypush_parse', `yypull_parse', `yypstate', `yypstate_new' 5684 and `yypstate_delete' will also be renamed. The renamed macros include 5685 `YYSTYPE', `YYLTYPE', and `YYDEBUG', which is treated specifically -- 5686 more about this below. 5687 5688 For example, if you use `%define api.prefix c', the names become 5689 `cparse', `clex', ..., `CSTYPE', `CLTYPE', and so on. 5690 5691 The `%define' variable `api.prefix' works in two different ways. In 5692 the implementation file, it works by adding macro definitions to the 5693 beginning of the parser implementation file, defining `yyparse' as 5694 `PREFIXparse', and so on: 5695 5696 #define YYSTYPE CTYPE 5697 #define yyparse cparse 5698 #define yylval clval 5699 ... 5700 YYSTYPE yylval; 5701 int yyparse (void); 5702 5703 This effectively substitutes one name for the other in the entire 5704 parser implementation file, thus the "original" names (`yylex', 5705 `YYSTYPE', ...) are also usable in the parser implementation file. 5706 5707 However, in the parser header file, the symbols are defined renamed, 5708 for instance: 5709 5710 extern CSTYPE clval; 5711 int cparse (void); 5712 5713 The macro `YYDEBUG' is commonly used to enable the tracing support in 5714 parsers. To comply with this tradition, when `api.prefix' is used, 5715 `YYDEBUG' (not renamed) is used as a default value: 5716 5717 /* Enabling traces. */ 5718 #ifndef CDEBUG 5719 # if defined YYDEBUG 5720 # if YYDEBUG 5721 # define CDEBUG 1 5722 # else 5723 # define CDEBUG 0 5724 # endif 5725 # else 5726 # define CDEBUG 0 5727 # endif 5728 #endif 5729 #if CDEBUG 5730 extern int cdebug; 5731 #endif 5732 5733 5734 5735 Prior to Bison 2.6, a feature similar to `api.prefix' was provided by 5736 the obsolete directive `%name-prefix' (*note Bison Symbols: Table of 5737 Symbols.) and the option `--name-prefix' (*note Bison Options::). 5738 5739 5740 File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top 5741 5742 4 Parser C-Language Interface 5743 ***************************** 5744 5745 The Bison parser is actually a C function named `yyparse'. Here we 5746 describe the interface conventions of `yyparse' and the other functions 5747 that it needs to use. 5748 5749 Keep in mind that the parser uses many C identifiers starting with 5750 `yy' and `YY' for internal purposes. If you use such an identifier 5751 (aside from those in this manual) in an action or in epilogue in the 5752 grammar file, you are likely to run into trouble. 5753 5754 * Menu: 5755 5756 * Parser Function:: How to call `yyparse' and what it returns. 5757 * Push Parser Function:: How to call `yypush_parse' and what it returns. 5758 * Pull Parser Function:: How to call `yypull_parse' and what it returns. 5759 * Parser Create Function:: How to call `yypstate_new' and what it returns. 5760 * Parser Delete Function:: How to call `yypstate_delete' and what it returns. 5761 * Lexical:: You must supply a function `yylex' 5762 which reads tokens. 5763 * Error Reporting:: You must supply a function `yyerror'. 5764 * Action Features:: Special features for use in actions. 5765 * Internationalization:: How to let the parser speak in the user's 5766 native language. 5767 5768 5769 File: bison.info, Node: Parser Function, Next: Push Parser Function, Up: Interface 5770 5771 4.1 The Parser Function `yyparse' 5772 ================================= 5773 5774 You call the function `yyparse' to cause parsing to occur. This 5775 function reads tokens, executes actions, and ultimately returns when it 5776 encounters end-of-input or an unrecoverable syntax error. You can also 5777 write an action which directs `yyparse' to return immediately without 5778 reading further. 5779 5780 -- Function: int yyparse (void) 5781 The value returned by `yyparse' is 0 if parsing was successful 5782 (return is due to end-of-input). 5783 5784 The value is 1 if parsing failed because of invalid input, i.e., 5785 input that contains a syntax error or that causes `YYABORT' to be 5786 invoked. 5787 5788 The value is 2 if parsing failed due to memory exhaustion. 5789 5790 In an action, you can cause immediate return from `yyparse' by using 5791 these macros: 5792 5793 -- Macro: YYACCEPT 5794 Return immediately with value 0 (to report success). 5795 5796 -- Macro: YYABORT 5797 Return immediately with value 1 (to report failure). 5798 5799 If you use a reentrant parser, you can optionally pass additional 5800 parameter information to it in a reentrant way. To do so, use the 5801 declaration `%parse-param': 5802 5803 -- Directive: %parse-param {ARGUMENT-DECLARATION} 5804 Declare that an argument declared by the braced-code 5805 ARGUMENT-DECLARATION is an additional `yyparse' argument. The 5806 ARGUMENT-DECLARATION is used when declaring functions or 5807 prototypes. The last identifier in ARGUMENT-DECLARATION must be 5808 the argument name. 5809 5810 Here's an example. Write this in the parser: 5811 5812 %parse-param {int *nastiness} 5813 %parse-param {int *randomness} 5814 5815 Then call the parser like this: 5816 5817 { 5818 int nastiness, randomness; 5819 ... /* Store proper data in `nastiness' and `randomness'. */ 5820 value = yyparse (&nastiness, &randomness); 5821 ... 5822 } 5823 5824 In the grammar actions, use expressions like this to refer to the data: 5825 5826 exp: ... { ...; *randomness += 1; ... } 5827 5828 Using the following: 5829 %parse-param {int *randomness} 5830 5831 Results in these signatures: 5832 void yyerror (int *randomness, const char *msg); 5833 int yyparse (int *randomness); 5834 5835 Or, if both `%define api.pure full' (or just `%define api.pure') and 5836 `%locations' are used: 5837 5838 void yyerror (YYLTYPE *llocp, int *randomness, const char *msg); 5839 int yyparse (int *randomness); 5840 5841 5842 File: bison.info, Node: Push Parser Function, Next: Pull Parser Function, Prev: Parser Function, Up: Interface 5843 5844 4.2 The Push Parser Function `yypush_parse' 5845 =========================================== 5846 5847 (The current push parsing interface is experimental and may evolve. 5848 More user feedback will help to stabilize it.) 5849 5850 You call the function `yypush_parse' to parse a single token. This 5851 function is available if either the `%define api.push-pull push' or 5852 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5853 Push Decl. 5854 5855 -- Function: int yypush_parse (yypstate *yyps) 5856 The value returned by `yypush_parse' is the same as for yyparse 5857 with the following exception: it returns `YYPUSH_MORE' if more 5858 input is required to finish parsing the grammar. 5859 5860 5861 File: bison.info, Node: Pull Parser Function, Next: Parser Create Function, Prev: Push Parser Function, Up: Interface 5862 5863 4.3 The Pull Parser Function `yypull_parse' 5864 =========================================== 5865 5866 (The current push parsing interface is experimental and may evolve. 5867 More user feedback will help to stabilize it.) 5868 5869 You call the function `yypull_parse' to parse the rest of the input 5870 stream. This function is available if the `%define api.push-pull both' 5871 declaration is used. *Note A Push Parser: Push Decl. 5872 5873 -- Function: int yypull_parse (yypstate *yyps) 5874 The value returned by `yypull_parse' is the same as for `yyparse'. 5875 5876 5877 File: bison.info, Node: Parser Create Function, Next: Parser Delete Function, Prev: Pull Parser Function, Up: Interface 5878 5879 4.4 The Parser Create Function `yystate_new' 5880 ============================================ 5881 5882 (The current push parsing interface is experimental and may evolve. 5883 More user feedback will help to stabilize it.) 5884 5885 You call the function `yypstate_new' to create a new parser instance. 5886 This function is available if either the `%define api.push-pull push' or 5887 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5888 Push Decl. 5889 5890 -- Function: yypstate* yypstate_new (void) 5891 The function will return a valid parser instance if there was 5892 memory available or 0 if no memory was available. In impure mode, 5893 it will also return 0 if a parser instance is currently allocated. 5894 5895 5896 File: bison.info, Node: Parser Delete Function, Next: Lexical, Prev: Parser Create Function, Up: Interface 5897 5898 4.5 The Parser Delete Function `yystate_delete' 5899 =============================================== 5900 5901 (The current push parsing interface is experimental and may evolve. 5902 More user feedback will help to stabilize it.) 5903 5904 You call the function `yypstate_delete' to delete a parser instance. 5905 function is available if either the `%define api.push-pull push' or 5906 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5907 Push Decl. 5908 5909 -- Function: void yypstate_delete (yypstate *yyps) 5910 This function will reclaim the memory associated with a parser 5911 instance. After this call, you should no longer attempt to use 5912 the parser instance. 5913 5914 5915 File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Delete Function, Up: Interface 5916 5917 4.6 The Lexical Analyzer Function `yylex' 5918 ========================================= 5919 5920 The "lexical analyzer" function, `yylex', recognizes tokens from the 5921 input stream and returns them to the parser. Bison does not create 5922 this function automatically; you must write it so that `yyparse' can 5923 call it. The function is sometimes referred to as a lexical scanner. 5924 5925 In simple programs, `yylex' is often defined at the end of the Bison 5926 grammar file. If `yylex' is defined in a separate source file, you 5927 need to arrange for the token-type macro definitions to be available 5928 there. To do this, use the `-d' option when you run Bison, so that it 5929 will write these macro definitions into the separate parser header 5930 file, `NAME.tab.h', which you can include in the other source files 5931 that need it. *Note Invoking Bison: Invocation. 5932 5933 * Menu: 5934 5935 * Calling Convention:: How `yyparse' calls `yylex'. 5936 * Token Values:: How `yylex' must return the semantic value 5937 of the token it has read. 5938 * Token Locations:: How `yylex' must return the text location 5939 (line number, etc.) of the token, if the 5940 actions want that. 5941 * Pure Calling:: How the calling convention differs in a pure parser 5942 (*note A Pure (Reentrant) Parser: Pure Decl.). 5943 5944 5945 File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical 5946 5947 4.6.1 Calling Convention for `yylex' 5948 ------------------------------------ 5949 5950 The value that `yylex' returns must be the positive numeric code for 5951 the type of token it has just found; a zero or negative value signifies 5952 end-of-input. 5953 5954 When a token is referred to in the grammar rules by a name, that name 5955 in the parser implementation file becomes a C macro whose definition is 5956 the proper numeric code for that token type. So `yylex' can use the 5957 name to indicate that type. *Note Symbols::. 5958 5959 When a token is referred to in the grammar rules by a character 5960 literal, the numeric code for that character is also the code for the 5961 token type. So `yylex' can simply return that character code, possibly 5962 converted to `unsigned char' to avoid sign-extension. The null 5963 character must not be used this way, because its code is zero and that 5964 signifies end-of-input. 5965 5966 Here is an example showing these things: 5967 5968 int 5969 yylex (void) 5970 { 5971 ... 5972 if (c == EOF) /* Detect end-of-input. */ 5973 return 0; 5974 ... 5975 if (c == '+' || c == '-') 5976 return c; /* Assume token type for `+' is '+'. */ 5977 ... 5978 return INT; /* Return the type of the token. */ 5979 ... 5980 } 5981 5982 This interface has been designed so that the output from the `lex' 5983 utility can be used without change as the definition of `yylex'. 5984 5985 If the grammar uses literal string tokens, there are two ways that 5986 `yylex' can determine the token type codes for them: 5987 5988 * If the grammar defines symbolic token names as aliases for the 5989 literal string tokens, `yylex' can use these symbolic names like 5990 all others. In this case, the use of the literal string tokens in 5991 the grammar file has no effect on `yylex'. 5992 5993 * `yylex' can find the multicharacter token in the `yytname' table. 5994 The index of the token in the table is the token type's code. The 5995 name of a multicharacter token is recorded in `yytname' with a 5996 double-quote, the token's characters, and another double-quote. 5997 The token's characters are escaped as necessary to be suitable as 5998 input to Bison. 5999 6000 Here's code for looking up a multicharacter token in `yytname', 6001 assuming that the characters of the token are stored in 6002 `token_buffer', and assuming that the token does not contain any 6003 characters like `"' that require escaping. 6004 6005 for (i = 0; i < YYNTOKENS; i++) 6006 { 6007 if (yytname[i] != 0 6008 && yytname[i][0] == '"' 6009 && ! strncmp (yytname[i] + 1, token_buffer, 6010 strlen (token_buffer)) 6011 && yytname[i][strlen (token_buffer) + 1] == '"' 6012 && yytname[i][strlen (token_buffer) + 2] == 0) 6013 break; 6014 } 6015 6016 The `yytname' table is generated only if you use the 6017 `%token-table' declaration. *Note Decl Summary::. 6018 6019 6020 File: bison.info, Node: Token Values, Next: Token Locations, Prev: Calling Convention, Up: Lexical 6021 6022 4.6.2 Semantic Values of Tokens 6023 ------------------------------- 6024 6025 In an ordinary (nonreentrant) parser, the semantic value of the token 6026 must be stored into the global variable `yylval'. When you are using 6027 just one data type for semantic values, `yylval' has that type. Thus, 6028 if the type is `int' (the default), you might write this in `yylex': 6029 6030 ... 6031 yylval = value; /* Put value onto Bison stack. */ 6032 return INT; /* Return the type of the token. */ 6033 ... 6034 6035 When you are using multiple data types, `yylval''s type is a union 6036 made from the `%union' declaration (*note The Collection of Value 6037 Types: Union Decl.). So when you store a token's value, you must use 6038 the proper member of the union. If the `%union' declaration looks like 6039 this: 6040 6041 %union { 6042 int intval; 6043 double val; 6044 symrec *tptr; 6045 } 6046 6047 then the code in `yylex' might look like this: 6048 6049 ... 6050 yylval.intval = value; /* Put value onto Bison stack. */ 6051 return INT; /* Return the type of the token. */ 6052 ... 6053 6054 6055 File: bison.info, Node: Token Locations, Next: Pure Calling, Prev: Token Values, Up: Lexical 6056 6057 4.6.3 Textual Locations of Tokens 6058 --------------------------------- 6059 6060 If you are using the `@N'-feature (*note Tracking Locations::) in 6061 actions to keep track of the textual locations of tokens and groupings, 6062 then you must provide this information in `yylex'. The function 6063 `yyparse' expects to find the textual location of a token just parsed 6064 in the global variable `yylloc'. So `yylex' must store the proper data 6065 in that variable. 6066 6067 By default, the value of `yylloc' is a structure and you need only 6068 initialize the members that are going to be used by the actions. The 6069 four members are called `first_line', `first_column', `last_line' and 6070 `last_column'. Note that the use of this feature makes the parser 6071 noticeably slower. 6072 6073 The data type of `yylloc' has the name `YYLTYPE'. 6074 6075 6076 File: bison.info, Node: Pure Calling, Prev: Token Locations, Up: Lexical 6077 6078 4.6.4 Calling Conventions for Pure Parsers 6079 ------------------------------------------ 6080 6081 When you use the Bison declaration `%define api.pure full' to request a 6082 pure, reentrant parser, the global communication variables `yylval' and 6083 `yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.) 6084 In such parsers the two global variables are replaced by pointers 6085 passed as arguments to `yylex'. You must declare them as shown here, 6086 and pass the information back by storing it through those pointers. 6087 6088 int 6089 yylex (YYSTYPE *lvalp, YYLTYPE *llocp) 6090 { 6091 ... 6092 *lvalp = value; /* Put value onto Bison stack. */ 6093 return INT; /* Return the type of the token. */ 6094 ... 6095 } 6096 6097 If the grammar file does not use the `@' constructs to refer to 6098 textual locations, then the type `YYLTYPE' will not be defined. In 6099 this case, omit the second argument; `yylex' will be called with only 6100 one argument. 6101 6102 If you wish to pass the additional parameter data to `yylex', use 6103 `%lex-param' just like `%parse-param' (*note Parser Function::). 6104 6105 -- Directive: lex-param {ARGUMENT-DECLARATION} 6106 Declare that the braced-code ARGUMENT-DECLARATION is an additional 6107 `yylex' argument declaration. 6108 6109 For instance: 6110 6111 %lex-param {int *nastiness} 6112 6113 results in the following signature: 6114 6115 int yylex (int *nastiness); 6116 6117 If `%define api.pure full' (or just `%define api.pure') is added: 6118 6119 int yylex (YYSTYPE *lvalp, int *nastiness); 6120 6121 6122 File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface 6123 6124 4.7 The Error Reporting Function `yyerror' 6125 ========================================== 6126 6127 The Bison parser detects a "syntax error" or "parse error" whenever it 6128 reads a token which cannot satisfy any syntax rule. An action in the 6129 grammar can also explicitly proclaim an error, using the macro 6130 `YYERROR' (*note Special Features for Use in Actions: Action Features.). 6131 6132 The Bison parser expects to report the error by calling an error 6133 reporting function named `yyerror', which you must supply. It is 6134 called by `yyparse' whenever a syntax error is found, and it receives 6135 one argument. For a syntax error, the string is normally 6136 `"syntax error"'. 6137 6138 If you invoke the directive `%error-verbose' in the Bison 6139 declarations section (*note The Bison Declarations Section: Bison 6140 Declarations.), then Bison provides a more verbose and specific error 6141 message string instead of just plain `"syntax error"'. However, that 6142 message sometimes contains incorrect information if LAC is not enabled 6143 (*note LAC::). 6144 6145 The parser can detect one other kind of error: memory exhaustion. 6146 This can happen when the input contains constructions that are very 6147 deeply nested. It isn't likely you will encounter this, since the Bison 6148 parser normally extends its stack automatically up to a very large 6149 limit. But if memory is exhausted, `yyparse' calls `yyerror' in the 6150 usual fashion, except that the argument string is `"memory exhausted"'. 6151 6152 In some cases diagnostics like `"syntax error"' are translated 6153 automatically from English to some other language before they are 6154 passed to `yyerror'. *Note Internationalization::. 6155 6156 The following definition suffices in simple programs: 6157 6158 void 6159 yyerror (char const *s) 6160 { 6161 fprintf (stderr, "%s\n", s); 6162 } 6163 6164 After `yyerror' returns to `yyparse', the latter will attempt error 6165 recovery if you have written suitable error recovery grammar rules 6166 (*note Error Recovery::). If recovery is impossible, `yyparse' will 6167 immediately return 1. 6168 6169 Obviously, in location tracking pure parsers, `yyerror' should have 6170 an access to the current location. With `%define api.pure', this is 6171 indeed the case for the GLR parsers, but not for the Yacc parser, for 6172 historical reasons, and this is the why `%define api.pure full' should 6173 be prefered over `%define api.pure'. 6174 6175 When `%locations %define api.pure full' is used, `yyerror' has the 6176 following signature: 6177 6178 void yyerror (YYLTYPE *locp, char const *msg); 6179 6180 The prototypes are only indications of how the code produced by Bison 6181 uses `yyerror'. Bison-generated code always ignores the returned 6182 value, so `yyerror' can return any type, including `void'. Also, 6183 `yyerror' can be a variadic function; that is why the message is always 6184 passed last. 6185 6186 Traditionally `yyerror' returns an `int' that is always ignored, but 6187 this is purely for historical reasons, and `void' is preferable since 6188 it more accurately describes the return type for `yyerror'. 6189 6190 The variable `yynerrs' contains the number of syntax errors reported 6191 so far. Normally this variable is global; but if you request a pure 6192 parser (*note A Pure (Reentrant) Parser: Pure Decl.) then it is a 6193 local variable which only the actions can access. 6194 6195 6196 File: bison.info, Node: Action Features, Next: Internationalization, Prev: Error Reporting, Up: Interface 6197 6198 4.8 Special Features for Use in Actions 6199 ======================================= 6200 6201 Here is a table of Bison constructs, variables and macros that are 6202 useful in actions. 6203 6204 -- Variable: $$ 6205 Acts like a variable that contains the semantic value for the 6206 grouping made by the current rule. *Note Actions::. 6207 6208 -- Variable: $N 6209 Acts like a variable that contains the semantic value for the Nth 6210 component of the current rule. *Note Actions::. 6211 6212 -- Variable: $<TYPEALT>$ 6213 Like `$$' but specifies alternative TYPEALT in the union specified 6214 by the `%union' declaration. *Note Data Types of Values in 6215 Actions: Action Types. 6216 6217 -- Variable: $<TYPEALT>N 6218 Like `$N' but specifies alternative TYPEALT in the union specified 6219 by the `%union' declaration. *Note Data Types of Values in 6220 Actions: Action Types. 6221 6222 -- Macro: YYABORT `;' 6223 Return immediately from `yyparse', indicating failure. *Note The 6224 Parser Function `yyparse': Parser Function. 6225 6226 -- Macro: YYACCEPT `;' 6227 Return immediately from `yyparse', indicating success. *Note The 6228 Parser Function `yyparse': Parser Function. 6229 6230 -- Macro: YYBACKUP (TOKEN, VALUE)`;' 6231 Unshift a token. This macro is allowed only for rules that reduce 6232 a single value, and only when there is no lookahead token. It is 6233 also disallowed in GLR parsers. It installs a lookahead token 6234 with token type TOKEN and semantic value VALUE; then it discards 6235 the value that was going to be reduced by this rule. 6236 6237 If the macro is used when it is not valid, such as when there is a 6238 lookahead token already, then it reports a syntax error with a 6239 message `cannot back up' and performs ordinary error recovery. 6240 6241 In either case, the rest of the action is not executed. 6242 6243 -- Macro: YYEMPTY 6244 Value stored in `yychar' when there is no lookahead token. 6245 6246 -- Macro: YYEOF 6247 Value stored in `yychar' when the lookahead is the end of the input 6248 stream. 6249 6250 -- Macro: YYERROR `;' 6251 Cause an immediate syntax error. This statement initiates error 6252 recovery just as if the parser itself had detected an error; 6253 however, it does not call `yyerror', and does not print any 6254 message. If you want to print an error message, call `yyerror' 6255 explicitly before the `YYERROR;' statement. *Note Error 6256 Recovery::. 6257 6258 -- Macro: YYRECOVERING 6259 The expression `YYRECOVERING ()' yields 1 when the parser is 6260 recovering from a syntax error, and 0 otherwise. *Note Error 6261 Recovery::. 6262 6263 -- Variable: yychar 6264 Variable containing either the lookahead token, or `YYEOF' when the 6265 lookahead is the end of the input stream, or `YYEMPTY' when no 6266 lookahead has been performed so the next token is not yet known. 6267 Do not modify `yychar' in a deferred semantic action (*note GLR 6268 Semantic Actions::). *Note Lookahead Tokens: Lookahead. 6269 6270 -- Macro: yyclearin `;' 6271 Discard the current lookahead token. This is useful primarily in 6272 error rules. Do not invoke `yyclearin' in a deferred semantic 6273 action (*note GLR Semantic Actions::). *Note Error Recovery::. 6274 6275 -- Macro: yyerrok `;' 6276 Resume generating error messages immediately for subsequent syntax 6277 errors. This is useful primarily in error rules. *Note Error 6278 Recovery::. 6279 6280 -- Variable: yylloc 6281 Variable containing the lookahead token location when `yychar' is 6282 not set to `YYEMPTY' or `YYEOF'. Do not modify `yylloc' in a 6283 deferred semantic action (*note GLR Semantic Actions::). *Note 6284 Actions and Locations: Actions and Locations. 6285 6286 -- Variable: yylval 6287 Variable containing the lookahead token semantic value when 6288 `yychar' is not set to `YYEMPTY' or `YYEOF'. Do not modify 6289 `yylval' in a deferred semantic action (*note GLR Semantic 6290 Actions::). *Note Actions: Actions. 6291 6292 -- Value: @$ 6293 Acts like a structure variable containing information on the 6294 textual location of the grouping made by the current rule. *Note 6295 Tracking Locations::. 6296 6297 6298 -- Value: @N 6299 Acts like a structure variable containing information on the 6300 textual location of the Nth component of the current rule. *Note 6301 Tracking Locations::. 6302 6303 6304 File: bison.info, Node: Internationalization, Prev: Action Features, Up: Interface 6305 6306 4.9 Parser Internationalization 6307 =============================== 6308 6309 A Bison-generated parser can print diagnostics, including error and 6310 tracing messages. By default, they appear in English. However, Bison 6311 also supports outputting diagnostics in the user's native language. To 6312 make this work, the user should set the usual environment variables. 6313 *Note The User's View: (gettext)Users. For example, the shell command 6314 `export LC_ALL=fr_CA.UTF-8' might set the user's locale to French 6315 Canadian using the UTF-8 encoding. The exact set of available locales 6316 depends on the user's installation. 6317 6318 The maintainer of a package that uses a Bison-generated parser 6319 enables the internationalization of the parser's output through the 6320 following steps. Here we assume a package that uses GNU Autoconf and 6321 GNU Automake. 6322 6323 1. Into the directory containing the GNU Autoconf macros used by the 6324 package --often called `m4'-- copy the `bison-i18n.m4' file 6325 installed by Bison under `share/aclocal/bison-i18n.m4' in Bison's 6326 installation directory. For example: 6327 6328 cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 6329 6330 2. In the top-level `configure.ac', after the `AM_GNU_GETTEXT' 6331 invocation, add an invocation of `BISON_I18N'. This macro is 6332 defined in the file `bison-i18n.m4' that you copied earlier. It 6333 causes `configure' to find the value of the `BISON_LOCALEDIR' 6334 variable, and it defines the source-language symbol `YYENABLE_NLS' 6335 to enable translations in the Bison-generated parser. 6336 6337 3. In the `main' function of your program, designate the directory 6338 containing Bison's runtime message catalog, through a call to 6339 `bindtextdomain' with domain name `bison-runtime'. For example: 6340 6341 bindtextdomain ("bison-runtime", BISON_LOCALEDIR); 6342 6343 Typically this appears after any other call `bindtextdomain 6344 (PACKAGE, LOCALEDIR)' that your package already has. Here we rely 6345 on `BISON_LOCALEDIR' to be defined as a string through the 6346 `Makefile'. 6347 6348 4. In the `Makefile.am' that controls the compilation of the `main' 6349 function, make `BISON_LOCALEDIR' available as a C preprocessor 6350 macro, either in `DEFS' or in `AM_CPPFLAGS'. For example: 6351 6352 DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 6353 6354 or: 6355 6356 AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 6357 6358 5. Finally, invoke the command `autoreconf' to generate the build 6359 infrastructure. 6360 6361 6362 File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top 6363 6364 5 The Bison Parser Algorithm 6365 **************************** 6366 6367 As Bison reads tokens, it pushes them onto a stack along with their 6368 semantic values. The stack is called the "parser stack". Pushing a 6369 token is traditionally called "shifting". 6370 6371 For example, suppose the infix calculator has read `1 + 5 *', with a 6372 `3' to come. The stack will have four elements, one for each token 6373 that was shifted. 6374 6375 But the stack does not always have an element for each token read. 6376 When the last N tokens and groupings shifted match the components of a 6377 grammar rule, they can be combined according to that rule. This is 6378 called "reduction". Those tokens and groupings are replaced on the 6379 stack by a single grouping whose symbol is the result (left hand side) 6380 of that rule. Running the rule's action is part of the process of 6381 reduction, because this is what computes the semantic value of the 6382 resulting grouping. 6383 6384 For example, if the infix calculator's parser stack contains this: 6385 6386 1 + 5 * 3 6387 6388 and the next input token is a newline character, then the last three 6389 elements can be reduced to 15 via the rule: 6390 6391 expr: expr '*' expr; 6392 6393 Then the stack contains just these three elements: 6394 6395 1 + 15 6396 6397 At this point, another reduction can be made, resulting in the single 6398 value 16. Then the newline token can be shifted. 6399 6400 The parser tries, by shifts and reductions, to reduce the entire 6401 input down to a single grouping whose symbol is the grammar's 6402 start-symbol (*note Languages and Context-Free Grammars: Language and 6403 Grammar.). 6404 6405 This kind of parser is known in the literature as a bottom-up parser. 6406 6407 * Menu: 6408 6409 * Lookahead:: Parser looks one token ahead when deciding what to do. 6410 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 6411 * Precedence:: Operator precedence works by resolving conflicts. 6412 * Contextual Precedence:: When an operator's precedence depends on context. 6413 * Parser States:: The parser is a finite-state-machine with stack. 6414 * Reduce/Reduce:: When two rules are applicable in the same situation. 6415 * Mysterious Conflicts:: Conflicts that look unjustified. 6416 * Tuning LR:: How to tune fundamental aspects of LR-based parsing. 6417 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 6418 * Memory Management:: What happens when memory is exhausted. How to avoid it. 6419 6420 6421 File: bison.info, Node: Lookahead, Next: Shift/Reduce, Up: Algorithm 6422 6423 5.1 Lookahead Tokens 6424 ==================== 6425 6426 The Bison parser does _not_ always reduce immediately as soon as the 6427 last N tokens and groupings match a rule. This is because such a 6428 simple strategy is inadequate to handle most languages. Instead, when a 6429 reduction is possible, the parser sometimes "looks ahead" at the next 6430 token in order to decide what to do. 6431 6432 When a token is read, it is not immediately shifted; first it 6433 becomes the "lookahead token", which is not on the stack. Now the 6434 parser can perform one or more reductions of tokens and groupings on 6435 the stack, while the lookahead token remains off to the side. When no 6436 more reductions should take place, the lookahead token is shifted onto 6437 the stack. This does not mean that all possible reductions have been 6438 done; depending on the token type of the lookahead token, some rules 6439 may choose to delay their application. 6440 6441 Here is a simple case where lookahead is needed. These three rules 6442 define expressions which contain binary addition operators and postfix 6443 unary factorial operators (`!'), and allow parentheses for grouping. 6444 6445 expr: 6446 term '+' expr 6447 | term 6448 ; 6449 6450 term: 6451 '(' expr ')' 6452 | term '!' 6453 | "number" 6454 ; 6455 6456 Suppose that the tokens `1 + 2' have been read and shifted; what 6457 should be done? If the following token is `)', then the first three 6458 tokens must be reduced to form an `expr'. This is the only valid 6459 course, because shifting the `)' would produce a sequence of symbols 6460 `term ')'', and no rule allows this. 6461 6462 If the following token is `!', then it must be shifted immediately so 6463 that `2 !' can be reduced to make a `term'. If instead the parser were 6464 to reduce before shifting, `1 + 2' would become an `expr'. It would 6465 then be impossible to shift the `!' because doing so would produce on 6466 the stack the sequence of symbols `expr '!''. No rule allows that 6467 sequence. 6468 6469 The lookahead token is stored in the variable `yychar'. Its 6470 semantic value and location, if any, are stored in the variables 6471 `yylval' and `yylloc'. *Note Special Features for Use in Actions: 6472 Action Features. 6473 6474 6475 File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Lookahead, Up: Algorithm 6476 6477 5.2 Shift/Reduce Conflicts 6478 ========================== 6479 6480 Suppose we are parsing a language which has if-then and if-then-else 6481 statements, with a pair of rules like this: 6482 6483 if_stmt: 6484 "if" expr "then" stmt 6485 | "if" expr "then" stmt "else" stmt 6486 ; 6487 6488 Here `"if"', `"then"' and `"else"' are terminal symbols for specific 6489 keyword tokens. 6490 6491 When the `"else"' token is read and becomes the lookahead token, the 6492 contents of the stack (assuming the input is valid) are just right for 6493 reduction by the first rule. But it is also legitimate to shift the 6494 `"else"', because that would lead to eventual reduction by the second 6495 rule. 6496 6497 This situation, where either a shift or a reduction would be valid, 6498 is called a "shift/reduce conflict". Bison is designed to resolve 6499 these conflicts by choosing to shift, unless otherwise directed by 6500 operator precedence declarations. To see the reason for this, let's 6501 contrast it with the other alternative. 6502 6503 Since the parser prefers to shift the `"else"', the result is to 6504 attach the else-clause to the innermost if-statement, making these two 6505 inputs equivalent: 6506 6507 if x then if y then win; else lose; 6508 6509 if x then do; if y then win; else lose; end; 6510 6511 But if the parser chose to reduce when possible rather than shift, 6512 the result would be to attach the else-clause to the outermost 6513 if-statement, making these two inputs equivalent: 6514 6515 if x then if y then win; else lose; 6516 6517 if x then do; if y then win; end; else lose; 6518 6519 The conflict exists because the grammar as written is ambiguous: 6520 either parsing of the simple nested if-statement is legitimate. The 6521 established convention is that these ambiguities are resolved by 6522 attaching the else-clause to the innermost if-statement; this is what 6523 Bison accomplishes by choosing to shift rather than reduce. (It would 6524 ideally be cleaner to write an unambiguous grammar, but that is very 6525 hard to do in this case.) This particular ambiguity was first 6526 encountered in the specifications of Algol 60 and is called the 6527 "dangling `else'" ambiguity. 6528 6529 To avoid warnings from Bison about predictable, legitimate 6530 shift/reduce conflicts, you can use the `%expect N' declaration. There 6531 will be no warning as long as the number of shift/reduce conflicts is 6532 exactly N, and Bison will report an error if there is a different 6533 number. *Note Suppressing Conflict Warnings: Expect Decl. However, we 6534 don't recommend the use of `%expect' (except `%expect 0'!), as an equal 6535 number of conflicts does not mean that they are the _same_. When 6536 possible, you should rather use precedence directives to _fix_ the 6537 conflicts explicitly (*note Using Precedence For Non Operators: Non 6538 Operators.). 6539 6540 The definition of `if_stmt' above is solely to blame for the 6541 conflict, but the conflict does not actually appear without additional 6542 rules. Here is a complete Bison grammar file that actually manifests 6543 the conflict: 6544 6545 %% 6546 stmt: 6547 expr 6548 | if_stmt 6549 ; 6550 6551 if_stmt: 6552 "if" expr "then" stmt 6553 | "if" expr "then" stmt "else" stmt 6554 ; 6555 6556 expr: 6557 "identifier" 6558 ; 6559 6560 6561 File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm 6562 6563 5.3 Operator Precedence 6564 ======================= 6565 6566 Another situation where shift/reduce conflicts appear is in arithmetic 6567 expressions. Here shifting is not always the preferred resolution; the 6568 Bison declarations for operator precedence allow you to specify when to 6569 shift and when to reduce. 6570 6571 * Menu: 6572 6573 * Why Precedence:: An example showing why precedence is needed. 6574 * Using Precedence:: How to specify precedence in Bison grammars. 6575 * Precedence Examples:: How these features are used in the previous example. 6576 * How Precedence:: How they work. 6577 * Non Operators:: Using precedence for general conflicts. 6578 6579 6580 File: bison.info, Node: Why Precedence, Next: Using Precedence, Up: Precedence 6581 6582 5.3.1 When Precedence is Needed 6583 ------------------------------- 6584 6585 Consider the following ambiguous grammar fragment (ambiguous because the 6586 input `1 - 2 * 3' can be parsed in two different ways): 6587 6588 expr: 6589 expr '-' expr 6590 | expr '*' expr 6591 | expr '<' expr 6592 | '(' expr ')' 6593 ... 6594 ; 6595 6596 Suppose the parser has seen the tokens `1', `-' and `2'; should it 6597 reduce them via the rule for the subtraction operator? It depends on 6598 the next token. Of course, if the next token is `)', we must reduce; 6599 shifting is invalid because no single rule can reduce the token 6600 sequence `- 2 )' or anything starting with that. But if the next token 6601 is `*' or `<', we have a choice: either shifting or reduction would 6602 allow the parse to complete, but with different results. 6603 6604 To decide which one Bison should do, we must consider the results. 6605 If the next operator token OP is shifted, then it must be reduced first 6606 in order to permit another opportunity to reduce the difference. The 6607 result is (in effect) `1 - (2 OP 3)'. On the other hand, if the 6608 subtraction is reduced before shifting OP, the result is 6609 `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce should 6610 depend on the relative precedence of the operators `-' and OP: `*' 6611 should be shifted first, but not `<'. 6612 6613 What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5' 6614 or should it be `1 - (2 - 5)'? For most operators we prefer the 6615 former, which is called "left association". The latter alternative, 6616 "right association", is desirable for assignment operators. The choice 6617 of left or right association is a matter of whether the parser chooses 6618 to shift or reduce when the stack contains `1 - 2' and the lookahead 6619 token is `-': shifting makes right-associativity. 6620 6621 6622 File: bison.info, Node: Using Precedence, Next: Precedence Examples, Prev: Why Precedence, Up: Precedence 6623 6624 5.3.2 Specifying Operator Precedence 6625 ------------------------------------ 6626 6627 Bison allows you to specify these choices with the operator precedence 6628 declarations `%left' and `%right'. Each such declaration contains a 6629 list of tokens, which are operators whose precedence and associativity 6630 is being declared. The `%left' declaration makes all those operators 6631 left-associative and the `%right' declaration makes them 6632 right-associative. A third alternative is `%nonassoc', which declares 6633 that it is a syntax error to find the same operator twice "in a row". 6634 6635 The relative precedence of different operators is controlled by the 6636 order in which they are declared. The first `%left' or `%right' 6637 declaration in the file declares the operators whose precedence is 6638 lowest, the next such declaration declares the operators whose 6639 precedence is a little higher, and so on. 6640 6641 6642 File: bison.info, Node: Precedence Examples, Next: How Precedence, Prev: Using Precedence, Up: Precedence 6643 6644 5.3.3 Precedence Examples 6645 ------------------------- 6646 6647 In our example, we would want the following declarations: 6648 6649 %left '<' 6650 %left '-' 6651 %left '*' 6652 6653 In a more complete example, which supports other operators as well, 6654 we would declare them in groups of equal precedence. For example, 6655 `'+'' is declared with `'-'': 6656 6657 %left '<' '>' '=' "!=" "<=" ">=" 6658 %left '+' '-' 6659 %left '*' '/' 6660 6661 6662 File: bison.info, Node: How Precedence, Next: Non Operators, Prev: Precedence Examples, Up: Precedence 6663 6664 5.3.4 How Precedence Works 6665 -------------------------- 6666 6667 The first effect of the precedence declarations is to assign precedence 6668 levels to the terminal symbols declared. The second effect is to assign 6669 precedence levels to certain rules: each rule gets its precedence from 6670 the last terminal symbol mentioned in the components. (You can also 6671 specify explicitly the precedence of a rule. *Note Context-Dependent 6672 Precedence: Contextual Precedence.) 6673 6674 Finally, the resolution of conflicts works by comparing the 6675 precedence of the rule being considered with that of the lookahead 6676 token. If the token's precedence is higher, the choice is to shift. 6677 If the rule's precedence is higher, the choice is to reduce. If they 6678 have equal precedence, the choice is made based on the associativity of 6679 that precedence level. The verbose output file made by `-v' (*note 6680 Invoking Bison: Invocation.) says how each conflict was resolved. 6681 6682 Not all rules and not all tokens have precedence. If either the 6683 rule or the lookahead token has no precedence, then the default is to 6684 shift. 6685 6686 6687 File: bison.info, Node: Non Operators, Prev: How Precedence, Up: Precedence 6688 6689 5.3.5 Using Precedence For Non Operators 6690 ---------------------------------------- 6691 6692 Using properly precedence and associativity directives can help fixing 6693 shift/reduce conflicts that do not involve arithmetics-like operators. 6694 For instance, the "dangling `else'" problem (*note Shift/Reduce 6695 Conflicts: Shift/Reduce.) can be solved elegantly in two different ways. 6696 6697 In the present case, the conflict is between the token `"else"' 6698 willing to be shifted, and the rule `if_stmt: "if" expr "then" stmt', 6699 asking for reduction. By default, the precedence of a rule is that of 6700 its last token, here `"then"', so the conflict will be solved 6701 appropriately by giving `"else"' a precedence higher than that of 6702 `"then"', for instance as follows: 6703 6704 %nonassoc "then" 6705 %nonassoc "else" 6706 6707 Alternatively, you may give both tokens the same precedence, in 6708 which case associativity is used to solve the conflict. To preserve 6709 the shift action, use right associativity: 6710 6711 %right "then" "else" 6712 6713 Neither solution is perfect however. Since Bison does not provide, 6714 so far, support for "scoped" precedence, both force you to declare the 6715 precedence of these keywords with respect to the other operators your 6716 grammar. Therefore, instead of being warned about new conflicts you 6717 would be unaware of (e.g., a shift/reduce conflict due to `if test then 6718 1 else 2 + 3' being ambiguous: `if test then 1 else (2 + 3)' or `(if 6719 test then 1 else 2) + 3'?), the conflict will be already "fixed". 6720 6721 6722 File: bison.info, Node: Contextual Precedence, Next: Parser States, Prev: Precedence, Up: Algorithm 6723 6724 5.4 Context-Dependent Precedence 6725 ================================ 6726 6727 Often the precedence of an operator depends on the context. This sounds 6728 outlandish at first, but it is really very common. For example, a minus 6729 sign typically has a very high precedence as a unary operator, and a 6730 somewhat lower precedence (lower than multiplication) as a binary 6731 operator. 6732 6733 The Bison precedence declarations, `%left', `%right' and 6734 `%nonassoc', can only be used once for a given token; so a token has 6735 only one precedence declared in this way. For context-dependent 6736 precedence, you need to use an additional mechanism: the `%prec' 6737 modifier for rules. 6738 6739 The `%prec' modifier declares the precedence of a particular rule by 6740 specifying a terminal symbol whose precedence should be used for that 6741 rule. It's not necessary for that symbol to appear otherwise in the 6742 rule. The modifier's syntax is: 6743 6744 %prec TERMINAL-SYMBOL 6745 6746 and it is written after the components of the rule. Its effect is to 6747 assign the rule the precedence of TERMINAL-SYMBOL, overriding the 6748 precedence that would be deduced for it in the ordinary way. The 6749 altered rule precedence then affects how conflicts involving that rule 6750 are resolved (*note Operator Precedence: Precedence.). 6751 6752 Here is how `%prec' solves the problem of unary minus. First, 6753 declare a precedence for a fictitious terminal symbol named `UMINUS'. 6754 There are no tokens of this type, but the symbol serves to stand for its 6755 precedence: 6756 6757 ... 6758 %left '+' '-' 6759 %left '*' 6760 %left UMINUS 6761 6762 Now the precedence of `UMINUS' can be used in specific rules: 6763 6764 exp: 6765 ... 6766 | exp '-' exp 6767 ... 6768 | '-' exp %prec UMINUS 6769 6770 6771 File: bison.info, Node: Parser States, Next: Reduce/Reduce, Prev: Contextual Precedence, Up: Algorithm 6772 6773 5.5 Parser States 6774 ================= 6775 6776 The function `yyparse' is implemented using a finite-state machine. 6777 The values pushed on the parser stack are not simply token type codes; 6778 they represent the entire sequence of terminal and nonterminal symbols 6779 at or near the top of the stack. The current state collects all the 6780 information about previous input which is relevant to deciding what to 6781 do next. 6782 6783 Each time a lookahead token is read, the current parser state 6784 together with the type of lookahead token are looked up in a table. 6785 This table entry can say, "Shift the lookahead token." In this case, 6786 it also specifies the new parser state, which is pushed onto the top of 6787 the parser stack. Or it can say, "Reduce using rule number N." This 6788 means that a certain number of tokens or groupings are taken off the 6789 top of the stack, and replaced by one grouping. In other words, that 6790 number of states are popped from the stack, and one new state is pushed. 6791 6792 There is one other alternative: the table can say that the lookahead 6793 token is erroneous in the current state. This causes error processing 6794 to begin (*note Error Recovery::). 6795 6796 6797 File: bison.info, Node: Reduce/Reduce, Next: Mysterious Conflicts, Prev: Parser States, Up: Algorithm 6798 6799 5.6 Reduce/Reduce Conflicts 6800 =========================== 6801 6802 A reduce/reduce conflict occurs if there are two or more rules that 6803 apply to the same sequence of input. This usually indicates a serious 6804 error in the grammar. 6805 6806 For example, here is an erroneous attempt to define a sequence of 6807 zero or more `word' groupings. 6808 6809 sequence: 6810 /* empty */ { printf ("empty sequence\n"); } 6811 | maybeword 6812 | sequence word { printf ("added word %s\n", $2); } 6813 ; 6814 6815 maybeword: 6816 /* empty */ { printf ("empty maybeword\n"); } 6817 | word { printf ("single word %s\n", $1); } 6818 ; 6819 6820 The error is an ambiguity: there is more than one way to parse a single 6821 `word' into a `sequence'. It could be reduced to a `maybeword' and 6822 then into a `sequence' via the second rule. Alternatively, 6823 nothing-at-all could be reduced into a `sequence' via the first rule, 6824 and this could be combined with the `word' using the third rule for 6825 `sequence'. 6826 6827 There is also more than one way to reduce nothing-at-all into a 6828 `sequence'. This can be done directly via the first rule, or 6829 indirectly via `maybeword' and then the second rule. 6830 6831 You might think that this is a distinction without a difference, 6832 because it does not change whether any particular input is valid or 6833 not. But it does affect which actions are run. One parsing order runs 6834 the second rule's action; the other runs the first rule's action and 6835 the third rule's action. In this example, the output of the program 6836 changes. 6837 6838 Bison resolves a reduce/reduce conflict by choosing to use the rule 6839 that appears first in the grammar, but it is very risky to rely on 6840 this. Every reduce/reduce conflict must be studied and usually 6841 eliminated. Here is the proper way to define `sequence': 6842 6843 sequence: 6844 /* empty */ { printf ("empty sequence\n"); } 6845 | sequence word { printf ("added word %s\n", $2); } 6846 ; 6847 6848 Here is another common error that yields a reduce/reduce conflict: 6849 6850 sequence: 6851 /* empty */ 6852 | sequence words 6853 | sequence redirects 6854 ; 6855 6856 words: 6857 /* empty */ 6858 | words word 6859 ; 6860 6861 redirects: 6862 /* empty */ 6863 | redirects redirect 6864 ; 6865 6866 The intention here is to define a sequence which can contain either 6867 `word' or `redirect' groupings. The individual definitions of 6868 `sequence', `words' and `redirects' are error-free, but the three 6869 together make a subtle ambiguity: even an empty input can be parsed in 6870 infinitely many ways! 6871 6872 Consider: nothing-at-all could be a `words'. Or it could be two 6873 `words' in a row, or three, or any number. It could equally well be a 6874 `redirects', or two, or any number. Or it could be a `words' followed 6875 by three `redirects' and another `words'. And so on. 6876 6877 Here are two ways to correct these rules. First, to make it a 6878 single level of sequence: 6879 6880 sequence: 6881 /* empty */ 6882 | sequence word 6883 | sequence redirect 6884 ; 6885 6886 Second, to prevent either a `words' or a `redirects' from being 6887 empty: 6888 6889 sequence: 6890 /* empty */ 6891 | sequence words 6892 | sequence redirects 6893 ; 6894 6895 words: 6896 word 6897 | words word 6898 ; 6899 6900 redirects: 6901 redirect 6902 | redirects redirect 6903 ; 6904 6905 Yet this proposal introduces another kind of ambiguity! The input 6906 `word word' can be parsed as a single `words' composed of two `word's, 6907 or as two one-`word' `words' (and likewise for `redirect'/`redirects'). 6908 However this ambiguity is now a shift/reduce conflict, and therefore it 6909 can now be addressed with precedence directives. 6910 6911 To simplify the matter, we will proceed with `word' and `redirect' 6912 being tokens: `"word"' and `"redirect"'. 6913 6914 To prefer the longest `words', the conflict between the token 6915 `"word"' and the rule `sequence: sequence words' must be resolved as a 6916 shift. To this end, we use the same techniques as exposed above, see 6917 *note Using Precedence For Non Operators: Non Operators. One solution 6918 relies on precedences: use `%prec' to give a lower precedence to the 6919 rule: 6920 6921 %nonassoc "word" 6922 %nonassoc "sequence" 6923 %% 6924 sequence: 6925 /* empty */ 6926 | sequence word %prec "sequence" 6927 | sequence redirect %prec "sequence" 6928 ; 6929 6930 words: 6931 word 6932 | words "word" 6933 ; 6934 6935 Another solution relies on associativity: provide both the token and 6936 the rule with the same precedence, but make them right-associative: 6937 6938 %right "word" "redirect" 6939 %% 6940 sequence: 6941 /* empty */ 6942 | sequence word %prec "word" 6943 | sequence redirect %prec "redirect" 6944 ; 6945 6946 6947 File: bison.info, Node: Mysterious Conflicts, Next: Tuning LR, Prev: Reduce/Reduce, Up: Algorithm 6948 6949 5.7 Mysterious Conflicts 6950 ======================== 6951 6952 Sometimes reduce/reduce conflicts can occur that don't look warranted. 6953 Here is an example: 6954 6955 %% 6956 def: param_spec return_spec ','; 6957 param_spec: 6958 type 6959 | name_list ':' type 6960 ; 6961 return_spec: 6962 type 6963 | name ':' type 6964 ; 6965 type: "id"; 6966 name: "id"; 6967 name_list: 6968 name 6969 | name ',' name_list 6970 ; 6971 6972 It would seem that this grammar can be parsed with only a single 6973 token of lookahead: when a `param_spec' is being read, an `"id"' is a 6974 `name' if a comma or colon follows, or a `type' if another `"id"' 6975 follows. In other words, this grammar is LR(1). 6976 6977 However, for historical reasons, Bison cannot by default handle all 6978 LR(1) grammars. In this grammar, two contexts, that after an `"id"' at 6979 the beginning of a `param_spec' and likewise at the beginning of a 6980 `return_spec', are similar enough that Bison assumes they are the same. 6981 They appear similar because the same set of rules would be active--the 6982 rule for reducing to a `name' and that for reducing to a `type'. Bison 6983 is unable to determine at that stage of processing that the rules would 6984 require different lookahead tokens in the two contexts, so it makes a 6985 single parser state for them both. Combining the two contexts causes a 6986 conflict later. In parser terminology, this occurrence means that the 6987 grammar is not LALR(1). 6988 6989 For many practical grammars (specifically those that fall into the 6990 non-LR(1) class), the limitations of LALR(1) result in difficulties 6991 beyond just mysterious reduce/reduce conflicts. The best way to fix 6992 all these problems is to select a different parser table construction 6993 algorithm. Either IELR(1) or canonical LR(1) would suffice, but the 6994 former is more efficient and easier to debug during development. *Note 6995 LR Table Construction::, for details. (Bison's IELR(1) and canonical 6996 LR(1) implementations are experimental. More user feedback will help 6997 to stabilize them.) 6998 6999 If you instead wish to work around LALR(1)'s limitations, you can 7000 often fix a mysterious conflict by identifying the two parser states 7001 that are being confused, and adding something to make them look 7002 distinct. In the above example, adding one rule to `return_spec' as 7003 follows makes the problem go away: 7004 7005 ... 7006 return_spec: 7007 type 7008 | name ':' type 7009 | "id" "bogus" /* This rule is never used. */ 7010 ; 7011 7012 This corrects the problem because it introduces the possibility of an 7013 additional active rule in the context after the `"id"' at the beginning 7014 of `return_spec'. This rule is not active in the corresponding context 7015 in a `param_spec', so the two contexts receive distinct parser states. 7016 As long as the token `"bogus"' is never generated by `yylex', the added 7017 rule cannot alter the way actual input is parsed. 7018 7019 In this particular example, there is another way to solve the 7020 problem: rewrite the rule for `return_spec' to use `"id"' directly 7021 instead of via `name'. This also causes the two confusing contexts to 7022 have different sets of active rules, because the one for `return_spec' 7023 activates the altered rule for `return_spec' rather than the one for 7024 `name'. 7025 7026 param_spec: 7027 type 7028 | name_list ':' type 7029 ; 7030 return_spec: 7031 type 7032 | "id" ':' type 7033 ; 7034 7035 For a more detailed exposition of LALR(1) parsers and parser 7036 generators, *note DeRemer 1982: Bibliography. 7037 7038 7039 File: bison.info, Node: Tuning LR, Next: Generalized LR Parsing, Prev: Mysterious Conflicts, Up: Algorithm 7040 7041 5.8 Tuning LR 7042 ============= 7043 7044 The default behavior of Bison's LR-based parsers is chosen mostly for 7045 historical reasons, but that behavior is often not robust. For 7046 example, in the previous section, we discussed the mysterious conflicts 7047 that can be produced by LALR(1), Bison's default parser table 7048 construction algorithm. Another example is Bison's `%error-verbose' 7049 directive, which instructs the generated parser to produce verbose 7050 syntax error messages, which can sometimes contain incorrect 7051 information. 7052 7053 In this section, we explore several modern features of Bison that 7054 allow you to tune fundamental aspects of the generated LR-based 7055 parsers. Some of these features easily eliminate shortcomings like 7056 those mentioned above. Others can be helpful purely for understanding 7057 your parser. 7058 7059 Most of the features discussed in this section are still 7060 experimental. More user feedback will help to stabilize them. 7061 7062 * Menu: 7063 7064 * LR Table Construction:: Choose a different construction algorithm. 7065 * Default Reductions:: Disable default reductions. 7066 * LAC:: Correct lookahead sets in the parser states. 7067 * Unreachable States:: Keep unreachable parser states for debugging. 7068 7069 7070 File: bison.info, Node: LR Table Construction, Next: Default Reductions, Up: Tuning LR 7071 7072 5.8.1 LR Table Construction 7073 --------------------------- 7074 7075 For historical reasons, Bison constructs LALR(1) parser tables by 7076 default. However, LALR does not possess the full language-recognition 7077 power of LR. As a result, the behavior of parsers employing LALR 7078 parser tables is often mysterious. We presented a simple example of 7079 this effect in *note Mysterious Conflicts::. 7080 7081 As we also demonstrated in that example, the traditional approach to 7082 eliminating such mysterious behavior is to restructure the grammar. 7083 Unfortunately, doing so correctly is often difficult. Moreover, merely 7084 discovering that LALR causes mysterious behavior in your parser can be 7085 difficult as well. 7086 7087 Fortunately, Bison provides an easy way to eliminate the possibility 7088 of such mysterious behavior altogether. You simply need to activate a 7089 more powerful parser table construction algorithm by using the `%define 7090 lr.type' directive. 7091 7092 -- Directive: %define lr.type TYPE 7093 Specify the type of parser tables within the LR(1) family. The 7094 accepted values for TYPE are: 7095 7096 * `lalr' (default) 7097 7098 * `ielr' 7099 7100 * `canonical-lr' 7101 7102 (This feature is experimental. More user feedback will help to 7103 stabilize it.) 7104 7105 For example, to activate IELR, you might add the following directive 7106 to you grammar file: 7107 7108 %define lr.type ielr 7109 7110 For the example in *note Mysterious Conflicts::, the mysterious 7111 conflict is then eliminated, so there is no need to invest time in 7112 comprehending the conflict or restructuring the grammar to fix it. If, 7113 during future development, the grammar evolves such that all mysterious 7114 behavior would have disappeared using just LALR, you need not fear that 7115 continuing to use IELR will result in unnecessarily large parser tables. 7116 That is, IELR generates LALR tables when LALR (using a deterministic 7117 parsing algorithm) is sufficient to support the full 7118 language-recognition power of LR. Thus, by enabling IELR at the start 7119 of grammar development, you can safely and completely eliminate the 7120 need to consider LALR's shortcomings. 7121 7122 While IELR is almost always preferable, there are circumstances 7123 where LALR or the canonical LR parser tables described by Knuth (*note 7124 Knuth 1965: Bibliography.) can be useful. Here we summarize the 7125 relative advantages of each parser table construction algorithm within 7126 Bison: 7127 7128 * LALR 7129 7130 There are at least two scenarios where LALR can be worthwhile: 7131 7132 * GLR without static conflict resolution. 7133 7134 When employing GLR parsers (*note GLR Parsers::), if you do 7135 not resolve any conflicts statically (for example, with 7136 `%left' or `%prec'), then the parser explores all potential 7137 parses of any given input. In this case, the choice of 7138 parser table construction algorithm is guaranteed not to alter 7139 the language accepted by the parser. LALR parser tables are 7140 the smallest parser tables Bison can currently construct, so 7141 they may then be preferable. Nevertheless, once you begin to 7142 resolve conflicts statically, GLR behaves more like a 7143 deterministic parser in the syntactic contexts where those 7144 conflicts appear, and so either IELR or canonical LR can then 7145 be helpful to avoid LALR's mysterious behavior. 7146 7147 * Malformed grammars. 7148 7149 Occasionally during development, an especially malformed 7150 grammar with a major recurring flaw may severely impede the 7151 IELR or canonical LR parser table construction algorithm. 7152 LALR can be a quick way to construct parser tables in order 7153 to investigate such problems while ignoring the more subtle 7154 differences from IELR and canonical LR. 7155 7156 * IELR 7157 7158 IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That 7159 is, given any grammar (LR or non-LR), parsers using IELR or 7160 canonical LR parser tables always accept exactly the same set of 7161 sentences. However, like LALR, IELR merges parser states during 7162 parser table construction so that the number of parser states is 7163 often an order of magnitude less than for canonical LR. More 7164 importantly, because canonical LR's extra parser states may contain 7165 duplicate conflicts in the case of non-LR grammars, the number of 7166 conflicts for IELR is often an order of magnitude less as well. 7167 This effect can significantly reduce the complexity of developing 7168 a grammar. 7169 7170 * Canonical LR 7171 7172 While inefficient, canonical LR parser tables can be an 7173 interesting means to explore a grammar because they possess a 7174 property that IELR and LALR tables do not. That is, if 7175 `%nonassoc' is not used and default reductions are left disabled 7176 (*note Default Reductions::), then, for every left context of 7177 every canonical LR state, the set of tokens accepted by that state 7178 is guaranteed to be the exact set of tokens that is syntactically 7179 acceptable in that left context. It might then seem that an 7180 advantage of canonical LR parsers in production is that, under the 7181 above constraints, they are guaranteed to detect a syntax error as 7182 soon as possible without performing any unnecessary reductions. 7183 However, IELR parsers that use LAC are also able to achieve this 7184 behavior without sacrificing `%nonassoc' or default reductions. 7185 For details and a few caveats of LAC, *note LAC::. 7186 7187 For a more detailed exposition of the mysterious behavior in LALR 7188 parsers and the benefits of IELR, *note Denny 2008 March: Bibliography, 7189 and *note Denny 2010 November: Bibliography. 7190 7191 7192 File: bison.info, Node: Default Reductions, Next: LAC, Prev: LR Table Construction, Up: Tuning LR 7193 7194 5.8.2 Default Reductions 7195 ------------------------ 7196 7197 After parser table construction, Bison identifies the reduction with the 7198 largest lookahead set in each parser state. To reduce the size of the 7199 parser state, traditional Bison behavior is to remove that lookahead 7200 set and to assign that reduction to be the default parser action. Such 7201 a reduction is known as a "default reduction". 7202 7203 Default reductions affect more than the size of the parser tables. 7204 They also affect the behavior of the parser: 7205 7206 * Delayed `yylex' invocations. 7207 7208 A "consistent state" is a state that has only one possible parser 7209 action. If that action is a reduction and is encoded as a default 7210 reduction, then that consistent state is called a "defaulted 7211 state". Upon reaching a defaulted state, a Bison-generated parser 7212 does not bother to invoke `yylex' to fetch the next token before 7213 performing the reduction. In other words, whether default 7214 reductions are enabled in consistent states determines how soon a 7215 Bison-generated parser invokes `yylex' for a token: immediately 7216 when it _reaches_ that token in the input or when it eventually 7217 _needs_ that token as a lookahead to determine the next parser 7218 action. Traditionally, default reductions are enabled, and so the 7219 parser exhibits the latter behavior. 7220 7221 The presence of defaulted states is an important consideration when 7222 designing `yylex' and the grammar file. That is, if the behavior 7223 of `yylex' can influence or be influenced by the semantic actions 7224 associated with the reductions in defaulted states, then the delay 7225 of the next `yylex' invocation until after those reductions is 7226 significant. For example, the semantic actions might pop a scope 7227 stack that `yylex' uses to determine what token to return. Thus, 7228 the delay might be necessary to ensure that `yylex' does not look 7229 up the next token in a scope that should already be considered 7230 closed. 7231 7232 * Delayed syntax error detection. 7233 7234 When the parser fetches a new token by invoking `yylex', it checks 7235 whether there is an action for that token in the current parser 7236 state. The parser detects a syntax error if and only if either 7237 (1) there is no action for that token or (2) the action for that 7238 token is the error action (due to the use of `%nonassoc'). 7239 However, if there is a default reduction in that state (which 7240 might or might not be a defaulted state), then it is impossible 7241 for condition 1 to exist. That is, all tokens have an action. 7242 Thus, the parser sometimes fails to detect the syntax error until 7243 it reaches a later state. 7244 7245 While default reductions never cause the parser to accept 7246 syntactically incorrect sentences, the delay of syntax error 7247 detection can have unexpected effects on the behavior of the 7248 parser. However, the delay can be caused anyway by parser state 7249 merging and the use of `%nonassoc', and it can be fixed by another 7250 Bison feature, LAC. We discuss the effects of delayed syntax 7251 error detection and LAC more in the next section (*note LAC::). 7252 7253 For canonical LR, the only default reduction that Bison enables by 7254 default is the accept action, which appears only in the accepting 7255 state, which has no other action and is thus a defaulted state. 7256 However, the default accept action does not delay any `yylex' 7257 invocation or syntax error detection because the accept action ends the 7258 parse. 7259 7260 For LALR and IELR, Bison enables default reductions in nearly all 7261 states by default. There are only two exceptions. First, states that 7262 have a shift action on the `error' token do not have default reductions 7263 because delayed syntax error detection could then prevent the `error' 7264 token from ever being shifted in that state. However, parser state 7265 merging can cause the same effect anyway, and LAC fixes it in both 7266 cases, so future versions of Bison might drop this exception when LAC 7267 is activated. Second, GLR parsers do not record the default reduction 7268 as the action on a lookahead token for which there is a conflict. The 7269 correct action in this case is to split the parse instead. 7270 7271 To adjust which states have default reductions enabled, use the 7272 `%define lr.default-reductions' directive. 7273 7274 -- Directive: %define lr.default-reductions WHERE 7275 Specify the kind of states that are permitted to contain default 7276 reductions. The accepted values of WHERE are: 7277 * `most' (default for LALR and IELR) 7278 7279 * `consistent' 7280 7281 * `accepting' (default for canonical LR) 7282 7283 (The ability to specify where default reductions are permitted is 7284 experimental. More user feedback will help to stabilize it.) 7285 7286 7287 File: bison.info, Node: LAC, Next: Unreachable States, Prev: Default Reductions, Up: Tuning LR 7288 7289 5.8.3 LAC 7290 --------- 7291 7292 Canonical LR, IELR, and LALR can suffer from a couple of problems upon 7293 encountering a syntax error. First, the parser might perform additional 7294 parser stack reductions before discovering the syntax error. Such 7295 reductions can perform user semantic actions that are unexpected because 7296 they are based on an invalid token, and they cause error recovery to 7297 begin in a different syntactic context than the one in which the 7298 invalid token was encountered. Second, when verbose error messages are 7299 enabled (*note Error Reporting::), the expected token list in the 7300 syntax error message can both contain invalid tokens and omit valid 7301 tokens. 7302 7303 The culprits for the above problems are `%nonassoc', default 7304 reductions in inconsistent states (*note Default Reductions::), and 7305 parser state merging. Because IELR and LALR merge parser states, they 7306 suffer the most. Canonical LR can suffer only if `%nonassoc' is used 7307 or if default reductions are enabled for inconsistent states. 7308 7309 LAC (Lookahead Correction) is a new mechanism within the parsing 7310 algorithm that solves these problems for canonical LR, IELR, and LALR 7311 without sacrificing `%nonassoc', default reductions, or state merging. 7312 You can enable LAC with the `%define parse.lac' directive. 7313 7314 -- Directive: %define parse.lac VALUE 7315 Enable LAC to improve syntax error handling. 7316 * `none' (default) 7317 7318 * `full' 7319 (This feature is experimental. More user feedback will help to 7320 stabilize it. Moreover, it is currently only available for 7321 deterministic parsers in C.) 7322 7323 Conceptually, the LAC mechanism is straight-forward. Whenever the 7324 parser fetches a new token from the scanner so that it can determine 7325 the next parser action, it immediately suspends normal parsing and 7326 performs an exploratory parse using a temporary copy of the normal 7327 parser state stack. During this exploratory parse, the parser does not 7328 perform user semantic actions. If the exploratory parse reaches a 7329 shift action, normal parsing then resumes on the normal parser stacks. 7330 If the exploratory parse reaches an error instead, the parser reports a 7331 syntax error. If verbose syntax error messages are enabled, the parser 7332 must then discover the list of expected tokens, so it performs a 7333 separate exploratory parse for each token in the grammar. 7334 7335 There is one subtlety about the use of LAC. That is, when in a 7336 consistent parser state with a default reduction, the parser will not 7337 attempt to fetch a token from the scanner because no lookahead is 7338 needed to determine the next parser action. Thus, whether default 7339 reductions are enabled in consistent states (*note Default 7340 Reductions::) affects how soon the parser detects a syntax error: 7341 immediately when it _reaches_ an erroneous token or when it eventually 7342 _needs_ that token as a lookahead to determine the next parser action. 7343 The latter behavior is probably more intuitive, so Bison currently 7344 provides no way to achieve the former behavior while default reductions 7345 are enabled in consistent states. 7346 7347 Thus, when LAC is in use, for some fixed decision of whether to 7348 enable default reductions in consistent states, canonical LR and IELR 7349 behave almost exactly the same for both syntactically acceptable and 7350 syntactically unacceptable input. While LALR still does not support 7351 the full language-recognition power of canonical LR and IELR, LAC at 7352 least enables LALR's syntax error handling to correctly reflect LALR's 7353 language-recognition power. 7354 7355 There are a few caveats to consider when using LAC: 7356 7357 * Infinite parsing loops. 7358 7359 IELR plus LAC does have one shortcoming relative to canonical LR. 7360 Some parsers generated by Bison can loop infinitely. LAC does not 7361 fix infinite parsing loops that occur between encountering a 7362 syntax error and detecting it, but enabling canonical LR or 7363 disabling default reductions sometimes does. 7364 7365 * Verbose error message limitations. 7366 7367 Because of internationalization considerations, Bison-generated 7368 parsers limit the size of the expected token list they are willing 7369 to report in a verbose syntax error message. If the number of 7370 expected tokens exceeds that limit, the list is simply dropped 7371 from the message. Enabling LAC can increase the size of the list 7372 and thus cause the parser to drop it. Of course, dropping the 7373 list is better than reporting an incorrect list. 7374 7375 * Performance. 7376 7377 Because LAC requires many parse actions to be performed twice, it 7378 can have a performance penalty. However, not all parse actions 7379 must be performed twice. Specifically, during a series of default 7380 reductions in consistent states and shift actions, the parser 7381 never has to initiate an exploratory parse. Moreover, the most 7382 time-consuming tasks in a parse are often the file I/O, the 7383 lexical analysis performed by the scanner, and the user's semantic 7384 actions, but none of these are performed during the exploratory 7385 parse. Finally, the base of the temporary stack used during an 7386 exploratory parse is a pointer into the normal parser state stack 7387 so that the stack is never physically copied. In our experience, 7388 the performance penalty of LAC has proved insignificant for 7389 practical grammars. 7390 7391 While the LAC algorithm shares techniques that have been recognized 7392 in the parser community for years, for the publication that introduces 7393 LAC, *note Denny 2010 May: Bibliography. 7394 7395 7396 File: bison.info, Node: Unreachable States, Prev: LAC, Up: Tuning LR 7397 7398 5.8.4 Unreachable States 7399 ------------------------ 7400 7401 If there exists no sequence of transitions from the parser's start 7402 state to some state S, then Bison considers S to be an "unreachable 7403 state". A state can become unreachable during conflict resolution if 7404 Bison disables a shift action leading to it from a predecessor state. 7405 7406 By default, Bison removes unreachable states from the parser after 7407 conflict resolution because they are useless in the generated parser. 7408 However, keeping unreachable states is sometimes useful when trying to 7409 understand the relationship between the parser and the grammar. 7410 7411 -- Directive: %define lr.keep-unreachable-states VALUE 7412 Request that Bison allow unreachable states to remain in the 7413 parser tables. VALUE must be a Boolean. The default is `false'. 7414 7415 There are a few caveats to consider: 7416 7417 * Missing or extraneous warnings. 7418 7419 Unreachable states may contain conflicts and may use rules not 7420 used in any other state. Thus, keeping unreachable states may 7421 induce warnings that are irrelevant to your parser's behavior, and 7422 it may eliminate warnings that are relevant. Of course, the 7423 change in warnings may actually be relevant to a parser table 7424 analysis that wants to keep unreachable states, so this behavior 7425 will likely remain in future Bison releases. 7426 7427 * Other useless states. 7428 7429 While Bison is able to remove unreachable states, it is not 7430 guaranteed to remove other kinds of useless states. Specifically, 7431 when Bison disables reduce actions during conflict resolution, 7432 some goto actions may become useless, and thus some additional 7433 states may become useless. If Bison were to compute which goto 7434 actions were useless and then disable those actions, it could 7435 identify such states as unreachable and then remove those states. 7436 However, Bison does not compute which goto actions are useless. 7437 7438 7439 File: bison.info, Node: Generalized LR Parsing, Next: Memory Management, Prev: Tuning LR, Up: Algorithm 7440 7441 5.9 Generalized LR (GLR) Parsing 7442 ================================ 7443 7444 Bison produces _deterministic_ parsers that choose uniquely when to 7445 reduce and which reduction to apply based on a summary of the preceding 7446 input and on one extra token of lookahead. As a result, normal Bison 7447 handles a proper subset of the family of context-free languages. 7448 Ambiguous grammars, since they have strings with more than one possible 7449 sequence of reductions cannot have deterministic parsers in this sense. 7450 The same is true of languages that require more than one symbol of 7451 lookahead, since the parser lacks the information necessary to make a 7452 decision at the point it must be made in a shift-reduce parser. 7453 Finally, as previously mentioned (*note Mysterious Conflicts::), there 7454 are languages where Bison's default choice of how to summarize the 7455 input seen so far loses necessary information. 7456 7457 When you use the `%glr-parser' declaration in your grammar file, 7458 Bison generates a parser that uses a different algorithm, called 7459 Generalized LR (or GLR). A Bison GLR parser uses the same basic 7460 algorithm for parsing as an ordinary Bison parser, but behaves 7461 differently in cases where there is a shift-reduce conflict that has not 7462 been resolved by precedence rules (*note Precedence::) or a 7463 reduce-reduce conflict. When a GLR parser encounters such a situation, 7464 it effectively _splits_ into a several parsers, one for each possible 7465 shift or reduction. These parsers then proceed as usual, consuming 7466 tokens in lock-step. Some of the stacks may encounter other conflicts 7467 and split further, with the result that instead of a sequence of states, 7468 a Bison GLR parsing stack is what is in effect a tree of states. 7469 7470 In effect, each stack represents a guess as to what the proper parse 7471 is. Additional input may indicate that a guess was wrong, in which case 7472 the appropriate stack silently disappears. Otherwise, the semantics 7473 actions generated in each stack are saved, rather than being executed 7474 immediately. When a stack disappears, its saved semantic actions never 7475 get executed. When a reduction causes two stacks to become equivalent, 7476 their sets of semantic actions are both saved with the state that 7477 results from the reduction. We say that two stacks are equivalent when 7478 they both represent the same sequence of states, and each pair of 7479 corresponding states represents a grammar symbol that produces the same 7480 segment of the input token stream. 7481 7482 Whenever the parser makes a transition from having multiple states 7483 to having one, it reverts to the normal deterministic parsing 7484 algorithm, after resolving and executing the saved-up actions. At this 7485 transition, some of the states on the stack will have semantic values 7486 that are sets (actually multisets) of possible actions. The parser 7487 tries to pick one of the actions by first finding one whose rule has 7488 the highest dynamic precedence, as set by the `%dprec' declaration. 7489 Otherwise, if the alternative actions are not ordered by precedence, 7490 but there the same merging function is declared for both rules by the 7491 `%merge' declaration, Bison resolves and evaluates both and then calls 7492 the merge function on the result. Otherwise, it reports an ambiguity. 7493 7494 It is possible to use a data structure for the GLR parsing tree that 7495 permits the processing of any LR(1) grammar in linear time (in the size 7496 of the input), any unambiguous (not necessarily LR(1)) grammar in 7497 quadratic worst-case time, and any general (possibly ambiguous) 7498 context-free grammar in cubic worst-case time. However, Bison currently 7499 uses a simpler data structure that requires time proportional to the 7500 length of the input times the maximum number of stacks required for any 7501 prefix of the input. Thus, really ambiguous or nondeterministic 7502 grammars can require exponential time and space to process. Such badly 7503 behaving examples, however, are not generally of practical interest. 7504 Usually, nondeterminism in a grammar is local--the parser is "in doubt" 7505 only for a few tokens at a time. Therefore, the current data structure 7506 should generally be adequate. On LR(1) portions of a grammar, in 7507 particular, it is only slightly slower than with the deterministic 7508 LR(1) Bison parser. 7509 7510 For a more detailed exposition of GLR parsers, *note Scott 2000: 7511 Bibliography. 7512 7513 7514 File: bison.info, Node: Memory Management, Prev: Generalized LR Parsing, Up: Algorithm 7515 7516 5.10 Memory Management, and How to Avoid Memory Exhaustion 7517 ========================================================== 7518 7519 The Bison parser stack can run out of memory if too many tokens are 7520 shifted and not reduced. When this happens, the parser function 7521 `yyparse' calls `yyerror' and then returns 2. 7522 7523 Because Bison parsers have growing stacks, hitting the upper limit 7524 usually results from using a right recursion instead of a left 7525 recursion, see *note Recursive Rules: Recursion. 7526 7527 By defining the macro `YYMAXDEPTH', you can control how deep the 7528 parser stack can become before memory is exhausted. Define the macro 7529 with a value that is an integer. This value is the maximum number of 7530 tokens that can be shifted (and not reduced) before overflow. 7531 7532 The stack space allowed is not necessarily allocated. If you 7533 specify a large value for `YYMAXDEPTH', the parser normally allocates a 7534 small stack at first, and then makes it bigger by stages as needed. 7535 This increasing allocation happens automatically and silently. 7536 Therefore, you do not need to make `YYMAXDEPTH' painfully small merely 7537 to save space for ordinary inputs that do not need much stack. 7538 7539 However, do not allow `YYMAXDEPTH' to be a value so large that 7540 arithmetic overflow could occur when calculating the size of the stack 7541 space. Also, do not allow `YYMAXDEPTH' to be less than `YYINITDEPTH'. 7542 7543 The default value of `YYMAXDEPTH', if you do not define it, is 10000. 7544 7545 You can control how much stack is allocated initially by defining the 7546 macro `YYINITDEPTH' to a positive integer. For the deterministic 7547 parser in C, this value must be a compile-time constant unless you are 7548 assuming C99 or some other target language or compiler that allows 7549 variable-length arrays. The default is 200. 7550 7551 Do not allow `YYINITDEPTH' to be greater than `YYMAXDEPTH'. 7552 7553 Because of semantic differences between C and C++, the deterministic 7554 parsers in C produced by Bison cannot grow when compiled by C++ 7555 compilers. In this precise case (compiling a C parser as C++) you are 7556 suggested to grow `YYINITDEPTH'. The Bison maintainers hope to fix 7557 this deficiency in a future release. 7558 7559 7560 File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top 7561 7562 6 Error Recovery 7563 **************** 7564 7565 It is not usually acceptable to have a program terminate on a syntax 7566 error. For example, a compiler should recover sufficiently to parse the 7567 rest of the input file and check it for errors; a calculator should 7568 accept another expression. 7569 7570 In a simple interactive command parser where each input is one line, 7571 it may be sufficient to allow `yyparse' to return 1 on error and have 7572 the caller ignore the rest of the input line when that happens (and