1 This is ../../../doc/bison.info, produced by makeinfo version 4.13 from 2 ../../../doc/bison.texi. 3 4 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU 5 parser generator. 6 7 Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation, 8 Inc. 9 10 Permission is granted to copy, distribute and/or modify this 11 document under the terms of the GNU Free Documentation License, 12 Version 1.3 or any later version published by the Free Software 13 Foundation; with no Invariant Sections, with the Front-Cover texts 14 being "A GNU Manual," and with the Back-Cover Texts as in (a) 15 below. A copy of the license is included in the section entitled 16 "GNU Free Documentation License." 17 18 (a) The FSF's Back-Cover Text is: "You have the freedom to copy and 19 modify this GNU manual. Buying copies from the FSF supports it in 20 developing GNU and promoting software freedom." 21 22 INFO-DIR-SECTION Software development 23 START-INFO-DIR-ENTRY 24 * bison: (bison). GNU parser generator (Yacc replacement). 25 END-INFO-DIR-ENTRY 26 27 28 File: bison.info, Node: Top, Next: Introduction, Up: (dir) 29 30 Bison 31 ***** 32 33 This manual (9 December 2012) is for GNU Bison (version 2.7), the GNU 34 parser generator. 35 36 Copyright (C) 1988-1993, 1995, 1998-2012 Free Software Foundation, 37 Inc. 38 39 Permission is granted to copy, distribute and/or modify this 40 document under the terms of the GNU Free Documentation License, 41 Version 1.3 or any later version published by the Free Software 42 Foundation; with no Invariant Sections, with the Front-Cover texts 43 being "A GNU Manual," and with the Back-Cover Texts as in (a) 44 below. A copy of the license is included in the section entitled 45 "GNU Free Documentation License." 46 47 (a) The FSF's Back-Cover Text is: "You have the freedom to copy and 48 modify this GNU manual. Buying copies from the FSF supports it in 49 developing GNU and promoting software freedom." 50 51 * Menu: 52 53 * Introduction:: 54 * Conditions:: 55 * Copying:: The GNU General Public License says 56 how you can copy and share Bison. 57 58 Tutorial sections: 59 * Concepts:: Basic concepts for understanding Bison. 60 * Examples:: Three simple explained examples of using Bison. 61 62 Reference sections: 63 * Grammar File:: Writing Bison declarations and rules. 64 * Interface:: C-language interface to the parser function `yyparse'. 65 * Algorithm:: How the Bison parser works at run-time. 66 * Error Recovery:: Writing rules for error recovery. 67 * Context Dependency:: What to do if your language syntax is too 68 messy for Bison to handle straightforwardly. 69 * Debugging:: Understanding or debugging Bison parsers. 70 * Invocation:: How to run Bison (to produce the parser implementation). 71 * Other Languages:: Creating C++ and Java parsers. 72 * FAQ:: Frequently Asked Questions 73 * Table of Symbols:: All the keywords of the Bison language are explained. 74 * Glossary:: Basic concepts are explained. 75 * Copying This Manual:: License for copying this manual. 76 * Bibliography:: Publications cited in this manual. 77 * Index of Terms:: Cross-references to the text. 78 79 --- The Detailed Node Listing --- 80 81 The Concepts of Bison 82 83 * Language and Grammar:: Languages and context-free grammars, 84 as mathematical ideas. 85 * Grammar in Bison:: How we represent grammars for Bison's sake. 86 * Semantic Values:: Each token or syntactic grouping can have 87 a semantic value (the value of an integer, 88 the name of an identifier, etc.). 89 * Semantic Actions:: Each rule can have an action containing C code. 90 * GLR Parsers:: Writing parsers for general context-free languages. 91 * Locations:: Overview of location tracking. 92 * Bison Parser:: What are Bison's input and output, 93 how is the output used? 94 * Stages:: Stages in writing and running Bison grammars. 95 * Grammar Layout:: Overall structure of a Bison grammar file. 96 97 Writing GLR Parsers 98 99 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 100 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 101 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 102 * Compiler Requirements:: GLR parsers require a modern C compiler. 103 104 Examples 105 106 * RPN Calc:: Reverse polish notation calculator; 107 a first example with no operator precedence. 108 * Infix Calc:: Infix (algebraic) notation calculator. 109 Operator precedence is introduced. 110 * Simple Error Recovery:: Continuing after syntax errors. 111 * Location Tracking Calc:: Demonstrating the use of @N and @$. 112 * Multi-function Calc:: Calculator with memory and trig functions. 113 It uses multiple data-types for semantic values. 114 * Exercises:: Ideas for improving the multi-function calculator. 115 116 Reverse Polish Notation Calculator 117 118 * Rpcalc Declarations:: Prologue (declarations) for rpcalc. 119 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. 120 * Rpcalc Lexer:: The lexical analyzer. 121 * Rpcalc Main:: The controlling function. 122 * Rpcalc Error:: The error reporting function. 123 * Rpcalc Generate:: Running Bison on the grammar file. 124 * Rpcalc Compile:: Run the C compiler on the output code. 125 126 Grammar Rules for `rpcalc' 127 128 * Rpcalc Input:: 129 * Rpcalc Line:: 130 * Rpcalc Expr:: 131 132 Location Tracking Calculator: `ltcalc' 133 134 * Ltcalc Declarations:: Bison and C declarations for ltcalc. 135 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations. 136 * Ltcalc Lexer:: The lexical analyzer. 137 138 Multi-Function Calculator: `mfcalc' 139 140 * Mfcalc Declarations:: Bison declarations for multi-function calculator. 141 * Mfcalc Rules:: Grammar rules for the calculator. 142 * Mfcalc Symbol Table:: Symbol table management subroutines. 143 144 Bison Grammar Files 145 146 * Grammar Outline:: Overall layout of the grammar file. 147 * Symbols:: Terminal and nonterminal symbols. 148 * Rules:: How to write grammar rules. 149 * Recursion:: Writing recursive rules. 150 * Semantics:: Semantic values and actions. 151 * Tracking Locations:: Locations and actions. 152 * Named References:: Using named references in actions. 153 * Declarations:: All kinds of Bison declarations are described here. 154 * Multiple Parsers:: Putting more than one Bison parser in one program. 155 156 Outline of a Bison Grammar 157 158 * Prologue:: Syntax and usage of the prologue. 159 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. 160 * Bison Declarations:: Syntax and usage of the Bison declarations section. 161 * Grammar Rules:: Syntax and usage of the grammar rules section. 162 * Epilogue:: Syntax and usage of the epilogue. 163 164 Defining Language Semantics 165 166 * Value Type:: Specifying one data type for all semantic values. 167 * Multiple Types:: Specifying several alternative data types. 168 * Actions:: An action is the semantic definition of a grammar rule. 169 * Action Types:: Specifying data types for actions to operate on. 170 * Mid-Rule Actions:: Most actions go at the end of a rule. 171 This says when, why and how to use the exceptional 172 action in the middle of a rule. 173 174 Actions in Mid-Rule 175 176 * Using Mid-Rule Actions:: Putting an action in the middle of a rule. 177 * Mid-Rule Action Translation:: How mid-rule actions are actually processed. 178 * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. 179 180 Tracking Locations 181 182 * Location Type:: Specifying a data type for locations. 183 * Actions and Locations:: Using locations in actions. 184 * Location Default Action:: Defining a general way to compute locations. 185 186 Bison Declarations 187 188 * Require Decl:: Requiring a Bison version. 189 * Token Decl:: Declaring terminal symbols. 190 * Precedence Decl:: Declaring terminals with precedence and associativity. 191 * Union Decl:: Declaring the set of all semantic value types. 192 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 193 * Initial Action Decl:: Code run before parsing starts. 194 * Destructor Decl:: Declaring how symbols are freed. 195 * Printer Decl:: Declaring how symbol values are displayed. 196 * Expect Decl:: Suppressing warnings about parsing conflicts. 197 * Start Decl:: Specifying the start symbol. 198 * Pure Decl:: Requesting a reentrant parser. 199 * Push Decl:: Requesting a push parser. 200 * Decl Summary:: Table of all Bison declarations. 201 * %define Summary:: Defining variables to adjust Bison's behavior. 202 * %code Summary:: Inserting code into the parser source. 203 204 Parser C-Language Interface 205 206 * Parser Function:: How to call `yyparse' and what it returns. 207 * Push Parser Function:: How to call `yypush_parse' and what it returns. 208 * Pull Parser Function:: How to call `yypull_parse' and what it returns. 209 * Parser Create Function:: How to call `yypstate_new' and what it returns. 210 * Parser Delete Function:: How to call `yypstate_delete' and what it returns. 211 * Lexical:: You must supply a function `yylex' 212 which reads tokens. 213 * Error Reporting:: You must supply a function `yyerror'. 214 * Action Features:: Special features for use in actions. 215 * Internationalization:: How to let the parser speak in the user's 216 native language. 217 218 The Lexical Analyzer Function `yylex' 219 220 * Calling Convention:: How `yyparse' calls `yylex'. 221 * Token Values:: How `yylex' must return the semantic value 222 of the token it has read. 223 * Token Locations:: How `yylex' must return the text location 224 (line number, etc.) of the token, if the 225 actions want that. 226 * Pure Calling:: How the calling convention differs in a pure parser 227 (*note A Pure (Reentrant) Parser: Pure Decl.). 228 229 The Bison Parser Algorithm 230 231 * Lookahead:: Parser looks one token ahead when deciding what to do. 232 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 233 * Precedence:: Operator precedence works by resolving conflicts. 234 * Contextual Precedence:: When an operator's precedence depends on context. 235 * Parser States:: The parser is a finite-state-machine with stack. 236 * Reduce/Reduce:: When two rules are applicable in the same situation. 237 * Mysterious Conflicts:: Conflicts that look unjustified. 238 * Tuning LR:: How to tune fundamental aspects of LR-based parsing. 239 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 240 * Memory Management:: What happens when memory is exhausted. How to avoid it. 241 242 Operator Precedence 243 244 * Why Precedence:: An example showing why precedence is needed. 245 * Using Precedence:: How to specify precedence in Bison grammars. 246 * Precedence Examples:: How these features are used in the previous example. 247 * How Precedence:: How they work. 248 * Non Operators:: Using precedence for general conflicts. 249 250 Tuning LR 251 252 * LR Table Construction:: Choose a different construction algorithm. 253 * Default Reductions:: Disable default reductions. 254 * LAC:: Correct lookahead sets in the parser states. 255 * Unreachable States:: Keep unreachable parser states for debugging. 256 257 Handling Context Dependencies 258 259 * Semantic Tokens:: Token parsing can depend on the semantic context. 260 * Lexical Tie-ins:: Token parsing can depend on the syntactic context. 261 * Tie-in Recovery:: Lexical tie-ins have implications for how 262 error recovery rules must be written. 263 264 Debugging Your Parser 265 266 * Understanding:: Understanding the structure of your parser. 267 * Graphviz:: Getting a visual representation of the parser. 268 * Xml:: Getting a markup representation of the parser. 269 * Tracing:: Tracing the execution of your parser. 270 271 Tracing Your Parser 272 273 * Enabling Traces:: Activating run-time trace support 274 * Mfcalc Traces:: Extending `mfcalc' to support traces 275 * The YYPRINT Macro:: Obsolete interface for semantic value reports 276 277 Invoking Bison 278 279 * Bison Options:: All the options described in detail, 280 in alphabetical order by short options. 281 * Option Cross Key:: Alphabetical list of long options. 282 * Yacc Library:: Yacc-compatible `yylex' and `main'. 283 284 Parsers Written In Other Languages 285 286 * C++ Parsers:: The interface to generate C++ parser classes 287 * Java Parsers:: The interface to generate Java parser classes 288 289 C++ Parsers 290 291 * C++ Bison Interface:: Asking for C++ parser generation 292 * C++ Semantic Values:: %union vs. C++ 293 * C++ Location Values:: The position and location classes 294 * C++ Parser Interface:: Instantiating and running the parser 295 * C++ Scanner Interface:: Exchanges between yylex and parse 296 * A Complete C++ Example:: Demonstrating their use 297 298 C++ Location Values 299 300 * C++ position:: One point in the source file 301 * C++ location:: Two points in the source file 302 * User Defined Location Type:: Required interface for locations 303 304 A Complete C++ Example 305 306 * Calc++ --- C++ Calculator:: The specifications 307 * Calc++ Parsing Driver:: An active parsing context 308 * Calc++ Parser:: A parser class 309 * Calc++ Scanner:: A pure C++ Flex scanner 310 * Calc++ Top Level:: Conducting the band 311 312 Java Parsers 313 314 * Java Bison Interface:: Asking for Java parser generation 315 * Java Semantic Values:: %type and %token vs. Java 316 * Java Location Values:: The position and location classes 317 * Java Parser Interface:: Instantiating and running the parser 318 * Java Scanner Interface:: Specifying the scanner for the parser 319 * Java Action Features:: Special features for use in actions 320 * Java Differences:: Differences between C/C++ and Java Grammars 321 * Java Declarations Summary:: List of Bison declarations used with Java 322 323 Frequently Asked Questions 324 325 * Memory Exhausted:: Breaking the Stack Limits 326 * How Can I Reset the Parser:: `yyparse' Keeps some State 327 * Strings are Destroyed:: `yylval' Loses Track of Strings 328 * Implementing Gotos/Loops:: Control Flow in the Calculator 329 * Multiple start-symbols:: Factoring closely related grammars 330 * Secure? Conform?:: Is Bison POSIX safe? 331 * I can't build Bison:: Troubleshooting 332 * Where can I find help?:: Troubleshouting 333 * Bug Reports:: Troublereporting 334 * More Languages:: Parsers in C++, Java, and so on 335 * Beta Testing:: Experimenting development versions 336 * Mailing Lists:: Meeting other Bison users 337 338 Copying This Manual 339 340 * Copying This Manual:: License for copying this manual. 341 342 343 File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top 344 345 Introduction 346 ************ 347 348 "Bison" is a general-purpose parser generator that converts an 349 annotated context-free grammar into a deterministic LR or generalized 350 LR (GLR) parser employing LALR(1) parser tables. As an experimental 351 feature, Bison can also generate IELR(1) or canonical LR(1) parser 352 tables. Once you are proficient with Bison, you can use it to develop 353 a wide range of language parsers, from those used in simple desk 354 calculators to complex programming languages. 355 356 Bison is upward compatible with Yacc: all properly-written Yacc 357 grammars ought to work with Bison with no change. Anyone familiar with 358 Yacc should be able to use Bison with little trouble. You need to be 359 fluent in C or C++ programming in order to use Bison or to understand 360 this manual. Java is also supported as an experimental feature. 361 362 We begin with tutorial chapters that explain the basic concepts of 363 using Bison and show three explained examples, each building on the 364 last. If you don't know Bison or Yacc, start by reading these 365 chapters. Reference chapters follow, which describe specific aspects 366 of Bison in detail. 367 368 Bison was written originally by Robert Corbett. Richard Stallman 369 made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University 370 added multi-character string literals and other features. Since then, 371 Bison has grown more robust and evolved many other new features thanks 372 to the hard work of a long list of volunteers. For details, see the 373 `THANKS' and `ChangeLog' files included in the Bison distribution. 374 375 This edition corresponds to version 2.7 of Bison. 376 377 378 File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top 379 380 Conditions for Using Bison 381 ************************** 382 383 The distribution terms for Bison-generated parsers permit using the 384 parsers in nonfree programs. Before Bison version 2.2, these extra 385 permissions applied only when Bison was generating LALR(1) parsers in 386 C. And before Bison version 1.24, Bison-generated parsers could be 387 used only in programs that were free software. 388 389 The other GNU programming tools, such as the GNU C compiler, have 390 never had such a requirement. They could always be used for nonfree 391 software. The reason Bison was different was not due to a special 392 policy decision; it resulted from applying the usual General Public 393 License to all of the Bison source code. 394 395 The main output of the Bison utility--the Bison parser implementation 396 file--contains a verbatim copy of a sizable piece of Bison, which is 397 the code for the parser's implementation. (The actions from your 398 grammar are inserted into this implementation at one point, but most of 399 the rest of the implementation is not changed.) When we applied the 400 GPL terms to the skeleton code for the parser's implementation, the 401 effect was to restrict the use of Bison output to free software. 402 403 We didn't change the terms because of sympathy for people who want to 404 make software proprietary. *Software should be free.* But we 405 concluded that limiting Bison's use to free software was doing little to 406 encourage people to make other software free. So we decided to make the 407 practical conditions for using Bison match the practical conditions for 408 using the other GNU tools. 409 410 This exception applies when Bison is generating code for a parser. 411 You can tell whether the exception applies to a Bison output file by 412 inspecting the file for text beginning with "As a special 413 exception...". The text spells out the exact terms of the exception. 414 415 416 File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top 417 418 GNU GENERAL PUBLIC LICENSE 419 ************************** 420 421 Version 3, 29 June 2007 422 423 Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/' 424 425 Everyone is permitted to copy and distribute verbatim copies of this 426 license document, but changing it is not allowed. 427 428 Preamble 429 ======== 430 431 The GNU General Public License is a free, copyleft license for software 432 and other kinds of works. 433 434 The licenses for most software and other practical works are designed 435 to take away your freedom to share and change the works. By contrast, 436 the GNU General Public License is intended to guarantee your freedom to 437 share and change all versions of a program--to make sure it remains 438 free software for all its users. We, the Free Software Foundation, use 439 the GNU General Public License for most of our software; it applies 440 also to any other work released this way by its authors. You can apply 441 it to your programs, too. 442 443 When we speak of free software, we are referring to freedom, not 444 price. Our General Public Licenses are designed to make sure that you 445 have the freedom to distribute copies of free software (and charge for 446 them if you wish), that you receive source code or can get it if you 447 want it, that you can change the software or use pieces of it in new 448 free programs, and that you know you can do these things. 449 450 To protect your rights, we need to prevent others from denying you 451 these rights or asking you to surrender the rights. Therefore, you 452 have certain responsibilities if you distribute copies of the software, 453 or if you modify it: responsibilities to respect the freedom of others. 454 455 For example, if you distribute copies of such a program, whether 456 gratis or for a fee, you must pass on to the recipients the same 457 freedoms that you received. You must make sure that they, too, receive 458 or can get the source code. And you must show them these terms so they 459 know their rights. 460 461 Developers that use the GNU GPL protect your rights with two steps: 462 (1) assert copyright on the software, and (2) offer you this License 463 giving you legal permission to copy, distribute and/or modify it. 464 465 For the developers' and authors' protection, the GPL clearly explains 466 that there is no warranty for this free software. For both users' and 467 authors' sake, the GPL requires that modified versions be marked as 468 changed, so that their problems will not be attributed erroneously to 469 authors of previous versions. 470 471 Some devices are designed to deny users access to install or run 472 modified versions of the software inside them, although the 473 manufacturer can do so. This is fundamentally incompatible with the 474 aim of protecting users' freedom to change the software. The 475 systematic pattern of such abuse occurs in the area of products for 476 individuals to use, which is precisely where it is most unacceptable. 477 Therefore, we have designed this version of the GPL to prohibit the 478 practice for those products. If such problems arise substantially in 479 other domains, we stand ready to extend this provision to those domains 480 in future versions of the GPL, as needed to protect the freedom of 481 users. 482 483 Finally, every program is threatened constantly by software patents. 484 States should not allow patents to restrict development and use of 485 software on general-purpose computers, but in those that do, we wish to 486 avoid the special danger that patents applied to a free program could 487 make it effectively proprietary. To prevent this, the GPL assures that 488 patents cannot be used to render the program non-free. 489 490 The precise terms and conditions for copying, distribution and 491 modification follow. 492 493 TERMS AND CONDITIONS 494 ==================== 495 496 0. Definitions. 497 498 "This License" refers to version 3 of the GNU General Public 499 License. 500 501 "Copyright" also means copyright-like laws that apply to other 502 kinds of works, such as semiconductor masks. 503 504 "The Program" refers to any copyrightable work licensed under this 505 License. Each licensee is addressed as "you". "Licensees" and 506 "recipients" may be individuals or organizations. 507 508 To "modify" a work means to copy from or adapt all or part of the 509 work in a fashion requiring copyright permission, other than the 510 making of an exact copy. The resulting work is called a "modified 511 version" of the earlier work or a work "based on" the earlier work. 512 513 A "covered work" means either the unmodified Program or a work 514 based on the Program. 515 516 To "propagate" a work means to do anything with it that, without 517 permission, would make you directly or secondarily liable for 518 infringement under applicable copyright law, except executing it 519 on a computer or modifying a private copy. Propagation includes 520 copying, distribution (with or without modification), making 521 available to the public, and in some countries other activities as 522 well. 523 524 To "convey" a work means any kind of propagation that enables other 525 parties to make or receive copies. Mere interaction with a user 526 through a computer network, with no transfer of a copy, is not 527 conveying. 528 529 An interactive user interface displays "Appropriate Legal Notices" 530 to the extent that it includes a convenient and prominently visible 531 feature that (1) displays an appropriate copyright notice, and (2) 532 tells the user that there is no warranty for the work (except to 533 the extent that warranties are provided), that licensees may 534 convey the work under this License, and how to view a copy of this 535 License. If the interface presents a list of user commands or 536 options, such as a menu, a prominent item in the list meets this 537 criterion. 538 539 1. Source Code. 540 541 The "source code" for a work means the preferred form of the work 542 for making modifications to it. "Object code" means any 543 non-source form of a work. 544 545 A "Standard Interface" means an interface that either is an 546 official standard defined by a recognized standards body, or, in 547 the case of interfaces specified for a particular programming 548 language, one that is widely used among developers working in that 549 language. 550 551 The "System Libraries" of an executable work include anything, 552 other than the work as a whole, that (a) is included in the normal 553 form of packaging a Major Component, but which is not part of that 554 Major Component, and (b) serves only to enable use of the work 555 with that Major Component, or to implement a Standard Interface 556 for which an implementation is available to the public in source 557 code form. A "Major Component", in this context, means a major 558 essential component (kernel, window system, and so on) of the 559 specific operating system (if any) on which the executable work 560 runs, or a compiler used to produce the work, or an object code 561 interpreter used to run it. 562 563 The "Corresponding Source" for a work in object code form means all 564 the source code needed to generate, install, and (for an executable 565 work) run the object code and to modify the work, including 566 scripts to control those activities. However, it does not include 567 the work's System Libraries, or general-purpose tools or generally 568 available free programs which are used unmodified in performing 569 those activities but which are not part of the work. For example, 570 Corresponding Source includes interface definition files 571 associated with source files for the work, and the source code for 572 shared libraries and dynamically linked subprograms that the work 573 is specifically designed to require, such as by intimate data 574 communication or control flow between those subprograms and other 575 parts of the work. 576 577 The Corresponding Source need not include anything that users can 578 regenerate automatically from other parts of the Corresponding 579 Source. 580 581 The Corresponding Source for a work in source code form is that 582 same work. 583 584 2. Basic Permissions. 585 586 All rights granted under this License are granted for the term of 587 copyright on the Program, and are irrevocable provided the stated 588 conditions are met. This License explicitly affirms your unlimited 589 permission to run the unmodified Program. The output from running 590 a covered work is covered by this License only if the output, 591 given its content, constitutes a covered work. This License 592 acknowledges your rights of fair use or other equivalent, as 593 provided by copyright law. 594 595 You may make, run and propagate covered works that you do not 596 convey, without conditions so long as your license otherwise 597 remains in force. You may convey covered works to others for the 598 sole purpose of having them make modifications exclusively for 599 you, or provide you with facilities for running those works, 600 provided that you comply with the terms of this License in 601 conveying all material for which you do not control copyright. 602 Those thus making or running the covered works for you must do so 603 exclusively on your behalf, under your direction and control, on 604 terms that prohibit them from making any copies of your 605 copyrighted material outside their relationship with you. 606 607 Conveying under any other circumstances is permitted solely under 608 the conditions stated below. Sublicensing is not allowed; section 609 10 makes it unnecessary. 610 611 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 612 613 No covered work shall be deemed part of an effective technological 614 measure under any applicable law fulfilling obligations under 615 article 11 of the WIPO copyright treaty adopted on 20 December 616 1996, or similar laws prohibiting or restricting circumvention of 617 such measures. 618 619 When you convey a covered work, you waive any legal power to forbid 620 circumvention of technological measures to the extent such 621 circumvention is effected by exercising rights under this License 622 with respect to the covered work, and you disclaim any intention 623 to limit operation or modification of the work as a means of 624 enforcing, against the work's users, your or third parties' legal 625 rights to forbid circumvention of technological measures. 626 627 4. Conveying Verbatim Copies. 628 629 You may convey verbatim copies of the Program's source code as you 630 receive it, in any medium, provided that you conspicuously and 631 appropriately publish on each copy an appropriate copyright notice; 632 keep intact all notices stating that this License and any 633 non-permissive terms added in accord with section 7 apply to the 634 code; keep intact all notices of the absence of any warranty; and 635 give all recipients a copy of this License along with the Program. 636 637 You may charge any price or no price for each copy that you convey, 638 and you may offer support or warranty protection for a fee. 639 640 5. Conveying Modified Source Versions. 641 642 You may convey a work based on the Program, or the modifications to 643 produce it from the Program, in the form of source code under the 644 terms of section 4, provided that you also meet all of these 645 conditions: 646 647 a. The work must carry prominent notices stating that you 648 modified it, and giving a relevant date. 649 650 b. The work must carry prominent notices stating that it is 651 released under this License and any conditions added under 652 section 7. This requirement modifies the requirement in 653 section 4 to "keep intact all notices". 654 655 c. You must license the entire work, as a whole, under this 656 License to anyone who comes into possession of a copy. This 657 License will therefore apply, along with any applicable 658 section 7 additional terms, to the whole of the work, and all 659 its parts, regardless of how they are packaged. This License 660 gives no permission to license the work in any other way, but 661 it does not invalidate such permission if you have separately 662 received it. 663 664 d. If the work has interactive user interfaces, each must display 665 Appropriate Legal Notices; however, if the Program has 666 interactive interfaces that do not display Appropriate Legal 667 Notices, your work need not make them do so. 668 669 A compilation of a covered work with other separate and independent 670 works, which are not by their nature extensions of the covered 671 work, and which are not combined with it such as to form a larger 672 program, in or on a volume of a storage or distribution medium, is 673 called an "aggregate" if the compilation and its resulting 674 copyright are not used to limit the access or legal rights of the 675 compilation's users beyond what the individual works permit. 676 Inclusion of a covered work in an aggregate does not cause this 677 License to apply to the other parts of the aggregate. 678 679 6. Conveying Non-Source Forms. 680 681 You may convey a covered work in object code form under the terms 682 of sections 4 and 5, provided that you also convey the 683 machine-readable Corresponding Source under the terms of this 684 License, in one of these ways: 685 686 a. Convey the object code in, or embodied in, a physical product 687 (including a physical distribution medium), accompanied by the 688 Corresponding Source fixed on a durable physical medium 689 customarily used for software interchange. 690 691 b. Convey the object code in, or embodied in, a physical product 692 (including a physical distribution medium), accompanied by a 693 written offer, valid for at least three years and valid for 694 as long as you offer spare parts or customer support for that 695 product model, to give anyone who possesses the object code 696 either (1) a copy of the Corresponding Source for all the 697 software in the product that is covered by this License, on a 698 durable physical medium customarily used for software 699 interchange, for a price no more than your reasonable cost of 700 physically performing this conveying of source, or (2) access 701 to copy the Corresponding Source from a network server at no 702 charge. 703 704 c. Convey individual copies of the object code with a copy of 705 the written offer to provide the Corresponding Source. This 706 alternative is allowed only occasionally and noncommercially, 707 and only if you received the object code with such an offer, 708 in accord with subsection 6b. 709 710 d. Convey the object code by offering access from a designated 711 place (gratis or for a charge), and offer equivalent access 712 to the Corresponding Source in the same way through the same 713 place at no further charge. You need not require recipients 714 to copy the Corresponding Source along with the object code. 715 If the place to copy the object code is a network server, the 716 Corresponding Source may be on a different server (operated 717 by you or a third party) that supports equivalent copying 718 facilities, provided you maintain clear directions next to 719 the object code saying where to find the Corresponding Source. 720 Regardless of what server hosts the Corresponding Source, you 721 remain obligated to ensure that it is available for as long 722 as needed to satisfy these requirements. 723 724 e. Convey the object code using peer-to-peer transmission, 725 provided you inform other peers where the object code and 726 Corresponding Source of the work are being offered to the 727 general public at no charge under subsection 6d. 728 729 730 A separable portion of the object code, whose source code is 731 excluded from the Corresponding Source as a System Library, need 732 not be included in conveying the object code work. 733 734 A "User Product" is either (1) a "consumer product", which means 735 any tangible personal property which is normally used for personal, 736 family, or household purposes, or (2) anything designed or sold for 737 incorporation into a dwelling. In determining whether a product 738 is a consumer product, doubtful cases shall be resolved in favor of 739 coverage. For a particular product received by a particular user, 740 "normally used" refers to a typical or common use of that class of 741 product, regardless of the status of the particular user or of the 742 way in which the particular user actually uses, or expects or is 743 expected to use, the product. A product is a consumer product 744 regardless of whether the product has substantial commercial, 745 industrial or non-consumer uses, unless such uses represent the 746 only significant mode of use of the product. 747 748 "Installation Information" for a User Product means any methods, 749 procedures, authorization keys, or other information required to 750 install and execute modified versions of a covered work in that 751 User Product from a modified version of its Corresponding Source. 752 The information must suffice to ensure that the continued 753 functioning of the modified object code is in no case prevented or 754 interfered with solely because modification has been made. 755 756 If you convey an object code work under this section in, or with, 757 or specifically for use in, a User Product, and the conveying 758 occurs as part of a transaction in which the right of possession 759 and use of the User Product is transferred to the recipient in 760 perpetuity or for a fixed term (regardless of how the transaction 761 is characterized), the Corresponding Source conveyed under this 762 section must be accompanied by the Installation Information. But 763 this requirement does not apply if neither you nor any third party 764 retains the ability to install modified object code on the User 765 Product (for example, the work has been installed in ROM). 766 767 The requirement to provide Installation Information does not 768 include a requirement to continue to provide support service, 769 warranty, or updates for a work that has been modified or 770 installed by the recipient, or for the User Product in which it 771 has been modified or installed. Access to a network may be denied 772 when the modification itself materially and adversely affects the 773 operation of the network or violates the rules and protocols for 774 communication across the network. 775 776 Corresponding Source conveyed, and Installation Information 777 provided, in accord with this section must be in a format that is 778 publicly documented (and with an implementation available to the 779 public in source code form), and must require no special password 780 or key for unpacking, reading or copying. 781 782 7. Additional Terms. 783 784 "Additional permissions" are terms that supplement the terms of 785 this License by making exceptions from one or more of its 786 conditions. Additional permissions that are applicable to the 787 entire Program shall be treated as though they were included in 788 this License, to the extent that they are valid under applicable 789 law. If additional permissions apply only to part of the Program, 790 that part may be used separately under those permissions, but the 791 entire Program remains governed by this License without regard to 792 the additional permissions. 793 794 When you convey a copy of a covered work, you may at your option 795 remove any additional permissions from that copy, or from any part 796 of it. (Additional permissions may be written to require their own 797 removal in certain cases when you modify the work.) You may place 798 additional permissions on material, added by you to a covered work, 799 for which you have or can give appropriate copyright permission. 800 801 Notwithstanding any other provision of this License, for material 802 you add to a covered work, you may (if authorized by the copyright 803 holders of that material) supplement the terms of this License 804 with terms: 805 806 a. Disclaiming warranty or limiting liability differently from 807 the terms of sections 15 and 16 of this License; or 808 809 b. Requiring preservation of specified reasonable legal notices 810 or author attributions in that material or in the Appropriate 811 Legal Notices displayed by works containing it; or 812 813 c. Prohibiting misrepresentation of the origin of that material, 814 or requiring that modified versions of such material be 815 marked in reasonable ways as different from the original 816 version; or 817 818 d. Limiting the use for publicity purposes of names of licensors 819 or authors of the material; or 820 821 e. Declining to grant rights under trademark law for use of some 822 trade names, trademarks, or service marks; or 823 824 f. Requiring indemnification of licensors and authors of that 825 material by anyone who conveys the material (or modified 826 versions of it) with contractual assumptions of liability to 827 the recipient, for any liability that these contractual 828 assumptions directly impose on those licensors and authors. 829 830 All other non-permissive additional terms are considered "further 831 restrictions" within the meaning of section 10. If the Program as 832 you received it, or any part of it, contains a notice stating that 833 it is governed by this License along with a term that is a further 834 restriction, you may remove that term. If a license document 835 contains a further restriction but permits relicensing or 836 conveying under this License, you may add to a covered work 837 material governed by the terms of that license document, provided 838 that the further restriction does not survive such relicensing or 839 conveying. 840 841 If you add terms to a covered work in accord with this section, you 842 must place, in the relevant source files, a statement of the 843 additional terms that apply to those files, or a notice indicating 844 where to find the applicable terms. 845 846 Additional terms, permissive or non-permissive, may be stated in 847 the form of a separately written license, or stated as exceptions; 848 the above requirements apply either way. 849 850 8. Termination. 851 852 You may not propagate or modify a covered work except as expressly 853 provided under this License. Any attempt otherwise to propagate or 854 modify it is void, and will automatically terminate your rights 855 under this License (including any patent licenses granted under 856 the third paragraph of section 11). 857 858 However, if you cease all violation of this License, then your 859 license from a particular copyright holder is reinstated (a) 860 provisionally, unless and until the copyright holder explicitly 861 and finally terminates your license, and (b) permanently, if the 862 copyright holder fails to notify you of the violation by some 863 reasonable means prior to 60 days after the cessation. 864 865 Moreover, your license from a particular copyright holder is 866 reinstated permanently if the copyright holder notifies you of the 867 violation by some reasonable means, this is the first time you have 868 received notice of violation of this License (for any work) from 869 that copyright holder, and you cure the violation prior to 30 days 870 after your receipt of the notice. 871 872 Termination of your rights under this section does not terminate 873 the licenses of parties who have received copies or rights from 874 you under this License. If your rights have been terminated and 875 not permanently reinstated, you do not qualify to receive new 876 licenses for the same material under section 10. 877 878 9. Acceptance Not Required for Having Copies. 879 880 You are not required to accept this License in order to receive or 881 run a copy of the Program. Ancillary propagation of a covered work 882 occurring solely as a consequence of using peer-to-peer 883 transmission to receive a copy likewise does not require 884 acceptance. However, nothing other than this License grants you 885 permission to propagate or modify any covered work. These actions 886 infringe copyright if you do not accept this License. Therefore, 887 by modifying or propagating a covered work, you indicate your 888 acceptance of this License to do so. 889 890 10. Automatic Licensing of Downstream Recipients. 891 892 Each time you convey a covered work, the recipient automatically 893 receives a license from the original licensors, to run, modify and 894 propagate that work, subject to this License. You are not 895 responsible for enforcing compliance by third parties with this 896 License. 897 898 An "entity transaction" is a transaction transferring control of an 899 organization, or substantially all assets of one, or subdividing an 900 organization, or merging organizations. If propagation of a 901 covered work results from an entity transaction, each party to that 902 transaction who receives a copy of the work also receives whatever 903 licenses to the work the party's predecessor in interest had or 904 could give under the previous paragraph, plus a right to 905 possession of the Corresponding Source of the work from the 906 predecessor in interest, if the predecessor has it or can get it 907 with reasonable efforts. 908 909 You may not impose any further restrictions on the exercise of the 910 rights granted or affirmed under this License. For example, you 911 may not impose a license fee, royalty, or other charge for 912 exercise of rights granted under this License, and you may not 913 initiate litigation (including a cross-claim or counterclaim in a 914 lawsuit) alleging that any patent claim is infringed by making, 915 using, selling, offering for sale, or importing the Program or any 916 portion of it. 917 918 11. Patents. 919 920 A "contributor" is a copyright holder who authorizes use under this 921 License of the Program or a work on which the Program is based. 922 The work thus licensed is called the contributor's "contributor 923 version". 924 925 A contributor's "essential patent claims" are all patent claims 926 owned or controlled by the contributor, whether already acquired or 927 hereafter acquired, that would be infringed by some manner, 928 permitted by this License, of making, using, or selling its 929 contributor version, but do not include claims that would be 930 infringed only as a consequence of further modification of the 931 contributor version. For purposes of this definition, "control" 932 includes the right to grant patent sublicenses in a manner 933 consistent with the requirements of this License. 934 935 Each contributor grants you a non-exclusive, worldwide, 936 royalty-free patent license under the contributor's essential 937 patent claims, to make, use, sell, offer for sale, import and 938 otherwise run, modify and propagate the contents of its 939 contributor version. 940 941 In the following three paragraphs, a "patent license" is any 942 express agreement or commitment, however denominated, not to 943 enforce a patent (such as an express permission to practice a 944 patent or covenant not to sue for patent infringement). To 945 "grant" such a patent license to a party means to make such an 946 agreement or commitment not to enforce a patent against the party. 947 948 If you convey a covered work, knowingly relying on a patent 949 license, and the Corresponding Source of the work is not available 950 for anyone to copy, free of charge and under the terms of this 951 License, through a publicly available network server or other 952 readily accessible means, then you must either (1) cause the 953 Corresponding Source to be so available, or (2) arrange to deprive 954 yourself of the benefit of the patent license for this particular 955 work, or (3) arrange, in a manner consistent with the requirements 956 of this License, to extend the patent license to downstream 957 recipients. "Knowingly relying" means you have actual knowledge 958 that, but for the patent license, your conveying the covered work 959 in a country, or your recipient's use of the covered work in a 960 country, would infringe one or more identifiable patents in that 961 country that you have reason to believe are valid. 962 963 If, pursuant to or in connection with a single transaction or 964 arrangement, you convey, or propagate by procuring conveyance of, a 965 covered work, and grant a patent license to some of the parties 966 receiving the covered work authorizing them to use, propagate, 967 modify or convey a specific copy of the covered work, then the 968 patent license you grant is automatically extended to all 969 recipients of the covered work and works based on it. 970 971 A patent license is "discriminatory" if it does not include within 972 the scope of its coverage, prohibits the exercise of, or is 973 conditioned on the non-exercise of one or more of the rights that 974 are specifically granted under this License. You may not convey a 975 covered work if you are a party to an arrangement with a third 976 party that is in the business of distributing software, under 977 which you make payment to the third party based on the extent of 978 your activity of conveying the work, and under which the third 979 party grants, to any of the parties who would receive the covered 980 work from you, a discriminatory patent license (a) in connection 981 with copies of the covered work conveyed by you (or copies made 982 from those copies), or (b) primarily for and in connection with 983 specific products or compilations that contain the covered work, 984 unless you entered into that arrangement, or that patent license 985 was granted, prior to 28 March 2007. 986 987 Nothing in this License shall be construed as excluding or limiting 988 any implied license or other defenses to infringement that may 989 otherwise be available to you under applicable patent law. 990 991 12. No Surrender of Others' Freedom. 992 993 If conditions are imposed on you (whether by court order, 994 agreement or otherwise) that contradict the conditions of this 995 License, they do not excuse you from the conditions of this 996 License. If you cannot convey a covered work so as to satisfy 997 simultaneously your obligations under this License and any other 998 pertinent obligations, then as a consequence you may not convey it 999 at all. For example, if you agree to terms that obligate you to 1000 collect a royalty for further conveying from those to whom you 1001 convey the Program, the only way you could satisfy both those 1002 terms and this License would be to refrain entirely from conveying 1003 the Program. 1004 1005 13. Use with the GNU Affero General Public License. 1006 1007 Notwithstanding any other provision of this License, you have 1008 permission to link or combine any covered work with a work licensed 1009 under version 3 of the GNU Affero General Public License into a 1010 single combined work, and to convey the resulting work. The terms 1011 of this License will continue to apply to the part which is the 1012 covered work, but the special requirements of the GNU Affero 1013 General Public License, section 13, concerning interaction through 1014 a network will apply to the combination as such. 1015 1016 14. Revised Versions of this License. 1017 1018 The Free Software Foundation may publish revised and/or new 1019 versions of the GNU General Public License from time to time. 1020 Such new versions will be similar in spirit to the present 1021 version, but may differ in detail to address new problems or 1022 concerns. 1023 1024 Each version is given a distinguishing version number. If the 1025 Program specifies that a certain numbered version of the GNU 1026 General Public License "or any later version" applies to it, you 1027 have the option of following the terms and conditions either of 1028 that numbered version or of any later version published by the 1029 Free Software Foundation. If the Program does not specify a 1030 version number of the GNU General Public License, you may choose 1031 any version ever published by the Free Software Foundation. 1032 1033 If the Program specifies that a proxy can decide which future 1034 versions of the GNU General Public License can be used, that 1035 proxy's public statement of acceptance of a version permanently 1036 authorizes you to choose that version for the Program. 1037 1038 Later license versions may give you additional or different 1039 permissions. However, no additional obligations are imposed on any 1040 author or copyright holder as a result of your choosing to follow a 1041 later version. 1042 1043 15. Disclaimer of Warranty. 1044 1045 THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 1046 APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE 1047 COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" 1048 WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, 1049 INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 1050 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE 1051 RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. 1052 SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL 1053 NECESSARY SERVICING, REPAIR OR CORRECTION. 1054 1055 16. Limitation of Liability. 1056 1057 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN 1058 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES 1059 AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU 1060 FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR 1061 CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE 1062 THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA 1063 BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 1064 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 1065 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF 1066 THE POSSIBILITY OF SUCH DAMAGES. 1067 1068 17. Interpretation of Sections 15 and 16. 1069 1070 If the disclaimer of warranty and limitation of liability provided 1071 above cannot be given local legal effect according to their terms, 1072 reviewing courts shall apply local law that most closely 1073 approximates an absolute waiver of all civil liability in 1074 connection with the Program, unless a warranty or assumption of 1075 liability accompanies a copy of the Program in return for a fee. 1076 1077 1078 END OF TERMS AND CONDITIONS 1079 =========================== 1080 1081 How to Apply These Terms to Your New Programs 1082 ============================================= 1083 1084 If you develop a new program, and you want it to be of the greatest 1085 possible use to the public, the best way to achieve this is to make it 1086 free software which everyone can redistribute and change under these 1087 terms. 1088 1089 To do so, attach the following notices to the program. It is safest 1090 to attach them to the start of each source file to most effectively 1091 state the exclusion of warranty; and each file should have at least the 1092 "copyright" line and a pointer to where the full notice is found. 1093 1094 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. 1095 Copyright (C) YEAR NAME OF AUTHOR 1096 1097 This program is free software: you can redistribute it and/or modify 1098 it under the terms of the GNU General Public License as published by 1099 the Free Software Foundation, either version 3 of the License, or (at 1100 your option) any later version. 1101 1102 This program is distributed in the hope that it will be useful, but 1103 WITHOUT ANY WARRANTY; without even the implied warranty of 1104 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 1105 General Public License for more details. 1106 1107 You should have received a copy of the GNU General Public License 1108 along with this program. If not, see `http://www.gnu.org/licenses/'. 1109 1110 Also add information on how to contact you by electronic and paper 1111 mail. 1112 1113 If the program does terminal interaction, make it output a short 1114 notice like this when it starts in an interactive mode: 1115 1116 PROGRAM Copyright (C) YEAR NAME OF AUTHOR 1117 This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 1118 This is free software, and you are welcome to redistribute it 1119 under certain conditions; type `show c' for details. 1120 1121 The hypothetical commands `show w' and `show c' should show the 1122 appropriate parts of the General Public License. Of course, your 1123 program's commands might be different; for a GUI interface, you would 1124 use an "about box". 1125 1126 You should also get your employer (if you work as a programmer) or 1127 school, if any, to sign a "copyright disclaimer" for the program, if 1128 necessary. For more information on this, and how to apply and follow 1129 the GNU GPL, see `http://www.gnu.org/licenses/'. 1130 1131 The GNU General Public License does not permit incorporating your 1132 program into proprietary programs. If your program is a subroutine 1133 library, you may consider it more useful to permit linking proprietary 1134 applications with the library. If this is what you want to do, use the 1135 GNU Lesser General Public License instead of this License. But first, 1136 please read `http://www.gnu.org/philosophy/why-not-lgpl.html'. 1137 1138 1139 File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top 1140 1141 1 The Concepts of Bison 1142 *********************** 1143 1144 This chapter introduces many of the basic concepts without which the 1145 details of Bison will not make sense. If you do not already know how to 1146 use Bison or Yacc, we suggest you start by reading this chapter 1147 carefully. 1148 1149 * Menu: 1150 1151 * Language and Grammar:: Languages and context-free grammars, 1152 as mathematical ideas. 1153 * Grammar in Bison:: How we represent grammars for Bison's sake. 1154 * Semantic Values:: Each token or syntactic grouping can have 1155 a semantic value (the value of an integer, 1156 the name of an identifier, etc.). 1157 * Semantic Actions:: Each rule can have an action containing C code. 1158 * GLR Parsers:: Writing parsers for general context-free languages. 1159 * Locations:: Overview of location tracking. 1160 * Bison Parser:: What are Bison's input and output, 1161 how is the output used? 1162 * Stages:: Stages in writing and running Bison grammars. 1163 * Grammar Layout:: Overall structure of a Bison grammar file. 1164 1165 1166 File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Up: Concepts 1167 1168 1.1 Languages and Context-Free Grammars 1169 ======================================= 1170 1171 In order for Bison to parse a language, it must be described by a 1172 "context-free grammar". This means that you specify one or more 1173 "syntactic groupings" and give rules for constructing them from their 1174 parts. For example, in the C language, one kind of grouping is called 1175 an `expression'. One rule for making an expression might be, "An 1176 expression can be made of a minus sign and another expression". 1177 Another would be, "An expression can be an integer". As you can see, 1178 rules are often recursive, but there must be at least one rule which 1179 leads out of the recursion. 1180 1181 The most common formal system for presenting such rules for humans 1182 to read is "Backus-Naur Form" or "BNF", which was developed in order to 1183 specify the language Algol 60. Any grammar expressed in BNF is a 1184 context-free grammar. The input to Bison is essentially 1185 machine-readable BNF. 1186 1187 There are various important subclasses of context-free grammars. 1188 Although it can handle almost all context-free grammars, Bison is 1189 optimized for what are called LR(1) grammars. In brief, in these 1190 grammars, it must be possible to tell how to parse any portion of an 1191 input string with just a single token of lookahead. For historical 1192 reasons, Bison by default is limited by the additional restrictions of 1193 LALR(1), which is hard to explain simply. *Note Mysterious 1194 Conflicts::, for more information on this. As an experimental feature, 1195 you can escape these additional restrictions by requesting IELR(1) or 1196 canonical LR(1) parser tables. *Note LR Table Construction::, to learn 1197 how. 1198 1199 Parsers for LR(1) grammars are "deterministic", meaning roughly that 1200 the next grammar rule to apply at any point in the input is uniquely 1201 determined by the preceding input and a fixed, finite portion (called a 1202 "lookahead") of the remaining input. A context-free grammar can be 1203 "ambiguous", meaning that there are multiple ways to apply the grammar 1204 rules to get the same inputs. Even unambiguous grammars can be 1205 "nondeterministic", meaning that no fixed lookahead always suffices to 1206 determine the next grammar rule to apply. With the proper 1207 declarations, Bison is also able to parse these more general 1208 context-free grammars, using a technique known as GLR parsing (for 1209 Generalized LR). Bison's GLR parsers are able to handle any 1210 context-free grammar for which the number of possible parses of any 1211 given string is finite. 1212 1213 In the formal grammatical rules for a language, each kind of 1214 syntactic unit or grouping is named by a "symbol". Those which are 1215 built by grouping smaller constructs according to grammatical rules are 1216 called "nonterminal symbols"; those which can't be subdivided are called 1217 "terminal symbols" or "token types". We call a piece of input 1218 corresponding to a single terminal symbol a "token", and a piece 1219 corresponding to a single nonterminal symbol a "grouping". 1220 1221 We can use the C language as an example of what symbols, terminal and 1222 nonterminal, mean. The tokens of C are identifiers, constants (numeric 1223 and string), and the various keywords, arithmetic operators and 1224 punctuation marks. So the terminal symbols of a grammar for C include 1225 `identifier', `number', `string', plus one symbol for each keyword, 1226 operator or punctuation mark: `if', `return', `const', `static', `int', 1227 `char', `plus-sign', `open-brace', `close-brace', `comma' and many more. 1228 (These tokens can be subdivided into characters, but that is a matter of 1229 lexicography, not grammar.) 1230 1231 Here is a simple C function subdivided into tokens: 1232 1233 int /* keyword `int' */ 1234 square (int x) /* identifier, open-paren, keyword `int', 1235 identifier, close-paren */ 1236 { /* open-brace */ 1237 return x * x; /* keyword `return', identifier, asterisk, 1238 identifier, semicolon */ 1239 } /* close-brace */ 1240 1241 The syntactic groupings of C include the expression, the statement, 1242 the declaration, and the function definition. These are represented in 1243 the grammar of C by nonterminal symbols `expression', `statement', 1244 `declaration' and `function definition'. The full grammar uses dozens 1245 of additional language constructs, each with its own nonterminal 1246 symbol, in order to express the meanings of these four. The example 1247 above is a function definition; it contains one declaration, and one 1248 statement. In the statement, each `x' is an expression and so is `x * 1249 x'. 1250 1251 Each nonterminal symbol must have grammatical rules showing how it 1252 is made out of simpler constructs. For example, one kind of C 1253 statement is the `return' statement; this would be described with a 1254 grammar rule which reads informally as follows: 1255 1256 A `statement' can be made of a `return' keyword, an `expression' 1257 and a `semicolon'. 1258 1259 There would be many other rules for `statement', one for each kind of 1260 statement in C. 1261 1262 One nonterminal symbol must be distinguished as the special one which 1263 defines a complete utterance in the language. It is called the "start 1264 symbol". In a compiler, this means a complete input program. In the C 1265 language, the nonterminal symbol `sequence of definitions and 1266 declarations' plays this role. 1267 1268 For example, `1 + 2' is a valid C expression--a valid part of a C 1269 program--but it is not valid as an _entire_ C program. In the 1270 context-free grammar of C, this follows from the fact that `expression' 1271 is not the start symbol. 1272 1273 The Bison parser reads a sequence of tokens as its input, and groups 1274 the tokens using the grammar rules. If the input is valid, the end 1275 result is that the entire token sequence reduces to a single grouping 1276 whose symbol is the grammar's start symbol. If we use a grammar for C, 1277 the entire input must be a `sequence of definitions and declarations'. 1278 If not, the parser reports a syntax error. 1279 1280 1281 File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts 1282 1283 1.2 From Formal Rules to Bison Input 1284 ==================================== 1285 1286 A formal grammar is a mathematical construct. To define the language 1287 for Bison, you must write a file expressing the grammar in Bison syntax: 1288 a "Bison grammar" file. *Note Bison Grammar Files: Grammar File. 1289 1290 A nonterminal symbol in the formal grammar is represented in Bison 1291 input as an identifier, like an identifier in C. By convention, it 1292 should be in lower case, such as `expr', `stmt' or `declaration'. 1293 1294 The Bison representation for a terminal symbol is also called a 1295 "token type". Token types as well can be represented as C-like 1296 identifiers. By convention, these identifiers should be upper case to 1297 distinguish them from nonterminals: for example, `INTEGER', 1298 `IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a 1299 particular keyword in the language should be named after that keyword 1300 converted to upper case. The terminal symbol `error' is reserved for 1301 error recovery. *Note Symbols::. 1302 1303 A terminal symbol can also be represented as a character literal, 1304 just like a C character constant. You should do this whenever a token 1305 is just a single character (parenthesis, plus-sign, etc.): use that 1306 same character in a literal as the terminal symbol for that token. 1307 1308 A third way to represent a terminal symbol is with a C string 1309 constant containing several characters. *Note Symbols::, for more 1310 information. 1311 1312 The grammar rules also have an expression in Bison syntax. For 1313 example, here is the Bison rule for a C `return' statement. The 1314 semicolon in quotes is a literal character token, representing part of 1315 the C syntax for the statement; the naked semicolon, and the colon, are 1316 Bison punctuation used in every rule. 1317 1318 stmt: RETURN expr ';' ; 1319 1320 *Note Syntax of Grammar Rules: Rules. 1321 1322 1323 File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts 1324 1325 1.3 Semantic Values 1326 =================== 1327 1328 A formal grammar selects tokens only by their classifications: for 1329 example, if a rule mentions the terminal symbol `integer constant', it 1330 means that _any_ integer constant is grammatically valid in that 1331 position. The precise value of the constant is irrelevant to how to 1332 parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is 1333 equally grammatical. 1334 1335 But the precise value is very important for what the input means 1336 once it is parsed. A compiler is useless if it fails to distinguish 1337 between 4, 1 and 3989 as constants in the program! Therefore, each 1338 token in a Bison grammar has both a token type and a "semantic value". 1339 *Note Defining Language Semantics: Semantics, for details. 1340 1341 The token type is a terminal symbol defined in the grammar, such as 1342 `INTEGER', `IDENTIFIER' or `',''. It tells everything you need to know 1343 to decide where the token may validly appear and how to group it with 1344 other tokens. The grammar rules know nothing about tokens except their 1345 types. 1346 1347 The semantic value has all the rest of the information about the 1348 meaning of the token, such as the value of an integer, or the name of an 1349 identifier. (A token such as `','' which is just punctuation doesn't 1350 need to have any semantic value.) 1351 1352 For example, an input token might be classified as token type 1353 `INTEGER' and have the semantic value 4. Another input token might 1354 have the same token type `INTEGER' but value 3989. When a grammar rule 1355 says that `INTEGER' is allowed, either of these tokens is acceptable 1356 because each is an `INTEGER'. When the parser accepts the token, it 1357 keeps track of the token's semantic value. 1358 1359 Each grouping can also have a semantic value as well as its 1360 nonterminal symbol. For example, in a calculator, an expression 1361 typically has a semantic value that is a number. In a compiler for a 1362 programming language, an expression typically has a semantic value that 1363 is a tree structure describing the meaning of the expression. 1364 1365 1366 File: bison.info, Node: Semantic Actions, Next: GLR Parsers, Prev: Semantic Values, Up: Concepts 1367 1368 1.4 Semantic Actions 1369 ==================== 1370 1371 In order to be useful, a program must do more than parse input; it must 1372 also produce some output based on the input. In a Bison grammar, a 1373 grammar rule can have an "action" made up of C statements. Each time 1374 the parser recognizes a match for that rule, the action is executed. 1375 *Note Actions::. 1376 1377 Most of the time, the purpose of an action is to compute the 1378 semantic value of the whole construct from the semantic values of its 1379 parts. For example, suppose we have a rule which says an expression 1380 can be the sum of two expressions. When the parser recognizes such a 1381 sum, each of the subexpressions has a semantic value which describes 1382 how it was built up. The action for this rule should create a similar 1383 sort of value for the newly recognized larger expression. 1384 1385 For example, here is a rule that says an expression can be the sum of 1386 two subexpressions: 1387 1388 expr: expr '+' expr { $$ = $1 + $3; } ; 1389 1390 The action says how to produce the semantic value of the sum expression 1391 from the values of the two subexpressions. 1392 1393 1394 File: bison.info, Node: GLR Parsers, Next: Locations, Prev: Semantic Actions, Up: Concepts 1395 1396 1.5 Writing GLR Parsers 1397 ======================= 1398 1399 In some grammars, Bison's deterministic LR(1) parsing algorithm cannot 1400 decide whether to apply a certain grammar rule at a given point. That 1401 is, it may not be able to decide (on the basis of the input read so 1402 far) which of two possible reductions (applications of a grammar rule) 1403 applies, or whether to apply a reduction or read more of the input and 1404 apply a reduction later in the input. These are known respectively as 1405 "reduce/reduce" conflicts (*note Reduce/Reduce::), and "shift/reduce" 1406 conflicts (*note Shift/Reduce::). 1407 1408 To use a grammar that is not easily modified to be LR(1), a more 1409 general parsing algorithm is sometimes necessary. If you include 1410 `%glr-parser' among the Bison declarations in your file (*note Grammar 1411 Outline::), the result is a Generalized LR (GLR) parser. These parsers 1412 handle Bison grammars that contain no unresolved conflicts (i.e., after 1413 applying precedence declarations) identically to deterministic parsers. 1414 However, when faced with unresolved shift/reduce and reduce/reduce 1415 conflicts, GLR parsers use the simple expedient of doing both, 1416 effectively cloning the parser to follow both possibilities. Each of 1417 the resulting parsers can again split, so that at any given time, there 1418 can be any number of possible parses being explored. The parsers 1419 proceed in lockstep; that is, all of them consume (shift) a given input 1420 symbol before any of them proceed to the next. Each of the cloned 1421 parsers eventually meets one of two possible fates: either it runs into 1422 a parsing error, in which case it simply vanishes, or it merges with 1423 another parser, because the two of them have reduced the input to an 1424 identical set of symbols. 1425 1426 During the time that there are multiple parsers, semantic actions are 1427 recorded, but not performed. When a parser disappears, its recorded 1428 semantic actions disappear as well, and are never performed. When a 1429 reduction makes two parsers identical, causing them to merge, Bison 1430 records both sets of semantic actions. Whenever the last two parsers 1431 merge, reverting to the single-parser case, Bison resolves all the 1432 outstanding actions either by precedences given to the grammar rules 1433 involved, or by performing both actions, and then calling a designated 1434 user-defined function on the resulting values to produce an arbitrary 1435 merged result. 1436 1437 * Menu: 1438 1439 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 1440 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 1441 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 1442 * Compiler Requirements:: GLR parsers require a modern C compiler. 1443 1444 1445 File: bison.info, Node: Simple GLR Parsers, Next: Merging GLR Parses, Up: GLR Parsers 1446 1447 1.5.1 Using GLR on Unambiguous Grammars 1448 --------------------------------------- 1449 1450 In the simplest cases, you can use the GLR algorithm to parse grammars 1451 that are unambiguous but fail to be LR(1). Such grammars typically 1452 require more than one symbol of lookahead. 1453 1454 Consider a problem that arises in the declaration of enumerated and 1455 subrange types in the programming language Pascal. Here are some 1456 examples: 1457 1458 type subrange = lo .. hi; 1459 type enum = (a, b, c); 1460 1461 The original language standard allows only numeric literals and 1462 constant identifiers for the subrange bounds (`lo' and `hi'), but 1463 Extended Pascal (ISO/IEC 10206) and many other Pascal implementations 1464 allow arbitrary expressions there. This gives rise to the following 1465 situation, containing a superfluous pair of parentheses: 1466 1467 type subrange = (a) .. b; 1468 1469 Compare this to the following declaration of an enumerated type with 1470 only one value: 1471 1472 type enum = (a); 1473 1474 (These declarations are contrived, but they are syntactically valid, 1475 and more-complicated cases can come up in practical programs.) 1476 1477 These two declarations look identical until the `..' token. With 1478 normal LR(1) one-token lookahead it is not possible to decide between 1479 the two forms when the identifier `a' is parsed. It is, however, 1480 desirable for a parser to decide this, since in the latter case `a' 1481 must become a new identifier to represent the enumeration value, while 1482 in the former case `a' must be evaluated with its current meaning, 1483 which may be a constant or even a function call. 1484 1485 You could parse `(a)' as an "unspecified identifier in parentheses", 1486 to be resolved later, but this typically requires substantial 1487 contortions in both semantic actions and large parts of the grammar, 1488 where the parentheses are nested in the recursive rules for expressions. 1489 1490 You might think of using the lexer to distinguish between the two 1491 forms by returning different tokens for currently defined and undefined 1492 identifiers. But if these declarations occur in a local scope, and `a' 1493 is defined in an outer scope, then both forms are possible--either 1494 locally redefining `a', or using the value of `a' from the outer scope. 1495 So this approach cannot work. 1496 1497 A simple solution to this problem is to declare the parser to use 1498 the GLR algorithm. When the GLR parser reaches the critical state, it 1499 merely splits into two branches and pursues both syntax rules 1500 simultaneously. Sooner or later, one of them runs into a parsing 1501 error. If there is a `..' token before the next `;', the rule for 1502 enumerated types fails since it cannot accept `..' anywhere; otherwise, 1503 the subrange type rule fails since it requires a `..' token. So one of 1504 the branches fails silently, and the other one continues normally, 1505 performing all the intermediate actions that were postponed during the 1506 split. 1507 1508 If the input is syntactically incorrect, both branches fail and the 1509 parser reports a syntax error as usual. 1510 1511 The effect of all this is that the parser seems to "guess" the 1512 correct branch to take, or in other words, it seems to use more 1513 lookahead than the underlying LR(1) algorithm actually allows for. In 1514 this example, LR(2) would suffice, but also some cases that are not 1515 LR(k) for any k can be handled this way. 1516 1517 In general, a GLR parser can take quadratic or cubic worst-case time, 1518 and the current Bison parser even takes exponential time and space for 1519 some grammars. In practice, this rarely happens, and for many grammars 1520 it is possible to prove that it cannot happen. The present example 1521 contains only one conflict between two rules, and the type-declaration 1522 context containing the conflict cannot be nested. So the number of 1523 branches that can exist at any time is limited by the constant 2, and 1524 the parsing time is still linear. 1525 1526 Here is a Bison grammar corresponding to the example above. It 1527 parses a vastly simplified form of Pascal type declarations. 1528 1529 %token TYPE DOTDOT ID 1530 1531 %left '+' '-' 1532 %left '*' '/' 1533 1534 %% 1535 1536 type_decl: TYPE ID '=' type ';' ; 1537 1538 type: 1539 '(' id_list ')' 1540 | expr DOTDOT expr 1541 ; 1542 1543 id_list: 1544 ID 1545 | id_list ',' ID 1546 ; 1547 1548 expr: 1549 '(' expr ')' 1550 | expr '+' expr 1551 | expr '-' expr 1552 | expr '*' expr 1553 | expr '/' expr 1554 | ID 1555 ; 1556 1557 When used as a normal LR(1) grammar, Bison correctly complains about 1558 one reduce/reduce conflict. In the conflicting situation the parser 1559 chooses one of the alternatives, arbitrarily the one declared first. 1560 Therefore the following correct input is not recognized: 1561 1562 type t = (a) .. b; 1563 1564 The parser can be turned into a GLR parser, while also telling Bison 1565 to be silent about the one known reduce/reduce conflict, by adding 1566 these two declarations to the Bison grammar file (before the first 1567 `%%'): 1568 1569 %glr-parser 1570 %expect-rr 1 1571 1572 No change in the grammar itself is required. Now the parser recognizes 1573 all valid declarations, according to the limited syntax above, 1574 transparently. In fact, the user does not even notice when the parser 1575 splits. 1576 1577 So here we have a case where we can use the benefits of GLR, almost 1578 without disadvantages. Even in simple cases like this, however, there 1579 are at least two potential problems to beware. First, always analyze 1580 the conflicts reported by Bison to make sure that GLR splitting is only 1581 done where it is intended. A GLR parser splitting inadvertently may 1582 cause problems less obvious than an LR parser statically choosing the 1583 wrong alternative in a conflict. Second, consider interactions with 1584 the lexer (*note Semantic Tokens::) with great care. Since a split 1585 parser consumes tokens without performing any actions during the split, 1586 the lexer cannot obtain information via parser actions. Some cases of 1587 lexer interactions can be eliminated by using GLR to shift the 1588 complications from the lexer to the parser. You must check the 1589 remaining cases for correctness. 1590 1591 In our example, it would be safe for the lexer to return tokens 1592 based on their current meanings in some symbol table, because no new 1593 symbols are defined in the middle of a type declaration. Though it is 1594 possible for a parser to define the enumeration constants as they are 1595 parsed, before the type declaration is completed, it actually makes no 1596 difference since they cannot be used within the same enumerated type 1597 declaration. 1598 1599 1600 File: bison.info, Node: Merging GLR Parses, Next: GLR Semantic Actions, Prev: Simple GLR Parsers, Up: GLR Parsers 1601 1602 1.5.2 Using GLR to Resolve Ambiguities 1603 -------------------------------------- 1604 1605 Let's consider an example, vastly simplified from a C++ grammar. 1606 1607 %{ 1608 #include <stdio.h> 1609 #define YYSTYPE char const * 1610 int yylex (void); 1611 void yyerror (char const *); 1612 %} 1613 1614 %token TYPENAME ID 1615 1616 %right '=' 1617 %left '+' 1618 1619 %glr-parser 1620 1621 %% 1622 1623 prog: 1624 /* Nothing. */ 1625 | prog stmt { printf ("\n"); } 1626 ; 1627 1628 stmt: 1629 expr ';' %dprec 1 1630 | decl %dprec 2 1631 ; 1632 1633 expr: 1634 ID { printf ("%s ", $$); } 1635 | TYPENAME '(' expr ')' 1636 { printf ("%s <cast> ", $1); } 1637 | expr '+' expr { printf ("+ "); } 1638 | expr '=' expr { printf ("= "); } 1639 ; 1640 1641 decl: 1642 TYPENAME declarator ';' 1643 { printf ("%s <declare> ", $1); } 1644 | TYPENAME declarator '=' expr ';' 1645 { printf ("%s <init-declare> ", $1); } 1646 ; 1647 1648 declarator: 1649 ID { printf ("\"%s\" ", $1); } 1650 | '(' declarator ')' 1651 ; 1652 1653 This models a problematic part of the C++ grammar--the ambiguity between 1654 certain declarations and statements. For example, 1655 1656 T (x) = y+z; 1657 1658 parses as either an `expr' or a `stmt' (assuming that `T' is recognized 1659 as a `TYPENAME' and `x' as an `ID'). Bison detects this as a 1660 reduce/reduce conflict between the rules `expr : ID' and `declarator : 1661 ID', which it cannot resolve at the time it encounters `x' in the 1662 example above. Since this is a GLR parser, it therefore splits the 1663 problem into two parses, one for each choice of resolving the 1664 reduce/reduce conflict. Unlike the example from the previous section 1665 (*note Simple GLR Parsers::), however, neither of these parses "dies," 1666 because the grammar as it stands is ambiguous. One of the parsers 1667 eventually reduces `stmt : expr ';'' and the other reduces `stmt : 1668 decl', after which both parsers are in an identical state: they've seen 1669 `prog stmt' and have the same unprocessed input remaining. We say that 1670 these parses have "merged." 1671 1672 At this point, the GLR parser requires a specification in the 1673 grammar of how to choose between the competing parses. In the example 1674 above, the two `%dprec' declarations specify that Bison is to give 1675 precedence to the parse that interprets the example as a `decl', which 1676 implies that `x' is a declarator. The parser therefore prints 1677 1678 "x" y z + T <init-declare> 1679 1680 The `%dprec' declarations only come into play when more than one 1681 parse survives. Consider a different input string for this parser: 1682 1683 T (x) + y; 1684 1685 This is another example of using GLR to parse an unambiguous construct, 1686 as shown in the previous section (*note Simple GLR Parsers::). Here, 1687 there is no ambiguity (this cannot be parsed as a declaration). 1688 However, at the time the Bison parser encounters `x', it does not have 1689 enough information to resolve the reduce/reduce conflict (again, 1690 between `x' as an `expr' or a `declarator'). In this case, no 1691 precedence declaration is used. Again, the parser splits into two, one 1692 assuming that `x' is an `expr', and the other assuming `x' is a 1693 `declarator'. The second of these parsers then vanishes when it sees 1694 `+', and the parser prints 1695 1696 x T <cast> y + 1697 1698 Suppose that instead of resolving the ambiguity, you wanted to see 1699 all the possibilities. For this purpose, you must merge the semantic 1700 actions of the two possible parsers, rather than choosing one over the 1701 other. To do so, you could change the declaration of `stmt' as follows: 1702 1703 stmt: 1704 expr ';' %merge <stmtMerge> 1705 | decl %merge <stmtMerge> 1706 ; 1707 1708 and define the `stmtMerge' function as: 1709 1710 static YYSTYPE 1711 stmtMerge (YYSTYPE x0, YYSTYPE x1) 1712 { 1713 printf ("<OR> "); 1714 return ""; 1715 } 1716 1717 with an accompanying forward declaration in the C declarations at the 1718 beginning of the file: 1719 1720 %{ 1721 #define YYSTYPE char const * 1722 static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); 1723 %} 1724 1725 With these declarations, the resulting parser parses the first example 1726 as both an `expr' and a `decl', and prints 1727 1728 "x" y z + T <init-declare> x T <cast> y z + = <OR> 1729 1730 Bison requires that all of the productions that participate in any 1731 particular merge have identical `%merge' clauses. Otherwise, the 1732 ambiguity would be unresolvable, and the parser will report an error 1733 during any parse that results in the offending merge. 1734 1735 1736 File: bison.info, Node: GLR Semantic Actions, Next: Compiler Requirements, Prev: Merging GLR Parses, Up: GLR Parsers 1737 1738 1.5.3 GLR Semantic Actions 1739 -------------------------- 1740 1741 By definition, a deferred semantic action is not performed at the same 1742 time as the associated reduction. This raises caveats for several 1743 Bison features you might use in a semantic action in a GLR parser. 1744 1745 In any semantic action, you can examine `yychar' to determine the 1746 type of the lookahead token present at the time of the associated 1747 reduction. After checking that `yychar' is not set to `YYEMPTY' or 1748 `YYEOF', you can then examine `yylval' and `yylloc' to determine the 1749 lookahead token's semantic value and location, if any. In a 1750 nondeferred semantic action, you can also modify any of these variables 1751 to influence syntax analysis. *Note Lookahead Tokens: Lookahead. 1752 1753 In a deferred semantic action, it's too late to influence syntax 1754 analysis. In this case, `yychar', `yylval', and `yylloc' are set to 1755 shallow copies of the values they had at the time of the associated 1756 reduction. For this reason alone, modifying them is dangerous. 1757 Moreover, the result of modifying them is undefined and subject to 1758 change with future versions of Bison. For example, if a semantic 1759 action might be deferred, you should never write it to invoke 1760 `yyclearin' (*note Action Features::) or to attempt to free memory 1761 referenced by `yylval'. 1762 1763 Another Bison feature requiring special consideration is `YYERROR' 1764 (*note Action Features::), which you can invoke in a semantic action to 1765 initiate error recovery. During deterministic GLR operation, the 1766 effect of `YYERROR' is the same as its effect in a deterministic parser. 1767 In a deferred semantic action, its effect is undefined. 1768 1769 Also, see *note Default Action for Locations: Location Default 1770 Action, which describes a special usage of `YYLLOC_DEFAULT' in GLR 1771 parsers. 1772 1773 1774 File: bison.info, Node: Compiler Requirements, Prev: GLR Semantic Actions, Up: GLR Parsers 1775 1776 1.5.4 Considerations when Compiling GLR Parsers 1777 ----------------------------------------------- 1778 1779 The GLR parsers require a compiler for ISO C89 or later. In addition, 1780 they use the `inline' keyword, which is not C89, but is C99 and is a 1781 common extension in pre-C99 compilers. It is up to the user of these 1782 parsers to handle portability issues. For instance, if using Autoconf 1783 and the Autoconf macro `AC_C_INLINE', a mere 1784 1785 %{ 1786 #include <config.h> 1787 %} 1788 1789 will suffice. Otherwise, we suggest 1790 1791 %{ 1792 #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \ 1793 && ! defined inline) 1794 # define inline 1795 #endif 1796 %} 1797 1798 1799 File: bison.info, Node: Locations, Next: Bison Parser, Prev: GLR Parsers, Up: Concepts 1800 1801 1.6 Locations 1802 ============= 1803 1804 Many applications, like interpreters or compilers, have to produce 1805 verbose and useful error messages. To achieve this, one must be able 1806 to keep track of the "textual location", or "location", of each 1807 syntactic construct. Bison provides a mechanism for handling these 1808 locations. 1809 1810 Each token has a semantic value. In a similar fashion, each token 1811 has an associated location, but the type of locations is the same for 1812 all tokens and groupings. Moreover, the output parser is equipped with 1813 a default data structure for storing locations (*note Tracking 1814 Locations::, for more details). 1815 1816 Like semantic values, locations can be reached in actions using a 1817 dedicated set of constructs. In the example above, the location of the 1818 whole grouping is `@$', while the locations of the subexpressions are 1819 `@1' and `@3'. 1820 1821 When a rule is matched, a default action is used to compute the 1822 semantic value of its left hand side (*note Actions::). In the same 1823 way, another default action is used for locations. However, the action 1824 for locations is general enough for most cases, meaning there is 1825 usually no need to describe for each rule how `@$' should be formed. 1826 When building a new location for a given grouping, the default behavior 1827 of the output parser is to take the beginning of the first symbol, and 1828 the end of the last symbol. 1829 1830 1831 File: bison.info, Node: Bison Parser, Next: Stages, Prev: Locations, Up: Concepts 1832 1833 1.7 Bison Output: the Parser Implementation File 1834 ================================================ 1835 1836 When you run Bison, you give it a Bison grammar file as input. The 1837 most important output is a C source file that implements a parser for 1838 the language described by the grammar. This parser is called a "Bison 1839 parser", and this file is called a "Bison parser implementation file". 1840 Keep in mind that the Bison utility and the Bison parser are two 1841 distinct programs: the Bison utility is a program whose output is the 1842 Bison parser implementation file that becomes part of your program. 1843 1844 The job of the Bison parser is to group tokens into groupings 1845 according to the grammar rules--for example, to build identifiers and 1846 operators into expressions. As it does this, it runs the actions for 1847 the grammar rules it uses. 1848 1849 The tokens come from a function called the "lexical analyzer" that 1850 you must supply in some fashion (such as by writing it in C). The Bison 1851 parser calls the lexical analyzer each time it wants a new token. It 1852 doesn't know what is "inside" the tokens (though their semantic values 1853 may reflect this). Typically the lexical analyzer makes the tokens by 1854 parsing characters of text, but Bison does not depend on this. *Note 1855 The Lexical Analyzer Function `yylex': Lexical. 1856 1857 The Bison parser implementation file is C code which defines a 1858 function named `yyparse' which implements that grammar. This function 1859 does not make a complete C program: you must supply some additional 1860 functions. One is the lexical analyzer. Another is an error-reporting 1861 function which the parser calls to report an error. In addition, a 1862 complete C program must start with a function called `main'; you have 1863 to provide this, and arrange for it to call `yyparse' or the parser 1864 will never run. *Note Parser C-Language Interface: Interface. 1865 1866 Aside from the token type names and the symbols in the actions you 1867 write, all symbols defined in the Bison parser implementation file 1868 itself begin with `yy' or `YY'. This includes interface functions such 1869 as the lexical analyzer function `yylex', the error reporting function 1870 `yyerror' and the parser function `yyparse' itself. This also includes 1871 numerous identifiers used for internal purposes. Therefore, you should 1872 avoid using C identifiers starting with `yy' or `YY' in the Bison 1873 grammar file except for the ones defined in this manual. Also, you 1874 should avoid using the C identifiers `malloc' and `free' for anything 1875 other than their usual meanings. 1876 1877 In some cases the Bison parser implementation file includes system 1878 headers, and in those cases your code should respect the identifiers 1879 reserved by those headers. On some non-GNU hosts, `<alloca.h>', 1880 `<malloc.h>', `<stddef.h>', and `<stdlib.h>' are included as needed to 1881 declare memory allocators and related types. `<libintl.h>' is included 1882 if message translation is in use (*note Internationalization::). Other 1883 system headers may be included if you define `YYDEBUG' to a nonzero 1884 value (*note Tracing Your Parser: Tracing.). 1885 1886 1887 File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts 1888 1889 1.8 Stages in Using Bison 1890 ========================= 1891 1892 The actual language-design process using Bison, from grammar 1893 specification to a working compiler or interpreter, has these parts: 1894 1895 1. Formally specify the grammar in a form recognized by Bison (*note 1896 Bison Grammar Files: Grammar File.). For each grammatical rule in 1897 the language, describe the action that is to be taken when an 1898 instance of that rule is recognized. The action is described by a 1899 sequence of C statements. 1900 1901 2. Write a lexical analyzer to process input and pass tokens to the 1902 parser. The lexical analyzer may be written by hand in C (*note 1903 The Lexical Analyzer Function `yylex': Lexical.). It could also 1904 be produced using Lex, but the use of Lex is not discussed in this 1905 manual. 1906 1907 3. Write a controlling function that calls the Bison-produced parser. 1908 1909 4. Write error-reporting routines. 1910 1911 To turn this source code as written into a runnable program, you 1912 must follow these steps: 1913 1914 1. Run Bison on the grammar to produce the parser. 1915 1916 2. Compile the code output by Bison, as well as any other source 1917 files. 1918 1919 3. Link the object files to produce the finished product. 1920 1921 1922 File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts 1923 1924 1.9 The Overall Layout of a Bison Grammar 1925 ========================================= 1926 1927 The input file for the Bison utility is a "Bison grammar file". The 1928 general form of a Bison grammar file is as follows: 1929 1930 %{ 1931 PROLOGUE 1932 %} 1933 1934 BISON DECLARATIONS 1935 1936 %% 1937 GRAMMAR RULES 1938 %% 1939 EPILOGUE 1940 1941 The `%%', `%{' and `%}' are punctuation that appears in every Bison 1942 grammar file to separate the sections. 1943 1944 The prologue may define types and variables used in the actions. 1945 You can also use preprocessor commands to define macros used there, and 1946 use `#include' to include header files that do any of these things. 1947 You need to declare the lexical analyzer `yylex' and the error printer 1948 `yyerror' here, along with any other global identifiers used by the 1949 actions in the grammar rules. 1950 1951 The Bison declarations declare the names of the terminal and 1952 nonterminal symbols, and may also describe operator precedence and the 1953 data types of semantic values of various symbols. 1954 1955 The grammar rules define how to construct each nonterminal symbol 1956 from its parts. 1957 1958 The epilogue can contain any code you want to use. Often the 1959 definitions of functions declared in the prologue go here. In a simple 1960 program, all the rest of the program can go here. 1961 1962 1963 File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top 1964 1965 2 Examples 1966 ********** 1967 1968 Now we show and explain several sample programs written using Bison: a 1969 reverse polish notation calculator, an algebraic (infix) notation 1970 calculator -- later extended to track "locations" -- and a 1971 multi-function calculator. All produce usable, though limited, 1972 interactive desk-top calculators. 1973 1974 These examples are simple, but Bison grammars for real programming 1975 languages are written the same way. You can copy these examples into a 1976 source file to try them. 1977 1978 * Menu: 1979 1980 * RPN Calc:: Reverse polish notation calculator; 1981 a first example with no operator precedence. 1982 * Infix Calc:: Infix (algebraic) notation calculator. 1983 Operator precedence is introduced. 1984 * Simple Error Recovery:: Continuing after syntax errors. 1985 * Location Tracking Calc:: Demonstrating the use of @N and @$. 1986 * Multi-function Calc:: Calculator with memory and trig functions. 1987 It uses multiple data-types for semantic values. 1988 * Exercises:: Ideas for improving the multi-function calculator. 1989 1990 1991 File: bison.info, Node: RPN Calc, Next: Infix Calc, Up: Examples 1992 1993 2.1 Reverse Polish Notation Calculator 1994 ====================================== 1995 1996 The first example is that of a simple double-precision "reverse polish 1997 notation" calculator (a calculator using postfix operators). This 1998 example provides a good starting point, since operator precedence is 1999 not an issue. The second example will illustrate how operator 2000 precedence is handled. 2001 2002 The source code for this calculator is named `rpcalc.y'. The `.y' 2003 extension is a convention used for Bison grammar files. 2004 2005 * Menu: 2006 2007 * Rpcalc Declarations:: Prologue (declarations) for rpcalc. 2008 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. 2009 * Rpcalc Lexer:: The lexical analyzer. 2010 * Rpcalc Main:: The controlling function. 2011 * Rpcalc Error:: The error reporting function. 2012 * Rpcalc Generate:: Running Bison on the grammar file. 2013 * Rpcalc Compile:: Run the C compiler on the output code. 2014 2015 2016 File: bison.info, Node: Rpcalc Declarations, Next: Rpcalc Rules, Up: RPN Calc 2017 2018 2.1.1 Declarations for `rpcalc' 2019 ------------------------------- 2020 2021 Here are the C and Bison declarations for the reverse polish notation 2022 calculator. As in C, comments are placed between `/*...*/'. 2023 2024 /* Reverse polish notation calculator. */ 2025 2026 %{ 2027 #define YYSTYPE double 2028 #include <math.h> 2029 int yylex (void); 2030 void yyerror (char const *); 2031 %} 2032 2033 %token NUM 2034 2035 %% /* Grammar rules and actions follow. */ 2036 2037 The declarations section (*note The prologue: Prologue.) contains two 2038 preprocessor directives and two forward declarations. 2039 2040 The `#define' directive defines the macro `YYSTYPE', thus specifying 2041 the C data type for semantic values of both tokens and groupings (*note 2042 Data Types of Semantic Values: Value Type.). The Bison parser will use 2043 whatever type `YYSTYPE' is defined as; if you don't define it, `int' is 2044 the default. Because we specify `double', each token and each 2045 expression has an associated value, which is a floating point number. 2046 2047 The `#include' directive is used to declare the exponentiation 2048 function `pow'. 2049 2050 The forward declarations for `yylex' and `yyerror' are needed 2051 because the C language requires that functions be declared before they 2052 are used. These functions will be defined in the epilogue, but the 2053 parser calls them so they must be declared in the prologue. 2054 2055 The second section, Bison declarations, provides information to Bison 2056 about the token types (*note The Bison Declarations Section: Bison 2057 Declarations.). Each terminal symbol that is not a single-character 2058 literal must be declared here. (Single-character literals normally 2059 don't need to be declared.) In this example, all the arithmetic 2060 operators are designated by single-character literals, so the only 2061 terminal symbol that needs to be declared is `NUM', the token type for 2062 numeric constants. 2063 2064 2065 File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Declarations, Up: RPN Calc 2066 2067 2.1.2 Grammar Rules for `rpcalc' 2068 -------------------------------- 2069 2070 Here are the grammar rules for the reverse polish notation calculator. 2071 2072 input: 2073 /* empty */ 2074 | input line 2075 ; 2076 2077 line: 2078 '\n' 2079 | exp '\n' { printf ("%.10g\n", $1); } 2080 ; 2081 2082 exp: 2083 NUM { $$ = $1; } 2084 | exp exp '+' { $$ = $1 + $2; } 2085 | exp exp '-' { $$ = $1 - $2; } 2086 | exp exp '*' { $$ = $1 * $2; } 2087 | exp exp '/' { $$ = $1 / $2; } 2088 | exp exp '^' { $$ = pow ($1, $2); } /* Exponentiation */ 2089 | exp 'n' { $$ = -$1; } /* Unary minus */ 2090 ; 2091 %% 2092 2093 The groupings of the rpcalc "language" defined here are the 2094 expression (given the name `exp'), the line of input (`line'), and the 2095 complete input transcript (`input'). Each of these nonterminal symbols 2096 has several alternate rules, joined by the vertical bar `|' which is 2097 read as "or". The following sections explain what these rules mean. 2098 2099 The semantics of the language is determined by the actions taken 2100 when a grouping is recognized. The actions are the C code that appears 2101 inside braces. *Note Actions::. 2102 2103 You must specify these actions in C, but Bison provides the means for 2104 passing semantic values between the rules. In each action, the 2105 pseudo-variable `$$' stands for the semantic value for the grouping 2106 that the rule is going to construct. Assigning a value to `$$' is the 2107 main job of most actions. The semantic values of the components of the 2108 rule are referred to as `$1', `$2', and so on. 2109 2110 * Menu: 2111 2112 * Rpcalc Input:: 2113 * Rpcalc Line:: 2114 * Rpcalc Expr:: 2115 2116 2117 File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules 2118 2119 2.1.2.1 Explanation of `input' 2120 .............................. 2121 2122 Consider the definition of `input': 2123 2124 input: 2125 /* empty */ 2126 | input line 2127 ; 2128 2129 This definition reads as follows: "A complete input is either an 2130 empty string, or a complete input followed by an input line". Notice 2131 that "complete input" is defined in terms of itself. This definition 2132 is said to be "left recursive" since `input' appears always as the 2133 leftmost symbol in the sequence. *Note Recursive Rules: Recursion. 2134 2135 The first alternative is empty because there are no symbols between 2136 the colon and the first `|'; this means that `input' can match an empty 2137 string of input (no tokens). We write the rules this way because it is 2138 legitimate to type `Ctrl-d' right after you start the calculator. It's 2139 conventional to put an empty alternative first and write the comment 2140 `/* empty */' in it. 2141 2142 The second alternate rule (`input line') handles all nontrivial 2143 input. It means, "After reading any number of lines, read one more 2144 line if possible." The left recursion makes this rule into a loop. 2145 Since the first alternative matches empty input, the loop can be 2146 executed zero or more times. 2147 2148 The parser function `yyparse' continues to process input until a 2149 grammatical error is seen or the lexical analyzer says there are no more 2150 input tokens; we will arrange for the latter to happen at end-of-input. 2151 2152 2153 File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules 2154 2155 2.1.2.2 Explanation of `line' 2156 ............................. 2157 2158 Now consider the definition of `line': 2159 2160 line: 2161 '\n' 2162 | exp '\n' { printf ("%.10g\n", $1); } 2163 ; 2164 2165 The first alternative is a token which is a newline character; this 2166 means that rpcalc accepts a blank line (and ignores it, since there is 2167 no action). The second alternative is an expression followed by a 2168 newline. This is the alternative that makes rpcalc useful. The 2169 semantic value of the `exp' grouping is the value of `$1' because the 2170 `exp' in question is the first symbol in the alternative. The action 2171 prints this value, which is the result of the computation the user 2172 asked for. 2173 2174 This action is unusual because it does not assign a value to `$$'. 2175 As a consequence, the semantic value associated with the `line' is 2176 uninitialized (its value will be unpredictable). This would be a bug if 2177 that value were ever used, but we don't use it: once rpcalc has printed 2178 the value of the user's input line, that value is no longer needed. 2179 2180 2181 File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules 2182 2183 2.1.2.3 Explanation of `expr' 2184 ............................. 2185 2186 The `exp' grouping has several rules, one for each kind of expression. 2187 The first rule handles the simplest expressions: those that are just 2188 numbers. The second handles an addition-expression, which looks like 2189 two expressions followed by a plus-sign. The third handles 2190 subtraction, and so on. 2191 2192 exp: 2193 NUM 2194 | exp exp '+' { $$ = $1 + $2; } 2195 | exp exp '-' { $$ = $1 - $2; } 2196 ... 2197 ; 2198 2199 We have used `|' to join all the rules for `exp', but we could 2200 equally well have written them separately: 2201 2202 exp: NUM ; 2203 exp: exp exp '+' { $$ = $1 + $2; }; 2204 exp: exp exp '-' { $$ = $1 - $2; }; 2205 ... 2206 2207 Most of the rules have actions that compute the value of the 2208 expression in terms of the value of its parts. For example, in the 2209 rule for addition, `$1' refers to the first component `exp' and `$2' 2210 refers to the second one. The third component, `'+'', has no meaningful 2211 associated semantic value, but if it had one you could refer to it as 2212 `$3'. When `yyparse' recognizes a sum expression using this rule, the 2213 sum of the two subexpressions' values is produced as the value of the 2214 entire expression. *Note Actions::. 2215 2216 You don't have to give an action for every rule. When a rule has no 2217 action, Bison by default copies the value of `$1' into `$$'. This is 2218 what happens in the first rule (the one that uses `NUM'). 2219 2220 The formatting shown here is the recommended convention, but Bison 2221 does not require it. You can add or change white space as much as you 2222 wish. For example, this: 2223 2224 exp: NUM | exp exp '+' {$$ = $1 + $2; } | ... ; 2225 2226 means the same thing as this: 2227 2228 exp: 2229 NUM 2230 | exp exp '+' { $$ = $1 + $2; } 2231 | ... 2232 ; 2233 2234 The latter, however, is much more readable. 2235 2236 2237 File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc 2238 2239 2.1.3 The `rpcalc' Lexical Analyzer 2240 ----------------------------------- 2241 2242 The lexical analyzer's job is low-level parsing: converting characters 2243 or sequences of characters into tokens. The Bison parser gets its 2244 tokens by calling the lexical analyzer. *Note The Lexical Analyzer 2245 Function `yylex': Lexical. 2246 2247 Only a simple lexical analyzer is needed for the RPN calculator. 2248 This lexical analyzer skips blanks and tabs, then reads in numbers as 2249 `double' and returns them as `NUM' tokens. Any other character that 2250 isn't part of a number is a separate token. Note that the token-code 2251 for such a single-character token is the character itself. 2252 2253 The return value of the lexical analyzer function is a numeric code 2254 which represents a token type. The same text used in Bison rules to 2255 stand for this token type is also a C expression for the numeric code 2256 for the type. This works in two ways. If the token type is a 2257 character literal, then its numeric code is that of the character; you 2258 can use the same character literal in the lexical analyzer to express 2259 the number. If the token type is an identifier, that identifier is 2260 defined by Bison as a C macro whose definition is the appropriate 2261 number. In this example, therefore, `NUM' becomes a macro for `yylex' 2262 to use. 2263 2264 The semantic value of the token (if it has one) is stored into the 2265 global variable `yylval', which is where the Bison parser will look for 2266 it. (The C data type of `yylval' is `YYSTYPE', which was defined at 2267 the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc 2268 Declarations.) 2269 2270 A token type code of zero is returned if the end-of-input is 2271 encountered. (Bison recognizes any nonpositive value as indicating 2272 end-of-input.) 2273 2274 Here is the code for the lexical analyzer: 2275 2276 /* The lexical analyzer returns a double floating point 2277 number on the stack and the token NUM, or the numeric code 2278 of the character read if not a number. It skips all blanks 2279 and tabs, and returns 0 for end-of-input. */ 2280 2281 #include <ctype.h> 2282 2283 int 2284 yylex (void) 2285 { 2286 int c; 2287 2288 /* Skip white space. */ 2289 while ((c = getchar ()) == ' ' || c == '\t') 2290 continue; 2291 /* Process numbers. */ 2292 if (c == '.' || isdigit (c)) 2293 { 2294 ungetc (c, stdin); 2295 scanf ("%lf", &yylval); 2296 return NUM; 2297 } 2298 /* Return end-of-input. */ 2299 if (c == EOF) 2300 return 0; 2301 /* Return a single char. */ 2302 return c; 2303 } 2304 2305 2306 File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc 2307 2308 2.1.4 The Controlling Function 2309 ------------------------------ 2310 2311 In keeping with the spirit of this example, the controlling function is 2312 kept to the bare minimum. The only requirement is that it call 2313 `yyparse' to start the process of parsing. 2314 2315 int 2316 main (void) 2317 { 2318 return yyparse (); 2319 } 2320 2321 2322 File: bison.info, Node: Rpcalc Error, Next: Rpcalc Generate, Prev: Rpcalc Main, Up: RPN Calc 2323 2324 2.1.5 The Error Reporting Routine 2325 --------------------------------- 2326 2327 When `yyparse' detects a syntax error, it calls the error reporting 2328 function `yyerror' to print an error message (usually but not always 2329 `"syntax error"'). It is up to the programmer to supply `yyerror' 2330 (*note Parser C-Language Interface: Interface.), so here is the 2331 definition we will use: 2332 2333 #include <stdio.h> 2334 2335 /* Called by yyparse on error. */ 2336 void 2337 yyerror (char const *s) 2338 { 2339 fprintf (stderr, "%s\n", s); 2340 } 2341 2342 After `yyerror' returns, the Bison parser may recover from the error 2343 and continue parsing if the grammar contains a suitable error rule 2344 (*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We 2345 have not written any error rules in this example, so any invalid input 2346 will cause the calculator program to exit. This is not clean behavior 2347 for a real calculator, but it is adequate for the first example. 2348 2349 2350 File: bison.info, Node: Rpcalc Generate, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc 2351 2352 2.1.6 Running Bison to Make the Parser 2353 -------------------------------------- 2354 2355 Before running Bison to produce a parser, we need to decide how to 2356 arrange all the source code in one or more source files. For such a 2357 simple example, the easiest thing is to put everything in one file, the 2358 grammar file. The definitions of `yylex', `yyerror' and `main' go at 2359 the end, in the epilogue of the grammar file (*note The Overall Layout 2360 of a Bison Grammar: Grammar Layout.). 2361 2362 For a large project, you would probably have several source files, 2363 and use `make' to arrange to recompile them. 2364 2365 With all the source in the grammar file, you use the following 2366 command to convert it into a parser implementation file: 2367 2368 bison FILE.y 2369 2370 In this example, the grammar file is called `rpcalc.y' (for "Reverse 2371 Polish CALCulator"). Bison produces a parser implementation file named 2372 `FILE.tab.c', removing the `.y' from the grammar file name. The parser 2373 implementation file contains the source code for `yyparse'. The 2374 additional functions in the grammar file (`yylex', `yyerror' and 2375 `main') are copied verbatim to the parser implementation file. 2376 2377 2378 File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Generate, Up: RPN Calc 2379 2380 2.1.7 Compiling the Parser Implementation File 2381 ---------------------------------------------- 2382 2383 Here is how to compile and run the parser implementation file: 2384 2385 # List files in current directory. 2386 $ ls 2387 rpcalc.tab.c rpcalc.y 2388 2389 # Compile the Bison parser. 2390 # `-lm' tells compiler to search math library for `pow'. 2391 $ cc -lm -o rpcalc rpcalc.tab.c 2392 2393 # List files again. 2394 $ ls 2395 rpcalc rpcalc.tab.c rpcalc.y 2396 2397 The file `rpcalc' now contains the executable code. Here is an 2398 example session using `rpcalc'. 2399 2400 $ rpcalc 2401 4 9 + 2402 13 2403 3 7 + 3 4 5 *+- 2404 -13 2405 3 7 + 3 4 5 * + - n Note the unary minus, `n' 2406 13 2407 5 6 / 4 n + 2408 -3.166666667 2409 3 4 ^ Exponentiation 2410 81 2411 ^D End-of-file indicator 2412 $ 2413 2414 2415 File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples 2416 2417 2.2 Infix Notation Calculator: `calc' 2418 ===================================== 2419 2420 We now modify rpcalc to handle infix operators instead of postfix. 2421 Infix notation involves the concept of operator precedence and the need 2422 for parentheses nested to arbitrary depth. Here is the Bison code for 2423 `calc.y', an infix desk-top calculator. 2424 2425 /* Infix notation calculator. */ 2426 2427 %{ 2428 #define YYSTYPE double 2429 #include <math.h> 2430 #include <stdio.h> 2431 int yylex (void); 2432 void yyerror (char const *); 2433 %} 2434 2435 /* Bison declarations. */ 2436 %token NUM 2437 %left '-' '+' 2438 %left '*' '/' 2439 %left NEG /* negation--unary minus */ 2440 %right '^' /* exponentiation */ 2441 2442 %% /* The grammar follows. */ 2443 input: 2444 /* empty */ 2445 | input line 2446 ; 2447 2448 line: 2449 '\n' 2450 | exp '\n' { printf ("\t%.10g\n", $1); } 2451 ; 2452 2453 exp: 2454 NUM { $$ = $1; } 2455 | exp '+' exp { $$ = $1 + $3; } 2456 | exp '-' exp { $$ = $1 - $3; } 2457 | exp '*' exp { $$ = $1 * $3; } 2458 | exp '/' exp { $$ = $1 / $3; } 2459 | '-' exp %prec NEG { $$ = -$2; } 2460 | exp '^' exp { $$ = pow ($1, $3); } 2461 | '(' exp ')' { $$ = $2; } 2462 ; 2463 %% 2464 2465 The functions `yylex', `yyerror' and `main' can be the same as before. 2466 2467 There are two important new features shown in this code. 2468 2469 In the second section (Bison declarations), `%left' declares token 2470 types and says they are left-associative operators. The declarations 2471 `%left' and `%right' (right associativity) take the place of `%token' 2472 which is used to declare a token type name without associativity. 2473 (These tokens are single-character literals, which ordinarily don't 2474 need to be declared. We declare them here to specify the 2475 associativity.) 2476 2477 Operator precedence is determined by the line ordering of the 2478 declarations; the higher the line number of the declaration (lower on 2479 the page or screen), the higher the precedence. Hence, exponentiation 2480 has the highest precedence, unary minus (`NEG') is next, followed by 2481 `*' and `/', and so on. *Note Operator Precedence: Precedence. 2482 2483 The other important new feature is the `%prec' in the grammar 2484 section for the unary minus operator. The `%prec' simply instructs 2485 Bison that the rule `| '-' exp' has the same precedence as `NEG'--in 2486 this case the next-to-highest. *Note Context-Dependent Precedence: 2487 Contextual Precedence. 2488 2489 Here is a sample run of `calc.y': 2490 2491 $ calc 2492 4 + 4.5 - (34/(8*3+-3)) 2493 6.880952381 2494 -56 + 2 2495 -54 2496 3 ^ 2 2497 9 2498 2499 2500 File: bison.info, Node: Simple Error Recovery, Next: Location Tracking Calc, Prev: Infix Calc, Up: Examples 2501 2502 2.3 Simple Error Recovery 2503 ========================= 2504 2505 Up to this point, this manual has not addressed the issue of "error 2506 recovery"--how to continue parsing after the parser detects a syntax 2507 error. All we have handled is error reporting with `yyerror'. Recall 2508 that by default `yyparse' returns after calling `yyerror'. This means 2509 that an erroneous input line causes the calculator program to exit. 2510 Now we show how to rectify this deficiency. 2511 2512 The Bison language itself includes the reserved word `error', which 2513 may be included in the grammar rules. In the example below it has been 2514 added to one of the alternatives for `line': 2515 2516 line: 2517 '\n' 2518 | exp '\n' { printf ("\t%.10g\n", $1); } 2519 | error '\n' { yyerrok; } 2520 ; 2521 2522 This addition to the grammar allows for simple error recovery in the 2523 event of a syntax error. If an expression that cannot be evaluated is 2524 read, the error will be recognized by the third rule for `line', and 2525 parsing will continue. (The `yyerror' function is still called upon to 2526 print its message as well.) The action executes the statement 2527 `yyerrok', a macro defined automatically by Bison; its meaning is that 2528 error recovery is complete (*note Error Recovery::). Note the 2529 difference between `yyerrok' and `yyerror'; neither one is a misprint. 2530 2531 This form of error recovery deals with syntax errors. There are 2532 other kinds of errors; for example, division by zero, which raises an 2533 exception signal that is normally fatal. A real calculator program 2534 must handle this signal and use `longjmp' to return to `main' and 2535 resume parsing input lines; it would also have to discard the rest of 2536 the current line of input. We won't discuss this issue further because 2537 it is not specific to Bison programs. 2538 2539 2540 File: bison.info, Node: Location Tracking Calc, Next: Multi-function Calc, Prev: Simple Error Recovery, Up: Examples 2541 2542 2.4 Location Tracking Calculator: `ltcalc' 2543 ========================================== 2544 2545 This example extends the infix notation calculator with location 2546 tracking. This feature will be used to improve the error messages. For 2547 the sake of clarity, this example is a simple integer calculator, since 2548 most of the work needed to use locations will be done in the lexical 2549 analyzer. 2550 2551 * Menu: 2552 2553 * Ltcalc Declarations:: Bison and C declarations for ltcalc. 2554 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations. 2555 * Ltcalc Lexer:: The lexical analyzer. 2556 2557 2558 File: bison.info, Node: Ltcalc Declarations, Next: Ltcalc Rules, Up: Location Tracking Calc 2559 2560 2.4.1 Declarations for `ltcalc' 2561 ------------------------------- 2562 2563 The C and Bison declarations for the location tracking calculator are 2564 the same as the declarations for the infix notation calculator. 2565 2566 /* Location tracking calculator. */ 2567 2568 %{ 2569 #define YYSTYPE int 2570 #include <math.h> 2571 int yylex (void); 2572 void yyerror (char const *); 2573 %} 2574 2575 /* Bison declarations. */ 2576 %token NUM 2577 2578 %left '-' '+' 2579 %left '*' '/' 2580 %left NEG 2581 %right '^' 2582 2583 %% /* The grammar follows. */ 2584 2585 Note there are no declarations specific to locations. Defining a data 2586 type for storing locations is not needed: we will use the type provided 2587 by default (*note Data Types of Locations: Location Type.), which is a 2588 four member structure with the following integer fields: `first_line', 2589 `first_column', `last_line' and `last_column'. By conventions, and in 2590 accordance with the GNU Coding Standards and common practice, the line 2591 and column count both start at 1. 2592 2593 2594 File: bison.info, Node: Ltcalc Rules, Next: Ltcalc Lexer, Prev: Ltcalc Declarations, Up: Location Tracking Calc 2595 2596 2.4.2 Grammar Rules for `ltcalc' 2597 -------------------------------- 2598 2599 Whether handling locations or not has no effect on the syntax of your 2600 language. Therefore, grammar rules for this example will be very close 2601 to those of the previous example: we will only modify them to benefit 2602 from the new information. 2603 2604 Here, we will use locations to report divisions by zero, and locate 2605 the wrong expressions or subexpressions. 2606 2607 input: 2608 /* empty */ 2609 | input line 2610 ; 2611 2612 line: 2613 '\n' 2614 | exp '\n' { printf ("%d\n", $1); } 2615 ; 2616 2617 exp: 2618 NUM { $$ = $1; } 2619 | exp '+' exp { $$ = $1 + $3; } 2620 | exp '-' exp { $$ = $1 - $3; } 2621 | exp '*' exp { $$ = $1 * $3; } 2622 | exp '/' exp 2623 { 2624 if ($3) 2625 $$ = $1 / $3; 2626 else 2627 { 2628 $$ = 1; 2629 fprintf (stderr, "%d.%d-%d.%d: division by zero", 2630 @3.first_line, @3.first_column, 2631 @3.last_line, @3.last_column); 2632 } 2633 } 2634 | '-' exp %prec NEG { $$ = -$2; } 2635 | exp '^' exp { $$ = pow ($1, $3); } 2636 | '(' exp ')' { $$ = $2; } 2637 2638 This code shows how to reach locations inside of semantic actions, by 2639 using the pseudo-variables `@N' for rule components, and the 2640 pseudo-variable `@$' for groupings. 2641 2642 We don't need to assign a value to `@$': the output parser does it 2643 automatically. By default, before executing the C code of each action, 2644 `@$' is set to range from the beginning of `@1' to the end of `@N', for 2645 a rule with N components. This behavior can be redefined (*note 2646 Default Action for Locations: Location Default Action.), and for very 2647 specific rules, `@$' can be computed by hand. 2648 2649 2650 File: bison.info, Node: Ltcalc Lexer, Prev: Ltcalc Rules, Up: Location Tracking Calc 2651 2652 2.4.3 The `ltcalc' Lexical Analyzer. 2653 ------------------------------------ 2654 2655 Until now, we relied on Bison's defaults to enable location tracking. 2656 The next step is to rewrite the lexical analyzer, and make it able to 2657 feed the parser with the token locations, as it already does for 2658 semantic values. 2659 2660 To this end, we must take into account every single character of the 2661 input text, to avoid the computed locations of being fuzzy or wrong: 2662 2663 int 2664 yylex (void) 2665 { 2666 int c; 2667 2668 /* Skip white space. */ 2669 while ((c = getchar ()) == ' ' || c == '\t') 2670 ++yylloc.last_column; 2671 2672 /* Step. */ 2673 yylloc.first_line = yylloc.last_line; 2674 yylloc.first_column = yylloc.last_column; 2675 2676 /* Process numbers. */ 2677 if (isdigit (c)) 2678 { 2679 yylval = c - '0'; 2680 ++yylloc.last_column; 2681 while (isdigit (c = getchar ())) 2682 { 2683 ++yylloc.last_column; 2684 yylval = yylval * 10 + c - '0'; 2685 } 2686 ungetc (c, stdin); 2687 return NUM; 2688 } 2689 2690 /* Return end-of-input. */ 2691 if (c == EOF) 2692 return 0; 2693 2694 /* Return a single char, and update location. */ 2695 if (c == '\n') 2696 { 2697 ++yylloc.last_line; 2698 yylloc.last_column = 0; 2699 } 2700 else 2701 ++yylloc.last_column; 2702 return c; 2703 } 2704 2705 Basically, the lexical analyzer performs the same processing as 2706 before: it skips blanks and tabs, and reads numbers or single-character 2707 tokens. In addition, it updates `yylloc', the global variable (of type 2708 `YYLTYPE') containing the token's location. 2709 2710 Now, each time this function returns a token, the parser has its 2711 number as well as its semantic value, and its location in the text. 2712 The last needed change is to initialize `yylloc', for example in the 2713 controlling function: 2714 2715 int 2716 main (void) 2717 { 2718 yylloc.first_line = yylloc.last_line = 1; 2719 yylloc.first_column = yylloc.last_column = 0; 2720 return yyparse (); 2721 } 2722 2723 Remember that computing locations is not a matter of syntax. Every 2724 character must be associated to a location update, whether it is in 2725 valid input, in comments, in literal strings, and so on. 2726 2727 2728 File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Location Tracking Calc, Up: Examples 2729 2730 2.5 Multi-Function Calculator: `mfcalc' 2731 ======================================= 2732 2733 Now that the basics of Bison have been discussed, it is time to move on 2734 to a more advanced problem. The above calculators provided only five 2735 functions, `+', `-', `*', `/' and `^'. It would be nice to have a 2736 calculator that provides other mathematical functions such as `sin', 2737 `cos', etc. 2738 2739 It is easy to add new operators to the infix calculator as long as 2740 they are only single-character literals. The lexical analyzer `yylex' 2741 passes back all nonnumeric characters as tokens, so new grammar rules 2742 suffice for adding a new operator. But we want something more 2743 flexible: built-in functions whose syntax has this form: 2744 2745 FUNCTION_NAME (ARGUMENT) 2746 2747 At the same time, we will add memory to the calculator, by allowing you 2748 to create named variables, store values in them, and use them later. 2749 Here is a sample session with the multi-function calculator: 2750 2751 $ mfcalc 2752 pi = 3.141592653589 2753 3.1415926536 2754 sin(pi) 2755 0.0000000000 2756 alpha = beta1 = 2.3 2757 2.3000000000 2758 alpha 2759 2.3000000000 2760 ln(alpha) 2761 0.8329091229 2762 exp(ln(beta1)) 2763 2.3000000000 2764 $ 2765 2766 Note that multiple assignment and nested function calls are 2767 permitted. 2768 2769 * Menu: 2770 2771 * Mfcalc Declarations:: Bison declarations for multi-function calculator. 2772 * Mfcalc Rules:: Grammar rules for the calculator. 2773 * Mfcalc Symbol Table:: Symbol table management subroutines. 2774 2775 2776 File: bison.info, Node: Mfcalc Declarations, Next: Mfcalc Rules, Up: Multi-function Calc 2777 2778 2.5.1 Declarations for `mfcalc' 2779 ------------------------------- 2780 2781 Here are the C and Bison declarations for the multi-function calculator. 2782 2783 %{ 2784 #include <math.h> /* For math functions, cos(), sin(), etc. */ 2785 #include "calc.h" /* Contains definition of `symrec'. */ 2786 int yylex (void); 2787 void yyerror (char const *); 2788 %} 2789 2790 %union { 2791 double val; /* For returning numbers. */ 2792 symrec *tptr; /* For returning symbol-table pointers. */ 2793 } 2794 %token <val> NUM /* Simple double precision number. */ 2795 %token <tptr> VAR FNCT /* Variable and function. */ 2796 %type <val> exp 2797 2798 %right '=' 2799 %left '-' '+' 2800 %left '*' '/' 2801 %left NEG /* negation--unary minus */ 2802 %right '^' /* exponentiation */ 2803 2804 The above grammar introduces only two new features of the Bison 2805 language. These features allow semantic values to have various data 2806 types (*note More Than One Value Type: Multiple Types.). 2807 2808 The `%union' declaration specifies the entire list of possible types; 2809 this is instead of defining `YYSTYPE'. The allowable types are now 2810 double-floats (for `exp' and `NUM') and pointers to entries in the 2811 symbol table. *Note The Collection of Value Types: Union Decl. 2812 2813 Since values can now have various types, it is necessary to 2814 associate a type with each grammar symbol whose semantic value is used. 2815 These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations 2816 are augmented with information about their data type (placed between 2817 angle brackets). 2818 2819 The Bison construct `%type' is used for declaring nonterminal 2820 symbols, just as `%token' is used for declaring token types. We have 2821 not used `%type' before because nonterminal symbols are normally 2822 declared implicitly by the rules that define them. But `exp' must be 2823 declared explicitly so we can specify its value type. *Note 2824 Nonterminal Symbols: Type Decl. 2825 2826 2827 File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symbol Table, Prev: Mfcalc Declarations, Up: Multi-function Calc 2828 2829 2.5.2 Grammar Rules for `mfcalc' 2830 -------------------------------- 2831 2832 Here are the grammar rules for the multi-function calculator. Most of 2833 them are copied directly from `calc'; three rules, those which mention 2834 `VAR' or `FNCT', are new. 2835 2836 %% /* The grammar follows. */ 2837 input: 2838 /* empty */ 2839 | input line 2840 ; 2841 2842 line: 2843 '\n' 2844 | exp '\n' { printf ("%.10g\n", $1); } 2845 | error '\n' { yyerrok; } 2846 ; 2847 2848 exp: 2849 NUM { $$ = $1; } 2850 | VAR { $$ = $1->value.var; } 2851 | VAR '=' exp { $$ = $3; $1->value.var = $3; } 2852 | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); } 2853 | exp '+' exp { $$ = $1 + $3; } 2854 | exp '-' exp { $$ = $1 - $3; } 2855 | exp '*' exp { $$ = $1 * $3; } 2856 | exp '/' exp { $$ = $1 / $3; } 2857 | '-' exp %prec NEG { $$ = -$2; } 2858 | exp '^' exp { $$ = pow ($1, $3); } 2859 | '(' exp ')' { $$ = $2; } 2860 ; 2861 /* End of grammar. */ 2862 %% 2863 2864 2865 File: bison.info, Node: Mfcalc Symbol Table, Prev: Mfcalc Rules, Up: Multi-function Calc 2866 2867 2.5.3 The `mfcalc' Symbol Table 2868 ------------------------------- 2869 2870 The multi-function calculator requires a symbol table to keep track of 2871 the names and meanings of variables and functions. This doesn't affect 2872 the grammar rules (except for the actions) or the Bison declarations, 2873 but it requires some additional C functions for support. 2874 2875 The symbol table itself consists of a linked list of records. Its 2876 definition, which is kept in the header `calc.h', is as follows. It 2877 provides for either functions or variables to be placed in the table. 2878 2879 /* Function type. */ 2880 typedef double (*func_t) (double); 2881 2882 /* Data type for links in the chain of symbols. */ 2883 struct symrec 2884 { 2885 char *name; /* name of symbol */ 2886 int type; /* type of symbol: either VAR or FNCT */ 2887 union 2888 { 2889 double var; /* value of a VAR */ 2890 func_t fnctptr; /* value of a FNCT */ 2891 } value; 2892 struct symrec *next; /* link field */ 2893 }; 2894 2895 typedef struct symrec symrec; 2896 2897 /* The symbol table: a chain of `struct symrec'. */ 2898 extern symrec *sym_table; 2899 2900 symrec *putsym (char const *, int); 2901 symrec *getsym (char const *); 2902 2903 The new version of `main' includes a call to `init_table', a 2904 function that initializes the symbol table. Here it is, and 2905 `init_table' as well: 2906 2907 #include <stdio.h> 2908 2909 /* Called by yyparse on error. */ 2910 void 2911 yyerror (char const *s) 2912 { 2913 fprintf (stderr, "%s\n", s); 2914 } 2915 2916 struct init 2917 { 2918 char const *fname; 2919 double (*fnct) (double); 2920 }; 2921 2922 struct init const arith_fncts[] = 2923 { 2924 "sin", sin, 2925 "cos", cos, 2926 "atan", atan, 2927 "ln", log, 2928 "exp", exp, 2929 "sqrt", sqrt, 2930 0, 0 2931 }; 2932 2933 /* The symbol table: a chain of `struct symrec'. */ 2934 symrec *sym_table; 2935 2936 /* Put arithmetic functions in table. */ 2937 void 2938 init_table (void) 2939 { 2940 int i; 2941 for (i = 0; arith_fncts[i].fname != 0; i++) 2942 { 2943 symrec *ptr = putsym (arith_fncts[i].fname, FNCT); 2944 ptr->value.fnctptr = arith_fncts[i].fnct; 2945 } 2946 } 2947 2948 int 2949 main (void) 2950 { 2951 init_table (); 2952 return yyparse (); 2953 } 2954 2955 By simply editing the initialization list and adding the necessary 2956 include files, you can add additional functions to the calculator. 2957 2958 Two important functions allow look-up and installation of symbols in 2959 the symbol table. The function `putsym' is passed a name and the type 2960 (`VAR' or `FNCT') of the object to be installed. The object is linked 2961 to the front of the list, and a pointer to the object is returned. The 2962 function `getsym' is passed the name of the symbol to look up. If 2963 found, a pointer to that symbol is returned; otherwise zero is returned. 2964 2965 #include <stdlib.h> /* malloc. */ 2966 #include <string.h> /* strlen. */ 2967 2968 symrec * 2969 putsym (char const *sym_name, int sym_type) 2970 { 2971 symrec *ptr = (symrec *) malloc (sizeof (symrec)); 2972 ptr->name = (char *) malloc (strlen (sym_name) + 1); 2973 strcpy (ptr->name,sym_name); 2974 ptr->type = sym_type; 2975 ptr->value.var = 0; /* Set value to 0 even if fctn. */ 2976 ptr->next = (struct symrec *)sym_table; 2977 sym_table = ptr; 2978 return ptr; 2979 } 2980 2981 symrec * 2982 getsym (char const *sym_name) 2983 { 2984 symrec *ptr; 2985 for (ptr = sym_table; ptr != (symrec *) 0; 2986 ptr = (symrec *)ptr->next) 2987 if (strcmp (ptr->name,sym_name) == 0) 2988 return ptr; 2989 return 0; 2990 } 2991 2992 The function `yylex' must now recognize variables, numeric values, 2993 and the single-character arithmetic operators. Strings of alphanumeric 2994 characters with a leading letter are recognized as either variables or 2995 functions depending on what the symbol table says about them. 2996 2997 The string is passed to `getsym' for look up in the symbol table. If 2998 the name appears in the table, a pointer to its location and its type 2999 (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in 3000 the table, then it is installed as a `VAR' using `putsym'. Again, a 3001 pointer and its type (which must be `VAR') is returned to `yyparse'. 3002 3003 No change is needed in the handling of numeric values and arithmetic 3004 operators in `yylex'. 3005 3006 #include <ctype.h> 3007 3008 int 3009 yylex (void) 3010 { 3011 int c; 3012 3013 /* Ignore white space, get first nonwhite character. */ 3014 while ((c = getchar ()) == ' ' || c == '\t') 3015 continue; 3016 3017 if (c == EOF) 3018 return 0; 3019 3020 /* Char starts a number => parse the number. */ 3021 if (c == '.' || isdigit (c)) 3022 { 3023 ungetc (c, stdin); 3024 scanf ("%lf", &yylval.val); 3025 return NUM; 3026 } 3027 3028 /* Char starts an identifier => read the name. */ 3029 if (isalpha (c)) 3030 { 3031 /* Initially make the buffer long enough 3032 for a 40-character symbol name. */ 3033 static size_t length = 40; 3034 static char *symbuf = 0; 3035 symrec *s; 3036 int i; 3037 3038 if (!symbuf) 3039 symbuf = (char *) malloc (length + 1); 3040 3041 i = 0; 3042 do 3043 { 3044 /* If buffer is full, make it bigger. */ 3045 if (i == length) 3046 { 3047 length *= 2; 3048 symbuf = (char *) realloc (symbuf, length + 1); 3049 } 3050 /* Add this character to the buffer. */ 3051 symbuf[i++] = c; 3052 /* Get another character. */ 3053 c = getchar (); 3054 } 3055 while (isalnum (c)); 3056 3057 ungetc (c, stdin); 3058 symbuf[i] = '\0'; 3059 3060 s = getsym (symbuf); 3061 if (s == 0) 3062 s = putsym (symbuf, VAR); 3063 yylval.tptr = s; 3064 return s->type; 3065 } 3066 3067 /* Any other character is a token by itself. */ 3068 return c; 3069 } 3070 3071 The error reporting function is unchanged, and the new version of 3072 `main' includes a call to `init_table' and sets the `yydebug' on user 3073 demand (*Note Tracing Your Parser: Tracing, for details): 3074 3075 /* Called by yyparse on error. */ 3076 void 3077 yyerror (char const *s) 3078 { 3079 fprintf (stderr, "%s\n", s); 3080 } 3081 3082 int 3083 main (int argc, char const* argv[]) 3084 { 3085 int i; 3086 /* Enable parse traces on option -p. */ 3087 for (i = 1; i < argc; ++i) 3088 if (!strcmp(argv[i], "-p")) 3089 yydebug = 1; 3090 init_table (); 3091 return yyparse (); 3092 } 3093 3094 This program is both powerful and flexible. You may easily add new 3095 functions, and it is a simple job to modify this code to install 3096 predefined variables such as `pi' or `e' as well. 3097 3098 3099 File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples 3100 3101 2.6 Exercises 3102 ============= 3103 3104 1. Add some new functions from `math.h' to the initialization list. 3105 3106 2. Add another array that contains constants and their values. Then 3107 modify `init_table' to add these constants to the symbol table. 3108 It will be easiest to give the constants type `VAR'. 3109 3110 3. Make the program report an error if the user refers to an 3111 uninitialized variable in any way except to store a value in it. 3112 3113 3114 File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top 3115 3116 3 Bison Grammar Files 3117 ********************* 3118 3119 Bison takes as input a context-free grammar specification and produces a 3120 C-language function that recognizes correct instances of the grammar. 3121 3122 The Bison grammar file conventionally has a name ending in `.y'. 3123 *Note Invoking Bison: Invocation. 3124 3125 * Menu: 3126 3127 * Grammar Outline:: Overall layout of the grammar file. 3128 * Symbols:: Terminal and nonterminal symbols. 3129 * Rules:: How to write grammar rules. 3130 * Recursion:: Writing recursive rules. 3131 * Semantics:: Semantic values and actions. 3132 * Tracking Locations:: Locations and actions. 3133 * Named References:: Using named references in actions. 3134 * Declarations:: All kinds of Bison declarations are described here. 3135 * Multiple Parsers:: Putting more than one Bison parser in one program. 3136 3137 3138 File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File 3139 3140 3.1 Outline of a Bison Grammar 3141 ============================== 3142 3143 A Bison grammar file has four main sections, shown here with the 3144 appropriate delimiters: 3145 3146 %{ 3147 PROLOGUE 3148 %} 3149 3150 BISON DECLARATIONS 3151 3152 %% 3153 GRAMMAR RULES 3154 %% 3155 3156 EPILOGUE 3157 3158 Comments enclosed in `/* ... */' may appear in any of the sections. 3159 As a GNU extension, `//' introduces a comment that continues until end 3160 of line. 3161 3162 * Menu: 3163 3164 * Prologue:: Syntax and usage of the prologue. 3165 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. 3166 * Bison Declarations:: Syntax and usage of the Bison declarations section. 3167 * Grammar Rules:: Syntax and usage of the grammar rules section. 3168 * Epilogue:: Syntax and usage of the epilogue. 3169 3170 3171 File: bison.info, Node: Prologue, Next: Prologue Alternatives, Up: Grammar Outline 3172 3173 3.1.1 The prologue 3174 ------------------ 3175 3176 The PROLOGUE section contains macro definitions and declarations of 3177 functions and variables that are used in the actions in the grammar 3178 rules. These are copied to the beginning of the parser implementation 3179 file so that they precede the definition of `yyparse'. You can use 3180 `#include' to get the declarations from a header file. If you don't 3181 need any C declarations, you may omit the `%{' and `%}' delimiters that 3182 bracket this section. 3183 3184 The PROLOGUE section is terminated by the first occurrence of `%}' 3185 that is outside a comment, a string literal, or a character constant. 3186 3187 You may have more than one PROLOGUE section, intermixed with the 3188 BISON DECLARATIONS. This allows you to have C and Bison declarations 3189 that refer to each other. For example, the `%union' declaration may 3190 use types defined in a header file, and you may wish to prototype 3191 functions that take arguments of type `YYSTYPE'. This can be done with 3192 two PROLOGUE blocks, one before and one after the `%union' declaration. 3193 3194 %{ 3195 #define _GNU_SOURCE 3196 #include <stdio.h> 3197 #include "ptypes.h" 3198 %} 3199 3200 %union { 3201 long int n; 3202 tree t; /* `tree' is defined in `ptypes.h'. */ 3203 } 3204 3205 %{ 3206 static void print_token_value (FILE *, int, YYSTYPE); 3207 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3208 %} 3209 3210 ... 3211 3212 When in doubt, it is usually safer to put prologue code before all 3213 Bison declarations, rather than after. For example, any definitions of 3214 feature test macros like `_GNU_SOURCE' or `_POSIX_C_SOURCE' should 3215 appear before all Bison declarations, as feature test macros can affect 3216 the behavior of Bison-generated `#include' directives. 3217 3218 3219 File: bison.info, Node: Prologue Alternatives, Next: Bison Declarations, Prev: Prologue, Up: Grammar Outline 3220 3221 3.1.2 Prologue Alternatives 3222 --------------------------- 3223 3224 The functionality of PROLOGUE sections can often be subtle and 3225 inflexible. As an alternative, Bison provides a `%code' directive with 3226 an explicit qualifier field, which identifies the purpose of the code 3227 and thus the location(s) where Bison should generate it. For C/C++, 3228 the qualifier can be omitted for the default location, or it can be one 3229 of `requires', `provides', `top'. *Note %code Summary::. 3230 3231 Look again at the example of the previous section: 3232 3233 %{ 3234 #define _GNU_SOURCE 3235 #include <stdio.h> 3236 #include "ptypes.h" 3237 %} 3238 3239 %union { 3240 long int n; 3241 tree t; /* `tree' is defined in `ptypes.h'. */ 3242 } 3243 3244 %{ 3245 static void print_token_value (FILE *, int, YYSTYPE); 3246 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3247 %} 3248 3249 ... 3250 3251 Notice that there are two PROLOGUE sections here, but there's a subtle 3252 distinction between their functionality. For example, if you decide to 3253 override Bison's default definition for `YYLTYPE', in which PROLOGUE 3254 section should you write your new definition? You should write it in 3255 the first since Bison will insert that code into the parser 3256 implementation file _before_ the default `YYLTYPE' definition. In 3257 which PROLOGUE section should you prototype an internal function, 3258 `trace_token', that accepts `YYLTYPE' and `yytokentype' as arguments? 3259 You should prototype it in the second since Bison will insert that code 3260 _after_ the `YYLTYPE' and `yytokentype' definitions. 3261 3262 This distinction in functionality between the two PROLOGUE sections 3263 is established by the appearance of the `%union' between them. This 3264 behavior raises a few questions. First, why should the position of a 3265 `%union' affect definitions related to `YYLTYPE' and `yytokentype'? 3266 Second, what if there is no `%union'? In that case, the second kind of 3267 PROLOGUE section is not available. This behavior is not intuitive. 3268 3269 To avoid this subtle `%union' dependency, rewrite the example using a 3270 `%code top' and an unqualified `%code'. Let's go ahead and add the new 3271 `YYLTYPE' definition and the `trace_token' prototype at the same time: 3272 3273 %code top { 3274 #define _GNU_SOURCE 3275 #include <stdio.h> 3276 3277 /* WARNING: The following code really belongs 3278 * in a `%code requires'; see below. */ 3279 3280 #include "ptypes.h" 3281 #define YYLTYPE YYLTYPE 3282 typedef struct YYLTYPE 3283 { 3284 int first_line; 3285 int first_column; 3286 int last_line; 3287 int last_column; 3288 char *filename; 3289 } YYLTYPE; 3290 } 3291 3292 %union { 3293 long int n; 3294 tree t; /* `tree' is defined in `ptypes.h'. */ 3295 } 3296 3297 %code { 3298 static void print_token_value (FILE *, int, YYSTYPE); 3299 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3300 static void trace_token (enum yytokentype token, YYLTYPE loc); 3301 } 3302 3303 ... 3304 3305 In this way, `%code top' and the unqualified `%code' achieve the same 3306 functionality as the two kinds of PROLOGUE sections, but it's always 3307 explicit which kind you intend. Moreover, both kinds are always 3308 available even in the absence of `%union'. 3309 3310 The `%code top' block above logically contains two parts. The first 3311 two lines before the warning need to appear near the top of the parser 3312 implementation file. The first line after the warning is required by 3313 `YYSTYPE' and thus also needs to appear in the parser implementation 3314 file. However, if you've instructed Bison to generate a parser header 3315 file (*note %defines: Decl Summary.), you probably want that line to 3316 appear before the `YYSTYPE' definition in that header file as well. 3317 The `YYLTYPE' definition should also appear in the parser header file 3318 to override the default `YYLTYPE' definition there. 3319 3320 In other words, in the `%code top' block above, all but the first two 3321 lines are dependency code required by the `YYSTYPE' and `YYLTYPE' 3322 definitions. Thus, they belong in one or more `%code requires': 3323 3324 %code top { 3325 #define _GNU_SOURCE 3326 #include <stdio.h> 3327 } 3328 3329 %code requires { 3330 #include "ptypes.h" 3331 } 3332 %union { 3333 long int n; 3334 tree t; /* `tree' is defined in `ptypes.h'. */ 3335 } 3336 3337 %code requires { 3338 #define YYLTYPE YYLTYPE 3339 typedef struct YYLTYPE 3340 { 3341 int first_line; 3342 int first_column; 3343 int last_line; 3344 int last_column; 3345 char *filename; 3346 } YYLTYPE; 3347 } 3348 3349 %code { 3350 static void print_token_value (FILE *, int, YYSTYPE); 3351 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3352 static void trace_token (enum yytokentype token, YYLTYPE loc); 3353 } 3354 3355 ... 3356 3357 Now Bison will insert `#include "ptypes.h"' and the new `YYLTYPE' 3358 definition before the Bison-generated `YYSTYPE' and `YYLTYPE' 3359 definitions in both the parser implementation file and the parser 3360 header file. (By the same reasoning, `%code requires' would also be 3361 the appropriate place to write your own definition for `YYSTYPE'.) 3362 3363 When you are writing dependency code for `YYSTYPE' and `YYLTYPE', 3364 you should prefer `%code requires' over `%code top' regardless of 3365 whether you instruct Bison to generate a parser header file. When you 3366 are writing code that you need Bison to insert only into the parser 3367 implementation file and that has no special need to appear at the top 3368 of that file, you should prefer the unqualified `%code' over `%code 3369 top'. These practices will make the purpose of each block of your code 3370 explicit to Bison and to other developers reading your grammar file. 3371 Following these practices, we expect the unqualified `%code' and `%code 3372 requires' to be the most important of the four PROLOGUE alternatives. 3373 3374 At some point while developing your parser, you might decide to 3375 provide `trace_token' to modules that are external to your parser. 3376 Thus, you might wish for Bison to insert the prototype into both the 3377 parser header file and the parser implementation file. Since this 3378 function is not a dependency required by `YYSTYPE' or `YYLTYPE', it 3379 doesn't make sense to move its prototype to a `%code requires'. More 3380 importantly, since it depends upon `YYLTYPE' and `yytokentype', `%code 3381 requires' is not sufficient. Instead, move its prototype from the 3382 unqualified `%code' to a `%code provides': 3383 3384 %code top { 3385 #define _GNU_SOURCE 3386 #include <stdio.h> 3387 } 3388 3389 %code requires { 3390 #include "ptypes.h" 3391 } 3392 %union { 3393 long int n; 3394 tree t; /* `tree' is defined in `ptypes.h'. */ 3395 } 3396 3397 %code requires { 3398 #define YYLTYPE YYLTYPE 3399 typedef struct YYLTYPE 3400 { 3401 int first_line; 3402 int first_column; 3403 int last_line; 3404 int last_column; 3405 char *filename; 3406 } YYLTYPE; 3407 } 3408 3409 %code provides { 3410 void trace_token (enum yytokentype token, YYLTYPE loc); 3411 } 3412 3413 %code { 3414 static void print_token_value (FILE *, int, YYSTYPE); 3415 #define YYPRINT(F, N, L) print_token_value (F, N, L) 3416 } 3417 3418 ... 3419 3420 Bison will insert the `trace_token' prototype into both the parser 3421 header file and the parser implementation file after the definitions 3422 for `yytokentype', `YYLTYPE', and `YYSTYPE'. 3423 3424 The above examples are careful to write directives in an order that 3425 reflects the layout of the generated parser implementation and header 3426 files: `%code top', `%code requires', `%code provides', and then 3427 `%code'. While your grammar files may generally be easier to read if 3428 you also follow this order, Bison does not require it. Instead, Bison 3429 lets you choose an organization that makes sense to you. 3430 3431 You may declare any of these directives multiple times in the 3432 grammar file. In that case, Bison concatenates the contained code in 3433 declaration order. This is the only way in which the position of one 3434 of these directives within the grammar file affects its functionality. 3435 3436 The result of the previous two properties is greater flexibility in 3437 how you may organize your grammar file. For example, you may organize 3438 semantic-type-related directives by semantic type: 3439 3440 %code requires { #include "type1.h" } 3441 %union { type1 field1; } 3442 %destructor { type1_free ($$); } <field1> 3443 %printer { type1_print (yyoutput, $$); } <field1> 3444 3445 %code requires { #include "type2.h" } 3446 %union { type2 field2; } 3447 %destructor { type2_free ($$); } <field2> 3448 %printer { type2_print (yyoutput, $$); } <field2> 3449 3450 You could even place each of the above directive groups in the rules 3451 section of the grammar file next to the set of rules that uses the 3452 associated semantic type. (In the rules section, you must terminate 3453 each of those directives with a semicolon.) And you don't have to 3454 worry that some directive (like a `%union') in the definitions section 3455 is going to adversely affect their functionality in some 3456 counter-intuitive manner just because it comes first. Such an 3457 organization is not possible using PROLOGUE sections. 3458 3459 This section has been concerned with explaining the advantages of 3460 the four PROLOGUE alternatives over the original Yacc PROLOGUE. 3461 However, in most cases when using these directives, you shouldn't need 3462 to think about all the low-level ordering issues discussed here. 3463 Instead, you should simply use these directives to label each block of 3464 your code according to its purpose and let Bison handle the ordering. 3465 `%code' is the most generic label. Move code to `%code requires', 3466 `%code provides', or `%code top' as needed. 3467 3468 3469 File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: Prologue Alternatives, Up: Grammar Outline 3470 3471 3.1.3 The Bison Declarations Section 3472 ------------------------------------ 3473 3474 The BISON DECLARATIONS section contains declarations that define 3475 terminal and nonterminal symbols, specify precedence, and so on. In 3476 some simple grammars you may not need any declarations. *Note Bison 3477 Declarations: Declarations. 3478 3479 3480 File: bison.info, Node: Grammar Rules, Next: Epilogue, Prev: Bison Declarations, Up: Grammar Outline 3481 3482 3.1.4 The Grammar Rules Section 3483 ------------------------------- 3484 3485 The "grammar rules" section contains one or more Bison grammar rules, 3486 and nothing else. *Note Syntax of Grammar Rules: Rules. 3487 3488 There must always be at least one grammar rule, and the first `%%' 3489 (which precedes the grammar rules) may never be omitted even if it is 3490 the first thing in the file. 3491 3492 3493 File: bison.info, Node: Epilogue, Prev: Grammar Rules, Up: Grammar Outline 3494 3495 3.1.5 The epilogue 3496 ------------------ 3497 3498 The EPILOGUE is copied verbatim to the end of the parser implementation 3499 file, just as the PROLOGUE is copied to the beginning. This is the 3500 most convenient place to put anything that you want to have in the 3501 parser implementation file but which need not come before the 3502 definition of `yyparse'. For example, the definitions of `yylex' and 3503 `yyerror' often go here. Because C requires functions to be declared 3504 before being used, you often need to declare functions like `yylex' and 3505 `yyerror' in the Prologue, even if you define them in the Epilogue. 3506 *Note Parser C-Language Interface: Interface. 3507 3508 If the last section is empty, you may omit the `%%' that separates it 3509 from the grammar rules. 3510 3511 The Bison parser itself contains many macros and identifiers whose 3512 names start with `yy' or `YY', so it is a good idea to avoid using any 3513 such names (except those documented in this manual) in the epilogue of 3514 the grammar file. 3515 3516 3517 File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File 3518 3519 3.2 Symbols, Terminal and Nonterminal 3520 ===================================== 3521 3522 "Symbols" in Bison grammars represent the grammatical classifications 3523 of the language. 3524 3525 A "terminal symbol" (also known as a "token type") represents a 3526 class of syntactically equivalent tokens. You use the symbol in grammar 3527 rules to mean that a token in that class is allowed. The symbol is 3528 represented in the Bison parser by a numeric code, and the `yylex' 3529 function returns a token type code to indicate what kind of token has 3530 been read. You don't need to know what the code value is; you can use 3531 the symbol to stand for it. 3532 3533 A "nonterminal symbol" stands for a class of syntactically 3534 equivalent groupings. The symbol name is used in writing grammar rules. 3535 By convention, it should be all lower case. 3536 3537 Symbol names can contain letters, underscores, periods, and 3538 non-initial digits and dashes. Dashes in symbol names are a GNU 3539 extension, incompatible with POSIX Yacc. Periods and dashes make 3540 symbol names less convenient to use with named references, which 3541 require brackets around such names (*note Named References::). 3542 Terminal symbols that contain periods or dashes make little sense: 3543 since they are not valid symbols (in most programming languages) they 3544 are not exported as token names. 3545 3546 There are three ways of writing terminal symbols in the grammar: 3547 3548 * A "named token type" is written with an identifier, like an 3549 identifier in C. By convention, it should be all upper case. Each 3550 such name must be defined with a Bison declaration such as 3551 `%token'. *Note Token Type Names: Token Decl. 3552 3553 * A "character token type" (or "literal character token") is written 3554 in the grammar using the same syntax used in C for character 3555 constants; for example, `'+'' is a character token type. A 3556 character token type doesn't need to be declared unless you need to 3557 specify its semantic value data type (*note Data Types of Semantic 3558 Values: Value Type.), associativity, or precedence (*note Operator 3559 Precedence: Precedence.). 3560 3561 By convention, a character token type is used only to represent a 3562 token that consists of that particular character. Thus, the token 3563 type `'+'' is used to represent the character `+' as a token. 3564 Nothing enforces this convention, but if you depart from it, your 3565 program will confuse other readers. 3566 3567 All the usual escape sequences used in character literals in C can 3568 be used in Bison as well, but you must not use the null character 3569 as a character literal because its numeric code, zero, signifies 3570 end-of-input (*note Calling Convention for `yylex': Calling 3571 Convention.). Also, unlike standard C, trigraphs have no special 3572 meaning in Bison character literals, nor is backslash-newline 3573 allowed. 3574 3575 * A "literal string token" is written like a C string constant; for 3576 example, `"<="' is a literal string token. A literal string token 3577 doesn't need to be declared unless you need to specify its semantic 3578 value data type (*note Value Type::), associativity, or precedence 3579 (*note Precedence::). 3580 3581 You can associate the literal string token with a symbolic name as 3582 an alias, using the `%token' declaration (*note Token 3583 Declarations: Token Decl.). If you don't do that, the lexical 3584 analyzer has to retrieve the token number for the literal string 3585 token from the `yytname' table (*note Calling Convention::). 3586 3587 *Warning*: literal string tokens do not work in Yacc. 3588 3589 By convention, a literal string token is used only to represent a 3590 token that consists of that particular string. Thus, you should 3591 use the token type `"<="' to represent the string `<=' as a token. 3592 Bison does not enforce this convention, but if you depart from it, 3593 people who read your program will be confused. 3594 3595 All the escape sequences used in string literals in C can be used 3596 in Bison as well, except that you must not use a null character 3597 within a string literal. Also, unlike Standard C, trigraphs have 3598 no special meaning in Bison string literals, nor is 3599 backslash-newline allowed. A literal string token must contain 3600 two or more characters; for a token containing just one character, 3601 use a character token (see above). 3602 3603 How you choose to write a terminal symbol has no effect on its 3604 grammatical meaning. That depends only on where it appears in rules and 3605 on when the parser function returns that symbol. 3606 3607 The value returned by `yylex' is always one of the terminal symbols, 3608 except that a zero or negative value signifies end-of-input. Whichever 3609 way you write the token type in the grammar rules, you write it the 3610 same way in the definition of `yylex'. The numeric code for a 3611 character token type is simply the positive numeric code of the 3612 character, so `yylex' can use the identical value to generate the 3613 requisite code, though you may need to convert it to `unsigned char' to 3614 avoid sign-extension on hosts where `char' is signed. Each named token 3615 type becomes a C macro in the parser implementation file, so `yylex' 3616 can use the name to stand for the code. (This is why periods don't 3617 make sense in terminal symbols.) *Note Calling Convention for `yylex': 3618 Calling Convention. 3619 3620 If `yylex' is defined in a separate file, you need to arrange for the 3621 token-type macro definitions to be available there. Use the `-d' 3622 option when you run Bison, so that it will write these macro definitions 3623 into a separate header file `NAME.tab.h' which you can include in the 3624 other source files that need it. *Note Invoking Bison: Invocation. 3625 3626 If you want to write a grammar that is portable to any Standard C 3627 host, you must use only nonnull character tokens taken from the basic 3628 execution character set of Standard C. This set consists of the ten 3629 digits, the 52 lower- and upper-case English letters, and the 3630 characters in the following C-language string: 3631 3632 "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~" 3633 3634 The `yylex' function and Bison must use a consistent character set 3635 and encoding for character tokens. For example, if you run Bison in an 3636 ASCII environment, but then compile and run the resulting program in an 3637 environment that uses an incompatible character set like EBCDIC, the 3638 resulting program may not work because the tables generated by Bison 3639 will assume ASCII numeric values for character tokens. It is standard 3640 practice for software distributions to contain C source files that were 3641 generated by Bison in an ASCII environment, so installers on platforms 3642 that are incompatible with ASCII must rebuild those files before 3643 compiling them. 3644 3645 The symbol `error' is a terminal symbol reserved for error recovery 3646 (*note Error Recovery::); you shouldn't use it for any other purpose. 3647 In particular, `yylex' should never return this value. The default 3648 value of the error token is 256, unless you explicitly assigned 256 to 3649 one of your tokens with a `%token' declaration. 3650 3651 3652 File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File 3653 3654 3.3 Syntax of Grammar Rules 3655 =========================== 3656 3657 A Bison grammar rule has the following general form: 3658 3659 RESULT: COMPONENTS...; 3660 3661 where RESULT is the nonterminal symbol that this rule describes, and 3662 COMPONENTS are various terminal and nonterminal symbols that are put 3663 together by this rule (*note Symbols::). 3664 3665 For example, 3666 3667 exp: exp '+' exp; 3668 3669 says that two groupings of type `exp', with a `+' token in between, can 3670 be combined into a larger grouping of type `exp'. 3671 3672 White space in rules is significant only to separate symbols. You 3673 can add extra white space as you wish. 3674 3675 Scattered among the components can be ACTIONS that determine the 3676 semantics of the rule. An action looks like this: 3677 3678 {C STATEMENTS} 3679 3680 This is an example of "braced code", that is, C code surrounded by 3681 braces, much like a compound statement in C. Braced code can contain 3682 any sequence of C tokens, so long as its braces are balanced. Bison 3683 does not check the braced code for correctness directly; it merely 3684 copies the code to the parser implementation file, where the C compiler 3685 can check it. 3686 3687 Within braced code, the balanced-brace count is not affected by 3688 braces within comments, string literals, or character constants, but it 3689 is affected by the C digraphs `<%' and `%>' that represent braces. At 3690 the top level braced code must be terminated by `}' and not by a 3691 digraph. Bison does not look for trigraphs, so if braced code uses 3692 trigraphs you should ensure that they do not affect the nesting of 3693 braces or the boundaries of comments, string literals, or character 3694 constants. 3695 3696 Usually there is only one action and it follows the components. 3697 *Note Actions::. 3698 3699 Multiple rules for the same RESULT can be written separately or can 3700 be joined with the vertical-bar character `|' as follows: 3701 3702 RESULT: 3703 RULE1-COMPONENTS... 3704 | RULE2-COMPONENTS... 3705 ... 3706 ; 3707 3708 They are still considered distinct rules even when joined in this way. 3709 3710 If COMPONENTS in a rule is empty, it means that RESULT can match the 3711 empty string. For example, here is how to define a comma-separated 3712 sequence of zero or more `exp' groupings: 3713 3714 expseq: 3715 /* empty */ 3716 | expseq1 3717 ; 3718 3719 expseq1: 3720 exp 3721 | expseq1 ',' exp 3722 ; 3723 3724 It is customary to write a comment `/* empty */' in each rule with no 3725 components. 3726 3727 3728 File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File 3729 3730 3.4 Recursive Rules 3731 =================== 3732 3733 A rule is called "recursive" when its RESULT nonterminal appears also 3734 on its right hand side. Nearly all Bison grammars need to use 3735 recursion, because that is the only way to define a sequence of any 3736 number of a particular thing. Consider this recursive definition of a 3737 comma-separated sequence of one or more expressions: 3738 3739 expseq1: 3740 exp 3741 | expseq1 ',' exp 3742 ; 3743 3744 Since the recursive use of `expseq1' is the leftmost symbol in the 3745 right hand side, we call this "left recursion". By contrast, here the 3746 same construct is defined using "right recursion": 3747 3748 expseq1: 3749 exp 3750 | exp ',' expseq1 3751 ; 3752 3753 Any kind of sequence can be defined using either left recursion or right 3754 recursion, but you should always use left recursion, because it can 3755 parse a sequence of any number of elements with bounded stack space. 3756 Right recursion uses up space on the Bison stack in proportion to the 3757 number of elements in the sequence, because all the elements must be 3758 shifted onto the stack before the rule can be applied even once. *Note 3759 The Bison Parser Algorithm: Algorithm, for further explanation of this. 3760 3761 "Indirect" or "mutual" recursion occurs when the result of the rule 3762 does not appear directly on its right hand side, but does appear in 3763 rules for other nonterminals which do appear on its right hand side. 3764 3765 For example: 3766 3767 expr: 3768 primary 3769 | primary '+' primary 3770 ; 3771 3772 primary: 3773 constant 3774 | '(' expr ')' 3775 ; 3776 3777 defines two mutually-recursive nonterminals, since each refers to the 3778 other. 3779 3780 3781 File: bison.info, Node: Semantics, Next: Tracking Locations, Prev: Recursion, Up: Grammar File 3782 3783 3.5 Defining Language Semantics 3784 =============================== 3785 3786 The grammar rules for a language determine only the syntax. The 3787 semantics are determined by the semantic values associated with various 3788 tokens and groupings, and by the actions taken when various groupings 3789 are recognized. 3790 3791 For example, the calculator calculates properly because the value 3792 associated with each expression is the proper number; it adds properly 3793 because the action for the grouping `X + Y' is to add the numbers 3794 associated with X and Y. 3795 3796 * Menu: 3797 3798 * Value Type:: Specifying one data type for all semantic values. 3799 * Multiple Types:: Specifying several alternative data types. 3800 * Actions:: An action is the semantic definition of a grammar rule. 3801 * Action Types:: Specifying data types for actions to operate on. 3802 * Mid-Rule Actions:: Most actions go at the end of a rule. 3803 This says when, why and how to use the exceptional 3804 action in the middle of a rule. 3805 3806 3807 File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics 3808 3809 3.5.1 Data Types of Semantic Values 3810 ----------------------------------- 3811 3812 In a simple program it may be sufficient to use the same data type for 3813 the semantic values of all language constructs. This was true in the 3814 RPN and infix calculator examples (*note Reverse Polish Notation 3815 Calculator: RPN Calc.). 3816 3817 Bison normally uses the type `int' for semantic values if your 3818 program uses the same data type for all language constructs. To 3819 specify some other type, define `YYSTYPE' as a macro, like this: 3820 3821 #define YYSTYPE double 3822 3823 `YYSTYPE''s replacement list should be a type name that does not 3824 contain parentheses or square brackets. This macro definition must go 3825 in the prologue of the grammar file (*note Outline of a Bison Grammar: 3826 Grammar Outline.). 3827 3828 3829 File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics 3830 3831 3.5.2 More Than One Value Type 3832 ------------------------------ 3833 3834 In most programs, you will need different data types for different kinds 3835 of tokens and groupings. For example, a numeric constant may need type 3836 `int' or `long int', while a string constant needs type `char *', and 3837 an identifier might need a pointer to an entry in the symbol table. 3838 3839 To use more than one data type for semantic values in one parser, 3840 Bison requires you to do two things: 3841 3842 * Specify the entire collection of possible data types, either by 3843 using the `%union' Bison declaration (*note The Collection of 3844 Value Types: Union Decl.), or by using a `typedef' or a `#define' 3845 to define `YYSTYPE' to be a union type whose member names are the 3846 type tags. 3847 3848 * Choose one of those types for each symbol (terminal or 3849 nonterminal) for which semantic values are used. This is done for 3850 tokens with the `%token' Bison declaration (*note Token Type 3851 Names: Token Decl.) and for groupings with the `%type' Bison 3852 declaration (*note Nonterminal Symbols: Type Decl.). 3853 3854 3855 File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics 3856 3857 3.5.3 Actions 3858 ------------- 3859 3860 An action accompanies a syntactic rule and contains C code to be 3861 executed each time an instance of that rule is recognized. The task of 3862 most actions is to compute a semantic value for the grouping built by 3863 the rule from the semantic values associated with tokens or smaller 3864 groupings. 3865 3866 An action consists of braced code containing C statements, and can be 3867 placed at any position in the rule; it is executed at that position. 3868 Most rules have just one action at the end of the rule, following all 3869 the components. Actions in the middle of a rule are tricky and used 3870 only for special purposes (*note Actions in Mid-Rule: Mid-Rule 3871 Actions.). 3872 3873 The C code in an action can refer to the semantic values of the 3874 components matched by the rule with the construct `$N', which stands 3875 for the value of the Nth component. The semantic value for the 3876 grouping being constructed is `$$'. In addition, the semantic values 3877 of symbols can be accessed with the named references construct `$NAME' 3878 or `$[NAME]'. Bison translates both of these constructs into 3879 expressions of the appropriate type when it copies the actions into the 3880 parser implementation file. `$$' (or `$NAME', when it stands for the 3881 current grouping) is translated to a modifiable lvalue, so it can be 3882 assigned to. 3883 3884 Here is a typical example: 3885 3886 exp: 3887 ... 3888 | exp '+' exp { $$ = $1 + $3; } 3889 3890 Or, in terms of named references: 3891 3892 exp[result]: 3893 ... 3894 | exp[left] '+' exp[right] { $result = $left + $right; } 3895 3896 This rule constructs an `exp' from two smaller `exp' groupings 3897 connected by a plus-sign token. In the action, `$1' and `$3' (`$left' 3898 and `$right') refer to the semantic values of the two component `exp' 3899 groupings, which are the first and third symbols on the right hand side 3900 of the rule. The sum is stored into `$$' (`$result') so that it 3901 becomes the semantic value of the addition-expression just recognized 3902 by the rule. If there were a useful semantic value associated with the 3903 `+' token, it could be referred to as `$2'. 3904 3905 *Note Named References::, for more information about using the named 3906 references construct. 3907 3908 Note that the vertical-bar character `|' is really a rule separator, 3909 and actions are attached to a single rule. This is a difference with 3910 tools like Flex, for which `|' stands for either "or", or "the same 3911 action as that of the next rule". In the following example, the action 3912 is triggered only when `b' is found: 3913 3914 a-or-b: 'a'|'b' { a_or_b_found = 1; }; 3915 3916 If you don't specify an action for a rule, Bison supplies a default: 3917 `$$ = $1'. Thus, the value of the first symbol in the rule becomes the 3918 value of the whole rule. Of course, the default action is valid only 3919 if the two data types match. There is no meaningful default action for 3920 an empty rule; every empty rule must have an explicit action unless the 3921 rule's value does not matter. 3922 3923 `$N' with N zero or negative is allowed for reference to tokens and 3924 groupings on the stack _before_ those that match the current rule. 3925 This is a very risky practice, and to use it reliably you must be 3926 certain of the context in which the rule is applied. Here is a case in 3927 which you can use this reliably: 3928 3929 foo: 3930 expr bar '+' expr { ... } 3931 | expr bar '-' expr { ... } 3932 ; 3933 3934 bar: 3935 /* empty */ { previous_expr = $0; } 3936 ; 3937 3938 As long as `bar' is used only in the fashion shown here, `$0' always 3939 refers to the `expr' which precedes `bar' in the definition of `foo'. 3940 3941 It is also possible to access the semantic value of the lookahead 3942 token, if any, from a semantic action. This semantic value is stored 3943 in `yylval'. *Note Special Features for Use in Actions: Action 3944 Features. 3945 3946 3947 File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics 3948 3949 3.5.4 Data Types of Values in Actions 3950 ------------------------------------- 3951 3952 If you have chosen a single data type for semantic values, the `$$' and 3953 `$N' constructs always have that data type. 3954 3955 If you have used `%union' to specify a variety of data types, then 3956 you must declare a choice among these types for each terminal or 3957 nonterminal symbol that can have a semantic value. Then each time you 3958 use `$$' or `$N', its data type is determined by which symbol it refers 3959 to in the rule. In this example, 3960 3961 exp: 3962 ... 3963 | exp '+' exp { $$ = $1 + $3; } 3964 3965 `$1' and `$3' refer to instances of `exp', so they all have the data 3966 type declared for the nonterminal symbol `exp'. If `$2' were used, it 3967 would have the data type declared for the terminal symbol `'+'', 3968 whatever that might be. 3969 3970 Alternatively, you can specify the data type when you refer to the 3971 value, by inserting `<TYPE>' after the `$' at the beginning of the 3972 reference. For example, if you have defined types as shown here: 3973 3974 %union { 3975 int itype; 3976 double dtype; 3977 } 3978 3979 then you can write `$<itype>1' to refer to the first subunit of the 3980 rule as an integer, or `$<dtype>1' to refer to it as a double. 3981 3982 3983 File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics 3984 3985 3.5.5 Actions in Mid-Rule 3986 ------------------------- 3987 3988 Occasionally it is useful to put an action in the middle of a rule. 3989 These actions are written just like usual end-of-rule actions, but they 3990 are executed before the parser even recognizes the following components. 3991 3992 * Menu: 3993 3994 * Using Mid-Rule Actions:: Putting an action in the middle of a rule. 3995 * Mid-Rule Action Translation:: How mid-rule actions are actually processed. 3996 * Mid-Rule Conflicts:: Mid-rule actions can cause conflicts. 3997 3998 3999 File: bison.info, Node: Using Mid-Rule Actions, Next: Mid-Rule Action Translation, Up: Mid-Rule Actions 4000 4001 3.5.5.1 Using Mid-Rule Actions 4002 .............................. 4003 4004 A mid-rule action may refer to the components preceding it using `$N', 4005 but it may not refer to subsequent components because it is run before 4006 they are parsed. 4007 4008 The mid-rule action itself counts as one of the components of the 4009 rule. This makes a difference when there is another action later in 4010 the same rule (and usually there is another at the end): you have to 4011 count the actions along with the symbols when working out which number 4012 N to use in `$N'. 4013 4014 The mid-rule action can also have a semantic value. The action can 4015 set its value with an assignment to `$$', and actions later in the rule 4016 can refer to the value using `$N'. Since there is no symbol to name 4017 the action, there is no way to declare a data type for the value in 4018 advance, so you must use the `$<...>N' construct to specify a data type 4019 each time you refer to this value. 4020 4021 There is no way to set the value of the entire rule with a mid-rule 4022 action, because assignments to `$$' do not have that effect. The only 4023 way to set the value for the entire rule is with an ordinary action at 4024 the end of the rule. 4025 4026 Here is an example from a hypothetical compiler, handling a `let' 4027 statement that looks like `let (VARIABLE) STATEMENT' and serves to 4028 create a variable named VARIABLE temporarily for the duration of 4029 STATEMENT. To parse this construct, we must put VARIABLE into the 4030 symbol table while STATEMENT is parsed, then remove it afterward. Here 4031 is how it is done: 4032 4033 stmt: 4034 "let" '(' var ')' 4035 { 4036 $<context>$ = push_context (); 4037 declare_variable ($3); 4038 } 4039 stmt 4040 { 4041 $$ = $6; 4042 pop_context ($<context>5); 4043 } 4044 4045 As soon as `let (VARIABLE)' has been recognized, the first action is 4046 run. It saves a copy of the current semantic context (the list of 4047 accessible variables) as its semantic value, using alternative 4048 `context' in the data-type union. Then it calls `declare_variable' to 4049 add the new variable to that list. Once the first action is finished, 4050 the embedded statement `stmt' can be parsed. 4051 4052 Note that the mid-rule action is component number 5, so the `stmt' is 4053 component number 6. Named references can be used to improve the 4054 readability and maintainability (*note Named References::): 4055 4056 stmt: 4057 "let" '(' var ')' 4058 { 4059 $<context>let = push_context (); 4060 declare_variable ($3); 4061 }[let] 4062 stmt 4063 { 4064 $$ = $6; 4065 pop_context ($<context>let); 4066 } 4067 4068 After the embedded statement is parsed, its semantic value becomes 4069 the value of the entire `let'-statement. Then the semantic value from 4070 the earlier action is used to restore the prior list of variables. This 4071 removes the temporary `let'-variable from the list so that it won't 4072 appear to exist while the rest of the program is parsed. 4073 4074 In the above example, if the parser initiates error recovery (*note 4075 Error Recovery::) while parsing the tokens in the embedded statement 4076 `stmt', it might discard the previous semantic context `$<context>5' 4077 without restoring it. Thus, `$<context>5' needs a destructor (*note 4078 Freeing Discarded Symbols: Destructor Decl.). However, Bison currently 4079 provides no means to declare a destructor specific to a particular 4080 mid-rule action's semantic value. 4081 4082 One solution is to bury the mid-rule action inside a nonterminal 4083 symbol and to declare a destructor for that symbol: 4084 4085 %type <context> let 4086 %destructor { pop_context ($$); } let 4087 4088 %% 4089 4090 stmt: 4091 let stmt 4092 { 4093 $$ = $2; 4094 pop_context ($let); 4095 }; 4096 4097 let: 4098 "let" '(' var ')' 4099 { 4100 $let = push_context (); 4101 declare_variable ($3); 4102 }; 4103 4104 Note that the action is now at the end of its rule. Any mid-rule 4105 action can be converted to an end-of-rule action in this way, and this 4106 is what Bison actually does to implement mid-rule actions. 4107 4108 4109 File: bison.info, Node: Mid-Rule Action Translation, Next: Mid-Rule Conflicts, Prev: Using Mid-Rule Actions, Up: Mid-Rule Actions 4110 4111 3.5.5.2 Mid-Rule Action Translation 4112 ................................... 4113 4114 As hinted earlier, mid-rule actions are actually transformed into 4115 regular rules and actions. The various reports generated by Bison 4116 (textual, graphical, etc., see *note Understanding Your Parser: 4117 Understanding.) reveal this translation, best explained by means of an 4118 example. The following rule: 4119 4120 exp: { a(); } "b" { c(); } { d(); } "e" { f(); }; 4121 4122 is translated into: 4123 4124 $@1: /* empty */ { a(); }; 4125 $@2: /* empty */ { c(); }; 4126 $@3: /* empty */ { d(); }; 4127 exp: $@1 "b" $@2 $@3 "e" { f(); }; 4128 4129 with new nonterminal symbols `$@N', where N is a number. 4130 4131 A mid-rule action is expected to generate a value if it uses `$$', or 4132 the (final) action uses `$N' where N denote the mid-rule action. In 4133 that case its nonterminal is rather named `@N': 4134 4135 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4136 4137 is translated into 4138 4139 @1: /* empty */ { a(); }; 4140 @2: /* empty */ { $$ = c(); }; 4141 $@3: /* empty */ { d(); }; 4142 exp: @1 "b" @2 $@3 "e" { f = $1; } 4143 4144 There are probably two errors in the above example: the first 4145 mid-rule action does not generate a value (it does not use `$$' 4146 although the final action uses it), and the value of the second one is 4147 not used (the final action does not use `$3'). Bison reports these 4148 errors when the `midrule-value' warnings are enabled (*note Invoking 4149 Bison: Invocation.): 4150 4151 $ bison -fcaret -Wmidrule-value mid.y 4152 mid.y:2.6-13: warning: unset value: $$ 4153 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4154 ^^^^^^^^ 4155 mid.y:2.19-31: warning: unused value: $3 4156 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; 4157 ^^^^^^^^^^^^^ 4158 4159 4160 File: bison.info, Node: Mid-Rule Conflicts, Prev: Mid-Rule Action Translation, Up: Mid-Rule Actions 4161 4162 3.5.5.3 Conflicts due to Mid-Rule Actions 4163 ......................................... 4164 4165 Taking action before a rule is completely recognized often leads to 4166 conflicts since the parser must commit to a parse in order to execute 4167 the action. For example, the following two rules, without mid-rule 4168 actions, can coexist in a working parser because the parser can shift 4169 the open-brace token and look at what follows before deciding whether 4170 there is a declaration or not: 4171 4172 compound: 4173 '{' declarations statements '}' 4174 | '{' statements '}' 4175 ; 4176 4177 But when we add a mid-rule action as follows, the rules become 4178 nonfunctional: 4179 4180 compound: 4181 { prepare_for_local_variables (); } 4182 '{' declarations statements '}' 4183 | '{' statements '}' 4184 ; 4185 4186 Now the parser is forced to decide whether to run the mid-rule action 4187 when it has read no farther than the open-brace. In other words, it 4188 must commit to using one rule or the other, without sufficient 4189 information to do it correctly. (The open-brace token is what is called 4190 the "lookahead" token at this time, since the parser is still deciding 4191 what to do about it. *Note Lookahead Tokens: Lookahead.) 4192 4193 You might think that you could correct the problem by putting 4194 identical actions into the two rules, like this: 4195 4196 compound: 4197 { prepare_for_local_variables (); } 4198 '{' declarations statements '}' 4199 | { prepare_for_local_variables (); } 4200 '{' statements '}' 4201 ; 4202 4203 But this does not help, because Bison does not realize that the two 4204 actions are identical. (Bison never tries to understand the C code in 4205 an action.) 4206 4207 If the grammar is such that a declaration can be distinguished from a 4208 statement by the first token (which is true in C), then one solution 4209 which does work is to put the action after the open-brace, like this: 4210 4211 compound: 4212 '{' { prepare_for_local_variables (); } 4213 declarations statements '}' 4214 | '{' statements '}' 4215 ; 4216 4217 Now the first token of the following declaration or statement, which 4218 would in any case tell Bison which rule to use, can still do so. 4219 4220 Another solution is to bury the action inside a nonterminal symbol 4221 which serves as a subroutine: 4222 4223 subroutine: 4224 /* empty */ { prepare_for_local_variables (); } 4225 ; 4226 4227 compound: 4228 subroutine '{' declarations statements '}' 4229 | subroutine '{' statements '}' 4230 ; 4231 4232 Now Bison can execute the action in the rule for `subroutine' without 4233 deciding which rule for `compound' it will eventually use. 4234 4235 4236 File: bison.info, Node: Tracking Locations, Next: Named References, Prev: Semantics, Up: Grammar File 4237 4238 3.6 Tracking Locations 4239 ====================== 4240 4241 Though grammar rules and semantic actions are enough to write a fully 4242 functional parser, it can be useful to process some additional 4243 information, especially symbol locations. 4244 4245 The way locations are handled is defined by providing a data type, 4246 and actions to take when rules are matched. 4247 4248 * Menu: 4249 4250 * Location Type:: Specifying a data type for locations. 4251 * Actions and Locations:: Using locations in actions. 4252 * Location Default Action:: Defining a general way to compute locations. 4253 4254 4255 File: bison.info, Node: Location Type, Next: Actions and Locations, Up: Tracking Locations 4256 4257 3.6.1 Data Type of Locations 4258 ---------------------------- 4259 4260 Defining a data type for locations is much simpler than for semantic 4261 values, since all tokens and groupings always use the same type. 4262 4263 You can specify the type of locations by defining a macro called 4264 `YYLTYPE', just as you can specify the semantic value type by defining 4265 a `YYSTYPE' macro (*note Value Type::). When `YYLTYPE' is not defined, 4266 Bison uses a default structure type with four members: 4267 4268 typedef struct YYLTYPE 4269 { 4270 int first_line; 4271 int first_column; 4272 int last_line; 4273 int last_column; 4274 } YYLTYPE; 4275 4276 When `YYLTYPE' is not defined, at the beginning of the parsing, Bison 4277 initializes all these fields to 1 for `yylloc'. To initialize `yylloc' 4278 with a custom location type (or to chose a different initialization), 4279 use the `%initial-action' directive. *Note Performing Actions before 4280 Parsing: Initial Action Decl. 4281 4282 4283 File: bison.info, Node: Actions and Locations, Next: Location Default Action, Prev: Location Type, Up: Tracking Locations 4284 4285 3.6.2 Actions and Locations 4286 --------------------------- 4287 4288 Actions are not only useful for defining language semantics, but also 4289 for describing the behavior of the output parser with locations. 4290 4291 The most obvious way for building locations of syntactic groupings 4292 is very similar to the way semantic values are computed. In a given 4293 rule, several constructs can be used to access the locations of the 4294 elements being matched. The location of the Nth component of the right 4295 hand side is `@N', while the location of the left hand side grouping is 4296 `@$'. 4297 4298 In addition, the named references construct `@NAME' and `@[NAME]' 4299 may also be used to address the symbol locations. *Note Named 4300 References::, for more information about using the named references 4301 construct. 4302 4303 Here is a basic example using the default data type for locations: 4304 4305 exp: 4306 ... 4307 | exp '/' exp 4308 { 4309 @$.first_column = @1.first_column; 4310 @$.first_line = @1.first_line; 4311 @$.last_column = @3.last_column; 4312 @$.last_line = @3.last_line; 4313 if ($3) 4314 $$ = $1 / $3; 4315 else 4316 { 4317 $$ = 1; 4318 fprintf (stderr, 4319 "Division by zero, l%d,c%d-l%d,c%d", 4320 @3.first_line, @3.first_column, 4321 @3.last_line, @3.last_column); 4322 } 4323 } 4324 4325 As for semantic values, there is a default action for locations that 4326 is run each time a rule is matched. It sets the beginning of `@$' to 4327 the beginning of the first symbol, and the end of `@$' to the end of the 4328 last symbol. 4329 4330 With this default action, the location tracking can be fully 4331 automatic. The example above simply rewrites this way: 4332 4333 exp: 4334 ... 4335 | exp '/' exp 4336 { 4337 if ($3) 4338 $$ = $1 / $3; 4339 else 4340 { 4341 $$ = 1; 4342 fprintf (stderr, 4343 "Division by zero, l%d,c%d-l%d,c%d", 4344 @3.first_line, @3.first_column, 4345 @3.last_line, @3.last_column); 4346 } 4347 } 4348 4349 It is also possible to access the location of the lookahead token, 4350 if any, from a semantic action. This location is stored in `yylloc'. 4351 *Note Special Features for Use in Actions: Action Features. 4352 4353 4354 File: bison.info, Node: Location Default Action, Prev: Actions and Locations, Up: Tracking Locations 4355 4356 3.6.3 Default Action for Locations 4357 ---------------------------------- 4358 4359 Actually, actions are not the best place to compute locations. Since 4360 locations are much more general than semantic values, there is room in 4361 the output parser to redefine the default action to take for each rule. 4362 The `YYLLOC_DEFAULT' macro is invoked each time a rule is matched, 4363 before the associated action is run. It is also invoked while 4364 processing a syntax error, to compute the error's location. Before 4365 reporting an unresolvable syntactic ambiguity, a GLR parser invokes 4366 `YYLLOC_DEFAULT' recursively to compute the location of that ambiguity. 4367 4368 Most of the time, this macro is general enough to suppress location 4369 dedicated code from semantic actions. 4370 4371 The `YYLLOC_DEFAULT' macro takes three parameters. The first one is 4372 the location of the grouping (the result of the computation). When a 4373 rule is matched, the second parameter identifies locations of all right 4374 hand side elements of the rule being matched, and the third parameter 4375 is the size of the rule's right hand side. When a GLR parser reports 4376 an ambiguity, which of multiple candidate right hand sides it passes to 4377 `YYLLOC_DEFAULT' is undefined. When processing a syntax error, the 4378 second parameter identifies locations of the symbols that were 4379 discarded during error processing, and the third parameter is the 4380 number of discarded symbols. 4381 4382 By default, `YYLLOC_DEFAULT' is defined this way: 4383 4384 # define YYLLOC_DEFAULT(Cur, Rhs, N) \ 4385 do \ 4386 if (N) \ 4387 { \ 4388 (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ 4389 (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ 4390 (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ 4391 (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ 4392 } \ 4393 else \ 4394 { \ 4395 (Cur).first_line = (Cur).last_line = \ 4396 YYRHSLOC(Rhs, 0).last_line; \ 4397 (Cur).first_column = (Cur).last_column = \ 4398 YYRHSLOC(Rhs, 0).last_column; \ 4399 } \ 4400 while (0) 4401 4402 where `YYRHSLOC (rhs, k)' is the location of the Kth symbol in RHS when 4403 K is positive, and the location of the symbol just before the reduction 4404 when K and N are both zero. 4405 4406 When defining `YYLLOC_DEFAULT', you should consider that: 4407 4408 * All arguments are free of side-effects. However, only the first 4409 one (the result) should be modified by `YYLLOC_DEFAULT'. 4410 4411 * For consistency with semantic actions, valid indexes within the 4412 right hand side range from 1 to N. When N is zero, only 0 is a 4413 valid index, and it refers to the symbol just before the reduction. 4414 During error processing N is always positive. 4415 4416 * Your macro should parenthesize its arguments, if need be, since the 4417 actual arguments may not be surrounded by parentheses. Also, your 4418 macro should expand to something that can be used as a single 4419 statement when it is followed by a semicolon. 4420 4421 4422 File: bison.info, Node: Named References, Next: Declarations, Prev: Tracking Locations, Up: Grammar File 4423 4424 3.7 Named References 4425 ==================== 4426 4427 As described in the preceding sections, the traditional way to refer to 4428 any semantic value or location is a "positional reference", which takes 4429 the form `$N', `$$', `@N', and `@$'. However, such a reference is not 4430 very descriptive. Moreover, if you later decide to insert or remove 4431 symbols in the right-hand side of a grammar rule, the need to renumber 4432 such references can be tedious and error-prone. 4433 4434 To avoid these issues, you can also refer to a semantic value or 4435 location using a "named reference". First of all, original symbol 4436 names may be used as named references. For example: 4437 4438 invocation: op '(' args ')' 4439 { $invocation = new_invocation ($op, $args, @invocation); } 4440 4441 Positional and named references can be mixed arbitrarily. For example: 4442 4443 invocation: op '(' args ')' 4444 { $$ = new_invocation ($op, $args, @$); } 4445 4446 However, sometimes regular symbol names are not sufficient due to 4447 ambiguities: 4448 4449 exp: exp '/' exp 4450 { $exp = $exp / $exp; } // $exp is ambiguous. 4451 4452 exp: exp '/' exp 4453 { $$ = $1 / $exp; } // One usage is ambiguous. 4454 4455 exp: exp '/' exp 4456 { $$ = $1 / $3; } // No error. 4457 4458 When ambiguity occurs, explicitly declared names may be used for values 4459 and locations. Explicit names are declared as a bracketed name after a 4460 symbol appearance in rule definitions. For example: 4461 exp[result]: exp[left] '/' exp[right] 4462 { $result = $left / $right; } 4463 4464 In order to access a semantic value generated by a mid-rule action, an 4465 explicit name may also be declared by putting a bracketed name after the 4466 closing brace of the mid-rule action code: 4467 exp[res]: exp[x] '+' {$left = $x;}[left] exp[right] 4468 { $res = $left + $right; } 4469 4470 In references, in order to specify names containing dots and dashes, an 4471 explicit bracketed syntax `$[name]' and `@[name]' must be used: 4472 if-stmt: "if" '(' expr ')' "then" then.stmt ';' 4473 { $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); } 4474 4475 It often happens that named references are followed by a dot, dash 4476 or other C punctuation marks and operators. By default, Bison will read 4477 `$name.suffix' as a reference to symbol value `$name' followed by 4478 `.suffix', i.e., an access to the `suffix' field of the semantic value. 4479 In order to force Bison to recognize `name.suffix' in its entirety as 4480 the name of a semantic value, the bracketed syntax `$[name.suffix]' 4481 must be used. 4482 4483 The named references feature is experimental. More user feedback 4484 will help to stabilize it. 4485 4486 4487 File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Named References, Up: Grammar File 4488 4489 3.8 Bison Declarations 4490 ====================== 4491 4492 The "Bison declarations" section of a Bison grammar defines the symbols 4493 used in formulating the grammar and the data types of semantic values. 4494 *Note Symbols::. 4495 4496 All token type names (but not single-character literal tokens such as 4497 `'+'' and `'*'') must be declared. Nonterminal symbols must be 4498 declared if you need to specify which data type to use for the semantic 4499 value (*note More Than One Value Type: Multiple Types.). 4500 4501 The first rule in the grammar file also specifies the start symbol, 4502 by default. If you want some other symbol to be the start symbol, you 4503 must declare it explicitly (*note Languages and Context-Free Grammars: 4504 Language and Grammar.). 4505 4506 * Menu: 4507 4508 * Require Decl:: Requiring a Bison version. 4509 * Token Decl:: Declaring terminal symbols. 4510 * Precedence Decl:: Declaring terminals with precedence and associativity. 4511 * Union Decl:: Declaring the set of all semantic value types. 4512 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 4513 * Initial Action Decl:: Code run before parsing starts. 4514 * Destructor Decl:: Declaring how symbols are freed. 4515 * Printer Decl:: Declaring how symbol values are displayed. 4516 * Expect Decl:: Suppressing warnings about parsing conflicts. 4517 * Start Decl:: Specifying the start symbol. 4518 * Pure Decl:: Requesting a reentrant parser. 4519 * Push Decl:: Requesting a push parser. 4520 * Decl Summary:: Table of all Bison declarations. 4521 * %define Summary:: Defining variables to adjust Bison's behavior. 4522 * %code Summary:: Inserting code into the parser source. 4523 4524 4525 File: bison.info, Node: Require Decl, Next: Token Decl, Up: Declarations 4526 4527 3.8.1 Require a Version of Bison 4528 -------------------------------- 4529 4530 You may require the minimum version of Bison to process the grammar. If 4531 the requirement is not met, `bison' exits with an error (exit status 4532 63). 4533 4534 %require "VERSION" 4535 4536 4537 File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Require Decl, Up: Declarations 4538 4539 3.8.2 Token Type Names 4540 ---------------------- 4541 4542 The basic way to declare a token type name (terminal symbol) is as 4543 follows: 4544 4545 %token NAME 4546 4547 Bison will convert this into a `#define' directive in the parser, so 4548 that the function `yylex' (if it is in this file) can use the name NAME 4549 to stand for this token type's code. 4550 4551 Alternatively, you can use `%left', `%right', or `%nonassoc' instead 4552 of `%token', if you wish to specify associativity and precedence. 4553 *Note Operator Precedence: Precedence Decl. 4554 4555 You can explicitly specify the numeric code for a token type by 4556 appending a nonnegative decimal or hexadecimal integer value in the 4557 field immediately following the token name: 4558 4559 %token NUM 300 4560 %token XNUM 0x12d // a GNU extension 4561 4562 It is generally best, however, to let Bison choose the numeric codes for 4563 all token types. Bison will automatically select codes that don't 4564 conflict with each other or with normal characters. 4565 4566 In the event that the stack type is a union, you must augment the 4567 `%token' or other token declaration to include the data type 4568 alternative delimited by angle-brackets (*note More Than One Value 4569 Type: Multiple Types.). 4570 4571 For example: 4572 4573 %union { /* define stack type */ 4574 double val; 4575 symrec *tptr; 4576 } 4577 %token <val> NUM /* define token NUM and its type */ 4578 4579 You can associate a literal string token with a token type name by 4580 writing the literal string at the end of a `%token' declaration which 4581 declares the name. For example: 4582 4583 %token arrow "=>" 4584 4585 For example, a grammar for the C language might specify these names with 4586 equivalent literal string tokens: 4587 4588 %token <operator> OR "||" 4589 %token <operator> LE 134 "<=" 4590 %left OR "<=" 4591 4592 Once you equate the literal string and the token name, you can use them 4593 interchangeably in further declarations or the grammar rules. The 4594 `yylex' function can use the token name or the literal string to obtain 4595 the token type code number (*note Calling Convention::). Syntax error 4596 messages passed to `yyerror' from the parser will reference the literal 4597 string instead of the token name. 4598 4599 The token numbered as 0 corresponds to end of file; the following 4600 line allows for nicer error messages referring to "end of file" instead 4601 of "$end": 4602 4603 %token END 0 "end of file" 4604 4605 4606 File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations 4607 4608 3.8.3 Operator Precedence 4609 ------------------------- 4610 4611 Use the `%left', `%right' or `%nonassoc' declaration to declare a token 4612 and specify its precedence and associativity, all at once. These are 4613 called "precedence declarations". *Note Operator Precedence: 4614 Precedence, for general information on operator precedence. 4615 4616 The syntax of a precedence declaration is nearly the same as that of 4617 `%token': either 4618 4619 %left SYMBOLS... 4620 4621 or 4622 4623 %left <TYPE> SYMBOLS... 4624 4625 And indeed any of these declarations serves the purposes of `%token'. 4626 But in addition, they specify the associativity and relative precedence 4627 for all the SYMBOLS: 4628 4629 * The associativity of an operator OP determines how repeated uses 4630 of the operator nest: whether `X OP Y OP Z' is parsed by grouping 4631 X with Y first or by grouping Y with Z first. `%left' specifies 4632 left-associativity (grouping X with Y first) and `%right' 4633 specifies right-associativity (grouping Y with Z first). 4634 `%nonassoc' specifies no associativity, which means that `X OP Y 4635 OP Z' is considered a syntax error. 4636 4637 * The precedence of an operator determines how it nests with other 4638 operators. All the tokens declared in a single precedence 4639 declaration have equal precedence and nest together according to 4640 their associativity. When two tokens declared in different 4641 precedence declarations associate, the one declared later has the 4642 higher precedence and is grouped first. 4643 4644 For backward compatibility, there is a confusing difference between 4645 the argument lists of `%token' and precedence declarations. Only a 4646 `%token' can associate a literal string with a token type name. A 4647 precedence declaration always interprets a literal string as a 4648 reference to a separate token. For example: 4649 4650 %left OR "<=" // Does not declare an alias. 4651 %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". 4652 4653 4654 File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations 4655 4656 3.8.4 The Collection of Value Types 4657 ----------------------------------- 4658 4659 The `%union' declaration specifies the entire collection of possible 4660 data types for semantic values. The keyword `%union' is followed by 4661 braced code containing the same thing that goes inside a `union' in C. 4662 4663 For example: 4664 4665 %union { 4666 double val; 4667 symrec *tptr; 4668 } 4669 4670 This says that the two alternative types are `double' and `symrec *'. 4671 They are given names `val' and `tptr'; these names are used in the 4672 `%token' and `%type' declarations to pick one of the types for a 4673 terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.). 4674 4675 As an extension to POSIX, a tag is allowed after the `union'. For 4676 example: 4677 4678 %union value { 4679 double val; 4680 symrec *tptr; 4681 } 4682 4683 specifies the union tag `value', so the corresponding C type is `union 4684 value'. If you do not specify a tag, it defaults to `YYSTYPE'. 4685 4686 As another extension to POSIX, you may specify multiple `%union' 4687 declarations; their contents are concatenated. However, only the first 4688 `%union' declaration can specify a tag. 4689 4690 Note that, unlike making a `union' declaration in C, you need not 4691 write a semicolon after the closing brace. 4692 4693 Instead of `%union', you can define and use your own union type 4694 `YYSTYPE' if your grammar contains at least one `<TYPE>' tag. For 4695 example, you can put the following into a header file `parser.h': 4696 4697 union YYSTYPE { 4698 double val; 4699 symrec *tptr; 4700 }; 4701 typedef union YYSTYPE YYSTYPE; 4702 4703 and then your grammar can use the following instead of `%union': 4704 4705 %{ 4706 #include "parser.h" 4707 %} 4708 %type <val> expr 4709 %token <tptr> ID 4710 4711 4712 File: bison.info, Node: Type Decl, Next: Initial Action Decl, Prev: Union Decl, Up: Declarations 4713 4714 3.8.5 Nonterminal Symbols 4715 ------------------------- 4716 4717 When you use `%union' to specify multiple value types, you must declare 4718 the value type of each nonterminal symbol for which values are used. 4719 This is done with a `%type' declaration, like this: 4720 4721 %type <TYPE> NONTERMINAL... 4722 4723 Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the 4724 name given in the `%union' to the alternative that you want (*note The 4725 Collection of Value Types: Union Decl.). You can give any number of 4726 nonterminal symbols in the same `%type' declaration, if they have the 4727 same value type. Use spaces to separate the symbol names. 4728 4729 You can also declare the value type of a terminal symbol. To do 4730 this, use the same `<TYPE>' construction in a declaration for the 4731 terminal symbol. All kinds of token declarations allow `<TYPE>'. 4732 4733 4734 File: bison.info, Node: Initial Action Decl, Next: Destructor Decl, Prev: Type Decl, Up: Declarations 4735 4736 3.8.6 Performing Actions before Parsing 4737 --------------------------------------- 4738 4739 Sometimes your parser needs to perform some initializations before 4740 parsing. The `%initial-action' directive allows for such arbitrary 4741 code. 4742 4743 -- Directive: %initial-action { CODE } 4744 Declare that the braced CODE must be invoked before parsing each 4745 time `yyparse' is called. The CODE may use `$$' (or `$<TAG>$') 4746 and `@$' -- initial value and location of the lookahead -- and the 4747 `%parse-param'. 4748 4749 For instance, if your locations use a file name, you may use 4750 4751 %parse-param { char const *file_name }; 4752 %initial-action 4753 { 4754 @$.initialize (file_name); 4755 }; 4756 4757 4758 File: bison.info, Node: Destructor Decl, Next: Printer Decl, Prev: Initial Action Decl, Up: Declarations 4759 4760 3.8.7 Freeing Discarded Symbols 4761 ------------------------------- 4762 4763 During error recovery (*note Error Recovery::), symbols already pushed 4764 on the stack and tokens coming from the rest of the file are discarded 4765 until the parser falls on its feet. If the parser runs out of memory, 4766 or if it returns via `YYABORT' or `YYACCEPT', all the symbols on the 4767 stack must be discarded. Even if the parser succeeds, it must discard 4768 the start symbol. 4769 4770 When discarded symbols convey heap based information, this memory is 4771 lost. While this behavior can be tolerable for batch parsers, such as 4772 in traditional compilers, it is unacceptable for programs like shells or 4773 protocol implementations that may parse and execute indefinitely. 4774 4775 The `%destructor' directive defines code that is called when a 4776 symbol is automatically discarded. 4777 4778 -- Directive: %destructor { CODE } SYMBOLS 4779 Invoke the braced CODE whenever the parser discards one of the 4780 SYMBOLS. Within CODE, `$$' (or `$<TAG>$') designates the semantic 4781 value associated with the discarded symbol, and `@$' designates 4782 its location. The additional parser parameters are also available 4783 (*note The Parser Function `yyparse': Parser Function.). 4784 4785 When a symbol is listed among SYMBOLS, its `%destructor' is called 4786 a per-symbol `%destructor'. You may also define a per-type 4787 `%destructor' by listing a semantic type tag among SYMBOLS. In 4788 that case, the parser will invoke this CODE whenever it discards 4789 any grammar symbol that has that semantic type tag unless that 4790 symbol has its own per-symbol `%destructor'. 4791 4792 Finally, you can define two different kinds of default 4793 `%destructor's. (These default forms are experimental. More user 4794 feedback will help to determine whether they should become 4795 permanent features.) You can place each of `<*>' and `<>' in the 4796 SYMBOLS list of exactly one `%destructor' declaration in your 4797 grammar file. The parser will invoke the CODE associated with one 4798 of these whenever it discards any user-defined grammar symbol that 4799 has no per-symbol and no per-type `%destructor'. The parser uses 4800 the CODE for `<*>' in the case of such a grammar symbol for which 4801 you have formally declared a semantic type tag (`%type' counts as 4802 such a declaration, but `$<tag>$' does not). The parser uses the 4803 CODE for `<>' in the case of such a grammar symbol that has no 4804 declared semantic type tag. 4805 4806 For example: 4807 4808 %union { char *string; } 4809 %token <string> STRING1 4810 %token <string> STRING2 4811 %type <string> string1 4812 %type <string> string2 4813 %union { char character; } 4814 %token <character> CHR 4815 %type <character> chr 4816 %token TAGLESS 4817 4818 %destructor { } <character> 4819 %destructor { free ($$); } <*> 4820 %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1 4821 %destructor { printf ("Discarding tagless symbol.\n"); } <> 4822 4823 guarantees that, when the parser discards any user-defined symbol that 4824 has a semantic type tag other than `<character>', it passes its 4825 semantic value to `free' by default. However, when the parser discards 4826 a `STRING1' or a `string1', it also prints its line number to `stdout'. 4827 It performs only the second `%destructor' in this case, so it invokes 4828 `free' only once. Finally, the parser merely prints a message whenever 4829 it discards any symbol, such as `TAGLESS', that has no semantic type 4830 tag. 4831 4832 A Bison-generated parser invokes the default `%destructor's only for 4833 user-defined as opposed to Bison-defined symbols. For example, the 4834 parser will not invoke either kind of default `%destructor' for the 4835 special Bison-defined symbols `$accept', `$undefined', or `$end' (*note 4836 Bison Symbols: Table of Symbols.), none of which you can reference in 4837 your grammar. It also will not invoke either for the `error' token 4838 (*note error: Table of Symbols.), which is always defined by Bison 4839 regardless of whether you reference it in your grammar. However, it 4840 may invoke one of them for the end token (token 0) if you redefine it 4841 from `$end' to, for example, `END': 4842 4843 %token END 0 4844 4845 Finally, Bison will never invoke a `%destructor' for an unreferenced 4846 mid-rule semantic value (*note Actions in Mid-Rule: Mid-Rule Actions.). 4847 That is, Bison does not consider a mid-rule to have a semantic value if 4848 you do not reference `$$' in the mid-rule's action or `$N' (where N is 4849 the right-hand side symbol position of the mid-rule) in any later 4850 action in that rule. However, if you do reference either, the 4851 Bison-generated parser will invoke the `<>' `%destructor' whenever it 4852 discards the mid-rule symbol. 4853 4854 4855 "Discarded symbols" are the following: 4856 4857 * stacked symbols popped during the first phase of error recovery, 4858 4859 * incoming terminals during the second phase of error recovery, 4860 4861 * the current lookahead and the entire stack (except the current 4862 right-hand side symbols) when the parser returns immediately, and 4863 4864 * the current lookahead and the entire stack (including the current 4865 right-hand side symbols) when the C++ parser (`lalr1.cc') catches 4866 an exception in `parse', 4867 4868 * the start symbol, when the parser succeeds. 4869 4870 The parser can "return immediately" because of an explicit call to 4871 `YYABORT' or `YYACCEPT', or failed error recovery, or memory exhaustion. 4872 4873 Right-hand side symbols of a rule that explicitly triggers a syntax 4874 error via `YYERROR' are not discarded automatically. As a rule of 4875 thumb, destructors are invoked only when user actions cannot manage the 4876 memory. 4877 4878 4879 File: bison.info, Node: Printer Decl, Next: Expect Decl, Prev: Destructor Decl, Up: Declarations 4880 4881 3.8.8 Printing Semantic Values 4882 ------------------------------ 4883 4884 When run-time traces are enabled (*note Tracing Your Parser: Tracing.), 4885 the parser reports its actions, such as reductions. When a symbol 4886 involved in an action is reported, only its kind is displayed, as the 4887 parser cannot know how semantic values should be formatted. 4888 4889 The `%printer' directive defines code that is called when a symbol is 4890 reported. Its syntax is the same as `%destructor' (*note Freeing 4891 Discarded Symbols: Destructor Decl.). 4892 4893 -- Directive: %printer { CODE } SYMBOLS 4894 Invoke the braced CODE whenever the parser displays one of the 4895 SYMBOLS. Within CODE, `yyoutput' denotes the output stream (a 4896 `FILE*' in C, and an `std::ostream&' in C++), `$$' (or `$<TAG>$') 4897 designates the semantic value associated with the symbol, and `@$' 4898 its location. The additional parser parameters are also available 4899 (*note The Parser Function `yyparse': Parser Function.). 4900 4901 The SYMBOLS are defined as for `%destructor' (*note Freeing 4902 Discarded Symbols: Destructor Decl.): they can be per-type (e.g., 4903 `<ival>'), per-symbol (e.g., `exp', `NUM', `"float"'), typed 4904 per-default (i.e., `<*>', or untyped per-default (i.e., `<>'). 4905 4906 For example: 4907 4908 %union { char *string; } 4909 %token <string> STRING1 4910 %token <string> STRING2 4911 %type <string> string1 4912 %type <string> string2 4913 %union { char character; } 4914 %token <character> CHR 4915 %type <character> chr 4916 %token TAGLESS 4917 4918 %printer { fprintf (yyoutput, "'%c'", $$); } <character> 4919 %printer { fprintf (yyoutput, "&%p", $$); } <*> 4920 %printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1 4921 %printer { fprintf (yyoutput, "<>"); } <> 4922 4923 guarantees that, when the parser print any symbol that has a semantic 4924 type tag other than `<character>', it display the address of the 4925 semantic value by default. However, when the parser displays a 4926 `STRING1' or a `string1', it formats it as a string in double quotes. 4927 It performs only the second `%printer' in this case, so it prints only 4928 once. Finally, the parser print `<>' for any symbol, such as `TAGLESS', 4929 that has no semantic type tag. See also 4930 4931 4932 File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Printer Decl, Up: Declarations 4933 4934 3.8.9 Suppressing Conflict Warnings 4935 ----------------------------------- 4936 4937 Bison normally warns if there are any conflicts in the grammar (*note 4938 Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars have 4939 harmless shift/reduce conflicts which are resolved in a predictable way 4940 and would be difficult to eliminate. It is desirable to suppress the 4941 warning about these conflicts unless the number of conflicts changes. 4942 You can do this with the `%expect' declaration. 4943 4944 The declaration looks like this: 4945 4946 %expect N 4947 4948 Here N is a decimal integer. The declaration says there should be N 4949 shift/reduce conflicts and no reduce/reduce conflicts. Bison reports 4950 an error if the number of shift/reduce conflicts differs from N, or if 4951 there are any reduce/reduce conflicts. 4952 4953 For deterministic parsers, reduce/reduce conflicts are more serious, 4954 and should be eliminated entirely. Bison will always report 4955 reduce/reduce conflicts for these parsers. With GLR parsers, however, 4956 both kinds of conflicts are routine; otherwise, there would be no need 4957 to use GLR parsing. Therefore, it is also possible to specify an 4958 expected number of reduce/reduce conflicts in GLR parsers, using the 4959 declaration: 4960 4961 %expect-rr N 4962 4963 In general, using `%expect' involves these steps: 4964 4965 * Compile your grammar without `%expect'. Use the `-v' option to 4966 get a verbose list of where the conflicts occur. Bison will also 4967 print the number of conflicts. 4968 4969 * Check each of the conflicts to make sure that Bison's default 4970 resolution is what you really want. If not, rewrite the grammar 4971 and go back to the beginning. 4972 4973 * Add an `%expect' declaration, copying the number N from the number 4974 which Bison printed. With GLR parsers, add an `%expect-rr' 4975 declaration as well. 4976 4977 Now Bison will report an error if you introduce an unexpected 4978 conflict, but will keep silent otherwise. 4979 4980 4981 File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations 4982 4983 3.8.10 The Start-Symbol 4984 ----------------------- 4985 4986 Bison assumes by default that the start symbol for the grammar is the 4987 first nonterminal specified in the grammar specification section. The 4988 programmer may override this restriction with the `%start' declaration 4989 as follows: 4990 4991 %start SYMBOL 4992 4993 4994 File: bison.info, Node: Pure Decl, Next: Push Decl, Prev: Start Decl, Up: Declarations 4995 4996 3.8.11 A Pure (Reentrant) Parser 4997 -------------------------------- 4998 4999 A "reentrant" program is one which does not alter in the course of 5000 execution; in other words, it consists entirely of "pure" (read-only) 5001 code. Reentrancy is important whenever asynchronous execution is 5002 possible; for example, a nonreentrant program may not be safe to call 5003 from a signal handler. In systems with multiple threads of control, a 5004 nonreentrant program must be called only within interlocks. 5005 5006 Normally, Bison generates a parser which is not reentrant. This is 5007 suitable for most uses, and it permits compatibility with Yacc. (The 5008 standard Yacc interfaces are inherently nonreentrant, because they use 5009 statically allocated variables for communication with `yylex', 5010 including `yylval' and `yylloc'.) 5011 5012 Alternatively, you can generate a pure, reentrant parser. The Bison 5013 declaration `%define api.pure' says that you want the parser to be 5014 reentrant. It looks like this: 5015 5016 %define api.pure full 5017 5018 The result is that the communication variables `yylval' and `yylloc' 5019 become local variables in `yyparse', and a different calling convention 5020 is used for the lexical analyzer function `yylex'. *Note Calling 5021 Conventions for Pure Parsers: Pure Calling, for the details of this. 5022 The variable `yynerrs' becomes local in `yyparse' in pull mode but it 5023 becomes a member of yypstate in push mode. (*note The Error Reporting 5024 Function `yyerror': Error Reporting.). The convention for calling 5025 `yyparse' itself is unchanged. 5026 5027 Whether the parser is pure has nothing to do with the grammar rules. 5028 You can generate either a pure parser or a nonreentrant parser from any 5029 valid grammar. 5030 5031 5032 File: bison.info, Node: Push Decl, Next: Decl Summary, Prev: Pure Decl, Up: Declarations 5033 5034 3.8.12 A Push Parser 5035 -------------------- 5036 5037 (The current push parsing interface is experimental and may evolve. 5038 More user feedback will help to stabilize it.) 5039 5040 A pull parser is called once and it takes control until all its input 5041 is completely parsed. A push parser, on the other hand, is called each 5042 time a new token is made available. 5043 5044 A push parser is typically useful when the parser is part of a main 5045 event loop in the client's application. This is typically a 5046 requirement of a GUI, when the main event loop needs to be triggered 5047 within a certain time period. 5048 5049 Normally, Bison generates a pull parser. The following Bison 5050 declaration says that you want the parser to be a push parser (*note 5051 api.push-pull: %define Summary.): 5052 5053 %define api.push-pull push 5054 5055 In almost all cases, you want to ensure that your push parser is also 5056 a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). The only 5057 time you should create an impure push parser is to have backwards 5058 compatibility with the impure Yacc pull mode interface. Unless you know 5059 what you are doing, your declarations should look like this: 5060 5061 %define api.pure full 5062 %define api.push-pull push 5063 5064 There is a major notable functional difference between the pure push 5065 parser and the impure push parser. It is acceptable for a pure push 5066 parser to have many parser instances, of the same type of parser, in 5067 memory at the same time. An impure push parser should only use one 5068 parser at a time. 5069 5070 When a push parser is selected, Bison will generate some new symbols 5071 in the generated parser. `yypstate' is a structure that the generated 5072 parser uses to store the parser's state. `yypstate_new' is the 5073 function that will create a new parser instance. `yypstate_delete' 5074 will free the resources associated with the corresponding parser 5075 instance. Finally, `yypush_parse' is the function that should be 5076 called whenever a token is available to provide the parser. A trivial 5077 example of using a pure push parser would look like this: 5078 5079 int status; 5080 yypstate *ps = yypstate_new (); 5081 do { 5082 status = yypush_parse (ps, yylex (), NULL); 5083 } while (status == YYPUSH_MORE); 5084 yypstate_delete (ps); 5085 5086 If the user decided to use an impure push parser, a few things about 5087 the generated parser will change. The `yychar' variable becomes a 5088 global variable instead of a variable in the `yypush_parse' function. 5089 For this reason, the signature of the `yypush_parse' function is 5090 changed to remove the token as a parameter. A nonreentrant push parser 5091 example would thus look like this: 5092 5093 extern int yychar; 5094 int status; 5095 yypstate *ps = yypstate_new (); 5096 do { 5097 yychar = yylex (); 5098 status = yypush_parse (ps); 5099 } while (status == YYPUSH_MORE); 5100 yypstate_delete (ps); 5101 5102 That's it. Notice the next token is put into the global variable 5103 `yychar' for use by the next invocation of the `yypush_parse' function. 5104 5105 Bison also supports both the push parser interface along with the 5106 pull parser interface in the same generated parser. In order to get 5107 this functionality, you should replace the `%define api.push-pull push' 5108 declaration with the `%define api.push-pull both' declaration. Doing 5109 this will create all of the symbols mentioned earlier along with the 5110 two extra symbols, `yyparse' and `yypull_parse'. `yyparse' can be used 5111 exactly as it normally would be used. However, the user should note 5112 that it is implemented in the generated parser by calling 5113 `yypull_parse'. This makes the `yyparse' function that is generated 5114 with the `%define api.push-pull both' declaration slower than the normal 5115 `yyparse' function. If the user calls the `yypull_parse' function it 5116 will parse the rest of the input stream. It is possible to 5117 `yypush_parse' tokens to select a subgrammar and then `yypull_parse' 5118 the rest of the input stream. If you would like to switch back and 5119 forth between between parsing styles, you would have to write your own 5120 `yypull_parse' function that knows when to quit looking for input. An 5121 example of using the `yypull_parse' function would look like this: 5122 5123 yypstate *ps = yypstate_new (); 5124 yypull_parse (ps); /* Will call the lexer */ 5125 yypstate_delete (ps); 5126 5127 Adding the `%define api.pure full' declaration does exactly the same 5128 thing to the generated parser with `%define api.push-pull both' as it 5129 did for `%define api.push-pull push'. 5130 5131 5132 File: bison.info, Node: Decl Summary, Next: %define Summary, Prev: Push Decl, Up: Declarations 5133 5134 3.8.13 Bison Declaration Summary 5135 -------------------------------- 5136 5137 Here is a summary of the declarations used to define a grammar: 5138 5139 -- Directive: %union 5140 Declare the collection of data types that semantic values may have 5141 (*note The Collection of Value Types: Union Decl.). 5142 5143 -- Directive: %token 5144 Declare a terminal symbol (token type name) with no precedence or 5145 associativity specified (*note Token Type Names: Token Decl.). 5146 5147 -- Directive: %right 5148 Declare a terminal symbol (token type name) that is 5149 right-associative (*note Operator Precedence: Precedence Decl.). 5150 5151 -- Directive: %left 5152 Declare a terminal symbol (token type name) that is 5153 left-associative (*note Operator Precedence: Precedence Decl.). 5154 5155 -- Directive: %nonassoc 5156 Declare a terminal symbol (token type name) that is nonassociative 5157 (*note Operator Precedence: Precedence Decl.). Using it in a way 5158 that would be associative is a syntax error. 5159 5160 -- Directive: %type 5161 Declare the type of semantic values for a nonterminal symbol 5162 (*note Nonterminal Symbols: Type Decl.). 5163 5164 -- Directive: %start 5165 Specify the grammar's start symbol (*note The Start-Symbol: Start 5166 Decl.). 5167 5168 -- Directive: %expect 5169 Declare the expected number of shift-reduce conflicts (*note 5170 Suppressing Conflict Warnings: Expect Decl.). 5171 5172 5173 In order to change the behavior of `bison', use the following 5174 directives: 5175 5176 -- Directive: %code {CODE} 5177 -- Directive: %code QUALIFIER {CODE} 5178 Insert CODE verbatim into the output parser source at the default 5179 location or at the location specified by QUALIFIER. *Note %code 5180 Summary::. 5181 5182 -- Directive: %debug 5183 In the parser implementation file, define the macro `YYDEBUG' (or 5184 `PREFIXDEBUG' with `%define api.prefix PREFIX', see *note Multiple 5185 Parsers in the Same Program: Multiple Parsers.) to 1 if it is not 5186 already defined, so that the debugging facilities are compiled. 5187 *Note Tracing Your Parser: Tracing. 5188 5189 -- Directive: %define VARIABLE 5190 -- Directive: %define VARIABLE VALUE 5191 -- Directive: %define VARIABLE "VALUE" 5192 Define a variable to adjust Bison's behavior. *Note %define 5193 Summary::. 5194 5195 -- Directive: %defines 5196 Write a parser header file containing macro definitions for the 5197 token type names defined in the grammar as well as a few other 5198 declarations. If the parser implementation file is named `NAME.c' 5199 then the parser header file is named `NAME.h'. 5200 5201 For C parsers, the parser header file declares `YYSTYPE' unless 5202 `YYSTYPE' is already defined as a macro or you have used a 5203 `<TYPE>' tag without using `%union'. Therefore, if you are using 5204 a `%union' (*note More Than One Value Type: Multiple Types.) with 5205 components that require other definitions, or if you have defined 5206 a `YYSTYPE' macro or type definition (*note Data Types of Semantic 5207 Values: Value Type.), you need to arrange for these definitions to 5208 be propagated to all modules, e.g., by putting them in a 5209 prerequisite header that is included both by your parser and by any 5210 other module that needs `YYSTYPE'. 5211 5212 Unless your parser is pure, the parser header file declares 5213 `yylval' as an external variable. *Note A Pure (Reentrant) 5214 Parser: Pure Decl. 5215 5216 If you have also used locations, the parser header file declares 5217 `YYLTYPE' and `yylloc' using a protocol similar to that of the 5218 `YYSTYPE' macro and `yylval'. *Note Tracking Locations::. 5219 5220 This parser header file is normally essential if you wish to put 5221 the definition of `yylex' in a separate source file, because 5222 `yylex' typically needs to be able to refer to the above-mentioned 5223 declarations and to the token type codes. *Note Semantic Values 5224 of Tokens: Token Values. 5225 5226 If you have declared `%code requires' or `%code provides', the 5227 output header also contains their code. *Note %code Summary::. 5228 5229 The generated header is protected against multiple inclusions with 5230 a C preprocessor guard: `YY_PREFIX_FILE_INCLUDED', where PREFIX 5231 and FILE are the prefix (*note Multiple Parsers in the Same 5232 Program: Multiple Parsers.) and generated file name turned 5233 uppercase, with each series of non alphanumerical characters 5234 converted to a single underscore. 5235 5236 For instance with `%define api.prefix "calc"' and `%defines 5237 "lib/parse.h"', the header will be guarded as follows. 5238 #ifndef YY_CALC_LIB_PARSE_H_INCLUDED 5239 # define YY_CALC_LIB_PARSE_H_INCLUDED 5240 ... 5241 #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */ 5242 5243 -- Directive: %defines DEFINES-FILE 5244 Same as above, but save in the file DEFINES-FILE. 5245 5246 -- Directive: %destructor 5247 Specify how the parser should reclaim the memory associated to 5248 discarded symbols. *Note Freeing Discarded Symbols: Destructor 5249 Decl. 5250 5251 -- Directive: %file-prefix "PREFIX" 5252 Specify a prefix to use for all Bison output file names. The names 5253 are chosen as if the grammar file were named `PREFIX.y'. 5254 5255 -- Directive: %language "LANGUAGE" 5256 Specify the programming language for the generated parser. 5257 Currently supported languages include C, C++, and Java. LANGUAGE 5258 is case-insensitive. 5259 5260 5261 -- Directive: %locations 5262 Generate the code processing the locations (*note Special Features 5263 for Use in Actions: Action Features.). This mode is enabled as 5264 soon as the grammar uses the special `@N' tokens, but if your 5265 grammar does not use it, using `%locations' allows for more 5266 accurate syntax error messages. 5267 5268 -- Directive: %no-lines 5269 Don't generate any `#line' preprocessor commands in the parser 5270 implementation file. Ordinarily Bison writes these commands in the 5271 parser implementation file so that the C compiler and debuggers 5272 will associate errors and object code with your source file (the 5273 grammar file). This directive causes them to associate errors 5274 with the parser implementation file, treating it as an independent 5275 source file in its own right. 5276 5277 -- Directive: %output "FILE" 5278 Specify FILE for the parser implementation file. 5279 5280 -- Directive: %pure-parser 5281 Deprecated version of `%define api.pure' (*note api.pure: %define 5282 Summary.), for which Bison is more careful to warn about 5283 unreasonable usage. 5284 5285 -- Directive: %require "VERSION" 5286 Require version VERSION or higher of Bison. *Note Require a 5287 Version of Bison: Require Decl. 5288 5289 -- Directive: %skeleton "FILE" 5290 Specify the skeleton to use. 5291 5292 If FILE does not contain a `/', FILE is the name of a skeleton 5293 file in the Bison installation directory. If it does, FILE is an 5294 absolute file name or a file name relative to the directory of the 5295 grammar file. This is similar to how most shells resolve commands. 5296 5297 -- Directive: %token-table 5298 Generate an array of token names in the parser implementation file. 5299 The name of the array is `yytname'; `yytname[I]' is the name of 5300 the token whose internal Bison token code number is I. The first 5301 three elements of `yytname' correspond to the predefined tokens 5302 `"$end"', `"error"', and `"$undefined"'; after these come the 5303 symbols defined in the grammar file. 5304 5305 The name in the table includes all the characters needed to 5306 represent the token in Bison. For single-character literals and 5307 literal strings, this includes the surrounding quoting characters 5308 and any escape sequences. For example, the Bison single-character 5309 literal `'+'' corresponds to a three-character name, represented 5310 in C as `"'+'"'; and the Bison two-character literal string `"\\/"' 5311 corresponds to a five-character name, represented in C as 5312 `"\"\\\\/\""'. 5313 5314 When you specify `%token-table', Bison also generates macro 5315 definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and 5316 `YYNSTATES': 5317 5318 `YYNTOKENS' 5319 The highest token number, plus one. 5320 5321 `YYNNTS' 5322 The number of nonterminal symbols. 5323 5324 `YYNRULES' 5325 The number of grammar rules, 5326 5327 `YYNSTATES' 5328 The number of parser states (*note Parser States::). 5329 5330 -- Directive: %verbose 5331 Write an extra output file containing verbose descriptions of the 5332 parser states and what is done for each type of lookahead token in 5333 that state. *Note Understanding Your Parser: Understanding, for 5334 more information. 5335 5336 -- Directive: %yacc 5337 Pretend the option `--yacc' was given, i.e., imitate Yacc, 5338 including its naming conventions. *Note Bison Options::, for more. 5339 5340 5341 File: bison.info, Node: %define Summary, Next: %code Summary, Prev: Decl Summary, Up: Declarations 5342 5343 3.8.14 %define Summary 5344 ---------------------- 5345 5346 There are many features of Bison's behavior that can be controlled by 5347 assigning the feature a single value. For historical reasons, some 5348 such features are assigned values by dedicated directives, such as 5349 `%start', which assigns the start symbol. However, newer such features 5350 are associated with variables, which are assigned by the `%define' 5351 directive: 5352 5353 -- Directive: %define VARIABLE 5354 -- Directive: %define VARIABLE VALUE 5355 -- Directive: %define VARIABLE "VALUE" 5356 Define VARIABLE to VALUE. 5357 5358 VALUE must be placed in quotation marks if it contains any 5359 character other than a letter, underscore, period, or non-initial 5360 dash or digit. Omitting `"VALUE"' entirely is always equivalent 5361 to specifying `""'. 5362 5363 It is an error if a VARIABLE is defined by `%define' multiple 5364 times, but see *note -D NAME[=VALUE]: Bison Options. 5365 5366 The rest of this section summarizes variables and values that 5367 `%define' accepts. 5368 5369 Some VARIABLEs take Boolean values. In this case, Bison will 5370 complain if the variable definition does not meet one of the following 5371 four conditions: 5372 5373 1. `VALUE' is `true' 5374 5375 2. `VALUE' is omitted (or `""' is specified). This is equivalent to 5376 `true'. 5377 5378 3. `VALUE' is `false'. 5379 5380 4. VARIABLE is never defined. In this case, Bison selects a default 5381 value. 5382 5383 What VARIABLEs are accepted, as well as their meanings and default 5384 values, depend on the selected target language and/or the parser 5385 skeleton (*note %language: Decl Summary, *note %skeleton: Decl 5386 Summary.). Unaccepted VARIABLEs produce an error. Some of the 5387 accepted VARIABLEs are: 5388 5389 * `api.location.type' 5390 5391 * Language(s): C++, Java 5392 5393 * Purpose: Define the location type. *Note User Defined 5394 Location Type::. 5395 5396 * Accepted Values: String 5397 5398 * Default Value: none 5399 5400 * History: introduced in Bison 2.7 5401 5402 * `api.prefix' 5403 5404 * Language(s): All 5405 5406 * Purpose: Rename exported symbols. *Note Multiple Parsers in 5407 the Same Program: Multiple Parsers. 5408 5409 * Accepted Values: String 5410 5411 * Default Value: `yy' 5412 5413 * History: introduced in Bison 2.6 5414 5415 * `api.pure' 5416 5417 * Language(s): C 5418 5419 * Purpose: Request a pure (reentrant) parser program. *Note A 5420 Pure (Reentrant) Parser: Pure Decl. 5421 5422 * Accepted Values: `true', `false', `full' 5423 5424 The value may be omitted: this is equivalent to specifying 5425 `true', as is the case for Boolean values. 5426 5427 When `%define api.pure full' is used, the parser is made 5428 reentrant. This changes the signature for `yylex' (*note Pure 5429 Calling::), and also that of `yyerror' when the tracking of 5430 locations has been activated, as shown below. 5431 5432 The `true' value is very similar to the `full' value, the only 5433 difference is in the signature of `yyerror' on Yacc parsers 5434 without `%parse-param', for historical reasons. 5435 5436 I.e., if `%locations %define api.pure' is passed then the 5437 prototypes for `yyerror' are: 5438 5439 void yyerror (char const *msg); // Yacc parsers. 5440 void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers. 5441 5442 But if `%locations %define api.pure %parse-param {int 5443 *nastiness}' is used, then both parsers have the same 5444 signature: 5445 5446 void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg); 5447 5448 (*note The Error Reporting Function `yyerror': Error 5449 Reporting.) 5450 5451 * Default Value: `false' 5452 5453 * History: the `full' value was introduced in Bison 2.7 5454 5455 * `api.push-pull' 5456 5457 * Language(s): C (deterministic parsers only) 5458 5459 * Purpose: Request a pull parser, a push parser, or both. 5460 *Note A Push Parser: Push Decl. (The current push parsing 5461 interface is experimental and may evolve. More user feedback 5462 will help to stabilize it.) 5463 5464 * Accepted Values: `pull', `push', `both' 5465 5466 * Default Value: `pull' 5467 5468 * `lr.default-reductions' 5469 5470 * Language(s): all 5471 5472 * Purpose: Specify the kind of states that are permitted to 5473 contain default reductions. *Note Default Reductions::. 5474 (The ability to specify where default reductions should be 5475 used is experimental. More user feedback will help to 5476 stabilize it.) 5477 5478 * Accepted Values: `most', `consistent', `accepting' 5479 5480 * Default Value: 5481 * `accepting' if `lr.type' is `canonical-lr'. 5482 5483 * `most' otherwise. 5484 5485 * `lr.keep-unreachable-states' 5486 5487 * Language(s): all 5488 5489 * Purpose: Request that Bison allow unreachable parser states to 5490 remain in the parser tables. *Note Unreachable States::. 5491 5492 * Accepted Values: Boolean 5493 5494 * Default Value: `false' 5495 5496 * `lr.type' 5497 5498 * Language(s): all 5499 5500 * Purpose: Specify the type of parser tables within the LR(1) 5501 family. *Note LR Table Construction::. (This feature is 5502 experimental. More user feedback will help to stabilize it.) 5503 5504 * Accepted Values: `lalr', `ielr', `canonical-lr' 5505 5506 * Default Value: `lalr' 5507 5508 * `namespace' 5509 5510 * Languages(s): C++ 5511 5512 * Purpose: Specify the namespace for the parser class. For 5513 example, if you specify: 5514 5515 %define namespace "foo::bar" 5516 5517 Bison uses `foo::bar' verbatim in references such as: 5518 5519 foo::bar::parser::semantic_type 5520 5521 However, to open a namespace, Bison removes any leading `::' 5522 and then splits on any remaining occurrences: 5523 5524 namespace foo { namespace bar { 5525 class position; 5526 class location; 5527 } } 5528 5529 * Accepted Values: Any absolute or relative C++ namespace 5530 reference without a trailing `"::"'. For example, `"foo"' or 5531 `"::foo::bar"'. 5532 5533 * Default Value: The value specified by `%name-prefix', which 5534 defaults to `yy'. This usage of `%name-prefix' is for 5535 backward compatibility and can be confusing since 5536 `%name-prefix' also specifies the textual prefix for the 5537 lexical analyzer function. Thus, if you specify 5538 `%name-prefix', it is best to also specify `%define 5539 namespace' so that `%name-prefix' _only_ affects the lexical 5540 analyzer function. For example, if you specify: 5541 5542 %define namespace "foo" 5543 %name-prefix "bar::" 5544 5545 The parser namespace is `foo' and `yylex' is referenced as 5546 `bar::lex'. 5547 5548 * `parse.lac' 5549 5550 * Languages(s): C (deterministic parsers only) 5551 5552 * Purpose: Enable LAC (lookahead correction) to improve syntax 5553 error handling. *Note LAC::. 5554 5555 * Accepted Values: `none', `full' 5556 5557 * Default Value: `none' 5558 5559 5560 File: bison.info, Node: %code Summary, Prev: %define Summary, Up: Declarations 5561 5562 3.8.15 %code Summary 5563 -------------------- 5564 5565 The `%code' directive inserts code verbatim into the output parser 5566 source at any of a predefined set of locations. It thus serves as a 5567 flexible and user-friendly alternative to the traditional Yacc 5568 prologue, `%{CODE%}'. This section summarizes the functionality of 5569 `%code' for the various target languages supported by Bison. For a 5570 detailed discussion of how to use `%code' in place of `%{CODE%}' for 5571 C/C++ and why it is advantageous to do so, *note Prologue 5572 Alternatives::. 5573 5574 -- Directive: %code {CODE} 5575 This is the unqualified form of the `%code' directive. It inserts 5576 CODE verbatim at a language-dependent default location in the 5577 parser implementation. 5578 5579 For C/C++, the default location is the parser implementation file 5580 after the usual contents of the parser header file. Thus, the 5581 unqualified form replaces `%{CODE%}' for most purposes. 5582 5583 For Java, the default location is inside the parser class. 5584 5585 -- Directive: %code QUALIFIER {CODE} 5586 This is the qualified form of the `%code' directive. QUALIFIER 5587 identifies the purpose of CODE and thus the location(s) where 5588 Bison should insert it. That is, if you need to specify 5589 location-sensitive CODE that does not belong at the default 5590 location selected by the unqualified `%code' form, use this form 5591 instead. 5592 5593 For any particular qualifier or for the unqualified form, if there 5594 are multiple occurrences of the `%code' directive, Bison concatenates 5595 the specified code in the order in which it appears in the grammar file. 5596 5597 Not all qualifiers are accepted for all target languages. Unaccepted 5598 qualifiers produce an error. Some of the accepted qualifiers are: 5599 5600 * requires 5601 5602 * Language(s): C, C++ 5603 5604 * Purpose: This is the best place to write dependency code 5605 required for `YYSTYPE' and `YYLTYPE'. In other words, it's 5606 the best place to define types referenced in `%union' 5607 directives, and it's the best place to override Bison's 5608 default `YYSTYPE' and `YYLTYPE' definitions. 5609 5610 * Location(s): The parser header file and the parser 5611 implementation file before the Bison-generated `YYSTYPE' and 5612 `YYLTYPE' definitions. 5613 5614 * provides 5615 5616 * Language(s): C, C++ 5617 5618 * Purpose: This is the best place to write additional 5619 definitions and declarations that should be provided to other 5620 modules. 5621 5622 * Location(s): The parser header file and the parser 5623 implementation file after the Bison-generated `YYSTYPE', 5624 `YYLTYPE', and token definitions. 5625 5626 * top 5627 5628 * Language(s): C, C++ 5629 5630 * Purpose: The unqualified `%code' or `%code requires' should 5631 usually be more appropriate than `%code top'. However, 5632 occasionally it is necessary to insert code much nearer the 5633 top of the parser implementation file. For example: 5634 5635 %code top { 5636 #define _GNU_SOURCE 5637 #include <stdio.h> 5638 } 5639 5640 * Location(s): Near the top of the parser implementation file. 5641 5642 * imports 5643 5644 * Language(s): Java 5645 5646 * Purpose: This is the best place to write Java import 5647 directives. 5648 5649 * Location(s): The parser Java file after any Java package 5650 directive and before any class definitions. 5651 5652 Though we say the insertion locations are language-dependent, they 5653 are technically skeleton-dependent. Writers of non-standard skeletons 5654 however should choose their locations consistently with the behavior of 5655 the standard Bison skeletons. 5656 5657 5658 File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File 5659 5660 3.9 Multiple Parsers in the Same Program 5661 ======================================== 5662 5663 Most programs that use Bison parse only one language and therefore 5664 contain only one Bison parser. But what if you want to parse more than 5665 one language with the same program? Then you need to avoid name 5666 conflicts between different definitions of functions and variables such 5667 as `yyparse', `yylval'. To use different parsers from the same 5668 compilation unit, you also need to avoid conflicts on types and macros 5669 (e.g., `YYSTYPE') exported in the generated header. 5670 5671 The easy way to do this is to define the `%define' variable 5672 `api.prefix'. With different `api.prefix's it is guaranteed that 5673 headers do not conflict when included together, and that compiled 5674 objects can be linked together too. Specifying `%define api.prefix 5675 PREFIX' (or passing the option `-Dapi.prefix=PREFIX', see *note 5676 Invoking Bison: Invocation.) renames the interface functions and 5677 variables of the Bison parser to start with PREFIX instead of `yy', and 5678 all the macros to start by PREFIX (i.e., PREFIX upper-cased) instead of 5679 `YY'. 5680 5681 The renamed symbols include `yyparse', `yylex', `yyerror', 5682 `yynerrs', `yylval', `yylloc', `yychar' and `yydebug'. If you use a 5683 push parser, `yypush_parse', `yypull_parse', `yypstate', `yypstate_new' 5684 and `yypstate_delete' will also be renamed. The renamed macros include 5685 `YYSTYPE', `YYLTYPE', and `YYDEBUG', which is treated specifically -- 5686 more about this below. 5687 5688 For example, if you use `%define api.prefix c', the names become 5689 `cparse', `clex', ..., `CSTYPE', `CLTYPE', and so on. 5690 5691 The `%define' variable `api.prefix' works in two different ways. In 5692 the implementation file, it works by adding macro definitions to the 5693 beginning of the parser implementation file, defining `yyparse' as 5694 `PREFIXparse', and so on: 5695 5696 #define YYSTYPE CTYPE 5697 #define yyparse cparse 5698 #define yylval clval 5699 ... 5700 YYSTYPE yylval; 5701 int yyparse (void); 5702 5703 This effectively substitutes one name for the other in the entire 5704 parser implementation file, thus the "original" names (`yylex', 5705 `YYSTYPE', ...) are also usable in the parser implementation file. 5706 5707 However, in the parser header file, the symbols are defined renamed, 5708 for instance: 5709 5710 extern CSTYPE clval; 5711 int cparse (void); 5712 5713 The macro `YYDEBUG' is commonly used to enable the tracing support in 5714 parsers. To comply with this tradition, when `api.prefix' is used, 5715 `YYDEBUG' (not renamed) is used as a default value: 5716 5717 /* Enabling traces. */ 5718 #ifndef CDEBUG 5719 # if defined YYDEBUG 5720 # if YYDEBUG 5721 # define CDEBUG 1 5722 # else 5723 # define CDEBUG 0 5724 # endif 5725 # else 5726 # define CDEBUG 0 5727 # endif 5728 #endif 5729 #if CDEBUG 5730 extern int cdebug; 5731 #endif 5732 5733 5734 5735 Prior to Bison 2.6, a feature similar to `api.prefix' was provided by 5736 the obsolete directive `%name-prefix' (*note Bison Symbols: Table of 5737 Symbols.) and the option `--name-prefix' (*note Bison Options::). 5738 5739 5740 File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top 5741 5742 4 Parser C-Language Interface 5743 ***************************** 5744 5745 The Bison parser is actually a C function named `yyparse'. Here we 5746 describe the interface conventions of `yyparse' and the other functions 5747 that it needs to use. 5748 5749 Keep in mind that the parser uses many C identifiers starting with 5750 `yy' and `YY' for internal purposes. If you use such an identifier 5751 (aside from those in this manual) in an action or in epilogue in the 5752 grammar file, you are likely to run into trouble. 5753 5754 * Menu: 5755 5756 * Parser Function:: How to call `yyparse' and what it returns. 5757 * Push Parser Function:: How to call `yypush_parse' and what it returns. 5758 * Pull Parser Function:: How to call `yypull_parse' and what it returns. 5759 * Parser Create Function:: How to call `yypstate_new' and what it returns. 5760 * Parser Delete Function:: How to call `yypstate_delete' and what it returns. 5761 * Lexical:: You must supply a function `yylex' 5762 which reads tokens. 5763 * Error Reporting:: You must supply a function `yyerror'. 5764 * Action Features:: Special features for use in actions. 5765 * Internationalization:: How to let the parser speak in the user's 5766 native language. 5767 5768 5769 File: bison.info, Node: Parser Function, Next: Push Parser Function, Up: Interface 5770 5771 4.1 The Parser Function `yyparse' 5772 ================================= 5773 5774 You call the function `yyparse' to cause parsing to occur. This 5775 function reads tokens, executes actions, and ultimately returns when it 5776 encounters end-of-input or an unrecoverable syntax error. You can also 5777 write an action which directs `yyparse' to return immediately without 5778 reading further. 5779 5780 -- Function: int yyparse (void) 5781 The value returned by `yyparse' is 0 if parsing was successful 5782 (return is due to end-of-input). 5783 5784 The value is 1 if parsing failed because of invalid input, i.e., 5785 input that contains a syntax error or that causes `YYABORT' to be 5786 invoked. 5787 5788 The value is 2 if parsing failed due to memory exhaustion. 5789 5790 In an action, you can cause immediate return from `yyparse' by using 5791 these macros: 5792 5793 -- Macro: YYACCEPT 5794 Return immediately with value 0 (to report success). 5795 5796 -- Macro: YYABORT 5797 Return immediately with value 1 (to report failure). 5798 5799 If you use a reentrant parser, you can optionally pass additional 5800 parameter information to it in a reentrant way. To do so, use the 5801 declaration `%parse-param': 5802 5803 -- Directive: %parse-param {ARGUMENT-DECLARATION} 5804 Declare that an argument declared by the braced-code 5805 ARGUMENT-DECLARATION is an additional `yyparse' argument. The 5806 ARGUMENT-DECLARATION is used when declaring functions or 5807 prototypes. The last identifier in ARGUMENT-DECLARATION must be 5808 the argument name. 5809 5810 Here's an example. Write this in the parser: 5811 5812 %parse-param {int *nastiness} 5813 %parse-param {int *randomness} 5814 5815 Then call the parser like this: 5816 5817 { 5818 int nastiness, randomness; 5819 ... /* Store proper data in `nastiness' and `randomness'. */ 5820 value = yyparse (&nastiness, &randomness); 5821 ... 5822 } 5823 5824 In the grammar actions, use expressions like this to refer to the data: 5825 5826 exp: ... { ...; *randomness += 1; ... } 5827 5828 Using the following: 5829 %parse-param {int *randomness} 5830 5831 Results in these signatures: 5832 void yyerror (int *randomness, const char *msg); 5833 int yyparse (int *randomness); 5834 5835 Or, if both `%define api.pure full' (or just `%define api.pure') and 5836 `%locations' are used: 5837 5838 void yyerror (YYLTYPE *llocp, int *randomness, const char *msg); 5839 int yyparse (int *randomness); 5840 5841 5842 File: bison.info, Node: Push Parser Function, Next: Pull Parser Function, Prev: Parser Function, Up: Interface 5843 5844 4.2 The Push Parser Function `yypush_parse' 5845 =========================================== 5846 5847 (The current push parsing interface is experimental and may evolve. 5848 More user feedback will help to stabilize it.) 5849 5850 You call the function `yypush_parse' to parse a single token. This 5851 function is available if either the `%define api.push-pull push' or 5852 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5853 Push Decl. 5854 5855 -- Function: int yypush_parse (yypstate *yyps) 5856 The value returned by `yypush_parse' is the same as for yyparse 5857 with the following exception: it returns `YYPUSH_MORE' if more 5858 input is required to finish parsing the grammar. 5859 5860 5861 File: bison.info, Node: Pull Parser Function, Next: Parser Create Function, Prev: Push Parser Function, Up: Interface 5862 5863 4.3 The Pull Parser Function `yypull_parse' 5864 =========================================== 5865 5866 (The current push parsing interface is experimental and may evolve. 5867 More user feedback will help to stabilize it.) 5868 5869 You call the function `yypull_parse' to parse the rest of the input 5870 stream. This function is available if the `%define api.push-pull both' 5871 declaration is used. *Note A Push Parser: Push Decl. 5872 5873 -- Function: int yypull_parse (yypstate *yyps) 5874 The value returned by `yypull_parse' is the same as for `yyparse'. 5875 5876 5877 File: bison.info, Node: Parser Create Function, Next: Parser Delete Function, Prev: Pull Parser Function, Up: Interface 5878 5879 4.4 The Parser Create Function `yystate_new' 5880 ============================================ 5881 5882 (The current push parsing interface is experimental and may evolve. 5883 More user feedback will help to stabilize it.) 5884 5885 You call the function `yypstate_new' to create a new parser instance. 5886 This function is available if either the `%define api.push-pull push' or 5887 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5888 Push Decl. 5889 5890 -- Function: yypstate* yypstate_new (void) 5891 The function will return a valid parser instance if there was 5892 memory available or 0 if no memory was available. In impure mode, 5893 it will also return 0 if a parser instance is currently allocated. 5894 5895 5896 File: bison.info, Node: Parser Delete Function, Next: Lexical, Prev: Parser Create Function, Up: Interface 5897 5898 4.5 The Parser Delete Function `yystate_delete' 5899 =============================================== 5900 5901 (The current push parsing interface is experimental and may evolve. 5902 More user feedback will help to stabilize it.) 5903 5904 You call the function `yypstate_delete' to delete a parser instance. 5905 function is available if either the `%define api.push-pull push' or 5906 `%define api.push-pull both' declaration is used. *Note A Push Parser: 5907 Push Decl. 5908 5909 -- Function: void yypstate_delete (yypstate *yyps) 5910 This function will reclaim the memory associated with a parser 5911 instance. After this call, you should no longer attempt to use 5912 the parser instance. 5913 5914 5915 File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Delete Function, Up: Interface 5916 5917 4.6 The Lexical Analyzer Function `yylex' 5918 ========================================= 5919 5920 The "lexical analyzer" function, `yylex', recognizes tokens from the 5921 input stream and returns them to the parser. Bison does not create 5922 this function automatically; you must write it so that `yyparse' can 5923 call it. The function is sometimes referred to as a lexical scanner. 5924 5925 In simple programs, `yylex' is often defined at the end of the Bison 5926 grammar file. If `yylex' is defined in a separate source file, you 5927 need to arrange for the token-type macro definitions to be available 5928 there. To do this, use the `-d' option when you run Bison, so that it 5929 will write these macro definitions into the separate parser header 5930 file, `NAME.tab.h', which you can include in the other source files 5931 that need it. *Note Invoking Bison: Invocation. 5932 5933 * Menu: 5934 5935 * Calling Convention:: How `yyparse' calls `yylex'. 5936 * Token Values:: How `yylex' must return the semantic value 5937 of the token it has read. 5938 * Token Locations:: How `yylex' must return the text location 5939 (line number, etc.) of the token, if the 5940 actions want that. 5941 * Pure Calling:: How the calling convention differs in a pure parser 5942 (*note A Pure (Reentrant) Parser: Pure Decl.). 5943 5944 5945 File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical 5946 5947 4.6.1 Calling Convention for `yylex' 5948 ------------------------------------ 5949 5950 The value that `yylex' returns must be the positive numeric code for 5951 the type of token it has just found; a zero or negative value signifies 5952 end-of-input. 5953 5954 When a token is referred to in the grammar rules by a name, that name 5955 in the parser implementation file becomes a C macro whose definition is 5956 the proper numeric code for that token type. So `yylex' can use the 5957 name to indicate that type. *Note Symbols::. 5958 5959 When a token is referred to in the grammar rules by a character 5960 literal, the numeric code for that character is also the code for the 5961 token type. So `yylex' can simply return that character code, possibly 5962 converted to `unsigned char' to avoid sign-extension. The null 5963 character must not be used this way, because its code is zero and that 5964 signifies end-of-input. 5965 5966 Here is an example showing these things: 5967 5968 int 5969 yylex (void) 5970 { 5971 ... 5972 if (c == EOF) /* Detect end-of-input. */ 5973 return 0; 5974 ... 5975 if (c == '+' || c == '-') 5976 return c; /* Assume token type for `+' is '+'. */ 5977 ... 5978 return INT; /* Return the type of the token. */ 5979 ... 5980 } 5981 5982 This interface has been designed so that the output from the `lex' 5983 utility can be used without change as the definition of `yylex'. 5984 5985 If the grammar uses literal string tokens, there are two ways that 5986 `yylex' can determine the token type codes for them: 5987 5988 * If the grammar defines symbolic token names as aliases for the 5989 literal string tokens, `yylex' can use these symbolic names like 5990 all others. In this case, the use of the literal string tokens in 5991 the grammar file has no effect on `yylex'. 5992 5993 * `yylex' can find the multicharacter token in the `yytname' table. 5994 The index of the token in the table is the token type's code. The 5995 name of a multicharacter token is recorded in `yytname' with a 5996 double-quote, the token's characters, and another double-quote. 5997 The token's characters are escaped as necessary to be suitable as 5998 input to Bison. 5999 6000 Here's code for looking up a multicharacter token in `yytname', 6001 assuming that the characters of the token are stored in 6002 `token_buffer', and assuming that the token does not contain any 6003 characters like `"' that require escaping. 6004 6005 for (i = 0; i < YYNTOKENS; i++) 6006 { 6007 if (yytname[i] != 0 6008 && yytname[i][0] == '"' 6009 && ! strncmp (yytname[i] + 1, token_buffer, 6010 strlen (token_buffer)) 6011 && yytname[i][strlen (token_buffer) + 1] == '"' 6012 && yytname[i][strlen (token_buffer) + 2] == 0) 6013 break; 6014 } 6015 6016 The `yytname' table is generated only if you use the 6017 `%token-table' declaration. *Note Decl Summary::. 6018 6019 6020 File: bison.info, Node: Token Values, Next: Token Locations, Prev: Calling Convention, Up: Lexical 6021 6022 4.6.2 Semantic Values of Tokens 6023 ------------------------------- 6024 6025 In an ordinary (nonreentrant) parser, the semantic value of the token 6026 must be stored into the global variable `yylval'. When you are using 6027 just one data type for semantic values, `yylval' has that type. Thus, 6028 if the type is `int' (the default), you might write this in `yylex': 6029 6030 ... 6031 yylval = value; /* Put value onto Bison stack. */ 6032 return INT; /* Return the type of the token. */ 6033 ... 6034 6035 When you are using multiple data types, `yylval''s type is a union 6036 made from the `%union' declaration (*note The Collection of Value 6037 Types: Union Decl.). So when you store a token's value, you must use 6038 the proper member of the union. If the `%union' declaration looks like 6039 this: 6040 6041 %union { 6042 int intval; 6043 double val; 6044 symrec *tptr; 6045 } 6046 6047 then the code in `yylex' might look like this: 6048 6049 ... 6050 yylval.intval = value; /* Put value onto Bison stack. */ 6051 return INT; /* Return the type of the token. */ 6052 ... 6053 6054 6055 File: bison.info, Node: Token Locations, Next: Pure Calling, Prev: Token Values, Up: Lexical 6056 6057 4.6.3 Textual Locations of Tokens 6058 --------------------------------- 6059 6060 If you are using the `@N'-feature (*note Tracking Locations::) in 6061 actions to keep track of the textual locations of tokens and groupings, 6062 then you must provide this information in `yylex'. The function 6063 `yyparse' expects to find the textual location of a token just parsed 6064 in the global variable `yylloc'. So `yylex' must store the proper data 6065 in that variable. 6066 6067 By default, the value of `yylloc' is a structure and you need only 6068 initialize the members that are going to be used by the actions. The 6069 four members are called `first_line', `first_column', `last_line' and 6070 `last_column'. Note that the use of this feature makes the parser 6071 noticeably slower. 6072 6073 The data type of `yylloc' has the name `YYLTYPE'. 6074 6075 6076 File: bison.info, Node: Pure Calling, Prev: Token Locations, Up: Lexical 6077 6078 4.6.4 Calling Conventions for Pure Parsers 6079 ------------------------------------------ 6080 6081 When you use the Bison declaration `%define api.pure full' to request a 6082 pure, reentrant parser, the global communication variables `yylval' and 6083 `yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.) 6084 In such parsers the two global variables are replaced by pointers 6085 passed as arguments to `yylex'. You must declare them as shown here, 6086 and pass the information back by storing it through those pointers. 6087 6088 int 6089 yylex (YYSTYPE *lvalp, YYLTYPE *llocp) 6090 { 6091 ... 6092 *lvalp = value; /* Put value onto Bison stack. */ 6093 return INT; /* Return the type of the token. */ 6094 ... 6095 } 6096 6097 If the grammar file does not use the `@' constructs to refer to 6098 textual locations, then the type `YYLTYPE' will not be defined. In 6099 this case, omit the second argument; `yylex' will be called with only 6100 one argument. 6101 6102 If you wish to pass the additional parameter data to `yylex', use 6103 `%lex-param' just like `%parse-param' (*note Parser Function::). 6104 6105 -- Directive: lex-param {ARGUMENT-DECLARATION} 6106 Declare that the braced-code ARGUMENT-DECLARATION is an additional 6107 `yylex' argument declaration. 6108 6109 For instance: 6110 6111 %lex-param {int *nastiness} 6112 6113 results in the following signature: 6114 6115 int yylex (int *nastiness); 6116 6117 If `%define api.pure full' (or just `%define api.pure') is added: 6118 6119 int yylex (YYSTYPE *lvalp, int *nastiness); 6120 6121 6122 File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface 6123 6124 4.7 The Error Reporting Function `yyerror' 6125 ========================================== 6126 6127 The Bison parser detects a "syntax error" or "parse error" whenever it 6128 reads a token which cannot satisfy any syntax rule. An action in the 6129 grammar can also explicitly proclaim an error, using the macro 6130 `YYERROR' (*note Special Features for Use in Actions: Action Features.). 6131 6132 The Bison parser expects to report the error by calling an error 6133 reporting function named `yyerror', which you must supply. It is 6134 called by `yyparse' whenever a syntax error is found, and it receives 6135 one argument. For a syntax error, the string is normally 6136 `"syntax error"'. 6137 6138 If you invoke the directive `%error-verbose' in the Bison 6139 declarations section (*note The Bison Declarations Section: Bison 6140 Declarations.), then Bison provides a more verbose and specific error 6141 message string instead of just plain `"syntax error"'. However, that 6142 message sometimes contains incorrect information if LAC is not enabled 6143 (*note LAC::). 6144 6145 The parser can detect one other kind of error: memory exhaustion. 6146 This can happen when the input contains constructions that are very 6147 deeply nested. It isn't likely you will encounter this, since the Bison 6148 parser normally extends its stack automatically up to a very large 6149 limit. But if memory is exhausted, `yyparse' calls `yyerror' in the 6150 usual fashion, except that the argument string is `"memory exhausted"'. 6151 6152 In some cases diagnostics like `"syntax error"' are translated 6153 automatically from English to some other language before they are 6154 passed to `yyerror'. *Note Internationalization::. 6155 6156 The following definition suffices in simple programs: 6157 6158 void 6159 yyerror (char const *s) 6160 { 6161 fprintf (stderr, "%s\n", s); 6162 } 6163 6164 After `yyerror' returns to `yyparse', the latter will attempt error 6165 recovery if you have written suitable error recovery grammar rules 6166 (*note Error Recovery::). If recovery is impossible, `yyparse' will 6167 immediately return 1. 6168 6169 Obviously, in location tracking pure parsers, `yyerror' should have 6170 an access to the current location. With `%define api.pure', this is 6171 indeed the case for the GLR parsers, but not for the Yacc parser, for 6172 historical reasons, and this is the why `%define api.pure full' should 6173 be prefered over `%define api.pure'. 6174 6175 When `%locations %define api.pure full' is used, `yyerror' has the 6176 following signature: 6177 6178 void yyerror (YYLTYPE *locp, char const *msg); 6179 6180 The prototypes are only indications of how the code produced by Bison 6181 uses `yyerror'. Bison-generated code always ignores the returned 6182 value, so `yyerror' can return any type, including `void'. Also, 6183 `yyerror' can be a variadic function; that is why the message is always 6184 passed last. 6185 6186 Traditionally `yyerror' returns an `int' that is always ignored, but 6187 this is purely for historical reasons, and `void' is preferable since 6188 it more accurately describes the return type for `yyerror'. 6189 6190 The variable `yynerrs' contains the number of syntax errors reported 6191 so far. Normally this variable is global; but if you request a pure 6192 parser (*note A Pure (Reentrant) Parser: Pure Decl.) then it is a 6193 local variable which only the actions can access. 6194 6195 6196 File: bison.info, Node: Action Features, Next: Internationalization, Prev: Error Reporting, Up: Interface 6197 6198 4.8 Special Features for Use in Actions 6199 ======================================= 6200 6201 Here is a table of Bison constructs, variables and macros that are 6202 useful in actions. 6203 6204 -- Variable: $$ 6205 Acts like a variable that contains the semantic value for the 6206 grouping made by the current rule. *Note Actions::. 6207 6208 -- Variable: $N 6209 Acts like a variable that contains the semantic value for the Nth 6210 component of the current rule. *Note Actions::. 6211 6212 -- Variable: $<TYPEALT>$ 6213 Like `$$' but specifies alternative TYPEALT in the union specified 6214 by the `%union' declaration. *Note Data Types of Values in 6215 Actions: Action Types. 6216 6217 -- Variable: $<TYPEALT>N 6218 Like `$N' but specifies alternative TYPEALT in the union specified 6219 by the `%union' declaration. *Note Data Types of Values in 6220 Actions: Action Types. 6221 6222 -- Macro: YYABORT `;' 6223 Return immediately from `yyparse', indicating failure. *Note The 6224 Parser Function `yyparse': Parser Function. 6225 6226 -- Macro: YYACCEPT `;' 6227 Return immediately from `yyparse', indicating success. *Note The 6228 Parser Function `yyparse': Parser Function. 6229 6230 -- Macro: YYBACKUP (TOKEN, VALUE)`;' 6231 Unshift a token. This macro is allowed only for rules that reduce 6232 a single value, and only when there is no lookahead token. It is 6233 also disallowed in GLR parsers. It installs a lookahead token 6234 with token type TOKEN and semantic value VALUE; then it discards 6235 the value that was going to be reduced by this rule. 6236 6237 If the macro is used when it is not valid, such as when there is a 6238 lookahead token already, then it reports a syntax error with a 6239 message `cannot back up' and performs ordinary error recovery. 6240 6241 In either case, the rest of the action is not executed. 6242 6243 -- Macro: YYEMPTY 6244 Value stored in `yychar' when there is no lookahead token. 6245 6246 -- Macro: YYEOF 6247 Value stored in `yychar' when the lookahead is the end of the input 6248 stream. 6249 6250 -- Macro: YYERROR `;' 6251 Cause an immediate syntax error. This statement initiates error 6252 recovery just as if the parser itself had detected an error; 6253 however, it does not call `yyerror', and does not print any 6254 message. If you want to print an error message, call `yyerror' 6255 explicitly before the `YYERROR;' statement. *Note Error 6256 Recovery::. 6257 6258 -- Macro: YYRECOVERING 6259 The expression `YYRECOVERING ()' yields 1 when the parser is 6260 recovering from a syntax error, and 0 otherwise. *Note Error 6261 Recovery::. 6262 6263 -- Variable: yychar 6264 Variable containing either the lookahead token, or `YYEOF' when the 6265 lookahead is the end of the input stream, or `YYEMPTY' when no 6266 lookahead has been performed so the next token is not yet known. 6267 Do not modify `yychar' in a deferred semantic action (*note GLR 6268 Semantic Actions::). *Note Lookahead Tokens: Lookahead. 6269 6270 -- Macro: yyclearin `;' 6271 Discard the current lookahead token. This is useful primarily in 6272 error rules. Do not invoke `yyclearin' in a deferred semantic 6273 action (*note GLR Semantic Actions::). *Note Error Recovery::. 6274 6275 -- Macro: yyerrok `;' 6276 Resume generating error messages immediately for subsequent syntax 6277 errors. This is useful primarily in error rules. *Note Error 6278 Recovery::. 6279 6280 -- Variable: yylloc 6281 Variable containing the lookahead token location when `yychar' is 6282 not set to `YYEMPTY' or `YYEOF'. Do not modify `yylloc' in a 6283 deferred semantic action (*note GLR Semantic Actions::). *Note 6284 Actions and Locations: Actions and Locations. 6285 6286 -- Variable: yylval 6287 Variable containing the lookahead token semantic value when 6288 `yychar' is not set to `YYEMPTY' or `YYEOF'. Do not modify 6289 `yylval' in a deferred semantic action (*note GLR Semantic 6290 Actions::). *Note Actions: Actions. 6291 6292 -- Value: @$ 6293 Acts like a structure variable containing information on the 6294 textual location of the grouping made by the current rule. *Note 6295 Tracking Locations::. 6296 6297 6298 -- Value: @N 6299 Acts like a structure variable containing information on the 6300 textual location of the Nth component of the current rule. *Note 6301 Tracking Locations::. 6302 6303 6304 File: bison.info, Node: Internationalization, Prev: Action Features, Up: Interface 6305 6306 4.9 Parser Internationalization 6307 =============================== 6308 6309 A Bison-generated parser can print diagnostics, including error and 6310 tracing messages. By default, they appear in English. However, Bison 6311 also supports outputting diagnostics in the user's native language. To 6312 make this work, the user should set the usual environment variables. 6313 *Note The User's View: (gettext)Users. For example, the shell command 6314 `export LC_ALL=fr_CA.UTF-8' might set the user's locale to French 6315 Canadian using the UTF-8 encoding. The exact set of available locales 6316 depends on the user's installation. 6317 6318 The maintainer of a package that uses a Bison-generated parser 6319 enables the internationalization of the parser's output through the 6320 following steps. Here we assume a package that uses GNU Autoconf and 6321 GNU Automake. 6322 6323 1. Into the directory containing the GNU Autoconf macros used by the 6324 package --often called `m4'-- copy the `bison-i18n.m4' file 6325 installed by Bison under `share/aclocal/bison-i18n.m4' in Bison's 6326 installation directory. For example: 6327 6328 cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 6329 6330 2. In the top-level `configure.ac', after the `AM_GNU_GETTEXT' 6331 invocation, add an invocation of `BISON_I18N'. This macro is 6332 defined in the file `bison-i18n.m4' that you copied earlier. It 6333 causes `configure' to find the value of the `BISON_LOCALEDIR' 6334 variable, and it defines the source-language symbol `YYENABLE_NLS' 6335 to enable translations in the Bison-generated parser. 6336 6337 3. In the `main' function of your program, designate the directory 6338 containing Bison's runtime message catalog, through a call to 6339 `bindtextdomain' with domain name `bison-runtime'. For example: 6340 6341 bindtextdomain ("bison-runtime", BISON_LOCALEDIR); 6342 6343 Typically this appears after any other call `bindtextdomain 6344 (PACKAGE, LOCALEDIR)' that your package already has. Here we rely 6345 on `BISON_LOCALEDIR' to be defined as a string through the 6346 `Makefile'. 6347 6348 4. In the `Makefile.am' that controls the compilation of the `main' 6349 function, make `BISON_LOCALEDIR' available as a C preprocessor 6350 macro, either in `DEFS' or in `AM_CPPFLAGS'. For example: 6351 6352 DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 6353 6354 or: 6355 6356 AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 6357 6358 5. Finally, invoke the command `autoreconf' to generate the build 6359 infrastructure. 6360 6361 6362 File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top 6363 6364 5 The Bison Parser Algorithm 6365 **************************** 6366 6367 As Bison reads tokens, it pushes them onto a stack along with their 6368 semantic values. The stack is called the "parser stack". Pushing a 6369 token is traditionally called "shifting". 6370 6371 For example, suppose the infix calculator has read `1 + 5 *', with a 6372 `3' to come. The stack will have four elements, one for each token 6373 that was shifted. 6374 6375 But the stack does not always have an element for each token read. 6376 When the last N tokens and groupings shifted match the components of a 6377 grammar rule, they can be combined according to that rule. This is 6378 called "reduction". Those tokens and groupings are replaced on the 6379 stack by a single grouping whose symbol is the result (left hand side) 6380 of that rule. Running the rule's action is part of the process of 6381 reduction, because this is what computes the semantic value of the 6382 resulting grouping. 6383 6384 For example, if the infix calculator's parser stack contains this: 6385 6386 1 + 5 * 3 6387 6388 and the next input token is a newline character, then the last three 6389 elements can be reduced to 15 via the rule: 6390 6391 expr: expr '*' expr; 6392 6393 Then the stack contains just these three elements: 6394 6395 1 + 15 6396 6397 At this point, another reduction can be made, resulting in the single 6398 value 16. Then the newline token can be shifted. 6399 6400 The parser tries, by shifts and reductions, to reduce the entire 6401 input down to a single grouping whose symbol is the grammar's 6402 start-symbol (*note Languages and Context-Free Grammars: Language and 6403 Grammar.). 6404 6405 This kind of parser is known in the literature as a bottom-up parser. 6406 6407 * Menu: 6408 6409 * Lookahead:: Parser looks one token ahead when deciding what to do. 6410 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 6411 * Precedence:: Operator precedence works by resolving conflicts. 6412 * Contextual Precedence:: When an operator's precedence depends on context. 6413 * Parser States:: The parser is a finite-state-machine with stack. 6414 * Reduce/Reduce:: When two rules are applicable in the same situation. 6415 * Mysterious Conflicts:: Conflicts that look unjustified. 6416 * Tuning LR:: How to tune fundamental aspects of LR-based parsing. 6417 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 6418 * Memory Management:: What happens when memory is exhausted. How to avoid it. 6419 6420 6421 File: bison.info, Node: Lookahead, Next: Shift/Reduce, Up: Algorithm 6422 6423 5.1 Lookahead Tokens 6424 ==================== 6425 6426 The Bison parser does _not_ always reduce immediately as soon as the 6427 last N tokens and groupings match a rule. This is because such a 6428 simple strategy is inadequate to handle most languages. Instead, when a 6429 reduction is possible, the parser sometimes "looks ahead" at the next 6430 token in order to decide what to do. 6431 6432 When a token is read, it is not immediately shifted; first it 6433 becomes the "lookahead token", which is not on the stack. Now the 6434 parser can perform one or more reductions of tokens and groupings on 6435 the stack, while the lookahead token remains off to the side. When no 6436 more reductions should take place, the lookahead token is shifted onto 6437 the stack. This does not mean that all possible reductions have been 6438 done; depending on the token type of the lookahead token, some rules 6439 may choose to delay their application. 6440 6441 Here is a simple case where lookahead is needed. These three rules 6442 define expressions which contain binary addition operators and postfix 6443 unary factorial operators (`!'), and allow parentheses for grouping. 6444 6445 expr: 6446 term '+' expr 6447 | term 6448 ; 6449 6450 term: 6451 '(' expr ')' 6452 | term '!' 6453 | "number" 6454 ; 6455 6456 Suppose that the tokens `1 + 2' have been read and shifted; what 6457 should be done? If the following token is `)', then the first three 6458 tokens must be reduced to form an `expr'. This is the only valid 6459 course, because shifting the `)' would produce a sequence of symbols 6460 `term ')'', and no rule allows this. 6461 6462 If the following token is `!', then it must be shifted immediately so 6463 that `2 !' can be reduced to make a `term'. If instead the parser were 6464 to reduce before shifting, `1 + 2' would become an `expr'. It would 6465 then be impossible to shift the `!' because doing so would produce on 6466 the stack the sequence of symbols `expr '!''. No rule allows that 6467 sequence. 6468 6469 The lookahead token is stored in the variable `yychar'. Its 6470 semantic value and location, if any, are stored in the variables 6471 `yylval' and `yylloc'. *Note Special Features for Use in Actions: 6472 Action Features. 6473 6474 6475 File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Lookahead, Up: Algorithm 6476 6477 5.2 Shift/Reduce Conflicts 6478 ========================== 6479 6480 Suppose we are parsing a language which has if-then and if-then-else 6481 statements, with a pair of rules like this: 6482 6483 if_stmt: 6484 "if" expr "then" stmt 6485 | "if" expr "then" stmt "else" stmt 6486 ; 6487 6488 Here `"if"', `"then"' and `"else"' are terminal symbols for specific 6489 keyword tokens. 6490 6491 When the `"else"' token is read and becomes the lookahead token, the 6492 contents of the stack (assuming the input is valid) are just right for 6493 reduction by the first rule. But it is also legitimate to shift the 6494 `"else"', because that would lead to eventual reduction by the second 6495 rule. 6496 6497 This situation, where either a shift or a reduction would be valid, 6498 is called a "shift/reduce conflict". Bison is designed to resolve 6499 these conflicts by choosing to shift, unless otherwise directed by 6500 operator precedence declarations. To see the reason for this, let's 6501 contrast it with the other alternative. 6502 6503 Since the parser prefers to shift the `"else"', the result is to 6504 attach the else-clause to the innermost if-statement, making these two 6505 inputs equivalent: 6506 6507 if x then if y then win; else lose; 6508 6509 if x then do; if y then win; else lose; end; 6510 6511 But if the parser chose to reduce when possible rather than shift, 6512 the result would be to attach the else-clause to the outermost 6513 if-statement, making these two inputs equivalent: 6514 6515 if x then if y then win; else lose; 6516 6517 if x then do; if y then win; end; else lose; 6518 6519 The conflict exists because the grammar as written is ambiguous: 6520 either parsing of the simple nested if-statement is legitimate. The 6521 established convention is that these ambiguities are resolved by 6522 attaching the else-clause to the innermost if-statement; this is what 6523 Bison accomplishes by choosing to shift rather than reduce. (It would 6524 ideally be cleaner to write an unambiguous grammar, but that is very 6525 hard to do in this case.) This particular ambiguity was first 6526 encountered in the specifications of Algol 60 and is called the 6527 "dangling `else'" ambiguity. 6528 6529 To avoid warnings from Bison about predictable, legitimate 6530 shift/reduce conflicts, you can use the `%expect N' declaration. There 6531 will be no warning as long as the number of shift/reduce conflicts is 6532 exactly N, and Bison will report an error if there is a different 6533 number. *Note Suppressing Conflict Warnings: Expect Decl. However, we 6534 don't recommend the use of `%expect' (except `%expect 0'!), as an equal 6535 number of conflicts does not mean that they are the _same_. When 6536 possible, you should rather use precedence directives to _fix_ the 6537 conflicts explicitly (*note Using Precedence For Non Operators: Non 6538 Operators.). 6539 6540 The definition of `if_stmt' above is solely to blame for the 6541 conflict, but the conflict does not actually appear without additional 6542 rules. Here is a complete Bison grammar file that actually manifests 6543 the conflict: 6544 6545 %% 6546 stmt: 6547 expr 6548 | if_stmt 6549 ; 6550 6551 if_stmt: 6552 "if" expr "then" stmt 6553 | "if" expr "then" stmt "else" stmt 6554 ; 6555 6556 expr: 6557 "identifier" 6558 ; 6559 6560 6561 File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm 6562 6563 5.3 Operator Precedence 6564 ======================= 6565 6566 Another situation where shift/reduce conflicts appear is in arithmetic 6567 expressions. Here shifting is not always the preferred resolution; the 6568 Bison declarations for operator precedence allow you to specify when to 6569 shift and when to reduce. 6570 6571 * Menu: 6572 6573 * Why Precedence:: An example showing why precedence is needed. 6574 * Using Precedence:: How to specify precedence in Bison grammars. 6575 * Precedence Examples:: How these features are used in the previous example. 6576 * How Precedence:: How they work. 6577 * Non Operators:: Using precedence for general conflicts. 6578 6579 6580 File: bison.info, Node: Why Precedence, Next: Using Precedence, Up: Precedence 6581 6582 5.3.1 When Precedence is Needed 6583 ------------------------------- 6584 6585 Consider the following ambiguous grammar fragment (ambiguous because the 6586 input `1 - 2 * 3' can be parsed in two different ways): 6587 6588 expr: 6589 expr '-' expr 6590 | expr '*' expr 6591 | expr '<' expr 6592 | '(' expr ')' 6593 ... 6594 ; 6595 6596 Suppose the parser has seen the tokens `1', `-' and `2'; should it 6597 reduce them via the rule for the subtraction operator? It depends on 6598 the next token. Of course, if the next token is `)', we must reduce; 6599 shifting is invalid because no single rule can reduce the token 6600 sequence `- 2 )' or anything starting with that. But if the next token 6601 is `*' or `<', we have a choice: either shifting or reduction would 6602 allow the parse to complete, but with different results. 6603 6604 To decide which one Bison should do, we must consider the results. 6605 If the next operator token OP is shifted, then it must be reduced first 6606 in order to permit another opportunity to reduce the difference. The 6607 result is (in effect) `1 - (2 OP 3)'. On the other hand, if the 6608 subtraction is reduced before shifting OP, the result is 6609 `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce should 6610 depend on the relative precedence of the operators `-' and OP: `*' 6611 should be shifted first, but not `<'. 6612 6613 What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5' 6614 or should it be `1 - (2 - 5)'? For most operators we prefer the 6615 former, which is called "left association". The latter alternative, 6616 "right association", is desirable for assignment operators. The choice 6617 of left or right association is a matter of whether the parser chooses 6618 to shift or reduce when the stack contains `1 - 2' and the lookahead 6619 token is `-': shifting makes right-associativity. 6620 6621 6622 File: bison.info, Node: Using Precedence, Next: Precedence Examples, Prev: Why Precedence, Up: Precedence 6623 6624 5.3.2 Specifying Operator Precedence 6625 ------------------------------------ 6626 6627 Bison allows you to specify these choices with the operator precedence 6628 declarations `%left' and `%right'. Each such declaration contains a 6629 list of tokens, which are operators whose precedence and associativity 6630 is being declared. The `%left' declaration makes all those operators 6631 left-associative and the `%right' declaration makes them 6632 right-associative. A third alternative is `%nonassoc', which declares 6633 that it is a syntax error to find the same operator twice "in a row". 6634 6635 The relative precedence of different operators is controlled by the 6636 order in which they are declared. The first `%left' or `%right' 6637 declaration in the file declares the operators whose precedence is 6638 lowest, the next such declaration declares the operators whose 6639 precedence is a little higher, and so on. 6640 6641 6642 File: bison.info, Node: Precedence Examples, Next: How Precedence, Prev: Using Precedence, Up: Precedence 6643 6644 5.3.3 Precedence Examples 6645 ------------------------- 6646 6647 In our example, we would want the following declarations: 6648 6649 %left '<' 6650 %left '-' 6651 %left '*' 6652 6653 In a more complete example, which supports other operators as well, 6654 we would declare them in groups of equal precedence. For example, 6655 `'+'' is declared with `'-'': 6656 6657 %left '<' '>' '=' "!=" "<=" ">=" 6658 %left '+' '-' 6659 %left '*' '/' 6660 6661 6662 File: bison.info, Node: How Precedence, Next: Non Operators, Prev: Precedence Examples, Up: Precedence 6663 6664 5.3.4 How Precedence Works 6665 -------------------------- 6666 6667 The first effect of the precedence declarations is to assign precedence 6668 levels to the terminal symbols declared. The second effect is to assign 6669 precedence levels to certain rules: each rule gets its precedence from 6670 the last terminal symbol mentioned in the components. (You can also 6671 specify explicitly the precedence of a rule. *Note Context-Dependent 6672 Precedence: Contextual Precedence.) 6673 6674 Finally, the resolution of conflicts works by comparing the 6675 precedence of the rule being considered with that of the lookahead 6676 token. If the token's precedence is higher, the choice is to shift. 6677 If the rule's precedence is higher, the choice is to reduce. If they 6678 have equal precedence, the choice is made based on the associativity of 6679 that precedence level. The verbose output file made by `-v' (*note 6680 Invoking Bison: Invocation.) says how each conflict was resolved. 6681 6682 Not all rules and not all tokens have precedence. If either the 6683 rule or the lookahead token has no precedence, then the default is to 6684 shift. 6685 6686 6687 File: bison.info, Node: Non Operators, Prev: How Precedence, Up: Precedence 6688 6689 5.3.5 Using Precedence For Non Operators 6690 ---------------------------------------- 6691 6692 Using properly precedence and associativity directives can help fixing 6693 shift/reduce conflicts that do not involve arithmetics-like operators. 6694 For instance, the "dangling `else'" problem (*note Shift/Reduce 6695 Conflicts: Shift/Reduce.) can be solved elegantly in two different ways. 6696 6697 In the present case, the conflict is between the token `"else"' 6698 willing to be shifted, and the rule `if_stmt: "if" expr "then" stmt', 6699 asking for reduction. By default, the precedence of a rule is that of 6700 its last token, here `"then"', so the conflict will be solved 6701 appropriately by giving `"else"' a precedence higher than that of 6702 `"then"', for instance as follows: 6703 6704 %nonassoc "then" 6705 %nonassoc "else" 6706 6707 Alternatively, you may give both tokens the same precedence, in 6708 which case associativity is used to solve the conflict. To preserve 6709 the shift action, use right associativity: 6710 6711 %right "then" "else" 6712 6713 Neither solution is perfect however. Since Bison does not provide, 6714 so far, support for "scoped" precedence, both force you to declare the 6715 precedence of these keywords with respect to the other operators your 6716 grammar. Therefore, instead of being warned about new conflicts you 6717 would be unaware of (e.g., a shift/reduce conflict due to `if test then 6718 1 else 2 + 3' being ambiguous: `if test then 1 else (2 + 3)' or `(if 6719 test then 1 else 2) + 3'?), the conflict will be already "fixed". 6720 6721 6722 File: bison.info, Node: Contextual Precedence, Next: Parser States, Prev: Precedence, Up: Algorithm 6723 6724 5.4 Context-Dependent Precedence 6725 ================================ 6726 6727 Often the precedence of an operator depends on the context. This sounds 6728 outlandish at first, but it is really very common. For example, a minus 6729 sign typically has a very high precedence as a unary operator, and a 6730 somewhat lower precedence (lower than multiplication) as a binary 6731 operator. 6732 6733 The Bison precedence declarations, `%left', `%right' and 6734 `%nonassoc', can only be used once for a given token; so a token has 6735 only one precedence declared in this way. For context-dependent 6736 precedence, you need to use an additional mechanism: the `%prec' 6737 modifier for rules. 6738 6739 The `%prec' modifier declares the precedence of a particular rule by 6740 specifying a terminal symbol whose precedence should be used for that 6741 rule. It's not necessary for that symbol to appear otherwise in the 6742 rule. The modifier's syntax is: 6743 6744 %prec TERMINAL-SYMBOL 6745 6746 and it is written after the components of the rule. Its effect is to 6747 assign the rule the precedence of TERMINAL-SYMBOL, overriding the 6748 precedence that would be deduced for it in the ordinary way. The 6749 altered rule precedence then affects how conflicts involving that rule 6750 are resolved (*note Operator Precedence: Precedence.). 6751 6752 Here is how `%prec' solves the problem of unary minus. First, 6753 declare a precedence for a fictitious terminal symbol named `UMINUS'. 6754 There are no tokens of this type, but the symbol serves to stand for its 6755 precedence: 6756 6757 ... 6758 %left '+' '-' 6759 %left '*' 6760 %left UMINUS 6761 6762 Now the precedence of `UMINUS' can be used in specific rules: 6763 6764 exp: 6765 ... 6766 | exp '-' exp 6767 ... 6768 | '-' exp %prec UMINUS 6769 6770 6771 File: bison.info, Node: Parser States, Next: Reduce/Reduce, Prev: Contextual Precedence, Up: Algorithm 6772 6773 5.5 Parser States 6774 ================= 6775 6776 The function `yyparse' is implemented using a finite-state machine. 6777 The values pushed on the parser stack are not simply token type codes; 6778 they represent the entire sequence of terminal and nonterminal symbols 6779 at or near the top of the stack. The current state collects all the 6780 information about previous input which is relevant to deciding what to 6781 do next. 6782 6783 Each time a lookahead token is read, the current parser state 6784 together with the type of lookahead token are looked up in a table. 6785 This table entry can say, "Shift the lookahead token." In this case, 6786 it also specifies the new parser state, which is pushed onto the top of 6787 the parser stack. Or it can say, "Reduce using rule number N." This 6788 means that a certain number of tokens or groupings are taken off the 6789 top of the stack, and replaced by one grouping. In other words, that 6790 number of states are popped from the stack, and one new state is pushed. 6791 6792 There is one other alternative: the table can say that the lookahead 6793 token is erroneous in the current state. This causes error processing 6794 to begin (*note Error Recovery::). 6795 6796 6797 File: bison.info, Node: Reduce/Reduce, Next: Mysterious Conflicts, Prev: Parser States, Up: Algorithm 6798 6799 5.6 Reduce/Reduce Conflicts 6800 =========================== 6801 6802 A reduce/reduce conflict occurs if there are two or more rules that 6803 apply to the same sequence of input. This usually indicates a serious 6804 error in the grammar. 6805 6806 For example, here is an erroneous attempt to define a sequence of 6807 zero or more `word' groupings. 6808 6809 sequence: 6810 /* empty */ { printf ("empty sequence\n"); } 6811 | maybeword 6812 | sequence word { printf ("added word %s\n", $2); } 6813 ; 6814 6815 maybeword: 6816 /* empty */ { printf ("empty maybeword\n"); } 6817 | word { printf ("single word %s\n", $1); } 6818 ; 6819 6820 The error is an ambiguity: there is more than one way to parse a single 6821 `word' into a `sequence'. It could be reduced to a `maybeword' and 6822 then into a `sequence' via the second rule. Alternatively, 6823 nothing-at-all could be reduced into a `sequence' via the first rule, 6824 and this could be combined with the `word' using the third rule for 6825 `sequence'. 6826 6827 There is also more than one way to reduce nothing-at-all into a 6828 `sequence'. This can be done directly via the first rule, or 6829 indirectly via `maybeword' and then the second rule. 6830 6831 You might think that this is a distinction without a difference, 6832 because it does not change whether any particular input is valid or 6833 not. But it does affect which actions are run. One parsing order runs 6834 the second rule's action; the other runs the first rule's action and 6835 the third rule's action. In this example, the output of the program 6836 changes. 6837 6838 Bison resolves a reduce/reduce conflict by choosing to use the rule 6839 that appears first in the grammar, but it is very risky to rely on 6840 this. Every reduce/reduce conflict must be studied and usually 6841 eliminated. Here is the proper way to define `sequence': 6842 6843 sequence: 6844 /* empty */ { printf ("empty sequence\n"); } 6845 | sequence word { printf ("added word %s\n", $2); } 6846 ; 6847 6848 Here is another common error that yields a reduce/reduce conflict: 6849 6850 sequence: 6851 /* empty */ 6852 | sequence words 6853 | sequence redirects 6854 ; 6855 6856 words: 6857 /* empty */ 6858 | words word 6859 ; 6860 6861 redirects: 6862 /* empty */ 6863 | redirects redirect 6864 ; 6865 6866 The intention here is to define a sequence which can contain either 6867 `word' or `redirect' groupings. The individual definitions of 6868 `sequence', `words' and `redirects' are error-free, but the three 6869 together make a subtle ambiguity: even an empty input can be parsed in 6870 infinitely many ways! 6871 6872 Consider: nothing-at-all could be a `words'. Or it could be two 6873 `words' in a row, or three, or any number. It could equally well be a 6874 `redirects', or two, or any number. Or it could be a `words' followed 6875 by three `redirects' and another `words'. And so on. 6876 6877 Here are two ways to correct these rules. First, to make it a 6878 single level of sequence: 6879 6880 sequence: 6881 /* empty */ 6882 | sequence word 6883 | sequence redirect 6884 ; 6885 6886 Second, to prevent either a `words' or a `redirects' from being 6887 empty: 6888 6889 sequence: 6890 /* empty */ 6891 | sequence words 6892 | sequence redirects 6893 ; 6894 6895 words: 6896 word 6897 | words word 6898 ; 6899 6900 redirects: 6901 redirect 6902 | redirects redirect 6903 ; 6904 6905 Yet this proposal introduces another kind of ambiguity! The input 6906 `word word' can be parsed as a single `words' composed of two `word's, 6907 or as two one-`word' `words' (and likewise for `redirect'/`redirects'). 6908 However this ambiguity is now a shift/reduce conflict, and therefore it 6909 can now be addressed with precedence directives. 6910 6911 To simplify the matter, we will proceed with `word' and `redirect' 6912 being tokens: `"word"' and `"redirect"'. 6913 6914 To prefer the longest `words', the conflict between the token 6915 `"word"' and the rule `sequence: sequence words' must be resolved as a 6916 shift. To this end, we use the same techniques as exposed above, see 6917 *note Using Precedence For Non Operators: Non Operators. One solution 6918 relies on precedences: use `%prec' to give a lower precedence to the 6919 rule: 6920 6921 %nonassoc "word" 6922 %nonassoc "sequence" 6923 %% 6924 sequence: 6925 /* empty */ 6926 | sequence word %prec "sequence" 6927 | sequence redirect %prec "sequence" 6928 ; 6929 6930 words: 6931 word 6932 | words "word" 6933 ; 6934 6935 Another solution relies on associativity: provide both the token and 6936 the rule with the same precedence, but make them right-associative: 6937 6938 %right "word" "redirect" 6939 %% 6940 sequence: 6941 /* empty */ 6942 | sequence word %prec "word" 6943 | sequence redirect %prec "redirect" 6944 ; 6945 6946 6947 File: bison.info, Node: Mysterious Conflicts, Next: Tuning LR, Prev: Reduce/Reduce, Up: Algorithm 6948 6949 5.7 Mysterious Conflicts 6950 ======================== 6951 6952 Sometimes reduce/reduce conflicts can occur that don't look warranted. 6953 Here is an example: 6954 6955 %% 6956 def: param_spec return_spec ','; 6957 param_spec: 6958 type 6959 | name_list ':' type 6960 ; 6961 return_spec: 6962 type 6963 | name ':' type 6964 ; 6965 type: "id"; 6966 name: "id"; 6967 name_list: 6968 name 6969 | name ',' name_list 6970 ; 6971 6972 It would seem that this grammar can be parsed with only a single 6973 token of lookahead: when a `param_spec' is being read, an `"id"' is a 6974 `name' if a comma or colon follows, or a `type' if another `"id"' 6975 follows. In other words, this grammar is LR(1). 6976 6977 However, for historical reasons, Bison cannot by default handle all 6978 LR(1) grammars. In this grammar, two contexts, that after an `"id"' at 6979 the beginning of a `param_spec' and likewise at the beginning of a 6980 `return_spec', are similar enough that Bison assumes they are the same. 6981 They appear similar because the same set of rules would be active--the 6982 rule for reducing to a `name' and that for reducing to a `type'. Bison 6983 is unable to determine at that stage of processing that the rules would 6984 require different lookahead tokens in the two contexts, so it makes a 6985 single parser state for them both. Combining the two contexts causes a 6986 conflict later. In parser terminology, this occurrence means that the 6987 grammar is not LALR(1). 6988 6989 For many practical grammars (specifically those that fall into the 6990 non-LR(1) class), the limitations of LALR(1) result in difficulties 6991 beyond just mysterious reduce/reduce conflicts. The best way to fix 6992 all these problems is to select a different parser table construction 6993 algorithm. Either IELR(1) or canonical LR(1) would suffice, but the 6994 former is more efficient and easier to debug during development. *Note 6995 LR Table Construction::, for details. (Bison's IELR(1) and canonical 6996 LR(1) implementations are experimental. More user feedback will help 6997 to stabilize them.) 6998 6999 If you instead wish to work around LALR(1)'s limitations, you can 7000 often fix a mysterious conflict by identifying the two parser states 7001 that are being confused, and adding something to make them look 7002 distinct. In the above example, adding one rule to `return_spec' as 7003 follows makes the problem go away: 7004 7005 ... 7006 return_spec: 7007 type 7008 | name ':' type 7009 | "id" "bogus" /* This rule is never used. */ 7010 ; 7011 7012 This corrects the problem because it introduces the possibility of an 7013 additional active rule in the context after the `"id"' at the beginning 7014 of `return_spec'. This rule is not active in the corresponding context 7015 in a `param_spec', so the two contexts receive distinct parser states. 7016 As long as the token `"bogus"' is never generated by `yylex', the added 7017 rule cannot alter the way actual input is parsed. 7018 7019 In this particular example, there is another way to solve the 7020 problem: rewrite the rule for `return_spec' to use `"id"' directly 7021 instead of via `name'. This also causes the two confusing contexts to 7022 have different sets of active rules, because the one for `return_spec' 7023 activates the altered rule for `return_spec' rather than the one for 7024 `name'. 7025 7026 param_spec: 7027 type 7028 | name_list ':' type 7029 ; 7030 return_spec: 7031 type 7032 | "id" ':' type 7033 ; 7034 7035 For a more detailed exposition of LALR(1) parsers and parser 7036 generators, *note DeRemer 1982: Bibliography. 7037 7038 7039 File: bison.info, Node: Tuning LR, Next: Generalized LR Parsing, Prev: Mysterious Conflicts, Up: Algorithm 7040 7041 5.8 Tuning LR 7042 ============= 7043 7044 The default behavior of Bison's LR-based parsers is chosen mostly for 7045 historical reasons, but that behavior is often not robust. For 7046 example, in the previous section, we discussed the mysterious conflicts 7047 that can be produced by LALR(1), Bison's default parser table 7048 construction algorithm. Another example is Bison's `%error-verbose' 7049 directive, which instructs the generated parser to produce verbose 7050 syntax error messages, which can sometimes contain incorrect 7051 information. 7052 7053 In this section, we explore several modern features of Bison that 7054 allow you to tune fundamental aspects of the generated LR-based 7055 parsers. Some of these features easily eliminate shortcomings like 7056 those mentioned above. Others can be helpful purely for understanding 7057 your parser. 7058 7059 Most of the features discussed in this section are still 7060 experimental. More user feedback will help to stabilize them. 7061 7062 * Menu: 7063 7064 * LR Table Construction:: Choose a different construction algorithm. 7065 * Default Reductions:: Disable default reductions. 7066 * LAC:: Correct lookahead sets in the parser states. 7067 * Unreachable States:: Keep unreachable parser states for debugging. 7068 7069 7070 File: bison.info, Node: LR Table Construction, Next: Default Reductions, Up: Tuning LR 7071 7072 5.8.1 LR Table Construction 7073 --------------------------- 7074 7075 For historical reasons, Bison constructs LALR(1) parser tables by 7076 default. However, LALR does not possess the full language-recognition 7077 power of LR. As a result, the behavior of parsers employing LALR 7078 parser tables is often mysterious. We presented a simple example of 7079 this effect in *note Mysterious Conflicts::. 7080 7081 As we also demonstrated in that example, the traditional approach to 7082 eliminating such mysterious behavior is to restructure the grammar. 7083 Unfortunately, doing so correctly is often difficult. Moreover, merely 7084 discovering that LALR causes mysterious behavior in your parser can be 7085 difficult as well. 7086 7087 Fortunately, Bison provides an easy way to eliminate the possibility 7088 of such mysterious behavior altogether. You simply need to activate a 7089 more powerful parser table construction algorithm by using the `%define 7090 lr.type' directive. 7091 7092 -- Directive: %define lr.type TYPE 7093 Specify the type of parser tables within the LR(1) family. The 7094 accepted values for TYPE are: 7095 7096 * `lalr' (default) 7097 7098 * `ielr' 7099 7100 * `canonical-lr' 7101 7102 (This feature is experimental. More user feedback will help to 7103 stabilize it.) 7104 7105 For example, to activate IELR, you might add the following directive 7106 to you grammar file: 7107 7108 %define lr.type ielr 7109 7110 For the example in *note Mysterious Conflicts::, the mysterious 7111 conflict is then eliminated, so there is no need to invest time in 7112 comprehending the conflict or restructuring the grammar to fix it. If, 7113 during future development, the grammar evolves such that all mysterious 7114 behavior would have disappeared using just LALR, you need not fear that 7115 continuing to use IELR will result in unnecessarily large parser tables. 7116 That is, IELR generates LALR tables when LALR (using a deterministic 7117 parsing algorithm) is sufficient to support the full 7118 language-recognition power of LR. Thus, by enabling IELR at the start 7119 of grammar development, you can safely and completely eliminate the 7120 need to consider LALR's shortcomings. 7121 7122 While IELR is almost always preferable, there are circumstances 7123 where LALR or the canonical LR parser tables described by Knuth (*note 7124 Knuth 1965: Bibliography.) can be useful. Here we summarize the 7125 relative advantages of each parser table construction algorithm within 7126 Bison: 7127 7128 * LALR 7129 7130 There are at least two scenarios where LALR can be worthwhile: 7131 7132 * GLR without static conflict resolution. 7133 7134 When employing GLR parsers (*note GLR Parsers::), if you do 7135 not resolve any conflicts statically (for example, with 7136 `%left' or `%prec'), then the parser explores all potential 7137 parses of any given input. In this case, the choice of 7138 parser table construction algorithm is guaranteed not to alter 7139 the language accepted by the parser. LALR parser tables are 7140 the smallest parser tables Bison can currently construct, so 7141 they may then be preferable. Nevertheless, once you begin to 7142 resolve conflicts statically, GLR behaves more like a 7143 deterministic parser in the syntactic contexts where those 7144 conflicts appear, and so either IELR or canonical LR can then 7145 be helpful to avoid LALR's mysterious behavior. 7146 7147 * Malformed grammars. 7148 7149 Occasionally during development, an especially malformed 7150 grammar with a major recurring flaw may severely impede the 7151 IELR or canonical LR parser table construction algorithm. 7152 LALR can be a quick way to construct parser tables in order 7153 to investigate such problems while ignoring the more subtle 7154 differences from IELR and canonical LR. 7155 7156 * IELR 7157 7158 IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That 7159 is, given any grammar (LR or non-LR), parsers using IELR or 7160 canonical LR parser tables always accept exactly the same set of 7161 sentences. However, like LALR, IELR merges parser states during 7162 parser table construction so that the number of parser states is 7163 often an order of magnitude less than for canonical LR. More 7164 importantly, because canonical LR's extra parser states may contain 7165 duplicate conflicts in the case of non-LR grammars, the number of 7166 conflicts for IELR is often an order of magnitude less as well. 7167 This effect can significantly reduce the complexity of developing 7168 a grammar. 7169 7170 * Canonical LR 7171 7172 While inefficient, canonical LR parser tables can be an 7173 interesting means to explore a grammar because they possess a 7174 property that IELR and LALR tables do not. That is, if 7175 `%nonassoc' is not used and default reductions are left disabled 7176 (*note Default Reductions::), then, for every left context of 7177 every canonical LR state, the set of tokens accepted by that state 7178 is guaranteed to be the exact set of tokens that is syntactically 7179 acceptable in that left context. It might then seem that an 7180 advantage of canonical LR parsers in production is that, under the 7181 above constraints, they are guaranteed to detect a syntax error as 7182 soon as possible without performing any unnecessary reductions. 7183 However, IELR parsers that use LAC are also able to achieve this 7184 behavior without sacrificing `%nonassoc' or default reductions. 7185 For details and a few caveats of LAC, *note LAC::. 7186 7187 For a more detailed exposition of the mysterious behavior in LALR 7188 parsers and the benefits of IELR, *note Denny 2008 March: Bibliography, 7189 and *note Denny 2010 November: Bibliography. 7190 7191 7192 File: bison.info, Node: Default Reductions, Next: LAC, Prev: LR Table Construction, Up: Tuning LR 7193 7194 5.8.2 Default Reductions 7195 ------------------------ 7196 7197 After parser table construction, Bison identifies the reduction with the 7198 largest lookahead set in each parser state. To reduce the size of the 7199 parser state, traditional Bison behavior is to remove that lookahead 7200 set and to assign that reduction to be the default parser action. Such 7201 a reduction is known as a "default reduction". 7202 7203 Default reductions affect more than the size of the parser tables. 7204 They also affect the behavior of the parser: 7205 7206 * Delayed `yylex' invocations. 7207 7208 A "consistent state" is a state that has only one possible parser 7209 action. If that action is a reduction and is encoded as a default 7210 reduction, then that consistent state is called a "defaulted 7211 state". Upon reaching a defaulted state, a Bison-generated parser 7212 does not bother to invoke `yylex' to fetch the next token before 7213 performing the reduction. In other words, whether default 7214 reductions are enabled in consistent states determines how soon a 7215 Bison-generated parser invokes `yylex' for a token: immediately 7216 when it _reaches_ that token in the input or when it eventually 7217 _needs_ that token as a lookahead to determine the next parser 7218 action. Traditionally, default reductions are enabled, and so the 7219 parser exhibits the latter behavior. 7220 7221 The presence of defaulted states is an important consideration when 7222 designing `yylex' and the grammar file. That is, if the behavior 7223 of `yylex' can influence or be influenced by the semantic actions 7224 associated with the reductions in defaulted states, then the delay 7225 of the next `yylex' invocation until after those reductions is 7226 significant. For example, the semantic actions might pop a scope 7227 stack that `yylex' uses to determine what token to return. Thus, 7228 the delay might be necessary to ensure that `yylex' does not look 7229 up the next token in a scope that should already be considered 7230 closed. 7231 7232 * Delayed syntax error detection. 7233 7234 When the parser fetches a new token by invoking `yylex', it checks 7235 whether there is an action for that token in the current parser 7236 state. The parser detects a syntax error if and only if either 7237 (1) there is no action for that token or (2) the action for that 7238 token is the error action (due to the use of `%nonassoc'). 7239 However, if there is a default reduction in that state (which 7240 might or might not be a defaulted state), then it is impossible 7241 for condition 1 to exist. That is, all tokens have an action. 7242 Thus, the parser sometimes fails to detect the syntax error until 7243 it reaches a later state. 7244 7245 While default reductions never cause the parser to accept 7246 syntactically incorrect sentences, the delay of syntax error 7247 detection can have unexpected effects on the behavior of the 7248 parser. However, the delay can be caused anyway by parser state 7249 merging and the use of `%nonassoc', and it can be fixed by another 7250 Bison feature, LAC. We discuss the effects of delayed syntax 7251 error detection and LAC more in the next section (*note LAC::). 7252 7253 For canonical LR, the only default reduction that Bison enables by 7254 default is the accept action, which appears only in the accepting 7255 state, which has no other action and is thus a defaulted state. 7256 However, the default accept action does not delay any `yylex' 7257 invocation or syntax error detection because the accept action ends the 7258 parse. 7259 7260 For LALR and IELR, Bison enables default reductions in nearly all 7261 states by default. There are only two exceptions. First, states that 7262 have a shift action on the `error' token do not have default reductions 7263 because delayed syntax error detection could then prevent the `error' 7264 token from ever being shifted in that state. However, parser state 7265 merging can cause the same effect anyway, and LAC fixes it in both 7266 cases, so future versions of Bison might drop this exception when LAC 7267 is activated. Second, GLR parsers do not record the default reduction 7268 as the action on a lookahead token for which there is a conflict. The 7269 correct action in this case is to split the parse instead. 7270 7271 To adjust which states have default reductions enabled, use the 7272 `%define lr.default-reductions' directive. 7273 7274 -- Directive: %define lr.default-reductions WHERE 7275 Specify the kind of states that are permitted to contain default 7276 reductions. The accepted values of WHERE are: 7277 * `most' (default for LALR and IELR) 7278 7279 * `consistent' 7280 7281 * `accepting' (default for canonical LR) 7282 7283 (The ability to specify where default reductions are permitted is 7284 experimental. More user feedback will help to stabilize it.) 7285 7286 7287 File: bison.info, Node: LAC, Next: Unreachable States, Prev: Default Reductions, Up: Tuning LR 7288 7289 5.8.3 LAC 7290 --------- 7291 7292 Canonical LR, IELR, and LALR can suffer from a couple of problems upon 7293 encountering a syntax error. First, the parser might perform additional 7294 parser stack reductions before discovering the syntax error. Such 7295 reductions can perform user semantic actions that are unexpected because 7296 they are based on an invalid token, and they cause error recovery to 7297 begin in a different syntactic context than the one in which the 7298 invalid token was encountered. Second, when verbose error messages are 7299 enabled (*note Error Reporting::), the expected token list in the 7300 syntax error message can both contain invalid tokens and omit valid 7301 tokens. 7302 7303 The culprits for the above problems are `%nonassoc', default 7304 reductions in inconsistent states (*note Default Reductions::), and 7305 parser state merging. Because IELR and LALR merge parser states, they 7306 suffer the most. Canonical LR can suffer only if `%nonassoc' is used 7307 or if default reductions are enabled for inconsistent states. 7308 7309 LAC (Lookahead Correction) is a new mechanism within the parsing 7310 algorithm that solves these problems for canonical LR, IELR, and LALR 7311 without sacrificing `%nonassoc', default reductions, or state merging. 7312 You can enable LAC with the `%define parse.lac' directive. 7313 7314 -- Directive: %define parse.lac VALUE 7315 Enable LAC to improve syntax error handling. 7316 * `none' (default) 7317 7318 * `full' 7319 (This feature is experimental. More user feedback will help to 7320 stabilize it. Moreover, it is currently only available for 7321 deterministic parsers in C.) 7322 7323 Conceptually, the LAC mechanism is straight-forward. Whenever the 7324 parser fetches a new token from the scanner so that it can determine 7325 the next parser action, it immediately suspends normal parsing and 7326 performs an exploratory parse using a temporary copy of the normal 7327 parser state stack. During this exploratory parse, the parser does not 7328 perform user semantic actions. If the exploratory parse reaches a 7329 shift action, normal parsing then resumes on the normal parser stacks. 7330 If the exploratory parse reaches an error instead, the parser reports a 7331 syntax error. If verbose syntax error messages are enabled, the parser 7332 must then discover the list of expected tokens, so it performs a 7333 separate exploratory parse for each token in the grammar. 7334 7335 There is one subtlety about the use of LAC. That is, when in a 7336 consistent parser state with a default reduction, the parser will not 7337 attempt to fetch a token from the scanner because no lookahead is 7338 needed to determine the next parser action. Thus, whether default 7339 reductions are enabled in consistent states (*note Default 7340 Reductions::) affects how soon the parser detects a syntax error: 7341 immediately when it _reaches_ an erroneous token or when it eventually 7342 _needs_ that token as a lookahead to determine the next parser action. 7343 The latter behavior is probably more intuitive, so Bison currently 7344 provides no way to achieve the former behavior while default reductions 7345 are enabled in consistent states. 7346 7347 Thus, when LAC is in use, for some fixed decision of whether to 7348 enable default reductions in consistent states, canonical LR and IELR 7349 behave almost exactly the same for both syntactically acceptable and 7350 syntactically unacceptable input. While LALR still does not support 7351 the full language-recognition power of canonical LR and IELR, LAC at 7352 least enables LALR's syntax error handling to correctly reflect LALR's 7353 language-recognition power. 7354 7355 There are a few caveats to consider when using LAC: 7356 7357 * Infinite parsing loops. 7358 7359 IELR plus LAC does have one shortcoming relative to canonical LR. 7360 Some parsers generated by Bison can loop infinitely. LAC does not 7361 fix infinite parsing loops that occur between encountering a 7362 syntax error and detecting it, but enabling canonical LR or 7363 disabling default reductions sometimes does. 7364 7365 * Verbose error message limitations. 7366 7367 Because of internationalization considerations, Bison-generated 7368 parsers limit the size of the expected token list they are willing 7369 to report in a verbose syntax error message. If the number of 7370 expected tokens exceeds that limit, the list is simply dropped 7371 from the message. Enabling LAC can increase the size of the list 7372 and thus cause the parser to drop it. Of course, dropping the 7373 list is better than reporting an incorrect list. 7374 7375 * Performance. 7376 7377 Because LAC requires many parse actions to be performed twice, it 7378 can have a performance penalty. However, not all parse actions 7379 must be performed twice. Specifically, during a series of default 7380 reductions in consistent states and shift actions, the parser 7381 never has to initiate an exploratory parse. Moreover, the most 7382 time-consuming tasks in a parse are often the file I/O, the 7383 lexical analysis performed by the scanner, and the user's semantic 7384 actions, but none of these are performed during the exploratory 7385 parse. Finally, the base of the temporary stack used during an 7386 exploratory parse is a pointer into the normal parser state stack 7387 so that the stack is never physically copied. In our experience, 7388 the performance penalty of LAC has proved insignificant for 7389 practical grammars. 7390 7391 While the LAC algorithm shares techniques that have been recognized 7392 in the parser community for years, for the publication that introduces 7393 LAC, *note Denny 2010 May: Bibliography. 7394 7395 7396 File: bison.info, Node: Unreachable States, Prev: LAC, Up: Tuning LR 7397 7398 5.8.4 Unreachable States 7399 ------------------------ 7400 7401 If there exists no sequence of transitions from the parser's start 7402 state to some state S, then Bison considers S to be an "unreachable 7403 state". A state can become unreachable during conflict resolution if 7404 Bison disables a shift action leading to it from a predecessor state. 7405 7406 By default, Bison removes unreachable states from the parser after 7407 conflict resolution because they are useless in the generated parser. 7408 However, keeping unreachable states is sometimes useful when trying to 7409 understand the relationship between the parser and the grammar. 7410 7411 -- Directive: %define lr.keep-unreachable-states VALUE 7412 Request that Bison allow unreachable states to remain in the 7413 parser tables. VALUE must be a Boolean. The default is `false'. 7414 7415 There are a few caveats to consider: 7416 7417 * Missing or extraneous warnings. 7418 7419 Unreachable states may contain conflicts and may use rules not 7420 used in any other state. Thus, keeping unreachable states may 7421 induce warnings that are irrelevant to your parser's behavior, and 7422 it may eliminate warnings that are relevant. Of course, the 7423 change in warnings may actually be relevant to a parser table 7424 analysis that wants to keep unreachable states, so this behavior 7425 will likely remain in future Bison releases. 7426 7427 * Other useless states. 7428 7429 While Bison is able to remove unreachable states, it is not 7430 guaranteed to remove other kinds of useless states. Specifically, 7431 when Bison disables reduce actions during conflict resolution, 7432 some goto actions may become useless, and thus some additional 7433 states may become useless. If Bison were to compute which goto 7434 actions were useless and then disable those actions, it could 7435 identify such states as unreachable and then remove those states. 7436 However, Bison does not compute which goto actions are useless. 7437 7438 7439 File: bison.info, Node: Generalized LR Parsing, Next: Memory Management, Prev: Tuning LR, Up: Algorithm 7440 7441 5.9 Generalized LR (GLR) Parsing 7442 ================================ 7443 7444 Bison produces _deterministic_ parsers that choose uniquely when to 7445 reduce and which reduction to apply based on a summary of the preceding 7446 input and on one extra token of lookahead. As a result, normal Bison 7447 handles a proper subset of the family of context-free languages. 7448 Ambiguous grammars, since they have strings with more than one possible 7449 sequence of reductions cannot have deterministic parsers in this sense. 7450 The same is true of languages that require more than one symbol of 7451 lookahead, since the parser lacks the information necessary to make a 7452 decision at the point it must be made in a shift-reduce parser. 7453 Finally, as previously mentioned (*note Mysterious Conflicts::), there 7454 are languages where Bison's default choice of how to summarize the 7455 input seen so far loses necessary information. 7456 7457 When you use the `%glr-parser' declaration in your grammar file, 7458 Bison generates a parser that uses a different algorithm, called 7459 Generalized LR (or GLR). A Bison GLR parser uses the same basic 7460 algorithm for parsing as an ordinary Bison parser, but behaves 7461 differently in cases where there is a shift-reduce conflict that has not 7462 been resolved by precedence rules (*note Precedence::) or a 7463 reduce-reduce conflict. When a GLR parser encounters such a situation, 7464 it effectively _splits_ into a several parsers, one for each possible 7465 shift or reduction. These parsers then proceed as usual, consuming 7466 tokens in lock-step. Some of the stacks may encounter other conflicts 7467 and split further, with the result that instead of a sequence of states, 7468 a Bison GLR parsing stack is what is in effect a tree of states. 7469 7470 In effect, each stack represents a guess as to what the proper parse 7471 is. Additional input may indicate that a guess was wrong, in which case 7472 the appropriate stack silently disappears. Otherwise, the semantics 7473 actions generated in each stack are saved, rather than being executed 7474 immediately. When a stack disappears, its saved semantic actions never 7475 get executed. When a reduction causes two stacks to become equivalent, 7476 their sets of semantic actions are both saved with the state that 7477 results from the reduction. We say that two stacks are equivalent when 7478 they both represent the same sequence of states, and each pair of 7479 corresponding states represents a grammar symbol that produces the same 7480 segment of the input token stream. 7481 7482 Whenever the parser makes a transition from having multiple states 7483 to having one, it reverts to the normal deterministic parsing 7484 algorithm, after resolving and executing the saved-up actions. At this 7485 transition, some of the states on the stack will have semantic values 7486 that are sets (actually multisets) of possible actions. The parser 7487 tries to pick one of the actions by first finding one whose rule has 7488 the highest dynamic precedence, as set by the `%dprec' declaration. 7489 Otherwise, if the alternative actions are not ordered by precedence, 7490 but there the same merging function is declared for both rules by the 7491 `%merge' declaration, Bison resolves and evaluates both and then calls 7492 the merge function on the result. Otherwise, it reports an ambiguity. 7493 7494 It is possible to use a data structure for the GLR parsing tree that 7495 permits the processing of any LR(1) grammar in linear time (in the size 7496 of the input), any unambiguous (not necessarily LR(1)) grammar in 7497 quadratic worst-case time, and any general (possibly ambiguous) 7498 context-free grammar in cubic worst-case time. However, Bison currently 7499 uses a simpler data structure that requires time proportional to the 7500 length of the input times the maximum number of stacks required for any 7501 prefix of the input. Thus, really ambiguous or nondeterministic 7502 grammars can require exponential time and space to process. Such badly 7503 behaving examples, however, are not generally of practical interest. 7504 Usually, nondeterminism in a grammar is local--the parser is "in doubt" 7505 only for a few tokens at a time. Therefore, the current data structure 7506 should generally be adequate. On LR(1) portions of a grammar, in 7507 particular, it is only slightly slower than with the deterministic 7508 LR(1) Bison parser. 7509 7510 For a more detailed exposition of GLR parsers, *note Scott 2000: 7511 Bibliography. 7512 7513 7514 File: bison.info, Node: Memory Management, Prev: Generalized LR Parsing, Up: Algorithm 7515 7516 5.10 Memory Management, and How to Avoid Memory Exhaustion 7517 ========================================================== 7518 7519 The Bison parser stack can run out of memory if too many tokens are 7520 shifted and not reduced. When this happens, the parser function 7521 `yyparse' calls `yyerror' and then returns 2. 7522 7523 Because Bison parsers have growing stacks, hitting the upper limit 7524 usually results from using a right recursion instead of a left 7525 recursion, see *note Recursive Rules: Recursion. 7526 7527 By defining the macro `YYMAXDEPTH', you can control how deep the 7528 parser stack can become before memory is exhausted. Define the macro 7529 with a value that is an integer. This value is the maximum number of 7530 tokens that can be shifted (and not reduced) before overflow. 7531 7532 The stack space allowed is not necessarily allocated. If you 7533 specify a large value for `YYMAXDEPTH', the parser normally allocates a 7534 small stack at first, and then makes it bigger by stages as needed. 7535 This increasing allocation happens automatically and silently. 7536 Therefore, you do not need to make `YYMAXDEPTH' painfully small merely 7537 to save space for ordinary inputs that do not need much stack. 7538 7539 However, do not allow `YYMAXDEPTH' to be a value so large that 7540 arithmetic overflow could occur when calculating the size of the stack 7541 space. Also, do not allow `YYMAXDEPTH' to be less than `YYINITDEPTH'. 7542 7543 The default value of `YYMAXDEPTH', if you do not define it, is 10000. 7544 7545 You can control how much stack is allocated initially by defining the 7546 macro `YYINITDEPTH' to a positive integer. For the deterministic 7547 parser in C, this value must be a compile-time constant unless you are 7548 assuming C99 or some other target language or compiler that allows 7549 variable-length arrays. The default is 200. 7550 7551 Do not allow `YYINITDEPTH' to be greater than `YYMAXDEPTH'. 7552 7553 Because of semantic differences between C and C++, the deterministic 7554 parsers in C produced by Bison cannot grow when compiled by C++ 7555 compilers. In this precise case (compiling a C parser as C++) you are 7556 suggested to grow `YYINITDEPTH'. The Bison maintainers hope to fix 7557 this deficiency in a future release. 7558 7559 7560 File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top 7561 7562 6 Error Recovery 7563 **************** 7564 7565 It is not usually acceptable to have a program terminate on a syntax 7566 error. For example, a compiler should recover sufficiently to parse the 7567 rest of the input file and check it for errors; a calculator should 7568 accept another expression. 7569 7570 In a simple interactive command parser where each input is one line, 7571 it may be sufficient to allow `yyparse' to return 1 on error and have 7572 the caller ignore the rest of the input line when that happens (and 7573 then call `yyparse' again). But this is inadequate for a compiler, 7574 because it forgets all the syntactic context leading up to the error. 7575 A syntax error deep within a function in the compiler input should not 7576 cause the compiler to treat the following line like the beginning of a 7577 source file. 7578 7579 You can define how to recover from a syntax error by writing rules to 7580 recognize the special token `error'. This is a terminal symbol that is 7581 always defined (you need not declare it) and reserved for error 7582 handling. The Bison parser generates an `error' token whenever a 7583 syntax error happens; if you have provided a rule to recognize this 7584 token in the current context, the parse can continue. 7585 7586 For example: 7587 7588 stmts: 7589 /* empty string */ 7590 | stmts '\n' 7591 | stmts exp '\n' 7592 | stmts error '\n' 7593 7594 The fourth rule in this example says that an error followed by a 7595 newline makes a valid addition to any `stmts'. 7596 7597 What happens if a syntax error occurs in the middle of an `exp'? The 7598 error recovery rule, interpreted strictly, applies to the precise 7599 sequence of a `stmts', an `error' and a newline. If an error occurs in 7600 the middle of an `exp', there will probably be some additional tokens 7601 and subexpressions on the stack after the last `stmts', and there will 7602 be tokens to read before the next newline. So the rule is not 7603 applicable in the ordinary way. 7604 7605 But Bison can force the situation to fit the rule, by discarding 7606 part of the semantic context and part of the input. First it discards 7607 states and objects from the stack until it gets back to a state in 7608 which the `error' token is acceptable. (This means that the 7609 subexpressions already parsed are discarded, back to the last complete 7610 `stmts'.) At this point the `error' token can be shifted. Then, if 7611 the old lookahead token is not acceptable to be shifted next, the 7612 parser reads tokens and discards them until it finds a token which is 7613 acceptable. In this example, Bison reads and discards input until the 7614 next newline so that the fourth rule can apply. Note that discarded 7615 symbols are possible sources of memory leaks, see *note Freeing 7616 Discarded Symbols: Destructor Decl, for a means to reclaim this memory. 7617 7618 The choice of error rules in the grammar is a choice of strategies 7619 for error recovery. A simple and useful strategy is simply to skip the 7620 rest of the current input line or current statement if an error is 7621 detected: 7622 7623 stmt: error ';' /* On error, skip until ';' is read. */ 7624 7625 It is also useful to recover to the matching close-delimiter of an 7626 opening-delimiter that has already been parsed. Otherwise the 7627 close-delimiter will probably appear to be unmatched, and generate 7628 another, spurious error message: 7629 7630 primary: 7631 '(' expr ')' 7632 | '(' error ')' 7633 ... 7634 ; 7635 7636 Error recovery strategies are necessarily guesses. When they guess 7637 wrong, one syntax error often leads to another. In the above example, 7638 the error recovery rule guesses that an error is due to bad input 7639 within one `stmt'. Suppose that instead a spurious semicolon is 7640 inserted in the middle of a valid `stmt'. After the error recovery 7641 rule recovers from the first error, another syntax error will be found 7642 straightaway, since the text following the spurious semicolon is also 7643 an invalid `stmt'. 7644 7645 To prevent an outpouring of error messages, the parser will output 7646 no error message for another syntax error that happens shortly after 7647 the first; only after three consecutive input tokens have been 7648 successfully shifted will error messages resume. 7649 7650 Note that rules which accept the `error' token may have actions, just 7651 as any other rules can. 7652 7653 You can make error messages resume immediately by using the macro 7654 `yyerrok' in an action. If you do this in the error rule's action, no 7655 error messages will be suppressed. This macro requires no arguments; 7656 `yyerrok;' is a valid C statement. 7657 7658 The previous lookahead token is reanalyzed immediately after an 7659 error. If this is unacceptable, then the macro `yyclearin' may be used 7660 to clear this token. Write the statement `yyclearin;' in the error 7661 rule's action. *Note Special Features for Use in Actions: Action 7662 Features. 7663 7664 For example, suppose that on a syntax error, an error handling 7665 routine is called that advances the input stream to some point where 7666 parsing should once again commence. The next symbol returned by the 7667 lexical scanner is probably correct. The previous lookahead token 7668 ought to be discarded with `yyclearin;'. 7669 7670 The expression `YYRECOVERING ()' yields 1 when the parser is 7671 recovering from a syntax error, and 0 otherwise. Syntax error 7672 diagnostics are suppressed while recovering from a syntax error. 7673 7674 7675 File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top 7676 7677 7 Handling Context Dependencies 7678 ******************************* 7679 7680 The Bison paradigm is to parse tokens first, then group them into larger 7681 syntactic units. In many languages, the meaning of a token is affected 7682 by its context. Although this violates the Bison paradigm, certain 7683 techniques (known as "kludges") may enable you to write Bison parsers 7684 for such languages. 7685 7686 * Menu: 7687 7688 * Semantic Tokens:: Token parsing can depend on the semantic context. 7689 * Lexical Tie-ins:: Token parsing can depend on the syntactic context. 7690 * Tie-in Recovery:: Lexical tie-ins have implications for how 7691 error recovery rules must be written. 7692 7693 (Actually, "kludge" means any technique that gets its job done but is 7694 neither clean nor robust.) 7695 7696 7697 File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Up: Context Dependency 7698 7699 7.1 Semantic Info in Token Types 7700 ================================ 7701 7702 The C language has a context dependency: the way an identifier is used 7703 depends on what its current meaning is. For example, consider this: 7704 7705 foo (x); 7706 7707 This looks like a function call statement, but if `foo' is a typedef 7708 name, then this is actually a declaration of `x'. How can a Bison 7709 parser for C decide how to parse this input? 7710 7711 The method used in GNU C is to have two different token types, 7712 `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it 7713 looks up the current declaration of the identifier in order to decide 7714 which token type to return: `TYPENAME' if the identifier is declared as 7715 a typedef, `IDENTIFIER' otherwise. 7716 7717 The grammar rules can then express the context dependency by the 7718 choice of token type to recognize. `IDENTIFIER' is accepted as an 7719 expression, but `TYPENAME' is not. `TYPENAME' can start a declaration, 7720 but `IDENTIFIER' cannot. In contexts where the meaning of the 7721 identifier is _not_ significant, such as in declarations that can 7722 shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is 7723 accepted--there is one rule for each of the two token types. 7724 7725 This technique is simple to use if the decision of which kinds of 7726 identifiers to allow is made at a place close to where the identifier is 7727 parsed. But in C this is not always so: C allows a declaration to 7728 redeclare a typedef name provided an explicit type has been specified 7729 earlier: 7730 7731 typedef int foo, bar; 7732 int baz (void) 7733 { 7734 static bar (bar); /* redeclare `bar' as static variable */ 7735 extern foo foo (foo); /* redeclare `foo' as function */ 7736 return foo (bar); 7737 } 7738 7739 Unfortunately, the name being declared is separated from the 7740 declaration construct itself by a complicated syntactic structure--the 7741 "declarator". 7742 7743 As a result, part of the Bison parser for C needs to be duplicated, 7744 with all the nonterminal names changed: once for parsing a declaration 7745 in which a typedef name can be redefined, and once for parsing a 7746 declaration in which that can't be done. Here is a part of the 7747 duplication, with actions omitted for brevity: 7748 7749 initdcl: 7750 declarator maybeasm '=' init 7751 | declarator maybeasm 7752 ; 7753 7754 notype_initdcl: 7755 notype_declarator maybeasm '=' init 7756 | notype_declarator maybeasm 7757 ; 7758 7759 Here `initdcl' can redeclare a typedef name, but `notype_initdcl' 7760 cannot. The distinction between `declarator' and `notype_declarator' 7761 is the same sort of thing. 7762 7763 There is some similarity between this technique and a lexical tie-in 7764 (described next), in that information which alters the lexical analysis 7765 is changed during parsing by other parts of the program. The 7766 difference is here the information is global, and is used for other 7767 purposes in the program. A true lexical tie-in has a special-purpose 7768 flag controlled by the syntactic context. 7769 7770 7771 File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency 7772 7773 7.2 Lexical Tie-ins 7774 =================== 7775 7776 One way to handle context-dependency is the "lexical tie-in": a flag 7777 which is set by Bison actions, whose purpose is to alter the way tokens 7778 are parsed. 7779 7780 For example, suppose we have a language vaguely like C, but with a 7781 special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an 7782 expression in parentheses in which all integers are hexadecimal. In 7783 particular, the token `a1b' must be treated as an integer rather than 7784 as an identifier if it appears in that context. Here is how you can do 7785 it: 7786 7787 %{ 7788 int hexflag; 7789 int yylex (void); 7790 void yyerror (char const *); 7791 %} 7792 %% 7793 ... 7794 expr: 7795 IDENTIFIER 7796 | constant 7797 | HEX '(' { hexflag = 1; } 7798 expr ')' { hexflag = 0; $$ = $4; } 7799 | expr '+' expr { $$ = make_sum ($1, $3); } 7800 ... 7801 ; 7802 7803 constant: 7804 INTEGER 7805 | STRING 7806 ; 7807 7808 Here we assume that `yylex' looks at the value of `hexflag'; when it is 7809 nonzero, all integers are parsed in hexadecimal, and tokens starting 7810 with letters are parsed as integers if possible. 7811 7812 The declaration of `hexflag' shown in the prologue of the grammar 7813 file is needed to make it accessible to the actions (*note The 7814 Prologue: Prologue.). You must also write the code in `yylex' to obey 7815 the flag. 7816 7817 7818 File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency 7819 7820 7.3 Lexical Tie-ins and Error Recovery 7821 ====================================== 7822 7823 Lexical tie-ins make strict demands on any error recovery rules you 7824 have. *Note Error Recovery::. 7825 7826 The reason for this is that the purpose of an error recovery rule is 7827 to abort the parsing of one construct and resume in some larger 7828 construct. For example, in C-like languages, a typical error recovery 7829 rule is to skip tokens until the next semicolon, and then start a new 7830 statement, like this: 7831 7832 stmt: 7833 expr ';' 7834 | IF '(' expr ')' stmt { ... } 7835 ... 7836 | error ';' { hexflag = 0; } 7837 ; 7838 7839 If there is a syntax error in the middle of a `hex (EXPR)' 7840 construct, this error rule will apply, and then the action for the 7841 completed `hex (EXPR)' will never run. So `hexflag' would remain set 7842 for the entire rest of the input, or until the next `hex' keyword, 7843 causing identifiers to be misinterpreted as integers. 7844 7845 To avoid this problem the error recovery rule itself clears 7846 `hexflag'. 7847 7848 There may also be an error recovery rule that works within 7849 expressions. For example, there could be a rule which applies within 7850 parentheses and skips to the close-parenthesis: 7851 7852 expr: 7853 ... 7854 | '(' expr ')' { $$ = $2; } 7855 | '(' error ')' 7856 ... 7857 7858 If this rule acts within the `hex' construct, it is not going to 7859 abort that construct (since it applies to an inner level of parentheses 7860 within the construct). Therefore, it should not clear the flag: the 7861 rest of the `hex' construct should be parsed with the flag still in 7862 effect. 7863 7864 What if there is an error recovery rule which might abort out of the 7865 `hex' construct or might not, depending on circumstances? There is no 7866 way you can write the action to determine whether a `hex' construct is 7867 being aborted or not. So if you are using a lexical tie-in, you had 7868 better make sure your error recovery rules are not of this kind. Each 7869 rule must be such that you can be sure that it always will, or always 7870 won't, have to clear the flag. 7871 7872 7873 File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top 7874 7875 8 Debugging Your Parser 7876 *********************** 7877 7878 Developing a parser can be a challenge, especially if you don't 7879 understand the algorithm (*note The Bison Parser Algorithm: 7880 Algorithm.). This chapter explains how understand and debug a parser. 7881 7882 The first sections focus on the static part of the parser: its 7883 structure. They explain how to generate and read the detailed 7884 description of the automaton. There are several formats available: 7885 - as text, see *note Understanding Your Parser: Understanding.; 7886 7887 - as a graph, see *note Visualizing Your Parser: Graphviz.; 7888 7889 - or as a markup report that can be turned, for instance, into HTML, 7890 see *note Visualizing your parser in multiple formats: Xml. 7891 7892 The last section focuses on the dynamic part of the parser: how to 7893 enable and understand the parser run-time traces (*note Tracing Your 7894 Parser: Tracing.). 7895 7896 * Menu: 7897 7898 * Understanding:: Understanding the structure of your parser. 7899 * Graphviz:: Getting a visual representation of the parser. 7900 * Xml:: Getting a markup representation of the parser. 7901 * Tracing:: Tracing the execution of your parser. 7902 7903 7904 File: bison.info, Node: Understanding, Next: Graphviz, Up: Debugging 7905 7906 8.1 Understanding Your Parser 7907 ============================= 7908 7909 As documented elsewhere (*note The Bison Parser Algorithm: Algorithm.) 7910 Bison parsers are "shift/reduce automata". In some cases (much more 7911 frequent than one would hope), looking at this automaton is required to 7912 tune or simply fix a parser. 7913 7914 The textual file is generated when the options `--report' or 7915 `--verbose' are specified, see *note Invoking Bison: Invocation. Its 7916 name is made by removing `.tab.c' or `.c' from the parser 7917 implementation file name, and adding `.output' instead. Therefore, if 7918 the grammar file is `foo.y', then the parser implementation file is 7919 called `foo.tab.c' by default. As a consequence, the verbose output 7920 file is called `foo.output'. 7921 7922 The following grammar file, `calc.y', will be used in the sequel: 7923 7924 %token NUM STR 7925 %left '+' '-' 7926 %left '*' 7927 %% 7928 exp: 7929 exp '+' exp 7930 | exp '-' exp 7931 | exp '*' exp 7932 | exp '/' exp 7933 | NUM 7934 ; 7935 useless: STR; 7936 %% 7937 7938 `bison' reports: 7939 7940 calc.y: warning: 1 nonterminal useless in grammar 7941 calc.y: warning: 1 rule useless in grammar 7942 calc.y:12.1-7: warning: nonterminal useless in grammar: useless 7943 calc.y:12.10-12: warning: rule useless in grammar: useless: STR 7944 calc.y: conflicts: 7 shift/reduce 7945 7946 When given `--report=state', in addition to `calc.tab.c', it creates 7947 a file `calc.output' with contents detailed below. The order of the 7948 output and the exact presentation might vary, but the interpretation is 7949 the same. 7950 7951 The first section reports useless tokens, nonterminals and rules. 7952 Useless nonterminals and rules are removed in order to produce a 7953 smaller parser, but useless tokens are preserved, since they might be 7954 used by the scanner (note the difference between "useless" and "unused" 7955 below): 7956 7957 Nonterminals useless in grammar 7958 useless 7959 7960 Terminals unused in grammar 7961 STR 7962 7963 Rules useless in grammar 7964 6 useless: STR 7965 7966 The next section lists states that still have conflicts. 7967 7968 State 8 conflicts: 1 shift/reduce 7969 State 9 conflicts: 1 shift/reduce 7970 State 10 conflicts: 1 shift/reduce 7971 State 11 conflicts: 4 shift/reduce 7972 7973 Then Bison reproduces the exact grammar it used: 7974 7975 Grammar 7976 7977 0 $accept: exp $end 7978 7979 1 exp: exp '+' exp 7980 2 | exp '-' exp 7981 3 | exp '*' exp 7982 4 | exp '/' exp 7983 5 | NUM 7984 7985 and reports the uses of the symbols: 7986 7987 Terminals, with rules where they appear 7988 7989 $end (0) 0 7990 '*' (42) 3 7991 '+' (43) 1 7992 '-' (45) 2 7993 '/' (47) 4 7994 error (256) 7995 NUM (258) 5 7996 STR (259) 7997 7998 Nonterminals, with rules where they appear 7999 8000 $accept (9) 8001 on left: 0 8002 exp (10) 8003 on left: 1 2 3 4 5, on right: 0 1 2 3 4 8004 8005 Bison then proceeds onto the automaton itself, describing each state 8006 with its set of "items", also known as "pointed rules". Each item is a 8007 production rule together with a point (`.') marking the location of the 8008 input cursor. 8009 8010 State 0 8011 8012 0 $accept: . exp $end 8013 8014 NUM shift, and go to state 1 8015 8016 exp go to state 2 8017 8018 This reads as follows: "state 0 corresponds to being at the very 8019 beginning of the parsing, in the initial rule, right before the start 8020 symbol (here, `exp'). When the parser returns to this state right 8021 after having reduced a rule that produced an `exp', the control flow 8022 jumps to state 2. If there is no such transition on a nonterminal 8023 symbol, and the lookahead is a `NUM', then this token is shifted onto 8024 the parse stack, and the control flow jumps to state 1. Any other 8025 lookahead triggers a syntax error." 8026 8027 Even though the only active rule in state 0 seems to be rule 0, the 8028 report lists `NUM' as a lookahead token because `NUM' can be at the 8029 beginning of any rule deriving an `exp'. By default Bison reports the 8030 so-called "core" or "kernel" of the item set, but if you want to see 8031 more detail you can invoke `bison' with `--report=itemset' to list the 8032 derived items as well: 8033 8034 State 0 8035 8036 0 $accept: . exp $end 8037 1 exp: . exp '+' exp 8038 2 | . exp '-' exp 8039 3 | . exp '*' exp 8040 4 | . exp '/' exp 8041 5 | . NUM 8042 8043 NUM shift, and go to state 1 8044 8045 exp go to state 2 8046 8047 In the state 1... 8048 8049 State 1 8050 8051 5 exp: NUM . 8052 8053 $default reduce using rule 5 (exp) 8054 8055 the rule 5, `exp: NUM;', is completed. Whatever the lookahead token 8056 (`$default'), the parser will reduce it. If it was coming from State 8057 0, then, after this reduction it will return to state 0, and will jump 8058 to state 2 (`exp: go to state 2'). 8059 8060 State 2 8061 8062 0 $accept: exp . $end 8063 1 exp: exp . '+' exp 8064 2 | exp . '-' exp 8065 3 | exp . '*' exp 8066 4 | exp . '/' exp 8067 8068 $end shift, and go to state 3 8069 '+' shift, and go to state 4 8070 '-' shift, and go to state 5 8071 '*' shift, and go to state 6 8072 '/' shift, and go to state 7 8073 8074 In state 2, the automaton can only shift a symbol. For instance, 8075 because of the item `exp: exp . '+' exp', if the lookahead is `+' it is 8076 shifted onto the parse stack, and the automaton jumps to state 4, 8077 corresponding to the item `exp: exp '+' . exp'. Since there is no 8078 default action, any lookahead not listed triggers a syntax error. 8079 8080 The state 3 is named the "final state", or the "accepting state": 8081 8082 State 3 8083 8084 0 $accept: exp $end . 8085 8086 $default accept 8087 8088 the initial rule is completed (the start symbol and the end-of-input 8089 were read), the parsing exits successfully. 8090 8091 The interpretation of states 4 to 7 is straightforward, and is left 8092 to the reader. 8093 8094 State 4 8095 8096 1 exp: exp '+' . exp 8097 8098 NUM shift, and go to state 1 8099 8100 exp go to state 8 8101 8102 8103 State 5 8104 8105 2 exp: exp '-' . exp 8106 8107 NUM shift, and go to state 1 8108 8109 exp go to state 9 8110 8111 8112 State 6 8113 8114 3 exp: exp '*' . exp 8115 8116 NUM shift, and go to state 1 8117 8118 exp go to state 10 8119 8120 8121 State 7 8122 8123 4 exp: exp '/' . exp 8124 8125 NUM shift, and go to state 1 8126 8127 exp go to state 11 8128 8129 As was announced in beginning of the report, `State 8 conflicts: 1 8130 shift/reduce': 8131 8132 State 8 8133 8134 1 exp: exp . '+' exp 8135 1 | exp '+' exp . 8136 2 | exp . '-' exp 8137 3 | exp . '*' exp 8138 4 | exp . '/' exp 8139 8140 '*' shift, and go to state 6 8141 '/' shift, and go to state 7 8142 8143 '/' [reduce using rule 1 (exp)] 8144 $default reduce using rule 1 (exp) 8145 8146 Indeed, there are two actions associated to the lookahead `/': 8147 either shifting (and going to state 7), or reducing rule 1. The 8148 conflict means that either the grammar is ambiguous, or the parser lacks 8149 information to make the right decision. Indeed the grammar is 8150 ambiguous, as, since we did not specify the precedence of `/', the 8151 sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which 8152 corresponds to shifting `/', or as `(NUM + NUM) / NUM', which 8153 corresponds to reducing rule 1. 8154 8155 Because in deterministic parsing a single decision can be made, Bison 8156 arbitrarily chose to disable the reduction, see *note Shift/Reduce 8157 Conflicts: Shift/Reduce. Discarded actions are reported between square 8158 brackets. 8159 8160 Note that all the previous states had a single possible action: 8161 either shifting the next token and going to the corresponding state, or 8162 reducing a single rule. In the other cases, i.e., when shifting _and_ 8163 reducing is possible or when _several_ reductions are possible, the 8164 lookahead is required to select the action. State 8 is one such state: 8165 if the lookahead is `*' or `/' then the action is shifting, otherwise 8166 the action is reducing rule 1. In other words, the first two items, 8167 corresponding to rule 1, are not eligible when the lookahead token is 8168 `*', since we specified that `*' has higher precedence than `+'. More 8169 generally, some items are eligible only with some set of possible 8170 lookahead tokens. When run with `--report=lookahead', Bison specifies 8171 these lookahead tokens: 8172 8173 State 8 8174 8175 1 exp: exp . '+' exp 8176 1 | exp '+' exp . [$end, '+', '-', '/'] 8177 2 | exp . '-' exp 8178 3 | exp . '*' exp 8179 4 | exp . '/' exp 8180 8181 '*' shift, and go to state 6 8182 '/' shift, and go to state 7 8183 8184 '/' [reduce using rule 1 (exp)] 8185 $default reduce using rule 1 (exp) 8186 8187 Note however that while `NUM + NUM / NUM' is ambiguous (which 8188 results in the conflicts on `/'), `NUM + NUM * NUM' is not: the 8189 conflict was solved thanks to associativity and precedence directives. 8190 If invoked with `--report=solved', Bison includes information about the 8191 solved conflicts in the report: 8192 8193 Conflict between rule 1 and token '+' resolved as reduce (%left '+'). 8194 Conflict between rule 1 and token '-' resolved as reduce (%left '-'). 8195 Conflict between rule 1 and token '*' resolved as shift ('+' < '*'). 8196 8197 The remaining states are similar: 8198 8199 State 9 8200 8201 1 exp: exp . '+' exp 8202 2 | exp . '-' exp 8203 2 | exp '-' exp . 8204 3 | exp . '*' exp 8205 4 | exp . '/' exp 8206 8207 '*' shift, and go to state 6 8208 '/' shift, and go to state 7 8209 8210 '/' [reduce using rule 2 (exp)] 8211 $default reduce using rule 2 (exp) 8212 8213 State 10 8214 8215 1 exp: exp . '+' exp 8216 2 | exp . '-' exp 8217 3 | exp . '*' exp 8218 3 | exp '*' exp . 8219 4 | exp . '/' exp 8220 8221 '/' shift, and go to state 7 8222 8223 '/' [reduce using rule 3 (exp)] 8224 $default reduce using rule 3 (exp) 8225 8226 State 11 8227 8228 1 exp: exp . '+' exp 8229 2 | exp . '-' exp 8230 3 | exp . '*' exp 8231 4 | exp . '/' exp 8232 4 | exp '/' exp . 8233 8234 '+' shift, and go to state 4 8235 '-' shift, and go to state 5 8236 '*' shift, and go to state 6 8237 '/' shift, and go to state 7 8238 8239 '+' [reduce using rule 4 (exp)] 8240 '-' [reduce using rule 4 (exp)] 8241 '*' [reduce using rule 4 (exp)] 8242 '/' [reduce using rule 4 (exp)] 8243 $default reduce using rule 4 (exp) 8244 8245 Observe that state 11 contains conflicts not only due to the lack of 8246 precedence of `/' with respect to `+', `-', and `*', but also because 8247 the associativity of `/' is not specified. 8248 8249 Bison may also produce an HTML version of this output, via an XML 8250 file and XSLT processing (*note Visualizing your parser in multiple 8251 formats: Xml.). 8252 8253 8254 File: bison.info, Node: Graphviz, Next: Xml, Prev: Understanding, Up: Debugging 8255 8256 8.2 Visualizing Your Parser 8257 =========================== 8258 8259 As another means to gain better understanding of the shift/reduce 8260 automaton corresponding to the Bison parser, a DOT file can be 8261 generated. Note that debugging a real grammar with this is tedious at 8262 best, and impractical most of the times, because the generated files 8263 are huge (the generation of a PDF or PNG file from it will take very 8264 long, and more often than not it will fail due to memory exhaustion). 8265 This option was rather designed for beginners, to help them understand 8266 LR parsers. 8267 8268 This file is generated when the `--graph' option is specified (*note 8269 Invoking Bison: Invocation.). Its name is made by removing `.tab.c' or 8270 `.c' from the parser implementation file name, and adding `.dot' 8271 instead. If the grammar file is `foo.y', the Graphviz output file is 8272 called `foo.dot'. A DOT file may also be produced via an XML file and 8273 XSLT processing (*note Visualizing your parser in multiple formats: 8274 Xml.). 8275 8276 The following grammar file, `rr.y', will be used in the sequel: 8277 8278 %% 8279 exp: a ";" | b "."; 8280 a: "0"; 8281 b: "0"; 8282 8283 The graphical output is very similar to the textual one, and as such 8284 it is easier understood by making direct comparisons between them. 8285 *Note Debugging Your Parser: Debugging, for a detailled analysis of the 8286 textual report. 8287 8288 Graphical Representation of States 8289 ---------------------------------- 8290 8291 The items (pointed rules) for each state are grouped together in graph 8292 nodes. Their numbering is the same as in the verbose file. See the 8293 following points, about transitions, for examples 8294 8295 When invoked with `--report=lookaheads', the lookahead tokens, when 8296 needed, are shown next to the relevant rule between square brackets as a 8297 comma separated list. This is the case in the figure for the 8298 representation of reductions, below. 8299 8300 8301 The transitions are represented as directed edges between the 8302 current and the target states. 8303 8304 Graphical Representation of Shifts 8305 ---------------------------------- 8306 8307 Shifts are shown as solid arrows, labelled with the lookahead token for 8308 that shift. The following describes a reduction in the `rr.output' file: 8309 8310 State 3 8311 8312 1 exp: a . ";" 8313 8314 ";" shift, and go to state 6 8315 8316 A Graphviz rendering of this portion of the graph could be: 8317 8318 [image src="figs/example-shift.png" text=".----------------. 8319 | State 3 | 8320 | 1 exp: a . \";\" | 8321 `----------------' 8322 | 8323 | \";\" 8324 | 8325 v 8326 .----------------. 8327 | State 6 | 8328 | 1 exp: a \";\" . | 8329 `----------------' 8330 "] 8331 8332 Graphical Representation of Reductions 8333 -------------------------------------- 8334 8335 Reductions are shown as solid arrows, leading to a diamond-shaped node 8336 bearing the number of the reduction rule. The arrow is labelled with the 8337 appropriate comma separated lookahead tokens. If the reduction is the 8338 default action for the given state, there is no such label. 8339 8340 This is how reductions are represented in the verbose file 8341 `rr.output': 8342 State 1 8343 8344 3 a: "0" . [";"] 8345 4 b: "0" . ["."] 8346 8347 "." reduce using rule 4 (b) 8348 $default reduce using rule 3 (a) 8349 8350 A Graphviz rendering of this portion of the graph could be: 8351 8352 [image src="figs/example-reduce.png" text=" .------------------. 8353 | State 1 | 8354 | 3 a: \"0\" . [\";\"] | 8355 | 4 b: \"0\" . [\".\"] | 8356 `------------------' 8357 / \\ 8358 / \\ [\".\"] 8359 / \\ 8360 v v 8361 / \\ / \\ 8362 / R \\ / R \\ 8363 (green) \\ 3 / \\ 4 / (green) 8364 \\ / \\ / 8365 "] 8366 8367 When unresolved conflicts are present, because in deterministic parsing 8368 a single decision can be made, Bison can arbitrarily choose to disable a 8369 reduction, see *note Shift/Reduce Conflicts: Shift/Reduce. Discarded 8370 actions are distinguished by a red filling color on these nodes, just 8371 like how they are reported between square brackets in the verbose file. 8372 8373 The reduction corresponding to the rule number 0 is the acceptation 8374 state. It is shown as a blue diamond, labelled "Acc". 8375 8376 Graphical representation of go tos 8377 ---------------------------------- 8378 8379 The `go to' jump transitions are represented as dotted lines bearing 8380 the name of the rule being jumped to. 8381 8382 8383 File: bison.info, Node: Xml, Next: Tracing, Prev: Graphviz, Up: Debugging 8384 8385 8.3 Visualizing your parser in multiple formats 8386 =============================================== 8387 8388 Bison supports two major report formats: textual output (*note 8389 Understanding Your Parser: Understanding.) when invoked with option 8390 `--verbose', and DOT (*note Visualizing Your Parser: Graphviz.) when 8391 invoked with option `--graph'. However, another alternative is to 8392 output an XML file that may then be, with `xsltproc', rendered as 8393 either a raw text format equivalent to the verbose file, or as an HTML 8394 version of the same file, with clickable transitions, or even as a DOT. 8395 The `.output' and DOT files obtained via XSLT have no difference 8396 whatsoever with those obtained by invoking `bison' with options 8397 `--verbose' or `--graph'. 8398 8399 The XML file is generated when the options `-x' or `--xml[=FILE]' 8400 are specified, see *note Invoking Bison: Invocation. If not specified, 8401 its name is made by removing `.tab.c' or `.c' from the parser 8402 implementation file name, and adding `.xml' instead. For instance, if 8403 the grammar file is `foo.y', the default XML output file is `foo.xml'. 8404 8405 Bison ships with a `data/xslt' directory, containing XSL 8406 Transformation files to apply to the XML file. Their names are 8407 non-ambiguous: 8408 8409 `xml2dot.xsl' 8410 Used to output a copy of the DOT visualization of the automaton. 8411 8412 `xml2text.xsl' 8413 Used to output a copy of the `.output' file. 8414 8415 `xml2xhtml.xsl' 8416 Used to output an xhtml enhancement of the `.output' file. 8417 8418 Sample usage (requires `xsltproc'): 8419 $ bison -x gr.y 8420 $ bison --print-datadir 8421 /usr/local/share/bison 8422 $ xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl gr.xml >gr.html 8423 8424 8425 File: bison.info, Node: Tracing, Prev: Xml, Up: Debugging 8426 8427 8.4 Tracing Your Parser 8428 ======================= 8429 8430 When a Bison grammar compiles properly but parses "incorrectly", the 8431 `yydebug' parser-trace feature helps figuring out why. 8432 8433 * Menu: 8434 8435 * Enabling Traces:: Activating run-time trace support 8436 * Mfcalc Traces:: Extending `mfcalc' to support traces 8437 * The YYPRINT Macro:: Obsolete interface for semantic value reports 8438 8439 8440 File: bison.info, Node: Enabling Traces, Next: Mfcalc Traces, Up: Tracing 8441 8442 8.4.1 Enabling Traces 8443 --------------------- 8444 8445 There are several means to enable compilation of trace facilities: 8446 8447 the macro `YYDEBUG' 8448 Define the macro `YYDEBUG' to a nonzero value when you compile the 8449 parser. This is compliant with POSIX Yacc. You could use 8450 `-DYYDEBUG=1' as a compiler option or you could put `#define 8451 YYDEBUG 1' in the prologue of the grammar file (*note The 8452 Prologue: Prologue.). 8453 8454 If the `%define' variable `api.prefix' is used (*note Multiple 8455 Parsers in the Same Program: Multiple Parsers.), for instance 8456 `%define api.prefix x', then if `CDEBUG' is defined, its value 8457 controls the tracing feature (enabled if and only if nonzero); 8458 otherwise tracing is enabled if and only if `YYDEBUG' is nonzero. 8459 8460 the option `-t' (POSIX Yacc compliant) 8461 the option `--debug' (Bison extension) 8462 Use the `-t' option when you run Bison (*note Invoking Bison: 8463 Invocation.). With `%define api.prefix c', it defines `CDEBUG' to 8464 1, otherwise it defines `YYDEBUG' to 1. 8465 8466 the directive `%debug' 8467 Add the `%debug' directive (*note Bison Declaration Summary: Decl 8468 Summary.). This is a Bison extension, especially useful for 8469 languages that don't use a preprocessor. Unless POSIX and Yacc 8470 portability matter to you, this is the preferred solution. 8471 8472 We suggest that you always enable the debug option so that debugging 8473 is always possible. 8474 8475 The trace facility outputs messages with macro calls of the form 8476 `YYFPRINTF (stderr, FORMAT, ARGS)' where FORMAT and ARGS are the usual 8477 `printf' format and variadic arguments. If you define `YYDEBUG' to a 8478 nonzero value but do not define `YYFPRINTF', `<stdio.h>' is 8479 automatically included and `YYFPRINTF' is defined to `fprintf'. 8480 8481 Once you have compiled the program with trace facilities, the way to 8482 request a trace is to store a nonzero value in the variable `yydebug'. 8483 You can do this by making the C code do it (in `main', perhaps), or you 8484 can alter the value with a C debugger. 8485 8486 Each step taken by the parser when `yydebug' is nonzero produces a 8487 line or two of trace information, written on `stderr'. The trace 8488 messages tell you these things: 8489 8490 * Each time the parser calls `yylex', what kind of token was read. 8491 8492 * Each time a token is shifted, the depth and complete contents of 8493 the state stack (*note Parser States::). 8494 8495 * Each time a rule is reduced, which rule it is, and the complete 8496 contents of the state stack afterward. 8497 8498 To make sense of this information, it helps to refer to the automaton 8499 description file (*note Understanding Your Parser: Understanding.). 8500 This file shows the meaning of each state in terms of positions in 8501 various rules, and also what each state will do with each possible 8502 input token. As you read the successive trace messages, you can see 8503 that the parser is functioning according to its specification in the 8504 listing file. Eventually you will arrive at the place where something 8505 undesirable happens, and you will see which parts of the grammar are to 8506 blame. 8507 8508 The parser implementation file is a C/C++/Java program and you can 8509 use debuggers on it, but it's not easy to interpret what it is doing. 8510 The parser function is a finite-state machine interpreter, and aside 8511 from the actions it executes the same code over and over. Only the 8512 values of variables show where in the grammar it is working. 8513 8514 8515 File: bison.info, Node: Mfcalc Traces, Next: The YYPRINT Macro, Prev: Enabling Traces, Up: Tracing 8516 8517 8.4.2 Enabling Debug Traces for `mfcalc' 8518 ---------------------------------------- 8519 8520 The debugging information normally gives the token type of each token 8521 read, but not its semantic value. The `%printer' directive allows 8522 specify how semantic values are reported, see *note Printing Semantic 8523 Values: Printer Decl. For backward compatibility, Yacc like C parsers 8524 may also use the `YYPRINT' (*note The `YYPRINT' Macro: The YYPRINT 8525 Macro.), but its use is discouraged. 8526 8527 As a demonstration of `%printer', consider the multi-function 8528 calculator, `mfcalc' (*note Multi-function Calc::). To enable run-time 8529 traces, and semantic value reports, insert the following directives in 8530 its prologue: 8531 8532 /* Generate the parser description file. */ 8533 %verbose 8534 /* Enable run-time traces (yydebug). */ 8535 %define parse.trace 8536 8537 /* Formatting semantic values. */ 8538 %printer { fprintf (yyoutput, "%s", $$->name); } VAR; 8539 %printer { fprintf (yyoutput, "%s()", $$->name); } FNCT; 8540 %printer { fprintf (yyoutput, "%g", $$); } <val>; 8541 8542 The `%define' directive instructs Bison to generate run-time trace 8543 support. Then, activation of these traces is controlled at run-time by 8544 the `yydebug' variable, which is disabled by default. Because these 8545 traces will refer to the "states" of the parser, it is helpful to ask 8546 for the creation of a description of that parser; this is the purpose 8547 of (admittedly ill-named) `%verbose' directive. 8548 8549 The set of `%printer' directives demonstrates how to format the 8550 semantic value in the traces. Note that the specification can be done 8551 either on the symbol type (e.g., `VAR' or `FNCT'), or on the type tag: 8552 since `<val>' is the type for both `NUM' and `exp', this printer will 8553 be used for them. 8554 8555 Here is a sample of the information provided by run-time traces. 8556 The traces are sent onto standard error. 8557 8558 $ echo 'sin(1-1)' | ./mfcalc -p 8559 Starting parse 8560 Entering state 0 8561 Reducing stack by rule 1 (line 34): 8562 -> $$ = nterm input () 8563 Stack now 0 8564 Entering state 1 8565 8566 This first batch shows a specific feature of this grammar: the first 8567 rule (which is in line 34 of `mfcalc.y' can be reduced without even 8568 having to look for the first token. The resulting left-hand symbol 8569 (`$$') is a valueless (`()') `input' non terminal (`nterm'). 8570 8571 Then the parser calls the scanner. 8572 Reading a token: Next token is token FNCT (sin()) 8573 Shifting token FNCT (sin()) 8574 Entering state 6 8575 8576 That token (`token') is a function (`FNCT') whose value is `sin' as 8577 formatted per our `%printer' specification: `sin()'. The parser stores 8578 (`Shifting') that token, and others, until it can do something about it. 8579 8580 Reading a token: Next token is token '(' () 8581 Shifting token '(' () 8582 Entering state 14 8583 Reading a token: Next token is token NUM (1.000000) 8584 Shifting token NUM (1.000000) 8585 Entering state 4 8586 Reducing stack by rule 6 (line 44): 8587 $1 = token NUM (1.000000) 8588 -> $$ = nterm exp (1.000000) 8589 Stack now 0 1 6 14 8590 Entering state 24 8591 8592 The previous reduction demonstrates the `%printer' directive for 8593 `<val>': both the token `NUM' and the resulting nonterminal `exp' have 8594 `1' as value. 8595 8596 Reading a token: Next token is token '-' () 8597 Shifting token '-' () 8598 Entering state 17 8599 Reading a token: Next token is token NUM (1.000000) 8600 Shifting token NUM (1.000000) 8601 Entering state 4 8602 Reducing stack by rule 6 (line 44): 8603 $1 = token NUM (1.000000) 8604 -> $$ = nterm exp (1.000000) 8605 Stack now 0 1 6 14 24 17 8606 Entering state 26 8607 Reading a token: Next token is token ')' () 8608 Reducing stack by rule 11 (line 49): 8609 $1 = nterm exp (1.000000) 8610 $2 = token '-' () 8611 $3 = nterm exp (1.000000) 8612 -> $$ = nterm exp (0.000000) 8613 Stack now 0 1 6 14 8614 Entering state 24 8615 8616 The rule for the subtraction was just reduced. The parser is about to 8617 discover the end of the call to `sin'. 8618 8619 Next token is token ')' () 8620 Shifting token ')' () 8621 Entering state 31 8622 Reducing stack by rule 9 (line 47): 8623 $1 = token FNCT (sin()) 8624 $2 = token '(' () 8625 $3 = nterm exp (0.000000) 8626 $4 = token ')' () 8627 -> $$ = nterm exp (0.000000) 8628 Stack now 0 1 8629 Entering state 11 8630 8631 Finally, the end-of-line allow the parser to complete the computation, 8632 and display its result. 8633 8634 Reading a token: Next token is token '\n' () 8635 Shifting token '\n' () 8636 Entering state 22 8637 Reducing stack by rule 4 (line 40): 8638 $1 = nterm exp (0.000000) 8639 $2 = token '\n' () 8640 => 0 8641 -> $$ = nterm line () 8642 Stack now 0 1 8643 Entering state 10 8644 Reducing stack by rule 2 (line 35): 8645 $1 = nterm input () 8646 $2 = nterm line () 8647 -> $$ = nterm input () 8648 Stack now 0 8649 Entering state 1 8650 8651 The parser has returned into state 1, in which it is waiting for the 8652 next expression to evaluate, or for the end-of-file token, which causes 8653 the completion of the parsing. 8654 8655 Reading a token: Now at end of input. 8656 Shifting token $end () 8657 Entering state 2 8658 Stack now 0 1 2 8659 Cleanup: popping token $end () 8660 Cleanup: popping nterm input () 8661 8662 8663 File: bison.info, Node: The YYPRINT Macro, Prev: Mfcalc Traces, Up: Tracing 8664 8665 8.4.3 The `YYPRINT' Macro 8666 ------------------------- 8667 8668 Before `%printer' support, semantic values could be displayed using the 8669 `YYPRINT' macro, which works only for terminal symbols and only with 8670 the `yacc.c' skeleton. 8671 8672 -- Macro: YYPRINT (STREAM, TOKEN, VALUE); 8673 If you define `YYPRINT', it should take three arguments. The 8674 parser will pass a standard I/O stream, the numeric code for the 8675 token type, and the token value (from `yylval'). 8676 8677 For `yacc.c' only. Obsoleted by `%printer'. 8678 8679 Here is an example of `YYPRINT' suitable for the multi-function 8680 calculator (*note Declarations for `mfcalc': Mfcalc Declarations.): 8681 8682 %{ 8683 static void print_token_value (FILE *, int, YYSTYPE); 8684 #define YYPRINT(File, Type, Value) \ 8685 print_token_value (File, Type, Value) 8686 %} 8687 8688 ... %% ... %% ... 8689 8690 static void 8691 print_token_value (FILE *file, int type, YYSTYPE value) 8692 { 8693 if (type == VAR) 8694 fprintf (file, "%s", value.tptr->name); 8695 else if (type == NUM) 8696 fprintf (file, "%d", value.val); 8697 } 8698 8699 8700 File: bison.info, Node: Invocation, Next: Other Languages, Prev: Debugging, Up: Top 8701 8702 9 Invoking Bison 8703 **************** 8704 8705 The usual way to invoke Bison is as follows: 8706 8707 bison INFILE 8708 8709 Here INFILE is the grammar file name, which usually ends in `.y'. 8710 The parser implementation file's name is made by replacing the `.y' 8711 with `.tab.c' and removing any leading directory. Thus, the `bison 8712 foo.y' file name yields `foo.tab.c', and the `bison hack/foo.y' file 8713 name yields `foo.tab.c'. It's also possible, in case you are writing 8714 C++ code instead of C in your grammar file, to name it `foo.ypp' or 8715 `foo.y++'. Then, the output files will take an extension like the 8716 given one as input (respectively `foo.tab.cpp' and `foo.tab.c++'). This 8717 feature takes effect with all options that manipulate file names like 8718 `-o' or `-d'. 8719 8720 For example : 8721 8722 bison -d INFILE.YXX 8723 will produce `infile.tab.cxx' and `infile.tab.hxx', and 8724 8725 bison -d -o OUTPUT.C++ INFILE.Y 8726 will produce `output.c++' and `outfile.h++'. 8727 8728 For compatibility with POSIX, the standard Bison distribution also 8729 contains a shell script called `yacc' that invokes Bison with the `-y' 8730 option. 8731 8732 * Menu: 8733 8734 * Bison Options:: All the options described in detail, 8735 in alphabetical order by short options. 8736 * Option Cross Key:: Alphabetical list of long options. 8737 * Yacc Library:: Yacc-compatible `yylex' and `main'. 8738 8739 8740 File: bison.info, Node: Bison Options, Next: Option Cross Key, Up: Invocation 8741 8742 9.1 Bison Options 8743 ================= 8744 8745 Bison supports both traditional single-letter options and mnemonic long 8746 option names. Long option names are indicated with `--' instead of 8747 `-'. Abbreviations for option names are allowed as long as they are 8748 unique. When a long option takes an argument, like `--file-prefix', 8749 connect the option name and the argument with `='. 8750 8751 Here is a list of options that can be used with Bison, alphabetized 8752 by short option. It is followed by a cross key alphabetized by long 8753 option. 8754 8755 Operations modes: 8756 `-h' 8757 `--help' 8758 Print a summary of the command-line options to Bison and exit. 8759 8760 `-V' 8761 `--version' 8762 Print the version number of Bison and exit. 8763 8764 `--print-localedir' 8765 Print the name of the directory containing locale-dependent data. 8766 8767 `--print-datadir' 8768 Print the name of the directory containing skeletons and XSLT. 8769 8770 `-y' 8771 `--yacc' 8772 Act more like the traditional Yacc command. This can cause 8773 different diagnostics to be generated, and may change behavior in 8774 other minor ways. Most importantly, imitate Yacc's output file 8775 name conventions, so that the parser implementation file is called 8776 `y.tab.c', and the other outputs are called `y.output' and 8777 `y.tab.h'. Also, if generating a deterministic parser in C, 8778 generate `#define' statements in addition to an `enum' to associate 8779 token numbers with token names. Thus, the following shell script 8780 can substitute for Yacc, and the Bison distribution contains such 8781 a script for compatibility with POSIX: 8782 8783 #! /bin/sh 8784 bison -y "$@" 8785 8786 The `-y'/`--yacc' option is intended for use with traditional Yacc 8787 grammars. If your grammar uses a Bison extension like 8788 `%glr-parser', Bison might not be Yacc-compatible even if this 8789 option is specified. 8790 8791 `-W [CATEGORY]' 8792 `--warnings[=CATEGORY]' 8793 Output warnings falling in CATEGORY. CATEGORY can be one of: 8794 `midrule-values' 8795 Warn about mid-rule values that are set but not used within 8796 any of the actions of the parent rule. For example, warn 8797 about unused `$2' in: 8798 8799 exp: '1' { $$ = 1; } '+' exp { $$ = $1 + $4; }; 8800 8801 Also warn about mid-rule values that are used but not set. 8802 For example, warn about unset `$$' in the mid-rule action in: 8803 8804 exp: '1' { $1 = 1; } '+' exp { $$ = $2 + $4; }; 8805 8806 These warnings are not enabled by default since they 8807 sometimes prove to be false alarms in existing grammars 8808 employing the Yacc constructs `$0' or `$-N' (where N is some 8809 positive integer). 8810 8811 `yacc' 8812 Incompatibilities with POSIX Yacc. 8813 8814 `conflicts-sr' 8815 `conflicts-rr' 8816 S/R and R/R conflicts. These warnings are enabled by 8817 default. However, if the `%expect' or `%expect-rr' directive 8818 is specified, an unexpected number of conflicts is an error, 8819 and an expected number of conflicts is not reported, so `-W' 8820 and `--warning' then have no effect on the conflict report. 8821 8822 `other' 8823 All warnings not categorized above. These warnings are 8824 enabled by default. 8825 8826 This category is provided merely for the sake of 8827 completeness. Future releases of Bison may move warnings 8828 from this category to new, more specific categories. 8829 8830 `all' 8831 All the warnings. 8832 8833 `none' 8834 Turn off all the warnings. 8835 8836 `error' 8837 Treat warnings as errors. 8838 8839 A category can be turned off by prefixing its name with `no-'. For 8840 instance, `-Wno-yacc' will hide the warnings about POSIX Yacc 8841 incompatibilities. 8842 8843 `-f [FEATURE]' 8844 `--feature[=FEATURE]' 8845 Activate miscellaneous FEATURE. FEATURE can be one of: 8846 `caret' 8847 `diagnostics-show-caret' 8848 Show caret errors, in a manner similar to GCC's 8849 `-fdiagnostics-show-caret', or Clang's `-fcaret-diagnotics'. 8850 The location provided with the message is used to quote the 8851 corresponding line of the source file, underlining the 8852 important part of it with carets (^). Here is an example, 8853 using the following file `in.y': 8854 8855 %type <ival> exp 8856 %% 8857 exp: exp '+' exp { $exp = $1 + $2; }; 8858 8859 When invoked with `-fcaret', Bison will report: 8860 8861 in.y:3.20-23: error: ambiguous reference: '$exp' 8862 exp: exp '+' exp { $exp = $1 + $2; }; 8863 ^^^^ 8864 in.y:3.1-3: refers to: $exp at $$ 8865 exp: exp '+' exp { $exp = $1 + $2; }; 8866 ^^^ 8867 in.y:3.6-8: refers to: $exp at $1 8868 exp: exp '+' exp { $exp = $1 + $2; }; 8869 ^^^ 8870 in.y:3.14-16: refers to: $exp at $3 8871 exp: exp '+' exp { $exp = $1 + $2; }; 8872 ^^^ 8873 in.y:3.32-33: error: $2 of 'exp' has no declared type 8874 exp: exp '+' exp { $exp = $1 + $2; }; 8875 ^^ 8876 8877 8878 Tuning the parser: 8879 8880 `-t' 8881 `--debug' 8882 In the parser implementation file, define the macro `YYDEBUG' to 1 8883 if it is not already defined, so that the debugging facilities are 8884 compiled. *Note Tracing Your Parser: Tracing. 8885 8886 `-D NAME[=VALUE]' 8887 `--define=NAME[=VALUE]' 8888 `-F NAME[=VALUE]' 8889 `--force-define=NAME[=VALUE]' 8890 Each of these is equivalent to `%define NAME "VALUE"' (*note 8891 %define Summary::) except that Bison processes multiple 8892 definitions for the same NAME as follows: 8893 8894 * Bison quietly ignores all command-line definitions for NAME 8895 except the last. 8896 8897 * If that command-line definition is specified by a `-D' or 8898 `--define', Bison reports an error for any `%define' 8899 definition for NAME. 8900 8901 * If that command-line definition is specified by a `-F' or 8902 `--force-define' instead, Bison quietly ignores all `%define' 8903 definitions for NAME. 8904 8905 * Otherwise, Bison reports an error if there are multiple 8906 `%define' definitions for NAME. 8907 8908 You should avoid using `-F' and `--force-define' in your make 8909 files unless you are confident that it is safe to quietly ignore 8910 any conflicting `%define' that may be added to the grammar file. 8911 8912 `-L LANGUAGE' 8913 `--language=LANGUAGE' 8914 Specify the programming language for the generated parser, as if 8915 `%language' was specified (*note Bison Declaration Summary: Decl 8916 Summary.). Currently supported languages include C, C++, and Java. 8917 LANGUAGE is case-insensitive. 8918 8919 `--locations' 8920 Pretend that `%locations' was specified. *Note Decl Summary::. 8921 8922 `-p PREFIX' 8923 `--name-prefix=PREFIX' 8924 Pretend that `%name-prefix "PREFIX"' was specified (*note Decl 8925 Summary::). Obsoleted by `-Dapi.prefix=PREFIX'. *Note Multiple 8926 Parsers in the Same Program: Multiple Parsers. 8927 8928 `-l' 8929 `--no-lines' 8930 Don't put any `#line' preprocessor commands in the parser 8931 implementation file. Ordinarily Bison puts them in the parser 8932 implementation file so that the C compiler and debuggers will 8933 associate errors with your source file, the grammar file. This 8934 option causes them to associate errors with the parser 8935 implementation file, treating it as an independent source file in 8936 its own right. 8937 8938 `-S FILE' 8939 `--skeleton=FILE' 8940 Specify the skeleton to use, similar to `%skeleton' (*note Bison 8941 Declaration Summary: Decl Summary.). 8942 8943 If FILE does not contain a `/', FILE is the name of a skeleton 8944 file in the Bison installation directory. If it does, FILE is an 8945 absolute file name or a file name relative to the current working 8946 directory. This is similar to how most shells resolve commands. 8947 8948 `-k' 8949 `--token-table' 8950 Pretend that `%token-table' was specified. *Note Decl Summary::. 8951 8952 Adjust the output: 8953 8954 `--defines[=FILE]' 8955 Pretend that `%defines' was specified, i.e., write an extra output 8956 file containing macro definitions for the token type names defined 8957 in the grammar, as well as a few other declarations. *Note Decl 8958 Summary::. 8959 8960 `-d' 8961 This is the same as `--defines' except `-d' does not accept a FILE 8962 argument since POSIX Yacc requires that `-d' can be bundled with 8963 other short options. 8964 8965 `-b FILE-PREFIX' 8966 `--file-prefix=PREFIX' 8967 Pretend that `%file-prefix' was specified, i.e., specify prefix to 8968 use for all Bison output file names. *Note Decl Summary::. 8969 8970 `-r THINGS' 8971 `--report=THINGS' 8972 Write an extra output file containing verbose description of the 8973 comma separated list of THINGS among: 8974 8975 `state' 8976 Description of the grammar, conflicts (resolved and 8977 unresolved), and parser's automaton. 8978 8979 `itemset' 8980 Implies `state' and augments the description of the automaton 8981 with the full set of items for each state, instead of its 8982 core only. 8983 8984 `lookahead' 8985 Implies `state' and augments the description of the automaton 8986 with each rule's lookahead set. 8987 8988 `solved' 8989 Implies `state'. Explain how conflicts were solved thanks to 8990 precedence and associativity directives. 8991 8992 `all' 8993 Enable all the items. 8994 8995 `none' 8996 Do not generate the report. 8997 8998 `--report-file=FILE' 8999 Specify the FILE for the verbose description. 9000 9001 `-v' 9002 `--verbose' 9003 Pretend that `%verbose' was specified, i.e., write an extra output 9004 file containing verbose descriptions of the grammar and parser. 9005 *Note Decl Summary::. 9006 9007 `-o FILE' 9008 `--output=FILE' 9009 Specify the FILE for the parser implementation file. 9010 9011 The other output files' names are constructed from FILE as 9012 described under the `-v' and `-d' options. 9013 9014 `-g [FILE]' 9015 `--graph[=FILE]' 9016 Output a graphical representation of the parser's automaton 9017 computed by Bison, in Graphviz (http://www.graphviz.org/) DOT 9018 (http://www.graphviz.org/doc/info/lang.html) format. `FILE' is 9019 optional. If omitted and the grammar file is `foo.y', the output 9020 file will be `foo.dot'. 9021 9022 `-x [FILE]' 9023 `--xml[=FILE]' 9024 Output an XML report of the parser's automaton computed by Bison. 9025 `FILE' is optional. If omitted and the grammar file is `foo.y', 9026 the output file will be `foo.xml'. (The current XML schema is 9027 experimental and may evolve. More user feedback will help to 9028 stabilize it.) 9029 9030 9031 File: bison.info, Node: Option Cross Key, Next: Yacc Library, Prev: Bison Options, Up: Invocation 9032 9033 9.2 Option Cross Key 9034 ==================== 9035 9036 Here is a list of options, alphabetized by long option, to help you find 9037 the corresponding short option and directive. 9038 9039 Long Option Short Option Bison Directive 9040 --------------------------------------------------------------------------------- 9041 `--debug' `-t' `%debug' 9042 `--define=NAME[=VALUE]' `-D NAME[=VALUE]' `%define NAME ["VALUE"]' 9043 `--defines[=FILE]' `-d' `%defines ["FILE"]' 9044 `--feature[=FEATURE]' `-f [FEATURE]' 9045 `--file-prefix=PREFIX' `-b PREFIX' `%file-prefix "PREFIX"' 9046 `--force-define=NAME[=VALUE]' `-F NAME[=VALUE]' `%define NAME ["VALUE"]' 9047 `--graph[=FILE]' `-g [FILE]' 9048 `--help' `-h' 9049 `--language=LANGUAGE' `-L LANGUAGE' `%language "LANGUAGE"' 9050 `--locations' `%locations' 9051 `--name-prefix=PREFIX' `-p PREFIX' `%name-prefix "PREFIX"' 9052 `--no-lines' `-l' `%no-lines' 9053 `--output=FILE' `-o FILE' `%output "FILE"' 9054 `--print-datadir' 9055 `--print-localedir' 9056 `--report-file=FILE' 9057 `--report=THINGS' `-r THINGS' 9058 `--skeleton=FILE' `-S FILE' `%skeleton "FILE"' 9059 `--token-table' `-k' `%token-table' 9060 `--verbose' `-v' `%verbose' 9061 `--version' `-V' 9062 `--warnings[=CATEGORY]' `-W [CATEGORY]' 9063 `--xml[=FILE]' `-x [FILE]' 9064 `--yacc' `-y' `%yacc' 9065 9066 9067 File: bison.info, Node: Yacc Library, Prev: Option Cross Key, Up: Invocation 9068 9069 9.3 Yacc Library 9070 ================ 9071 9072 The Yacc library contains default implementations of the `yyerror' and 9073 `main' functions. These default implementations are normally not 9074 useful, but POSIX requires them. To use the Yacc library, link your 9075 program with the `-ly' option. Note that Bison's implementation of the 9076 Yacc library is distributed under the terms of the GNU General Public 9077 License (*note Copying::). 9078 9079 If you use the Yacc library's `yyerror' function, you should declare 9080 `yyerror' as follows: 9081 9082 int yyerror (char const *); 9083 9084 Bison ignores the `int' value returned by this `yyerror'. If you 9085 use the Yacc library's `main' function, your `yyparse' function should 9086 have the following type signature: 9087 9088 int yyparse (void); 9089 9090 9091 File: bison.info, Node: Other Languages, Next: FAQ, Prev: Invocation, Up: Top 9092 9093 10 Parsers Written In Other Languages 9094 ************************************* 9095 9096 * Menu: 9097 9098 * C++ Parsers:: The interface to generate C++ parser classes 9099 * Java Parsers:: The interface to generate Java parser classes 9100 9101 9102 File: bison.info, Node: C++ Parsers, Next: Java Parsers, Up: Other Languages 9103 9104 10.1 C++ Parsers 9105 ================ 9106 9107 * Menu: 9108 9109 * C++ Bison Interface:: Asking for C++ parser generation 9110 * C++ Semantic Values:: %union vs. C++ 9111 * C++ Location Values:: The position and location classes 9112 * C++ Parser Interface:: Instantiating and running the parser 9113 * C++ Scanner Interface:: Exchanges between yylex and parse 9114 * A Complete C++ Example:: Demonstrating their use 9115 9116 9117 File: bison.info, Node: C++ Bison Interface, Next: C++ Semantic Values, Up: C++ Parsers 9118 9119 10.1.1 C++ Bison Interface 9120 -------------------------- 9121 9122 The C++ deterministic parser is selected using the skeleton directive, 9123 `%skeleton "lalr1.cc"', or the synonymous command-line option 9124 `--skeleton=lalr1.cc'. *Note Decl Summary::. 9125 9126 When run, `bison' will create several entities in the `yy' namespace. Use 9127 the `%define namespace' directive to change the namespace name, see 9128 *note namespace: %define Summary. The various classes are generated in 9129 the following files: 9130 9131 `position.hh' 9132 `location.hh' 9133 The definition of the classes `position' and `location', used for 9134 location tracking. These files are not generated if the `%define' 9135 variable `api.location.type' is defined. *Note C++ Location 9136 Values::. 9137 9138 `stack.hh' 9139 An auxiliary class `stack' used by the parser. 9140 9141 `FILE.hh' 9142 `FILE.cc' 9143 (Assuming the extension of the grammar file was `.yy'.) The 9144 declaration and implementation of the C++ parser class. The 9145 basename and extension of these two files follow the same rules as 9146 with regular C parsers (*note Invocation::). 9147 9148 The header is _mandatory_; you must either pass `-d'/`--defines' 9149 to `bison', or use the `%defines' directive. 9150 9151 All these files are documented using Doxygen; run `doxygen' for a 9152 complete and accurate documentation. 9153 9154 9155 File: bison.info, Node: C++ Semantic Values, Next: C++ Location Values, Prev: C++ Bison Interface, Up: C++ Parsers 9156 9157 10.1.2 C++ Semantic Values 9158 -------------------------- 9159 9160 The `%union' directive works as for C, see *note The Collection of 9161 Value Types: Union Decl. In particular it produces a genuine 9162 `union'(1), which have a few specific features in C++. 9163 - The type `YYSTYPE' is defined but its use is discouraged: rather 9164 you should refer to the parser's encapsulated type 9165 `yy::parser::semantic_type'. 9166 9167 - Non POD (Plain Old Data) types cannot be used. C++ forbids any 9168 instance of classes with constructors in unions: only _pointers_ 9169 to such objects are allowed. 9170 9171 Because objects have to be stored via pointers, memory is not 9172 reclaimed automatically: using the `%destructor' directive is the only 9173 means to avoid leaks. *Note Freeing Discarded Symbols: Destructor Decl. 9174 9175 ---------- Footnotes ---------- 9176 9177 (1) In the future techniques to allow complex types within 9178 pseudo-unions (similar to Boost variants) might be implemented to 9179 alleviate these issues. 9180 9181 9182 File: bison.info, Node: C++ Location Values, Next: C++ Parser Interface, Prev: C++ Semantic Values, Up: C++ Parsers 9183 9184 10.1.3 C++ Location Values 9185 -------------------------- 9186 9187 When the directive `%locations' is used, the C++ parser supports 9188 location tracking, see *note Tracking Locations::. 9189 9190 By default, two auxiliary classes define a `position', a single point 9191 in a file, and a `location', a range composed of a pair of `position's 9192 (possibly spanning several files). But if the `%define' variable 9193 `api.location.type' is defined, then these classes will not be 9194 generated, and the user defined type will be used. 9195 9196 In this section `uint' is an abbreviation for `unsigned int': in 9197 genuine code only the latter is used. 9198 9199 * Menu: 9200 9201 * C++ position:: One point in the source file 9202 * C++ location:: Two points in the source file 9203 * User Defined Location Type:: Required interface for locations 9204 9205 9206 File: bison.info, Node: C++ position, Next: C++ location, Up: C++ Location Values 9207 9208 10.1.3.1 C++ `position' 9209 ....................... 9210 9211 -- Constructor on position: position (std::string* FILE = 0, uint 9212 LINE = 1, uint COL = 1) 9213 Create a `position' denoting a given point. Note that `file' is 9214 not reclaimed when the `position' is destroyed: memory managed 9215 must be handled elsewhere. 9216 9217 -- Method on position: void initialize (std::string* FILE = 0, uint 9218 LINE = 1, uint COL = 1) 9219 Reset the position to the given values. 9220 9221 -- Instance Variable of position: std::string* file 9222 The name of the file. It will always be handled as a pointer, the 9223 parser will never duplicate nor deallocate it. As an experimental 9224 feature you may change it to `TYPE*' using `%define filename_type 9225 "TYPE"'. 9226 9227 -- Instance Variable of position: uint line 9228 The line, starting at 1. 9229 9230 -- Method on position: uint lines (int HEIGHT = 1) 9231 Advance by HEIGHT lines, resetting the column number. 9232 9233 -- Instance Variable of position: uint column 9234 The column, starting at 1. 9235 9236 -- Method on position: uint columns (int WIDTH = 1) 9237 Advance by WIDTH columns, without changing the line number. 9238 9239 -- Method on position: position& operator+= (int WIDTH) 9240 -- Method on position: position operator+ (int WIDTH) 9241 -- Method on position: position& operator-= (int WIDTH) 9242 -- Method on position: position operator- (int WIDTH) 9243 Various forms of syntactic sugar for `columns'. 9244 9245 -- Method on position: bool operator== (const position& THAT) 9246 -- Method on position: bool operator!= (const position& THAT) 9247 Whether `*this' and `that' denote equal/different positions. 9248 9249 -- Function: std::ostream& operator<< (std::ostream& O, const 9250 position& P) 9251 Report P on O like this: `FILE:LINE.COLUMN', or `LINE.COLUMN' if 9252 FILE is null. 9253 9254 9255 File: bison.info, Node: C++ location, Next: User Defined Location Type, Prev: C++ position, Up: C++ Location Values 9256 9257 10.1.3.2 C++ `location' 9258 ....................... 9259 9260 -- Constructor on location: location (const position& BEGIN, const 9261 position& END) 9262 Create a `Location' from the endpoints of the range. 9263 9264 -- Constructor on location: location (const position& POS = 9265 position()) 9266 -- Constructor on location: location (std::string* FILE, uint LINE, 9267 uint COL) 9268 Create a `Location' denoting an empty range located at a given 9269 point. 9270 9271 -- Method on location: void initialize (std::string* FILE = 0, uint 9272 LINE = 1, uint COL = 1) 9273 Reset the location to an empty range at the given values. 9274 9275 -- Instance Variable of location: position begin 9276 -- Instance Variable of location: position end 9277 The first, inclusive, position of the range, and the first beyond. 9278 9279 -- Method on location: uint columns (int WIDTH = 1) 9280 -- Method on location: uint lines (int HEIGHT = 1) 9281 Advance the `end' position. 9282 9283 -- Method on location: location operator+ (const location& END) 9284 -- Method on location: location operator+ (int WIDTH) 9285 -- Method on location: location operator+= (int WIDTH) 9286 Various forms of syntactic sugar. 9287 9288 -- Method on location: void step () 9289 Move `begin' onto `end'. 9290 9291 -- Method on location: bool operator== (const location& THAT) 9292 -- Method on location: bool operator!= (const location& THAT) 9293 Whether `*this' and `that' denote equal/different ranges of 9294 positions. 9295 9296 -- Function: std::ostream& operator<< (std::ostream& O, const 9297 location& P) 9298 Report P on O, taking care of special cases such as: no `filename' 9299 defined, or equal filename/line or column. 9300 9301 9302 File: bison.info, Node: User Defined Location Type, Prev: C++ location, Up: C++ Location Values 9303 9304 10.1.3.3 User Defined Location Type 9305 ................................... 9306 9307 Instead of using the built-in types you may use the `%define' variable 9308 `api.location.type' to specify your own type: 9309 9310 %define api.location.type LOCATIONTYPE 9311 9312 The requirements over your LOCATIONTYPE are: 9313 * it must be copyable; 9314 9315 * in order to compute the (default) value of `@$' in a reduction, the 9316 parser basically runs 9317 @$.begin = @$1.begin; 9318 @$.end = @$N.end; // The location of last right-hand side symbol. 9319 so there must be copyable `begin' and `end' members; 9320 9321 * alternatively you may redefine the computation of the default 9322 location, in which case these members are not required (*note 9323 Location Default Action::); 9324 9325 * if traces are enabled, then there must exist an `std::ostream& 9326 operator<< (std::ostream& o, const LOCATIONTYPE& s)' function. 9327 9328 9329 In programs with several C++ parsers, you may also use the `%define' 9330 variable `api.location.type' to share a common set of built-in 9331 definitions for `position' and `location'. For instance, one parser 9332 `master/parser.yy' might use: 9333 9334 %defines 9335 %locations 9336 %define namespace "master::" 9337 9338 to generate the `master/position.hh' and `master/location.hh' files, 9339 reused by other parsers as follows: 9340 9341 %define api.location.type "master::location" 9342 %code requires { #include <master/location.hh> } 9343 9344 9345 File: bison.info, Node: C++ Parser Interface, Next: C++ Scanner Interface, Prev: C++ Location Values, Up: C++ Parsers 9346 9347 10.1.4 C++ Parser Interface 9348 --------------------------- 9349 9350 The output files `OUTPUT.hh' and `OUTPUT.cc' declare and define the 9351 parser class in the namespace `yy'. The class name defaults to 9352 `parser', but may be changed using `%define parser_class_name "NAME"'. 9353 The interface of this class is detailed below. It can be extended 9354 using the `%parse-param' feature: its semantics is slightly changed 9355 since it describes an additional member of the parser class, and an 9356 additional argument for its constructor. 9357 9358 -- Type of parser: semantic_type 9359 -- Type of parser: location_type 9360 The types for semantics value and locations. 9361 9362 -- Type of parser: token 9363 A structure that contains (only) the `yytokentype' enumeration, 9364 which defines the tokens. To refer to the token `FOO', use 9365 `yy::parser::token::FOO'. The scanner can use `typedef 9366 yy::parser::token token;' to "import" the token enumeration (*note 9367 Calc++ Scanner::). 9368 9369 -- Method on parser: parser (TYPE1 ARG1, ...) 9370 Build a new parser object. There are no arguments by default, 9371 unless `%parse-param {TYPE1 ARG1}' was used. 9372 9373 -- Method on parser: int parse () 9374 Run the syntactic analysis, and return 0 on success, 1 otherwise. 9375 9376 The whole function is wrapped in a `try'/`catch' block, so that 9377 when an exception is thrown, the `%destructor's are called to 9378 release the lookahead symbol, and the symbols pushed on the stack. 9379 9380 -- Method on parser: std::ostream& debug_stream () 9381 -- Method on parser: void set_debug_stream (std::ostream& O) 9382 Get or set the stream used for tracing the parsing. It defaults to 9383 `std::cerr'. 9384 9385 -- Method on parser: debug_level_type debug_level () 9386 -- Method on parser: void set_debug_level (debug_level L) 9387 Get or set the tracing level. Currently its value is either 0, no 9388 trace, or nonzero, full tracing. 9389 9390 -- Method on parser: void error (const location_type& L, const 9391 std::string& M) 9392 The definition for this member function must be supplied by the 9393 user: the parser uses it to report a parser error occurring at L, 9394 described by M. 9395 9396 9397 File: bison.info, Node: C++ Scanner Interface, Next: A Complete C++ Example, Prev: C++ Parser Interface, Up: C++ Parsers 9398 9399 10.1.5 C++ Scanner Interface 9400 ---------------------------- 9401 9402 The parser invokes the scanner by calling `yylex'. Contrary to C 9403 parsers, C++ parsers are always pure: there is no point in using the 9404 `%define api.pure full' directive. Therefore the interface is as 9405 follows. 9406 9407 -- Method on parser: int yylex (semantic_type* YYLVAL, location_type* 9408 YYLLOC, TYPE1 ARG1, ...) 9409 Return the next token. Its type is the return value, its semantic 9410 value and location being YYLVAL and YYLLOC. Invocations of 9411 `%lex-param {TYPE1 ARG1}' yield additional arguments. 9412 9413 9414 File: bison.info, Node: A Complete C++ Example, Prev: C++ Scanner Interface, Up: C++ Parsers 9415 9416 10.1.6 A Complete C++ Example 9417 ----------------------------- 9418 9419 This section demonstrates the use of a C++ parser with a simple but 9420 complete example. This example should be available on your system, 9421 ready to compile, in the directory "../bison/examples/calc++". It 9422 focuses on the use of Bison, therefore the design of the various C++ 9423 classes is very naive: no accessors, no encapsulation of members etc. 9424 We will use a Lex scanner, and more precisely, a Flex scanner, to 9425 demonstrate the various interaction. A hand written scanner is 9426 actually easier to interface with. 9427 9428 * Menu: 9429 9430 * Calc++ --- C++ Calculator:: The specifications 9431 * Calc++ Parsing Driver:: An active parsing context 9432 * Calc++ Parser:: A parser class 9433 * Calc++ Scanner:: A pure C++ Flex scanner 9434 * Calc++ Top Level:: Conducting the band 9435 9436 9437 File: bison.info, Node: Calc++ --- C++ Calculator, Next: Calc++ Parsing Driver, Up: A Complete C++ Example 9438 9439 10.1.6.1 Calc++ -- C++ Calculator 9440 ................................. 9441 9442 Of course the grammar is dedicated to arithmetics, a single expression, 9443 possibly preceded by variable assignments. An environment containing 9444 possibly predefined variables such as `one' and `two', is exchanged 9445 with the parser. An example of valid input follows. 9446 9447 three := 3 9448 seven := one + two * three 9449 seven * seven 9450 9451 9452 File: bison.info, Node: Calc++ Parsing Driver, Next: Calc++ Parser, Prev: Calc++ --- C++ Calculator, Up: A Complete C++ Example 9453 9454 10.1.6.2 Calc++ Parsing Driver 9455 .............................. 9456 9457 To support a pure interface with the parser (and the scanner) the 9458 technique of the "parsing context" is convenient: a structure 9459 containing all the data to exchange. Since, in addition to simply 9460 launch the parsing, there are several auxiliary tasks to execute (open 9461 the file for parsing, instantiate the parser etc.), we recommend 9462 transforming the simple parsing context structure into a fully blown 9463 "parsing driver" class. 9464 9465 The declaration of this driver class, `calc++-driver.hh', is as 9466 follows. The first part includes the CPP guard and imports the 9467 required standard library components, and the declaration of the parser 9468 class. 9469 9470 #ifndef CALCXX_DRIVER_HH 9471 # define CALCXX_DRIVER_HH 9472 # include <string> 9473 # include <map> 9474 # include "calc++-parser.hh" 9475 9476 Then comes the declaration of the scanning function. Flex expects the 9477 signature of `yylex' to be defined in the macro `YY_DECL', and the C++ 9478 parser expects it to be declared. We can factor both as follows. 9479 9480 // Tell Flex the lexer's prototype ... 9481 # define YY_DECL \ 9482 yy::calcxx_parser::token_type \ 9483 yylex (yy::calcxx_parser::semantic_type* yylval, \ 9484 yy::calcxx_parser::location_type* yylloc, \ 9485 calcxx_driver& driver) 9486 // ... and declare it for the parser's sake. 9487 YY_DECL; 9488 9489 The `calcxx_driver' class is then declared with its most obvious 9490 members. 9491 9492 // Conducting the whole scanning and parsing of Calc++. 9493 class calcxx_driver 9494 { 9495 public: 9496 calcxx_driver (); 9497 virtual ~calcxx_driver (); 9498 9499 std::map<std::string, int> variables; 9500 9501 int result; 9502 9503 To encapsulate the coordination with the Flex scanner, it is useful to 9504 have two members function to open and close the scanning phase. 9505 9506 // Handling the scanner. 9507 void scan_begin (); 9508 void scan_end (); 9509 bool trace_scanning; 9510 9511 Similarly for the parser itself. 9512 9513 // Run the parser. Return 0 on success. 9514 int parse (const std::string& f); 9515 std::string file; 9516 bool trace_parsing; 9517 9518 To demonstrate pure handling of parse errors, instead of simply dumping 9519 them on the standard error output, we will pass them to the compiler 9520 driver using the following two member functions. Finally, we close the 9521 class declaration and CPP guard. 9522 9523 // Error handling. 9524 void error (const yy::location& l, const std::string& m); 9525 void error (const std::string& m); 9526 }; 9527 #endif // ! CALCXX_DRIVER_HH 9528 9529 The implementation of the driver is straightforward. The `parse' 9530 member function deserves some attention. The `error' functions are 9531 simple stubs, they should actually register the located error messages 9532 and set error state. 9533 9534 #include "calc++-driver.hh" 9535 #include "calc++-parser.hh" 9536 9537 calcxx_driver::calcxx_driver () 9538 : trace_scanning (false), trace_parsing (false) 9539 { 9540 variables["one"] = 1; 9541 variables["two"] = 2; 9542 } 9543 9544 calcxx_driver::~calcxx_driver () 9545 { 9546 } 9547 9548 int 9549 calcxx_driver::parse (const std::string &f) 9550 { 9551 file = f; 9552 scan_begin (); 9553 yy::calcxx_parser parser (*this); 9554 parser.set_debug_level (trace_parsing); 9555 int res = parser.parse (); 9556 scan_end (); 9557 return res; 9558 } 9559 9560 void 9561 calcxx_driver::error (const yy::location& l, const std::string& m) 9562 { 9563 std::cerr << l << ": " << m << std::endl; 9564 } 9565 9566 void 9567 calcxx_driver::error (const std::string& m) 9568 { 9569 std::cerr << m << std::endl; 9570 } 9571 9572 9573 File: bison.info, Node: Calc++ Parser, Next: Calc++ Scanner, Prev: Calc++ Parsing Driver, Up: A Complete C++ Example 9574 9575 10.1.6.3 Calc++ Parser 9576 ...................... 9577 9578 The grammar file `calc++-parser.yy' starts by asking for the C++ 9579 deterministic parser skeleton, the creation of the parser header file, 9580 and specifies the name of the parser class. Because the C++ skeleton 9581 changed several times, it is safer to require the version you designed 9582 the grammar for. 9583 9584 %skeleton "lalr1.cc" /* -*- C++ -*- */ 9585 %require "2.7" 9586 %defines 9587 %define parser_class_name "calcxx_parser" 9588 9589 Then come the declarations/inclusions needed to define the `%union'. 9590 Because the parser uses the parsing driver and reciprocally, both 9591 cannot include the header of the other. Because the driver's header 9592 needs detailed knowledge about the parser class (in particular its 9593 inner types), it is the parser's header which will simply use a forward 9594 declaration of the driver. *Note %code Summary::. 9595 9596 %code requires { 9597 # include <string> 9598 class calcxx_driver; 9599 } 9600 9601 The driver is passed by reference to the parser and to the scanner. 9602 This provides a simple but effective pure interface, not relying on 9603 global variables. 9604 9605 // The parsing context. 9606 %parse-param { calcxx_driver& driver } 9607 %lex-param { calcxx_driver& driver } 9608 9609 Then we request the location tracking feature, and initialize the first 9610 location's file name. Afterward new locations are computed relatively 9611 to the previous locations: the file name will be automatically 9612 propagated. 9613 9614 %locations 9615 %initial-action 9616 { 9617 // Initialize the initial location. 9618 @$.begin.filename = @$.end.filename = &driver.file; 9619 }; 9620 9621 Use the two following directives to enable parser tracing and verbose 9622 error messages. However, verbose error messages can contain incorrect 9623 information (*note LAC::). 9624 9625 %debug 9626 %error-verbose 9627 9628 Semantic values cannot use "real" objects, but only pointers to them. 9629 9630 // Symbols. 9631 %union 9632 { 9633 int ival; 9634 std::string *sval; 9635 }; 9636 9637 The code between `%code {' and `}' is output in the `*.cc' file; it 9638 needs detailed knowledge about the driver. 9639 9640 %code { 9641 # include "calc++-driver.hh" 9642 } 9643 9644 The token numbered as 0 corresponds to end of file; the following line 9645 allows for nicer error messages referring to "end of file" instead of 9646 "$end". Similarly user friendly named are provided for each symbol. 9647 Note that the tokens names are prefixed by `TOKEN_' to avoid name 9648 clashes. 9649 9650 %token END 0 "end of file" 9651 %token ASSIGN ":=" 9652 %token <sval> IDENTIFIER "identifier" 9653 %token <ival> NUMBER "number" 9654 %type <ival> exp 9655 9656 To enable memory deallocation during error recovery, use `%destructor'. 9657 9658 %printer { yyoutput << *$$; } "identifier" 9659 %destructor { delete $$; } "identifier" 9660 9661 %printer { yyoutput << $$; } <ival> 9662 9663 The grammar itself is straightforward. 9664 9665 %% 9666 %start unit; 9667 unit: assignments exp { driver.result = $2; }; 9668 9669 assignments: 9670 /* Nothing. */ {} 9671 | assignments assignment {}; 9672 9673 assignment: 9674 "identifier" ":=" exp 9675 { driver.variables[*$1] = $3; delete $1; }; 9676 9677 %left '+' '-'; 9678 %left '*' '/'; 9679 exp: exp '+' exp { $$ = $1 + $3; } 9680 | exp '-' exp { $$ = $1 - $3; } 9681 | exp '*' exp { $$ = $1 * $3; } 9682 | exp '/' exp { $$ = $1 / $3; } 9683 | "identifier" { $$ = driver.variables[*$1]; delete $1; } 9684 | "number" { $$ = $1; }; 9685 %% 9686 9687 Finally the `error' member function registers the errors to the driver. 9688 9689 void 9690 yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, 9691 const std::string& m) 9692 { 9693 driver.error (l, m); 9694 } 9695 9696 9697 File: bison.info, Node: Calc++ Scanner, Next: Calc++ Top Level, Prev: Calc++ Parser, Up: A Complete C++ Example 9698 9699 10.1.6.4 Calc++ Scanner 9700 ....................... 9701 9702 The Flex scanner first includes the driver declaration, then the 9703 parser's to get the set of defined tokens. 9704 9705 %{ /* -*- C++ -*- */ 9706 # include <cstdlib> 9707 # include <cerrno> 9708 # include <climits> 9709 # include <string> 9710 # include "calc++-driver.hh" 9711 # include "calc++-parser.hh" 9712 9713 /* Work around an incompatibility in flex (at least versions 9714 2.5.31 through 2.5.33): it generates code that does 9715 not conform to C89. See Debian bug 333231 9716 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */ 9717 # undef yywrap 9718 # define yywrap() 1 9719 9720 /* By default yylex returns int, we use token_type. 9721 Unfortunately yyterminate by default returns 0, which is 9722 not of token_type. */ 9723 #define yyterminate() return token::END 9724 %} 9725 9726 Because there is no `#include'-like feature we don't need `yywrap', we 9727 don't need `unput' either, and we parse an actual file, this is not an 9728 interactive session with the user. Finally we enable the scanner 9729 tracing features. 9730 9731 %option noyywrap nounput batch debug 9732 9733 Abbreviations allow for more readable rules. 9734 9735 id [a-zA-Z][a-zA-Z_0-9]* 9736 int [0-9]+ 9737 blank [ \t] 9738 9739 The following paragraph suffices to track locations accurately. Each 9740 time `yylex' is invoked, the begin position is moved onto the end 9741 position. Then when a pattern is matched, the end position is advanced 9742 of its width. In case it matched ends of lines, the end cursor is 9743 adjusted, and each time blanks are matched, the begin cursor is moved 9744 onto the end cursor to effectively ignore the blanks preceding tokens. 9745 Comments would be treated equally. 9746 9747 %{ 9748 # define YY_USER_ACTION yylloc->columns (yyleng); 9749 %} 9750 %% 9751 %{ 9752 yylloc->step (); 9753 %} 9754 {blank}+ yylloc->step (); 9755 [\n]+ yylloc->lines (yyleng); yylloc->step (); 9756 9757 The rules are simple, just note the use of the driver to report errors. 9758 It is convenient to use a typedef to shorten 9759 `yy::calcxx_parser::token::identifier' into `token::identifier' for 9760 instance. 9761 9762 %{ 9763 typedef yy::calcxx_parser::token token; 9764 %} 9765 /* Convert ints to the actual type of tokens. */ 9766 [-+*/] return yy::calcxx_parser::token_type (yytext[0]); 9767 9768 ":=" return token::ASSIGN; 9769 9770 {int} { 9771 errno = 0; 9772 long n = strtol (yytext, NULL, 10); 9773 if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) 9774 driver.error (*yylloc, "integer is out of range"); 9775 yylval->ival = n; 9776 return token::NUMBER; 9777 } 9778 9779 {id} { 9780 yylval->sval = new std::string (yytext); 9781 return token::IDENTIFIER; 9782 } 9783 9784 . driver.error (*yylloc, "invalid character"); 9785 %% 9786 9787 Finally, because the scanner related driver's member function depend on 9788 the scanner's data, it is simpler to implement them in this file. 9789 9790 void 9791 calcxx_driver::scan_begin () 9792 { 9793 yy_flex_debug = trace_scanning; 9794 if (file.empty () || file == "-") 9795 yyin = stdin; 9796 else if (!(yyin = fopen (file.c_str (), "r"))) 9797 { 9798 error ("cannot open " + file + ": " + strerror(errno)); 9799 exit (EXIT_FAILURE); 9800 } 9801 } 9802 9803 void 9804 calcxx_driver::scan_end () 9805 { 9806 fclose (yyin); 9807 } 9808 9809 9810 File: bison.info, Node: Calc++ Top Level, Prev: Calc++ Scanner, Up: A Complete C++ Example 9811 9812 10.1.6.5 Calc++ Top Level 9813 ......................... 9814 9815 The top level file, `calc++.cc', poses no problem. 9816 9817 #include <iostream> 9818 #include "calc++-driver.hh" 9819 9820 int 9821 main (int argc, char *argv[]) 9822 { 9823 calcxx_driver driver; 9824 for (int i = 1; i < argc; ++i) 9825 if (argv[i] == std::string ("-p")) 9826 driver.trace_parsing = true; 9827 else if (argv[i] == std::string ("-s")) 9828 driver.trace_scanning = true; 9829 else if (!driver.parse (argv[i])) 9830 std::cout << driver.result << std::endl; 9831 } 9832 9833 9834 File: bison.info, Node: Java Parsers, Prev: C++ Parsers, Up: Other Languages 9835 9836 10.2 Java Parsers 9837 ================= 9838 9839 * Menu: 9840 9841 * Java Bison Interface:: Asking for Java parser generation 9842 * Java Semantic Values:: %type and %token vs. Java 9843 * Java Location Values:: The position and location classes 9844 * Java Parser Interface:: Instantiating and running the parser 9845 * Java Scanner Interface:: Specifying the scanner for the parser 9846 * Java Action Features:: Special features for use in actions 9847 * Java Differences:: Differences between C/C++ and Java Grammars 9848 * Java Declarations Summary:: List of Bison declarations used with Java 9849 9850 9851 File: bison.info, Node: Java Bison Interface, Next: Java Semantic Values, Up: Java Parsers 9852 9853 10.2.1 Java Bison Interface 9854 --------------------------- 9855 9856 (The current Java interface is experimental and may evolve. More user 9857 feedback will help to stabilize it.) 9858 9859 The Java parser skeletons are selected using the `%language "Java"' 9860 directive or the `-L java'/`--language=java' option. 9861 9862 When generating a Java parser, `bison BASENAME.y' will create a 9863 single Java source file named `BASENAME.java' containing the parser 9864 implementation. Using a grammar file without a `.y' suffix is 9865 currently broken. The basename of the parser implementation file can 9866 be changed by the `%file-prefix' directive or the `-p'/`--name-prefix' 9867 option. The entire parser implementation file name can be changed by 9868 the `%output' directive or the `-o'/`--output' option. The parser 9869 implementation file contains a single class for the parser. 9870 9871 You can create documentation for generated parsers using Javadoc. 9872 9873 Contrary to C parsers, Java parsers do not use global variables; the 9874 state of the parser is always local to an instance of the parser class. 9875 Therefore, all Java parsers are "pure", and the `%pure-parser' and 9876 `%define api.pure full' directives does not do anything when used in 9877 Java. 9878 9879 Push parsers are currently unsupported in Java and `%define 9880 api.push-pull' have no effect. 9881 9882 GLR parsers are currently unsupported in Java. Do not use the 9883 `glr-parser' directive. 9884 9885 No header file can be generated for Java parsers. Do not use the 9886 `%defines' directive or the `-d'/`--defines' options. 9887 9888 Currently, support for debugging and verbose errors are always 9889 compiled in. Thus the `%debug' and `%token-table' directives and the 9890 `-t'/`--debug' and `-k'/`--token-table' options have no effect. This 9891 may change in the future to eliminate unused code in the generated 9892 parser, so use `%debug' and `%verbose-error' explicitly if needed. 9893 Also, in the future the `%token-table' directive might enable a public 9894 interface to access the token names and codes. 9895 9896 9897 File: bison.info, Node: Java Semantic Values, Next: Java Location Values, Prev: Java Bison Interface, Up: Java Parsers 9898 9899 10.2.2 Java Semantic Values 9900 --------------------------- 9901 9902 There is no `%union' directive in Java parsers. Instead, the semantic 9903 values' types (class names) should be specified in the `%type' or 9904 `%token' directive: 9905 9906 %type <Expression> expr assignment_expr term factor 9907 %type <Integer> number 9908 9909 By default, the semantic stack is declared to have `Object' members, 9910 which means that the class types you specify can be of any class. To 9911 improve the type safety of the parser, you can declare the common 9912 superclass of all the semantic values using the `%define stype' 9913 directive. For example, after the following declaration: 9914 9915 %define stype "ASTNode" 9916 9917 any `%type' or `%token' specifying a semantic type which is not a 9918 subclass of ASTNode, will cause a compile-time error. 9919 9920 Types used in the directives may be qualified with a package name. 9921 Primitive data types are accepted for Java version 1.5 or later. Note 9922 that in this case the autoboxing feature of Java 1.5 will be used. 9923 Generic types may not be used; this is due to a limitation in the 9924 implementation of Bison, and may change in future releases. 9925 9926 Java parsers do not support `%destructor', since the language adopts 9927 garbage collection. The parser will try to hold references to semantic 9928 values for as little time as needed. 9929 9930 Java parsers do not support `%printer', as `toString()' can be used 9931 to print the semantic values. This however may change (in a 9932 backwards-compatible way) in future versions of Bison. 9933 9934 9935 File: bison.info, Node: Java Location Values, Next: Java Parser Interface, Prev: Java Semantic Values, Up: Java Parsers 9936 9937 10.2.3 Java Location Values 9938 --------------------------- 9939 9940 When the directive `%locations' is used, the Java parser supports 9941 location tracking, see *note Tracking Locations::. An auxiliary 9942 user-defined class defines a "position", a single point in a file; 9943 Bison itself defines a class representing a "location", a range 9944 composed of a pair of positions (possibly spanning several files). The 9945 location class is an inner class of the parser; the name is `Location' 9946 by default, and may also be renamed using `%define api.location.type 9947 "CLASS-NAME"'. 9948 9949 The location class treats the position as a completely opaque value. 9950 By default, the class name is `Position', but this can be changed with 9951 `%define api.position.type "CLASS-NAME"'. This class must be supplied 9952 by the user. 9953 9954 -- Instance Variable of Location: Position begin 9955 -- Instance Variable of Location: Position end 9956 The first, inclusive, position of the range, and the first beyond. 9957 9958 -- Constructor on Location: Location (Position LOC) 9959 Create a `Location' denoting an empty range located at a given 9960 point. 9961 9962 -- Constructor on Location: Location (Position BEGIN, Position END) 9963 Create a `Location' from the endpoints of the range. 9964 9965 -- Method on Location: String toString () 9966 Prints the range represented by the location. For this to work 9967 properly, the position class should override the `equals' and 9968 `toString' methods appropriately. 9969 9970 9971 File: bison.info, Node: Java Parser Interface, Next: Java Scanner Interface, Prev: Java Location Values, Up: Java Parsers 9972 9973 10.2.4 Java Parser Interface 9974 ---------------------------- 9975 9976 The name of the generated parser class defaults to `YYParser'. The 9977 `YY' prefix may be changed using the `%name-prefix' directive or the 9978 `-p'/`--name-prefix' option. Alternatively, use `%define 9979 parser_class_name "NAME"' to give a custom name to the class. The 9980 interface of this class is detailed below. 9981 9982 By default, the parser class has package visibility. A declaration 9983 `%define public' will change to public visibility. Remember that, 9984 according to the Java language specification, the name of the `.java' 9985 file should match the name of the class in this case. Similarly, you 9986 can use `abstract', `final' and `strictfp' with the `%define' 9987 declaration to add other modifiers to the parser class. 9988 9989 The Java package name of the parser class can be specified using the 9990 `%define package' directive. The superclass and the implemented 9991 interfaces of the parser class can be specified with the `%define 9992 extends' and `%define implements' directives. 9993 9994 The parser class defines an inner class, `Location', that is used 9995 for location tracking (see *note Java Location Values::), and a inner 9996 interface, `Lexer' (see *note Java Scanner Interface::). Other than 9997 these inner class/interface, and the members described in the interface 9998 below, all the other members and fields are preceded with a `yy' or 9999 `YY' prefix to avoid clashes with user code. 10000 10001 The parser class can be extended using the `%parse-param' directive. 10002 Each occurrence of the directive will add a `protected final' field to 10003 the parser class, and an argument to its constructor, which initialize 10004 them automatically. 10005 10006 Token names defined by `%token' and the predefined `EOF' token name 10007 are added as constant fields to the parser class. 10008 10009 -- Constructor on YYParser: YYParser (LEX_PARAM, ..., PARSE_PARAM, 10010 ...) 10011 Build a new parser object with embedded `%code lexer'. There are 10012 no parameters, unless `%parse-param's and/or `%lex-param's are 10013 used. 10014 10015 -- Constructor on YYParser: YYParser (Lexer LEXER, PARSE_PARAM, ...) 10016 Build a new parser object using the specified scanner. There are 10017 no additional parameters unless `%parse-param's are used. 10018 10019 If the scanner is defined by `%code lexer', this constructor is 10020 declared `protected' and is called automatically with a scanner 10021 created with the correct `%lex-param's. 10022 10023 -- Method on YYParser: boolean parse () 10024 Run the syntactic analysis, and return `true' on success, `false' 10025 otherwise. 10026 10027 -- Method on YYParser: boolean recovering () 10028 During the syntactic analysis, return `true' if recovering from a 10029 syntax error. *Note Error Recovery::. 10030 10031 -- Method on YYParser: java.io.PrintStream getDebugStream () 10032 -- Method on YYParser: void setDebugStream (java.io.printStream O) 10033 Get or set the stream used for tracing the parsing. It defaults to 10034 `System.err'. 10035 10036 -- Method on YYParser: int getDebugLevel () 10037 -- Method on YYParser: void setDebugLevel (int L) 10038 Get or set the tracing level. Currently its value is either 0, no 10039 trace, or nonzero, full tracing. 10040 10041 10042 File: bison.info, Node: Java Scanner Interface, Next: Java Action Features, Prev: Java Parser Interface, Up: Java Parsers 10043 10044 10.2.5 Java Scanner Interface 10045 ----------------------------- 10046 10047 There are two possible ways to interface a Bison-generated Java parser 10048 with a scanner: the scanner may be defined by `%code lexer', or defined 10049 elsewhere. In either case, the scanner has to implement the `Lexer' 10050 inner interface of the parser class. 10051 10052 In the first case, the body of the scanner class is placed in `%code 10053 lexer' blocks. If you want to pass parameters from the parser 10054 constructor to the scanner constructor, specify them with `%lex-param'; 10055 they are passed before `%parse-param's to the constructor. 10056 10057 In the second case, the scanner has to implement the `Lexer' 10058 interface, which is defined within the parser class (e.g., 10059 `YYParser.Lexer'). The constructor of the parser object will then 10060 accept an object implementing the interface; `%lex-param' is not used 10061 in this case. 10062 10063 In both cases, the scanner has to implement the following methods. 10064 10065 -- Method on Lexer: void yyerror (Location LOC, String MSG) 10066 This method is defined by the user to emit an error message. The 10067 first parameter is omitted if location tracking is not active. 10068 Its type can be changed using `%define api.location.type 10069 "CLASS-NAME".' 10070 10071 -- Method on Lexer: int yylex () 10072 Return the next token. Its type is the return value, its semantic 10073 value and location are saved and returned by the their methods in 10074 the interface. 10075 10076 Use `%define lex_throws' to specify any uncaught exceptions. 10077 Default is `java.io.IOException'. 10078 10079 -- Method on Lexer: Position getStartPos () 10080 -- Method on Lexer: Position getEndPos () 10081 Return respectively the first position of the last token that 10082 `yylex' returned, and the first position beyond it. These methods 10083 are not needed unless location tracking is active. 10084 10085 The return type can be changed using `%define api.position.type 10086 "CLASS-NAME".' 10087 10088 -- Method on Lexer: Object getLVal () 10089 Return the semantic value of the last token that yylex returned. 10090 10091 The return type can be changed using `%define stype "CLASS-NAME".' 10092 10093 10094 File: bison.info, Node: Java Action Features, Next: Java Differences, Prev: Java Scanner Interface, Up: Java Parsers 10095 10096 10.2.6 Special Features for Use in Java Actions 10097 ----------------------------------------------- 10098 10099 The following special constructs can be uses in Java actions. Other 10100 analogous C action features are currently unavailable for Java. 10101 10102 Use `%define throws' to specify any uncaught exceptions from parser 10103 actions, and initial actions specified by `%initial-action'. 10104 10105 -- Variable: $N 10106 The semantic value for the Nth component of the current rule. 10107 This may not be assigned to. *Note Java Semantic Values::. 10108 10109 -- Variable: $<TYPEALT>N 10110 Like `$N' but specifies a alternative type TYPEALT. *Note Java 10111 Semantic Values::. 10112 10113 -- Variable: $$ 10114 The semantic value for the grouping made by the current rule. As a 10115 value, this is in the base type (`Object' or as specified by 10116 `%define stype') as in not cast to the declared subtype because 10117 casts are not allowed on the left-hand side of Java assignments. 10118 Use an explicit Java cast if the correct subtype is needed. *Note 10119 Java Semantic Values::. 10120 10121 -- Variable: $<TYPEALT>$ 10122 Same as `$$' since Java always allow assigning to the base type. 10123 Perhaps we should use this and `$<>$' for the value and `$$' for 10124 setting the value but there is currently no easy way to distinguish 10125 these constructs. *Note Java Semantic Values::. 10126 10127 -- Variable: @N 10128 The location information of the Nth component of the current rule. 10129 This may not be assigned to. *Note Java Location Values::. 10130 10131 -- Variable: @$ 10132 The location information of the grouping made by the current rule. 10133 *Note Java Location Values::. 10134 10135 -- Statement: return YYABORT `;' 10136 Return immediately from the parser, indicating failure. *Note 10137 Java Parser Interface::. 10138 10139 -- Statement: return YYACCEPT `;' 10140 Return immediately from the parser, indicating success. *Note 10141 Java Parser Interface::. 10142 10143 -- Statement: return YYERROR `;' 10144 Start error recovery (without printing an error message). *Note 10145 Error Recovery::. 10146 10147 -- Function: boolean recovering () 10148 Return whether error recovery is being done. In this state, the 10149 parser reads token until it reaches a known state, and then 10150 restarts normal operation. *Note Error Recovery::. 10151 10152 -- Function: protected void yyerror (String msg) 10153 -- Function: protected void yyerror (Position pos, String msg) 10154 -- Function: protected void yyerror (Location loc, String msg) 10155 Print an error message using the `yyerror' method of the scanner 10156 instance in use. 10157 10158 10159 File: bison.info, Node: Java Differences, Next: Java Declarations Summary, Prev: Java Action Features, Up: Java Parsers 10160 10161 10.2.7 Differences between C/C++ and Java Grammars 10162 -------------------------------------------------- 10163 10164 The different structure of the Java language forces several differences 10165 between C/C++ grammars, and grammars designed for Java parsers. This 10166 section summarizes these differences. 10167 10168 * Java lacks a preprocessor, so the `YYERROR', `YYACCEPT', `YYABORT' 10169 symbols (*note Table of Symbols::) cannot obviously be macros. 10170 Instead, they should be preceded by `return' when they appear in 10171 an action. The actual definition of these symbols is opaque to 10172 the Bison grammar, and it might change in the future. The only 10173 meaningful operation that you can do, is to return them. *Note 10174 Java Action Features::. 10175 10176 Note that of these three symbols, only `YYACCEPT' and `YYABORT' 10177 will cause a return from the `yyparse' method(1). 10178 10179 * Java lacks unions, so `%union' has no effect. Instead, semantic 10180 values have a common base type: `Object' or as specified by 10181 `%define stype'. Angle brackets on `%token', `type', `$N' and 10182 `$$' specify subtypes rather than fields of an union. The type of 10183 `$$', even with angle brackets, is the base type since Java casts 10184 are not allow on the left-hand side of assignments. Also, `$N' 10185 and `@N' are not allowed on the left-hand side of assignments. 10186 *Note Java Semantic Values::, and *note Java Action Features::. 10187 10188 * The prologue declarations have a different meaning than in C/C++ 10189 code. 10190 `%code imports' 10191 blocks are placed at the beginning of the Java source code. 10192 They may include copyright notices. For a `package' 10193 declarations, it is suggested to use `%define package' 10194 instead. 10195 10196 unqualified `%code' 10197 blocks are placed inside the parser class. 10198 10199 `%code lexer' 10200 blocks, if specified, should include the implementation of the 10201 scanner. If there is no such block, the scanner can be any 10202 class that implements the appropriate interface (*note Java 10203 Scanner Interface::). 10204 10205 Other `%code' blocks are not supported in Java parsers. In 10206 particular, `%{ ... %}' blocks should not be used and may give an 10207 error in future versions of Bison. 10208 10209 The epilogue has the same meaning as in C/C++ code and it can be 10210 used to define other classes used by the parser _outside_ the 10211 parser class. 10212 10213 ---------- Footnotes ---------- 10214 10215 (1) Java parsers include the actions in a separate method than 10216 `yyparse' in order to have an intuitive syntax that corresponds to 10217 these C macros. 10218 10219 10220 File: bison.info, Node: Java Declarations Summary, Prev: Java Differences, Up: Java Parsers 10221 10222 10.2.8 Java Declarations Summary 10223 -------------------------------- 10224 10225 This summary only include declarations specific to Java or have special 10226 meaning when used in a Java parser. 10227 10228 -- Directive: %language "Java" 10229 Generate a Java class for the parser. 10230 10231 -- Directive: %lex-param {TYPE NAME} 10232 A parameter for the lexer class defined by `%code lexer' _only_, 10233 added as parameters to the lexer constructor and the parser 10234 constructor that _creates_ a lexer. Default is none. *Note Java 10235 Scanner Interface::. 10236 10237 -- Directive: %name-prefix "PREFIX" 10238 The prefix of the parser class name `PREFIXParser' if `%define 10239 parser_class_name' is not used. Default is `YY'. *Note Java 10240 Bison Interface::. 10241 10242 -- Directive: %parse-param {TYPE NAME} 10243 A parameter for the parser class added as parameters to 10244 constructor(s) and as fields initialized by the constructor(s). 10245 Default is none. *Note Java Parser Interface::. 10246 10247 -- Directive: %token <TYPE> TOKEN ... 10248 Declare tokens. Note that the angle brackets enclose a Java 10249 _type_. *Note Java Semantic Values::. 10250 10251 -- Directive: %type <TYPE> NONTERMINAL ... 10252 Declare the type of nonterminals. Note that the angle brackets 10253 enclose a Java _type_. *Note Java Semantic Values::. 10254 10255 -- Directive: %code { CODE ... } 10256 Code appended to the inside of the parser class. *Note Java 10257 Differences::. 10258 10259 -- Directive: %code imports { CODE ... } 10260 Code inserted just after the `package' declaration. *Note Java 10261 Differences::. 10262 10263 -- Directive: %code lexer { CODE ... } 10264 Code added to the body of a inner lexer class within the parser 10265 class. *Note Java Scanner Interface::. 10266 10267 -- Directive: %% CODE ... 10268 Code (after the second `%%') appended to the end of the file, 10269 _outside_ the parser class. *Note Java Differences::. 10270 10271 -- Directive: %{ CODE ... %} 10272 Not supported. Use `%code import' instead. *Note Java 10273 Differences::. 10274 10275 -- Directive: %define abstract 10276 Whether the parser class is declared `abstract'. Default is false. 10277 *Note Java Bison Interface::. 10278 10279 -- Directive: %define extends "SUPERCLASS" 10280 The superclass of the parser class. Default is none. *Note Java 10281 Bison Interface::. 10282 10283 -- Directive: %define final 10284 Whether the parser class is declared `final'. Default is false. 10285 *Note Java Bison Interface::. 10286 10287 -- Directive: %define implements "INTERFACES" 10288 The implemented interfaces of the parser class, a comma-separated 10289 list. Default is none. *Note Java Bison Interface::. 10290 10291 -- Directive: %define lex_throws "EXCEPTIONS" 10292 The exceptions thrown by the `yylex' method of the lexer, a 10293 comma-separated list. Default is `java.io.IOException'. *Note 10294 Java Scanner Interface::. 10295 10296 -- Directive: %define api.location.type "CLASS" 10297 The name of the class used for locations (a range between two 10298 positions). This class is generated as an inner class of the 10299 parser class by `bison'. Default is `Location'. Formerly named 10300 `location_type'. *Note Java Location Values::. 10301 10302 -- Directive: %define package "PACKAGE" 10303 The package to put the parser class in. Default is none. *Note 10304 Java Bison Interface::. 10305 10306 -- Directive: %define parser_class_name "NAME" 10307 The name of the parser class. Default is `YYParser' or 10308 `NAME-PREFIXParser'. *Note Java Bison Interface::. 10309 10310 -- Directive: %define api.position.type "CLASS" 10311 The name of the class used for positions. This class must be 10312 supplied by the user. Default is `Position'. Formerly named 10313 `position_type'. *Note Java Location Values::. 10314 10315 -- Directive: %define public 10316 Whether the parser class is declared `public'. Default is false. 10317 *Note Java Bison Interface::. 10318 10319 -- Directive: %define stype "CLASS" 10320 The base type of semantic values. Default is `Object'. *Note 10321 Java Semantic Values::. 10322 10323 -- Directive: %define strictfp 10324 Whether the parser class is declared `strictfp'. Default is false. 10325 *Note Java Bison Interface::. 10326 10327 -- Directive: %define throws "EXCEPTIONS" 10328 The exceptions thrown by user-supplied parser actions and 10329 `%initial-action', a comma-separated list. Default is none. 10330 *Note Java Parser Interface::. 10331 10332 10333 File: bison.info, Node: FAQ, Next: Table of Symbols, Prev: Other Languages, Up: Top 10334 10335 11 Frequently Asked Questions 10336 ***************************** 10337 10338 Several questions about Bison come up occasionally. Here some of them 10339 are addressed. 10340 10341 * Menu: 10342 10343 * Memory Exhausted:: Breaking the Stack Limits 10344 * How Can I Reset the Parser:: `yyparse' Keeps some State 10345 * Strings are Destroyed:: `yylval' Loses Track of Strings 10346 * Implementing Gotos/Loops:: Control Flow in the Calculator 10347 * Multiple start-symbols:: Factoring closely related grammars 10348 * Secure? Conform?:: Is Bison POSIX safe? 10349 * I can't build Bison:: Troubleshooting 10350 * Where can I find help?:: Troubleshouting 10351 * Bug Reports:: Troublereporting 10352 * More Languages:: Parsers in C++, Java, and so on 10353 * Beta Testing:: Experimenting development versions 10354 * Mailing Lists:: Meeting other Bison users 10355 10356 10357 File: bison.info, Node: Memory Exhausted, Next: How Can I Reset the Parser, Up: FAQ 10358 10359 11.1 Memory Exhausted 10360 ===================== 10361 10362 My parser returns with error with a `memory exhausted' message. 10363 What can I do? 10364 10365 This question is already addressed elsewhere, see *note Recursive 10366 Rules: Recursion. 10367 10368 10369 File: bison.info, Node: How Can I Reset the Parser, Next: Strings are Destroyed, Prev: Memory Exhausted, Up: FAQ 10370 10371 11.2 How Can I Reset the Parser 10372 =============================== 10373 10374 The following phenomenon has several symptoms, resulting in the 10375 following typical questions: 10376 10377 I invoke `yyparse' several times, and on correct input it works 10378 properly; but when a parse error is found, all the other calls fail 10379 too. How can I reset the error flag of `yyparse'? 10380 10381 or 10382 10383 My parser includes support for an `#include'-like feature, in 10384 which case I run `yyparse' from `yyparse'. This fails although I 10385 did specify `%define api.pure full'. 10386 10387 These problems typically come not from Bison itself, but from 10388 Lex-generated scanners. Because these scanners use large buffers for 10389 speed, they might not notice a change of input file. As a 10390 demonstration, consider the following source file, `first-line.l': 10391 10392 %{ 10393 #include <stdio.h> 10394 #include <stdlib.h> 10395 %} 10396 %% 10397 .*\n ECHO; return 1; 10398 %% 10399 int 10400 yyparse (char const *file) 10401 { 10402 yyin = fopen (file, "r"); 10403 if (!yyin) 10404 { 10405 perror ("fopen"); 10406 exit (EXIT_FAILURE); 10407 } 10408 /* One token only. */ 10409 yylex (); 10410 if (fclose (yyin) != 0) 10411 { 10412 perror ("fclose"); 10413 exit (EXIT_FAILURE); 10414 } 10415 return 0; 10416 } 10417 10418 int 10419 main (void) 10420 { 10421 yyparse ("input"); 10422 yyparse ("input"); 10423 return 0; 10424 } 10425 10426 If the file `input' contains 10427 10428 input:1: Hello, 10429 input:2: World! 10430 10431 then instead of getting the first line twice, you get: 10432 10433 $ flex -ofirst-line.c first-line.l 10434 $ gcc -ofirst-line first-line.c -ll 10435 $ ./first-line 10436 input:1: Hello, 10437 input:2: World! 10438 10439 Therefore, whenever you change `yyin', you must tell the 10440 Lex-generated scanner to discard its current buffer and switch to the 10441 new one. This depends upon your implementation of Lex; see its 10442 documentation for more. For Flex, it suffices to call 10443 `YY_FLUSH_BUFFER' after each change to `yyin'. If your Flex-generated 10444 scanner needs to read from several input streams to handle features 10445 like include files, you might consider using Flex functions like 10446 `yy_switch_to_buffer' that manipulate multiple input buffers. 10447 10448 If your Flex-generated scanner uses start conditions (*note Start 10449 conditions: (flex)Start conditions.), you might also want to reset the 10450 scanner's state, i.e., go back to the initial start condition, through 10451 a call to `BEGIN (0)'. 10452 10453 10454 File: bison.info, Node: Strings are Destroyed, Next: Implementing Gotos/Loops, Prev: How Can I Reset the Parser, Up: FAQ 10455 10456 11.3 Strings are Destroyed 10457 ========================== 10458 10459 My parser seems to destroy old strings, or maybe it loses track of 10460 them. Instead of reporting `"foo", "bar"', it reports `"bar", 10461 "bar"', or even `"foo\nbar", "bar"'. 10462 10463 This error is probably the single most frequent "bug report" sent to 10464 Bison lists, but is only concerned with a misunderstanding of the role 10465 of the scanner. Consider the following Lex code: 10466 10467 %{ 10468 #include <stdio.h> 10469 char *yylval = NULL; 10470 %} 10471 %% 10472 .* yylval = yytext; return 1; 10473 \n /* IGNORE */ 10474 %% 10475 int 10476 main () 10477 { 10478 /* Similar to using $1, $2 in a Bison action. */ 10479 char *fst = (yylex (), yylval); 10480 char *snd = (yylex (), yylval); 10481 printf ("\"%s\", \"%s\"\n", fst, snd); 10482 return 0; 10483 } 10484 10485 If you compile and run this code, you get: 10486 10487 $ flex -osplit-lines.c split-lines.l 10488 $ gcc -osplit-lines split-lines.c -ll 10489 $ printf 'one\ntwo\n' | ./split-lines 10490 "one 10491 two", "two" 10492 10493 this is because `yytext' is a buffer provided for _reading_ in the 10494 action, but if you want to keep it, you have to duplicate it (e.g., 10495 using `strdup'). Note that the output may depend on how your 10496 implementation of Lex handles `yytext'. For instance, when given the 10497 Lex compatibility option `-l' (which triggers the option `%array') Flex 10498 generates a different behavior: 10499 10500 $ flex -l -osplit-lines.c split-lines.l 10501 $ gcc -osplit-lines split-lines.c -ll 10502 $ printf 'one\ntwo\n' | ./split-lines 10503 "two", "two" 10504 10505 10506 File: bison.info, Node: Implementing Gotos/Loops, Next: Multiple start-symbols, Prev: Strings are Destroyed, Up: FAQ 10507 10508 11.4 Implementing Gotos/Loops 10509 ============================= 10510 10511 My simple calculator supports variables, assignments, and 10512 functions, but how can I implement gotos, or loops? 10513 10514 Although very pedagogical, the examples included in the document blur 10515 the distinction to make between the parser--whose job is to recover the 10516 structure of a text and to transmit it to subsequent modules of the 10517 program--and the processing (such as the execution) of this structure. 10518 This works well with so called straight line programs, i.e., precisely 10519 those that have a straightforward execution model: execute simple 10520 instructions one after the others. 10521 10522 If you want a richer model, you will probably need to use the parser 10523 to construct a tree that does represent the structure it has recovered; 10524 this tree is usually called the "abstract syntax tree", or "AST" for 10525 short. Then, walking through this tree, traversing it in various ways, 10526 will enable treatments such as its execution or its translation, which 10527 will result in an interpreter or a compiler. 10528 10529 This topic is way beyond the scope of this manual, and the reader is 10530 invited to consult the dedicated literature. 10531 10532 10533 File: bison.info, Node: Multiple start-symbols, Next: Secure? Conform?, Prev: Implementing Gotos/Loops, Up: FAQ 10534 10535 11.5 Multiple start-symbols 10536 =========================== 10537 10538 I have several closely related grammars, and I would like to share 10539 their implementations. In fact, I could use a single grammar but 10540 with multiple entry points. 10541 10542 Bison does not support multiple start-symbols, but there is a very 10543 simple means to simulate them. If `foo' and `bar' are the two pseudo 10544 start-symbols, then introduce two new tokens, say `START_FOO' and 10545 `START_BAR', and use them as switches from the real start-symbol: 10546 10547 %token START_FOO START_BAR; 10548 %start start; 10549 start: 10550 START_FOO foo 10551 | START_BAR bar; 10552 10553 These tokens prevents the introduction of new conflicts. As far as 10554 the parser goes, that is all that is needed. 10555 10556 Now the difficult part is ensuring that the scanner will send these 10557 tokens first. If your scanner is hand-written, that should be 10558 straightforward. If your scanner is generated by Lex, them there is 10559 simple means to do it: recall that anything between `%{ ... %}' after 10560 the first `%%' is copied verbatim in the top of the generated `yylex' 10561 function. Make sure a variable `start_token' is available in the 10562 scanner (e.g., a global variable or using `%lex-param' etc.), and use 10563 the following: 10564 10565 /* Prologue. */ 10566 %% 10567 %{ 10568 if (start_token) 10569 { 10570 int t = start_token; 10571 start_token = 0; 10572 return t; 10573 } 10574 %} 10575 /* The rules. */ 10576 10577 10578 File: bison.info, Node: Secure? Conform?, Next: I can't build Bison, Prev: Multiple start-symbols, Up: FAQ 10579 10580 11.6 Secure? Conform? 10581 ====================== 10582 10583 Is Bison secure? Does it conform to POSIX? 10584 10585 If you're looking for a guarantee or certification, we don't provide 10586 it. However, Bison is intended to be a reliable program that conforms 10587 to the POSIX specification for Yacc. If you run into problems, please 10588 send us a bug report. 10589 10590 10591 File: bison.info, Node: I can't build Bison, Next: Where can I find help?, Prev: Secure? Conform?, Up: FAQ 10592 10593 11.7 I can't build Bison 10594 ======================== 10595 10596 I can't build Bison because `make' complains that `msgfmt' is not 10597 found. What should I do? 10598 10599 Like most GNU packages with internationalization support, that 10600 feature is turned on by default. If you have problems building in the 10601 `po' subdirectory, it indicates that your system's internationalization 10602 support is lacking. You can re-configure Bison with `--disable-nls' to 10603 turn off this support, or you can install GNU gettext from 10604 `ftp://ftp.gnu.org/gnu/gettext/' and re-configure Bison. See the file 10605 `ABOUT-NLS' for more information. 10606 10607 10608 File: bison.info, Node: Where can I find help?, Next: Bug Reports, Prev: I can't build Bison, Up: FAQ 10609 10610 11.8 Where can I find help? 10611 =========================== 10612 10613 I'm having trouble using Bison. Where can I find help? 10614 10615 First, read this fine manual. Beyond that, you can send mail to 10616 <help-bison (a] gnu.org>. This mailing list is intended to be populated 10617 with people who are willing to answer questions about using and 10618 installing Bison. Please keep in mind that (most of) the people on the 10619 list have aspects of their lives which are not related to Bison (!), so 10620 you may not receive an answer to your question right away. This can be 10621 frustrating, but please try not to honk them off; remember that any 10622 help they provide is purely voluntary and out of the kindness of their 10623 hearts. 10624 10625 10626 File: bison.info, Node: Bug Reports, Next: More Languages, Prev: Where can I find help?, Up: FAQ 10627 10628 11.9 Bug Reports 10629 ================ 10630 10631 I found a bug. What should I include in the bug report? 10632 10633 Before you send a bug report, make sure you are using the latest 10634 version. Check `ftp://ftp.gnu.org/pub/gnu/bison/' or one of its 10635 mirrors. Be sure to include the version number in your bug report. If 10636 the bug is present in the latest version but not in a previous version, 10637 try to determine the most recent version which did not contain the bug. 10638 10639 If the bug is parser-related, you should include the smallest grammar 10640 you can which demonstrates the bug. The grammar file should also be 10641 complete (i.e., I should be able to run it through Bison without having 10642 to edit or add anything). The smaller and simpler the grammar, the 10643 easier it will be to fix the bug. 10644 10645 Include information about your compilation environment, including 10646 your operating system's name and version and your compiler's name and 10647 version. If you have trouble compiling, you should also include a 10648 transcript of the build session, starting with the invocation of 10649 `configure'. Depending on the nature of the bug, you may be asked to 10650 send additional files as well (such as `config.h' or `config.cache'). 10651 10652 Patches are most welcome, but not required. That is, do not 10653 hesitate to send a bug report just because you cannot provide a fix. 10654 10655 Send bug reports to <bug-bison (a] gnu.org>. 10656 10657 10658 File: bison.info, Node: More Languages, Next: Beta Testing, Prev: Bug Reports, Up: FAQ 10659 10660 11.10 More Languages 10661 ==================== 10662 10663 Will Bison ever have C++ and Java support? How about INSERT YOUR 10664 FAVORITE LANGUAGE HERE? 10665 10666 C++ and Java support is there now, and is documented. We'd love to 10667 add other languages; contributions are welcome. 10668 10669 10670 File: bison.info, Node: Beta Testing, Next: Mailing Lists, Prev: More Languages, Up: FAQ 10671 10672 11.11 Beta Testing 10673 ================== 10674 10675 What is involved in being a beta tester? 10676 10677 It's not terribly involved. Basically, you would download a test 10678 release, compile it, and use it to build and run a parser or two. After 10679 that, you would submit either a bug report or a message saying that 10680 everything is okay. It is important to report successes as well as 10681 failures because test releases eventually become mainstream releases, 10682 but only if they are adequately tested. If no one tests, development is 10683 essentially halted. 10684 10685 Beta testers are particularly needed for operating systems to which 10686 the developers do not have easy access. They currently have easy 10687 access to recent GNU/Linux and Solaris versions. Reports about other 10688 operating systems are especially welcome. 10689 10690 10691 File: bison.info, Node: Mailing Lists, Prev: Beta Testing, Up: FAQ 10692 10693 11.12 Mailing Lists 10694 =================== 10695 10696 How do I join the help-bison and bug-bison mailing lists? 10697 10698 See `http://lists.gnu.org/'. 10699 10700 10701 File: bison.info, Node: Table of Symbols, Next: Glossary, Prev: FAQ, Up: Top 10702 10703 Appendix A Bison Symbols 10704 ************************ 10705 10706 -- Variable: @$ 10707 In an action, the location of the left-hand side of the rule. 10708 *Note Tracking Locations::. 10709 10710 -- Variable: @N 10711 -- Symbol: @N 10712 In an action, the location of the N-th symbol of the right-hand 10713 side of the rule. *Note Tracking Locations::. 10714 10715 In a grammar, the Bison-generated nonterminal symbol for a 10716 mid-rule action with a semantical value. *Note Mid-Rule Action 10717 Translation::. 10718 10719 -- Variable: @NAME 10720 -- Variable: @[NAME] 10721 In an action, the location of a symbol addressed by NAME. *Note 10722 Tracking Locations::. 10723 10724 -- Symbol: $@N 10725 In a grammar, the Bison-generated nonterminal symbol for a 10726 mid-rule action with no semantical value. *Note Mid-Rule Action 10727 Translation::. 10728 10729 -- Variable: $$ 10730 In an action, the semantic value of the left-hand side of the rule. 10731 *Note Actions::. 10732 10733 -- Variable: $N 10734 In an action, the semantic value of the N-th symbol of the 10735 right-hand side of the rule. *Note Actions::. 10736 10737 -- Variable: $NAME 10738 -- Variable: $[NAME] 10739 In an action, the semantic value of a symbol addressed by NAME. 10740 *Note Actions::. 10741 10742 -- Delimiter: %% 10743 Delimiter used to separate the grammar rule section from the Bison 10744 declarations section or the epilogue. *Note The Overall Layout of 10745 a Bison Grammar: Grammar Layout. 10746 10747 -- Delimiter: %{CODE%} 10748 All code listed between `%{' and `%}' is copied verbatim to the 10749 parser implementation file. Such code forms the prologue of the 10750 grammar file. *Note Outline of a Bison Grammar: Grammar Outline. 10751 10752 -- Construct: /* ... */ 10753 -- Construct: // ... 10754 Comments, as in C/C++. 10755 10756 -- Delimiter: : 10757 Separates a rule's result from its components. *Note Syntax of 10758 Grammar Rules: Rules. 10759 10760 -- Delimiter: ; 10761 Terminates a rule. *Note Syntax of Grammar Rules: Rules. 10762 10763 -- Delimiter: | 10764 Separates alternate rules for the same result nonterminal. *Note 10765 Syntax of Grammar Rules: Rules. 10766 10767 -- Directive: <*> 10768 Used to define a default tagged `%destructor' or default tagged 10769 `%printer'. 10770 10771 This feature is experimental. More user feedback will help to 10772 determine whether it should become a permanent feature. 10773 10774 *Note Freeing Discarded Symbols: Destructor Decl. 10775 10776 -- Directive: <> 10777 Used to define a default tagless `%destructor' or default tagless 10778 `%printer'. 10779 10780 This feature is experimental. More user feedback will help to 10781 determine whether it should become a permanent feature. 10782 10783 *Note Freeing Discarded Symbols: Destructor Decl. 10784 10785 -- Symbol: $accept 10786 The predefined nonterminal whose only rule is `$accept: START 10787 $end', where START is the start symbol. *Note The Start-Symbol: 10788 Start Decl. It cannot be used in the grammar. 10789 10790 -- Directive: %code {CODE} 10791 -- Directive: %code QUALIFIER {CODE} 10792 Insert CODE verbatim into the output parser source at the default 10793 location or at the location specified by QUALIFIER. *Note %code 10794 Summary::. 10795 10796 -- Directive: %debug 10797 Equip the parser for debugging. *Note Decl Summary::. 10798 10799 -- Directive: %define VARIABLE 10800 -- Directive: %define VARIABLE VALUE 10801 -- Directive: %define VARIABLE "VALUE" 10802 Define a variable to adjust Bison's behavior. *Note %define 10803 Summary::. 10804 10805 -- Directive: %defines 10806 Bison declaration to create a parser header file, which is usually 10807 meant for the scanner. *Note Decl Summary::. 10808 10809 -- Directive: %defines DEFINES-FILE 10810 Same as above, but save in the file DEFINES-FILE. *Note Decl 10811 Summary::. 10812 10813 -- Directive: %destructor 10814 Specify how the parser should reclaim the memory associated to 10815 discarded symbols. *Note Freeing Discarded Symbols: Destructor 10816 Decl. 10817 10818 -- Directive: %dprec 10819 Bison declaration to assign a precedence to a rule that is used at 10820 parse time to resolve reduce/reduce conflicts. *Note Writing GLR 10821 Parsers: GLR Parsers. 10822 10823 -- Symbol: $end 10824 The predefined token marking the end of the token stream. It 10825 cannot be used in the grammar. 10826 10827 -- Symbol: error 10828 A token name reserved for error recovery. This token may be used 10829 in grammar rules so as to allow the Bison parser to recognize an 10830 error in the grammar without halting the process. In effect, a 10831 sentence containing an error may be recognized as valid. On a 10832 syntax error, the token `error' becomes the current lookahead 10833 token. Actions corresponding to `error' are then executed, and 10834 the lookahead token is reset to the token that originally caused 10835 the violation. *Note Error Recovery::. 10836 10837 -- Directive: %error-verbose 10838 Bison declaration to request verbose, specific error message 10839 strings when `yyerror' is called. *Note Error Reporting::. 10840 10841 -- Directive: %file-prefix "PREFIX" 10842 Bison declaration to set the prefix of the output files. *Note 10843 Decl Summary::. 10844 10845 -- Directive: %glr-parser 10846 Bison declaration to produce a GLR parser. *Note Writing GLR 10847 Parsers: GLR Parsers. 10848 10849 -- Directive: %initial-action 10850 Run user code before parsing. *Note Performing Actions before 10851 Parsing: Initial Action Decl. 10852 10853 -- Directive: %language 10854 Specify the programming language for the generated parser. *Note 10855 Decl Summary::. 10856 10857 -- Directive: %left 10858 Bison declaration to assign left associativity to token(s). *Note 10859 Operator Precedence: Precedence Decl. 10860 10861 -- Directive: %lex-param {ARGUMENT-DECLARATION} 10862 Bison declaration to specifying an additional parameter that 10863 `yylex' should accept. *Note Calling Conventions for Pure 10864 Parsers: Pure Calling. 10865 10866 -- Directive: %merge 10867 Bison declaration to assign a merging function to a rule. If 10868 there is a reduce/reduce conflict with a rule having the same 10869 merging function, the function is applied to the two semantic 10870 values to get a single result. *Note Writing GLR Parsers: GLR 10871 Parsers. 10872 10873 -- Directive: %name-prefix "PREFIX" 10874 Obsoleted by the `%define' variable `api.prefix' (*note Multiple 10875 Parsers in the Same Program: Multiple Parsers.). 10876 10877 Rename the external symbols (variables and functions) used in the 10878 parser so that they start with PREFIX instead of `yy'. Contrary to 10879 `api.prefix', do no rename types and macros. 10880 10881 The precise list of symbols renamed in C parsers is `yyparse', 10882 `yylex', `yyerror', `yynerrs', `yylval', `yychar', `yydebug', and 10883 (if locations are used) `yylloc'. If you use a push parser, 10884 `yypush_parse', `yypull_parse', `yypstate', `yypstate_new' and 10885 `yypstate_delete' will also be renamed. For example, if you use 10886 `%name-prefix "c_"', the names become `c_parse', `c_lex', and so 10887 on. For C++ parsers, see the `%define namespace' documentation in 10888 this section. 10889 10890 -- Directive: %no-lines 10891 Bison declaration to avoid generating `#line' directives in the 10892 parser implementation file. *Note Decl Summary::. 10893 10894 -- Directive: %nonassoc 10895 Bison declaration to assign nonassociativity to token(s). *Note 10896 Operator Precedence: Precedence Decl. 10897 10898 -- Directive: %output "FILE" 10899 Bison declaration to set the name of the parser implementation 10900 file. *Note Decl Summary::. 10901 10902 -- Directive: %parse-param {ARGUMENT-DECLARATION} 10903 Bison declaration to specifying an additional parameter that 10904 `yyparse' should accept. *Note The Parser Function `yyparse': 10905 Parser Function. 10906 10907 -- Directive: %prec 10908 Bison declaration to assign a precedence to a specific rule. 10909 *Note Context-Dependent Precedence: Contextual Precedence. 10910 10911 -- Directive: %pure-parser 10912 Deprecated version of `%define api.pure' (*note api.pure: %define 10913 Summary.), for which Bison is more careful to warn about 10914 unreasonable usage. 10915 10916 -- Directive: %require "VERSION" 10917 Require version VERSION or higher of Bison. *Note Require a 10918 Version of Bison: Require Decl. 10919 10920 -- Directive: %right 10921 Bison declaration to assign right associativity to token(s). 10922 *Note Operator Precedence: Precedence Decl. 10923 10924 -- Directive: %skeleton 10925 Specify the skeleton to use; usually for development. *Note Decl 10926 Summary::. 10927 10928 -- Directive: %start 10929 Bison declaration to specify the start symbol. *Note The 10930 Start-Symbol: Start Decl. 10931 10932 -- Directive: %token 10933 Bison declaration to declare token(s) without specifying 10934 precedence. *Note Token Type Names: Token Decl. 10935 10936 -- Directive: %token-table 10937 Bison declaration to include a token name table in the parser 10938 implementation file. *Note Decl Summary::. 10939 10940 -- Directive: %type 10941 Bison declaration to declare nonterminals. *Note Nonterminal 10942 Symbols: Type Decl. 10943 10944 -- Symbol: $undefined 10945 The predefined token onto which all undefined values returned by 10946 `yylex' are mapped. It cannot be used in the grammar, rather, use 10947 `error'. 10948 10949 -- Directive: %union 10950 Bison declaration to specify several possible data types for 10951 semantic values. *Note The Collection of Value Types: Union Decl. 10952 10953 -- Macro: YYABORT 10954 Macro to pretend that an unrecoverable syntax error has occurred, 10955 by making `yyparse' return 1 immediately. The error reporting 10956 function `yyerror' is not called. *Note The Parser Function 10957 `yyparse': Parser Function. 10958 10959 For Java parsers, this functionality is invoked using `return 10960 YYABORT;' instead. 10961 10962 -- Macro: YYACCEPT 10963 Macro to pretend that a complete utterance of the language has been 10964 read, by making `yyparse' return 0 immediately. *Note The Parser 10965 Function `yyparse': Parser Function. 10966 10967 For Java parsers, this functionality is invoked using `return 10968 YYACCEPT;' instead. 10969 10970 -- Macro: YYBACKUP 10971 Macro to discard a value from the parser stack and fake a lookahead 10972 token. *Note Special Features for Use in Actions: Action Features. 10973 10974 -- Variable: yychar 10975 External integer variable that contains the integer value of the 10976 lookahead token. (In a pure parser, it is a local variable within 10977 `yyparse'.) Error-recovery rule actions may examine this variable. 10978 *Note Special Features for Use in Actions: Action Features. 10979 10980 -- Variable: yyclearin 10981 Macro used in error-recovery rule actions. It clears the previous 10982 lookahead token. *Note Error Recovery::. 10983 10984 -- Macro: YYDEBUG 10985 Macro to define to equip the parser with tracing code. *Note 10986 Tracing Your Parser: Tracing. 10987 10988 -- Variable: yydebug 10989 External integer variable set to zero by default. If `yydebug' is 10990 given a nonzero value, the parser will output information on input 10991 symbols and parser action. *Note Tracing Your Parser: Tracing. 10992 10993 -- Macro: yyerrok 10994 Macro to cause parser to recover immediately to its normal mode 10995 after a syntax error. *Note Error Recovery::. 10996 10997 -- Macro: YYERROR 10998 Cause an immediate syntax error. This statement initiates error 10999 recovery just as if the parser itself had detected an error; 11000 however, it does not call `yyerror', and does not print any 11001 message. If you want to print an error message, call `yyerror' 11002 explicitly before the `YYERROR;' statement. *Note Error 11003 Recovery::. 11004 11005 For Java parsers, this functionality is invoked using `return 11006 YYERROR;' instead. 11007 11008 -- Function: yyerror 11009 User-supplied function to be called by `yyparse' on error. *Note 11010 The Error Reporting Function `yyerror': Error Reporting. 11011 11012 -- Macro: YYERROR_VERBOSE 11013 An obsolete macro that you define with `#define' in the prologue 11014 to request verbose, specific error message strings when `yyerror' 11015 is called. It doesn't matter what definition you use for 11016 `YYERROR_VERBOSE', just whether you define it. Supported by the C 11017 skeletons only; using `%error-verbose' is preferred. *Note Error 11018 Reporting::. 11019 11020 -- Macro: YYFPRINTF 11021 Macro used to output run-time traces. *Note Enabling Traces::. 11022 11023 -- Macro: YYINITDEPTH 11024 Macro for specifying the initial size of the parser stack. *Note 11025 Memory Management::. 11026 11027 -- Function: yylex 11028 User-supplied lexical analyzer function, called with no arguments 11029 to get the next token. *Note The Lexical Analyzer Function 11030 `yylex': Lexical. 11031 11032 -- Macro: YYLEX_PARAM 11033 An obsolete macro for specifying an extra argument (or list of 11034 extra arguments) for `yyparse' to pass to `yylex'. The use of this 11035 macro is deprecated, and is supported only for Yacc like parsers. 11036 *Note Calling Conventions for Pure Parsers: Pure Calling. 11037 11038 -- Variable: yylloc 11039 External variable in which `yylex' should place the line and column 11040 numbers associated with a token. (In a pure parser, it is a local 11041 variable within `yyparse', and its address is passed to `yylex'.) 11042 You can ignore this variable if you don't use the `@' feature in 11043 the grammar actions. *Note Textual Locations of Tokens: Token 11044 Locations. In semantic actions, it stores the location of the 11045 lookahead token. *Note Actions and Locations: Actions and 11046 Locations. 11047 11048 -- Type: YYLTYPE 11049 Data type of `yylloc'; by default, a structure with four members. 11050 *Note Data Types of Locations: Location Type. 11051 11052 -- Variable: yylval 11053 External variable in which `yylex' should place the semantic value 11054 associated with a token. (In a pure parser, it is a local 11055 variable within `yyparse', and its address is passed to `yylex'.) 11056 *Note Semantic Values of Tokens: Token Values. In semantic 11057 actions, it stores the semantic value of the lookahead token. 11058 *Note Actions: Actions. 11059 11060 -- Macro: YYMAXDEPTH 11061 Macro for specifying the maximum size of the parser stack. *Note 11062 Memory Management::. 11063 11064 -- Variable: yynerrs 11065 Global variable which Bison increments each time it reports a 11066 syntax error. (In a pure parser, it is a local variable within 11067 `yyparse'. In a pure push parser, it is a member of yypstate.) 11068 *Note The Error Reporting Function `yyerror': Error Reporting. 11069 11070 -- Function: yyparse 11071 The parser function produced by Bison; call this function to start 11072 parsing. *Note The Parser Function `yyparse': Parser Function. 11073 11074 -- Macro: YYPRINT 11075 Macro used to output token semantic values. For `yacc.c' only. 11076 Obsoleted by `%printer'. *Note The `YYPRINT' Macro: The YYPRINT 11077 Macro. 11078 11079 -- Function: yypstate_delete 11080 The function to delete a parser instance, produced by Bison in 11081 push mode; call this function to delete the memory associated with 11082 a parser. *Note The Parser Delete Function `yypstate_delete': 11083 Parser Delete Function. (The current push parsing interface is 11084 experimental and may evolve. More user feedback will help to 11085 stabilize it.) 11086 11087 -- Function: yypstate_new 11088 The function to create a parser instance, produced by Bison in 11089 push mode; call this function to create a new parser. *Note The 11090 Parser Create Function `yypstate_new': Parser Create Function. 11091 (The current push parsing interface is experimental and may evolve. 11092 More user feedback will help to stabilize it.) 11093 11094 -- Function: yypull_parse 11095 The parser function produced by Bison in push mode; call this 11096 function to parse the rest of the input stream. *Note The Pull 11097 Parser Function `yypull_parse': Pull Parser Function. (The 11098 current push parsing interface is experimental and may evolve. 11099 More user feedback will help to stabilize it.) 11100 11101 -- Function: yypush_parse 11102 The parser function produced by Bison in push mode; call this 11103 function to parse a single token. *Note The Push Parser Function 11104 `yypush_parse': Push Parser Function. (The current push parsing 11105 interface is experimental and may evolve. More user feedback will 11106 help to stabilize it.) 11107 11108 -- Macro: YYPARSE_PARAM 11109 An obsolete macro for specifying the name of a parameter that 11110 `yyparse' should accept. The use of this macro is deprecated, and 11111 is supported only for Yacc like parsers. *Note Calling 11112 Conventions for Pure Parsers: Pure Calling. 11113 11114 -- Macro: YYRECOVERING 11115 The expression `YYRECOVERING ()' yields 1 when the parser is 11116 recovering from a syntax error, and 0 otherwise. *Note Special 11117 Features for Use in Actions: Action Features. 11118 11119 -- Macro: YYSTACK_USE_ALLOCA 11120 Macro used to control the use of `alloca' when the deterministic 11121 parser in C needs to extend its stacks. If defined to 0, the 11122 parser will use `malloc' to extend its stacks. If defined to 1, 11123 the parser will use `alloca'. Values other than 0 and 1 are 11124 reserved for future Bison extensions. If not defined, 11125 `YYSTACK_USE_ALLOCA' defaults to 0. 11126 11127 In the all-too-common case where your code may run on a host with a 11128 limited stack and with unreliable stack-overflow checking, you 11129 should set `YYMAXDEPTH' to a value that cannot possibly result in 11130 unchecked stack overflow on any of your target hosts when `alloca' 11131 is called. You can inspect the code that Bison generates in order 11132 to determine the proper numeric values. This will require some 11133 expertise in low-level implementation details. 11134 11135 -- Type: YYSTYPE 11136 Data type of semantic values; `int' by default. *Note Data Types 11137 of Semantic Values: Value Type. 11138 11139 11140 File: bison.info, Node: Glossary, Next: Copying This Manual, Prev: Table of Symbols, Up: Top 11141 11142 Appendix B Glossary 11143 ******************* 11144 11145 Accepting state 11146 A state whose only action is the accept action. The accepting 11147 state is thus a consistent state. *Note Understanding Your 11148 Parser: Understanding. 11149 11150 Backus-Naur Form (BNF; also called "Backus Normal Form") 11151 Formal method of specifying context-free grammars originally 11152 proposed by John Backus, and slightly improved by Peter Naur in 11153 his 1960-01-02 committee document contributing to what became the 11154 Algol 60 report. *Note Languages and Context-Free Grammars: 11155 Language and Grammar. 11156 11157 Consistent state 11158 A state containing only one possible action. *Note Default 11159 Reductions::. 11160 11161 Context-free grammars 11162 Grammars specified as rules that can be applied regardless of 11163 context. Thus, if there is a rule which says that an integer can 11164 be used as an expression, integers are allowed _anywhere_ an 11165 expression is permitted. *Note Languages and Context-Free 11166 Grammars: Language and Grammar. 11167 11168 Default reduction 11169 The reduction that a parser should perform if the current parser 11170 state contains no other action for the lookahead token. In 11171 permitted parser states, Bison declares the reduction with the 11172 largest lookahead set to be the default reduction and removes that 11173 lookahead set. *Note Default Reductions::. 11174 11175 Defaulted state 11176 A consistent state with a default reduction. *Note Default 11177 Reductions::. 11178 11179 Dynamic allocation 11180 Allocation of memory that occurs during execution, rather than at 11181 compile time or on entry to a function. 11182 11183 Empty string 11184 Analogous to the empty set in set theory, the empty string is a 11185 character string of length zero. 11186 11187 Finite-state stack machine 11188 A "machine" that has discrete states in which it is said to exist 11189 at each instant in time. As input to the machine is processed, the 11190 machine moves from state to state as specified by the logic of the 11191 machine. In the case of the parser, the input is the language 11192 being parsed, and the states correspond to various stages in the 11193 grammar rules. *Note The Bison Parser Algorithm: Algorithm. 11194 11195 Generalized LR (GLR) 11196 A parsing algorithm that can handle all context-free grammars, 11197 including those that are not LR(1). It resolves situations that 11198 Bison's deterministic parsing algorithm cannot by effectively 11199 splitting off multiple parsers, trying all possible parsers, and 11200 discarding those that fail in the light of additional right 11201 context. *Note Generalized LR Parsing: Generalized LR Parsing. 11202 11203 Grouping 11204 A language construct that is (in general) grammatically divisible; 11205 for example, `expression' or `declaration' in C. *Note Languages 11206 and Context-Free Grammars: Language and Grammar. 11207 11208 IELR(1) (Inadequacy Elimination LR(1)) 11209 A minimal LR(1) parser table construction algorithm. That is, 11210 given any context-free grammar, IELR(1) generates parser tables 11211 with the full language-recognition power of canonical LR(1) but 11212 with nearly the same number of parser states as LALR(1). This 11213 reduction in parser states is often an order of magnitude. More 11214 importantly, because canonical LR(1)'s extra parser states may 11215 contain duplicate conflicts in the case of non-LR(1) grammars, the 11216 number of conflicts for IELR(1) is often an order of magnitude 11217 less as well. This can significantly reduce the complexity of 11218 developing a grammar. *Note LR Table Construction::. 11219 11220 Infix operator 11221 An arithmetic operator that is placed between the operands on 11222 which it performs some operation. 11223 11224 Input stream 11225 A continuous flow of data between devices or programs. 11226 11227 LAC (Lookahead Correction) 11228 A parsing mechanism that fixes the problem of delayed syntax error 11229 detection, which is caused by LR state merging, default 11230 reductions, and the use of `%nonassoc'. Delayed syntax error 11231 detection results in unexpected semantic actions, initiation of 11232 error recovery in the wrong syntactic context, and an incorrect 11233 list of expected tokens in a verbose syntax error message. *Note 11234 LAC::. 11235 11236 Language construct 11237 One of the typical usage schemas of the language. For example, 11238 one of the constructs of the C language is the `if' statement. 11239 *Note Languages and Context-Free Grammars: Language and Grammar. 11240 11241 Left associativity 11242 Operators having left associativity are analyzed from left to 11243 right: `a+b+c' first computes `a+b' and then combines with `c'. 11244 *Note Operator Precedence: Precedence. 11245 11246 Left recursion 11247 A rule whose result symbol is also its first component symbol; for 11248 example, `expseq1 : expseq1 ',' exp;'. *Note Recursive Rules: 11249 Recursion. 11250 11251 Left-to-right parsing 11252 Parsing a sentence of a language by analyzing it token by token 11253 from left to right. *Note The Bison Parser Algorithm: Algorithm. 11254 11255 Lexical analyzer (scanner) 11256 A function that reads an input stream and returns tokens one by 11257 one. *Note The Lexical Analyzer Function `yylex': Lexical. 11258 11259 Lexical tie-in 11260 A flag, set by actions in the grammar rules, which alters the way 11261 tokens are parsed. *Note Lexical Tie-ins::. 11262 11263 Literal string token 11264 A token which consists of two or more fixed characters. *Note 11265 Symbols::. 11266 11267 Lookahead token 11268 A token already read but not yet shifted. *Note Lookahead Tokens: 11269 Lookahead. 11270 11271 LALR(1) 11272 The class of context-free grammars that Bison (like most other 11273 parser generators) can handle by default; a subset of LR(1). 11274 *Note Mysterious Conflicts::. 11275 11276 LR(1) 11277 The class of context-free grammars in which at most one token of 11278 lookahead is needed to disambiguate the parsing of any piece of 11279 input. 11280 11281 Nonterminal symbol 11282 A grammar symbol standing for a grammatical construct that can be 11283 expressed through rules in terms of smaller constructs; in other 11284 words, a construct that is not a token. *Note Symbols::. 11285 11286 Parser 11287 A function that recognizes valid sentences of a language by 11288 analyzing the syntax structure of a set of tokens passed to it 11289 from a lexical analyzer. 11290 11291 Postfix operator 11292 An arithmetic operator that is placed after the operands upon 11293 which it performs some operation. 11294 11295 Reduction 11296 Replacing a string of nonterminals and/or terminals with a single 11297 nonterminal, according to a grammar rule. *Note The Bison Parser 11298 Algorithm: Algorithm. 11299 11300 Reentrant 11301 A reentrant subprogram is a subprogram which can be in invoked any 11302 number of times in parallel, without interference between the 11303 various invocations. *Note A Pure (Reentrant) Parser: Pure Decl. 11304 11305 Reverse polish notation 11306 A language in which all operators are postfix operators. 11307 11308 Right recursion 11309 A rule whose result symbol is also its last component symbol; for 11310 example, `expseq1: exp ',' expseq1;'. *Note Recursive Rules: 11311 Recursion. 11312 11313 Semantics 11314 In computer languages, the semantics are specified by the actions 11315 taken for each instance of the language, i.e., the meaning of each 11316 statement. *Note Defining Language Semantics: Semantics. 11317 11318 Shift 11319 A parser is said to shift when it makes the choice of analyzing 11320 further input from the stream rather than reducing immediately some 11321 already-recognized rule. *Note The Bison Parser Algorithm: 11322 Algorithm. 11323 11324 Single-character literal 11325 A single character that is recognized and interpreted as is. 11326 *Note From Formal Rules to Bison Input: Grammar in Bison. 11327 11328 Start symbol 11329 The nonterminal symbol that stands for a complete valid utterance 11330 in the language being parsed. The start symbol is usually listed 11331 as the first nonterminal symbol in a language specification. 11332 *Note The Start-Symbol: Start Decl. 11333 11334 Symbol table 11335 A data structure where symbol names and associated data are stored 11336 during parsing to allow for recognition and use of existing 11337 information in repeated uses of a symbol. *Note Multi-function 11338 Calc::. 11339 11340 Syntax error 11341 An error encountered during parsing of an input stream due to 11342 invalid syntax. *Note Error Recovery::. 11343 11344 Token 11345 A basic, grammatically indivisible unit of a language. The symbol 11346 that describes a token in the grammar is a terminal symbol. The 11347 input of the Bison parser is a stream of tokens which comes from 11348 the lexical analyzer. *Note Symbols::. 11349 11350 Terminal symbol 11351 A grammar symbol that has no rules in the grammar and therefore is 11352 grammatically indivisible. The piece of text it represents is a 11353 token. *Note Languages and Context-Free Grammars: Language and 11354 Grammar. 11355 11356 Unreachable state 11357 A parser state to which there does not exist a sequence of 11358 transitions from the parser's start state. A state can become 11359 unreachable during conflict resolution. *Note Unreachable 11360 States::. 11361 11362 11363 File: bison.info, Node: Copying This Manual, Next: Bibliography, Prev: Glossary, Up: Top 11364 11365 Appendix C Copying This Manual 11366 ****************************** 11367 11368 Version 1.3, 3 November 2008 11369 11370 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. 11371 `http://fsf.org/' 11372 11373 Everyone is permitted to copy and distribute verbatim copies 11374 of this license document, but changing it is not allowed. 11375 11376 0. PREAMBLE 11377 11378 The purpose of this License is to make a manual, textbook, or other 11379 functional and useful document "free" in the sense of freedom: to 11380 assure everyone the effective freedom to copy and redistribute it, 11381 with or without modifying it, either commercially or 11382 noncommercially. Secondarily, this License preserves for the 11383 author and publisher a way to get credit for their work, while not 11384 being considered responsible for modifications made by others. 11385 11386 This License is a kind of "copyleft", which means that derivative 11387 works of the document must themselves be free in the same sense. 11388 It complements the GNU General Public License, which is a copyleft 11389 license designed for free software. 11390 11391 We have designed this License in order to use it for manuals for 11392 free software, because free software needs free documentation: a 11393 free program should come with manuals providing the same freedoms 11394 that the software does. But this License is not limited to 11395 software manuals; it can be used for any textual work, regardless 11396 of subject matter or whether it is published as a printed book. 11397 We recommend this License principally for works whose purpose is 11398 instruction or reference. 11399 11400 1. APPLICABILITY AND DEFINITIONS 11401 11402 This License applies to any manual or other work, in any medium, 11403 that contains a notice placed by the copyright holder saying it 11404 can be distributed under the terms of this License. Such a notice 11405 grants a world-wide, royalty-free license, unlimited in duration, 11406 to use that work under the conditions stated herein. The 11407 "Document", below, refers to any such manual or work. Any member 11408 of the public is a licensee, and is addressed as "you". You 11409 accept the license if you copy, modify or distribute the work in a 11410 way requiring permission under copyright law. 11411 11412 A "Modified Version" of the Document means any work containing the 11413 Document or a portion of it, either copied verbatim, or with 11414 modifications and/or translated into another language. 11415 11416 A "Secondary Section" is a named appendix or a front-matter section 11417 of the Document that deals exclusively with the relationship of the 11418 publishers or authors of the Document to the Document's overall 11419 subject (or to related matters) and contains nothing that could 11420 fall directly within that overall subject. (Thus, if the Document 11421 is in part a textbook of mathematics, a Secondary Section may not 11422 explain any mathematics.) The relationship could be a matter of 11423 historical connection with the subject or with related matters, or 11424 of legal, commercial, philosophical, ethical or political position 11425 regarding them. 11426 11427 The "Invariant Sections" are certain Secondary Sections whose 11428 titles are designated, as being those of Invariant Sections, in 11429 the notice that says that the Document is released under this 11430 License. If a section does not fit the above definition of 11431 Secondary then it is not allowed to be designated as Invariant. 11432 The Document may contain zero Invariant Sections. If the Document 11433 does not identify any Invariant Sections then there are none. 11434 11435 The "Cover Texts" are certain short passages of text that are 11436 listed, as Front-Cover Texts or Back-Cover Texts, in the notice 11437 that says that the Document is released under this License. A 11438 Front-Cover Text may be at most 5 words, and a Back-Cover Text may 11439 be at most 25 words. 11440 11441 A "Transparent" copy of the Document means a machine-readable copy, 11442 represented in a format whose specification is available to the 11443 general public, that is suitable for revising the document 11444 straightforwardly with generic text editors or (for images 11445 composed of pixels) generic paint programs or (for drawings) some 11446 widely available drawing editor, and that is suitable for input to 11447 text formatters or for automatic translation to a variety of 11448 formats suitable for input to text formatters. A copy made in an 11449 otherwise Transparent file format whose markup, or absence of 11450 markup, has been arranged to thwart or discourage subsequent 11451 modification by readers is not Transparent. An image format is 11452 not Transparent if used for any substantial amount of text. A 11453 copy that is not "Transparent" is called "Opaque". 11454 11455 Examples of suitable formats for Transparent copies include plain 11456 ASCII without markup, Texinfo input format, LaTeX input format, 11457 SGML or XML using a publicly available DTD, and 11458 standard-conforming simple HTML, PostScript or PDF designed for 11459 human modification. Examples of transparent image formats include 11460 PNG, XCF and JPG. Opaque formats include proprietary formats that 11461 can be read and edited only by proprietary word processors, SGML or 11462 XML for which the DTD and/or processing tools are not generally 11463 available, and the machine-generated HTML, PostScript or PDF 11464 produced by some word processors for output purposes only. 11465 11466 The "Title Page" means, for a printed book, the title page itself, 11467 plus such following pages as are needed to hold, legibly, the 11468 material this License requires to appear in the title page. For 11469 works in formats which do not have any title page as such, "Title 11470 Page" means the text near the most prominent appearance of the 11471 work's title, preceding the beginning of the body of the text. 11472 11473 The "publisher" means any person or entity that distributes copies 11474 of the Document to the public. 11475 11476 A section "Entitled XYZ" means a named subunit of the Document 11477 whose title either is precisely XYZ or contains XYZ in parentheses 11478 following text that translates XYZ in another language. (Here XYZ 11479 stands for a specific section name mentioned below, such as 11480 "Acknowledgements", "Dedications", "Endorsements", or "History".) 11481 To "Preserve the Title" of such a section when you modify the 11482 Document means that it remains a section "Entitled XYZ" according 11483 to this definition. 11484 11485 The Document may include Warranty Disclaimers next to the notice 11486 which states that this License applies to the Document. These 11487 Warranty Disclaimers are considered to be included by reference in 11488 this License, but only as regards disclaiming warranties: any other 11489 implication that these Warranty Disclaimers may have is void and 11490 has no effect on the meaning of this License. 11491 11492 2. VERBATIM COPYING 11493 11494 You may copy and distribute the Document in any medium, either 11495 commercially or noncommercially, provided that this License, the 11496 copyright notices, and the license notice saying this License 11497 applies to the Document are reproduced in all copies, and that you 11498 add no other conditions whatsoever to those of this License. You 11499 may not use technical measures to obstruct or control the reading 11500 or further copying of the copies you make or distribute. However, 11501 you may accept compensation in exchange for copies. If you 11502 distribute a large enough number of copies you must also follow 11503 the conditions in section 3. 11504 11505 You may also lend copies, under the same conditions stated above, 11506 and you may publicly display copies. 11507 11508 3. COPYING IN QUANTITY 11509 11510 If you publish printed copies (or copies in media that commonly 11511 have printed covers) of the Document, numbering more than 100, and 11512 the Document's license notice requires Cover Texts, you must 11513 enclose the copies in covers that carry, clearly and legibly, all 11514 these Cover Texts: Front-Cover Texts on the front cover, and 11515 Back-Cover Texts on the back cover. Both covers must also clearly 11516 and legibly identify you as the publisher of these copies. The 11517 front cover must present the full title with all words of the 11518 title equally prominent and visible. You may add other material 11519 on the covers in addition. Copying with changes limited to the 11520 covers, as long as they preserve the title of the Document and 11521 satisfy these conditions, can be treated as verbatim copying in 11522 other respects. 11523 11524 If the required texts for either cover are too voluminous to fit 11525 legibly, you should put the first ones listed (as many as fit 11526 reasonably) on the actual cover, and continue the rest onto 11527 adjacent pages. 11528 11529 If you publish or distribute Opaque copies of the Document 11530 numbering more than 100, you must either include a 11531 machine-readable Transparent copy along with each Opaque copy, or 11532 state in or with each Opaque copy a computer-network location from 11533 which the general network-using public has access to download 11534 using public-standard network protocols a complete Transparent 11535 copy of the Document, free of added material. If you use the 11536 latter option, you must take reasonably prudent steps, when you 11537 begin distribution of Opaque copies in quantity, to ensure that 11538 this Transparent copy will remain thus accessible at the stated 11539 location until at least one year after the last time you 11540 distribute an Opaque copy (directly or through your agents or 11541 retailers) of that edition to the public. 11542 11543 It is requested, but not required, that you contact the authors of 11544 the Document well before redistributing any large number of 11545 copies, to give them a chance to provide you with an updated 11546 version of the Document. 11547 11548 4. MODIFICATIONS 11549 11550 You may copy and distribute a Modified Version of the Document 11551 under the conditions of sections 2 and 3 above, provided that you 11552 release the Modified Version under precisely this License, with 11553 the Modified Version filling the role of the Document, thus 11554 licensing distribution and modification of the Modified Version to 11555 whoever possesses a copy of it. In addition, you must do these 11556 things in the Modified Version: 11557 11558 A. Use in the Title Page (and on the covers, if any) a title 11559 distinct from that of the Document, and from those of 11560 previous versions (which should, if there were any, be listed 11561 in the History section of the Document). You may use the 11562 same title as a previous version if the original publisher of 11563 that version gives permission. 11564 11565 B. List on the Title Page, as authors, one or more persons or 11566 entities responsible for authorship of the modifications in 11567 the Modified Version, together with at least five of the 11568 principal authors of the Document (all of its principal 11569 authors, if it has fewer than five), unless they release you 11570 from this requirement. 11571 11572 C. State on the Title page the name of the publisher of the 11573 Modified Version, as the publisher. 11574 11575 D. Preserve all the copyright notices of the Document. 11576 11577 E. Add an appropriate copyright notice for your modifications 11578 adjacent to the other copyright notices. 11579 11580 F. Include, immediately after the copyright notices, a license 11581 notice giving the public permission to use the Modified 11582 Version under the terms of this License, in the form shown in 11583 the Addendum below. 11584 11585 G. Preserve in that license notice the full lists of Invariant 11586 Sections and required Cover Texts given in the Document's 11587 license notice. 11588 11589 H. Include an unaltered copy of this License. 11590 11591 I. Preserve the section Entitled "History", Preserve its Title, 11592 and add to it an item stating at least the title, year, new 11593 authors, and publisher of the Modified Version as given on 11594 the Title Page. If there is no section Entitled "History" in 11595 the Document, create one stating the title, year, authors, 11596 and publisher of the Document as given on its Title Page, 11597 then add an item describing the Modified Version as stated in 11598 the previous sentence. 11599 11600 J. Preserve the network location, if any, given in the Document 11601 for public access to a Transparent copy of the Document, and 11602 likewise the network locations given in the Document for 11603 previous versions it was based on. These may be placed in 11604 the "History" section. You may omit a network location for a 11605 work that was published at least four years before the 11606 Document itself, or if the original publisher of the version 11607 it refers to gives permission. 11608 11609 K. For any section Entitled "Acknowledgements" or "Dedications", 11610 Preserve the Title of the section, and preserve in the 11611 section all the substance and tone of each of the contributor 11612 acknowledgements and/or dedications given therein. 11613 11614 L. Preserve all the Invariant Sections of the Document, 11615 unaltered in their text and in their titles. Section numbers 11616 or the equivalent are not considered part of the section 11617 titles. 11618 11619 M. Delete any section Entitled "Endorsements". Such a section 11620 may not be included in the Modified Version. 11621 11622 N. Do not retitle any existing section to be Entitled 11623 "Endorsements" or to conflict in title with any Invariant 11624 Section. 11625 11626 O. Preserve any Warranty Disclaimers. 11627 11628 If the Modified Version includes new front-matter sections or 11629 appendices that qualify as Secondary Sections and contain no 11630 material copied from the Document, you may at your option 11631 designate some or all of these sections as invariant. To do this, 11632 add their titles to the list of Invariant Sections in the Modified 11633 Version's license notice. These titles must be distinct from any 11634 other section titles. 11635 11636 You may add a section Entitled "Endorsements", provided it contains 11637 nothing but endorsements of your Modified Version by various 11638 parties--for example, statements of peer review or that the text 11639 has been approved by an organization as the authoritative 11640 definition of a standard. 11641 11642 You may add a passage of up to five words as a Front-Cover Text, 11643 and a passage of up to 25 words as a Back-Cover Text, to the end 11644 of the list of Cover Texts in the Modified Version. Only one 11645 passage of Front-Cover Text and one of Back-Cover Text may be 11646 added by (or through arrangements made by) any one entity. If the 11647 Document already includes a cover text for the same cover, 11648 previously added by you or by arrangement made by the same entity 11649 you are acting on behalf of, you may not add another; but you may 11650 replace the old one, on explicit permission from the previous 11651 publisher that added the old one. 11652 11653 The author(s) and publisher(s) of the Document do not by this 11654 License give permission to use their names for publicity for or to 11655 assert or imply endorsement of any Modified Version. 11656 11657 5. COMBINING DOCUMENTS 11658 11659 You may combine the Document with other documents released under 11660 this License, under the terms defined in section 4 above for 11661 modified versions, provided that you include in the combination 11662 all of the Invariant Sections of all of the original documents, 11663 unmodified, and list them all as Invariant Sections of your 11664 combined work in its license notice, and that you preserve all 11665 their Warranty Disclaimers. 11666 11667 The combined work need only contain one copy of this License, and 11668 multiple identical Invariant Sections may be replaced with a single 11669 copy. If there are multiple Invariant Sections with the same name 11670 but different contents, make the title of each such section unique 11671 by adding at the end of it, in parentheses, the name of the 11672 original author or publisher of that section if known, or else a 11673 unique number. Make the same adjustment to the section titles in 11674 the list of Invariant Sections in the license notice of the 11675 combined work. 11676 11677 In the combination, you must combine any sections Entitled 11678 "History" in the various original documents, forming one section 11679 Entitled "History"; likewise combine any sections Entitled 11680 "Acknowledgements", and any sections Entitled "Dedications". You 11681 must delete all sections Entitled "Endorsements." 11682 11683 6. COLLECTIONS OF DOCUMENTS 11684 11685 You may make a collection consisting of the Document and other 11686 documents released under this License, and replace the individual 11687 copies of this License in the various documents with a single copy 11688 that is included in the collection, provided that you follow the 11689 rules of this License for verbatim copying of each of the 11690 documents in all other respects. 11691 11692 You may extract a single document from such a collection, and 11693 distribute it individually under this License, provided you insert 11694 a copy of this License into the extracted document, and follow 11695 this License in all other respects regarding verbatim copying of 11696 that document. 11697 11698 7. AGGREGATION WITH INDEPENDENT WORKS 11699 11700 A compilation of the Document or its derivatives with other 11701 separate and independent documents or works, in or on a volume of 11702 a storage or distribution medium, is called an "aggregate" if the 11703 copyright resulting from the compilation is not used to limit the 11704 legal rights of the compilation's users beyond what the individual 11705 works permit. When the Document is included in an aggregate, this 11706 License does not apply to the other works in the aggregate which 11707 are not themselves derivative works of the Document. 11708 11709 If the Cover Text requirement of section 3 is applicable to these 11710 copies of the Document, then if the Document is less than one half 11711 of the entire aggregate, the Document's Cover Texts may be placed 11712 on covers that bracket the Document within the aggregate, or the 11713 electronic equivalent of covers if the Document is in electronic 11714 form. Otherwise they must appear on printed covers that bracket 11715 the whole aggregate. 11716 11717 8. TRANSLATION 11718 11719 Translation is considered a kind of modification, so you may 11720 distribute translations of the Document under the terms of section 11721 4. Replacing Invariant Sections with translations requires special 11722 permission from their copyright holders, but you may include 11723 translations of some or all Invariant Sections in addition to the 11724 original versions of these Invariant Sections. You may include a 11725 translation of this License, and all the license notices in the 11726 Document, and any Warranty Disclaimers, provided that you also 11727 include the original English version of this License and the 11728 original versions of those notices and disclaimers. In case of a 11729 disagreement between the translation and the original version of 11730 this License or a notice or disclaimer, the original version will 11731 prevail. 11732 11733 If a section in the Document is Entitled "Acknowledgements", 11734 "Dedications", or "History", the requirement (section 4) to 11735 Preserve its Title (section 1) will typically require changing the 11736 actual title. 11737 11738 9. TERMINATION 11739 11740 You may not copy, modify, sublicense, or distribute the Document 11741 except as expressly provided under this License. Any attempt 11742 otherwise to copy, modify, sublicense, or distribute it is void, 11743 and will automatically terminate your rights under this License. 11744 11745 However, if you cease all violation of this License, then your 11746 license from a particular copyright holder is reinstated (a) 11747 provisionally, unless and until the copyright holder explicitly 11748 and finally terminates your license, and (b) permanently, if the 11749 copyright holder fails to notify you of the violation by some 11750 reasonable means prior to 60 days after the cessation. 11751 11752 Moreover, your license from a particular copyright holder is 11753 reinstated permanently if the copyright holder notifies you of the 11754 violation by some reasonable means, this is the first time you have 11755 received notice of violation of this License (for any work) from 11756 that copyright holder, and you cure the violation prior to 30 days 11757 after your receipt of the notice. 11758 11759 Termination of your rights under this section does not terminate 11760 the licenses of parties who have received copies or rights from 11761 you under this License. If your rights have been terminated and 11762 not permanently reinstated, receipt of a copy of some or all of 11763 the same material does not give you any rights to use it. 11764 11765 10. FUTURE REVISIONS OF THIS LICENSE 11766 11767 The Free Software Foundation may publish new, revised versions of 11768 the GNU Free Documentation License from time to time. Such new 11769 versions will be similar in spirit to the present version, but may 11770 differ in detail to address new problems or concerns. See 11771 `http://www.gnu.org/copyleft/'. 11772 11773 Each version of the License is given a distinguishing version 11774 number. If the Document specifies that a particular numbered 11775 version of this License "or any later version" applies to it, you 11776 have the option of following the terms and conditions either of 11777 that specified version or of any later version that has been 11778 published (not as a draft) by the Free Software Foundation. If 11779 the Document does not specify a version number of this License, 11780 you may choose any version ever published (not as a draft) by the 11781 Free Software Foundation. If the Document specifies that a proxy 11782 can decide which future versions of this License can be used, that 11783 proxy's public statement of acceptance of a version permanently 11784 authorizes you to choose that version for the Document. 11785 11786 11. RELICENSING 11787 11788 "Massive Multiauthor Collaboration Site" (or "MMC Site") means any 11789 World Wide Web server that publishes copyrightable works and also 11790 provides prominent facilities for anybody to edit those works. A 11791 public wiki that anybody can edit is an example of such a server. 11792 A "Massive Multiauthor Collaboration" (or "MMC") contained in the 11793 site means any set of copyrightable works thus published on the MMC 11794 site. 11795 11796 "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 11797 license published by Creative Commons Corporation, a not-for-profit 11798 corporation with a principal place of business in San Francisco, 11799 California, as well as future copyleft versions of that license 11800 published by that same organization. 11801 11802 "Incorporate" means to publish or republish a Document, in whole or 11803 in part, as part of another Document. 11804 11805 An MMC is "eligible for relicensing" if it is licensed under this 11806 License, and if all works that were first published under this 11807 License somewhere other than this MMC, and subsequently 11808 incorporated in whole or in part into the MMC, (1) had no cover 11809 texts or invariant sections, and (2) were thus incorporated prior 11810 to November 1, 2008. 11811 11812 The operator of an MMC Site may republish an MMC contained in the 11813 site under CC-BY-SA on the same site at any time before August 1, 11814 2009, provided the MMC is eligible for relicensing. 11815 11816 11817 ADDENDUM: How to use this License for your documents 11818 ==================================================== 11819 11820 To use this License in a document you have written, include a copy of 11821 the License in the document and put the following copyright and license 11822 notices just after the title page: 11823 11824 Copyright (C) YEAR YOUR NAME. 11825 Permission is granted to copy, distribute and/or modify this document 11826 under the terms of the GNU Free Documentation License, Version 1.3 11827 or any later version published by the Free Software Foundation; 11828 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover 11829 Texts. A copy of the license is included in the section entitled ``GNU 11830 Free Documentation License''. 11831 11832 If you have Invariant Sections, Front-Cover Texts and Back-Cover 11833 Texts, replace the "with...Texts." line with this: 11834 11835 with the Invariant Sections being LIST THEIR TITLES, with 11836 the Front-Cover Texts being LIST, and with the Back-Cover Texts 11837 being LIST. 11838 11839 If you have Invariant Sections without Cover Texts, or some other 11840 combination of the three, merge those two alternatives to suit the 11841 situation. 11842 11843 If your document contains nontrivial examples of program code, we 11844 recommend releasing these examples in parallel under your choice of 11845 free software license, such as the GNU General Public License, to 11846 permit their use in free software. 11847 11848 11849 File: bison.info, Node: Bibliography, Next: Index of Terms, Prev: Copying This Manual, Up: Top 11850 11851 Bibliography 11852 ************ 11853 11854 [Denny 2008] 11855 Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser 11856 Tables for Non-LR(1) Grammars with Conflict Resolution, in 11857 `Proceedings of the 2008 ACM Symposium on Applied Computing' 11858 (SAC'08), ACM, New York, NY, USA, pp. 240-245. 11859 `http://dx.doi.org/10.1145/1363686.1363747' 11860 11861 [Denny 2010 May] 11862 Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the 11863 Deterministic Parsing of Composite Languages, Ph.D. Dissertation, 11864 Clemson University, Clemson, SC, USA (May 2010). 11865 `http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD' 11866 11867 [Denny 2010 November] 11868 Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for 11869 Generating Minimal LR(1) Parser Tables for Non-LR(1) Grammars with 11870 Conflict Resolution, in `Science of Computer Programming', Vol. 11871 75, Issue 11 (November 2010), pp. 943-979. 11872 `http://dx.doi.org/10.1016/j.scico.2009.08.001' 11873 11874 [DeRemer 1982] 11875 Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1) 11876 Look-Ahead Sets, in `ACM Transactions on Programming Languages and 11877 Systems', Vol. 4, No. 4 (October 1982), pp. 615-649. 11878 `http://dx.doi.org/10.1145/69622.357187' 11879 11880 [Knuth 1965] 11881 Donald E. Knuth, On the Translation of Languages from Left to 11882 Right, in `Information and Control', Vol. 8, Issue 6 (December 11883 1965), pp. 607-639. 11884 `http://dx.doi.org/10.1016/S0019-9958(65)90426-2' 11885 11886 [Scott 2000] 11887 Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain, 11888 `Tomita-Style Generalised LR Parsers', Royal Holloway, University 11889 of London, Department of Computer Science, TR-00-12 (December 11890 2000). 11891 `http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps' 11892 11893 11894 File: bison.info, Node: Index of Terms, Prev: Bibliography, Up: Top 11895 11896 Index of Terms 11897 ************** 11898 11899 [index] 11900 * Menu: 11901 11902 * $ <1>: Table of Symbols. (line 25) 11903 * $ <2>: Java Action Features. 11904 (line 13) 11905 * $ <3>: Table of Symbols. (line 39) 11906 * $ <4>: Action Features. (line 14) 11907 * $: Table of Symbols. (line 34) 11908 * $$ <1>: Java Action Features. 11909 (line 21) 11910 * $$ <2>: Action Features. (line 10) 11911 * $$ <3>: Table of Symbols. (line 30) 11912 * $$: Actions. (line 6) 11913 * $< <1>: Java Action Features. 11914 (line 17) 11915 * $< <2>: Action Features. (line 18) 11916 * $<: Java Action Features. 11917 (line 29) 11918 * $@N: Mid-Rule Action Translation. 11919 (line 6) 11920 * $[NAME]: Actions. (line 6) 11921 * $accept: Table of Symbols. (line 86) 11922 * $end: Table of Symbols. (line 124) 11923 * $N: Actions. (line 6) 11924 * $NAME: Actions. (line 6) 11925 * $undefined: Table of Symbols. (line 245) 11926 * % <1>: Java Declarations Summary. 11927 (line 53) 11928 * %: Table of Symbols. (line 48) 11929 * %% <1>: Table of Symbols. (line 43) 11930 * %%: Java Declarations Summary. 11931 (line 49) 11932 * %code <1>: Table of Symbols. (line 92) 11933 * %code <2>: %code Summary. (line 6) 11934 * %code <3>: Calc++ Parser. (line 65) 11935 * %code <4>: Prologue Alternatives. 11936 (line 6) 11937 * %code <5>: Decl Summary. (line 46) 11938 * %code <6>: Java Declarations Summary. 11939 (line 37) 11940 * %code <7>: Table of Symbols. (line 91) 11941 * %code: Decl Summary. (line 47) 11942 * %code imports <1>: %code Summary. (line 83) 11943 * %code imports: Java Declarations Summary. 11944 (line 41) 11945 * %code lexer: Java Declarations Summary. 11946 (line 45) 11947 * %code provides <1>: %code Summary. (line 55) 11948 * %code provides <2>: Prologue Alternatives. 11949 (line 6) 11950 * %code provides: Decl Summary. (line 95) 11951 * %code requires <1>: Decl Summary. (line 95) 11952 * %code requires <2>: %code Summary. (line 41) 11953 * %code requires <3>: Calc++ Parser. (line 17) 11954 * %code requires: Prologue Alternatives. 11955 (line 6) 11956 * %code top <1>: Prologue Alternatives. 11957 (line 6) 11958 * %code top: %code Summary. (line 67) 11959 * %debug <1>: Decl Summary. (line 52) 11960 * %debug <2>: Table of Symbols. (line 97) 11961 * %debug: Enabling Traces. (line 28) 11962 * %define <1>: %define Summary. (line 16) 11963 * %define <2>: Decl Summary. (line 59) 11964 * %define <3>: Table of Symbols. (line 100) 11965 * %define <4>: Decl Summary. (line 60) 11966 * %define: %define Summary. (line 14) 11967 * %define abstract: Java Declarations Summary. 11968 (line 57) 11969 * %define api.location.type <1>: %define Summary. (line 49) 11970 * %define api.location.type <2>: Java Declarations Summary. 11971 (line 78) 11972 * %define api.location.type: User Defined Location Type. 11973 (line 6) 11974 * %define api.position.type: Java Declarations Summary. 11975 (line 92) 11976 * %define api.prefix: %define Summary. (line 62) 11977 * %define api.pure <1>: Pure Decl. (line 6) 11978 * %define api.pure: %define Summary. (line 75) 11979 * %define api.push-pull <1>: Push Decl. (line 6) 11980 * %define api.push-pull: %define Summary. (line 115) 11981 * %define extends: Java Declarations Summary. 11982 (line 61) 11983 * %define final: Java Declarations Summary. 11984 (line 65) 11985 * %define implements: Java Declarations Summary. 11986 (line 69) 11987 * %define lex_throws: Java Declarations Summary. 11988 (line 73) 11989 * %define lr.default-reductions <1>: %define Summary. (line 128) 11990 * %define lr.default-reductions: Default Reductions. (line 6) 11991 * %define lr.keep-unreachable-states <1>: Unreachable States. (line 6) 11992 * %define lr.keep-unreachable-states <2>: %define Summary. (line 145) 11993 * %define lr.keep-unreachable-states: Unreachable States. (line 17) 11994 * %define lr.type <1>: LR Table Construction. 11995 (line 6) 11996 * %define lr.type <2>: %define Summary. (line 156) 11997 * %define lr.type: LR Table Construction. 11998 (line 24) 11999 * %define namespace <1>: %define Summary. (line 168) 12000 * %define namespace: C++ Bison Interface. (line 10) 12001 * %define package: Java Declarations Summary. 12002 (line 84) 12003 * %define parse.lac <1>: %define Summary. (line 208) 12004 * %define parse.lac: LAC. (line 29) 12005 * %define parser_class_name: Java Declarations Summary. 12006 (line 88) 12007 * %define public: Java Declarations Summary. 12008 (line 97) 12009 * %define strictfp: Java Declarations Summary. 12010 (line 105) 12011 * %define stype: Java Declarations Summary. 12012 (line 101) 12013 * %define throws: Java Declarations Summary. 12014 (line 109) 12015 * %defines <1>: Decl Summary. (line 113) 12016 * %defines: Table of Symbols. (line 110) 12017 * %destructor <1>: Table of Symbols. (line 114) 12018 * %destructor <2>: Destructor Decl. (line 22) 12019 * %destructor <3>: Using Mid-Rule Actions. 12020 (line 76) 12021 * %destructor <4>: Destructor Decl. (line 6) 12022 * %destructor <5>: Decl Summary. (line 116) 12023 * %destructor: Destructor Decl. (line 22) 12024 * %dprec <1>: Table of Symbols. (line 119) 12025 * %dprec: Merging GLR Parses. (line 6) 12026 * %error-verbose <1>: Table of Symbols. (line 138) 12027 * %error-verbose: Error Reporting. (line 17) 12028 * %expect <1>: Expect Decl. (line 6) 12029 * %expect: Decl Summary. (line 38) 12030 * %expect-rr <1>: Simple GLR Parsers. (line 6) 12031 * %expect-rr: Expect Decl. (line 6) 12032 * %file-prefix <1>: Decl Summary. (line 121) 12033 * %file-prefix: Table of Symbols. (line 142) 12034 * %glr-parser <1>: Simple GLR Parsers. (line 6) 12035 * %glr-parser <2>: GLR Parsers. (line 6) 12036 * %glr-parser: Table of Symbols. (line 146) 12037 * %initial-action <1>: Initial Action Decl. (line 11) 12038 * %initial-action <2>: Table of Symbols. (line 150) 12039 * %initial-action: Initial Action Decl. (line 6) 12040 * %language <1>: Decl Summary. (line 125) 12041 * %language: Table of Symbols. (line 154) 12042 * %language "Java": Java Declarations Summary. 12043 (line 10) 12044 * %left <1>: Table of Symbols. (line 158) 12045 * %left <2>: Decl Summary. (line 21) 12046 * %left: Using Precedence. (line 6) 12047 * %lex-param <1>: Pure Calling. (line 31) 12048 * %lex-param <2>: Table of Symbols. (line 162) 12049 * %lex-param: Java Declarations Summary. 12050 (line 13) 12051 * %locations: Decl Summary. (line 131) 12052 * %merge <1>: Merging GLR Parses. (line 6) 12053 * %merge: Table of Symbols. (line 167) 12054 * %name-prefix <1>: Table of Symbols. (line 174) 12055 * %name-prefix: Java Declarations Summary. 12056 (line 19) 12057 * %no-lines <1>: Decl Summary. (line 138) 12058 * %no-lines: Table of Symbols. (line 191) 12059 * %nonassoc <1>: Decl Summary. (line 25) 12060 * %nonassoc <2>: Table of Symbols. (line 195) 12061 * %nonassoc <3>: LR Table Construction. 12062 (line 103) 12063 * %nonassoc <4>: Using Precedence. (line 6) 12064 * %nonassoc: Default Reductions. (line 6) 12065 * %output <1>: Decl Summary. (line 147) 12066 * %output: Table of Symbols. (line 199) 12067 * %parse-param <1>: Table of Symbols. (line 203) 12068 * %parse-param <2>: Parser Function. (line 36) 12069 * %parse-param <3>: Java Declarations Summary. 12070 (line 24) 12071 * %parse-param: Parser Function. (line 36) 12072 * %prec <1>: Contextual Precedence. 12073 (line 6) 12074 * %prec: Table of Symbols. (line 208) 12075 * %printer: Printer Decl. (line 6) 12076 * %pure-parser <1>: Table of Symbols. (line 212) 12077 * %pure-parser: Decl Summary. (line 150) 12078 * %require <1>: Decl Summary. (line 155) 12079 * %require <2>: Table of Symbols. (line 217) 12080 * %require: Require Decl. (line 6) 12081 * %right <1>: Decl Summary. (line 17) 12082 * %right <2>: Using Precedence. (line 6) 12083 * %right: Table of Symbols. (line 221) 12084 * %skeleton <1>: Decl Summary. (line 159) 12085 * %skeleton: Table of Symbols. (line 225) 12086 * %start <1>: Start Decl. (line 6) 12087 * %start <2>: Table of Symbols. (line 229) 12088 * %start: Decl Summary. (line 34) 12089 * %token <1>: Table of Symbols. (line 233) 12090 * %token <2>: Java Declarations Summary. 12091 (line 29) 12092 * %token <3>: Token Decl. (line 6) 12093 * %token: Decl Summary. (line 13) 12094 * %token-table <1>: Decl Summary. (line 167) 12095 * %token-table: Table of Symbols. (line 237) 12096 * %type <1>: Type Decl. (line 6) 12097 * %type <2>: Table of Symbols. (line 241) 12098 * %type <3>: Java Declarations Summary. 12099 (line 33) 12100 * %type: Decl Summary. (line 30) 12101 * %union <1>: Table of Symbols. (line 250) 12102 * %union <2>: Decl Summary. (line 9) 12103 * %union: Union Decl. (line 6) 12104 * %verbose: Decl Summary. (line 200) 12105 * %yacc: Decl Summary. (line 206) 12106 * /*: Table of Symbols. (line 53) 12107 * /* ... */: Grammar Outline. (line 6) 12108 * //: Table of Symbols. (line 54) 12109 * // ...: Grammar Outline. (line 6) 12110 * :: Table of Symbols. (line 57) 12111 * ;: Table of Symbols. (line 61) 12112 * <*> <1>: Printer Decl. (line 6) 12113 * <*> <2>: Destructor Decl. (line 6) 12114 * <*>: Table of Symbols. (line 68) 12115 * <> <1>: Table of Symbols. (line 77) 12116 * <> <2>: Printer Decl. (line 6) 12117 * <>: Destructor Decl. (line 6) 12118 * @$ <1>: Action Features. (line 98) 12119 * @$ <2>: Table of Symbols. (line 7) 12120 * @$ <3>: Actions and Locations. 12121 (line 6) 12122 * @$: Java Action Features. 12123 (line 39) 12124 * @[: Table of Symbols. (line 21) 12125 * @[NAME]: Actions and Locations. 12126 (line 6) 12127 * @N <1>: Action Features. (line 104) 12128 * @N <2>: Table of Symbols. (line 12) 12129 * @N <3>: Action Features. (line 104) 12130 * @N <4>: Actions and Locations. 12131 (line 6) 12132 * @N <5>: Mid-Rule Action Translation. 12133 (line 6) 12134 * @N: Java Action Features. 12135 (line 35) 12136 * @NAME <1>: Actions and Locations. 12137 (line 6) 12138 * @NAME: Table of Symbols. (line 20) 12139 * abstract syntax tree: Implementing Gotos/Loops. 12140 (line 17) 12141 * accepting state: Understanding. (line 177) 12142 * action: Actions. (line 6) 12143 * action data types: Action Types. (line 6) 12144 * action features summary: Action Features. (line 6) 12145 * actions in mid-rule <1>: Mid-Rule Actions. (line 6) 12146 * actions in mid-rule: Destructor Decl. (line 88) 12147 * actions, location: Actions and Locations. 12148 (line 6) 12149 * actions, semantic: Semantic Actions. (line 6) 12150 * additional C code section: Epilogue. (line 6) 12151 * algorithm of parser: Algorithm. (line 6) 12152 * ambiguous grammars <1>: Generalized LR Parsing. 12153 (line 6) 12154 * ambiguous grammars: Language and Grammar. 12155 (line 34) 12156 * associativity: Why Precedence. (line 34) 12157 * AST: Implementing Gotos/Loops. 12158 (line 17) 12159 * Backus-Naur form: Language and Grammar. 12160 (line 16) 12161 * begin of Location: Java Location Values. 12162 (line 21) 12163 * begin of location: C++ location. (line 22) 12164 * Bison declaration summary: Decl Summary. (line 6) 12165 * Bison declarations: Declarations. (line 6) 12166 * Bison declarations (introduction): Bison Declarations. (line 6) 12167 * Bison grammar: Grammar in Bison. (line 6) 12168 * Bison invocation: Invocation. (line 6) 12169 * Bison parser: Bison Parser. (line 6) 12170 * Bison parser algorithm: Algorithm. (line 6) 12171 * Bison symbols, table of: Table of Symbols. (line 6) 12172 * Bison utility: Bison Parser. (line 6) 12173 * bison-i18n.m4: Internationalization. 12174 (line 20) 12175 * bison-po: Internationalization. 12176 (line 6) 12177 * BISON_I18N: Internationalization. 12178 (line 27) 12179 * BISON_LOCALEDIR: Internationalization. 12180 (line 27) 12181 * BNF: Language and Grammar. 12182 (line 16) 12183 * braced code: Rules. (line 29) 12184 * C code, section for additional: Epilogue. (line 6) 12185 * C-language interface: Interface. (line 6) 12186 * calc: Infix Calc. (line 6) 12187 * calculator, infix notation: Infix Calc. (line 6) 12188 * calculator, location tracking: Location Tracking Calc. 12189 (line 6) 12190 * calculator, multi-function: Multi-function Calc. (line 6) 12191 * calculator, simple: RPN Calc. (line 6) 12192 * canonical LR <1>: Mysterious Conflicts. 12193 (line 43) 12194 * canonical LR: LR Table Construction. 12195 (line 6) 12196 * character token: Symbols. (line 37) 12197 * column of position: C++ position. (line 29) 12198 * columns on location: C++ location. (line 26) 12199 * columns on position: C++ position. (line 32) 12200 * comment: Grammar Outline. (line 6) 12201 * compiling the parser: Rpcalc Compile. (line 6) 12202 * conflicts <1>: Merging GLR Parses. (line 6) 12203 * conflicts <2>: Shift/Reduce. (line 6) 12204 * conflicts <3>: GLR Parsers. (line 6) 12205 * conflicts: Simple GLR Parsers. (line 6) 12206 * conflicts, reduce/reduce: Reduce/Reduce. (line 6) 12207 * conflicts, suppressing warnings of: Expect Decl. (line 6) 12208 * consistent states: Default Reductions. (line 17) 12209 * context-dependent precedence: Contextual Precedence. 12210 (line 6) 12211 * context-free grammar: Language and Grammar. 12212 (line 6) 12213 * controlling function: Rpcalc Main. (line 6) 12214 * core, item set: Understanding. (line 124) 12215 * dangling else: Shift/Reduce. (line 6) 12216 * data type of locations: Location Type. (line 6) 12217 * data types in actions: Action Types. (line 6) 12218 * data types of semantic values: Value Type. (line 6) 12219 * debug_level on parser: C++ Parser Interface. 12220 (line 42) 12221 * debug_stream on parser: C++ Parser Interface. 12222 (line 37) 12223 * debugging: Tracing. (line 6) 12224 * declaration summary: Decl Summary. (line 6) 12225 * declarations: Prologue. (line 6) 12226 * declarations section: Prologue. (line 6) 12227 * declarations, Bison: Declarations. (line 6) 12228 * declarations, Bison (introduction): Bison Declarations. (line 6) 12229 * declaring literal string tokens: Token Decl. (line 6) 12230 * declaring operator precedence: Precedence Decl. (line 6) 12231 * declaring the start symbol: Start Decl. (line 6) 12232 * declaring token type names: Token Decl. (line 6) 12233 * declaring value types: Union Decl. (line 6) 12234 * declaring value types, nonterminals: Type Decl. (line 6) 12235 * default action: Actions. (line 62) 12236 * default data type: Value Type. (line 6) 12237 * default location type: Location Type. (line 6) 12238 * default reductions: Default Reductions. (line 6) 12239 * default stack limit: Memory Management. (line 30) 12240 * default start symbol: Start Decl. (line 6) 12241 * defaulted states: Default Reductions. (line 17) 12242 * deferred semantic actions: GLR Semantic Actions. 12243 (line 6) 12244 * defining language semantics: Semantics. (line 6) 12245 * delayed syntax error detection <1>: Default Reductions. (line 43) 12246 * delayed syntax error detection: LR Table Construction. 12247 (line 103) 12248 * delayed yylex invocations: Default Reductions. (line 17) 12249 * discarded symbols: Destructor Decl. (line 98) 12250 * discarded symbols, mid-rule actions: Using Mid-Rule Actions. 12251 (line 76) 12252 * dot: Graphviz. (line 6) 12253 * else, dangling: Shift/Reduce. (line 6) 12254 * end of Location: Java Location Values. 12255 (line 22) 12256 * end of location: C++ location. (line 23) 12257 * epilogue: Epilogue. (line 6) 12258 * error <1>: Table of Symbols. (line 128) 12259 * error: Error Recovery. (line 20) 12260 * error on parser: C++ Parser Interface. 12261 (line 48) 12262 * error recovery: Error Recovery. (line 6) 12263 * error recovery, mid-rule actions: Using Mid-Rule Actions. 12264 (line 76) 12265 * error recovery, simple: Simple Error Recovery. 12266 (line 6) 12267 * error reporting function: Error Reporting. (line 6) 12268 * error reporting routine: Rpcalc Error. (line 6) 12269 * examples, simple: Examples. (line 6) 12270 * exceptions: C++ Parser Interface. 12271 (line 32) 12272 * exercises: Exercises. (line 6) 12273 * file format: Grammar Layout. (line 6) 12274 * file of position: C++ position. (line 17) 12275 * finite-state machine: Parser States. (line 6) 12276 * formal grammar: Grammar in Bison. (line 6) 12277 * format of grammar file: Grammar Layout. (line 6) 12278 * freeing discarded symbols: Destructor Decl. (line 6) 12279 * frequently asked questions: FAQ. (line 6) 12280 * generalized LR (GLR) parsing <1>: Language and Grammar. 12281 (line 34) 12282 * generalized LR (GLR) parsing <2>: Generalized LR Parsing. 12283 (line 6) 12284 * generalized LR (GLR) parsing: GLR Parsers. (line 6) 12285 * generalized LR (GLR) parsing, ambiguous grammars: Merging GLR Parses. 12286 (line 6) 12287 * generalized LR (GLR) parsing, unambiguous grammars: Simple GLR Parsers. 12288 (line 6) 12289 * getDebugLevel on YYParser: Java Parser Interface. 12290 (line 67) 12291 * getDebugStream on YYParser: Java Parser Interface. 12292 (line 62) 12293 * getEndPos on Lexer: Java Scanner Interface. 12294 (line 40) 12295 * getLVal on Lexer: Java Scanner Interface. 12296 (line 48) 12297 * getStartPos on Lexer: Java Scanner Interface. 12298 (line 39) 12299 * gettext: Internationalization. 12300 (line 6) 12301 * glossary: Glossary. (line 6) 12302 * GLR parsers and inline: Compiler Requirements. 12303 (line 6) 12304 * GLR parsers and yychar: GLR Semantic Actions. 12305 (line 10) 12306 * GLR parsers and yyclearin: GLR Semantic Actions. 12307 (line 18) 12308 * GLR parsers and YYERROR: GLR Semantic Actions. 12309 (line 28) 12310 * GLR parsers and yylloc: GLR Semantic Actions. 12311 (line 10) 12312 * GLR parsers and YYLLOC_DEFAULT: Location Default Action. 12313 (line 6) 12314 * GLR parsers and yylval: GLR Semantic Actions. 12315 (line 10) 12316 * GLR parsing <1>: Generalized LR Parsing. 12317 (line 6) 12318 * GLR parsing <2>: GLR Parsers. (line 6) 12319 * GLR parsing: Language and Grammar. 12320 (line 34) 12321 * GLR parsing, ambiguous grammars: Merging GLR Parses. (line 6) 12322 * GLR parsing, unambiguous grammars: Simple GLR Parsers. (line 6) 12323 * GLR with LALR: LR Table Construction. 12324 (line 65) 12325 * grammar file: Grammar Layout. (line 6) 12326 * grammar rule syntax: Rules. (line 6) 12327 * grammar rules section: Grammar Rules. (line 6) 12328 * grammar, Bison: Grammar in Bison. (line 6) 12329 * grammar, context-free: Language and Grammar. 12330 (line 6) 12331 * grouping, syntactic: Language and Grammar. 12332 (line 48) 12333 * Header guard: Decl Summary. (line 98) 12334 * i18n: Internationalization. 12335 (line 6) 12336 * IELR <1>: LR Table Construction. 12337 (line 6) 12338 * IELR: Mysterious Conflicts. 12339 (line 43) 12340 * IELR grammars: Language and Grammar. 12341 (line 22) 12342 * infix notation calculator: Infix Calc. (line 6) 12343 * initialize on location: C++ location. (line 19) 12344 * initialize on position: C++ position. (line 14) 12345 * inline: Compiler Requirements. 12346 (line 6) 12347 * interface: Interface. (line 6) 12348 * internationalization: Internationalization. 12349 (line 6) 12350 * introduction: Introduction. (line 6) 12351 * invoking Bison: Invocation. (line 6) 12352 * item: Understanding. (line 102) 12353 * item set core: Understanding. (line 124) 12354 * kernel, item set: Understanding. (line 124) 12355 * LAC <1>: LR Table Construction. 12356 (line 103) 12357 * LAC <2>: LAC. (line 6) 12358 * LAC: Default Reductions. (line 54) 12359 * LALR <1>: LR Table Construction. 12360 (line 6) 12361 * LALR: Mysterious Conflicts. 12362 (line 31) 12363 * LALR grammars: Language and Grammar. 12364 (line 22) 12365 * language semantics, defining: Semantics. (line 6) 12366 * layout of Bison grammar: Grammar Layout. (line 6) 12367 * left recursion: Recursion. (line 17) 12368 * lex-param: Pure Calling. (line 31) 12369 * lexical analyzer: Lexical. (line 6) 12370 * lexical analyzer, purpose: Bison Parser. (line 6) 12371 * lexical analyzer, writing: Rpcalc Lexer. (line 6) 12372 * lexical tie-in: Lexical Tie-ins. (line 6) 12373 * line of position: C++ position. (line 23) 12374 * lines on location: C++ location. (line 27) 12375 * lines on position: C++ position. (line 26) 12376 * literal string token: Symbols. (line 59) 12377 * literal token: Symbols. (line 37) 12378 * location <1>: Locations. (line 6) 12379 * location: Tracking Locations. (line 6) 12380 * location actions: Actions and Locations. 12381 (line 6) 12382 * location on location: C++ location. (line 8) 12383 * Location on Location: Java Location Values. 12384 (line 25) 12385 * location on location: C++ location. (line 12) 12386 * location tracking calculator: Location Tracking Calc. 12387 (line 6) 12388 * location, textual <1>: Tracking Locations. (line 6) 12389 * location, textual: Locations. (line 6) 12390 * location_type: C++ Parser Interface. 12391 (line 16) 12392 * lookahead correction: LAC. (line 6) 12393 * lookahead token: Lookahead. (line 6) 12394 * LR: Mysterious Conflicts. 12395 (line 31) 12396 * LR grammars: Language and Grammar. 12397 (line 22) 12398 * ltcalc: Location Tracking Calc. 12399 (line 6) 12400 * main function in simple example: Rpcalc Main. (line 6) 12401 * memory exhaustion: Memory Management. (line 6) 12402 * memory management: Memory Management. (line 6) 12403 * mfcalc: Multi-function Calc. (line 6) 12404 * mid-rule actions <1>: Destructor Decl. (line 88) 12405 * mid-rule actions: Mid-Rule Actions. (line 6) 12406 * multi-function calculator: Multi-function Calc. (line 6) 12407 * multicharacter literal: Symbols. (line 59) 12408 * mutual recursion: Recursion. (line 34) 12409 * Mysterious Conflict: LR Table Construction. 12410 (line 6) 12411 * Mysterious Conflicts: Mysterious Conflicts. 12412 (line 6) 12413 * named references: Named References. (line 6) 12414 * NLS: Internationalization. 12415 (line 6) 12416 * nondeterministic parsing <1>: Language and Grammar. 12417 (line 34) 12418 * nondeterministic parsing: Generalized LR Parsing. 12419 (line 6) 12420 * nonterminal symbol: Symbols. (line 6) 12421 * nonterminal, useless: Understanding. (line 48) 12422 * operator precedence: Precedence. (line 6) 12423 * operator precedence, declaring: Precedence Decl. (line 6) 12424 * operator!= on location: C++ location. (line 39) 12425 * operator!= on position: C++ position. (line 42) 12426 * operator+ on location: C++ location. (line 31) 12427 * operator+ on position: C++ position. (line 36) 12428 * operator+= on location: C++ location. (line 32) 12429 * operator+= on position: C++ position. (line 35) 12430 * operator- on position: C++ position. (line 38) 12431 * operator-= on position: C++ position. (line 37) 12432 * operator<< <1>: C++ position. (line 46) 12433 * operator<<: C++ location. (line 44) 12434 * operator== on location: C++ location. (line 38) 12435 * operator== on position: C++ position. (line 41) 12436 * options for invoking Bison: Invocation. (line 6) 12437 * overflow of parser stack: Memory Management. (line 6) 12438 * parse error: Error Reporting. (line 6) 12439 * parse on parser: C++ Parser Interface. 12440 (line 30) 12441 * parse on YYParser: Java Parser Interface. 12442 (line 54) 12443 * parser: Bison Parser. (line 6) 12444 * parser on parser: C++ Parser Interface. 12445 (line 26) 12446 * parser stack: Algorithm. (line 6) 12447 * parser stack overflow: Memory Management. (line 6) 12448 * parser state: Parser States. (line 6) 12449 * pointed rule: Understanding. (line 102) 12450 * polish notation calculator: RPN Calc. (line 6) 12451 * position on position: C++ position. (line 8) 12452 * precedence declarations: Precedence Decl. (line 6) 12453 * precedence of operators: Precedence. (line 6) 12454 * precedence, context-dependent: Contextual Precedence. 12455 (line 6) 12456 * precedence, unary operator: Contextual Precedence. 12457 (line 6) 12458 * preventing warnings about conflicts: Expect Decl. (line 6) 12459 * printing semantic values: Printer Decl. (line 6) 12460 * Prologue <1>: %code Summary. (line 6) 12461 * Prologue: Prologue. (line 6) 12462 * Prologue Alternatives: Prologue Alternatives. 12463 (line 6) 12464 * pure parser: Pure Decl. (line 6) 12465 * push parser: Push Decl. (line 6) 12466 * questions: FAQ. (line 6) 12467 * recovering: Java Action Features. 12468 (line 55) 12469 * recovering on YYParser: Java Parser Interface. 12470 (line 58) 12471 * recovery from errors: Error Recovery. (line 6) 12472 * recursive rule: Recursion. (line 6) 12473 * reduce/reduce conflict: Reduce/Reduce. (line 6) 12474 * reduce/reduce conflicts <1>: Simple GLR Parsers. (line 6) 12475 * reduce/reduce conflicts <2>: GLR Parsers. (line 6) 12476 * reduce/reduce conflicts: Merging GLR Parses. (line 6) 12477 * reduction: Algorithm. (line 6) 12478 * reentrant parser: Pure Decl. (line 6) 12479 * requiring a version of Bison: Require Decl. (line 6) 12480 * reverse polish notation: RPN Calc. (line 6) 12481 * right recursion: Recursion. (line 17) 12482 * rpcalc: RPN Calc. (line 6) 12483 * rule syntax: Rules. (line 6) 12484 * rule, pointed: Understanding. (line 102) 12485 * rule, useless: Understanding. (line 48) 12486 * rules section for grammar: Grammar Rules. (line 6) 12487 * running Bison (introduction): Rpcalc Generate. (line 6) 12488 * semantic actions: Semantic Actions. (line 6) 12489 * semantic value: Semantic Values. (line 6) 12490 * semantic value type: Value Type. (line 6) 12491 * semantic_type: C++ Parser Interface. 12492 (line 15) 12493 * set_debug_level on parser: C++ Parser Interface. 12494 (line 43) 12495 * set_debug_stream on parser: C++ Parser Interface. 12496 (line 38) 12497 * setDebugLevel on YYParser: Java Parser Interface. 12498 (line 68) 12499 * setDebugStream on YYParser: Java Parser Interface. 12500 (line 63) 12501 * shift/reduce conflicts <1>: Simple GLR Parsers. (line 6) 12502 * shift/reduce conflicts <2>: GLR Parsers. (line 6) 12503 * shift/reduce conflicts: Shift/Reduce. (line 6) 12504 * shifting: Algorithm. (line 6) 12505 * simple examples: Examples. (line 6) 12506 * single-character literal: Symbols. (line 37) 12507 * stack overflow: Memory Management. (line 6) 12508 * stack, parser: Algorithm. (line 6) 12509 * stages in using Bison: Stages. (line 6) 12510 * start symbol: Language and Grammar. 12511 (line 97) 12512 * start symbol, declaring: Start Decl. (line 6) 12513 * state (of parser): Parser States. (line 6) 12514 * step on location: C++ location. (line 35) 12515 * string token: Symbols. (line 59) 12516 * summary, action features: Action Features. (line 6) 12517 * summary, Bison declaration: Decl Summary. (line 6) 12518 * suppressing conflict warnings: Expect Decl. (line 6) 12519 * symbol: Symbols. (line 6) 12520 * symbol table example: Mfcalc Symbol Table. (line 6) 12521 * symbols (abstract): Language and Grammar. 12522 (line 48) 12523 * symbols in Bison, table of: Table of Symbols. (line 6) 12524 * syntactic grouping: Language and Grammar. 12525 (line 48) 12526 * syntax error: Error Reporting. (line 6) 12527 * syntax of grammar rules: Rules. (line 6) 12528 * terminal symbol: Symbols. (line 6) 12529 * textual location <1>: Tracking Locations. (line 6) 12530 * textual location: Locations. (line 6) 12531 * token <1>: Language and Grammar. 12532 (line 48) 12533 * token: C++ Parser Interface. 12534 (line 19) 12535 * token type: Symbols. (line 6) 12536 * token type names, declaring: Token Decl. (line 6) 12537 * token, useless: Understanding. (line 48) 12538 * toString on Location: Java Location Values. 12539 (line 32) 12540 * tracing the parser: Tracing. (line 6) 12541 * uint: C++ Location Values. (line 15) 12542 * unary operator precedence: Contextual Precedence. 12543 (line 6) 12544 * unreachable states: Unreachable States. (line 6) 12545 * useless nonterminal: Understanding. (line 48) 12546 * useless rule: Understanding. (line 48) 12547 * useless token: Understanding. (line 48) 12548 * using Bison: Stages. (line 6) 12549 * value type, semantic: Value Type. (line 6) 12550 * value types, declaring: Union Decl. (line 6) 12551 * value types, nonterminals, declaring: Type Decl. (line 6) 12552 * value, semantic: Semantic Values. (line 6) 12553 * version requirement: Require Decl. (line 6) 12554 * warnings, preventing: Expect Decl. (line 6) 12555 * writing a lexical analyzer: Rpcalc Lexer. (line 6) 12556 * xml: Xml. (line 6) 12557 * YYABORT <1>: Parser Function. (line 29) 12558 * YYABORT <2>: Table of Symbols. (line 254) 12559 * YYABORT <3>: Action Features. (line 28) 12560 * YYABORT: Java Action Features. 12561 (line 43) 12562 * YYACCEPT <1>: Table of Symbols. (line 263) 12563 * YYACCEPT <2>: Parser Function. (line 26) 12564 * YYACCEPT <3>: Java Action Features. 12565 (line 47) 12566 * YYACCEPT <4>: Parser Function. (line 26) 12567 * YYACCEPT: Action Features. (line 32) 12568 * YYBACKUP <1>: Table of Symbols. (line 271) 12569 * YYBACKUP: Action Features. (line 36) 12570 * yychar <1>: Action Features. (line 69) 12571 * yychar <2>: GLR Semantic Actions. 12572 (line 10) 12573 * yychar <3>: Lookahead. (line 49) 12574 * yychar: Table of Symbols. (line 275) 12575 * yyclearin <1>: GLR Semantic Actions. 12576 (line 18) 12577 * yyclearin <2>: Error Recovery. (line 99) 12578 * yyclearin <3>: Action Features. (line 76) 12579 * yyclearin: Table of Symbols. (line 281) 12580 * YYDEBUG: Enabling Traces. (line 9) 12581 * yydebug: Tracing. (line 6) 12582 * YYDEBUG: Table of Symbols. (line 285) 12583 * yydebug: Table of Symbols. (line 289) 12584 * YYEMPTY: Action Features. (line 49) 12585 * YYENABLE_NLS: Internationalization. 12586 (line 27) 12587 * YYEOF: Action Features. (line 52) 12588 * yyerrok <1>: Action Features. (line 81) 12589 * yyerrok <2>: Table of Symbols. (line 294) 12590 * yyerrok: Error Recovery. (line 94) 12591 * yyerror: Java Action Features. 12592 (line 60) 12593 * YYERROR <1>: Java Action Features. 12594 (line 51) 12595 * YYERROR: Action Features. (line 56) 12596 * yyerror <1>: Error Reporting. (line 6) 12597 * yyerror: Table of Symbols. (line 309) 12598 * YYERROR <1>: Table of Symbols. (line 298) 12599 * YYERROR: GLR Semantic Actions. 12600 (line 28) 12601 * yyerror: Java Action Features. 12602 (line 61) 12603 * yyerror on Lexer: Java Scanner Interface. 12604 (line 25) 12605 * YYERROR_VERBOSE: Table of Symbols. (line 313) 12606 * YYFPRINTF <1>: Enabling Traces. (line 36) 12607 * YYFPRINTF: Table of Symbols. (line 321) 12608 * YYINITDEPTH <1>: Memory Management. (line 32) 12609 * YYINITDEPTH: Table of Symbols. (line 324) 12610 * yylex <1>: Lexical. (line 6) 12611 * yylex: Table of Symbols. (line 328) 12612 * yylex on Lexer: Java Scanner Interface. 12613 (line 31) 12614 * yylex on parser: C++ Scanner Interface. 12615 (line 13) 12616 * YYLEX_PARAM: Table of Symbols. (line 333) 12617 * yylloc <1>: GLR Semantic Actions. 12618 (line 10) 12619 * yylloc <2>: Actions and Locations. 12620 (line 67) 12621 * yylloc <3>: Lookahead. (line 49) 12622 * yylloc <4>: Action Features. (line 86) 12623 * yylloc <5>: Table of Symbols. (line 339) 12624 * yylloc: Token Locations. (line 6) 12625 * YYLLOC_DEFAULT: Location Default Action. 12626 (line 6) 12627 * YYLTYPE <1>: Table of Symbols. (line 349) 12628 * YYLTYPE: Token Locations. (line 19) 12629 * yylval <1>: Token Values. (line 6) 12630 * yylval <2>: Lookahead. (line 49) 12631 * yylval <3>: Table of Symbols. (line 353) 12632 * yylval <4>: Actions. (line 87) 12633 * yylval <5>: GLR Semantic Actions. 12634 (line 10) 12635 * yylval: Action Features. (line 92) 12636 * YYMAXDEPTH <1>: Memory Management. (line 14) 12637 * YYMAXDEPTH: Table of Symbols. (line 361) 12638 * yynerrs <1>: Table of Symbols. (line 365) 12639 * yynerrs: Error Reporting. (line 69) 12640 * yyoutput: Printer Decl. (line 16) 12641 * yyparse <1>: Parser Function. (line 13) 12642 * yyparse: Table of Symbols. (line 371) 12643 * YYPARSE_PARAM: Table of Symbols. (line 409) 12644 * YYParser on YYParser: Java Parser Interface. 12645 (line 41) 12646 * YYPRINT <1>: The YYPRINT Macro. (line 6) 12647 * YYPRINT <2>: Table of Symbols. (line 375) 12648 * YYPRINT: The YYPRINT Macro. (line 11) 12649 * yypstate_delete <1>: Table of Symbols. (line 380) 12650 * yypstate_delete: Parser Delete Function. 12651 (line 15) 12652 * yypstate_new <1>: Parser Create Function. 12653 (line 15) 12654 * yypstate_new <2>: Table of Symbols. (line 388) 12655 * yypstate_new: Parser Create Function. 12656 (line 6) 12657 * yypull_parse <1>: Table of Symbols. (line 395) 12658 * yypull_parse: Pull Parser Function. 12659 (line 14) 12660 * yypush_parse <1>: Table of Symbols. (line 402) 12661 * yypush_parse: Push Parser Function. 12662 (line 6) 12663 * YYRECOVERING <1>: Error Recovery. (line 111) 12664 * YYRECOVERING <2>: Action Features. (line 64) 12665 * YYRECOVERING: Table of Symbols. (line 415) 12666 * YYSTACK_USE_ALLOCA: Table of Symbols. (line 420) 12667 * YYSTYPE: Table of Symbols. (line 436) 12668 * | <1>: Rules. (line 48) 12669 * |: Table of Symbols. (line 64) 12670 12671 12672 12673 Tag Table: 12674 Node: Top1060 12675 Node: Introduction15110 12676 Node: Conditions16781 12677 Node: Copying18692 12678 Node: Concepts56230 12679 Node: Language and Grammar57422 12680 Node: Grammar in Bison63357 12681 Node: Semantic Values65271 12682 Node: Semantic Actions67377 12683 Node: GLR Parsers68551 12684 Node: Simple GLR Parsers71296 12685 Node: Merging GLR Parses77709 12686 Node: GLR Semantic Actions82251 12687 Node: Compiler Requirements84145 12688 Node: Locations84897 12689 Node: Bison Parser86350 12690 Node: Stages89471 12691 Node: Grammar Layout90759 12692 Node: Examples92091 12693 Node: RPN Calc93287 12694 Node: Rpcalc Declarations94289 12695 Node: Rpcalc Rules96217 12696 Node: Rpcalc Input97950 12697 Node: Rpcalc Line99421 12698 Node: Rpcalc Expr100542 12699 Node: Rpcalc Lexer102431 12700 Node: Rpcalc Main105033 12701 Node: Rpcalc Error105440 12702 Node: Rpcalc Generate106473 12703 Node: Rpcalc Compile107709 12704 Node: Infix Calc108633 12705 Node: Simple Error Recovery111347 12706 Node: Location Tracking Calc113229 12707 Node: Ltcalc Declarations113925 12708 Node: Ltcalc Rules115014 12709 Node: Ltcalc Lexer116879 12710 Node: Multi-function Calc119202 12711 Node: Mfcalc Declarations120778 12712 Node: Mfcalc Rules122790 12713 Node: Mfcalc Symbol Table124112 12714 Node: Exercises130956 12715 Node: Grammar File131470 12716 Node: Grammar Outline132378 12717 Node: Prologue133228 12718 Node: Prologue Alternatives135032 12719 Node: Bison Declarations144604 12720 Node: Grammar Rules145032 12721 Node: Epilogue145503 12722 Node: Symbols146548 12723 Node: Rules153623 12724 Node: Recursion156043 12725 Node: Semantics157718 12726 Node: Value Type158826 12727 Node: Multiple Types159661 12728 Node: Actions160828 12729 Node: Action Types164641 12730 Node: Mid-Rule Actions165935 12731 Node: Using Mid-Rule Actions166519 12732 Node: Mid-Rule Action Translation170599 12733 Node: Mid-Rule Conflicts172475 12734 Node: Tracking Locations175101 12735 Node: Location Type175765 12736 Node: Actions and Locations176785 12737 Node: Location Default Action179235 12738 Node: Named References182733 12739 Node: Declarations185369 12740 Node: Require Decl187106 12741 Node: Token Decl187425 12742 Node: Precedence Decl189851 12743 Node: Union Decl191861 12744 Node: Type Decl193635 12745 Node: Initial Action Decl194561 12746 Node: Destructor Decl195347 12747 Node: Printer Decl201012 12748 Node: Expect Decl203308 12749 Node: Start Decl205304 12750 Node: Pure Decl205694 12751 Node: Push Decl207449 12752 Node: Decl Summary211940 12753 Node: %define Summary220627 12754 Node: %code Summary227657 12755 Node: Multiple Parsers231375 12756 Node: Interface234444 12757 Node: Parser Function235762 12758 Node: Push Parser Function238161 12759 Node: Pull Parser Function238947 12760 Node: Parser Create Function239596 12761 Node: Parser Delete Function240416 12762 Node: Lexical241183 12763 Node: Calling Convention242626 12764 Node: Token Values245601 12765 Node: Token Locations246765 12766 Node: Pure Calling247649 12767 Node: Error Reporting249196 12768 Node: Action Features252491 12769 Node: Internationalization256792 12770 Node: Algorithm259335 12771 Node: Lookahead261765 12772 Node: Shift/Reduce263942 12773 Node: Precedence267119 12774 Node: Why Precedence267837 12775 Node: Using Precedence269673 12776 Node: Precedence Examples270650 12777 Node: How Precedence271167 12778 Node: Non Operators272346 12779 Node: Contextual Precedence273904 12780 Node: Parser States275682 12781 Node: Reduce/Reduce276926 12782 Node: Mysterious Conflicts281576 12783 Node: Tuning LR285082 12784 Node: LR Table Construction286390 12785 Node: Default Reductions292078 12786 Node: LAC296912 12787 Node: Unreachable States302464 12788 Node: Generalized LR Parsing304455 12789 Node: Memory Management308829 12790 Node: Error Recovery311061 12791 Node: Context Dependency316316 12792 Node: Semantic Tokens317165 12793 Node: Lexical Tie-ins320157 12794 Node: Tie-in Recovery321595 12795 Node: Debugging323689 12796 Node: Understanding324928 12797 Node: Graphviz335514 12798 Node: Xml339875 12799 Node: Tracing341593 12800 Node: Enabling Traces342027 12801 Node: Mfcalc Traces345489 12802 Node: The YYPRINT Macro350761 12803 Node: Invocation351924 12804 Node: Bison Options353339 12805 Node: Option Cross Key363832 12806 Node: Yacc Library365755 12807 Node: Other Languages366580 12808 Node: C++ Parsers366907 12809 Node: C++ Bison Interface367404 12810 Node: C++ Semantic Values368793 12811 Ref: C++ Semantic Values-Footnote-1369735 12812 Node: C++ Location Values369888 12813 Node: C++ position370813 12814 Node: C++ location372689 12815 Node: User Defined Location Type374445 12816 Node: C++ Parser Interface375944 12817 Node: C++ Scanner Interface378176 12818 Node: A Complete C++ Example378877 12819 Node: Calc++ --- C++ Calculator379819 12820 Node: Calc++ Parsing Driver380333 12821 Node: Calc++ Parser384114 12822 Node: Calc++ Scanner387938 12823 Node: Calc++ Top Level391463 12824 Node: Java Parsers392119 12825 Node: Java Bison Interface392796 12826 Node: Java Semantic Values394847 12827 Node: Java Location Values396461 12828 Node: Java Parser Interface398017 12829 Node: Java Scanner Interface401255 12830 Node: Java Action Features403452 12831 Node: Java Differences406075 12832 Ref: Java Differences-Footnote-1408642 12833 Node: Java Declarations Summary408792 12834 Node: FAQ413114 12835 Node: Memory Exhausted414061 12836 Node: How Can I Reset the Parser414374 12837 Node: Strings are Destroyed416915 12838 Node: Implementing Gotos/Loops418588 12839 Node: Multiple start-symbols419871 12840 Node: Secure? Conform?421418 12841 Node: I can't build Bison421866 12842 Node: Where can I find help?422580 12843 Node: Bug Reports423373 12844 Node: More Languages424833 12845 Node: Beta Testing425191 12846 Node: Mailing Lists426065 12847 Node: Table of Symbols426276 12848 Node: Glossary443560 12849 Node: Copying This Manual452564 12850 Node: Bibliography477699 12851 Node: Index of Terms479590 12852 12853 End Tag Table 12854