1 This is bison.info, produced by makeinfo version 4.8 from bison.texinfo. 2 3 This manual is for GNU Bison (version 2.3, 30 May 2006), the GNU 4 parser generator. 5 6 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 7 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. 8 9 Permission is granted to copy, distribute and/or modify this 10 document under the terms of the GNU Free Documentation License, 11 Version 1.2 or any later version published by the Free Software 12 Foundation; with no Invariant Sections, with the Front-Cover texts 13 being "A GNU Manual," and with the Back-Cover Texts as in (a) 14 below. A copy of the license is included in the section entitled 15 "GNU Free Documentation License." 16 17 (a) The FSF's Back-Cover Text is: "You have freedom to copy and 18 modify this GNU Manual, like GNU software. Copies published by 19 the Free Software Foundation raise funds for GNU development." 20 21 INFO-DIR-SECTION Software development 22 START-INFO-DIR-ENTRY 23 * bison: (bison). GNU parser generator (Yacc replacement). 24 END-INFO-DIR-ENTRY 25 26 27 File: bison.info, Node: Top, Next: Introduction, Up: (dir) 28 29 Bison 30 ***** 31 32 This manual is for GNU Bison (version 2.3, 30 May 2006), the GNU parser 33 generator. 34 35 Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 36 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. 37 38 Permission is granted to copy, distribute and/or modify this 39 document under the terms of the GNU Free Documentation License, 40 Version 1.2 or any later version published by the Free Software 41 Foundation; with no Invariant Sections, with the Front-Cover texts 42 being "A GNU Manual," and with the Back-Cover Texts as in (a) 43 below. A copy of the license is included in the section entitled 44 "GNU Free Documentation License." 45 46 (a) The FSF's Back-Cover Text is: "You have freedom to copy and 47 modify this GNU Manual, like GNU software. Copies published by 48 the Free Software Foundation raise funds for GNU development." 49 50 * Menu: 51 52 * Introduction:: 53 * Conditions:: 54 * Copying:: The GNU General Public License says 55 how you can copy and share Bison 56 57 Tutorial sections: 58 * Concepts:: Basic concepts for understanding Bison. 59 * Examples:: Three simple explained examples of using Bison. 60 61 Reference sections: 62 * Grammar File:: Writing Bison declarations and rules. 63 * Interface:: C-language interface to the parser function `yyparse'. 64 * Algorithm:: How the Bison parser works at run-time. 65 * Error Recovery:: Writing rules for error recovery. 66 * Context Dependency:: What to do if your language syntax is too 67 messy for Bison to handle straightforwardly. 68 * Debugging:: Understanding or debugging Bison parsers. 69 * Invocation:: How to run Bison (to produce the parser source file). 70 * C++ Language Interface:: Creating C++ parser objects. 71 * FAQ:: Frequently Asked Questions 72 * Table of Symbols:: All the keywords of the Bison language are explained. 73 * Glossary:: Basic concepts are explained. 74 * Copying This Manual:: License for copying this manual. 75 * Index:: Cross-references to the text. 76 77 --- The Detailed Node Listing --- 78 79 The Concepts of Bison 80 81 * Language and Grammar:: Languages and context-free grammars, 82 as mathematical ideas. 83 * Grammar in Bison:: How we represent grammars for Bison's sake. 84 * Semantic Values:: Each token or syntactic grouping can have 85 a semantic value (the value of an integer, 86 the name of an identifier, etc.). 87 * Semantic Actions:: Each rule can have an action containing C code. 88 * GLR Parsers:: Writing parsers for general context-free languages. 89 * Locations Overview:: Tracking Locations. 90 * Bison Parser:: What are Bison's input and output, 91 how is the output used? 92 * Stages:: Stages in writing and running Bison grammars. 93 * Grammar Layout:: Overall structure of a Bison grammar file. 94 95 Writing GLR Parsers 96 97 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 98 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 99 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 100 * Compiler Requirements:: GLR parsers require a modern C compiler. 101 102 Examples 103 104 * RPN Calc:: Reverse polish notation calculator; 105 a first example with no operator precedence. 106 * Infix Calc:: Infix (algebraic) notation calculator. 107 Operator precedence is introduced. 108 * Simple Error Recovery:: Continuing after syntax errors. 109 * Location Tracking Calc:: Demonstrating the use of @N and @$. 110 * Multi-function Calc:: Calculator with memory and trig functions. 111 It uses multiple data-types for semantic values. 112 * Exercises:: Ideas for improving the multi-function calculator. 113 114 Reverse Polish Notation Calculator 115 116 * Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. 117 * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. 118 * Lexer: Rpcalc Lexer. The lexical analyzer. 119 * Main: Rpcalc Main. The controlling function. 120 * Error: Rpcalc Error. The error reporting function. 121 * Gen: Rpcalc Gen. Running Bison on the grammar file. 122 * Comp: Rpcalc Compile. Run the C compiler on the output code. 123 124 Grammar Rules for `rpcalc' 125 126 * Rpcalc Input:: 127 * Rpcalc Line:: 128 * Rpcalc Expr:: 129 130 Location Tracking Calculator: `ltcalc' 131 132 * Decls: Ltcalc Decls. Bison and C declarations for ltcalc. 133 * Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. 134 * Lexer: Ltcalc Lexer. The lexical analyzer. 135 136 Multi-Function Calculator: `mfcalc' 137 138 * Decl: Mfcalc Decl. Bison declarations for multi-function calculator. 139 * Rules: Mfcalc Rules. Grammar rules for the calculator. 140 * Symtab: Mfcalc Symtab. Symbol table management subroutines. 141 142 Bison Grammar Files 143 144 * Grammar Outline:: Overall layout of the grammar file. 145 * Symbols:: Terminal and nonterminal symbols. 146 * Rules:: How to write grammar rules. 147 * Recursion:: Writing recursive rules. 148 * Semantics:: Semantic values and actions. 149 * Locations:: Locations and actions. 150 * Declarations:: All kinds of Bison declarations are described here. 151 * Multiple Parsers:: Putting more than one Bison parser in one program. 152 153 Outline of a Bison Grammar 154 155 * Prologue:: Syntax and usage of the prologue. 156 * Bison Declarations:: Syntax and usage of the Bison declarations section. 157 * Grammar Rules:: Syntax and usage of the grammar rules section. 158 * Epilogue:: Syntax and usage of the epilogue. 159 160 Defining Language Semantics 161 162 * Value Type:: Specifying one data type for all semantic values. 163 * Multiple Types:: Specifying several alternative data types. 164 * Actions:: An action is the semantic definition of a grammar rule. 165 * Action Types:: Specifying data types for actions to operate on. 166 * Mid-Rule Actions:: Most actions go at the end of a rule. 167 This says when, why and how to use the exceptional 168 action in the middle of a rule. 169 170 Tracking Locations 171 172 * Location Type:: Specifying a data type for locations. 173 * Actions and Locations:: Using locations in actions. 174 * Location Default Action:: Defining a general way to compute locations. 175 176 Bison Declarations 177 178 * Require Decl:: Requiring a Bison version. 179 * Token Decl:: Declaring terminal symbols. 180 * Precedence Decl:: Declaring terminals with precedence and associativity. 181 * Union Decl:: Declaring the set of all semantic value types. 182 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 183 * Initial Action Decl:: Code run before parsing starts. 184 * Destructor Decl:: Declaring how symbols are freed. 185 * Expect Decl:: Suppressing warnings about parsing conflicts. 186 * Start Decl:: Specifying the start symbol. 187 * Pure Decl:: Requesting a reentrant parser. 188 * Decl Summary:: Table of all Bison declarations. 189 190 Parser C-Language Interface 191 192 * Parser Function:: How to call `yyparse' and what it returns. 193 * Lexical:: You must supply a function `yylex' 194 which reads tokens. 195 * Error Reporting:: You must supply a function `yyerror'. 196 * Action Features:: Special features for use in actions. 197 * Internationalization:: How to let the parser speak in the user's 198 native language. 199 200 The Lexical Analyzer Function `yylex' 201 202 * Calling Convention:: How `yyparse' calls `yylex'. 203 * Token Values:: How `yylex' must return the semantic value 204 of the token it has read. 205 * Token Locations:: How `yylex' must return the text location 206 (line number, etc.) of the token, if the 207 actions want that. 208 * Pure Calling:: How the calling convention differs 209 in a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). 210 211 The Bison Parser Algorithm 212 213 * Look-Ahead:: Parser looks one token ahead when deciding what to do. 214 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 215 * Precedence:: Operator precedence works by resolving conflicts. 216 * Contextual Precedence:: When an operator's precedence depends on context. 217 * Parser States:: The parser is a finite-state-machine with stack. 218 * Reduce/Reduce:: When two rules are applicable in the same situation. 219 * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. 220 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 221 * Memory Management:: What happens when memory is exhausted. How to avoid it. 222 223 Operator Precedence 224 225 * Why Precedence:: An example showing why precedence is needed. 226 * Using Precedence:: How to specify precedence in Bison grammars. 227 * Precedence Examples:: How these features are used in the previous example. 228 * How Precedence:: How they work. 229 230 Handling Context Dependencies 231 232 * Semantic Tokens:: Token parsing can depend on the semantic context. 233 * Lexical Tie-ins:: Token parsing can depend on the syntactic context. 234 * Tie-in Recovery:: Lexical tie-ins have implications for how 235 error recovery rules must be written. 236 237 Debugging Your Parser 238 239 * Understanding:: Understanding the structure of your parser. 240 * Tracing:: Tracing the execution of your parser. 241 242 Invoking Bison 243 244 * Bison Options:: All the options described in detail, 245 in alphabetical order by short options. 246 * Option Cross Key:: Alphabetical list of long options. 247 * Yacc Library:: Yacc-compatible `yylex' and `main'. 248 249 C++ Language Interface 250 251 * C++ Parsers:: The interface to generate C++ parser classes 252 * A Complete C++ Example:: Demonstrating their use 253 254 C++ Parsers 255 256 * C++ Bison Interface:: Asking for C++ parser generation 257 * C++ Semantic Values:: %union vs. C++ 258 * C++ Location Values:: The position and location classes 259 * C++ Parser Interface:: Instantiating and running the parser 260 * C++ Scanner Interface:: Exchanges between yylex and parse 261 262 A Complete C++ Example 263 264 * Calc++ --- C++ Calculator:: The specifications 265 * Calc++ Parsing Driver:: An active parsing context 266 * Calc++ Parser:: A parser class 267 * Calc++ Scanner:: A pure C++ Flex scanner 268 * Calc++ Top Level:: Conducting the band 269 270 Frequently Asked Questions 271 272 * Memory Exhausted:: Breaking the Stack Limits 273 * How Can I Reset the Parser:: `yyparse' Keeps some State 274 * Strings are Destroyed:: `yylval' Loses Track of Strings 275 * Implementing Gotos/Loops:: Control Flow in the Calculator 276 * Multiple start-symbols:: Factoring closely related grammars 277 * Secure? Conform?:: Is Bison POSIX safe? 278 * I can't build Bison:: Troubleshooting 279 * Where can I find help?:: Troubleshouting 280 * Bug Reports:: Troublereporting 281 * Other Languages:: Parsers in Java and others 282 * Beta Testing:: Experimenting development versions 283 * Mailing Lists:: Meeting other Bison users 284 285 Copying This Manual 286 287 * GNU Free Documentation License:: License for copying this manual. 288 289 290 File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top 291 292 Introduction 293 ************ 294 295 "Bison" is a general-purpose parser generator that converts an 296 annotated context-free grammar into an LALR(1) or GLR parser for that 297 grammar. Once you are proficient with Bison, you can use it to develop 298 a wide range of language parsers, from those used in simple desk 299 calculators to complex programming languages. 300 301 Bison is upward compatible with Yacc: all properly-written Yacc 302 grammars ought to work with Bison with no change. Anyone familiar with 303 Yacc should be able to use Bison with little trouble. You need to be 304 fluent in C or C++ programming in order to use Bison or to understand 305 this manual. 306 307 We begin with tutorial chapters that explain the basic concepts of 308 using Bison and show three explained examples, each building on the 309 last. If you don't know Bison or Yacc, start by reading these 310 chapters. Reference chapters follow which describe specific aspects of 311 Bison in detail. 312 313 Bison was written primarily by Robert Corbett; Richard Stallman made 314 it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added 315 multi-character string literals and other features. 316 317 This edition corresponds to version 2.3 of Bison. 318 319 320 File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top 321 322 Conditions for Using Bison 323 ************************** 324 325 The distribution terms for Bison-generated parsers permit using the 326 parsers in nonfree programs. Before Bison version 2.2, these extra 327 permissions applied only when Bison was generating LALR(1) parsers in 328 C. And before Bison version 1.24, Bison-generated parsers could be 329 used only in programs that were free software. 330 331 The other GNU programming tools, such as the GNU C compiler, have 332 never had such a requirement. They could always be used for nonfree 333 software. The reason Bison was different was not due to a special 334 policy decision; it resulted from applying the usual General Public 335 License to all of the Bison source code. 336 337 The output of the Bison utility--the Bison parser file--contains a 338 verbatim copy of a sizable piece of Bison, which is the code for the 339 parser's implementation. (The actions from your grammar are inserted 340 into this implementation at one point, but most of the rest of the 341 implementation is not changed.) When we applied the GPL terms to the 342 skeleton code for the parser's implementation, the effect was to 343 restrict the use of Bison output to free software. 344 345 We didn't change the terms because of sympathy for people who want to 346 make software proprietary. *Software should be free.* But we 347 concluded that limiting Bison's use to free software was doing little to 348 encourage people to make other software free. So we decided to make the 349 practical conditions for using Bison match the practical conditions for 350 using the other GNU tools. 351 352 This exception applies when Bison is generating code for a parser. 353 You can tell whether the exception applies to a Bison output file by 354 inspecting the file for text beginning with "As a special 355 exception...". The text spells out the exact terms of the exception. 356 357 358 File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top 359 360 GNU GENERAL PUBLIC LICENSE 361 ************************** 362 363 Version 2, June 1991 364 365 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 366 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA 367 368 Everyone is permitted to copy and distribute verbatim copies 369 of this license document, but changing it is not allowed. 370 371 Preamble 372 ======== 373 374 The licenses for most software are designed to take away your freedom 375 to share and change it. By contrast, the GNU General Public License is 376 intended to guarantee your freedom to share and change free 377 software--to make sure the software is free for all its users. This 378 General Public License applies to most of the Free Software 379 Foundation's software and to any other program whose authors commit to 380 using it. (Some other Free Software Foundation software is covered by 381 the GNU Library General Public License instead.) You can apply it to 382 your programs, too. 383 384 When we speak of free software, we are referring to freedom, not 385 price. Our General Public Licenses are designed to make sure that you 386 have the freedom to distribute copies of free software (and charge for 387 this service if you wish), that you receive source code or can get it 388 if you want it, that you can change the software or use pieces of it in 389 new free programs; and that you know you can do these things. 390 391 To protect your rights, we need to make restrictions that forbid 392 anyone to deny you these rights or to ask you to surrender the rights. 393 These restrictions translate to certain responsibilities for you if you 394 distribute copies of the software, or if you modify it. 395 396 For example, if you distribute copies of such a program, whether 397 gratis or for a fee, you must give the recipients all the rights that 398 you have. You must make sure that they, too, receive or can get the 399 source code. And you must show them these terms so they know their 400 rights. 401 402 We protect your rights with two steps: (1) copyright the software, 403 and (2) offer you this license which gives you legal permission to copy, 404 distribute and/or modify the software. 405 406 Also, for each author's protection and ours, we want to make certain 407 that everyone understands that there is no warranty for this free 408 software. If the software is modified by someone else and passed on, we 409 want its recipients to know that what they have is not the original, so 410 that any problems introduced by others will not reflect on the original 411 authors' reputations. 412 413 Finally, any free program is threatened constantly by software 414 patents. We wish to avoid the danger that redistributors of a free 415 program will individually obtain patent licenses, in effect making the 416 program proprietary. To prevent this, we have made it clear that any 417 patent must be licensed for everyone's free use or not licensed at all. 418 419 The precise terms and conditions for copying, distribution and 420 modification follow. 421 422 TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 423 0. This License applies to any program or other work which contains a 424 notice placed by the copyright holder saying it may be distributed 425 under the terms of this General Public License. The "Program", 426 below, refers to any such program or work, and a "work based on 427 the Program" means either the Program or any derivative work under 428 copyright law: that is to say, a work containing the Program or a 429 portion of it, either verbatim or with modifications and/or 430 translated into another language. (Hereinafter, translation is 431 included without limitation in the term "modification".) Each 432 licensee is addressed as "you". 433 434 Activities other than copying, distribution and modification are 435 not covered by this License; they are outside its scope. The act 436 of running the Program is not restricted, and the output from the 437 Program is covered only if its contents constitute a work based on 438 the Program (independent of having been made by running the 439 Program). Whether that is true depends on what the Program does. 440 441 1. You may copy and distribute verbatim copies of the Program's 442 source code as you receive it, in any medium, provided that you 443 conspicuously and appropriately publish on each copy an appropriate 444 copyright notice and disclaimer of warranty; keep intact all the 445 notices that refer to this License and to the absence of any 446 warranty; and give any other recipients of the Program a copy of 447 this License along with the Program. 448 449 You may charge a fee for the physical act of transferring a copy, 450 and you may at your option offer warranty protection in exchange 451 for a fee. 452 453 2. You may modify your copy or copies of the Program or any portion 454 of it, thus forming a work based on the Program, and copy and 455 distribute such modifications or work under the terms of Section 1 456 above, provided that you also meet all of these conditions: 457 458 a. You must cause the modified files to carry prominent notices 459 stating that you changed the files and the date of any change. 460 461 b. You must cause any work that you distribute or publish, that 462 in whole or in part contains or is derived from the Program 463 or any part thereof, to be licensed as a whole at no charge 464 to all third parties under the terms of this License. 465 466 c. If the modified program normally reads commands interactively 467 when run, you must cause it, when started running for such 468 interactive use in the most ordinary way, to print or display 469 an announcement including an appropriate copyright notice and 470 a notice that there is no warranty (or else, saying that you 471 provide a warranty) and that users may redistribute the 472 program under these conditions, and telling the user how to 473 view a copy of this License. (Exception: if the Program 474 itself is interactive but does not normally print such an 475 announcement, your work based on the Program is not required 476 to print an announcement.) 477 478 These requirements apply to the modified work as a whole. If 479 identifiable sections of that work are not derived from the 480 Program, and can be reasonably considered independent and separate 481 works in themselves, then this License, and its terms, do not 482 apply to those sections when you distribute them as separate 483 works. But when you distribute the same sections as part of a 484 whole which is a work based on the Program, the distribution of 485 the whole must be on the terms of this License, whose permissions 486 for other licensees extend to the entire whole, and thus to each 487 and every part regardless of who wrote it. 488 489 Thus, it is not the intent of this section to claim rights or 490 contest your rights to work written entirely by you; rather, the 491 intent is to exercise the right to control the distribution of 492 derivative or collective works based on the Program. 493 494 In addition, mere aggregation of another work not based on the 495 Program with the Program (or with a work based on the Program) on 496 a volume of a storage or distribution medium does not bring the 497 other work under the scope of this License. 498 499 3. You may copy and distribute the Program (or a work based on it, 500 under Section 2) in object code or executable form under the terms 501 of Sections 1 and 2 above provided that you also do one of the 502 following: 503 504 a. Accompany it with the complete corresponding machine-readable 505 source code, which must be distributed under the terms of 506 Sections 1 and 2 above on a medium customarily used for 507 software interchange; or, 508 509 b. Accompany it with a written offer, valid for at least three 510 years, to give any third party, for a charge no more than your 511 cost of physically performing source distribution, a complete 512 machine-readable copy of the corresponding source code, to be 513 distributed under the terms of Sections 1 and 2 above on a 514 medium customarily used for software interchange; or, 515 516 c. Accompany it with the information you received as to the offer 517 to distribute corresponding source code. (This alternative is 518 allowed only for noncommercial distribution and only if you 519 received the program in object code or executable form with 520 such an offer, in accord with Subsection b above.) 521 522 The source code for a work means the preferred form of the work for 523 making modifications to it. For an executable work, complete 524 source code means all the source code for all modules it contains, 525 plus any associated interface definition files, plus the scripts 526 used to control compilation and installation of the executable. 527 However, as a special exception, the source code distributed need 528 not include anything that is normally distributed (in either 529 source or binary form) with the major components (compiler, 530 kernel, and so on) of the operating system on which the executable 531 runs, unless that component itself accompanies the executable. 532 533 If distribution of executable or object code is made by offering 534 access to copy from a designated place, then offering equivalent 535 access to copy the source code from the same place counts as 536 distribution of the source code, even though third parties are not 537 compelled to copy the source along with the object code. 538 539 4. You may not copy, modify, sublicense, or distribute the Program 540 except as expressly provided under this License. Any attempt 541 otherwise to copy, modify, sublicense or distribute the Program is 542 void, and will automatically terminate your rights under this 543 License. However, parties who have received copies, or rights, 544 from you under this License will not have their licenses 545 terminated so long as such parties remain in full compliance. 546 547 5. You are not required to accept this License, since you have not 548 signed it. However, nothing else grants you permission to modify 549 or distribute the Program or its derivative works. These actions 550 are prohibited by law if you do not accept this License. 551 Therefore, by modifying or distributing the Program (or any work 552 based on the Program), you indicate your acceptance of this 553 License to do so, and all its terms and conditions for copying, 554 distributing or modifying the Program or works based on it. 555 556 6. Each time you redistribute the Program (or any work based on the 557 Program), the recipient automatically receives a license from the 558 original licensor to copy, distribute or modify the Program 559 subject to these terms and conditions. You may not impose any 560 further restrictions on the recipients' exercise of the rights 561 granted herein. You are not responsible for enforcing compliance 562 by third parties to this License. 563 564 7. If, as a consequence of a court judgment or allegation of patent 565 infringement or for any other reason (not limited to patent 566 issues), conditions are imposed on you (whether by court order, 567 agreement or otherwise) that contradict the conditions of this 568 License, they do not excuse you from the conditions of this 569 License. If you cannot distribute so as to satisfy simultaneously 570 your obligations under this License and any other pertinent 571 obligations, then as a consequence you may not distribute the 572 Program at all. For example, if a patent license would not permit 573 royalty-free redistribution of the Program by all those who 574 receive copies directly or indirectly through you, then the only 575 way you could satisfy both it and this License would be to refrain 576 entirely from distribution of the Program. 577 578 If any portion of this section is held invalid or unenforceable 579 under any particular circumstance, the balance of the section is 580 intended to apply and the section as a whole is intended to apply 581 in other circumstances. 582 583 It is not the purpose of this section to induce you to infringe any 584 patents or other property right claims or to contest validity of 585 any such claims; this section has the sole purpose of protecting 586 the integrity of the free software distribution system, which is 587 implemented by public license practices. Many people have made 588 generous contributions to the wide range of software distributed 589 through that system in reliance on consistent application of that 590 system; it is up to the author/donor to decide if he or she is 591 willing to distribute software through any other system and a 592 licensee cannot impose that choice. 593 594 This section is intended to make thoroughly clear what is believed 595 to be a consequence of the rest of this License. 596 597 8. If the distribution and/or use of the Program is restricted in 598 certain countries either by patents or by copyrighted interfaces, 599 the original copyright holder who places the Program under this 600 License may add an explicit geographical distribution limitation 601 excluding those countries, so that distribution is permitted only 602 in or among countries not thus excluded. In such case, this 603 License incorporates the limitation as if written in the body of 604 this License. 605 606 9. The Free Software Foundation may publish revised and/or new 607 versions of the General Public License from time to time. Such 608 new versions will be similar in spirit to the present version, but 609 may differ in detail to address new problems or concerns. 610 611 Each version is given a distinguishing version number. If the 612 Program specifies a version number of this License which applies 613 to it and "any later version", you have the option of following 614 the terms and conditions either of that version or of any later 615 version published by the Free Software Foundation. If the Program 616 does not specify a version number of this License, you may choose 617 any version ever published by the Free Software Foundation. 618 619 10. If you wish to incorporate parts of the Program into other free 620 programs whose distribution conditions are different, write to the 621 author to ask for permission. For software which is copyrighted 622 by the Free Software Foundation, write to the Free Software 623 Foundation; we sometimes make exceptions for this. Our decision 624 will be guided by the two goals of preserving the free status of 625 all derivatives of our free software and of promoting the sharing 626 and reuse of software generally. 627 628 NO WARRANTY 629 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO 630 WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE 631 LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 632 HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT 633 WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT 634 NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 635 FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE 636 QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 637 PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY 638 SERVICING, REPAIR OR CORRECTION. 639 640 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN 641 WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY 642 MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE 643 LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, 644 INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR 645 INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 646 DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU 647 OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY 648 OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN 649 ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 650 651 END OF TERMS AND CONDITIONS 652 Appendix: How to Apply These Terms to Your New Programs 653 ======================================================= 654 655 If you develop a new program, and you want it to be of the greatest 656 possible use to the public, the best way to achieve this is to make it 657 free software which everyone can redistribute and change under these 658 terms. 659 660 To do so, attach the following notices to the program. It is safest 661 to attach them to the start of each source file to most effectively 662 convey the exclusion of warranty; and each file should have at least 663 the "copyright" line and a pointer to where the full notice is found. 664 665 ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. 666 Copyright (C) YYYY NAME OF AUTHOR 667 668 This program is free software; you can redistribute it and/or modify 669 it under the terms of the GNU General Public License as published by 670 the Free Software Foundation; either version 2 of the License, or 671 (at your option) any later version. 672 673 This program is distributed in the hope that it will be useful, 674 but WITHOUT ANY WARRANTY; without even the implied warranty of 675 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 676 GNU General Public License for more details. 677 678 You should have received a copy of the GNU General Public License 679 along with this program; if not, write to the Free Software 680 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. 681 682 Also add information on how to contact you by electronic and paper 683 mail. 684 685 If the program is interactive, make it output a short notice like 686 this when it starts in an interactive mode: 687 688 Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR 689 Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 690 This is free software, and you are welcome to redistribute it 691 under certain conditions; type `show c' for details. 692 693 The hypothetical commands `show w' and `show c' should show the 694 appropriate parts of the General Public License. Of course, the 695 commands you use may be called something other than `show w' and `show 696 c'; they could even be mouse-clicks or menu items--whatever suits your 697 program. 698 699 You should also get your employer (if you work as a programmer) or 700 your school, if any, to sign a "copyright disclaimer" for the program, 701 if necessary. Here is a sample; alter the names: 702 703 Yoyodyne, Inc., hereby disclaims all copyright interest in the program 704 `Gnomovision' (which makes passes at compilers) written by James Hacker. 705 706 SIGNATURE OF TY COON, 1 April 1989 707 Ty Coon, President of Vice 708 709 This General Public License does not permit incorporating your 710 program into proprietary programs. If your program is a subroutine 711 library, you may consider it more useful to permit linking proprietary 712 applications with the library. If this is what you want to do, use the 713 GNU Library General Public License instead of this License. 714 715 716 File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top 717 718 1 The Concepts of Bison 719 *********************** 720 721 This chapter introduces many of the basic concepts without which the 722 details of Bison will not make sense. If you do not already know how to 723 use Bison or Yacc, we suggest you start by reading this chapter 724 carefully. 725 726 * Menu: 727 728 * Language and Grammar:: Languages and context-free grammars, 729 as mathematical ideas. 730 * Grammar in Bison:: How we represent grammars for Bison's sake. 731 * Semantic Values:: Each token or syntactic grouping can have 732 a semantic value (the value of an integer, 733 the name of an identifier, etc.). 734 * Semantic Actions:: Each rule can have an action containing C code. 735 * GLR Parsers:: Writing parsers for general context-free languages. 736 * Locations Overview:: Tracking Locations. 737 * Bison Parser:: What are Bison's input and output, 738 how is the output used? 739 * Stages:: Stages in writing and running Bison grammars. 740 * Grammar Layout:: Overall structure of a Bison grammar file. 741 742 743 File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Up: Concepts 744 745 1.1 Languages and Context-Free Grammars 746 ======================================= 747 748 In order for Bison to parse a language, it must be described by a 749 "context-free grammar". This means that you specify one or more 750 "syntactic groupings" and give rules for constructing them from their 751 parts. For example, in the C language, one kind of grouping is called 752 an `expression'. One rule for making an expression might be, "An 753 expression can be made of a minus sign and another expression". 754 Another would be, "An expression can be an integer". As you can see, 755 rules are often recursive, but there must be at least one rule which 756 leads out of the recursion. 757 758 The most common formal system for presenting such rules for humans 759 to read is "Backus-Naur Form" or "BNF", which was developed in order to 760 specify the language Algol 60. Any grammar expressed in BNF is a 761 context-free grammar. The input to Bison is essentially 762 machine-readable BNF. 763 764 There are various important subclasses of context-free grammar. 765 Although it can handle almost all context-free grammars, Bison is 766 optimized for what are called LALR(1) grammars. In brief, in these 767 grammars, it must be possible to tell how to parse any portion of an 768 input string with just a single token of look-ahead. Strictly 769 speaking, that is a description of an LR(1) grammar, and LALR(1) 770 involves additional restrictions that are hard to explain simply; but 771 it is rare in actual practice to find an LR(1) grammar that fails to be 772 LALR(1). *Note Mysterious Reduce/Reduce Conflicts: Mystery Conflicts, 773 for more information on this. 774 775 Parsers for LALR(1) grammars are "deterministic", meaning roughly 776 that the next grammar rule to apply at any point in the input is 777 uniquely determined by the preceding input and a fixed, finite portion 778 (called a "look-ahead") of the remaining input. A context-free grammar 779 can be "ambiguous", meaning that there are multiple ways to apply the 780 grammar rules to get the same inputs. Even unambiguous grammars can be 781 "nondeterministic", meaning that no fixed look-ahead always suffices to 782 determine the next grammar rule to apply. With the proper 783 declarations, Bison is also able to parse these more general 784 context-free grammars, using a technique known as GLR parsing (for 785 Generalized LR). Bison's GLR parsers are able to handle any 786 context-free grammar for which the number of possible parses of any 787 given string is finite. 788 789 In the formal grammatical rules for a language, each kind of 790 syntactic unit or grouping is named by a "symbol". Those which are 791 built by grouping smaller constructs according to grammatical rules are 792 called "nonterminal symbols"; those which can't be subdivided are called 793 "terminal symbols" or "token types". We call a piece of input 794 corresponding to a single terminal symbol a "token", and a piece 795 corresponding to a single nonterminal symbol a "grouping". 796 797 We can use the C language as an example of what symbols, terminal and 798 nonterminal, mean. The tokens of C are identifiers, constants (numeric 799 and string), and the various keywords, arithmetic operators and 800 punctuation marks. So the terminal symbols of a grammar for C include 801 `identifier', `number', `string', plus one symbol for each keyword, 802 operator or punctuation mark: `if', `return', `const', `static', `int', 803 `char', `plus-sign', `open-brace', `close-brace', `comma' and many more. 804 (These tokens can be subdivided into characters, but that is a matter of 805 lexicography, not grammar.) 806 807 Here is a simple C function subdivided into tokens: 808 809 int /* keyword `int' */ 810 square (int x) /* identifier, open-paren, keyword `int', 811 identifier, close-paren */ 812 { /* open-brace */ 813 return x * x; /* keyword `return', identifier, asterisk, 814 identifier, semicolon */ 815 } /* close-brace */ 816 817 The syntactic groupings of C include the expression, the statement, 818 the declaration, and the function definition. These are represented in 819 the grammar of C by nonterminal symbols `expression', `statement', 820 `declaration' and `function definition'. The full grammar uses dozens 821 of additional language constructs, each with its own nonterminal 822 symbol, in order to express the meanings of these four. The example 823 above is a function definition; it contains one declaration, and one 824 statement. In the statement, each `x' is an expression and so is `x * 825 x'. 826 827 Each nonterminal symbol must have grammatical rules showing how it 828 is made out of simpler constructs. For example, one kind of C 829 statement is the `return' statement; this would be described with a 830 grammar rule which reads informally as follows: 831 832 A `statement' can be made of a `return' keyword, an `expression' 833 and a `semicolon'. 834 835 There would be many other rules for `statement', one for each kind of 836 statement in C. 837 838 One nonterminal symbol must be distinguished as the special one which 839 defines a complete utterance in the language. It is called the "start 840 symbol". In a compiler, this means a complete input program. In the C 841 language, the nonterminal symbol `sequence of definitions and 842 declarations' plays this role. 843 844 For example, `1 + 2' is a valid C expression--a valid part of a C 845 program--but it is not valid as an _entire_ C program. In the 846 context-free grammar of C, this follows from the fact that `expression' 847 is not the start symbol. 848 849 The Bison parser reads a sequence of tokens as its input, and groups 850 the tokens using the grammar rules. If the input is valid, the end 851 result is that the entire token sequence reduces to a single grouping 852 whose symbol is the grammar's start symbol. If we use a grammar for C, 853 the entire input must be a `sequence of definitions and declarations'. 854 If not, the parser reports a syntax error. 855 856 857 File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts 858 859 1.2 From Formal Rules to Bison Input 860 ==================================== 861 862 A formal grammar is a mathematical construct. To define the language 863 for Bison, you must write a file expressing the grammar in Bison syntax: 864 a "Bison grammar" file. *Note Bison Grammar Files: Grammar File. 865 866 A nonterminal symbol in the formal grammar is represented in Bison 867 input as an identifier, like an identifier in C. By convention, it 868 should be in lower case, such as `expr', `stmt' or `declaration'. 869 870 The Bison representation for a terminal symbol is also called a 871 "token type". Token types as well can be represented as C-like 872 identifiers. By convention, these identifiers should be upper case to 873 distinguish them from nonterminals: for example, `INTEGER', 874 `IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a 875 particular keyword in the language should be named after that keyword 876 converted to upper case. The terminal symbol `error' is reserved for 877 error recovery. *Note Symbols::. 878 879 A terminal symbol can also be represented as a character literal, 880 just like a C character constant. You should do this whenever a token 881 is just a single character (parenthesis, plus-sign, etc.): use that 882 same character in a literal as the terminal symbol for that token. 883 884 A third way to represent a terminal symbol is with a C string 885 constant containing several characters. *Note Symbols::, for more 886 information. 887 888 The grammar rules also have an expression in Bison syntax. For 889 example, here is the Bison rule for a C `return' statement. The 890 semicolon in quotes is a literal character token, representing part of 891 the C syntax for the statement; the naked semicolon, and the colon, are 892 Bison punctuation used in every rule. 893 894 stmt: RETURN expr ';' 895 ; 896 897 *Note Syntax of Grammar Rules: Rules. 898 899 900 File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts 901 902 1.3 Semantic Values 903 =================== 904 905 A formal grammar selects tokens only by their classifications: for 906 example, if a rule mentions the terminal symbol `integer constant', it 907 means that _any_ integer constant is grammatically valid in that 908 position. The precise value of the constant is irrelevant to how to 909 parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is 910 equally grammatical. 911 912 But the precise value is very important for what the input means 913 once it is parsed. A compiler is useless if it fails to distinguish 914 between 4, 1 and 3989 as constants in the program! Therefore, each 915 token in a Bison grammar has both a token type and a "semantic value". 916 *Note Defining Language Semantics: Semantics, for details. 917 918 The token type is a terminal symbol defined in the grammar, such as 919 `INTEGER', `IDENTIFIER' or `',''. It tells everything you need to know 920 to decide where the token may validly appear and how to group it with 921 other tokens. The grammar rules know nothing about tokens except their 922 types. 923 924 The semantic value has all the rest of the information about the 925 meaning of the token, such as the value of an integer, or the name of an 926 identifier. (A token such as `','' which is just punctuation doesn't 927 need to have any semantic value.) 928 929 For example, an input token might be classified as token type 930 `INTEGER' and have the semantic value 4. Another input token might 931 have the same token type `INTEGER' but value 3989. When a grammar rule 932 says that `INTEGER' is allowed, either of these tokens is acceptable 933 because each is an `INTEGER'. When the parser accepts the token, it 934 keeps track of the token's semantic value. 935 936 Each grouping can also have a semantic value as well as its 937 nonterminal symbol. For example, in a calculator, an expression 938 typically has a semantic value that is a number. In a compiler for a 939 programming language, an expression typically has a semantic value that 940 is a tree structure describing the meaning of the expression. 941 942 943 File: bison.info, Node: Semantic Actions, Next: GLR Parsers, Prev: Semantic Values, Up: Concepts 944 945 1.4 Semantic Actions 946 ==================== 947 948 In order to be useful, a program must do more than parse input; it must 949 also produce some output based on the input. In a Bison grammar, a 950 grammar rule can have an "action" made up of C statements. Each time 951 the parser recognizes a match for that rule, the action is executed. 952 *Note Actions::. 953 954 Most of the time, the purpose of an action is to compute the 955 semantic value of the whole construct from the semantic values of its 956 parts. For example, suppose we have a rule which says an expression 957 can be the sum of two expressions. When the parser recognizes such a 958 sum, each of the subexpressions has a semantic value which describes 959 how it was built up. The action for this rule should create a similar 960 sort of value for the newly recognized larger expression. 961 962 For example, here is a rule that says an expression can be the sum of 963 two subexpressions: 964 965 expr: expr '+' expr { $$ = $1 + $3; } 966 ; 967 968 The action says how to produce the semantic value of the sum expression 969 from the values of the two subexpressions. 970 971 972 File: bison.info, Node: GLR Parsers, Next: Locations Overview, Prev: Semantic Actions, Up: Concepts 973 974 1.5 Writing GLR Parsers 975 ======================= 976 977 In some grammars, Bison's standard LALR(1) parsing algorithm cannot 978 decide whether to apply a certain grammar rule at a given point. That 979 is, it may not be able to decide (on the basis of the input read so 980 far) which of two possible reductions (applications of a grammar rule) 981 applies, or whether to apply a reduction or read more of the input and 982 apply a reduction later in the input. These are known respectively as 983 "reduce/reduce" conflicts (*note Reduce/Reduce::), and "shift/reduce" 984 conflicts (*note Shift/Reduce::). 985 986 To use a grammar that is not easily modified to be LALR(1), a more 987 general parsing algorithm is sometimes necessary. If you include 988 `%glr-parser' among the Bison declarations in your file (*note Grammar 989 Outline::), the result is a Generalized LR (GLR) parser. These parsers 990 handle Bison grammars that contain no unresolved conflicts (i.e., after 991 applying precedence declarations) identically to LALR(1) parsers. 992 However, when faced with unresolved shift/reduce and reduce/reduce 993 conflicts, GLR parsers use the simple expedient of doing both, 994 effectively cloning the parser to follow both possibilities. Each of 995 the resulting parsers can again split, so that at any given time, there 996 can be any number of possible parses being explored. The parsers 997 proceed in lockstep; that is, all of them consume (shift) a given input 998 symbol before any of them proceed to the next. Each of the cloned 999 parsers eventually meets one of two possible fates: either it runs into 1000 a parsing error, in which case it simply vanishes, or it merges with 1001 another parser, because the two of them have reduced the input to an 1002 identical set of symbols. 1003 1004 During the time that there are multiple parsers, semantic actions are 1005 recorded, but not performed. When a parser disappears, its recorded 1006 semantic actions disappear as well, and are never performed. When a 1007 reduction makes two parsers identical, causing them to merge, Bison 1008 records both sets of semantic actions. Whenever the last two parsers 1009 merge, reverting to the single-parser case, Bison resolves all the 1010 outstanding actions either by precedences given to the grammar rules 1011 involved, or by performing both actions, and then calling a designated 1012 user-defined function on the resulting values to produce an arbitrary 1013 merged result. 1014 1015 * Menu: 1016 1017 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. 1018 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. 1019 * GLR Semantic Actions:: Deferred semantic actions have special concerns. 1020 * Compiler Requirements:: GLR parsers require a modern C compiler. 1021 1022 1023 File: bison.info, Node: Simple GLR Parsers, Next: Merging GLR Parses, Up: GLR Parsers 1024 1025 1.5.1 Using GLR on Unambiguous Grammars 1026 --------------------------------------- 1027 1028 In the simplest cases, you can use the GLR algorithm to parse grammars 1029 that are unambiguous, but fail to be LALR(1). Such grammars typically 1030 require more than one symbol of look-ahead, or (in rare cases) fall 1031 into the category of grammars in which the LALR(1) algorithm throws 1032 away too much information (they are in LR(1), but not LALR(1), *Note 1033 Mystery Conflicts::). 1034 1035 Consider a problem that arises in the declaration of enumerated and 1036 subrange types in the programming language Pascal. Here are some 1037 examples: 1038 1039 type subrange = lo .. hi; 1040 type enum = (a, b, c); 1041 1042 The original language standard allows only numeric literals and 1043 constant identifiers for the subrange bounds (`lo' and `hi'), but 1044 Extended Pascal (ISO/IEC 10206) and many other Pascal implementations 1045 allow arbitrary expressions there. This gives rise to the following 1046 situation, containing a superfluous pair of parentheses: 1047 1048 type subrange = (a) .. b; 1049 1050 Compare this to the following declaration of an enumerated type with 1051 only one value: 1052 1053 type enum = (a); 1054 1055 (These declarations are contrived, but they are syntactically valid, 1056 and more-complicated cases can come up in practical programs.) 1057 1058 These two declarations look identical until the `..' token. With 1059 normal LALR(1) one-token look-ahead it is not possible to decide 1060 between the two forms when the identifier `a' is parsed. It is, 1061 however, desirable for a parser to decide this, since in the latter case 1062 `a' must become a new identifier to represent the enumeration value, 1063 while in the former case `a' must be evaluated with its current 1064 meaning, which may be a constant or even a function call. 1065 1066 You could parse `(a)' as an "unspecified identifier in parentheses", 1067 to be resolved later, but this typically requires substantial 1068 contortions in both semantic actions and large parts of the grammar, 1069 where the parentheses are nested in the recursive rules for expressions. 1070 1071 You might think of using the lexer to distinguish between the two 1072 forms by returning different tokens for currently defined and undefined 1073 identifiers. But if these declarations occur in a local scope, and `a' 1074 is defined in an outer scope, then both forms are possible--either 1075 locally redefining `a', or using the value of `a' from the outer scope. 1076 So this approach cannot work. 1077 1078 A simple solution to this problem is to declare the parser to use 1079 the GLR algorithm. When the GLR parser reaches the critical state, it 1080 merely splits into two branches and pursues both syntax rules 1081 simultaneously. Sooner or later, one of them runs into a parsing 1082 error. If there is a `..' token before the next `;', the rule for 1083 enumerated types fails since it cannot accept `..' anywhere; otherwise, 1084 the subrange type rule fails since it requires a `..' token. So one of 1085 the branches fails silently, and the other one continues normally, 1086 performing all the intermediate actions that were postponed during the 1087 split. 1088 1089 If the input is syntactically incorrect, both branches fail and the 1090 parser reports a syntax error as usual. 1091 1092 The effect of all this is that the parser seems to "guess" the 1093 correct branch to take, or in other words, it seems to use more 1094 look-ahead than the underlying LALR(1) algorithm actually allows for. 1095 In this example, LALR(2) would suffice, but also some cases that are 1096 not LALR(k) for any k can be handled this way. 1097 1098 In general, a GLR parser can take quadratic or cubic worst-case time, 1099 and the current Bison parser even takes exponential time and space for 1100 some grammars. In practice, this rarely happens, and for many grammars 1101 it is possible to prove that it cannot happen. The present example 1102 contains only one conflict between two rules, and the type-declaration 1103 context containing the conflict cannot be nested. So the number of 1104 branches that can exist at any time is limited by the constant 2, and 1105 the parsing time is still linear. 1106 1107 Here is a Bison grammar corresponding to the example above. It 1108 parses a vastly simplified form of Pascal type declarations. 1109 1110 %token TYPE DOTDOT ID 1111 1112 %left '+' '-' 1113 %left '*' '/' 1114 1115 %% 1116 1117 type_decl : TYPE ID '=' type ';' 1118 ; 1119 1120 type : '(' id_list ')' 1121 | expr DOTDOT expr 1122 ; 1123 1124 id_list : ID 1125 | id_list ',' ID 1126 ; 1127 1128 expr : '(' expr ')' 1129 | expr '+' expr 1130 | expr '-' expr 1131 | expr '*' expr 1132 | expr '/' expr 1133 | ID 1134 ; 1135 1136 When used as a normal LALR(1) grammar, Bison correctly complains 1137 about one reduce/reduce conflict. In the conflicting situation the 1138 parser chooses one of the alternatives, arbitrarily the one declared 1139 first. Therefore the following correct input is not recognized: 1140 1141 type t = (a) .. b; 1142 1143 The parser can be turned into a GLR parser, while also telling Bison 1144 to be silent about the one known reduce/reduce conflict, by adding 1145 these two declarations to the Bison input file (before the first `%%'): 1146 1147 %glr-parser 1148 %expect-rr 1 1149 1150 No change in the grammar itself is required. Now the parser recognizes 1151 all valid declarations, according to the limited syntax above, 1152 transparently. In fact, the user does not even notice when the parser 1153 splits. 1154 1155 So here we have a case where we can use the benefits of GLR, almost 1156 without disadvantages. Even in simple cases like this, however, there 1157 are at least two potential problems to beware. First, always analyze 1158 the conflicts reported by Bison to make sure that GLR splitting is only 1159 done where it is intended. A GLR parser splitting inadvertently may 1160 cause problems less obvious than an LALR parser statically choosing the 1161 wrong alternative in a conflict. Second, consider interactions with 1162 the lexer (*note Semantic Tokens::) with great care. Since a split 1163 parser consumes tokens without performing any actions during the split, 1164 the lexer cannot obtain information via parser actions. Some cases of 1165 lexer interactions can be eliminated by using GLR to shift the 1166 complications from the lexer to the parser. You must check the 1167 remaining cases for correctness. 1168 1169 In our example, it would be safe for the lexer to return tokens 1170 based on their current meanings in some symbol table, because no new 1171 symbols are defined in the middle of a type declaration. Though it is 1172 possible for a parser to define the enumeration constants as they are 1173 parsed, before the type declaration is completed, it actually makes no 1174 difference since they cannot be used within the same enumerated type 1175 declaration. 1176 1177 1178 File: bison.info, Node: Merging GLR Parses, Next: GLR Semantic Actions, Prev: Simple GLR Parsers, Up: GLR Parsers 1179 1180 1.5.2 Using GLR to Resolve Ambiguities 1181 -------------------------------------- 1182 1183 Let's consider an example, vastly simplified from a C++ grammar. 1184 1185 %{ 1186 #include <stdio.h> 1187 #define YYSTYPE char const * 1188 int yylex (void); 1189 void yyerror (char const *); 1190 %} 1191 1192 %token TYPENAME ID 1193 1194 %right '=' 1195 %left '+' 1196 1197 %glr-parser 1198 1199 %% 1200 1201 prog : 1202 | prog stmt { printf ("\n"); } 1203 ; 1204 1205 stmt : expr ';' %dprec 1 1206 | decl %dprec 2 1207 ; 1208 1209 expr : ID { printf ("%s ", $$); } 1210 | TYPENAME '(' expr ')' 1211 { printf ("%s <cast> ", $1); } 1212 | expr '+' expr { printf ("+ "); } 1213 | expr '=' expr { printf ("= "); } 1214 ; 1215 1216 decl : TYPENAME declarator ';' 1217 { printf ("%s <declare> ", $1); } 1218 | TYPENAME declarator '=' expr ';' 1219 { printf ("%s <init-declare> ", $1); } 1220 ; 1221 1222 declarator : ID { printf ("\"%s\" ", $1); } 1223 | '(' declarator ')' 1224 ; 1225 1226 This models a problematic part of the C++ grammar--the ambiguity between 1227 certain declarations and statements. For example, 1228 1229 T (x) = y+z; 1230 1231 parses as either an `expr' or a `stmt' (assuming that `T' is recognized 1232 as a `TYPENAME' and `x' as an `ID'). Bison detects this as a 1233 reduce/reduce conflict between the rules `expr : ID' and `declarator : 1234 ID', which it cannot resolve at the time it encounters `x' in the 1235 example above. Since this is a GLR parser, it therefore splits the 1236 problem into two parses, one for each choice of resolving the 1237 reduce/reduce conflict. Unlike the example from the previous section 1238 (*note Simple GLR Parsers::), however, neither of these parses "dies," 1239 because the grammar as it stands is ambiguous. One of the parsers 1240 eventually reduces `stmt : expr ';'' and the other reduces `stmt : 1241 decl', after which both parsers are in an identical state: they've seen 1242 `prog stmt' and have the same unprocessed input remaining. We say that 1243 these parses have "merged." 1244 1245 At this point, the GLR parser requires a specification in the 1246 grammar of how to choose between the competing parses. In the example 1247 above, the two `%dprec' declarations specify that Bison is to give 1248 precedence to the parse that interprets the example as a `decl', which 1249 implies that `x' is a declarator. The parser therefore prints 1250 1251 "x" y z + T <init-declare> 1252 1253 The `%dprec' declarations only come into play when more than one 1254 parse survives. Consider a different input string for this parser: 1255 1256 T (x) + y; 1257 1258 This is another example of using GLR to parse an unambiguous construct, 1259 as shown in the previous section (*note Simple GLR Parsers::). Here, 1260 there is no ambiguity (this cannot be parsed as a declaration). 1261 However, at the time the Bison parser encounters `x', it does not have 1262 enough information to resolve the reduce/reduce conflict (again, 1263 between `x' as an `expr' or a `declarator'). In this case, no 1264 precedence declaration is used. Again, the parser splits into two, one 1265 assuming that `x' is an `expr', and the other assuming `x' is a 1266 `declarator'. The second of these parsers then vanishes when it sees 1267 `+', and the parser prints 1268 1269 x T <cast> y + 1270 1271 Suppose that instead of resolving the ambiguity, you wanted to see 1272 all the possibilities. For this purpose, you must merge the semantic 1273 actions of the two possible parsers, rather than choosing one over the 1274 other. To do so, you could change the declaration of `stmt' as follows: 1275 1276 stmt : expr ';' %merge <stmtMerge> 1277 | decl %merge <stmtMerge> 1278 ; 1279 1280 and define the `stmtMerge' function as: 1281 1282 static YYSTYPE 1283 stmtMerge (YYSTYPE x0, YYSTYPE x1) 1284 { 1285 printf ("<OR> "); 1286 return ""; 1287 } 1288 1289 with an accompanying forward declaration in the C declarations at the 1290 beginning of the file: 1291 1292 %{ 1293 #define YYSTYPE char const * 1294 static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); 1295 %} 1296 1297 With these declarations, the resulting parser parses the first example 1298 as both an `expr' and a `decl', and prints 1299 1300 "x" y z + T <init-declare> x T <cast> y z + = <OR> 1301 1302 Bison requires that all of the productions that participate in any 1303 particular merge have identical `%merge' clauses. Otherwise, the 1304 ambiguity would be unresolvable, and the parser will report an error 1305 during any parse that results in the offending merge. 1306 1307 1308 File: bison.info, Node: GLR Semantic Actions, Next: Compiler Requirements, Prev: Merging GLR Parses, Up: GLR Parsers 1309 1310 1.5.3 GLR Semantic Actions 1311 -------------------------- 1312 1313 By definition, a deferred semantic action is not performed at the same 1314 time as the associated reduction. This raises caveats for several 1315 Bison features you might use in a semantic action in a GLR parser. 1316 1317 In any semantic action, you can examine `yychar' to determine the 1318 type of the look-ahead token present at the time of the associated 1319 reduction. After checking that `yychar' is not set to `YYEMPTY' or 1320 `YYEOF', you can then examine `yylval' and `yylloc' to determine the 1321 look-ahead token's semantic value and location, if any. In a 1322 nondeferred semantic action, you can also modify any of these variables 1323 to influence syntax analysis. *Note Look-Ahead Tokens: Look-Ahead. 1324 1325 In a deferred semantic action, it's too late to influence syntax 1326 analysis. In this case, `yychar', `yylval', and `yylloc' are set to 1327 shallow copies of the values they had at the time of the associated 1328 reduction. For this reason alone, modifying them is dangerous. 1329 Moreover, the result of modifying them is undefined and subject to 1330 change with future versions of Bison. For example, if a semantic 1331 action might be deferred, you should never write it to invoke 1332 `yyclearin' (*note Action Features::) or to attempt to free memory 1333 referenced by `yylval'. 1334 1335 Another Bison feature requiring special consideration is `YYERROR' 1336 (*note Action Features::), which you can invoke in a semantic action to 1337 initiate error recovery. During deterministic GLR operation, the 1338 effect of `YYERROR' is the same as its effect in an LALR(1) parser. In 1339 a deferred semantic action, its effect is undefined. 1340 1341 Also, see *Note Default Action for Locations: Location Default 1342 Action, which describes a special usage of `YYLLOC_DEFAULT' in GLR 1343 parsers. 1344 1345 1346 File: bison.info, Node: Compiler Requirements, Prev: GLR Semantic Actions, Up: GLR Parsers 1347 1348 1.5.4 Considerations when Compiling GLR Parsers 1349 ----------------------------------------------- 1350 1351 The GLR parsers require a compiler for ISO C89 or later. In addition, 1352 they use the `inline' keyword, which is not C89, but is C99 and is a 1353 common extension in pre-C99 compilers. It is up to the user of these 1354 parsers to handle portability issues. For instance, if using Autoconf 1355 and the Autoconf macro `AC_C_INLINE', a mere 1356 1357 %{ 1358 #include <config.h> 1359 %} 1360 1361 will suffice. Otherwise, we suggest 1362 1363 %{ 1364 #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline 1365 #define inline 1366 #endif 1367 %} 1368 1369 1370 File: bison.info, Node: Locations Overview, Next: Bison Parser, Prev: GLR Parsers, Up: Concepts 1371 1372 1.6 Locations 1373 ============= 1374 1375 Many applications, like interpreters or compilers, have to produce 1376 verbose and useful error messages. To achieve this, one must be able 1377 to keep track of the "textual location", or "location", of each 1378 syntactic construct. Bison provides a mechanism for handling these 1379 locations. 1380 1381 Each token has a semantic value. In a similar fashion, each token 1382 has an associated location, but the type of locations is the same for 1383 all tokens and groupings. Moreover, the output parser is equipped with 1384 a default data structure for storing locations (*note Locations::, for 1385 more details). 1386 1387 Like semantic values, locations can be reached in actions using a 1388 dedicated set of constructs. In the example above, the location of the 1389 whole grouping is `@$', while the locations of the subexpressions are 1390 `@1' and `@3'. 1391 1392 When a rule is matched, a default action is used to compute the 1393 semantic value of its left hand side (*note Actions::). In the same 1394 way, another default action is used for locations. However, the action 1395 for locations is general enough for most cases, meaning there is 1396 usually no need to describe for each rule how `@$' should be formed. 1397 When building a new location for a given grouping, the default behavior 1398 of the output parser is to take the beginning of the first symbol, and 1399 the end of the last symbol. 1400 1401 1402 File: bison.info, Node: Bison Parser, Next: Stages, Prev: Locations Overview, Up: Concepts 1403 1404 1.7 Bison Output: the Parser File 1405 ================================= 1406 1407 When you run Bison, you give it a Bison grammar file as input. The 1408 output is a C source file that parses the language described by the 1409 grammar. This file is called a "Bison parser". Keep in mind that the 1410 Bison utility and the Bison parser are two distinct programs: the Bison 1411 utility is a program whose output is the Bison parser that becomes part 1412 of your program. 1413 1414 The job of the Bison parser is to group tokens into groupings 1415 according to the grammar rules--for example, to build identifiers and 1416 operators into expressions. As it does this, it runs the actions for 1417 the grammar rules it uses. 1418 1419 The tokens come from a function called the "lexical analyzer" that 1420 you must supply in some fashion (such as by writing it in C). The Bison 1421 parser calls the lexical analyzer each time it wants a new token. It 1422 doesn't know what is "inside" the tokens (though their semantic values 1423 may reflect this). Typically the lexical analyzer makes the tokens by 1424 parsing characters of text, but Bison does not depend on this. *Note 1425 The Lexical Analyzer Function `yylex': Lexical. 1426 1427 The Bison parser file is C code which defines a function named 1428 `yyparse' which implements that grammar. This function does not make a 1429 complete C program: you must supply some additional functions. One is 1430 the lexical analyzer. Another is an error-reporting function which the 1431 parser calls to report an error. In addition, a complete C program must 1432 start with a function called `main'; you have to provide this, and 1433 arrange for it to call `yyparse' or the parser will never run. *Note 1434 Parser C-Language Interface: Interface. 1435 1436 Aside from the token type names and the symbols in the actions you 1437 write, all symbols defined in the Bison parser file itself begin with 1438 `yy' or `YY'. This includes interface functions such as the lexical 1439 analyzer function `yylex', the error reporting function `yyerror' and 1440 the parser function `yyparse' itself. This also includes numerous 1441 identifiers used for internal purposes. Therefore, you should avoid 1442 using C identifiers starting with `yy' or `YY' in the Bison grammar 1443 file except for the ones defined in this manual. Also, you should 1444 avoid using the C identifiers `malloc' and `free' for anything other 1445 than their usual meanings. 1446 1447 In some cases the Bison parser file includes system headers, and in 1448 those cases your code should respect the identifiers reserved by those 1449 headers. On some non-GNU hosts, `<alloca.h>', `<malloc.h>', 1450 `<stddef.h>', and `<stdlib.h>' are included as needed to declare memory 1451 allocators and related types. `<libintl.h>' is included if message 1452 translation is in use (*note Internationalization::). Other system 1453 headers may be included if you define `YYDEBUG' to a nonzero value 1454 (*note Tracing Your Parser: Tracing.). 1455 1456 1457 File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts 1458 1459 1.8 Stages in Using Bison 1460 ========================= 1461 1462 The actual language-design process using Bison, from grammar 1463 specification to a working compiler or interpreter, has these parts: 1464 1465 1. Formally specify the grammar in a form recognized by Bison (*note 1466 Bison Grammar Files: Grammar File.). For each grammatical rule in 1467 the language, describe the action that is to be taken when an 1468 instance of that rule is recognized. The action is described by a 1469 sequence of C statements. 1470 1471 2. Write a lexical analyzer to process input and pass tokens to the 1472 parser. The lexical analyzer may be written by hand in C (*note 1473 The Lexical Analyzer Function `yylex': Lexical.). It could also 1474 be produced using Lex, but the use of Lex is not discussed in this 1475 manual. 1476 1477 3. Write a controlling function that calls the Bison-produced parser. 1478 1479 4. Write error-reporting routines. 1480 1481 To turn this source code as written into a runnable program, you 1482 must follow these steps: 1483 1484 1. Run Bison on the grammar to produce the parser. 1485 1486 2. Compile the code output by Bison, as well as any other source 1487 files. 1488 1489 3. Link the object files to produce the finished product. 1490 1491 1492 File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts 1493 1494 1.9 The Overall Layout of a Bison Grammar 1495 ========================================= 1496 1497 The input file for the Bison utility is a "Bison grammar file". The 1498 general form of a Bison grammar file is as follows: 1499 1500 %{ 1501 PROLOGUE 1502 %} 1503 1504 BISON DECLARATIONS 1505 1506 %% 1507 GRAMMAR RULES 1508 %% 1509 EPILOGUE 1510 1511 The `%%', `%{' and `%}' are punctuation that appears in every Bison 1512 grammar file to separate the sections. 1513 1514 The prologue may define types and variables used in the actions. 1515 You can also use preprocessor commands to define macros used there, and 1516 use `#include' to include header files that do any of these things. 1517 You need to declare the lexical analyzer `yylex' and the error printer 1518 `yyerror' here, along with any other global identifiers used by the 1519 actions in the grammar rules. 1520 1521 The Bison declarations declare the names of the terminal and 1522 nonterminal symbols, and may also describe operator precedence and the 1523 data types of semantic values of various symbols. 1524 1525 The grammar rules define how to construct each nonterminal symbol 1526 from its parts. 1527 1528 The epilogue can contain any code you want to use. Often the 1529 definitions of functions declared in the prologue go here. In a simple 1530 program, all the rest of the program can go here. 1531 1532 1533 File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top 1534 1535 2 Examples 1536 ********** 1537 1538 Now we show and explain three sample programs written using Bison: a 1539 reverse polish notation calculator, an algebraic (infix) notation 1540 calculator, and a multi-function calculator. All three have been tested 1541 under BSD Unix 4.3; each produces a usable, though limited, interactive 1542 desk-top calculator. 1543 1544 These examples are simple, but Bison grammars for real programming 1545 languages are written the same way. You can copy these examples into a 1546 source file to try them. 1547 1548 * Menu: 1549 1550 * RPN Calc:: Reverse polish notation calculator; 1551 a first example with no operator precedence. 1552 * Infix Calc:: Infix (algebraic) notation calculator. 1553 Operator precedence is introduced. 1554 * Simple Error Recovery:: Continuing after syntax errors. 1555 * Location Tracking Calc:: Demonstrating the use of @N and @$. 1556 * Multi-function Calc:: Calculator with memory and trig functions. 1557 It uses multiple data-types for semantic values. 1558 * Exercises:: Ideas for improving the multi-function calculator. 1559 1560 1561 File: bison.info, Node: RPN Calc, Next: Infix Calc, Up: Examples 1562 1563 2.1 Reverse Polish Notation Calculator 1564 ====================================== 1565 1566 The first example is that of a simple double-precision "reverse polish 1567 notation" calculator (a calculator using postfix operators). This 1568 example provides a good starting point, since operator precedence is 1569 not an issue. The second example will illustrate how operator 1570 precedence is handled. 1571 1572 The source code for this calculator is named `rpcalc.y'. The `.y' 1573 extension is a convention used for Bison input files. 1574 1575 * Menu: 1576 1577 * Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. 1578 * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. 1579 * Lexer: Rpcalc Lexer. The lexical analyzer. 1580 * Main: Rpcalc Main. The controlling function. 1581 * Error: Rpcalc Error. The error reporting function. 1582 * Gen: Rpcalc Gen. Running Bison on the grammar file. 1583 * Comp: Rpcalc Compile. Run the C compiler on the output code. 1584 1585 1586 File: bison.info, Node: Rpcalc Decls, Next: Rpcalc Rules, Up: RPN Calc 1587 1588 2.1.1 Declarations for `rpcalc' 1589 ------------------------------- 1590 1591 Here are the C and Bison declarations for the reverse polish notation 1592 calculator. As in C, comments are placed between `/*...*/'. 1593 1594 /* Reverse polish notation calculator. */ 1595 1596 %{ 1597 #define YYSTYPE double 1598 #include <math.h> 1599 int yylex (void); 1600 void yyerror (char const *); 1601 %} 1602 1603 %token NUM 1604 1605 %% /* Grammar rules and actions follow. */ 1606 1607 The declarations section (*note The prologue: Prologue.) contains two 1608 preprocessor directives and two forward declarations. 1609 1610 The `#define' directive defines the macro `YYSTYPE', thus specifying 1611 the C data type for semantic values of both tokens and groupings (*note 1612 Data Types of Semantic Values: Value Type.). The Bison parser will use 1613 whatever type `YYSTYPE' is defined as; if you don't define it, `int' is 1614 the default. Because we specify `double', each token and each 1615 expression has an associated value, which is a floating point number. 1616 1617 The `#include' directive is used to declare the exponentiation 1618 function `pow'. 1619 1620 The forward declarations for `yylex' and `yyerror' are needed 1621 because the C language requires that functions be declared before they 1622 are used. These functions will be defined in the epilogue, but the 1623 parser calls them so they must be declared in the prologue. 1624 1625 The second section, Bison declarations, provides information to Bison 1626 about the token types (*note The Bison Declarations Section: Bison 1627 Declarations.). Each terminal symbol that is not a single-character 1628 literal must be declared here. (Single-character literals normally 1629 don't need to be declared.) In this example, all the arithmetic 1630 operators are designated by single-character literals, so the only 1631 terminal symbol that needs to be declared is `NUM', the token type for 1632 numeric constants. 1633 1634 1635 File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc 1636 1637 2.1.2 Grammar Rules for `rpcalc' 1638 -------------------------------- 1639 1640 Here are the grammar rules for the reverse polish notation calculator. 1641 1642 input: /* empty */ 1643 | input line 1644 ; 1645 1646 line: '\n' 1647 | exp '\n' { printf ("\t%.10g\n", $1); } 1648 ; 1649 1650 exp: NUM { $$ = $1; } 1651 | exp exp '+' { $$ = $1 + $2; } 1652 | exp exp '-' { $$ = $1 - $2; } 1653 | exp exp '*' { $$ = $1 * $2; } 1654 | exp exp '/' { $$ = $1 / $2; } 1655 /* Exponentiation */ 1656 | exp exp '^' { $$ = pow ($1, $2); } 1657 /* Unary minus */ 1658 | exp 'n' { $$ = -$1; } 1659 ; 1660 %% 1661 1662 The groupings of the rpcalc "language" defined here are the 1663 expression (given the name `exp'), the line of input (`line'), and the 1664 complete input transcript (`input'). Each of these nonterminal symbols 1665 has several alternate rules, joined by the vertical bar `|' which is 1666 read as "or". The following sections explain what these rules mean. 1667 1668 The semantics of the language is determined by the actions taken 1669 when a grouping is recognized. The actions are the C code that appears 1670 inside braces. *Note Actions::. 1671 1672 You must specify these actions in C, but Bison provides the means for 1673 passing semantic values between the rules. In each action, the 1674 pseudo-variable `$$' stands for the semantic value for the grouping 1675 that the rule is going to construct. Assigning a value to `$$' is the 1676 main job of most actions. The semantic values of the components of the 1677 rule are referred to as `$1', `$2', and so on. 1678 1679 * Menu: 1680 1681 * Rpcalc Input:: 1682 * Rpcalc Line:: 1683 * Rpcalc Expr:: 1684 1685 1686 File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules 1687 1688 2.1.2.1 Explanation of `input' 1689 .............................. 1690 1691 Consider the definition of `input': 1692 1693 input: /* empty */ 1694 | input line 1695 ; 1696 1697 This definition reads as follows: "A complete input is either an 1698 empty string, or a complete input followed by an input line". Notice 1699 that "complete input" is defined in terms of itself. This definition 1700 is said to be "left recursive" since `input' appears always as the 1701 leftmost symbol in the sequence. *Note Recursive Rules: Recursion. 1702 1703 The first alternative is empty because there are no symbols between 1704 the colon and the first `|'; this means that `input' can match an empty 1705 string of input (no tokens). We write the rules this way because it is 1706 legitimate to type `Ctrl-d' right after you start the calculator. It's 1707 conventional to put an empty alternative first and write the comment 1708 `/* empty */' in it. 1709 1710 The second alternate rule (`input line') handles all nontrivial 1711 input. It means, "After reading any number of lines, read one more 1712 line if possible." The left recursion makes this rule into a loop. 1713 Since the first alternative matches empty input, the loop can be 1714 executed zero or more times. 1715 1716 The parser function `yyparse' continues to process input until a 1717 grammatical error is seen or the lexical analyzer says there are no more 1718 input tokens; we will arrange for the latter to happen at end-of-input. 1719 1720 1721 File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules 1722 1723 2.1.2.2 Explanation of `line' 1724 ............................. 1725 1726 Now consider the definition of `line': 1727 1728 line: '\n' 1729 | exp '\n' { printf ("\t%.10g\n", $1); } 1730 ; 1731 1732 The first alternative is a token which is a newline character; this 1733 means that rpcalc accepts a blank line (and ignores it, since there is 1734 no action). The second alternative is an expression followed by a 1735 newline. This is the alternative that makes rpcalc useful. The 1736 semantic value of the `exp' grouping is the value of `$1' because the 1737 `exp' in question is the first symbol in the alternative. The action 1738 prints this value, which is the result of the computation the user 1739 asked for. 1740 1741 This action is unusual because it does not assign a value to `$$'. 1742 As a consequence, the semantic value associated with the `line' is 1743 uninitialized (its value will be unpredictable). This would be a bug if 1744 that value were ever used, but we don't use it: once rpcalc has printed 1745 the value of the user's input line, that value is no longer needed. 1746 1747 1748 File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules 1749 1750 2.1.2.3 Explanation of `expr' 1751 ............................. 1752 1753 The `exp' grouping has several rules, one for each kind of expression. 1754 The first rule handles the simplest expressions: those that are just 1755 numbers. The second handles an addition-expression, which looks like 1756 two expressions followed by a plus-sign. The third handles 1757 subtraction, and so on. 1758 1759 exp: NUM 1760 | exp exp '+' { $$ = $1 + $2; } 1761 | exp exp '-' { $$ = $1 - $2; } 1762 ... 1763 ; 1764 1765 We have used `|' to join all the rules for `exp', but we could 1766 equally well have written them separately: 1767 1768 exp: NUM ; 1769 exp: exp exp '+' { $$ = $1 + $2; } ; 1770 exp: exp exp '-' { $$ = $1 - $2; } ; 1771 ... 1772 1773 Most of the rules have actions that compute the value of the 1774 expression in terms of the value of its parts. For example, in the 1775 rule for addition, `$1' refers to the first component `exp' and `$2' 1776 refers to the second one. The third component, `'+'', has no meaningful 1777 associated semantic value, but if it had one you could refer to it as 1778 `$3'. When `yyparse' recognizes a sum expression using this rule, the 1779 sum of the two subexpressions' values is produced as the value of the 1780 entire expression. *Note Actions::. 1781 1782 You don't have to give an action for every rule. When a rule has no 1783 action, Bison by default copies the value of `$1' into `$$'. This is 1784 what happens in the first rule (the one that uses `NUM'). 1785 1786 The formatting shown here is the recommended convention, but Bison 1787 does not require it. You can add or change white space as much as you 1788 wish. For example, this: 1789 1790 exp : NUM | exp exp '+' {$$ = $1 + $2; } | ... ; 1791 1792 means the same thing as this: 1793 1794 exp: NUM 1795 | exp exp '+' { $$ = $1 + $2; } 1796 | ... 1797 ; 1798 1799 The latter, however, is much more readable. 1800 1801 1802 File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc 1803 1804 2.1.3 The `rpcalc' Lexical Analyzer 1805 ----------------------------------- 1806 1807 The lexical analyzer's job is low-level parsing: converting characters 1808 or sequences of characters into tokens. The Bison parser gets its 1809 tokens by calling the lexical analyzer. *Note The Lexical Analyzer 1810 Function `yylex': Lexical. 1811 1812 Only a simple lexical analyzer is needed for the RPN calculator. 1813 This lexical analyzer skips blanks and tabs, then reads in numbers as 1814 `double' and returns them as `NUM' tokens. Any other character that 1815 isn't part of a number is a separate token. Note that the token-code 1816 for such a single-character token is the character itself. 1817 1818 The return value of the lexical analyzer function is a numeric code 1819 which represents a token type. The same text used in Bison rules to 1820 stand for this token type is also a C expression for the numeric code 1821 for the type. This works in two ways. If the token type is a 1822 character literal, then its numeric code is that of the character; you 1823 can use the same character literal in the lexical analyzer to express 1824 the number. If the token type is an identifier, that identifier is 1825 defined by Bison as a C macro whose definition is the appropriate 1826 number. In this example, therefore, `NUM' becomes a macro for `yylex' 1827 to use. 1828 1829 The semantic value of the token (if it has one) is stored into the 1830 global variable `yylval', which is where the Bison parser will look for 1831 it. (The C data type of `yylval' is `YYSTYPE', which was defined at 1832 the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc 1833 Decls.) 1834 1835 A token type code of zero is returned if the end-of-input is 1836 encountered. (Bison recognizes any nonpositive value as indicating 1837 end-of-input.) 1838 1839 Here is the code for the lexical analyzer: 1840 1841 /* The lexical analyzer returns a double floating point 1842 number on the stack and the token NUM, or the numeric code 1843 of the character read if not a number. It skips all blanks 1844 and tabs, and returns 0 for end-of-input. */ 1845 1846 #include <ctype.h> 1847 1848 int 1849 yylex (void) 1850 { 1851 int c; 1852 1853 /* Skip white space. */ 1854 while ((c = getchar ()) == ' ' || c == '\t') 1855 ; 1856 /* Process numbers. */ 1857 if (c == '.' || isdigit (c)) 1858 { 1859 ungetc (c, stdin); 1860 scanf ("%lf", &yylval); 1861 return NUM; 1862 } 1863 /* Return end-of-input. */ 1864 if (c == EOF) 1865 return 0; 1866 /* Return a single char. */ 1867 return c; 1868 } 1869 1870 1871 File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc 1872 1873 2.1.4 The Controlling Function 1874 ------------------------------ 1875 1876 In keeping with the spirit of this example, the controlling function is 1877 kept to the bare minimum. The only requirement is that it call 1878 `yyparse' to start the process of parsing. 1879 1880 int 1881 main (void) 1882 { 1883 return yyparse (); 1884 } 1885 1886 1887 File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc 1888 1889 2.1.5 The Error Reporting Routine 1890 --------------------------------- 1891 1892 When `yyparse' detects a syntax error, it calls the error reporting 1893 function `yyerror' to print an error message (usually but not always 1894 `"syntax error"'). It is up to the programmer to supply `yyerror' 1895 (*note Parser C-Language Interface: Interface.), so here is the 1896 definition we will use: 1897 1898 #include <stdio.h> 1899 1900 /* Called by yyparse on error. */ 1901 void 1902 yyerror (char const *s) 1903 { 1904 fprintf (stderr, "%s\n", s); 1905 } 1906 1907 After `yyerror' returns, the Bison parser may recover from the error 1908 and continue parsing if the grammar contains a suitable error rule 1909 (*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We 1910 have not written any error rules in this example, so any invalid input 1911 will cause the calculator program to exit. This is not clean behavior 1912 for a real calculator, but it is adequate for the first example. 1913 1914 1915 File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc 1916 1917 2.1.6 Running Bison to Make the Parser 1918 -------------------------------------- 1919 1920 Before running Bison to produce a parser, we need to decide how to 1921 arrange all the source code in one or more source files. For such a 1922 simple example, the easiest thing is to put everything in one file. The 1923 definitions of `yylex', `yyerror' and `main' go at the end, in the 1924 epilogue of the file (*note The Overall Layout of a Bison Grammar: 1925 Grammar Layout.). 1926 1927 For a large project, you would probably have several source files, 1928 and use `make' to arrange to recompile them. 1929 1930 With all the source in a single file, you use the following command 1931 to convert it into a parser file: 1932 1933 bison FILE.y 1934 1935 In this example the file was called `rpcalc.y' (for "Reverse Polish 1936 CALCulator"). Bison produces a file named `FILE.tab.c', removing the 1937 `.y' from the original file name. The file output by Bison contains 1938 the source code for `yyparse'. The additional functions in the input 1939 file (`yylex', `yyerror' and `main') are copied verbatim to the output. 1940 1941 1942 File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc 1943 1944 2.1.7 Compiling the Parser File 1945 ------------------------------- 1946 1947 Here is how to compile and run the parser file: 1948 1949 # List files in current directory. 1950 $ ls 1951 rpcalc.tab.c rpcalc.y 1952 1953 # Compile the Bison parser. 1954 # `-lm' tells compiler to search math library for `pow'. 1955 $ cc -lm -o rpcalc rpcalc.tab.c 1956 1957 # List files again. 1958 $ ls 1959 rpcalc rpcalc.tab.c rpcalc.y 1960 1961 The file `rpcalc' now contains the executable code. Here is an 1962 example session using `rpcalc'. 1963 1964 $ rpcalc 1965 4 9 + 1966 13 1967 3 7 + 3 4 5 *+- 1968 -13 1969 3 7 + 3 4 5 * + - n Note the unary minus, `n' 1970 13 1971 5 6 / 4 n + 1972 -3.166666667 1973 3 4 ^ Exponentiation 1974 81 1975 ^D End-of-file indicator 1976 $ 1977 1978 1979 File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples 1980 1981 2.2 Infix Notation Calculator: `calc' 1982 ===================================== 1983 1984 We now modify rpcalc to handle infix operators instead of postfix. 1985 Infix notation involves the concept of operator precedence and the need 1986 for parentheses nested to arbitrary depth. Here is the Bison code for 1987 `calc.y', an infix desk-top calculator. 1988 1989 /* Infix notation calculator. */ 1990 1991 %{ 1992 #define YYSTYPE double 1993 #include <math.h> 1994 #include <stdio.h> 1995 int yylex (void); 1996 void yyerror (char const *); 1997 %} 1998 1999 /* Bison declarations. */ 2000 %token NUM 2001 %left '-' '+' 2002 %left '*' '/' 2003 %left NEG /* negation--unary minus */ 2004 %right '^' /* exponentiation */ 2005 2006 %% /* The grammar follows. */ 2007 input: /* empty */ 2008 | input line 2009 ; 2010 2011 line: '\n' 2012 | exp '\n' { printf ("\t%.10g\n", $1); } 2013 ; 2014 2015 exp: NUM { $$ = $1; } 2016 | exp '+' exp { $$ = $1 + $3; } 2017 | exp '-' exp { $$ = $1 - $3; } 2018 | exp '*' exp { $$ = $1 * $3; } 2019 | exp '/' exp { $$ = $1 / $3; } 2020 | '-' exp %prec NEG { $$ = -$2; } 2021 | exp '^' exp { $$ = pow ($1, $3); } 2022 | '(' exp ')' { $$ = $2; } 2023 ; 2024 %% 2025 2026 The functions `yylex', `yyerror' and `main' can be the same as before. 2027 2028 There are two important new features shown in this code. 2029 2030 In the second section (Bison declarations), `%left' declares token 2031 types and says they are left-associative operators. The declarations 2032 `%left' and `%right' (right associativity) take the place of `%token' 2033 which is used to declare a token type name without associativity. 2034 (These tokens are single-character literals, which ordinarily don't 2035 need to be declared. We declare them here to specify the 2036 associativity.) 2037 2038 Operator precedence is determined by the line ordering of the 2039 declarations; the higher the line number of the declaration (lower on 2040 the page or screen), the higher the precedence. Hence, exponentiation 2041 has the highest precedence, unary minus (`NEG') is next, followed by 2042 `*' and `/', and so on. *Note Operator Precedence: Precedence. 2043 2044 The other important new feature is the `%prec' in the grammar 2045 section for the unary minus operator. The `%prec' simply instructs 2046 Bison that the rule `| '-' exp' has the same precedence as `NEG'--in 2047 this case the next-to-highest. *Note Context-Dependent Precedence: 2048 Contextual Precedence. 2049 2050 Here is a sample run of `calc.y': 2051 2052 $ calc 2053 4 + 4.5 - (34/(8*3+-3)) 2054 6.880952381 2055 -56 + 2 2056 -54 2057 3 ^ 2 2058 9 2059 2060 2061 File: bison.info, Node: Simple Error Recovery, Next: Location Tracking Calc, Prev: Infix Calc, Up: Examples 2062 2063 2.3 Simple Error Recovery 2064 ========================= 2065 2066 Up to this point, this manual has not addressed the issue of "error 2067 recovery"--how to continue parsing after the parser detects a syntax 2068 error. All we have handled is error reporting with `yyerror'. Recall 2069 that by default `yyparse' returns after calling `yyerror'. This means 2070 that an erroneous input line causes the calculator program to exit. 2071 Now we show how to rectify this deficiency. 2072 2073 The Bison language itself includes the reserved word `error', which 2074 may be included in the grammar rules. In the example below it has been 2075 added to one of the alternatives for `line': 2076 2077 line: '\n' 2078 | exp '\n' { printf ("\t%.10g\n", $1); } 2079 | error '\n' { yyerrok; } 2080 ; 2081 2082 This addition to the grammar allows for simple error recovery in the 2083 event of a syntax error. If an expression that cannot be evaluated is 2084 read, the error will be recognized by the third rule for `line', and 2085 parsing will continue. (The `yyerror' function is still called upon to 2086 print its message as well.) The action executes the statement 2087 `yyerrok', a macro defined automatically by Bison; its meaning is that 2088 error recovery is complete (*note Error Recovery::). Note the 2089 difference between `yyerrok' and `yyerror'; neither one is a misprint. 2090 2091 This form of error recovery deals with syntax errors. There are 2092 other kinds of errors; for example, division by zero, which raises an 2093 exception signal that is normally fatal. A real calculator program 2094 must handle this signal and use `longjmp' to return to `main' and 2095 resume parsing input lines; it would also have to discard the rest of 2096 the current line of input. We won't discuss this issue further because 2097 it is not specific to Bison programs. 2098 2099 2100 File: bison.info, Node: Location Tracking Calc, Next: Multi-function Calc, Prev: Simple Error Recovery, Up: Examples 2101 2102 2.4 Location Tracking Calculator: `ltcalc' 2103 ========================================== 2104 2105 This example extends the infix notation calculator with location 2106 tracking. This feature will be used to improve the error messages. For 2107 the sake of clarity, this example is a simple integer calculator, since 2108 most of the work needed to use locations will be done in the lexical 2109 analyzer. 2110 2111 * Menu: 2112 2113 * Decls: Ltcalc Decls. Bison and C declarations for ltcalc. 2114 * Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. 2115 * Lexer: Ltcalc Lexer. The lexical analyzer. 2116 2117 2118 File: bison.info, Node: Ltcalc Decls, Next: Ltcalc Rules, Up: Location Tracking Calc 2119 2120 2.4.1 Declarations for `ltcalc' 2121 ------------------------------- 2122 2123 The C and Bison declarations for the location tracking calculator are 2124 the same as the declarations for the infix notation calculator. 2125 2126 /* Location tracking calculator. */ 2127 2128 %{ 2129 #define YYSTYPE int 2130 #include <math.h> 2131 int yylex (void); 2132 void yyerror (char const *); 2133 %} 2134 2135 /* Bison declarations. */ 2136 %token NUM 2137 2138 %left '-' '+' 2139 %left '*' '/' 2140 %left NEG 2141 %right '^' 2142 2143 %% /* The grammar follows. */ 2144 2145 Note there are no declarations specific to locations. Defining a data 2146 type for storing locations is not needed: we will use the type provided 2147 by default (*note Data Types of Locations: Location Type.), which is a 2148 four member structure with the following integer fields: `first_line', 2149 `first_column', `last_line' and `last_column'. 2150 2151 2152 File: bison.info, Node: Ltcalc Rules, Next: Ltcalc Lexer, Prev: Ltcalc Decls, Up: Location Tracking Calc 2153 2154 2.4.2 Grammar Rules for `ltcalc' 2155 -------------------------------- 2156 2157 Whether handling locations or not has no effect on the syntax of your 2158 language. Therefore, grammar rules for this example will be very close 2159 to those of the previous example: we will only modify them to benefit 2160 from the new information. 2161 2162 Here, we will use locations to report divisions by zero, and locate 2163 the wrong expressions or subexpressions. 2164 2165 input : /* empty */ 2166 | input line 2167 ; 2168 2169 line : '\n' 2170 | exp '\n' { printf ("%d\n", $1); } 2171 ; 2172 2173 exp : NUM { $$ = $1; } 2174 | exp '+' exp { $$ = $1 + $3; } 2175 | exp '-' exp { $$ = $1 - $3; } 2176 | exp '*' exp { $$ = $1 * $3; } 2177 | exp '/' exp 2178 { 2179 if ($3) 2180 $$ = $1 / $3; 2181 else 2182 { 2183 $$ = 1; 2184 fprintf (stderr, "%d.%d-%d.%d: division by zero", 2185 @3.first_line, @3.first_column, 2186 @3.last_line, @3.last_column); 2187 } 2188 } 2189 | '-' exp %preg NEG { $$ = -$2; } 2190 | exp '^' exp { $$ = pow ($1, $3); } 2191 | '(' exp ')' { $$ = $2; } 2192 2193 This code shows how to reach locations inside of semantic actions, by 2194 using the pseudo-variables `@N' for rule components, and the 2195 pseudo-variable `@$' for groupings. 2196 2197 We don't need to assign a value to `@$': the output parser does it 2198 automatically. By default, before executing the C code of each action, 2199 `@$' is set to range from the beginning of `@1' to the end of `@N', for 2200 a rule with N components. This behavior can be redefined (*note 2201 Default Action for Locations: Location Default Action.), and for very 2202 specific rules, `@$' can be computed by hand. 2203 2204 2205 File: bison.info, Node: Ltcalc Lexer, Prev: Ltcalc Rules, Up: Location Tracking Calc 2206 2207 2.4.3 The `ltcalc' Lexical Analyzer. 2208 ------------------------------------ 2209 2210 Until now, we relied on Bison's defaults to enable location tracking. 2211 The next step is to rewrite the lexical analyzer, and make it able to 2212 feed the parser with the token locations, as it already does for 2213 semantic values. 2214 2215 To this end, we must take into account every single character of the 2216 input text, to avoid the computed locations of being fuzzy or wrong: 2217 2218 int 2219 yylex (void) 2220 { 2221 int c; 2222 2223 /* Skip white space. */ 2224 while ((c = getchar ()) == ' ' || c == '\t') 2225 ++yylloc.last_column; 2226 2227 /* Step. */ 2228 yylloc.first_line = yylloc.last_line; 2229 yylloc.first_column = yylloc.last_column; 2230 2231 /* Process numbers. */ 2232 if (isdigit (c)) 2233 { 2234 yylval = c - '0'; 2235 ++yylloc.last_column; 2236 while (isdigit (c = getchar ())) 2237 { 2238 ++yylloc.last_column; 2239 yylval = yylval * 10 + c - '0'; 2240 } 2241 ungetc (c, stdin); 2242 return NUM; 2243 } 2244 2245 /* Return end-of-input. */ 2246 if (c == EOF) 2247 return 0; 2248 2249 /* Return a single char, and update location. */ 2250 if (c == '\n') 2251 { 2252 ++yylloc.last_line; 2253 yylloc.last_column = 0; 2254 } 2255 else 2256 ++yylloc.last_column; 2257 return c; 2258 } 2259 2260 Basically, the lexical analyzer performs the same processing as 2261 before: it skips blanks and tabs, and reads numbers or single-character 2262 tokens. In addition, it updates `yylloc', the global variable (of type 2263 `YYLTYPE') containing the token's location. 2264 2265 Now, each time this function returns a token, the parser has its 2266 number as well as its semantic value, and its location in the text. 2267 The last needed change is to initialize `yylloc', for example in the 2268 controlling function: 2269 2270 int 2271 main (void) 2272 { 2273 yylloc.first_line = yylloc.last_line = 1; 2274 yylloc.first_column = yylloc.last_column = 0; 2275 return yyparse (); 2276 } 2277 2278 Remember that computing locations is not a matter of syntax. Every 2279 character must be associated to a location update, whether it is in 2280 valid input, in comments, in literal strings, and so on. 2281 2282 2283 File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Location Tracking Calc, Up: Examples 2284 2285 2.5 Multi-Function Calculator: `mfcalc' 2286 ======================================= 2287 2288 Now that the basics of Bison have been discussed, it is time to move on 2289 to a more advanced problem. The above calculators provided only five 2290 functions, `+', `-', `*', `/' and `^'. It would be nice to have a 2291 calculator that provides other mathematical functions such as `sin', 2292 `cos', etc. 2293 2294 It is easy to add new operators to the infix calculator as long as 2295 they are only single-character literals. The lexical analyzer `yylex' 2296 passes back all nonnumeric characters as tokens, so new grammar rules 2297 suffice for adding a new operator. But we want something more 2298 flexible: built-in functions whose syntax has this form: 2299 2300 FUNCTION_NAME (ARGUMENT) 2301 2302 At the same time, we will add memory to the calculator, by allowing you 2303 to create named variables, store values in them, and use them later. 2304 Here is a sample session with the multi-function calculator: 2305 2306 $ mfcalc 2307 pi = 3.141592653589 2308 3.1415926536 2309 sin(pi) 2310 0.0000000000 2311 alpha = beta1 = 2.3 2312 2.3000000000 2313 alpha 2314 2.3000000000 2315 ln(alpha) 2316 0.8329091229 2317 exp(ln(beta1)) 2318 2.3000000000 2319 $ 2320 2321 Note that multiple assignment and nested function calls are 2322 permitted. 2323 2324 * Menu: 2325 2326 * Decl: Mfcalc Decl. Bison declarations for multi-function calculator. 2327 * Rules: Mfcalc Rules. Grammar rules for the calculator. 2328 * Symtab: Mfcalc Symtab. Symbol table management subroutines. 2329 2330 2331 File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc 2332 2333 2.5.1 Declarations for `mfcalc' 2334 ------------------------------- 2335 2336 Here are the C and Bison declarations for the multi-function calculator. 2337 2338 %{ 2339 #include <math.h> /* For math functions, cos(), sin(), etc. */ 2340 #include "calc.h" /* Contains definition of `symrec'. */ 2341 int yylex (void); 2342 void yyerror (char const *); 2343 %} 2344 %union { 2345 double val; /* For returning numbers. */ 2346 symrec *tptr; /* For returning symbol-table pointers. */ 2347 } 2348 %token <val> NUM /* Simple double precision number. */ 2349 %token <tptr> VAR FNCT /* Variable and Function. */ 2350 %type <val> exp 2351 2352 %right '=' 2353 %left '-' '+' 2354 %left '*' '/' 2355 %left NEG /* negation--unary minus */ 2356 %right '^' /* exponentiation */ 2357 %% /* The grammar follows. */ 2358 2359 The above grammar introduces only two new features of the Bison 2360 language. These features allow semantic values to have various data 2361 types (*note More Than One Value Type: Multiple Types.). 2362 2363 The `%union' declaration specifies the entire list of possible types; 2364 this is instead of defining `YYSTYPE'. The allowable types are now 2365 double-floats (for `exp' and `NUM') and pointers to entries in the 2366 symbol table. *Note The Collection of Value Types: Union Decl. 2367 2368 Since values can now have various types, it is necessary to 2369 associate a type with each grammar symbol whose semantic value is used. 2370 These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations 2371 are augmented with information about their data type (placed between 2372 angle brackets). 2373 2374 The Bison construct `%type' is used for declaring nonterminal 2375 symbols, just as `%token' is used for declaring token types. We have 2376 not used `%type' before because nonterminal symbols are normally 2377 declared implicitly by the rules that define them. But `exp' must be 2378 declared explicitly so we can specify its value type. *Note 2379 Nonterminal Symbols: Type Decl. 2380 2381 2382 File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc 2383 2384 2.5.2 Grammar Rules for `mfcalc' 2385 -------------------------------- 2386 2387 Here are the grammar rules for the multi-function calculator. Most of 2388 them are copied directly from `calc'; three rules, those which mention 2389 `VAR' or `FNCT', are new. 2390 2391 input: /* empty */ 2392 | input line 2393 ; 2394 2395 line: 2396 '\n' 2397 | exp '\n' { printf ("\t%.10g\n", $1); } 2398 | error '\n' { yyerrok; } 2399 ; 2400 2401 exp: NUM { $$ = $1; } 2402 | VAR { $$ = $1->value.var; } 2403 | VAR '=' exp { $$ = $3; $1->value.var = $3; } 2404 | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); } 2405 | exp '+' exp { $$ = $1 + $3; } 2406 | exp '-' exp { $$ = $1 - $3; } 2407 | exp '*' exp { $$ = $1 * $3; } 2408 | exp '/' exp { $$ = $1 / $3; } 2409 | '-' exp %prec NEG { $$ = -$2; } 2410 | exp '^' exp { $$ = pow ($1, $3); } 2411 | '(' exp ')' { $$ = $2; } 2412 ; 2413 /* End of grammar. */ 2414 %% 2415 2416 2417 File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc 2418 2419 2.5.3 The `mfcalc' Symbol Table 2420 ------------------------------- 2421 2422 The multi-function calculator requires a symbol table to keep track of 2423 the names and meanings of variables and functions. This doesn't affect 2424 the grammar rules (except for the actions) or the Bison declarations, 2425 but it requires some additional C functions for support. 2426 2427 The symbol table itself consists of a linked list of records. Its 2428 definition, which is kept in the header `calc.h', is as follows. It 2429 provides for either functions or variables to be placed in the table. 2430 2431 /* Function type. */ 2432 typedef double (*func_t) (double); 2433 2434 /* Data type for links in the chain of symbols. */ 2435 struct symrec 2436 { 2437 char *name; /* name of symbol */ 2438 int type; /* type of symbol: either VAR or FNCT */ 2439 union 2440 { 2441 double var; /* value of a VAR */ 2442 func_t fnctptr; /* value of a FNCT */ 2443 } value; 2444 struct symrec *next; /* link field */ 2445 }; 2446 2447 typedef struct symrec symrec; 2448 2449 /* The symbol table: a chain of `struct symrec'. */ 2450 extern symrec *sym_table; 2451 2452 symrec *putsym (char const *, int); 2453 symrec *getsym (char const *); 2454 2455 The new version of `main' includes a call to `init_table', a 2456 function that initializes the symbol table. Here it is, and 2457 `init_table' as well: 2458 2459 #include <stdio.h> 2460 2461 /* Called by yyparse on error. */ 2462 void 2463 yyerror (char const *s) 2464 { 2465 printf ("%s\n", s); 2466 } 2467 2468 struct init 2469 { 2470 char const *fname; 2471 double (*fnct) (double); 2472 }; 2473 2474 struct init const arith_fncts[] = 2475 { 2476 "sin", sin, 2477 "cos", cos, 2478 "atan", atan, 2479 "ln", log, 2480 "exp", exp, 2481 "sqrt", sqrt, 2482 0, 0 2483 }; 2484 2485 /* The symbol table: a chain of `struct symrec'. */ 2486 symrec *sym_table; 2487 2488 /* Put arithmetic functions in table. */ 2489 void 2490 init_table (void) 2491 { 2492 int i; 2493 symrec *ptr; 2494 for (i = 0; arith_fncts[i].fname != 0; i++) 2495 { 2496 ptr = putsym (arith_fncts[i].fname, FNCT); 2497 ptr->value.fnctptr = arith_fncts[i].fnct; 2498 } 2499 } 2500 2501 int 2502 main (void) 2503 { 2504 init_table (); 2505 return yyparse (); 2506 } 2507 2508 By simply editing the initialization list and adding the necessary 2509 include files, you can add additional functions to the calculator. 2510 2511 Two important functions allow look-up and installation of symbols in 2512 the symbol table. The function `putsym' is passed a name and the type 2513 (`VAR' or `FNCT') of the object to be installed. The object is linked 2514 to the front of the list, and a pointer to the object is returned. The 2515 function `getsym' is passed the name of the symbol to look up. If 2516 found, a pointer to that symbol is returned; otherwise zero is returned. 2517 2518 symrec * 2519 putsym (char const *sym_name, int sym_type) 2520 { 2521 symrec *ptr; 2522 ptr = (symrec *) malloc (sizeof (symrec)); 2523 ptr->name = (char *) malloc (strlen (sym_name) + 1); 2524 strcpy (ptr->name,sym_name); 2525 ptr->type = sym_type; 2526 ptr->value.var = 0; /* Set value to 0 even if fctn. */ 2527 ptr->next = (struct symrec *)sym_table; 2528 sym_table = ptr; 2529 return ptr; 2530 } 2531 2532 symrec * 2533 getsym (char const *sym_name) 2534 { 2535 symrec *ptr; 2536 for (ptr = sym_table; ptr != (symrec *) 0; 2537 ptr = (symrec *)ptr->next) 2538 if (strcmp (ptr->name,sym_name) == 0) 2539 return ptr; 2540 return 0; 2541 } 2542 2543 The function `yylex' must now recognize variables, numeric values, 2544 and the single-character arithmetic operators. Strings of alphanumeric 2545 characters with a leading letter are recognized as either variables or 2546 functions depending on what the symbol table says about them. 2547 2548 The string is passed to `getsym' for look up in the symbol table. If 2549 the name appears in the table, a pointer to its location and its type 2550 (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in 2551 the table, then it is installed as a `VAR' using `putsym'. Again, a 2552 pointer and its type (which must be `VAR') is returned to `yyparse'. 2553 2554 No change is needed in the handling of numeric values and arithmetic 2555 operators in `yylex'. 2556 2557 #include <ctype.h> 2558 2559 int 2560 yylex (void) 2561 { 2562 int c; 2563 2564 /* Ignore white space, get first nonwhite character. */ 2565 while ((c = getchar ()) == ' ' || c == '\t'); 2566 2567 if (c == EOF) 2568 return 0; 2569 2570 /* Char starts a number => parse the number. */ 2571 if (c == '.' || isdigit (c)) 2572 { 2573 ungetc (c, stdin); 2574 scanf ("%lf", &yylval.val); 2575 return NUM; 2576 } 2577 2578 /* Char starts an identifier => read the name. */ 2579 if (isalpha (c)) 2580 { 2581 symrec *s; 2582 static char *symbuf = 0; 2583 static int length = 0; 2584 int i; 2585 2586 /* Initially make the buffer long enough 2587 for a 40-character symbol name. */ 2588 if (length == 0) 2589 length = 40, symbuf = (char *)malloc (length + 1); 2590 2591 i = 0; 2592 do 2593 { 2594 /* If buffer is full, make it bigger. */ 2595 if (i == length) 2596 { 2597 length *= 2; 2598 symbuf = (char *) realloc (symbuf, length + 1); 2599 } 2600 /* Add this character to the buffer. */ 2601 symbuf[i++] = c; 2602 /* Get another character. */ 2603 c = getchar (); 2604 } 2605 while (isalnum (c)); 2606 2607 ungetc (c, stdin); 2608 symbuf[i] = '\0'; 2609 2610 s = getsym (symbuf); 2611 if (s == 0) 2612 s = putsym (symbuf, VAR); 2613 yylval.tptr = s; 2614 return s->type; 2615 } 2616 2617 /* Any other character is a token by itself. */ 2618 return c; 2619 } 2620 2621 This program is both powerful and flexible. You may easily add new 2622 functions, and it is a simple job to modify this code to install 2623 predefined variables such as `pi' or `e' as well. 2624 2625 2626 File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples 2627 2628 2.6 Exercises 2629 ============= 2630 2631 1. Add some new functions from `math.h' to the initialization list. 2632 2633 2. Add another array that contains constants and their values. Then 2634 modify `init_table' to add these constants to the symbol table. 2635 It will be easiest to give the constants type `VAR'. 2636 2637 3. Make the program report an error if the user refers to an 2638 uninitialized variable in any way except to store a value in it. 2639 2640 2641 File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top 2642 2643 3 Bison Grammar Files 2644 ********************* 2645 2646 Bison takes as input a context-free grammar specification and produces a 2647 C-language function that recognizes correct instances of the grammar. 2648 2649 The Bison grammar input file conventionally has a name ending in 2650 `.y'. *Note Invoking Bison: Invocation. 2651 2652 * Menu: 2653 2654 * Grammar Outline:: Overall layout of the grammar file. 2655 * Symbols:: Terminal and nonterminal symbols. 2656 * Rules:: How to write grammar rules. 2657 * Recursion:: Writing recursive rules. 2658 * Semantics:: Semantic values and actions. 2659 * Locations:: Locations and actions. 2660 * Declarations:: All kinds of Bison declarations are described here. 2661 * Multiple Parsers:: Putting more than one Bison parser in one program. 2662 2663 2664 File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File 2665 2666 3.1 Outline of a Bison Grammar 2667 ============================== 2668 2669 A Bison grammar file has four main sections, shown here with the 2670 appropriate delimiters: 2671 2672 %{ 2673 PROLOGUE 2674 %} 2675 2676 BISON DECLARATIONS 2677 2678 %% 2679 GRAMMAR RULES 2680 %% 2681 2682 EPILOGUE 2683 2684 Comments enclosed in `/* ... */' may appear in any of the sections. 2685 As a GNU extension, `//' introduces a comment that continues until end 2686 of line. 2687 2688 * Menu: 2689 2690 * Prologue:: Syntax and usage of the prologue. 2691 * Bison Declarations:: Syntax and usage of the Bison declarations section. 2692 * Grammar Rules:: Syntax and usage of the grammar rules section. 2693 * Epilogue:: Syntax and usage of the epilogue. 2694 2695 2696 File: bison.info, Node: Prologue, Next: Bison Declarations, Up: Grammar Outline 2697 2698 3.1.1 The prologue 2699 ------------------ 2700 2701 The PROLOGUE section contains macro definitions and declarations of 2702 functions and variables that are used in the actions in the grammar 2703 rules. These are copied to the beginning of the parser file so that 2704 they precede the definition of `yyparse'. You can use `#include' to 2705 get the declarations from a header file. If you don't need any C 2706 declarations, you may omit the `%{' and `%}' delimiters that bracket 2707 this section. 2708 2709 The PROLOGUE section is terminated by the the first occurrence of 2710 `%}' that is outside a comment, a string literal, or a character 2711 constant. 2712 2713 You may have more than one PROLOGUE section, intermixed with the 2714 BISON DECLARATIONS. This allows you to have C and Bison declarations 2715 that refer to each other. For example, the `%union' declaration may 2716 use types defined in a header file, and you may wish to prototype 2717 functions that take arguments of type `YYSTYPE'. This can be done with 2718 two PROLOGUE blocks, one before and one after the `%union' declaration. 2719 2720 %{ 2721 #include <stdio.h> 2722 #include "ptypes.h" 2723 %} 2724 2725 %union { 2726 long int n; 2727 tree t; /* `tree' is defined in `ptypes.h'. */ 2728 } 2729 2730 %{ 2731 static void print_token_value (FILE *, int, YYSTYPE); 2732 #define YYPRINT(F, N, L) print_token_value (F, N, L) 2733 %} 2734 2735 ... 2736 2737 2738 File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: Prologue, Up: Grammar Outline 2739 2740 3.1.2 The Bison Declarations Section 2741 ------------------------------------ 2742 2743 The BISON DECLARATIONS section contains declarations that define 2744 terminal and nonterminal symbols, specify precedence, and so on. In 2745 some simple grammars you may not need any declarations. *Note Bison 2746 Declarations: Declarations. 2747 2748 2749 File: bison.info, Node: Grammar Rules, Next: Epilogue, Prev: Bison Declarations, Up: Grammar Outline 2750 2751 3.1.3 The Grammar Rules Section 2752 ------------------------------- 2753 2754 The "grammar rules" section contains one or more Bison grammar rules, 2755 and nothing else. *Note Syntax of Grammar Rules: Rules. 2756 2757 There must always be at least one grammar rule, and the first `%%' 2758 (which precedes the grammar rules) may never be omitted even if it is 2759 the first thing in the file. 2760 2761 2762 File: bison.info, Node: Epilogue, Prev: Grammar Rules, Up: Grammar Outline 2763 2764 3.1.4 The epilogue 2765 ------------------ 2766 2767 The EPILOGUE is copied verbatim to the end of the parser file, just as 2768 the PROLOGUE is copied to the beginning. This is the most convenient 2769 place to put anything that you want to have in the parser file but 2770 which need not come before the definition of `yyparse'. For example, 2771 the definitions of `yylex' and `yyerror' often go here. Because C 2772 requires functions to be declared before being used, you often need to 2773 declare functions like `yylex' and `yyerror' in the Prologue, even if 2774 you define them in the Epilogue. *Note Parser C-Language Interface: 2775 Interface. 2776 2777 If the last section is empty, you may omit the `%%' that separates it 2778 from the grammar rules. 2779 2780 The Bison parser itself contains many macros and identifiers whose 2781 names start with `yy' or `YY', so it is a good idea to avoid using any 2782 such names (except those documented in this manual) in the epilogue of 2783 the grammar file. 2784 2785 2786 File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File 2787 2788 3.2 Symbols, Terminal and Nonterminal 2789 ===================================== 2790 2791 "Symbols" in Bison grammars represent the grammatical classifications 2792 of the language. 2793 2794 A "terminal symbol" (also known as a "token type") represents a 2795 class of syntactically equivalent tokens. You use the symbol in grammar 2796 rules to mean that a token in that class is allowed. The symbol is 2797 represented in the Bison parser by a numeric code, and the `yylex' 2798 function returns a token type code to indicate what kind of token has 2799 been read. You don't need to know what the code value is; you can use 2800 the symbol to stand for it. 2801 2802 A "nonterminal symbol" stands for a class of syntactically 2803 equivalent groupings. The symbol name is used in writing grammar rules. 2804 By convention, it should be all lower case. 2805 2806 Symbol names can contain letters, digits (not at the beginning), 2807 underscores and periods. Periods make sense only in nonterminals. 2808 2809 There are three ways of writing terminal symbols in the grammar: 2810 2811 * A "named token type" is written with an identifier, like an 2812 identifier in C. By convention, it should be all upper case. Each 2813 such name must be defined with a Bison declaration such as 2814 `%token'. *Note Token Type Names: Token Decl. 2815 2816 * A "character token type" (or "literal character token") is written 2817 in the grammar using the same syntax used in C for character 2818 constants; for example, `'+'' is a character token type. A 2819 character token type doesn't need to be declared unless you need to 2820 specify its semantic value data type (*note Data Types of Semantic 2821 Values: Value Type.), associativity, or precedence (*note Operator 2822 Precedence: Precedence.). 2823 2824 By convention, a character token type is used only to represent a 2825 token that consists of that particular character. Thus, the token 2826 type `'+'' is used to represent the character `+' as a token. 2827 Nothing enforces this convention, but if you depart from it, your 2828 program will confuse other readers. 2829 2830 All the usual escape sequences used in character literals in C can 2831 be used in Bison as well, but you must not use the null character 2832 as a character literal because its numeric code, zero, signifies 2833 end-of-input (*note Calling Convention for `yylex': Calling 2834 Convention.). Also, unlike standard C, trigraphs have no special 2835 meaning in Bison character literals, nor is backslash-newline 2836 allowed. 2837 2838 * A "literal string token" is written like a C string constant; for 2839 example, `"<="' is a literal string token. A literal string token 2840 doesn't need to be declared unless you need to specify its semantic 2841 value data type (*note Value Type::), associativity, or precedence 2842 (*note Precedence::). 2843 2844 You can associate the literal string token with a symbolic name as 2845 an alias, using the `%token' declaration (*note Token 2846 Declarations: Token Decl.). If you don't do that, the lexical 2847 analyzer has to retrieve the token number for the literal string 2848 token from the `yytname' table (*note Calling Convention::). 2849 2850 *Warning*: literal string tokens do not work in Yacc. 2851 2852 By convention, a literal string token is used only to represent a 2853 token that consists of that particular string. Thus, you should 2854 use the token type `"<="' to represent the string `<=' as a token. 2855 Bison does not enforce this convention, but if you depart from 2856 it, people who read your program will be confused. 2857 2858 All the escape sequences used in string literals in C can be used 2859 in Bison as well, except that you must not use a null character 2860 within a string literal. Also, unlike Standard C, trigraphs have 2861 no special meaning in Bison string literals, nor is 2862 backslash-newline allowed. A literal string token must contain 2863 two or more characters; for a token containing just one character, 2864 use a character token (see above). 2865 2866 How you choose to write a terminal symbol has no effect on its 2867 grammatical meaning. That depends only on where it appears in rules and 2868 on when the parser function returns that symbol. 2869 2870 The value returned by `yylex' is always one of the terminal symbols, 2871 except that a zero or negative value signifies end-of-input. Whichever 2872 way you write the token type in the grammar rules, you write it the 2873 same way in the definition of `yylex'. The numeric code for a 2874 character token type is simply the positive numeric code of the 2875 character, so `yylex' can use the identical value to generate the 2876 requisite code, though you may need to convert it to `unsigned char' to 2877 avoid sign-extension on hosts where `char' is signed. Each named token 2878 type becomes a C macro in the parser file, so `yylex' can use the name 2879 to stand for the code. (This is why periods don't make sense in 2880 terminal symbols.) *Note Calling Convention for `yylex': Calling 2881 Convention. 2882 2883 If `yylex' is defined in a separate file, you need to arrange for the 2884 token-type macro definitions to be available there. Use the `-d' 2885 option when you run Bison, so that it will write these macro definitions 2886 into a separate header file `NAME.tab.h' which you can include in the 2887 other source files that need it. *Note Invoking Bison: Invocation. 2888 2889 If you want to write a grammar that is portable to any Standard C 2890 host, you must use only nonnull character tokens taken from the basic 2891 execution character set of Standard C. This set consists of the ten 2892 digits, the 52 lower- and upper-case English letters, and the 2893 characters in the following C-language string: 2894 2895 "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~" 2896 2897 The `yylex' function and Bison must use a consistent character set 2898 and encoding for character tokens. For example, if you run Bison in an 2899 ASCII environment, but then compile and run the resulting program in an 2900 environment that uses an incompatible character set like EBCDIC, the 2901 resulting program may not work because the tables generated by Bison 2902 will assume ASCII numeric values for character tokens. It is standard 2903 practice for software distributions to contain C source files that were 2904 generated by Bison in an ASCII environment, so installers on platforms 2905 that are incompatible with ASCII must rebuild those files before 2906 compiling them. 2907 2908 The symbol `error' is a terminal symbol reserved for error recovery 2909 (*note Error Recovery::); you shouldn't use it for any other purpose. 2910 In particular, `yylex' should never return this value. The default 2911 value of the error token is 256, unless you explicitly assigned 256 to 2912 one of your tokens with a `%token' declaration. 2913 2914 2915 File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File 2916 2917 3.3 Syntax of Grammar Rules 2918 =========================== 2919 2920 A Bison grammar rule has the following general form: 2921 2922 RESULT: COMPONENTS... 2923 ; 2924 2925 where RESULT is the nonterminal symbol that this rule describes, and 2926 COMPONENTS are various terminal and nonterminal symbols that are put 2927 together by this rule (*note Symbols::). 2928 2929 For example, 2930 2931 exp: exp '+' exp 2932 ; 2933 2934 says that two groupings of type `exp', with a `+' token in between, can 2935 be combined into a larger grouping of type `exp'. 2936 2937 White space in rules is significant only to separate symbols. You 2938 can add extra white space as you wish. 2939 2940 Scattered among the components can be ACTIONS that determine the 2941 semantics of the rule. An action looks like this: 2942 2943 {C STATEMENTS} 2944 2945 This is an example of "braced code", that is, C code surrounded by 2946 braces, much like a compound statement in C. Braced code can contain 2947 any sequence of C tokens, so long as its braces are balanced. Bison 2948 does not check the braced code for correctness directly; it merely 2949 copies the code to the output file, where the C compiler can check it. 2950 2951 Within braced code, the balanced-brace count is not affected by 2952 braces within comments, string literals, or character constants, but it 2953 is affected by the C digraphs `<%' and `%>' that represent braces. At 2954 the top level braced code must be terminated by `}' and not by a 2955 digraph. Bison does not look for trigraphs, so if braced code uses 2956 trigraphs you should ensure that they do not affect the nesting of 2957 braces or the boundaries of comments, string literals, or character 2958 constants. 2959 2960 Usually there is only one action and it follows the components. 2961 *Note Actions::. 2962 2963 Multiple rules for the same RESULT can be written separately or can 2964 be joined with the vertical-bar character `|' as follows: 2965 2966 RESULT: RULE1-COMPONENTS... 2967 | RULE2-COMPONENTS... 2968 ... 2969 ; 2970 2971 They are still considered distinct rules even when joined in this way. 2972 2973 If COMPONENTS in a rule is empty, it means that RESULT can match the 2974 empty string. For example, here is how to define a comma-separated 2975 sequence of zero or more `exp' groupings: 2976 2977 expseq: /* empty */ 2978 | expseq1 2979 ; 2980 2981 expseq1: exp 2982 | expseq1 ',' exp 2983 ; 2984 2985 It is customary to write a comment `/* empty */' in each rule with no 2986 components. 2987 2988 2989 File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File 2990 2991 3.4 Recursive Rules 2992 =================== 2993 2994 A rule is called "recursive" when its RESULT nonterminal appears also 2995 on its right hand side. Nearly all Bison grammars need to use 2996 recursion, because that is the only way to define a sequence of any 2997 number of a particular thing. Consider this recursive definition of a 2998 comma-separated sequence of one or more expressions: 2999 3000 expseq1: exp 3001 | expseq1 ',' exp 3002 ; 3003 3004 Since the recursive use of `expseq1' is the leftmost symbol in the 3005 right hand side, we call this "left recursion". By contrast, here the 3006 same construct is defined using "right recursion": 3007 3008 expseq1: exp 3009 | exp ',' expseq1 3010 ; 3011 3012 Any kind of sequence can be defined using either left recursion or right 3013 recursion, but you should always use left recursion, because it can 3014 parse a sequence of any number of elements with bounded stack space. 3015 Right recursion uses up space on the Bison stack in proportion to the 3016 number of elements in the sequence, because all the elements must be 3017 shifted onto the stack before the rule can be applied even once. *Note 3018 The Bison Parser Algorithm: Algorithm, for further explanation of this. 3019 3020 "Indirect" or "mutual" recursion occurs when the result of the rule 3021 does not appear directly on its right hand side, but does appear in 3022 rules for other nonterminals which do appear on its right hand side. 3023 3024 For example: 3025 3026 expr: primary 3027 | primary '+' primary 3028 ; 3029 3030 primary: constant 3031 | '(' expr ')' 3032 ; 3033 3034 defines two mutually-recursive nonterminals, since each refers to the 3035 other. 3036 3037 3038 File: bison.info, Node: Semantics, Next: Locations, Prev: Recursion, Up: Grammar File 3039 3040 3.5 Defining Language Semantics 3041 =============================== 3042 3043 The grammar rules for a language determine only the syntax. The 3044 semantics are determined by the semantic values associated with various 3045 tokens and groupings, and by the actions taken when various groupings 3046 are recognized. 3047 3048 For example, the calculator calculates properly because the value 3049 associated with each expression is the proper number; it adds properly 3050 because the action for the grouping `X + Y' is to add the numbers 3051 associated with X and Y. 3052 3053 * Menu: 3054 3055 * Value Type:: Specifying one data type for all semantic values. 3056 * Multiple Types:: Specifying several alternative data types. 3057 * Actions:: An action is the semantic definition of a grammar rule. 3058 * Action Types:: Specifying data types for actions to operate on. 3059 * Mid-Rule Actions:: Most actions go at the end of a rule. 3060 This says when, why and how to use the exceptional 3061 action in the middle of a rule. 3062 3063 3064 File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics 3065 3066 3.5.1 Data Types of Semantic Values 3067 ----------------------------------- 3068 3069 In a simple program it may be sufficient to use the same data type for 3070 the semantic values of all language constructs. This was true in the 3071 RPN and infix calculator examples (*note Reverse Polish Notation 3072 Calculator: RPN Calc.). 3073 3074 Bison's default is to use type `int' for all semantic values. To 3075 specify some other type, define `YYSTYPE' as a macro, like this: 3076 3077 #define YYSTYPE double 3078 3079 `YYSTYPE''s replacement list should be a type name that does not 3080 contain parentheses or square brackets. This macro definition must go 3081 in the prologue of the grammar file (*note Outline of a Bison Grammar: 3082 Grammar Outline.). 3083 3084 3085 File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics 3086 3087 3.5.2 More Than One Value Type 3088 ------------------------------ 3089 3090 In most programs, you will need different data types for different kinds 3091 of tokens and groupings. For example, a numeric constant may need type 3092 `int' or `long int', while a string constant needs type `char *', and 3093 an identifier might need a pointer to an entry in the symbol table. 3094 3095 To use more than one data type for semantic values in one parser, 3096 Bison requires you to do two things: 3097 3098 * Specify the entire collection of possible data types, with the 3099 `%union' Bison declaration (*note The Collection of Value Types: 3100 Union Decl.). 3101 3102 * Choose one of those types for each symbol (terminal or 3103 nonterminal) for which semantic values are used. This is done for 3104 tokens with the `%token' Bison declaration (*note Token Type 3105 Names: Token Decl.) and for groupings with the `%type' Bison 3106 declaration (*note Nonterminal Symbols: Type Decl.). 3107 3108 3109 File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics 3110 3111 3.5.3 Actions 3112 ------------- 3113 3114 An action accompanies a syntactic rule and contains C code to be 3115 executed each time an instance of that rule is recognized. The task of 3116 most actions is to compute a semantic value for the grouping built by 3117 the rule from the semantic values associated with tokens or smaller 3118 groupings. 3119 3120 An action consists of braced code containing C statements, and can be 3121 placed at any position in the rule; it is executed at that position. 3122 Most rules have just one action at the end of the rule, following all 3123 the components. Actions in the middle of a rule are tricky and used 3124 only for special purposes (*note Actions in Mid-Rule: Mid-Rule 3125 Actions.). 3126 3127 The C code in an action can refer to the semantic values of the 3128 components matched by the rule with the construct `$N', which stands for 3129 the value of the Nth component. The semantic value for the grouping 3130 being constructed is `$$'. Bison translates both of these constructs 3131 into expressions of the appropriate type when it copies the actions 3132 into the parser file. `$$' is translated to a modifiable lvalue, so it 3133 can be assigned to. 3134 3135 Here is a typical example: 3136 3137 exp: ... 3138 | exp '+' exp 3139 { $$ = $1 + $3; } 3140 3141 This rule constructs an `exp' from two smaller `exp' groupings 3142 connected by a plus-sign token. In the action, `$1' and `$3' refer to 3143 the semantic values of the two component `exp' groupings, which are the 3144 first and third symbols on the right hand side of the rule. The sum is 3145 stored into `$$' so that it becomes the semantic value of the 3146 addition-expression just recognized by the rule. If there were a 3147 useful semantic value associated with the `+' token, it could be 3148 referred to as `$2'. 3149 3150 Note that the vertical-bar character `|' is really a rule separator, 3151 and actions are attached to a single rule. This is a difference with 3152 tools like Flex, for which `|' stands for either "or", or "the same 3153 action as that of the next rule". In the following example, the action 3154 is triggered only when `b' is found: 3155 3156 a-or-b: 'a'|'b' { a_or_b_found = 1; }; 3157 3158 If you don't specify an action for a rule, Bison supplies a default: 3159 `$$ = $1'. Thus, the value of the first symbol in the rule becomes the 3160 value of the whole rule. Of course, the default action is valid only 3161 if the two data types match. There is no meaningful default action for 3162 an empty rule; every empty rule must have an explicit action unless the 3163 rule's value does not matter. 3164 3165 `$N' with N zero or negative is allowed for reference to tokens and 3166 groupings on the stack _before_ those that match the current rule. 3167 This is a very risky practice, and to use it reliably you must be 3168 certain of the context in which the rule is applied. Here is a case in 3169 which you can use this reliably: 3170 3171 foo: expr bar '+' expr { ... } 3172 | expr bar '-' expr { ... } 3173 ; 3174 3175 bar: /* empty */ 3176 { previous_expr = $0; } 3177 ; 3178 3179 As long as `bar' is used only in the fashion shown here, `$0' always 3180 refers to the `expr' which precedes `bar' in the definition of `foo'. 3181 3182 It is also possible to access the semantic value of the look-ahead 3183 token, if any, from a semantic action. This semantic value is stored 3184 in `yylval'. *Note Special Features for Use in Actions: Action 3185 Features. 3186 3187 3188 File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics 3189 3190 3.5.4 Data Types of Values in Actions 3191 ------------------------------------- 3192 3193 If you have chosen a single data type for semantic values, the `$$' and 3194 `$N' constructs always have that data type. 3195 3196 If you have used `%union' to specify a variety of data types, then 3197 you must declare a choice among these types for each terminal or 3198 nonterminal symbol that can have a semantic value. Then each time you 3199 use `$$' or `$N', its data type is determined by which symbol it refers 3200 to in the rule. In this example, 3201 3202 exp: ... 3203 | exp '+' exp 3204 { $$ = $1 + $3; } 3205 3206 `$1' and `$3' refer to instances of `exp', so they all have the data 3207 type declared for the nonterminal symbol `exp'. If `$2' were used, it 3208 would have the data type declared for the terminal symbol `'+'', 3209 whatever that might be. 3210 3211 Alternatively, you can specify the data type when you refer to the 3212 value, by inserting `<TYPE>' after the `$' at the beginning of the 3213 reference. For example, if you have defined types as shown here: 3214 3215 %union { 3216 int itype; 3217 double dtype; 3218 } 3219 3220 then you can write `$<itype>1' to refer to the first subunit of the 3221 rule as an integer, or `$<dtype>1' to refer to it as a double. 3222 3223 3224 File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics 3225 3226 3.5.5 Actions in Mid-Rule 3227 ------------------------- 3228 3229 Occasionally it is useful to put an action in the middle of a rule. 3230 These actions are written just like usual end-of-rule actions, but they 3231 are executed before the parser even recognizes the following components. 3232 3233 A mid-rule action may refer to the components preceding it using 3234 `$N', but it may not refer to subsequent components because it is run 3235 before they are parsed. 3236 3237 The mid-rule action itself counts as one of the components of the 3238 rule. This makes a difference when there is another action later in 3239 the same rule (and usually there is another at the end): you have to 3240 count the actions along with the symbols when working out which number 3241 N to use in `$N'. 3242 3243 The mid-rule action can also have a semantic value. The action can 3244 set its value with an assignment to `$$', and actions later in the rule 3245 can refer to the value using `$N'. Since there is no symbol to name 3246 the action, there is no way to declare a data type for the value in 3247 advance, so you must use the `$<...>N' construct to specify a data type 3248 each time you refer to this value. 3249 3250 There is no way to set the value of the entire rule with a mid-rule 3251 action, because assignments to `$$' do not have that effect. The only 3252 way to set the value for the entire rule is with an ordinary action at 3253 the end of the rule. 3254 3255 Here is an example from a hypothetical compiler, handling a `let' 3256 statement that looks like `let (VARIABLE) STATEMENT' and serves to 3257 create a variable named VARIABLE temporarily for the duration of 3258 STATEMENT. To parse this construct, we must put VARIABLE into the 3259 symbol table while STATEMENT is parsed, then remove it afterward. Here 3260 is how it is done: 3261 3262 stmt: LET '(' var ')' 3263 { $<context>$ = push_context (); 3264 declare_variable ($3); } 3265 stmt { $$ = $6; 3266 pop_context ($<context>5); } 3267 3268 As soon as `let (VARIABLE)' has been recognized, the first action is 3269 run. It saves a copy of the current semantic context (the list of 3270 accessible variables) as its semantic value, using alternative 3271 `context' in the data-type union. Then it calls `declare_variable' to 3272 add the new variable to that list. Once the first action is finished, 3273 the embedded statement `stmt' can be parsed. Note that the mid-rule 3274 action is component number 5, so the `stmt' is component number 6. 3275 3276 After the embedded statement is parsed, its semantic value becomes 3277 the value of the entire `let'-statement. Then the semantic value from 3278 the earlier action is used to restore the prior list of variables. This 3279 removes the temporary `let'-variable from the list so that it won't 3280 appear to exist while the rest of the program is parsed. 3281 3282 In the above example, if the parser initiates error recovery (*note 3283 Error Recovery::) while parsing the tokens in the embedded statement 3284 `stmt', it might discard the previous semantic context `$<context>5' 3285 without restoring it. Thus, `$<context>5' needs a destructor (*note 3286 Freeing Discarded Symbols: Destructor Decl.). However, Bison currently 3287 provides no means to declare a destructor for a mid-rule action's 3288 semantic value. 3289 3290 One solution is to bury the mid-rule action inside a nonterminal 3291 symbol and to declare a destructor for that symbol: 3292 3293 %type <context> let 3294 %destructor { pop_context ($$); } let 3295 3296 %% 3297 3298 stmt: let stmt 3299 { $$ = $2; 3300 pop_context ($1); } 3301 ; 3302 3303 let: LET '(' var ')' 3304 { $$ = push_context (); 3305 declare_variable ($3); } 3306 ; 3307 3308 Note that the action is now at the end of its rule. Any mid-rule 3309 action can be converted to an end-of-rule action in this way, and this 3310 is what Bison actually does to implement mid-rule actions. 3311 3312 Taking action before a rule is completely recognized often leads to 3313 conflicts since the parser must commit to a parse in order to execute 3314 the action. For example, the following two rules, without mid-rule 3315 actions, can coexist in a working parser because the parser can shift 3316 the open-brace token and look at what follows before deciding whether 3317 there is a declaration or not: 3318 3319 compound: '{' declarations statements '}' 3320 | '{' statements '}' 3321 ; 3322 3323 But when we add a mid-rule action as follows, the rules become 3324 nonfunctional: 3325 3326 compound: { prepare_for_local_variables (); } 3327 '{' declarations statements '}' 3328 | '{' statements '}' 3329 ; 3330 3331 Now the parser is forced to decide whether to run the mid-rule action 3332 when it has read no farther than the open-brace. In other words, it 3333 must commit to using one rule or the other, without sufficient 3334 information to do it correctly. (The open-brace token is what is called 3335 the "look-ahead" token at this time, since the parser is still deciding 3336 what to do about it. *Note Look-Ahead Tokens: Look-Ahead.) 3337 3338 You might think that you could correct the problem by putting 3339 identical actions into the two rules, like this: 3340 3341 compound: { prepare_for_local_variables (); } 3342 '{' declarations statements '}' 3343 | { prepare_for_local_variables (); } 3344 '{' statements '}' 3345 ; 3346 3347 But this does not help, because Bison does not realize that the two 3348 actions are identical. (Bison never tries to understand the C code in 3349 an action.) 3350 3351 If the grammar is such that a declaration can be distinguished from a 3352 statement by the first token (which is true in C), then one solution 3353 which does work is to put the action after the open-brace, like this: 3354 3355 compound: '{' { prepare_for_local_variables (); } 3356 declarations statements '}' 3357 | '{' statements '}' 3358 ; 3359 3360 Now the first token of the following declaration or statement, which 3361 would in any case tell Bison which rule to use, can still do so. 3362 3363 Another solution is to bury the action inside a nonterminal symbol 3364 which serves as a subroutine: 3365 3366 subroutine: /* empty */ 3367 { prepare_for_local_variables (); } 3368 ; 3369 3370 compound: subroutine 3371 '{' declarations statements '}' 3372 | subroutine 3373 '{' statements '}' 3374 ; 3375 3376 Now Bison can execute the action in the rule for `subroutine' without 3377 deciding which rule for `compound' it will eventually use. 3378 3379 3380 File: bison.info, Node: Locations, Next: Declarations, Prev: Semantics, Up: Grammar File 3381 3382 3.6 Tracking Locations 3383 ====================== 3384 3385 Though grammar rules and semantic actions are enough to write a fully 3386 functional parser, it can be useful to process some additional 3387 information, especially symbol locations. 3388 3389 The way locations are handled is defined by providing a data type, 3390 and actions to take when rules are matched. 3391 3392 * Menu: 3393 3394 * Location Type:: Specifying a data type for locations. 3395 * Actions and Locations:: Using locations in actions. 3396 * Location Default Action:: Defining a general way to compute locations. 3397 3398 3399 File: bison.info, Node: Location Type, Next: Actions and Locations, Up: Locations 3400 3401 3.6.1 Data Type of Locations 3402 ---------------------------- 3403 3404 Defining a data type for locations is much simpler than for semantic 3405 values, since all tokens and groupings always use the same type. 3406 3407 You can specify the type of locations by defining a macro called 3408 `YYLTYPE', just as you can specify the semantic value type by defining 3409 `YYSTYPE' (*note Value Type::). When `YYLTYPE' is not defined, Bison 3410 uses a default structure type with four members: 3411 3412 typedef struct YYLTYPE 3413 { 3414 int first_line; 3415 int first_column; 3416 int last_line; 3417 int last_column; 3418 } YYLTYPE; 3419 3420 3421 File: bison.info, Node: Actions and Locations, Next: Location Default Action, Prev: Location Type, Up: Locations 3422 3423 3.6.2 Actions and Locations 3424 --------------------------- 3425 3426 Actions are not only useful for defining language semantics, but also 3427 for describing the behavior of the output parser with locations. 3428 3429 The most obvious way for building locations of syntactic groupings 3430 is very similar to the way semantic values are computed. In a given 3431 rule, several constructs can be used to access the locations of the 3432 elements being matched. The location of the Nth component of the right 3433 hand side is `@N', while the location of the left hand side grouping is 3434 `@$'. 3435 3436 Here is a basic example using the default data type for locations: 3437 3438 exp: ... 3439 | exp '/' exp 3440 { 3441 @$.first_column = @1.first_column; 3442 @$.first_line = @1.first_line; 3443 @$.last_column = @3.last_column; 3444 @$.last_line = @3.last_line; 3445 if ($3) 3446 $$ = $1 / $3; 3447 else 3448 { 3449 $$ = 1; 3450 fprintf (stderr, 3451 "Division by zero, l%d,c%d-l%d,c%d", 3452 @3.first_line, @3.first_column, 3453 @3.last_line, @3.last_column); 3454 } 3455 } 3456 3457 As for semantic values, there is a default action for locations that 3458 is run each time a rule is matched. It sets the beginning of `@$' to 3459 the beginning of the first symbol, and the end of `@$' to the end of the 3460 last symbol. 3461 3462 With this default action, the location tracking can be fully 3463 automatic. The example above simply rewrites this way: 3464 3465 exp: ... 3466 | exp '/' exp 3467 { 3468 if ($3) 3469 $$ = $1 / $3; 3470 else 3471 { 3472 $$ = 1; 3473 fprintf (stderr, 3474 "Division by zero, l%d,c%d-l%d,c%d", 3475 @3.first_line, @3.first_column, 3476 @3.last_line, @3.last_column); 3477 } 3478 } 3479 3480 It is also possible to access the location of the look-ahead token, 3481 if any, from a semantic action. This location is stored in `yylloc'. 3482 *Note Special Features for Use in Actions: Action Features. 3483 3484 3485 File: bison.info, Node: Location Default Action, Prev: Actions and Locations, Up: Locations 3486 3487 3.6.3 Default Action for Locations 3488 ---------------------------------- 3489 3490 Actually, actions are not the best place to compute locations. Since 3491 locations are much more general than semantic values, there is room in 3492 the output parser to redefine the default action to take for each rule. 3493 The `YYLLOC_DEFAULT' macro is invoked each time a rule is matched, 3494 before the associated action is run. It is also invoked while 3495 processing a syntax error, to compute the error's location. Before 3496 reporting an unresolvable syntactic ambiguity, a GLR parser invokes 3497 `YYLLOC_DEFAULT' recursively to compute the location of that ambiguity. 3498 3499 Most of the time, this macro is general enough to suppress location 3500 dedicated code from semantic actions. 3501 3502 The `YYLLOC_DEFAULT' macro takes three parameters. The first one is 3503 the location of the grouping (the result of the computation). When a 3504 rule is matched, the second parameter identifies locations of all right 3505 hand side elements of the rule being matched, and the third parameter 3506 is the size of the rule's right hand side. When a GLR parser reports 3507 an ambiguity, which of multiple candidate right hand sides it passes to 3508 `YYLLOC_DEFAULT' is undefined. When processing a syntax error, the 3509 second parameter identifies locations of the symbols that were 3510 discarded during error processing, and the third parameter is the 3511 number of discarded symbols. 3512 3513 By default, `YYLLOC_DEFAULT' is defined this way: 3514 3515 # define YYLLOC_DEFAULT(Current, Rhs, N) \ 3516 do \ 3517 if (N) \ 3518 { \ 3519 (Current).first_line = YYRHSLOC(Rhs, 1).first_line; \ 3520 (Current).first_column = YYRHSLOC(Rhs, 1).first_column; \ 3521 (Current).last_line = YYRHSLOC(Rhs, N).last_line; \ 3522 (Current).last_column = YYRHSLOC(Rhs, N).last_column; \ 3523 } \ 3524 else \ 3525 { \ 3526 (Current).first_line = (Current).last_line = \ 3527 YYRHSLOC(Rhs, 0).last_line; \ 3528 (Current).first_column = (Current).last_column = \ 3529 YYRHSLOC(Rhs, 0).last_column; \ 3530 } \ 3531 while (0) 3532 3533 where `YYRHSLOC (rhs, k)' is the location of the Kth symbol in RHS 3534 when K is positive, and the location of the symbol just before the 3535 reduction when K and N are both zero. 3536 3537 When defining `YYLLOC_DEFAULT', you should consider that: 3538 3539 * All arguments are free of side-effects. However, only the first 3540 one (the result) should be modified by `YYLLOC_DEFAULT'. 3541 3542 * For consistency with semantic actions, valid indexes within the 3543 right hand side range from 1 to N. When N is zero, only 0 is a 3544 valid index, and it refers to the symbol just before the reduction. 3545 During error processing N is always positive. 3546 3547 * Your macro should parenthesize its arguments, if need be, since the 3548 actual arguments may not be surrounded by parentheses. Also, your 3549 macro should expand to something that can be used as a single 3550 statement when it is followed by a semicolon. 3551 3552 3553 File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Locations, Up: Grammar File 3554 3555 3.7 Bison Declarations 3556 ====================== 3557 3558 The "Bison declarations" section of a Bison grammar defines the symbols 3559 used in formulating the grammar and the data types of semantic values. 3560 *Note Symbols::. 3561 3562 All token type names (but not single-character literal tokens such as 3563 `'+'' and `'*'') must be declared. Nonterminal symbols must be 3564 declared if you need to specify which data type to use for the semantic 3565 value (*note More Than One Value Type: Multiple Types.). 3566 3567 The first rule in the file also specifies the start symbol, by 3568 default. If you want some other symbol to be the start symbol, you 3569 must declare it explicitly (*note Languages and Context-Free Grammars: 3570 Language and Grammar.). 3571 3572 * Menu: 3573 3574 * Require Decl:: Requiring a Bison version. 3575 * Token Decl:: Declaring terminal symbols. 3576 * Precedence Decl:: Declaring terminals with precedence and associativity. 3577 * Union Decl:: Declaring the set of all semantic value types. 3578 * Type Decl:: Declaring the choice of type for a nonterminal symbol. 3579 * Initial Action Decl:: Code run before parsing starts. 3580 * Destructor Decl:: Declaring how symbols are freed. 3581 * Expect Decl:: Suppressing warnings about parsing conflicts. 3582 * Start Decl:: Specifying the start symbol. 3583 * Pure Decl:: Requesting a reentrant parser. 3584 * Decl Summary:: Table of all Bison declarations. 3585 3586 3587 File: bison.info, Node: Require Decl, Next: Token Decl, Up: Declarations 3588 3589 3.7.1 Require a Version of Bison 3590 -------------------------------- 3591 3592 You may require the minimum version of Bison to process the grammar. If 3593 the requirement is not met, `bison' exits with an error (exit status 3594 63). 3595 3596 %require "VERSION" 3597 3598 3599 File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Require Decl, Up: Declarations 3600 3601 3.7.2 Token Type Names 3602 ---------------------- 3603 3604 The basic way to declare a token type name (terminal symbol) is as 3605 follows: 3606 3607 %token NAME 3608 3609 Bison will convert this into a `#define' directive in the parser, so 3610 that the function `yylex' (if it is in this file) can use the name NAME 3611 to stand for this token type's code. 3612 3613 Alternatively, you can use `%left', `%right', or `%nonassoc' instead 3614 of `%token', if you wish to specify associativity and precedence. 3615 *Note Operator Precedence: Precedence Decl. 3616 3617 You can explicitly specify the numeric code for a token type by 3618 appending a decimal or hexadecimal integer value in the field 3619 immediately following the token name: 3620 3621 %token NUM 300 3622 %token XNUM 0x12d // a GNU extension 3623 3624 It is generally best, however, to let Bison choose the numeric codes for 3625 all token types. Bison will automatically select codes that don't 3626 conflict with each other or with normal characters. 3627 3628 In the event that the stack type is a union, you must augment the 3629 `%token' or other token declaration to include the data type 3630 alternative delimited by angle-brackets (*note More Than One Value 3631 Type: Multiple Types.). 3632 3633 For example: 3634 3635 %union { /* define stack type */ 3636 double val; 3637 symrec *tptr; 3638 } 3639 %token <val> NUM /* define token NUM and its type */ 3640 3641 You can associate a literal string token with a token type name by 3642 writing the literal string at the end of a `%token' declaration which 3643 declares the name. For example: 3644 3645 %token arrow "=>" 3646 3647 For example, a grammar for the C language might specify these names with 3648 equivalent literal string tokens: 3649 3650 %token <operator> OR "||" 3651 %token <operator> LE 134 "<=" 3652 %left OR "<=" 3653 3654 Once you equate the literal string and the token name, you can use them 3655 interchangeably in further declarations or the grammar rules. The 3656 `yylex' function can use the token name or the literal string to obtain 3657 the token type code number (*note Calling Convention::). 3658 3659 3660 File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations 3661 3662 3.7.3 Operator Precedence 3663 ------------------------- 3664 3665 Use the `%left', `%right' or `%nonassoc' declaration to declare a token 3666 and specify its precedence and associativity, all at once. These are 3667 called "precedence declarations". *Note Operator Precedence: 3668 Precedence, for general information on operator precedence. 3669 3670 The syntax of a precedence declaration is the same as that of 3671 `%token': either 3672 3673 %left SYMBOLS... 3674 3675 or 3676 3677 %left <TYPE> SYMBOLS... 3678 3679 And indeed any of these declarations serves the purposes of `%token'. 3680 But in addition, they specify the associativity and relative precedence 3681 for all the SYMBOLS: 3682 3683 * The associativity of an operator OP determines how repeated uses 3684 of the operator nest: whether `X OP Y OP Z' is parsed by grouping 3685 X with Y first or by grouping Y with Z first. `%left' specifies 3686 left-associativity (grouping X with Y first) and `%right' 3687 specifies right-associativity (grouping Y with Z first). 3688 `%nonassoc' specifies no associativity, which means that `X OP Y 3689 OP Z' is considered a syntax error. 3690 3691 * The precedence of an operator determines how it nests with other 3692 operators. All the tokens declared in a single precedence 3693 declaration have equal precedence and nest together according to 3694 their associativity. When two tokens declared in different 3695 precedence declarations associate, the one declared later has the 3696 higher precedence and is grouped first. 3697 3698 3699 File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations 3700 3701 3.7.4 The Collection of Value Types 3702 ----------------------------------- 3703 3704 The `%union' declaration specifies the entire collection of possible 3705 data types for semantic values. The keyword `%union' is followed by 3706 braced code containing the same thing that goes inside a `union' in C. 3707 3708 For example: 3709 3710 %union { 3711 double val; 3712 symrec *tptr; 3713 } 3714 3715 This says that the two alternative types are `double' and `symrec *'. 3716 They are given names `val' and `tptr'; these names are used in the 3717 `%token' and `%type' declarations to pick one of the types for a 3718 terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.). 3719 3720 As an extension to POSIX, a tag is allowed after the `union'. For 3721 example: 3722 3723 %union value { 3724 double val; 3725 symrec *tptr; 3726 } 3727 3728 specifies the union tag `value', so the corresponding C type is `union 3729 value'. If you do not specify a tag, it defaults to `YYSTYPE'. 3730 3731 As another extension to POSIX, you may specify multiple `%union' 3732 declarations; their contents are concatenated. However, only the first 3733 `%union' declaration can specify a tag. 3734 3735 Note that, unlike making a `union' declaration in C, you need not 3736 write a semicolon after the closing brace. 3737 3738 3739 File: bison.info, Node: Type Decl, Next: Initial Action Decl, Prev: Union Decl, Up: Declarations 3740 3741 3.7.5 Nonterminal Symbols 3742 ------------------------- 3743 3744 When you use `%union' to specify multiple value types, you must declare 3745 the value type of each nonterminal symbol for which values are used. 3746 This is done with a `%type' declaration, like this: 3747 3748 %type <TYPE> NONTERMINAL... 3749 3750 Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the 3751 name given in the `%union' to the alternative that you want (*note The 3752 Collection of Value Types: Union Decl.). You can give any number of 3753 nonterminal symbols in the same `%type' declaration, if they have the 3754 same value type. Use spaces to separate the symbol names. 3755 3756 You can also declare the value type of a terminal symbol. To do 3757 this, use the same `<TYPE>' construction in a declaration for the 3758 terminal symbol. All kinds of token declarations allow `<TYPE>'. 3759 3760 3761 File: bison.info, Node: Initial Action Decl, Next: Destructor Decl, Prev: Type Decl, Up: Declarations 3762 3763 3.7.6 Performing Actions before Parsing 3764 --------------------------------------- 3765 3766 Sometimes your parser needs to perform some initializations before 3767 parsing. The `%initial-action' directive allows for such arbitrary 3768 code. 3769 3770 -- Directive: %initial-action { CODE } 3771 Declare that the braced CODE must be invoked before parsing each 3772 time `yyparse' is called. The CODE may use `$$' and `@$' -- 3773 initial value and location of the look-ahead -- and the 3774 `%parse-param'. 3775 3776 For instance, if your locations use a file name, you may use 3777 3778 %parse-param { char const *file_name }; 3779 %initial-action 3780 { 3781 @$.initialize (file_name); 3782 }; 3783 3784 3785 File: bison.info, Node: Destructor Decl, Next: Expect Decl, Prev: Initial Action Decl, Up: Declarations 3786 3787 3.7.7 Freeing Discarded Symbols 3788 ------------------------------- 3789 3790 During error recovery (*note Error Recovery::), symbols already pushed 3791 on the stack and tokens coming from the rest of the file are discarded 3792 until the parser falls on its feet. If the parser runs out of memory, 3793 or if it returns via `YYABORT' or `YYACCEPT', all the symbols on the 3794 stack must be discarded. Even if the parser succeeds, it must discard 3795 the start symbol. 3796 3797 When discarded symbols convey heap based information, this memory is 3798 lost. While this behavior can be tolerable for batch parsers, such as 3799 in traditional compilers, it is unacceptable for programs like shells or 3800 protocol implementations that may parse and execute indefinitely. 3801 3802 The `%destructor' directive defines code that is called when a 3803 symbol is automatically discarded. 3804 3805 -- Directive: %destructor { CODE } SYMBOLS 3806 Invoke the braced CODE whenever the parser discards one of the 3807 SYMBOLS. Within CODE, `$$' designates the semantic value 3808 associated with the discarded symbol. The additional parser 3809 parameters are also available (*note The Parser Function 3810 `yyparse': Parser Function.). 3811 3812 For instance: 3813 3814 %union 3815 { 3816 char *string; 3817 } 3818 %token <string> STRING 3819 %type <string> string 3820 %destructor { free ($$); } STRING string 3821 3822 guarantees that when a `STRING' or a `string' is discarded, its 3823 associated memory will be freed. 3824 3825 3826 "Discarded symbols" are the following: 3827 3828 * stacked symbols popped during the first phase of error recovery, 3829 3830 * incoming terminals during the second phase of error recovery, 3831 3832 * the current look-ahead and the entire stack (except the current 3833 right-hand side symbols) when the parser returns immediately, and 3834 3835 * the start symbol, when the parser succeeds. 3836 3837 The parser can "return immediately" because of an explicit call to 3838 `YYABORT' or `YYACCEPT', or failed error recovery, or memory exhaustion. 3839 3840 Right-hand size symbols of a rule that explicitly triggers a syntax 3841 error via `YYERROR' are not discarded automatically. As a rule of 3842 thumb, destructors are invoked only when user actions cannot manage the 3843 memory. 3844 3845 3846 File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Destructor Decl, Up: Declarations 3847 3848 3.7.8 Suppressing Conflict Warnings 3849 ----------------------------------- 3850 3851 Bison normally warns if there are any conflicts in the grammar (*note 3852 Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars have 3853 harmless shift/reduce conflicts which are resolved in a predictable way 3854 and would be difficult to eliminate. It is desirable to suppress the 3855 warning about these conflicts unless the number of conflicts changes. 3856 You can do this with the `%expect' declaration. 3857 3858 The declaration looks like this: 3859 3860 %expect N 3861 3862 Here N is a decimal integer. The declaration says there should be N 3863 shift/reduce conflicts and no reduce/reduce conflicts. Bison reports 3864 an error if the number of shift/reduce conflicts differs from N, or if 3865 there are any reduce/reduce conflicts. 3866 3867 For normal LALR(1) parsers, reduce/reduce conflicts are more 3868 serious, and should be eliminated entirely. Bison will always report 3869 reduce/reduce conflicts for these parsers. With GLR parsers, however, 3870 both kinds of conflicts are routine; otherwise, there would be no need 3871 to use GLR parsing. Therefore, it is also possible to specify an 3872 expected number of reduce/reduce conflicts in GLR parsers, using the 3873 declaration: 3874 3875 %expect-rr N 3876 3877 In general, using `%expect' involves these steps: 3878 3879 * Compile your grammar without `%expect'. Use the `-v' option to 3880 get a verbose list of where the conflicts occur. Bison will also 3881 print the number of conflicts. 3882 3883 * Check each of the conflicts to make sure that Bison's default 3884 resolution is what you really want. If not, rewrite the grammar 3885 and go back to the beginning. 3886 3887 * Add an `%expect' declaration, copying the number N from the number 3888 which Bison printed. With GLR parsers, add an `%expect-rr' 3889 declaration as well. 3890 3891 Now Bison will warn you if you introduce an unexpected conflict, but 3892 will keep silent otherwise. 3893 3894 3895 File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations 3896 3897 3.7.9 The Start-Symbol 3898 ---------------------- 3899 3900 Bison assumes by default that the start symbol for the grammar is the 3901 first nonterminal specified in the grammar specification section. The 3902 programmer may override this restriction with the `%start' declaration 3903 as follows: 3904 3905 %start SYMBOL 3906 3907 3908 File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations 3909 3910 3.7.10 A Pure (Reentrant) Parser 3911 -------------------------------- 3912 3913 A "reentrant" program is one which does not alter in the course of 3914 execution; in other words, it consists entirely of "pure" (read-only) 3915 code. Reentrancy is important whenever asynchronous execution is 3916 possible; for example, a nonreentrant program may not be safe to call 3917 from a signal handler. In systems with multiple threads of control, a 3918 nonreentrant program must be called only within interlocks. 3919 3920 Normally, Bison generates a parser which is not reentrant. This is 3921 suitable for most uses, and it permits compatibility with Yacc. (The 3922 standard Yacc interfaces are inherently nonreentrant, because they use 3923 statically allocated variables for communication with `yylex', 3924 including `yylval' and `yylloc'.) 3925 3926 Alternatively, you can generate a pure, reentrant parser. The Bison 3927 declaration `%pure-parser' says that you want the parser to be 3928 reentrant. It looks like this: 3929 3930 %pure-parser 3931 3932 The result is that the communication variables `yylval' and `yylloc' 3933 become local variables in `yyparse', and a different calling convention 3934 is used for the lexical analyzer function `yylex'. *Note Calling 3935 Conventions for Pure Parsers: Pure Calling, for the details of this. 3936 The variable `yynerrs' also becomes local in `yyparse' (*note The Error 3937 Reporting Function `yyerror': Error Reporting.). The convention for 3938 calling `yyparse' itself is unchanged. 3939 3940 Whether the parser is pure has nothing to do with the grammar rules. 3941 You can generate either a pure parser or a nonreentrant parser from any 3942 valid grammar. 3943 3944 3945 File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations 3946 3947 3.7.11 Bison Declaration Summary 3948 -------------------------------- 3949 3950 Here is a summary of the declarations used to define a grammar: 3951 3952 -- Directive: %union 3953 Declare the collection of data types that semantic values may have 3954 (*note The Collection of Value Types: Union Decl.). 3955 3956 -- Directive: %token 3957 Declare a terminal symbol (token type name) with no precedence or 3958 associativity specified (*note Token Type Names: Token Decl.). 3959 3960 -- Directive: %right 3961 Declare a terminal symbol (token type name) that is 3962 right-associative (*note Operator Precedence: Precedence Decl.). 3963 3964 -- Directive: %left 3965 Declare a terminal symbol (token type name) that is 3966 left-associative (*note Operator Precedence: Precedence Decl.). 3967 3968 -- Directive: %nonassoc 3969 Declare a terminal symbol (token type name) that is nonassociative 3970 (*note Operator Precedence: Precedence Decl.). Using it in a way 3971 that would be associative is a syntax error. 3972 3973 -- Directive: %type 3974 Declare the type of semantic values for a nonterminal symbol 3975 (*note Nonterminal Symbols: Type Decl.). 3976 3977 -- Directive: %start 3978 Specify the grammar's start symbol (*note The Start-Symbol: Start 3979 Decl.). 3980 3981 -- Directive: %expect 3982 Declare the expected number of shift-reduce conflicts (*note 3983 Suppressing Conflict Warnings: Expect Decl.). 3984 3985 3986 In order to change the behavior of `bison', use the following 3987 directives: 3988 3989 -- Directive: %debug 3990 In the parser file, define the macro `YYDEBUG' to 1 if it is not 3991 already defined, so that the debugging facilities are compiled. 3992 *Note Tracing Your Parser: Tracing. 3993 3994 -- Directive: %defines 3995 Write a header file containing macro definitions for the token type 3996 names defined in the grammar as well as a few other declarations. 3997 If the parser output file is named `NAME.c' then this file is 3998 named `NAME.h'. 3999 4000 Unless `YYSTYPE' is already defined as a macro, the output header 4001 declares `YYSTYPE'. Therefore, if you are using a `%union' (*note 4002 More Than One Value Type: Multiple Types.) with components that 4003 require other definitions, or if you have defined a `YYSTYPE' macro 4004 (*note Data Types of Semantic Values: Value Type.), you need to 4005 arrange for these definitions to be propagated to all modules, 4006 e.g., by putting them in a prerequisite header that is included 4007 both by your parser and by any other module that needs `YYSTYPE'. 4008 4009 Unless your parser is pure, the output header declares `yylval' as 4010 an external variable. *Note A Pure (Reentrant) Parser: Pure Decl. 4011 4012 If you have also used locations, the output header declares 4013 `YYLTYPE' and `yylloc' using a protocol similar to that of 4014 `YYSTYPE' and `yylval'. *Note Tracking Locations: Locations. 4015 4016 This output file is normally essential if you wish to put the 4017 definition of `yylex' in a separate source file, because `yylex' 4018 typically needs to be able to refer to the above-mentioned 4019 declarations and to the token type codes. *Note Semantic Values 4020 of Tokens: Token Values. 4021 4022 -- Directive: %destructor 4023 Specify how the parser should reclaim the memory associated to 4024 discarded symbols. *Note Freeing Discarded Symbols: Destructor 4025 Decl. 4026 4027 -- Directive: %file-prefix="PREFIX" 4028 Specify a prefix to use for all Bison output file names. The 4029 names are chosen as if the input file were named `PREFIX.y'. 4030 4031 -- Directive: %locations 4032 Generate the code processing the locations (*note Special Features 4033 for Use in Actions: Action Features.). This mode is enabled as 4034 soon as the grammar uses the special `@N' tokens, but if your 4035 grammar does not use it, using `%locations' allows for more 4036 accurate syntax error messages. 4037 4038 -- Directive: %name-prefix="PREFIX" 4039 Rename the external symbols used in the parser so that they start 4040 with PREFIX instead of `yy'. The precise list of symbols renamed 4041 in C parsers is `yyparse', `yylex', `yyerror', `yynerrs', 4042 `yylval', `yychar', `yydebug', and (if locations are used) 4043 `yylloc'. For example, if you use `%name-prefix="c_"', the names 4044 become `c_parse', `c_lex', and so on. In C++ parsers, it is only 4045 the surrounding namespace which is named PREFIX instead of `yy'. 4046 *Note Multiple Parsers in the Same Program: Multiple Parsers. 4047 4048 -- Directive: %no-parser 4049 Do not include any C code in the parser file; generate tables 4050 only. The parser file contains just `#define' directives and 4051 static variable declarations. 4052 4053 This option also tells Bison to write the C code for the grammar 4054 actions into a file named `FILE.act', in the form of a 4055 brace-surrounded body fit for a `switch' statement. 4056 4057 -- Directive: %no-lines 4058 Don't generate any `#line' preprocessor commands in the parser 4059 file. Ordinarily Bison writes these commands in the parser file 4060 so that the C compiler and debuggers will associate errors and 4061 object code with your source file (the grammar file). This 4062 directive causes them to associate errors with the parser file, 4063 treating it an independent source file in its own right. 4064 4065 -- Directive: %output="FILE" 4066 Specify FILE for the parser file. 4067 4068 -- Directive: %pure-parser 4069 Request a pure (reentrant) parser program (*note A Pure 4070 (Reentrant) Parser: Pure Decl.). 4071 4072 -- Directive: %require "VERSION" 4073 Require version VERSION or higher of Bison. *Note Require a 4074 Version of Bison: Require Decl. 4075 4076 -- Directive: %token-table 4077 Generate an array of token names in the parser file. The name of 4078 the array is `yytname'; `yytname[I]' is the name of the token 4079 whose internal Bison token code number is I. The first three 4080 elements of `yytname' correspond to the predefined tokens `"$end"', 4081 `"error"', and `"$undefined"'; after these come the symbols 4082 defined in the grammar file. 4083 4084 The name in the table includes all the characters needed to 4085 represent the token in Bison. For single-character literals and 4086 literal strings, this includes the surrounding quoting characters 4087 and any escape sequences. For example, the Bison single-character 4088 literal `'+'' corresponds to a three-character name, represented 4089 in C as `"'+'"'; and the Bison two-character literal string `"\\/"' 4090 corresponds to a five-character name, represented in C as 4091 `"\"\\\\/\""'. 4092 4093 When you specify `%token-table', Bison also generates macro 4094 definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and 4095 `YYNSTATES': 4096 4097 `YYNTOKENS' 4098 The highest token number, plus one. 4099 4100 `YYNNTS' 4101 The number of nonterminal symbols. 4102 4103 `YYNRULES' 4104 The number of grammar rules, 4105 4106 `YYNSTATES' 4107 The number of parser states (*note Parser States::). 4108 4109 -- Directive: %verbose 4110 Write an extra output file containing verbose descriptions of the 4111 parser states and what is done for each type of look-ahead token in 4112 that state. *Note Understanding Your Parser: Understanding, for 4113 more information. 4114 4115 -- Directive: %yacc 4116 Pretend the option `--yacc' was given, i.e., imitate Yacc, 4117 including its naming conventions. *Note Bison Options::, for more. 4118 4119 4120 File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File 4121 4122 3.8 Multiple Parsers in the Same Program 4123 ======================================== 4124 4125 Most programs that use Bison parse only one language and therefore 4126 contain only one Bison parser. But what if you want to parse more than 4127 one language with the same program? Then you need to avoid a name 4128 conflict between different definitions of `yyparse', `yylval', and so 4129 on. 4130 4131 The easy way to do this is to use the option `-p PREFIX' (*note 4132 Invoking Bison: Invocation.). This renames the interface functions and 4133 variables of the Bison parser to start with PREFIX instead of `yy'. 4134 You can use this to give each parser distinct names that do not 4135 conflict. 4136 4137 The precise list of symbols renamed is `yyparse', `yylex', 4138 `yyerror', `yynerrs', `yylval', `yylloc', `yychar' and `yydebug'. For 4139 example, if you use `-p c', the names become `cparse', `clex', and so 4140 on. 4141 4142 *All the other variables and macros associated with Bison are not 4143 renamed.* These others are not global; there is no conflict if the same 4144 name is used in different parsers. For example, `YYSTYPE' is not 4145 renamed, but defining this in different ways in different parsers causes 4146 no trouble (*note Data Types of Semantic Values: Value Type.). 4147 4148 The `-p' option works by adding macro definitions to the beginning 4149 of the parser source file, defining `yyparse' as `PREFIXparse', and so 4150 on. This effectively substitutes one name for the other in the entire 4151 parser file. 4152 4153 4154 File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top 4155 4156 4 Parser C-Language Interface 4157 ***************************** 4158 4159 The Bison parser is actually a C function named `yyparse'. Here we 4160 describe the interface conventions of `yyparse' and the other functions 4161 that it needs to use. 4162 4163 Keep in mind that the parser uses many C identifiers starting with 4164 `yy' and `YY' for internal purposes. If you use such an identifier 4165 (aside from those in this manual) in an action or in epilogue in the 4166 grammar file, you are likely to run into trouble. 4167 4168 * Menu: 4169 4170 * Parser Function:: How to call `yyparse' and what it returns. 4171 * Lexical:: You must supply a function `yylex' 4172 which reads tokens. 4173 * Error Reporting:: You must supply a function `yyerror'. 4174 * Action Features:: Special features for use in actions. 4175 * Internationalization:: How to let the parser speak in the user's 4176 native language. 4177 4178 4179 File: bison.info, Node: Parser Function, Next: Lexical, Up: Interface 4180 4181 4.1 The Parser Function `yyparse' 4182 ================================= 4183 4184 You call the function `yyparse' to cause parsing to occur. This 4185 function reads tokens, executes actions, and ultimately returns when it 4186 encounters end-of-input or an unrecoverable syntax error. You can also 4187 write an action which directs `yyparse' to return immediately without 4188 reading further. 4189 4190 -- Function: int yyparse (void) 4191 The value returned by `yyparse' is 0 if parsing was successful 4192 (return is due to end-of-input). 4193 4194 The value is 1 if parsing failed because of invalid input, i.e., 4195 input that contains a syntax error or that causes `YYABORT' to be 4196 invoked. 4197 4198 The value is 2 if parsing failed due to memory exhaustion. 4199 4200 In an action, you can cause immediate return from `yyparse' by using 4201 these macros: 4202 4203 -- Macro: YYACCEPT 4204 Return immediately with value 0 (to report success). 4205 4206 -- Macro: YYABORT 4207 Return immediately with value 1 (to report failure). 4208 4209 If you use a reentrant parser, you can optionally pass additional 4210 parameter information to it in a reentrant way. To do so, use the 4211 declaration `%parse-param': 4212 4213 -- Directive: %parse-param {ARGUMENT-DECLARATION} 4214 Declare that an argument declared by the braced-code 4215 ARGUMENT-DECLARATION is an additional `yyparse' argument. The 4216 ARGUMENT-DECLARATION is used when declaring functions or 4217 prototypes. The last identifier in ARGUMENT-DECLARATION must be 4218 the argument name. 4219 4220 Here's an example. Write this in the parser: 4221 4222 %parse-param {int *nastiness} 4223 %parse-param {int *randomness} 4224 4225 Then call the parser like this: 4226 4227 { 4228 int nastiness, randomness; 4229 ... /* Store proper data in `nastiness' and `randomness'. */ 4230 value = yyparse (&nastiness, &randomness); 4231 ... 4232 } 4233 4234 In the grammar actions, use expressions like this to refer to the data: 4235 4236 exp: ... { ...; *randomness += 1; ... } 4237 4238 4239 File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Function, Up: Interface 4240 4241 4.2 The Lexical Analyzer Function `yylex' 4242 ========================================= 4243 4244 The "lexical analyzer" function, `yylex', recognizes tokens from the 4245 input stream and returns them to the parser. Bison does not create 4246 this function automatically; you must write it so that `yyparse' can 4247 call it. The function is sometimes referred to as a lexical scanner. 4248 4249 In simple programs, `yylex' is often defined at the end of the Bison 4250 grammar file. If `yylex' is defined in a separate source file, you 4251 need to arrange for the token-type macro definitions to be available 4252 there. To do this, use the `-d' option when you run Bison, so that it 4253 will write these macro definitions into a separate header file 4254 `NAME.tab.h' which you can include in the other source files that need 4255 it. *Note Invoking Bison: Invocation. 4256 4257 * Menu: 4258 4259 * Calling Convention:: How `yyparse' calls `yylex'. 4260 * Token Values:: How `yylex' must return the semantic value 4261 of the token it has read. 4262 * Token Locations:: How `yylex' must return the text location 4263 (line number, etc.) of the token, if the 4264 actions want that. 4265 * Pure Calling:: How the calling convention differs 4266 in a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). 4267 4268 4269 File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical 4270 4271 4.2.1 Calling Convention for `yylex' 4272 ------------------------------------ 4273 4274 The value that `yylex' returns must be the positive numeric code for 4275 the type of token it has just found; a zero or negative value signifies 4276 end-of-input. 4277 4278 When a token is referred to in the grammar rules by a name, that name 4279 in the parser file becomes a C macro whose definition is the proper 4280 numeric code for that token type. So `yylex' can use the name to 4281 indicate that type. *Note Symbols::. 4282 4283 When a token is referred to in the grammar rules by a character 4284 literal, the numeric code for that character is also the code for the 4285 token type. So `yylex' can simply return that character code, possibly 4286 converted to `unsigned char' to avoid sign-extension. The null 4287 character must not be used this way, because its code is zero and that 4288 signifies end-of-input. 4289 4290 Here is an example showing these things: 4291 4292 int 4293 yylex (void) 4294 { 4295 ... 4296 if (c == EOF) /* Detect end-of-input. */ 4297 return 0; 4298 ... 4299 if (c == '+' || c == '-') 4300 return c; /* Assume token type for `+' is '+'. */ 4301 ... 4302 return INT; /* Return the type of the token. */ 4303 ... 4304 } 4305 4306 This interface has been designed so that the output from the `lex' 4307 utility can be used without change as the definition of `yylex'. 4308 4309 If the grammar uses literal string tokens, there are two ways that 4310 `yylex' can determine the token type codes for them: 4311 4312 * If the grammar defines symbolic token names as aliases for the 4313 literal string tokens, `yylex' can use these symbolic names like 4314 all others. In this case, the use of the literal string tokens in 4315 the grammar file has no effect on `yylex'. 4316 4317 * `yylex' can find the multicharacter token in the `yytname' table. 4318 The index of the token in the table is the token type's code. The 4319 name of a multicharacter token is recorded in `yytname' with a 4320 double-quote, the token's characters, and another double-quote. 4321 The token's characters are escaped as necessary to be suitable as 4322 input to Bison. 4323 4324 Here's code for looking up a multicharacter token in `yytname', 4325 assuming that the characters of the token are stored in 4326 `token_buffer', and assuming that the token does not contain any 4327 characters like `"' that require escaping. 4328 4329 for (i = 0; i < YYNTOKENS; i++) 4330 { 4331 if (yytname[i] != 0 4332 && yytname[i][0] == '"' 4333 && ! strncmp (yytname[i] + 1, token_buffer, 4334 strlen (token_buffer)) 4335 && yytname[i][strlen (token_buffer) + 1] == '"' 4336 && yytname[i][strlen (token_buffer) + 2] == 0) 4337 break; 4338 } 4339 4340 The `yytname' table is generated only if you use the 4341 `%token-table' declaration. *Note Decl Summary::. 4342 4343 4344 File: bison.info, Node: Token Values, Next: Token Locations, Prev: Calling Convention, Up: Lexical 4345 4346 4.2.2 Semantic Values of Tokens 4347 ------------------------------- 4348 4349 In an ordinary (nonreentrant) parser, the semantic value of the token 4350 must be stored into the global variable `yylval'. When you are using 4351 just one data type for semantic values, `yylval' has that type. Thus, 4352 if the type is `int' (the default), you might write this in `yylex': 4353 4354 ... 4355 yylval = value; /* Put value onto Bison stack. */ 4356 return INT; /* Return the type of the token. */ 4357 ... 4358 4359 When you are using multiple data types, `yylval''s type is a union 4360 made from the `%union' declaration (*note The Collection of Value 4361 Types: Union Decl.). So when you store a token's value, you must use 4362 the proper member of the union. If the `%union' declaration looks like 4363 this: 4364 4365 %union { 4366 int intval; 4367 double val; 4368 symrec *tptr; 4369 } 4370 4371 then the code in `yylex' might look like this: 4372 4373 ... 4374 yylval.intval = value; /* Put value onto Bison stack. */ 4375 return INT; /* Return the type of the token. */ 4376 ... 4377 4378 4379 File: bison.info, Node: Token Locations, Next: Pure Calling, Prev: Token Values, Up: Lexical 4380 4381 4.2.3 Textual Locations of Tokens 4382 --------------------------------- 4383 4384 If you are using the `@N'-feature (*note Tracking Locations: 4385 Locations.) in actions to keep track of the textual locations of tokens 4386 and groupings, then you must provide this information in `yylex'. The 4387 function `yyparse' expects to find the textual location of a token just 4388 parsed in the global variable `yylloc'. So `yylex' must store the 4389 proper data in that variable. 4390 4391 By default, the value of `yylloc' is a structure and you need only 4392 initialize the members that are going to be used by the actions. The 4393 four members are called `first_line', `first_column', `last_line' and 4394 `last_column'. Note that the use of this feature makes the parser 4395 noticeably slower. 4396 4397 The data type of `yylloc' has the name `YYLTYPE'. 4398 4399 4400 File: bison.info, Node: Pure Calling, Prev: Token Locations, Up: Lexical 4401 4402 4.2.4 Calling Conventions for Pure Parsers 4403 ------------------------------------------ 4404 4405 When you use the Bison declaration `%pure-parser' to request a pure, 4406 reentrant parser, the global communication variables `yylval' and 4407 `yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.) 4408 In such parsers the two global variables are replaced by pointers 4409 passed as arguments to `yylex'. You must declare them as shown here, 4410 and pass the information back by storing it through those pointers. 4411 4412 int 4413 yylex (YYSTYPE *lvalp, YYLTYPE *llocp) 4414 { 4415 ... 4416 *lvalp = value; /* Put value onto Bison stack. */ 4417 return INT; /* Return the type of the token. */ 4418 ... 4419 } 4420 4421 If the grammar file does not use the `@' constructs to refer to 4422 textual locations, then the type `YYLTYPE' will not be defined. In 4423 this case, omit the second argument; `yylex' will be called with only 4424 one argument. 4425 4426 If you wish to pass the additional parameter data to `yylex', use 4427 `%lex-param' just like `%parse-param' (*note Parser Function::). 4428 4429 -- Directive: lex-param {ARGUMENT-DECLARATION} 4430 Declare that the braced-code ARGUMENT-DECLARATION is an additional 4431 `yylex' argument declaration. 4432 4433 For instance: 4434 4435 %parse-param {int *nastiness} 4436 %lex-param {int *nastiness} 4437 %parse-param {int *randomness} 4438 4439 results in the following signature: 4440 4441 int yylex (int *nastiness); 4442 int yyparse (int *nastiness, int *randomness); 4443 4444 If `%pure-parser' is added: 4445 4446 int yylex (YYSTYPE *lvalp, int *nastiness); 4447 int yyparse (int *nastiness, int *randomness); 4448 4449 and finally, if both `%pure-parser' and `%locations' are used: 4450 4451 int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); 4452 int yyparse (int *nastiness, int *randomness); 4453 4454 4455 File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface 4456 4457 4.3 The Error Reporting Function `yyerror' 4458 ========================================== 4459 4460 The Bison parser detects a "syntax error" or "parse error" whenever it 4461 reads a token which cannot satisfy any syntax rule. An action in the 4462 grammar can also explicitly proclaim an error, using the macro 4463 `YYERROR' (*note Special Features for Use in Actions: Action Features.). 4464 4465 The Bison parser expects to report the error by calling an error 4466 reporting function named `yyerror', which you must supply. It is 4467 called by `yyparse' whenever a syntax error is found, and it receives 4468 one argument. For a syntax error, the string is normally 4469 `"syntax error"'. 4470 4471 If you invoke the directive `%error-verbose' in the Bison 4472 declarations section (*note The Bison Declarations Section: Bison 4473 Declarations.), then Bison provides a more verbose and specific error 4474 message string instead of just plain `"syntax error"'. 4475 4476 The parser can detect one other kind of error: memory exhaustion. 4477 This can happen when the input contains constructions that are very 4478 deeply nested. It isn't likely you will encounter this, since the Bison 4479 parser normally extends its stack automatically up to a very large 4480 limit. But if memory is exhausted, `yyparse' calls `yyerror' in the 4481 usual fashion, except that the argument string is `"memory exhausted"'. 4482 4483 In some cases diagnostics like `"syntax error"' are translated 4484 automatically from English to some other language before they are 4485 passed to `yyerror'. *Note Internationalization::. 4486 4487 The following definition suffices in simple programs: 4488 4489 void 4490 yyerror (char const *s) 4491 { 4492 fprintf (stderr, "%s\n", s); 4493 } 4494 4495 After `yyerror' returns to `yyparse', the latter will attempt error 4496 recovery if you have written suitable error recovery grammar rules 4497 (*note Error Recovery::). If recovery is impossible, `yyparse' will 4498 immediately return 1. 4499 4500 Obviously, in location tracking pure parsers, `yyerror' should have 4501 an access to the current location. This is indeed the case for the GLR 4502 parsers, but not for the Yacc parser, for historical reasons. I.e., if 4503 `%locations %pure-parser' is passed then the prototypes for `yyerror' 4504 are: 4505 4506 void yyerror (char const *msg); /* Yacc parsers. */ 4507 void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ 4508 4509 If `%parse-param {int *nastiness}' is used, then: 4510 4511 void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */ 4512 void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ 4513 4514 Finally, GLR and Yacc parsers share the same `yyerror' calling 4515 convention for absolutely pure parsers, i.e., when the calling 4516 convention of `yylex' _and_ the calling convention of `%pure-parser' 4517 are pure. I.e.: 4518 4519 /* Location tracking. */ 4520 %locations 4521 /* Pure yylex. */ 4522 %pure-parser 4523 %lex-param {int *nastiness} 4524 /* Pure yyparse. */ 4525 %parse-param {int *nastiness} 4526 %parse-param {int *randomness} 4527 4528 results in the following signatures for all the parser kinds: 4529 4530 int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); 4531 int yyparse (int *nastiness, int *randomness); 4532 void yyerror (YYLTYPE *locp, 4533 int *nastiness, int *randomness, 4534 char const *msg); 4535 4536 The prototypes are only indications of how the code produced by Bison 4537 uses `yyerror'. Bison-generated code always ignores the returned 4538 value, so `yyerror' can return any type, including `void'. Also, 4539 `yyerror' can be a variadic function; that is why the message is always 4540 passed last. 4541 4542 Traditionally `yyerror' returns an `int' that is always ignored, but 4543 this is purely for historical reasons, and `void' is preferable since 4544 it more accurately describes the return type for `yyerror'. 4545 4546 The variable `yynerrs' contains the number of syntax errors reported 4547 so far. Normally this variable is global; but if you request a pure 4548 parser (*note A Pure (Reentrant) Parser: Pure Decl.) then it is a 4549 local variable which only the actions can access. 4550 4551 4552 File: bison.info, Node: Action Features, Next: Internationalization, Prev: Error Reporting, Up: Interface 4553 4554 4.4 Special Features for Use in Actions 4555 ======================================= 4556 4557 Here is a table of Bison constructs, variables and macros that are 4558 useful in actions. 4559 4560 -- Variable: $$ 4561 Acts like a variable that contains the semantic value for the 4562 grouping made by the current rule. *Note Actions::. 4563 4564 -- Variable: $N 4565 Acts like a variable that contains the semantic value for the Nth 4566 component of the current rule. *Note Actions::. 4567 4568 -- Variable: $<TYPEALT>$ 4569 Like `$$' but specifies alternative TYPEALT in the union specified 4570 by the `%union' declaration. *Note Data Types of Values in 4571 Actions: Action Types. 4572 4573 -- Variable: $<TYPEALT>N 4574 Like `$N' but specifies alternative TYPEALT in the union specified 4575 by the `%union' declaration. *Note Data Types of Values in 4576 Actions: Action Types. 4577 4578 -- Macro: YYABORT; 4579 Return immediately from `yyparse', indicating failure. *Note The 4580 Parser Function `yyparse': Parser Function. 4581 4582 -- Macro: YYACCEPT; 4583 Return immediately from `yyparse', indicating success. *Note The 4584 Parser Function `yyparse': Parser Function. 4585 4586 -- Macro: YYBACKUP (TOKEN, VALUE); 4587 Unshift a token. This macro is allowed only for rules that reduce 4588 a single value, and only when there is no look-ahead token. It is 4589 also disallowed in GLR parsers. It installs a look-ahead token 4590 with token type TOKEN and semantic value VALUE; then it discards 4591 the value that was going to be reduced by this rule. 4592 4593 If the macro is used when it is not valid, such as when there is a 4594 look-ahead token already, then it reports a syntax error with a 4595 message `cannot back up' and performs ordinary error recovery. 4596 4597 In either case, the rest of the action is not executed. 4598 4599 -- Macro: YYEMPTY 4600 Value stored in `yychar' when there is no look-ahead token. 4601 4602 -- Macro: YYEOF 4603 Value stored in `yychar' when the look-ahead is the end of the 4604 input stream. 4605 4606 -- Macro: YYERROR; 4607 Cause an immediate syntax error. This statement initiates error 4608 recovery just as if the parser itself had detected an error; 4609 however, it does not call `yyerror', and does not print any 4610 message. If you want to print an error message, call `yyerror' 4611 explicitly before the `YYERROR;' statement. *Note Error 4612 Recovery::. 4613 4614 -- Macro: YYRECOVERING 4615 The expression `YYRECOVERING ()' yields 1 when the parser is 4616 recovering from a syntax error, and 0 otherwise. *Note Error 4617 Recovery::. 4618 4619 -- Variable: yychar 4620 Variable containing either the look-ahead token, or `YYEOF' when 4621 the look-ahead is the end of the input stream, or `YYEMPTY' when 4622 no look-ahead has been performed so the next token is not yet 4623 known. Do not modify `yychar' in a deferred semantic action 4624 (*note GLR Semantic Actions::). *Note Look-Ahead Tokens: 4625 Look-Ahead. 4626 4627 -- Macro: yyclearin; 4628 Discard the current look-ahead token. This is useful primarily in 4629 error rules. Do not invoke `yyclearin' in a deferred semantic 4630 action (*note GLR Semantic Actions::). *Note Error Recovery::. 4631 4632 -- Macro: yyerrok; 4633 Resume generating error messages immediately for subsequent syntax 4634 errors. This is useful primarily in error rules. *Note Error 4635 Recovery::. 4636 4637 -- Variable: yylloc 4638 Variable containing the look-ahead token location when `yychar' is 4639 not set to `YYEMPTY' or `YYEOF'. Do not modify `yylloc' in a 4640 deferred semantic action (*note GLR Semantic Actions::). *Note 4641 Actions and Locations: Actions and Locations. 4642 4643 -- Variable: yylval 4644 Variable containing the look-ahead token semantic value when 4645 `yychar' is not set to `YYEMPTY' or `YYEOF'. Do not modify 4646 `yylval' in a deferred semantic action (*note GLR Semantic 4647 Actions::). *Note Actions: Actions. 4648 4649 -- Value: @$ 4650 Acts like a structure variable containing information on the 4651 textual location of the grouping made by the current rule. *Note 4652 Tracking Locations: Locations. 4653 4654 4655 -- Value: @N 4656 Acts like a structure variable containing information on the 4657 textual location of the Nth component of the current rule. *Note 4658 Tracking Locations: Locations. 4659 4660 4661 File: bison.info, Node: Internationalization, Prev: Action Features, Up: Interface 4662 4663 4.5 Parser Internationalization 4664 =============================== 4665 4666 A Bison-generated parser can print diagnostics, including error and 4667 tracing messages. By default, they appear in English. However, Bison 4668 also supports outputting diagnostics in the user's native language. To 4669 make this work, the user should set the usual environment variables. 4670 *Note The User's View: (gettext)Users. For example, the shell command 4671 `export LC_ALL=fr_CA.UTF-8' might set the user's locale to French 4672 Canadian using the UTF-8 encoding. The exact set of available locales 4673 depends on the user's installation. 4674 4675 The maintainer of a package that uses a Bison-generated parser 4676 enables the internationalization of the parser's output through the 4677 following steps. Here we assume a package that uses GNU Autoconf and 4678 GNU Automake. 4679 4680 1. Into the directory containing the GNU Autoconf macros used by the 4681 package--often called `m4'--copy the `bison-i18n.m4' file 4682 installed by Bison under `share/aclocal/bison-i18n.m4' in Bison's 4683 installation directory. For example: 4684 4685 cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 4686 4687 2. In the top-level `configure.ac', after the `AM_GNU_GETTEXT' 4688 invocation, add an invocation of `BISON_I18N'. This macro is 4689 defined in the file `bison-i18n.m4' that you copied earlier. It 4690 causes `configure' to find the value of the `BISON_LOCALEDIR' 4691 variable, and it defines the source-language symbol `YYENABLE_NLS' 4692 to enable translations in the Bison-generated parser. 4693 4694 3. In the `main' function of your program, designate the directory 4695 containing Bison's runtime message catalog, through a call to 4696 `bindtextdomain' with domain name `bison-runtime'. For example: 4697 4698 bindtextdomain ("bison-runtime", BISON_LOCALEDIR); 4699 4700 Typically this appears after any other call `bindtextdomain 4701 (PACKAGE, LOCALEDIR)' that your package already has. Here we rely 4702 on `BISON_LOCALEDIR' to be defined as a string through the 4703 `Makefile'. 4704 4705 4. In the `Makefile.am' that controls the compilation of the `main' 4706 function, make `BISON_LOCALEDIR' available as a C preprocessor 4707 macro, either in `DEFS' or in `AM_CPPFLAGS'. For example: 4708 4709 DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 4710 4711 or: 4712 4713 AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' 4714 4715 5. Finally, invoke the command `autoreconf' to generate the build 4716 infrastructure. 4717 4718 4719 File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top 4720 4721 5 The Bison Parser Algorithm 4722 **************************** 4723 4724 As Bison reads tokens, it pushes them onto a stack along with their 4725 semantic values. The stack is called the "parser stack". Pushing a 4726 token is traditionally called "shifting". 4727 4728 For example, suppose the infix calculator has read `1 + 5 *', with a 4729 `3' to come. The stack will have four elements, one for each token 4730 that was shifted. 4731 4732 But the stack does not always have an element for each token read. 4733 When the last N tokens and groupings shifted match the components of a 4734 grammar rule, they can be combined according to that rule. This is 4735 called "reduction". Those tokens and groupings are replaced on the 4736 stack by a single grouping whose symbol is the result (left hand side) 4737 of that rule. Running the rule's action is part of the process of 4738 reduction, because this is what computes the semantic value of the 4739 resulting grouping. 4740 4741 For example, if the infix calculator's parser stack contains this: 4742 4743 1 + 5 * 3 4744 4745 and the next input token is a newline character, then the last three 4746 elements can be reduced to 15 via the rule: 4747 4748 expr: expr '*' expr; 4749 4750 Then the stack contains just these three elements: 4751 4752 1 + 15 4753 4754 At this point, another reduction can be made, resulting in the single 4755 value 16. Then the newline token can be shifted. 4756 4757 The parser tries, by shifts and reductions, to reduce the entire 4758 input down to a single grouping whose symbol is the grammar's 4759 start-symbol (*note Languages and Context-Free Grammars: Language and 4760 Grammar.). 4761 4762 This kind of parser is known in the literature as a bottom-up parser. 4763 4764 * Menu: 4765 4766 * Look-Ahead:: Parser looks one token ahead when deciding what to do. 4767 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. 4768 * Precedence:: Operator precedence works by resolving conflicts. 4769 * Contextual Precedence:: When an operator's precedence depends on context. 4770 * Parser States:: The parser is a finite-state-machine with stack. 4771 * Reduce/Reduce:: When two rules are applicable in the same situation. 4772 * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. 4773 * Generalized LR Parsing:: Parsing arbitrary context-free grammars. 4774 * Memory Management:: What happens when memory is exhausted. How to avoid it. 4775 4776 4777 File: bison.info, Node: Look-Ahead, Next: Shift/Reduce, Up: Algorithm 4778 4779 5.1 Look-Ahead Tokens 4780 ===================== 4781 4782 The Bison parser does _not_ always reduce immediately as soon as the 4783 last N tokens and groupings match a rule. This is because such a 4784 simple strategy is inadequate to handle most languages. Instead, when a 4785 reduction is possible, the parser sometimes "looks ahead" at the next 4786 token in order to decide what to do. 4787 4788 When a token is read, it is not immediately shifted; first it 4789 becomes the "look-ahead token", which is not on the stack. Now the 4790 parser can perform one or more reductions of tokens and groupings on 4791 the stack, while the look-ahead token remains off to the side. When no 4792 more reductions should take place, the look-ahead token is shifted onto 4793 the stack. This does not mean that all possible reductions have been 4794 done; depending on the token type of the look-ahead token, some rules 4795 may choose to delay their application. 4796 4797 Here is a simple case where look-ahead is needed. These three rules 4798 define expressions which contain binary addition operators and postfix 4799 unary factorial operators (`!'), and allow parentheses for grouping. 4800 4801 expr: term '+' expr 4802 | term 4803 ; 4804 4805 term: '(' expr ')' 4806 | term '!' 4807 | NUMBER 4808 ; 4809 4810 Suppose that the tokens `1 + 2' have been read and shifted; what 4811 should be done? If the following token is `)', then the first three 4812 tokens must be reduced to form an `expr'. This is the only valid 4813 course, because shifting the `)' would produce a sequence of symbols 4814 `term ')'', and no rule allows this. 4815 4816 If the following token is `!', then it must be shifted immediately so 4817 that `2 !' can be reduced to make a `term'. If instead the parser were 4818 to reduce before shifting, `1 + 2' would become an `expr'. It would 4819 then be impossible to shift the `!' because doing so would produce on 4820 the stack the sequence of symbols `expr '!''. No rule allows that 4821 sequence. 4822 4823 The look-ahead token is stored in the variable `yychar'. Its 4824 semantic value and location, if any, are stored in the variables 4825 `yylval' and `yylloc'. *Note Special Features for Use in Actions: 4826 Action Features. 4827 4828 4829 File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Look-Ahead, Up: Algorithm 4830 4831 5.2 Shift/Reduce Conflicts 4832 ========================== 4833 4834 Suppose we are parsing a language which has if-then and if-then-else 4835 statements, with a pair of rules like this: 4836 4837 if_stmt: 4838 IF expr THEN stmt 4839 | IF expr THEN stmt ELSE stmt 4840 ; 4841 4842 Here we assume that `IF', `THEN' and `ELSE' are terminal symbols for 4843 specific keyword tokens. 4844 4845 When the `ELSE' token is read and becomes the look-ahead token, the 4846 contents of the stack (assuming the input is valid) are just right for 4847 reduction by the first rule. But it is also legitimate to shift the 4848 `ELSE', because that would lead to eventual reduction by the second 4849 rule. 4850 4851 This situation, where either a shift or a reduction would be valid, 4852 is called a "shift/reduce conflict". Bison is designed to resolve 4853 these conflicts by choosing to shift, unless otherwise directed by 4854 operator precedence declarations. To see the reason for this, let's 4855 contrast it with the other alternative. 4856 4857 Since the parser prefers to shift the `ELSE', the result is to attach 4858 the else-clause to the innermost if-statement, making these two inputs 4859 equivalent: 4860 4861 if x then if y then win (); else lose; 4862 4863 if x then do; if y then win (); else lose; end; 4864 4865 But if the parser chose to reduce when possible rather than shift, 4866 the result would be to attach the else-clause to the outermost 4867 if-statement, making these two inputs equivalent: 4868 4869 if x then if y then win (); else lose; 4870 4871 if x then do; if y then win (); end; else lose; 4872 4873 The conflict exists because the grammar as written is ambiguous: 4874 either parsing of the simple nested if-statement is legitimate. The 4875 established convention is that these ambiguities are resolved by 4876 attaching the else-clause to the innermost if-statement; this is what 4877 Bison accomplishes by choosing to shift rather than reduce. (It would 4878 ideally be cleaner to write an unambiguous grammar, but that is very 4879 hard to do in this case.) This particular ambiguity was first 4880 encountered in the specifications of Algol 60 and is called the 4881 "dangling `else'" ambiguity. 4882 4883 To avoid warnings from Bison about predictable, legitimate 4884 shift/reduce conflicts, use the `%expect N' declaration. There will be 4885 no warning as long as the number of shift/reduce conflicts is exactly N. 4886 *Note Suppressing Conflict Warnings: Expect Decl. 4887 4888 The definition of `if_stmt' above is solely to blame for the 4889 conflict, but the conflict does not actually appear without additional 4890 rules. Here is a complete Bison input file that actually manifests the 4891 conflict: 4892 4893 %token IF THEN ELSE variable 4894 %% 4895 stmt: expr 4896 | if_stmt 4897 ; 4898 4899 if_stmt: 4900 IF expr THEN stmt 4901 | IF expr THEN stmt ELSE stmt 4902 ; 4903 4904 expr: variable 4905 ; 4906 4907 4908 File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm 4909 4910 5.3 Operator Precedence 4911 ======================= 4912 4913 Another situation where shift/reduce conflicts appear is in arithmetic 4914 expressions. Here shifting is not always the preferred resolution; the 4915 Bison declarations for operator precedence allow you to specify when to 4916 shift and when to reduce. 4917 4918 * Menu: 4919 4920 * Why Precedence:: An example showing why precedence is needed. 4921 * Using Precedence:: How to specify precedence in Bison grammars. 4922 * Precedence Examples:: How these features are used in the previous example. 4923 * How Precedence:: How they work. 4924 4925 4926 File: bison.info, Node: Why Precedence, Next: Using Precedence, Up: Precedence 4927 4928 5.3.1 When Precedence is Needed 4929 ------------------------------- 4930 4931 Consider the following ambiguous grammar fragment (ambiguous because the 4932 input `1 - 2 * 3' can be parsed in two different ways): 4933 4934 expr: expr '-' expr 4935 | expr '*' expr 4936 | expr '<' expr 4937 | '(' expr ')' 4938 ... 4939 ; 4940 4941 Suppose the parser has seen the tokens `1', `-' and `2'; should it 4942 reduce them via the rule for the subtraction operator? It depends on 4943 the next token. Of course, if the next token is `)', we must reduce; 4944 shifting is invalid because no single rule can reduce the token 4945 sequence `- 2 )' or anything starting with that. But if the next token 4946 is `*' or `<', we have a choice: either shifting or reduction would 4947 allow the parse to complete, but with different results. 4948 4949 To decide which one Bison should do, we must consider the results. 4950 If the next operator token OP is shifted, then it must be reduced first 4951 in order to permit another opportunity to reduce the difference. The 4952 result is (in effect) `1 - (2 OP 3)'. On the other hand, if the 4953 subtraction is reduced before shifting OP, the result is 4954 `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce should 4955 depend on the relative precedence of the operators `-' and OP: `*' 4956 should be shifted first, but not `<'. 4957 4958 What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5' 4959 or should it be `1 - (2 - 5)'? For most operators we prefer the 4960 former, which is called "left association". The latter alternative, 4961 "right association", is desirable for assignment operators. The choice 4962 of left or right association is a matter of whether the parser chooses 4963 to shift or reduce when the stack contains `1 - 2' and the look-ahead 4964 token is `-': shifting makes right-associativity. 4965 4966 4967 File: bison.info, Node: Using Precedence, Next: Precedence Examples, Prev: Why Precedence, Up: Precedence 4968 4969 5.3.2 Specifying Operator Precedence 4970 ------------------------------------ 4971 4972 Bison allows you to specify these choices with the operator precedence 4973 declarations `%left' and `%right'. Each such declaration contains a 4974 list of tokens, which are operators whose precedence and associativity 4975 is being declared. The `%left' declaration makes all those operators 4976 left-associative and the `%right' declaration makes them 4977 right-associative. A third alternative is `%nonassoc', which declares 4978 that it is a syntax error to find the same operator twice "in a row". 4979 4980 The relative precedence of different operators is controlled by the 4981 order in which they are declared. The first `%left' or `%right' 4982 declaration in the file declares the operators whose precedence is 4983 lowest, the next such declaration declares the operators whose 4984 precedence is a little higher, and so on. 4985 4986 4987 File: bison.info, Node: Precedence Examples, Next: How Precedence, Prev: Using Precedence, Up: Precedence 4988 4989 5.3.3 Precedence Examples 4990 ------------------------- 4991 4992 In our example, we would want the following declarations: 4993 4994 %left '<' 4995 %left '-' 4996 %left '*' 4997 4998 In a more complete example, which supports other operators as well, 4999 we would declare them in groups of equal precedence. For example, 5000 `'+'' is declared with `'-'': 5001 5002 %left '<' '>' '=' NE LE GE 5003 %left '+' '-' 5004 %left '*' '/' 5005 5006 (Here `NE' and so on stand for the operators for "not equal" and so on. 5007 We assume that these tokens are more than one character long and 5008 therefore are represented by names, not character literals.) 5009 5010 5011 File: bison.info, Node: How Precedence, Prev: Precedence Examples, Up: Precedence 5012 5013 5.3.4 How Precedence Works 5014 -------------------------- 5015 5016 The first effect of the precedence declarations is to assign precedence 5017 levels to the terminal symbols declared. The second effect is to assign 5018 precedence levels to certain rules: each rule gets its precedence from 5019 the last terminal symbol mentioned in the components. (You can also 5020 specify explicitly the precedence of a rule. *Note Context-Dependent 5021 Precedence: Contextual Precedence.) 5022 5023 Finally, the resolution of conflicts works by comparing the 5024 precedence of the rule being considered with that of the look-ahead 5025 token. If the token's precedence is higher, the choice is to shift. 5026 If the rule's precedence is higher, the choice is to reduce. If they 5027 have equal precedence, the choice is made based on the associativity of 5028 that precedence level. The verbose output file made by `-v' (*note 5029 Invoking Bison: Invocation.) says how each conflict was resolved. 5030 5031 Not all rules and not all tokens have precedence. If either the 5032 rule or the look-ahead token has no precedence, then the default is to 5033 shift. 5034 5035 5036 File: bison.info, Node: Contextual Precedence, Next: Parser States, Prev: Precedence, Up: Algorithm 5037 5038 5.4 Context-Dependent Precedence 5039 ================================ 5040 5041 Often the precedence of an operator depends on the context. This sounds 5042 outlandish at first, but it is really very common. For example, a minus 5043 sign typically has a very high precedence as a unary operator, and a 5044 somewhat lower precedence (lower than multiplication) as a binary 5045 operator. 5046 5047 The Bison precedence declarations, `%left', `%right' and 5048 `%nonassoc', can only be used once for a given token; so a token has 5049 only one precedence declared in this way. For context-dependent 5050 precedence, you need to use an additional mechanism: the `%prec' 5051 modifier for rules. 5052 5053 The `%prec' modifier declares the precedence of a particular rule by 5054 specifying a terminal symbol whose precedence should be used for that 5055 rule. It's not necessary for that symbol to appear otherwise in the 5056 rule. The modifier's syntax is: 5057 5058 %prec TERMINAL-SYMBOL 5059 5060 and it is written after the components of the rule. Its effect is to 5061 assign the rule the precedence of TERMINAL-SYMBOL, overriding the 5062 precedence that would be deduced for it in the ordinary way. The 5063 altered rule precedence then affects how conflicts involving that rule 5064 are resolved (*note Operator Precedence: Precedence.). 5065 5066 Here is how `%prec' solves the problem of unary minus. First, 5067 declare a precedence for a fictitious terminal symbol named `UMINUS'. 5068 There are no tokens of this type, but the symbol serves to stand for its 5069 precedence: 5070 5071 ... 5072 %left '+' '-' 5073 %left '*' 5074 %left UMINUS 5075 5076 Now the precedence of `UMINUS' can be used in specific rules: 5077 5078 exp: ... 5079 | exp '-' exp 5080 ... 5081 | '-' exp %prec UMINUS 5082 5083 5084 File: bison.info, Node: Parser States, Next: Reduce/Reduce, Prev: Contextual Precedence, Up: Algorithm 5085 5086 5.5 Parser States 5087 ================= 5088 5089 The function `yyparse' is implemented using a finite-state machine. 5090 The values pushed on the parser stack are not simply token type codes; 5091 they represent the entire sequence of terminal and nonterminal symbols 5092 at or near the top of the stack. The current state collects all the 5093 information about previous input which is relevant to deciding what to 5094 do next. 5095 5096 Each time a look-ahead token is read, the current parser state 5097 together with the type of look-ahead token are looked up in a table. 5098 This table entry can say, "Shift the look-ahead token." In this case, 5099 it also specifies the new parser state, which is pushed onto the top of 5100 the parser stack. Or it can say, "Reduce using rule number N." This 5101 means that a certain number of tokens or groupings are taken off the 5102 top of the stack, and replaced by one grouping. In other words, that 5103 number of states are popped from the stack, and one new state is pushed. 5104 5105 There is one other alternative: the table can say that the 5106 look-ahead token is erroneous in the current state. This causes error 5107 processing to begin (*note Error Recovery::). 5108 5109 5110 File: bison.info, Node: Reduce/Reduce, Next: Mystery Conflicts, Prev: Parser States, Up: Algorithm 5111 5112 5.6 Reduce/Reduce Conflicts 5113 =========================== 5114 5115 A reduce/reduce conflict occurs if there are two or more rules that 5116 apply to the same sequence of input. This usually indicates a serious 5117 error in the grammar. 5118 5119 For example, here is an erroneous attempt to define a sequence of 5120 zero or more `word' groupings. 5121 5122 sequence: /* empty */ 5123 { printf ("empty sequence\n"); } 5124 | maybeword 5125 | sequence word 5126 { printf ("added word %s\n", $2); } 5127 ; 5128 5129 maybeword: /* empty */ 5130 { printf ("empty maybeword\n"); } 5131 | word 5132 { printf ("single word %s\n", $1); } 5133 ; 5134 5135 The error is an ambiguity: there is more than one way to parse a single 5136 `word' into a `sequence'. It could be reduced to a `maybeword' and 5137 then into a `sequence' via the second rule. Alternatively, 5138 nothing-at-all could be reduced into a `sequence' via the first rule, 5139 and this could be combined with the `word' using the third rule for 5140 `sequence'. 5141 5142 There is also more than one way to reduce nothing-at-all into a 5143 `sequence'. This can be done directly via the first rule, or 5144 indirectly via `maybeword' and then the second rule. 5145 5146 You might think that this is a distinction without a difference, 5147 because it does not change whether any particular input is valid or 5148 not. But it does affect which actions are run. One parsing order runs 5149 the second rule's action; the other runs the first rule's action and 5150 the third rule's action. In this example, the output of the program 5151 changes. 5152 5153 Bison resolves a reduce/reduce conflict by choosing to use the rule 5154 that appears first in the grammar, but it is very risky to rely on 5155 this. Every reduce/reduce conflict must be studied and usually 5156 eliminated. Here is the proper way to define `sequence': 5157 5158 sequence: /* empty */ 5159 { printf ("empty sequence\n"); } 5160 | sequence word 5161 { printf ("added word %s\n", $2); } 5162 ; 5163 5164 Here is another common error that yields a reduce/reduce conflict: 5165 5166 sequence: /* empty */ 5167 | sequence words 5168 | sequence redirects 5169 ; 5170 5171 words: /* empty */ 5172 | words word 5173 ; 5174 5175 redirects:/* empty */ 5176 | redirects redirect 5177 ; 5178 5179 The intention here is to define a sequence which can contain either 5180 `word' or `redirect' groupings. The individual definitions of 5181 `sequence', `words' and `redirects' are error-free, but the three 5182 together make a subtle ambiguity: even an empty input can be parsed in 5183 infinitely many ways! 5184 5185 Consider: nothing-at-all could be a `words'. Or it could be two 5186 `words' in a row, or three, or any number. It could equally well be a 5187 `redirects', or two, or any number. Or it could be a `words' followed 5188 by three `redirects' and another `words'. And so on. 5189 5190 Here are two ways to correct these rules. First, to make it a 5191 single level of sequence: 5192 5193 sequence: /* empty */ 5194 | sequence word 5195 | sequence redirect 5196 ; 5197 5198 Second, to prevent either a `words' or a `redirects' from being 5199 empty: 5200 5201 sequence: /* empty */ 5202 | sequence words 5203 | sequence redirects 5204 ; 5205 5206 words: word 5207 | words word 5208 ; 5209 5210 redirects:redirect 5211 | redirects redirect 5212 ; 5213 5214 5215 File: bison.info, Node: Mystery Conflicts, Next: Generalized LR Parsing, Prev: Reduce/Reduce, Up: Algorithm 5216 5217 5.7 Mysterious Reduce/Reduce Conflicts 5218 ====================================== 5219 5220 Sometimes reduce/reduce conflicts can occur that don't look warranted. 5221 Here is an example: 5222 5223 %token ID 5224 5225 %% 5226 def: param_spec return_spec ',' 5227 ; 5228 param_spec: 5229 type 5230 | name_list ':' type 5231 ; 5232 return_spec: 5233 type 5234 | name ':' type 5235 ; 5236 type: ID 5237 ; 5238 name: ID 5239 ; 5240 name_list: 5241 name 5242 | name ',' name_list 5243 ; 5244 5245 It would seem that this grammar can be parsed with only a single 5246 token of look-ahead: when a `param_spec' is being read, an `ID' is a 5247 `name' if a comma or colon follows, or a `type' if another `ID' 5248 follows. In other words, this grammar is LR(1). 5249 5250 However, Bison, like most parser generators, cannot actually handle 5251 all LR(1) grammars. In this grammar, two contexts, that after an `ID' 5252 at the beginning of a `param_spec' and likewise at the beginning of a 5253 `return_spec', are similar enough that Bison assumes they are the same. 5254 They appear similar because the same set of rules would be active--the 5255 rule for reducing to a `name' and that for reducing to a `type'. Bison 5256 is unable to determine at that stage of processing that the rules would 5257 require different look-ahead tokens in the two contexts, so it makes a 5258 single parser state for them both. Combining the two contexts causes a 5259 conflict later. In parser terminology, this occurrence means that the 5260 grammar is not LALR(1). 5261 5262 In general, it is better to fix deficiencies than to document them. 5263 But this particular deficiency is intrinsically hard to fix; parser 5264 generators that can handle LR(1) grammars are hard to write and tend to 5265 produce parsers that are very large. In practice, Bison is more useful 5266 as it is now. 5267 5268 When the problem arises, you can often fix it by identifying the two 5269 parser states that are being confused, and adding something to make them 5270 look distinct. In the above example, adding one rule to `return_spec' 5271 as follows makes the problem go away: 5272 5273 %token BOGUS 5274 ... 5275 %% 5276 ... 5277 return_spec: 5278 type 5279 | name ':' type 5280 /* This rule is never used. */ 5281 | ID BOGUS 5282 ; 5283 5284 This corrects the problem because it introduces the possibility of an 5285 additional active rule in the context after the `ID' at the beginning of 5286 `return_spec'. This rule is not active in the corresponding context in 5287 a `param_spec', so the two contexts receive distinct parser states. As 5288 long as the token `BOGUS' is never generated by `yylex', the added rule 5289 cannot alter the way actual input is parsed. 5290 5291 In this particular example, there is another way to solve the 5292 problem: rewrite the rule for `return_spec' to use `ID' directly 5293 instead of via `name'. This also causes the two confusing contexts to 5294 have different sets of active rules, because the one for `return_spec' 5295 activates the altered rule for `return_spec' rather than the one for 5296 `name'. 5297 5298 param_spec: 5299 type 5300 | name_list ':' type 5301 ; 5302 return_spec: 5303 type 5304 | ID ':' type 5305 ; 5306 5307 For a more detailed exposition of LALR(1) parsers and parser 5308 generators, please see: Frank DeRemer and Thomas Pennello, Efficient 5309 Computation of LALR(1) Look-Ahead Sets, `ACM Transactions on 5310 Programming Languages and Systems', Vol. 4, No. 4 (October 1982), pp. 5311 615-649 `http://doi.acm.org/10.1145/69622.357187'. 5312 5313 5314 File: bison.info, Node: Generalized LR Parsing, Next: Memory Management, Prev: Mystery Conflicts, Up: Algorithm 5315 5316 5.8 Generalized LR (GLR) Parsing 5317 ================================ 5318 5319 Bison produces _deterministic_ parsers that choose uniquely when to 5320 reduce and which reduction to apply based on a summary of the preceding 5321 input and on one extra token of look-ahead. As a result, normal Bison 5322 handles a proper subset of the family of context-free languages. 5323 Ambiguous grammars, since they have strings with more than one possible 5324 sequence of reductions cannot have deterministic parsers in this sense. 5325 The same is true of languages that require more than one symbol of 5326 look-ahead, since the parser lacks the information necessary to make a 5327 decision at the point it must be made in a shift-reduce parser. 5328 Finally, as previously mentioned (*note Mystery Conflicts::), there are 5329 languages where Bison's particular choice of how to summarize the input 5330 seen so far loses necessary information. 5331 5332 When you use the `%glr-parser' declaration in your grammar file, 5333 Bison generates a parser that uses a different algorithm, called 5334 Generalized LR (or GLR). A Bison GLR parser uses the same basic 5335 algorithm for parsing as an ordinary Bison parser, but behaves 5336 differently in cases where there is a shift-reduce conflict that has not 5337 been resolved by precedence rules (*note Precedence::) or a 5338 reduce-reduce conflict. When a GLR parser encounters such a situation, 5339 it effectively _splits_ into a several parsers, one for each possible 5340 shift or reduction. These parsers then proceed as usual, consuming 5341 tokens in lock-step. Some of the stacks may encounter other conflicts 5342 and split further, with the result that instead of a sequence of states, 5343 a Bison GLR parsing stack is what is in effect a tree of states. 5344 5345 In effect, each stack represents a guess as to what the proper parse 5346 is. Additional input may indicate that a guess was wrong, in which case 5347 the appropriate stack silently disappears. Otherwise, the semantics 5348 actions generated in each stack are saved, rather than being executed 5349 immediately. When a stack disappears, its saved semantic actions never 5350 get executed. When a reduction causes two stacks to become equivalent, 5351 their sets of semantic actions are both saved with the state that 5352 results from the reduction. We say that two stacks are equivalent when 5353 they both represent the same sequence of states, and each pair of 5354 corresponding states represents a grammar symbol that produces the same 5355 segment of the input token stream. 5356 5357 Whenever the parser makes a transition from having multiple states 5358 to having one, it reverts to the normal LALR(1) parsing algorithm, 5359 after resolving and executing the saved-up actions. At this 5360 transition, some of the states on the stack will have semantic values 5361 that are sets (actually multisets) of possible actions. The parser 5362 tries to pick one of the actions by first finding one whose rule has 5363 the highest dynamic precedence, as set by the `%dprec' declaration. 5364 Otherwise, if the alternative actions are not ordered by precedence, 5365 but there the same merging function is declared for both rules by the 5366 `%merge' declaration, Bison resolves and evaluates both and then calls 5367 the merge function on the result. Otherwise, it reports an ambiguity. 5368 5369 It is possible to use a data structure for the GLR parsing tree that 5370 permits the processing of any LALR(1) grammar in linear time (in the 5371 size of the input), any unambiguous (not necessarily LALR(1)) grammar in 5372 quadratic worst-case time, and any general (possibly ambiguous) 5373 context-free grammar in cubic worst-case time. However, Bison currently 5374 uses a simpler data structure that requires time proportional to the 5375 length of the input times the maximum number of stacks required for any 5376 prefix of the input. Thus, really ambiguous or nondeterministic 5377 grammars can require exponential time and space to process. Such badly 5378 behaving examples, however, are not generally of practical interest. 5379 Usually, nondeterminism in a grammar is local--the parser is "in doubt" 5380 only for a few tokens at a time. Therefore, the current data structure 5381 should generally be adequate. On LALR(1) portions of a grammar, in 5382 particular, it is only slightly slower than with the default Bison 5383 parser. 5384 5385 For a more detailed exposition of GLR parsers, please see: Elizabeth 5386 Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style 5387 Generalised LR Parsers, Royal Holloway, University of London, 5388 Department of Computer Science, TR-00-12, 5389 `http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps', 5390 (2000-12-24). 5391 5392 5393 File: bison.info, Node: Memory Management, Prev: Generalized LR Parsing, Up: Algorithm 5394 5395 5.9 Memory Management, and How to Avoid Memory Exhaustion 5396 ========================================================= 5397 5398 The Bison parser stack can run out of memory if too many tokens are 5399 shifted and not reduced. When this happens, the parser function 5400 `yyparse' calls `yyerror' and then returns 2. 5401 5402 Because Bison parsers have growing stacks, hitting the upper limit 5403 usually results from using a right recursion instead of a left 5404 recursion, *Note Recursive Rules: Recursion. 5405 5406 By defining the macro `YYMAXDEPTH', you can control how deep the 5407 parser stack can become before memory is exhausted. Define the macro 5408 with a value that is an integer. This value is the maximum number of 5409 tokens that can be shifted (and not reduced) before overflow. 5410 5411 The stack space allowed is not necessarily allocated. If you 5412 specify a large value for `YYMAXDEPTH', the parser normally allocates a 5413 small stack at first, and then makes it bigger by stages as needed. 5414 This increasing allocation happens automatically and silently. 5415 Therefore, you do not need to make `YYMAXDEPTH' painfully small merely 5416 to save space for ordinary inputs that do not need much stack. 5417 5418 However, do not allow `YYMAXDEPTH' to be a value so large that 5419 arithmetic overflow could occur when calculating the size of the stack 5420 space. Also, do not allow `YYMAXDEPTH' to be less than `YYINITDEPTH'. 5421 5422 The default value of `YYMAXDEPTH', if you do not define it, is 10000. 5423 5424 You can control how much stack is allocated initially by defining the 5425 macro `YYINITDEPTH' to a positive integer. For the C LALR(1) parser, 5426 this value must be a compile-time constant unless you are assuming C99 5427 or some other target language or compiler that allows variable-length 5428 arrays. The default is 200. 5429 5430 Do not allow `YYINITDEPTH' to be greater than `YYMAXDEPTH'. 5431 5432 Because of semantical differences between C and C++, the LALR(1) 5433 parsers in C produced by Bison cannot grow when compiled by C++ 5434 compilers. In this precise case (compiling a C parser as C++) you are 5435 suggested to grow `YYINITDEPTH'. The Bison maintainers hope to fix 5436 this deficiency in a future release. 5437 5438 5439 File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top 5440 5441 6 Error Recovery 5442 **************** 5443 5444 It is not usually acceptable to have a program terminate on a syntax 5445 error. For example, a compiler should recover sufficiently to parse the 5446 rest of the input file and check it for errors; a calculator should 5447 accept another expression. 5448 5449 In a simple interactive command parser where each input is one line, 5450 it may be sufficient to allow `yyparse' to return 1 on error and have 5451 the caller ignore the rest of the input line when that happens (and 5452 then call `yyparse' again). But this is inadequate for a compiler, 5453 because it forgets all the syntactic context leading up to the error. 5454 A syntax error deep within a function in the compiler input should not 5455 cause the compiler to treat the following line like the beginning of a 5456 source file. 5457 5458 You can define how to recover from a syntax error by writing rules to 5459 recognize the special token `error'. This is a terminal symbol that is 5460 always defined (you need not declare it) and reserved for error 5461 handling. The Bison parser generates an `error' token whenever a 5462 syntax error happens; if you have provided a rule to recognize this 5463 token in the current context, the parse can continue. 5464 5465 For example: 5466 5467 stmnts: /* empty string */ 5468 | stmnts '\n' 5469 | stmnts exp '\n' 5470 | stmnts error '\n' 5471 5472 The fourth rule in this example says that an error followed by a 5473 newline makes a valid addition to any `stmnts'. 5474 5475 What happens if a syntax error occurs in the middle of an `exp'? The 5476 error recovery rule, interpreted strictly, applies to the precise 5477 sequence of a `stmnts', an `error' and a newline. If an error occurs in 5478 the middle of an `exp', there will probably be some additional tokens 5479 and subexpressions on the stack after the last `stmnts', and there will 5480 be tokens to read before the next newline. So the rule is not 5481 applicable in the ordinary way. 5482 5483 But Bison can force the situation to fit the rule, by discarding 5484 part of the semantic context and part of the input. First it discards 5485 states and objects from the stack until it gets back to a state in 5486 which the `error' token is acceptable. (This means that the 5487 subexpressions already parsed are discarded, back to the last complete 5488 `stmnts'.) At this point the `error' token can be shifted. Then, if 5489 the old look-ahead token is not acceptable to be shifted next, the 5490 parser reads tokens and discards them until it finds a token which is 5491 acceptable. In this example, Bison reads and discards input until the 5492 next newline so that the fourth rule can apply. Note that discarded 5493 symbols are possible sources of memory leaks, see *Note Freeing 5494 Discarded Symbols: Destructor Decl, for a means to reclaim this memory. 5495 5496 The choice of error rules in the grammar is a choice of strategies 5497 for error recovery. A simple and useful strategy is simply to skip the 5498 rest of the current input line or current statement if an error is 5499 detected: 5500 5501 stmnt: error ';' /* On error, skip until ';' is read. */ 5502 5503 It is also useful to recover to the matching close-delimiter of an 5504 opening-delimiter that has already been parsed. Otherwise the 5505 close-delimiter will probably appear to be unmatched, and generate 5506 another, spurious error message: 5507 5508 primary: '(' expr ')' 5509 | '(' error ')' 5510 ... 5511 ; 5512 5513 Error recovery strategies are necessarily guesses. When they guess 5514 wrong, one syntax error often leads to another. In the above example, 5515 the error recovery rule guesses that an error is due to bad input 5516 within one `stmnt'. Suppose that instead a spurious semicolon is 5517 inserted in the middle of a valid `stmnt'. After the error recovery 5518 rule recovers from the first error, another syntax error will be found 5519 straightaway, since the text following the spurious semicolon is also 5520 an invalid `stmnt'. 5521 5522 To prevent an outpouring of error messages, the parser will output 5523 no error message for another syntax error that happens shortly after 5524 the first; only after three consecutive input tokens have been 5525 successfully shifted will error messages resume. 5526 5527 Note that rules which accept the `error' token may have actions, just 5528 as any other rules can. 5529 5530 You can make error messages resume immediately by using the macro 5531 `yyerrok' in an action. If you do this in the error rule's action, no 5532 error messages will be suppressed. This macro requires no arguments; 5533 `yyerrok;' is a valid C statement. 5534 5535 The previous look-ahead token is reanalyzed immediately after an 5536 error. If this is unacceptable, then the macro `yyclearin' may be used 5537 to clear this token. Write the statement `yyclearin;' in the error 5538 rule's action. *Note Special Features for Use in Actions: Action 5539 Features. 5540 5541 For example, suppose that on a syntax error, an error handling 5542 routine is called that advances the input stream to some point where 5543 parsing should once again commence. The next symbol returned by the 5544 lexical scanner is probably correct. The previous look-ahead token 5545 ought to be discarded with `yyclearin;'. 5546 5547 The expression `YYRECOVERING ()' yields 1 when the parser is 5548 recovering from a syntax error, and 0 otherwise. Syntax error 5549 diagnostics are suppressed while recovering from a syntax error. 5550 5551 5552 File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top 5553 5554 7 Handling Context Dependencies 5555 ******************************* 5556 5557 The Bison paradigm is to parse tokens first, then group them into larger 5558 syntactic units. In many languages, the meaning of a token is affected 5559 by its context. Although this violates the Bison paradigm, certain 5560 techniques (known as "kludges") may enable you to write Bison parsers 5561 for such languages. 5562 5563 * Menu: 5564 5565 * Semantic Tokens:: Token parsing can depend on the semantic context. 5566 * Lexical Tie-ins:: Token parsing can depend on the syntactic context. 5567 * Tie-in Recovery:: Lexical tie-ins have implications for how 5568 error recovery rules must be written. 5569 5570 (Actually, "kludge" means any technique that gets its job done but is 5571 neither clean nor robust.) 5572 5573 5574 File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Up: Context Dependency 5575 5576 7.1 Semantic Info in Token Types 5577 ================================ 5578 5579 The C language has a context dependency: the way an identifier is used 5580 depends on what its current meaning is. For example, consider this: 5581 5582 foo (x); 5583 5584 This looks like a function call statement, but if `foo' is a typedef 5585 name, then this is actually a declaration of `x'. How can a Bison 5586 parser for C decide how to parse this input? 5587 5588 The method used in GNU C is to have two different token types, 5589 `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it 5590 looks up the current declaration of the identifier in order to decide 5591 which token type to return: `TYPENAME' if the identifier is declared as 5592 a typedef, `IDENTIFIER' otherwise. 5593 5594 The grammar rules can then express the context dependency by the 5595 choice of token type to recognize. `IDENTIFIER' is accepted as an 5596 expression, but `TYPENAME' is not. `TYPENAME' can start a declaration, 5597 but `IDENTIFIER' cannot. In contexts where the meaning of the 5598 identifier is _not_ significant, such as in declarations that can 5599 shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is 5600 accepted--there is one rule for each of the two token types. 5601 5602 This technique is simple to use if the decision of which kinds of 5603 identifiers to allow is made at a place close to where the identifier is 5604 parsed. But in C this is not always so: C allows a declaration to 5605 redeclare a typedef name provided an explicit type has been specified 5606 earlier: 5607 5608 typedef int foo, bar; 5609 int baz (void) 5610 { 5611 static bar (bar); /* redeclare `bar' as static variable */ 5612 extern foo foo (foo); /* redeclare `foo' as function */ 5613 return foo (bar); 5614 } 5615 5616 Unfortunately, the name being declared is separated from the 5617 declaration construct itself by a complicated syntactic structure--the 5618 "declarator". 5619 5620 As a result, part of the Bison parser for C needs to be duplicated, 5621 with all the nonterminal names changed: once for parsing a declaration 5622 in which a typedef name can be redefined, and once for parsing a 5623 declaration in which that can't be done. Here is a part of the 5624 duplication, with actions omitted for brevity: 5625 5626 initdcl: 5627 declarator maybeasm '=' 5628 init 5629 | declarator maybeasm 5630 ; 5631 5632 notype_initdcl: 5633 notype_declarator maybeasm '=' 5634 init 5635 | notype_declarator maybeasm 5636 ; 5637 5638 Here `initdcl' can redeclare a typedef name, but `notype_initdcl' 5639 cannot. The distinction between `declarator' and `notype_declarator' 5640 is the same sort of thing. 5641 5642 There is some similarity between this technique and a lexical tie-in 5643 (described next), in that information which alters the lexical analysis 5644 is changed during parsing by other parts of the program. The 5645 difference is here the information is global, and is used for other 5646 purposes in the program. A true lexical tie-in has a special-purpose 5647 flag controlled by the syntactic context. 5648 5649 5650 File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency 5651 5652 7.2 Lexical Tie-ins 5653 =================== 5654 5655 One way to handle context-dependency is the "lexical tie-in": a flag 5656 which is set by Bison actions, whose purpose is to alter the way tokens 5657 are parsed. 5658 5659 For example, suppose we have a language vaguely like C, but with a 5660 special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an 5661 expression in parentheses in which all integers are hexadecimal. In 5662 particular, the token `a1b' must be treated as an integer rather than 5663 as an identifier if it appears in that context. Here is how you can do 5664 it: 5665 5666 %{ 5667 int hexflag; 5668 int yylex (void); 5669 void yyerror (char const *); 5670 %} 5671 %% 5672 ... 5673 expr: IDENTIFIER 5674 | constant 5675 | HEX '(' 5676 { hexflag = 1; } 5677 expr ')' 5678 { hexflag = 0; 5679 $$ = $4; } 5680 | expr '+' expr 5681 { $$ = make_sum ($1, $3); } 5682 ... 5683 ; 5684 5685 constant: 5686 INTEGER 5687 | STRING 5688 ; 5689 5690 Here we assume that `yylex' looks at the value of `hexflag'; when it is 5691 nonzero, all integers are parsed in hexadecimal, and tokens starting 5692 with letters are parsed as integers if possible. 5693 5694 The declaration of `hexflag' shown in the prologue of the parser file 5695 is needed to make it accessible to the actions (*note The Prologue: 5696 Prologue.). You must also write the code in `yylex' to obey the flag. 5697 5698 5699 File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency 5700 5701 7.3 Lexical Tie-ins and Error Recovery 5702 ====================================== 5703 5704 Lexical tie-ins make strict demands on any error recovery rules you 5705 have. *Note Error Recovery::. 5706 5707 The reason for this is that the purpose of an error recovery rule is 5708 to abort the parsing of one construct and resume in some larger 5709 construct. For example, in C-like languages, a typical error recovery 5710 rule is to skip tokens until the next semicolon, and then start a new 5711 statement, like this: 5712 5713 stmt: expr ';' 5714 | IF '(' expr ')' stmt { ... } 5715 ... 5716 error ';' 5717 { hexflag = 0; } 5718 ; 5719 5720 If there is a syntax error in the middle of a `hex (EXPR)' 5721 construct, this error rule will apply, and then the action for the 5722 completed `hex (EXPR)' will never run. So `hexflag' would remain set 5723 for the entire rest of the input, or until the next `hex' keyword, 5724 causing identifiers to be misinterpreted as integers. 5725 5726 To avoid this problem the error recovery rule itself clears 5727 `hexflag'. 5728 5729 There may also be an error recovery rule that works within 5730 expressions. For example, there could be a rule which applies within 5731 parentheses and skips to the close-parenthesis: 5732 5733 expr: ... 5734 | '(' expr ')' 5735 { $$ = $2; } 5736 | '(' error ')' 5737 ... 5738 5739 If this rule acts within the `hex' construct, it is not going to 5740 abort that construct (since it applies to an inner level of parentheses 5741 within the construct). Therefore, it should not clear the flag: the 5742 rest of the `hex' construct should be parsed with the flag still in 5743 effect. 5744 5745 What if there is an error recovery rule which might abort out of the 5746 `hex' construct or might not, depending on circumstances? There is no 5747 way you can write the action to determine whether a `hex' construct is 5748 being aborted or not. So if you are using a lexical tie-in, you had 5749 better make sure your error recovery rules are not of this kind. Each 5750 rule must be such that you can be sure that it always will, or always 5751 won't, have to clear the flag. 5752 5753 5754 File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top 5755 5756 8 Debugging Your Parser 5757 *********************** 5758 5759 Developing a parser can be a challenge, especially if you don't 5760 understand the algorithm (*note The Bison Parser Algorithm: 5761 Algorithm.). Even so, sometimes a detailed description of the automaton 5762 can help (*note Understanding Your Parser: Understanding.), or tracing 5763 the execution of the parser can give some insight on why it behaves 5764 improperly (*note Tracing Your Parser: Tracing.). 5765 5766 * Menu: 5767 5768 * Understanding:: Understanding the structure of your parser. 5769 * Tracing:: Tracing the execution of your parser. 5770 5771 5772 File: bison.info, Node: Understanding, Next: Tracing, Up: Debugging 5773 5774 8.1 Understanding Your Parser 5775 ============================= 5776 5777 As documented elsewhere (*note The Bison Parser Algorithm: Algorithm.) 5778 Bison parsers are "shift/reduce automata". In some cases (much more 5779 frequent than one would hope), looking at this automaton is required to 5780 tune or simply fix a parser. Bison provides two different 5781 representation of it, either textually or graphically (as a VCG file). 5782 5783 The textual file is generated when the options `--report' or 5784 `--verbose' are specified, see *Note Invoking Bison: Invocation. Its 5785 name is made by removing `.tab.c' or `.c' from the parser output file 5786 name, and adding `.output' instead. Therefore, if the input file is 5787 `foo.y', then the parser file is called `foo.tab.c' by default. As a 5788 consequence, the verbose output file is called `foo.output'. 5789 5790 The following grammar file, `calc.y', will be used in the sequel: 5791 5792 %token NUM STR 5793 %left '+' '-' 5794 %left '*' 5795 %% 5796 exp: exp '+' exp 5797 | exp '-' exp 5798 | exp '*' exp 5799 | exp '/' exp 5800 | NUM 5801 ; 5802 useless: STR; 5803 %% 5804 5805 `bison' reports: 5806 5807 calc.y: warning: 1 useless nonterminal and 1 useless rule 5808 calc.y:11.1-7: warning: useless nonterminal: useless 5809 calc.y:11.10-12: warning: useless rule: useless: STR 5810 calc.y: conflicts: 7 shift/reduce 5811 5812 When given `--report=state', in addition to `calc.tab.c', it creates 5813 a file `calc.output' with contents detailed below. The order of the 5814 output and the exact presentation might vary, but the interpretation is 5815 the same. 5816 5817 The first section includes details on conflicts that were solved 5818 thanks to precedence and/or associativity: 5819 5820 Conflict in state 8 between rule 2 and token '+' resolved as reduce. 5821 Conflict in state 8 between rule 2 and token '-' resolved as reduce. 5822 Conflict in state 8 between rule 2 and token '*' resolved as shift. 5823 ... 5824 5825 5826 The next section lists states that still have conflicts. 5827 5828 State 8 conflicts: 1 shift/reduce 5829 State 9 conflicts: 1 shift/reduce 5830 State 10 conflicts: 1 shift/reduce 5831 State 11 conflicts: 4 shift/reduce 5832 5833 The next section reports useless tokens, nonterminal and rules. Useless 5834 nonterminals and rules are removed in order to produce a smaller parser, 5835 but useless tokens are preserved, since they might be used by the 5836 scanner (note the difference between "useless" and "not used" below): 5837 5838 Useless nonterminals: 5839 useless 5840 5841 Terminals which are not used: 5842 STR 5843 5844 Useless rules: 5845 #6 useless: STR; 5846 5847 The next section reproduces the exact grammar that Bison used: 5848 5849 Grammar 5850 5851 Number, Line, Rule 5852 0 5 $accept -> exp $end 5853 1 5 exp -> exp '+' exp 5854 2 6 exp -> exp '-' exp 5855 3 7 exp -> exp '*' exp 5856 4 8 exp -> exp '/' exp 5857 5 9 exp -> NUM 5858 5859 and reports the uses of the symbols: 5860 5861 Terminals, with rules where they appear 5862 5863 $end (0) 0 5864 '*' (42) 3 5865 '+' (43) 1 5866 '-' (45) 2 5867 '/' (47) 4 5868 error (256) 5869 NUM (258) 5 5870 5871 Nonterminals, with rules where they appear 5872 5873 $accept (8) 5874 on left: 0 5875 exp (9) 5876 on left: 1 2 3 4 5, on right: 0 1 2 3 4 5877 5878 Bison then proceeds onto the automaton itself, describing each state 5879 with it set of "items", also known as "pointed rules". Each item is a 5880 production rule together with a point (marked by `.') that the input 5881 cursor. 5882 5883 state 0 5884 5885 $accept -> . exp $ (rule 0) 5886 5887 NUM shift, and go to state 1 5888 5889 exp go to state 2 5890 5891 This reads as follows: "state 0 corresponds to being at the very 5892 beginning of the parsing, in the initial rule, right before the start 5893 symbol (here, `exp'). When the parser returns to this state right 5894 after having reduced a rule that produced an `exp', the control flow 5895 jumps to state 2. If there is no such transition on a nonterminal 5896 symbol, and the look-ahead is a `NUM', then this token is shifted on 5897 the parse stack, and the control flow jumps to state 1. Any other 5898 look-ahead triggers a syntax error." 5899 5900 Even though the only active rule in state 0 seems to be rule 0, the 5901 report lists `NUM' as a look-ahead token because `NUM' can be at the 5902 beginning of any rule deriving an `exp'. By default Bison reports the 5903 so-called "core" or "kernel" of the item set, but if you want to see 5904 more detail you can invoke `bison' with `--report=itemset' to list all 5905 the items, include those that can be derived: 5906 5907 state 0 5908 5909 $accept -> . exp $ (rule 0) 5910 exp -> . exp '+' exp (rule 1) 5911 exp -> . exp '-' exp (rule 2) 5912 exp -> . exp '*' exp (rule 3) 5913 exp -> . exp '/' exp (rule 4) 5914 exp -> . NUM (rule 5) 5915 5916 NUM shift, and go to state 1 5917 5918 exp go to state 2 5919 5920 In the state 1... 5921 5922 state 1 5923 5924 exp -> NUM . (rule 5) 5925 5926 $default reduce using rule 5 (exp) 5927 5928 the rule 5, `exp: NUM;', is completed. Whatever the look-ahead token 5929 (`$default'), the parser will reduce it. If it was coming from state 5930 0, then, after this reduction it will return to state 0, and will jump 5931 to state 2 (`exp: go to state 2'). 5932 5933 state 2 5934 5935 $accept -> exp . $ (rule 0) 5936 exp -> exp . '+' exp (rule 1) 5937 exp -> exp . '-' exp (rule 2) 5938 exp -> exp . '*' exp (rule 3) 5939 exp -> exp . '/' exp (rule 4) 5940 5941 $ shift, and go to state 3 5942 '+' shift, and go to state 4 5943 '-' shift, and go to state 5 5944 '*' shift, and go to state 6 5945 '/' shift, and go to state 7 5946 5947 In state 2, the automaton can only shift a symbol. For instance, 5948 because of the item `exp -> exp . '+' exp', if the look-ahead if `+', 5949 it will be shifted on the parse stack, and the automaton control will 5950 jump to state 4, corresponding to the item `exp -> exp '+' . exp'. 5951 Since there is no default action, any other token than those listed 5952 above will trigger a syntax error. 5953 5954 The state 3 is named the "final state", or the "accepting state": 5955 5956 state 3 5957 5958 $accept -> exp $ . (rule 0) 5959 5960 $default accept 5961 5962 the initial rule is completed (the start symbol and the end of input 5963 were read), the parsing exits successfully. 5964 5965 The interpretation of states 4 to 7 is straightforward, and is left 5966 to the reader. 5967 5968 state 4 5969 5970 exp -> exp '+' . exp (rule 1) 5971 5972 NUM shift, and go to state 1 5973 5974 exp go to state 8 5975 5976 state 5 5977 5978 exp -> exp '-' . exp (rule 2) 5979 5980 NUM shift, and go to state 1 5981 5982 exp go to state 9 5983 5984 state 6 5985 5986 exp -> exp '*' . exp (rule 3) 5987 5988 NUM shift, and go to state 1 5989 5990 exp go to state 10 5991 5992 state 7 5993 5994 exp -> exp '/' . exp (rule 4) 5995 5996 NUM shift, and go to state 1 5997 5998 exp go to state 11 5999 6000 As was announced in beginning of the report, `State 8 conflicts: 1 6001 shift/reduce': 6002 6003 state 8 6004 6005 exp -> exp . '+' exp (rule 1) 6006 exp -> exp '+' exp . (rule 1) 6007 exp -> exp . '-' exp (rule 2) 6008 exp -> exp . '*' exp (rule 3) 6009 exp -> exp . '/' exp (rule 4) 6010 6011 '*' shift, and go to state 6 6012 '/' shift, and go to state 7 6013 6014 '/' [reduce using rule 1 (exp)] 6015 $default reduce using rule 1 (exp) 6016 6017 Indeed, there are two actions associated to the look-ahead `/': 6018 either shifting (and going to state 7), or reducing rule 1. The 6019 conflict means that either the grammar is ambiguous, or the parser lacks 6020 information to make the right decision. Indeed the grammar is 6021 ambiguous, as, since we did not specify the precedence of `/', the 6022 sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which 6023 corresponds to shifting `/', or as `(NUM + NUM) / NUM', which 6024 corresponds to reducing rule 1. 6025 6026 Because in LALR(1) parsing a single decision can be made, Bison 6027 arbitrarily chose to disable the reduction, see *Note Shift/Reduce 6028 Conflicts: Shift/Reduce. Discarded actions are reported in between 6029 square brackets. 6030 6031 Note that all the previous states had a single possible action: 6032 either shifting the next token and going to the corresponding state, or 6033 reducing a single rule. In the other cases, i.e., when shifting _and_ 6034 reducing is possible or when _several_ reductions are possible, the 6035 look-ahead is required to select the action. State 8 is one such 6036 state: if the look-ahead is `*' or `/' then the action is shifting, 6037 otherwise the action is reducing rule 1. In other words, the first two 6038 items, corresponding to rule 1, are not eligible when the look-ahead 6039 token is `*', since we specified that `*' has higher precedence than 6040 `+'. More generally, some items are eligible only with some set of 6041 possible look-ahead tokens. When run with `--report=look-ahead', Bison 6042 specifies these look-ahead tokens: 6043 6044 state 8 6045 6046 exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1) 6047 exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1) 6048 exp -> exp . '-' exp (rule 2) 6049 exp -> exp . '*' exp (rule 3) 6050 exp -> exp . '/' exp (rule 4) 6051 6052 '*' shift, and go to state 6 6053 '/' shift, and go to state 7 6054 6055 '/' [reduce using rule 1 (exp)] 6056 $default reduce using rule 1 (exp) 6057 6058 The remaining states are similar: 6059 6060 state 9 6061 6062 exp -> exp . '+' exp (rule 1) 6063 exp -> exp . '-' exp (rule 2) 6064 exp -> exp '-' exp . (rule 2) 6065 exp -> exp . '*' exp (rule 3) 6066 exp -> exp . '/' exp (rule 4) 6067 6068 '*' shift, and go to state 6 6069 '/' shift, and go to state 7 6070 6071 '/' [reduce using rule 2 (exp)] 6072 $default reduce using rule 2 (exp) 6073 6074 state 10 6075 6076 exp -> exp . '+' exp (rule 1) 6077 exp -> exp . '-' exp (rule 2) 6078 exp -> exp . '*' exp (rule 3) 6079 exp -> exp '*' exp . (rule 3) 6080 exp -> exp . '/' exp (rule 4) 6081 6082 '/' shift, and go to state 7 6083 6084 '/' [reduce using rule 3 (exp)] 6085 $default reduce using rule 3 (exp) 6086 6087 state 11 6088 6089 exp -> exp . '+' exp (rule 1) 6090 exp -> exp . '-' exp (rule 2) 6091 exp -> exp . '*' exp (rule 3) 6092 exp -> exp . '/' exp (rule 4) 6093 exp -> exp '/' exp . (rule 4) 6094 6095 '+' shift, and go to state 4 6096 '-' shift, and go to state 5 6097 '*' shift, and go to state 6 6098 '/' shift, and go to state 7 6099 6100 '+' [reduce using rule 4 (exp)] 6101 '-' [reduce using rule 4 (exp)] 6102 '*' [reduce using rule 4 (exp)] 6103 '/' [reduce using rule 4 (exp)] 6104 $default reduce using rule 4 (exp) 6105 6106 Observe that state 11 contains conflicts not only due to the lack of 6107 precedence of `/' with respect to `+', `-', and `*', but also because 6108 the associativity of `/' is not specified. 6109 6110 6111 File: bison.info, Node: Tracing, Prev: Understanding, Up: Debugging 6112 6113 8.2 Tracing Your Parser 6114 ======================= 6115 6116 If a Bison grammar compiles properly but doesn't do what you want when 6117 it runs, the `yydebug' parser-trace feature can help you figure out why. 6118 6119 There are several means to enable compilation of trace facilities: 6120 6121 the macro `YYDEBUG' 6122 Define the macro `YYDEBUG' to a nonzero value when you compile the 6123 parser. This is compliant with POSIX Yacc. You could use 6124 `-DYYDEBUG=1' as a compiler option or you could put `#define 6125 YYDEBUG 1' in the prologue of the grammar file (*note The 6126 Prologue: Prologue.). 6127 6128 the option `-t', `--debug' 6129 Use the `-t' option when you run Bison (*note Invoking Bison: 6130 Invocation.). This is POSIX compliant too. 6131 6132 the directive `%debug' 6133 Add the `%debug' directive (*note Bison Declaration Summary: Decl 6134 Summary.). This is a Bison extension, which will prove useful 6135 when Bison will output parsers for languages that don't use a 6136 preprocessor. Unless POSIX and Yacc portability matter to you, 6137 this is the preferred solution. 6138 6139 We suggest that you always enable the debug option so that debugging 6140 is always possible. 6141 6142 The trace facility outputs messages with macro calls of the form 6143 `YYFPRINTF (stderr, FORMAT, ARGS)' where FORMAT and ARGS are the usual 6144 `printf' format and arguments. If you define `YYDEBUG' to a nonzero 6145 value but do not define `YYFPRINTF', `<stdio.h>' is automatically 6146 included and `YYPRINTF' is defined to `fprintf'. 6147 6148 Once you have compiled the program with trace facilities, the way to 6149 request a trace is to store a nonzero value in the variable `yydebug'. 6150 You can do this by making the C code do it (in `main', perhaps), or you 6151 can alter the value with a C debugger. 6152 6153 Each step taken by the parser when `yydebug' is nonzero produces a 6154 line or two of trace information, written on `stderr'. The trace 6155 messages tell you these things: 6156 6157 * Each time the parser calls `yylex', what kind of token was read. 6158 6159 * Each time a token is shifted, the depth and complete contents of 6160 the state stack (*note Parser States::). 6161 6162 * Each time a rule is reduced, which rule it is, and the complete 6163 contents of the state stack afterward. 6164 6165 To make sense of this information, it helps to refer to the listing 6166 file produced by the Bison `-v' option (*note Invoking Bison: 6167 Invocation.). This file shows the meaning of each state in terms of 6168 positions in various rules, and also what each state will do with each 6169 possible input token. As you read the successive trace messages, you 6170 can see that the parser is functioning according to its specification in 6171 the listing file. Eventually you will arrive at the place where 6172 something undesirable happens, and you will see which parts of the 6173 grammar are to blame. 6174 6175 The parser file is a C program and you can use C debuggers on it, 6176 but it's not easy to interpret what it is doing. The parser function 6177 is a finite-state machine interpreter, and aside from the actions it 6178 executes the same code over and over. Only the values of variables 6179 show where in the grammar it is working. 6180 6181 The debugging information normally gives the token type of each token 6182 read, but not its semantic value. You can optionally define a macro 6183 named `YYPRINT' to provide a way to print the value. If you define 6184 `YYPRINT', it should take three arguments. The parser will pass a 6185 standard I/O stream, the numeric code for the token type, and the token 6186 value (from `yylval'). 6187 6188 Here is an example of `YYPRINT' suitable for the multi-function 6189 calculator (*note Declarations for `mfcalc': Mfcalc Decl.): 6190 6191 %{ 6192 static void print_token_value (FILE *, int, YYSTYPE); 6193 #define YYPRINT(file, type, value) print_token_value (file, type, value) 6194 %} 6195 6196 ... %% ... %% ... 6197 6198 static void 6199 print_token_value (FILE *file, int type, YYSTYPE value) 6200 { 6201 if (type == VAR) 6202 fprintf (file, "%s", value.tptr->name); 6203 else if (type == NUM) 6204 fprintf (file, "%d", value.val); 6205 } 6206 6207 6208 File: bison.info, Node: Invocation, Next: C++ Language Interface, Prev: Debugging, Up: Top 6209 6210 9 Invoking Bison 6211 **************** 6212 6213 The usual way to invoke Bison is as follows: 6214 6215 bison INFILE 6216 6217 Here INFILE is the grammar file name, which usually ends in `.y'. 6218 The parser file's name is made by replacing the `.y' with `.tab.c' and 6219 removing any leading directory. Thus, the `bison foo.y' file name 6220 yields `foo.tab.c', and the `bison hack/foo.y' file name yields 6221 `foo.tab.c'. It's also possible, in case you are writing C++ code 6222 instead of C in your grammar file, to name it `foo.ypp' or `foo.y++'. 6223 Then, the output files will take an extension like the given one as 6224 input (respectively `foo.tab.cpp' and `foo.tab.c++'). This feature 6225 takes effect with all options that manipulate file names like `-o' or 6226 `-d'. 6227 6228 For example : 6229 6230 bison -d INFILE.YXX 6231 will produce `infile.tab.cxx' and `infile.tab.hxx', and 6232 6233 bison -d -o OUTPUT.C++ INFILE.Y 6234 will produce `output.c++' and `outfile.h++'. 6235 6236 For compatibility with POSIX, the standard Bison distribution also 6237 contains a shell script called `yacc' that invokes Bison with the `-y' 6238 option. 6239 6240 * Menu: 6241 6242 * Bison Options:: All the options described in detail, 6243 in alphabetical order by short options. 6244 * Option Cross Key:: Alphabetical list of long options. 6245 * Yacc Library:: Yacc-compatible `yylex' and `main'. 6246 6247 6248 File: bison.info, Node: Bison Options, Next: Option Cross Key, Up: Invocation 6249 6250 9.1 Bison Options 6251 ================= 6252 6253 Bison supports both traditional single-letter options and mnemonic long 6254 option names. Long option names are indicated with `--' instead of 6255 `-'. Abbreviations for option names are allowed as long as they are 6256 unique. When a long option takes an argument, like `--file-prefix', 6257 connect the option name and the argument with `='. 6258 6259 Here is a list of options that can be used with Bison, alphabetized 6260 by short option. It is followed by a cross key alphabetized by long 6261 option. 6262 6263 Operations modes: 6264 `-h' 6265 `--help' 6266 Print a summary of the command-line options to Bison and exit. 6267 6268 `-V' 6269 `--version' 6270 Print the version number of Bison and exit. 6271 6272 `--print-localedir' 6273 Print the name of the directory containing locale-dependent data. 6274 6275 `-y' 6276 `--yacc' 6277 Act more like the traditional Yacc command. This can cause 6278 different diagnostics to be generated, and may change behavior in 6279 other minor ways. Most importantly, imitate Yacc's output file 6280 name conventions, so that the parser output file is called 6281 `y.tab.c', and the other outputs are called `y.output' and 6282 `y.tab.h'. Thus, the following shell script can substitute for 6283 Yacc, and the Bison distribution contains such a script for 6284 compatibility with POSIX: 6285 6286 #! /bin/sh 6287 bison -y "$@" 6288 6289 The `-y'/`--yacc' option is intended for use with traditional Yacc 6290 grammars. If your grammar uses a Bison extension like 6291 `%glr-parser', Bison might not be Yacc-compatible even if this 6292 option is specified. 6293 6294 6295 Tuning the parser: 6296 6297 `-S FILE' 6298 `--skeleton=FILE' 6299 Specify the skeleton to use. You probably don't need this option 6300 unless you are developing Bison. 6301 6302 `-t' 6303 `--debug' 6304 In the parser file, define the macro `YYDEBUG' to 1 if it is not 6305 already defined, so that the debugging facilities are compiled. 6306 *Note Tracing Your Parser: Tracing. 6307 6308 `--locations' 6309 Pretend that `%locations' was specified. *Note Decl Summary::. 6310 6311 `-p PREFIX' 6312 `--name-prefix=PREFIX' 6313 Pretend that `%name-prefix="PREFIX"' was specified. *Note Decl 6314 Summary::. 6315 6316 `-l' 6317 `--no-lines' 6318 Don't put any `#line' preprocessor commands in the parser file. 6319 Ordinarily Bison puts them in the parser file so that the C 6320 compiler and debuggers will associate errors with your source 6321 file, the grammar file. This option causes them to associate 6322 errors with the parser file, treating it as an independent source 6323 file in its own right. 6324 6325 `-n' 6326 `--no-parser' 6327 Pretend that `%no-parser' was specified. *Note Decl Summary::. 6328 6329 `-k' 6330 `--token-table' 6331 Pretend that `%token-table' was specified. *Note Decl Summary::. 6332 6333 Adjust the output: 6334 6335 `-d' 6336 `--defines' 6337 Pretend that `%defines' was specified, i.e., write an extra output 6338 file containing macro definitions for the token type names defined 6339 in the grammar, as well as a few other declarations. *Note Decl 6340 Summary::. 6341 6342 `--defines=DEFINES-FILE' 6343 Same as above, but save in the file DEFINES-FILE. 6344 6345 `-b FILE-PREFIX' 6346 `--file-prefix=PREFIX' 6347 Pretend that `%file-prefix' was specified, i.e, specify prefix to 6348 use for all Bison output file names. *Note Decl Summary::. 6349 6350 `-r THINGS' 6351 `--report=THINGS' 6352 Write an extra output file containing verbose description of the 6353 comma separated list of THINGS among: 6354 6355 `state' 6356 Description of the grammar, conflicts (resolved and 6357 unresolved), and LALR automaton. 6358 6359 `look-ahead' 6360 Implies `state' and augments the description of the automaton 6361 with each rule's look-ahead set. 6362 6363 `itemset' 6364 Implies `state' and augments the description of the automaton 6365 with the full set of items for each state, instead of its 6366 core only. 6367 6368 `-v' 6369 `--verbose' 6370 Pretend that `%verbose' was specified, i.e, write an extra output 6371 file containing verbose descriptions of the grammar and parser. 6372 *Note Decl Summary::. 6373 6374 `-o FILE' 6375 `--output=FILE' 6376 Specify the FILE for the parser file. 6377 6378 The other output files' names are constructed from FILE as 6379 described under the `-v' and `-d' options. 6380 6381 `-g' 6382 Output a VCG definition of the LALR(1) grammar automaton computed 6383 by Bison. If the grammar file is `foo.y', the VCG output file will 6384 be `foo.vcg'. 6385 6386 `--graph=GRAPH-FILE' 6387 The behavior of -GRAPH is the same than `-g'. The only difference 6388 is that it has an optional argument which is the name of the 6389 output graph file. 6390 6391 6392 File: bison.info, Node: Option Cross Key, Next: Yacc Library, Prev: Bison Options, Up: Invocation 6393 6394 9.2 Option Cross Key 6395 ==================== 6396 6397 Here is a list of options, alphabetized by long option, to help you find 6398 the corresponding short option. 6399 6400 Long Option Short Option 6401 ------------------------------------------------- 6402 `--debug' `-t' 6403 `--defines=DEFINES-FILE' `-d' 6404 `--file-prefix=PREFIX' `-b FILE-PREFIX' 6405 `--graph=GRAPH-FILE' `-d' 6406 `--help' `-h' 6407 `--name-prefix=PREFIX' `-p NAME-PREFIX' 6408 `--no-lines' `-l' 6409 `--no-parser' `-n' 6410 `--output=OUTFILE' `-o OUTFILE' 6411 `--print-localedir' 6412 `--token-table' `-k' 6413 `--verbose' `-v' 6414 `--version' `-V' 6415 `--yacc' `-y' 6416 6417 6418 File: bison.info, Node: Yacc Library, Prev: Option Cross Key, Up: Invocation 6419 6420 9.3 Yacc Library 6421 ================ 6422 6423 The Yacc library contains default implementations of the `yyerror' and 6424 `main' functions. These default implementations are normally not 6425 useful, but POSIX requires them. To use the Yacc library, link your 6426 program with the `-ly' option. Note that Bison's implementation of the 6427 Yacc library is distributed under the terms of the GNU General Public 6428 License (*note Copying::). 6429 6430 If you use the Yacc library's `yyerror' function, you should declare 6431 `yyerror' as follows: 6432 6433 int yyerror (char const *); 6434 6435 Bison ignores the `int' value returned by this `yyerror'. If you 6436 use the Yacc library's `main' function, your `yyparse' function should 6437 have the following type signature: 6438 6439 int yyparse (void); 6440 6441 6442 File: bison.info, Node: C++ Language Interface, Next: FAQ, Prev: Invocation, Up: Top 6443 6444 10 C++ Language Interface 6445 ************************* 6446 6447 * Menu: 6448 6449 * C++ Parsers:: The interface to generate C++ parser classes 6450 * A Complete C++ Example:: Demonstrating their use 6451 6452 6453 File: bison.info, Node: C++ Parsers, Next: A Complete C++ Example, Up: C++ Language Interface 6454 6455 10.1 C++ Parsers 6456 ================ 6457 6458 * Menu: 6459 6460 * C++ Bison Interface:: Asking for C++ parser generation 6461 * C++ Semantic Values:: %union vs. C++ 6462 * C++ Location Values:: The position and location classes 6463 * C++ Parser Interface:: Instantiating and running the parser 6464 * C++ Scanner Interface:: Exchanges between yylex and parse 6465 6466 6467 File: bison.info, Node: C++ Bison Interface, Next: C++ Semantic Values, Up: C++ Parsers 6468 6469 10.1.1 C++ Bison Interface 6470 -------------------------- 6471 6472 The C++ parser LALR(1) skeleton is named `lalr1.cc'. To select it, you 6473 may either pass the option `--skeleton=lalr1.cc' to Bison, or include 6474 the directive `%skeleton "lalr1.cc"' in the grammar preamble. When 6475 run, `bison' will create several entities in the `yy' namespace. Use 6476 the `%name-prefix' directive to change the namespace name, see *Note 6477 Decl Summary::. The various classes are generated in the following 6478 files: 6479 6480 `position.hh' 6481 `location.hh' 6482 The definition of the classes `position' and `location', used for 6483 location tracking. *Note C++ Location Values::. 6484 6485 `stack.hh' 6486 An auxiliary class `stack' used by the parser. 6487 6488 `FILE.hh' 6489 `FILE.cc' 6490 (Assuming the extension of the input file was `.yy'.) The 6491 declaration and implementation of the C++ parser class. The 6492 basename and extension of these two files follow the same rules as 6493 with regular C parsers (*note Invocation::). 6494 6495 The header is _mandatory_; you must either pass `-d'/`--defines' 6496 to `bison', or use the `%defines' directive. 6497 6498 All these files are documented using Doxygen; run `doxygen' for a 6499 complete and accurate documentation. 6500 6501 6502 File: bison.info, Node: C++ Semantic Values, Next: C++ Location Values, Prev: C++ Bison Interface, Up: C++ Parsers 6503 6504 10.1.2 C++ Semantic Values 6505 -------------------------- 6506 6507 The `%union' directive works as for C, see *Note The Collection of 6508 Value Types: Union Decl. In particular it produces a genuine 6509 `union'(1), which have a few specific features in C++. 6510 - The type `YYSTYPE' is defined but its use is discouraged: rather 6511 you should refer to the parser's encapsulated type 6512 `yy::parser::semantic_type'. 6513 6514 - Non POD (Plain Old Data) types cannot be used. C++ forbids any 6515 instance of classes with constructors in unions: only _pointers_ 6516 to such objects are allowed. 6517 6518 Because objects have to be stored via pointers, memory is not 6519 reclaimed automatically: using the `%destructor' directive is the only 6520 means to avoid leaks. *Note Freeing Discarded Symbols: Destructor Decl. 6521 6522 ---------- Footnotes ---------- 6523 6524 (1) In the future techniques to allow complex types within 6525 pseudo-unions (similar to Boost variants) might be implemented to 6526 alleviate these issues. 6527 6528 6529 File: bison.info, Node: C++ Location Values, Next: C++ Parser Interface, Prev: C++ Semantic Values, Up: C++ Parsers 6530 6531 10.1.3 C++ Location Values 6532 -------------------------- 6533 6534 When the directive `%locations' is used, the C++ parser supports 6535 location tracking, see *Note Locations Overview: Locations. Two 6536 auxiliary classes define a `position', a single point in a file, and a 6537 `location', a range composed of a pair of `position's (possibly 6538 spanning several files). 6539 6540 -- Method on position: std::string* file 6541 The name of the file. It will always be handled as a pointer, the 6542 parser will never duplicate nor deallocate it. As an experimental 6543 feature you may change it to `TYPE*' using `%define 6544 "filename_type" "TYPE"'. 6545 6546 -- Method on position: unsigned int line 6547 The line, starting at 1. 6548 6549 -- Method on position: unsigned int lines (int HEIGHT = 1) 6550 Advance by HEIGHT lines, resetting the column number. 6551 6552 -- Method on position: unsigned int column 6553 The column, starting at 0. 6554 6555 -- Method on position: unsigned int columns (int WIDTH = 1) 6556 Advance by WIDTH columns, without changing the line number. 6557 6558 -- Method on position: position& operator+= (position& POS, int WIDTH) 6559 -- Method on position: position operator+ (const position& POS, int 6560 WIDTH) 6561 -- Method on position: position& operator-= (const position& POS, int 6562 WIDTH) 6563 -- Method on position: position operator- (position& POS, int WIDTH) 6564 Various forms of syntactic sugar for `columns'. 6565 6566 -- Method on position: position operator<< (std::ostream O, const 6567 position& P) 6568 Report P on O like this: `FILE:LINE.COLUMN', or `LINE.COLUMN' if 6569 FILE is null. 6570 6571 -- Method on location: position begin 6572 -- Method on location: position end 6573 The first, inclusive, position of the range, and the first beyond. 6574 6575 -- Method on location: unsigned int columns (int WIDTH = 1) 6576 -- Method on location: unsigned int lines (int HEIGHT = 1) 6577 Advance the `end' position. 6578 6579 -- Method on location: location operator+ (const location& BEGIN, 6580 const location& END) 6581 -- Method on location: location operator+ (const location& BEGIN, int 6582 WIDTH) 6583 -- Method on location: location operator+= (const location& LOC, int 6584 WIDTH) 6585 Various forms of syntactic sugar. 6586 6587 -- Method on location: void step () 6588 Move `begin' onto `end'. 6589 6590 6591 File: bison.info, Node: C++ Parser Interface, Next: C++ Scanner Interface, Prev: C++ Location Values, Up: C++ Parsers 6592 6593 10.1.4 C++ Parser Interface 6594 --------------------------- 6595 6596 The output files `OUTPUT.hh' and `OUTPUT.cc' declare and define the 6597 parser class in the namespace `yy'. The class name defaults to 6598 `parser', but may be changed using `%define "parser_class_name" 6599 "NAME"'. The interface of this class is detailed below. It can be 6600 extended using the `%parse-param' feature: its semantics is slightly 6601 changed since it describes an additional member of the parser class, 6602 and an additional argument for its constructor. 6603 6604 -- Type of parser: semantic_value_type 6605 -- Type of parser: location_value_type 6606 The types for semantics value and locations. 6607 6608 -- Method on parser: parser (TYPE1 ARG1, ...) 6609 Build a new parser object. There are no arguments by default, 6610 unless `%parse-param {TYPE1 ARG1}' was used. 6611 6612 -- Method on parser: int parse () 6613 Run the syntactic analysis, and return 0 on success, 1 otherwise. 6614 6615 -- Method on parser: std::ostream& debug_stream () 6616 -- Method on parser: void set_debug_stream (std::ostream& O) 6617 Get or set the stream used for tracing the parsing. It defaults to 6618 `std::cerr'. 6619 6620 -- Method on parser: debug_level_type debug_level () 6621 -- Method on parser: void set_debug_level (debug_level L) 6622 Get or set the tracing level. Currently its value is either 0, no 6623 trace, or nonzero, full tracing. 6624 6625 -- Method on parser: void error (const location_type& L, const 6626 std::string& M) 6627 The definition for this member function must be supplied by the 6628 user: the parser uses it to report a parser error occurring at L, 6629 described by M. 6630 6631 6632 File: bison.info, Node: C++ Scanner Interface, Prev: C++ Parser Interface, Up: C++ Parsers 6633 6634 10.1.5 C++ Scanner Interface 6635 ---------------------------- 6636 6637 The parser invokes the scanner by calling `yylex'. Contrary to C 6638 parsers, C++ parsers are always pure: there is no point in using the 6639 `%pure-parser' directive. Therefore the interface is as follows. 6640 6641 -- Method on parser: int yylex (semantic_value_type& YYLVAL, 6642 location_type& YYLLOC, TYPE1 ARG1, ...) 6643 Return the next token. Its type is the return value, its semantic 6644 value and location being YYLVAL and YYLLOC. Invocations of 6645 `%lex-param {TYPE1 ARG1}' yield additional arguments. 6646 6647 6648 File: bison.info, Node: A Complete C++ Example, Prev: C++ Parsers, Up: C++ Language Interface 6649 6650 10.2 A Complete C++ Example 6651 =========================== 6652 6653 This section demonstrates the use of a C++ parser with a simple but 6654 complete example. This example should be available on your system, 6655 ready to compile, in the directory "../bison/examples/calc++". It 6656 focuses on the use of Bison, therefore the design of the various C++ 6657 classes is very naive: no accessors, no encapsulation of members etc. 6658 We will use a Lex scanner, and more precisely, a Flex scanner, to 6659 demonstrate the various interaction. A hand written scanner is 6660 actually easier to interface with. 6661 6662 * Menu: 6663 6664 * Calc++ --- C++ Calculator:: The specifications 6665 * Calc++ Parsing Driver:: An active parsing context 6666 * Calc++ Parser:: A parser class 6667 * Calc++ Scanner:: A pure C++ Flex scanner 6668 * Calc++ Top Level:: Conducting the band 6669 6670 6671 File: bison.info, Node: Calc++ --- C++ Calculator, Next: Calc++ Parsing Driver, Up: A Complete C++ Example 6672 6673 10.2.1 Calc++ -- C++ Calculator 6674 ------------------------------- 6675 6676 Of course the grammar is dedicated to arithmetics, a single expression, 6677 possibly preceded by variable assignments. An environment containing 6678 possibly predefined variables such as `one' and `two', is exchanged 6679 with the parser. An example of valid input follows. 6680 6681 three := 3 6682 seven := one + two * three 6683 seven * seven 6684 6685 6686 File: bison.info, Node: Calc++ Parsing Driver, Next: Calc++ Parser, Prev: Calc++ --- C++ Calculator, Up: A Complete C++ Example 6687 6688 10.2.2 Calc++ Parsing Driver 6689 ---------------------------- 6690 6691 To support a pure interface with the parser (and the scanner) the 6692 technique of the "parsing context" is convenient: a structure 6693 containing all the data to exchange. Since, in addition to simply 6694 launch the parsing, there are several auxiliary tasks to execute (open 6695 the file for parsing, instantiate the parser etc.), we recommend 6696 transforming the simple parsing context structure into a fully blown 6697 "parsing driver" class. 6698 6699 The declaration of this driver class, `calc++-driver.hh', is as 6700 follows. The first part includes the CPP guard and imports the 6701 required standard library components, and the declaration of the parser 6702 class. 6703 6704 #ifndef CALCXX_DRIVER_HH 6705 # define CALCXX_DRIVER_HH 6706 # include <string> 6707 # include <map> 6708 # include "calc++-parser.hh" 6709 6710 Then comes the declaration of the scanning function. Flex expects the 6711 signature of `yylex' to be defined in the macro `YY_DECL', and the C++ 6712 parser expects it to be declared. We can factor both as follows. 6713 6714 // Announce to Flex the prototype we want for lexing function, ... 6715 # define YY_DECL \ 6716 yy::calcxx_parser::token_type \ 6717 yylex (yy::calcxx_parser::semantic_type* yylval, \ 6718 yy::calcxx_parser::location_type* yylloc, \ 6719 calcxx_driver& driver) 6720 // ... and declare it for the parser's sake. 6721 YY_DECL; 6722 6723 The `calcxx_driver' class is then declared with its most obvious 6724 members. 6725 6726 // Conducting the whole scanning and parsing of Calc++. 6727 class calcxx_driver 6728 { 6729 public: 6730 calcxx_driver (); 6731 virtual ~calcxx_driver (); 6732 6733 std::map<std::string, int> variables; 6734 6735 int result; 6736 6737 To encapsulate the coordination with the Flex scanner, it is useful to 6738 have two members function to open and close the scanning phase. 6739 members. 6740 6741 // Handling the scanner. 6742 void scan_begin (); 6743 void scan_end (); 6744 bool trace_scanning; 6745 6746 Similarly for the parser itself. 6747 6748 // Handling the parser. 6749 void parse (const std::string& f); 6750 std::string file; 6751 bool trace_parsing; 6752 6753 To demonstrate pure handling of parse errors, instead of simply dumping 6754 them on the standard error output, we will pass them to the compiler 6755 driver using the following two member functions. Finally, we close the 6756 class declaration and CPP guard. 6757 6758 // Error handling. 6759 void error (const yy::location& l, const std::string& m); 6760 void error (const std::string& m); 6761 }; 6762 #endif // ! CALCXX_DRIVER_HH 6763 6764 The implementation of the driver is straightforward. The `parse' 6765 member function deserves some attention. The `error' functions are 6766 simple stubs, they should actually register the located error messages 6767 and set error state. 6768 6769 #include "calc++-driver.hh" 6770 #include "calc++-parser.hh" 6771 6772 calcxx_driver::calcxx_driver () 6773 : trace_scanning (false), trace_parsing (false) 6774 { 6775 variables["one"] = 1; 6776 variables["two"] = 2; 6777 } 6778 6779 calcxx_driver::~calcxx_driver () 6780 { 6781 } 6782 6783 void 6784 calcxx_driver::parse (const std::string &f) 6785 { 6786 file = f; 6787 scan_begin (); 6788 yy::calcxx_parser parser (*this); 6789 parser.set_debug_level (trace_parsing); 6790 parser.parse (); 6791 scan_end (); 6792 } 6793 6794 void 6795 calcxx_driver::error (const yy::location& l, const std::string& m) 6796 { 6797 std::cerr << l << ": " << m << std::endl; 6798 } 6799 6800 void 6801 calcxx_driver::error (const std::string& m) 6802 { 6803 std::cerr << m << std::endl; 6804 } 6805 6806 6807 File: bison.info, Node: Calc++ Parser, Next: Calc++ Scanner, Prev: Calc++ Parsing Driver, Up: A Complete C++ Example 6808 6809 10.2.3 Calc++ Parser 6810 -------------------- 6811 6812 The parser definition file `calc++-parser.yy' starts by asking for the 6813 C++ LALR(1) skeleton, the creation of the parser header file, and 6814 specifies the name of the parser class. Because the C++ skeleton 6815 changed several times, it is safer to require the version you designed 6816 the grammar for. 6817 6818 %skeleton "lalr1.cc" /* -*- C++ -*- */ 6819 %require "2.1a" 6820 %defines 6821 %define "parser_class_name" "calcxx_parser" 6822 6823 Then come the declarations/inclusions needed to define the `%union'. 6824 Because the parser uses the parsing driver and reciprocally, both 6825 cannot include the header of the other. Because the driver's header 6826 needs detailed knowledge about the parser class (in particular its 6827 inner types), it is the parser's header which will simply use a forward 6828 declaration of the driver. 6829 6830 %{ 6831 # include <string> 6832 class calcxx_driver; 6833 %} 6834 6835 The driver is passed by reference to the parser and to the scanner. 6836 This provides a simple but effective pure interface, not relying on 6837 global variables. 6838 6839 // The parsing context. 6840 %parse-param { calcxx_driver& driver } 6841 %lex-param { calcxx_driver& driver } 6842 6843 Then we request the location tracking feature, and initialize the first 6844 location's file name. Afterwards new locations are computed relatively 6845 to the previous locations: the file name will be automatically 6846 propagated. 6847 6848 %locations 6849 %initial-action 6850 { 6851 // Initialize the initial location. 6852 @$.begin.filename = @$.end.filename = &driver.file; 6853 }; 6854 6855 Use the two following directives to enable parser tracing and verbose 6856 error messages. 6857 6858 %debug 6859 %error-verbose 6860 6861 Semantic values cannot use "real" objects, but only pointers to them. 6862 6863 // Symbols. 6864 %union 6865 { 6866 int ival; 6867 std::string *sval; 6868 }; 6869 6870 The code between `%{' and `%}' after the introduction of the `%union' 6871 is output in the `*.cc' file; it needs detailed knowledge about the 6872 driver. 6873 6874 %{ 6875 # include "calc++-driver.hh" 6876 %} 6877 6878 The token numbered as 0 corresponds to end of file; the following line 6879 allows for nicer error messages referring to "end of file" instead of 6880 "$end". Similarly user friendly named are provided for each symbol. 6881 Note that the tokens names are prefixed by `TOKEN_' to avoid name 6882 clashes. 6883 6884 %token END 0 "end of file" 6885 %token ASSIGN ":=" 6886 %token <sval> IDENTIFIER "identifier" 6887 %token <ival> NUMBER "number" 6888 %type <ival> exp "expression" 6889 6890 To enable memory deallocation during error recovery, use `%destructor'. 6891 6892 %printer { debug_stream () << *$$; } "identifier" 6893 %destructor { delete $$; } "identifier" 6894 6895 %printer { debug_stream () << $$; } "number" "expression" 6896 6897 The grammar itself is straightforward. 6898 6899 %% 6900 %start unit; 6901 unit: assignments exp { driver.result = $2; }; 6902 6903 assignments: assignments assignment {} 6904 | /* Nothing. */ {}; 6905 6906 assignment: "identifier" ":=" exp { driver.variables[*$1] = $3; }; 6907 6908 %left '+' '-'; 6909 %left '*' '/'; 6910 exp: exp '+' exp { $$ = $1 + $3; } 6911 | exp '-' exp { $$ = $1 - $3; } 6912 | exp '*' exp { $$ = $1 * $3; } 6913 | exp '/' exp { $$ = $1 / $3; } 6914 | "identifier" { $$ = driver.variables[*$1]; } 6915 | "number" { $$ = $1; }; 6916 %% 6917 6918 Finally the `error' member function registers the errors to the driver. 6919 6920 void 6921 yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, 6922 const std::string& m) 6923 { 6924 driver.error (l, m); 6925 } 6926 6927 6928 File: bison.info, Node: Calc++ Scanner, Next: Calc++ Top Level, Prev: Calc++ Parser, Up: A Complete C++ Example 6929 6930 10.2.4 Calc++ Scanner 6931 --------------------- 6932 6933 The Flex scanner first includes the driver declaration, then the 6934 parser's to get the set of defined tokens. 6935 6936 %{ /* -*- C++ -*- */ 6937 # include <cstdlib> 6938 # include <errno.h> 6939 # include <limits.h> 6940 # include <string> 6941 # include "calc++-driver.hh" 6942 # include "calc++-parser.hh" 6943 6944 /* Work around an incompatibility in flex (at least versions 6945 2.5.31 through 2.5.33): it generates code that does 6946 not conform to C89. See Debian bug 333231 6947 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */ 6948 # undef yywrap 6949 # define yywrap() 1 6950 6951 /* By default yylex returns int, we use token_type. 6952 Unfortunately yyterminate by default returns 0, which is 6953 not of token_type. */ 6954 #define yyterminate() return token::END 6955 %} 6956 6957 Because there is no `#include'-like feature we don't need `yywrap', we 6958 don't need `unput' either, and we parse an actual file, this is not an 6959 interactive session with the user. Finally we enable the scanner 6960 tracing features. 6961 6962 %option noyywrap nounput batch debug 6963 6964 Abbreviations allow for more readable rules. 6965 6966 id [a-zA-Z][a-zA-Z_0-9]* 6967 int [0-9]+ 6968 blank [ \t] 6969 6970 The following paragraph suffices to track locations accurately. Each 6971 time `yylex' is invoked, the begin position is moved onto the end 6972 position. Then when a pattern is matched, the end position is advanced 6973 of its width. In case it matched ends of lines, the end cursor is 6974 adjusted, and each time blanks are matched, the begin cursor is moved 6975 onto the end cursor to effectively ignore the blanks preceding tokens. 6976 Comments would be treated equally. 6977 6978 %{ 6979 # define YY_USER_ACTION yylloc->columns (yyleng); 6980 %} 6981 %% 6982 %{ 6983 yylloc->step (); 6984 %} 6985 {blank}+ yylloc->step (); 6986 [\n]+ yylloc->lines (yyleng); yylloc->step (); 6987 6988 The rules are simple, just note the use of the driver to report errors. 6989 It is convenient to use a typedef to shorten 6990 `yy::calcxx_parser::token::identifier' into `token::identifier' for 6991 instance. 6992 6993 %{ 6994 typedef yy::calcxx_parser::token token; 6995 %} 6996 /* Convert ints to the actual type of tokens. */ 6997 [-+*/] return yy::calcxx_parser::token_type (yytext[0]); 6998 ":=" return token::ASSIGN; 6999 {int} { 7000 errno = 0; 7001 long n = strtol (yytext, NULL, 10); 7002 if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) 7003 driver.error (*yylloc, "integer is out of range"); 7004 yylval->ival = n; 7005 return token::NUMBER; 7006 } 7007 {id} yylval->sval = new std::string (yytext); return token::IDENTIFIER; 7008 . driver.error (*yylloc, "invalid character"); 7009 %% 7010 7011 Finally, because the scanner related driver's member function depend on 7012 the scanner's data, it is simpler to implement them in this file. 7013 7014 void 7015 calcxx_driver::scan_begin () 7016 { 7017 yy_flex_debug = trace_scanning; 7018 if (!(yyin = fopen (file.c_str (), "r"))) 7019 error (std::string ("cannot open ") + file); 7020 } 7021 7022 void 7023 calcxx_driver::scan_end () 7024 { 7025 fclose (yyin); 7026 } 7027 7028 7029 File: bison.info, Node: Calc++ Top Level, Prev: Calc++ Scanner, Up: A Complete C++ Example 7030 7031 10.2.5 Calc++ Top Level 7032 ----------------------- 7033 7034 The top level file, `calc++.cc', poses no problem. 7035 7036 #include <iostream> 7037 #include "calc++-driver.hh" 7038 7039 int 7040 main (int argc, char *argv[]) 7041 { 7042 calcxx_driver driver; 7043 for (++argv; argv[0]; ++argv) 7044 if (*argv == std::string ("-p")) 7045 driver.trace_parsing = true; 7046 else if (*argv == std::string ("-s")) 7047 driver.trace_scanning = true; 7048 else 7049 { 7050 driver.parse (*argv); 7051 std::cout << driver.result << std::endl; 7052 } 7053 } 7054 7055 7056 File: bison.info, Node: FAQ, Next: Table of Symbols, Prev: C++ Language Interface, Up: Top 7057 7058 11 Frequently Asked Questions 7059 ***************************** 7060 7061 Several questions about Bison come up occasionally. Here some of them 7062 are addressed. 7063 7064 * Menu: 7065 7066 * Memory Exhausted:: Breaking the Stack Limits 7067 * How Can I Reset the Parser:: `yyparse' Keeps some State 7068 * Strings are Destroyed:: `yylval' Loses Track of Strings 7069 * Implementing Gotos/Loops:: Control Flow in the Calculator 7070 * Multiple start-symbols:: Factoring closely related grammars 7071 * Secure? Conform?:: Is Bison POSIX safe? 7072 * I can't build Bison:: Troubleshooting 7073 * Where can I find help?:: Troubleshouting 7074 * Bug Reports:: Troublereporting 7075 * Other Languages:: Parsers in Java and others 7076 * Beta Testing:: Experimenting development versions 7077 * Mailing Lists:: Meeting other Bison users 7078 7079 7080 File: bison.info, Node: Memory Exhausted, Next: How Can I Reset the Parser, Up: FAQ 7081 7082 11.1 Memory Exhausted 7083 ===================== 7084 7085 My parser returns with error with a `memory exhausted' 7086 message. What can I do? 7087 7088 This question is already addressed elsewhere, *Note Recursive Rules: 7089 Recursion. 7090 7091 7092 File: bison.info, Node: How Can I Reset the Parser, Next: Strings are Destroyed, Prev: Memory Exhausted, Up: FAQ 7093 7094 11.2 How Can I Reset the Parser 7095 =============================== 7096 7097 The following phenomenon has several symptoms, resulting in the 7098 following typical questions: 7099 7100 I invoke `yyparse' several times, and on correct input it works 7101 properly; but when a parse error is found, all the other calls fail 7102 too. How can I reset the error flag of `yyparse'? 7103 7104 or 7105 7106 My parser includes support for an `#include'-like feature, in 7107 which case I run `yyparse' from `yyparse'. This fails 7108 although I did specify I needed a `%pure-parser'. 7109 7110 These problems typically come not from Bison itself, but from 7111 Lex-generated scanners. Because these scanners use large buffers for 7112 speed, they might not notice a change of input file. As a 7113 demonstration, consider the following source file, `first-line.l': 7114 7115 7116 %{ 7117 #include <stdio.h> 7118 #include <stdlib.h> 7119 %} 7120 %% 7121 .*\n ECHO; return 1; 7122 %% 7123 int 7124 yyparse (char const *file) 7125 { 7126 yyin = fopen (file, "r"); 7127 if (!yyin) 7128 exit (2); 7129 /* One token only. */ 7130 yylex (); 7131 if (fclose (yyin) != 0) 7132 exit (3); 7133 return 0; 7134 } 7135 7136 int 7137 main (void) 7138 { 7139 yyparse ("input"); 7140 yyparse ("input"); 7141 return 0; 7142 } 7143 7144 If the file `input' contains 7145 7146 7147 input:1: Hello, 7148 input:2: World! 7149 7150 then instead of getting the first line twice, you get: 7151 7152 $ flex -ofirst-line.c first-line.l 7153 $ gcc -ofirst-line first-line.c -ll 7154 $ ./first-line 7155 input:1: Hello, 7156 input:2: World! 7157 7158 Therefore, whenever you change `yyin', you must tell the 7159 Lex-generated scanner to discard its current buffer and switch to the 7160 new one. This depends upon your implementation of Lex; see its 7161 documentation for more. For Flex, it suffices to call 7162 `YY_FLUSH_BUFFER' after each change to `yyin'. If your Flex-generated 7163 scanner needs to read from several input streams to handle features 7164 like include files, you might consider using Flex functions like 7165 `yy_switch_to_buffer' that manipulate multiple input buffers. 7166 7167 If your Flex-generated scanner uses start conditions (*note Start 7168 conditions: (flex)Start conditions.), you might also want to reset the 7169 scanner's state, i.e., go back to the initial start condition, through 7170 a call to `BEGIN (0)'. 7171 7172 7173 File: bison.info, Node: Strings are Destroyed, Next: Implementing Gotos/Loops, Prev: How Can I Reset the Parser, Up: FAQ 7174 7175 11.3 Strings are Destroyed 7176 ========================== 7177 7178 My parser seems to destroy old strings, or maybe it loses track of 7179 them. Instead of reporting `"foo", "bar"', it reports 7180 `"bar", "bar"', or even `"foo\nbar", "bar"'. 7181 7182 This error is probably the single most frequent "bug report" sent to 7183 Bison lists, but is only concerned with a misunderstanding of the role 7184 of the scanner. Consider the following Lex code: 7185 7186 7187 %{ 7188 #include <stdio.h> 7189 char *yylval = NULL; 7190 %} 7191 %% 7192 .* yylval = yytext; return 1; 7193 \n /* IGNORE */ 7194 %% 7195 int 7196 main () 7197 { 7198 /* Similar to using $1, $2 in a Bison action. */ 7199 char *fst = (yylex (), yylval); 7200 char *snd = (yylex (), yylval); 7201 printf ("\"%s\", \"%s\"\n", fst, snd); 7202 return 0; 7203 } 7204 7205 If you compile and run this code, you get: 7206 7207 $ flex -osplit-lines.c split-lines.l 7208 $ gcc -osplit-lines split-lines.c -ll 7209 $ printf 'one\ntwo\n' | ./split-lines 7210 "one 7211 two", "two" 7212 7213 this is because `yytext' is a buffer provided for _reading_ in the 7214 action, but if you want to keep it, you have to duplicate it (e.g., 7215 using `strdup'). Note that the output may depend on how your 7216 implementation of Lex handles `yytext'. For instance, when given the 7217 Lex compatibility option `-l' (which triggers the option `%array') Flex 7218 generates a different behavior: 7219 7220 $ flex -l -osplit-lines.c split-lines.l 7221 $ gcc -osplit-lines split-lines.c -ll 7222 $ printf 'one\ntwo\n' | ./split-lines 7223 "two", "two" 7224 7225 7226 File: bison.info, Node: Implementing Gotos/Loops, Next: Multiple start-symbols, Prev: Strings are Destroyed, Up: FAQ 7227 7228 11.4 Implementing Gotos/Loops 7229 ============================= 7230 7231 My simple calculator supports variables, assignments, and functions, 7232 but how can I implement gotos, or loops? 7233 7234 Although very pedagogical, the examples included in the document blur 7235 the distinction to make between the parser--whose job is to recover the 7236 structure of a text and to transmit it to subsequent modules of the 7237 program--and the processing (such as the execution) of this structure. 7238 This works well with so called straight line programs, i.e., precisely 7239 those that have a straightforward execution model: execute simple 7240 instructions one after the others. 7241 7242 If you want a richer model, you will probably need to use the parser 7243 to construct a tree that does represent the structure it has recovered; 7244 this tree is usually called the "abstract syntax tree", or "AST" for 7245 short. Then, walking through this tree, traversing it in various ways, 7246 will enable treatments such as its execution or its translation, which 7247 will result in an interpreter or a compiler. 7248 7249 This topic is way beyond the scope of this manual, and the reader is 7250 invited to consult the dedicated literature. 7251 7252 7253 File: bison.info, Node: Multiple start-symbols, Next: Secure? Conform?, Prev: Implementing Gotos/Loops, Up: FAQ 7254 7255 11.5 Multiple start-symbols 7256 =========================== 7257 7258 I have several closely related grammars, and I would like to share their 7259 implementations. In fact, I could use a single grammar but with 7260 multiple entry points. 7261 7262 Bison does not support multiple start-symbols, but there is a very 7263 simple means to simulate them. If `foo' and `bar' are the two pseudo 7264 start-symbols, then introduce two new tokens, say `START_FOO' and 7265 `START_BAR', and use them as switches from the real start-symbol: 7266 7267 %token START_FOO START_BAR; 7268 %start start; 7269 start: START_FOO foo 7270 | START_BAR bar; 7271 7272 These tokens prevents the introduction of new conflicts. As far as 7273 the parser goes, that is all that is needed. 7274 7275 Now the difficult part is ensuring that the scanner will send these 7276 tokens first. If your scanner is hand-written, that should be 7277 straightforward. If your scanner is generated by Lex, them there is 7278 simple means to do it: recall that anything between `%{ ... %}' after 7279 the first `%%' is copied verbatim in the top of the generated `yylex' 7280 function. Make sure a variable `start_token' is available in the 7281 scanner (e.g., a global variable or using `%lex-param' etc.), and use 7282 the following: 7283 7284 /* Prologue. */ 7285 %% 7286 %{ 7287 if (start_token) 7288 { 7289 int t = start_token; 7290 start_token = 0; 7291 return t; 7292 } 7293 %} 7294 /* The rules. */ 7295 7296 7297 File: bison.info, Node: Secure? Conform?, Next: I can't build Bison, Prev: Multiple start-symbols, Up: FAQ 7298 7299 11.6 Secure? Conform? 7300 ====================== 7301 7302 Is Bison secure? Does it conform to POSIX? 7303 7304 If you're looking for a guarantee or certification, we don't provide 7305 it. However, Bison is intended to be a reliable program that conforms 7306 to the POSIX specification for Yacc. If you run into problems, please 7307 send us a bug report. 7308 7309 7310 File: bison.info, Node: I can't build Bison, Next: Where can I find help?, Prev: Secure? Conform?, Up: FAQ 7311 7312 11.7 I can't build Bison 7313 ======================== 7314 7315 I can't build Bison because `make' complains that 7316 `msgfmt' is not found. 7317 What should I do? 7318 7319 Like most GNU packages with internationalization support, that 7320 feature is turned on by default. If you have problems building in the 7321 `po' subdirectory, it indicates that your system's internationalization 7322 support is lacking. You can re-configure Bison with `--disable-nls' to 7323 turn off this support, or you can install GNU gettext from 7324 `ftp://ftp.gnu.org/gnu/gettext/' and re-configure Bison. See the file 7325 `ABOUT-NLS' for more information. 7326 7327 7328 File: bison.info, Node: Where can I find help?, Next: Bug Reports, Prev: I can't build Bison, Up: FAQ 7329 7330 11.8 Where can I find help? 7331 =========================== 7332 7333 I'm having trouble using Bison. Where can I find help? 7334 7335 First, read this fine manual. Beyond that, you can send mail to 7336 <help-bison (a] gnu.org>. This mailing list is intended to be populated 7337 with people who are willing to answer questions about using and 7338 installing Bison. Please keep in mind that (most of) the people on the 7339 list have aspects of their lives which are not related to Bison (!), so 7340 you may not receive an answer to your question right away. This can be 7341 frustrating, but please try not to honk them off; remember that any 7342 help they provide is purely voluntary and out of the kindness of their 7343 hearts. 7344 7345 7346 File: bison.info, Node: Bug Reports, Next: Other Languages, Prev: Where can I find help?, Up: FAQ 7347 7348 11.9 Bug Reports 7349 ================ 7350 7351 I found a bug. What should I include in the bug report? 7352 7353 Before you send a bug report, make sure you are using the latest 7354 version. Check `ftp://ftp.gnu.org/pub/gnu/bison/' or one of its 7355 mirrors. Be sure to include the version number in your bug report. If 7356 the bug is present in the latest version but not in a previous version, 7357 try to determine the most recent version which did not contain the bug. 7358 7359 If the bug is parser-related, you should include the smallest grammar 7360 you can which demonstrates the bug. The grammar file should also be 7361 complete (i.e., I should be able to run it through Bison without having 7362 to edit or add anything). The smaller and simpler the grammar, the 7363 easier it will be to fix the bug. 7364 7365 Include information about your compilation environment, including 7366 your operating system's name and version and your compiler's name and 7367 version. If you have trouble compiling, you should also include a 7368 transcript of the build session, starting with the invocation of 7369 `configure'. Depending on the nature of the bug, you may be asked to 7370 send additional files as well (such as `config.h' or `config.cache'). 7371 7372 Patches are most welcome, but not required. That is, do not 7373 hesitate to send a bug report just because you can not provide a fix. 7374 7375 Send bug reports to <bug-bison (a] gnu.org>. 7376 7377 7378 File: bison.info, Node: Other Languages, Next: Beta Testing, Prev: Bug Reports, Up: FAQ 7379 7380 11.10 Other Languages 7381 ===================== 7382 7383 Will Bison ever have C++ support? How about Java or INSERT YOUR 7384 FAVORITE LANGUAGE HERE? 7385 7386 C++ support is there now, and is documented. We'd love to add other 7387 languages; contributions are welcome. 7388 7389 7390 File: bison.info, Node: Beta Testing, Next: Mailing Lists, Prev: Other Languages, Up: FAQ 7391 7392 11.11 Beta Testing 7393 ================== 7394 7395 What is involved in being a beta tester? 7396 7397 It's not terribly involved. Basically, you would download a test 7398 release, compile it, and use it to build and run a parser or two. After 7399 that, you would submit either a bug report or a message saying that 7400 everything is okay. It is important to report successes as well as 7401 failures because test releases eventually become mainstream releases, 7402 but only if they are adequately tested. If no one tests, development is 7403 essentially halted. 7404 7405 Beta testers are particularly needed for operating systems to which 7406 the developers do not have easy access. They currently have easy 7407 access to recent GNU/Linux and Solaris versions. Reports about other 7408 operating systems are especially welcome. 7409 7410 7411 File: bison.info, Node: Mailing Lists, Prev: Beta Testing, Up: FAQ 7412 7413 11.12 Mailing Lists 7414 =================== 7415 7416 How do I join the help-bison and bug-bison mailing lists? 7417 7418 See `http://lists.gnu.org/'. 7419 7420 7421 File: bison.info, Node: Table of Symbols, Next: Glossary, Prev: FAQ, Up: Top 7422 7423 Appendix A Bison Symbols 7424 ************************ 7425 7426 -- Variable: @$ 7427 In an action, the location of the left-hand side of the rule. 7428 *Note Locations Overview: Locations. 7429 7430 -- Variable: @N 7431 In an action, the location of the N-th symbol of the right-hand 7432 side of the rule. *Note Locations Overview: Locations. 7433 7434 -- Variable: $$ 7435 In an action, the semantic value of the left-hand side of the rule. 7436 *Note Actions::. 7437 7438 -- Variable: $N 7439 In an action, the semantic value of the N-th symbol of the 7440 right-hand side of the rule. *Note Actions::. 7441 7442 -- Delimiter: %% 7443 Delimiter used to separate the grammar rule section from the Bison 7444 declarations section or the epilogue. *Note The Overall Layout of 7445 a Bison Grammar: Grammar Layout. 7446 7447 -- Delimiter: %{CODE%} 7448 All code listed between `%{' and `%}' is copied directly to the 7449 output file uninterpreted. Such code forms the prologue of the 7450 input file. *Note Outline of a Bison Grammar: Grammar Outline. 7451 7452 -- Construct: /*...*/ 7453 Comment delimiters, as in C. 7454 7455 -- Delimiter: : 7456 Separates a rule's result from its components. *Note Syntax of 7457 Grammar Rules: Rules. 7458 7459 -- Delimiter: ; 7460 Terminates a rule. *Note Syntax of Grammar Rules: Rules. 7461 7462 -- Delimiter: | 7463 Separates alternate rules for the same result nonterminal. *Note 7464 Syntax of Grammar Rules: Rules. 7465 7466 -- Symbol: $accept 7467 The predefined nonterminal whose only rule is `$accept: START 7468 $end', where START is the start symbol. *Note The Start-Symbol: 7469 Start Decl. It cannot be used in the grammar. 7470 7471 -- Directive: %debug 7472 Equip the parser for debugging. *Note Decl Summary::. 7473 7474 -- Directive: %defines 7475 Bison declaration to create a header file meant for the scanner. 7476 *Note Decl Summary::. 7477 7478 -- Directive: %destructor 7479 Specify how the parser should reclaim the memory associated to 7480 discarded symbols. *Note Freeing Discarded Symbols: Destructor 7481 Decl. 7482 7483 -- Directive: %dprec 7484 Bison declaration to assign a precedence to a rule that is used at 7485 parse time to resolve reduce/reduce conflicts. *Note Writing GLR 7486 Parsers: GLR Parsers. 7487 7488 -- Symbol: $end 7489 The predefined token marking the end of the token stream. It 7490 cannot be used in the grammar. 7491 7492 -- Symbol: error 7493 A token name reserved for error recovery. This token may be used 7494 in grammar rules so as to allow the Bison parser to recognize an 7495 error in the grammar without halting the process. In effect, a 7496 sentence containing an error may be recognized as valid. On a 7497 syntax error, the token `error' becomes the current look-ahead 7498 token. Actions corresponding to `error' are then executed, and 7499 the look-ahead token is reset to the token that originally caused 7500 the violation. *Note Error Recovery::. 7501 7502 -- Directive: %error-verbose 7503 Bison declaration to request verbose, specific error message 7504 strings when `yyerror' is called. 7505 7506 -- Directive: %file-prefix="PREFIX" 7507 Bison declaration to set the prefix of the output files. *Note 7508 Decl Summary::. 7509 7510 -- Directive: %glr-parser 7511 Bison declaration to produce a GLR parser. *Note Writing GLR 7512 Parsers: GLR Parsers. 7513 7514 -- Directive: %initial-action 7515 Run user code before parsing. *Note Performing Actions before 7516 Parsing: Initial Action Decl. 7517 7518 -- Directive: %left 7519 Bison declaration to assign left associativity to token(s). *Note 7520 Operator Precedence: Precedence Decl. 7521 7522 -- Directive: %lex-param {ARGUMENT-DECLARATION} 7523 Bison declaration to specifying an additional parameter that 7524 `yylex' should accept. *Note Calling Conventions for Pure 7525 Parsers: Pure Calling. 7526 7527 -- Directive: %merge 7528 Bison declaration to assign a merging function to a rule. If 7529 there is a reduce/reduce conflict with a rule having the same 7530 merging function, the function is applied to the two semantic 7531 values to get a single result. *Note Writing GLR Parsers: GLR 7532 Parsers. 7533 7534 -- Directive: %name-prefix="PREFIX" 7535 Bison declaration to rename the external symbols. *Note Decl 7536 Summary::. 7537 7538 -- Directive: %no-lines 7539 Bison declaration to avoid generating `#line' directives in the 7540 parser file. *Note Decl Summary::. 7541 7542 -- Directive: %nonassoc 7543 Bison declaration to assign nonassociativity to token(s). *Note 7544 Operator Precedence: Precedence Decl. 7545 7546 -- Directive: %output="FILE" 7547 Bison declaration to set the name of the parser file. *Note Decl 7548 Summary::. 7549 7550 -- Directive: %parse-param {ARGUMENT-DECLARATION} 7551 Bison declaration to specifying an additional parameter that 7552 `yyparse' should accept. *Note The Parser Function `yyparse': 7553 Parser Function. 7554 7555 -- Directive: %prec 7556 Bison declaration to assign a precedence to a specific rule. 7557 *Note Context-Dependent Precedence: Contextual Precedence. 7558 7559 -- Directive: %pure-parser 7560 Bison declaration to request a pure (reentrant) parser. *Note A 7561 Pure (Reentrant) Parser: Pure Decl. 7562 7563 -- Directive: %require "VERSION" 7564 Require version VERSION or higher of Bison. *Note Require a 7565 Version of Bison: Require Decl. 7566 7567 -- Directive: %right 7568 Bison declaration to assign right associativity to token(s). 7569 *Note Operator Precedence: Precedence Decl. 7570 7571 -- Directive: %start 7572 Bison declaration to specify the start symbol. *Note The 7573 Start-Symbol: Start Decl. 7574 7575 -- Directive: %token 7576 Bison declaration to declare token(s) without specifying 7577 precedence. *Note Token Type Names: Token Decl. 7578 7579 -- Directive: %token-table 7580 Bison declaration to include a token name table in the parser file. 7581 *Note Decl Summary::. 7582 7583 -- Directive: %type 7584 Bison declaration to declare nonterminals. *Note Nonterminal 7585 Symbols: Type Decl. 7586 7587 -- Symbol: $undefined 7588 The predefined token onto which all undefined values returned by 7589 `yylex' are mapped. It cannot be used in the grammar, rather, use 7590 `error'. 7591 7592 -- Directive: %union 7593 Bison declaration to specify several possible data types for 7594 semantic values. *Note The Collection of Value Types: Union Decl. 7595 7596 -- Macro: YYABORT 7597 Macro to pretend that an unrecoverable syntax error has occurred, 7598 by making `yyparse' return 1 immediately. The error reporting 7599 function `yyerror' is not called. *Note The Parser Function 7600 `yyparse': Parser Function. 7601 7602 -- Macro: YYACCEPT 7603 Macro to pretend that a complete utterance of the language has been 7604 read, by making `yyparse' return 0 immediately. *Note The Parser 7605 Function `yyparse': Parser Function. 7606 7607 -- Macro: YYBACKUP 7608 Macro to discard a value from the parser stack and fake a 7609 look-ahead token. *Note Special Features for Use in Actions: 7610 Action Features. 7611 7612 -- Variable: yychar 7613 External integer variable that contains the integer value of the 7614 look-ahead token. (In a pure parser, it is a local variable within 7615 `yyparse'.) Error-recovery rule actions may examine this variable. 7616 *Note Special Features for Use in Actions: Action Features. 7617 7618 -- Variable: yyclearin 7619 Macro used in error-recovery rule actions. It clears the previous 7620 look-ahead token. *Note Error Recovery::. 7621 7622 -- Macro: YYDEBUG 7623 Macro to define to equip the parser with tracing code. *Note 7624 Tracing Your Parser: Tracing. 7625 7626 -- Variable: yydebug 7627 External integer variable set to zero by default. If `yydebug' is 7628 given a nonzero value, the parser will output information on input 7629 symbols and parser action. *Note Tracing Your Parser: Tracing. 7630 7631 -- Macro: yyerrok 7632 Macro to cause parser to recover immediately to its normal mode 7633 after a syntax error. *Note Error Recovery::. 7634 7635 -- Macro: YYERROR 7636 Macro to pretend that a syntax error has just been detected: call 7637 `yyerror' and then perform normal error recovery if possible 7638 (*note Error Recovery::), or (if recovery is impossible) make 7639 `yyparse' return 1. *Note Error Recovery::. 7640 7641 -- Function: yyerror 7642 User-supplied function to be called by `yyparse' on error. *Note 7643 The Error Reporting Function `yyerror': Error Reporting. 7644 7645 -- Macro: YYERROR_VERBOSE 7646 An obsolete macro that you define with `#define' in the prologue 7647 to request verbose, specific error message strings when `yyerror' 7648 is called. It doesn't matter what definition you use for 7649 `YYERROR_VERBOSE', just whether you define it. Using 7650 `%error-verbose' is preferred. 7651 7652 -- Macro: YYINITDEPTH 7653 Macro for specifying the initial size of the parser stack. *Note 7654 Memory Management::. 7655 7656 -- Function: yylex 7657 User-supplied lexical analyzer function, called with no arguments 7658 to get the next token. *Note The Lexical Analyzer Function 7659 `yylex': Lexical. 7660 7661 -- Macro: YYLEX_PARAM 7662 An obsolete macro for specifying an extra argument (or list of 7663 extra arguments) for `yyparse' to pass to `yylex'. The use of this 7664 macro is deprecated, and is supported only for Yacc like parsers. 7665 *Note Calling Conventions for Pure Parsers: Pure Calling. 7666 7667 -- Variable: yylloc 7668 External variable in which `yylex' should place the line and column 7669 numbers associated with a token. (In a pure parser, it is a local 7670 variable within `yyparse', and its address is passed to `yylex'.) 7671 You can ignore this variable if you don't use the `@' feature in 7672 the grammar actions. *Note Textual Locations of Tokens: Token 7673 Locations. In semantic actions, it stores the location of the 7674 look-ahead token. *Note Actions and Locations: Actions and 7675 Locations. 7676 7677 -- Type: YYLTYPE 7678 Data type of `yylloc'; by default, a structure with four members. 7679 *Note Data Types of Locations: Location Type. 7680 7681 -- Variable: yylval 7682 External variable in which `yylex' should place the semantic value 7683 associated with a token. (In a pure parser, it is a local 7684 variable within `yyparse', and its address is passed to `yylex'.) 7685 *Note Semantic Values of Tokens: Token Values. In semantic 7686 actions, it stores the semantic value of the look-ahead token. 7687 *Note Actions: Actions. 7688 7689 -- Macro: YYMAXDEPTH 7690 Macro for specifying the maximum size of the parser stack. *Note 7691 Memory Management::. 7692 7693 -- Variable: yynerrs 7694 Global variable which Bison increments each time it reports a 7695 syntax error. (In a pure parser, it is a local variable within 7696 `yyparse'.) *Note The Error Reporting Function `yyerror': Error 7697 Reporting. 7698 7699 -- Function: yyparse 7700 The parser function produced by Bison; call this function to start 7701 parsing. *Note The Parser Function `yyparse': Parser Function. 7702 7703 -- Macro: YYPARSE_PARAM 7704 An obsolete macro for specifying the name of a parameter that 7705 `yyparse' should accept. The use of this macro is deprecated, and 7706 is supported only for Yacc like parsers. *Note Calling 7707 Conventions for Pure Parsers: Pure Calling. 7708 7709 -- Macro: YYRECOVERING 7710 The expression `YYRECOVERING ()' yields 1 when the parser is 7711 recovering from a syntax error, and 0 otherwise. *Note Special 7712 Features for Use in Actions: Action Features. 7713 7714 -- Macro: YYSTACK_USE_ALLOCA 7715 Macro used to control the use of `alloca' when the C LALR(1) 7716 parser needs to extend its stacks. If defined to 0, the parser 7717 will use `malloc' to extend its stacks. If defined to 1, the 7718 parser will use `alloca'. Values other than 0 and 1 are reserved 7719 for future Bison extensions. If not defined, `YYSTACK_USE_ALLOCA' 7720 defaults to 0. 7721 7722 In the all-too-common case where your code may run on a host with a 7723 limited stack and with unreliable stack-overflow checking, you 7724 should set `YYMAXDEPTH' to a value that cannot possibly result in 7725 unchecked stack overflow on any of your target hosts when `alloca' 7726 is called. You can inspect the code that Bison generates in order 7727 to determine the proper numeric values. This will require some 7728 expertise in low-level implementation details. 7729 7730 -- Type: YYSTYPE 7731 Data type of semantic values; `int' by default. *Note Data Types 7732 of Semantic Values: Value Type. 7733 7734 7735 File: bison.info, Node: Glossary, Next: Copying This Manual, Prev: Table of Symbols, Up: Top 7736 7737 Appendix B Glossary 7738 ******************* 7739 7740 Backus-Naur Form (BNF; also called "Backus Normal Form") 7741 Formal method of specifying context-free grammars originally 7742 proposed by John Backus, and slightly improved by Peter Naur in 7743 his 1960-01-02 committee document contributing to what became the 7744 Algol 60 report. *Note Languages and Context-Free Grammars: 7745 Language and Grammar. 7746 7747 Context-free grammars 7748 Grammars specified as rules that can be applied regardless of 7749 context. Thus, if there is a rule which says that an integer can 7750 be used as an expression, integers are allowed _anywhere_ an 7751 expression is permitted. *Note Languages and Context-Free 7752 Grammars: Language and Grammar. 7753 7754 Dynamic allocation 7755 Allocation of memory that occurs during execution, rather than at 7756 compile time or on entry to a function. 7757 7758 Empty string 7759 Analogous to the empty set in set theory, the empty string is a 7760 character string of length zero. 7761 7762 Finite-state stack machine 7763 A "machine" that has discrete states in which it is said to exist 7764 at each instant in time. As input to the machine is processed, the 7765 machine moves from state to state as specified by the logic of the 7766 machine. In the case of the parser, the input is the language 7767 being parsed, and the states correspond to various stages in the 7768 grammar rules. *Note The Bison Parser Algorithm: Algorithm. 7769 7770 Generalized LR (GLR) 7771 A parsing algorithm that can handle all context-free grammars, 7772 including those that are not LALR(1). It resolves situations that 7773 Bison's usual LALR(1) algorithm cannot by effectively splitting 7774 off multiple parsers, trying all possible parsers, and discarding 7775 those that fail in the light of additional right context. *Note 7776 Generalized LR Parsing: Generalized LR Parsing. 7777 7778 Grouping 7779 A language construct that is (in general) grammatically divisible; 7780 for example, `expression' or `declaration' in C. *Note Languages 7781 and Context-Free Grammars: Language and Grammar. 7782 7783 Infix operator 7784 An arithmetic operator that is placed between the operands on 7785 which it performs some operation. 7786 7787 Input stream 7788 A continuous flow of data between devices or programs. 7789 7790 Language construct 7791 One of the typical usage schemas of the language. For example, 7792 one of the constructs of the C language is the `if' statement. 7793 *Note Languages and Context-Free Grammars: Language and Grammar. 7794 7795 Left associativity 7796 Operators having left associativity are analyzed from left to 7797 right: `a+b+c' first computes `a+b' and then combines with `c'. 7798 *Note Operator Precedence: Precedence. 7799 7800 Left recursion 7801 A rule whose result symbol is also its first component symbol; for 7802 example, `expseq1 : expseq1 ',' exp;'. *Note Recursive Rules: 7803 Recursion. 7804 7805 Left-to-right parsing 7806 Parsing a sentence of a language by analyzing it token by token 7807 from left to right. *Note The Bison Parser Algorithm: Algorithm. 7808 7809 Lexical analyzer (scanner) 7810 A function that reads an input stream and returns tokens one by 7811 one. *Note The Lexical Analyzer Function `yylex': Lexical. 7812 7813 Lexical tie-in 7814 A flag, set by actions in the grammar rules, which alters the way 7815 tokens are parsed. *Note Lexical Tie-ins::. 7816 7817 Literal string token 7818 A token which consists of two or more fixed characters. *Note 7819 Symbols::. 7820 7821 Look-ahead token 7822 A token already read but not yet shifted. *Note Look-Ahead 7823 Tokens: Look-Ahead. 7824 7825 LALR(1) 7826 The class of context-free grammars that Bison (like most other 7827 parser generators) can handle; a subset of LR(1). *Note 7828 Mysterious Reduce/Reduce Conflicts: Mystery Conflicts. 7829 7830 LR(1) 7831 The class of context-free grammars in which at most one token of 7832 look-ahead is needed to disambiguate the parsing of any piece of 7833 input. 7834 7835 Nonterminal symbol 7836 A grammar symbol standing for a grammatical construct that can be 7837 expressed through rules in terms of smaller constructs; in other 7838 words, a construct that is not a token. *Note Symbols::. 7839 7840 Parser 7841 A function that recognizes valid sentences of a language by 7842 analyzing the syntax structure of a set of tokens passed to it 7843 from a lexical analyzer. 7844 7845 Postfix operator 7846 An arithmetic operator that is placed after the operands upon 7847 which it performs some operation. 7848 7849 Reduction 7850 Replacing a string of nonterminals and/or terminals with a single 7851 nonterminal, according to a grammar rule. *Note The Bison Parser 7852 Algorithm: Algorithm. 7853 7854 Reentrant 7855 A reentrant subprogram is a subprogram which can be in invoked any 7856 number of times in parallel, without interference between the 7857 various invocations. *Note A Pure (Reentrant) Parser: Pure Decl. 7858 7859 Reverse polish notation 7860 A language in which all operators are postfix operators. 7861 7862 Right recursion 7863 A rule whose result symbol is also its last component symbol; for 7864 example, `expseq1: exp ',' expseq1;'. *Note Recursive Rules: 7865 Recursion. 7866 7867 Semantics 7868 In computer languages, the semantics are specified by the actions 7869 taken for each instance of the language, i.e., the meaning of each 7870 statement. *Note Defining Language Semantics: Semantics. 7871 7872 Shift 7873 A parser is said to shift when it makes the choice of analyzing 7874 further input from the stream rather than reducing immediately some 7875 already-recognized rule. *Note The Bison Parser Algorithm: 7876 Algorithm. 7877 7878 Single-character literal 7879 A single character that is recognized and interpreted as is. 7880 *Note From Formal Rules to Bison Input: Grammar in Bison. 7881 7882 Start symbol 7883 The nonterminal symbol that stands for a complete valid utterance 7884 in the language being parsed. The start symbol is usually listed 7885 as the first nonterminal symbol in a language specification. 7886 *Note The Start-Symbol: Start Decl. 7887 7888 Symbol table 7889 A data structure where symbol names and associated data are stored 7890 during parsing to allow for recognition and use of existing 7891 information in repeated uses of a symbol. *Note Multi-function 7892 Calc::. 7893 7894 Syntax error 7895 An error encountered during parsing of an input stream due to 7896 invalid syntax. *Note Error Recovery::. 7897 7898 Token 7899 A basic, grammatically indivisible unit of a language. The symbol 7900 that describes a token in the grammar is a terminal symbol. The 7901 input of the Bison parser is a stream of tokens which comes from 7902 the lexical analyzer. *Note Symbols::. 7903 7904 Terminal symbol 7905 A grammar symbol that has no rules in the grammar and therefore is 7906 grammatically indivisible. The piece of text it represents is a 7907 token. *Note Languages and Context-Free Grammars: Language and 7908 Grammar. 7909 7910 7911 File: bison.info, Node: Copying This Manual, Next: Index, Prev: Glossary, Up: Top 7912 7913 Appendix C Copying This Manual 7914 ****************************** 7915 7916 * Menu: 7917 7918 * GNU Free Documentation License:: License for copying this manual. 7919 7920 7921 File: bison.info, Node: GNU Free Documentation License, Up: Copying This Manual 7922 7923 C.1 GNU Free Documentation License 7924 ================================== 7925 7926 Version 1.2, November 2002 7927 7928 Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 7929 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA 7930 7931 Everyone is permitted to copy and distribute verbatim copies 7932 of this license document, but changing it is not allowed. 7933 7934 0. PREAMBLE 7935 7936 The purpose of this License is to make a manual, textbook, or other 7937 functional and useful document "free" in the sense of freedom: to 7938 assure everyone the effective freedom to copy and redistribute it, 7939 with or without modifying it, either commercially or 7940 noncommercially. Secondarily, this License preserves for the 7941 author and publisher a way to get credit for their work, while not 7942 being considered responsible for modifications made by others. 7943 7944 This License is a kind of "copyleft", which means that derivative 7945 works of the document must themselves be free in the same sense. 7946 It complements the GNU General Public License, which is a copyleft 7947 license designed for free software. 7948 7949 We have designed this License in order to use it for manuals for 7950 free software, because free software needs free documentation: a 7951 free program should come with manuals providing the same freedoms 7952 that the software does. But this License is not limited to 7953 software manuals; it can be used for any textual work, regardless 7954 of subject matter or whether it is published as a printed book. 7955 We recommend this License principally for works whose purpose is 7956 instruction or reference. 7957 7958 1. APPLICABILITY AND DEFINITIONS 7959 7960 This License applies to any manual or other work, in any medium, 7961 that contains a notice placed by the copyright holder saying it 7962 can be distributed under the terms of this License. Such a notice 7963 grants a world-wide, royalty-free license, unlimited in duration, 7964 to use that work under the conditions stated herein. The 7965 "Document", below, refers to any such manual or work. Any member 7966 of the public is a licensee, and is addressed as "you". You 7967 accept the license if you copy, modify or distribute the work in a 7968 way requiring permission under copyright law. 7969 7970 A "Modified Version" of the Document means any work containing the 7971 Document or a portion of it, either copied verbatim, or with 7972 modifications and/or translated into another language. 7973 7974 A "Secondary Section" is a named appendix or a front-matter section 7975 of the Document that deals exclusively with the relationship of the 7976 publishers or authors of the Document to the Document's overall 7977 subject (or to related matters) and contains nothing that could 7978 fall directly within that overall subject. (Thus, if the Document 7979 is in part a textbook of mathematics, a Secondary Section may not 7980 explain any mathematics.) The relationship could be a matter of 7981 historical connection with the subject or with related matters, or 7982 of legal, commercial, philosophical, ethical or political position 7983 regarding them. 7984 7985 The "Invariant Sections" are certain Secondary Sections whose 7986 titles are designated, as being those of Invariant Sections, in 7987 the notice that says that the Document is released under this 7988 License. If a section does not fit the above definition of 7989 Secondary then it is not allowed to be designated as Invariant. 7990 The Document may contain zero Invariant Sections. If the Document 7991 does not identify any Invariant Sections then there are none. 7992 7993 The "Cover Texts" are certain short passages of text that are 7994 listed, as Front-Cover Texts or Back-Cover Texts, in the notice 7995 that says that the Document is released under this License. A 7996 Front-Cover Text may be at most 5 words, and a Back-Cover Text may 7997 be at most 25 words. 7998 7999 A "Transparent" copy of the Document means a machine-readable copy, 8000 represented in a format whose specification is available to the 8001 general public, that is suitable for revising the document 8002 straightforwardly with generic text editors or (for images 8003 composed of pixels) generic paint programs or (for drawings) some 8004 widely available drawing editor, and that is suitable for input to 8005 text formatters or for automatic translation to a variety of 8006 formats suitable for input to text formatters. A copy made in an 8007 otherwise Transparent file format whose markup, or absence of 8008 markup, has been arranged to thwart or discourage subsequent 8009 modification by readers is not Transparent. An image format is 8010 not Transparent if used for any substantial amount of text. A 8011 copy that is not "Transparent" is called "Opaque". 8012 8013 Examples of suitable formats for Transparent copies include plain 8014 ASCII without markup, Texinfo input format, LaTeX input format, 8015 SGML or XML using a publicly available DTD, and 8016 standard-conforming simple HTML, PostScript or PDF designed for 8017 human modification. Examples of transparent image formats include 8018 PNG, XCF and JPG. Opaque formats include proprietary formats that 8019 can be read and edited only by proprietary word processors, SGML or 8020 XML for which the DTD and/or processing tools are not generally 8021 available, and the machine-generated HTML, PostScript or PDF 8022 produced by some word processors for output purposes only. 8023 8024 The "Title Page" means, for a printed book, the title page itself, 8025 plus such following pages as are needed to hold, legibly, the 8026 material this License requires to appear in the title page. For 8027 works in formats which do not have any title page as such, "Title 8028 Page" means the text near the most prominent appearance of the 8029 work's title, preceding the beginning of the body of the text. 8030 8031 A section "Entitled XYZ" means a named subunit of the Document 8032 whose title either is precisely XYZ or contains XYZ in parentheses 8033 following text that translates XYZ in another language. (Here XYZ 8034 stands for a specific section name mentioned below, such as 8035 "Acknowledgements", "Dedications", "Endorsements", or "History".) 8036 To "Preserve the Title" of such a section when you modify the 8037 Document means that it remains a section "Entitled XYZ" according 8038 to this definition. 8039 8040 The Document may include Warranty Disclaimers next to the notice 8041 which states that this License applies to the Document. These 8042 Warranty Disclaimers are considered to be included by reference in 8043 this License, but only as regards disclaiming warranties: any other 8044 implication that these Warranty Disclaimers may have is void and 8045 has no effect on the meaning of this License. 8046 8047 2. VERBATIM COPYING 8048 8049 You may copy and distribute the Document in any medium, either 8050 commercially or noncommercially, provided that this License, the 8051 copyright notices, and the license notice saying this License 8052 applies to the Document are reproduced in all copies, and that you 8053 add no other conditions whatsoever to those of this License. You 8054 may not use technical measures to obstruct or control the reading 8055 or further copying of the copies you make or distribute. However, 8056 you may accept compensation in exchange for copies. If you 8057 distribute a large enough number of copies you must also follow 8058 the conditions in section 3. 8059 8060 You may also lend copies, under the same conditions stated above, 8061 and you may publicly display copies. 8062 8063 3. COPYING IN QUANTITY 8064 8065 If you publish printed copies (or copies in media that commonly 8066 have printed covers) of the Document, numbering more than 100, and 8067 the Document's license notice requires Cover Texts, you must 8068 enclose the copies in covers that carry, clearly and legibly, all 8069 these Cover Texts: Front-Cover Texts on the front cover, and 8070 Back-Cover Texts on the back cover. Both covers must also clearly 8071 and legibly identify you as the publisher of these copies. The 8072 front cover must present the full title with all words of the 8073 title equally prominent and visible. You may add other material 8074 on the covers in addition. Copying with changes limited to the 8075 covers, as long as they preserve the title of the Document and 8076 satisfy these conditions, can be treated as verbatim copying in 8077 other respects. 8078 8079 If the required texts for either cover are too voluminous to fit 8080 legibly, you should put the first ones listed (as many as fit 8081 reasonably) on the actual cover, and continue the rest onto 8082 adjacent pages. 8083 8084 If you publish or distribute Opaque copies of the Document 8085 numbering more than 100, you must either include a 8086 machine-readable Transparent copy along with each Opaque copy, or 8087 state in or with each Opaque copy a computer-network location from 8088 which the general network-using public has access to download 8089 using public-standard network protocols a complete Transparent 8090 copy of the Document, free of added material. If you use the 8091 latter option, you must take reasonably prudent steps, when you 8092 begin distribution of Opaque copies in quantity, to ensure that 8093 this Transparent copy will remain thus accessible at the stated 8094 location until at least one year after the last time you 8095 distribute an Opaque copy (directly or through your agents or 8096 retailers) of that edition to the public. 8097 8098 It is requested, but not required, that you contact the authors of 8099 the Document well before redistributing any large number of 8100 copies, to give them a chance to provide you with an updated 8101 version of the Document. 8102 8103 4. MODIFICATIONS 8104 8105 You may copy and distribute a Modified Version of the Document 8106 under the conditions of sections 2 and 3 above, provided that you 8107 release the Modified Version under precisely this License, with 8108 the Modified Version filling the role of the Document, thus 8109 licensing distribution and modification of the Modified Version to 8110 whoever possesses a copy of it. In addition, you must do these 8111 things in the Modified Version: 8112 8113 A. Use in the Title Page (and on the covers, if any) a title 8114 distinct from that of the Document, and from those of 8115 previous versions (which should, if there were any, be listed 8116 in the History section of the Document). You may use the 8117 same title as a previous version if the original publisher of 8118 that version gives permission. 8119 8120 B. List on the Title Page, as authors, one or more persons or 8121 entities responsible for authorship of the modifications in 8122 the Modified Version, together with at least five of the 8123 principal authors of the Document (all of its principal 8124 authors, if it has fewer than five), unless they release you 8125 from this requirement. 8126 8127 C. State on the Title page the name of the publisher of the 8128 Modified Version, as the publisher. 8129 8130 D. Preserve all the copyright notices of the Document. 8131 8132 E. Add an appropriate copyright notice for your modifications 8133 adjacent to the other copyright notices. 8134 8135 F. Include, immediately after the copyright notices, a license 8136 notice giving the public permission to use the Modified 8137 Version under the terms of this License, in the form shown in 8138 the Addendum below. 8139 8140 G. Preserve in that license notice the full lists of Invariant 8141 Sections and required Cover Texts given in the Document's 8142 license notice. 8143 8144 H. Include an unaltered copy of this License. 8145 8146 I. Preserve the section Entitled "History", Preserve its Title, 8147 and add to it an item stating at least the title, year, new 8148 authors, and publisher of the Modified Version as given on 8149 the Title Page. If there is no section Entitled "History" in 8150 the Document, create one stating the title, year, authors, 8151 and publisher of the Document as given on its Title Page, 8152 then add an item describing the Modified Version as stated in 8153 the previous sentence. 8154 8155 J. Preserve the network location, if any, given in the Document 8156 for public access to a Transparent copy of the Document, and 8157 likewise the network locations given in the Document for 8158 previous versions it was based on. These may be placed in 8159 the "History" section. You may omit a network location for a 8160 work that was published at least four years before the 8161 Document itself, or if the original publisher of the version 8162 it refers to gives permission. 8163 8164 K. For any section Entitled "Acknowledgements" or "Dedications", 8165 Preserve the Title of the section, and preserve in the 8166 section all the substance and tone of each of the contributor 8167 acknowledgements and/or dedications given therein. 8168 8169 L. Preserve all the Invariant Sections of the Document, 8170 unaltered in their text and in their titles. Section numbers 8171 or the equivalent are not considered part of the section 8172 titles. 8173 8174 M. Delete any section Entitled "Endorsements". Such a section 8175 may not be included in the Modified Version. 8176 8177 N. Do not retitle any existing section to be Entitled 8178 "Endorsements" or to conflict in title with any Invariant 8179 Section. 8180 8181 O. Preserve any Warranty Disclaimers. 8182 8183 If the Modified Version includes new front-matter sections or 8184 appendices that qualify as Secondary Sections and contain no 8185 material copied from the Document, you may at your option 8186 designate some or all of these sections as invariant. To do this, 8187 add their titles to the list of Invariant Sections in the Modified 8188 Version's license notice. These titles must be distinct from any 8189 other section titles. 8190 8191 You may add a section Entitled "Endorsements", provided it contains 8192 nothing but endorsements of your Modified Version by various 8193 parties--for example, statements of peer review or that the text 8194 has been approved by an organization as the authoritative 8195 definition of a standard. 8196 8197 You may add a passage of up to five words as a Front-Cover Text, 8198 and a passage of up to 25 words as a Back-Cover Text, to the end 8199 of the list of Cover Texts in the Modified Version. Only one 8200 passage of Front-Cover Text and one of Back-Cover Text may be 8201 added by (or through arrangements made by) any one entity. If the 8202 Document already includes a cover text for the same cover, 8203 previously added by you or by arrangement made by the same entity 8204 you are acting on behalf of, you may not add another; but you may 8205 replace the old one, on explicit permission from the previous 8206 publisher that added the old one. 8207 8208 The author(s) and publisher(s) of the Document do not by this 8209 License give permission to use their names for publicity for or to 8210 assert or imply endorsement of any Modified Version. 8211 8212 5. COMBINING DOCUMENTS 8213 8214 You may combine the Document with other documents released under 8215 this License, under the terms defined in section 4 above for 8216 modified versions, provided that you include in the combination 8217 all of the Invariant Sections of all of the original documents, 8218 unmodified, and list them all as Invariant Sections of your 8219 combined work in its license notice, and that you preserve all 8220 their Warranty Disclaimers. 8221 8222 The combined work need only contain one copy of this License, and 8223 multiple identical Invariant Sections may be replaced with a single 8224 copy. If there are multiple Invariant Sections with the same name 8225 but different contents, make the title of each such section unique 8226 by adding at the end of it, in parentheses, the name of the 8227 original author or publisher of that section if known, or else a 8228 unique number. Make the same adjustment to the section titles in 8229 the list of Invariant Sections in the license notice of the 8230 combined work. 8231 8232 In the combination, you must combine any sections Entitled 8233 "History" in the various original documents, forming one section 8234 Entitled "History"; likewise combine any sections Entitled 8235 "Acknowledgements", and any sections Entitled "Dedications". You 8236 must delete all sections Entitled "Endorsements." 8237 8238 6. COLLECTIONS OF DOCUMENTS 8239 8240 You may make a collection consisting of the Document and other 8241 documents released under this License, and replace the individual 8242 copies of this License in the various documents with a single copy 8243 that is included in the collection, provided that you follow the 8244 rules of this License for verbatim copying of each of the 8245 documents in all other respects. 8246 8247 You may extract a single document from such a collection, and 8248 distribute it individually under this License, provided you insert 8249 a copy of this License into the extracted document, and follow 8250 this License in all other respects regarding verbatim copying of 8251 that document. 8252 8253 7. AGGREGATION WITH INDEPENDENT WORKS 8254 8255 A compilation of the Document or its derivatives with other 8256 separate and independent documents or works, in or on a volume of 8257 a storage or distribution medium, is called an "aggregate" if the 8258 copyright resulting from the compilation is not used to limit the 8259 legal rights of the compilation's users beyond what the individual 8260 works permit. When the Document is included in an aggregate, this 8261 License does not apply to the other works in the aggregate which 8262 are not themselves derivative works of the Document. 8263 8264 If the Cover Text requirement of section 3 is applicable to these 8265 copies of the Document, then if the Document is less than one half 8266 of the entire aggregate, the Document's Cover Texts may be placed 8267 on covers that bracket the Document within the aggregate, or the 8268 electronic equivalent of covers if the Document is in electronic 8269 form. Otherwise they must appear on printed covers that bracket 8270 the whole aggregate. 8271 8272 8. TRANSLATION 8273 8274 Translation is considered a kind of modification, so you may 8275 distribute translations of the Document under the terms of section 8276 4. Replacing Invariant Sections with translations requires special 8277 permission from their copyright holders, but you may include 8278 translations of some or all Invariant Sections in addition to the 8279 original versions of these Invariant Sections. You may include a 8280 translation of this License, and all the license notices in the 8281 Document, and any Warranty Disclaimers, provided that you also 8282 include the original English version of this License and the 8283 original versions of those notices and disclaimers. In case of a 8284 disagreement between the translation and the original version of 8285 this License or a notice or disclaimer, the original version will 8286 prevail. 8287 8288 If a section in the Document is Entitled "Acknowledgements", 8289 "Dedications", or "History", the requirement (section 4) to 8290 Preserve its Title (section 1) will typically require changing the 8291 actual title. 8292 8293 9. TERMINATION 8294 8295 You may not copy, modify, sublicense, or distribute the Document 8296 except as expressly provided for under this License. Any other 8297 attempt to copy, modify, sublicense or distribute the Document is 8298 void, and will automatically terminate your rights under this 8299 License. However, parties who have received copies, or rights, 8300 from you under this License will not have their licenses 8301 terminated so long as such parties remain in full compliance. 8302 8303 10. FUTURE REVISIONS OF THIS LICENSE 8304 8305 The Free Software Foundation may publish new, revised versions of 8306 the GNU Free Documentation License from time to time. Such new 8307 versions will be similar in spirit to the present version, but may 8308 differ in detail to address new problems or concerns. See 8309 `http://www.gnu.org/copyleft/'. 8310 8311 Each version of the License is given a distinguishing version 8312 number. If the Document specifies that a particular numbered 8313 version of this License "or any later version" applies to it, you 8314 have the option of following the terms and conditions either of 8315 that specified version or of any later version that has been 8316 published (not as a draft) by the Free Software Foundation. If 8317 the Document does not specify a version number of this License, 8318 you may choose any version ever published (not as a draft) by the 8319 Free Software Foundation. 8320 8321 C.1.1 ADDENDUM: How to use this License for your documents 8322 ---------------------------------------------------------- 8323 8324 To use this License in a document you have written, include a copy of 8325 the License in the document and put the following copyright and license 8326 notices just after the title page: 8327 8328 Copyright (C) YEAR YOUR NAME. 8329 Permission is granted to copy, distribute and/or modify this document 8330 under the terms of the GNU Free Documentation License, Version 1.2 8331 or any later version published by the Free Software Foundation; 8332 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover 8333 Texts. A copy of the license is included in the section entitled ``GNU 8334 Free Documentation License''. 8335 8336 If you have Invariant Sections, Front-Cover Texts and Back-Cover 8337 Texts, replace the "with...Texts." line with this: 8338 8339 with the Invariant Sections being LIST THEIR TITLES, with 8340 the Front-Cover Texts being LIST, and with the Back-Cover Texts 8341 being LIST. 8342 8343 If you have Invariant Sections without Cover Texts, or some other 8344 combination of the three, merge those two alternatives to suit the 8345 situation. 8346 8347 If your document contains nontrivial examples of program code, we 8348 recommend releasing these examples in parallel under your choice of 8349 free software license, such as the GNU General Public License, to 8350 permit their use in free software. 8351 8352 8353 File: bison.info, Node: Index, Prev: Copying This Manual, Up: Top 8354 8355 Index 8356 ***** 8357 8358 [index] 8359 * Menu: 8360 8361 * $ <1>: Table of Symbols. (line 19) 8362 * $: Action Features. (line 14) 8363 * $$ <1>: Table of Symbols. (line 15) 8364 * $$ <2>: Action Features. (line 10) 8365 * $$: Actions. (line 6) 8366 * $<: Action Features. (line 18) 8367 * $accept: Table of Symbols. (line 47) 8368 * $end: Table of Symbols. (line 69) 8369 * $N: Actions. (line 6) 8370 * $undefined: Table of Symbols. (line 168) 8371 * %: Table of Symbols. (line 28) 8372 * %%: Table of Symbols. (line 23) 8373 * %debug <1>: Table of Symbols. (line 52) 8374 * %debug <2>: Tracing. (line 23) 8375 * %debug: Decl Summary. (line 46) 8376 * %defines <1>: Table of Symbols. (line 55) 8377 * %defines: Decl Summary. (line 51) 8378 * %destructor <1>: Table of Symbols. (line 59) 8379 * %destructor <2>: Decl Summary. (line 79) 8380 * %destructor <3>: Destructor Decl. (line 6) 8381 * %destructor: Mid-Rule Actions. (line 59) 8382 * %dprec <1>: Table of Symbols. (line 64) 8383 * %dprec: Merging GLR Parses. (line 6) 8384 * %error-verbose <1>: Table of Symbols. (line 83) 8385 * %error-verbose: Error Reporting. (line 17) 8386 * %expect <1>: Decl Summary. (line 38) 8387 * %expect: Expect Decl. (line 6) 8388 * %expect-rr <1>: Expect Decl. (line 6) 8389 * %expect-rr: Simple GLR Parsers. (line 6) 8390 * %file-prefix=" <1>: Table of Symbols. (line 87) 8391 * %file-prefix=": Decl Summary. (line 84) 8392 * %glr-parser <1>: Table of Symbols. (line 91) 8393 * %glr-parser <2>: Simple GLR Parsers. (line 6) 8394 * %glr-parser: GLR Parsers. (line 6) 8395 * %initial-action <1>: Table of Symbols. (line 95) 8396 * %initial-action: Initial Action Decl. (line 6) 8397 * %left <1>: Table of Symbols. (line 99) 8398 * %left <2>: Using Precedence. (line 6) 8399 * %left: Decl Summary. (line 21) 8400 * %lex-param <1>: Table of Symbols. (line 103) 8401 * %lex-param: Pure Calling. (line 31) 8402 * %locations: Decl Summary. (line 88) 8403 * %merge <1>: Table of Symbols. (line 108) 8404 * %merge: Merging GLR Parses. (line 6) 8405 * %name-prefix=" <1>: Table of Symbols. (line 115) 8406 * %name-prefix=": Decl Summary. (line 95) 8407 * %no-lines <1>: Table of Symbols. (line 119) 8408 * %no-lines: Decl Summary. (line 114) 8409 * %no-parser: Decl Summary. (line 105) 8410 * %nonassoc <1>: Table of Symbols. (line 123) 8411 * %nonassoc <2>: Using Precedence. (line 6) 8412 * %nonassoc: Decl Summary. (line 25) 8413 * %output=" <1>: Table of Symbols. (line 127) 8414 * %output=": Decl Summary. (line 122) 8415 * %parse-param <1>: Table of Symbols. (line 131) 8416 * %parse-param: Parser Function. (line 36) 8417 * %prec <1>: Table of Symbols. (line 136) 8418 * %prec: Contextual Precedence. 8419 (line 6) 8420 * %pure-parser <1>: Table of Symbols. (line 140) 8421 * %pure-parser <2>: Decl Summary. (line 125) 8422 * %pure-parser: Pure Decl. (line 6) 8423 * %require <1>: Table of Symbols. (line 144) 8424 * %require <2>: Decl Summary. (line 129) 8425 * %require: Require Decl. (line 6) 8426 * %right <1>: Table of Symbols. (line 148) 8427 * %right <2>: Using Precedence. (line 6) 8428 * %right: Decl Summary. (line 17) 8429 * %start <1>: Table of Symbols. (line 152) 8430 * %start <2>: Decl Summary. (line 34) 8431 * %start: Start Decl. (line 6) 8432 * %token <1>: Table of Symbols. (line 156) 8433 * %token <2>: Decl Summary. (line 13) 8434 * %token: Token Decl. (line 6) 8435 * %token-table <1>: Table of Symbols. (line 160) 8436 * %token-table: Decl Summary. (line 133) 8437 * %type <1>: Table of Symbols. (line 164) 8438 * %type <2>: Decl Summary. (line 30) 8439 * %type: Type Decl. (line 6) 8440 * %union <1>: Table of Symbols. (line 173) 8441 * %union <2>: Decl Summary. (line 9) 8442 * %union: Union Decl. (line 6) 8443 * %verbose: Decl Summary. (line 166) 8444 * %yacc: Decl Summary. (line 172) 8445 * /*: Table of Symbols. (line 33) 8446 * :: Table of Symbols. (line 36) 8447 * ;: Table of Symbols. (line 40) 8448 * @$ <1>: Table of Symbols. (line 7) 8449 * @$ <2>: Action Features. (line 99) 8450 * @$: Actions and Locations. 8451 (line 6) 8452 * @N <1>: Table of Symbols. (line 11) 8453 * @N <2>: Action Features. (line 105) 8454 * @N: Actions and Locations. 8455 (line 6) 8456 * abstract syntax tree: Implementing Gotos/Loops. 8457 (line 17) 8458 * action: Actions. (line 6) 8459 * action data types: Action Types. (line 6) 8460 * action features summary: Action Features. (line 6) 8461 * actions in mid-rule: Mid-Rule Actions. (line 6) 8462 * actions, location: Actions and Locations. 8463 (line 6) 8464 * actions, semantic: Semantic Actions. (line 6) 8465 * additional C code section: Epilogue. (line 6) 8466 * algorithm of parser: Algorithm. (line 6) 8467 * ambiguous grammars <1>: Generalized LR Parsing. 8468 (line 6) 8469 * ambiguous grammars: Language and Grammar. 8470 (line 33) 8471 * associativity: Why Precedence. (line 33) 8472 * AST: Implementing Gotos/Loops. 8473 (line 17) 8474 * Backus-Naur form: Language and Grammar. 8475 (line 16) 8476 * begin on location: C++ Location Values. (line 44) 8477 * Bison declaration summary: Decl Summary. (line 6) 8478 * Bison declarations: Declarations. (line 6) 8479 * Bison declarations (introduction): Bison Declarations. (line 6) 8480 * Bison grammar: Grammar in Bison. (line 6) 8481 * Bison invocation: Invocation. (line 6) 8482 * Bison parser: Bison Parser. (line 6) 8483 * Bison parser algorithm: Algorithm. (line 6) 8484 * Bison symbols, table of: Table of Symbols. (line 6) 8485 * Bison utility: Bison Parser. (line 6) 8486 * bison-i18n.m4: Internationalization. 8487 (line 20) 8488 * bison-po: Internationalization. 8489 (line 6) 8490 * BISON_I18N: Internationalization. 8491 (line 27) 8492 * BISON_LOCALEDIR: Internationalization. 8493 (line 27) 8494 * BNF: Language and Grammar. 8495 (line 16) 8496 * braced code: Rules. (line 31) 8497 * C code, section for additional: Epilogue. (line 6) 8498 * C-language interface: Interface. (line 6) 8499 * calc: Infix Calc. (line 6) 8500 * calculator, infix notation: Infix Calc. (line 6) 8501 * calculator, location tracking: Location Tracking Calc. 8502 (line 6) 8503 * calculator, multi-function: Multi-function Calc. (line 6) 8504 * calculator, simple: RPN Calc. (line 6) 8505 * character token: Symbols. (line 31) 8506 * column on position: C++ Location Values. (line 25) 8507 * columns on location: C++ Location Values. (line 48) 8508 * columns on position: C++ Location Values. (line 28) 8509 * compiling the parser: Rpcalc Compile. (line 6) 8510 * conflicts <1>: Shift/Reduce. (line 6) 8511 * conflicts <2>: Merging GLR Parses. (line 6) 8512 * conflicts <3>: Simple GLR Parsers. (line 6) 8513 * conflicts: GLR Parsers. (line 6) 8514 * conflicts, reduce/reduce: Reduce/Reduce. (line 6) 8515 * conflicts, suppressing warnings of: Expect Decl. (line 6) 8516 * context-dependent precedence: Contextual Precedence. 8517 (line 6) 8518 * context-free grammar: Language and Grammar. 8519 (line 6) 8520 * controlling function: Rpcalc Main. (line 6) 8521 * core, item set: Understanding. (line 129) 8522 * dangling else: Shift/Reduce. (line 6) 8523 * data type of locations: Location Type. (line 6) 8524 * data types in actions: Action Types. (line 6) 8525 * data types of semantic values: Value Type. (line 6) 8526 * debug_level on parser: C++ Parser Interface. 8527 (line 31) 8528 * debug_stream on parser: C++ Parser Interface. 8529 (line 26) 8530 * debugging: Tracing. (line 6) 8531 * declaration summary: Decl Summary. (line 6) 8532 * declarations: Prologue. (line 6) 8533 * declarations section: Prologue. (line 6) 8534 * declarations, Bison: Declarations. (line 6) 8535 * declarations, Bison (introduction): Bison Declarations. (line 6) 8536 * declaring literal string tokens: Token Decl. (line 6) 8537 * declaring operator precedence: Precedence Decl. (line 6) 8538 * declaring the start symbol: Start Decl. (line 6) 8539 * declaring token type names: Token Decl. (line 6) 8540 * declaring value types: Union Decl. (line 6) 8541 * declaring value types, nonterminals: Type Decl. (line 6) 8542 * default action: Actions. (line 50) 8543 * default data type: Value Type. (line 6) 8544 * default location type: Location Type. (line 6) 8545 * default stack limit: Memory Management. (line 30) 8546 * default start symbol: Start Decl. (line 6) 8547 * deferred semantic actions: GLR Semantic Actions. 8548 (line 6) 8549 * defining language semantics: Semantics. (line 6) 8550 * discarded symbols: Destructor Decl. (line 42) 8551 * discarded symbols, mid-rule actions: Mid-Rule Actions. (line 59) 8552 * else, dangling: Shift/Reduce. (line 6) 8553 * end on location: C++ Location Values. (line 45) 8554 * epilogue: Epilogue. (line 6) 8555 * error <1>: Table of Symbols. (line 73) 8556 * error: Error Recovery. (line 20) 8557 * error on parser: C++ Parser Interface. 8558 (line 37) 8559 * error recovery: Error Recovery. (line 6) 8560 * error recovery, mid-rule actions: Mid-Rule Actions. (line 59) 8561 * error recovery, simple: Simple Error Recovery. 8562 (line 6) 8563 * error reporting function: Error Reporting. (line 6) 8564 * error reporting routine: Rpcalc Error. (line 6) 8565 * examples, simple: Examples. (line 6) 8566 * exercises: Exercises. (line 6) 8567 * FDL, GNU Free Documentation License: GNU Free Documentation License. 8568 (line 6) 8569 * file format: Grammar Layout. (line 6) 8570 * file on position: C++ Location Values. (line 13) 8571 * finite-state machine: Parser States. (line 6) 8572 * formal grammar: Grammar in Bison. (line 6) 8573 * format of grammar file: Grammar Layout. (line 6) 8574 * freeing discarded symbols: Destructor Decl. (line 6) 8575 * frequently asked questions: FAQ. (line 6) 8576 * generalized LR (GLR) parsing <1>: Generalized LR Parsing. 8577 (line 6) 8578 * generalized LR (GLR) parsing <2>: GLR Parsers. (line 6) 8579 * generalized LR (GLR) parsing: Language and Grammar. 8580 (line 33) 8581 * generalized LR (GLR) parsing, ambiguous grammars: Merging GLR Parses. 8582 (line 6) 8583 * generalized LR (GLR) parsing, unambiguous grammars: Simple GLR Parsers. 8584 (line 6) 8585 * gettext: Internationalization. 8586 (line 6) 8587 * glossary: Glossary. (line 6) 8588 * GLR parsers and inline: Compiler Requirements. 8589 (line 6) 8590 * GLR parsers and yychar: GLR Semantic Actions. 8591 (line 10) 8592 * GLR parsers and yyclearin: GLR Semantic Actions. 8593 (line 18) 8594 * GLR parsers and YYERROR: GLR Semantic Actions. 8595 (line 28) 8596 * GLR parsers and yylloc: GLR Semantic Actions. 8597 (line 10) 8598 * GLR parsers and YYLLOC_DEFAULT: Location Default Action. 8599 (line 6) 8600 * GLR parsers and yylval: GLR Semantic Actions. 8601 (line 10) 8602 * GLR parsing <1>: Generalized LR Parsing. 8603 (line 6) 8604 * GLR parsing <2>: GLR Parsers. (line 6) 8605 * GLR parsing: Language and Grammar. 8606 (line 33) 8607 * GLR parsing, ambiguous grammars: Merging GLR Parses. (line 6) 8608 * GLR parsing, unambiguous grammars: Simple GLR Parsers. (line 6) 8609 * grammar file: Grammar Layout. (line 6) 8610 * grammar rule syntax: Rules. (line 6) 8611 * grammar rules section: Grammar Rules. (line 6) 8612 * grammar, Bison: Grammar in Bison. (line 6) 8613 * grammar, context-free: Language and Grammar. 8614 (line 6) 8615 * grouping, syntactic: Language and Grammar. 8616 (line 47) 8617 * i18n: Internationalization. 8618 (line 6) 8619 * infix notation calculator: Infix Calc. (line 6) 8620 * inline: Compiler Requirements. 8621 (line 6) 8622 * interface: Interface. (line 6) 8623 * internationalization: Internationalization. 8624 (line 6) 8625 * introduction: Introduction. (line 6) 8626 * invoking Bison: Invocation. (line 6) 8627 * item: Understanding. (line 107) 8628 * item set core: Understanding. (line 129) 8629 * kernel, item set: Understanding. (line 129) 8630 * LALR(1): Mystery Conflicts. (line 36) 8631 * LALR(1) grammars: Language and Grammar. 8632 (line 22) 8633 * language semantics, defining: Semantics. (line 6) 8634 * layout of Bison grammar: Grammar Layout. (line 6) 8635 * left recursion: Recursion. (line 16) 8636 * lex-param: Pure Calling. (line 31) 8637 * lexical analyzer: Lexical. (line 6) 8638 * lexical analyzer, purpose: Bison Parser. (line 6) 8639 * lexical analyzer, writing: Rpcalc Lexer. (line 6) 8640 * lexical tie-in: Lexical Tie-ins. (line 6) 8641 * line on position: C++ Location Values. (line 19) 8642 * lines on location: C++ Location Values. (line 49) 8643 * lines on position: C++ Location Values. (line 22) 8644 * literal string token: Symbols. (line 53) 8645 * literal token: Symbols. (line 31) 8646 * location <1>: Locations. (line 6) 8647 * location: Locations Overview. (line 6) 8648 * location actions: Actions and Locations. 8649 (line 6) 8650 * location tracking calculator: Location Tracking Calc. 8651 (line 6) 8652 * location, textual <1>: Locations. (line 6) 8653 * location, textual: Locations Overview. (line 6) 8654 * location_value_type: C++ Parser Interface. 8655 (line 16) 8656 * look-ahead token: Look-Ahead. (line 6) 8657 * LR(1): Mystery Conflicts. (line 36) 8658 * LR(1) grammars: Language and Grammar. 8659 (line 22) 8660 * ltcalc: Location Tracking Calc. 8661 (line 6) 8662 * main function in simple example: Rpcalc Main. (line 6) 8663 * memory exhaustion: Memory Management. (line 6) 8664 * memory management: Memory Management. (line 6) 8665 * mfcalc: Multi-function Calc. (line 6) 8666 * mid-rule actions: Mid-Rule Actions. (line 6) 8667 * multi-function calculator: Multi-function Calc. (line 6) 8668 * multicharacter literal: Symbols. (line 53) 8669 * mutual recursion: Recursion. (line 32) 8670 * NLS: Internationalization. 8671 (line 6) 8672 * nondeterministic parsing <1>: Generalized LR Parsing. 8673 (line 6) 8674 * nondeterministic parsing: Language and Grammar. 8675 (line 33) 8676 * nonterminal symbol: Symbols. (line 6) 8677 * nonterminal, useless: Understanding. (line 62) 8678 * operator precedence: Precedence. (line 6) 8679 * operator precedence, declaring: Precedence Decl. (line 6) 8680 * operator+ on location: C++ Location Values. (line 53) 8681 * operator+ on position: C++ Location Values. (line 33) 8682 * operator+= on location: C++ Location Values. (line 57) 8683 * operator+= on position: C++ Location Values. (line 31) 8684 * operator- on position: C++ Location Values. (line 36) 8685 * operator-= on position: C++ Location Values. (line 35) 8686 * operator<< on position: C++ Location Values. (line 40) 8687 * options for invoking Bison: Invocation. (line 6) 8688 * overflow of parser stack: Memory Management. (line 6) 8689 * parse error: Error Reporting. (line 6) 8690 * parse on parser: C++ Parser Interface. 8691 (line 23) 8692 * parser: Bison Parser. (line 6) 8693 * parser on parser: C++ Parser Interface. 8694 (line 19) 8695 * parser stack: Algorithm. (line 6) 8696 * parser stack overflow: Memory Management. (line 6) 8697 * parser state: Parser States. (line 6) 8698 * pointed rule: Understanding. (line 107) 8699 * polish notation calculator: RPN Calc. (line 6) 8700 * precedence declarations: Precedence Decl. (line 6) 8701 * precedence of operators: Precedence. (line 6) 8702 * precedence, context-dependent: Contextual Precedence. 8703 (line 6) 8704 * precedence, unary operator: Contextual Precedence. 8705 (line 6) 8706 * preventing warnings about conflicts: Expect Decl. (line 6) 8707 * Prologue: Prologue. (line 6) 8708 * pure parser: Pure Decl. (line 6) 8709 * questions: FAQ. (line 6) 8710 * recovery from errors: Error Recovery. (line 6) 8711 * recursive rule: Recursion. (line 6) 8712 * reduce/reduce conflict: Reduce/Reduce. (line 6) 8713 * reduce/reduce conflicts <1>: Merging GLR Parses. (line 6) 8714 * reduce/reduce conflicts <2>: Simple GLR Parsers. (line 6) 8715 * reduce/reduce conflicts: GLR Parsers. (line 6) 8716 * reduction: Algorithm. (line 6) 8717 * reentrant parser: Pure Decl. (line 6) 8718 * requiring a version of Bison: Require Decl. (line 6) 8719 * reverse polish notation: RPN Calc. (line 6) 8720 * right recursion: Recursion. (line 16) 8721 * rpcalc: RPN Calc. (line 6) 8722 * rule syntax: Rules. (line 6) 8723 * rule, pointed: Understanding. (line 107) 8724 * rule, useless: Understanding. (line 62) 8725 * rules section for grammar: Grammar Rules. (line 6) 8726 * running Bison (introduction): Rpcalc Gen. (line 6) 8727 * semantic actions: Semantic Actions. (line 6) 8728 * semantic value: Semantic Values. (line 6) 8729 * semantic value type: Value Type. (line 6) 8730 * semantic_value_type: C++ Parser Interface. 8731 (line 15) 8732 * set_debug_level on parser: C++ Parser Interface. 8733 (line 32) 8734 * set_debug_stream on parser: C++ Parser Interface. 8735 (line 27) 8736 * shift/reduce conflicts <1>: Shift/Reduce. (line 6) 8737 * shift/reduce conflicts <2>: Simple GLR Parsers. (line 6) 8738 * shift/reduce conflicts: GLR Parsers. (line 6) 8739 * shifting: Algorithm. (line 6) 8740 * simple examples: Examples. (line 6) 8741 * single-character literal: Symbols. (line 31) 8742 * stack overflow: Memory Management. (line 6) 8743 * stack, parser: Algorithm. (line 6) 8744 * stages in using Bison: Stages. (line 6) 8745 * start symbol: Language and Grammar. 8746 (line 96) 8747 * start symbol, declaring: Start Decl. (line 6) 8748 * state (of parser): Parser States. (line 6) 8749 * step on location: C++ Location Values. (line 60) 8750 * string token: Symbols. (line 53) 8751 * summary, action features: Action Features. (line 6) 8752 * summary, Bison declaration: Decl Summary. (line 6) 8753 * suppressing conflict warnings: Expect Decl. (line 6) 8754 * symbol: Symbols. (line 6) 8755 * symbol table example: Mfcalc Symtab. (line 6) 8756 * symbols (abstract): Language and Grammar. 8757 (line 47) 8758 * symbols in Bison, table of: Table of Symbols. (line 6) 8759 * syntactic grouping: Language and Grammar. 8760 (line 47) 8761 * syntax error: Error Reporting. (line 6) 8762 * syntax of grammar rules: Rules. (line 6) 8763 * terminal symbol: Symbols. (line 6) 8764 * textual location <1>: Locations. (line 6) 8765 * textual location: Locations Overview. (line 6) 8766 * token: Language and Grammar. 8767 (line 47) 8768 * token type: Symbols. (line 6) 8769 * token type names, declaring: Token Decl. (line 6) 8770 * token, useless: Understanding. (line 62) 8771 * tracing the parser: Tracing. (line 6) 8772 * unary operator precedence: Contextual Precedence. 8773 (line 6) 8774 * useless nonterminal: Understanding. (line 62) 8775 * useless rule: Understanding. (line 62) 8776 * useless token: Understanding. (line 62) 8777 * using Bison: Stages. (line 6) 8778 * value type, semantic: Value Type. (line 6) 8779 * value types, declaring: Union Decl. (line 6) 8780 * value types, nonterminals, declaring: Type Decl. (line 6) 8781 * value, semantic: Semantic Values. (line 6) 8782 * version requirement: Require Decl. (line 6) 8783 * warnings, preventing: Expect Decl. (line 6) 8784 * writing a lexical analyzer: Rpcalc Lexer. (line 6) 8785 * YYABORT <1>: Table of Symbols. (line 177) 8786 * YYABORT: Parser Function. (line 29) 8787 * YYABORT;: Action Features. (line 28) 8788 * YYACCEPT <1>: Table of Symbols. (line 183) 8789 * YYACCEPT: Parser Function. (line 26) 8790 * YYACCEPT;: Action Features. (line 32) 8791 * YYBACKUP <1>: Table of Symbols. (line 188) 8792 * YYBACKUP: Action Features. (line 36) 8793 * yychar <1>: Table of Symbols. (line 193) 8794 * yychar <2>: Look-Ahead. (line 47) 8795 * yychar <3>: Action Features. (line 69) 8796 * yychar: GLR Semantic Actions. 8797 (line 10) 8798 * yyclearin <1>: Table of Symbols. (line 199) 8799 * yyclearin <2>: Error Recovery. (line 97) 8800 * yyclearin: GLR Semantic Actions. 8801 (line 18) 8802 * yyclearin;: Action Features. (line 77) 8803 * yydebug: Table of Symbols. (line 207) 8804 * YYDEBUG <1>: Table of Symbols. (line 203) 8805 * YYDEBUG: Tracing. (line 12) 8806 * yydebug: Tracing. (line 6) 8807 * YYEMPTY: Action Features. (line 49) 8808 * YYENABLE_NLS: Internationalization. 8809 (line 27) 8810 * YYEOF: Action Features. (line 52) 8811 * yyerrok <1>: Table of Symbols. (line 212) 8812 * yyerrok: Error Recovery. (line 92) 8813 * yyerrok;: Action Features. (line 82) 8814 * yyerror: Table of Symbols. (line 222) 8815 * YYERROR <1>: Table of Symbols. (line 216) 8816 * YYERROR: Action Features. (line 56) 8817 * yyerror: Error Reporting. (line 6) 8818 * YYERROR: GLR Semantic Actions. 8819 (line 28) 8820 * YYERROR;: Action Features. (line 56) 8821 * YYERROR_VERBOSE: Table of Symbols. (line 226) 8822 * YYINITDEPTH <1>: Table of Symbols. (line 233) 8823 * YYINITDEPTH: Memory Management. (line 32) 8824 * yylex <1>: Table of Symbols. (line 237) 8825 * yylex: Lexical. (line 6) 8826 * yylex on parser: C++ Scanner Interface. 8827 (line 12) 8828 * YYLEX_PARAM: Table of Symbols. (line 242) 8829 * yylloc <1>: Table of Symbols. (line 248) 8830 * yylloc <2>: Look-Ahead. (line 47) 8831 * yylloc <3>: Action Features. (line 87) 8832 * yylloc <4>: Token Locations. (line 6) 8833 * yylloc <5>: Actions and Locations. 8834 (line 60) 8835 * yylloc: GLR Semantic Actions. 8836 (line 10) 8837 * YYLLOC_DEFAULT: Location Default Action. 8838 (line 6) 8839 * YYLTYPE <1>: Table of Symbols. (line 258) 8840 * YYLTYPE: Token Locations. (line 19) 8841 * yylval <1>: Table of Symbols. (line 262) 8842 * yylval <2>: Look-Ahead. (line 47) 8843 * yylval <3>: Action Features. (line 93) 8844 * yylval <4>: Token Values. (line 6) 8845 * yylval <5>: Actions. (line 74) 8846 * yylval: GLR Semantic Actions. 8847 (line 10) 8848 * YYMAXDEPTH <1>: Table of Symbols. (line 270) 8849 * YYMAXDEPTH: Memory Management. (line 14) 8850 * yynerrs <1>: Table of Symbols. (line 274) 8851 * yynerrs: Error Reporting. (line 92) 8852 * yyparse <1>: Table of Symbols. (line 280) 8853 * yyparse: Parser Function. (line 6) 8854 * YYPARSE_PARAM: Table of Symbols. (line 284) 8855 * YYPRINT: Tracing. (line 71) 8856 * YYRECOVERING <1>: Table of Symbols. (line 290) 8857 * YYRECOVERING <2>: Error Recovery. (line 109) 8858 * YYRECOVERING: Action Features. (line 64) 8859 * YYSTACK_USE_ALLOCA: Table of Symbols. (line 295) 8860 * YYSTYPE: Table of Symbols. (line 311) 8861 * | <1>: Table of Symbols. (line 43) 8862 * |: Rules. (line 49) 8863 8864 8865 8866 Tag Table: 8867 Node: Top1110 8868 Node: Introduction12389 8869 Node: Conditions13650 8870 Node: Copying15541 8871 Node: Concepts34719 8872 Node: Language and Grammar35873 8873 Node: Grammar in Bison41766 8874 Node: Semantic Values43695 8875 Node: Semantic Actions45801 8876 Node: GLR Parsers46988 8877 Node: Simple GLR Parsers49739 8878 Node: Merging GLR Parses56394 8879 Node: GLR Semantic Actions60963 8880 Node: Compiler Requirements62857 8881 Node: Locations Overview63593 8882 Node: Bison Parser65046 8883 Node: Stages67986 8884 Node: Grammar Layout69274 8885 Node: Examples70606 8886 Node: RPN Calc71780 8887 Node: Rpcalc Decls72759 8888 Node: Rpcalc Rules74680 8889 Node: Rpcalc Input76489 8890 Node: Rpcalc Line77964 8891 Node: Rpcalc Expr79092 8892 Node: Rpcalc Lexer81059 8893 Node: Rpcalc Main83646 8894 Node: Rpcalc Error84053 8895 Node: Rpcalc Gen85081 8896 Node: Rpcalc Compile86211 8897 Node: Infix Calc87085 8898 Node: Simple Error Recovery89848 8899 Node: Location Tracking Calc91743 8900 Node: Ltcalc Decls92430 8901 Node: Ltcalc Rules93383 8902 Node: Ltcalc Lexer95392 8903 Node: Multi-function Calc97715 8904 Node: Mfcalc Decl99288 8905 Node: Mfcalc Rules101327 8906 Node: Mfcalc Symtab102708 8907 Node: Exercises108878 8908 Node: Grammar File109392 8909 Node: Grammar Outline110241 8910 Node: Prologue111001 8911 Node: Bison Declarations112427 8912 Node: Grammar Rules112842 8913 Node: Epilogue113313 8914 Node: Symbols114329 8915 Node: Rules121032 8916 Node: Recursion123511 8917 Node: Semantics125229 8918 Node: Value Type126328 8919 Node: Multiple Types127101 8920 Node: Actions128131 8921 Node: Action Types131547 8922 Node: Mid-Rule Actions132859 8923 Node: Locations139308 8924 Node: Location Type139959 8925 Node: Actions and Locations140646 8926 Node: Location Default Action143108 8927 Node: Declarations146828 8928 Node: Require Decl148307 8929 Node: Token Decl148626 8930 Node: Precedence Decl150734 8931 Node: Union Decl152294 8932 Node: Type Decl153609 8933 Node: Initial Action Decl154535 8934 Node: Destructor Decl155307 8935 Node: Expect Decl157582 8936 Node: Start Decl159575 8937 Node: Pure Decl159963 8938 Node: Decl Summary161649 8939 Node: Multiple Parsers168975 8940 Node: Interface170484 8941 Node: Parser Function171457 8942 Node: Lexical173460 8943 Node: Calling Convention174871 8944 Node: Token Values177831 8945 Node: Token Locations178995 8946 Node: Pure Calling179889 8947 Node: Error Reporting181758 8948 Node: Action Features185876 8949 Node: Internationalization190197 8950 Node: Algorithm192738 8951 Node: Look-Ahead195105 8952 Node: Shift/Reduce197323 8953 Node: Precedence200220 8954 Node: Why Precedence200876 8955 Node: Using Precedence202750 8956 Node: Precedence Examples203727 8957 Node: How Precedence204437 8958 Node: Contextual Precedence205596 8959 Node: Parser States207392 8960 Node: Reduce/Reduce208640 8961 Node: Mystery Conflicts212181 8962 Node: Generalized LR Parsing215890 8963 Node: Memory Management220511 8964 Node: Error Recovery222724 8965 Node: Context Dependency228030 8966 Node: Semantic Tokens228879 8967 Node: Lexical Tie-ins231949 8968 Node: Tie-in Recovery233526 8969 Node: Debugging235703 8970 Node: Understanding236369 8971 Node: Tracing247515 8972 Node: Invocation251599 8973 Node: Bison Options253005 8974 Node: Option Cross Key257597 8975 Node: Yacc Library258419 8976 Node: C++ Language Interface259244 8977 Node: C++ Parsers259532 8978 Node: C++ Bison Interface259990 8979 Node: C++ Semantic Values261283 8980 Ref: C++ Semantic Values-Footnote-1262225 8981 Node: C++ Location Values262378 8982 Node: C++ Parser Interface264753 8983 Node: C++ Scanner Interface266473 8984 Node: A Complete C++ Example267140 8985 Node: Calc++ --- C++ Calculator268079 8986 Node: Calc++ Parsing Driver268589 8987 Node: Calc++ Parser272324 8988 Node: Calc++ Scanner276092 8989 Node: Calc++ Top Level279417 8990 Node: FAQ280084 8991 Node: Memory Exhausted281033 8992 Node: How Can I Reset the Parser281343 8993 Node: Strings are Destroyed283619 8994 Node: Implementing Gotos/Loops285208 8995 Node: Multiple start-symbols286491 8996 Node: Secure? Conform?288036 8997 Node: I can't build Bison288484 8998 Node: Where can I find help?289202 8999 Node: Bug Reports289995 9000 Node: Other Languages291457 9001 Node: Beta Testing291808 9002 Node: Mailing Lists292683 9003 Node: Table of Symbols292894 9004 Node: Glossary305135 9005 Node: Copying This Manual312036 9006 Node: GNU Free Documentation License312267 9007 Node: Index334676 9008 9009 End Tag Table 9010