1 ANTLR v3.0.1 C Runtime 2 ANTLR 3.0.1 3 January 1, 2008 4 5 At the moment, the use of the C runtime engine for the parser is not generally 6 for the inexperienced C programmer. However this is mainly because of the lack 7 of documentation on use, which will be corrected shortly. The C runtime 8 code itself is however well documented with doxygen style comments and a 9 reasonably experienced C programmer should be able to piece it together. You 10 can visit the documentation at: http://www.antlr.org/api/C/index.html 11 12 The general make up is that everything is implemented as a pseudo class/object 13 initialized with pointers to its 'member' functions and data. All objects are 14 (usually) created by factories, which auto manage the memory allocation and 15 release and generally make life easier. If you remember this rule, everything 16 should fall in to place. 17 18 Jim Idle - Portland Oregon, Jan 2008 19 jimi idle ws 20 21 =============================================================================== 22 23 Terence Parr, parrt at cs usfca edu 24 ANTLR project lead and supreme dictator for life 25 University of San Francisco 26 27 INTRODUCTION 28 29 Welcome to ANTLR v3! I've been working on this for nearly 4 years and it's 30 almost ready! I plan no feature additions between this beta and first 31 3.0 release. I have lots of features to add later, but this will be 32 the first set. Ultimately, I need to rewrite ANTLR v3 in itself (it's 33 written in 2.7.7 at the moment and also needs StringTemplate 3.0 or 34 later). 35 36 You should use v3 in conjunction with ANTLRWorks: 37 38 http://www.antlr.org/works/index.html 39 40 WARNING: We have bits of documentation started, but nothing super-complete 41 yet. The book will be printed May 2007: 42 43 http://www.pragmaticprogrammer.com/titles/tpantlr/index.html 44 45 but we should have a beta PDF available on that page in Feb 2007. 46 47 You also have the examples plus the source to guide you. 48 49 See the new wiki FAQ: 50 51 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+v3+FAQ 52 53 and general doc root: 54 55 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home 56 57 Please help add/update FAQ entries. 58 59 I have made very little effort at this point to deal well with 60 erroneous input (e.g., bad syntax might make ANTLR crash). I will clean 61 this up after I've rewritten v3 in v3. 62 63 Per the license in LICENSE.txt, this software is not guaranteed to 64 work and might even destroy all life on this planet: 65 66 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 67 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 68 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 69 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, 70 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 71 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 72 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 73 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 74 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 75 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 76 POSSIBILITY OF SUCH DAMAGE. 77 78 EXAMPLES 79 80 ANTLR v3 sample grammars: 81 82 http://www.antlr.org/download/examples-v3.tar.gz 83 84 contains the following examples: LL-star, cminus, dynamic-scope, 85 fuzzy, hoistedPredicates, island-grammar, java, python, scopes, 86 simplecTreeParser, treeparser, tweak, xmlLexer. 87 88 Also check out Mantra Programming Language for a prototype (work in 89 progress) using v3: 90 91 http://www.linguamantra.org/ 92 93 ---------------------------------------------------------------------- 94 95 What is ANTLR? 96 97 ANTLR stands for (AN)other (T)ool for (L)anguage (R)ecognition and was 98 originally known as PCCTS. ANTLR is a language tool that provides a 99 framework for constructing recognizers, compilers, and translators 100 from grammatical descriptions containing actions. Target language list: 101 102 http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets 103 104 ---------------------------------------------------------------------- 105 106 How is ANTLR v3 different than ANTLR v2? 107 108 See migration guide: 109 http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+ANTLR+3 110 111 ANTLR v3 has a far superior parsing algorithm called LL(*) that 112 handles many more grammars than v2 does. In practice, it means you 113 can throw almost any grammar at ANTLR that is non-left-recursive and 114 unambiguous (same input can be matched by multiple rules); the cost is 115 perhaps a tiny bit of backtracking, but with a DFA not a full parser. 116 You can manually set the max lookahead k as an option for any decision 117 though. The LL(*) algorithm ramps up to use more lookahead when it 118 needs to and is much more efficient than normal LL backtracking. There 119 is support for syntactic predicate (full LL backtracking) when LL(*) 120 fails. 121 122 Lexers are much easier due to the LL(*) algorithm as well. Previously 123 these two lexer rules would cause trouble because ANTLR couldn't 124 distinguish between them with finite lookahead to see the decimal 125 point: 126 127 INT : ('0'..'9')+ ; 128 FLOAT : INT '.' INT ; 129 130 The syntax is almost identical for features in common, but you should 131 note that labels are always '=' not ':'. So do id=ID not id:ID. 132 133 You can do combined lexer/parser grammars again (ala PCCTS) both lexer 134 and parser rules are defined in the same file. See the examples. 135 Really nice. You can reference strings and characters in the grammar 136 and ANTLR will generate the lexer for you. 137 138 The attribute structure has been enhanced. Rules may have multiple 139 return values, for example. Further, there are dynamically scoped 140 attributes whereby a rule may define a value usable by any rule it 141 invokes directly or indirectly w/o having to pass a parameter all the 142 way down. 143 144 ANTLR v3 tree construction is far superior--it provides tree rewrite 145 rules where the right hand side is simply the tree grammar fragment 146 describing the tree you want to build: 147 148 formalArgs 149 : typename declarator (',' typename declarator )* 150 -> ^(ARG typename declarator)+ 151 ; 152 153 That builds tree sequences like: 154 155 ^(ARG int v1) ^(ARG int v2) 156 157 ANTLR v3 also incorporates StringTemplate: 158 159 http://www.stringtemplate.org 160 161 just like AST support. It is useful for generating output. For 162 example this rule creates a template called 'import' for each import 163 definition found in the input stream: 164 165 grammar Java; 166 options { 167 output=template; 168 } 169 ... 170 importDefinition 171 : 'import' identifierStar SEMI 172 -> import(name={$identifierStar.st}, 173 begin={$identifierStar.start}, 174 end={$identifierStar.stop}) 175 ; 176 177 The attributes are set via assignments in the argument list. The 178 arguments are actions with arbitrary expressions in the target 179 language. The .st label property is the result template from a rule 180 reference. There is a nice shorthand in actions too: 181 182 %foo(a={},b={},...) ctor 183 %({name-expr})(a={},...) indirect template ctor reference 184 %{string-expr} anonymous template from string expr 185 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z 186 %x.y = z; set template attribute y of x (always set never get attr) 187 to z [languages like python without ';' must still use the 188 ';' which the code generator is free to remove during code gen] 189 Same as '(x).setAttribute("y", z);' 190 191 For ANTLR v3 I decided to make the most common tasks easy by default 192 rather. This means that some of the basic objects are heavier weight 193 than some speed demons would like, but they are free to pare it down 194 leaving most programmers the luxury of having it "just work." For 195 example, to read in some input, tweak it, and write it back out 196 preserving whitespace, is easy in v3. 197 198 The ANTLR source code is much prettier. You'll also note that the 199 run-time classes are conveniently encapsulated in the 200 org.antlr.runtime package. 201 202 ---------------------------------------------------------------------- 203 204 How do I install this damn thing? 205 206 Just untar and you'll get: 207 208 antlr-3.0b6/README.txt (this file) 209 antlr-3.0b6/LICENSE.txt 210 antlr-3.0b6/src/org/antlr/... 211 antlr-3.0b6/lib/stringtemplate-3.0.jar (3.0b6 needs 3.0) 212 antlr-3.0b6/lib/antlr-2.7.7.jar 213 antlr-3.0b6/lib/antlr-3.0b6.jar 214 215 Then you need to add all the jars in lib to your CLASSPATH. 216 217 ---------------------------------------------------------------------- 218 219 How do I use ANTLR v3? 220 221 [I am assuming you are only using the command-line (and not the 222 ANTLRWorks GUI)]. 223 224 Running ANTLR with no parameters shows you: 225 226 ANTLR Parser Generator Early Access Version 3.0b6 (Jan 31, 2007) 1989-2007 227 usage: java org.antlr.Tool [args] file.g [file2.g file3.g ...] 228 -o outputDir specify output directory where all output is generated 229 -lib dir specify location of token files 230 -report print out a report about the grammar(s) processed 231 -print print out the grammar without actions 232 -debug generate a parser that emits debugging events 233 -profile generate a parser that computes profiling information 234 -nfa generate an NFA for each rule 235 -dfa generate a DFA for each decision point 236 -message-format name specify output style for messages 237 -X display extended argument list 238 239 For example, consider how to make the LL-star example from the examples 240 tarball you can get at http://www.antlr.org/download/examples-v3.tar.gz 241 242 $ cd examples/java/LL-star 243 $ java org.antlr.Tool simplec.g 244 $ jikes *.java 245 246 For input: 247 248 char c; 249 int x; 250 void bar(int x); 251 int foo(int y, char d) { 252 int i; 253 for (i=0; i<3; i=i+1) { 254 x=3; 255 y=5; 256 } 257 } 258 259 you will see output as follows: 260 261 $ java Main input 262 bar is a declaration 263 foo is a definition 264 265 What if I want to test my parser without generating code? Easy. Just 266 run ANTLR in interpreter mode. It can't execute your actions, but it 267 can create a parse tree from your input to show you how it would be 268 matched. Use the org.antlr.tool.Interp main class. In the following, 269 I interpret simplec.g on t.c, which contains "int x;" 270 271 $ java org.antlr.tool.Interp simplec.g WS program t.c 272 ( <grammar SimpleC> 273 ( program 274 ( declaration 275 ( variable 276 ( type [@0,0:2='int',<14>,1:0] ) 277 ( declarator [@2,4:4='x',<2>,1:4] ) 278 [@3,5:5=';',<5>,1:5] 279 ) 280 ) 281 ) 282 ) 283 284 where I have formatted the output to make it more readable. I have 285 told it to ignore all WS tokens. 286 287 ---------------------------------------------------------------------- 288 289 How do I rebuild ANTLR v3? 290 291 Make sure the following two jars are in your CLASSPATH 292 293 antlr-3.0b6/lib/stringtemplate-3.0.jar 294 antlr-3.0b6/lib/antlr-2.7.7.jar 295 junit.jar [if you want to build the test directories] 296 297 then jump into antlr-3.0b6/src directory and then type: 298 299 $ javac -d . org/antlr/Tool.java org/antlr/*/*.java org/antlr/*/*/*.java 300 301 Takes 9 seconds on my 1Ghz laptop or 4 seconds with jikes. Later I'll 302 have a real build mechanism, though I must admit the one-liner appeals 303 to me. I use Intellij so I never type anything actually to build. 304 305 There is also an ANT build.xml file, but I know nothing of ANT; contributed 306 by others (I'm opposed to any tool with an XML interface for Humans). 307 308 ----------------------------------------------------------------------- 309 C# Target Notes 310 311 1. Auto-generated lexers do not inherit parent parser's @namespace 312 {...} value. Use @lexer::namespace{...}. 313 314 ----------------------------------------------------------------------- 315 316 CHANGES 317 318 March 17, 2007 319 320 * Jonathan DeKlotz updated C# templates to be 3.0b6 current 321 322 March 14, 2007 323 324 * Manually-specified (...)=> force backtracking eval of that predicate. 325 backtracking=true mode does not however. Added unit test. 326 327 March 14, 2007 328 329 * Fixed bug in lexer where ~T didn't compute the set from rule T. 330 331 * Added -Xnoinlinedfa make all DFA with tables; no inline prediction with IFs 332 333 * Fixed http://www.antlr.org:8888/browse/ANTLR-80. 334 Sem pred states didn't define lookahead vars. 335 336 * Fixed http://www.antlr.org:8888/browse/ANTLR-91. 337 When forcing some acyclic DFA to be state tables, they broke. 338 Forcing all DFA to be state tables should give same results. 339 340 March 12, 2007 341 342 * setTokenSource in CommonTokenStream didn't clear tokens list. 343 setCharStream calls reset in Lexer. 344 345 * Altered -depend. No longer printing grammar files for multiple input 346 files with -depend. Doesn't show T__.g temp file anymore. Added 347 TLexer.tokens. Added .h files if defined. 348 349 February 11, 2007 350 351 * Added -depend command-line option that, instead of processing files, 352 it shows you what files the input grammar(s) depend on and what files 353 they generate. For combined grammar T.g: 354 355 $ java org.antlr.Tool -depend T.g 356 357 You get: 358 359 TParser.java : T.g 360 T.tokens : T.g 361 T__.g : T.g 362 363 Now, assuming U.g is a tree grammar ref'd T's tokens: 364 365 $ java org.antlr.Tool -depend T.g U.g 366 367 TParser.java : T.g 368 T.tokens : T.g 369 T__.g : T.g 370 U.g: T.tokens 371 U.java : U.g 372 U.tokens : U.g 373 374 Handles spaces by escaping them. Pays attention to -o, -fo and -lib. 375 Dir 'x y' is a valid dir in current dir. 376 377 $ java org.antlr.Tool -depend -lib /usr/local/lib -o 'x y' T.g U.g 378 x\ y/TParser.java : T.g 379 x\ y/T.tokens : T.g 380 x\ y/T__.g : T.g 381 U.g: /usr/local/lib/T.tokens 382 x\ y/U.java : U.g 383 x\ y/U.tokens : U.g 384 385 You have API access via org.antlr.tool.BuildDependencyGenerator class: 386 getGeneratedFileList(), getDependenciesFileList(). You can also access 387 the output template: getDependencies(). The file 388 org/antlr/tool/templates/depend.stg contains the template. You can 389 modify as you want. File objects go in so you can play with path etc... 390 391 February 10, 2007 392 393 * no more .gl files generated. All .g all the time. 394 395 * changed @finally to be @after and added a finally clause to the 396 exception stuff. I also removed the superfluous "exception" 397 keyword. Here's what the new syntax looks like: 398 399 a 400 @after { System.out.println("ick"); } 401 : 'a' 402 ; 403 catch[RecognitionException e] { System.out.println("foo"); } 404 catch[IOException e] { System.out.println("io"); } 405 finally { System.out.println("foobar"); } 406 407 @after executes after bookkeeping to set $rule.stop, $rule.tree but 408 before scopes pop and any memoization happens. Dynamic scopes and 409 memoization are still in generated finally block because they must 410 exec even if error in rule. The @after action and tree setting 411 stuff can technically be skipped upon syntax error in rule. [Later 412 we might add something to finally to stick an ERROR token in the 413 tree and set the return value.] Sequence goes: set $stop, $tree (if 414 any), @after (if any), pop scopes (if any), memoize (if needed), 415 grammar finally clause. Last 3 are in generated code's finally 416 clause. 417 418 3.0b6 - January 31, 2007 419 420 January 30, 2007 421 422 * Fixed bug in IntervalSet.and: it returned the same empty set all the time 423 rather than new empty set. Code altered the same empty set. 424 425 * Made analysis terminate faster upon a decision that takes too long; 426 it seemed to keep doing work for a while. Refactored some names 427 and updated comments. Also made it terminate when it realizes it's 428 non-LL(*) due to recursion. just added terminate conditions to loop 429 in convert(). 430 431 * Sometimes fatal non-LL(*) messages didn't appear; instead you got 432 "antlr couldn't analyze", which is actually untrue. I had the 433 order of some prints wrong in the DecisionProbe. 434 435 * The code generator incorrectly detected when it could use a fixed, 436 acyclic inline DFA (i.e., using an IF). Upon non-LL(*) decisions 437 with predicates, analysis made cyclic DFA. But this stops 438 the computation detecting whether they are cyclic. I just added 439 a protection in front of the acyclic DFA generator to avoid if 440 non-LL(*). Updated comments. 441 442 January 23, 2007 443 444 * Made tree node streams use adaptor to create navigation nodes. 445 Thanks to Emond Papegaaij. 446 447 January 22, 2007 448 449 * Added lexer rule properties: start, stop 450 451 January 1, 2007 452 453 * analysis failsafe is back on; if a decision takes too long, it bails out 454 and uses k=1 455 456 January 1, 2007 457 458 * += labels for rules only work for output option; previously elements 459 of list were the return value structs, but are now either the tree or 460 StringTemplate return value. You can label different rules now 461 x+=a x+=b. 462 463 December 30, 2006 464 465 * Allow \" to work correctly in "..." template. 466 467 December 28, 2006 468 469 * errors that are now warnings: missing AST label type in trees. 470 Also "no start rule detected" is warning. 471 472 * tree grammars also can do rewrite=true for output=template. 473 Only works for alts with single node or tree as alt elements. 474 If you are going to use $text in a tree grammar or do rewrite=true 475 for templates, you must use in your main: 476 477 nodes.setTokenStream(tokens); 478 479 * You get a warning for tree grammars that do rewrite=true and 480 output=template and have -> for alts that are not simple nodes 481 or simple trees. new unit tests in TestRewriteTemplates at end. 482 483 December 27, 2006 484 485 * Error message appears when you use -> in tree grammar with 486 output=template and rewrite=true for alt that is not simple 487 node or tree ref. 488 489 * no more $stop attribute for tree parsers; meaningless/useless. 490 Removed from TreeRuleReturnScope also. 491 492 * rule text attribute in tree parser must pull from token buffer. 493 Makes no sense otherwise. added getTokenStream to TreeNodeStream 494 so rule $text attr works. CommonTreeNodeStream etc... now let 495 you set the token stream so you can access later from tree parser. 496 $text is not well-defined for rules like 497 498 slist : stat+ ; 499 500 because stat is not a single node nor rooted with a single node. 501 $slist.text will get only first stat. I need to add a warning about 502 this... 503 504 * Fixed http://www.antlr.org:8888/browse/ANTLR-76 for Java. 505 Enhanced TokenRewriteStream so it accepts any object; converts 506 to string at last second. Allows you to rewrite with StringTemplate 507 templates now :) 508 509 * added rewrite option that makes -> template rewrites do replace ops for 510 TokenRewriteStream input stream. In output=template and rewrite=true mode 511 same as before 'cept that the parser does 512 513 ((TokenRewriteStream)input).replace( 514 ((Token)retval.start).getTokenIndex(), 515 input.LT(-1).getTokenIndex(), 516 retval.st); 517 518 after each rewrite so that the input stream is altered. Later refs to 519 $text will have rewrites. Here's a sample test program for grammar Rew. 520 521 FileReader groupFileR = new FileReader("Rew.stg"); 522 StringTemplateGroup templates = new StringTemplateGroup(groupFileR); 523 ANTLRInputStream input = new ANTLRInputStream(System.in); 524 RewLexer lexer = new RewLexer(input); 525 TokenRewriteStream tokens = new TokenRewriteStream(lexer); 526 RewParser parser = new RewParser(tokens); 527 parser.setTemplateLib(templates); 528 parser.program(); 529 System.out.println(tokens.toString()); 530 groupFileR.close(); 531 532 December 26, 2006 533 534 * BaseTree.dupTree didn't dup recursively. 535 536 December 24, 2006 537 538 * Cleaned up some comments and removed field treeNode 539 from MismatchedTreeNodeException class. It is "node" in 540 RecognitionException. 541 542 * Changed type from Object to BitSet for expecting fields in 543 MismatchedSetException and MismatchedNotSetException 544 545 * Cleaned up error printing in lexers and the messages that it creates. 546 547 * Added this to TreeAdaptor: 548 /** Return the token object from which this node was created. 549 * Currently used only for printing an error message. 550 * The error display routine in BaseRecognizer needs to 551 * display where the input the error occurred. If your 552 * tree of limitation does not store information that can 553 * lead you to the token, you can create a token filled with 554 * the appropriate information and pass that back. See 555 * BaseRecognizer.getErrorMessage(). 556 */ 557 public Token getToken(Object t); 558 559 December 23, 2006 560 561 * made BaseRecognizer.displayRecognitionError nonstatic so people can 562 override it. Not sure why it was static before. 563 564 * Removed state/decision message that comes out of no 565 viable alternative exceptions, as that was too much. 566 removed the decision number from the early exit exception 567 also. During development, you can simply override 568 displayRecognitionError from BaseRecognizer to add the stuff 569 back in if you want. 570 571 * made output go to an output method you can override: emitErrorMessage() 572 573 * general cleanup of the error emitting code in BaseRecognizer. Lots 574 more stuff you can override: getErrorHeader, getTokenErrorDisplay, 575 emitErrorMessage, getErrorMessage. 576 577 December 22, 2006 578 579 * Altered Tree.Parser.matchAny() so that it skips entire trees if 580 node has children otherwise skips one node. Now this works to 581 skip entire body of function if single-rooted subtree: 582 ^(FUNC name=ID arg=ID .) 583 584 * Added "reverse index" from node to stream index. Override 585 fillReverseIndex() in CommonTreeNodeStream if you want to change. 586 Use getNodeIndex(node) to find stream index for a specific tree node. 587 See getNodeIndex(), reverseIndex(Set tokenTypes), 588 reverseIndex(int tokenType), fillReverseIndex(). The indexing 589 costs time and memory to fill, but pulling stuff out will be lots 590 faster as it can jump from a node ptr straight to a stream index. 591 592 * Added TreeNodeStream.get(index) to make it easier for interpreters to 593 jump around in tree node stream. 594 595 * New CommonTreeNodeStream buffers all nodes in stream for fast jumping 596 around. It now has push/pop methods to invoke other locations in 597 the stream for building interpreters. 598 599 * Moved CommonTreeNodeStream to UnBufferedTreeNodeStream and removed 600 Iterator implementation. moved toNodesOnlyString() to TestTreeNodeStream 601 602 * [BREAKS ANY TREE IMPLEMENTATION] 603 made CommonTreeNodeStream work with any tree node type. TreeAdaptor 604 now implements isNil so must add; trivial, but does break back 605 compatibility. 606 607 December 17, 2006 608 609 * Added traceIn/Out methods to recognizers so that you can override them; 610 previously they were in-line print statements. The message has also 611 been slightly improved. 612 613 * Factored BuildParseTree into debug package; cleaned stuff up. Fixed 614 unit tests. 615 616 December 15, 2006 617 618 * [BREAKS ANY TREE IMPLEMENTATION] 619 org.antlr.runtime.tree.Tree; needed to add get/set for token start/stop 620 index so CommonTreeAdaptor can assume Tree interface not CommonTree 621 implementation. Otherwise, no way to create your own nodes that satisfy 622 Tree because CommonTreeAdaptor was doing 623 624 public int getTokenStartIndex(Object t) { 625 return ((CommonTree)t).startIndex; 626 } 627 628 Added to Tree: 629 630 /** What is the smallest token index (indexing from 0) for this node 631 * and its children? 632 */ 633 int getTokenStartIndex(); 634 635 void setTokenStartIndex(int index); 636 637 /** What is the largest token index (indexing from 0) for this node 638 * and its children? 639 */ 640 int getTokenStopIndex(); 641 642 void setTokenStopIndex(int index); 643 644 December 13, 2006 645 646 * Added org.antlr.runtime.tree.DOTTreeGenerator so you can generate DOT 647 diagrams easily from trees. 648 649 CharStream input = new ANTLRInputStream(System.in); 650 TLexer lex = new TLexer(input); 651 CommonTokenStream tokens = new CommonTokenStream(lex); 652 TParser parser = new TParser(tokens); 653 TParser.e_return r = parser.e(); 654 Tree t = (Tree)r.tree; 655 System.out.println(t.toStringTree()); 656 DOTTreeGenerator gen = new DOTTreeGenerator(); 657 StringTemplate st = gen.toDOT(t); 658 System.out.println(st); 659 660 * Changed the way mark()/rewind() work in CommonTreeNode stream to mirror 661 more flexible solution in ANTLRStringStream. Forgot to set lastMarker 662 anyway. Now you can rewind to non-most-recent marker. 663 664 December 12, 2006 665 666 * Temp lexer now end in .gl (T__.gl, for example) 667 668 * TreeParser suffix no longer generated for tree grammars 669 670 * Defined reset for lexer, parser, tree parser; rewinds the input stream also 671 672 December 10, 2006 673 674 * Made Grammar.abortNFAToDFAConversion() abort in middle of a DFA. 675 676 December 9, 2006 677 678 * fixed bug in OrderedHashSet.add(). It didn't track elements correctly. 679 680 December 6, 2006 681 682 * updated build.xml for future Ant compatibility, thanks to Matt Benson. 683 684 * various tests in TestRewriteTemplate and TestSyntacticPredicateEvaluation 685 were using the old 'channel' vs. new '$channel' notation. 686 TestInterpretedParsing didn't pick up an earlier change to CommonToken. 687 Reported by Matt Benson. 688 689 * fixed platform dependent test failures in TestTemplates, supplied by Matt 690 Benson. 691 692 November 29, 2006 693 694 * optimized semantic predicate evaluation so that p||!p yields true. 695 696 November 22, 2006 697 698 * fixed bug that prevented var = $rule.some_retval from working in anything 699 but the first alternative of a rule or subrule. 700 701 * attribute names containing digits were not allowed, this is now fixed, 702 allowing attributes like 'name1' but not '1name1'. 703 704 November 19, 2006 705 706 * Removed LeftRecursionMessage and apparatus because it seems that I check 707 for left recursion upfront before analysis and everything gets specified as 708 recursion cycles at this point. 709 710 November 16, 2006 711 712 * TokenRewriteStream.replace was not passing programName to next method. 713 714 November 15, 2006 715 716 * updated DOT files for DFA generation to make smaller circles. 717 718 * made epsilon edges italics in the NFA diagrams. 719 720 3.0b5 - November 15, 2006 721 722 The biggest thing is that your grammar file names must match the grammar name 723 inside (your generated class names will also be different) and we use 724 $channel=HIDDEN now instead of channel=99 inside lexer actions. 725 Should be compatible other than that. Please look at complete list of 726 changes. 727 728 November 14, 2006 729 730 * Force token index to be -1 for CommonIndex in case not set. 731 732 November 11, 2006 733 734 * getUniqueID for TreeAdaptor now uses identityHashCode instead of hashCode. 735 736 November 10, 2006 737 738 * No grammar nondeterminism warning now when wildcard '.' is final alt. 739 Examples: 740 741 a : A | B | . ; 742 743 A : 'a' 744 | . 745 ; 746 747 SL_COMMENT 748 : '//' (options {greedy=false;} : .)* '\r'? '\n' 749 ; 750 751 SL_COMMENT2 752 : '//' (options {greedy=false;} : 'x'|.)* '\r'? '\n' 753 ; 754 755 756 November 8, 2006 757 758 * Syntactic predicates did not get hoisting properly upon non-LL(*) decision. Other hoisting issues fixed. Cleaned up code. 759 760 * Removed failsafe that check to see if I'm spending too much time on a single DFA; I don't think we need it anymore. 761 762 November 3, 2006 763 764 * $text, $line, etc... were not working in assignments. Fixed and added 765 test case. 766 767 * $label.text translated to label.getText in lexer even if label was on a char 768 769 November 2, 2006 770 771 * Added error if you don't specify what the AST type is; actions in tree 772 grammar won't work without it. 773 774 $ cat x.g 775 tree grammar x; 776 a : ID {String s = $ID.text;} ; 777 778 ANTLR Parser Generator Early Access Version 3.0b5 (??, 2006) 1989-2006 779 error: x.g:0:0: (152) tree grammar x has no ASTLabelType option 780 781 November 1, 2006 782 783 * $text, $line, etc... were not working properly within lexer rule. 784 785 October 32, 2006 786 787 * Finally actions now execute before dynamic scopes are popped it in the 788 rule. Previously was not possible to access the rules scoped variables 789 in a finally action. 790 791 October 29, 2006 792 793 * Altered ActionTranslator to emit errors on setting read-only attributes 794 such as $start, $stop, $text in a rule. Also forbid setting any attributes 795 in rules/tokens referenced by a label or name. 796 Setting dynamic scopes's attributes and your own parameter attributes 797 is legal. 798 799 October 27, 2006 800 801 * Altered how ANTLR figures out what decision is associated with which 802 block of grammar. Makes ANTLRWorks correctly find DFA for a block. 803 804 October 26, 2006 805 806 * Fixed bug where EOT transitions led to no NFA configs in a DFA state, 807 yielding an error in DFA table generation. 808 809 * renamed action.g to ActionTranslator.g 810 the ActionTranslator class is now called ActionTranslatorLexer, as ANTLR 811 generates this classname now. Fixed rest of codebase accordingly. 812 813 * added rules recognizing setting of scopes' attributes to ActionTranslator.g 814 the Objective C target needed access to the right-hand side of the assignment 815 in order to generate correct code 816 817 * changed ANTLRCore.sti to reflect the new mandatory templates to support the above 818 namely: scopeSetAttributeRef, returnSetAttributeRef and the ruleSetPropertyRef_* 819 templates, with the exception of ruleSetPropertyRef_text. we cannot set this attribute 820 821 October 19, 2006 822 823 * Fixed 2 bugs in DFA conversion that caused exceptions. 824 altered functionality of getMinElement so it ignores elements<0. 825 826 October 18, 2006 827 828 * moved resetStateNumbersToBeContiguous() to after issuing of warnings; 829 an internal error in that routine should make more sense as issues 830 with decision will appear first. 831 832 * fixed cut/paste bug I introduced when fixed EOF in min/max 833 bug. Prevented C grammar from working briefly. 834 835 October 17, 2006 836 837 * Removed a failsafe that seems to be unnecessary that ensure DFA didn't 838 get too big. It was resulting in some failures in code generation that 839 led me on quite a strange debugging trip. 840 841 October 16, 2006 842 843 * Use channel=HIDDEN not channel=99 to put tokens on hidden channel. 844 845 October 12, 2006 846 847 * ANTLR now has a customizable message format for errors and warnings, 848 to make it easier to fulfill requirements by IDEs and such. 849 The format to be used can be specified via the '-message-format name' 850 command line switch. The default for name is 'antlr', also available 851 at the moment is 'gnu'. This is done via StringTemplate, for details 852 on the requirements look in org/antlr/tool/templates/messages/formats/ 853 854 * line numbers for lexers in combined grammars are now reported correctly. 855 856 September 29, 2006 857 858 * ANTLRReaderStream improperly checked for end of input. 859 860 September 28, 2006 861 862 * For ANTLRStringStream, LA(-1) was off by one...gave you LA(-2). 863 864 3.0b4 - August 24, 2006 865 866 * error when no rules in grammar. doesn't crash now. 867 868 * Token is now an interface. 869 870 * remove dependence on non runtime classes in runtime package. 871 872 * filename and grammar name must be same Foo in Foo.g. Generates FooParser, 873 FooLexer, ... Combined grammar Foo generates Foo$Lexer.g which generates 874 FooLexer.java. tree grammars generate FooTreeParser.java 875 876 August 24, 2006 877 878 * added C# target to lib, codegen, templates 879 880 August 11, 2006 881 882 * added tree arg to navigation methods in treeadaptor 883 884 August 07, 2006 885 886 * fixed bug related to (a|)+ on end of lexer rules. crashed instead 887 of warning. 888 889 * added warning that interpreter doesn't do synpreds yet 890 891 * allow different source of classloader: 892 ClassLoader cl = Thread.currentThread().getContextClassLoader(); 893 if ( cl==null ) { 894 cl = this.getClass().getClassLoader(); 895 } 896 897 898 July 26, 2006 899 900 * compressed DFA edge tables significantly. All edge tables are 901 unique. The transition table can reuse arrays. Look like this now: 902 903 public static readonly DFA30_transition0 = 904 new short[] { 46, 46, -1, 46, 46, -1, -1, -1, -1, -1, -1, -1,...}; 905 public static readonly DFA30_transition1 = 906 new short[] { 21 }; 907 public static readonly short[][] DFA30_transition = { 908 DFA30_transition0, 909 DFA30_transition0, 910 DFA30_transition1, 911 ... 912 }; 913 914 * If you defined both a label like EQ and '=', sometimes the '=' was 915 used instead of the EQ label. 916 917 * made headerFile template have same arg list as outputFile for consistency 918 919 * outputFile, lexer, genericParser, parser, treeParser templates 920 reference cyclicDFAs attribute which was no longer used after I 921 started the new table-based DFA. I made cyclicDFADescriptors 922 argument to outputFile and headerFile (only). I think this is 923 correct as only OO languages will want the DFA in the recognizer. 924 At the top level, C and friends can use it. Changed name to use 925 cyclicDFAs again as it's a better name probably. Removed parameter 926 from the lexer, ... For example, my parser template says this now: 927 928 <cyclicDFAs:cyclicDFA()> <! dump tables for all DFA !> 929 930 * made all token ref token types go thru code gen's 931 getTokenTypeAsTargetLabel() 932 933 * no more computing DFA transition tables for acyclic DFA. 934 935 July 25, 2006 936 937 * fixed a place where I was adding syn predicates into rewrite stuff. 938 939 * turned off invalid token index warning in AW support; had a problem. 940 941 * bad location event generated with -debug for synpreds in autobacktrack mode. 942 943 July 24, 2006 944 945 * changed runtime.DFA so that it treats all chars and token types as 946 char (unsigned 16 bit int). -1 becomes '\uFFFF' then or 65535. 947 948 * changed MAX_STATE_TRANSITIONS_FOR_TABLE to be 65534 by default 949 now. This means that all states can use a table to do transitions. 950 951 * was not making synpreds on (C)* type loops with backtrack=true 952 953 * was copying tree stuff and actions into synpreds with backtrack=true 954 955 * was making synpreds on even single alt rules / blocks with backtrack=true 956 957 3.0b3 - July 21, 2006 958 959 * ANTLR fails to analyze complex decisions much less frequently. It 960 turns out that the set of decisions for which ANTLR fails (times 961 out) is the same set (so far) of non-LL(*) decisions. Morever, I'm 962 able to detect this situation quickly and report rather than timing 963 out. Errors look like: 964 965 java.g:468:23: [fatal] rule concreteDimensions has non-LL(*) 966 decision due to recursive rule invocations in alts 1,2. Resolve 967 by left-factoring or using syntactic predicates with fixed k 968 lookahead or use backtrack=true option. 969 970 This message only appears when k=*. 971 972 * Shortened no viable alt messages to not include decision 973 description: 974 975 [compilationUnit, declaration]: line 8:8 decision=<<67:1: declaration 976 : ( ( fieldDeclaration )=> fieldDeclaration | ( methodDeclaration )=> 977 methodDeclaration | ( constructorDeclaration )=> 978 constructorDeclaration | ( classDeclaration )=> classDeclaration | ( 979 interfaceDeclaration )=> interfaceDeclaration | ( blockDeclaration )=> 980 blockDeclaration | emptyDeclaration );>> state 3 (decision=14) no 981 viable alt; token=[@1,184:187='java',<122>,8:8] 982 983 too long and hard to read. 984 985 July 19, 2006 986 987 * Code gen bug: states with no emanating edges were ignored by ST. 988 Now an empty list is used. 989 990 * Added grammar parameter to recognizer templates so they can access 991 properties like getName(), ... 992 993 July 10, 2006 994 995 * Fixed the gated pred merged state bug. Added unit test. 996 997 * added new method to Target: getTokenTypeAsTargetLabel() 998 999 July 7, 2006 1000 1001 * I was doing an AND instead of OR in the gated predicate stuff. 1002 Thanks to Stephen Kou! 1003 1004 * Reduce op for combining predicates was insanely slow sometimes and 1005 didn't actually work well. Now it's fast and works. 1006 1007 * There is a bug in merging of DFA stop states related to gated 1008 preds...turned it off for now. 1009 1010 3.0b2 - July 5, 2006 1011 1012 July 5, 2006 1013 1014 * token emission not properly protected in lexer filter mode. 1015 1016 * EOT, EOT DFA state transition tables should be init'd to -1 (only 1017 was doing this for compressed tables). Fixed. 1018 1019 * in trace mode, exit method not shown for memoized rules 1020 1021 * added -Xmaxdfaedges to allow you to increase number of edges allowed 1022 for a single DFA state before it becomes "special" and can't fit in 1023 a simple table. 1024 1025 * Bug in tables. Short are signed so min/max tables for DFA are now 1026 char[]. Bizarre. 1027 1028 July 3, 2006 1029 1030 * Added a method to reset the tool error state for current thread. 1031 See ErrorManager.java 1032 1033 * [Got this working properly today] backtrack mode that let's you type 1034 in any old crap and ANTLR will backtrack if it can't figure out what 1035 you meant. No errors are reported by antlr during analysis. It 1036 implicitly adds a syn pred in front of every production, using them 1037 only if static grammar LL(*) analysis fails. Syn pred code is not 1038 generated if the pred is not used in a decision. 1039 1040 This is essentially a rapid prototyping mode. 1041 1042 * Added backtracking report to the -report option 1043 1044 * Added NFA->DFA conversion early termination report to the -report option 1045 1046 * Added grammar level k and backtrack options to -report 1047 1048 * Added a dozen unit tests to test autobacktrack NFA construction. 1049 1050 * If you are using filter mode, you must manually use option 1051 memoize=true now. 1052 1053 July 2, 2006 1054 1055 * Added k=* option so you can set k=2, for example, on whole grammar, 1056 but an individual decision can be LL(*). 1057 1058 * memoize option for grammars, rules, blocks. Remove -nomemo cmd-line option 1059 1060 * but in DOT generator for DFA; fixed. 1061 1062 * runtime.DFA reported errors even when backtracking 1063 1064 July 1, 2006 1065 1066 * Added -X option list to help 1067 1068 * Syn preds were being hoisted into other rules, causing lots of extra 1069 backtracking. 1070 1071 June 29, 2006 1072 1073 * unnecessary files removed during build. 1074 1075 * Matt Benson updated build.xml 1076 1077 * Detecting use of synpreds in analysis now instead of codegen. In 1078 this way, I can avoid analyzing decisions in synpreds for synpreds 1079 not used in a DFA for a real rule. This is used to optimize things 1080 for backtrack option. 1081 1082 * Code gen must add _fragment or whatever to end of pred name in 1083 template synpredRule to avoid having ANTLR know anything about 1084 method names. 1085 1086 * Added -IdbgST option to emit ST delimiters at start/stop of all 1087 templates spit out. 1088 1089 June 28, 2006 1090 1091 * Tweaked message when ANTLR cannot handle analysis. 1092 1093 3.0b1 - June 27, 2006 1094 1095 June 24, 2006 1096 1097 * syn preds no longer generate little static classes; they also don't 1098 generate a whole bunch of extra crap in the rules built to test syn 1099 preds. Removed GrammarFragmentPointer class from runtime. 1100 1101 June 23-24, 2006 1102 1103 * added output option to -report output. 1104 1105 * added profiling info: 1106 Number of rule invocations in "guessing" mode 1107 number of rule memoization cache hits 1108 number of rule memoization cache misses 1109 1110 * made DFA DOT diagrams go left to right not top to bottom 1111 1112 * I try to recursive overflow states now by resolving these states 1113 with semantic/syntactic predicates if they exist. The DFA is then 1114 deterministic rather than simply resolving by choosing first 1115 nondeterministic alt. I used to generated errors: 1116 1117 ~/tmp $ java org.antlr.Tool -dfa t.g 1118 ANTLR Parser Generator Early Access Version 3.0b2 (July 5, 2006) 1989-2006 1119 t.g:2:5: Alternative 1: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b 1120 t.g:2:5: Alternative 2: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b 1121 1122 Now, I uses predicates if available and emits no warnings. 1123 1124 * made sem preds share accept states. Previously, multiple preds in a 1125 decision forked new accepts each time for each nondet state. 1126 1127 June 19, 2006 1128 1129 * Need parens around the prediction expressions in templates. 1130 1131 * Referencing $ID.text in an action forced bad code gen in lexer rule ID. 1132 1133 * Fixed a bug in how predicates are collected. The definition of 1134 "last predicated alternative" was incorrect in the analysis. Further, 1135 gated predicates incorrectly missed a case where an edge should become 1136 true (a tautology). 1137 1138 * Removed an unnecessary input.consume() reference in the runtime/DFA class. 1139 1140 June 14, 2006 1141 1142 * -> ($rulelabel)? didn't generate proper code for ASTs. 1143 1144 * bug in code gen (did not compile) 1145 a : ID -> ID 1146 | ID -> ID 1147 ; 1148 Problem is repeated ref to ID from left side. Juergen pointed this out. 1149 1150 * use of tokenVocab with missing file yielded exception 1151 1152 * (A|B)=> foo yielded an exception as (A|B) is a set not a block. Fixed. 1153 1154 * Didn't set ID1= and INT1= for this alt: 1155 | ^(ID INT+ {System.out.print(\"^(\"+$ID+\" \"+$INT+\")\");}) 1156 1157 * Fixed so repeated dangling state errors only occur once like: 1158 t.g:4:17: the decision cannot distinguish between alternative(s) 2,1 for at least one input sequence 1159 1160 * tracking of rule elements was on (making list defs at start of 1161 method) with templates instead of just with ASTs. Turned off. 1162 1163 * Doesn't crash when you give it a missing file now. 1164 1165 * -report: add output info: how many LL(1) decisions. 1166 1167 June 13, 2006 1168 1169 * ^(ROOT ID?) Didn't work; nor did any other nullable child list such as 1170 ^(ROOT ID* INT?). Now, I check to see if child list is nullable using 1171 Grammar.LOOK() and, if so, I generate an "IF lookahead is DOWN" gate 1172 around the child list so the whole thing is optional. 1173 1174 * Fixed a bug in LOOK that made it not look through nullable rules. 1175 1176 * Using AST suffixes or -> rewrite syntax now gives an error w/o a grammar 1177 output option. Used to crash ;) 1178 1179 * References to EOF ended up with improper -1 refs instead of EOF in output. 1180 1181 * didn't warn of ambig ref to $expr in rewrite; fixed. 1182 list 1183 : '[' expr 'for' type ID 'in' expr ']' 1184 -> comprehension(expr={$expr.st},type={},list={},i={}) 1185 ; 1186 1187 June 12, 2006 1188 1189 * EOF works in the parser as a token name. 1190 1191 * Rule b:(A B?)*; didn't display properly in AW due to the way ANTLR 1192 generated NFA. 1193 1194 * "scope x;" in a rule for unknown x gives no error. Fixed. Added unit test. 1195 1196 * Label type for refs to start/stop in tree parser and other parsers were 1197 not used. Lots of casting. Ick. Fixed. 1198 1199 * couldn't refer to $tokenlabel in isolation; but need so we can test if 1200 something was matched. Fixed. 1201 1202 * Lots of little bugs fixed in $x.y, %... translation due to new 1203 action translator. 1204 1205 * Improperly tracking block nesting level; result was that you couldn't 1206 see $ID in action of rule "a : A+ | ID {Token t = $ID;} | C ;" 1207 1208 * a : ID ID {$ID.text;} ; did not get a warning about ambiguous $ID ref. 1209 1210 * No error was found on $COMMENT.text: 1211 1212 COMMENT 1213 : '/*' (options {greedy=false;} : . )* '*/' 1214 {System.out.println("found method "+$COMMENT.text);} 1215 ; 1216 1217 $enclosinglexerrule scope does not exist. Use text or setText() here. 1218 1219 June 11, 2006 1220 1221 * Single return values are initialized now to default or to your spec. 1222 1223 * cleaned up input stream stuff. Added ANTLRReaderStream, ANTLRInputStream 1224 and refactored. You can specify encodings now on ANTLRFileStream (and 1225 ANTLRInputStream) now. 1226 1227 * You can set text local var now in a lexer rule and token gets that text. 1228 start/stop indexes are still set for the token. 1229 1230 * Changed lexer slightly. Calling a nonfragment rule from a 1231 nonfragment rule does not set the overall token. 1232 1233 June 10, 2006 1234 1235 * Fixed bug where unnecessary escapes yield char==0 like '\{'. 1236 1237 * Fixed analysis bug. This grammar didn't report a recursion warning: 1238 x : y X 1239 | y Y 1240 ; 1241 y : L y R 1242 | B 1243 ; 1244 The DFAState.equals() method was messed up. 1245 1246 * Added @synpredgate {...} action so you can tell ANTLR how to gate actions 1247 in/out during syntactic predicate evaluation. 1248 1249 * Fuzzy parsing should be more efficient. It should backtrack over a rule 1250 and then rewind and do it again "with feeling" to exec actions. It was 1251 actually doing it 3x not 2x. 1252 1253 June 9, 2006 1254 1255 * Gutted and rebuilt the action translator for $x.y, $x::y, ... 1256 Uses ANTLR v3 now for the first time inside v3 source. :) 1257 ActionTranslator.java 1258 1259 * Fixed a bug where referencing a return value on a rule didn't work 1260 because later a ref to that rule's predefined properties didn't 1261 properly force a return value struct to be built. Added unit test. 1262 1263 June 6, 2006 1264 1265 * New DFA mechanisms. Cyclic DFA are implemented as state tables, 1266 encoded via strings as java cannot handle large static arrays :( 1267 States with edges emanating that have predicates are specially 1268 treated. A method is generated to do these states. The DFA 1269 simulation routine uses the "special" array to figure out if the 1270 state is special. See March 25, 2006 entry for description: 1271 http://www.antlr.org/blog/antlr3/codegen.tml. analysis.DFA now has 1272 all the state tables generated for code gen. CyclicCodeGenerator.java 1273 disappeared as it's unneeded code. :) 1274 1275 * Internal general clean up of the DFA.states vs uniqueStates thing. 1276 Fixed lookahead decisions no longer fill uniqueStates. Waste of 1277 time. Also noted that when adding sem pred edges, I didn't check 1278 for state reuse. Fixed. 1279 1280 June 4, 2006 1281 1282 * When resolving ambig DFA states predicates, I did not add the new states 1283 to the list of unique DFA states. No observable effect on output except 1284 that DFA state numbers were not always contiguous for predicated decisions. 1285 I needed this fix for new DFA tables. 1286 1287 3.0ea10 - June 2, 2006 1288 1289 June 2, 2006 1290 1291 * Improved grammar stats and added syntactic pred tracking. 1292 1293 June 1, 2006 1294 1295 * Due to a type mismatch, the DebugParser.recoverFromMismatchedToken() 1296 method was not called. Debug events for mismatched token error 1297 notification were not sent to ANTLRWorks probably 1298 1299 * Added getBacktrackingLevel() for any recognizer; needed for profiler. 1300 1301 * Only writes profiling data for antlr grammar analysis with -profile set 1302 1303 * Major update and bug fix to (runtime) Profiler. 1304 1305 May 27, 2006 1306 1307 * Added Lexer.skip() to force lexer to ignore current token and look for 1308 another; no token is created for current rule and is not passed on to 1309 parser (or other consumer of the lexer). 1310 1311 * Parsers are much faster now. I removed use of java.util.Stack for pushing 1312 follow sets and use a hardcoded array stack instead. Dropped from 1313 5900ms to 3900ms for parse+lex time parsing entire java 1.4.2 source. Lex 1314 time alone was about 1500ms. Just looking at parse time, we get about 2x 1315 speed improvement. :) 1316 1317 May 26, 2006 1318 1319 * Fixed NFA construction so it generates NFA for (A*)* such that ANTLRWorks 1320 can display it properly. 1321 1322 May 25, 2006 1323 1324 * added abort method to Grammar so AW can terminate the conversion if it's 1325 taking too long. 1326 1327 May 24, 2006 1328 1329 * added method to get left recursive rules from grammar without doing full 1330 grammar analysis. 1331 1332 * analysis, code gen not attempted if serious error (like 1333 left-recursion or missing rule definition) occurred while reading 1334 the grammar in and defining symbols. 1335 1336 * added amazing optimization; reduces analysis time by 90% for java 1337 grammar; simple IF statement addition! 1338 1339 3.0ea9 - May 20, 2006 1340 1341 * added global k value for grammar to limit lookahead for all decisions unless 1342 overridden in a particular decision. 1343 1344 * added failsafe so that any decision taking longer than 2 seconds to create 1345 the DFA will fall back on k=1. Use -ImaxtimeforDFA n (in ms) to set the time. 1346 1347 * added an option (turned off for now) to use multiple threads to 1348 perform grammar analysis. Not much help on a 2-CPU computer as 1349 garbage collection seems to peg the 2nd CPU already. :( Gotta wait for 1350 a 4 CPU box ;) 1351 1352 * switched from #src to // $ANTLR src directive. 1353 1354 * CommonTokenStream.getTokens() looked past end of buffer sometimes. fixed. 1355 1356 * unicode literals didn't really work in DOT output and generated code. fixed. 1357 1358 * fixed the unit test rig so it compiles nicely with Java 1.5 1359 1360 * Added ant build.xml file (reads build.properties file) 1361 1362 * predicates sometimes failed to compile/eval properly due to missing (...) 1363 in IF expressions. Forced (..) 1364 1365 * (...)? with only one alt were not optimized. Was: 1366 1367 // t.g:4:7: ( B )? 1368 int alt1=2; 1369 int LA1_0 = input.LA(1); 1370 if ( LA1_0==B ) { 1371 alt1=1; 1372 } 1373 else if ( LA1_0==-1 ) { 1374 alt1=2; 1375 } 1376 else { 1377 NoViableAltException nvae = 1378 new NoViableAltException("4:7: ( B )?", 1, 0, input); 1379 throw nvae; 1380 } 1381 1382 is now: 1383 1384 // t.g:4:7: ( B )? 1385 int alt1=2; 1386 int LA1_0 = input.LA(1); 1387 if ( LA1_0==B ) { 1388 alt1=1; 1389 } 1390 1391 Smaller, faster and more readable. 1392 1393 * Allow manual init of return values now: 1394 functionHeader returns [int x=3*4, char (*f)()=null] : ... ; 1395 1396 * Added optimization for DFAs that fixed a codegen bug with rules in lexer: 1397 EQ : '=' ; 1398 ASSIGNOP : '=' | '+=' ; 1399 EQ is a subset of other rule. It did not given an error which is 1400 correct, but generated bad code. 1401 1402 * ANTLR was sending column not char position to ANTLRWorks. 1403 1404 * Bug fix: location 0, 0 emitted for synpreds and empty alts. 1405 1406 * debugging event handshake how sends grammar file name. Added getGrammarFileName() to recognizers. Java.stg generates it: 1407 1408 public String getGrammarFileName() { return "<fileName>"; } 1409 1410 * tree parsers can do arbitrary lookahead now including backtracking. I 1411 updated CommonTreeNodeStream. 1412 1413 * added events for debugging tree parsers: 1414 1415 /** Input for a tree parser is an AST, but we know nothing for sure 1416 * about a node except its type and text (obtained from the adaptor). 1417 * This is the analog of the consumeToken method. Again, the ID is 1418 * the hashCode usually of the node so it only works if hashCode is 1419 * not implemented. 1420 */ 1421 public void consumeNode(int ID, String text, int type); 1422 1423 /** The tree parser looked ahead */ 1424 public void LT(int i, int ID, String text, int type); 1425 1426 /** The tree parser has popped back up from the child list to the 1427 * root node. 1428 */ 1429 public void goUp(); 1430 1431 /** The tree parser has descended to the first child of a the current 1432 * root node. 1433 */ 1434 public void goDown(); 1435 1436 * Added DebugTreeNodeStream and DebugTreeParser classes 1437 1438 * Added ctor because the debug tree node stream will need to ask quesitons about nodes and since nodes are just Object, it needs an adaptor to decode the nodes and get text/type info for the debugger. 1439 1440 public CommonTreeNodeStream(TreeAdaptor adaptor, Tree tree); 1441 1442 * added getter to TreeNodeStream: 1443 public TreeAdaptor getTreeAdaptor(); 1444 1445 * Implemented getText/getType in CommonTreeAdaptor. 1446 1447 * Added TraceDebugEventListener that can dump all events to stdout. 1448 1449 * I broke down and make Tree implement getText 1450 1451 * tree rewrites now gen location debug events. 1452 1453 * added AST debug events to listener; added blank listener for convenience 1454 1455 * updated debug events to send begin/end backtrack events for debugging 1456 1457 * with a : (b->b) ('+' b -> ^(PLUS $a b))* ; you get b[0] each time as 1458 there is no loop in rewrite rule itself. Need to know context that 1459 the -> is inside the rule and hence b means last value of b not all 1460 values. 1461 1462 * Bug in TokenRewriteStream; ops at indexes < start index blocked proper op. 1463 1464 * Actions in ST rewrites "-> ({$op})()" were not translated 1465 1466 * Added new action name: 1467 1468 @rulecatch { 1469 catch (RecognitionException re) { 1470 reportError(re); 1471 recover(input,re); 1472 } 1473 catch (Throwable t) { 1474 System.err.println(t); 1475 } 1476 } 1477 Overrides rule catch stuff. 1478 1479 * Isolated $ refs caused exception 1480 1481 3.0ea8 - March 11, 2006 1482 1483 * added @finally {...} action like @init for rules. Executes in 1484 finally block (java target) after all other stuff like rule memoization. 1485 No code changes needs; ST just refs a new action: 1486 <ruleDescriptor.actions.finally> 1487 1488 * hideous bug fixed: PLUS='+' didn't result in '+' rule in lexer 1489 1490 * TokenRewriteStream didn't do toString() right when no rewrites had been done. 1491 1492 * lexer errors in interpreter were not printed properly 1493 1494 * bitsets are dumped in hex not decimal now for FOLLOW sets 1495 1496 * /* epsilon */ is not printed now when printing out grammars with empty alts 1497 1498 * Fixed another bug in tree rewrite stuff where it was checking that elements 1499 had at least one element. Strange...commented out for now to see if I can remember what's up. 1500 1501 * Tree rewrites had problems when you didn't have x+=FOO variables. Rules 1502 like this work now: 1503 1504 a : (x=ID)? y=ID -> ($x $y)?; 1505 1506 * filter=true for lexers turns on k=1 and backtracking for every token 1507 alternative. Put the rules in priority order. 1508 1509 * added getLine() etc... to Tree to support better error reporting for 1510 trees. Added MismatchedTreeNodeException. 1511 1512 * $templates::foo() is gone. added % as special template symbol. 1513 %foo(a={},b={},...) ctor (even shorter than $templates::foo(...)) 1514 %({name-expr})(a={},...) indirect template ctor reference 1515 1516 The above are parsed by antlr.g and translated by codegen.g 1517 The following are parsed manually here: 1518 1519 %{string-expr} anonymous template from string expr 1520 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z 1521 %x.y = z; set template attribute y of x (always set never get attr) 1522 to z [languages like python without ';' must still use the 1523 ';' which the code generator is free to remove during code gen] 1524 1525 * -> ({expr})(a={},...) notation for indirect template rewrite. 1526 expr is the name of the template. 1527 1528 * $x[i]::y and $x[-i]::y notation for accesssing absolute scope stack 1529 indexes and relative negative scopes. $x[-1]::y is the y attribute 1530 of the previous scope (stack top - 1). 1531 1532 * filter=true mode for lexers; can do this now...upon mismatch, just 1533 consumes a char and tries again: 1534 lexer grammar FuzzyJava; 1535 options {filter=true;} 1536 1537 FIELD 1538 : TYPE WS? name=ID WS? (';'|'=') 1539 {System.out.println("found var "+$name.text);} 1540 ; 1541 1542 * refactored char streams so ANTLRFileStream is now a subclass of 1543 ANTLRStringStream. 1544 1545 * char streams for lexer now allowed nested backtracking in lexer. 1546 1547 * added TokenLabelType for lexer/parser for all token labels 1548 1549 * line numbers for error messages were not updated properly in antlr.g 1550 for strings, char literals and <<...>> 1551 1552 * init action in lexer rules was before the type,start,line,... decls. 1553 1554 * Tree grammars can now specify output; I've only tested output=templat 1555 though. 1556 1557 * You can reference EOF now in the parser and lexer. It's just token type 1558 or char value -1. 1559 1560 * Bug fix: $ID refs in the *lexer* were all messed up. Cleaned up the 1561 set of properties available... 1562 1563 * Bug fix: .st not found in rule ref when rule has scope: 1564 field 1565 scope { 1566 StringTemplate funcDef; 1567 } 1568 : ... 1569 {$field::funcDef = $field.st;} 1570 ; 1571 it gets field_stack.st instead 1572 1573 * return in backtracking must return retval or null if return value. 1574 1575 * $property within a rule now works like $text, $st, ... 1576 1577 * AST/Template Rewrites were not gated by backtracking==0 so they 1578 executed even when guessing. Auto AST construction is now gated also. 1579 1580 * CommonTokenStream was somehow returning tokens not text in toString() 1581 1582 * added useful methods to runtime.BitSet and also to CommonToken so you can 1583 update the text. Added nice Token stream method: 1584 1585 /** Given a start and stop index, return a List of all tokens in 1586 * the token type BitSet. Return null if no tokens were found. This 1587 * method looks at both on and off channel tokens. 1588 */ 1589 public List getTokens(int start, int stop, BitSet types); 1590 1591 * literals are now passed in the .tokens files so you can ref them in 1592 tree parses, for example. 1593 1594 * added basic exception handling; no labels, just general catches: 1595 1596 a : {;}A | B ; 1597 exception 1598 catch[RecognitionException re] { 1599 System.out.println("recog error"); 1600 } 1601 catch[Exception e] { 1602 System.out.println("error"); 1603 } 1604 1605 * Added method to TokenStream: 1606 public String toString(Token start, Token stop); 1607 1608 * antlr generates #src lines in lexer grammars generated from combined grammars 1609 so error messages refer to original file. 1610 1611 * lexers generated from combined grammars now use originally formatting. 1612 1613 * predicates have $x.y stuff translated now. Warning: predicates might be 1614 hoisted out of context. 1615 1616 * return values in return val structs are now public. 1617 1618 * output=template with return values on rules was broken. I assume return values with ASTs was broken too. Fixed. 1619 1620 3.0ea7 - December 14, 2005 1621 1622 * Added -print option to print out grammar w/o actions 1623 1624 * Renamed BaseParser to be BaseRecognizer and even made Lexer derive from 1625 this; nice as it now shares backtracking support code. 1626 1627 * Added syntactic predicates (...)=>. See December 4, 2005 entry: 1628 1629 http://www.antlr.org/blog/antlr3/lookahead.tml 1630 1631 Note that we have a new option for turning off rule memoization during 1632 backtracking: 1633 1634 -nomemo when backtracking don't generate memoization code 1635 1636 * Predicates are now tested in order that you specify the alts. If you 1637 leave the last alt "naked" (w/o pred), it will assume a true pred rather 1638 than union of other preds. 1639 1640 * Added gated predicates "{p}?=>" that literally turn off a production whereas 1641 disambiguating predicates are only hoisted into the predictor when syntax alone 1642 is not sufficient to uniquely predict alternatives. 1643 1644 A : {p}? => "a" ; 1645 B : {!p}? => ("a"|"b")+ ; 1646 1647 * bug fixed related to predicates in predictor 1648 lexer grammar w; 1649 A : {p}? "a" ; 1650 B : {!p}? ("a"|"b")+ ; 1651 DFA is correct. A state splits for input "a" on the pred. 1652 Generated code though was hosed. No pred tests in prediction code! 1653 I added testLexerPreds() and others in TestSemanticPredicateEvaluation.java 1654 1655 * added execAction template in case we want to do something in front of 1656 each action execution or something. 1657 1658 * left-recursive cycles from rules w/o decisions were not detected. 1659 1660 * undefined lexer rules were not announced! fixed. 1661 1662 * unreachable messages for Tokens rule now indicate rule name not alt. E.g., 1663 1664 Ruby.lexer.g:24:1: The following token definitions are unreachable: IVAR 1665 1666 * nondeterminism warnings improved for Tokens rule: 1667 1668 Ruby.lexer.g:10:1: Multiple token rules can match input such as ""0".."9"": INT, FLOAT 1669 As a result, tokens(s) FLOAT were disabled for that input 1670 1671 1672 * DOT diagrams didn't show escaped char properly. 1673 1674 * Char/string literals are now all 'abc' not "abc". 1675 1676 * action syntax changed "@scope::actionname {action}" where scope defaults 1677 to "parser" if parser grammar or combined grammar, "lexer" if lexer grammar, 1678 and "treeparser" if tree grammar. The code generation targets decide 1679 what scopes are available. Each "scope" yields a hashtable for use in 1680 the output templates. The scopes full of actions are sent to all output 1681 file templates (currently headerFile and outputFile) as attribute actions. 1682 Then you can reference <actions.scope> to get the map of actions associated 1683 with scope and <actions.parser.header> to get the parser's header action 1684 for example. This should be very flexible. The target should only have 1685 to define which scopes are valid, but the action names should be variable 1686 so we don't have to recompile ANTLR to add actions to code gen templates. 1687 1688 grammar T; 1689 options {language=Java;} 1690 @header { package foo; } 1691 @parser::stuff { int i; } // names within scope not checked; target dependent 1692 @members { int i; } 1693 @lexer::header {head} 1694 @lexer::members { int j; } 1695 @headerfile::blort {...} // error: this target doesn't have headerfile 1696 @treeparser::members {...} // error: this is not a tree parser 1697 a 1698 @init {int i;} 1699 : ID 1700 ; 1701 ID : 'a'..'z'; 1702 1703 For now, the Java target uses members and header as a valid name. Within a 1704 rule, the init action name is valid. 1705 1706 * changed $dynamicscope.value to $dynamicscope::value even if value is defined 1707 in same rule such as $function::name where rule function defines name. 1708 1709 * $dynamicscope gets you the stack 1710 1711 * rule scopes go like this now: 1712 1713 rule 1714 scope {...} 1715 scope slist,Symbols; 1716 : ... 1717 ; 1718 1719 * Created RuleReturnScope as a generic rule return value. Makes it easier 1720 to do this: 1721 RuleReturnScope r = parser.program(); 1722 System.out.println(r.getTemplate().toString()); 1723 1724 * $template, $tree, $start, etc... 1725 1726 * $r.x in current rule. $r is ignored as fully-qualified name. $r.start works too 1727 1728 * added warning about $r referring to both return value of rule and dynamic scope of rule 1729 1730 * integrated StringTemplate in a very simple manner 1731 1732 Syntax: 1733 -> template(arglist) "..." 1734 -> template(arglist) <<...>> 1735 -> namedTemplate(arglist) 1736 -> {free expression} 1737 -> // empty 1738 1739 Predicate syntax: 1740 a : A B -> {p1}? foo(a={$A.text}) 1741 -> {p2}? foo(a={$B.text}) 1742 -> // return nothing 1743 1744 An arg list is just a list of template attribute assignments to actions in curlies. 1745 1746 There is a setTemplateLib() method for you to use with named template rewrites. 1747 1748 Use a new option: 1749 1750 grammar t; 1751 options {output=template;} 1752 ... 1753 1754 This all should work for tree grammars too, but I'm still testing. 1755 1756 * fixed bugs where strings were improperly escaped in exceptions, comments, etc.. For example, newlines came out as newlines not the escaped version 1757 1758 3.0ea6 - November 13, 2005 1759 1760 * turned off -debug/-profile, which was on by default 1761 1762 * completely refactored the output templates; added some missing templates. 1763 1764 * dramatically improved infinite recursion error messages (actually 1765 left-recursion never even was printed out before). 1766 1767 * wasn't printing dangling state messages when it reanalyzes with k=1. 1768 1769 * fixed a nasty bug in the analysis engine dealing with infinite recursion. 1770 Spent all day thinking about it and cleaned up the code dramatically. 1771 Bug fixed and software is more powerful and I understand it better! :) 1772 1773 * improved verbose DFA nodes; organized by alt 1774 1775 * got much better random phrase generation. For example: 1776 1777 $ java org.antlr.tool.RandomPhrase simple.g program 1778 int Ktcdn ';' method wh '(' ')' '{' return 5 ';' '}' 1779 1780 * empty rules like "a : ;" generated code that didn't compile due to 1781 try/catch for RecognitionException. Generated code couldn't possibly 1782 throw that exception. 1783 1784 * when printing out a grammar, such as in comments in generated code, 1785 ANTLR didn't print ast suffix stuff back out for literals. 1786 1787 * This never exited loop: 1788 DATA : (options {greedy=false;}: .* '\n' )* '\n' '.' ; 1789 and now it works due to new default nongreedy .* Also this works: 1790 DATA : (options {greedy=false;}: .* '\n' )* '.' ; 1791 1792 * Dot star ".*" syntax didn't work; in lexer it is nongreedy by 1793 default. In parser it is on greedy but also k=1 by default. Added 1794 unit tests. Added blog entry to describe. 1795 1796 * ~T where T is the only token yielded an empty set but no error 1797 1798 * Used to generate unreachable message here: 1799 1800 parser grammar t; 1801 a : ID a 1802 | ID 1803 ; 1804 1805 z.g:3:11: The following alternatives are unreachable: 2 1806 1807 In fact it should really be an error; now it generates: 1808 1809 no start rule in grammar t (no rule can obviously be followed by EOF) 1810 1811 Per next change item, ANTLR cannot know that EOF follows rule 'a'. 1812 1813 * added error message indicating that ANTLR can't figure out what your 1814 start rule is. Required to properly generate code in some cases. 1815 1816 * validating semantic predicates now work (if they are false, they 1817 throw a new FailedPredicateException 1818 1819 * two hideous bug fixes in the IntervalSet, which made analysis go wrong 1820 in a few cases. Thanks to Oliver Zeigermann for finding lots of bugs 1821 and making suggested fixes (including the next two items)! 1822 1823 * cyclic DFAs are now nonstatic and hence can access instance variables 1824 1825 * labels are now allowed on lexical elements (in the lexer) 1826 1827 * added some internal debugging options 1828 1829 * ~'a'* and ~('a')* were not working properly; refactored antlr.g grammar 1830 1831 3.0ea5 - July 5, 2005 1832 1833 * Using '\n' in a parser grammar resulted in a nonescaped version of '\n' in the token names table making compilation fail. I fixed this by reorganizing/cleaning up portion of ANTLR that deals with literals. See comment org.antlr.codegen.Target. 1834 1835 * Target.getMaxCharValue() did not use the appropriate max value constant. 1836 1837 * ALLCHAR was a constant when it should use the Target max value def. set complement for wildcard also didn't use the Target def. Generally cleaned up the max char value stuff. 1838 1839 * Code gen didn't deal with ASTLabelType properly...I think even the 3.0ea7 example tree parser was broken! :( 1840 1841 * Added a few more unit tests dealing with escaped literals 1842 1843 3.0ea4 - June 29, 2005 1844 1845 * tree parsers work; added CommonTreeNodeStream. See simplecTreeParser 1846 example in examples-v3 tarball. 1847 1848 * added superClass and ASTLabelType options 1849 1850 * refactored Parser to have a BaseParser and added TreeParser 1851 1852 * bug fix: actions being dumped in description strings; compile errors 1853 resulted 1854 1855 3.0ea3 - June 23, 2005 1856 1857 Enhancements 1858 1859 * Automatic tree construction operators are in: ! ^ ^^ 1860 1861 * Tree construction rewrite rules are in 1862 -> {pred1}? rewrite1 1863 -> {pred2}? rewrite2 1864 ... 1865 -> rewriteN 1866 1867 The rewrite rules may be elements like ID, expr, $label, {node expr} 1868 and trees ^( <root> <children> ). You have have (...)?, (...)*, (...)+ 1869 subrules as well. 1870 1871 You may have rewrites in subrules not just at outer level of rule, but 1872 any -> rewrite forces auto AST construction off for that alternative 1873 of that rule. 1874 1875 To avoid cycles, copy semantics are used: 1876 1877 r : INT -> INT INT ; 1878 1879 means make two new nodes from the same INT token. 1880 1881 Repeated references to a rule element implies a copy for at least one 1882 tree: 1883 1884 a : atom -> ^(atom atom) ; // NOT CYCLE! (dup atom tree) 1885 1886 * $ruleLabel.tree refers to tree created by matching the labeled element. 1887 1888 * A description of the blocks/alts is generated as a comment in output code 1889 1890 * A timestamp / signature is put at top of each generated code file 1891 1892 3.0ea2 - June 12, 2005 1893 1894 Bug fixes 1895 1896 * Some error messages were missing the stackTrace parameter 1897 1898 * Removed the file locking mechanism as it's not cross platform 1899 1900 * Some absolute vs relative path name problems with writing output 1901 files. Rules are now more concrete. -o option takes precedence 1902 // -o /tmp /var/lib/t.g => /tmp/T.java 1903 // -o subdir/output /usr/lib/t.g => subdir/output/T.java 1904 // -o . /usr/lib/t.g => ./T.java 1905 // -o /tmp subdir/t.g => /tmp/subdir/t.g 1906 // If they didn't specify a -o dir so just write to location 1907 // where grammar is, absolute or relative 1908 1909 * does error checking on unknown option names now 1910 1911 * Using just language code not locale name for error message file. I.e., 1912 the default (and for any English speaking locale) is en.stg not en_US.stg 1913 anymore. 1914 1915 * The error manager now asks the Tool to panic rather than simply doing 1916 a System.exit(). 1917 1918 * Lots of refactoring concerning grammar, rule, subrule options. Now 1919 detects invalid options. 1920 1921 3.0ea1 - June 1, 2005 1922 1923 Initial early access release 1924 1925