1 ANTLR v3.0.1 C Runtime
2 ANTLR 3.0.1
3 January 1, 2008
4
5 At the moment, the use of the C runtime engine for the parser is not generally
6 for the inexperienced C programmer. However this is mainly because of the lack
7 of documentation on use, which will be corrected shortly. The C runtime
8 code itself is however well documented with doxygen style comments and a
9 reasonably experienced C programmer should be able to piece it together. You
10 can visit the documentation at: http://www.antlr.org/api/C/index.html
11
12 The general make up is that everything is implemented as a pseudo class/object
13 initialized with pointers to its 'member' functions and data. All objects are
14 (usually) created by factories, which auto manage the memory allocation and
15 release and generally make life easier. If you remember this rule, everything
16 should fall in to place.
17
18 Jim Idle - Portland Oregon, Jan 2008
19 jimi idle ws
20
21 ===============================================================================
22
23 Terence Parr, parrt at cs usfca edu
24 ANTLR project lead and supreme dictator for life
25 University of San Francisco
26
27 INTRODUCTION
28
29 Welcome to ANTLR v3! I've been working on this for nearly 4 years and it's
30 almost ready! I plan no feature additions between this beta and first
31 3.0 release. I have lots of features to add later, but this will be
32 the first set. Ultimately, I need to rewrite ANTLR v3 in itself (it's
33 written in 2.7.7 at the moment and also needs StringTemplate 3.0 or
34 later).
35
36 You should use v3 in conjunction with ANTLRWorks:
37
38 http://www.antlr.org/works/index.html
39
40 WARNING: We have bits of documentation started, but nothing super-complete
41 yet. The book will be printed May 2007:
42
43 http://www.pragmaticprogrammer.com/titles/tpantlr/index.html
44
45 but we should have a beta PDF available on that page in Feb 2007.
46
47 You also have the examples plus the source to guide you.
48
49 See the new wiki FAQ:
50
51 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+v3+FAQ
52
53 and general doc root:
54
55 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
56
57 Please help add/update FAQ entries.
58
59 I have made very little effort at this point to deal well with
60 erroneous input (e.g., bad syntax might make ANTLR crash). I will clean
61 this up after I've rewritten v3 in v3.
62
63 Per the license in LICENSE.txt, this software is not guaranteed to
64 work and might even destroy all life on this planet:
65
66 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
67 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
68 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
69 DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
70 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
71 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
72 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
73 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
74 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
75 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
76 POSSIBILITY OF SUCH DAMAGE.
77
78 EXAMPLES
79
80 ANTLR v3 sample grammars:
81
82 http://www.antlr.org/download/examples-v3.tar.gz
83
84 contains the following examples: LL-star, cminus, dynamic-scope,
85 fuzzy, hoistedPredicates, island-grammar, java, python, scopes,
86 simplecTreeParser, treeparser, tweak, xmlLexer.
87
88 Also check out Mantra Programming Language for a prototype (work in
89 progress) using v3:
90
91 http://www.linguamantra.org/
92
93 ----------------------------------------------------------------------
94
95 What is ANTLR?
96
97 ANTLR stands for (AN)other (T)ool for (L)anguage (R)ecognition and was
98 originally known as PCCTS. ANTLR is a language tool that provides a
99 framework for constructing recognizers, compilers, and translators
100 from grammatical descriptions containing actions. Target language list:
101
102 http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets
103
104 ----------------------------------------------------------------------
105
106 How is ANTLR v3 different than ANTLR v2?
107
108 See migration guide:
109 http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+ANTLR+3
110
111 ANTLR v3 has a far superior parsing algorithm called LL(*) that
112 handles many more grammars than v2 does. In practice, it means you
113 can throw almost any grammar at ANTLR that is non-left-recursive and
114 unambiguous (same input can be matched by multiple rules); the cost is
115 perhaps a tiny bit of backtracking, but with a DFA not a full parser.
116 You can manually set the max lookahead k as an option for any decision
117 though. The LL(*) algorithm ramps up to use more lookahead when it
118 needs to and is much more efficient than normal LL backtracking. There
119 is support for syntactic predicate (full LL backtracking) when LL(*)
120 fails.
121
122 Lexers are much easier due to the LL(*) algorithm as well. Previously
123 these two lexer rules would cause trouble because ANTLR couldn't
124 distinguish between them with finite lookahead to see the decimal
125 point:
126
127 INT : ('0'..'9')+ ;
128 FLOAT : INT '.' INT ;
129
130 The syntax is almost identical for features in common, but you should
131 note that labels are always '=' not ':'. So do id=ID not id:ID.
132
133 You can do combined lexer/parser grammars again (ala PCCTS) both lexer
134 and parser rules are defined in the same file. See the examples.
135 Really nice. You can reference strings and characters in the grammar
136 and ANTLR will generate the lexer for you.
137
138 The attribute structure has been enhanced. Rules may have multiple
139 return values, for example. Further, there are dynamically scoped
140 attributes whereby a rule may define a value usable by any rule it
141 invokes directly or indirectly w/o having to pass a parameter all the
142 way down.
143
144 ANTLR v3 tree construction is far superior--it provides tree rewrite
145 rules where the right hand side is simply the tree grammar fragment
146 describing the tree you want to build:
147
148 formalArgs
149 : typename declarator (',' typename declarator )*
150 -> ^(ARG typename declarator)+
151 ;
152
153 That builds tree sequences like:
154
155 ^(ARG int v1) ^(ARG int v2)
156
157 ANTLR v3 also incorporates StringTemplate:
158
159 http://www.stringtemplate.org
160
161 just like AST support. It is useful for generating output. For
162 example this rule creates a template called 'import' for each import
163 definition found in the input stream:
164
165 grammar Java;
166 options {
167 output=template;
168 }
169 ...
170 importDefinition
171 : 'import' identifierStar SEMI
172 -> import(name={$identifierStar.st},
173 begin={$identifierStar.start},
174 end={$identifierStar.stop})
175 ;
176
177 The attributes are set via assignments in the argument list. The
178 arguments are actions with arbitrary expressions in the target
179 language. The .st label property is the result template from a rule
180 reference. There is a nice shorthand in actions too:
181
182 %foo(a={},b={},...) ctor
183 %({name-expr})(a={},...) indirect template ctor reference
184 %{string-expr} anonymous template from string expr
185 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z
186 %x.y = z; set template attribute y of x (always set never get attr)
187 to z [languages like python without ';' must still use the
188 ';' which the code generator is free to remove during code gen]
189 Same as '(x).setAttribute("y", z);'
190
191 For ANTLR v3 I decided to make the most common tasks easy by default
192 rather. This means that some of the basic objects are heavier weight
193 than some speed demons would like, but they are free to pare it down
194 leaving most programmers the luxury of having it "just work." For
195 example, to read in some input, tweak it, and write it back out
196 preserving whitespace, is easy in v3.
197
198 The ANTLR source code is much prettier. You'll also note that the
199 run-time classes are conveniently encapsulated in the
200 org.antlr.runtime package.
201
202 ----------------------------------------------------------------------
203
204 How do I install this damn thing?
205
206 Just untar and you'll get:
207
208 antlr-3.0b6/README.txt (this file)
209 antlr-3.0b6/LICENSE.txt
210 antlr-3.0b6/src/org/antlr/...
211 antlr-3.0b6/lib/stringtemplate-3.0.jar (3.0b6 needs 3.0)
212 antlr-3.0b6/lib/antlr-2.7.7.jar
213 antlr-3.0b6/lib/antlr-3.0b6.jar
214
215 Then you need to add all the jars in lib to your CLASSPATH.
216
217 ----------------------------------------------------------------------
218
219 How do I use ANTLR v3?
220
221 [I am assuming you are only using the command-line (and not the
222 ANTLRWorks GUI)].
223
224 Running ANTLR with no parameters shows you:
225
226 ANTLR Parser Generator Early Access Version 3.0b6 (Jan 31, 2007) 1989-2007
227 usage: java org.antlr.Tool [args] file.g [file2.g file3.g ...]
228 -o outputDir specify output directory where all output is generated
229 -lib dir specify location of token files
230 -report print out a report about the grammar(s) processed
231 -print print out the grammar without actions
232 -debug generate a parser that emits debugging events
233 -profile generate a parser that computes profiling information
234 -nfa generate an NFA for each rule
235 -dfa generate a DFA for each decision point
236 -message-format name specify output style for messages
237 -X display extended argument list
238
239 For example, consider how to make the LL-star example from the examples
240 tarball you can get at http://www.antlr.org/download/examples-v3.tar.gz
241
242 $ cd examples/java/LL-star
243 $ java org.antlr.Tool simplec.g
244 $ jikes *.java
245
246 For input:
247
248 char c;
249 int x;
250 void bar(int x);
251 int foo(int y, char d) {
252 int i;
253 for (i=0; i<3; i=i+1) {
254 x=3;
255 y=5;
256 }
257 }
258
259 you will see output as follows:
260
261 $ java Main input
262 bar is a declaration
263 foo is a definition
264
265 What if I want to test my parser without generating code? Easy. Just
266 run ANTLR in interpreter mode. It can't execute your actions, but it
267 can create a parse tree from your input to show you how it would be
268 matched. Use the org.antlr.tool.Interp main class. In the following,
269 I interpret simplec.g on t.c, which contains "int x;"
270
271 $ java org.antlr.tool.Interp simplec.g WS program t.c
272 ( <grammar SimpleC>
273 ( program
274 ( declaration
275 ( variable
276 ( type [@0,0:2='int',<14>,1:0] )
277 ( declarator [@2,4:4='x',<2>,1:4] )
278 [@3,5:5=';',<5>,1:5]
279 )
280 )
281 )
282 )
283
284 where I have formatted the output to make it more readable. I have
285 told it to ignore all WS tokens.
286
287 ----------------------------------------------------------------------
288
289 How do I rebuild ANTLR v3?
290
291 Make sure the following two jars are in your CLASSPATH
292
293 antlr-3.0b6/lib/stringtemplate-3.0.jar
294 antlr-3.0b6/lib/antlr-2.7.7.jar
295 junit.jar [if you want to build the test directories]
296
297 then jump into antlr-3.0b6/src directory and then type:
298
299 $ javac -d . org/antlr/Tool.java org/antlr/*/*.java org/antlr/*/*/*.java
300
301 Takes 9 seconds on my 1Ghz laptop or 4 seconds with jikes. Later I'll
302 have a real build mechanism, though I must admit the one-liner appeals
303 to me. I use Intellij so I never type anything actually to build.
304
305 There is also an ANT build.xml file, but I know nothing of ANT; contributed
306 by others (I'm opposed to any tool with an XML interface for Humans).
307
308 -----------------------------------------------------------------------
309 C# Target Notes
310
311 1. Auto-generated lexers do not inherit parent parser's @namespace
312 {...} value. Use @lexer::namespace{...}.
313
314 -----------------------------------------------------------------------
315
316 CHANGES
317
318 March 17, 2007
319
320 * Jonathan DeKlotz updated C# templates to be 3.0b6 current
321
322 March 14, 2007
323
324 * Manually-specified (...)=> force backtracking eval of that predicate.
325 backtracking=true mode does not however. Added unit test.
326
327 March 14, 2007
328
329 * Fixed bug in lexer where ~T didn't compute the set from rule T.
330
331 * Added -Xnoinlinedfa make all DFA with tables; no inline prediction with IFs
332
333 * Fixed http://www.antlr.org:8888/browse/ANTLR-80.
334 Sem pred states didn't define lookahead vars.
335
336 * Fixed http://www.antlr.org:8888/browse/ANTLR-91.
337 When forcing some acyclic DFA to be state tables, they broke.
338 Forcing all DFA to be state tables should give same results.
339
340 March 12, 2007
341
342 * setTokenSource in CommonTokenStream didn't clear tokens list.
343 setCharStream calls reset in Lexer.
344
345 * Altered -depend. No longer printing grammar files for multiple input
346 files with -depend. Doesn't show T__.g temp file anymore. Added
347 TLexer.tokens. Added .h files if defined.
348
349 February 11, 2007
350
351 * Added -depend command-line option that, instead of processing files,
352 it shows you what files the input grammar(s) depend on and what files
353 they generate. For combined grammar T.g:
354
355 $ java org.antlr.Tool -depend T.g
356
357 You get:
358
359 TParser.java : T.g
360 T.tokens : T.g
361 T__.g : T.g
362
363 Now, assuming U.g is a tree grammar ref'd T's tokens:
364
365 $ java org.antlr.Tool -depend T.g U.g
366
367 TParser.java : T.g
368 T.tokens : T.g
369 T__.g : T.g
370 U.g: T.tokens
371 U.java : U.g
372 U.tokens : U.g
373
374 Handles spaces by escaping them. Pays attention to -o, -fo and -lib.
375 Dir 'x y' is a valid dir in current dir.
376
377 $ java org.antlr.Tool -depend -lib /usr/local/lib -o 'x y' T.g U.g
378 x\ y/TParser.java : T.g
379 x\ y/T.tokens : T.g
380 x\ y/T__.g : T.g
381 U.g: /usr/local/lib/T.tokens
382 x\ y/U.java : U.g
383 x\ y/U.tokens : U.g
384
385 You have API access via org.antlr.tool.BuildDependencyGenerator class:
386 getGeneratedFileList(), getDependenciesFileList(). You can also access
387 the output template: getDependencies(). The file
388 org/antlr/tool/templates/depend.stg contains the template. You can
389 modify as you want. File objects go in so you can play with path etc...
390
391 February 10, 2007
392
393 * no more .gl files generated. All .g all the time.
394
395 * changed @finally to be @after and added a finally clause to the
396 exception stuff. I also removed the superfluous "exception"
397 keyword. Here's what the new syntax looks like:
398
399 a
400 @after { System.out.println("ick"); }
401 : 'a'
402 ;
403 catch[RecognitionException e] { System.out.println("foo"); }
404 catch[IOException e] { System.out.println("io"); }
405 finally { System.out.println("foobar"); }
406
407 @after executes after bookkeeping to set $rule.stop, $rule.tree but
408 before scopes pop and any memoization happens. Dynamic scopes and
409 memoization are still in generated finally block because they must
410 exec even if error in rule. The @after action and tree setting
411 stuff can technically be skipped upon syntax error in rule. [Later
412 we might add something to finally to stick an ERROR token in the
413 tree and set the return value.] Sequence goes: set $stop, $tree (if
414 any), @after (if any), pop scopes (if any), memoize (if needed),
415 grammar finally clause. Last 3 are in generated code's finally
416 clause.
417
418 3.0b6 - January 31, 2007
419
420 January 30, 2007
421
422 * Fixed bug in IntervalSet.and: it returned the same empty set all the time
423 rather than new empty set. Code altered the same empty set.
424
425 * Made analysis terminate faster upon a decision that takes too long;
426 it seemed to keep doing work for a while. Refactored some names
427 and updated comments. Also made it terminate when it realizes it's
428 non-LL(*) due to recursion. just added terminate conditions to loop
429 in convert().
430
431 * Sometimes fatal non-LL(*) messages didn't appear; instead you got
432 "antlr couldn't analyze", which is actually untrue. I had the
433 order of some prints wrong in the DecisionProbe.
434
435 * The code generator incorrectly detected when it could use a fixed,
436 acyclic inline DFA (i.e., using an IF). Upon non-LL(*) decisions
437 with predicates, analysis made cyclic DFA. But this stops
438 the computation detecting whether they are cyclic. I just added
439 a protection in front of the acyclic DFA generator to avoid if
440 non-LL(*). Updated comments.
441
442 January 23, 2007
443
444 * Made tree node streams use adaptor to create navigation nodes.
445 Thanks to Emond Papegaaij.
446
447 January 22, 2007
448
449 * Added lexer rule properties: start, stop
450
451 January 1, 2007
452
453 * analysis failsafe is back on; if a decision takes too long, it bails out
454 and uses k=1
455
456 January 1, 2007
457
458 * += labels for rules only work for output option; previously elements
459 of list were the return value structs, but are now either the tree or
460 StringTemplate return value. You can label different rules now
461 x+=a x+=b.
462
463 December 30, 2006
464
465 * Allow \" to work correctly in "..." template.
466
467 December 28, 2006
468
469 * errors that are now warnings: missing AST label type in trees.
470 Also "no start rule detected" is warning.
471
472 * tree grammars also can do rewrite=true for output=template.
473 Only works for alts with single node or tree as alt elements.
474 If you are going to use $text in a tree grammar or do rewrite=true
475 for templates, you must use in your main:
476
477 nodes.setTokenStream(tokens);
478
479 * You get a warning for tree grammars that do rewrite=true and
480 output=template and have -> for alts that are not simple nodes
481 or simple trees. new unit tests in TestRewriteTemplates at end.
482
483 December 27, 2006
484
485 * Error message appears when you use -> in tree grammar with
486 output=template and rewrite=true for alt that is not simple
487 node or tree ref.
488
489 * no more $stop attribute for tree parsers; meaningless/useless.
490 Removed from TreeRuleReturnScope also.
491
492 * rule text attribute in tree parser must pull from token buffer.
493 Makes no sense otherwise. added getTokenStream to TreeNodeStream
494 so rule $text attr works. CommonTreeNodeStream etc... now let
495 you set the token stream so you can access later from tree parser.
496 $text is not well-defined for rules like
497
498 slist : stat+ ;
499
500 because stat is not a single node nor rooted with a single node.
501 $slist.text will get only first stat. I need to add a warning about
502 this...
503
504 * Fixed http://www.antlr.org:8888/browse/ANTLR-76 for Java.
505 Enhanced TokenRewriteStream so it accepts any object; converts
506 to string at last second. Allows you to rewrite with StringTemplate
507 templates now :)
508
509 * added rewrite option that makes -> template rewrites do replace ops for
510 TokenRewriteStream input stream. In output=template and rewrite=true mode
511 same as before 'cept that the parser does
512
513 ((TokenRewriteStream)input).replace(
514 ((Token)retval.start).getTokenIndex(),
515 input.LT(-1).getTokenIndex(),
516 retval.st);
517
518 after each rewrite so that the input stream is altered. Later refs to
519 $text will have rewrites. Here's a sample test program for grammar Rew.
520
521 FileReader groupFileR = new FileReader("Rew.stg");
522 StringTemplateGroup templates = new StringTemplateGroup(groupFileR);
523 ANTLRInputStream input = new ANTLRInputStream(System.in);
524 RewLexer lexer = new RewLexer(input);
525 TokenRewriteStream tokens = new TokenRewriteStream(lexer);
526 RewParser parser = new RewParser(tokens);
527 parser.setTemplateLib(templates);
528 parser.program();
529 System.out.println(tokens.toString());
530 groupFileR.close();
531
532 December 26, 2006
533
534 * BaseTree.dupTree didn't dup recursively.
535
536 December 24, 2006
537
538 * Cleaned up some comments and removed field treeNode
539 from MismatchedTreeNodeException class. It is "node" in
540 RecognitionException.
541
542 * Changed type from Object to BitSet for expecting fields in
543 MismatchedSetException and MismatchedNotSetException
544
545 * Cleaned up error printing in lexers and the messages that it creates.
546
547 * Added this to TreeAdaptor:
548 /** Return the token object from which this node was created.
549 * Currently used only for printing an error message.
550 * The error display routine in BaseRecognizer needs to
551 * display where the input the error occurred. If your
552 * tree of limitation does not store information that can
553 * lead you to the token, you can create a token filled with
554 * the appropriate information and pass that back. See
555 * BaseRecognizer.getErrorMessage().
556 */
557 public Token getToken(Object t);
558
559 December 23, 2006
560
561 * made BaseRecognizer.displayRecognitionError nonstatic so people can
562 override it. Not sure why it was static before.
563
564 * Removed state/decision message that comes out of no
565 viable alternative exceptions, as that was too much.
566 removed the decision number from the early exit exception
567 also. During development, you can simply override
568 displayRecognitionError from BaseRecognizer to add the stuff
569 back in if you want.
570
571 * made output go to an output method you can override: emitErrorMessage()
572
573 * general cleanup of the error emitting code in BaseRecognizer. Lots
574 more stuff you can override: getErrorHeader, getTokenErrorDisplay,
575 emitErrorMessage, getErrorMessage.
576
577 December 22, 2006
578
579 * Altered Tree.Parser.matchAny() so that it skips entire trees if
580 node has children otherwise skips one node. Now this works to
581 skip entire body of function if single-rooted subtree:
582 ^(FUNC name=ID arg=ID .)
583
584 * Added "reverse index" from node to stream index. Override
585 fillReverseIndex() in CommonTreeNodeStream if you want to change.
586 Use getNodeIndex(node) to find stream index for a specific tree node.
587 See getNodeIndex(), reverseIndex(Set tokenTypes),
588 reverseIndex(int tokenType), fillReverseIndex(). The indexing
589 costs time and memory to fill, but pulling stuff out will be lots
590 faster as it can jump from a node ptr straight to a stream index.
591
592 * Added TreeNodeStream.get(index) to make it easier for interpreters to
593 jump around in tree node stream.
594
595 * New CommonTreeNodeStream buffers all nodes in stream for fast jumping
596 around. It now has push/pop methods to invoke other locations in
597 the stream for building interpreters.
598
599 * Moved CommonTreeNodeStream to UnBufferedTreeNodeStream and removed
600 Iterator implementation. moved toNodesOnlyString() to TestTreeNodeStream
601
602 * [BREAKS ANY TREE IMPLEMENTATION]
603 made CommonTreeNodeStream work with any tree node type. TreeAdaptor
604 now implements isNil so must add; trivial, but does break back
605 compatibility.
606
607 December 17, 2006
608
609 * Added traceIn/Out methods to recognizers so that you can override them;
610 previously they were in-line print statements. The message has also
611 been slightly improved.
612
613 * Factored BuildParseTree into debug package; cleaned stuff up. Fixed
614 unit tests.
615
616 December 15, 2006
617
618 * [BREAKS ANY TREE IMPLEMENTATION]
619 org.antlr.runtime.tree.Tree; needed to add get/set for token start/stop
620 index so CommonTreeAdaptor can assume Tree interface not CommonTree
621 implementation. Otherwise, no way to create your own nodes that satisfy
622 Tree because CommonTreeAdaptor was doing
623
624 public int getTokenStartIndex(Object t) {
625 return ((CommonTree)t).startIndex;
626 }
627
628 Added to Tree:
629
630 /** What is the smallest token index (indexing from 0) for this node
631 * and its children?
632 */
633 int getTokenStartIndex();
634
635 void setTokenStartIndex(int index);
636
637 /** What is the largest token index (indexing from 0) for this node
638 * and its children?
639 */
640 int getTokenStopIndex();
641
642 void setTokenStopIndex(int index);
643
644 December 13, 2006
645
646 * Added org.antlr.runtime.tree.DOTTreeGenerator so you can generate DOT
647 diagrams easily from trees.
648
649 CharStream input = new ANTLRInputStream(System.in);
650 TLexer lex = new TLexer(input);
651 CommonTokenStream tokens = new CommonTokenStream(lex);
652 TParser parser = new TParser(tokens);
653 TParser.e_return r = parser.e();
654 Tree t = (Tree)r.tree;
655 System.out.println(t.toStringTree());
656 DOTTreeGenerator gen = new DOTTreeGenerator();
657 StringTemplate st = gen.toDOT(t);
658 System.out.println(st);
659
660 * Changed the way mark()/rewind() work in CommonTreeNode stream to mirror
661 more flexible solution in ANTLRStringStream. Forgot to set lastMarker
662 anyway. Now you can rewind to non-most-recent marker.
663
664 December 12, 2006
665
666 * Temp lexer now end in .gl (T__.gl, for example)
667
668 * TreeParser suffix no longer generated for tree grammars
669
670 * Defined reset for lexer, parser, tree parser; rewinds the input stream also
671
672 December 10, 2006
673
674 * Made Grammar.abortNFAToDFAConversion() abort in middle of a DFA.
675
676 December 9, 2006
677
678 * fixed bug in OrderedHashSet.add(). It didn't track elements correctly.
679
680 December 6, 2006
681
682 * updated build.xml for future Ant compatibility, thanks to Matt Benson.
683
684 * various tests in TestRewriteTemplate and TestSyntacticPredicateEvaluation
685 were using the old 'channel' vs. new '$channel' notation.
686 TestInterpretedParsing didn't pick up an earlier change to CommonToken.
687 Reported by Matt Benson.
688
689 * fixed platform dependent test failures in TestTemplates, supplied by Matt
690 Benson.
691
692 November 29, 2006
693
694 * optimized semantic predicate evaluation so that p||!p yields true.
695
696 November 22, 2006
697
698 * fixed bug that prevented var = $rule.some_retval from working in anything
699 but the first alternative of a rule or subrule.
700
701 * attribute names containing digits were not allowed, this is now fixed,
702 allowing attributes like 'name1' but not '1name1'.
703
704 November 19, 2006
705
706 * Removed LeftRecursionMessage and apparatus because it seems that I check
707 for left recursion upfront before analysis and everything gets specified as
708 recursion cycles at this point.
709
710 November 16, 2006
711
712 * TokenRewriteStream.replace was not passing programName to next method.
713
714 November 15, 2006
715
716 * updated DOT files for DFA generation to make smaller circles.
717
718 * made epsilon edges italics in the NFA diagrams.
719
720 3.0b5 - November 15, 2006
721
722 The biggest thing is that your grammar file names must match the grammar name
723 inside (your generated class names will also be different) and we use
724 $channel=HIDDEN now instead of channel=99 inside lexer actions.
725 Should be compatible other than that. Please look at complete list of
726 changes.
727
728 November 14, 2006
729
730 * Force token index to be -1 for CommonIndex in case not set.
731
732 November 11, 2006
733
734 * getUniqueID for TreeAdaptor now uses identityHashCode instead of hashCode.
735
736 November 10, 2006
737
738 * No grammar nondeterminism warning now when wildcard '.' is final alt.
739 Examples:
740
741 a : A | B | . ;
742
743 A : 'a'
744 | .
745 ;
746
747 SL_COMMENT
748 : '//' (options {greedy=false;} : .)* '\r'? '\n'
749 ;
750
751 SL_COMMENT2
752 : '//' (options {greedy=false;} : 'x'|.)* '\r'? '\n'
753 ;
754
755
756 November 8, 2006
757
758 * Syntactic predicates did not get hoisting properly upon non-LL(*) decision. Other hoisting issues fixed. Cleaned up code.
759
760 * Removed failsafe that check to see if I'm spending too much time on a single DFA; I don't think we need it anymore.
761
762 November 3, 2006
763
764 * $text, $line, etc... were not working in assignments. Fixed and added
765 test case.
766
767 * $label.text translated to label.getText in lexer even if label was on a char
768
769 November 2, 2006
770
771 * Added error if you don't specify what the AST type is; actions in tree
772 grammar won't work without it.
773
774 $ cat x.g
775 tree grammar x;
776 a : ID {String s = $ID.text;} ;
777
778 ANTLR Parser Generator Early Access Version 3.0b5 (??, 2006) 1989-2006
779 error: x.g:0:0: (152) tree grammar x has no ASTLabelType option
780
781 November 1, 2006
782
783 * $text, $line, etc... were not working properly within lexer rule.
784
785 October 32, 2006
786
787 * Finally actions now execute before dynamic scopes are popped it in the
788 rule. Previously was not possible to access the rules scoped variables
789 in a finally action.
790
791 October 29, 2006
792
793 * Altered ActionTranslator to emit errors on setting read-only attributes
794 such as $start, $stop, $text in a rule. Also forbid setting any attributes
795 in rules/tokens referenced by a label or name.
796 Setting dynamic scopes's attributes and your own parameter attributes
797 is legal.
798
799 October 27, 2006
800
801 * Altered how ANTLR figures out what decision is associated with which
802 block of grammar. Makes ANTLRWorks correctly find DFA for a block.
803
804 October 26, 2006
805
806 * Fixed bug where EOT transitions led to no NFA configs in a DFA state,
807 yielding an error in DFA table generation.
808
809 * renamed action.g to ActionTranslator.g
810 the ActionTranslator class is now called ActionTranslatorLexer, as ANTLR
811 generates this classname now. Fixed rest of codebase accordingly.
812
813 * added rules recognizing setting of scopes' attributes to ActionTranslator.g
814 the Objective C target needed access to the right-hand side of the assignment
815 in order to generate correct code
816
817 * changed ANTLRCore.sti to reflect the new mandatory templates to support the above
818 namely: scopeSetAttributeRef, returnSetAttributeRef and the ruleSetPropertyRef_*
819 templates, with the exception of ruleSetPropertyRef_text. we cannot set this attribute
820
821 October 19, 2006
822
823 * Fixed 2 bugs in DFA conversion that caused exceptions.
824 altered functionality of getMinElement so it ignores elements<0.
825
826 October 18, 2006
827
828 * moved resetStateNumbersToBeContiguous() to after issuing of warnings;
829 an internal error in that routine should make more sense as issues
830 with decision will appear first.
831
832 * fixed cut/paste bug I introduced when fixed EOF in min/max
833 bug. Prevented C grammar from working briefly.
834
835 October 17, 2006
836
837 * Removed a failsafe that seems to be unnecessary that ensure DFA didn't
838 get too big. It was resulting in some failures in code generation that
839 led me on quite a strange debugging trip.
840
841 October 16, 2006
842
843 * Use channel=HIDDEN not channel=99 to put tokens on hidden channel.
844
845 October 12, 2006
846
847 * ANTLR now has a customizable message format for errors and warnings,
848 to make it easier to fulfill requirements by IDEs and such.
849 The format to be used can be specified via the '-message-format name'
850 command line switch. The default for name is 'antlr', also available
851 at the moment is 'gnu'. This is done via StringTemplate, for details
852 on the requirements look in org/antlr/tool/templates/messages/formats/
853
854 * line numbers for lexers in combined grammars are now reported correctly.
855
856 September 29, 2006
857
858 * ANTLRReaderStream improperly checked for end of input.
859
860 September 28, 2006
861
862 * For ANTLRStringStream, LA(-1) was off by one...gave you LA(-2).
863
864 3.0b4 - August 24, 2006
865
866 * error when no rules in grammar. doesn't crash now.
867
868 * Token is now an interface.
869
870 * remove dependence on non runtime classes in runtime package.
871
872 * filename and grammar name must be same Foo in Foo.g. Generates FooParser,
873 FooLexer, ... Combined grammar Foo generates Foo$Lexer.g which generates
874 FooLexer.java. tree grammars generate FooTreeParser.java
875
876 August 24, 2006
877
878 * added C# target to lib, codegen, templates
879
880 August 11, 2006
881
882 * added tree arg to navigation methods in treeadaptor
883
884 August 07, 2006
885
886 * fixed bug related to (a|)+ on end of lexer rules. crashed instead
887 of warning.
888
889 * added warning that interpreter doesn't do synpreds yet
890
891 * allow different source of classloader:
892 ClassLoader cl = Thread.currentThread().getContextClassLoader();
893 if ( cl==null ) {
894 cl = this.getClass().getClassLoader();
895 }
896
897
898 July 26, 2006
899
900 * compressed DFA edge tables significantly. All edge tables are
901 unique. The transition table can reuse arrays. Look like this now:
902
903 public static readonly DFA30_transition0 =
904 new short[] { 46, 46, -1, 46, 46, -1, -1, -1, -1, -1, -1, -1,...};
905 public static readonly DFA30_transition1 =
906 new short[] { 21 };
907 public static readonly short[][] DFA30_transition = {
908 DFA30_transition0,
909 DFA30_transition0,
910 DFA30_transition1,
911 ...
912 };
913
914 * If you defined both a label like EQ and '=', sometimes the '=' was
915 used instead of the EQ label.
916
917 * made headerFile template have same arg list as outputFile for consistency
918
919 * outputFile, lexer, genericParser, parser, treeParser templates
920 reference cyclicDFAs attribute which was no longer used after I
921 started the new table-based DFA. I made cyclicDFADescriptors
922 argument to outputFile and headerFile (only). I think this is
923 correct as only OO languages will want the DFA in the recognizer.
924 At the top level, C and friends can use it. Changed name to use
925 cyclicDFAs again as it's a better name probably. Removed parameter
926 from the lexer, ... For example, my parser template says this now:
927
928 <cyclicDFAs:cyclicDFA()> <! dump tables for all DFA !>
929
930 * made all token ref token types go thru code gen's
931 getTokenTypeAsTargetLabel()
932
933 * no more computing DFA transition tables for acyclic DFA.
934
935 July 25, 2006
936
937 * fixed a place where I was adding syn predicates into rewrite stuff.
938
939 * turned off invalid token index warning in AW support; had a problem.
940
941 * bad location event generated with -debug for synpreds in autobacktrack mode.
942
943 July 24, 2006
944
945 * changed runtime.DFA so that it treats all chars and token types as
946 char (unsigned 16 bit int). -1 becomes '\uFFFF' then or 65535.
947
948 * changed MAX_STATE_TRANSITIONS_FOR_TABLE to be 65534 by default
949 now. This means that all states can use a table to do transitions.
950
951 * was not making synpreds on (C)* type loops with backtrack=true
952
953 * was copying tree stuff and actions into synpreds with backtrack=true
954
955 * was making synpreds on even single alt rules / blocks with backtrack=true
956
957 3.0b3 - July 21, 2006
958
959 * ANTLR fails to analyze complex decisions much less frequently. It
960 turns out that the set of decisions for which ANTLR fails (times
961 out) is the same set (so far) of non-LL(*) decisions. Morever, I'm
962 able to detect this situation quickly and report rather than timing
963 out. Errors look like:
964
965 java.g:468:23: [fatal] rule concreteDimensions has non-LL(*)
966 decision due to recursive rule invocations in alts 1,2. Resolve
967 by left-factoring or using syntactic predicates with fixed k
968 lookahead or use backtrack=true option.
969
970 This message only appears when k=*.
971
972 * Shortened no viable alt messages to not include decision
973 description:
974
975 [compilationUnit, declaration]: line 8:8 decision=<<67:1: declaration
976 : ( ( fieldDeclaration )=> fieldDeclaration | ( methodDeclaration )=>
977 methodDeclaration | ( constructorDeclaration )=>
978 constructorDeclaration | ( classDeclaration )=> classDeclaration | (
979 interfaceDeclaration )=> interfaceDeclaration | ( blockDeclaration )=>
980 blockDeclaration | emptyDeclaration );>> state 3 (decision=14) no
981 viable alt; token=[@1,184:187='java',<122>,8:8]
982
983 too long and hard to read.
984
985 July 19, 2006
986
987 * Code gen bug: states with no emanating edges were ignored by ST.
988 Now an empty list is used.
989
990 * Added grammar parameter to recognizer templates so they can access
991 properties like getName(), ...
992
993 July 10, 2006
994
995 * Fixed the gated pred merged state bug. Added unit test.
996
997 * added new method to Target: getTokenTypeAsTargetLabel()
998
999 July 7, 2006
1000
1001 * I was doing an AND instead of OR in the gated predicate stuff.
1002 Thanks to Stephen Kou!
1003
1004 * Reduce op for combining predicates was insanely slow sometimes and
1005 didn't actually work well. Now it's fast and works.
1006
1007 * There is a bug in merging of DFA stop states related to gated
1008 preds...turned it off for now.
1009
1010 3.0b2 - July 5, 2006
1011
1012 July 5, 2006
1013
1014 * token emission not properly protected in lexer filter mode.
1015
1016 * EOT, EOT DFA state transition tables should be init'd to -1 (only
1017 was doing this for compressed tables). Fixed.
1018
1019 * in trace mode, exit method not shown for memoized rules
1020
1021 * added -Xmaxdfaedges to allow you to increase number of edges allowed
1022 for a single DFA state before it becomes "special" and can't fit in
1023 a simple table.
1024
1025 * Bug in tables. Short are signed so min/max tables for DFA are now
1026 char[]. Bizarre.
1027
1028 July 3, 2006
1029
1030 * Added a method to reset the tool error state for current thread.
1031 See ErrorManager.java
1032
1033 * [Got this working properly today] backtrack mode that let's you type
1034 in any old crap and ANTLR will backtrack if it can't figure out what
1035 you meant. No errors are reported by antlr during analysis. It
1036 implicitly adds a syn pred in front of every production, using them
1037 only if static grammar LL(*) analysis fails. Syn pred code is not
1038 generated if the pred is not used in a decision.
1039
1040 This is essentially a rapid prototyping mode.
1041
1042 * Added backtracking report to the -report option
1043
1044 * Added NFA->DFA conversion early termination report to the -report option
1045
1046 * Added grammar level k and backtrack options to -report
1047
1048 * Added a dozen unit tests to test autobacktrack NFA construction.
1049
1050 * If you are using filter mode, you must manually use option
1051 memoize=true now.
1052
1053 July 2, 2006
1054
1055 * Added k=* option so you can set k=2, for example, on whole grammar,
1056 but an individual decision can be LL(*).
1057
1058 * memoize option for grammars, rules, blocks. Remove -nomemo cmd-line option
1059
1060 * but in DOT generator for DFA; fixed.
1061
1062 * runtime.DFA reported errors even when backtracking
1063
1064 July 1, 2006
1065
1066 * Added -X option list to help
1067
1068 * Syn preds were being hoisted into other rules, causing lots of extra
1069 backtracking.
1070
1071 June 29, 2006
1072
1073 * unnecessary files removed during build.
1074
1075 * Matt Benson updated build.xml
1076
1077 * Detecting use of synpreds in analysis now instead of codegen. In
1078 this way, I can avoid analyzing decisions in synpreds for synpreds
1079 not used in a DFA for a real rule. This is used to optimize things
1080 for backtrack option.
1081
1082 * Code gen must add _fragment or whatever to end of pred name in
1083 template synpredRule to avoid having ANTLR know anything about
1084 method names.
1085
1086 * Added -IdbgST option to emit ST delimiters at start/stop of all
1087 templates spit out.
1088
1089 June 28, 2006
1090
1091 * Tweaked message when ANTLR cannot handle analysis.
1092
1093 3.0b1 - June 27, 2006
1094
1095 June 24, 2006
1096
1097 * syn preds no longer generate little static classes; they also don't
1098 generate a whole bunch of extra crap in the rules built to test syn
1099 preds. Removed GrammarFragmentPointer class from runtime.
1100
1101 June 23-24, 2006
1102
1103 * added output option to -report output.
1104
1105 * added profiling info:
1106 Number of rule invocations in "guessing" mode
1107 number of rule memoization cache hits
1108 number of rule memoization cache misses
1109
1110 * made DFA DOT diagrams go left to right not top to bottom
1111
1112 * I try to recursive overflow states now by resolving these states
1113 with semantic/syntactic predicates if they exist. The DFA is then
1114 deterministic rather than simply resolving by choosing first
1115 nondeterministic alt. I used to generated errors:
1116
1117 ~/tmp $ java org.antlr.Tool -dfa t.g
1118 ANTLR Parser Generator Early Access Version 3.0b2 (July 5, 2006) 1989-2006
1119 t.g:2:5: Alternative 1: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b
1120 t.g:2:5: Alternative 2: after matching input such as A A A A A decision cannot predict what comes next due to recursion overflow to b from b
1121
1122 Now, I uses predicates if available and emits no warnings.
1123
1124 * made sem preds share accept states. Previously, multiple preds in a
1125 decision forked new accepts each time for each nondet state.
1126
1127 June 19, 2006
1128
1129 * Need parens around the prediction expressions in templates.
1130
1131 * Referencing $ID.text in an action forced bad code gen in lexer rule ID.
1132
1133 * Fixed a bug in how predicates are collected. The definition of
1134 "last predicated alternative" was incorrect in the analysis. Further,
1135 gated predicates incorrectly missed a case where an edge should become
1136 true (a tautology).
1137
1138 * Removed an unnecessary input.consume() reference in the runtime/DFA class.
1139
1140 June 14, 2006
1141
1142 * -> ($rulelabel)? didn't generate proper code for ASTs.
1143
1144 * bug in code gen (did not compile)
1145 a : ID -> ID
1146 | ID -> ID
1147 ;
1148 Problem is repeated ref to ID from left side. Juergen pointed this out.
1149
1150 * use of tokenVocab with missing file yielded exception
1151
1152 * (A|B)=> foo yielded an exception as (A|B) is a set not a block. Fixed.
1153
1154 * Didn't set ID1= and INT1= for this alt:
1155 | ^(ID INT+ {System.out.print(\"^(\"+$ID+\" \"+$INT+\")\");})
1156
1157 * Fixed so repeated dangling state errors only occur once like:
1158 t.g:4:17: the decision cannot distinguish between alternative(s) 2,1 for at least one input sequence
1159
1160 * tracking of rule elements was on (making list defs at start of
1161 method) with templates instead of just with ASTs. Turned off.
1162
1163 * Doesn't crash when you give it a missing file now.
1164
1165 * -report: add output info: how many LL(1) decisions.
1166
1167 June 13, 2006
1168
1169 * ^(ROOT ID?) Didn't work; nor did any other nullable child list such as
1170 ^(ROOT ID* INT?). Now, I check to see if child list is nullable using
1171 Grammar.LOOK() and, if so, I generate an "IF lookahead is DOWN" gate
1172 around the child list so the whole thing is optional.
1173
1174 * Fixed a bug in LOOK that made it not look through nullable rules.
1175
1176 * Using AST suffixes or -> rewrite syntax now gives an error w/o a grammar
1177 output option. Used to crash ;)
1178
1179 * References to EOF ended up with improper -1 refs instead of EOF in output.
1180
1181 * didn't warn of ambig ref to $expr in rewrite; fixed.
1182 list
1183 : '[' expr 'for' type ID 'in' expr ']'
1184 -> comprehension(expr={$expr.st},type={},list={},i={})
1185 ;
1186
1187 June 12, 2006
1188
1189 * EOF works in the parser as a token name.
1190
1191 * Rule b:(A B?)*; didn't display properly in AW due to the way ANTLR
1192 generated NFA.
1193
1194 * "scope x;" in a rule for unknown x gives no error. Fixed. Added unit test.
1195
1196 * Label type for refs to start/stop in tree parser and other parsers were
1197 not used. Lots of casting. Ick. Fixed.
1198
1199 * couldn't refer to $tokenlabel in isolation; but need so we can test if
1200 something was matched. Fixed.
1201
1202 * Lots of little bugs fixed in $x.y, %... translation due to new
1203 action translator.
1204
1205 * Improperly tracking block nesting level; result was that you couldn't
1206 see $ID in action of rule "a : A+ | ID {Token t = $ID;} | C ;"
1207
1208 * a : ID ID {$ID.text;} ; did not get a warning about ambiguous $ID ref.
1209
1210 * No error was found on $COMMENT.text:
1211
1212 COMMENT
1213 : '/*' (options {greedy=false;} : . )* '*/'
1214 {System.out.println("found method "+$COMMENT.text);}
1215 ;
1216
1217 $enclosinglexerrule scope does not exist. Use text or setText() here.
1218
1219 June 11, 2006
1220
1221 * Single return values are initialized now to default or to your spec.
1222
1223 * cleaned up input stream stuff. Added ANTLRReaderStream, ANTLRInputStream
1224 and refactored. You can specify encodings now on ANTLRFileStream (and
1225 ANTLRInputStream) now.
1226
1227 * You can set text local var now in a lexer rule and token gets that text.
1228 start/stop indexes are still set for the token.
1229
1230 * Changed lexer slightly. Calling a nonfragment rule from a
1231 nonfragment rule does not set the overall token.
1232
1233 June 10, 2006
1234
1235 * Fixed bug where unnecessary escapes yield char==0 like '\{'.
1236
1237 * Fixed analysis bug. This grammar didn't report a recursion warning:
1238 x : y X
1239 | y Y
1240 ;
1241 y : L y R
1242 | B
1243 ;
1244 The DFAState.equals() method was messed up.
1245
1246 * Added @synpredgate {...} action so you can tell ANTLR how to gate actions
1247 in/out during syntactic predicate evaluation.
1248
1249 * Fuzzy parsing should be more efficient. It should backtrack over a rule
1250 and then rewind and do it again "with feeling" to exec actions. It was
1251 actually doing it 3x not 2x.
1252
1253 June 9, 2006
1254
1255 * Gutted and rebuilt the action translator for $x.y, $x::y, ...
1256 Uses ANTLR v3 now for the first time inside v3 source. :)
1257 ActionTranslator.java
1258
1259 * Fixed a bug where referencing a return value on a rule didn't work
1260 because later a ref to that rule's predefined properties didn't
1261 properly force a return value struct to be built. Added unit test.
1262
1263 June 6, 2006
1264
1265 * New DFA mechanisms. Cyclic DFA are implemented as state tables,
1266 encoded via strings as java cannot handle large static arrays :(
1267 States with edges emanating that have predicates are specially
1268 treated. A method is generated to do these states. The DFA
1269 simulation routine uses the "special" array to figure out if the
1270 state is special. See March 25, 2006 entry for description:
1271 http://www.antlr.org/blog/antlr3/codegen.tml. analysis.DFA now has
1272 all the state tables generated for code gen. CyclicCodeGenerator.java
1273 disappeared as it's unneeded code. :)
1274
1275 * Internal general clean up of the DFA.states vs uniqueStates thing.
1276 Fixed lookahead decisions no longer fill uniqueStates. Waste of
1277 time. Also noted that when adding sem pred edges, I didn't check
1278 for state reuse. Fixed.
1279
1280 June 4, 2006
1281
1282 * When resolving ambig DFA states predicates, I did not add the new states
1283 to the list of unique DFA states. No observable effect on output except
1284 that DFA state numbers were not always contiguous for predicated decisions.
1285 I needed this fix for new DFA tables.
1286
1287 3.0ea10 - June 2, 2006
1288
1289 June 2, 2006
1290
1291 * Improved grammar stats and added syntactic pred tracking.
1292
1293 June 1, 2006
1294
1295 * Due to a type mismatch, the DebugParser.recoverFromMismatchedToken()
1296 method was not called. Debug events for mismatched token error
1297 notification were not sent to ANTLRWorks probably
1298
1299 * Added getBacktrackingLevel() for any recognizer; needed for profiler.
1300
1301 * Only writes profiling data for antlr grammar analysis with -profile set
1302
1303 * Major update and bug fix to (runtime) Profiler.
1304
1305 May 27, 2006
1306
1307 * Added Lexer.skip() to force lexer to ignore current token and look for
1308 another; no token is created for current rule and is not passed on to
1309 parser (or other consumer of the lexer).
1310
1311 * Parsers are much faster now. I removed use of java.util.Stack for pushing
1312 follow sets and use a hardcoded array stack instead. Dropped from
1313 5900ms to 3900ms for parse+lex time parsing entire java 1.4.2 source. Lex
1314 time alone was about 1500ms. Just looking at parse time, we get about 2x
1315 speed improvement. :)
1316
1317 May 26, 2006
1318
1319 * Fixed NFA construction so it generates NFA for (A*)* such that ANTLRWorks
1320 can display it properly.
1321
1322 May 25, 2006
1323
1324 * added abort method to Grammar so AW can terminate the conversion if it's
1325 taking too long.
1326
1327 May 24, 2006
1328
1329 * added method to get left recursive rules from grammar without doing full
1330 grammar analysis.
1331
1332 * analysis, code gen not attempted if serious error (like
1333 left-recursion or missing rule definition) occurred while reading
1334 the grammar in and defining symbols.
1335
1336 * added amazing optimization; reduces analysis time by 90% for java
1337 grammar; simple IF statement addition!
1338
1339 3.0ea9 - May 20, 2006
1340
1341 * added global k value for grammar to limit lookahead for all decisions unless
1342 overridden in a particular decision.
1343
1344 * added failsafe so that any decision taking longer than 2 seconds to create
1345 the DFA will fall back on k=1. Use -ImaxtimeforDFA n (in ms) to set the time.
1346
1347 * added an option (turned off for now) to use multiple threads to
1348 perform grammar analysis. Not much help on a 2-CPU computer as
1349 garbage collection seems to peg the 2nd CPU already. :( Gotta wait for
1350 a 4 CPU box ;)
1351
1352 * switched from #src to // $ANTLR src directive.
1353
1354 * CommonTokenStream.getTokens() looked past end of buffer sometimes. fixed.
1355
1356 * unicode literals didn't really work in DOT output and generated code. fixed.
1357
1358 * fixed the unit test rig so it compiles nicely with Java 1.5
1359
1360 * Added ant build.xml file (reads build.properties file)
1361
1362 * predicates sometimes failed to compile/eval properly due to missing (...)
1363 in IF expressions. Forced (..)
1364
1365 * (...)? with only one alt were not optimized. Was:
1366
1367 // t.g:4:7: ( B )?
1368 int alt1=2;
1369 int LA1_0 = input.LA(1);
1370 if ( LA1_0==B ) {
1371 alt1=1;
1372 }
1373 else if ( LA1_0==-1 ) {
1374 alt1=2;
1375 }
1376 else {
1377 NoViableAltException nvae =
1378 new NoViableAltException("4:7: ( B )?", 1, 0, input);
1379 throw nvae;
1380 }
1381
1382 is now:
1383
1384 // t.g:4:7: ( B )?
1385 int alt1=2;
1386 int LA1_0 = input.LA(1);
1387 if ( LA1_0==B ) {
1388 alt1=1;
1389 }
1390
1391 Smaller, faster and more readable.
1392
1393 * Allow manual init of return values now:
1394 functionHeader returns [int x=3*4, char (*f)()=null] : ... ;
1395
1396 * Added optimization for DFAs that fixed a codegen bug with rules in lexer:
1397 EQ : '=' ;
1398 ASSIGNOP : '=' | '+=' ;
1399 EQ is a subset of other rule. It did not given an error which is
1400 correct, but generated bad code.
1401
1402 * ANTLR was sending column not char position to ANTLRWorks.
1403
1404 * Bug fix: location 0, 0 emitted for synpreds and empty alts.
1405
1406 * debugging event handshake how sends grammar file name. Added getGrammarFileName() to recognizers. Java.stg generates it:
1407
1408 public String getGrammarFileName() { return "<fileName>"; }
1409
1410 * tree parsers can do arbitrary lookahead now including backtracking. I
1411 updated CommonTreeNodeStream.
1412
1413 * added events for debugging tree parsers:
1414
1415 /** Input for a tree parser is an AST, but we know nothing for sure
1416 * about a node except its type and text (obtained from the adaptor).
1417 * This is the analog of the consumeToken method. Again, the ID is
1418 * the hashCode usually of the node so it only works if hashCode is
1419 * not implemented.
1420 */
1421 public void consumeNode(int ID, String text, int type);
1422
1423 /** The tree parser looked ahead */
1424 public void LT(int i, int ID, String text, int type);
1425
1426 /** The tree parser has popped back up from the child list to the
1427 * root node.
1428 */
1429 public void goUp();
1430
1431 /** The tree parser has descended to the first child of a the current
1432 * root node.
1433 */
1434 public void goDown();
1435
1436 * Added DebugTreeNodeStream and DebugTreeParser classes
1437
1438 * Added ctor because the debug tree node stream will need to ask quesitons about nodes and since nodes are just Object, it needs an adaptor to decode the nodes and get text/type info for the debugger.
1439
1440 public CommonTreeNodeStream(TreeAdaptor adaptor, Tree tree);
1441
1442 * added getter to TreeNodeStream:
1443 public TreeAdaptor getTreeAdaptor();
1444
1445 * Implemented getText/getType in CommonTreeAdaptor.
1446
1447 * Added TraceDebugEventListener that can dump all events to stdout.
1448
1449 * I broke down and make Tree implement getText
1450
1451 * tree rewrites now gen location debug events.
1452
1453 * added AST debug events to listener; added blank listener for convenience
1454
1455 * updated debug events to send begin/end backtrack events for debugging
1456
1457 * with a : (b->b) ('+' b -> ^(PLUS $a b))* ; you get b[0] each time as
1458 there is no loop in rewrite rule itself. Need to know context that
1459 the -> is inside the rule and hence b means last value of b not all
1460 values.
1461
1462 * Bug in TokenRewriteStream; ops at indexes < start index blocked proper op.
1463
1464 * Actions in ST rewrites "-> ({$op})()" were not translated
1465
1466 * Added new action name:
1467
1468 @rulecatch {
1469 catch (RecognitionException re) {
1470 reportError(re);
1471 recover(input,re);
1472 }
1473 catch (Throwable t) {
1474 System.err.println(t);
1475 }
1476 }
1477 Overrides rule catch stuff.
1478
1479 * Isolated $ refs caused exception
1480
1481 3.0ea8 - March 11, 2006
1482
1483 * added @finally {...} action like @init for rules. Executes in
1484 finally block (java target) after all other stuff like rule memoization.
1485 No code changes needs; ST just refs a new action:
1486 <ruleDescriptor.actions.finally>
1487
1488 * hideous bug fixed: PLUS='+' didn't result in '+' rule in lexer
1489
1490 * TokenRewriteStream didn't do toString() right when no rewrites had been done.
1491
1492 * lexer errors in interpreter were not printed properly
1493
1494 * bitsets are dumped in hex not decimal now for FOLLOW sets
1495
1496 * /* epsilon */ is not printed now when printing out grammars with empty alts
1497
1498 * Fixed another bug in tree rewrite stuff where it was checking that elements
1499 had at least one element. Strange...commented out for now to see if I can remember what's up.
1500
1501 * Tree rewrites had problems when you didn't have x+=FOO variables. Rules
1502 like this work now:
1503
1504 a : (x=ID)? y=ID -> ($x $y)?;
1505
1506 * filter=true for lexers turns on k=1 and backtracking for every token
1507 alternative. Put the rules in priority order.
1508
1509 * added getLine() etc... to Tree to support better error reporting for
1510 trees. Added MismatchedTreeNodeException.
1511
1512 * $templates::foo() is gone. added % as special template symbol.
1513 %foo(a={},b={},...) ctor (even shorter than $templates::foo(...))
1514 %({name-expr})(a={},...) indirect template ctor reference
1515
1516 The above are parsed by antlr.g and translated by codegen.g
1517 The following are parsed manually here:
1518
1519 %{string-expr} anonymous template from string expr
1520 %{expr}.y = z; template attribute y of StringTemplate-typed expr to z
1521 %x.y = z; set template attribute y of x (always set never get attr)
1522 to z [languages like python without ';' must still use the
1523 ';' which the code generator is free to remove during code gen]
1524
1525 * -> ({expr})(a={},...) notation for indirect template rewrite.
1526 expr is the name of the template.
1527
1528 * $x[i]::y and $x[-i]::y notation for accesssing absolute scope stack
1529 indexes and relative negative scopes. $x[-1]::y is the y attribute
1530 of the previous scope (stack top - 1).
1531
1532 * filter=true mode for lexers; can do this now...upon mismatch, just
1533 consumes a char and tries again:
1534 lexer grammar FuzzyJava;
1535 options {filter=true;}
1536
1537 FIELD
1538 : TYPE WS? name=ID WS? (';'|'=')
1539 {System.out.println("found var "+$name.text);}
1540 ;
1541
1542 * refactored char streams so ANTLRFileStream is now a subclass of
1543 ANTLRStringStream.
1544
1545 * char streams for lexer now allowed nested backtracking in lexer.
1546
1547 * added TokenLabelType for lexer/parser for all token labels
1548
1549 * line numbers for error messages were not updated properly in antlr.g
1550 for strings, char literals and <<...>>
1551
1552 * init action in lexer rules was before the type,start,line,... decls.
1553
1554 * Tree grammars can now specify output; I've only tested output=templat
1555 though.
1556
1557 * You can reference EOF now in the parser and lexer. It's just token type
1558 or char value -1.
1559
1560 * Bug fix: $ID refs in the *lexer* were all messed up. Cleaned up the
1561 set of properties available...
1562
1563 * Bug fix: .st not found in rule ref when rule has scope:
1564 field
1565 scope {
1566 StringTemplate funcDef;
1567 }
1568 : ...
1569 {$field::funcDef = $field.st;}
1570 ;
1571 it gets field_stack.st instead
1572
1573 * return in backtracking must return retval or null if return value.
1574
1575 * $property within a rule now works like $text, $st, ...
1576
1577 * AST/Template Rewrites were not gated by backtracking==0 so they
1578 executed even when guessing. Auto AST construction is now gated also.
1579
1580 * CommonTokenStream was somehow returning tokens not text in toString()
1581
1582 * added useful methods to runtime.BitSet and also to CommonToken so you can
1583 update the text. Added nice Token stream method:
1584
1585 /** Given a start and stop index, return a List of all tokens in
1586 * the token type BitSet. Return null if no tokens were found. This
1587 * method looks at both on and off channel tokens.
1588 */
1589 public List getTokens(int start, int stop, BitSet types);
1590
1591 * literals are now passed in the .tokens files so you can ref them in
1592 tree parses, for example.
1593
1594 * added basic exception handling; no labels, just general catches:
1595
1596 a : {;}A | B ;
1597 exception
1598 catch[RecognitionException re] {
1599 System.out.println("recog error");
1600 }
1601 catch[Exception e] {
1602 System.out.println("error");
1603 }
1604
1605 * Added method to TokenStream:
1606 public String toString(Token start, Token stop);
1607
1608 * antlr generates #src lines in lexer grammars generated from combined grammars
1609 so error messages refer to original file.
1610
1611 * lexers generated from combined grammars now use originally formatting.
1612
1613 * predicates have $x.y stuff translated now. Warning: predicates might be
1614 hoisted out of context.
1615
1616 * return values in return val structs are now public.
1617
1618 * output=template with return values on rules was broken. I assume return values with ASTs was broken too. Fixed.
1619
1620 3.0ea7 - December 14, 2005
1621
1622 * Added -print option to print out grammar w/o actions
1623
1624 * Renamed BaseParser to be BaseRecognizer and even made Lexer derive from
1625 this; nice as it now shares backtracking support code.
1626
1627 * Added syntactic predicates (...)=>. See December 4, 2005 entry:
1628
1629 http://www.antlr.org/blog/antlr3/lookahead.tml
1630
1631 Note that we have a new option for turning off rule memoization during
1632 backtracking:
1633
1634 -nomemo when backtracking don't generate memoization code
1635
1636 * Predicates are now tested in order that you specify the alts. If you
1637 leave the last alt "naked" (w/o pred), it will assume a true pred rather
1638 than union of other preds.
1639
1640 * Added gated predicates "{p}?=>" that literally turn off a production whereas
1641 disambiguating predicates are only hoisted into the predictor when syntax alone
1642 is not sufficient to uniquely predict alternatives.
1643
1644 A : {p}? => "a" ;
1645 B : {!p}? => ("a"|"b")+ ;
1646
1647 * bug fixed related to predicates in predictor
1648 lexer grammar w;
1649 A : {p}? "a" ;
1650 B : {!p}? ("a"|"b")+ ;
1651 DFA is correct. A state splits for input "a" on the pred.
1652 Generated code though was hosed. No pred tests in prediction code!
1653 I added testLexerPreds() and others in TestSemanticPredicateEvaluation.java
1654
1655 * added execAction template in case we want to do something in front of
1656 each action execution or something.
1657
1658 * left-recursive cycles from rules w/o decisions were not detected.
1659
1660 * undefined lexer rules were not announced! fixed.
1661
1662 * unreachable messages for Tokens rule now indicate rule name not alt. E.g.,
1663
1664 Ruby.lexer.g:24:1: The following token definitions are unreachable: IVAR
1665
1666 * nondeterminism warnings improved for Tokens rule:
1667
1668 Ruby.lexer.g:10:1: Multiple token rules can match input such as ""0".."9"": INT, FLOAT
1669 As a result, tokens(s) FLOAT were disabled for that input
1670
1671
1672 * DOT diagrams didn't show escaped char properly.
1673
1674 * Char/string literals are now all 'abc' not "abc".
1675
1676 * action syntax changed "@scope::actionname {action}" where scope defaults
1677 to "parser" if parser grammar or combined grammar, "lexer" if lexer grammar,
1678 and "treeparser" if tree grammar. The code generation targets decide
1679 what scopes are available. Each "scope" yields a hashtable for use in
1680 the output templates. The scopes full of actions are sent to all output
1681 file templates (currently headerFile and outputFile) as attribute actions.
1682 Then you can reference <actions.scope> to get the map of actions associated
1683 with scope and <actions.parser.header> to get the parser's header action
1684 for example. This should be very flexible. The target should only have
1685 to define which scopes are valid, but the action names should be variable
1686 so we don't have to recompile ANTLR to add actions to code gen templates.
1687
1688 grammar T;
1689 options {language=Java;}
1690 @header { package foo; }
1691 @parser::stuff { int i; } // names within scope not checked; target dependent
1692 @members { int i; }
1693 @lexer::header {head}
1694 @lexer::members { int j; }
1695 @headerfile::blort {...} // error: this target doesn't have headerfile
1696 @treeparser::members {...} // error: this is not a tree parser
1697 a
1698 @init {int i;}
1699 : ID
1700 ;
1701 ID : 'a'..'z';
1702
1703 For now, the Java target uses members and header as a valid name. Within a
1704 rule, the init action name is valid.
1705
1706 * changed $dynamicscope.value to $dynamicscope::value even if value is defined
1707 in same rule such as $function::name where rule function defines name.
1708
1709 * $dynamicscope gets you the stack
1710
1711 * rule scopes go like this now:
1712
1713 rule
1714 scope {...}
1715 scope slist,Symbols;
1716 : ...
1717 ;
1718
1719 * Created RuleReturnScope as a generic rule return value. Makes it easier
1720 to do this:
1721 RuleReturnScope r = parser.program();
1722 System.out.println(r.getTemplate().toString());
1723
1724 * $template, $tree, $start, etc...
1725
1726 * $r.x in current rule. $r is ignored as fully-qualified name. $r.start works too
1727
1728 * added warning about $r referring to both return value of rule and dynamic scope of rule
1729
1730 * integrated StringTemplate in a very simple manner
1731
1732 Syntax:
1733 -> template(arglist) "..."
1734 -> template(arglist) <<...>>
1735 -> namedTemplate(arglist)
1736 -> {free expression}
1737 -> // empty
1738
1739 Predicate syntax:
1740 a : A B -> {p1}? foo(a={$A.text})
1741 -> {p2}? foo(a={$B.text})
1742 -> // return nothing
1743
1744 An arg list is just a list of template attribute assignments to actions in curlies.
1745
1746 There is a setTemplateLib() method for you to use with named template rewrites.
1747
1748 Use a new option:
1749
1750 grammar t;
1751 options {output=template;}
1752 ...
1753
1754 This all should work for tree grammars too, but I'm still testing.
1755
1756 * fixed bugs where strings were improperly escaped in exceptions, comments, etc.. For example, newlines came out as newlines not the escaped version
1757
1758 3.0ea6 - November 13, 2005
1759
1760 * turned off -debug/-profile, which was on by default
1761
1762 * completely refactored the output templates; added some missing templates.
1763
1764 * dramatically improved infinite recursion error messages (actually
1765 left-recursion never even was printed out before).
1766
1767 * wasn't printing dangling state messages when it reanalyzes with k=1.
1768
1769 * fixed a nasty bug in the analysis engine dealing with infinite recursion.
1770 Spent all day thinking about it and cleaned up the code dramatically.
1771 Bug fixed and software is more powerful and I understand it better! :)
1772
1773 * improved verbose DFA nodes; organized by alt
1774
1775 * got much better random phrase generation. For example:
1776
1777 $ java org.antlr.tool.RandomPhrase simple.g program
1778 int Ktcdn ';' method wh '(' ')' '{' return 5 ';' '}'
1779
1780 * empty rules like "a : ;" generated code that didn't compile due to
1781 try/catch for RecognitionException. Generated code couldn't possibly
1782 throw that exception.
1783
1784 * when printing out a grammar, such as in comments in generated code,
1785 ANTLR didn't print ast suffix stuff back out for literals.
1786
1787 * This never exited loop:
1788 DATA : (options {greedy=false;}: .* '\n' )* '\n' '.' ;
1789 and now it works due to new default nongreedy .* Also this works:
1790 DATA : (options {greedy=false;}: .* '\n' )* '.' ;
1791
1792 * Dot star ".*" syntax didn't work; in lexer it is nongreedy by
1793 default. In parser it is on greedy but also k=1 by default. Added
1794 unit tests. Added blog entry to describe.
1795
1796 * ~T where T is the only token yielded an empty set but no error
1797
1798 * Used to generate unreachable message here:
1799
1800 parser grammar t;
1801 a : ID a
1802 | ID
1803 ;
1804
1805 z.g:3:11: The following alternatives are unreachable: 2
1806
1807 In fact it should really be an error; now it generates:
1808
1809 no start rule in grammar t (no rule can obviously be followed by EOF)
1810
1811 Per next change item, ANTLR cannot know that EOF follows rule 'a'.
1812
1813 * added error message indicating that ANTLR can't figure out what your
1814 start rule is. Required to properly generate code in some cases.
1815
1816 * validating semantic predicates now work (if they are false, they
1817 throw a new FailedPredicateException
1818
1819 * two hideous bug fixes in the IntervalSet, which made analysis go wrong
1820 in a few cases. Thanks to Oliver Zeigermann for finding lots of bugs
1821 and making suggested fixes (including the next two items)!
1822
1823 * cyclic DFAs are now nonstatic and hence can access instance variables
1824
1825 * labels are now allowed on lexical elements (in the lexer)
1826
1827 * added some internal debugging options
1828
1829 * ~'a'* and ~('a')* were not working properly; refactored antlr.g grammar
1830
1831 3.0ea5 - July 5, 2005
1832
1833 * Using '\n' in a parser grammar resulted in a nonescaped version of '\n' in the token names table making compilation fail. I fixed this by reorganizing/cleaning up portion of ANTLR that deals with literals. See comment org.antlr.codegen.Target.
1834
1835 * Target.getMaxCharValue() did not use the appropriate max value constant.
1836
1837 * ALLCHAR was a constant when it should use the Target max value def. set complement for wildcard also didn't use the Target def. Generally cleaned up the max char value stuff.
1838
1839 * Code gen didn't deal with ASTLabelType properly...I think even the 3.0ea7 example tree parser was broken! :(
1840
1841 * Added a few more unit tests dealing with escaped literals
1842
1843 3.0ea4 - June 29, 2005
1844
1845 * tree parsers work; added CommonTreeNodeStream. See simplecTreeParser
1846 example in examples-v3 tarball.
1847
1848 * added superClass and ASTLabelType options
1849
1850 * refactored Parser to have a BaseParser and added TreeParser
1851
1852 * bug fix: actions being dumped in description strings; compile errors
1853 resulted
1854
1855 3.0ea3 - June 23, 2005
1856
1857 Enhancements
1858
1859 * Automatic tree construction operators are in: ! ^ ^^
1860
1861 * Tree construction rewrite rules are in
1862 -> {pred1}? rewrite1
1863 -> {pred2}? rewrite2
1864 ...
1865 -> rewriteN
1866
1867 The rewrite rules may be elements like ID, expr, $label, {node expr}
1868 and trees ^( <root> <children> ). You have have (...)?, (...)*, (...)+
1869 subrules as well.
1870
1871 You may have rewrites in subrules not just at outer level of rule, but
1872 any -> rewrite forces auto AST construction off for that alternative
1873 of that rule.
1874
1875 To avoid cycles, copy semantics are used:
1876
1877 r : INT -> INT INT ;
1878
1879 means make two new nodes from the same INT token.
1880
1881 Repeated references to a rule element implies a copy for at least one
1882 tree:
1883
1884 a : atom -> ^(atom atom) ; // NOT CYCLE! (dup atom tree)
1885
1886 * $ruleLabel.tree refers to tree created by matching the labeled element.
1887
1888 * A description of the blocks/alts is generated as a comment in output code
1889
1890 * A timestamp / signature is put at top of each generated code file
1891
1892 3.0ea2 - June 12, 2005
1893
1894 Bug fixes
1895
1896 * Some error messages were missing the stackTrace parameter
1897
1898 * Removed the file locking mechanism as it's not cross platform
1899
1900 * Some absolute vs relative path name problems with writing output
1901 files. Rules are now more concrete. -o option takes precedence
1902 // -o /tmp /var/lib/t.g => /tmp/T.java
1903 // -o subdir/output /usr/lib/t.g => subdir/output/T.java
1904 // -o . /usr/lib/t.g => ./T.java
1905 // -o /tmp subdir/t.g => /tmp/subdir/t.g
1906 // If they didn't specify a -o dir so just write to location
1907 // where grammar is, absolute or relative
1908
1909 * does error checking on unknown option names now
1910
1911 * Using just language code not locale name for error message file. I.e.,
1912 the default (and for any English speaking locale) is en.stg not en_US.stg
1913 anymore.
1914
1915 * The error manager now asks the Tool to panic rather than simply doing
1916 a System.exit().
1917
1918 * Lots of refactoring concerning grammar, rule, subrule options. Now
1919 detects invalid options.
1920
1921 3.0ea1 - June 1, 2005
1922
1923 Initial early access release
1924
1925