Home | History | Annotate | Download | only in Pccts
      1 CHANGES FROM 1.31
      2 
      3 This file contains the migration of PCCTS from 1.31 in the order that
      4 changes were made.  1.32b7 is the last beta before full 1.32.
      5 Terence Parr, Parr Research Corporation 1995.
      6 
      7 
      8 ======================================================================
      9 1.32b1
     10 Added Russell Quong to banner, changed banner for output slightly
     11 Fixed it so that you have before / after actions for C++ in class def
     12 Fixed bug in optimizer that made it sometimes forget to set internal
     13         token pointers.  Only showed up when a {...} was in the "wrong spot".
     14 
     15 ======================================================================
     16 1.32b2
     17 Added fixes by Dave Seidel for PC compilers in 32 bit mode (config.h
     18 and set.h).
     19 
     20 ======================================================================
     21 1.32b3
     22 Fixed hideous bug in code generator for wildcard and for ~token op.
     23 
     24 from Dave Seidel
     25 
     26    Added pcnames.bat
     27    1. in antlr/main.c: change strcasecmp() to stricmp()
     28 
     29    2. in dlg/output.c: use DLEXER_C instead on "DLexer.C"
     30 
     31    3. in h/PBlackBox.h: use <iostream.h> instead of <stream.h>
     32 
     33 ======================================================================
     34 1.32b4
     35 When the -ft option was used, any path prefix screwed up
     36 the gate on the .h files
     37 
     38 Fixed yet another bug due to the optimizer.
     39 
     40 The exception handling thing was a bit wacko:
     41 
     42 a : ( A B )? A B
     43   | A C
     44   ;
     45   exception ...
     46 
     47 caused an exception if "A C" was the input.  In other words,
     48 it found that A C didn't match the (A B)? pred and caused
     49 an exception rather than trying the next alt.  All I did
     50 was to change the zzmatch_wsig() macros.
     51 
     52 Fixed some problems in gen.c relating to the name of token
     53 class bit sets in the output.
     54 
     55 Added the tremendously cool generalized predicate.  For the
     56 moment, I'll give this bried description.
     57 
     58 a : <<predicate>>? blah
     59   | foo
     60   ;
     61 
     62 This implies that (assuming blah and foo are syntactically
     63 ambiguous) "predicate" indicates the semantic validity of
     64 applying "blah".  If "predicate" is false, "foo" is attempted.
     65 
     66 Previously, you had to say:
     67 
     68 a : <<LA(1)==ID ? predicate : 1>>? ID
     69   | ID
     70   ;
     71 
     72 Now, you can simply use "predicate" without the ?: operator
     73 if you turn on ANTLR command line option: "-prc on".  This
     74 tells ANTLR to compute that all by itself.  It computes n
     75 tokens of lookahead where LT(n) or LATEXT(n) is the farthest
     76 ahead you look.
     77 
     78 If you give a predicate using "-prc on" that is followed
     79 by a construct that can recognize more than one n-sequence,
     80 you will get a warning from ANTLR.  For example,
     81 
     82 a : <<isTypeName(LT(1)->getText())>>? (ID|INT)
     83   ;
     84 
     85 This is wrong because the predicate will be applied to INTs
     86 as well as ID's.  You should use this syntax to make
     87 the predicate more specific:
     88 
     89 a : (ID)? => <<isTypeName(LT(1)->getText())>>? (ID|INT)
     90   ;
     91 
     92 which says "don't apply the predicate unless ID is the
     93 current lookahead context".
     94 
     95 You cannot currently have anything in the "(context)? =>"
     96 except sequences such as:
     97 
     98 ( LPAREN ID | LPAREN SCOPE )? => <<pred>>?
     99 
    100 I haven't tested this THAT much, but it does work for the
    101 C++ grammar.
    102 
    103 ======================================================================
    104 1.32b5
    105 
    106 Added getLine() to the ANTLRTokenBase and DLGBasedToken classes
    107 left line() for backward compatibility.
    108 ----
    109 Removed SORCERER_TRANSFORM from the ast.h stuff.
    110 -------
    111 Fixed bug in code gen of ANTLR such that nested syn preds work more
    112 efficiently now.  The ANTLRTokenBuffer was getting very large
    113 with nested predicates.
    114 ------
    115 Memory leak is now gone from ANTLRTokenBuf; all tokens are deleted.
    116 For backward compatibility reasons, you have to say parser->deleteTokens()
    117 or mytokenbuffer->deleteTokens() but later it will be the default mode.
    118 Say this after the parser is constructed. E.g.,
    119 
    120     ParserBlackBox<DLGLexer, MyParser, ANTLRToken> p(stdin);
    121     p.parser()->deleteTokens();
    122     p.parser()->start_symbol();
    123 
    124 
    125 ==============================
    126 1.32b6
    127 
    128 Changed so that deleteTokens() will do a delete ((ANTLRTokenBase *))
    129 on the ptr.  This gets the virtual destructor.
    130 
    131 Fixed some weird things in the C++ header files (a few return types).
    132 
    133 Made the AST routines correspond to the book and SORCERER stuff.
    134 
    135 New token stuff:  See testcpp/14/test.g
    136 
    137 ANTLR accepts a #pragma gc_tokens which says
    138 [1]     Generate label = copy(LT(1)) instead of label=LT(1) for
    139         all labeled token references.
    140 [2]     User now has to define ANTLRTokenPtr (as a class or a typedef
    141         to just a pointer) as well as the ANTLRToken class itself.
    142 		See the example.
    143 
    144 To delete tokens in token buffer, use deleteTokens() message on parser.
    145 
    146         All tokens that fall off the ANTLRTokenBuffer get deleted
    147         which is what currently happens when deleteTokens() message
    148         has been sent to token buffer.
    149 
    150 We always generate ANTLRTokenPtr instead of 'ANTLRToken *' now.
    151 Then if no pragma set, ANTLR generates a
    152 
    153         class ANTLRToken;
    154         typedef ANTLRToken *ANTLRTokenPtr;
    155 
    156 in each file.
    157 
    158 Made a warning for x:rule_ref <<$x>>; still no warning for $i's, however.
    159 class BB {
    160 
    161 a : x:b y:A <<$x
    162 $y>>
    163   ;
    164 
    165 b : B;
    166 
    167 }
    168 generates
    169 Antlr parser generator   Version 1.32b6   1989-1995
    170 test.g, line 3: error: There are no token ptrs for rule references: '$x'
    171 
    172 ===================
    173 1.32b7:
    174 
    175 [With respect to token object garbage collection (GC), 1.32b7
    176  backtracks from 1.32b6, but results in better and less intrusive GC.
    177  This is the last beta version before full 1.32.]
    178 
    179 BIGGEST CHANGES:
    180 
    181 o	The "#pragma gc_tokens" is no longer used.
    182 
    183 o	.C files are now .cpp files (hence, makefiles will have to
    184 	be changed; or you can rerun genmk).  This is a good move,
    185 	but causes some backward incompatibility problems.  You can
    186 	avoid this by changing CPP_FILE_SUFFIX to ".C" in pccts/h/config.h.
    187 
    188 o	The token object class hierarchy has been flattened to include
    189 	only three classes: ANTLRAbstractToken, ANTLRCommonToken, and
    190 	ANTLRCommonNoRefCountToken.  The common token now does garbage
    191 	collection via ref counting.
    192 
    193 o	"Smart" pointers are now used for garbage collection.  That is,
    194 	ANTLRTokenPtr is used instead of "ANTLRToken *".
    195 
    196 o	The antlr.1 man page has been cleaned up slightly.
    197 
    198 o	The SUN C++ compiler now complains less about C++ support code.
    199 
    200 o	Grammars which subclass ANTLRCommonToken must wrap all token
    201 	pointer references in mytoken(token_ptr).  This is the only
    202 	serious backward incompatibility.  See below.
    203 
    204 
    205 MINOR CHANGES:
    206 
    207 --------------------------------------------------------
    208 1	deleteTokens()
    209 
    210 The deleteTokens() message to the parser or token buffer has been changed
    211 to one of:
    212 
    213     void noGarbageCollectTokens()   { inputTokens->noGarbageCollectTokens(); }
    214     void garbageCollectTokens()     { inputTokens->garbageCollectTokens(); }
    215 
    216 The token buffer deletes all non-referenced tokens by default now.
    217 
    218 --------------------------------------------------------
    219 2	makeToken()
    220 
    221 The makeToken() message returns a new type.  The function should look
    222 like:
    223 
    224     virtual ANTLRAbstractToken *makeToken(ANTLRTokenType tt,
    225                                           ANTLRChar *txt,
    226                                           int line)
    227     {
    228         ANTLRAbstractToken *t = new ANTLRCommonToken(tt,txt);
    229         t->setLine(line);
    230         return t;
    231     }
    232 
    233 --------------------------------------------------------
    234 3	TokenType
    235 
    236 Changed TokenType-> ANTLRTokenType  (often forces changes in AST defs due
    237 to #[] constructor called to AST(tokentype, string)).
    238 
    239 --------------------------------------------------------
    240 4	AST()
    241 
    242 You must define AST(ANTLRTokenPtr t) now in your AST class definition.
    243 You might also have to include ATokPtr.h above the definition; e.g.,
    244 if AST is defined in a separate file, such as AST.h, it's a good idea
    245 to include ATOKPTR_H (ATokPtr.h).  For example,
    246 
    247 	#include ATOKPTR_H
    248 	class AST : public ASTBase {
    249 	protected:
    250 	    ANTLRTokenPtr token;
    251 	public:
    252 	    AST(ANTLRTokenPtr t) { token = t; }
    253 	    void preorder_action() {
    254 	        char *s = token->getText();
    255 	        printf(" %s", s);
    256 	    }
    257 	};
    258 
    259 Note the use of smart pointers rather than "ANTLRToken *".
    260 
    261 --------------------------------------------------------
    262 5	SUN C++
    263 
    264 From robertb oakhill.sps.mot.com Bob Bailey. Changed ANTLR C++ output
    265 to avoid an error in Sun C++ 3.0.1.  Made "public" return value
    266 structs created to hold multiple return values public.
    267 
    268 --------------------------------------------------------
    269 6	genmk
    270 
    271 Fixed genmk so that target List.* is not included anymore.  It's
    272 called SList.* anyway.
    273 
    274 --------------------------------------------------------
    275 7	\r vs \n
    276 
    277 Scott Vorthmann <vorth cmu.edu> fixed antlr.g in ANTLR so that \r
    278 is allowed as the return character as well as \n.
    279 
    280 --------------------------------------------------------
    281 8	Exceptions
    282 
    283 Bug in exceptions attached to labeled token/tokclass references.  Didn't gen
    284 code for exceptions.  This didn't work:
    285 
    286 a : "help" x:ID
    287   ;
    288         exception[x]
    289         catch MismatchedToken : <<printf("eh?\n");>>
    290 
    291 Now ANTLR generates (which is kinda big, but necessary):
    292 
    293         if ( !_match_wsig(ID) ) {
    294                 if ( guessing ) goto fail;
    295                 _signal=MismatchedToken;
    296                 switch ( _signal ) {
    297                 case MismatchedToken :
    298                         printf("eh?\n");
    299                         _signal = NoSignal;
    300                         break;
    301                 default :
    302                         goto _handler;
    303                 }
    304         }
    305 
    306 which implies that you can recover and continue parsing after a missing/bad
    307 token reference.
    308 
    309 --------------------------------------------------------
    310 9	genmk
    311 
    312 genmk now correctly uses config file for CPP_FILE_SUFFIX stuff.
    313 
    314 --------------------------------------------------------
    315 10	general cleanup / PURIFY
    316 
    317 Anthony Green <green vizbiz.com> suggested a bunch of good general
    318 clean up things for the code; he also suggested a few things to
    319 help out the "PURIFY" memory allocation checker.
    320 
    321 --------------------------------------------------------
    322 11	$-variable references.
    323 
    324 Manuel ORNATO indicated that a $-variable outside of a rule caused
    325 ANTLR to crash.  I fixed this.
    326 
    327 12	Tom Moog suggestion
    328 
    329 Fail action of semantic predicate needs "{}" envelope.  FIXED.
    330 
    331 13	references to LT(1).
    332 
    333 I have enclosed all assignments such as:
    334 
    335              _t22 = (ANTLRTokenPtr)LT(1);
    336 
    337 in "if ( !guessing )" so that during backtracking the reference count
    338 for token objects is not increased.
    339 
    340 
    341 TOKEN OBJECT GARBAGE COLLECTION
    342 
    343 1	INTRODUCTION
    344 
    345 The class ANTLRCommonToken is now garbaged collected through a "smart"
    346 pointer called ANTLRTokenPtr using reference counting.  Any token
    347 object not referenced by your grammar actions is destroyed by the
    348 ANTLRTokenBuffer when it must make room for more token objects.
    349 Referenced tokens are then destroyed in your parser when local
    350 ANTLRTokenPtr objects are deleted.  For example,
    351 
    352 a : label:ID ;
    353 
    354 would be converted to something like:
    355 
    356 void yourclass::a(void)
    357 {
    358 	zzRULE;
    359 	ANTLRTokenPtr label=NULL;	// used to be ANTLRToken *label;
    360         zzmatch(ID);
    361         label = (ANTLRTokenPtr)LT(1);
    362 	consume();
    363 	...
    364 }
    365 
    366 When the "label" object is destroyed (it's just a pointer to your
    367 input token object LT(1)), it decrements the reference count on the
    368 object created for the ID.  If the count goes to zero, the object
    369 pointed by label is deleted.
    370 
    371 To correctly manage the garbage collection, you should use
    372 ANTLRTokenPtr instead of "ANTLRToken *".  Most ANTLR support code
    373 (visible to the user) has been modified to use the smart pointers.
    374 
    375 ***************************************************************
    376 Remember that any local objects that you create are not deleted when a
    377 lonjmp() is executed.  Unfortunately, the syntactic predicates (...)?
    378 use setjmp()/longjmp().  There are some situations when a few tokens
    379 will "leak".
    380 ***************************************************************
    381 
    382 2	DETAILS
    383 
    384 o	The default is to perform token object garbage collection.
    385 	You may use parser->noGarbageCollectTokens() to turn off
    386 	garbage collection.
    387 
    388 
    389 o	The type ANTLRTokenPtr is always defined now (automatically).
    390 	If you do not wish to use smart pointers, you will have to
    391 	redefined ANTLRTokenPtr by subclassing, changing the header
    392 	file or changing ANTLR's code generation (easy enough to
    393 	do in gen.c).
    394 
    395 o	If you don't use ParserBlackBox, the new initialization sequence is:
    396 
    397 	    ANTLRTokenPtr aToken = new ANTLRToken;
    398 	    scan.setToken(mytoken(aToken));
    399 
    400 	where mytoken(aToken) gets an ANTLRToken * from the smart pointer.
    401 
    402 o	Define C++ preprocessor symbol DBG_REFCOUNTTOKEN to see a bunch of
    403 	debugging stuff for reference counting if you suspect something.
    404 
    405 
    406 3	WHY DO I HAVE TO TYPECAST ALL MY TOKEN POINTERS NOW??????
    407 
    408 If you subclass ANTLRCommonToken and then attempt to refer to one of
    409 your token members via a token pointer in your grammar actions, the
    410 C++ compiler will complain that your token object does not have that
    411 member.  For example, if you used to do this
    412 
    413 <<
    414 class ANTLRToken : public ANTLRCommonToken {
    415         int muck;
    416 	...
    417 };
    418 >>
    419 
    420 class Foo {
    421 a : t:ID << t->muck = ...; >> ;
    422 }
    423 
    424 Now, you must do change the t->muck reference to:
    425 
    426 a : t:ID << mytoken(t)->muck = ...; >> ;
    427 
    428 in order to downcast 't' to be an "ANTLRToken *" not the
    429 "ANTLRAbstractToken *" resulting from ANTLRTokenPtr::operator->().
    430 The macro is defined as:
    431 
    432 /*
    433  * Since you cannot redefine operator->() to return one of the user's
    434  * token object types, we must down cast.  This is a drag.  Here's
    435  * a macro that helps.  template: "mytoken(a-smart-ptr)->myfield".
    436  */
    437 #define mytoken(tp) ((ANTLRToken *)(tp.operator->()))
    438 
    439 You have to use macro mytoken(grammar-label) now because smart
    440 pointers are not specific to a parser's token objects.  In other
    441 words, the ANTLRTokenPtr class has a pointer to a generic
    442 ANTLRAbstractToken not your ANTLRToken; the ANTLR support code must
    443 use smart pointers too, but be able to work with any kind of
    444 ANTLRToken.  Sorry about this, but it's C++'s fault not mine.  Some
    445 nebulous future version of the C++ compilers should obviate the need
    446 to downcast smart pointers with runtime type checking (and by allowing
    447 different return type of overridden functions).
    448 
    449 A way to have backward compatible code is to shut off the token object
    450 garbage collection; i.e., use parser->noGarbageCollectTokens() and
    451 change the definition of ANTLRTokenPtr (that's why you get source code
    452 <wink>).
    453 
    454 
    455 PARSER EXCEPTION HANDLING
    456 
    457 I've noticed some weird stuff with the exception handling.  I intend
    458 to give this top priority for the "book release" of ANTLR.
    459 
    460 ==========
    461 1.32 Full Release
    462 
    463 o	Changed Token class hierarchy to be (Thanks to Tom Moog):
    464 
    465         ANTLRAbstractToken
    466           ANTLRRefCountToken
    467              ANTLRCommonToken
    468           ANTLRNoRefCountCommonToken
    469 
    470 o	Added virtual panic() to ANTLRAbstractToken.  Made ANTLRParser::panic()
    471 	virtual also.
    472 
    473 o	Cleaned up the dup() stuff in AST hierarchy to use shallowCopy() to
    474 	make node copies.  John Farr at Medtronic suggested this.  I.e.,
    475 	if you want to use dup() with either ANTLR or SORCERER or -transform
    476 	mode with SORCERER, you must defined shallowCopy() as:
    477 
    478 	virtual PCCTS_AST *shallowCopy()
    479 	{
    480 	    return new AST;
    481 	    p->setDown(NULL);
    482 	    p->setRight(NULL);
    483 	    return p;
    484 	}
    485 
    486 	or
    487 
    488 	virtual PCCTS_AST *shallowCopy()
    489 	{
    490 	    return new AST(*this);
    491 	}
    492 	
    493 	if you have defined a copy constructor such as
    494 
    495 	AST(const AST &t)	// shallow copy constructor
    496 	{
    497 		token = t.token;
    498 		iconst = t.iconst;
    499 		setDown(NULL);
    500 		setRight(NULL);
    501 	}
    502 
    503 o	Added a warning with -CC and -gk are used together.  This is broken,
    504 	hence a warning is appropriate.
    505 
    506 o	Added warning when #-stuff is used w/o -gt option.
    507 
    508 o	Updated MPW installation.
    509 
    510 o	"Miller, Philip W." <MILLERPW f1groups.fsd.jhuapl.edu> suggested
    511 	that genmk be use RENAME_OBJ_FLAG RENAME_EXE_FLAG instead of
    512 	hardcoding "-o" in genmk.c.
    513 
    514 o	made all exit() calls use EXIT_SUCCESS or EXIT_FAILURE.
    515 
    516 ===========================================================================
    517 1.33
    518 
    519 EXIT_FAILURE and EXIT_SUCCESS were not always defined.  I had to modify
    520 a bunch of files to use PCCTS_EXIT_XXX, which forces a new version.  Sorry
    521 about that.
    522 
    523