1 CHANGES FROM 1.31 2 3 This file contains the migration of PCCTS from 1.31 in the order that 4 changes were made. 1.32b7 is the last beta before full 1.32. 5 Terence Parr, Parr Research Corporation 1995. 6 7 8 ====================================================================== 9 1.32b1 10 Added Russell Quong to banner, changed banner for output slightly 11 Fixed it so that you have before / after actions for C++ in class def 12 Fixed bug in optimizer that made it sometimes forget to set internal 13 token pointers. Only showed up when a {...} was in the "wrong spot". 14 15 ====================================================================== 16 1.32b2 17 Added fixes by Dave Seidel for PC compilers in 32 bit mode (config.h 18 and set.h). 19 20 ====================================================================== 21 1.32b3 22 Fixed hideous bug in code generator for wildcard and for ~token op. 23 24 from Dave Seidel 25 26 Added pcnames.bat 27 1. in antlr/main.c: change strcasecmp() to stricmp() 28 29 2. in dlg/output.c: use DLEXER_C instead on "DLexer.C" 30 31 3. in h/PBlackBox.h: use <iostream.h> instead of <stream.h> 32 33 ====================================================================== 34 1.32b4 35 When the -ft option was used, any path prefix screwed up 36 the gate on the .h files 37 38 Fixed yet another bug due to the optimizer. 39 40 The exception handling thing was a bit wacko: 41 42 a : ( A B )? A B 43 | A C 44 ; 45 exception ... 46 47 caused an exception if "A C" was the input. In other words, 48 it found that A C didn't match the (A B)? pred and caused 49 an exception rather than trying the next alt. All I did 50 was to change the zzmatch_wsig() macros. 51 52 Fixed some problems in gen.c relating to the name of token 53 class bit sets in the output. 54 55 Added the tremendously cool generalized predicate. For the 56 moment, I'll give this bried description. 57 58 a : <<predicate>>? blah 59 | foo 60 ; 61 62 This implies that (assuming blah and foo are syntactically 63 ambiguous) "predicate" indicates the semantic validity of 64 applying "blah". If "predicate" is false, "foo" is attempted. 65 66 Previously, you had to say: 67 68 a : <<LA(1)==ID ? predicate : 1>>? ID 69 | ID 70 ; 71 72 Now, you can simply use "predicate" without the ?: operator 73 if you turn on ANTLR command line option: "-prc on". This 74 tells ANTLR to compute that all by itself. It computes n 75 tokens of lookahead where LT(n) or LATEXT(n) is the farthest 76 ahead you look. 77 78 If you give a predicate using "-prc on" that is followed 79 by a construct that can recognize more than one n-sequence, 80 you will get a warning from ANTLR. For example, 81 82 a : <<isTypeName(LT(1)->getText())>>? (ID|INT) 83 ; 84 85 This is wrong because the predicate will be applied to INTs 86 as well as ID's. You should use this syntax to make 87 the predicate more specific: 88 89 a : (ID)? => <<isTypeName(LT(1)->getText())>>? (ID|INT) 90 ; 91 92 which says "don't apply the predicate unless ID is the 93 current lookahead context". 94 95 You cannot currently have anything in the "(context)? =>" 96 except sequences such as: 97 98 ( LPAREN ID | LPAREN SCOPE )? => <<pred>>? 99 100 I haven't tested this THAT much, but it does work for the 101 C++ grammar. 102 103 ====================================================================== 104 1.32b5 105 106 Added getLine() to the ANTLRTokenBase and DLGBasedToken classes 107 left line() for backward compatibility. 108 ---- 109 Removed SORCERER_TRANSFORM from the ast.h stuff. 110 ------- 111 Fixed bug in code gen of ANTLR such that nested syn preds work more 112 efficiently now. The ANTLRTokenBuffer was getting very large 113 with nested predicates. 114 ------ 115 Memory leak is now gone from ANTLRTokenBuf; all tokens are deleted. 116 For backward compatibility reasons, you have to say parser->deleteTokens() 117 or mytokenbuffer->deleteTokens() but later it will be the default mode. 118 Say this after the parser is constructed. E.g., 119 120 ParserBlackBox<DLGLexer, MyParser, ANTLRToken> p(stdin); 121 p.parser()->deleteTokens(); 122 p.parser()->start_symbol(); 123 124 125 ============================== 126 1.32b6 127 128 Changed so that deleteTokens() will do a delete ((ANTLRTokenBase *)) 129 on the ptr. This gets the virtual destructor. 130 131 Fixed some weird things in the C++ header files (a few return types). 132 133 Made the AST routines correspond to the book and SORCERER stuff. 134 135 New token stuff: See testcpp/14/test.g 136 137 ANTLR accepts a #pragma gc_tokens which says 138 [1] Generate label = copy(LT(1)) instead of label=LT(1) for 139 all labeled token references. 140 [2] User now has to define ANTLRTokenPtr (as a class or a typedef 141 to just a pointer) as well as the ANTLRToken class itself. 142 See the example. 143 144 To delete tokens in token buffer, use deleteTokens() message on parser. 145 146 All tokens that fall off the ANTLRTokenBuffer get deleted 147 which is what currently happens when deleteTokens() message 148 has been sent to token buffer. 149 150 We always generate ANTLRTokenPtr instead of 'ANTLRToken *' now. 151 Then if no pragma set, ANTLR generates a 152 153 class ANTLRToken; 154 typedef ANTLRToken *ANTLRTokenPtr; 155 156 in each file. 157 158 Made a warning for x:rule_ref <<$x>>; still no warning for $i's, however. 159 class BB { 160 161 a : x:b y:A <<$x 162 $y>> 163 ; 164 165 b : B; 166 167 } 168 generates 169 Antlr parser generator Version 1.32b6 1989-1995 170 test.g, line 3: error: There are no token ptrs for rule references: '$x' 171 172 =================== 173 1.32b7: 174 175 [With respect to token object garbage collection (GC), 1.32b7 176 backtracks from 1.32b6, but results in better and less intrusive GC. 177 This is the last beta version before full 1.32.] 178 179 BIGGEST CHANGES: 180 181 o The "#pragma gc_tokens" is no longer used. 182 183 o .C files are now .cpp files (hence, makefiles will have to 184 be changed; or you can rerun genmk). This is a good move, 185 but causes some backward incompatibility problems. You can 186 avoid this by changing CPP_FILE_SUFFIX to ".C" in pccts/h/config.h. 187 188 o The token object class hierarchy has been flattened to include 189 only three classes: ANTLRAbstractToken, ANTLRCommonToken, and 190 ANTLRCommonNoRefCountToken. The common token now does garbage 191 collection via ref counting. 192 193 o "Smart" pointers are now used for garbage collection. That is, 194 ANTLRTokenPtr is used instead of "ANTLRToken *". 195 196 o The antlr.1 man page has been cleaned up slightly. 197 198 o The SUN C++ compiler now complains less about C++ support code. 199 200 o Grammars which subclass ANTLRCommonToken must wrap all token 201 pointer references in mytoken(token_ptr). This is the only 202 serious backward incompatibility. See below. 203 204 205 MINOR CHANGES: 206 207 -------------------------------------------------------- 208 1 deleteTokens() 209 210 The deleteTokens() message to the parser or token buffer has been changed 211 to one of: 212 213 void noGarbageCollectTokens() { inputTokens->noGarbageCollectTokens(); } 214 void garbageCollectTokens() { inputTokens->garbageCollectTokens(); } 215 216 The token buffer deletes all non-referenced tokens by default now. 217 218 -------------------------------------------------------- 219 2 makeToken() 220 221 The makeToken() message returns a new type. The function should look 222 like: 223 224 virtual ANTLRAbstractToken *makeToken(ANTLRTokenType tt, 225 ANTLRChar *txt, 226 int line) 227 { 228 ANTLRAbstractToken *t = new ANTLRCommonToken(tt,txt); 229 t->setLine(line); 230 return t; 231 } 232 233 -------------------------------------------------------- 234 3 TokenType 235 236 Changed TokenType-> ANTLRTokenType (often forces changes in AST defs due 237 to #[] constructor called to AST(tokentype, string)). 238 239 -------------------------------------------------------- 240 4 AST() 241 242 You must define AST(ANTLRTokenPtr t) now in your AST class definition. 243 You might also have to include ATokPtr.h above the definition; e.g., 244 if AST is defined in a separate file, such as AST.h, it's a good idea 245 to include ATOKPTR_H (ATokPtr.h). For example, 246 247 #include ATOKPTR_H 248 class AST : public ASTBase { 249 protected: 250 ANTLRTokenPtr token; 251 public: 252 AST(ANTLRTokenPtr t) { token = t; } 253 void preorder_action() { 254 char *s = token->getText(); 255 printf(" %s", s); 256 } 257 }; 258 259 Note the use of smart pointers rather than "ANTLRToken *". 260 261 -------------------------------------------------------- 262 5 SUN C++ 263 264 From robertb oakhill.sps.mot.com Bob Bailey. Changed ANTLR C++ output 265 to avoid an error in Sun C++ 3.0.1. Made "public" return value 266 structs created to hold multiple return values public. 267 268 -------------------------------------------------------- 269 6 genmk 270 271 Fixed genmk so that target List.* is not included anymore. It's 272 called SList.* anyway. 273 274 -------------------------------------------------------- 275 7 \r vs \n 276 277 Scott Vorthmann <vorth cmu.edu> fixed antlr.g in ANTLR so that \r 278 is allowed as the return character as well as \n. 279 280 -------------------------------------------------------- 281 8 Exceptions 282 283 Bug in exceptions attached to labeled token/tokclass references. Didn't gen 284 code for exceptions. This didn't work: 285 286 a : "help" x:ID 287 ; 288 exception[x] 289 catch MismatchedToken : <<printf("eh?\n");>> 290 291 Now ANTLR generates (which is kinda big, but necessary): 292 293 if ( !_match_wsig(ID) ) { 294 if ( guessing ) goto fail; 295 _signal=MismatchedToken; 296 switch ( _signal ) { 297 case MismatchedToken : 298 printf("eh?\n"); 299 _signal = NoSignal; 300 break; 301 default : 302 goto _handler; 303 } 304 } 305 306 which implies that you can recover and continue parsing after a missing/bad 307 token reference. 308 309 -------------------------------------------------------- 310 9 genmk 311 312 genmk now correctly uses config file for CPP_FILE_SUFFIX stuff. 313 314 -------------------------------------------------------- 315 10 general cleanup / PURIFY 316 317 Anthony Green <green vizbiz.com> suggested a bunch of good general 318 clean up things for the code; he also suggested a few things to 319 help out the "PURIFY" memory allocation checker. 320 321 -------------------------------------------------------- 322 11 $-variable references. 323 324 Manuel ORNATO indicated that a $-variable outside of a rule caused 325 ANTLR to crash. I fixed this. 326 327 12 Tom Moog suggestion 328 329 Fail action of semantic predicate needs "{}" envelope. FIXED. 330 331 13 references to LT(1). 332 333 I have enclosed all assignments such as: 334 335 _t22 = (ANTLRTokenPtr)LT(1); 336 337 in "if ( !guessing )" so that during backtracking the reference count 338 for token objects is not increased. 339 340 341 TOKEN OBJECT GARBAGE COLLECTION 342 343 1 INTRODUCTION 344 345 The class ANTLRCommonToken is now garbaged collected through a "smart" 346 pointer called ANTLRTokenPtr using reference counting. Any token 347 object not referenced by your grammar actions is destroyed by the 348 ANTLRTokenBuffer when it must make room for more token objects. 349 Referenced tokens are then destroyed in your parser when local 350 ANTLRTokenPtr objects are deleted. For example, 351 352 a : label:ID ; 353 354 would be converted to something like: 355 356 void yourclass::a(void) 357 { 358 zzRULE; 359 ANTLRTokenPtr label=NULL; // used to be ANTLRToken *label; 360 zzmatch(ID); 361 label = (ANTLRTokenPtr)LT(1); 362 consume(); 363 ... 364 } 365 366 When the "label" object is destroyed (it's just a pointer to your 367 input token object LT(1)), it decrements the reference count on the 368 object created for the ID. If the count goes to zero, the object 369 pointed by label is deleted. 370 371 To correctly manage the garbage collection, you should use 372 ANTLRTokenPtr instead of "ANTLRToken *". Most ANTLR support code 373 (visible to the user) has been modified to use the smart pointers. 374 375 *************************************************************** 376 Remember that any local objects that you create are not deleted when a 377 lonjmp() is executed. Unfortunately, the syntactic predicates (...)? 378 use setjmp()/longjmp(). There are some situations when a few tokens 379 will "leak". 380 *************************************************************** 381 382 2 DETAILS 383 384 o The default is to perform token object garbage collection. 385 You may use parser->noGarbageCollectTokens() to turn off 386 garbage collection. 387 388 389 o The type ANTLRTokenPtr is always defined now (automatically). 390 If you do not wish to use smart pointers, you will have to 391 redefined ANTLRTokenPtr by subclassing, changing the header 392 file or changing ANTLR's code generation (easy enough to 393 do in gen.c). 394 395 o If you don't use ParserBlackBox, the new initialization sequence is: 396 397 ANTLRTokenPtr aToken = new ANTLRToken; 398 scan.setToken(mytoken(aToken)); 399 400 where mytoken(aToken) gets an ANTLRToken * from the smart pointer. 401 402 o Define C++ preprocessor symbol DBG_REFCOUNTTOKEN to see a bunch of 403 debugging stuff for reference counting if you suspect something. 404 405 406 3 WHY DO I HAVE TO TYPECAST ALL MY TOKEN POINTERS NOW?????? 407 408 If you subclass ANTLRCommonToken and then attempt to refer to one of 409 your token members via a token pointer in your grammar actions, the 410 C++ compiler will complain that your token object does not have that 411 member. For example, if you used to do this 412 413 << 414 class ANTLRToken : public ANTLRCommonToken { 415 int muck; 416 ... 417 }; 418 >> 419 420 class Foo { 421 a : t:ID << t->muck = ...; >> ; 422 } 423 424 Now, you must do change the t->muck reference to: 425 426 a : t:ID << mytoken(t)->muck = ...; >> ; 427 428 in order to downcast 't' to be an "ANTLRToken *" not the 429 "ANTLRAbstractToken *" resulting from ANTLRTokenPtr::operator->(). 430 The macro is defined as: 431 432 /* 433 * Since you cannot redefine operator->() to return one of the user's 434 * token object types, we must down cast. This is a drag. Here's 435 * a macro that helps. template: "mytoken(a-smart-ptr)->myfield". 436 */ 437 #define mytoken(tp) ((ANTLRToken *)(tp.operator->())) 438 439 You have to use macro mytoken(grammar-label) now because smart 440 pointers are not specific to a parser's token objects. In other 441 words, the ANTLRTokenPtr class has a pointer to a generic 442 ANTLRAbstractToken not your ANTLRToken; the ANTLR support code must 443 use smart pointers too, but be able to work with any kind of 444 ANTLRToken. Sorry about this, but it's C++'s fault not mine. Some 445 nebulous future version of the C++ compilers should obviate the need 446 to downcast smart pointers with runtime type checking (and by allowing 447 different return type of overridden functions). 448 449 A way to have backward compatible code is to shut off the token object 450 garbage collection; i.e., use parser->noGarbageCollectTokens() and 451 change the definition of ANTLRTokenPtr (that's why you get source code 452 <wink>). 453 454 455 PARSER EXCEPTION HANDLING 456 457 I've noticed some weird stuff with the exception handling. I intend 458 to give this top priority for the "book release" of ANTLR. 459 460 ========== 461 1.32 Full Release 462 463 o Changed Token class hierarchy to be (Thanks to Tom Moog): 464 465 ANTLRAbstractToken 466 ANTLRRefCountToken 467 ANTLRCommonToken 468 ANTLRNoRefCountCommonToken 469 470 o Added virtual panic() to ANTLRAbstractToken. Made ANTLRParser::panic() 471 virtual also. 472 473 o Cleaned up the dup() stuff in AST hierarchy to use shallowCopy() to 474 make node copies. John Farr at Medtronic suggested this. I.e., 475 if you want to use dup() with either ANTLR or SORCERER or -transform 476 mode with SORCERER, you must defined shallowCopy() as: 477 478 virtual PCCTS_AST *shallowCopy() 479 { 480 return new AST; 481 p->setDown(NULL); 482 p->setRight(NULL); 483 return p; 484 } 485 486 or 487 488 virtual PCCTS_AST *shallowCopy() 489 { 490 return new AST(*this); 491 } 492 493 if you have defined a copy constructor such as 494 495 AST(const AST &t) // shallow copy constructor 496 { 497 token = t.token; 498 iconst = t.iconst; 499 setDown(NULL); 500 setRight(NULL); 501 } 502 503 o Added a warning with -CC and -gk are used together. This is broken, 504 hence a warning is appropriate. 505 506 o Added warning when #-stuff is used w/o -gt option. 507 508 o Updated MPW installation. 509 510 o "Miller, Philip W." <MILLERPW f1groups.fsd.jhuapl.edu> suggested 511 that genmk be use RENAME_OBJ_FLAG RENAME_EXE_FLAG instead of 512 hardcoding "-o" in genmk.c. 513 514 o made all exit() calls use EXIT_SUCCESS or EXIT_FAILURE. 515 516 =========================================================================== 517 1.33 518 519 EXIT_FAILURE and EXIT_SUCCESS were not always defined. I had to modify 520 a bunch of files to use PCCTS_EXIT_XXX, which forces a new version. Sorry 521 about that. 522 523