1 The following changes (change numbers refer to perforce) were 2 made from version 3.1.1 to 3.1.2 3 4 Runtime 5 ------- 6 7 Change 5641 on 2009/02/20 by jimi (a] jimi.jimi.antlr3 8 9 Release version 3.1.2 of the ANTLR C runtime. 10 11 Updated documents and release notes will have to follow later. 12 13 Change 5639 on 2009/02/20 by jimi (a] jimi.jimi.antlr3 14 15 Fixed: ANTLR-356 16 17 Ensure that code generation for C++ does not require casts 18 19 Change 5577 on 2009/02/12 by jimi (a] jimi.jimi.antlr3 20 21 C Runtime - Bug fixes. 22 23 o Having moved to use an extract directly from a vector for returning 24 tokens, it exposed a 25 bug whereby the EOF boudary calculation in tokLT was incorrectly 26 checking > rather than >=. 27 o Changing to API initialization of tokens rather than memcmp() 28 incorrectly forgot to set teh input stream pointer for the 29 manufactured tokens in the token factory; 30 o Rewrite streams for rewriting tree parsers did not check whether the 31 rewrite stream was ever assigned before trying to free it, it is now 32 in line with the ordinary parser code. 33 34 Change 5576 on 2009/02/11 by jimi (a] jimi.jimi.antlr3 35 36 C Runtime: Ensure that when we manufacture a new token for a missing 37 token, that the user suplied custom information (if any) is copied 38 from the current token. 39 40 Change 5575 on 2009/02/08 by jimi (a] jimi.jimi.antlr3 41 42 C Runtime - Vastly improve the reuse of allocated memory for nodes in 43 tree rewriting. 44 45 A problem for all targets at the moment si that the rewrite logic 46 generated by ANTLR makes no attempt 47 to reuse any resources, it merely gurantees that the tree shape at the 48 end is correct. To some extent this is mitigated by the garbage 49 collection systems of Java and .Net, even thoguh it is still an overhead to 50 keep creating so many modes. 51 52 This change implements the first of two C runtime changes that make 53 best efforst to track when a node has become orphaned and will never 54 be reused, based on inherent knowledge of the rewrite logic (which in 55 the long term is not a great soloution). 56 57 Much of the rewrite logic consists of creating a niilnode into which 58 child nodes are appended. At: rulePost processing time; when a rewrite 59 stream is closed; and when becomeRoot is called, there are many situations 60 where the root of the tree that will be manipulted, or is finished with 61 (in the case of rewrtie streams), where the nilNode was just a temporary 62 creation for the sake of the rewrite itself. 63 64 In these cases we can see that the nilNode would just be left ot rot in 65 the node factory that tracks all the tree nodes. 66 Rather than leave these in the factory to rot, we now keep a resuse 67 stck and always reuse any node on this 68 stack before claimin a new node from the factory pool. 69 70 This single change alone reduces memory usage in the test case (20,604 71 line C program and a GNU C parser) 72 from nearly a GB, to 276MB. This is still way more memory than we 73 shoudl need to do this operation, even on such a large input file, 74 but the reduction results in a huge performance increase and greatly 75 reduced system time spent on allocations. 76 77 After this optimizatoin, comparison with gcc yeilds: 78 79 time gcc -S a.c 80 a.c:1026: warning: conflicting types for built-in function vsprintf 81 a.c:1030: warning: conflicting types for built-in function vsnprintf 82 a.c:1041: warning: conflicting types for built-in function vsscanf 83 0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 84 0inputs+240outputs (0major+8345minor)pagefaults 0swaps 85 86 and 87 88 time ./jimi 89 Reading a.c 90 0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 91 0inputs+0outputs (0major+66609minor)pagefaults 0swaps 92 93 And we can now interpolate the fact that the only major differnce is 94 now the huge disparity in memory allocations. A 95 future optimization of vector pooling, to sepate node resue from vector 96 reuse, currently looks promising for further reuse of memory. 97 98 Finally, a static analysis of the rewrte code, plus a realtime analysis 99 of the heap at runtime, may well give us a reasonable memory usage 100 pattern. In reality though, it is the generated rewrite logic 101 that must becom optional at not continuously rewriting things that it 102 need not, as it ascends the rule chain. 103 104 Change 5563 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 105 106 Allow rewrite streams to use the base adaptors vector factory and not 107 try to malloc new vectors themselves. 108 109 Change 5562 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 110 111 Don't use CALLOC to allocate tree pools, use malloc as there is no need 112 for calloc. 113 114 Change 5561 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 115 116 Prevent warnigsn about retval.stop not being initialized when a rule 117 returns eraly because it is in backtracking mode 118 119 Change 5558 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 120 121 Lots of optimizations (though the next one to be checked in is the huge 122 win) for AST building and vector factories. 123 124 A large part of tree rewriting was the creation of vectors to hold AST 125 nodes. Although I had created a vector factory, for some reason I never got 126 around to creating a proper one, that pre-allocated the vectors in chunks and 127 so on. I guess I just forgot to. Hence a big win here is prevention of calling 128 malloc lots and lots of times to create vectors. 129 130 A second inprovement was to change teh vector definition such that it 131 holds a certain number of elements wihtin the vector structure itself, rather 132 than malloc and freeing these. Currently this is set to 8, but may increase. 133 For AST construction, this is generally a big win because AST nodes don't often 134 have many individual children unless there has not been any shaping going on in 135 the parser. But if you are not shaping, then you don't really need a tree. 136 137 Other perforamnce inprovements here include not calling functions 138 indirectly within token stream and common token stream. Hence tokens are 139 claimed directly from the vectors. Users can override these funcitons of course 140 and all this means is that if you override tokenstreams then you pretty much 141 have to provide all the mehtods, but then I think you woudl have to anyway (and 142 I don't know of anyone that has wanted to do this as you can carry your own 143 structure around with the tokens anyway and that is much easier). 144 145 Change 5555 on 2009/01/26 by jimi (a] jimi.jimi.antlr3 146 147 Fixed: ANTLR-288 148 Correct the interpretation of the skip token such that channel, start 149 index, char pos in lie, start line and text are correctly reset to the start of 150 the new token when the one that we just traversed was marked as being skipped. 151 152 This correctly excludes the text that was matched as part of the 153 SKIP()ed token from the next token in the token stream and so has the side 154 effect that asking for $text of a rule no longer includes the text that shuodl 155 be skipped, but DOES include the text of tokens that were merely placed off the 156 default channel. 157 158 Change 5551 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 159 160 Fixed: ANTLR-287 161 Most of the source files did not include the BSD license. THis might 162 not be that big a deal given that I don't care what people do with it 163 other than take my name off it, but having the license reproduced 164 everywhere 165 at least makes things perfectly clear. Hence this mass change of 166 sources and templates 167 to include the license. 168 169 Change 5550 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 170 171 Fixed: ANTLR-365 172 Ensure that as soon as we known about an input stream on the lexer that 173 we borrow its string factroy adn use it in our EOF token in case 174 anyone tries to make it a string, such as in error messages for 175 instance. 176 177 Change 5548 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 178 179 Fixed: ANTLR-363 180 At some point the Java runtime default changed from discarding offchannel 181 tokens to preserving them. The fix is to make the C runtime also 182 default to preserving off-channel tokens. 183 184 Change 5544 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 185 186 Fixed: ANTLR-360 187 Ensure that the fillBuffer funtiion does not call any methods 188 that require the cached buffer size to be recorded before we 189 have actually recorded it. 190 191 Change 5543 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 192 193 Fixed: ANTLR-362 194 Some users have started using string factories themselves and 195 exposed a flaw in the destroy method, that is intended to remove 196 a strng htat was created by the factory and is no longer needed. 197 The string was correctly removed from the vector that tracks them 198 but after the first one, all the remaining strings are then numbered 199 incorrectly. Hence the destroy method has been recoded to reindex 200 the strings in the factory after one is removed and everythig is once 201 more hunky dory. 202 User suggested fix rejected. 203 204 Change 5542 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 205 206 Fixed ANTLR-366 207 The recognizer state now ensures that all fields are set to NULL upon 208 creation 209 and the reset does not overwrite the tokenname array 210 211 Change 5527 on 2009/01/15 by jimi (a] jimi.jimi.antlr3 212 213 Add the C runtime for 3.1.2 beta2 to perforce 214 215 Change 5526 on 2009/01/15 by jimi (a] jimi.jimivista.antlr3 216 217 Correctly define the MEMMOVE macro which was inadvertently left to be 218 memcpy. 219 220 Change 5503 on 2008/12/12 by jimi (a] jimi.jimi.antlr3 221 222 Change C runtime release number to 3.1.2 beta 223 224 Change 5473 on 2008/12/01 by jimi (a] jimi.jimivista.antlr3 225 226 Fixed: ANTLR-350 - C runtime use of memcpy 227 Prior change to use memcpy instead of memmove in all cases missed the 228 fact that the string factory can be in a situation where overlaps occur. We now 229 have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. 230 231 Change 5471 on 2008/12/01 by jimi (a] jimi.jimivista.antlr3 232 233 Fixed ANTLR-361 234 - Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for 235 MingW 236 237 Templates 238 --------- 239 240 Change 5637 on 2009/02/20 by jimi (a] jimi.jimi.antlr3 241 242 C rtunime - make sure that ADAPTOR results are cast to the tree type on 243 a rewrite 244 245 Change 5620 on 2009/02/18 by jimi (a] jimi.jimi.antlr3 246 247 Rename/Move: 248 From: //depot/code/antlr/main/src/org/antlr/codegen/templates/... 249 To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/... 250 251 Relocate the code generating templates to exist in the directory set 252 that maven expects. 253 254 When checking in your templates, you may find it easiest to make a copy 255 of what you have, revert the change in perforce, then just check out the 256 template in the new location, and copy the changes back over. Nobody has oore 257 than two files open at the moment. 258 259 Change 5578 on 2009/02/12 by jimi (a] jimi.jimi.antlr3 260 261 Correct the string template escape sequences for generating scope 262 code in the C templates. 263 264 Change 5577 on 2009/02/12 by jimi (a] jimi.jimi.antlr3 265 266 C Runtime - Bug fixes. 267 268 o Having moved to use an extract directly from a vector for returning 269 tokens, it exposed a 270 bug whereby the EOF boudary calculation in tokLT was incorrectly 271 checking > rather than 272 >=. 273 o Changing to API initialization of tokens rather than memcmp() 274 incorrectly forgot to 275 set teh input stream pointer for the manufactured tokens in the 276 token factory; 277 o Rewrite streams for rewriting tree parsers did not check whether the 278 rewrite stream 279 was ever assigned before trying to free it, it is now in line with 280 the ordinary parser code. 281 282 Change 5567 on 2009/01/29 by jimi (a] jimi.jimi.antlr3 283 284 C Runtime - Further Optimizations 285 286 Within grammars that used scopes and were intended to parse large 287 inputs with many rule nests, 288 the creation anf deletion of the scopes themselves became significant. 289 Careful analysis shows that 290 for most grammars, while a parse could create and delete 20,000 scopes, 291 the maxium depth of 292 any scope was only 8. 293 294 This change therefore changes the scope implementation so that it does 295 not free scope memory when 296 it is popped but just tracks it in a C runtime stack, eventually 297 freeing it when the stack is freed. This change 298 caused the allocation of only 12 scope structures instead of 20,000 for 299 the extreme example case. 300 301 This change means that scope users must be carefule (as ever in C) to 302 initializae their scope elements 303 correctly as: 304 305 1) If not you may inherit values from a prior use of the scope 306 structure; 307 2) SCope structure are now allocated with malloc and not calloc; 308 309 Also, when using a custom free function to clean a scope when it is 310 popped, it is probably a good idea 311 to set any free'd pointers to NULL (this is generally good C programmig 312 practice in any case) 313 314 Change 5566 on 2009/01/29 by jimi (a] jimi.jimi.antlr3 315 316 Remove redundant BACKTRACK checking so that MSVC9 does not get confused 317 about possibly uninitialized variables 318 319 Change 5565 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 320 321 Use malloc rather than calloc to allocate memory for new scopes. Note 322 that this means users will have to be careful to initialize any values in their 323 scopes that they expect to be 0 or NULL and I must document this. 324 325 Change 5564 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 326 327 Use malloc rather than calloc for copying list lable tokens for 328 rewrites. 329 330 Change 5561 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 331 332 Prevent warnigsn about retval.stop not being initialized when a rule 333 returns eraly because it is in backtracking mode 334 335 Change 5560 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 336 337 Add a NULL check before freeing rewrite streams used in AST rewrites 338 rather than auto-rewrites. 339 340 While the NULL check is redundant as the free cannot be called unless 341 it is assigned, Visual Studio C 2008 342 gets it wrong and thinks that there is a PATH than can arrive at the 343 free wihtout it being assigned and that is too annoying to ignore. 344 345 Change 5559 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 346 347 C target Tree rewrite optimization 348 349 There is only one optimization in this change, but it is a huge one. 350 351 The code generation templates were set up so that at the start of a rule, 352 any rewrite streams mentioned in the rule wer pre-created. However, this 353 is a massive overhead for rules where only one or two of the streams are 354 actually used, as we create them then free them without ever using them. 355 This was copied from the Java templates basically. 356 This caused literally millions of extra calls and vector allocations 357 in the case of the GNU C parser given to me for testing with a 20,000 358 line program. 359 360 After this change, the following comparison is avaiable against the gcc 361 compiler: 362 363 Before (different machines here so use the relative difference for 364 comparison): 365 366 gcc: 367 368 real 0m0.425s 369 user 0m0.384s 370 sys 0m0.036s 371 372 ANTLR C 373 real 0m1.958s 374 user 0m1.284s 375 sys 0m0.656s 376 377 After the previous optimizations for vector pooling via a factory, 378 plus this huge win in removing redundant code, we have the following 379 (different machine to the one above): 380 381 gcc: 382 0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 383 0inputs+328outputs (0major+9922minor)pagefaults 0swaps 384 385 ANTLR C: 386 387 0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 388 0inputs+0outputs (0major+130944minor)pagefaults 0swaps 389 390 The extra system time coming from the fact that although the tree 391 rewriting is now optimal in terms of not allocating things it does 392 not need, there is still a lot more overhead in a parser that is generated 393 for generic use, including much more use of structures for tokens and extra 394 copying and so on. I will 395 continue to work on improviing things where I can, but the next big 396 improvement will come from Ter's optimization of the actual code structures we 397 generate including not doing things with rewrite streams that we do not need to 398 do at all. 399 400 The second machine I used is about twice as fast CPU wise as the system 401 that was used originally by the user that asked about this performance. 402 403 Change 5558 on 2009/01/28 by jimi (a] jimi.jimi.antlr3 404 405 Lots of optimizations (though the next one to be checked in is the huge 406 win) for AST building and vector factories. 407 408 A large part of tree rewriting was the creation of vectors to hold AST 409 nodes. Although I had created a vector factory, for some reason I never got 410 around to creating a proper one, that pre-allocated the vectors in chunks and 411 so on. I guess I just forgot to. Hence a big win here is prevention of calling 412 malloc lots and lots of times to create vectors. 413 414 A second inprovement was to change teh vector definition such that it 415 holds a certain number of elements wihtin the vector structure itself, rather 416 than malloc and freeing these. Currently this is set to 8, but may increase. 417 For AST construction, this is generally a big win because AST nodes don't often 418 have many individual children unless there has not been any shaping going on in 419 the parser. But if you are not shaping, then you don't really need a tree. 420 421 Other perforamnce inprovements here include not calling functions 422 indirectly within token stream and common token stream. Hence tokens are 423 claimed directly from the vectors. Users can override these funcitons of course 424 and all this means is that if you override tokenstreams then you pretty much 425 have to provide all the mehtods, but then I think you woudl have to anyway (and 426 I don't know of anyone that has wanted to do this as you can carry your own 427 structure around with the tokens anyway and that is much easier). 428 429 Change 5554 on 2009/01/26 by jimi (a] jimi.jimi.antlr3 430 431 Fixed: ANTLR-379 432 For some reason in the past, the ruleMemozation() template had required 433 that the name parameter be set to the rule name. This does not seem to be a 434 requirement any more. The name=xxx override when invoking the template was 435 causing all the scope names derived when cleaning up in memoization to be 436 called after the rule name, which was not correct. Howver, this only affected 437 the output when in output=AST mode. 438 439 This template invocation is now corrected. 440 441 Change 5553 on 2009/01/26 by jimi (a] jimi.jimi.antlr3 442 443 Fixed: ANTLR-330 444 Managed to get the one rule that could not see the ASTLabelType to call 445 back in to the super template C.stg and ask it to construct hte name. I am not 446 100% sure that this fixes all cases, but I cannot find any that fail. PLease 447 let me know if you find any exampoles of being unable to default the 448 ASTLabelType option in the C target. 449 450 Change 5552 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 451 452 Progress: ANTLR-327 453 Fix debug code generation templates when output=AST such that code 454 can at least be generated and I can debug the output code correctly. 455 Note that this checkin does not implement the debugging requirements 456 for tree generating parsers. 457 458 Change 5551 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 459 460 Fixed: ANTLR-287 461 Most of the source files did not include the BSD license. THis might 462 not be that big a deal given that I don't care what people do with it 463 other than take my name off it, but having the license reproduced 464 everywhere at least makes things perfectly clear. Hence this mass change of 465 sources and templates to include the license. 466 467 Change 5549 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 468 469 Fixed: ANTLR-354 470 Using 0.0D as the default initialize value for a double caused 471 VS 2003 C compiler to bomb out. There seesm to be no reason other 472 than force of habit to set this to 0.0D so I have dropped the D so 473 that older compilers do not complain. 474 475 Change 5547 on 2009/01/25 by jimi (a] jimi.jimi.antlr3 476 477 Fixed: ANTLR-282 478 All references are now unadorned with any type of NULL check for the 479 following reasons: 480 481 1) A NULL reference means that there is a problem with the 482 grammar and we need the program to fail immediately so 483 that the programmer can work out where the problem occured; 484 2) Most of the time, the only sensible value that can be 485 returned is NULL or 0 which 486 obviates the NULL check in the first place; 487 3) If we replace a NULL reference with some value such as 0, 488 then the program may blithely continue but just do something 489 logically wrong, which will be very difficult for the 490 grammar programmer to detect and correct. 491 492 Change 5545 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 493 494 Fixed: ANTLR-357 495 The bug report was correct in that the types of references to things 496 like $start were being incorrectly cast as they wer not changed from 497 Java style casts (and the casts are unneccessary). this is now fixed 498 and references are referencing the correct, uncast, types. 499 However, the bug report was wrong in that the reference in the bok to 500 $start.pos will only work for Java and really, it is incorrect in the 501 book because it shoudl not access the .pos member directly but shudl 502 be using $start.getCharPositionInLine(). 503 Because there is no access qualification in C, one could use 504 $start.charPosition, however 505 really this should be $start->getCharPositionInLine($start); 506 507 Change 5541 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 508 509 Fixed - ANTLR-367 510 The code generation for the free method of a recognizer was not 511 distinguishing tree parsers from parsers when it came to calling delegate free 512 functions. 513 This is now corrected. 514 515 Change 5540 on 2009/01/24 by jimi (a] jimi.jimi.antlr3 516 517 Fixed ANTLR-355 518 Ensure that we do not attempt to free any memory that we did not 519 actually allocate because the parser rule was being executed in 520 backtracking mode. 521 522 Change 5539 on 2009/01/24 by jimi (a] jimi.jimivista.antlr3 523 524 Fixed: ANTLR-355 525 When a C targetted parser is producing in backtracking mode, then the 526 creation of new stream rewrite structures shoudl not happen if the rule is 527 currently backtracking 528 529 Change 5502 on 2008/12/11 by jimi (a] jimi.jimi.antlr3 530 531 Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit 532 compatible 533 534 Change 5473 on 2008/12/01 by jimi (a] jimi.jimivista.antlr3 535 536 Fixed: ANTLR-350 - C runtime use of memcpy 537 Prior change to use memcpy instead of memmove in all cases missed the 538 fact that the string factory can be in a situation where overlaps occur. We now 539 have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. 540 541 Change 5387 on 2008/11/05 by parrt (a] parrt.spork 542 543 Fixed x+=. issue with tree grammars; added unit test 544 545 Change 5325 on 2008/10/23 by parrt (a] parrt.spork 546 547 We were all ref'ing backtracking==0 hardcoded instead checking the 548 @synpredgate action. 549 550 551