1 2 ======================================================= 3 Known Problems In PCCTS - Last revised 14 November 1998 4 ======================================================= 5 6 #17. The dlg fix for handling characters up to 255 is incorrect. 7 8 See item #207. 9 10 Reported by Frank Hartmann. 11 12 #16. A note about "&&" predicates (Mike Dimmick) 13 14 Mike Dimmick has pointed out a potential pitfall in the use of the 15 "&&" style predicate. Consider: 16 17 r0: (g)? => <<P>>? r1 18 | ... 19 ; 20 r1: A | B; 21 22 If the context guard g is not a subset of the lookahead context for r1 23 (in other words g is neither A nor B) then the code may execute r1 24 even when the lookahead context is not satisfied. This is an error 25 by the person coding the grammer, and the error should be reported to 26 the user, but it isn't. expect. Some examples I've run seem to 27 indicate that such an error actually results in the rule becoming 28 unreachable. 29 30 When g is properly coded the code is correct, the problem is when g 31 is not properly coded. 32 33 A second problem reported by Mike Dimmick is that the test for a 34 failed validation predicate is equivalent to a test on the predicate 35 along. In other words, if the "&&" has not been hoisted then it may 36 falsely report a validation error. 37 38 #15. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions 39 40 An bug (or at least an oddity) is that a reference to LT(1), LA(1), 41 or LATEXT(1) in an action which immediately follows a token match 42 in a rule refers to the token matched, not the token which is in 43 the lookahead buffer. Consider: 44 45 r : abc <<action alpha>> D <<action beta>> E; 46 47 In this case LT(1) in action alpha will refer to the next token in 48 the lookahead buffer ("D"), but LT(1) in action beta will refer to 49 the token matched by D - the preceding token. 50 51 A warning has been added which warns users about this when an action 52 following a token match contains a reference to LT(1), LA(1), or LATEXT(1). 53 54 This behavior should be changed, but it appears in too many programs 55 now. Another problem, perhaps more significant, is that the obvious 56 fix (moving the consume() call to before the action) could change the 57 order in which input is requested and output appears in existing programs. 58 59 This problem was reported, along with a fix by Benjamin Mandel 60 (beny (a] sd.co.il). However, I felt that changing the behavior was too 61 dangerous for existing code. 62 63 #14. Parsing bug in dlg 64 65 THM: I have been unable to reproduce this problem. 66 67 Reported by Rick Howard Mijenix Corporation (rickh (a] mijenix.com). 68 69 The regular expression parser (in rexpr.c) fails while 70 trying to parse the following regular expression: 71 72 {[a-zA-Z]:}(\\\\[a-zA-Z0-9]*)+ 73 74 See my comment in the following excerpt from rexpr.c: 75 76 /* 77 * <regExpr> ::= <andExpr> ( '|' {<andExpr>} )* 78 * 79 * Return -1 if syntax error 80 * Return 0 if none found 81 * Return 1 if a regExrp was found 82 */ 83 static 84 regExpr(g) 85 GraphPtr g; 86 { 87 Graph g1, g2; 88 89 if ( andExpr(&g1) == -1 ) 90 { 91 return -1; 92 } 93 94 while ( token == '|' ) 95 { 96 int a; 97 next(); 98 a = andExpr(&g2); 99 if ( a == -1 ) return -1; /* syntax error below */ 100 else if ( !a ) return 1; /* empty alternative */ 101 g1 = BuildNFA_AorB(g1, g2); 102 } 103 104 if ( token!='\0' ) return -1; 105 ***** 106 ***** It appears to fail here becuause token is 125 - the closing '}' 107 ***** If I change it to: 108 ***** if ( token!='\0' && token!='}' && token!= ')' ) return -1; 109 ***** 110 ***** It succeeds, but I'm not sure this is the corrrect approach. 111 ***** 112 *g = g1; 113 return 1; 114 } 115 116 #13. dlg reports an invalid range for: [\0x00-\0xff] 117 118 Diagnosed by Piotr Eljasiak (eljasiak (a] no-spam.zt.gdansk.tpsa.pl): 119 120 Fixed in MR16. 121 122 #12. Strings containing comment actions 123 124 Sequences that looked like C style comments appearing in string 125 literals are improperly parsed by antlr/dlg. 126 127 << fprintf(out," /* obsolete */ "); 128 129 For this case use: 130 131 << fprintf(out," \/\* obsolete \*\/ "); 132 133 Reported by K.J. Cummings (cummings (a] peritus.com). 134 135 #11. User hook for deallocation of variables on guess fail 136 137 The mechanism outlined in Item #108 works only for 138 heap allocated variables. 139 140 #10. Label re-initialization in ( X {y:Y} )* 141 142 If a label assignment is optional and appears in a 143 (...)* or (...)+ block it will not be reset to NULL 144 when it is skipped by a subsequent iteration. 145 146 Consider the example: 147 148 ( X { y:Y })* Z 149 150 with input: 151 152 X Y X Z 153 154 The first time through the block Y will be matched and 155 y will be set to point to the token. On the second 156 iteration of the (...)* block there is no match for Y. 157 But y will not be reset to NULL, as the user might 158 expect, it will contain a reference to the Y that was 159 matched in the first iteration. 160 161 The work-around is to manually reset y: 162 163 ( X << y = NULL; >> { y:Y } )* Z 164 165 or 166 167 ( X ( y:Y | << y = NULL; >> /* epsilon */ ) )* Z 168 169 Reported by Jeff Vincent (JVincent (a] novell.com). 170 171 #9. PCCTAST.h PCCTSAST::setType() is a noop 172 173 #8. #tokdefs with ~Token and . 174 175 THM: I have been unable to reproduce this problem. 176 177 When antlr uses #tokdefs to define tokens the fields of 178 #errclass and #tokclass do not get properly defined. 179 When it subsequently attempts to take the complement of 180 the set of tokens (using ~Token or .) it can refer to 181 tokens which don't have names, generating a fatal error. 182 183 #7. DLG crashes on some invalid inputs 184 185 THM: In MR20 have fixed the most common cases. 186 187 The following token defintion will cause DLG to crash. 188 189 #token "()" 190 191 Reported by Mengue Olivier (dolmen (a] bigfoot.com). 192 193 #6. On MS systems \n\r is treated as two new lines 194 195 Fixed. 196 197 #5. Token expressions in #tokclass 198 199 #errclass does not support TOK1..TOK2 or ~TOK syntax. 200 #tokclass does not support ~TOKEN syntax 201 202 A workaround for #errclass TOK1..TOK2 is to use a 203 #tokclass. 204 205 Reported by Dave Watola (dwatola (a] amtsun.jpl.nasa.gov) 206 207 #4. A #tokdef must appear "early" in the grammar file. 208 209 The "early" section of the grammar file is the only 210 place where the following directives may appear: 211 212 #header 213 #first 214 #tokdefs 215 #parser 216 217 Any other kind of statement signifiies the end of the 218 "early" section. 219 220 #3. Use of PURIFY macro for C++ mode 221 222 Item #93 of the CHANGES_FROM_1.33 describes the use of 223 the PURIFY macro to zero arguments to be passed by 224 upward inheritance. 225 226 #define PURIFY(r, s) memset((char *) &(r), '\0', (s)); 227 228 This may not be the right thing to do for C++ objects that 229 have constructors. Reported by Bonny Rais (bonny (a] werple.net.au). 230 231 For those cases one should #define PURIFY to be an empty macro 232 in the #header or #first actions. 233 234 #2. Fixed in 1.33MR10 - See CHANGES_FROM_1.33 Item #80. 235 236 #1. The quality of support for systems with 8.3 file names leaves 237 much to be desired. Since the kit is distributed using the 238 long file names and the make file uses long file names it requires 239 some effort to generate. This will probably not be changed due 240 to the large number of systems already written using the long 241 file names. 242