Home | History | Annotate | Download | only in Pccts
      1 
      2     =======================================================
      3     Known Problems In PCCTS - Last revised 14 November 1998
      4     =======================================================
      5 
      6 #17. The dlg fix for handling characters up to 255 is incorrect.
      7 
      8     See item #207.
      9 
     10     Reported by Frank Hartmann.
     11         
     12 #16. A note about "&&" predicates (Mike Dimmick)
     13 
     14     Mike Dimmick has pointed out a potential pitfall in the use of the
     15     "&&" style predicate.  Consider:
     16     
     17          r0: (g)? => <<P>>?  r1
     18              | ...
     19              ;
     20          r1: A | B;
     21          
     22     If the context guard g is not a subset of the lookahead context for r1
     23     (in other words g is neither A nor B) then the code may execute r1 
     24     even when the lookahead context is not satisfied.  This is an error
     25     by the person coding the grammer, and the error should be reported to
     26     the user, but it isn't. expect.  Some examples I've run seem to
     27     indicate that such an error actually results in the rule becoming
     28     unreachable.
     29     
     30     When g is properly coded the code is correct, the problem is when g
     31     is not properly coded.
     32     
     33     A second problem reported by Mike Dimmick is that the test for a
     34     failed validation predicate is equivalent to a test on the predicate
     35     along.  In other words, if the "&&" has not been hoisted then it may
     36     falsely report a validation error.
     37 
     38 #15. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions
     39 
     40     An bug (or at least an oddity) is that a reference to LT(1), LA(1),
     41     or LATEXT(1) in an action which immediately follows a token match
     42     in a rule refers to the token matched, not the token which is in
     43     the lookahead buffer.  Consider:
     44 
     45         r : abc <<action alpha>> D <<action beta>> E;
     46 
     47     In this case LT(1) in action alpha will refer to the next token in
     48     the lookahead buffer ("D"), but LT(1) in action beta will refer to
     49     the token matched by D - the preceding token.
     50 
     51     A warning has been added which warns users about this when an action
     52     following a token match contains a reference to LT(1), LA(1), or LATEXT(1).
     53 
     54     This behavior should be changed, but it appears in too many programs
     55     now.  Another problem, perhaps more significant, is that the obvious
     56     fix (moving the consume() call to before the action) could change the 
     57     order in which input is requested and output appears in existing programs.
     58 
     59     This problem was reported, along with a fix by Benjamin Mandel
     60     (beny (a] sd.co.il).  However, I felt that changing the behavior was too
     61     dangerous for existing code.
     62 
     63 #14. Parsing bug in dlg
     64 
     65     THM: I have been unable to reproduce this problem.
     66 
     67     Reported by Rick Howard Mijenix Corporation (rickh (a] mijenix.com).
     68 
     69     The regular expression parser (in rexpr.c) fails while
     70     trying to parse the following regular expression:
     71 
     72             {[a-zA-Z]:}(\\\\[a-zA-Z0-9]*)+
     73 
     74     See my comment in the following excerpt from rexpr.c:
     75 
     76     /*
     77      * <regExpr>        ::= <andExpr> ( '|' {<andExpr>} )*
     78      *
     79      * Return -1 if syntax error
     80      * Return  0 if none found
     81      * Return  1 if a regExrp was found
     82      */
     83 	static
     84 	regExpr(g)
     85 	GraphPtr g;
     86 	{
     87 	    Graph g1, g2;
     88 	
     89 	    if ( andExpr(&g1) == -1 )
     90 	    {
     91 	        return -1;
     92 	    }
     93 	
     94 	    while ( token == '|' )
     95 	    {
     96 	        int a;
     97 	        next();
     98 	        a = andExpr(&g2);
     99 	        if ( a == -1 ) return -1;   /* syntax error below */
    100 	        else if ( !a ) return 1;    /* empty alternative */
    101 	        g1 = BuildNFA_AorB(g1, g2);
    102 	    }
    103 	
    104 	    if ( token!='\0' ) return -1;
    105 	*****
    106 	***** It appears to fail here becuause token is 125 - the closing '}'
    107 	***** If I change it to:
    108 	*****    if ( token!='\0' && token!='}' && token!= ')' ) return -1;
    109 	*****
    110 	***** It succeeds, but I'm not sure this is the corrrect approach.
    111 	*****
    112 	    *g = g1;
    113 	    return 1;
    114 	}
    115 
    116 #13. dlg reports an invalid range for: [\0x00-\0xff]
    117 
    118     Diagnosed by Piotr Eljasiak (eljasiak (a] no-spam.zt.gdansk.tpsa.pl):
    119 
    120     Fixed in MR16.
    121 
    122 #12. Strings containing comment actions
    123 
    124      Sequences that looked like C style comments appearing in string
    125      literals are improperly parsed by antlr/dlg.
    126 
    127         << fprintf(out," /* obsolete */ ");
    128 
    129      For this case use:
    130 
    131         << fprintf(out," \/\* obsolete \*\/ ");
    132 
    133      Reported by K.J. Cummings (cummings (a] peritus.com).
    134 
    135 #11. User hook for deallocation of variables on guess fail
    136 
    137      The mechanism outlined in Item #108 works only for
    138      heap allocated variables.
    139 
    140 #10. Label re-initialization in ( X {y:Y} )*
    141 
    142      If a label assignment is optional and appears in a
    143      (...)* or (...)+ block it will not be reset to NULL
    144      when it is skipped by a subsequent iteration.
    145 
    146      Consider the example:
    147 
    148             ( X { y:Y })* Z
    149 
    150      with input:
    151 
    152             X Y X Z
    153 
    154      The first time through the block Y will be matched and
    155      y will be set to point to the token.  On the second
    156      iteration of the (...)* block there is no match for Y.
    157      But y will not be reset to NULL, as the user might
    158      expect, it will contain a reference to the Y that was
    159      matched in the first iteration.
    160 
    161      The work-around is to manually reset y:
    162 
    163             ( X << y = NULL; >> { y:Y } )* Z
    164 
    165         or
    166 
    167             ( X ( y:Y | << y = NULL; >> /* epsilon */ ) )* Z
    168 
    169      Reported by Jeff Vincent (JVincent (a] novell.com).
    170 
    171 #9. PCCTAST.h PCCTSAST::setType() is a noop
    172 
    173 #8. #tokdefs with ~Token and .
    174 
    175     THM: I have been unable to reproduce this problem.
    176 
    177     When antlr uses #tokdefs to define tokens the fields of
    178     #errclass and #tokclass do not get properly defined.
    179     When it subsequently attempts to take the complement of
    180     the set of tokens (using ~Token or .) it can refer to
    181     tokens which don't have names, generating a fatal error.
    182 
    183 #7. DLG crashes on some invalid inputs
    184 
    185     THM:  In MR20 have fixed the most common cases.
    186 
    187     The following token defintion will cause DLG to crash.
    188 
    189         #token "()"
    190 
    191     Reported by  Mengue Olivier (dolmen (a] bigfoot.com).
    192 
    193 #6. On MS systems \n\r is treated as two new lines
    194 
    195     Fixed.
    196 
    197 #5. Token expressions in #tokclass
    198 
    199     #errclass does not support TOK1..TOK2 or ~TOK syntax.
    200     #tokclass does not support ~TOKEN syntax
    201 
    202     A workaround for #errclass TOK1..TOK2 is to use a
    203     #tokclass.
    204 
    205     Reported by Dave Watola (dwatola (a] amtsun.jpl.nasa.gov)
    206 
    207 #4. A #tokdef must appear "early" in the grammar file.
    208 
    209     The "early" section of the grammar file is the only
    210     place where the following directives may appear:
    211 
    212         #header
    213         #first
    214         #tokdefs
    215         #parser
    216 
    217     Any other kind of statement signifiies the end of the
    218     "early" section.
    219 
    220 #3. Use of PURIFY macro for C++ mode
    221 
    222     Item #93 of the CHANGES_FROM_1.33 describes the use of
    223     the PURIFY macro to zero arguments to be passed by
    224     upward inheritance.
    225 
    226         #define PURIFY(r, s) memset((char *) &(r), '\0', (s));
    227 
    228     This may not be the right thing to do for C++ objects that
    229     have constructors.  Reported by Bonny Rais (bonny (a] werple.net.au).
    230 
    231     For those cases one should #define PURIFY to be an empty macro
    232     in the #header or #first actions.
    233 
    234 #2. Fixed in 1.33MR10 - See CHANGES_FROM_1.33 Item #80.
    235 
    236 #1. The quality of support for systems with 8.3 file names leaves
    237     much to be desired.  Since the kit is distributed using the
    238     long file names and the make file uses long file names it requires
    239     some effort to generate.  This will probably not be changed due
    240     to the large number of systems already written using the long
    241     file names.
    242