1 2 3 4 ANTLR(1) PCCTS Manual Pages ANTLR(1) 5 6 7 8 NAME 9 antlr - ANother Tool for Language Recognition 10 11 SYNTAX 12 antlr [_o_p_t_i_o_n_s] _g_r_a_m_m_a_r__f_i_l_e_s 13 14 DESCRIPTION 15 _A_n_t_l_r converts an extended form of context-free grammar into 16 a set of C functions which directly implement an efficient 17 form of deterministic recursive-descent LL(k) parser. 18 Context-free grammars may be augmented with predicates to 19 allow semantics to influence parsing; this allows a form of 20 context-sensitive parsing. Selective backtracking is also 21 available to handle non-LL(k) and even non-LALR(k) con- 22 structs. _A_n_t_l_r also produces a definition of a lexer which 23 can be automatically converted into C code for a DFA-based 24 lexer by _d_l_g. Hence, _a_n_t_l_r serves a function much like that 25 of _y_a_c_c, however, it is notably more flexible and is more 26 integrated with a lexer generator (_a_n_t_l_r directly generates 27 _d_l_g code, whereas _y_a_c_c and _l_e_x are given independent 28 descriptions). Unlike _y_a_c_c which accepts LALR(1) grammars, 29 _a_n_t_l_r accepts LL(k) grammars in an extended BNF notation - 30 which eliminates the need for precedence rules. 31 32 Like _y_a_c_c grammars, _a_n_t_l_r grammars can use automatically- 33 maintained symbol attribute values referenced as dollar 34 variables. Further, because _a_n_t_l_r generates top-down 35 parsers, arbitrary values may be inherited from parent rules 36 (passed like function parameters). _A_n_t_l_r also has a mechan- 37 ism for creating and manipulating abstract-syntax-trees. 38 39 There are various other niceties in _a_n_t_l_r, including the 40 ability to spread one grammar over multiple files or even 41 multiple grammars in a single file, the ability to generate 42 a version of the grammar with actions stripped out (for 43 documentation purposes), and lots more. 44 45 OPTIONS 46 -ck _n 47 Use up to _n symbols of lookahead when using compressed 48 (linear approximation) lookahead. This type of looka- 49 head is very cheap to compute and is attempted before 50 full LL(k) lookahead, which is of exponential complex- 51 ity in the worst case. In general, the compressed loo- 52 kahead can be much deeper (e.g, -ck 10) _t_h_a_n _t_h_e _f_u_l_l 53 _l_o_o_k_a_h_e_a_d (_w_h_i_c_h _u_s_u_a_l_l_y _m_u_s_t _b_e _l_e_s_s _t_h_a_n _4). 54 55 -CC Generate C++ output from both ANTLR and DLG. 56 57 -cr Generate a cross-reference for all rules. For each 58 rule, print a list of all other rules that reference 59 it. 60 61 -e1 Ambiguities/errors shown in low detail (default). 62 63 -e2 Ambiguities/errors shown in more detail. 64 65 -e3 Ambiguities/errors shown in excruciating detail. 66 67 -fe file 68 Rename err.c to file. 69 70 -fh file 71 Rename stdpccts.h header (turns on -gh) to file. 72 73 -fl file 74 Rename lexical output, parser.dlg, to file. 75 76 -fm file 77 Rename file with lexical mode definitions, mode.h, to 78 file. 79 80 -fr file 81 Rename file which remaps globally visible symbols, 82 remap.h, to file. 83 84 -ft file 85 Rename tokens.h to file. 86 87 -ga Generate ANSI-compatible code (default case). This has 88 not been rigorously tested to be ANSI XJ11 C compliant, 89 but it is close. The normal output of _a_n_t_l_r is 90 currently compilable under both K&R, ANSI C, and C++- 91 this option does nothing because _a_n_t_l_r generates a 92 bunch of #ifdef's to do the right thing depending on 93 the language. 94 95 -gc Indicates that _a_n_t_l_r should generate no C code, i.e., 96 only perform analysis on the grammar. 97 98 -gd C code is inserted in each of the _a_n_t_l_r generated pars- 99 ing functions to provide for user-defined handling of a 100 detailed parse trace. The inserted code consists of 101 calls to the user-supplied macros or functions called 102 zzTRACEIN and zzTRACEOUT. The only argument is a _c_h_a_r 103 * pointing to a C-style string which is the grammar 104 rule recognized by the current parsing function. If no 105 definition is given for the trace functions, upon rule 106 entry and exit, a message will be printed indicating 107 that a particular rule as been entered or exited. 108 109 -ge Generate an error class for each non-terminal. 110 111 -gh Generate stdpccts.h for non-ANTLR-generated files to 112 include. This file contains all defines needed to 113 describe the type of parser generated by _a_n_t_l_r (e.g. 114 how much lookahead is used and whether or not trees are 115 constructed) and contains the header action specified 116 by the user. 117 118 -gk Generate parsers that delay lookahead fetches until 119 needed. Without this option, _a_n_t_l_r generates parsers 120 which always have _k tokens of lookahead available. 121 122 -gl Generate line info about grammar actions in C parser of 123 the form # _l_i_n_e "_f_i_l_e" which makes error messages from 124 the C/C++ compiler make more sense as they will point 125 into the grammar file not the resulting C file. 126 Debugging is easier as well, because you will step 127 through the grammar not C file. 128 129 -gs Do not generate sets for token expression lists; 130 instead generate a ||-separated sequence of 131 LA(1)==_t_o_k_e_n__n_u_m_b_e_r. The default is to generate sets. 132 133 -gt Generate code for Abstract-Syntax Trees. 134 135 -gx Do not create the lexical analyzer files (dlg-related). 136 This option should be given when the user wishes to 137 provide a customized lexical analyzer. It may also be 138 used in _m_a_k_e scripts to cause only the parser to be 139 rebuilt when a change not affecting the lexical struc- 140 ture is made to the input grammars. 141 142 -k _n Set k of LL(k) to _n; i.e. set tokens of look-ahead 143 (default==1). 144 145 -o dir 146 Directory where output files should go (default="."). 147 This is very nice for keeping the source directory 148 clear of ANTLR and DLG spawn. 149 150 -p The complete grammar, collected from all input grammar 151 files and stripped of all comments and embedded 152 actions, is listed to stdout. This is intended to aid 153 in viewing the entire grammar as a whole and to elim- 154 inate the need to keep actions concisely stated so that 155 the grammar is easier to read. Hence, it is preferable 156 to embed even complex actions directly in the grammar, 157 rather than to call them as subroutines, since the sub- 158 routine call overhead will be saved. 159 160 -pa This option is the same as -p except that the output is 161 annotated with the first sets determined from grammar 162 analysis. 163 164 -prc on 165 Turn on the computation and hoisting of predicate con- 166 text. 167 168 -prc off 169 Turn off the computation and hoisting of predicate con- 170 text. This option makes 1.10 behave like the 1.06 171 release with option -pr on. Context computation is off 172 by default. 173 174 -rl _n 175 Limit the maximum number of tree nodes used by grammar 176 analysis to _n. Occasionally, _a_n_t_l_r is unable to 177 analyze a grammar submitted by the user. This rare 178 situation can only occur when the grammar is large and 179 the amount of lookahead is greater than one. A non- 180 linear analysis algorithm is used by PCCTS to handle 181 the general case of LL(k) parsing. The average com- 182 plexity of analysis, however, is near linear due to 183 some fancy footwork in the implementation which reduces 184 the number of calls to the full LL(k) algorithm. An 185 error message will be displayed, if this limit is 186 reached, which indicates the grammar construct being 187 analyzed when _a_n_t_l_r hit a non-linearity. Use this 188 option if _a_n_t_l_r seems to go out to lunch and your disk 189 start thrashing; try _n=10000 to start. Once the 190 offending construct has been identified, try to remove 191 the ambiguity that _a_n_t_l_r was trying to overcome with 192 large lookahead analysis. The introduction of (...)? 193 backtracking blocks eliminates some of these problems - 194 _a_n_t_l_r does not analyze alternatives that begin with 195 (...)? (it simply backtracks, if necessary, at run 196 time). 197 198 -w1 Set low warning level. Do not warn if semantic 199 predicates and/or (...)? blocks are assumed to cover 200 ambiguous alternatives. 201 202 -w2 Ambiguous parsing decisions yield warnings even if 203 semantic predicates or (...)? blocks are used. Warn if 204 predicate context computed and semantic predicates 205 incompletely disambiguate alternative productions. 206 207 - Read grammar from standard input and generate stdin.c 208 as the parser file. 209 210 SPECIAL CONSIDERATIONS 211 _A_n_t_l_r works... we think. There is no implicit guarantee of 212 anything. We reserve no legal rights to the software known 213 as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS 214 is in the public domain. An individual or company may do 215 whatever they wish with source code distributed with PCCTS 216 or the code generated by PCCTS, including the incorporation 217 of PCCTS, or its output, into commercial software. We 218 encourage users to develop software with PCCTS. However, we 219 do ask that credit is given to us for developing PCCTS. By 220 "credit", we mean that if you incorporate our source code 221 into one of your programs (commercial product, research pro- 222 ject, or otherwise) that you acknowledge this fact somewhere 223 in the documentation, research report, etc... If you like 224 PCCTS and have developed a nice tool with the output, please 225 mention that you developed it using PCCTS. As long as these 226 guidelines are followed, we expect to continue enhancing 227 this system and expect to make other tools available as they 228 are completed. 229 230 FILES 231 *.c output C parser. 232 233 *.cpp 234 output C++ parser when C++ mode is used. 235 236 parser.dlg 237 output _d_l_g lexical analyzer. 238 239 err.c 240 token string array, error sets and error support rou- 241 tines. Not used in C++ mode. 242 243 remap.h 244 file that redefines all globally visible parser sym- 245 bols. The use of the #parser directive creates this 246 file. Not used in C++ mode. 247 248 stdpccts.h 249 list of definitions needed by C files, not generated by 250 PCCTS, that reference PCCTS objects. This is not gen- 251 erated by default. Not used in C++ mode. 252 253 tokens.h 254 output #_d_e_f_i_n_e_s for tokens used and function prototypes 255 for functions generated for rules. 256 257 258 SEE ALSO 259 dlg(1), pccts(1) 260 261 262 263 264 265