README
1 re2c
2 ----
3
4 Version 0.9.1
5 Originally written by Peter Bumbulis (peterr (a] csg.uwaterloo.ca)
6 Currently maintained by Brian Young (bayoung (a] acm.org)
7
8 The re2c distribution can be found at:
9
10 http://www.tildeslash.org/re2c/index.html
11
12 The source distribution is available from:
13
14 http://www.tildeslash.org/re2c/re2c-0.9.1.tar.gz
15
16 This distribution is a cleaned up version of the 0.5 release
17 maintained by me (Brian Young). Several bugs were fixed as well
18 as code cleanup for warning free compilation. It has been developed
19 and tested with egcs 1.0.2 and gcc 2.7.2.3 on Linux x86. Peter
20 Bumbulis' original release can be found at:
21
22 ftp://csg.uwaterloo.ca/pub/peterr/re2c.0.5.tar.gz
23
24 re2c is a great tool for writing fast and flexible lexers. It has
25 served many people well for many years and it deserves to be
26 maintained more actively. re2c is on the order of 2-3 times faster
27 than a flex based scanner, and its input model is much more
28 flexible.
29
30 Patches and requests for features will be entertained. Areas of
31 particular interest to me are porting (a Solaris and an NT
32 version will be forthcoming) and wide character support. Note
33 that the code is already quite portable and should be buildable
34 on any platform with minor makefile changes.
35
36 Peter's original version 0.5 ANNOUNCE and README follows.
37
38 Brian
39
40 --
41
42 re2c is a tool for generating C-based recognizers from regular
43 expressions. re2c-based scanners are efficient: for programming
44 languages, given similar specifications, an re2c-based scanner is
45 typically almost twice as fast as a flex-based scanner with little or no
46 increase in size (possibly a decrease on cisc architectures). Indeed,
47 re2c-based scanners are quite competitive with hand-crafted ones.
48
49 Unlike flex, re2c does not generate complete scanners: the user must
50 supply some interface code. While this code is not bulky (about 50-100
51 lines for a flex-like scanner; see the man page and examples in the
52 distribution) careful coding is required for efficiency (and
53 correctness). One advantage of this arrangement is that the generated
54 code is not tied to any particular input model. For example, re2c
55 generated code can be used to scan data from a null-byte terminated
56 buffer as illustrated below.
57
58 Given the following source
59
60 #define NULL ((char*) 0)
61 char *scan(char *p){
62 char *q;
63 #define YYCTYPE char
64 #define YYCURSOR p
65 #define YYLIMIT p
66 #define YYMARKER q
67 #define YYFILL(n)
68 /*!re2c
69 [0-9]+ {return YYCURSOR;}
70 [\000-\377] {return NULL;}
71 */
72 }
73
74 re2c will generate
75
76 /* Generated by re2c on Sat Apr 16 11:40:58 1994 */
77 #line 1 "simple.re"
78 #define NULL ((char*) 0)
79 char *scan(char *p){
80 char *q;
81 #define YYCTYPE char
82 #define YYCURSOR p
83 #define YYLIMIT p
84 #define YYMARKER q
85 #define YYFILL(n)
86 {
87 YYCTYPE yych;
88 unsigned int yyaccept;
89 goto yy0;
90 yy1: ++YYCURSOR;
91 yy0:
92 if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
93 yych = *YYCURSOR;
94 if(yych <= '/') goto yy4;
95 if(yych >= ':') goto yy4;
96 yy2: yych = *++YYCURSOR;
97 goto yy7;
98 yy3:
99 #line 10
100 {return YYCURSOR;}
101 yy4: yych = *++YYCURSOR;
102 yy5:
103 #line 11
104 {return NULL;}
105 yy6: ++YYCURSOR;
106 if(YYLIMIT == YYCURSOR) YYFILL(1);
107 yych = *YYCURSOR;
108 yy7: if(yych <= '/') goto yy3;
109 if(yych <= '9') goto yy6;
110 goto yy3;
111 }
112 #line 12
113
114 }
115
116 Note that most compilers will perform dead-code elimination to remove
117 all YYCURSOR, YYLIMIT comparisions.
118
119 re2c was developed for a particular project (constructing a fast REXX
120 scanner of all things!) and so while it has some rough edges, it should
121 be quite usable. More information about re2c can be found in the
122 (admittedly skimpy) man page; the algorithms and heuristics used are
123 described in an upcoming LOPLAS article (included in the distribution).
124 Probably the best way to find out more about re2c is to try the supplied
125 examples. re2c is written in C++, and is currently being developed
126 under Linux using gcc 2.5.8.
127
128 Peter
129
130 --
131
132 re2c is distributed with no warranty whatever. The code is certain to
133 contain errors. Neither the author nor any contributor takes
134 responsibility for any consequences of its use.
135
136 re2c is in the public domain. The data structures and algorithms used
137 in re2c are all either taken from documents available to the general
138 public or are inventions of the author. Programs generated by re2c may
139 be distributed freely. re2c itself may be distributed freely, in source
140 or binary, unchanged or modified. Distributors may charge whatever fees
141 they can obtain for re2c.
142
143 If you do make use of re2c, or incorporate it into a larger project an
144 acknowledgement somewhere (documentation, research report, etc.) would
145 be appreciated.
146
147 Please send bug reports and feedback (including suggestions for
148 improving the distribution) to
149
150 peterr (a] csg.uwaterloo.ca
151
152 Include a small example and the banner from parser.y with bug reports.
153
154