README
1 This directory contains some examples illustrating techniques for extracting
2 high-performance from flex scanners. Each program implements a simplified
3 version of the Unix "wc" tool: read text from stdin and print the number of
4 characters, words, and lines present in the text. All programs were compiled
5 using gcc (version unavailable, sorry) with the -O flag, and run on a
6 SPARCstation 1+. The input used was a PostScript file, mainly containing
7 figures, with the following "wc" counts:
8
9 lines words characters
10 214217 635954 2592172
11
12
13 The basic principles illustrated by these programs are:
14
15 - match as much text with each rule as possible
16 - adding rules does not slow you down!
17 - avoid backing up
18
19 and the big caveat that comes with them is:
20
21 - you buy performance with decreased maintainability; make
22 sure you really need it before applying the above techniques.
23
24 See the "Performance Considerations" section of flexdoc for more
25 details regarding these principles.
26
27
28 The different versions of "wc":
29
30 mywc.c
31 a simple but fairly efficient C version
32
33 wc1.l a naive flex "wc" implementation
34
35 wc2.l somewhat faster; adds rules to match multiple tokens at once
36
37 wc3.l faster still; adds more rules to match longer runs of tokens
38
39 wc4.l fastest; still more rules added; hard to do much better
40 using flex (or, I suspect, hand-coding)
41
42 wc5.l identical to wc3.l except one rule has been slightly
43 shortened, introducing backing-up
44
45 Timing results (all times in user CPU seconds):
46
47 program time notes
48 ------- ---- -----
49 wc1 16.4 default flex table compression (= -Cem)
50 wc1 6.7 -Cf compression option
51 /bin/wc 5.8 Sun's standard "wc" tool
52 mywc 4.6 simple but better C implementation!
53 wc2 4.6 as good as C implementation; built using -Cf
54 wc3 3.8 -Cf
55 wc4 3.3 -Cf
56 wc5 5.7 -Cf; ouch, backing up is expensive
57