Home | History | Annotate | Download | only in texinfo
      1 \input texinfo
      2 @c %**start of header
      3 @setfilename flex.info
      4 @settitle Flex - a scanner generator
      5 @c @finalout
      6 @c @setchapternewpage odd
      7 @c %**end of header
      8 
      9 @set EDITION 2.5
     10 @set UPDATED March 1995
     11 @set VERSION 2.5
     12 
     13 @c FIXME - Reread a printed copy with a red pen and patience.
     14 @c FIXME - Modify all "See ..." references and replace with @xref's.
     15 
     16 @ifinfo
     17 @format
     18 START-INFO-DIR-ENTRY
     19 * Flex: (flex).         A fast scanner generator.
     20 END-INFO-DIR-ENTRY
     21 @end format
     22 @end ifinfo
     23 
     24 @c Define new indices for commands, filenames, and options.
     25 @c @defcodeindex cm
     26 @c @defcodeindex fl
     27 @c @defcodeindex op
     28 
     29 @c Put everything in one index (arbitrarily chosen to be the concept index).
     30 @c @syncodeindex cm cp
     31 @c @syncodeindex fl cp
     32 @syncodeindex fn cp
     33 @syncodeindex ky cp
     34 @c @syncodeindex op cp
     35 @syncodeindex pg cp
     36 @syncodeindex vr cp
     37 
     38 @ifinfo
     39 This file documents Flex.
     40 
     41 Copyright (c) 1990 The Regents of the University of California.
     42 All rights reserved.
     43 
     44 This code is derived from software contributed to Berkeley by
     45 Vern Paxson.
     46 
     47 The United States Government has rights in this work pursuant
     48 to contract no. DE-AC03-76SF00098 between the United States
     49 Department of Energy and the University of California.
     50 
     51 Redistribution and use in source and binary forms with or without
     52 modification are permitted provided that: (1) source distributions
     53 retain this entire copyright notice and comment, and (2)
     54 distributions including binaries display the following
     55 acknowledgement:  ``This product includes software developed by the
     56 University of California, Berkeley and its contributors'' in the
     57 documentation or other materials provided with the distribution and
     58 in all advertising materials mentioning features or use of this
     59 software.  Neither the name of the University nor the names of its
     60 contributors may be used to endorse or promote products derived
     61 from this software without specific prior written permission.
     62 
     63 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
     64 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
     65 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
     66 PURPOSE.
     67 
     68 @ignore
     69 Permission is granted to process this file through TeX and print the
     70 results, provided the printed document carries copying permission
     71 notice identical to this one except for the removal of this paragraph
     72 (this paragraph not being relevant to the printed manual).
     73 
     74 @end ignore
     75 @end ifinfo
     76 
     77 @titlepage
     78 @title Flex, version @value{VERSION}
     79 @subtitle A fast scanner generator
     80 @subtitle Edition @value{EDITION}, @value{UPDATED}
     81 @author Vern Paxson
     82 
     83 @page
     84 @vskip 0pt plus 1filll
     85 Copyright @copyright{} 1990 The Regents of the University of California.
     86 All rights reserved.
     87 
     88 This code is derived from software contributed to Berkeley by
     89 Vern Paxson.
     90 
     91 The United States Government has rights in this work pursuant
     92 to contract no. DE-AC03-76SF00098 between the United States
     93 Department of Energy and the University of California.
     94 
     95 Redistribution and use in source and binary forms with or without
     96 modification are permitted provided that: (1) source distributions
     97 retain this entire copyright notice and comment, and (2)
     98 distributions including binaries display the following
     99 acknowledgement:  ``This product includes software developed by the
    100 University of California, Berkeley and its contributors'' in the
    101 documentation or other materials provided with the distribution and
    102 in all advertising materials mentioning features or use of this
    103 software.  Neither the name of the University nor the names of its
    104 contributors may be used to endorse or promote products derived
    105 from this software without specific prior written permission.
    106 
    107 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
    108 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
    109 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    110 PURPOSE.
    111 @end titlepage
    112 
    113 @ifinfo
    114 
    115 @node Top, Name, (dir), (dir)
    116 @top flex
    117 
    118 @cindex scanner generator
    119 
    120 This manual documents @code{flex}.  It covers release @value{VERSION}.
    121 
    122 @menu
    123 * Name::                        Name
    124 * Synopsis::                    Synopsis
    125 * Overview::                    Overview
    126 * Description::                 Description
    127 * Examples::                    Some simple examples
    128 * Format::                      Format of the input file
    129 * Patterns::                    Patterns
    130 * Matching::                    How the input is matched
    131 * Actions::                     Actions
    132 * Generated scanner::           The generated scanner
    133 * Start conditions::            Start conditions
    134 * Multiple buffers::            Multiple input buffers
    135 * End-of-file rules::           End-of-file rules
    136 * Miscellaneous::               Miscellaneous macros
    137 * User variables::              Values available to the user
    138 * YACC interface::              Interfacing with @code{yacc}
    139 * Options::                     Options
    140 * Performance::                 Performance considerations
    141 * C++::                         Generating C++ scanners
    142 * Incompatibilities::           Incompatibilities with @code{lex} and POSIX
    143 * Diagnostics::                 Diagnostics
    144 * Files::                       Files
    145 * Deficiencies::                Deficiencies / Bugs
    146 * See also::                    See also
    147 * Author::                      Author
    148 @c * Index::                       Index
    149 @end menu
    150 
    151 @end ifinfo
    152 
    153 @node Name, Synopsis, Top, Top
    154 @section Name
    155 
    156 flex - fast lexical analyzer generator
    157 
    158 @node Synopsis, Overview, Name, Top
    159 @section Synopsis
    160 
    161 @example
    162 flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
    163 [--help --version] [@var{filename} @dots{}]
    164 @end example
    165 
    166 @node Overview, Description, Synopsis, Top
    167 @section Overview
    168 
    169 This manual describes @code{flex}, a tool for generating programs
    170 that perform pattern-matching on text.  The manual
    171 includes both tutorial and reference sections:
    172 
    173 @table @asis
    174 @item Description
    175 a brief overview of the tool
    176 
    177 @item Some Simple Examples
    178 
    179 @item Format Of The Input File
    180 
    181 @item Patterns
    182 the extended regular expressions used by flex
    183 
    184 @item How The Input Is Matched
    185 the rules for determining what has been matched
    186 
    187 @item Actions
    188 how to specify what to do when a pattern is matched
    189 
    190 @item The Generated Scanner
    191 details regarding the scanner that flex produces;
    192 how to control the input source
    193 
    194 @item Start Conditions
    195 introducing context into your scanners, and
    196 managing "mini-scanners"
    197 
    198 @item Multiple Input Buffers
    199 how to manipulate multiple input sources; how to
    200 scan from strings instead of files
    201 
    202 @item End-of-file Rules
    203 special rules for matching the end of the input
    204 
    205 @item Miscellaneous Macros
    206 a summary of macros available to the actions
    207 
    208 @item Values Available To The User
    209 a summary of values available to the actions
    210 
    211 @item Interfacing With Yacc
    212 connecting flex scanners together with yacc parsers
    213 
    214 @item Options
    215 flex command-line options, and the "%option"
    216 directive
    217 
    218 @item Performance Considerations
    219 how to make your scanner go as fast as possible
    220 
    221 @item Generating C++ Scanners
    222 the (experimental) facility for generating C++
    223 scanner classes
    224 
    225 @item Incompatibilities With Lex And POSIX
    226 how flex differs from AT&T lex and the POSIX lex
    227 standard
    228 
    229 @item Diagnostics
    230 those error messages produced by flex (or scanners
    231 it generates) whose meanings might not be apparent
    232 
    233 @item Files
    234 files used by flex
    235 
    236 @item Deficiencies / Bugs
    237 known problems with flex
    238 
    239 @item See Also
    240 other documentation, related tools
    241 
    242 @item Author
    243 includes contact information
    244 @end table
    245 
    246 @node Description, Examples, Overview, Top
    247 @section Description
    248 
    249 @code{flex} is a tool for generating @dfn{scanners}: programs which
    250 recognized lexical patterns in text.  @code{flex} reads the given
    251 input files, or its standard input if no file names are
    252 given, for a description of a scanner to generate.  The
    253 description is in the form of pairs of regular expressions
    254 and C code, called @dfn{rules}. @code{flex} generates as output a C
    255 source file, @file{lex.yy.c}, which defines a routine @samp{yylex()}.
    256 This file is compiled and linked with the @samp{-lfl} library to
    257 produce an executable.  When the executable is run, it
    258 analyzes its input for occurrences of the regular
    259 expressions.  Whenever it finds one, it executes the
    260 corresponding C code.
    261 
    262 @node Examples, Format, Description, Top
    263 @section Some simple examples
    264 
    265 First some simple examples to get the flavor of how one
    266 uses @code{flex}.  The following @code{flex} input specifies a scanner
    267 which whenever it encounters the string "username" will
    268 replace it with the user's login name:
    269 
    270 @example
    271 %%
    272 username    printf( "%s", getlogin() );
    273 @end example
    274 
    275 By default, any text not matched by a @code{flex} scanner is
    276 copied to the output, so the net effect of this scanner is
    277 to copy its input file to its output with each occurrence
    278 of "username" expanded.  In this input, there is just one
    279 rule.  "username" is the @var{pattern} and the "printf" is the
    280 @var{action}.  The "%%" marks the beginning of the rules.
    281 
    282 Here's another simple example:
    283 
    284 @example
    285         int num_lines = 0, num_chars = 0;
    286 
    287 %%
    288 \n      ++num_lines; ++num_chars;
    289 .       ++num_chars;
    290 
    291 %%
    292 main()
    293         @{
    294         yylex();
    295         printf( "# of lines = %d, # of chars = %d\n",
    296                 num_lines, num_chars );
    297         @}
    298 @end example
    299 
    300 This scanner counts the number of characters and the
    301 number of lines in its input (it produces no output other
    302 than the final report on the counts).  The first line
    303 declares two globals, "num_lines" and "num_chars", which
    304 are accessible both inside @samp{yylex()} and in the @samp{main()}
    305 routine declared after the second "%%".  There are two rules,
    306 one which matches a newline ("\n") and increments both the
    307 line count and the character count, and one which matches
    308 any character other than a newline (indicated by the "."
    309 regular expression).
    310 
    311 A somewhat more complicated example:
    312 
    313 @example
    314 /* scanner for a toy Pascal-like language */
    315 
    316 %@{
    317 /* need this for the call to atof() below */
    318 #include <math.h>
    319 %@}
    320 
    321 DIGIT    [0-9]
    322 ID       [a-z][a-z0-9]*
    323 
    324 %%
    325 
    326 @{DIGIT@}+    @{
    327             printf( "An integer: %s (%d)\n", yytext,
    328                     atoi( yytext ) );
    329             @}
    330 
    331 @{DIGIT@}+"."@{DIGIT@}*        @{
    332             printf( "A float: %s (%g)\n", yytext,
    333                     atof( yytext ) );
    334             @}
    335 
    336 if|then|begin|end|procedure|function        @{
    337             printf( "A keyword: %s\n", yytext );
    338             @}
    339 
    340 @{ID@}        printf( "An identifier: %s\n", yytext );
    341 
    342 "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
    343 
    344 "@{"[^@}\n]*"@}"     /* eat up one-line comments */
    345 
    346 [ \t\n]+          /* eat up whitespace */
    347 
    348 .           printf( "Unrecognized character: %s\n", yytext );
    349 
    350 %%
    351 
    352 main( argc, argv )
    353 int argc;
    354 char **argv;
    355     @{
    356     ++argv, --argc;  /* skip over program name */
    357     if ( argc > 0 )
    358             yyin = fopen( argv[0], "r" );
    359     else
    360             yyin = stdin;
    361 
    362     yylex();
    363     @}
    364 @end example
    365 
    366 This is the beginnings of a simple scanner for a language
    367 like Pascal.  It identifies different types of @var{tokens} and
    368 reports on what it has seen.
    369 
    370 The details of this example will be explained in the
    371 following sections.
    372 
    373 @node Format, Patterns, Examples, Top
    374 @section Format of the input file
    375 
    376 The @code{flex} input file consists of three sections, separated
    377 by a line with just @samp{%%} in it:
    378 
    379 @example
    380 definitions
    381 %%
    382 rules
    383 %%
    384 user code
    385 @end example
    386 
    387 The @dfn{definitions} section contains declarations of simple
    388 @dfn{name} definitions to simplify the scanner specification,
    389 and declarations of @dfn{start conditions}, which are explained
    390 in a later section.
    391 Name definitions have the form:
    392 
    393 @example
    394 name definition
    395 @end example
    396 
    397 The "name" is a word beginning with a letter or an
    398 underscore ('_') followed by zero or more letters, digits, '_',
    399 or '-' (dash).  The definition is taken to begin at the
    400 first non-white-space character following the name and
    401 continuing to the end of the line.  The definition can
    402 subsequently be referred to using "@{name@}", which will
    403 expand to "(definition)".  For example,
    404 
    405 @example
    406 DIGIT    [0-9]
    407 ID       [a-z][a-z0-9]*
    408 @end example
    409 
    410 @noindent
    411 defines "DIGIT" to be a regular expression which matches a
    412 single digit, and "ID" to be a regular expression which
    413 matches a letter followed by zero-or-more
    414 letters-or-digits.  A subsequent reference to
    415 
    416 @example
    417 @{DIGIT@}+"."@{DIGIT@}*
    418 @end example
    419 
    420 @noindent
    421 is identical to
    422 
    423 @example
    424 ([0-9])+"."([0-9])*
    425 @end example
    426 
    427 @noindent
    428 and matches one-or-more digits followed by a '.' followed
    429 by zero-or-more digits.
    430 
    431 The @var{rules} section of the @code{flex} input contains a series of
    432 rules of the form:
    433 
    434 @example
    435 pattern   action
    436 @end example
    437 
    438 @noindent
    439 where the pattern must be unindented and the action must
    440 begin on the same line.
    441 
    442 See below for a further description of patterns and
    443 actions.
    444 
    445 Finally, the user code section is simply copied to
    446 @file{lex.yy.c} verbatim.  It is used for companion routines
    447 which call or are called by the scanner.  The presence of
    448 this section is optional; if it is missing, the second @samp{%%}
    449 in the input file may be skipped, too.
    450 
    451 In the definitions and rules sections, any @emph{indented} text or
    452 text enclosed in @samp{%@{} and @samp{%@}} is copied verbatim to the
    453 output (with the @samp{%@{@}}'s removed).  The @samp{%@{@}}'s must
    454 appear unindented on lines by themselves.
    455 
    456 In the rules section, any indented or %@{@} text appearing
    457 before the first rule may be used to declare variables
    458 which are local to the scanning routine and (after the
    459 declarations) code which is to be executed whenever the
    460 scanning routine is entered.  Other indented or %@{@} text
    461 in the rule section is still copied to the output, but its
    462 meaning is not well-defined and it may well cause
    463 compile-time errors (this feature is present for @code{POSIX} compliance;
    464 see below for other such features).
    465 
    466 In the definitions section (but not in the rules section),
    467 an unindented comment (i.e., a line beginning with "/*")
    468 is also copied verbatim to the output up to the next "*/".
    469 
    470 @node Patterns, Matching, Format, Top
    471 @section Patterns
    472 
    473 The patterns in the input are written using an extended
    474 set of regular expressions.  These are:
    475 
    476 @table @samp
    477 @item x
    478 match the character @samp{x}
    479 @item .
    480 any character (byte) except newline
    481 @item [xyz]
    482 a "character class"; in this case, the pattern
    483 matches either an @samp{x}, a @samp{y}, or a @samp{z}
    484 @item [abj-oZ]
    485 a "character class" with a range in it; matches
    486 an @samp{a}, a @samp{b}, any letter from @samp{j} through @samp{o},
    487 or a @samp{Z}
    488 @item [^A-Z]
    489 a "negated character class", i.e., any character
    490 but those in the class.  In this case, any
    491 character EXCEPT an uppercase letter.
    492 @item [^A-Z\n]
    493 any character EXCEPT an uppercase letter or
    494 a newline
    495 @item @var{r}*
    496 zero or more @var{r}'s, where @var{r} is any regular expression
    497 @item @var{r}+
    498 one or more @var{r}'s
    499 @item @var{r}?
    500 zero or one @var{r}'s (that is, "an optional @var{r}")
    501 @item @var{r}@{2,5@}
    502 anywhere from two to five @var{r}'s
    503 @item @var{r}@{2,@}
    504 two or more @var{r}'s
    505 @item @var{r}@{4@}
    506 exactly 4 @var{r}'s
    507 @item @{@var{name}@}
    508 the expansion of the "@var{name}" definition
    509 (see above)
    510 @item "[xyz]\"foo"
    511 the literal string: @samp{[xyz]"foo}
    512 @item \@var{x}
    513 if @var{x} is an @samp{a}, @samp{b}, @samp{f}, @samp{n}, @samp{r}, @samp{t}, or @samp{v},
    514 then the ANSI-C interpretation of \@var{x}.
    515 Otherwise, a literal @samp{@var{x}} (used to escape
    516 operators such as @samp{*})
    517 @item \0
    518 a NUL character (ASCII code 0)
    519 @item \123
    520 the character with octal value 123
    521 @item \x2a
    522 the character with hexadecimal value @code{2a}
    523 @item (@var{r})
    524 match an @var{r}; parentheses are used to override
    525 precedence (see below)
    526 @item @var{r}@var{s}
    527 the regular expression @var{r} followed by the
    528 regular expression @var{s}; called "concatenation"
    529 @item @var{r}|@var{s}
    530 either an @var{r} or an @var{s}
    531 @item @var{r}/@var{s}
    532 an @var{r} but only if it is followed by an @var{s}.  The text
    533 matched by @var{s} is included when determining whether this rule is
    534 the @dfn{longest match}, but is then returned to the input before
    535 the action is executed.  So the action only sees the text matched
    536 by @var{r}.  This type of pattern is called @dfn{trailing context}.
    537 (There are some combinations of @samp{@var{r}/@var{s}} that @code{flex}
    538 cannot match correctly; see notes in the Deficiencies / Bugs section
    539 below regarding "dangerous trailing context".)
    540 @item ^@var{r}
    541 an @var{r}, but only at the beginning of a line (i.e.,
    542 which just starting to scan, or right after a
    543 newline has been scanned).
    544 @item @var{r}$
    545 an @var{r}, but only at the end of a line (i.e., just
    546 before a newline).  Equivalent to "@var{r}/\n".
    547 
    548 Note that flex's notion of "newline" is exactly
    549 whatever the C compiler used to compile flex
    550 interprets '\n' as; in particular, on some DOS
    551 systems you must either filter out \r's in the
    552 input yourself, or explicitly use @var{r}/\r\n for "r$".
    553 @item <@var{s}>@var{r}
    554 an @var{r}, but only in start condition @var{s} (see
    555 below for discussion of start conditions)
    556 <@var{s1},@var{s2},@var{s3}>@var{r}
    557 same, but in any of start conditions @var{s1},
    558 @var{s2}, or @var{s3}
    559 @item <*>@var{r}
    560 an @var{r} in any start condition, even an exclusive one.
    561 @item <<EOF>>
    562 an end-of-file
    563 <@var{s1},@var{s2}><<EOF>>
    564 an end-of-file when in start condition @var{s1} or @var{s2}
    565 @end table
    566 
    567 Note that inside of a character class, all regular
    568 expression operators lose their special meaning except escape
    569 ('\') and the character class operators, '-', ']', and, at
    570 the beginning of the class, '^'.
    571 
    572 The regular expressions listed above are grouped according
    573 to precedence, from highest precedence at the top to
    574 lowest at the bottom.  Those grouped together have equal
    575 precedence.  For example,
    576 
    577 @example
    578 foo|bar*
    579 @end example
    580 
    581 @noindent
    582 is the same as
    583 
    584 @example
    585 (foo)|(ba(r*))
    586 @end example
    587 
    588 @noindent
    589 since the '*' operator has higher precedence than
    590 concatenation, and concatenation higher than alternation ('|').
    591 This pattern therefore matches @emph{either} the string "foo" @emph{or}
    592 the string "ba" followed by zero-or-more r's.  To match
    593 "foo" or zero-or-more "bar"'s, use:
    594 
    595 @example
    596 foo|(bar)*
    597 @end example
    598 
    599 @noindent
    600 and to match zero-or-more "foo"'s-or-"bar"'s:
    601 
    602 @example
    603 (foo|bar)*
    604 @end example
    605 
    606 In addition to characters and ranges of characters,
    607 character classes can also contain character class
    608 @dfn{expressions}.  These are expressions enclosed inside @samp{[}: and @samp{:}]
    609 delimiters (which themselves must appear between the '['
    610 and ']' of the character class; other elements may occur
    611 inside the character class, too).  The valid expressions
    612 are:
    613 
    614 @example
    615 [:alnum:] [:alpha:] [:blank:]
    616 [:cntrl:] [:digit:] [:graph:]
    617 [:lower:] [:print:] [:punct:]
    618 [:space:] [:upper:] [:xdigit:]
    619 @end example
    620 
    621 These expressions all designate a set of characters
    622 equivalent to the corresponding standard C @samp{isXXX} function.  For
    623 example, @samp{[:alnum:]} designates those characters for which
    624 @samp{isalnum()} returns true - i.e., any alphabetic or numeric.
    625 Some systems don't provide @samp{isblank()}, so flex defines
    626 @samp{[:blank:]} as a blank or a tab.
    627 
    628 For example, the following character classes are all
    629 equivalent:
    630 
    631 @example
    632 [[:alnum:]]
    633 [[:alpha:][:digit:]
    634 [[:alpha:]0-9]
    635 [a-zA-Z0-9]
    636 @end example
    637 
    638 If your scanner is case-insensitive (the @samp{-i} flag), then
    639 @samp{[:upper:]} and @samp{[:lower:]} are equivalent to @samp{[:alpha:]}.
    640 
    641 Some notes on patterns:
    642 
    643 @itemize -
    644 @item
    645 A negated character class such as the example
    646 "[^A-Z]" above @emph{will match a newline} unless "\n" (or an
    647 equivalent escape sequence) is one of the
    648 characters explicitly present in the negated character
    649 class (e.g., "[^A-Z\n]").  This is unlike how many
    650 other regular expression tools treat negated
    651 character classes, but unfortunately the inconsistency
    652 is historically entrenched.  Matching newlines
    653 means that a pattern like [^"]* can match the
    654 entire input unless there's another quote in the
    655 input.
    656 
    657 @item
    658 A rule can have at most one instance of trailing
    659 context (the '/' operator or the '$' operator).
    660 The start condition, '^', and "<<EOF>>" patterns
    661 can only occur at the beginning of a pattern, and,
    662 as well as with '/' and '$', cannot be grouped
    663 inside parentheses.  A '^' which does not occur at
    664 the beginning of a rule or a '$' which does not
    665 occur at the end of a rule loses its special
    666 properties and is treated as a normal character.
    667 
    668 The following are illegal:
    669 
    670 @example
    671 foo/bar$
    672 <sc1>foo<sc2>bar
    673 @end example
    674 
    675 Note that the first of these, can be written
    676 "foo/bar\n".
    677 
    678 The following will result in '$' or '^' being
    679 treated as a normal character:
    680 
    681 @example
    682 foo|(bar$)
    683 foo|^bar
    684 @end example
    685 
    686 If what's wanted is a "foo" or a
    687 bar-followed-by-a-newline, the following could be used (the special
    688 '|' action is explained below):
    689 
    690 @example
    691 foo      |
    692 bar$     /* action goes here */
    693 @end example
    694 
    695 A similar trick will work for matching a foo or a
    696 bar-at-the-beginning-of-a-line.
    697 @end itemize
    698 
    699 @node Matching, Actions, Patterns, Top
    700 @section How the input is matched
    701 
    702 When the generated scanner is run, it analyzes its input
    703 looking for strings which match any of its patterns.  If
    704 it finds more than one match, it takes the one matching
    705 the most text (for trailing context rules, this includes
    706 the length of the trailing part, even though it will then
    707 be returned to the input).  If it finds two or more
    708 matches of the same length, the rule listed first in the
    709 @code{flex} input file is chosen.
    710 
    711 Once the match is determined, the text corresponding to
    712 the match (called the @var{token}) is made available in the
    713 global character pointer @code{yytext}, and its length in the
    714 global integer @code{yyleng}.  The @var{action} corresponding to the
    715 matched pattern is then executed (a more detailed
    716 description of actions follows), and then the remaining input is
    717 scanned for another match.
    718 
    719 If no match is found, then the @dfn{default rule} is executed:
    720 the next character in the input is considered matched and
    721 copied to the standard output.  Thus, the simplest legal
    722 @code{flex} input is:
    723 
    724 @example
    725 %%
    726 @end example
    727 
    728 which generates a scanner that simply copies its input
    729 (one character at a time) to its output.
    730 
    731 Note that @code{yytext} can be defined in two different ways:
    732 either as a character @emph{pointer} or as a character @emph{array}.
    733 You can control which definition @code{flex} uses by including
    734 one of the special directives @samp{%pointer} or @samp{%array} in the
    735 first (definitions) section of your flex input.  The
    736 default is @samp{%pointer}, unless you use the @samp{-l} lex
    737 compatibility option, in which case @code{yytext} will be an array.  The
    738 advantage of using @samp{%pointer} is substantially faster
    739 scanning and no buffer overflow when matching very large
    740 tokens (unless you run out of dynamic memory).  The
    741 disadvantage is that you are restricted in how your actions can
    742 modify @code{yytext} (see the next section), and calls to the
    743 @samp{unput()} function destroys the present contents of @code{yytext},
    744 which can be a considerable porting headache when moving
    745 between different @code{lex} versions.
    746 
    747 The advantage of @samp{%array} is that you can then modify @code{yytext}
    748 to your heart's content, and calls to @samp{unput()} do not
    749 destroy @code{yytext} (see below).  Furthermore, existing @code{lex}
    750 programs sometimes access @code{yytext} externally using
    751 declarations of the form:
    752 @example
    753 extern char yytext[];
    754 @end example
    755 This definition is erroneous when used with @samp{%pointer}, but
    756 correct for @samp{%array}.
    757 
    758 @samp{%array} defines @code{yytext} to be an array of @code{YYLMAX} characters,
    759 which defaults to a fairly large value.  You can change
    760 the size by simply #define'ing @code{YYLMAX} to a different value
    761 in the first section of your @code{flex} input.  As mentioned
    762 above, with @samp{%pointer} yytext grows dynamically to
    763 accommodate large tokens.  While this means your @samp{%pointer} scanner
    764 can accommodate very large tokens (such as matching entire
    765 blocks of comments), bear in mind that each time the
    766 scanner must resize @code{yytext} it also must rescan the entire
    767 token from the beginning, so matching such tokens can
    768 prove slow.  @code{yytext} presently does @emph{not} dynamically grow if
    769 a call to @samp{unput()} results in too much text being pushed
    770 back; instead, a run-time error results.
    771 
    772 Also note that you cannot use @samp{%array} with C++ scanner
    773 classes (the @code{c++} option; see below).
    774 
    775 @node Actions, Generated scanner, Matching, Top
    776 @section Actions
    777 
    778 Each pattern in a rule has a corresponding action, which
    779 can be any arbitrary C statement.  The pattern ends at the
    780 first non-escaped whitespace character; the remainder of
    781 the line is its action.  If the action is empty, then when
    782 the pattern is matched the input token is simply
    783 discarded.  For example, here is the specification for a
    784 program which deletes all occurrences of "zap me" from its
    785 input:
    786 
    787 @example
    788 %%
    789 "zap me"
    790 @end example
    791 
    792 (It will copy all other characters in the input to the
    793 output since they will be matched by the default rule.)
    794 
    795 Here is a program which compresses multiple blanks and
    796 tabs down to a single blank, and throws away whitespace
    797 found at the end of a line:
    798 
    799 @example
    800 %%
    801 [ \t]+        putchar( ' ' );
    802 [ \t]+$       /* ignore this token */
    803 @end example
    804 
    805 If the action contains a '@{', then the action spans till
    806 the balancing '@}' is found, and the action may cross
    807 multiple lines.  @code{flex} knows about C strings and comments and
    808 won't be fooled by braces found within them, but also
    809 allows actions to begin with @samp{%@{} and will consider the
    810 action to be all the text up to the next @samp{%@}} (regardless of
    811 ordinary braces inside the action).
    812 
    813 An action consisting solely of a vertical bar ('|') means
    814 "same as the action for the next rule." See below for an
    815 illustration.
    816 
    817 Actions can include arbitrary C code, including @code{return}
    818 statements to return a value to whatever routine called
    819 @samp{yylex()}.  Each time @samp{yylex()} is called it continues
    820 processing tokens from where it last left off until it either
    821 reaches the end of the file or executes a return.
    822 
    823 Actions are free to modify @code{yytext} except for lengthening
    824 it (adding characters to its end--these will overwrite
    825 later characters in the input stream).  This however does
    826 not apply when using @samp{%array} (see above); in that case,
    827 @code{yytext} may be freely modified in any way.
    828 
    829 Actions are free to modify @code{yyleng} except they should not
    830 do so if the action also includes use of @samp{yymore()} (see
    831 below).
    832 
    833 There are a number of special directives which can be
    834 included within an action:
    835 
    836 @itemize -
    837 @item
    838 @samp{ECHO} copies yytext to the scanner's output.
    839 
    840 @item
    841 @code{BEGIN} followed by the name of a start condition
    842 places the scanner in the corresponding start
    843 condition (see below).
    844 
    845 @item
    846 @code{REJECT} directs the scanner to proceed on to the
    847 "second best" rule which matched the input (or a
    848 prefix of the input).  The rule is chosen as
    849 described above in "How the Input is Matched", and
    850 @code{yytext} and @code{yyleng} set up appropriately.  It may
    851 either be one which matched as much text as the
    852 originally chosen rule but came later in the @code{flex}
    853 input file, or one which matched less text.  For
    854 example, the following will both count the words in
    855 the input and call the routine special() whenever
    856 "frob" is seen:
    857 
    858 @example
    859         int word_count = 0;
    860 %%
    861 
    862 frob        special(); REJECT;
    863 [^ \t\n]+   ++word_count;
    864 @end example
    865 
    866 Without the @code{REJECT}, any "frob"'s in the input would
    867 not be counted as words, since the scanner normally
    868 executes only one action per token.  Multiple
    869 @code{REJECT's} are allowed, each one finding the next
    870 best choice to the currently active rule.  For
    871 example, when the following scanner scans the token
    872 "abcd", it will write "abcdabcaba" to the output:
    873 
    874 @example
    875 %%
    876 a        |
    877 ab       |
    878 abc      |
    879 abcd     ECHO; REJECT;
    880 .|\n     /* eat up any unmatched character */
    881 @end example
    882 
    883 (The first three rules share the fourth's action
    884 since they use the special '|' action.)  @code{REJECT} is
    885 a particularly expensive feature in terms of
    886 scanner performance; if it is used in @emph{any} of the
    887 scanner's actions it will slow down @emph{all} of the
    888 scanner's matching.  Furthermore, @code{REJECT} cannot be used
    889 with the @samp{-Cf} or @samp{-CF} options (see below).
    890 
    891 Note also that unlike the other special actions,
    892 @code{REJECT} is a @emph{branch}; code immediately following it
    893 in the action will @emph{not} be executed.
    894 
    895 @item
    896 @samp{yymore()} tells the scanner that the next time it
    897 matches a rule, the corresponding token should be
    898 @emph{appended} onto the current value of @code{yytext} rather
    899 than replacing it.  For example, given the input
    900 "mega-kludge" the following will write
    901 "mega-mega-kludge" to the output:
    902 
    903 @example
    904 %%
    905 mega-    ECHO; yymore();
    906 kludge   ECHO;
    907 @end example
    908 
    909 First "mega-" is matched and echoed to the output.
    910 Then "kludge" is matched, but the previous "mega-"
    911 is still hanging around at the beginning of @code{yytext}
    912 so the @samp{ECHO} for the "kludge" rule will actually
    913 write "mega-kludge".
    914 @end itemize
    915 
    916 Two notes regarding use of @samp{yymore()}.  First, @samp{yymore()}
    917 depends on the value of @code{yyleng} correctly reflecting the
    918 size of the current token, so you must not modify @code{yyleng}
    919 if you are using @samp{yymore()}.  Second, the presence of
    920 @samp{yymore()} in the scanner's action entails a minor
    921 performance penalty in the scanner's matching speed.
    922 
    923 @itemize -
    924 @item
    925 @samp{yyless(n)} returns all but the first @var{n} characters of
    926 the current token back to the input stream, where
    927 they will be rescanned when the scanner looks for
    928 the next match.  @code{yytext} and @code{yyleng} are adjusted
    929 appropriately (e.g., @code{yyleng} will now be equal to @var{n}
    930 ).  For example, on the input "foobar" the
    931 following will write out "foobarbar":
    932 
    933 @example
    934 %%
    935 foobar    ECHO; yyless(3);
    936 [a-z]+    ECHO;
    937 @end example
    938 
    939 An argument of 0 to @code{yyless} will cause the entire
    940 current input string to be scanned again.  Unless
    941 you've changed how the scanner will subsequently
    942 process its input (using @code{BEGIN}, for example), this
    943 will result in an endless loop.
    944 
    945 Note that @code{yyless} is a macro and can only be used in the
    946 flex input file, not from other source files.
    947 
    948 @item
    949 @samp{unput(c)} puts the character @code{c} back onto the input
    950 stream.  It will be the next character scanned.
    951 The following action will take the current token
    952 and cause it to be rescanned enclosed in
    953 parentheses.
    954 
    955 @example
    956 @{
    957 int i;
    958 /* Copy yytext because unput() trashes yytext */
    959 char *yycopy = strdup( yytext );
    960 unput( ')' );
    961 for ( i = yyleng - 1; i >= 0; --i )
    962     unput( yycopy[i] );
    963 unput( '(' );
    964 free( yycopy );
    965 @}
    966 @end example
    967 
    968 Note that since each @samp{unput()} puts the given
    969 character back at the @emph{beginning} of the input stream,
    970 pushing back strings must be done back-to-front.
    971 An important potential problem when using @samp{unput()} is that
    972 if you are using @samp{%pointer} (the default), a call to @samp{unput()}
    973 @emph{destroys} the contents of @code{yytext}, starting with its
    974 rightmost character and devouring one character to the left
    975 with each call.  If you need the value of yytext preserved
    976 after a call to @samp{unput()} (as in the above example), you
    977 must either first copy it elsewhere, or build your scanner
    978 using @samp{%array} instead (see How The Input Is Matched).
    979 
    980 Finally, note that you cannot put back @code{EOF} to attempt to
    981 mark the input stream with an end-of-file.
    982 
    983 @item
    984 @samp{input()} reads the next character from the input
    985 stream.  For example, the following is one way to
    986 eat up C comments:
    987 
    988 @example
    989 %%
    990 "/*"        @{
    991             register int c;
    992 
    993             for ( ; ; )
    994                 @{
    995                 while ( (c = input()) != '*' &&
    996                         c != EOF )
    997                     ;    /* eat up text of comment */
    998 
    999                 if ( c == '*' )
   1000                     @{
   1001                     while ( (c = input()) == '*' )
   1002                         ;
   1003                     if ( c == '/' )
   1004                         break;    /* found the end */
   1005                     @}
   1006 
   1007                 if ( c == EOF )
   1008                     @{
   1009                     error( "EOF in comment" );
   1010                     break;
   1011                     @}
   1012                 @}
   1013             @}
   1014 @end example
   1015 
   1016 (Note that if the scanner is compiled using @samp{C++},
   1017 then @samp{input()} is instead referred to as @samp{yyinput()},
   1018 in order to avoid a name clash with the @samp{C++} stream
   1019 by the name of @code{input}.)
   1020 
   1021 @item YY_FLUSH_BUFFER
   1022 flushes the scanner's internal buffer so that the next time the scanner
   1023 attempts to match a token, it will first refill the buffer using
   1024 @code{YY_INPUT} (see The Generated Scanner, below).  This action is
   1025 a special case of the more general @samp{yy_flush_buffer()} function,
   1026 described below in the section Multiple Input Buffers.
   1027 
   1028 @item
   1029 @samp{yyterminate()} can be used in lieu of a return
   1030 statement in an action.  It terminates the scanner
   1031 and returns a 0 to the scanner's caller, indicating
   1032 "all done".  By default, @samp{yyterminate()} is also
   1033 called when an end-of-file is encountered.  It is a
   1034 macro and may be redefined.
   1035 @end itemize
   1036 
   1037 @node Generated scanner, Start conditions, Actions, Top
   1038 @section The generated scanner
   1039 
   1040 The output of @code{flex} is the file @file{lex.yy.c}, which contains
   1041 the scanning routine @samp{yylex()}, a number of tables used by
   1042 it for matching tokens, and a number of auxiliary routines
   1043 and macros.  By default, @samp{yylex()} is declared as follows:
   1044 
   1045 @example
   1046 int yylex()
   1047     @{
   1048     @dots{} various definitions and the actions in here @dots{}
   1049     @}
   1050 @end example
   1051 
   1052 (If your environment supports function prototypes, then it
   1053 will be "int yylex( void  )".)   This  definition  may  be
   1054 changed by defining the "YY_DECL" macro.  For example, you
   1055 could use:
   1056 
   1057 @example
   1058 #define YY_DECL float lexscan( a, b ) float a, b;
   1059 @end example
   1060 
   1061 to give the scanning routine the name @code{lexscan}, returning a
   1062 float, and taking two floats as arguments.  Note that if
   1063 you give arguments to the scanning routine using a
   1064 K&R-style/non-prototyped function declaration, you must
   1065 terminate the definition with a semi-colon (@samp{;}).
   1066 
   1067 Whenever @samp{yylex()} is called, it scans tokens from the
   1068 global input file @code{yyin} (which defaults to stdin).  It
   1069 continues until it either reaches an end-of-file (at which
   1070 point it returns the value 0) or one of its actions
   1071 executes a @code{return} statement.
   1072 
   1073 If the scanner reaches an end-of-file, subsequent calls are undefined
   1074 unless either @code{yyin} is pointed at a new input file (in which case
   1075 scanning continues from that file), or @samp{yyrestart()} is called.
   1076 @samp{yyrestart()} takes one argument, a @samp{FILE *} pointer (which
   1077 can be nil, if you've set up @code{YY_INPUT} to scan from a source
   1078 other than @code{yyin}), and initializes @code{yyin} for scanning from
   1079 that file.  Essentially there is no difference between just assigning
   1080 @code{yyin} to a new input file or using @samp{yyrestart()} to do so;
   1081 the latter is available for compatibility with previous versions of
   1082 @code{flex}, and because it can be used to switch input files in the
   1083 middle of scanning.  It can also be used to throw away the current
   1084 input buffer, by calling it with an argument of @code{yyin}; but
   1085 better is to use @code{YY_FLUSH_BUFFER} (see above).  Note that
   1086 @samp{yyrestart()} does @emph{not} reset the start condition to
   1087 @code{INITIAL} (see Start Conditions, below).
   1088 
   1089 
   1090 If @samp{yylex()} stops scanning due to executing a @code{return}
   1091 statement in one of the actions, the scanner may then be called
   1092 again and it will resume scanning where it left off.
   1093 
   1094 By default (and for purposes of efficiency), the scanner
   1095 uses block-reads rather than simple @samp{getc()} calls to read
   1096 characters from @code{yyin}.  The nature of how it gets its input
   1097 can be controlled by defining the @code{YY_INPUT} macro.
   1098 YY_INPUT's calling sequence is
   1099 "YY_INPUT(buf,result,max_size)".  Its action is to place
   1100 up to @var{max_size} characters in the character array @var{buf} and
   1101 return in the integer variable @var{result} either the number of
   1102 characters read or the constant YY_NULL (0 on Unix
   1103 systems) to indicate EOF.  The default YY_INPUT reads from
   1104 the global file-pointer "yyin".
   1105 
   1106 A sample definition of YY_INPUT (in the definitions
   1107 section of the input file):
   1108 
   1109 @example
   1110 %@{
   1111 #define YY_INPUT(buf,result,max_size) \
   1112     @{ \
   1113     int c = getchar(); \
   1114     result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
   1115     @}
   1116 %@}
   1117 @end example
   1118 
   1119 This definition will change the input processing to occur
   1120 one character at a time.
   1121 
   1122 When the scanner receives an end-of-file indication from
   1123 YY_INPUT, it then checks the @samp{yywrap()} function.  If
   1124 @samp{yywrap()} returns false (zero), then it is assumed that the
   1125 function has gone ahead and set up @code{yyin} to point to
   1126 another input file, and scanning continues.  If it returns
   1127 true (non-zero), then the scanner terminates, returning 0
   1128 to its caller.  Note that in either case, the start
   1129 condition remains unchanged; it does @emph{not} revert to @code{INITIAL}.
   1130 
   1131 If you do not supply your own version of @samp{yywrap()}, then you
   1132 must either use @samp{%option noyywrap} (in which case the scanner
   1133 behaves as though @samp{yywrap()} returned 1), or you must link with
   1134 @samp{-lfl} to obtain the default version of the routine, which always
   1135 returns 1.
   1136 
   1137 Three routines are available for scanning from in-memory
   1138 buffers rather than files: @samp{yy_scan_string()},
   1139 @samp{yy_scan_bytes()}, and @samp{yy_scan_buffer()}.  See the discussion
   1140 of them below in the section Multiple Input Buffers.
   1141 
   1142 The scanner writes its @samp{ECHO} output to the @code{yyout} global
   1143 (default, stdout), which may be redefined by the user
   1144 simply by assigning it to some other @code{FILE} pointer.
   1145 
   1146 @node Start conditions, Multiple buffers, Generated scanner, Top
   1147 @section Start conditions
   1148 
   1149 @code{flex} provides a mechanism for conditionally activating
   1150 rules.  Any rule whose pattern is prefixed with "<sc>"
   1151 will only be active when the scanner is in the start
   1152 condition named "sc".  For example,
   1153 
   1154 @example
   1155 <STRING>[^"]*        @{ /* eat up the string body ... */
   1156             @dots{}
   1157             @}
   1158 @end example
   1159 
   1160 @noindent
   1161 will be active only when the scanner is in the "STRING"
   1162 start condition, and
   1163 
   1164 @example
   1165 <INITIAL,STRING,QUOTE>\.        @{ /* handle an escape ... */
   1166             @dots{}
   1167             @}
   1168 @end example
   1169 
   1170 @noindent
   1171 will be active only when the current start condition is
   1172 either "INITIAL", "STRING", or "QUOTE".
   1173 
   1174 Start conditions are declared in the definitions (first)
   1175 section of the input using unindented lines beginning with
   1176 either @samp{%s} or @samp{%x} followed by a list of names.  The former
   1177 declares @emph{inclusive} start conditions, the latter @emph{exclusive}
   1178 start conditions.  A start condition is activated using
   1179 the @code{BEGIN} action.  Until the next @code{BEGIN} action is
   1180 executed, rules with the given start condition will be active
   1181 and rules with other start conditions will be inactive.
   1182 If the start condition is @emph{inclusive}, then rules with no
   1183 start conditions at all will also be active.  If it is
   1184 @emph{exclusive}, then @emph{only} rules qualified with the start
   1185 condition will be active.  A set of rules contingent on the
   1186 same exclusive start condition describe a scanner which is
   1187 independent of any of the other rules in the @code{flex} input.
   1188 Because of this, exclusive start conditions make it easy
   1189 to specify "mini-scanners" which scan portions of the
   1190 input that are syntactically different from the rest
   1191 (e.g., comments).
   1192 
   1193 If the distinction between inclusive and exclusive start
   1194 conditions is still a little vague, here's a simple
   1195 example illustrating the connection between the two.  The set
   1196 of rules:
   1197 
   1198 @example
   1199 %s example
   1200 %%
   1201 
   1202 <example>foo   do_something();
   1203 
   1204 bar            something_else();
   1205 @end example
   1206 
   1207 @noindent
   1208 is equivalent to
   1209 
   1210 @example
   1211 %x example
   1212 %%
   1213 
   1214 <example>foo   do_something();
   1215 
   1216 <INITIAL,example>bar    something_else();
   1217 @end example
   1218 
   1219 Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern
   1220 in the second example wouldn't be active (i.e., couldn't match) when
   1221 in start condition @samp{example}.  If we just used @samp{<example>}
   1222 to qualify @samp{bar}, though, then it would only be active in
   1223 @samp{example} and not in @code{INITIAL}, while in the first example
   1224 it's active in both, because in the first example the @samp{example}
   1225 starting condition is an @emph{inclusive} (@samp{%s}) start condition.
   1226 
   1227 Also note that the special start-condition specifier @samp{<*>}
   1228 matches every start condition.  Thus, the above example
   1229 could also have been written;
   1230 
   1231 @example
   1232 %x example
   1233 %%
   1234 
   1235 <example>foo   do_something();
   1236 
   1237 <*>bar    something_else();
   1238 @end example
   1239 
   1240 The default rule (to @samp{ECHO} any unmatched character) remains
   1241 active in start conditions.  It is equivalent to:
   1242 
   1243 @example
   1244 <*>.|\\n     ECHO;
   1245 @end example
   1246 
   1247 @samp{BEGIN(0)} returns to the original state where only the
   1248 rules with no start conditions are active.  This state can
   1249 also be referred to as the start-condition "INITIAL", so
   1250 @samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}.  (The
   1251 parentheses around the start condition name are not required but
   1252 are considered good style.)
   1253 
   1254 @code{BEGIN} actions can also be given as indented code at the
   1255 beginning of the rules section.  For example, the
   1256 following will cause the scanner to enter the "SPECIAL" start
   1257 condition whenever @samp{yylex()} is called and the global
   1258 variable @code{enter_special} is true:
   1259 
   1260 @example
   1261         int enter_special;
   1262 
   1263 %x SPECIAL
   1264 %%
   1265         if ( enter_special )
   1266             BEGIN(SPECIAL);
   1267 
   1268 <SPECIAL>blahblahblah
   1269 @dots{}more rules follow@dots{}
   1270 @end example
   1271 
   1272 To illustrate the uses of start conditions, here is a
   1273 scanner which provides two different interpretations of a
   1274 string like "123.456".  By default it will treat it as as
   1275 three tokens, the integer "123", a dot ('.'), and the
   1276 integer "456".  But if the string is preceded earlier in
   1277 the line by the string "expect-floats" it will treat it as
   1278 a single token, the floating-point number 123.456:
   1279 
   1280 @example
   1281 %@{
   1282 #include <math.h>
   1283 %@}
   1284 %s expect
   1285 
   1286 %%
   1287 expect-floats        BEGIN(expect);
   1288 
   1289 <expect>[0-9]+"."[0-9]+      @{
   1290             printf( "found a float, = %f\n",
   1291                     atof( yytext ) );
   1292             @}
   1293 <expect>\n           @{
   1294             /* that's the end of the line, so
   1295              * we need another "expect-number"
   1296              * before we'll recognize any more
   1297              * numbers
   1298              */
   1299             BEGIN(INITIAL);
   1300             @}
   1301 
   1302 [0-9]+      @{
   1303 
   1304 Version 2.5               December 1994                        18
   1305 
   1306             printf( "found an integer, = %d\n",
   1307                     atoi( yytext ) );
   1308             @}
   1309 
   1310 "."         printf( "found a dot\n" );
   1311 @end example
   1312 
   1313 Here is a scanner which recognizes (and discards) C
   1314 comments while maintaining a count of the current input line.
   1315 
   1316 @example
   1317 %x comment
   1318 %%
   1319         int line_num = 1;
   1320 
   1321 "/*"         BEGIN(comment);
   1322 
   1323 <comment>[^*\n]*        /* eat anything that's not a '*' */
   1324 <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1325 <comment>\n             ++line_num;
   1326 <comment>"*"+"/"        BEGIN(INITIAL);
   1327 @end example
   1328 
   1329 This scanner goes to a bit of trouble to match as much
   1330 text as possible with each rule.  In general, when
   1331 attempting to write a high-speed scanner try to match as
   1332 much possible in each rule, as it's a big win.
   1333 
   1334 Note that start-conditions names are really integer values
   1335 and can be stored as such.  Thus, the above could be
   1336 extended in the following fashion:
   1337 
   1338 @example
   1339 %x comment foo
   1340 %%
   1341         int line_num = 1;
   1342         int comment_caller;
   1343 
   1344 "/*"         @{
   1345              comment_caller = INITIAL;
   1346              BEGIN(comment);
   1347              @}
   1348 
   1349 @dots{}
   1350 
   1351 <foo>"/*"    @{
   1352              comment_caller = foo;
   1353              BEGIN(comment);
   1354              @}
   1355 
   1356 <comment>[^*\n]*        /* eat anything that's not a '*' */
   1357 <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1358 <comment>\n             ++line_num;
   1359 <comment>"*"+"/"        BEGIN(comment_caller);
   1360 @end example
   1361 
   1362 Furthermore, you can access the current start condition
   1363 using the integer-valued @code{YY_START} macro.  For example, the
   1364 above assignments to @code{comment_caller} could instead be
   1365 written
   1366 
   1367 @example
   1368 comment_caller = YY_START;
   1369 @end example
   1370 
   1371 Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that
   1372 is what's used by AT&T @code{lex}).
   1373 
   1374 Note that start conditions do not have their own
   1375 name-space; %s's and %x's declare names in the same fashion as
   1376 #define's.
   1377 
   1378 Finally, here's an example of how to match C-style quoted
   1379 strings using exclusive start conditions, including
   1380 expanded escape sequences (but not including checking for
   1381 a string that's too long):
   1382 
   1383 @example
   1384 %x str
   1385 
   1386 %%
   1387         char string_buf[MAX_STR_CONST];
   1388         char *string_buf_ptr;
   1389 
   1390 \"      string_buf_ptr = string_buf; BEGIN(str);
   1391 
   1392 <str>\"        @{ /* saw closing quote - all done */
   1393         BEGIN(INITIAL);
   1394         *string_buf_ptr = '\0';
   1395         /* return string constant token type and
   1396          * value to parser
   1397          */
   1398         @}
   1399 
   1400 <str>\n        @{
   1401         /* error - unterminated string constant */
   1402         /* generate error message */
   1403         @}
   1404 
   1405 <str>\\[0-7]@{1,3@} @{
   1406         /* octal escape sequence */
   1407         int result;
   1408 
   1409         (void) sscanf( yytext + 1, "%o", &result );
   1410 
   1411         if ( result > 0xff )
   1412                 /* error, constant is out-of-bounds */
   1413 
   1414         *string_buf_ptr++ = result;
   1415         @}
   1416 
   1417 <str>\\[0-9]+ @{
   1418         /* generate error - bad escape sequence; something
   1419          * like '\48' or '\0777777'
   1420          */
   1421         @}
   1422 
   1423 <str>\\n  *string_buf_ptr++ = '\n';
   1424 <str>\\t  *string_buf_ptr++ = '\t';
   1425 <str>\\r  *string_buf_ptr++ = '\r';
   1426 <str>\\b  *string_buf_ptr++ = '\b';
   1427 <str>\\f  *string_buf_ptr++ = '\f';
   1428 
   1429 <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
   1430 
   1431 <str>[^\\\n\"]+        @{
   1432         char *yptr = yytext;
   1433 
   1434         while ( *yptr )
   1435                 *string_buf_ptr++ = *yptr++;
   1436         @}
   1437 @end example
   1438 
   1439 Often, such as in some of the examples above, you wind up
   1440 writing a whole bunch of rules all preceded by the same
   1441 start condition(s).  Flex makes this a little easier and
   1442 cleaner by introducing a notion of start condition @dfn{scope}.
   1443 A start condition scope is begun with:
   1444 
   1445 @example
   1446 <SCs>@{
   1447 @end example
   1448 
   1449 @noindent
   1450 where SCs is a list of one or more start conditions.
   1451 Inside the start condition scope, every rule automatically
   1452 has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which
   1453 matches the initial @samp{@{}.  So, for example,
   1454 
   1455 @example
   1456 <ESC>@{
   1457     "\\n"   return '\n';
   1458     "\\r"   return '\r';
   1459     "\\f"   return '\f';
   1460     "\\0"   return '\0';
   1461 @}
   1462 @end example
   1463 
   1464 @noindent
   1465 is equivalent to:
   1466 
   1467 @example
   1468 <ESC>"\\n"  return '\n';
   1469 <ESC>"\\r"  return '\r';
   1470 <ESC>"\\f"  return '\f';
   1471 <ESC>"\\0"  return '\0';
   1472 @end example
   1473 
   1474 Start condition scopes may be nested.
   1475 
   1476 Three routines are available for manipulating stacks of
   1477 start conditions:
   1478 
   1479 @table @samp
   1480 @item void yy_push_state(int new_state)
   1481 pushes the current start condition onto the top of
   1482 the start condition stack and switches to @var{new_state}
   1483 as though you had used @samp{BEGIN new_state} (recall that
   1484 start condition names are also integers).
   1485 
   1486 @item void yy_pop_state()
   1487 pops the top of the stack and switches to it via
   1488 @code{BEGIN}.
   1489 
   1490 @item int yy_top_state()
   1491 returns the top of the stack without altering the
   1492 stack's contents.
   1493 @end table
   1494 
   1495 The start condition stack grows dynamically and so has no
   1496 built-in size limitation.  If memory is exhausted, program
   1497 execution aborts.
   1498 
   1499 To use start condition stacks, your scanner must include a
   1500 @samp{%option stack} directive (see Options below).
   1501 
   1502 @node Multiple buffers, End-of-file rules, Start conditions, Top
   1503 @section Multiple input buffers
   1504 
   1505 Some scanners (such as those which support "include"
   1506 files) require reading from several input streams.  As
   1507 @code{flex} scanners do a large amount of buffering, one cannot
   1508 control where the next input will be read from by simply
   1509 writing a @code{YY_INPUT} which is sensitive to the scanning
   1510 context.  @code{YY_INPUT} is only called when the scanner reaches
   1511 the end of its buffer, which may be a long time after
   1512 scanning a statement such as an "include" which requires
   1513 switching the input source.
   1514 
   1515 To negotiate these sorts of problems, @code{flex} provides a
   1516 mechanism for creating and switching between multiple
   1517 input buffers.  An input buffer is created by using:
   1518 
   1519 @example
   1520 YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
   1521 @end example
   1522 
   1523 @noindent
   1524 which takes a @code{FILE} pointer and a size and creates a buffer
   1525 associated with the given file and large enough to hold
   1526 @var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the
   1527 size).  It returns a @code{YY_BUFFER_STATE} handle, which may
   1528 then be passed to other routines (see below).  The
   1529 @code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct}
   1530 @code{yy_buffer_state} structure, so you may safely initialize
   1531 YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you
   1532 wish, and also refer to the opaque structure in order to
   1533 correctly declare input buffers in source files other than
   1534 that of your scanner.  Note that the @code{FILE} pointer in the
   1535 call to @code{yy_create_buffer} is only used as the value of @code{yyin}
   1536 seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer
   1537 uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to
   1538 @code{yy_create_buffer}.  You select a particular buffer to scan
   1539 from using:
   1540 
   1541 @example
   1542 void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
   1543 @end example
   1544 
   1545 switches the scanner's input buffer so subsequent tokens
   1546 will come from @var{new_buffer}.  Note that
   1547 @samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set
   1548 things up for continued scanning, instead of opening a new
   1549 file and pointing @code{yyin} at it.  Note also that switching
   1550 input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()}
   1551 does @emph{not} change the start condition.
   1552 
   1553 @example
   1554 void yy_delete_buffer( YY_BUFFER_STATE buffer )
   1555 @end example
   1556 
   1557 @noindent
   1558 is used to reclaim the storage associated with a buffer.
   1559 You can also clear the current contents of a buffer using:
   1560 
   1561 @example
   1562 void yy_flush_buffer( YY_BUFFER_STATE buffer )
   1563 @end example
   1564 
   1565 This function discards the buffer's contents, so the next time the
   1566 scanner attempts to match a token from the buffer, it will first fill
   1567 the buffer anew using @code{YY_INPUT}.
   1568 
   1569 @samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()},
   1570 provided for compatibility with the C++ use of @code{new} and @code{delete}
   1571 for creating and destroying dynamic objects.
   1572 
   1573 Finally, the @code{YY_CURRENT_BUFFER} macro returns a
   1574 @code{YY_BUFFER_STATE} handle to the current buffer.
   1575 
   1576 Here is an example of using these features for writing a
   1577 scanner which expands include files (the @samp{<<EOF>>} feature
   1578 is discussed below):
   1579 
   1580 @example
   1581 /* the "incl" state is used for picking up the name
   1582  * of an include file
   1583  */
   1584 %x incl
   1585 
   1586 %@{
   1587 #define MAX_INCLUDE_DEPTH 10
   1588 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
   1589 int include_stack_ptr = 0;
   1590 %@}
   1591 
   1592 %%
   1593 include             BEGIN(incl);
   1594 
   1595 [a-z]+              ECHO;
   1596 [^a-z\n]*\n?        ECHO;
   1597 
   1598 <incl>[ \t]*      /* eat the whitespace */
   1599 <incl>[^ \t\n]+   @{ /* got the include file name */
   1600         if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
   1601             @{
   1602             fprintf( stderr, "Includes nested too deeply" );
   1603             exit( 1 );
   1604             @}
   1605 
   1606         include_stack[include_stack_ptr++] =
   1607             YY_CURRENT_BUFFER;
   1608 
   1609         yyin = fopen( yytext, "r" );
   1610 
   1611         if ( ! yyin )
   1612             error( @dots{} );
   1613 
   1614         yy_switch_to_buffer(
   1615             yy_create_buffer( yyin, YY_BUF_SIZE ) );
   1616 
   1617         BEGIN(INITIAL);
   1618         @}
   1619 
   1620 <<EOF>> @{
   1621         if ( --include_stack_ptr < 0 )
   1622             @{
   1623             yyterminate();
   1624             @}
   1625 
   1626         else
   1627             @{
   1628             yy_delete_buffer( YY_CURRENT_BUFFER );
   1629             yy_switch_to_buffer(
   1630                  include_stack[include_stack_ptr] );
   1631             @}
   1632         @}
   1633 @end example
   1634 
   1635 Three routines are available for setting up input buffers
   1636 for scanning in-memory strings instead of files.  All of
   1637 them create a new input buffer for scanning the string,
   1638 and return a corresponding @code{YY_BUFFER_STATE} handle (which
   1639 you should delete with @samp{yy_delete_buffer()} when done with
   1640 it).  They also switch to the new buffer using
   1641 @samp{yy_switch_to_buffer()}, so the next call to @samp{yylex()} will
   1642 start scanning the string.
   1643 
   1644 @table @samp
   1645 @item yy_scan_string(const char *str)
   1646 scans a NUL-terminated string.
   1647 
   1648 @item yy_scan_bytes(const char *bytes, int len)
   1649 scans @code{len} bytes (including possibly NUL's) starting
   1650 at location @var{bytes}.
   1651 @end table
   1652 
   1653 Note that both of these functions create and scan a @emph{copy}
   1654 of the string or bytes.  (This may be desirable, since
   1655 @samp{yylex()} modifies the contents of the buffer it is
   1656 scanning.) You can avoid the copy by using:
   1657 
   1658 @table @samp
   1659 @item yy_scan_buffer(char *base, yy_size_t size)
   1660 which scans in place the buffer starting at @var{base},
   1661 consisting of @var{size} bytes, the last two bytes of
   1662 which @emph{must} be @code{YY_END_OF_BUFFER_CHAR} (ASCII NUL).
   1663 These last two bytes are not scanned; thus,
   1664 scanning consists of @samp{base[0]} through @samp{base[size-2]},
   1665 inclusive.
   1666 
   1667 If you fail to set up @var{base} in this manner (i.e.,
   1668 forget the final two @code{YY_END_OF_BUFFER_CHAR} bytes),
   1669 then @samp{yy_scan_buffer()} returns a nil pointer instead
   1670 of creating a new input buffer.
   1671 
   1672 The type @code{yy_size_t} is an integral type to which you
   1673 can cast an integer expression reflecting the size
   1674 of the buffer.
   1675 @end table
   1676 
   1677 @node End-of-file rules, Miscellaneous, Multiple buffers, Top
   1678 @section End-of-file rules
   1679 
   1680 The special rule "<<EOF>>" indicates actions which are to
   1681 be taken when an end-of-file is encountered and yywrap()
   1682 returns non-zero (i.e., indicates no further files to
   1683 process).  The action must finish by doing one of four
   1684 things:
   1685 
   1686 @itemize -
   1687 @item
   1688 assigning @code{yyin} to a new input file (in previous
   1689 versions of flex, after doing the assignment you
   1690 had to call the special action @code{YY_NEW_FILE}; this is
   1691 no longer necessary);
   1692 
   1693 @item
   1694 executing a @code{return} statement;
   1695 
   1696 @item
   1697 executing the special @samp{yyterminate()} action;
   1698 
   1699 @item
   1700 or, switching to a new buffer using
   1701 @samp{yy_switch_to_buffer()} as shown in the example
   1702 above.
   1703 @end itemize
   1704 
   1705 <<EOF>> rules may not be used with other patterns; they
   1706 may only be qualified with a list of start conditions.  If
   1707 an unqualified <<EOF>> rule is given, it applies to @emph{all}
   1708 start conditions which do not already have <<EOF>>
   1709 actions.  To specify an <<EOF>> rule for only the initial
   1710 start condition, use
   1711 
   1712 @example
   1713 <INITIAL><<EOF>>
   1714 @end example
   1715 
   1716 These rules are useful for catching things like unclosed
   1717 comments.  An example:
   1718 
   1719 @example
   1720 %x quote
   1721 %%
   1722 
   1723 @dots{}other rules for dealing with quotes@dots{}
   1724 
   1725 <quote><<EOF>>   @{
   1726          error( "unterminated quote" );
   1727          yyterminate();
   1728          @}
   1729 <<EOF>>  @{
   1730          if ( *++filelist )
   1731              yyin = fopen( *filelist, "r" );
   1732          else
   1733             yyterminate();
   1734          @}
   1735 @end example
   1736 
   1737 @node Miscellaneous, User variables, End-of-file rules, Top
   1738 @section Miscellaneous macros
   1739 
   1740 The macro @code{YY_USER_ACTION} can be defined to provide an
   1741 action which is always executed prior to the matched
   1742 rule's action.  For example, it could be #define'd to call
   1743 a routine to convert yytext to lower-case.  When
   1744 @code{YY_USER_ACTION} is invoked, the variable @code{yy_act} gives the
   1745 number of the matched rule (rules are numbered starting
   1746 with 1).  Suppose you want to profile how often each of
   1747 your rules is matched.  The following would do the trick:
   1748 
   1749 @example
   1750 #define YY_USER_ACTION ++ctr[yy_act]
   1751 @end example
   1752 
   1753 where @code{ctr} is an array to hold the counts for the different
   1754 rules.  Note that the macro @code{YY_NUM_RULES} gives the total number
   1755 of rules (including the default rule, even if you use @samp{-s}, so
   1756 a correct declaration for @code{ctr} is:
   1757 
   1758 @example
   1759 int ctr[YY_NUM_RULES];
   1760 @end example
   1761 
   1762 The macro @code{YY_USER_INIT} may be defined to provide an action
   1763 which is always executed before the first scan (and before
   1764 the scanner's internal initializations are done).  For
   1765 example, it could be used to call a routine to read in a
   1766 data table or open a logging file.
   1767 
   1768 The macro @samp{yy_set_interactive(is_interactive)} can be used
   1769 to control whether the current buffer is considered
   1770 @emph{interactive}.  An interactive buffer is processed more slowly,
   1771 but must be used when the scanner's input source is indeed
   1772 interactive to avoid problems due to waiting to fill
   1773 buffers (see the discussion of the @samp{-I} flag below).  A
   1774 non-zero value in the macro invocation marks the buffer as
   1775 interactive, a zero value as non-interactive.  Note that
   1776 use of this macro overrides @samp{%option always-interactive} or
   1777 @samp{%option never-interactive} (see Options below).
   1778 @samp{yy_set_interactive()} must be invoked prior to beginning to
   1779 scan the buffer that is (or is not) to be considered
   1780 interactive.
   1781 
   1782 The macro @samp{yy_set_bol(at_bol)} can be used to control
   1783 whether the current buffer's scanning context for the next
   1784 token match is done as though at the beginning of a line.
   1785 A non-zero macro argument makes rules anchored with
   1786 
   1787 The macro @samp{YY_AT_BOL()} returns true if the next token
   1788 scanned from the current buffer will have '^' rules
   1789 active, false otherwise.
   1790 
   1791 In the generated scanner, the actions are all gathered in
   1792 one large switch statement and separated using @code{YY_BREAK},
   1793 which may be redefined.  By default, it is simply a
   1794 "break", to separate each rule's action from the following
   1795 rule's.  Redefining @code{YY_BREAK} allows, for example, C++
   1796 users to #define YY_BREAK to do nothing (while being very
   1797 careful that every rule ends with a "break" or a
   1798 "return"!) to avoid suffering from unreachable statement
   1799 warnings where because a rule's action ends with "return",
   1800 the @code{YY_BREAK} is inaccessible.
   1801 
   1802 @node User variables, YACC interface, Miscellaneous, Top
   1803 @section Values available to the user
   1804 
   1805 This section summarizes the various values available to
   1806 the user in the rule actions.
   1807 
   1808 @itemize -
   1809 @item
   1810 @samp{char *yytext} holds the text of the current token.
   1811 It may be modified but not lengthened (you cannot
   1812 append characters to the end).
   1813 
   1814 If the special directive @samp{%array} appears in the
   1815 first section of the scanner description, then
   1816 @code{yytext} is instead declared @samp{char yytext[YYLMAX]},
   1817 where @code{YYLMAX} is a macro definition that you can
   1818 redefine in the first section if you don't like the
   1819 default value (generally 8KB).  Using @samp{%array}
   1820 results in somewhat slower scanners, but the value
   1821 of @code{yytext} becomes immune to calls to @samp{input()} and
   1822 @samp{unput()}, which potentially destroy its value when
   1823 @code{yytext} is a character pointer.  The opposite of
   1824 @samp{%array} is @samp{%pointer}, which is the default.
   1825 
   1826 You cannot use @samp{%array} when generating C++ scanner
   1827 classes (the @samp{-+} flag).
   1828 
   1829 @item
   1830 @samp{int yyleng} holds the length of the current token.
   1831 
   1832 @item
   1833 @samp{FILE *yyin} is the file which by default @code{flex} reads
   1834 from.  It may be redefined but doing so only makes
   1835 sense before scanning begins or after an EOF has
   1836 been encountered.  Changing it in the midst of
   1837 scanning will have unexpected results since @code{flex}
   1838 buffers its input; use @samp{yyrestart()} instead.  Once
   1839 scanning terminates because an end-of-file has been
   1840 seen, you can assign @code{yyin} at the new input file and
   1841 then call the scanner again to continue scanning.
   1842 
   1843 @item
   1844 @samp{void yyrestart( FILE *new_file )} may be called to
   1845 point @code{yyin} at the new input file.  The switch-over
   1846 to the new file is immediate (any previously
   1847 buffered-up input is lost).  Note that calling
   1848 @samp{yyrestart()} with @code{yyin} as an argument thus throws
   1849 away the current input buffer and continues
   1850 scanning the same input file.
   1851 
   1852 @item
   1853 @samp{FILE *yyout} is the file to which @samp{ECHO} actions are
   1854 done.  It can be reassigned by the user.
   1855 
   1856 @item
   1857 @code{YY_CURRENT_BUFFER} returns a @code{YY_BUFFER_STATE} handle
   1858 to the current buffer.
   1859 
   1860 @item
   1861 @code{YY_START} returns an integer value corresponding to
   1862 the current start condition.  You can subsequently
   1863 use this value with @code{BEGIN} to return to that start
   1864 condition.
   1865 @end itemize
   1866 
   1867 @node YACC interface, Options, User variables, Top
   1868 @section Interfacing with @code{yacc}
   1869 
   1870 One of the main uses of @code{flex} is as a companion to the @code{yacc}
   1871 parser-generator.  @code{yacc} parsers expect to call a routine
   1872 named @samp{yylex()} to find the next input token.  The routine
   1873 is supposed to return the type of the next token as well
   1874 as putting any associated value in the global @code{yylval}.  To
   1875 use @code{flex} with @code{yacc}, one specifies the @samp{-d} option to @code{yacc} to
   1876 instruct it to generate the file @file{y.tab.h} containing
   1877 definitions of all the @samp{%tokens} appearing in the @code{yacc} input.
   1878 This file is then included in the @code{flex} scanner.  For
   1879 example, if one of the tokens is "TOK_NUMBER", part of the
   1880 scanner might look like:
   1881 
   1882 @example
   1883 %@{
   1884 #include "y.tab.h"
   1885 %@}
   1886 
   1887 %%
   1888 
   1889 [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
   1890 @end example
   1891 
   1892 @node Options, Performance, YACC interface, Top
   1893 @section Options
   1894 @code{flex} has the following options:
   1895 
   1896 @table @samp
   1897 @item -b
   1898 Generate backing-up information to @file{lex.backup}.
   1899 This is a list of scanner states which require
   1900 backing up and the input characters on which they
   1901 do so.  By adding rules one can remove backing-up
   1902 states.  If @emph{all} backing-up states are eliminated
   1903 and @samp{-Cf} or @samp{-CF} is used, the generated scanner will
   1904 run faster (see the @samp{-p} flag).  Only users who wish
   1905 to squeeze every last cycle out of their scanners
   1906 need worry about this option.  (See the section on
   1907 Performance Considerations below.)
   1908 
   1909 @item -c
   1910 is a do-nothing, deprecated option included for
   1911 POSIX compliance.
   1912 
   1913 @item -d
   1914 makes the generated scanner run in @dfn{debug} mode.
   1915 Whenever a pattern is recognized and the global
   1916 @code{yy_flex_debug} is non-zero (which is the default),
   1917 the scanner will write to @code{stderr} a line of the
   1918 form:
   1919 
   1920 @example
   1921 --accepting rule at line 53 ("the matched text")
   1922 @end example
   1923 
   1924 The line number refers to the location of the rule
   1925 in the file defining the scanner (i.e., the file
   1926 that was fed to flex).  Messages are also generated
   1927 when the scanner backs up, accepts the default
   1928 rule, reaches the end of its input buffer (or
   1929 encounters a NUL; at this point, the two look the
   1930 same as far as the scanner's concerned), or reaches
   1931 an end-of-file.
   1932 
   1933 @item -f
   1934 specifies @dfn{fast scanner}.  No table compression is
   1935 done and stdio is bypassed.  The result is large
   1936 but fast.  This option is equivalent to @samp{-Cfr} (see
   1937 below).
   1938 
   1939 @item -h
   1940 generates a "help" summary of @code{flex's} options to
   1941 @code{stdout} and then exits.  @samp{-?} and @samp{--help} are synonyms
   1942 for @samp{-h}.
   1943 
   1944 @item -i
   1945 instructs @code{flex} to generate a @emph{case-insensitive}
   1946 scanner.  The case of letters given in the @code{flex} input
   1947 patterns will be ignored, and tokens in the input
   1948 will be matched regardless of case.  The matched
   1949 text given in @code{yytext} will have the preserved case
   1950 (i.e., it will not be folded).
   1951 
   1952 @item -l
   1953 turns on maximum compatibility with the original
   1954 AT&T @code{lex} implementation.  Note that this does not
   1955 mean @emph{full} compatibility.  Use of this option costs
   1956 a considerable amount of performance, and it cannot
   1957 be used with the @samp{-+, -f, -F, -Cf}, or @samp{-CF} options.
   1958 For details on the compatibilities it provides, see
   1959 the section "Incompatibilities With Lex And POSIX"
   1960 below.  This option also results in the name
   1961 @code{YY_FLEX_LEX_COMPAT} being #define'd in the generated
   1962 scanner.
   1963 
   1964 @item -n
   1965 is another do-nothing, deprecated option included
   1966 only for POSIX compliance.
   1967 
   1968 @item -p
   1969 generates a performance report to stderr.  The
   1970 report consists of comments regarding features of
   1971 the @code{flex} input file which will cause a serious loss
   1972 of performance in the resulting scanner.  If you
   1973 give the flag twice, you will also get comments
   1974 regarding features that lead to minor performance
   1975 losses.
   1976 
   1977 Note that the use of @code{REJECT}, @samp{%option yylineno} and
   1978 variable trailing context (see the Deficiencies / Bugs section below)
   1979 entails a substantial performance penalty; use of @samp{yymore()},
   1980 the @samp{^} operator, and the @samp{-I} flag entail minor performance
   1981 penalties.
   1982 
   1983 @item -s
   1984 causes the @dfn{default rule} (that unmatched scanner
   1985 input is echoed to @code{stdout}) to be suppressed.  If
   1986 the scanner encounters input that does not match
   1987 any of its rules, it aborts with an error.  This
   1988 option is useful for finding holes in a scanner's
   1989 rule set.
   1990 
   1991 @item -t
   1992 instructs @code{flex} to write the scanner it generates to
   1993 standard output instead of @file{lex.yy.c}.
   1994 
   1995 @item -v
   1996 specifies that @code{flex} should write to @code{stderr} a
   1997 summary of statistics regarding the scanner it
   1998 generates.  Most of the statistics are meaningless to
   1999 the casual @code{flex} user, but the first line identifies
   2000 the version of @code{flex} (same as reported by @samp{-V}), and
   2001 the next line the flags used when generating the
   2002 scanner, including those that are on by default.
   2003 
   2004 @item -w
   2005 suppresses warning messages.
   2006 
   2007 @item -B
   2008 instructs @code{flex} to generate a @emph{batch} scanner, the
   2009 opposite of @emph{interactive} scanners generated by @samp{-I}
   2010 (see below).  In general, you use @samp{-B} when you are
   2011 @emph{certain} that your scanner will never be used
   2012 interactively, and you want to squeeze a @emph{little} more
   2013 performance out of it.  If your goal is instead to
   2014 squeeze out a @emph{lot} more performance, you should be
   2015 using the @samp{-Cf} or @samp{-CF} options (discussed below),
   2016 which turn on @samp{-B} automatically anyway.
   2017 
   2018 @item -F
   2019 specifies that the @dfn{fast} scanner table
   2020 representation should be used (and stdio bypassed).  This
   2021 representation is about as fast as the full table
   2022 representation @samp{(-f)}, and for some sets of patterns
   2023 will be considerably smaller (and for others,
   2024 larger).  In general, if the pattern set contains
   2025 both "keywords" and a catch-all, "identifier" rule,
   2026 such as in the set:
   2027 
   2028 @example
   2029 "case"    return TOK_CASE;
   2030 "switch"  return TOK_SWITCH;
   2031 ...
   2032 "default" return TOK_DEFAULT;
   2033 [a-z]+    return TOK_ID;
   2034 @end example
   2035 
   2036 @noindent
   2037 then you're better off using the full table
   2038 representation.  If only the "identifier" rule is
   2039 present and you then use a hash table or some such to
   2040 detect the keywords, you're better off using @samp{-F}.
   2041 
   2042 This option is equivalent to @samp{-CFr} (see below).  It
   2043 cannot be used with @samp{-+}.
   2044 
   2045 @item -I
   2046 instructs @code{flex} to generate an @emph{interactive} scanner.
   2047 An interactive scanner is one that only looks ahead
   2048 to decide what token has been matched if it
   2049 absolutely must.  It turns out that always looking one
   2050 extra character ahead, even if the scanner has
   2051 already seen enough text to disambiguate the
   2052 current token, is a bit faster than only looking ahead
   2053 when necessary.  But scanners that always look
   2054 ahead give dreadful interactive performance; for
   2055 example, when a user types a newline, it is not
   2056 recognized as a newline token until they enter
   2057 @emph{another} token, which often means typing in another
   2058 whole line.
   2059 
   2060 @code{Flex} scanners default to @emph{interactive} unless you use
   2061 the @samp{-Cf} or @samp{-CF} table-compression options (see
   2062 below).  That's because if you're looking for
   2063 high-performance you should be using one of these
   2064 options, so if you didn't, @code{flex} assumes you'd
   2065 rather trade off a bit of run-time performance for
   2066 intuitive interactive behavior.  Note also that you
   2067 @emph{cannot} use @samp{-I} in conjunction with @samp{-Cf} or @samp{-CF}.
   2068 Thus, this option is not really needed; it is on by
   2069 default for all those cases in which it is allowed.
   2070 
   2071 You can force a scanner to @emph{not} be interactive by
   2072 using @samp{-B} (see above).
   2073 
   2074 @item -L
   2075 instructs @code{flex} not to generate @samp{#line} directives.
   2076 Without this option, @code{flex} peppers the generated
   2077 scanner with #line directives so error messages in
   2078 the actions will be correctly located with respect
   2079 to either the original @code{flex} input file (if the
   2080 errors are due to code in the input file), or
   2081 @file{lex.yy.c} (if the errors are @code{flex's} fault -- you
   2082 should report these sorts of errors to the email
   2083 address given below).
   2084 
   2085 @item -T
   2086 makes @code{flex} run in @code{trace} mode.  It will generate a
   2087 lot of messages to @code{stderr} concerning the form of
   2088 the input and the resultant non-deterministic and
   2089 deterministic finite automata.  This option is
   2090 mostly for use in maintaining @code{flex}.
   2091 
   2092 @item -V
   2093 prints the version number to @code{stdout} and exits.
   2094 @samp{--version} is a synonym for @samp{-V}.
   2095 
   2096 @item -7
   2097 instructs @code{flex} to generate a 7-bit scanner, i.e.,
   2098 one which can only recognized 7-bit characters in
   2099 its input.  The advantage of using @samp{-7} is that the
   2100 scanner's tables can be up to half the size of
   2101 those generated using the @samp{-8} option (see below).
   2102 The disadvantage is that such scanners often hang
   2103 or crash if their input contains an 8-bit
   2104 character.
   2105 
   2106 Note, however, that unless you generate your
   2107 scanner using the @samp{-Cf} or @samp{-CF} table compression options,
   2108 use of @samp{-7} will save only a small amount of table
   2109 space, and make your scanner considerably less
   2110 portable.  @code{Flex's} default behavior is to generate
   2111 an 8-bit scanner unless you use the @samp{-Cf} or @samp{-CF}, in
   2112 which case @code{flex} defaults to generating 7-bit
   2113 scanners unless your site was always configured to
   2114 generate 8-bit scanners (as will often be the case
   2115 with non-USA sites).  You can tell whether flex
   2116 generated a 7-bit or an 8-bit scanner by inspecting
   2117 the flag summary in the @samp{-v} output as described
   2118 above.
   2119 
   2120 Note that if you use @samp{-Cfe} or @samp{-CFe} (those table
   2121 compression options, but also using equivalence
   2122 classes as discussed see below), flex still
   2123 defaults to generating an 8-bit scanner, since
   2124 usually with these compression options full 8-bit
   2125 tables are not much more expensive than 7-bit
   2126 tables.
   2127 
   2128 @item -8
   2129 instructs @code{flex} to generate an 8-bit scanner, i.e.,
   2130 one which can recognize 8-bit characters.  This
   2131 flag is only needed for scanners generated using
   2132 @samp{-Cf} or @samp{-CF}, as otherwise flex defaults to
   2133 generating an 8-bit scanner anyway.
   2134 
   2135 See the discussion of @samp{-7} above for flex's default
   2136 behavior and the tradeoffs between 7-bit and 8-bit
   2137 scanners.
   2138 
   2139 @item -+
   2140 specifies that you want flex to generate a C++
   2141 scanner class.  See the section on Generating C++
   2142 Scanners below for details.
   2143 
   2144 @item -C[aefFmr]
   2145 controls the degree of table compression and, more
   2146 generally, trade-offs between small scanners and
   2147 fast scanners.
   2148 
   2149 @samp{-Ca} ("align") instructs flex to trade off larger
   2150 tables in the generated scanner for faster
   2151 performance because the elements of the tables are better
   2152 aligned for memory access and computation.  On some
   2153 RISC architectures, fetching and manipulating
   2154 long-words is more efficient than with smaller-sized
   2155 units such as shortwords.  This option can double
   2156 the size of the tables used by your scanner.
   2157 
   2158 @samp{-Ce} directs @code{flex} to construct @dfn{equivalence classes},
   2159 i.e., sets of characters which have identical
   2160 lexical properties (for example, if the only appearance
   2161 of digits in the @code{flex} input is in the character
   2162 class "[0-9]" then the digits '0', '1', @dots{}, '9'
   2163 will all be put in the same equivalence class).
   2164 Equivalence classes usually give dramatic
   2165 reductions in the final table/object file sizes
   2166 (typically a factor of 2-5) and are pretty cheap
   2167 performance-wise (one array look-up per character
   2168 scanned).
   2169 
   2170 @samp{-Cf} specifies that the @emph{full} scanner tables should
   2171 be generated - @code{flex} should not compress the tables
   2172 by taking advantages of similar transition
   2173 functions for different states.
   2174 
   2175 @samp{-CF} specifies that the alternate fast scanner
   2176 representation (described above under the @samp{-F} flag)
   2177 should be used.  This option cannot be used with
   2178 @samp{-+}.
   2179 
   2180 @samp{-Cm} directs @code{flex} to construct @dfn{meta-equivalence
   2181 classes}, which are sets of equivalence classes (or
   2182 characters, if equivalence classes are not being
   2183 used) that are commonly used together.
   2184 Meta-equivalence classes are often a big win when using
   2185 compressed tables, but they have a moderate
   2186 performance impact (one or two "if" tests and one array
   2187 look-up per character scanned).
   2188 
   2189 @samp{-Cr} causes the generated scanner to @emph{bypass} use of
   2190 the standard I/O library (stdio) for input.
   2191 Instead of calling @samp{fread()} or @samp{getc()}, the scanner
   2192 will use the @samp{read()} system call, resulting in a
   2193 performance gain which varies from system to
   2194 system, but in general is probably negligible unless
   2195 you are also using @samp{-Cf} or @samp{-CF}.  Using @samp{-Cr} can cause
   2196 strange behavior if, for example, you read from
   2197 @code{yyin} using stdio prior to calling the scanner
   2198 (because the scanner will miss whatever text your
   2199 previous reads left in the stdio input buffer).
   2200 
   2201 @samp{-Cr} has no effect if you define @code{YY_INPUT} (see The
   2202 Generated Scanner above).
   2203 
   2204 A lone @samp{-C} specifies that the scanner tables should
   2205 be compressed but neither equivalence classes nor
   2206 meta-equivalence classes should be used.
   2207 
   2208 The options @samp{-Cf} or @samp{-CF} and @samp{-Cm} do not make sense
   2209 together - there is no opportunity for
   2210 meta-equivalence classes if the table is not being
   2211 compressed.  Otherwise the options may be freely
   2212 mixed, and are cumulative.
   2213 
   2214 The default setting is @samp{-Cem}, which specifies that
   2215 @code{flex} should generate equivalence classes and
   2216 meta-equivalence classes.  This setting provides the
   2217 highest degree of table compression.  You can trade
   2218 off faster-executing scanners at the cost of larger
   2219 tables with the following generally being true:
   2220 
   2221 @example
   2222 slowest & smallest
   2223       -Cem
   2224       -Cm
   2225       -Ce
   2226       -C
   2227       -C@{f,F@}e
   2228       -C@{f,F@}
   2229       -C@{f,F@}a
   2230 fastest & largest
   2231 @end example
   2232 
   2233 Note that scanners with the smallest tables are
   2234 usually generated and compiled the quickest, so
   2235 during development you will usually want to use the
   2236 default, maximal compression.
   2237 
   2238 @samp{-Cfe} is often a good compromise between speed and
   2239 size for production scanners.
   2240 
   2241 @item -ooutput
   2242 directs flex to write the scanner to the file @samp{out-}
   2243 @code{put} instead of @file{lex.yy.c}.  If you combine @samp{-o} with
   2244 the @samp{-t} option, then the scanner is written to
   2245 @code{stdout} but its @samp{#line} directives (see the @samp{-L} option
   2246 above) refer to the file @code{output}.
   2247 
   2248 @item -Pprefix
   2249 changes the default @samp{yy} prefix used by @code{flex} for all
   2250 globally-visible variable and function names to
   2251 instead be @var{prefix}.  For example, @samp{-Pfoo} changes the
   2252 name of @code{yytext} to @file{footext}.  It also changes the
   2253 name of the default output file from @file{lex.yy.c} to
   2254 @file{lex.foo.c}.  Here are all of the names affected:
   2255 
   2256 @example
   2257 yy_create_buffer
   2258 yy_delete_buffer
   2259 yy_flex_debug
   2260 yy_init_buffer
   2261 yy_flush_buffer
   2262 yy_load_buffer_state
   2263 yy_switch_to_buffer
   2264 yyin
   2265 yyleng
   2266 yylex
   2267 yylineno
   2268 yyout
   2269 yyrestart
   2270 yytext
   2271 yywrap
   2272 @end example
   2273 
   2274 (If you are using a C++ scanner, then only @code{yywrap}
   2275 and @code{yyFlexLexer} are affected.) Within your scanner
   2276 itself, you can still refer to the global variables
   2277 and functions using either version of their name;
   2278 but externally, they have the modified name.
   2279 
   2280 This option lets you easily link together multiple
   2281 @code{flex} programs into the same executable.  Note,
   2282 though, that using this option also renames
   2283 @samp{yywrap()}, so you now @emph{must} either provide your own
   2284 (appropriately-named) version of the routine for
   2285 your scanner, or use @samp{%option noyywrap}, as linking
   2286 with @samp{-lfl} no longer provides one for you by
   2287 default.
   2288 
   2289 @item -Sskeleton_file
   2290 overrides the default skeleton file from which @code{flex}
   2291 constructs its scanners.  You'll never need this
   2292 option unless you are doing @code{flex} maintenance or
   2293 development.
   2294 @end table
   2295 
   2296 @code{flex} also provides a mechanism for controlling options
   2297 within the scanner specification itself, rather than from
   2298 the flex command-line.  This is done by including @samp{%option}
   2299 directives in the first section of the scanner
   2300 specification.  You can specify multiple options with a single
   2301 @samp{%option} directive, and multiple directives in the first
   2302 section of your flex input file.  Most options are given
   2303 simply as names, optionally preceded by the word "no"
   2304 (with no intervening whitespace) to negate their meaning.
   2305 A number are equivalent to flex flags or their negation:
   2306 
   2307 @example
   2308 7bit            -7 option
   2309 8bit            -8 option
   2310 align           -Ca option
   2311 backup          -b option
   2312 batch           -B option
   2313 c++             -+ option
   2314 
   2315 caseful or
   2316 case-sensitive  opposite of -i (default)
   2317 
   2318 case-insensitive or
   2319 caseless        -i option
   2320 
   2321 debug           -d option
   2322 default         opposite of -s option
   2323 ecs             -Ce option
   2324 fast            -F option
   2325 full            -f option
   2326 interactive     -I option
   2327 lex-compat      -l option
   2328 meta-ecs        -Cm option
   2329 perf-report     -p option
   2330 read            -Cr option
   2331 stdout          -t option
   2332 verbose         -v option
   2333 warn            opposite of -w option
   2334                 (use "%option nowarn" for -w)
   2335 
   2336 array           equivalent to "%array"
   2337 pointer         equivalent to "%pointer" (default)
   2338 @end example
   2339 
   2340 Some @samp{%option's} provide features otherwise not available:
   2341 
   2342 @table @samp
   2343 @item always-interactive
   2344 instructs flex to generate a scanner which always
   2345 considers its input "interactive".  Normally, on
   2346 each new input file the scanner calls @samp{isatty()} in
   2347 an attempt to determine whether the scanner's input
   2348 source is interactive and thus should be read a
   2349 character at a time.  When this option is used,
   2350 however, then no such call is made.
   2351 
   2352 @item main
   2353 directs flex to provide a default @samp{main()} program
   2354 for the scanner, which simply calls @samp{yylex()}.  This
   2355 option implies @code{noyywrap} (see below).
   2356 
   2357 @item never-interactive
   2358 instructs flex to generate a scanner which never
   2359 considers its input "interactive" (again, no call
   2360 made to @samp{isatty())}.  This is the opposite of @samp{always-}
   2361 @emph{interactive}.
   2362 
   2363 @item stack
   2364 enables the use of start condition stacks (see
   2365 Start Conditions above).
   2366 
   2367 @item stdinit
   2368 if unset (i.e., @samp{%option nostdinit}) initializes @code{yyin}
   2369 and @code{yyout} to nil @code{FILE} pointers, instead of @code{stdin}
   2370 and @code{stdout}.
   2371 
   2372 @item yylineno
   2373 directs @code{flex} to generate a scanner that maintains the number
   2374 of the current line read from its input in the global variable
   2375 @code{yylineno}.  This option is implied by @samp{%option lex-compat}.
   2376 
   2377 @item yywrap
   2378 if unset (i.e., @samp{%option noyywrap}), makes the
   2379 scanner not call @samp{yywrap()} upon an end-of-file, but
   2380 simply assume that there are no more files to scan
   2381 (until the user points @code{yyin} at a new file and calls
   2382 @samp{yylex()} again).
   2383 @end table
   2384 
   2385 @code{flex} scans your rule actions to determine whether you use
   2386 the @code{REJECT} or @samp{yymore()} features.  The @code{reject} and @code{yymore}
   2387 options are available to override its decision as to
   2388 whether you use the options, either by setting them (e.g.,
   2389 @samp{%option reject}) to indicate the feature is indeed used, or
   2390 unsetting them to indicate it actually is not used (e.g.,
   2391 @samp{%option noyymore}).
   2392 
   2393 Three options take string-delimited values, offset with '=':
   2394 
   2395 @example
   2396 %option outfile="ABC"
   2397 @end example
   2398 
   2399 @noindent
   2400 is equivalent to @samp{-oABC}, and
   2401 
   2402 @example
   2403 %option prefix="XYZ"
   2404 @end example
   2405 
   2406 @noindent
   2407 is equivalent to @samp{-PXYZ}.
   2408 
   2409 Finally,
   2410 
   2411 @example
   2412 %option yyclass="foo"
   2413 @end example
   2414 
   2415 @noindent
   2416 only applies when generating a C++ scanner (@samp{-+} option).  It
   2417 informs @code{flex} that you have derived @samp{foo} as a subclass of
   2418 @code{yyFlexLexer} so @code{flex} will place your actions in the member
   2419 function @samp{foo::yylex()} instead of @samp{yyFlexLexer::yylex()}.
   2420 It also generates a @samp{yyFlexLexer::yylex()} member function that
   2421 emits a run-time error (by invoking @samp{yyFlexLexer::LexerError()})
   2422 if called.  See Generating C++ Scanners, below, for additional
   2423 information.
   2424 
   2425 A number of options are available for lint purists who
   2426 want to suppress the appearance of unneeded routines in
   2427 the generated scanner.  Each of the following, if unset,
   2428 results in the corresponding routine not appearing in the
   2429 generated scanner:
   2430 
   2431 @example
   2432 input, unput
   2433 yy_push_state, yy_pop_state, yy_top_state
   2434 yy_scan_buffer, yy_scan_bytes, yy_scan_string
   2435 @end example
   2436 
   2437 @noindent
   2438 (though @samp{yy_push_state()} and friends won't appear anyway
   2439 unless you use @samp{%option stack}).
   2440 
   2441 @node Performance, C++, Options, Top
   2442 @section Performance considerations
   2443 
   2444 The main design goal of @code{flex} is that it generate
   2445 high-performance scanners.  It has been optimized for dealing
   2446 well with large sets of rules.  Aside from the effects on
   2447 scanner speed of the table compression @samp{-C} options outlined
   2448 above, there are a number of options/actions which degrade
   2449 performance.  These are, from most expensive to least:
   2450 
   2451 @example
   2452 REJECT
   2453 %option yylineno
   2454 arbitrary trailing context
   2455 
   2456 pattern sets that require backing up
   2457 %array
   2458 %option interactive
   2459 %option always-interactive
   2460 
   2461 '^' beginning-of-line operator
   2462 yymore()
   2463 @end example
   2464 
   2465 with the first three all being quite expensive and the
   2466 last two being quite cheap.  Note also that @samp{unput()} is
   2467 implemented as a routine call that potentially does quite
   2468 a bit of work, while @samp{yyless()} is a quite-cheap macro; so
   2469 if just putting back some excess text you scanned, use
   2470 @samp{yyless()}.
   2471 
   2472 @code{REJECT} should be avoided at all costs when performance is
   2473 important.  It is a particularly expensive option.
   2474 
   2475 Getting rid of backing up is messy and often may be an
   2476 enormous amount of work for a complicated scanner.  In
   2477 principal, one begins by using the @samp{-b} flag to generate a
   2478 @file{lex.backup} file.  For example, on the input
   2479 
   2480 @example
   2481 %%
   2482 foo        return TOK_KEYWORD;
   2483 foobar     return TOK_KEYWORD;
   2484 @end example
   2485 
   2486 @noindent
   2487 the file looks like:
   2488 
   2489 @example
   2490 State #6 is non-accepting -
   2491  associated rule line numbers:
   2492        2       3
   2493  out-transitions: [ o ]
   2494  jam-transitions: EOF [ \001-n  p-\177 ]
   2495 
   2496 State #8 is non-accepting -
   2497  associated rule line numbers:
   2498        3
   2499  out-transitions: [ a ]
   2500  jam-transitions: EOF [ \001-`  b-\177 ]
   2501 
   2502 State #9 is non-accepting -
   2503  associated rule line numbers:
   2504        3
   2505  out-transitions: [ r ]
   2506  jam-transitions: EOF [ \001-q  s-\177 ]
   2507 
   2508 Compressed tables always back up.
   2509 @end example
   2510 
   2511 The first few lines tell us that there's a scanner state
   2512 in which it can make a transition on an 'o' but not on any
   2513 other character, and that in that state the currently
   2514 scanned text does not match any rule.  The state occurs
   2515 when trying to match the rules found at lines 2 and 3 in
   2516 the input file.  If the scanner is in that state and then
   2517 reads something other than an 'o', it will have to back up
   2518 to find a rule which is matched.  With a bit of
   2519 head-scratching one can see that this must be the state it's in
   2520 when it has seen "fo".  When this has happened, if
   2521 anything other than another 'o' is seen, the scanner will
   2522 have to back up to simply match the 'f' (by the default
   2523 rule).
   2524 
   2525 The comment regarding State #8 indicates there's a problem
   2526 when "foob" has been scanned.  Indeed, on any character
   2527 other than an 'a', the scanner will have to back up to
   2528 accept "foo".  Similarly, the comment for State #9
   2529 concerns when "fooba" has been scanned and an 'r' does not
   2530 follow.
   2531 
   2532 The final comment reminds us that there's no point going
   2533 to all the trouble of removing backing up from the rules
   2534 unless we're using @samp{-Cf} or @samp{-CF}, since there's no
   2535 performance gain doing so with compressed scanners.
   2536 
   2537 The way to remove the backing up is to add "error" rules:
   2538 
   2539 @example
   2540 %%
   2541 foo         return TOK_KEYWORD;
   2542 foobar      return TOK_KEYWORD;
   2543 
   2544 fooba       |
   2545 foob        |
   2546 fo          @{
   2547             /* false alarm, not really a keyword */
   2548             return TOK_ID;
   2549             @}
   2550 @end example
   2551 
   2552 Eliminating backing up among a list of keywords can also
   2553 be done using a "catch-all" rule:
   2554 
   2555 @example
   2556 %%
   2557 foo         return TOK_KEYWORD;
   2558 foobar      return TOK_KEYWORD;
   2559 
   2560 [a-z]+      return TOK_ID;
   2561 @end example
   2562 
   2563 This is usually the best solution when appropriate.
   2564 
   2565 Backing up messages tend to cascade.  With a complicated
   2566 set of rules it's not uncommon to get hundreds of
   2567 messages.  If one can decipher them, though, it often only
   2568 takes a dozen or so rules to eliminate the backing up
   2569 (though it's easy to make a mistake and have an error rule
   2570 accidentally match a valid token.  A possible future @code{flex}
   2571 feature will be to automatically add rules to eliminate
   2572 backing up).
   2573 
   2574 It's important to keep in mind that you gain the benefits
   2575 of eliminating backing up only if you eliminate @emph{every}
   2576 instance of backing up.  Leaving just one means you gain
   2577 nothing.
   2578 
   2579 @var{Variable} trailing context (where both the leading and
   2580 trailing parts do not have a fixed length) entails almost
   2581 the same performance loss as @code{REJECT} (i.e., substantial).
   2582 So when possible a rule like:
   2583 
   2584 @example
   2585 %%
   2586 mouse|rat/(cat|dog)   run();
   2587 @end example
   2588 
   2589 @noindent
   2590 is better written:
   2591 
   2592 @example
   2593 %%
   2594 mouse/cat|dog         run();
   2595 rat/cat|dog           run();
   2596 @end example
   2597 
   2598 @noindent
   2599 or as
   2600 
   2601 @example
   2602 %%
   2603 mouse|rat/cat         run();
   2604 mouse|rat/dog         run();
   2605 @end example
   2606 
   2607 Note that here the special '|' action does @emph{not} provide any
   2608 savings, and can even make things worse (see Deficiencies
   2609 / Bugs below).
   2610 
   2611 Another area where the user can increase a scanner's
   2612 performance (and one that's easier to implement) arises from
   2613 the fact that the longer the tokens matched, the faster
   2614 the scanner will run.  This is because with long tokens
   2615 the processing of most input characters takes place in the
   2616 (short) inner scanning loop, and does not often have to go
   2617 through the additional work of setting up the scanning
   2618 environment (e.g., @code{yytext}) for the action.  Recall the
   2619 scanner for C comments:
   2620 
   2621 @example
   2622 %x comment
   2623 %%
   2624         int line_num = 1;
   2625 
   2626 "/*"         BEGIN(comment);
   2627 
   2628 <comment>[^*\n]*
   2629 <comment>"*"+[^*/\n]*
   2630 <comment>\n             ++line_num;
   2631 <comment>"*"+"/"        BEGIN(INITIAL);
   2632 @end example
   2633 
   2634 This could be sped up by writing it as:
   2635 
   2636 @example
   2637 %x comment
   2638 %%
   2639         int line_num = 1;
   2640 
   2641 "/*"         BEGIN(comment);
   2642 
   2643 <comment>[^*\n]*
   2644 <comment>[^*\n]*\n      ++line_num;
   2645 <comment>"*"+[^*/\n]*
   2646 <comment>"*"+[^*/\n]*\n ++line_num;
   2647 <comment>"*"+"/"        BEGIN(INITIAL);
   2648 @end example
   2649 
   2650 Now instead of each newline requiring the processing of
   2651 another action, recognizing the newlines is "distributed"
   2652 over the other rules to keep the matched text as long as
   2653 possible.  Note that @emph{adding} rules does @emph{not} slow down the
   2654 scanner!  The speed of the scanner is independent of the
   2655 number of rules or (modulo the considerations given at the
   2656 beginning of this section) how complicated the rules are
   2657 with regard to operators such as '*' and '|'.
   2658 
   2659 A final example in speeding up a scanner: suppose you want
   2660 to scan through a file containing identifiers and
   2661 keywords, one per line and with no other extraneous
   2662 characters, and recognize all the keywords.  A natural first
   2663 approach is:
   2664 
   2665 @example
   2666 %%
   2667 asm      |
   2668 auto     |
   2669 break    |
   2670 @dots{} etc @dots{}
   2671 volatile |
   2672 while    /* it's a keyword */
   2673 
   2674 .|\n     /* it's not a keyword */
   2675 @end example
   2676 
   2677 To eliminate the back-tracking, introduce a catch-all
   2678 rule:
   2679 
   2680 @example
   2681 %%
   2682 asm      |
   2683 auto     |
   2684 break    |
   2685 ... etc ...
   2686 volatile |
   2687 while    /* it's a keyword */
   2688 
   2689 [a-z]+   |
   2690 .|\n     /* it's not a keyword */
   2691 @end example
   2692 
   2693 Now, if it's guaranteed that there's exactly one word per
   2694 line, then we can reduce the total number of matches by a
   2695 half by merging in the recognition of newlines with that
   2696 of the other tokens:
   2697 
   2698 @example
   2699 %%
   2700 asm\n    |
   2701 auto\n   |
   2702 break\n  |
   2703 @dots{} etc @dots{}
   2704 volatile\n |
   2705 while\n  /* it's a keyword */
   2706 
   2707 [a-z]+\n |
   2708 .|\n     /* it's not a keyword */
   2709 @end example
   2710 
   2711 One has to be careful here, as we have now reintroduced
   2712 backing up into the scanner.  In particular, while @emph{we} know
   2713 that there will never be any characters in the input
   2714 stream other than letters or newlines, @code{flex} can't figure
   2715 this out, and it will plan for possibly needing to back up
   2716 when it has scanned a token like "auto" and then the next
   2717 character is something other than a newline or a letter.
   2718 Previously it would then just match the "auto" rule and be
   2719 done, but now it has no "auto" rule, only a "auto\n" rule.
   2720 To eliminate the possibility of backing up, we could
   2721 either duplicate all rules but without final newlines, or,
   2722 since we never expect to encounter such an input and
   2723 therefore don't how it's classified, we can introduce one
   2724 more catch-all rule, this one which doesn't include a
   2725 newline:
   2726 
   2727 @example
   2728 %%
   2729 asm\n    |
   2730 auto\n   |
   2731 break\n  |
   2732 @dots{} etc @dots{}
   2733 volatile\n |
   2734 while\n  /* it's a keyword */
   2735 
   2736 [a-z]+\n |
   2737 [a-z]+   |
   2738 .|\n     /* it's not a keyword */
   2739 @end example
   2740 
   2741 Compiled with @samp{-Cf}, this is about as fast as one can get a
   2742 @code{flex} scanner to go for this particular problem.
   2743 
   2744 A final note: @code{flex} is slow when matching NUL's,
   2745 particularly when a token contains multiple NUL's.  It's best to
   2746 write rules which match @emph{short} amounts of text if it's
   2747 anticipated that the text will often include NUL's.
   2748 
   2749 Another final note regarding performance: as mentioned
   2750 above in the section How the Input is Matched, dynamically
   2751 resizing @code{yytext} to accommodate huge tokens is a slow
   2752 process because it presently requires that the (huge) token
   2753 be rescanned from the beginning.  Thus if performance is
   2754 vital, you should attempt to match "large" quantities of
   2755 text but not "huge" quantities, where the cutoff between
   2756 the two is at about 8K characters/token.
   2757 
   2758 @node C++, Incompatibilities, Performance, Top
   2759 @section Generating C++ scanners
   2760 
   2761 @code{flex} provides two different ways to generate scanners for
   2762 use with C++.  The first way is to simply compile a
   2763 scanner generated by @code{flex} using a C++ compiler instead of a C
   2764 compiler.  You should not encounter any compilations
   2765 errors (please report any you find to the email address
   2766 given in the Author section below).  You can then use C++
   2767 code in your rule actions instead of C code.  Note that
   2768 the default input source for your scanner remains @code{yyin},
   2769 and default echoing is still done to @code{yyout}.  Both of these
   2770 remain @samp{FILE *} variables and not C++ @code{streams}.
   2771 
   2772 You can also use @code{flex} to generate a C++ scanner class, using
   2773 the @samp{-+} option, (or, equivalently, @samp{%option c++}), which
   2774 is automatically specified if the name of the flex executable ends
   2775 in a @samp{+}, such as @code{flex++}.  When using this option, flex
   2776 defaults to generating the scanner to the file @file{lex.yy.cc} instead
   2777 of @file{lex.yy.c}.  The generated scanner includes the header file
   2778 @file{FlexLexer.h}, which defines the interface to two C++ classes.
   2779 
   2780 The first class, @code{FlexLexer}, provides an abstract base
   2781 class defining the general scanner class interface.  It
   2782 provides the following member functions:
   2783 
   2784 @table @samp
   2785 @item const char* YYText()
   2786 returns the text of the most recently matched
   2787 token, the equivalent of @code{yytext}.
   2788 
   2789 @item int YYLeng()
   2790 returns the length of the most recently matched
   2791 token, the equivalent of @code{yyleng}.
   2792 
   2793 @item int lineno() const
   2794 returns the current input line number (see @samp{%option yylineno}),
   2795 or 1 if @samp{%option yylineno} was not used.
   2796 
   2797 @item void set_debug( int flag )
   2798 sets the debugging flag for the scanner, equivalent to assigning to
   2799 @code{yy_flex_debug} (see the Options section above).  Note that you
   2800 must build the scanner using @samp{%option debug} to include debugging
   2801 information in it.
   2802 
   2803 @item int debug() const
   2804 returns the current setting of the debugging flag.
   2805 @end table
   2806 
   2807 Also provided are member functions equivalent to
   2808 @samp{yy_switch_to_buffer(), yy_create_buffer()} (though the
   2809 first argument is an @samp{istream*} object pointer and not a
   2810 @samp{FILE*}, @samp{yy_flush_buffer()}, @samp{yy_delete_buffer()},
   2811 and @samp{yyrestart()} (again, the first argument is a @samp{istream*}
   2812 object pointer).
   2813 
   2814 The second class defined in @file{FlexLexer.h} is @code{yyFlexLexer},
   2815 which is derived from @code{FlexLexer}.  It defines the following
   2816 additional member functions:
   2817 
   2818 @table @samp
   2819 @item yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
   2820 constructs a @code{yyFlexLexer} object using the given
   2821 streams for input and output.  If not specified,
   2822 the streams default to @code{cin} and @code{cout}, respectively.
   2823 
   2824 @item virtual int yylex()
   2825 performs the same role is @samp{yylex()} does for ordinary
   2826 flex scanners: it scans the input stream, consuming
   2827 tokens, until a rule's action returns a value.  If you derive a subclass
   2828 @var{S}
   2829 from @code{yyFlexLexer}
   2830 and want to access the member functions and variables of
   2831 @var{S}
   2832 inside @samp{yylex()},
   2833 then you need to use @samp{%option yyclass="@var{S}"}
   2834 to inform @code{flex}
   2835 that you will be using that subclass instead of @code{yyFlexLexer}.
   2836 In this case, rather than generating @samp{yyFlexLexer::yylex()},
   2837 @code{flex} generates @samp{@var{S}::yylex()}
   2838 (and also generates a dummy @samp{yyFlexLexer::yylex()}
   2839 that calls @samp{yyFlexLexer::LexerError()}
   2840 if called).
   2841 
   2842 @item virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)
   2843 reassigns @code{yyin} to @code{new_in}
   2844 (if non-nil)
   2845 and @code{yyout} to @code{new_out}
   2846 (ditto), deleting the previous input buffer if @code{yyin}
   2847 is reassigned.
   2848 
   2849 @item int yylex( istream* new_in = 0, ostream* new_out = 0 )
   2850 first switches the input streams via @samp{switch_streams( new_in, new_out )}
   2851 and then returns the value of @samp{yylex()}.
   2852 @end table
   2853 
   2854 In addition, @code{yyFlexLexer} defines the following protected
   2855 virtual functions which you can redefine in derived
   2856 classes to tailor the scanner:
   2857 
   2858 @table @samp
   2859 @item virtual int LexerInput( char* buf, int max_size )
   2860 reads up to @samp{max_size} characters into @var{buf} and
   2861 returns the number of characters read.  To indicate
   2862 end-of-input, return 0 characters.  Note that
   2863 "interactive" scanners (see the @samp{-B} and @samp{-I} flags)
   2864 define the macro @code{YY_INTERACTIVE}.  If you redefine
   2865 @code{LexerInput()} and need to take different actions
   2866 depending on whether or not the scanner might be
   2867 scanning an interactive input source, you can test
   2868 for the presence of this name via @samp{#ifdef}.
   2869 
   2870 @item virtual void LexerOutput( const char* buf, int size )
   2871 writes out @var{size} characters from the buffer @var{buf},
   2872 which, while NUL-terminated, may also contain
   2873 "internal" NUL's if the scanner's rules can match
   2874 text with NUL's in them.
   2875 
   2876 @item virtual void LexerError( const char* msg )
   2877 reports a fatal error message.  The default version
   2878 of this function writes the message to the stream
   2879 @code{cerr} and exits.
   2880 @end table
   2881 
   2882 Note that a @code{yyFlexLexer} object contains its @emph{entire}
   2883 scanning state.  Thus you can use such objects to create
   2884 reentrant scanners.  You can instantiate multiple instances of
   2885 the same @code{yyFlexLexer} class, and you can also combine
   2886 multiple C++ scanner classes together in the same program
   2887 using the @samp{-P} option discussed above.
   2888 Finally, note that the @samp{%array} feature is not available to
   2889 C++ scanner classes; you must use @samp{%pointer} (the default).
   2890 
   2891 Here is an example of a simple C++ scanner:
   2892 
   2893 @example
   2894     // An example of using the flex C++ scanner class.
   2895 
   2896 %@{
   2897 int mylineno = 0;
   2898 %@}
   2899 
   2900 string  \"[^\n"]+\"
   2901 
   2902 ws      [ \t]+
   2903 
   2904 alpha   [A-Za-z]
   2905 dig     [0-9]
   2906 name    (@{alpha@}|@{dig@}|\$)(@{alpha@}|@{dig@}|[_.\-/$])*
   2907 num1    [-+]?@{dig@}+\.?([eE][-+]?@{dig@}+)?
   2908 num2    [-+]?@{dig@}*\.@{dig@}+([eE][-+]?@{dig@}+)?
   2909 number  @{num1@}|@{num2@}
   2910 
   2911 %%
   2912 
   2913 @{ws@}    /* skip blanks and tabs */
   2914 
   2915 "/*"    @{
   2916         int c;
   2917 
   2918         while((c = yyinput()) != 0)
   2919             @{
   2920             if(c == '\n')
   2921                 ++mylineno;
   2922 
   2923             else if(c == '*')
   2924                 @{
   2925                 if((c = yyinput()) == '/')
   2926                     break;
   2927                 else
   2928                     unput(c);
   2929                 @}
   2930             @}
   2931         @}
   2932 
   2933 @{number@}  cout << "number " << YYText() << '\n';
   2934 
   2935 \n        mylineno++;
   2936 
   2937 @{name@}    cout << "name " << YYText() << '\n';
   2938 
   2939 @{string@}  cout << "string " << YYText() << '\n';
   2940 
   2941 %%
   2942 
   2943 Version 2.5               December 1994                        44
   2944 
   2945 int main( int /* argc */, char** /* argv */ )
   2946     @{
   2947     FlexLexer* lexer = new yyFlexLexer;
   2948     while(lexer->yylex() != 0)
   2949         ;
   2950     return 0;
   2951     @}
   2952 @end example
   2953 
   2954 If you want to create multiple (different) lexer classes,
   2955 you use the @samp{-P} flag (or the @samp{prefix=} option) to rename each
   2956 @code{yyFlexLexer} to some other @code{xxFlexLexer}.  You then can
   2957 include @samp{<FlexLexer.h>} in your other sources once per lexer
   2958 class, first renaming @code{yyFlexLexer} as follows:
   2959 
   2960 @example
   2961 #undef yyFlexLexer
   2962 #define yyFlexLexer xxFlexLexer
   2963 #include <FlexLexer.h>
   2964 
   2965 #undef yyFlexLexer
   2966 #define yyFlexLexer zzFlexLexer
   2967 #include <FlexLexer.h>
   2968 @end example
   2969 
   2970 if, for example, you used @samp{%option prefix="xx"} for one of
   2971 your scanners and @samp{%option prefix="zz"} for the other.
   2972 
   2973 IMPORTANT: the present form of the scanning class is
   2974 @emph{experimental} and may change considerably between major
   2975 releases.
   2976 
   2977 @node Incompatibilities, Diagnostics, C++, Top
   2978 @section Incompatibilities with @code{lex} and POSIX
   2979 
   2980 @code{flex} is a rewrite of the AT&T Unix @code{lex} tool (the two
   2981 implementations do not share any code, though), with some
   2982 extensions and incompatibilities, both of which are of
   2983 concern to those who wish to write scanners acceptable to
   2984 either implementation.  Flex is fully compliant with the
   2985 POSIX @code{lex} specification, except that when using @samp{%pointer}
   2986 (the default), a call to @samp{unput()} destroys the contents of
   2987 @code{yytext}, which is counter to the POSIX specification.
   2988 
   2989 In this section we discuss all of the known areas of
   2990 incompatibility between flex, AT&T lex, and the POSIX
   2991 specification.
   2992 
   2993 @code{flex's} @samp{-l} option turns on maximum compatibility with the
   2994 original AT&T @code{lex} implementation, at the cost of a major
   2995 loss in the generated scanner's performance.  We note
   2996 below which incompatibilities can be overcome using the @samp{-l}
   2997 option.
   2998 
   2999 @code{flex} is fully compatible with @code{lex} with the following
   3000 exceptions:
   3001 
   3002 @itemize -
   3003 @item
   3004 The undocumented @code{lex} scanner internal variable @code{yylineno}
   3005 is not supported unless @samp{-l} or @samp{%option yylineno} is used.
   3006 @code{yylineno} should be maintained on a per-buffer basis, rather
   3007 than a per-scanner (single global variable) basis.  @code{yylineno} is
   3008 not part of the POSIX specification.
   3009 
   3010 @item
   3011 The @samp{input()} routine is not redefinable, though it
   3012 may be called to read characters following whatever
   3013 has been matched by a rule.  If @samp{input()} encounters
   3014 an end-of-file the normal @samp{yywrap()} processing is
   3015 done.  A ``real'' end-of-file is returned by
   3016 @samp{input()} as @code{EOF}.
   3017 
   3018 Input is instead controlled by defining the
   3019 @code{YY_INPUT} macro.
   3020 
   3021 The @code{flex} restriction that @samp{input()} cannot be
   3022 redefined is in accordance with the POSIX
   3023 specification, which simply does not specify any way of
   3024 controlling the scanner's input other than by making
   3025 an initial assignment to @code{yyin}.
   3026 
   3027 @item
   3028 The @samp{unput()} routine is not redefinable.  This
   3029 restriction is in accordance with POSIX.
   3030 
   3031 @item
   3032 @code{flex} scanners are not as reentrant as @code{lex} scanners.
   3033 In particular, if you have an interactive scanner
   3034 and an interrupt handler which long-jumps out of
   3035 the scanner, and the scanner is subsequently called
   3036 again, you may get the following message:
   3037 
   3038 @example
   3039 fatal flex scanner internal error--end of buffer missed
   3040 @end example
   3041 
   3042 To reenter the scanner, first use
   3043 
   3044 @example
   3045 yyrestart( yyin );
   3046 @end example
   3047 
   3048 Note that this call will throw away any buffered
   3049 input; usually this isn't a problem with an
   3050 interactive scanner.
   3051 
   3052 Also note that flex C++ scanner classes @emph{are}
   3053 reentrant, so if using C++ is an option for you, you
   3054 should use them instead.  See "Generating C++
   3055 Scanners" above for details.
   3056 
   3057 @item
   3058 @samp{output()} is not supported.  Output from the @samp{ECHO}
   3059 macro is done to the file-pointer @code{yyout} (default
   3060 @code{stdout}).
   3061 
   3062 @samp{output()} is not part of the POSIX specification.
   3063 
   3064 @item
   3065 @code{lex} does not support exclusive start conditions
   3066 (%x), though they are in the POSIX specification.
   3067 
   3068 @item
   3069 When definitions are expanded, @code{flex} encloses them
   3070 in parentheses.  With lex, the following:
   3071 
   3072 @example
   3073 NAME    [A-Z][A-Z0-9]*
   3074 %%
   3075 foo@{NAME@}?      printf( "Found it\n" );
   3076 %%
   3077 @end example
   3078 
   3079 will not match the string "foo" because when the
   3080 macro is expanded the rule is equivalent to
   3081 "foo[A-Z][A-Z0-9]*?" and the precedence is such that the
   3082 '?' is associated with "[A-Z0-9]*".  With @code{flex}, the
   3083 rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and
   3084 so the string "foo" will match.
   3085 
   3086 Note that if the definition begins with @samp{^} or ends
   3087 with @samp{$} then it is @emph{not} expanded with parentheses, to
   3088 allow these operators to appear in definitions
   3089 without losing their special meanings.  But the
   3090 @samp{<s>, /}, and @samp{<<EOF>>} operators cannot be used in a
   3091 @code{flex} definition.
   3092 
   3093 Using @samp{-l} results in the @code{lex} behavior of no
   3094 parentheses around the definition.
   3095 
   3096 The POSIX specification is that the definition be enclosed in
   3097 parentheses.
   3098 
   3099 @item
   3100 Some implementations of @code{lex} allow a rule's action to begin on
   3101 a separate line, if the rule's pattern has trailing whitespace:
   3102 
   3103 @example
   3104 %%
   3105 foo|bar<space here>
   3106   @{ foobar_action(); @}
   3107 @end example
   3108 
   3109 @code{flex} does not support this feature.
   3110 
   3111 @item
   3112 The @code{lex} @samp{%r} (generate a Ratfor scanner) option is
   3113 not supported.  It is not part of the POSIX
   3114 specification.
   3115 
   3116 @item
   3117 After a call to @samp{unput()}, @code{yytext} is undefined until
   3118 the next token is matched, unless the scanner was
   3119 built using @samp{%array}.  This is not the case with @code{lex}
   3120 or the POSIX specification.  The @samp{-l} option does
   3121 away with this incompatibility.
   3122 
   3123 @item
   3124 The precedence of the @samp{@{@}} (numeric range) operator
   3125 is different.  @code{lex} interprets "abc@{1,3@}" as "match
   3126 one, two, or three occurrences of 'abc'", whereas
   3127 @code{flex} interprets it as "match 'ab' followed by one,
   3128 two, or three occurrences of 'c'".  The latter is
   3129 in agreement with the POSIX specification.
   3130 
   3131 @item
   3132 The precedence of the @samp{^} operator is different.  @code{lex}
   3133 interprets "^foo|bar" as "match either 'foo' at the
   3134 beginning of a line, or 'bar' anywhere", whereas
   3135 @code{flex} interprets it as "match either 'foo' or 'bar'
   3136 if they come at the beginning of a line".  The
   3137 latter is in agreement with the POSIX specification.
   3138 
   3139 @item
   3140 The special table-size declarations such as @samp{%a}
   3141 supported by @code{lex} are not required by @code{flex} scanners;
   3142 @code{flex} ignores them.
   3143 
   3144 @item
   3145 The name FLEX_SCANNER is #define'd so scanners may
   3146 be written for use with either @code{flex} or @code{lex}.
   3147 Scanners also include @code{YY_FLEX_MAJOR_VERSION} and
   3148 @code{YY_FLEX_MINOR_VERSION} indicating which version of
   3149 @code{flex} generated the scanner (for example, for the
   3150 2.5 release, these defines would be 2 and 5
   3151 respectively).
   3152 @end itemize
   3153 
   3154 The following @code{flex} features are not included in @code{lex} or the
   3155 POSIX specification:
   3156 
   3157 @example
   3158 C++ scanners
   3159 %option
   3160 start condition scopes
   3161 start condition stacks
   3162 interactive/non-interactive scanners
   3163 yy_scan_string() and friends
   3164 yyterminate()
   3165 yy_set_interactive()
   3166 yy_set_bol()
   3167 YY_AT_BOL()
   3168 <<EOF>>
   3169 <*>
   3170 YY_DECL
   3171 YY_START
   3172 YY_USER_ACTION
   3173 YY_USER_INIT
   3174 #line directives
   3175 %@{@}'s around actions
   3176 multiple actions on a line
   3177 @end example
   3178 
   3179 @noindent
   3180 plus almost all of the flex flags.  The last feature in
   3181 the list refers to the fact that with @code{flex} you can put
   3182 multiple actions on the same line, separated with
   3183 semicolons, while with @code{lex}, the following
   3184 
   3185 @example
   3186 foo    handle_foo(); ++num_foos_seen;
   3187 @end example
   3188 
   3189 @noindent
   3190 is (rather surprisingly) truncated to
   3191 
   3192 @example
   3193 foo    handle_foo();
   3194 @end example
   3195 
   3196 @code{flex} does not truncate the action.  Actions that are not
   3197 enclosed in braces are simply terminated at the end of the
   3198 line.
   3199 
   3200 @node Diagnostics, Files, Incompatibilities, Top
   3201 @section Diagnostics
   3202 
   3203 @table @samp
   3204 @item warning, rule cannot be matched
   3205 indicates that the given
   3206 rule cannot be matched because it follows other rules that
   3207 will always match the same text as it.  For example, in
   3208 the following "foo" cannot be matched because it comes
   3209 after an identifier "catch-all" rule:
   3210 
   3211 @example
   3212 [a-z]+    got_identifier();
   3213 foo       got_foo();
   3214 @end example
   3215 
   3216 Using @code{REJECT} in a scanner suppresses this warning.
   3217 
   3218 @item warning, -s option given but default rule can be matched
   3219 means that it is possible (perhaps only in a particular
   3220 start condition) that the default rule (match any single
   3221 character) is the only one that will match a particular
   3222 input.  Since @samp{-s} was given, presumably this is not
   3223 intended.
   3224 
   3225 @item reject_used_but_not_detected undefined
   3226 @itemx yymore_used_but_not_detected undefined
   3227 These errors can
   3228 occur at compile time.  They indicate that the scanner
   3229 uses @code{REJECT} or @samp{yymore()} but that @code{flex} failed to notice the
   3230 fact, meaning that @code{flex} scanned the first two sections
   3231 looking for occurrences of these actions and failed to
   3232 find any, but somehow you snuck some in (via a #include
   3233 file, for example).  Use @samp{%option reject} or @samp{%option yymore}
   3234 to indicate to flex that you really do use these features.
   3235 
   3236 @item flex scanner jammed
   3237 a scanner compiled with @samp{-s} has
   3238 encountered an input string which wasn't matched by any of
   3239 its rules.  This error can also occur due to internal
   3240 problems.
   3241 
   3242 @item token too large, exceeds YYLMAX
   3243 your scanner uses @samp{%array}
   3244 and one of its rules matched a string longer than the @samp{YYL-}
   3245 @code{MAX} constant (8K bytes by default).  You can increase the
   3246 value by #define'ing @code{YYLMAX} in the definitions section of
   3247 your @code{flex} input.
   3248 
   3249 @item scanner requires -8 flag to use the character '@var{x}'
   3250 Your
   3251 scanner specification includes recognizing the 8-bit
   3252 character @var{x} and you did not specify the -8 flag, and your
   3253 scanner defaulted to 7-bit because you used the @samp{-Cf} or @samp{-CF}
   3254 table compression options.  See the discussion of the @samp{-7}
   3255 flag for details.
   3256 
   3257 @item flex scanner push-back overflow
   3258 you used @samp{unput()} to push
   3259 back so much text that the scanner's buffer could not hold
   3260 both the pushed-back text and the current token in @code{yytext}.
   3261 Ideally the scanner should dynamically resize the buffer
   3262 in this case, but at present it does not.
   3263 
   3264 @item input buffer overflow, can't enlarge buffer because scanner uses REJECT
   3265 the scanner was working on matching an
   3266 extremely large token and needed to expand the input
   3267 buffer.  This doesn't work with scanners that use @code{REJECT}.
   3268 
   3269 @item fatal flex scanner internal error--end of buffer missed
   3270 This can occur in an scanner which is reentered after a
   3271 long-jump has jumped out (or over) the scanner's
   3272 activation frame.  Before reentering the scanner, use:
   3273 
   3274 @example
   3275 yyrestart( yyin );
   3276 @end example
   3277 
   3278 @noindent
   3279 or, as noted above, switch to using the C++ scanner class.
   3280 
   3281 @item too many start conditions in <> construct!
   3282 you listed
   3283 more start conditions in a <> construct than exist (so you
   3284 must have listed at least one of them twice).
   3285 @end table
   3286 
   3287 @node Files, Deficiencies, Diagnostics, Top
   3288 @section Files
   3289 
   3290 @table @file
   3291 @item -lfl
   3292 library with which scanners must be linked.
   3293 
   3294 @item lex.yy.c
   3295 generated scanner (called @file{lexyy.c} on some systems).
   3296 
   3297 @item lex.yy.cc
   3298 generated C++ scanner class, when using @samp{-+}.
   3299 
   3300 @item <FlexLexer.h>
   3301 header file defining the C++ scanner base class,
   3302 @code{FlexLexer}, and its derived class, @code{yyFlexLexer}.
   3303 
   3304 @item flex.skl
   3305 skeleton scanner.  This file is only used when
   3306 building flex, not when flex executes.
   3307 
   3308 @item lex.backup
   3309 backing-up information for @samp{-b} flag (called @file{lex.bck}
   3310 on some systems).
   3311 @end table
   3312 
   3313 @node Deficiencies, See also, Files, Top
   3314 @section Deficiencies / Bugs
   3315 
   3316 Some trailing context patterns cannot be properly matched
   3317 and generate warning messages ("dangerous trailing
   3318 context").  These are patterns where the ending of the first
   3319 part of the rule matches the beginning of the second part,
   3320 such as "zx*/xy*", where the 'x*' matches the 'x' at the
   3321 beginning of the trailing context.  (Note that the POSIX
   3322 draft states that the text matched by such patterns is
   3323 undefined.)
   3324 
   3325 For some trailing context rules, parts which are actually
   3326 fixed-length are not recognized as such, leading to the
   3327 abovementioned performance loss.  In particular, parts
   3328 using '|' or @{n@} (such as "foo@{3@}") are always considered
   3329 variable-length.
   3330 
   3331 Combining trailing context with the special '|' action can
   3332 result in @emph{fixed} trailing context being turned into the
   3333 more expensive @var{variable} trailing context.  For example, in
   3334 the following:
   3335 
   3336 @example
   3337 %%
   3338 abc      |
   3339 xyz/def
   3340 @end example
   3341 
   3342 Use of @samp{unput()} invalidates yytext and yyleng, unless the
   3343 @samp{%array} directive or the @samp{-l} option has been used.
   3344 
   3345 Pattern-matching of NUL's is substantially slower than
   3346 matching other characters.
   3347 
   3348 Dynamic resizing of the input buffer is slow, as it
   3349 entails rescanning all the text matched so far by the
   3350 current (generally huge) token.
   3351 
   3352 Due to both buffering of input and read-ahead, you cannot
   3353 intermix calls to <stdio.h> routines, such as, for
   3354 example, @samp{getchar()}, with @code{flex} rules and expect it to work.
   3355 Call @samp{input()} instead.
   3356 
   3357 The total table entries listed by the @samp{-v} flag excludes the
   3358 number of table entries needed to determine what rule has
   3359 been matched.  The number of entries is equal to the
   3360 number of DFA states if the scanner does not use @code{REJECT}, and
   3361 somewhat greater than the number of states if it does.
   3362 
   3363 @code{REJECT} cannot be used with the @samp{-f} or @samp{-F} options.
   3364 
   3365 The @code{flex} internal algorithms need documentation.
   3366 
   3367 @node See also, Author, Deficiencies, Top
   3368 @section See also
   3369 
   3370 @code{lex}(1), @code{yacc}(1), @code{sed}(1), @code{awk}(1).
   3371 
   3372 John Levine, Tony Mason, and Doug Brown: Lex & Yacc;
   3373 O'Reilly and Associates.  Be sure to get the 2nd edition.
   3374 
   3375 M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
   3376 
   3377 Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers:
   3378 Principles, Techniques and Tools; Addison-Wesley (1986).
   3379 Describes the pattern-matching techniques used by @code{flex}
   3380 (deterministic finite automata).
   3381 
   3382 @node Author,  , See also, Top
   3383 @section Author
   3384 
   3385 Vern Paxson, with the help of many ideas and much inspiration from
   3386 Van Jacobson.  Original version by Jef Poskanzer.  The fast table
   3387 representation is a partial implementation of a design done by Van
   3388 Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
   3389 
   3390 Thanks to the many @code{flex} beta-testers, feedbackers, and
   3391 contributors, especially Francois Pinard, Casey Leedom, Stan
   3392 Adermann, Terry Allen, David Barker-Plummer, John Basrai, Nelson
   3393 H.F. Beebe, @samp{benson@@odi.com}, Karl Berry, Peter A. Bigot,
   3394 Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank, Kin
   3395 Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin,
   3396 Bill Cox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris
   3397 G. Demetriou, Theo Deraadt, Mike Donahue, Chuck Doucette, Tom Epperly,
   3398 Leo Eskin, Chris Faylor, Chris Flatters, Jon Forrest, Joe Gayda, Kaveh
   3399 R. Ghazi, Eric Goldman, Christopher M.  Gould, Ulrich Grepel, Peer
   3400 Griebel, Jan Hajic, Charles Hemphill, NORO Hideo, Jarkko Hietaniemi,
   3401 Scott Hofmann, Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
   3402 Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
   3403 Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
   3404 Amir Katz, @samp{ken@@ken.hilco.com}, Kevin B. Kenny, Steve Kirsch,
   3405 Winfried Koenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard,
   3406 Craig Leres, John Levine, Steve Liddle, Mike Long, Mohamed el Lozy,
   3407 Brian Madsen, Malte, Joe Marshall, Bengt Martensson, Chris Metcalf,
   3408 Luke Mewburn, Jim Meyering, R.  Alexander Milowski, Erik Naggum,
   3409 G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, Richard Ohnemus,
   3410 Karsten Pahnke, Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
   3411 Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic
   3412 Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel,
   3413 Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, Raf
   3414 Schietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex
   3415 Siegel, Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart,
   3416 Dave Tallman, Ian Lance Taylor, Chris Thewalt, Richard M. Timoney,
   3417 Jodi Tsai, Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms,
   3418 Kent Williams, Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, and
   3419 those whose names have slipped my marginal mail-archiving skills but
   3420 whose contributions are appreciated all the same.
   3421 
   3422 Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
   3423 Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol, Francois Pinard,
   3424 Rich Salz, and Richard Stallman for help with various distribution
   3425 headaches.
   3426 
   3427 Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
   3428 to Benson Margulies and Fred Burke for C++ support; to Kent Williams
   3429 and Tom Epperly for C++ class support; to Ove Ewerlid for support of
   3430 NUL's; and to Eric Hughes for support of multiple buffers.
   3431 
   3432 This work was primarily done when I was with the Real Time Systems
   3433 Group at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks
   3434 to all there for the support I received.
   3435 
   3436 Send comments to @samp{vern@@ee.lbl.gov}.
   3437 
   3438 @c @node Index,  , Top, Top
   3439 @c @unnumbered Index
   3440 @c
   3441 @c @printindex cp
   3442 
   3443 @contents
   3444 @bye
   3445 
   3446 @c Local variables:
   3447 @c texinfo-column-for-description: 32
   3448 @c End:
   3449