Home | History | Annotate | Download | only in texinfo
      1 This is Info file flex.info, produced by Makeinfo-1.55 from the input
      2 file flex.texi.
      3 
      4 START-INFO-DIR-ENTRY
      5 * Flex: (flex).         A fast scanner generator.
      6 END-INFO-DIR-ENTRY
      7 
      8    This file documents Flex.
      9 
     10    Copyright (c) 1990 The Regents of the University of California.  All
     11 rights reserved.
     12 
     13    This code is derived from software contributed to Berkeley by Vern
     14 Paxson.
     15 
     16    The United States Government has rights in this work pursuant to
     17 contract no. DE-AC03-76SF00098 between the United States Department of
     18 Energy and the University of California.
     19 
     20    Redistribution and use in source and binary forms with or without
     21 modification are permitted provided that: (1) source distributions
     22 retain this entire copyright notice and comment, and (2) distributions
     23 including binaries display the following acknowledgement:  "This
     24 product includes software developed by the University of California,
     25 Berkeley and its contributors" in the documentation or other materials
     26 provided with the distribution and in all advertising materials
     27 mentioning features or use of this software.  Neither the name of the
     28 University nor the names of its contributors may be used to endorse or
     29 promote products derived from this software without specific prior
     30 written permission.
     31 
     32    THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
     33 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
     34 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
     35 
     36 
     37 File: flex.info,  Node: Top,  Next: Name,  Prev: (dir),  Up: (dir)
     38 
     39 flex
     40 ****
     41 
     42    This manual documents `flex'.  It covers release 2.5.
     43 
     44 * Menu:
     45 
     46 * Name::                        Name
     47 * Synopsis::                    Synopsis
     48 * Overview::                    Overview
     49 * Description::                 Description
     50 * Examples::                    Some simple examples
     51 * Format::                      Format of the input file
     52 * Patterns::                    Patterns
     53 * Matching::                    How the input is matched
     54 * Actions::                     Actions
     55 * Generated scanner::           The generated scanner
     56 * Start conditions::            Start conditions
     57 * Multiple buffers::            Multiple input buffers
     58 * End-of-file rules::           End-of-file rules
     59 * Miscellaneous::               Miscellaneous macros
     60 * User variables::              Values available to the user
     61 * YACC interface::              Interfacing with `yacc'
     62 * Options::                     Options
     63 * Performance::                 Performance considerations
     64 * C++::                         Generating C++ scanners
     65 * Incompatibilities::           Incompatibilities with `lex' and POSIX
     66 * Diagnostics::                 Diagnostics
     67 * Files::                       Files
     68 * Deficiencies::                Deficiencies / Bugs
     69 * See also::                    See also
     70 * Author::                      Author
     71 
     72 
     73 File: flex.info,  Node: Name,  Next: Synopsis,  Prev: Top,  Up: Top
     74 
     75 Name
     76 ====
     77 
     78    flex - fast lexical analyzer generator
     79 
     80 
     81 File: flex.info,  Node: Synopsis,  Next: Overview,  Prev: Name,  Up: Top
     82 
     83 Synopsis
     84 ========
     85 
     86      flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
     87      [--help --version] [FILENAME ...]
     88 
     89 
     90 File: flex.info,  Node: Overview,  Next: Description,  Prev: Synopsis,  Up: Top
     91 
     92 Overview
     93 ========
     94 
     95    This manual describes `flex', a tool for generating programs that
     96 perform pattern-matching on text.  The manual includes both tutorial
     97 and reference sections:
     98 
     99 Description
    100      a brief overview of the tool
    101 
    102 Some Simple Examples
    103 Format Of The Input File
    104 Patterns
    105      the extended regular expressions used by flex
    106 
    107 How The Input Is Matched
    108      the rules for determining what has been matched
    109 
    110 Actions
    111      how to specify what to do when a pattern is matched
    112 
    113 The Generated Scanner
    114      details regarding the scanner that flex produces; how to control
    115      the input source
    116 
    117 Start Conditions
    118      introducing context into your scanners, and managing
    119      "mini-scanners"
    120 
    121 Multiple Input Buffers
    122      how to manipulate multiple input sources; how to scan from strings
    123      instead of files
    124 
    125 End-of-file Rules
    126      special rules for matching the end of the input
    127 
    128 Miscellaneous Macros
    129      a summary of macros available to the actions
    130 
    131 Values Available To The User
    132      a summary of values available to the actions
    133 
    134 Interfacing With Yacc
    135      connecting flex scanners together with yacc parsers
    136 
    137 Options
    138      flex command-line options, and the "%option" directive
    139 
    140 Performance Considerations
    141      how to make your scanner go as fast as possible
    142 
    143 Generating C++ Scanners
    144      the (experimental) facility for generating C++ scanner classes
    145 
    146 Incompatibilities With Lex And POSIX
    147      how flex differs from AT&T lex and the POSIX lex standard
    148 
    149 Diagnostics
    150      those error messages produced by flex (or scanners it generates)
    151      whose meanings might not be apparent
    152 
    153 Files
    154      files used by flex
    155 
    156 Deficiencies / Bugs
    157      known problems with flex
    158 
    159 See Also
    160      other documentation, related tools
    161 
    162 Author
    163      includes contact information
    164 
    165 
    166 File: flex.info,  Node: Description,  Next: Examples,  Prev: Overview,  Up: Top
    167 
    168 Description
    169 ===========
    170 
    171    `flex' is a tool for generating "scanners": programs which
    172 recognized lexical patterns in text.  `flex' reads the given input
    173 files, or its standard input if no file names are given, for a
    174 description of a scanner to generate.  The description is in the form
    175 of pairs of regular expressions and C code, called "rules". `flex'
    176 generates as output a C source file, `lex.yy.c', which defines a
    177 routine `yylex()'.  This file is compiled and linked with the `-lfl'
    178 library to produce an executable.  When the executable is run, it
    179 analyzes its input for occurrences of the regular expressions.
    180 Whenever it finds one, it executes the corresponding C code.
    181 
    182 
    183 File: flex.info,  Node: Examples,  Next: Format,  Prev: Description,  Up: Top
    184 
    185 Some simple examples
    186 ====================
    187 
    188    First some simple examples to get the flavor of how one uses `flex'.
    189 The following `flex' input specifies a scanner which whenever it
    190 encounters the string "username" will replace it with the user's login
    191 name:
    192 
    193      %%
    194      username    printf( "%s", getlogin() );
    195 
    196    By default, any text not matched by a `flex' scanner is copied to
    197 the output, so the net effect of this scanner is to copy its input file
    198 to its output with each occurrence of "username" expanded.  In this
    199 input, there is just one rule.  "username" is the PATTERN and the
    200 "printf" is the ACTION.  The "%%" marks the beginning of the rules.
    201 
    202    Here's another simple example:
    203 
    204              int num_lines = 0, num_chars = 0;
    205      
    206      %%
    207      \n      ++num_lines; ++num_chars;
    208      .       ++num_chars;
    209      
    210      %%
    211      main()
    212              {
    213              yylex();
    214              printf( "# of lines = %d, # of chars = %d\n",
    215                      num_lines, num_chars );
    216              }
    217 
    218    This scanner counts the number of characters and the number of lines
    219 in its input (it produces no output other than the final report on the
    220 counts).  The first line declares two globals, "num_lines" and
    221 "num_chars", which are accessible both inside `yylex()' and in the
    222 `main()' routine declared after the second "%%".  There are two rules,
    223 one which matches a newline ("\n") and increments both the line count
    224 and the character count, and one which matches any character other than
    225 a newline (indicated by the "." regular expression).
    226 
    227    A somewhat more complicated example:
    228 
    229      /* scanner for a toy Pascal-like language */
    230      
    231      %{
    232      /* need this for the call to atof() below */
    233      #include <math.h>
    234      %}
    235      
    236      DIGIT    [0-9]
    237      ID       [a-z][a-z0-9]*
    238      
    239      %%
    240      
    241      {DIGIT}+    {
    242                  printf( "An integer: %s (%d)\n", yytext,
    243                          atoi( yytext ) );
    244                  }
    245      
    246      {DIGIT}+"."{DIGIT}*        {
    247                  printf( "A float: %s (%g)\n", yytext,
    248                          atof( yytext ) );
    249                  }
    250      
    251      if|then|begin|end|procedure|function        {
    252                  printf( "A keyword: %s\n", yytext );
    253                  }
    254      
    255      {ID}        printf( "An identifier: %s\n", yytext );
    256      
    257      "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
    258      
    259      "{"[^}\n]*"}"     /* eat up one-line comments */
    260      
    261      [ \t\n]+          /* eat up whitespace */
    262      
    263      .           printf( "Unrecognized character: %s\n", yytext );
    264      
    265      %%
    266      
    267      main( argc, argv )
    268      int argc;
    269      char **argv;
    270          {
    271          ++argv, --argc;  /* skip over program name */
    272          if ( argc > 0 )
    273                  yyin = fopen( argv[0], "r" );
    274          else
    275                  yyin = stdin;
    276      
    277          yylex();
    278          }
    279 
    280    This is the beginnings of a simple scanner for a language like
    281 Pascal.  It identifies different types of TOKENS and reports on what it
    282 has seen.
    283 
    284    The details of this example will be explained in the following
    285 sections.
    286 
    287 
    288 File: flex.info,  Node: Format,  Next: Patterns,  Prev: Examples,  Up: Top
    289 
    290 Format of the input file
    291 ========================
    292 
    293    The `flex' input file consists of three sections, separated by a
    294 line with just `%%' in it:
    295 
    296      definitions
    297      %%
    298      rules
    299      %%
    300      user code
    301 
    302    The "definitions" section contains declarations of simple "name"
    303 definitions to simplify the scanner specification, and declarations of
    304 "start conditions", which are explained in a later section.  Name
    305 definitions have the form:
    306 
    307      name definition
    308 
    309    The "name" is a word beginning with a letter or an underscore ('_')
    310 followed by zero or more letters, digits, '_', or '-' (dash).  The
    311 definition is taken to begin at the first non-white-space character
    312 following the name and continuing to the end of the line.  The
    313 definition can subsequently be referred to using "{name}", which will
    314 expand to "(definition)".  For example,
    315 
    316      DIGIT    [0-9]
    317      ID       [a-z][a-z0-9]*
    318 
    319 defines "DIGIT" to be a regular expression which matches a single
    320 digit, and "ID" to be a regular expression which matches a letter
    321 followed by zero-or-more letters-or-digits.  A subsequent reference to
    322 
    323      {DIGIT}+"."{DIGIT}*
    324 
    325 is identical to
    326 
    327      ([0-9])+"."([0-9])*
    328 
    329 and matches one-or-more digits followed by a '.' followed by
    330 zero-or-more digits.
    331 
    332    The RULES section of the `flex' input contains a series of rules of
    333 the form:
    334 
    335      pattern   action
    336 
    337 where the pattern must be unindented and the action must begin on the
    338 same line.
    339 
    340    See below for a further description of patterns and actions.
    341 
    342    Finally, the user code section is simply copied to `lex.yy.c'
    343 verbatim.  It is used for companion routines which call or are called
    344 by the scanner.  The presence of this section is optional; if it is
    345 missing, the second `%%' in the input file may be skipped, too.
    346 
    347    In the definitions and rules sections, any *indented* text or text
    348 enclosed in `%{' and `%}' is copied verbatim to the output (with the
    349 `%{}''s removed).  The `%{}''s must appear unindented on lines by
    350 themselves.
    351 
    352    In the rules section, any indented or %{} text appearing before the
    353 first rule may be used to declare variables which are local to the
    354 scanning routine and (after the declarations) code which is to be
    355 executed whenever the scanning routine is entered.  Other indented or
    356 %{} text in the rule section is still copied to the output, but its
    357 meaning is not well-defined and it may well cause compile-time errors
    358 (this feature is present for `POSIX' compliance; see below for other
    359 such features).
    360 
    361    In the definitions section (but not in the rules section), an
    362 unindented comment (i.e., a line beginning with "/*") is also copied
    363 verbatim to the output up to the next "*/".
    364 
    365 
    366 File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
    367 
    368 Patterns
    369 ========
    370 
    371    The patterns in the input are written using an extended set of
    372 regular expressions.  These are:
    373 
    374 `x'
    375      match the character `x'
    376 
    377 `.'
    378      any character (byte) except newline
    379 
    380 `[xyz]'
    381      a "character class"; in this case, the pattern matches either an
    382      `x', a `y', or a `z'
    383 
    384 `[abj-oZ]'
    385      a "character class" with a range in it; matches an `a', a `b', any
    386      letter from `j' through `o', or a `Z'
    387 
    388 `[^A-Z]'
    389      a "negated character class", i.e., any character but those in the
    390      class.  In this case, any character EXCEPT an uppercase letter.
    391 
    392 `[^A-Z\n]'
    393      any character EXCEPT an uppercase letter or a newline
    394 
    395 `R*'
    396      zero or more R's, where R is any regular expression
    397 
    398 `R+'
    399      one or more R's
    400 
    401 `R?'
    402      zero or one R's (that is, "an optional R")
    403 
    404 `R{2,5}'
    405      anywhere from two to five R's
    406 
    407 `R{2,}'
    408      two or more R's
    409 
    410 `R{4}'
    411      exactly 4 R's
    412 
    413 `{NAME}'
    414      the expansion of the "NAME" definition (see above)
    415 
    416 `"[xyz]\"foo"'
    417      the literal string: `[xyz]"foo'
    418 
    419 `\X'
    420      if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
    421      interpretation of \X.  Otherwise, a literal `X' (used to escape
    422      operators such as `*')
    423 
    424 `\0'
    425      a NUL character (ASCII code 0)
    426 
    427 `\123'
    428      the character with octal value 123
    429 
    430 `\x2a'
    431      the character with hexadecimal value `2a'
    432 
    433 `(R)'
    434      match an R; parentheses are used to override precedence (see below)
    435 
    436 `RS'
    437      the regular expression R followed by the regular expression S;
    438      called "concatenation"
    439 
    440 `R|S'
    441      either an R or an S
    442 
    443 `R/S'
    444      an R but only if it is followed by an S.  The text matched by S is
    445      included when determining whether this rule is the "longest
    446      match", but is then returned to the input before the action is
    447      executed.  So the action only sees the text matched by R.  This
    448      type of pattern is called "trailing context".  (There are some
    449      combinations of `R/S' that `flex' cannot match correctly; see
    450      notes in the Deficiencies / Bugs section below regarding
    451      "dangerous trailing context".)
    452 
    453 `^R'
    454      an R, but only at the beginning of a line (i.e., which just
    455      starting to scan, or right after a newline has been scanned).
    456 
    457 `R$'
    458      an R, but only at the end of a line (i.e., just before a newline).
    459      Equivalent to "R/\n".
    460 
    461      Note that flex's notion of "newline" is exactly whatever the C
    462      compiler used to compile flex interprets '\n' as; in particular,
    463      on some DOS systems you must either filter out \r's in the input
    464      yourself, or explicitly use R/\r\n for "r$".
    465 
    466 `<S>R'
    467      an R, but only in start condition S (see below for discussion of
    468      start conditions) <S1,S2,S3>R same, but in any of start conditions
    469      S1, S2, or S3
    470 
    471 `<*>R'
    472      an R in any start condition, even an exclusive one.
    473 
    474 `<<EOF>>'
    475      an end-of-file <S1,S2><<EOF>> an end-of-file when in start
    476      condition S1 or S2
    477 
    478    Note that inside of a character class, all regular expression
    479 operators lose their special meaning except escape ('\') and the
    480 character class operators, '-', ']', and, at the beginning of the
    481 class, '^'.
    482 
    483    The regular expressions listed above are grouped according to
    484 precedence, from highest precedence at the top to lowest at the bottom.
    485 Those grouped together have equal precedence.  For example,
    486 
    487      foo|bar*
    488 
    489 is the same as
    490 
    491      (foo)|(ba(r*))
    492 
    493 since the '*' operator has higher precedence than concatenation, and
    494 concatenation higher than alternation ('|').  This pattern therefore
    495 matches *either* the string "foo" *or* the string "ba" followed by
    496 zero-or-more r's.  To match "foo" or zero-or-more "bar"'s, use:
    497 
    498      foo|(bar)*
    499 
    500 and to match zero-or-more "foo"'s-or-"bar"'s:
    501 
    502      (foo|bar)*
    503 
    504    In addition to characters and ranges of characters, character
    505 classes can also contain character class "expressions".  These are
    506 expressions enclosed inside `[': and `:'] delimiters (which themselves
    507 must appear between the '[' and ']' of the character class; other
    508 elements may occur inside the character class, too).  The valid
    509 expressions are:
    510 
    511      [:alnum:] [:alpha:] [:blank:]
    512      [:cntrl:] [:digit:] [:graph:]
    513      [:lower:] [:print:] [:punct:]
    514      [:space:] [:upper:] [:xdigit:]
    515 
    516    These expressions all designate a set of characters equivalent to
    517 the corresponding standard C `isXXX' function.  For example,
    518 `[:alnum:]' designates those characters for which `isalnum()' returns
    519 true - i.e., any alphabetic or numeric.  Some systems don't provide
    520 `isblank()', so flex defines `[:blank:]' as a blank or a tab.
    521 
    522    For example, the following character classes are all equivalent:
    523 
    524      [[:alnum:]]
    525      [[:alpha:][:digit:]
    526      [[:alpha:]0-9]
    527      [a-zA-Z0-9]
    528 
    529    If your scanner is case-insensitive (the `-i' flag), then
    530 `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
    531 
    532    Some notes on patterns:
    533 
    534    - A negated character class such as the example "[^A-Z]" above *will
    535      match a newline* unless "\n" (or an equivalent escape sequence) is
    536      one of the characters explicitly present in the negated character
    537      class (e.g., "[^A-Z\n]").  This is unlike how many other regular
    538      expression tools treat negated character classes, but
    539      unfortunately the inconsistency is historically entrenched.
    540      Matching newlines means that a pattern like [^"]* can match the
    541      entire input unless there's another quote in the input.
    542 
    543    - A rule can have at most one instance of trailing context (the '/'
    544      operator or the '$' operator).  The start condition, '^', and
    545      "<<EOF>>" patterns can only occur at the beginning of a pattern,
    546      and, as well as with '/' and '$', cannot be grouped inside
    547      parentheses.  A '^' which does not occur at the beginning of a
    548      rule or a '$' which does not occur at the end of a rule loses its
    549      special properties and is treated as a normal character.
    550 
    551      The following are illegal:
    552 
    553           foo/bar$
    554           <sc1>foo<sc2>bar
    555 
    556      Note that the first of these, can be written "foo/bar\n".
    557 
    558      The following will result in '$' or '^' being treated as a normal
    559      character:
    560 
    561           foo|(bar$)
    562           foo|^bar
    563 
    564      If what's wanted is a "foo" or a bar-followed-by-a-newline, the
    565      following could be used (the special '|' action is explained
    566      below):
    567 
    568           foo      |
    569           bar$     /* action goes here */
    570 
    571      A similar trick will work for matching a foo or a
    572      bar-at-the-beginning-of-a-line.
    573 
    574 
    575 File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
    576 
    577 How the input is matched
    578 ========================
    579 
    580    When the generated scanner is run, it analyzes its input looking for
    581 strings which match any of its patterns.  If it finds more than one
    582 match, it takes the one matching the most text (for trailing context
    583 rules, this includes the length of the trailing part, even though it
    584 will then be returned to the input).  If it finds two or more matches
    585 of the same length, the rule listed first in the `flex' input file is
    586 chosen.
    587 
    588    Once the match is determined, the text corresponding to the match
    589 (called the TOKEN) is made available in the global character pointer
    590 `yytext', and its length in the global integer `yyleng'.  The ACTION
    591 corresponding to the matched pattern is then executed (a more detailed
    592 description of actions follows), and then the remaining input is
    593 scanned for another match.
    594 
    595    If no match is found, then the "default rule" is executed: the next
    596 character in the input is considered matched and copied to the standard
    597 output.  Thus, the simplest legal `flex' input is:
    598 
    599      %%
    600 
    601    which generates a scanner that simply copies its input (one
    602 character at a time) to its output.
    603 
    604    Note that `yytext' can be defined in two different ways: either as a
    605 character *pointer* or as a character *array*.  You can control which
    606 definition `flex' uses by including one of the special directives
    607 `%pointer' or `%array' in the first (definitions) section of your flex
    608 input.  The default is `%pointer', unless you use the `-l' lex
    609 compatibility option, in which case `yytext' will be an array.  The
    610 advantage of using `%pointer' is substantially faster scanning and no
    611 buffer overflow when matching very large tokens (unless you run out of
    612 dynamic memory).  The disadvantage is that you are restricted in how
    613 your actions can modify `yytext' (see the next section), and calls to
    614 the `unput()' function destroys the present contents of `yytext', which
    615 can be a considerable porting headache when moving between different
    616 `lex' versions.
    617 
    618    The advantage of `%array' is that you can then modify `yytext' to
    619 your heart's content, and calls to `unput()' do not destroy `yytext'
    620 (see below).  Furthermore, existing `lex' programs sometimes access
    621 `yytext' externally using declarations of the form:
    622      extern char yytext[];
    623    This definition is erroneous when used with `%pointer', but correct
    624 for `%array'.
    625 
    626    `%array' defines `yytext' to be an array of `YYLMAX' characters,
    627 which defaults to a fairly large value.  You can change the size by
    628 simply #define'ing `YYLMAX' to a different value in the first section
    629 of your `flex' input.  As mentioned above, with `%pointer' yytext grows
    630 dynamically to accommodate large tokens.  While this means your
    631 `%pointer' scanner can accommodate very large tokens (such as matching
    632 entire blocks of comments), bear in mind that each time the scanner
    633 must resize `yytext' it also must rescan the entire token from the
    634 beginning, so matching such tokens can prove slow.  `yytext' presently
    635 does *not* dynamically grow if a call to `unput()' results in too much
    636 text being pushed back; instead, a run-time error results.
    637 
    638    Also note that you cannot use `%array' with C++ scanner classes (the
    639 `c++' option; see below).
    640 
    641 
    642 File: flex.info,  Node: Actions,  Next: Generated scanner,  Prev: Matching,  Up: Top
    643 
    644 Actions
    645 =======
    646 
    647    Each pattern in a rule has a corresponding action, which can be any
    648 arbitrary C statement.  The pattern ends at the first non-escaped
    649 whitespace character; the remainder of the line is its action.  If the
    650 action is empty, then when the pattern is matched the input token is
    651 simply discarded.  For example, here is the specification for a program
    652 which deletes all occurrences of "zap me" from its input:
    653 
    654      %%
    655      "zap me"
    656 
    657    (It will copy all other characters in the input to the output since
    658 they will be matched by the default rule.)
    659 
    660    Here is a program which compresses multiple blanks and tabs down to
    661 a single blank, and throws away whitespace found at the end of a line:
    662 
    663      %%
    664      [ \t]+        putchar( ' ' );
    665      [ \t]+$       /* ignore this token */
    666 
    667    If the action contains a '{', then the action spans till the
    668 balancing '}' is found, and the action may cross multiple lines.
    669 `flex' knows about C strings and comments and won't be fooled by braces
    670 found within them, but also allows actions to begin with `%{' and will
    671 consider the action to be all the text up to the next `%}' (regardless
    672 of ordinary braces inside the action).
    673 
    674    An action consisting solely of a vertical bar ('|') means "same as
    675 the action for the next rule." See below for an illustration.
    676 
    677    Actions can include arbitrary C code, including `return' statements
    678 to return a value to whatever routine called `yylex()'.  Each time
    679 `yylex()' is called it continues processing tokens from where it last
    680 left off until it either reaches the end of the file or executes a
    681 return.
    682 
    683    Actions are free to modify `yytext' except for lengthening it
    684 (adding characters to its end-these will overwrite later characters in
    685 the input stream).  This however does not apply when using `%array'
    686 (see above); in that case, `yytext' may be freely modified in any way.
    687 
    688    Actions are free to modify `yyleng' except they should not do so if
    689 the action also includes use of `yymore()' (see below).
    690 
    691    There are a number of special directives which can be included
    692 within an action:
    693 
    694    - `ECHO' copies yytext to the scanner's output.
    695 
    696    - `BEGIN' followed by the name of a start condition places the
    697      scanner in the corresponding start condition (see below).
    698 
    699    - `REJECT' directs the scanner to proceed on to the "second best"
    700      rule which matched the input (or a prefix of the input).  The rule
    701      is chosen as described above in "How the Input is Matched", and
    702      `yytext' and `yyleng' set up appropriately.  It may either be one
    703      which matched as much text as the originally chosen rule but came
    704      later in the `flex' input file, or one which matched less text.
    705      For example, the following will both count the words in the input
    706      and call the routine special() whenever "frob" is seen:
    707 
    708                   int word_count = 0;
    709           %%
    710           
    711           frob        special(); REJECT;
    712           [^ \t\n]+   ++word_count;
    713 
    714      Without the `REJECT', any "frob"'s in the input would not be
    715      counted as words, since the scanner normally executes only one
    716      action per token.  Multiple `REJECT's' are allowed, each one
    717      finding the next best choice to the currently active rule.  For
    718      example, when the following scanner scans the token "abcd", it
    719      will write "abcdabcaba" to the output:
    720 
    721           %%
    722           a        |
    723           ab       |
    724           abc      |
    725           abcd     ECHO; REJECT;
    726           .|\n     /* eat up any unmatched character */
    727 
    728      (The first three rules share the fourth's action since they use
    729      the special '|' action.)  `REJECT' is a particularly expensive
    730      feature in terms of scanner performance; if it is used in *any* of
    731      the scanner's actions it will slow down *all* of the scanner's
    732      matching.  Furthermore, `REJECT' cannot be used with the `-Cf' or
    733      `-CF' options (see below).
    734 
    735      Note also that unlike the other special actions, `REJECT' is a
    736      *branch*; code immediately following it in the action will *not*
    737      be executed.
    738 
    739    - `yymore()' tells the scanner that the next time it matches a rule,
    740      the corresponding token should be *appended* onto the current
    741      value of `yytext' rather than replacing it.  For example, given
    742      the input "mega-kludge" the following will write
    743      "mega-mega-kludge" to the output:
    744 
    745           %%
    746           mega-    ECHO; yymore();
    747           kludge   ECHO;
    748 
    749      First "mega-" is matched and echoed to the output.  Then "kludge"
    750      is matched, but the previous "mega-" is still hanging around at
    751      the beginning of `yytext' so the `ECHO' for the "kludge" rule will
    752      actually write "mega-kludge".
    753 
    754    Two notes regarding use of `yymore()'.  First, `yymore()' depends on
    755 the value of `yyleng' correctly reflecting the size of the current
    756 token, so you must not modify `yyleng' if you are using `yymore()'.
    757 Second, the presence of `yymore()' in the scanner's action entails a
    758 minor performance penalty in the scanner's matching speed.
    759 
    760    - `yyless(n)' returns all but the first N characters of the current
    761      token back to the input stream, where they will be rescanned when
    762      the scanner looks for the next match.  `yytext' and `yyleng' are
    763      adjusted appropriately (e.g., `yyleng' will now be equal to N ).
    764      For example, on the input "foobar" the following will write out
    765      "foobarbar":
    766 
    767           %%
    768           foobar    ECHO; yyless(3);
    769           [a-z]+    ECHO;
    770 
    771      An argument of 0 to `yyless' will cause the entire current input
    772      string to be scanned again.  Unless you've changed how the scanner
    773      will subsequently process its input (using `BEGIN', for example),
    774      this will result in an endless loop.
    775 
    776      Note that `yyless' is a macro and can only be used in the flex
    777      input file, not from other source files.
    778 
    779    - `unput(c)' puts the character `c' back onto the input stream.  It
    780      will be the next character scanned.  The following action will
    781      take the current token and cause it to be rescanned enclosed in
    782      parentheses.
    783 
    784           {
    785           int i;
    786           /* Copy yytext because unput() trashes yytext */
    787           char *yycopy = strdup( yytext );
    788           unput( ')' );
    789           for ( i = yyleng - 1; i >= 0; --i )
    790               unput( yycopy[i] );
    791           unput( '(' );
    792           free( yycopy );
    793           }
    794 
    795      Note that since each `unput()' puts the given character back at
    796      the *beginning* of the input stream, pushing back strings must be
    797      done back-to-front.  An important potential problem when using
    798      `unput()' is that if you are using `%pointer' (the default), a
    799      call to `unput()' *destroys* the contents of `yytext', starting
    800      with its rightmost character and devouring one character to the
    801      left with each call.  If you need the value of yytext preserved
    802      after a call to `unput()' (as in the above example), you must
    803      either first copy it elsewhere, or build your scanner using
    804      `%array' instead (see How The Input Is Matched).
    805 
    806      Finally, note that you cannot put back `EOF' to attempt to mark
    807      the input stream with an end-of-file.
    808 
    809    - `input()' reads the next character from the input stream.  For
    810      example, the following is one way to eat up C comments:
    811 
    812           %%
    813           "/*"        {
    814                       register int c;
    815           
    816                       for ( ; ; )
    817                           {
    818                           while ( (c = input()) != '*' &&
    819                                   c != EOF )
    820                               ;    /* eat up text of comment */
    821           
    822                           if ( c == '*' )
    823                               {
    824                               while ( (c = input()) == '*' )
    825                                   ;
    826                               if ( c == '/' )
    827                                   break;    /* found the end */
    828                               }
    829           
    830                           if ( c == EOF )
    831                               {
    832                               error( "EOF in comment" );
    833                               break;
    834                               }
    835                           }
    836                       }
    837 
    838      (Note that if the scanner is compiled using `C++', then `input()'
    839      is instead referred to as `yyinput()', in order to avoid a name
    840      clash with the `C++' stream by the name of `input'.)
    841 
    842    - YY_FLUSH_BUFFER flushes the scanner's internal buffer so that the
    843      next time the scanner attempts to match a token, it will first
    844      refill the buffer using `YY_INPUT' (see The Generated Scanner,
    845      below).  This action is a special case of the more general
    846      `yy_flush_buffer()' function, described below in the section
    847      Multiple Input Buffers.
    848 
    849    - `yyterminate()' can be used in lieu of a return statement in an
    850      action.  It terminates the scanner and returns a 0 to the
    851      scanner's caller, indicating "all done".  By default,
    852      `yyterminate()' is also called when an end-of-file is encountered.
    853      It is a macro and may be redefined.
    854 
    855 
    856 File: flex.info,  Node: Generated scanner,  Next: Start conditions,  Prev: Actions,  Up: Top
    857 
    858 The generated scanner
    859 =====================
    860 
    861    The output of `flex' is the file `lex.yy.c', which contains the
    862 scanning routine `yylex()', a number of tables used by it for matching
    863 tokens, and a number of auxiliary routines and macros.  By default,
    864 `yylex()' is declared as follows:
    865 
    866      int yylex()
    867          {
    868          ... various definitions and the actions in here ...
    869          }
    870 
    871    (If your environment supports function prototypes, then it will be
    872 "int yylex( void  )".)   This  definition  may  be changed by defining
    873 the "YY_DECL" macro.  For example, you could use:
    874 
    875      #define YY_DECL float lexscan( a, b ) float a, b;
    876 
    877    to give the scanning routine the name `lexscan', returning a float,
    878 and taking two floats as arguments.  Note that if you give arguments to
    879 the scanning routine using a K&R-style/non-prototyped function
    880 declaration, you must terminate the definition with a semi-colon (`;').
    881 
    882    Whenever `yylex()' is called, it scans tokens from the global input
    883 file `yyin' (which defaults to stdin).  It continues until it either
    884 reaches an end-of-file (at which point it returns the value 0) or one
    885 of its actions executes a `return' statement.
    886 
    887    If the scanner reaches an end-of-file, subsequent calls are undefined
    888 unless either `yyin' is pointed at a new input file (in which case
    889 scanning continues from that file), or `yyrestart()' is called.
    890 `yyrestart()' takes one argument, a `FILE *' pointer (which can be nil,
    891 if you've set up `YY_INPUT' to scan from a source other than `yyin'),
    892 and initializes `yyin' for scanning from that file.  Essentially there
    893 is no difference between just assigning `yyin' to a new input file or
    894 using `yyrestart()' to do so; the latter is available for compatibility
    895 with previous versions of `flex', and because it can be used to switch
    896 input files in the middle of scanning.  It can also be used to throw
    897 away the current input buffer, by calling it with an argument of
    898 `yyin'; but better is to use `YY_FLUSH_BUFFER' (see above).  Note that
    899 `yyrestart()' does *not* reset the start condition to `INITIAL' (see
    900 Start Conditions, below).
    901 
    902    If `yylex()' stops scanning due to executing a `return' statement in
    903 one of the actions, the scanner may then be called again and it will
    904 resume scanning where it left off.
    905 
    906    By default (and for purposes of efficiency), the scanner uses
    907 block-reads rather than simple `getc()' calls to read characters from
    908 `yyin'.  The nature of how it gets its input can be controlled by
    909 defining the `YY_INPUT' macro.  YY_INPUT's calling sequence is
    910 "YY_INPUT(buf,result,max_size)".  Its action is to place up to MAX_SIZE
    911 characters in the character array BUF and return in the integer
    912 variable RESULT either the number of characters read or the constant
    913 YY_NULL (0 on Unix systems) to indicate EOF.  The default YY_INPUT
    914 reads from the global file-pointer "yyin".
    915 
    916    A sample definition of YY_INPUT (in the definitions section of the
    917 input file):
    918 
    919      %{
    920      #define YY_INPUT(buf,result,max_size) \
    921          { \
    922          int c = getchar(); \
    923          result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
    924          }
    925      %}
    926 
    927    This definition will change the input processing to occur one
    928 character at a time.
    929 
    930    When the scanner receives an end-of-file indication from YY_INPUT,
    931 it then checks the `yywrap()' function.  If `yywrap()' returns false
    932 (zero), then it is assumed that the function has gone ahead and set up
    933 `yyin' to point to another input file, and scanning continues.  If it
    934 returns true (non-zero), then the scanner terminates, returning 0 to
    935 its caller.  Note that in either case, the start condition remains
    936 unchanged; it does *not* revert to `INITIAL'.
    937 
    938    If you do not supply your own version of `yywrap()', then you must
    939 either use `%option noyywrap' (in which case the scanner behaves as
    940 though `yywrap()' returned 1), or you must link with `-lfl' to obtain
    941 the default version of the routine, which always returns 1.
    942 
    943    Three routines are available for scanning from in-memory buffers
    944 rather than files: `yy_scan_string()', `yy_scan_bytes()', and
    945 `yy_scan_buffer()'.  See the discussion of them below in the section
    946 Multiple Input Buffers.
    947 
    948    The scanner writes its `ECHO' output to the `yyout' global (default,
    949 stdout), which may be redefined by the user simply by assigning it to
    950 some other `FILE' pointer.
    951 
    952 
    953 File: flex.info,  Node: Start conditions,  Next: Multiple buffers,  Prev: Generated scanner,  Up: Top
    954 
    955 Start conditions
    956 ================
    957 
    958    `flex' provides a mechanism for conditionally activating rules.  Any
    959 rule whose pattern is prefixed with "<sc>" will only be active when the
    960 scanner is in the start condition named "sc".  For example,
    961 
    962      <STRING>[^"]*        { /* eat up the string body ... */
    963                  ...
    964                  }
    965 
    966 will be active only when the scanner is in the "STRING" start
    967 condition, and
    968 
    969      <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
    970                  ...
    971                  }
    972 
    973 will be active only when the current start condition is either
    974 "INITIAL", "STRING", or "QUOTE".
    975 
    976    Start conditions are declared in the definitions (first) section of
    977 the input using unindented lines beginning with either `%s' or `%x'
    978 followed by a list of names.  The former declares *inclusive* start
    979 conditions, the latter *exclusive* start conditions.  A start condition
    980 is activated using the `BEGIN' action.  Until the next `BEGIN' action is
    981 executed, rules with the given start condition will be active and rules
    982 with other start conditions will be inactive.  If the start condition
    983 is *inclusive*, then rules with no start conditions at all will also be
    984 active.  If it is *exclusive*, then *only* rules qualified with the
    985 start condition will be active.  A set of rules contingent on the same
    986 exclusive start condition describe a scanner which is independent of
    987 any of the other rules in the `flex' input.  Because of this, exclusive
    988 start conditions make it easy to specify "mini-scanners" which scan
    989 portions of the input that are syntactically different from the rest
    990 (e.g., comments).
    991 
    992    If the distinction between inclusive and exclusive start conditions
    993 is still a little vague, here's a simple example illustrating the
    994 connection between the two.  The set of rules:
    995 
    996      %s example
    997      %%
    998      
    999      <example>foo   do_something();
   1000      
   1001      bar            something_else();
   1002 
   1003 is equivalent to
   1004 
   1005      %x example
   1006      %%
   1007      
   1008      <example>foo   do_something();
   1009      
   1010      <INITIAL,example>bar    something_else();
   1011 
   1012    Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
   1013 second example wouldn't be active (i.e., couldn't match) when in start
   1014 condition `example'.  If we just used `<example>' to qualify `bar',
   1015 though, then it would only be active in `example' and not in `INITIAL',
   1016 while in the first example it's active in both, because in the first
   1017 example the `example' starting condition is an *inclusive* (`%s') start
   1018 condition.
   1019 
   1020    Also note that the special start-condition specifier `<*>' matches
   1021 every start condition.  Thus, the above example could also have been
   1022 written;
   1023 
   1024      %x example
   1025      %%
   1026      
   1027      <example>foo   do_something();
   1028      
   1029      <*>bar    something_else();
   1030 
   1031    The default rule (to `ECHO' any unmatched character) remains active
   1032 in start conditions.  It is equivalent to:
   1033 
   1034      <*>.|\\n     ECHO;
   1035 
   1036    `BEGIN(0)' returns to the original state where only the rules with
   1037 no start conditions are active.  This state can also be referred to as
   1038 the start-condition "INITIAL", so `BEGIN(INITIAL)' is equivalent to
   1039 `BEGIN(0)'.  (The parentheses around the start condition name are not
   1040 required but are considered good style.)
   1041 
   1042    `BEGIN' actions can also be given as indented code at the beginning
   1043 of the rules section.  For example, the following will cause the
   1044 scanner to enter the "SPECIAL" start condition whenever `yylex()' is
   1045 called and the global variable `enter_special' is true:
   1046 
   1047              int enter_special;
   1048      
   1049      %x SPECIAL
   1050      %%
   1051              if ( enter_special )
   1052                  BEGIN(SPECIAL);
   1053      
   1054      <SPECIAL>blahblahblah
   1055      ...more rules follow...
   1056 
   1057    To illustrate the uses of start conditions, here is a scanner which
   1058 provides two different interpretations of a string like "123.456".  By
   1059 default it will treat it as as three tokens, the integer "123", a dot
   1060 ('.'), and the integer "456".  But if the string is preceded earlier in
   1061 the line by the string "expect-floats" it will treat it as a single
   1062 token, the floating-point number 123.456:
   1063 
   1064      %{
   1065      #include <math.h>
   1066      %}
   1067      %s expect
   1068      
   1069      %%
   1070      expect-floats        BEGIN(expect);
   1071      
   1072      <expect>[0-9]+"."[0-9]+      {
   1073                  printf( "found a float, = %f\n",
   1074                          atof( yytext ) );
   1075                  }
   1076      <expect>\n           {
   1077                  /* that's the end of the line, so
   1078                   * we need another "expect-number"
   1079                   * before we'll recognize any more
   1080                   * numbers
   1081                   */
   1082                  BEGIN(INITIAL);
   1083                  }
   1084      
   1085      [0-9]+      {
   1086      
   1087      Version 2.5               December 1994                        18
   1088      
   1089                  printf( "found an integer, = %d\n",
   1090                          atoi( yytext ) );
   1091                  }
   1092      
   1093      "."         printf( "found a dot\n" );
   1094 
   1095    Here is a scanner which recognizes (and discards) C comments while
   1096 maintaining a count of the current input line.
   1097 
   1098      %x comment
   1099      %%
   1100              int line_num = 1;
   1101      
   1102      "/*"         BEGIN(comment);
   1103      
   1104      <comment>[^*\n]*        /* eat anything that's not a '*' */
   1105      <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1106      <comment>\n             ++line_num;
   1107      <comment>"*"+"/"        BEGIN(INITIAL);
   1108 
   1109    This scanner goes to a bit of trouble to match as much text as
   1110 possible with each rule.  In general, when attempting to write a
   1111 high-speed scanner try to match as much possible in each rule, as it's
   1112 a big win.
   1113 
   1114    Note that start-conditions names are really integer values and can
   1115 be stored as such.  Thus, the above could be extended in the following
   1116 fashion:
   1117 
   1118      %x comment foo
   1119      %%
   1120              int line_num = 1;
   1121              int comment_caller;
   1122      
   1123      "/*"         {
   1124                   comment_caller = INITIAL;
   1125                   BEGIN(comment);
   1126                   }
   1127      
   1128      ...
   1129      
   1130      <foo>"/*"    {
   1131                   comment_caller = foo;
   1132                   BEGIN(comment);
   1133                   }
   1134      
   1135      <comment>[^*\n]*        /* eat anything that's not a '*' */
   1136      <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1137      <comment>\n             ++line_num;
   1138      <comment>"*"+"/"        BEGIN(comment_caller);
   1139 
   1140    Furthermore, you can access the current start condition using the
   1141 integer-valued `YY_START' macro.  For example, the above assignments to
   1142 `comment_caller' could instead be written
   1143 
   1144      comment_caller = YY_START;
   1145 
   1146    Flex provides `YYSTATE' as an alias for `YY_START' (since that is
   1147 what's used by AT&T `lex').
   1148 
   1149    Note that start conditions do not have their own name-space; %s's
   1150 and %x's declare names in the same fashion as #define's.
   1151 
   1152    Finally, here's an example of how to match C-style quoted strings
   1153 using exclusive start conditions, including expanded escape sequences
   1154 (but not including checking for a string that's too long):
   1155 
   1156      %x str
   1157      
   1158      %%
   1159              char string_buf[MAX_STR_CONST];
   1160              char *string_buf_ptr;
   1161      
   1162      \"      string_buf_ptr = string_buf; BEGIN(str);
   1163      
   1164      <str>\"        { /* saw closing quote - all done */
   1165              BEGIN(INITIAL);
   1166              *string_buf_ptr = '\0';
   1167              /* return string constant token type and
   1168               * value to parser
   1169               */
   1170              }
   1171      
   1172      <str>\n        {
   1173              /* error - unterminated string constant */
   1174              /* generate error message */
   1175              }
   1176      
   1177      <str>\\[0-7]{1,3} {
   1178              /* octal escape sequence */
   1179              int result;
   1180      
   1181              (void) sscanf( yytext + 1, "%o", &result );
   1182      
   1183              if ( result > 0xff )
   1184                      /* error, constant is out-of-bounds */
   1185      
   1186              *string_buf_ptr++ = result;
   1187              }
   1188      
   1189      <str>\\[0-9]+ {
   1190              /* generate error - bad escape sequence; something
   1191               * like '\48' or '\0777777'
   1192               */
   1193              }
   1194      
   1195      <str>\\n  *string_buf_ptr++ = '\n';
   1196      <str>\\t  *string_buf_ptr++ = '\t';
   1197      <str>\\r  *string_buf_ptr++ = '\r';
   1198      <str>\\b  *string_buf_ptr++ = '\b';
   1199      <str>\\f  *string_buf_ptr++ = '\f';
   1200      
   1201      <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
   1202      
   1203      <str>[^\\\n\"]+        {
   1204              char *yptr = yytext;
   1205      
   1206              while ( *yptr )
   1207                      *string_buf_ptr++ = *yptr++;
   1208              }
   1209 
   1210    Often, such as in some of the examples above, you wind up writing a
   1211 whole bunch of rules all preceded by the same start condition(s).  Flex
   1212 makes this a little easier and cleaner by introducing a notion of start
   1213 condition "scope".  A start condition scope is begun with:
   1214 
   1215      <SCs>{
   1216 
   1217 where SCs is a list of one or more start conditions.  Inside the start
   1218 condition scope, every rule automatically has the prefix `<SCs>'
   1219 applied to it, until a `}' which matches the initial `{'.  So, for
   1220 example,
   1221 
   1222      <ESC>{
   1223          "\\n"   return '\n';
   1224          "\\r"   return '\r';
   1225          "\\f"   return '\f';
   1226          "\\0"   return '\0';
   1227      }
   1228 
   1229 is equivalent to:
   1230 
   1231      <ESC>"\\n"  return '\n';
   1232      <ESC>"\\r"  return '\r';
   1233      <ESC>"\\f"  return '\f';
   1234      <ESC>"\\0"  return '\0';
   1235 
   1236    Start condition scopes may be nested.
   1237 
   1238    Three routines are available for manipulating stacks of start
   1239 conditions:
   1240 
   1241 `void yy_push_state(int new_state)'
   1242      pushes the current start condition onto the top of the start
   1243      condition stack and switches to NEW_STATE as though you had used
   1244      `BEGIN new_state' (recall that start condition names are also
   1245      integers).
   1246 
   1247 `void yy_pop_state()'
   1248      pops the top of the stack and switches to it via `BEGIN'.
   1249 
   1250 `int yy_top_state()'
   1251      returns the top of the stack without altering the stack's contents.
   1252 
   1253    The start condition stack grows dynamically and so has no built-in
   1254 size limitation.  If memory is exhausted, program execution aborts.
   1255 
   1256    To use start condition stacks, your scanner must include a `%option
   1257 stack' directive (see Options below).
   1258 
   1259 
   1260 File: flex.info,  Node: Multiple buffers,  Next: End-of-file rules,  Prev: Start conditions,  Up: Top
   1261 
   1262 Multiple input buffers
   1263 ======================
   1264 
   1265    Some scanners (such as those which support "include" files) require
   1266 reading from several input streams.  As `flex' scanners do a large
   1267 amount of buffering, one cannot control where the next input will be
   1268 read from by simply writing a `YY_INPUT' which is sensitive to the
   1269 scanning context.  `YY_INPUT' is only called when the scanner reaches
   1270 the end of its buffer, which may be a long time after scanning a
   1271 statement such as an "include" which requires switching the input
   1272 source.
   1273 
   1274    To negotiate these sorts of problems, `flex' provides a mechanism
   1275 for creating and switching between multiple input buffers.  An input
   1276 buffer is created by using:
   1277 
   1278      YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
   1279 
   1280 which takes a `FILE' pointer and a size and creates a buffer associated
   1281 with the given file and large enough to hold SIZE characters (when in
   1282 doubt, use `YY_BUF_SIZE' for the size).  It returns a `YY_BUFFER_STATE'
   1283 handle, which may then be passed to other routines (see below).  The
   1284 `YY_BUFFER_STATE' type is a pointer to an opaque `struct'
   1285 `yy_buffer_state' structure, so you may safely initialize
   1286 YY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish, and
   1287 also refer to the opaque structure in order to correctly declare input
   1288 buffers in source files other than that of your scanner.  Note that the
   1289 `FILE' pointer in the call to `yy_create_buffer' is only used as the
   1290 value of `yyin' seen by `YY_INPUT'; if you redefine `YY_INPUT' so it no
   1291 longer uses `yyin', then you can safely pass a nil `FILE' pointer to
   1292 `yy_create_buffer'.  You select a particular buffer to scan from using:
   1293 
   1294      void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
   1295 
   1296    switches the scanner's input buffer so subsequent tokens will come
   1297 from NEW_BUFFER.  Note that `yy_switch_to_buffer()' may be used by
   1298 `yywrap()' to set things up for continued scanning, instead of opening
   1299 a new file and pointing `yyin' at it.  Note also that switching input
   1300 sources via either `yy_switch_to_buffer()' or `yywrap()' does *not*
   1301 change the start condition.
   1302 
   1303      void yy_delete_buffer( YY_BUFFER_STATE buffer )
   1304 
   1305 is used to reclaim the storage associated with a buffer.  You can also
   1306 clear the current contents of a buffer using:
   1307 
   1308      void yy_flush_buffer( YY_BUFFER_STATE buffer )
   1309 
   1310    This function discards the buffer's contents, so the next time the
   1311 scanner attempts to match a token from the buffer, it will first fill
   1312 the buffer anew using `YY_INPUT'.
   1313 
   1314    `yy_new_buffer()' is an alias for `yy_create_buffer()', provided for
   1315 compatibility with the C++ use of `new' and `delete' for creating and
   1316 destroying dynamic objects.
   1317 
   1318    Finally, the `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE'
   1319 handle to the current buffer.
   1320 
   1321    Here is an example of using these features for writing a scanner
   1322 which expands include files (the `<<EOF>>' feature is discussed below):
   1323 
   1324      /* the "incl" state is used for picking up the name
   1325       * of an include file
   1326       */
   1327      %x incl
   1328      
   1329      %{
   1330      #define MAX_INCLUDE_DEPTH 10
   1331      YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
   1332      int include_stack_ptr = 0;
   1333      %}
   1334      
   1335      %%
   1336      include             BEGIN(incl);
   1337      
   1338      [a-z]+              ECHO;
   1339      [^a-z\n]*\n?        ECHO;
   1340      
   1341      <incl>[ \t]*      /* eat the whitespace */
   1342      <incl>[^ \t\n]+   { /* got the include file name */
   1343              if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
   1344                  {
   1345                  fprintf( stderr, "Includes nested too deeply" );
   1346                  exit( 1 );
   1347                  }
   1348      
   1349              include_stack[include_stack_ptr++] =
   1350                  YY_CURRENT_BUFFER;
   1351      
   1352              yyin = fopen( yytext, "r" );
   1353      
   1354              if ( ! yyin )
   1355                  error( ... );
   1356      
   1357              yy_switch_to_buffer(
   1358                  yy_create_buffer( yyin, YY_BUF_SIZE ) );
   1359      
   1360              BEGIN(INITIAL);
   1361              }
   1362      
   1363      <<EOF>> {
   1364              if ( --include_stack_ptr < 0 )
   1365                  {
   1366                  yyterminate();
   1367                  }
   1368      
   1369              else
   1370                  {
   1371                  yy_delete_buffer( YY_CURRENT_BUFFER );
   1372                  yy_switch_to_buffer(
   1373                       include_stack[include_stack_ptr] );
   1374                  }
   1375              }
   1376 
   1377    Three routines are available for setting up input buffers for
   1378 scanning in-memory strings instead of files.  All of them create a new
   1379 input buffer for scanning the string, and return a corresponding
   1380 `YY_BUFFER_STATE' handle (which you should delete with
   1381 `yy_delete_buffer()' when done with it).  They also switch to the new
   1382 buffer using `yy_switch_to_buffer()', so the next call to `yylex()' will
   1383 start scanning the string.
   1384 
   1385 `yy_scan_string(const char *str)'
   1386      scans a NUL-terminated string.
   1387 
   1388 `yy_scan_bytes(const char *bytes, int len)'
   1389      scans `len' bytes (including possibly NUL's) starting at location
   1390      BYTES.
   1391 
   1392    Note that both of these functions create and scan a *copy* of the
   1393 string or bytes.  (This may be desirable, since `yylex()' modifies the
   1394 contents of the buffer it is scanning.) You can avoid the copy by using:
   1395 
   1396 `yy_scan_buffer(char *base, yy_size_t size)'
   1397      which scans in place the buffer starting at BASE, consisting of
   1398      SIZE bytes, the last two bytes of which *must* be
   1399      `YY_END_OF_BUFFER_CHAR' (ASCII NUL).  These last two bytes are not
   1400      scanned; thus, scanning consists of `base[0]' through
   1401      `base[size-2]', inclusive.
   1402 
   1403      If you fail to set up BASE in this manner (i.e., forget the final
   1404      two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()'
   1405      returns a nil pointer instead of creating a new input buffer.
   1406 
   1407      The type `yy_size_t' is an integral type to which you can cast an
   1408      integer expression reflecting the size of the buffer.
   1409 
   1410 
   1411 File: flex.info,  Node: End-of-file rules,  Next: Miscellaneous,  Prev: Multiple buffers,  Up: Top
   1412 
   1413 End-of-file rules
   1414 =================
   1415 
   1416    The special rule "<<EOF>>" indicates actions which are to be taken
   1417 when an end-of-file is encountered and yywrap() returns non-zero (i.e.,
   1418 indicates no further files to process).  The action must finish by
   1419 doing one of four things:
   1420 
   1421    - assigning `yyin' to a new input file (in previous versions of
   1422      flex, after doing the assignment you had to call the special
   1423      action `YY_NEW_FILE'; this is no longer necessary);
   1424 
   1425    - executing a `return' statement;
   1426 
   1427    - executing the special `yyterminate()' action;
   1428 
   1429    - or, switching to a new buffer using `yy_switch_to_buffer()' as
   1430      shown in the example above.
   1431 
   1432    <<EOF>> rules may not be used with other patterns; they may only be
   1433 qualified with a list of start conditions.  If an unqualified <<EOF>>
   1434 rule is given, it applies to *all* start conditions which do not
   1435 already have <<EOF>> actions.  To specify an <<EOF>> rule for only the
   1436 initial start condition, use
   1437 
   1438      <INITIAL><<EOF>>
   1439 
   1440    These rules are useful for catching things like unclosed comments.
   1441 An example:
   1442 
   1443      %x quote
   1444      %%
   1445      
   1446      ...other rules for dealing with quotes...
   1447      
   1448      <quote><<EOF>>   {
   1449               error( "unterminated quote" );
   1450               yyterminate();
   1451               }
   1452      <<EOF>>  {
   1453               if ( *++filelist )
   1454                   yyin = fopen( *filelist, "r" );
   1455               else
   1456                  yyterminate();
   1457               }
   1458 
   1459 
   1460 File: flex.info,  Node: Miscellaneous,  Next: User variables,  Prev: End-of-file rules,  Up: Top
   1461 
   1462 Miscellaneous macros
   1463 ====================
   1464 
   1465    The macro `YY_USER_ACTION' can be defined to provide an action which
   1466 is always executed prior to the matched rule's action.  For example, it
   1467 could be #define'd to call a routine to convert yytext to lower-case.
   1468 When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
   1469 number of the matched rule (rules are numbered starting with 1).
   1470 Suppose you want to profile how often each of your rules is matched.
   1471 The following would do the trick:
   1472 
   1473      #define YY_USER_ACTION ++ctr[yy_act]
   1474 
   1475    where `ctr' is an array to hold the counts for the different rules.
   1476 Note that the macro `YY_NUM_RULES' gives the total number of rules
   1477 (including the default rule, even if you use `-s', so a correct
   1478 declaration for `ctr' is:
   1479 
   1480      int ctr[YY_NUM_RULES];
   1481 
   1482    The macro `YY_USER_INIT' may be defined to provide an action which
   1483 is always executed before the first scan (and before the scanner's
   1484 internal initializations are done).  For example, it could be used to
   1485 call a routine to read in a data table or open a logging file.
   1486 
   1487    The macro `yy_set_interactive(is_interactive)' can be used to
   1488 control whether the current buffer is considered *interactive*.  An
   1489 interactive buffer is processed more slowly, but must be used when the
   1490 scanner's input source is indeed interactive to avoid problems due to
   1491 waiting to fill buffers (see the discussion of the `-I' flag below).  A
   1492 non-zero value in the macro invocation marks the buffer as interactive,
   1493 a zero value as non-interactive.  Note that use of this macro overrides
   1494 `%option always-interactive' or `%option never-interactive' (see
   1495 Options below).  `yy_set_interactive()' must be invoked prior to
   1496 beginning to scan the buffer that is (or is not) to be considered
   1497 interactive.
   1498 
   1499    The macro `yy_set_bol(at_bol)' can be used to control whether the
   1500 current buffer's scanning context for the next token match is done as
   1501 though at the beginning of a line.  A non-zero macro argument makes
   1502 rules anchored with
   1503 
   1504    The macro `YY_AT_BOL()' returns true if the next token scanned from
   1505 the current buffer will have '^' rules active, false otherwise.
   1506 
   1507    In the generated scanner, the actions are all gathered in one large
   1508 switch statement and separated using `YY_BREAK', which may be
   1509 redefined.  By default, it is simply a "break", to separate each rule's
   1510 action from the following rule's.  Redefining `YY_BREAK' allows, for
   1511 example, C++ users to #define YY_BREAK to do nothing (while being very
   1512 careful that every rule ends with a "break" or a "return"!) to avoid
   1513 suffering from unreachable statement warnings where because a rule's
   1514 action ends with "return", the `YY_BREAK' is inaccessible.
   1515 
   1516 
   1517 File: flex.info,  Node: User variables,  Next: YACC interface,  Prev: Miscellaneous,  Up: Top
   1518 
   1519 Values available to the user
   1520 ============================
   1521 
   1522    This section summarizes the various values available to the user in
   1523 the rule actions.
   1524 
   1525    - `char *yytext' holds the text of the current token.  It may be
   1526      modified but not lengthened (you cannot append characters to the
   1527      end).
   1528 
   1529      If the special directive `%array' appears in the first section of
   1530      the scanner description, then `yytext' is instead declared `char
   1531      yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
   1532      redefine in the first section if you don't like the default value
   1533      (generally 8KB).  Using `%array' results in somewhat slower
   1534      scanners, but the value of `yytext' becomes immune to calls to
   1535      `input()' and `unput()', which potentially destroy its value when
   1536      `yytext' is a character pointer.  The opposite of `%array' is
   1537      `%pointer', which is the default.
   1538 
   1539      You cannot use `%array' when generating C++ scanner classes (the
   1540      `-+' flag).
   1541 
   1542    - `int yyleng' holds the length of the current token.
   1543 
   1544    - `FILE *yyin' is the file which by default `flex' reads from.  It
   1545      may be redefined but doing so only makes sense before scanning
   1546      begins or after an EOF has been encountered.  Changing it in the
   1547      midst of scanning will have unexpected results since `flex'
   1548      buffers its input; use `yyrestart()' instead.  Once scanning
   1549      terminates because an end-of-file has been seen, you can assign
   1550      `yyin' at the new input file and then call the scanner again to
   1551      continue scanning.
   1552 
   1553    - `void yyrestart( FILE *new_file )' may be called to point `yyin'
   1554      at the new input file.  The switch-over to the new file is
   1555      immediate (any previously buffered-up input is lost).  Note that
   1556      calling `yyrestart()' with `yyin' as an argument thus throws away
   1557      the current input buffer and continues scanning the same input
   1558      file.
   1559 
   1560    - `FILE *yyout' is the file to which `ECHO' actions are done.  It
   1561      can be reassigned by the user.
   1562 
   1563    - `YY_CURRENT_BUFFER' returns a `YY_BUFFER_STATE' handle to the
   1564      current buffer.
   1565 
   1566    - `YY_START' returns an integer value corresponding to the current
   1567      start condition.  You can subsequently use this value with `BEGIN'
   1568      to return to that start condition.
   1569 
   1570 
   1571 File: flex.info,  Node: YACC interface,  Next: Options,  Prev: User variables,  Up: Top
   1572 
   1573 Interfacing with `yacc'
   1574 =======================
   1575 
   1576    One of the main uses of `flex' is as a companion to the `yacc'
   1577 parser-generator.  `yacc' parsers expect to call a routine named
   1578 `yylex()' to find the next input token.  The routine is supposed to
   1579 return the type of the next token as well as putting any associated
   1580 value in the global `yylval'.  To use `flex' with `yacc', one specifies
   1581 the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
   1582 containing definitions of all the `%tokens' appearing in the `yacc'
   1583 input.  This file is then included in the `flex' scanner.  For example,
   1584 if one of the tokens is "TOK_NUMBER", part of the scanner might look
   1585 like:
   1586 
   1587      %{
   1588      #include "y.tab.h"
   1589      %}
   1590      
   1591      %%
   1592      
   1593      [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
   1594 
   1595 
   1596 File: flex.info,  Node: Options,  Next: Performance,  Prev: YACC interface,  Up: Top
   1597 
   1598 Options
   1599 =======
   1600 
   1601    `flex' has the following options:
   1602 
   1603 `-b'
   1604      Generate backing-up information to `lex.backup'.  This is a list
   1605      of scanner states which require backing up and the input
   1606      characters on which they do so.  By adding rules one can remove
   1607      backing-up states.  If *all* backing-up states are eliminated and
   1608      `-Cf' or `-CF' is used, the generated scanner will run faster (see
   1609      the `-p' flag).  Only users who wish to squeeze every last cycle
   1610      out of their scanners need worry about this option.  (See the
   1611      section on Performance Considerations below.)
   1612 
   1613 `-c'
   1614      is a do-nothing, deprecated option included for POSIX compliance.
   1615 
   1616 `-d'
   1617      makes the generated scanner run in "debug" mode.  Whenever a
   1618      pattern is recognized and the global `yy_flex_debug' is non-zero
   1619      (which is the default), the scanner will write to `stderr' a line
   1620      of the form:
   1621 
   1622           --accepting rule at line 53 ("the matched text")
   1623 
   1624      The line number refers to the location of the rule in the file
   1625      defining the scanner (i.e., the file that was fed to flex).
   1626      Messages are also generated when the scanner backs up, accepts the
   1627      default rule, reaches the end of its input buffer (or encounters a
   1628      NUL; at this point, the two look the same as far as the scanner's
   1629      concerned), or reaches an end-of-file.
   1630 
   1631 `-f'
   1632      specifies "fast scanner".  No table compression is done and stdio
   1633      is bypassed.  The result is large but fast.  This option is
   1634      equivalent to `-Cfr' (see below).
   1635 
   1636 `-h'
   1637      generates a "help" summary of `flex's' options to `stdout' and
   1638      then exits.  `-?' and `--help' are synonyms for `-h'.
   1639 
   1640 `-i'
   1641      instructs `flex' to generate a *case-insensitive* scanner.  The
   1642      case of letters given in the `flex' input patterns will be
   1643      ignored, and tokens in the input will be matched regardless of
   1644      case.  The matched text given in `yytext' will have the preserved
   1645      case (i.e., it will not be folded).
   1646 
   1647 `-l'
   1648      turns on maximum compatibility with the original AT&T `lex'
   1649      implementation.  Note that this does not mean *full*
   1650      compatibility.  Use of this option costs a considerable amount of
   1651      performance, and it cannot be used with the `-+, -f, -F, -Cf', or
   1652      `-CF' options.  For details on the compatibilities it provides, see
   1653      the section "Incompatibilities With Lex And POSIX" below.  This
   1654      option also results in the name `YY_FLEX_LEX_COMPAT' being
   1655      #define'd in the generated scanner.
   1656 
   1657 `-n'
   1658      is another do-nothing, deprecated option included only for POSIX
   1659      compliance.
   1660 
   1661 `-p'
   1662      generates a performance report to stderr.  The report consists of
   1663      comments regarding features of the `flex' input file which will
   1664      cause a serious loss of performance in the resulting scanner.  If
   1665      you give the flag twice, you will also get comments regarding
   1666      features that lead to minor performance losses.
   1667 
   1668      Note that the use of `REJECT', `%option yylineno' and variable
   1669      trailing context (see the Deficiencies / Bugs section below)
   1670      entails a substantial performance penalty; use of `yymore()', the
   1671      `^' operator, and the `-I' flag entail minor performance penalties.
   1672 
   1673 `-s'
   1674      causes the "default rule" (that unmatched scanner input is echoed
   1675      to `stdout') to be suppressed.  If the scanner encounters input
   1676      that does not match any of its rules, it aborts with an error.
   1677      This option is useful for finding holes in a scanner's rule set.
   1678 
   1679 `-t'
   1680      instructs `flex' to write the scanner it generates to standard
   1681      output instead of `lex.yy.c'.
   1682 
   1683 `-v'
   1684      specifies that `flex' should write to `stderr' a summary of
   1685      statistics regarding the scanner it generates.  Most of the
   1686      statistics are meaningless to the casual `flex' user, but the
   1687      first line identifies the version of `flex' (same as reported by
   1688      `-V'), and the next line the flags used when generating the
   1689      scanner, including those that are on by default.
   1690 
   1691 `-w'
   1692      suppresses warning messages.
   1693 
   1694 `-B'
   1695      instructs `flex' to generate a *batch* scanner, the opposite of
   1696      *interactive* scanners generated by `-I' (see below).  In general,
   1697      you use `-B' when you are *certain* that your scanner will never
   1698      be used interactively, and you want to squeeze a *little* more
   1699      performance out of it.  If your goal is instead to squeeze out a
   1700      *lot* more performance, you should be using the `-Cf' or `-CF'
   1701      options (discussed below), which turn on `-B' automatically anyway.
   1702 
   1703 `-F'
   1704      specifies that the "fast" scanner table representation should be
   1705      used (and stdio bypassed).  This representation is about as fast
   1706      as the full table representation `(-f)', and for some sets of
   1707      patterns will be considerably smaller (and for others, larger).
   1708      In general, if the pattern set contains both "keywords" and a
   1709      catch-all, "identifier" rule, such as in the set:
   1710 
   1711           "case"    return TOK_CASE;
   1712           "switch"  return TOK_SWITCH;
   1713           ...
   1714           "default" return TOK_DEFAULT;
   1715           [a-z]+    return TOK_ID;
   1716 
   1717      then you're better off using the full table representation.  If
   1718      only the "identifier" rule is present and you then use a hash
   1719      table or some such to detect the keywords, you're better off using
   1720      `-F'.
   1721 
   1722      This option is equivalent to `-CFr' (see below).  It cannot be
   1723      used with `-+'.
   1724 
   1725 `-I'
   1726      instructs `flex' to generate an *interactive* scanner.  An
   1727      interactive scanner is one that only looks ahead to decide what
   1728      token has been matched if it absolutely must.  It turns out that
   1729      always looking one extra character ahead, even if the scanner has
   1730      already seen enough text to disambiguate the current token, is a
   1731      bit faster than only looking ahead when necessary.  But scanners
   1732      that always look ahead give dreadful interactive performance; for
   1733      example, when a user types a newline, it is not recognized as a
   1734      newline token until they enter *another* token, which often means
   1735      typing in another whole line.
   1736 
   1737      `Flex' scanners default to *interactive* unless you use the `-Cf'
   1738      or `-CF' table-compression options (see below).  That's because if
   1739      you're looking for high-performance you should be using one of
   1740      these options, so if you didn't, `flex' assumes you'd rather trade
   1741      off a bit of run-time performance for intuitive interactive
   1742      behavior.  Note also that you *cannot* use `-I' in conjunction
   1743      with `-Cf' or `-CF'.  Thus, this option is not really needed; it
   1744      is on by default for all those cases in which it is allowed.
   1745 
   1746      You can force a scanner to *not* be interactive by using `-B' (see
   1747      above).
   1748 
   1749 `-L'
   1750      instructs `flex' not to generate `#line' directives.  Without this
   1751      option, `flex' peppers the generated scanner with #line directives
   1752      so error messages in the actions will be correctly located with
   1753      respect to either the original `flex' input file (if the errors
   1754      are due to code in the input file), or `lex.yy.c' (if the errors
   1755      are `flex's' fault - you should report these sorts of errors to
   1756      the email address given below).
   1757 
   1758 `-T'
   1759      makes `flex' run in `trace' mode.  It will generate a lot of
   1760      messages to `stderr' concerning the form of the input and the
   1761      resultant non-deterministic and deterministic finite automata.
   1762      This option is mostly for use in maintaining `flex'.
   1763 
   1764 `-V'
   1765      prints the version number to `stdout' and exits.  `--version' is a
   1766      synonym for `-V'.
   1767 
   1768 `-7'
   1769      instructs `flex' to generate a 7-bit scanner, i.e., one which can
   1770      only recognized 7-bit characters in its input.  The advantage of
   1771      using `-7' is that the scanner's tables can be up to half the size
   1772      of those generated using the `-8' option (see below).  The
   1773      disadvantage is that such scanners often hang or crash if their
   1774      input contains an 8-bit character.
   1775 
   1776      Note, however, that unless you generate your scanner using the
   1777      `-Cf' or `-CF' table compression options, use of `-7' will save
   1778      only a small amount of table space, and make your scanner
   1779      considerably less portable.  `Flex's' default behavior is to
   1780      generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
   1781      which case `flex' defaults to generating 7-bit scanners unless
   1782      your site was always configured to generate 8-bit scanners (as
   1783      will often be the case with non-USA sites).  You can tell whether
   1784      flex generated a 7-bit or an 8-bit scanner by inspecting the flag
   1785      summary in the `-v' output as described above.
   1786 
   1787      Note that if you use `-Cfe' or `-CFe' (those table compression
   1788      options, but also using equivalence classes as discussed see
   1789      below), flex still defaults to generating an 8-bit scanner, since
   1790      usually with these compression options full 8-bit tables are not
   1791      much more expensive than 7-bit tables.
   1792 
   1793 `-8'
   1794      instructs `flex' to generate an 8-bit scanner, i.e., one which can
   1795      recognize 8-bit characters.  This flag is only needed for scanners
   1796      generated using `-Cf' or `-CF', as otherwise flex defaults to
   1797      generating an 8-bit scanner anyway.
   1798 
   1799      See the discussion of `-7' above for flex's default behavior and
   1800      the tradeoffs between 7-bit and 8-bit scanners.
   1801 
   1802 `-+'
   1803      specifies that you want flex to generate a C++ scanner class.  See
   1804      the section on Generating C++ Scanners below for details.
   1805 
   1806 `-C[aefFmr]'
   1807      controls the degree of table compression and, more generally,
   1808      trade-offs between small scanners and fast scanners.
   1809 
   1810      `-Ca' ("align") instructs flex to trade off larger tables in the
   1811      generated scanner for faster performance because the elements of
   1812      the tables are better aligned for memory access and computation.
   1813      On some RISC architectures, fetching and manipulating long-words
   1814      is more efficient than with smaller-sized units such as
   1815      shortwords.  This option can double the size of the tables used by
   1816      your scanner.
   1817 
   1818      `-Ce' directs `flex' to construct "equivalence classes", i.e.,
   1819      sets of characters which have identical lexical properties (for
   1820      example, if the only appearance of digits in the `flex' input is
   1821      in the character class "[0-9]" then the digits '0', '1', ..., '9'
   1822      will all be put in the same equivalence class).  Equivalence
   1823      classes usually give dramatic reductions in the final table/object
   1824      file sizes (typically a factor of 2-5) and are pretty cheap
   1825      performance-wise (one array look-up per character scanned).
   1826 
   1827      `-Cf' specifies that the *full* scanner tables should be generated
   1828      - `flex' should not compress the tables by taking advantages of
   1829      similar transition functions for different states.
   1830 
   1831      `-CF' specifies that the alternate fast scanner representation
   1832      (described above under the `-F' flag) should be used.  This option
   1833      cannot be used with `-+'.
   1834 
   1835      `-Cm' directs `flex' to construct "meta-equivalence classes",
   1836      which are sets of equivalence classes (or characters, if
   1837      equivalence classes are not being used) that are commonly used
   1838      together.  Meta-equivalence classes are often a big win when using
   1839      compressed tables, but they have a moderate performance impact
   1840      (one or two "if" tests and one array look-up per character
   1841      scanned).
   1842 
   1843      `-Cr' causes the generated scanner to *bypass* use of the standard
   1844      I/O library (stdio) for input.  Instead of calling `fread()' or
   1845      `getc()', the scanner will use the `read()' system call, resulting
   1846      in a performance gain which varies from system to system, but in
   1847      general is probably negligible unless you are also using `-Cf' or
   1848      `-CF'.  Using `-Cr' can cause strange behavior if, for example,
   1849      you read from `yyin' using stdio prior to calling the scanner
   1850      (because the scanner will miss whatever text your previous reads
   1851      left in the stdio input buffer).
   1852 
   1853      `-Cr' has no effect if you define `YY_INPUT' (see The Generated
   1854      Scanner above).
   1855 
   1856      A lone `-C' specifies that the scanner tables should be compressed
   1857      but neither equivalence classes nor meta-equivalence classes
   1858      should be used.
   1859 
   1860      The options `-Cf' or `-CF' and `-Cm' do not make sense together -
   1861      there is no opportunity for meta-equivalence classes if the table
   1862      is not being compressed.  Otherwise the options may be freely
   1863      mixed, and are cumulative.
   1864 
   1865      The default setting is `-Cem', which specifies that `flex' should
   1866      generate equivalence classes and meta-equivalence classes.  This
   1867      setting provides the highest degree of table compression.  You can
   1868      trade off faster-executing scanners at the cost of larger tables
   1869      with the following generally being true:
   1870 
   1871           slowest & smallest
   1872                 -Cem
   1873                 -Cm
   1874                 -Ce
   1875                 -C
   1876                 -C{f,F}e
   1877                 -C{f,F}
   1878                 -C{f,F}a
   1879           fastest & largest
   1880 
   1881      Note that scanners with the smallest tables are usually generated
   1882      and compiled the quickest, so during development you will usually
   1883      want to use the default, maximal compression.
   1884 
   1885      `-Cfe' is often a good compromise between speed and size for
   1886      production scanners.
   1887 
   1888 `-ooutput'
   1889      directs flex to write the scanner to the file `out-' `put' instead
   1890      of `lex.yy.c'.  If you combine `-o' with the `-t' option, then the
   1891      scanner is written to `stdout' but its `#line' directives (see the
   1892      `-L' option above) refer to the file `output'.
   1893 
   1894 `-Pprefix'
   1895      changes the default `yy' prefix used by `flex' for all
   1896      globally-visible variable and function names to instead be PREFIX.
   1897      For example, `-Pfoo' changes the name of `yytext' to `footext'.
   1898      It also changes the name of the default output file from
   1899      `lex.yy.c' to `lex.foo.c'.  Here are all of the names affected:
   1900 
   1901           yy_create_buffer
   1902           yy_delete_buffer
   1903           yy_flex_debug
   1904           yy_init_buffer
   1905           yy_flush_buffer
   1906           yy_load_buffer_state
   1907           yy_switch_to_buffer
   1908           yyin
   1909           yyleng
   1910           yylex
   1911           yylineno
   1912           yyout
   1913           yyrestart
   1914           yytext
   1915           yywrap
   1916 
   1917      (If you are using a C++ scanner, then only `yywrap' and
   1918      `yyFlexLexer' are affected.) Within your scanner itself, you can
   1919      still refer to the global variables and functions using either
   1920      version of their name; but externally, they have the modified name.
   1921 
   1922      This option lets you easily link together multiple `flex' programs
   1923      into the same executable.  Note, though, that using this option
   1924      also renames `yywrap()', so you now *must* either provide your own
   1925      (appropriately-named) version of the routine for your scanner, or
   1926      use `%option noyywrap', as linking with `-lfl' no longer provides
   1927      one for you by default.
   1928 
   1929 `-Sskeleton_file'
   1930      overrides the default skeleton file from which `flex' constructs
   1931      its scanners.  You'll never need this option unless you are doing
   1932      `flex' maintenance or development.
   1933 
   1934    `flex' also provides a mechanism for controlling options within the
   1935 scanner specification itself, rather than from the flex command-line.
   1936 This is done by including `%option' directives in the first section of
   1937 the scanner specification.  You can specify multiple options with a
   1938 single `%option' directive, and multiple directives in the first
   1939 section of your flex input file.  Most options are given simply as
   1940 names, optionally preceded by the word "no" (with no intervening
   1941 whitespace) to negate their meaning.  A number are equivalent to flex
   1942 flags or their negation:
   1943 
   1944      7bit            -7 option
   1945      8bit            -8 option
   1946      align           -Ca option
   1947      backup          -b option
   1948      batch           -B option
   1949      c++             -+ option
   1950      
   1951      caseful or
   1952      case-sensitive  opposite of -i (default)
   1953      
   1954      case-insensitive or
   1955      caseless        -i option
   1956      
   1957      debug           -d option
   1958      default         opposite of -s option
   1959      ecs             -Ce option
   1960      fast            -F option
   1961      full            -f option
   1962      interactive     -I option
   1963      lex-compat      -l option
   1964      meta-ecs        -Cm option
   1965      perf-report     -p option
   1966      read            -Cr option
   1967      stdout          -t option
   1968      verbose         -v option
   1969      warn            opposite of -w option
   1970                      (use "%option nowarn" for -w)
   1971      
   1972      array           equivalent to "%array"
   1973      pointer         equivalent to "%pointer" (default)
   1974 
   1975    Some `%option's' provide features otherwise not available:
   1976 
   1977 `always-interactive'
   1978      instructs flex to generate a scanner which always considers its
   1979      input "interactive".  Normally, on each new input file the scanner
   1980      calls `isatty()' in an attempt to determine whether the scanner's
   1981      input source is interactive and thus should be read a character at
   1982      a time.  When this option is used, however, then no such call is
   1983      made.
   1984 
   1985 `main'
   1986      directs flex to provide a default `main()' program for the
   1987      scanner, which simply calls `yylex()'.  This option implies
   1988      `noyywrap' (see below).
   1989 
   1990 `never-interactive'
   1991      instructs flex to generate a scanner which never considers its
   1992      input "interactive" (again, no call made to `isatty())'.  This is
   1993      the opposite of `always-' *interactive*.
   1994 
   1995 `stack'
   1996      enables the use of start condition stacks (see Start Conditions
   1997      above).
   1998 
   1999 `stdinit'
   2000      if unset (i.e., `%option nostdinit') initializes `yyin' and
   2001      `yyout' to nil `FILE' pointers, instead of `stdin' and `stdout'.
   2002 
   2003 `yylineno'
   2004      directs `flex' to generate a scanner that maintains the number of
   2005      the current line read from its input in the global variable
   2006      `yylineno'.  This option is implied by `%option lex-compat'.
   2007 
   2008 `yywrap'
   2009      if unset (i.e., `%option noyywrap'), makes the scanner not call
   2010      `yywrap()' upon an end-of-file, but simply assume that there are
   2011      no more files to scan (until the user points `yyin' at a new file
   2012      and calls `yylex()' again).
   2013 
   2014    `flex' scans your rule actions to determine whether you use the
   2015 `REJECT' or `yymore()' features.  The `reject' and `yymore' options are
   2016 available to override its decision as to whether you use the options,
   2017 either by setting them (e.g., `%option reject') to indicate the feature
   2018 is indeed used, or unsetting them to indicate it actually is not used
   2019 (e.g., `%option noyymore').
   2020 
   2021    Three options take string-delimited values, offset with '=':
   2022 
   2023      %option outfile="ABC"
   2024 
   2025 is equivalent to `-oABC', and
   2026 
   2027      %option prefix="XYZ"
   2028 
   2029 is equivalent to `-PXYZ'.
   2030 
   2031    Finally,
   2032 
   2033      %option yyclass="foo"
   2034 
   2035 only applies when generating a C++ scanner (`-+' option).  It informs
   2036 `flex' that you have derived `foo' as a subclass of `yyFlexLexer' so
   2037 `flex' will place your actions in the member function `foo::yylex()'
   2038 instead of `yyFlexLexer::yylex()'.  It also generates a
   2039 `yyFlexLexer::yylex()' member function that emits a run-time error (by
   2040 invoking `yyFlexLexer::LexerError()') if called.  See Generating C++
   2041 Scanners, below, for additional information.
   2042 
   2043    A number of options are available for lint purists who want to
   2044 suppress the appearance of unneeded routines in the generated scanner.
   2045 Each of the following, if unset, results in the corresponding routine
   2046 not appearing in the generated scanner:
   2047 
   2048      input, unput
   2049      yy_push_state, yy_pop_state, yy_top_state
   2050      yy_scan_buffer, yy_scan_bytes, yy_scan_string
   2051 
   2052 (though `yy_push_state()' and friends won't appear anyway unless you
   2053 use `%option stack').
   2054 
   2055 
   2056 File: flex.info,  Node: Performance,  Next: C++,  Prev: Options,  Up: Top
   2057 
   2058 Performance considerations
   2059 ==========================
   2060 
   2061    The main design goal of `flex' is that it generate high-performance
   2062 scanners.  It has been optimized for dealing well with large sets of
   2063 rules.  Aside from the effects on scanner speed of the table
   2064 compression `-C' options outlined above, there are a number of
   2065 options/actions which degrade performance.  These are, from most
   2066 expensive to least:
   2067 
   2068      REJECT
   2069      %option yylineno
   2070      arbitrary trailing context
   2071      
   2072      pattern sets that require backing up
   2073      %array
   2074      %option interactive
   2075      %option always-interactive
   2076      
   2077      '^' beginning-of-line operator
   2078      yymore()
   2079 
   2080    with the first three all being quite expensive and the last two
   2081 being quite cheap.  Note also that `unput()' is implemented as a
   2082 routine call that potentially does quite a bit of work, while
   2083 `yyless()' is a quite-cheap macro; so if just putting back some excess
   2084 text you scanned, use `yyless()'.
   2085 
   2086    `REJECT' should be avoided at all costs when performance is
   2087 important.  It is a particularly expensive option.
   2088 
   2089    Getting rid of backing up is messy and often may be an enormous
   2090 amount of work for a complicated scanner.  In principal, one begins by
   2091 using the `-b' flag to generate a `lex.backup' file.  For example, on
   2092 the input
   2093 
   2094      %%
   2095      foo        return TOK_KEYWORD;
   2096      foobar     return TOK_KEYWORD;
   2097 
   2098 the file looks like:
   2099 
   2100      State #6 is non-accepting -
   2101       associated rule line numbers:
   2102             2       3
   2103       out-transitions: [ o ]
   2104       jam-transitions: EOF [ \001-n  p-\177 ]
   2105      
   2106      State #8 is non-accepting -
   2107       associated rule line numbers:
   2108             3
   2109       out-transitions: [ a ]
   2110       jam-transitions: EOF [ \001-`  b-\177 ]
   2111      
   2112      State #9 is non-accepting -
   2113       associated rule line numbers:
   2114             3
   2115       out-transitions: [ r ]
   2116       jam-transitions: EOF [ \001-q  s-\177 ]
   2117      
   2118      Compressed tables always back up.
   2119 
   2120    The first few lines tell us that there's a scanner state in which it
   2121 can make a transition on an 'o' but not on any other character, and
   2122 that in that state the currently scanned text does not match any rule.
   2123 The state occurs when trying to match the rules found at lines 2 and 3
   2124 in the input file.  If the scanner is in that state and then reads
   2125 something other than an 'o', it will have to back up to find a rule
   2126 which is matched.  With a bit of head-scratching one can see that this
   2127 must be the state it's in when it has seen "fo".  When this has
   2128 happened, if anything other than another 'o' is seen, the scanner will
   2129 have to back up to simply match the 'f' (by the default rule).
   2130 
   2131    The comment regarding State #8 indicates there's a problem when
   2132 "foob" has been scanned.  Indeed, on any character other than an 'a',
   2133 the scanner will have to back up to accept "foo".  Similarly, the
   2134 comment for State #9 concerns when "fooba" has been scanned and an 'r'
   2135 does not follow.
   2136 
   2137    The final comment reminds us that there's no point going to all the
   2138 trouble of removing backing up from the rules unless we're using `-Cf'
   2139 or `-CF', since there's no performance gain doing so with compressed
   2140 scanners.
   2141 
   2142    The way to remove the backing up is to add "error" rules:
   2143 
   2144      %%
   2145      foo         return TOK_KEYWORD;
   2146      foobar      return TOK_KEYWORD;
   2147      
   2148      fooba       |
   2149      foob        |
   2150      fo          {
   2151                  /* false alarm, not really a keyword */
   2152                  return TOK_ID;
   2153                  }
   2154 
   2155    Eliminating backing up among a list of keywords can also be done
   2156 using a "catch-all" rule:
   2157 
   2158      %%
   2159      foo         return TOK_KEYWORD;
   2160      foobar      return TOK_KEYWORD;
   2161      
   2162      [a-z]+      return TOK_ID;
   2163 
   2164    This is usually the best solution when appropriate.
   2165 
   2166    Backing up messages tend to cascade.  With a complicated set of
   2167 rules it's not uncommon to get hundreds of messages.  If one can
   2168 decipher them, though, it often only takes a dozen or so rules to
   2169 eliminate the backing up (though it's easy to make a mistake and have
   2170 an error rule accidentally match a valid token.  A possible future
   2171 `flex' feature will be to automatically add rules to eliminate backing
   2172 up).
   2173 
   2174    It's important to keep in mind that you gain the benefits of
   2175 eliminating backing up only if you eliminate *every* instance of
   2176 backing up.  Leaving just one means you gain nothing.
   2177 
   2178    VARIABLE trailing context (where both the leading and trailing parts
   2179 do not have a fixed length) entails almost the same performance loss as
   2180 `REJECT' (i.e., substantial).  So when possible a rule like:
   2181 
   2182      %%
   2183      mouse|rat/(cat|dog)   run();
   2184 
   2185 is better written:
   2186 
   2187      %%
   2188      mouse/cat|dog         run();
   2189      rat/cat|dog           run();
   2190 
   2191 or as
   2192 
   2193      %%
   2194      mouse|rat/cat         run();
   2195      mouse|rat/dog         run();
   2196 
   2197    Note that here the special '|' action does *not* provide any
   2198 savings, and can even make things worse (see Deficiencies / Bugs below).
   2199 
   2200    Another area where the user can increase a scanner's performance
   2201 (and one that's easier to implement) arises from the fact that the
   2202 longer the tokens matched, the faster the scanner will run.  This is
   2203 because with long tokens the processing of most input characters takes
   2204 place in the (short) inner scanning loop, and does not often have to go
   2205 through the additional work of setting up the scanning environment
   2206 (e.g., `yytext') for the action.  Recall the scanner for C comments:
   2207 
   2208      %x comment
   2209      %%
   2210              int line_num = 1;
   2211      
   2212      "/*"         BEGIN(comment);
   2213      
   2214      <comment>[^*\n]*
   2215      <comment>"*"+[^*/\n]*
   2216      <comment>\n             ++line_num;
   2217      <comment>"*"+"/"        BEGIN(INITIAL);
   2218 
   2219    This could be sped up by writing it as:
   2220 
   2221      %x comment
   2222      %%
   2223              int line_num = 1;
   2224      
   2225      "/*"         BEGIN(comment);
   2226      
   2227      <comment>[^*\n]*
   2228      <comment>[^*\n]*\n      ++line_num;
   2229      <comment>"*"+[^*/\n]*
   2230      <comment>"*"+[^*/\n]*\n ++line_num;
   2231      <comment>"*"+"/"        BEGIN(INITIAL);
   2232 
   2233    Now instead of each newline requiring the processing of another
   2234 action, recognizing the newlines is "distributed" over the other rules
   2235 to keep the matched text as long as possible.  Note that *adding* rules
   2236 does *not* slow down the scanner!  The speed of the scanner is
   2237 independent of the number of rules or (modulo the considerations given
   2238 at the beginning of this section) how complicated the rules are with
   2239 regard to operators such as '*' and '|'.
   2240 
   2241    A final example in speeding up a scanner: suppose you want to scan
   2242 through a file containing identifiers and keywords, one per line and
   2243 with no other extraneous characters, and recognize all the keywords.  A
   2244 natural first approach is:
   2245 
   2246      %%
   2247      asm      |
   2248      auto     |
   2249      break    |
   2250      ... etc ...
   2251      volatile |
   2252      while    /* it's a keyword */
   2253      
   2254      .|\n     /* it's not a keyword */
   2255 
   2256    To eliminate the back-tracking, introduce a catch-all rule:
   2257 
   2258      %%
   2259      asm      |
   2260      auto     |
   2261      break    |
   2262      ... etc ...
   2263      volatile |
   2264      while    /* it's a keyword */
   2265      
   2266      [a-z]+   |
   2267      .|\n     /* it's not a keyword */
   2268 
   2269    Now, if it's guaranteed that there's exactly one word per line, then
   2270 we can reduce the total number of matches by a half by merging in the
   2271 recognition of newlines with that of the other tokens:
   2272 
   2273      %%
   2274      asm\n    |
   2275      auto\n   |
   2276      break\n  |
   2277      ... etc ...
   2278      volatile\n |
   2279      while\n  /* it's a keyword */
   2280      
   2281      [a-z]+\n |
   2282      .|\n     /* it's not a keyword */
   2283 
   2284    One has to be careful here, as we have now reintroduced backing up
   2285 into the scanner.  In particular, while *we* know that there will never
   2286 be any characters in the input stream other than letters or newlines,
   2287 `flex' can't figure this out, and it will plan for possibly needing to
   2288 back up when it has scanned a token like "auto" and then the next
   2289 character is something other than a newline or a letter.  Previously it
   2290 would then just match the "auto" rule and be done, but now it has no
   2291 "auto" rule, only a "auto\n" rule.  To eliminate the possibility of
   2292 backing up, we could either duplicate all rules but without final
   2293 newlines, or, since we never expect to encounter such an input and
   2294 therefore don't how it's classified, we can introduce one more
   2295 catch-all rule, this one which doesn't include a newline:
   2296 
   2297      %%
   2298      asm\n    |
   2299      auto\n   |
   2300      break\n  |
   2301      ... etc ...
   2302      volatile\n |
   2303      while\n  /* it's a keyword */
   2304      
   2305      [a-z]+\n |
   2306      [a-z]+   |
   2307      .|\n     /* it's not a keyword */
   2308 
   2309    Compiled with `-Cf', this is about as fast as one can get a `flex'
   2310 scanner to go for this particular problem.
   2311 
   2312    A final note: `flex' is slow when matching NUL's, particularly when
   2313 a token contains multiple NUL's.  It's best to write rules which match
   2314 *short* amounts of text if it's anticipated that the text will often
   2315 include NUL's.
   2316 
   2317    Another final note regarding performance: as mentioned above in the
   2318 section How the Input is Matched, dynamically resizing `yytext' to
   2319 accommodate huge tokens is a slow process because it presently requires
   2320 that the (huge) token be rescanned from the beginning.  Thus if
   2321 performance is vital, you should attempt to match "large" quantities of
   2322 text but not "huge" quantities, where the cutoff between the two is at
   2323 about 8K characters/token.
   2324 
   2325 
   2326 File: flex.info,  Node: C++,  Next: Incompatibilities,  Prev: Performance,  Up: Top
   2327 
   2328 Generating C++ scanners
   2329 =======================
   2330 
   2331    `flex' provides two different ways to generate scanners for use with
   2332 C++.  The first way is to simply compile a scanner generated by `flex'
   2333 using a C++ compiler instead of a C compiler.  You should not encounter
   2334 any compilations errors (please report any you find to the email address
   2335 given in the Author section below).  You can then use C++ code in your
   2336 rule actions instead of C code.  Note that the default input source for
   2337 your scanner remains `yyin', and default echoing is still done to
   2338 `yyout'.  Both of these remain `FILE *' variables and not C++ `streams'.
   2339 
   2340    You can also use `flex' to generate a C++ scanner class, using the
   2341 `-+' option, (or, equivalently, `%option c++'), which is automatically
   2342 specified if the name of the flex executable ends in a `+', such as
   2343 `flex++'.  When using this option, flex defaults to generating the
   2344 scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
   2345 scanner includes the header file `FlexLexer.h', which defines the
   2346 interface to two C++ classes.
   2347 
   2348    The first class, `FlexLexer', provides an abstract base class
   2349 defining the general scanner class interface.  It provides the
   2350 following member functions:
   2351 
   2352 `const char* YYText()'
   2353      returns the text of the most recently matched token, the
   2354      equivalent of `yytext'.
   2355 
   2356 `int YYLeng()'
   2357      returns the length of the most recently matched token, the
   2358      equivalent of `yyleng'.
   2359 
   2360 `int lineno() const'
   2361      returns the current input line number (see `%option yylineno'), or
   2362      1 if `%option yylineno' was not used.
   2363 
   2364 `void set_debug( int flag )'
   2365      sets the debugging flag for the scanner, equivalent to assigning to
   2366      `yy_flex_debug' (see the Options section above).  Note that you
   2367      must build the scanner using `%option debug' to include debugging
   2368      information in it.
   2369 
   2370 `int debug() const'
   2371      returns the current setting of the debugging flag.
   2372 
   2373    Also provided are member functions equivalent to
   2374 `yy_switch_to_buffer(), yy_create_buffer()' (though the first argument
   2375 is an `istream*' object pointer and not a `FILE*', `yy_flush_buffer()',
   2376 `yy_delete_buffer()', and `yyrestart()' (again, the first argument is a
   2377 `istream*' object pointer).
   2378 
   2379    The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
   2380 derived from `FlexLexer'.  It defines the following additional member
   2381 functions:
   2382 
   2383 `yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
   2384      constructs a `yyFlexLexer' object using the given streams for
   2385      input and output.  If not specified, the streams default to `cin'
   2386      and `cout', respectively.
   2387 
   2388 `virtual int yylex()'
   2389      performs the same role is `yylex()' does for ordinary flex
   2390      scanners: it scans the input stream, consuming tokens, until a
   2391      rule's action returns a value.  If you derive a subclass S from
   2392      `yyFlexLexer' and want to access the member functions and
   2393      variables of S inside `yylex()', then you need to use `%option
   2394      yyclass="S"' to inform `flex' that you will be using that subclass
   2395      instead of `yyFlexLexer'.  In this case, rather than generating
   2396      `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
   2397      generates a dummy `yyFlexLexer::yylex()' that calls
   2398      `yyFlexLexer::LexerError()' if called).
   2399 
   2400 `virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
   2401      reassigns `yyin' to `new_in' (if non-nil) and `yyout' to `new_out'
   2402      (ditto), deleting the previous input buffer if `yyin' is
   2403      reassigned.
   2404 
   2405 `int yylex( istream* new_in = 0, ostream* new_out = 0 )'
   2406      first switches the input streams via `switch_streams( new_in,
   2407      new_out )' and then returns the value of `yylex()'.
   2408 
   2409    In addition, `yyFlexLexer' defines the following protected virtual
   2410 functions which you can redefine in derived classes to tailor the
   2411 scanner:
   2412 
   2413 `virtual int LexerInput( char* buf, int max_size )'
   2414      reads up to `max_size' characters into BUF and returns the number
   2415      of characters read.  To indicate end-of-input, return 0
   2416      characters.  Note that "interactive" scanners (see the `-B' and
   2417      `-I' flags) define the macro `YY_INTERACTIVE'.  If you redefine
   2418      `LexerInput()' and need to take different actions depending on
   2419      whether or not the scanner might be scanning an interactive input
   2420      source, you can test for the presence of this name via `#ifdef'.
   2421 
   2422 `virtual void LexerOutput( const char* buf, int size )'
   2423      writes out SIZE characters from the buffer BUF, which, while
   2424      NUL-terminated, may also contain "internal" NUL's if the scanner's
   2425      rules can match text with NUL's in them.
   2426 
   2427 `virtual void LexerError( const char* msg )'
   2428      reports a fatal error message.  The default version of this
   2429      function writes the message to the stream `cerr' and exits.
   2430 
   2431    Note that a `yyFlexLexer' object contains its *entire* scanning
   2432 state.  Thus you can use such objects to create reentrant scanners.
   2433 You can instantiate multiple instances of the same `yyFlexLexer' class,
   2434 and you can also combine multiple C++ scanner classes together in the
   2435 same program using the `-P' option discussed above.  Finally, note that
   2436 the `%array' feature is not available to C++ scanner classes; you must
   2437 use `%pointer' (the default).
   2438 
   2439    Here is an example of a simple C++ scanner:
   2440 
   2441          // An example of using the flex C++ scanner class.
   2442      
   2443      %{
   2444      int mylineno = 0;
   2445      %}
   2446      
   2447      string  \"[^\n"]+\"
   2448      
   2449      ws      [ \t]+
   2450      
   2451      alpha   [A-Za-z]
   2452      dig     [0-9]
   2453      name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
   2454      num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
   2455      num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
   2456      number  {num1}|{num2}
   2457      
   2458      %%
   2459      
   2460      {ws}    /* skip blanks and tabs */
   2461      
   2462      "/*"    {
   2463              int c;
   2464      
   2465              while((c = yyinput()) != 0)
   2466                  {
   2467                  if(c == '\n')
   2468                      ++mylineno;
   2469      
   2470                  else if(c == '*')
   2471                      {
   2472                      if((c = yyinput()) == '/')
   2473                          break;
   2474                      else
   2475                          unput(c);
   2476                      }
   2477                  }
   2478              }
   2479      
   2480      {number}  cout << "number " << YYText() << '\n';
   2481      
   2482      \n        mylineno++;
   2483      
   2484      {name}    cout << "name " << YYText() << '\n';
   2485      
   2486      {string}  cout << "string " << YYText() << '\n';
   2487      
   2488      %%
   2489      
   2490      Version 2.5               December 1994                        44
   2491      
   2492      int main( int /* argc */, char** /* argv */ )
   2493          {
   2494          FlexLexer* lexer = new yyFlexLexer;
   2495          while(lexer->yylex() != 0)
   2496              ;
   2497          return 0;
   2498          }
   2499 
   2500    If you want to create multiple (different) lexer classes, you use
   2501 the `-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
   2502 some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
   2503 other sources once per lexer class, first renaming `yyFlexLexer' as
   2504 follows:
   2505 
   2506      #undef yyFlexLexer
   2507      #define yyFlexLexer xxFlexLexer
   2508      #include <FlexLexer.h>
   2509      
   2510      #undef yyFlexLexer
   2511      #define yyFlexLexer zzFlexLexer
   2512      #include <FlexLexer.h>
   2513 
   2514    if, for example, you used `%option prefix="xx"' for one of your
   2515 scanners and `%option prefix="zz"' for the other.
   2516 
   2517    IMPORTANT: the present form of the scanning class is *experimental*
   2518 and may change considerably between major releases.
   2519 
   2520 
   2521 File: flex.info,  Node: Incompatibilities,  Next: Diagnostics,  Prev: C++,  Up: Top
   2522 
   2523 Incompatibilities with `lex' and POSIX
   2524 ======================================
   2525 
   2526    `flex' is a rewrite of the AT&T Unix `lex' tool (the two
   2527 implementations do not share any code, though), with some extensions
   2528 and incompatibilities, both of which are of concern to those who wish
   2529 to write scanners acceptable to either implementation.  Flex is fully
   2530 compliant with the POSIX `lex' specification, except that when using
   2531 `%pointer' (the default), a call to `unput()' destroys the contents of
   2532 `yytext', which is counter to the POSIX specification.
   2533 
   2534    In this section we discuss all of the known areas of incompatibility
   2535 between flex, AT&T lex, and the POSIX specification.
   2536 
   2537    `flex's' `-l' option turns on maximum compatibility with the
   2538 original AT&T `lex' implementation, at the cost of a major loss in the
   2539 generated scanner's performance.  We note below which incompatibilities
   2540 can be overcome using the `-l' option.
   2541 
   2542    `flex' is fully compatible with `lex' with the following exceptions:
   2543 
   2544    - The undocumented `lex' scanner internal variable `yylineno' is not
   2545      supported unless `-l' or `%option yylineno' is used.  `yylineno'
   2546      should be maintained on a per-buffer basis, rather than a
   2547      per-scanner (single global variable) basis.  `yylineno' is not
   2548      part of the POSIX specification.
   2549 
   2550    - The `input()' routine is not redefinable, though it may be called
   2551      to read characters following whatever has been matched by a rule.
   2552      If `input()' encounters an end-of-file the normal `yywrap()'
   2553      processing is done.  A "real" end-of-file is returned by `input()'
   2554      as `EOF'.
   2555 
   2556      Input is instead controlled by defining the `YY_INPUT' macro.
   2557 
   2558      The `flex' restriction that `input()' cannot be redefined is in
   2559      accordance with the POSIX specification, which simply does not
   2560      specify any way of controlling the scanner's input other than by
   2561      making an initial assignment to `yyin'.
   2562 
   2563    - The `unput()' routine is not redefinable.  This restriction is in
   2564      accordance with POSIX.
   2565 
   2566    - `flex' scanners are not as reentrant as `lex' scanners.  In
   2567      particular, if you have an interactive scanner and an interrupt
   2568      handler which long-jumps out of the scanner, and the scanner is
   2569      subsequently called again, you may get the following message:
   2570 
   2571           fatal flex scanner internal error--end of buffer missed
   2572 
   2573      To reenter the scanner, first use
   2574 
   2575           yyrestart( yyin );
   2576 
   2577      Note that this call will throw away any buffered input; usually
   2578      this isn't a problem with an interactive scanner.
   2579 
   2580      Also note that flex C++ scanner classes *are* reentrant, so if
   2581      using C++ is an option for you, you should use them instead.  See
   2582      "Generating C++ Scanners" above for details.
   2583 
   2584    - `output()' is not supported.  Output from the `ECHO' macro is done
   2585      to the file-pointer `yyout' (default `stdout').
   2586 
   2587      `output()' is not part of the POSIX specification.
   2588 
   2589    - `lex' does not support exclusive start conditions (%x), though
   2590      they are in the POSIX specification.
   2591 
   2592    - When definitions are expanded, `flex' encloses them in
   2593      parentheses.  With lex, the following:
   2594 
   2595           NAME    [A-Z][A-Z0-9]*
   2596           %%
   2597           foo{NAME}?      printf( "Found it\n" );
   2598           %%
   2599 
   2600      will not match the string "foo" because when the macro is expanded
   2601      the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence
   2602      is such that the '?' is associated with "[A-Z0-9]*".  With `flex',
   2603      the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the
   2604      string "foo" will match.
   2605 
   2606      Note that if the definition begins with `^' or ends with `$' then
   2607      it is *not* expanded with parentheses, to allow these operators to
   2608      appear in definitions without losing their special meanings.  But
   2609      the `<s>, /', and `<<EOF>>' operators cannot be used in a `flex'
   2610      definition.
   2611 
   2612      Using `-l' results in the `lex' behavior of no parentheses around
   2613      the definition.
   2614 
   2615      The POSIX specification is that the definition be enclosed in
   2616      parentheses.
   2617 
   2618    - Some implementations of `lex' allow a rule's action to begin on a
   2619      separate line, if the rule's pattern has trailing whitespace:
   2620 
   2621           %%
   2622           foo|bar<space here>
   2623             { foobar_action(); }
   2624 
   2625      `flex' does not support this feature.
   2626 
   2627    - The `lex' `%r' (generate a Ratfor scanner) option is not
   2628      supported.  It is not part of the POSIX specification.
   2629 
   2630    - After a call to `unput()', `yytext' is undefined until the next
   2631      token is matched, unless the scanner was built using `%array'.
   2632      This is not the case with `lex' or the POSIX specification.  The
   2633      `-l' option does away with this incompatibility.
   2634 
   2635    - The precedence of the `{}' (numeric range) operator is different.
   2636      `lex' interprets "abc{1,3}" as "match one, two, or three
   2637      occurrences of 'abc'", whereas `flex' interprets it as "match 'ab'
   2638      followed by one, two, or three occurrences of 'c'".  The latter is
   2639      in agreement with the POSIX specification.
   2640 
   2641    - The precedence of the `^' operator is different.  `lex' interprets
   2642      "^foo|bar" as "match either 'foo' at the beginning of a line, or
   2643      'bar' anywhere", whereas `flex' interprets it as "match either
   2644      'foo' or 'bar' if they come at the beginning of a line".  The
   2645      latter is in agreement with the POSIX specification.
   2646 
   2647    - The special table-size declarations such as `%a' supported by
   2648      `lex' are not required by `flex' scanners; `flex' ignores them.
   2649 
   2650    - The name FLEX_SCANNER is #define'd so scanners may be written for
   2651      use with either `flex' or `lex'.  Scanners also include
   2652      `YY_FLEX_MAJOR_VERSION' and `YY_FLEX_MINOR_VERSION' indicating
   2653      which version of `flex' generated the scanner (for example, for the
   2654      2.5 release, these defines would be 2 and 5 respectively).
   2655 
   2656    The following `flex' features are not included in `lex' or the POSIX
   2657 specification:
   2658 
   2659      C++ scanners
   2660      %option
   2661      start condition scopes
   2662      start condition stacks
   2663      interactive/non-interactive scanners
   2664      yy_scan_string() and friends
   2665      yyterminate()
   2666      yy_set_interactive()
   2667      yy_set_bol()
   2668      YY_AT_BOL()
   2669      <<EOF>>
   2670      <*>
   2671      YY_DECL
   2672      YY_START
   2673      YY_USER_ACTION
   2674      YY_USER_INIT
   2675      #line directives
   2676      %{}'s around actions
   2677      multiple actions on a line
   2678 
   2679 plus almost all of the flex flags.  The last feature in the list refers
   2680 to the fact that with `flex' you can put multiple actions on the same
   2681 line, separated with semicolons, while with `lex', the following
   2682 
   2683      foo    handle_foo(); ++num_foos_seen;
   2684 
   2685 is (rather surprisingly) truncated to
   2686 
   2687      foo    handle_foo();
   2688 
   2689    `flex' does not truncate the action.  Actions that are not enclosed
   2690 in braces are simply terminated at the end of the line.
   2691 
   2692 
   2693 File: flex.info,  Node: Diagnostics,  Next: Files,  Prev: Incompatibilities,  Up: Top
   2694 
   2695 Diagnostics
   2696 ===========
   2697 
   2698 `warning, rule cannot be matched'
   2699      indicates that the given rule cannot be matched because it follows
   2700      other rules that will always match the same text as it.  For
   2701      example, in the following "foo" cannot be matched because it comes
   2702      after an identifier "catch-all" rule:
   2703 
   2704           [a-z]+    got_identifier();
   2705           foo       got_foo();
   2706 
   2707      Using `REJECT' in a scanner suppresses this warning.
   2708 
   2709 `warning, -s option given but default rule can be matched'
   2710      means that it is possible (perhaps only in a particular start
   2711      condition) that the default rule (match any single character) is
   2712      the only one that will match a particular input.  Since `-s' was
   2713      given, presumably this is not intended.
   2714 
   2715 `reject_used_but_not_detected undefined'
   2716 `yymore_used_but_not_detected undefined'
   2717      These errors can occur at compile time.  They indicate that the
   2718      scanner uses `REJECT' or `yymore()' but that `flex' failed to
   2719      notice the fact, meaning that `flex' scanned the first two sections
   2720      looking for occurrences of these actions and failed to find any,
   2721      but somehow you snuck some in (via a #include file, for example).
   2722      Use `%option reject' or `%option yymore' to indicate to flex that
   2723      you really do use these features.
   2724 
   2725 `flex scanner jammed'
   2726      a scanner compiled with `-s' has encountered an input string which
   2727      wasn't matched by any of its rules.  This error can also occur due
   2728      to internal problems.
   2729 
   2730 `token too large, exceeds YYLMAX'
   2731      your scanner uses `%array' and one of its rules matched a string
   2732      longer than the `YYL-' `MAX' constant (8K bytes by default).  You
   2733      can increase the value by #define'ing `YYLMAX' in the definitions
   2734      section of your `flex' input.
   2735 
   2736 `scanner requires -8 flag to use the character 'X''
   2737      Your scanner specification includes recognizing the 8-bit
   2738      character X and you did not specify the -8 flag, and your scanner
   2739      defaulted to 7-bit because you used the `-Cf' or `-CF' table
   2740      compression options.  See the discussion of the `-7' flag for
   2741      details.
   2742 
   2743 `flex scanner push-back overflow'
   2744      you used `unput()' to push back so much text that the scanner's
   2745      buffer could not hold both the pushed-back text and the current
   2746      token in `yytext'.  Ideally the scanner should dynamically resize
   2747      the buffer in this case, but at present it does not.
   2748 
   2749 `input buffer overflow, can't enlarge buffer because scanner uses REJECT'
   2750      the scanner was working on matching an extremely large token and
   2751      needed to expand the input buffer.  This doesn't work with
   2752      scanners that use `REJECT'.
   2753 
   2754 `fatal flex scanner internal error--end of buffer missed'
   2755      This can occur in an scanner which is reentered after a long-jump
   2756      has jumped out (or over) the scanner's activation frame.  Before
   2757      reentering the scanner, use:
   2758 
   2759           yyrestart( yyin );
   2760 
   2761      or, as noted above, switch to using the C++ scanner class.
   2762 
   2763 `too many start conditions in <> construct!'
   2764      you listed more start conditions in a <> construct than exist (so
   2765      you must have listed at least one of them twice).
   2766 
   2767 
   2768 File: flex.info,  Node: Files,  Next: Deficiencies,  Prev: Diagnostics,  Up: Top
   2769 
   2770 Files
   2771 =====
   2772 
   2773 `-lfl'
   2774      library with which scanners must be linked.
   2775 
   2776 `lex.yy.c'
   2777      generated scanner (called `lexyy.c' on some systems).
   2778 
   2779 `lex.yy.cc'
   2780      generated C++ scanner class, when using `-+'.
   2781 
   2782 `<FlexLexer.h>'
   2783      header file defining the C++ scanner base class, `FlexLexer', and
   2784      its derived class, `yyFlexLexer'.
   2785 
   2786 `flex.skl'
   2787      skeleton scanner.  This file is only used when building flex, not
   2788      when flex executes.
   2789 
   2790 `lex.backup'
   2791      backing-up information for `-b' flag (called `lex.bck' on some
   2792      systems).
   2793 
   2794 
   2795 File: flex.info,  Node: Deficiencies,  Next: See also,  Prev: Files,  Up: Top
   2796 
   2797 Deficiencies / Bugs
   2798 ===================
   2799 
   2800    Some trailing context patterns cannot be properly matched and
   2801 generate warning messages ("dangerous trailing context").  These are
   2802 patterns where the ending of the first part of the rule matches the
   2803 beginning of the second part, such as "zx*/xy*", where the 'x*' matches
   2804 the 'x' at the beginning of the trailing context.  (Note that the POSIX
   2805 draft states that the text matched by such patterns is undefined.)
   2806 
   2807    For some trailing context rules, parts which are actually
   2808 fixed-length are not recognized as such, leading to the abovementioned
   2809 performance loss.  In particular, parts using '|' or {n} (such as
   2810 "foo{3}") are always considered variable-length.
   2811 
   2812    Combining trailing context with the special '|' action can result in
   2813 *fixed* trailing context being turned into the more expensive VARIABLE
   2814 trailing context.  For example, in the following:
   2815 
   2816      %%
   2817      abc      |
   2818      xyz/def
   2819 
   2820    Use of `unput()' invalidates yytext and yyleng, unless the `%array'
   2821 directive or the `-l' option has been used.
   2822 
   2823    Pattern-matching of NUL's is substantially slower than matching
   2824 other characters.
   2825 
   2826    Dynamic resizing of the input buffer is slow, as it entails
   2827 rescanning all the text matched so far by the current (generally huge)
   2828 token.
   2829 
   2830    Due to both buffering of input and read-ahead, you cannot intermix
   2831 calls to <stdio.h> routines, such as, for example, `getchar()', with
   2832 `flex' rules and expect it to work.  Call `input()' instead.
   2833 
   2834    The total table entries listed by the `-v' flag excludes the number
   2835 of table entries needed to determine what rule has been matched.  The
   2836 number of entries is equal to the number of DFA states if the scanner
   2837 does not use `REJECT', and somewhat greater than the number of states
   2838 if it does.
   2839 
   2840    `REJECT' cannot be used with the `-f' or `-F' options.
   2841 
   2842    The `flex' internal algorithms need documentation.
   2843 
   2844 
   2845 File: flex.info,  Node: See also,  Next: Author,  Prev: Deficiencies,  Up: Top
   2846 
   2847 See also
   2848 ========
   2849 
   2850    `lex'(1), `yacc'(1), `sed'(1), `awk'(1).
   2851 
   2852    John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly and
   2853 Associates.  Be sure to get the 2nd edition.
   2854 
   2855    M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
   2856 
   2857    Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles,
   2858 Techniques and Tools; Addison-Wesley (1986).  Describes the
   2859 pattern-matching techniques used by `flex' (deterministic finite
   2860 automata).
   2861 
   2862 
   2863 File: flex.info,  Node: Author,  Prev: See also,  Up: Top
   2864 
   2865 Author
   2866 ======
   2867 
   2868    Vern Paxson, with the help of many ideas and much inspiration from
   2869 Van Jacobson.  Original version by Jef Poskanzer.  The fast table
   2870 representation is a partial implementation of a design done by Van
   2871 Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
   2872 
   2873    Thanks to the many `flex' beta-testers, feedbackers, and
   2874 contributors, especially Francois Pinard, Casey Leedom, Stan Adermann,
   2875 Terry Allen, David Barker-Plummer, John Basrai, Nelson H.F. Beebe,
   2876 `benson (a] odi.com', Karl Berry, Peter A. Bigot, Simon Blanchard, Keith
   2877 Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, Brian
   2878 Clapper, J.T. Conklin, Jason Coughlin, Bill Cox, Nick Cropper, Dave
   2879 Curtis, Scott David Daniels, Chris G. Demetriou, Theo Deraadt, Mike
   2880 Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor, Chris
   2881 Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,
   2882 Christopher M.  Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles
   2883 Hemphill, NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig,
   2884 Dana Hudes, Eric Hughes, John Interrante, Ceriel Jacobs, Michal
   2885 Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry Juengst, Klaus
   2886 Kaempf, Jonathan I. Kamens, Terrence O Kane, Amir Katz,
   2887 `ken (a] ken.hilco.com', Kevin B. Kenny, Steve Kirsch, Winfried Koenig,
   2888 Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, John
   2889 Levine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte,
   2890 Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim
   2891 Meyering, R.  Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll,
   2892 James Nordby, Marc Nozell, Richard Ohnemus, Karsten Pahnke, Sven Panne,
   2893 Roland Pesch, Walter Pelissero, Gaumond Pierre, Esmond Pitt, Jef
   2894 Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic Raimbault, Pat Rankin,
   2895 Rick Richardson, Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto
   2896 Santini, Andreas Scherer, Darrell Schiebel, Raf Schietekat, Doug
   2897 Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel, Eckehard
   2898 Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
   2899 Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul
   2900 Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
   2901 Yap, Ron Zellar, Nathan Zelle, David Zuhn, and those whose names have
   2902 slipped my marginal mail-archiving skills but whose contributions are
   2903 appreciated all the same.
   2904 
   2905    Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
   2906 Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol, Francois Pinard,
   2907 Rich Salz, and Richard Stallman for help with various distribution
   2908 headaches.
   2909 
   2910    Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
   2911 to Benson Margulies and Fred Burke for C++ support; to Kent Williams
   2912 and Tom Epperly for C++ class support; to Ove Ewerlid for support of
   2913 NUL's; and to Eric Hughes for support of multiple buffers.
   2914 
   2915    This work was primarily done when I was with the Real Time Systems
   2916 Group at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks
   2917 to all there for the support I received.
   2918 
   2919    Send comments to `vern (a] ee.lbl.gov'.
   2920 
   2921 
   2922 
   2923 Tag Table:
   2924 Node: Top1430
   2925 Node: Name2808
   2926 Node: Synopsis2933
   2927 Node: Overview3145
   2928 Node: Description4986
   2929 Node: Examples5748
   2930 Node: Format8896
   2931 Node: Patterns11637
   2932 Node: Matching18138
   2933 Node: Actions21438
   2934 Node: Generated scanner30560
   2935 Node: Start conditions34988
   2936 Node: Multiple buffers45069
   2937 Node: End-of-file rules50975
   2938 Node: Miscellaneous52508
   2939 Node: User variables55279
   2940 Node: YACC interface57651
   2941 Node: Options58542
   2942 Node: Performance78234
   2943 Node: C++87532
   2944 Node: Incompatibilities94993
   2945 Node: Diagnostics101853
   2946 Node: Files105094
   2947 Node: Deficiencies105715
   2948 Node: See also107684
   2949 Node: Author108216
   2950 
   2951 End Tag Table
   2952