Home | History | Annotate | Download | only in MISC
      1 
      2 
      3 
      4 FLEX(1)                  USER COMMANDS                    FLEX(1)
      5 
      6 
      7 
      8 NAME
      9      flex - fast lexical analyzer generator
     10 
     11 SYNOPSIS
     12      flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput  -Pprefix
     13      -Sskeleton] [--help --version] [filename ...]
     14 
     15 OVERVIEW
     16      This manual describes flex, a tool for  generating  programs
     17      that  perform pattern-matching on text.  The manual includes
     18      both tutorial and reference sections:
     19 
     20          Description
     21              a brief overview of the tool
     22 
     23          Some Simple Examples
     24 
     25          Format Of The Input File
     26 
     27          Patterns
     28              the extended regular expressions used by flex
     29 
     30          How The Input Is Matched
     31              the rules for determining what has been matched
     32 
     33          Actions
     34              how to specify what to do when a pattern is matched
     35 
     36          The Generated Scanner
     37              details regarding the scanner that flex produces;
     38              how to control the input source
     39 
     40          Start Conditions
     41              introducing context into your scanners, and
     42              managing "mini-scanners"
     43 
     44          Multiple Input Buffers
     45              how to manipulate multiple input sources; how to
     46              scan from strings instead of files
     47 
     48          End-of-file Rules
     49              special rules for matching the end of the input
     50 
     51          Miscellaneous Macros
     52              a summary of macros available to the actions
     53 
     54          Values Available To The User
     55              a summary of values available to the actions
     56 
     57          Interfacing With Yacc
     58              connecting flex scanners together with yacc parsers
     59 
     60 
     61 
     62 
     63 Version 2.5          Last change: April 1995                    1
     64 
     65 
     66 
     67 
     68 
     69 
     70 FLEX(1)                  USER COMMANDS                    FLEX(1)
     71 
     72 
     73 
     74          Options
     75              flex command-line options, and the "%option"
     76              directive
     77 
     78          Performance Considerations
     79              how to make your scanner go as fast as possible
     80 
     81          Generating C++ Scanners
     82              the (experimental) facility for generating C++
     83              scanner classes
     84 
     85          Incompatibilities With Lex And POSIX
     86              how flex differs from AT&T lex and the POSIX lex
     87              standard
     88 
     89          Diagnostics
     90              those error messages produced by flex (or scanners
     91              it generates) whose meanings might not be apparent
     92 
     93          Files
     94              files used by flex
     95 
     96          Deficiencies / Bugs
     97              known problems with flex
     98 
     99          See Also
    100              other documentation, related tools
    101 
    102          Author
    103              includes contact information
    104 
    105 
    106 DESCRIPTION
    107      flex is a  tool  for  generating  scanners:  programs  which
    108      recognized  lexical  patterns in text.  flex reads the given
    109      input files, or its standard input  if  no  file  names  are
    110      given,  for  a  description  of  a scanner to generate.  The
    111      description is in the form of pairs of  regular  expressions
    112      and  C  code,  called  rules.  flex  generates as output a C
    113      source file, lex.yy.c, which defines a routine yylex(). This
    114      file is compiled and linked with the -lfl library to produce
    115      an executable.  When the executable is run, it analyzes  its
    116      input  for occurrences of the regular expressions.  Whenever
    117      it finds one, it executes the corresponding C code.
    118 
    119 SOME SIMPLE EXAMPLES
    120      First some simple examples to get the flavor of how one uses
    121      flex.  The  following  flex  input specifies a scanner which
    122      whenever it encounters the string "username" will replace it
    123      with the user's login name:
    124 
    125          %%
    126 
    127 
    128 
    129 Version 2.5          Last change: April 1995                    2
    130 
    131 
    132 
    133 
    134 
    135 
    136 FLEX(1)                  USER COMMANDS                    FLEX(1)
    137 
    138 
    139 
    140          username    printf( "%s", getlogin() );
    141 
    142      By default, any text not matched by a flex scanner is copied
    143      to  the output, so the net effect of this scanner is to copy
    144      its input file to its output with each occurrence of  "user-
    145      name"  expanded.   In  this  input,  there is just one rule.
    146      "username" is the pattern and the "printf"  is  the  action.
    147      The "%%" marks the beginning of the rules.
    148 
    149      Here's another simple example:
    150 
    151                  int num_lines = 0, num_chars = 0;
    152 
    153          %%
    154          \n      ++num_lines; ++num_chars;
    155          .       ++num_chars;
    156 
    157          %%
    158          main()
    159                  {
    160                  yylex();
    161                  printf( "# of lines = %d, # of chars = %d\n",
    162                          num_lines, num_chars );
    163                  }
    164 
    165      This scanner counts the number of characters and the  number
    166      of  lines in its input (it produces no output other than the
    167      final report on the counts).  The first  line  declares  two
    168      globals,  "num_lines"  and "num_chars", which are accessible
    169      both inside yylex() and in the main() routine declared after
    170      the  second  "%%".  There are two rules, one which matches a
    171      newline ("\n") and increments both the line  count  and  the
    172      character  count,  and one which matches any character other
    173      than a newline (indicated by the "." regular expression).
    174 
    175      A somewhat more complicated example:
    176 
    177          /* scanner for a toy Pascal-like language */
    178 
    179          %{
    180          /* need this for the call to atof() below */
    181          #include <math.h>
    182          %}
    183 
    184          DIGIT    [0-9]
    185          ID       [a-z][a-z0-9]*
    186 
    187          %%
    188 
    189          {DIGIT}+    {
    190                      printf( "An integer: %s (%d)\n", yytext,
    191                              atoi( yytext ) );
    192 
    193 
    194 
    195 Version 2.5          Last change: April 1995                    3
    196 
    197 
    198 
    199 
    200 
    201 
    202 FLEX(1)                  USER COMMANDS                    FLEX(1)
    203 
    204 
    205 
    206                      }
    207 
    208          {DIGIT}+"."{DIGIT}*        {
    209                      printf( "A float: %s (%g)\n", yytext,
    210                              atof( yytext ) );
    211                      }
    212 
    213          if|then|begin|end|procedure|function        {
    214                      printf( "A keyword: %s\n", yytext );
    215                      }
    216 
    217          {ID}        printf( "An identifier: %s\n", yytext );
    218 
    219          "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
    220 
    221          "{"[^}\n]*"}"     /* eat up one-line comments */
    222 
    223          [ \t\n]+          /* eat up whitespace */
    224 
    225          .           printf( "Unrecognized character: %s\n", yytext );
    226 
    227          %%
    228 
    229          main( argc, argv )
    230          int argc;
    231          char **argv;
    232              {
    233              ++argv, --argc;  /* skip over program name */
    234              if ( argc > 0 )
    235                      yyin = fopen( argv[0], "r" );
    236              else
    237                      yyin = stdin;
    238 
    239              yylex();
    240              }
    241 
    242      This is the beginnings of a simple scanner  for  a  language
    243      like  Pascal.   It  identifies different types of tokens and
    244      reports on what it has seen.
    245 
    246      The details of this example will be explained in the follow-
    247      ing sections.
    248 
    249 FORMAT OF THE INPUT FILE
    250      The flex input file consists of three sections, separated by
    251      a line with just %% in it:
    252 
    253          definitions
    254          %%
    255          rules
    256          %%
    257          user code
    258 
    259 
    260 
    261 Version 2.5          Last change: April 1995                    4
    262 
    263 
    264 
    265 
    266 
    267 
    268 FLEX(1)                  USER COMMANDS                    FLEX(1)
    269 
    270 
    271 
    272      The definitions section contains declarations of simple name
    273      definitions  to  simplify  the  scanner  specification,  and
    274      declarations of start conditions, which are explained  in  a
    275      later section.
    276 
    277      Name definitions have the form:
    278 
    279          name definition
    280 
    281      The "name" is a word beginning with a letter  or  an  under-
    282      score  ('_')  followed by zero or more letters, digits, '_',
    283      or '-' (dash).  The definition is  taken  to  begin  at  the
    284      first  non-white-space character following the name and con-
    285      tinuing to the end of the line.  The definition  can  subse-
    286      quently  be referred to using "{name}", which will expand to
    287      "(definition)".  For example,
    288 
    289          DIGIT    [0-9]
    290          ID       [a-z][a-z0-9]*
    291 
    292      defines "DIGIT" to be a regular expression which  matches  a
    293      single  digit,  and  "ID"  to  be a regular expression which
    294      matches a letter followed by zero-or-more letters-or-digits.
    295      A subsequent reference to
    296 
    297          {DIGIT}+"."{DIGIT}*
    298 
    299      is identical to
    300 
    301          ([0-9])+"."([0-9])*
    302 
    303      and matches one-or-more digits followed by a '.' followed by
    304      zero-or-more digits.
    305 
    306      The rules section of the flex input  contains  a  series  of
    307      rules of the form:
    308 
    309          pattern   action
    310 
    311      where the pattern must be unindented  and  the  action  must
    312      begin on the same line.
    313 
    314      See below for a further description of patterns and actions.
    315 
    316      Finally, the user code section is simply copied to  lex.yy.c
    317      verbatim.   It  is used for companion routines which call or
    318      are called by the scanner.  The presence of this section  is
    319      optional;  if it is missing, the second %% in the input file
    320      may be skipped, too.
    321 
    322      In the definitions and rules sections, any indented text  or
    323      text  enclosed in %{ and %} is copied verbatim to the output
    324 
    325 
    326 
    327 Version 2.5          Last change: April 1995                    5
    328 
    329 
    330 
    331 
    332 
    333 
    334 FLEX(1)                  USER COMMANDS                    FLEX(1)
    335 
    336 
    337 
    338      (with the %{}'s removed).  The %{}'s must appear  unindented
    339      on lines by themselves.
    340 
    341      In the rules section, any indented  or  %{}  text  appearing
    342      before the first rule may be used to declare variables which
    343      are local to the scanning routine and  (after  the  declara-
    344      tions)  code  which  is to be executed whenever the scanning
    345      routine is entered.  Other indented or %{} text in the  rule
    346      section  is  still  copied to the output, but its meaning is
    347      not well-defined and it may well cause  compile-time  errors
    348      (this feature is present for POSIX compliance; see below for
    349      other such features).
    350 
    351      In the definitions section (but not in the  rules  section),
    352      an  unindented comment (i.e., a line beginning with "/*") is
    353      also copied verbatim to the output up to the next "*/".
    354 
    355 PATTERNS
    356      The patterns in the input are written using an extended  set
    357      of regular expressions.  These are:
    358 
    359          x          match the character 'x'
    360          .          any character (byte) except newline
    361          [xyz]      a "character class"; in this case, the pattern
    362                       matches either an 'x', a 'y', or a 'z'
    363          [abj-oZ]   a "character class" with a range in it; matches
    364                       an 'a', a 'b', any letter from 'j' through 'o',
    365                       or a 'Z'
    366          [^A-Z]     a "negated character class", i.e., any character
    367                       but those in the class.  In this case, any
    368                       character EXCEPT an uppercase letter.
    369          [^A-Z\n]   any character EXCEPT an uppercase letter or
    370                       a newline
    371          r*         zero or more r's, where r is any regular expression
    372          r+         one or more r's
    373          r?         zero or one r's (that is, "an optional r")
    374          r{2,5}     anywhere from two to five r's
    375          r{2,}      two or more r's
    376          r{4}       exactly 4 r's
    377          {name}     the expansion of the "name" definition
    378                     (see above)
    379          "[xyz]\"foo"
    380                     the literal string: [xyz]"foo
    381          \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
    382                       then the ANSI-C interpretation of \x.
    383                       Otherwise, a literal 'X' (used to escape
    384                       operators such as '*')
    385          \0         a NUL character (ASCII code 0)
    386          \123       the character with octal value 123
    387          \x2a       the character with hexadecimal value 2a
    388          (r)        match an r; parentheses are used to override
    389                       precedence (see below)
    390 
    391 
    392 
    393 Version 2.5          Last change: April 1995                    6
    394 
    395 
    396 
    397 
    398 
    399 
    400 FLEX(1)                  USER COMMANDS                    FLEX(1)
    401 
    402 
    403 
    404          rs         the regular expression r followed by the
    405                       regular expression s; called "concatenation"
    406 
    407 
    408          r|s        either an r or an s
    409 
    410 
    411          r/s        an r but only if it is followed by an s.  The
    412                       text matched by s is included when determining
    413                       whether this rule is the "longest match",
    414                       but is then returned to the input before
    415                       the action is executed.  So the action only
    416                       sees the text matched by r.  This type
    417                       of pattern is called trailing context".
    418                       (There are some combinations of r/s that flex
    419                       cannot match correctly; see notes in the
    420                       Deficiencies / Bugs section below regarding
    421                       "dangerous trailing context".)
    422          ^r         an r, but only at the beginning of a line (i.e.,
    423                       which just starting to scan, or right after a
    424                       newline has been scanned).
    425          r$         an r, but only at the end of a line (i.e., just
    426                       before a newline).  Equivalent to "r/\n".
    427 
    428                     Note that flex's notion of "newline" is exactly
    429                     whatever the C compiler used to compile flex
    430                     interprets '\n' as; in particular, on some DOS
    431                     systems you must either filter out \r's in the
    432                     input yourself, or explicitly use r/\r\n for "r$".
    433 
    434 
    435          <s>r       an r, but only in start condition s (see
    436                       below for discussion of start conditions)
    437          <s1,s2,s3>r
    438                     same, but in any of start conditions s1,
    439                       s2, or s3
    440          <*>r       an r in any start condition, even an exclusive one.
    441 
    442 
    443          <<EOF>>    an end-of-file
    444          <s1,s2><<EOF>>
    445                     an end-of-file when in start condition s1 or s2
    446 
    447      Note that inside of a character class, all  regular  expres-
    448      sion  operators  lose  their  special  meaning except escape
    449      ('\') and the character class operators, '-', ']',  and,  at
    450      the beginning of the class, '^'.
    451 
    452      The regular expressions listed above are  grouped  according
    453      to  precedence, from highest precedence at the top to lowest
    454      at the bottom.   Those  grouped  together  have  equal  pre-
    455      cedence.  For example,
    456 
    457 
    458 
    459 Version 2.5          Last change: April 1995                    7
    460 
    461 
    462 
    463 
    464 
    465 
    466 FLEX(1)                  USER COMMANDS                    FLEX(1)
    467 
    468 
    469 
    470          foo|bar*
    471 
    472      is the same as
    473 
    474          (foo)|(ba(r*))
    475 
    476      since the '*' operator has higher precedence than concatena-
    477      tion, and concatenation higher than alternation ('|').  This
    478      pattern therefore matches either the  string  "foo"  or  the
    479      string "ba" followed by zero-or-more r's.  To match "foo" or
    480      zero-or-more "bar"'s, use:
    481 
    482          foo|(bar)*
    483 
    484      and to match zero-or-more "foo"'s-or-"bar"'s:
    485 
    486          (foo|bar)*
    487 
    488 
    489      In addition to characters and ranges of characters,  charac-
    490      ter  classes  can  also contain character class expressions.
    491      These are expressions enclosed inside [: and  :]  delimiters
    492      (which themselves must appear between the '[' and ']' of the
    493      character class; other elements may occur inside the charac-
    494      ter class, too).  The valid expressions are:
    495 
    496          [:alnum:] [:alpha:] [:blank:]
    497          [:cntrl:] [:digit:] [:graph:]
    498          [:lower:] [:print:] [:punct:]
    499          [:space:] [:upper:] [:xdigit:]
    500 
    501      These  expressions  all  designate  a  set   of   characters
    502      equivalent  to  the corresponding standard C isXXX function.
    503      For example, [:alnum:] designates those characters for which
    504      isalnum()  returns  true  - i.e., any alphabetic or numeric.
    505      Some  systems  don't  provide  isblank(),  so  flex  defines
    506      [:blank:] as a blank or a tab.
    507 
    508      For  example,  the  following  character  classes  are   all
    509      equivalent:
    510 
    511          [[:alnum:]]
    512          [[:alpha:][:digit:]
    513          [[:alpha:]0-9]
    514          [a-zA-Z0-9]
    515 
    516      If your scanner is  case-insensitive  (the  -i  flag),  then
    517      [:upper:] and [:lower:] are equivalent to [:alpha:].
    518 
    519      Some notes on patterns:
    520 
    521      -    A negated character class such as the example  "[^A-Z]"
    522 
    523 
    524 
    525 Version 2.5          Last change: April 1995                    8
    526 
    527 
    528 
    529 
    530 
    531 
    532 FLEX(1)                  USER COMMANDS                    FLEX(1)
    533 
    534 
    535 
    536           above   will   match  a  newline  unless  "\n"  (or  an
    537           equivalent escape sequence) is one  of  the  characters
    538           explicitly  present  in  the  negated  character  class
    539           (e.g., "[^A-Z\n]").  This is unlike how many other reg-
    540           ular  expression tools treat negated character classes,
    541           but unfortunately  the  inconsistency  is  historically
    542           entrenched.   Matching  newlines  means  that a pattern
    543           like [^"]* can match the entire  input  unless  there's
    544           another quote in the input.
    545 
    546      -    A rule can have at most one instance of  trailing  con-
    547           text (the '/' operator or the '$' operator).  The start
    548           condition, '^', and "<<EOF>>" patterns can  only  occur
    549           at the beginning of a pattern, and, as well as with '/'
    550           and '$', cannot be grouped inside parentheses.   A  '^'
    551           which  does  not  occur at the beginning of a rule or a
    552           '$' which does not occur at the end of a rule loses its
    553           special  properties  and is treated as a normal charac-
    554           ter.
    555 
    556           The following are illegal:
    557 
    558               foo/bar$
    559               <sc1>foo<sc2>bar
    560 
    561           Note  that  the  first  of  these,   can   be   written
    562           "foo/bar\n".
    563 
    564           The following will result in '$' or '^'  being  treated
    565           as a normal character:
    566 
    567               foo|(bar$)
    568               foo|^bar
    569 
    570           If what's wanted is a  "foo"  or  a  bar-followed-by-a-
    571           newline,  the  following could be used (the special '|'
    572           action is explained below):
    573 
    574               foo      |
    575               bar$     /* action goes here */
    576 
    577           A similar trick will work for matching a foo or a  bar-
    578           at-the-beginning-of-a-line.
    579 
    580 HOW THE INPUT IS MATCHED
    581      When the generated scanner is run,  it  analyzes  its  input
    582      looking  for strings which match any of its patterns.  If it
    583      finds more than one match, it takes  the  one  matching  the
    584      most  text  (for  trailing  context rules, this includes the
    585      length of the trailing part, even though  it  will  then  be
    586      returned  to the input).  If it finds two or more matches of
    587      the same length, the rule listed first  in  the  flex  input
    588 
    589 
    590 
    591 Version 2.5          Last change: April 1995                    9
    592 
    593 
    594 
    595 
    596 
    597 
    598 FLEX(1)                  USER COMMANDS                    FLEX(1)
    599 
    600 
    601 
    602      file is chosen.
    603 
    604      Once the match is determined, the text corresponding to  the
    605      match  (called  the  token)  is made available in the global
    606      character pointer yytext,  and  its  length  in  the  global
    607      integer yyleng. The action corresponding to the matched pat-
    608      tern is  then  executed  (a  more  detailed  description  of
    609      actions  follows),  and  then the remaining input is scanned
    610      for another match.
    611 
    612      If no match is found, then the default rule is executed: the
    613      next character in the input is considered matched and copied
    614      to the standard output.  Thus, the simplest legal flex input
    615      is:
    616 
    617          %%
    618 
    619      which generates a scanner that simply copies its input  (one
    620      character at a time) to its output.
    621 
    622      Note that yytext can  be  defined  in  two  different  ways:
    623      either  as  a character pointer or as a character array. You
    624      can control which definition flex uses by including  one  of
    625      the  special  directives  %pointer  or  %array  in the first
    626      (definitions) section of your flex input.   The  default  is
    627      %pointer, unless you use the -l lex compatibility option, in
    628      which case yytext will be an array.  The advantage of  using
    629      %pointer  is  substantially  faster  scanning  and no buffer
    630      overflow when matching very large tokens (unless you run out
    631      of  dynamic  memory).  The disadvantage is that you are res-
    632      tricted in how your actions can modify yytext (see the  next
    633      section),  and  calls  to  the unput() function destroys the
    634      present contents of yytext,  which  can  be  a  considerable
    635      porting headache when moving between different lex versions.
    636 
    637      The advantage of %array is that you can then  modify  yytext
    638      to your heart's content, and calls to unput() do not destroy
    639      yytext (see  below).   Furthermore,  existing  lex  programs
    640      sometimes access yytext externally using declarations of the
    641      form:
    642          extern char yytext[];
    643      This definition is erroneous when used  with  %pointer,  but
    644      correct for %array.
    645 
    646      %array defines yytext to be an array of  YYLMAX  characters,
    647      which  defaults to a fairly large value.  You can change the
    648      size by simply #define'ing YYLMAX to a  different  value  in
    649      the  first  section of your flex input.  As mentioned above,
    650      with %pointer yytext grows dynamically to accommodate  large
    651      tokens.  While this means your %pointer scanner can accommo-
    652      date very large tokens (such as matching  entire  blocks  of
    653      comments),  bear  in  mind  that  each time the scanner must
    654 
    655 
    656 
    657 Version 2.5          Last change: April 1995                   10
    658 
    659 
    660 
    661 
    662 
    663 
    664 FLEX(1)                  USER COMMANDS                    FLEX(1)
    665 
    666 
    667 
    668      resize yytext it also must rescan the entire token from  the
    669      beginning,  so  matching such tokens can prove slow.  yytext
    670      presently does not dynamically grow if  a  call  to  unput()
    671      results  in too much text being pushed back; instead, a run-
    672      time error results.
    673 
    674      Also note that  you  cannot  use  %array  with  C++  scanner
    675      classes (the c++ option; see below).
    676 
    677 ACTIONS
    678      Each pattern in a rule has a corresponding action, which can
    679      be any arbitrary C statement.  The pattern ends at the first
    680      non-escaped whitespace character; the remainder of the  line
    681      is  its  action.  If the action is empty, then when the pat-
    682      tern is matched the input token is  simply  discarded.   For
    683      example,  here  is  the  specification  for  a program which
    684      deletes all occurrences of "zap me" from its input:
    685 
    686          %%
    687          "zap me"
    688 
    689      (It will copy all other characters in the input to the  out-
    690      put since they will be matched by the default rule.)
    691 
    692      Here is a program which compresses multiple blanks and  tabs
    693      down  to a single blank, and throws away whitespace found at
    694      the end of a line:
    695 
    696          %%
    697          [ \t]+        putchar( ' ' );
    698          [ \t]+$       /* ignore this token */
    699 
    700 
    701      If the action contains a '{', then the action spans till the
    702      balancing  '}'  is  found, and the action may cross multiple
    703      lines.  flex knows about C strings and comments and won't be
    704      fooled  by braces found within them, but also allows actions
    705      to begin with %{ and will consider the action to be all  the
    706      text up to the next %} (regardless of ordinary braces inside
    707      the action).
    708 
    709      An action consisting solely of a vertical  bar  ('|')  means
    710      "same  as  the  action for the next rule."  See below for an
    711      illustration.
    712 
    713      Actions can  include  arbitrary  C  code,  including  return
    714      statements  to  return  a  value  to whatever routine called
    715      yylex(). Each time yylex() is called it continues processing
    716      tokens  from  where it last left off until it either reaches
    717      the end of the file or executes a return.
    718 
    719 
    720 
    721 
    722 
    723 Version 2.5          Last change: April 1995                   11
    724 
    725 
    726 
    727 
    728 
    729 
    730 FLEX(1)                  USER COMMANDS                    FLEX(1)
    731 
    732 
    733 
    734      Actions are free to modify yytext except for lengthening  it
    735      (adding  characters  to  its end--these will overwrite later
    736      characters in the input  stream).   This  however  does  not
    737      apply  when  using  %array (see above); in that case, yytext
    738      may be freely modified in any way.
    739 
    740      Actions are free to modify yyleng except they should not  do
    741      so if the action also includes use of yymore() (see below).
    742 
    743      There are a  number  of  special  directives  which  can  be
    744      included within an action:
    745 
    746      -    ECHO copies yytext to the scanner's output.
    747 
    748      -    BEGIN followed by the name of a start condition  places
    749           the  scanner  in the corresponding start condition (see
    750           below).
    751 
    752      -    REJECT directs the scanner to proceed on to the "second
    753           best"  rule which matched the input (or a prefix of the
    754           input).  The rule is chosen as described above in  "How
    755           the  Input  is  Matched",  and yytext and yyleng set up
    756           appropriately.  It may either be one which  matched  as
    757           much  text as the originally chosen rule but came later
    758           in the flex input file, or one which matched less text.
    759           For example, the following will both count the words in
    760           the input  and  call  the  routine  special()  whenever
    761           "frob" is seen:
    762 
    763                       int word_count = 0;
    764               %%
    765 
    766               frob        special(); REJECT;
    767               [^ \t\n]+   ++word_count;
    768 
    769           Without the REJECT, any "frob"'s in the input would not
    770           be  counted  as  words, since the scanner normally exe-
    771           cutes only one action per token.  Multiple REJECT's are
    772           allowed,  each  one finding the next best choice to the
    773           currently active rule.  For example, when the following
    774           scanner  scans the token "abcd", it will write "abcdab-
    775           caba" to the output:
    776 
    777               %%
    778               a        |
    779               ab       |
    780               abc      |
    781               abcd     ECHO; REJECT;
    782               .|\n     /* eat up any unmatched character */
    783 
    784           (The first three rules share the fourth's action  since
    785           they   use   the  special  '|'  action.)  REJECT  is  a
    786 
    787 
    788 
    789 Version 2.5          Last change: April 1995                   12
    790 
    791 
    792 
    793 
    794 
    795 
    796 FLEX(1)                  USER COMMANDS                    FLEX(1)
    797 
    798 
    799 
    800           particularly expensive feature in terms of scanner per-
    801           formance; if it is used in any of the scanner's actions
    802           it will  slow  down  all  of  the  scanner's  matching.
    803           Furthermore,  REJECT cannot be used with the -Cf or -CF
    804           options (see below).
    805 
    806           Note also that unlike the other special actions, REJECT
    807           is  a  branch;  code  immediately  following  it in the
    808           action will not be executed.
    809 
    810      -    yymore() tells  the  scanner  that  the  next  time  it
    811           matches  a  rule,  the  corresponding  token  should be
    812           appended onto the current value of yytext  rather  than
    813           replacing  it.   For  example,  given  the input "mega-
    814           kludge" the following will write "mega-mega-kludge"  to
    815           the output:
    816 
    817               %%
    818               mega-    ECHO; yymore();
    819               kludge   ECHO;
    820 
    821           First "mega-" is matched  and  echoed  to  the  output.
    822           Then  "kludge"  is matched, but the previous "mega-" is
    823           still hanging around at the beginning of yytext so  the
    824           ECHO  for  the "kludge" rule will actually write "mega-
    825           kludge".
    826 
    827      Two notes regarding use of yymore(). First, yymore() depends
    828      on  the value of yyleng correctly reflecting the size of the
    829      current token, so you must not  modify  yyleng  if  you  are
    830      using  yymore().  Second,  the  presence  of yymore() in the
    831      scanner's action entails a minor performance penalty in  the
    832      scanner's matching speed.
    833 
    834      -    yyless(n) returns all but the first n characters of the
    835           current token back to the input stream, where they will
    836           be rescanned when the scanner looks for the next match.
    837           yytext  and  yyleng  are  adjusted appropriately (e.g.,
    838           yyleng will now be equal to n ).  For example,  on  the
    839           input  "foobar"  the  following will write out "foobar-
    840           bar":
    841 
    842               %%
    843               foobar    ECHO; yyless(3);
    844               [a-z]+    ECHO;
    845 
    846           An argument of  0  to  yyless  will  cause  the  entire
    847           current  input  string  to  be  scanned  again.  Unless
    848           you've changed how the scanner will  subsequently  pro-
    849           cess  its  input  (using BEGIN, for example), this will
    850           result in an endless loop.
    851 
    852 
    853 
    854 
    855 Version 2.5          Last change: April 1995                   13
    856 
    857 
    858 
    859 
    860 
    861 
    862 FLEX(1)                  USER COMMANDS                    FLEX(1)
    863 
    864 
    865 
    866      Note that yyless is a macro and can only be used in the flex
    867      input file, not from other source files.
    868 
    869      -    unput(c) puts the  character  c  back  onto  the  input
    870           stream.   It  will  be the next character scanned.  The
    871           following action will take the current token and  cause
    872           it to be rescanned enclosed in parentheses.
    873 
    874               {
    875               int i;
    876               /* Copy yytext because unput() trashes yytext */
    877               char *yycopy = strdup( yytext );
    878               unput( ')' );
    879               for ( i = yyleng - 1; i >= 0; --i )
    880                   unput( yycopy[i] );
    881               unput( '(' );
    882               free( yycopy );
    883               }
    884 
    885           Note that since each unput() puts the  given  character
    886           back at the beginning of the input stream, pushing back
    887           strings must be done back-to-front.
    888 
    889      An important potential problem when using unput() is that if
    890      you are using %pointer (the default), a call to unput() des-
    891      troys the contents of yytext, starting  with  its  rightmost
    892      character  and devouring one character to the left with each
    893      call.  If you need the value of  yytext  preserved  after  a
    894      call  to  unput() (as in the above example), you must either
    895      first copy it elsewhere, or build your scanner using  %array
    896      instead (see How The Input Is Matched).
    897 
    898      Finally, note that you cannot put back  EOF  to  attempt  to
    899      mark the input stream with an end-of-file.
    900 
    901      -    input() reads the next character from the input stream.
    902           For  example, the following is one way to eat up C com-
    903           ments:
    904 
    905               %%
    906               "/*"        {
    907                           register int c;
    908 
    909                           for ( ; ; )
    910                               {
    911                               while ( (c = input()) != '*' &&
    912                                       c != EOF )
    913                                   ;    /* eat up text of comment */
    914 
    915                               if ( c == '*' )
    916                                   {
    917                                   while ( (c = input()) == '*' )
    918 
    919 
    920 
    921 Version 2.5          Last change: April 1995                   14
    922 
    923 
    924 
    925 
    926 
    927 
    928 FLEX(1)                  USER COMMANDS                    FLEX(1)
    929 
    930 
    931 
    932                                       ;
    933                                   if ( c == '/' )
    934                                       break;    /* found the end */
    935                                   }
    936 
    937                               if ( c == EOF )
    938                                   {
    939                                   error( "EOF in comment" );
    940                                   break;
    941                                   }
    942                               }
    943                           }
    944 
    945           (Note that if the scanner is compiled using  C++,  then
    946           input()  is  instead referred to as yyinput(), in order
    947           to avoid a name clash with the C++ stream by  the  name
    948           of input.)
    949 
    950      -    YY_FLUSH_BUFFER flushes the scanner's  internal  buffer
    951           so  that  the next time the scanner attempts to match a
    952           token, it will first refill the buffer  using  YY_INPUT
    953           (see  The  Generated Scanner, below).  This action is a
    954           special case  of  the  more  general  yy_flush_buffer()
    955           function, described below in the section Multiple Input
    956           Buffers.
    957 
    958      -    yyterminate() can be used in lieu of a return statement
    959           in  an action.  It terminates the scanner and returns a
    960           0 to the scanner's caller, indicating "all  done".   By
    961           default,  yyterminate()  is also called when an end-of-
    962           file is encountered.  It is a macro and  may  be  rede-
    963           fined.
    964 
    965 THE GENERATED SCANNER
    966      The output of flex is the file lex.yy.c, which contains  the
    967      scanning  routine yylex(), a number of tables used by it for
    968      matching tokens, and a number of auxiliary routines and mac-
    969      ros.  By default, yylex() is declared as follows:
    970 
    971          int yylex()
    972              {
    973              ... various definitions and the actions in here ...
    974              }
    975 
    976      (If your environment supports function prototypes,  then  it
    977      will  be  "int  yylex(  void  )".)   This  definition may be
    978      changed by defining the "YY_DECL" macro.  For  example,  you
    979      could use:
    980 
    981          #define YY_DECL float lexscan( a, b ) float a, b;
    982 
    983      to give the scanning routine the name lexscan,  returning  a
    984 
    985 
    986 
    987 Version 2.5          Last change: April 1995                   15
    988 
    989 
    990 
    991 
    992 
    993 
    994 FLEX(1)                  USER COMMANDS                    FLEX(1)
    995 
    996 
    997 
    998      float, and taking two floats as arguments.  Note that if you
    999      give  arguments  to  the  scanning  routine  using  a   K&R-
   1000      style/non-prototyped  function  declaration,  you  must ter-
   1001      minate the definition with a semi-colon (;).
   1002 
   1003      Whenever yylex() is called, it scans tokens from the  global
   1004      input  file  yyin  (which  defaults to stdin).  It continues
   1005      until it either reaches an end-of-file (at  which  point  it
   1006      returns the value 0) or one of its actions executes a return
   1007      statement.
   1008 
   1009      If the scanner reaches an end-of-file, subsequent calls  are
   1010      undefined  unless either yyin is pointed at a new input file
   1011      (in which case scanning continues from that file), or yyres-
   1012      tart()  is called.  yyrestart() takes one argument, a FILE *
   1013      pointer (which can be nil, if you've set up YY_INPUT to scan
   1014      from  a  source  other  than yyin), and initializes yyin for
   1015      scanning from that file.  Essentially there is no difference
   1016      between  just  assigning  yyin  to a new input file or using
   1017      yyrestart() to do so; the latter is available  for  compati-
   1018      bility with previous versions of flex, and because it can be
   1019      used to switch input files in the middle  of  scanning.   It
   1020      can  also be used to throw away the current input buffer, by
   1021      calling it with an argument of yyin; but better  is  to  use
   1022      YY_FLUSH_BUFFER (see above).  Note that yyrestart() does not
   1023      reset the start condition to INITIAL (see Start  Conditions,
   1024      below).
   1025 
   1026      If yylex() stops scanning due to executing a  return  state-
   1027      ment  in  one of the actions, the scanner may then be called
   1028      again and it will resume scanning where it left off.
   1029 
   1030      By default (and for purposes  of  efficiency),  the  scanner
   1031      uses  block-reads  rather  than  simple getc() calls to read
   1032      characters from yyin. The nature of how it  gets  its  input
   1033      can   be   controlled   by   defining  the  YY_INPUT  macro.
   1034      YY_INPUT's           calling           sequence           is
   1035      "YY_INPUT(buf,result,max_size)".   Its action is to place up
   1036      to max_size characters in the character array buf and return
   1037      in  the integer variable result either the number of charac-
   1038      ters read or the constant YY_NULL (0  on  Unix  systems)  to
   1039      indicate  EOF.   The  default YY_INPUT reads from the global
   1040      file-pointer "yyin".
   1041 
   1042      A sample definition of YY_INPUT (in the definitions  section
   1043      of the input file):
   1044 
   1045          %{
   1046          #define YY_INPUT(buf,result,max_size) \
   1047              { \
   1048              int c = getchar(); \
   1049              result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
   1050 
   1051 
   1052 
   1053 Version 2.5          Last change: April 1995                   16
   1054 
   1055 
   1056 
   1057 
   1058 
   1059 
   1060 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1061 
   1062 
   1063 
   1064              }
   1065          %}
   1066 
   1067      This definition will change the input  processing  to  occur
   1068      one character at a time.
   1069 
   1070      When the scanner receives  an  end-of-file  indication  from
   1071      YY_INPUT, it then checks the yywrap() function.  If yywrap()
   1072      returns false (zero), then it is assumed that  the  function
   1073      has  gone  ahead  and  set up yyin to point to another input
   1074      file, and scanning continues.   If  it  returns  true  (non-
   1075      zero),  then  the  scanner  terminates,  returning  0 to its
   1076      caller.  Note that  in  either  case,  the  start  condition
   1077      remains unchanged; it does not revert to INITIAL.
   1078 
   1079      If you do not supply your own version of yywrap(), then  you
   1080      must  either use %option noyywrap (in which case the scanner
   1081      behaves as though yywrap() returned 1),  or  you  must  link
   1082      with  -lfl  to  obtain  the  default version of the routine,
   1083      which always returns 1.
   1084 
   1085      Three routines are available  for  scanning  from  in-memory
   1086      buffers     rather     than     files:     yy_scan_string(),
   1087      yy_scan_bytes(), and yy_scan_buffer(). See the discussion of
   1088      them below in the section Multiple Input Buffers.
   1089 
   1090      The scanner writes its  ECHO  output  to  the  yyout  global
   1091      (default, stdout), which may be redefined by the user simply
   1092      by assigning it to some other FILE pointer.
   1093 
   1094 START CONDITIONS
   1095      flex  provides  a  mechanism  for  conditionally  activating
   1096      rules.   Any rule whose pattern is prefixed with "<sc>" will
   1097      only be active when the scanner is in  the  start  condition
   1098      named "sc".  For example,
   1099 
   1100          <STRING>[^"]*        { /* eat up the string body ... */
   1101                      ...
   1102                      }
   1103 
   1104      will be active only when the  scanner  is  in  the  "STRING"
   1105      start condition, and
   1106 
   1107          <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
   1108                      ...
   1109                      }
   1110 
   1111      will be active only when  the  current  start  condition  is
   1112      either "INITIAL", "STRING", or "QUOTE".
   1113 
   1114      Start conditions are declared  in  the  definitions  (first)
   1115      section  of  the input using unindented lines beginning with
   1116 
   1117 
   1118 
   1119 Version 2.5          Last change: April 1995                   17
   1120 
   1121 
   1122 
   1123 
   1124 
   1125 
   1126 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1127 
   1128 
   1129 
   1130      either %s or %x followed by a list  of  names.   The  former
   1131      declares  inclusive  start  conditions, the latter exclusive
   1132      start conditions.  A start condition is activated using  the
   1133      BEGIN  action.   Until  the  next  BEGIN action is executed,
   1134      rules with the given start  condition  will  be  active  and
   1135      rules  with other start conditions will be inactive.  If the
   1136      start condition is inclusive, then rules with no start  con-
   1137      ditions  at  all  will  also be active.  If it is exclusive,
   1138      then only rules qualified with the start condition  will  be
   1139      active.   A  set  of  rules contingent on the same exclusive
   1140      start condition describe a scanner which is  independent  of
   1141      any  of the other rules in the flex input.  Because of this,
   1142      exclusive start conditions make it easy  to  specify  "mini-
   1143      scanners"  which scan portions of the input that are syntac-
   1144      tically different from the rest (e.g., comments).
   1145 
   1146      If the distinction between  inclusive  and  exclusive  start
   1147      conditions  is still a little vague, here's a simple example
   1148      illustrating the connection between the  two.   The  set  of
   1149      rules:
   1150 
   1151          %s example
   1152          %%
   1153 
   1154          <example>foo   do_something();
   1155 
   1156          bar            something_else();
   1157 
   1158      is equivalent to
   1159 
   1160          %x example
   1161          %%
   1162 
   1163          <example>foo   do_something();
   1164 
   1165          <INITIAL,example>bar    something_else();
   1166 
   1167      Without the <INITIAL,example> qualifier, the bar pattern  in
   1168      the second example wouldn't be active (i.e., couldn't match)
   1169      when in start condition example. If we just  used  <example>
   1170      to  qualify  bar,  though,  then  it would only be active in
   1171      example and not in INITIAL, while in the first example  it's
   1172      active  in  both,  because  in the first example the example
   1173      startion condition is an inclusive (%s) start condition.
   1174 
   1175      Also note that the  special  start-condition  specifier  <*>
   1176      matches  every  start  condition.   Thus,  the above example
   1177      could also have been written;
   1178 
   1179          %x example
   1180          %%
   1181 
   1182 
   1183 
   1184 
   1185 Version 2.5          Last change: April 1995                   18
   1186 
   1187 
   1188 
   1189 
   1190 
   1191 
   1192 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1193 
   1194 
   1195 
   1196          <example>foo   do_something();
   1197 
   1198          <*>bar    something_else();
   1199 
   1200 
   1201      The default rule (to ECHO any unmatched  character)  remains
   1202      active in start conditions.  It is equivalent to:
   1203 
   1204          <*>.|\n     ECHO;
   1205 
   1206 
   1207      BEGIN(0) returns to the original state where only the  rules
   1208      with no start conditions are active.  This state can also be
   1209      referred   to   as   the   start-condition   "INITIAL",   so
   1210      BEGIN(INITIAL)  is  equivalent to BEGIN(0). (The parentheses
   1211      around the start condition name are  not  required  but  are
   1212      considered good style.)
   1213 
   1214      BEGIN actions can also be given  as  indented  code  at  the
   1215      beginning  of the rules section.  For example, the following
   1216      will cause the scanner to enter the "SPECIAL"  start  condi-
   1217      tion  whenever  yylex()  is  called  and the global variable
   1218      enter_special is true:
   1219 
   1220                  int enter_special;
   1221 
   1222          %x SPECIAL
   1223          %%
   1224                  if ( enter_special )
   1225                      BEGIN(SPECIAL);
   1226 
   1227          <SPECIAL>blahblahblah
   1228          ...more rules follow...
   1229 
   1230 
   1231      To illustrate the  uses  of  start  conditions,  here  is  a
   1232      scanner  which  provides  two different interpretations of a
   1233      string like "123.456".  By default it will treat it as three
   1234      tokens,  the  integer  "123",  a  dot ('.'), and the integer
   1235      "456".  But if the string is preceded earlier in the line by
   1236      the  string  "expect-floats"  it  will  treat it as a single
   1237      token, the floating-point number 123.456:
   1238 
   1239          %{
   1240          #include <math.h>
   1241          %}
   1242          %s expect
   1243 
   1244          %%
   1245          expect-floats        BEGIN(expect);
   1246 
   1247          <expect>[0-9]+"."[0-9]+      {
   1248 
   1249 
   1250 
   1251 Version 2.5          Last change: April 1995                   19
   1252 
   1253 
   1254 
   1255 
   1256 
   1257 
   1258 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1259 
   1260 
   1261 
   1262                      printf( "found a float, = %f\n",
   1263                              atof( yytext ) );
   1264                      }
   1265          <expect>\n           {
   1266                      /* that's the end of the line, so
   1267                       * we need another "expect-number"
   1268                       * before we'll recognize any more
   1269                       * numbers
   1270                       */
   1271                      BEGIN(INITIAL);
   1272                      }
   1273 
   1274          [0-9]+      {
   1275                      printf( "found an integer, = %d\n",
   1276                              atoi( yytext ) );
   1277                      }
   1278 
   1279          "."         printf( "found a dot\n" );
   1280 
   1281      Here is a scanner which recognizes (and discards) C comments
   1282      while maintaining a count of the current input line.
   1283 
   1284          %x comment
   1285          %%
   1286                  int line_num = 1;
   1287 
   1288          "/*"         BEGIN(comment);
   1289 
   1290          <comment>[^*\n]*        /* eat anything that's not a '*' */
   1291          <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1292          <comment>\n             ++line_num;
   1293          <comment>"*"+"/"        BEGIN(INITIAL);
   1294 
   1295      This scanner goes to a bit of trouble to match as much  text
   1296      as  possible with each rule.  In general, when attempting to
   1297      write a high-speed scanner try to match as much possible  in
   1298      each rule, as it's a big win.
   1299 
   1300      Note that start-conditions names are really  integer  values
   1301      and  can  be  stored  as  such.   Thus,  the  above could be
   1302      extended in the following fashion:
   1303 
   1304          %x comment foo
   1305          %%
   1306                  int line_num = 1;
   1307                  int comment_caller;
   1308 
   1309          "/*"         {
   1310                       comment_caller = INITIAL;
   1311                       BEGIN(comment);
   1312                       }
   1313 
   1314 
   1315 
   1316 
   1317 Version 2.5          Last change: April 1995                   20
   1318 
   1319 
   1320 
   1321 
   1322 
   1323 
   1324 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1325 
   1326 
   1327 
   1328          ...
   1329 
   1330          <foo>"/*"    {
   1331                       comment_caller = foo;
   1332                       BEGIN(comment);
   1333                       }
   1334 
   1335          <comment>[^*\n]*        /* eat anything that's not a '*' */
   1336          <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
   1337          <comment>\n             ++line_num;
   1338          <comment>"*"+"/"        BEGIN(comment_caller);
   1339 
   1340      Furthermore, you can  access  the  current  start  condition
   1341      using  the  integer-valued YY_START macro.  For example, the
   1342      above assignments to comment_caller could instead be written
   1343 
   1344          comment_caller = YY_START;
   1345 
   1346      Flex provides YYSTATE as an alias for YY_START  (since  that
   1347      is what's used by AT&T lex).
   1348 
   1349      Note that start conditions do not have their own name-space;
   1350      %s's   and  %x's  declare  names  in  the  same  fashion  as
   1351      #define's.
   1352 
   1353      Finally, here's an example of how to  match  C-style  quoted
   1354      strings using exclusive start conditions, including expanded
   1355      escape sequences (but not including checking  for  a  string
   1356      that's too long):
   1357 
   1358          %x str
   1359 
   1360          %%
   1361                  char string_buf[MAX_STR_CONST];
   1362                  char *string_buf_ptr;
   1363 
   1364 
   1365          \"      string_buf_ptr = string_buf; BEGIN(str);
   1366 
   1367          <str>\"        { /* saw closing quote - all done */
   1368                  BEGIN(INITIAL);
   1369                  *string_buf_ptr = '\0';
   1370                  /* return string constant token type and
   1371                   * value to parser
   1372                   */
   1373                  }
   1374 
   1375          <str>\n        {
   1376                  /* error - unterminated string constant */
   1377                  /* generate error message */
   1378                  }
   1379 
   1380 
   1381 
   1382 
   1383 Version 2.5          Last change: April 1995                   21
   1384 
   1385 
   1386 
   1387 
   1388 
   1389 
   1390 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1391 
   1392 
   1393 
   1394          <str>\\[0-7]{1,3} {
   1395                  /* octal escape sequence */
   1396                  int result;
   1397 
   1398                  (void) sscanf( yytext + 1, "%o", &result );
   1399 
   1400                  if ( result > 0xff )
   1401                          /* error, constant is out-of-bounds */
   1402 
   1403                  *string_buf_ptr++ = result;
   1404                  }
   1405 
   1406          <str>\\[0-9]+ {
   1407                  /* generate error - bad escape sequence; something
   1408                   * like '\48' or '\0777777'
   1409                   */
   1410                  }
   1411 
   1412          <str>\\n  *string_buf_ptr++ = '\n';
   1413          <str>\\t  *string_buf_ptr++ = '\t';
   1414          <str>\\r  *string_buf_ptr++ = '\r';
   1415          <str>\\b  *string_buf_ptr++ = '\b';
   1416          <str>\\f  *string_buf_ptr++ = '\f';
   1417 
   1418          <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
   1419 
   1420          <str>[^\\\n\"]+        {
   1421                  char *yptr = yytext;
   1422 
   1423                  while ( *yptr )
   1424                          *string_buf_ptr++ = *yptr++;
   1425                  }
   1426 
   1427 
   1428      Often, such as in some of the examples above,  you  wind  up
   1429      writing  a  whole  bunch  of  rules all preceded by the same
   1430      start condition(s).  Flex makes this  a  little  easier  and
   1431      cleaner  by introducing a notion of start condition scope. A
   1432      start condition scope is begun with:
   1433 
   1434          <SCs>{
   1435 
   1436      where SCs is a list of one or more start conditions.  Inside
   1437      the  start condition scope, every rule automatically has the
   1438      prefix <SCs> applied to it, until a '}'  which  matches  the
   1439      initial '{'. So, for example,
   1440 
   1441          <ESC>{
   1442              "\\n"   return '\n';
   1443              "\\r"   return '\r';
   1444              "\\f"   return '\f';
   1445              "\\0"   return '\0';
   1446 
   1447 
   1448 
   1449 Version 2.5          Last change: April 1995                   22
   1450 
   1451 
   1452 
   1453 
   1454 
   1455 
   1456 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1457 
   1458 
   1459 
   1460          }
   1461 
   1462      is equivalent to:
   1463 
   1464          <ESC>"\\n"  return '\n';
   1465          <ESC>"\\r"  return '\r';
   1466          <ESC>"\\f"  return '\f';
   1467          <ESC>"\\0"  return '\0';
   1468 
   1469      Start condition scopes may be nested.
   1470 
   1471      Three routines are  available  for  manipulating  stacks  of
   1472      start conditions:
   1473 
   1474      void yy_push_state(int new_state)
   1475           pushes the current start condition onto the top of  the
   1476           start  condition  stack  and  switches  to new_state as
   1477           though you had used BEGIN new_state (recall that  start
   1478           condition names are also integers).
   1479 
   1480      void yy_pop_state()
   1481           pops the top of the stack and switches to it via BEGIN.
   1482 
   1483      int yy_top_state()
   1484           returns the top  of  the  stack  without  altering  the
   1485           stack's contents.
   1486 
   1487      The start condition stack grows dynamically and  so  has  no
   1488      built-in  size  limitation.  If memory is exhausted, program
   1489      execution aborts.
   1490 
   1491      To use start condition stacks, your scanner must  include  a
   1492      %option stack directive (see Options below).
   1493 
   1494 MULTIPLE INPUT BUFFERS
   1495      Some scanners (such as those which support "include"  files)
   1496      require   reading  from  several  input  streams.   As  flex
   1497      scanners do a large amount of buffering, one cannot  control
   1498      where  the  next input will be read from by simply writing a
   1499      YY_INPUT  which  is  sensitive  to  the  scanning   context.
   1500      YY_INPUT  is only called when the scanner reaches the end of
   1501      its buffer, which may be a long time after scanning a state-
   1502      ment such as an "include" which requires switching the input
   1503      source.
   1504 
   1505      To negotiate  these  sorts  of  problems,  flex  provides  a
   1506      mechanism  for creating and switching between multiple input
   1507      buffers.  An input buffer is created by using:
   1508 
   1509          YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
   1510 
   1511      which takes a FILE pointer and a size and creates  a  buffer
   1512 
   1513 
   1514 
   1515 Version 2.5          Last change: April 1995                   23
   1516 
   1517 
   1518 
   1519 
   1520 
   1521 
   1522 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1523 
   1524 
   1525 
   1526      associated with the given file and large enough to hold size
   1527      characters (when in doubt, use YY_BUF_SIZE  for  the  size).
   1528      It  returns  a  YY_BUFFER_STATE  handle,  which  may then be
   1529      passed to other routines (see below).   The  YY_BUFFER_STATE
   1530      type is a pointer to an opaque struct yy_buffer_state struc-
   1531      ture, so you may safely initialize YY_BUFFER_STATE variables
   1532      to  ((YY_BUFFER_STATE) 0) if you wish, and also refer to the
   1533      opaque structure in order to correctly declare input buffers
   1534      in  source files other than that of your scanner.  Note that
   1535      the FILE pointer in the call  to  yy_create_buffer  is  only
   1536      used  as the value of yyin seen by YY_INPUT; if you redefine
   1537      YY_INPUT so it no longer uses yyin, then you can safely pass
   1538      a nil FILE pointer to yy_create_buffer. You select a partic-
   1539      ular buffer to scan from using:
   1540 
   1541          void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
   1542 
   1543      switches the scanner's input  buffer  so  subsequent  tokens
   1544      will  come  from new_buffer. Note that yy_switch_to_buffer()
   1545      may be used by yywrap() to set things up for continued scan-
   1546      ning, instead of opening a new file and pointing yyin at it.
   1547      Note  also  that  switching   input   sources   via   either
   1548      yy_switch_to_buffer()  or yywrap() does not change the start
   1549      condition.
   1550 
   1551          void yy_delete_buffer( YY_BUFFER_STATE buffer )
   1552 
   1553      is used to reclaim the storage associated with a buffer.   (
   1554      buffer  can be nil, in which case the routine does nothing.)
   1555      You can also clear the current contents of a buffer using:
   1556 
   1557          void yy_flush_buffer( YY_BUFFER_STATE buffer )
   1558 
   1559      This function discards the buffer's contents,  so  the  next
   1560      time  the scanner attempts to match a token from the buffer,
   1561      it will first fill the buffer anew using YY_INPUT.
   1562 
   1563      yy_new_buffer() is an alias for yy_create_buffer(), provided
   1564      for  compatibility  with  the  C++ use of new and delete for
   1565      creating and destroying dynamic objects.
   1566 
   1567      Finally,   the    YY_CURRENT_BUFFER    macro    returns    a
   1568      YY_BUFFER_STATE handle to the current buffer.
   1569 
   1570      Here is an example of using these  features  for  writing  a
   1571      scanner  which expands include files (the <<EOF>> feature is
   1572      discussed below):
   1573 
   1574          /* the "incl" state is used for picking up the name
   1575           * of an include file
   1576           */
   1577          %x incl
   1578 
   1579 
   1580 
   1581 Version 2.5          Last change: April 1995                   24
   1582 
   1583 
   1584 
   1585 
   1586 
   1587 
   1588 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1589 
   1590 
   1591 
   1592          %{
   1593          #define MAX_INCLUDE_DEPTH 10
   1594          YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
   1595          int include_stack_ptr = 0;
   1596          %}
   1597 
   1598          %%
   1599          include             BEGIN(incl);
   1600 
   1601          [a-z]+              ECHO;
   1602          [^a-z\n]*\n?        ECHO;
   1603 
   1604          <incl>[ \t]*      /* eat the whitespace */
   1605          <incl>[^ \t\n]+   { /* got the include file name */
   1606                  if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
   1607                      {
   1608                      fprintf( stderr, "Includes nested too deeply" );
   1609                      exit( 1 );
   1610                      }
   1611 
   1612                  include_stack[include_stack_ptr++] =
   1613                      YY_CURRENT_BUFFER;
   1614 
   1615                  yyin = fopen( yytext, "r" );
   1616 
   1617                  if ( ! yyin )
   1618                      error( ... );
   1619 
   1620                  yy_switch_to_buffer(
   1621                      yy_create_buffer( yyin, YY_BUF_SIZE ) );
   1622 
   1623                  BEGIN(INITIAL);
   1624                  }
   1625 
   1626          <<EOF>> {
   1627                  if ( --include_stack_ptr < 0 )
   1628                      {
   1629                      yyterminate();
   1630                      }
   1631 
   1632                  else
   1633                      {
   1634                      yy_delete_buffer( YY_CURRENT_BUFFER );
   1635                      yy_switch_to_buffer(
   1636                           include_stack[include_stack_ptr] );
   1637                      }
   1638                  }
   1639 
   1640      Three routines are available for setting  up  input  buffers
   1641      for  scanning  in-memory  strings  instead of files.  All of
   1642      them create a new input buffer for scanning the string,  and
   1643      return  a  corresponding  YY_BUFFER_STATE  handle (which you
   1644 
   1645 
   1646 
   1647 Version 2.5          Last change: April 1995                   25
   1648 
   1649 
   1650 
   1651 
   1652 
   1653 
   1654 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1655 
   1656 
   1657 
   1658      should delete with yy_delete_buffer() when  done  with  it).
   1659      They    also    switch    to    the    new    buffer   using
   1660      yy_switch_to_buffer(), so the  next  call  to  yylex()  will
   1661      start scanning the string.
   1662 
   1663      yy_scan_string(const char *str)
   1664           scans a NUL-terminated string.
   1665 
   1666      yy_scan_bytes(const char *bytes, int len)
   1667           scans len bytes (including possibly NUL's) starting  at
   1668           location bytes.
   1669 
   1670      Note that both of these functions create and scan a copy  of
   1671      the  string or bytes.  (This may be desirable, since yylex()
   1672      modifies the contents of the buffer it  is  scanning.)   You
   1673      can avoid the copy by using:
   1674 
   1675      yy_scan_buffer(char *base, yy_size_t size)
   1676           which scans in place the buffer starting at base,  con-
   1677           sisting of size bytes, the last two bytes of which must
   1678           be YY_END_OF_BUFFER_CHAR (ASCII NUL).  These  last  two
   1679           bytes  are  not  scanned;  thus,  scanning  consists of
   1680           base[0] through base[size-2], inclusive.
   1681 
   1682           If you fail to set up base in this manner (i.e., forget
   1683           the   final   two  YY_END_OF_BUFFER_CHAR  bytes),  then
   1684           yy_scan_buffer()  returns  a  nil  pointer  instead  of
   1685           creating a new input buffer.
   1686 
   1687           The type yy_size_t is an integral type to which you can
   1688           cast  an  integer expression reflecting the size of the
   1689           buffer.
   1690 
   1691 END-OF-FILE RULES
   1692      The special rule "<<EOF>>" indicates actions which are to be
   1693      taken  when  an  end-of-file  is  encountered  and  yywrap()
   1694      returns non-zero (i.e., indicates no further files  to  pro-
   1695      cess).  The action must finish by doing one of four things:
   1696 
   1697      -    assigning yyin to a new input file  (in  previous  ver-
   1698           sions  of  flex,  after doing the assignment you had to
   1699           call the special action YY_NEW_FILE; this is no  longer
   1700           necessary);
   1701 
   1702      -    executing a return statement;
   1703 
   1704      -    executing the special yyterminate() action;
   1705 
   1706      -    or,    switching    to    a    new     buffer     using
   1707           yy_switch_to_buffer() as shown in the example above.
   1708 
   1709 
   1710 
   1711 
   1712 
   1713 Version 2.5          Last change: April 1995                   26
   1714 
   1715 
   1716 
   1717 
   1718 
   1719 
   1720 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1721 
   1722 
   1723 
   1724      <<EOF>> rules may not be used with other patterns; they  may
   1725      only  be  qualified  with a list of start conditions.  If an
   1726      unqualified <<EOF>> rule is given, it applies to  all  start
   1727      conditions  which  do  not already have <<EOF>> actions.  To
   1728      specify an <<EOF>> rule for only the  initial  start  condi-
   1729      tion, use
   1730 
   1731          <INITIAL><<EOF>>
   1732 
   1733 
   1734      These rules are useful for  catching  things  like  unclosed
   1735      comments.  An example:
   1736 
   1737          %x quote
   1738          %%
   1739 
   1740          ...other rules for dealing with quotes...
   1741 
   1742          <quote><<EOF>>   {
   1743                   error( "unterminated quote" );
   1744                   yyterminate();
   1745                   }
   1746          <<EOF>>  {
   1747                   if ( *++filelist )
   1748                       yyin = fopen( *filelist, "r" );
   1749                   else
   1750                      yyterminate();
   1751                   }
   1752 
   1753 
   1754 MISCELLANEOUS MACROS
   1755      The macro YY_USER_ACTION can be defined to provide an action
   1756      which is always executed prior to the matched rule's action.
   1757      For example, it could be #define'd to call a routine to con-
   1758      vert  yytext to lower-case.  When YY_USER_ACTION is invoked,
   1759      the variable yy_act gives the number  of  the  matched  rule
   1760      (rules  are  numbered starting with 1).  Suppose you want to
   1761      profile how often each of your rules is matched.   The  fol-
   1762      lowing would do the trick:
   1763 
   1764          #define YY_USER_ACTION ++ctr[yy_act]
   1765 
   1766      where ctr is an array to hold the counts for  the  different
   1767      rules.   Note  that  the  macro YY_NUM_RULES gives the total
   1768      number of rules (including the default rule, even if you use
   1769      -s), so a correct declaration for ctr is:
   1770 
   1771          int ctr[YY_NUM_RULES];
   1772 
   1773 
   1774      The macro YY_USER_INIT may be defined to provide  an  action
   1775      which  is  always executed before the first scan (and before
   1776 
   1777 
   1778 
   1779 Version 2.5          Last change: April 1995                   27
   1780 
   1781 
   1782 
   1783 
   1784 
   1785 
   1786 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1787 
   1788 
   1789 
   1790      the scanner's internal initializations are done).  For exam-
   1791      ple,  it  could  be used to call a routine to read in a data
   1792      table or open a logging file.
   1793 
   1794      The macro yy_set_interactive(is_interactive) can be used  to
   1795      control  whether  the  current buffer is considered interac-
   1796      tive. An interactive buffer is processed  more  slowly,  but
   1797      must  be  used  when  the  scanner's  input source is indeed
   1798      interactive to avoid problems due to waiting to fill buffers
   1799      (see the discussion of the -I flag below).  A non-zero value
   1800      in the macro invocation marks the buffer as  interactive,  a
   1801      zero  value as non-interactive.  Note that use of this macro
   1802      overrides  %option  always-interactive  or  %option   never-
   1803      interactive  (see Options below).  yy_set_interactive() must
   1804      be invoked prior to beginning to scan the buffer that is (or
   1805      is not) to be considered interactive.
   1806 
   1807      The macro yy_set_bol(at_bol) can be used to control  whether
   1808      the  current  buffer's  scanning  context for the next token
   1809      match is done as though at the beginning of a line.  A  non-
   1810      zero macro argument makes rules anchored with
   1811 
   1812      The macro YY_AT_BOL() returns true if the next token scanned
   1813      from  the  current  buffer will have '^' rules active, false
   1814      otherwise.
   1815 
   1816      In the generated scanner, the actions are  all  gathered  in
   1817      one  large  switch  statement  and separated using YY_BREAK,
   1818      which may be redefined.  By default, it is simply a "break",
   1819      to  separate  each  rule's action from the following rule's.
   1820      Redefining  YY_BREAK  allows,  for  example,  C++  users  to
   1821      #define  YY_BREAK  to  do  nothing (while being very careful
   1822      that every rule ends with a "break" or a "return"!) to avoid
   1823      suffering  from unreachable statement warnings where because
   1824      a rule's action ends with "return", the YY_BREAK is inacces-
   1825      sible.
   1826 
   1827 VALUES AVAILABLE TO THE USER
   1828      This section summarizes the various values available to  the
   1829      user in the rule actions.
   1830 
   1831      -    char *yytext holds the text of the current  token.   It
   1832           may  be  modified but not lengthened (you cannot append
   1833           characters to the end).
   1834 
   1835           If the special directive %array appears  in  the  first
   1836           section  of  the  scanner  description,  then yytext is
   1837           instead declared char yytext[YYLMAX], where YYLMAX is a
   1838           macro  definition  that  you  can redefine in the first
   1839           section if you don't like the default value  (generally
   1840           8KB).    Using   %array   results  in  somewhat  slower
   1841           scanners, but the value of  yytext  becomes  immune  to
   1842 
   1843 
   1844 
   1845 Version 2.5          Last change: April 1995                   28
   1846 
   1847 
   1848 
   1849 
   1850 
   1851 
   1852 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1853 
   1854 
   1855 
   1856           calls to input() and unput(), which potentially destroy
   1857           its value when yytext  is  a  character  pointer.   The
   1858           opposite of %array is %pointer, which is the default.
   1859 
   1860           You cannot  use  %array  when  generating  C++  scanner
   1861           classes (the -+ flag).
   1862 
   1863      -    int yyleng holds the length of the current token.
   1864 
   1865      -    FILE *yyin is the file  which  by  default  flex  reads
   1866           from.   It  may  be  redefined  but doing so only makes
   1867           sense before scanning begins or after an EOF  has  been
   1868           encountered.  Changing it in the midst of scanning will
   1869           have unexpected results since flex buffers  its  input;
   1870           use  yyrestart()  instead.   Once  scanning  terminates
   1871           because an end-of-file has been seen,  you  can  assign
   1872           yyin  at  the  new input file and then call the scanner
   1873           again to continue scanning.
   1874 
   1875      -    void yyrestart( FILE *new_file ) may be called to point
   1876           yyin at the new input file.  The switch-over to the new
   1877           file is immediate (any previously buffered-up input  is
   1878           lost).   Note  that calling yyrestart() with yyin as an
   1879           argument thus throws away the current input buffer  and
   1880           continues scanning the same input file.
   1881 
   1882      -    FILE *yyout is the file to which ECHO actions are done.
   1883           It can be reassigned by the user.
   1884 
   1885      -    YY_CURRENT_BUFFER returns a YY_BUFFER_STATE  handle  to
   1886           the current buffer.
   1887 
   1888      -    YY_START returns an integer value corresponding to  the
   1889           current start condition.  You can subsequently use this
   1890           value with BEGIN to return to that start condition.
   1891 
   1892 INTERFACING WITH YACC
   1893      One of the main uses of flex is as a companion to  the  yacc
   1894      parser-generator.   yacc  parsers  expect  to call a routine
   1895      named yylex() to find the next input token.  The routine  is
   1896      supposed  to  return  the  type of the next token as well as
   1897      putting any associated value in the global  yylval.  To  use
   1898      flex  with  yacc,  one  specifies  the  -d option to yacc to
   1899      instruct it to generate the file y.tab.h containing  defini-
   1900      tions  of all the %tokens appearing in the yacc input.  This
   1901      file is then included in the flex scanner.  For example,  if
   1902      one of the tokens is "TOK_NUMBER", part of the scanner might
   1903      look like:
   1904 
   1905          %{
   1906          #include "y.tab.h"
   1907          %}
   1908 
   1909 
   1910 
   1911 Version 2.5          Last change: April 1995                   29
   1912 
   1913 
   1914 
   1915 
   1916 
   1917 
   1918 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1919 
   1920 
   1921 
   1922          %%
   1923 
   1924          [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
   1925 
   1926 
   1927 OPTIONS
   1928      flex has the following options:
   1929 
   1930      -b   Generate backing-up information to lex.backup. This  is
   1931           a  list  of scanner states which require backing up and
   1932           the input characters on which they do  so.   By  adding
   1933           rules   one  can  remove  backing-up  states.   If  all
   1934           backing-up states are eliminated  and  -Cf  or  -CF  is
   1935           used, the generated scanner will run faster (see the -p
   1936           flag).  Only users who wish to squeeze every last cycle
   1937           out  of  their  scanners  need worry about this option.
   1938           (See the section on Performance Considerations below.)
   1939 
   1940      -c   is a do-nothing, deprecated option included  for  POSIX
   1941           compliance.
   1942 
   1943      -d   makes the generated scanner run in debug  mode.   When-
   1944           ever   a   pattern   is   recognized   and  the  global
   1945           yy_flex_debug is non-zero (which is the  default),  the
   1946           scanner will write to stderr a line of the form:
   1947 
   1948               --accepting rule at line 53 ("the matched text")
   1949 
   1950           The line number refers to the location of the  rule  in
   1951           the  file defining the scanner (i.e., the file that was
   1952           fed to flex).  Messages are  also  generated  when  the
   1953           scanner backs up, accepts the default rule, reaches the
   1954           end of its input buffer (or encounters a NUL;  at  this
   1955           point,  the  two  look the same as far as the scanner's
   1956           concerned), or reaches an end-of-file.
   1957 
   1958      -f   specifies fast scanner. No table  compression  is  done
   1959           and  stdio  is bypassed.  The result is large but fast.
   1960           This option is equivalent to -Cfr (see below).
   1961 
   1962      -h   generates a "help" summary of flex's options to  stdout
   1963           and then exits.  -? and --help are synonyms for -h.
   1964 
   1965      -i   instructs flex to generate a case-insensitive  scanner.
   1966           The  case  of  letters given in the flex input patterns
   1967           will be ignored,  and  tokens  in  the  input  will  be
   1968           matched  regardless of case.  The matched text given in
   1969           yytext will have the preserved case (i.e., it will  not
   1970           be folded).
   1971 
   1972      -l   turns on maximum compatibility with the  original  AT&T
   1973           lex  implementation.  Note that this does not mean full
   1974 
   1975 
   1976 
   1977 Version 2.5          Last change: April 1995                   30
   1978 
   1979 
   1980 
   1981 
   1982 
   1983 
   1984 FLEX(1)                  USER COMMANDS                    FLEX(1)
   1985 
   1986 
   1987 
   1988           compatibility.  Use of this option costs a considerable
   1989           amount  of  performance, and it cannot be used with the
   1990           -+, -f, -F, -Cf, or -CF options.  For  details  on  the
   1991           compatibilities  it provides, see the section "Incompa-
   1992           tibilities With Lex And POSIX" below.  This option also
   1993           results  in the name YY_FLEX_LEX_COMPAT being #define'd
   1994           in the generated scanner.
   1995 
   1996      -n   is another do-nothing, deprecated option included  only
   1997           for POSIX compliance.
   1998 
   1999      -p   generates a performance report to stderr.   The  report
   2000           consists  of  comments  regarding  features of the flex
   2001           input file which will cause a serious loss  of  perfor-
   2002           mance  in  the resulting scanner.  If you give the flag
   2003           twice, you will also get  comments  regarding  features
   2004           that lead to minor performance losses.
   2005 
   2006           Note that the use  of  REJECT,  %option  yylineno,  and
   2007           variable  trailing context (see the Deficiencies / Bugs
   2008           section  below)  entails  a   substantial   performance
   2009           penalty;  use  of  yymore(), the ^ operator, and the -I
   2010           flag entail minor performance penalties.
   2011 
   2012      -s   causes the default rule (that unmatched  scanner  input
   2013           is  echoed to stdout) to be suppressed.  If the scanner
   2014           encounters input that does not match any of its  rules,
   2015           it  aborts  with  an  error.  This option is useful for
   2016           finding holes in a scanner's rule set.
   2017 
   2018      -t   instructs flex to write the  scanner  it  generates  to
   2019           standard output instead of lex.yy.c.
   2020 
   2021      -v   specifies that flex should write to stderr a summary of
   2022           statistics regarding the scanner it generates.  Most of
   2023           the statistics are meaningless to the casual flex user,
   2024           but the first line identifies the version of flex (same
   2025           as reported by -V), and the next line  the  flags  used
   2026           when  generating  the scanner, including those that are
   2027           on by default.
   2028 
   2029      -w   suppresses warning messages.
   2030 
   2031      -B   instructs flex to generate a batch scanner,  the  oppo-
   2032           site  of  interactive  scanners  generated  by  -I (see
   2033           below).  In general, you use -B when  you  are  certain
   2034           that your scanner will never be used interactively, and
   2035           you want to squeeze a little more  performance  out  of
   2036           it.   If your goal is instead to squeeze out a lot more
   2037           performance, you  should   be  using  the  -Cf  or  -CF
   2038           options  (discussed  below), which turn on -B automati-
   2039           cally anyway.
   2040 
   2041 
   2042 
   2043 Version 2.5          Last change: April 1995                   31
   2044 
   2045 
   2046 
   2047 
   2048 
   2049 
   2050 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2051 
   2052 
   2053 
   2054      -F   specifies that the fast  scanner  table  representation
   2055           should  be used (and stdio bypassed).  This representa-
   2056           tion is about as fast as the full table  representation
   2057           (-f),  and  for some sets of patterns will be consider-
   2058           ably smaller (and for others, larger).  In general,  if
   2059           the  pattern  set contains both "keywords" and a catch-
   2060           all, "identifier" rule, such as in the set:
   2061 
   2062               "case"    return TOK_CASE;
   2063               "switch"  return TOK_SWITCH;
   2064               ...
   2065               "default" return TOK_DEFAULT;
   2066               [a-z]+    return TOK_ID;
   2067 
   2068           then you're better off using the full table representa-
   2069           tion.  If only the "identifier" rule is present and you
   2070           then use a hash table or some such to detect  the  key-
   2071           words, you're better off using -F.
   2072 
   2073           This option is equivalent to -CFr (see below).  It can-
   2074           not be used with -+.
   2075 
   2076      -I   instructs flex to generate an interactive scanner.   An
   2077           interactive  scanner  is  one  that only looks ahead to
   2078           decide what token has been  matched  if  it  absolutely
   2079           must.  It turns out that always looking one extra char-
   2080           acter ahead, even  if  the  scanner  has  already  seen
   2081           enough text to disambiguate the current token, is a bit
   2082           faster than only looking  ahead  when  necessary.   But
   2083           scanners  that always look ahead give dreadful interac-
   2084           tive performance; for example, when a user types a new-
   2085           line,  it  is  not  recognized as a newline token until
   2086           they enter another token, which often means  typing  in
   2087           another whole line.
   2088 
   2089           Flex scanners default to interactive unless you use the
   2090           -Cf  or  -CF  table-compression  options  (see  below).
   2091           That's because if you're looking  for  high-performance
   2092           you  should  be  using  one of these options, so if you
   2093           didn't, flex assumes you'd rather trade off  a  bit  of
   2094           run-time    performance   for   intuitive   interactive
   2095           behavior.  Note also that you cannot use -I in conjunc-
   2096           tion  with  -Cf or -CF. Thus, this option is not really
   2097           needed; it is on by default  for  all  those  cases  in
   2098           which it is allowed.
   2099 
   2100           You can force a scanner to not be interactive by  using
   2101           -B (see above).
   2102 
   2103      -L   instructs  flex  not  to  generate  #line   directives.
   2104           Without this option, flex peppers the generated scanner
   2105           with #line directives so error messages in the  actions
   2106 
   2107 
   2108 
   2109 Version 2.5          Last change: April 1995                   32
   2110 
   2111 
   2112 
   2113 
   2114 
   2115 
   2116 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2117 
   2118 
   2119 
   2120           will  be  correctly  located with respect to either the
   2121           original flex input file (if the errors are due to code
   2122           in  the  input  file),  or  lex.yy.c (if the errors are
   2123           flex's fault -- you should report these sorts of errors
   2124           to the email address given below).
   2125 
   2126      -T   makes flex run in trace mode.  It will generate  a  lot
   2127           of  messages to stderr concerning the form of the input
   2128           and the resultant non-deterministic  and  deterministic
   2129           finite  automata.   This  option  is  mostly for use in
   2130           maintaining flex.
   2131 
   2132      -V   prints the version number to stdout and exits.   --ver-
   2133           sion is a synonym for -V.
   2134 
   2135      -7   instructs flex to generate a 7-bit scanner,  i.e.,  one
   2136           which  can  only  recognized  7-bit  characters  in its
   2137           input.  The advantage of using -7 is that the scanner's
   2138           tables  can  be  up to half the size of those generated
   2139           using the -8 option (see below).  The  disadvantage  is
   2140           that  such  scanners often hang or crash if their input
   2141           contains an 8-bit character.
   2142 
   2143           Note, however, that unless you  generate  your  scanner
   2144           using  the -Cf or -CF table compression options, use of
   2145           -7 will save only a small amount of  table  space,  and
   2146           make  your  scanner considerably less portable.  Flex's
   2147           default behavior is to generate an 8-bit scanner unless
   2148           you  use the -Cf or -CF, in which case flex defaults to
   2149           generating 7-bit scanners unless your site  was  always
   2150           configured to generate 8-bit scanners (as will often be
   2151           the case with non-USA sites).   You  can  tell  whether
   2152           flex  generated a 7-bit or an 8-bit scanner by inspect-
   2153           ing the flag summary in  the  -v  output  as  described
   2154           above.
   2155 
   2156           Note that if you use -Cfe or -CFe (those table compres-
   2157           sion  options,  but  also  using equivalence classes as
   2158           discussed see below), flex still defaults to generating
   2159           an  8-bit scanner, since usually with these compression
   2160           options full 8-bit tables are not much  more  expensive
   2161           than 7-bit tables.
   2162 
   2163      -8   instructs flex to generate an 8-bit scanner, i.e.,  one
   2164           which  can  recognize  8-bit  characters.  This flag is
   2165           only needed for scanners generated using -Cf or -CF, as
   2166           otherwise  flex defaults to generating an 8-bit scanner
   2167           anyway.
   2168 
   2169           See the discussion  of  -7  above  for  flex's  default
   2170           behavior  and  the  tradeoffs  between  7-bit and 8-bit
   2171           scanners.
   2172 
   2173 
   2174 
   2175 Version 2.5          Last change: April 1995                   33
   2176 
   2177 
   2178 
   2179 
   2180 
   2181 
   2182 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2183 
   2184 
   2185 
   2186      -+   specifies that you want flex to generate a C++  scanner
   2187           class.   See  the  section  on  Generating C++ Scanners
   2188           below for details.
   2189 
   2190      -C[aefFmr]
   2191           controls the degree of table compression and, more gen-
   2192           erally,  trade-offs  between  small  scanners  and fast
   2193           scanners.
   2194 
   2195           -Ca ("align") instructs flex to trade off larger tables
   2196           in the generated scanner for faster performance because
   2197           the elements of  the  tables  are  better  aligned  for
   2198           memory  access and computation.  On some RISC architec-
   2199           tures, fetching  and  manipulating  longwords  is  more
   2200           efficient  than with smaller-sized units such as short-
   2201           words.  This option can double the size of  the  tables
   2202           used by your scanner.
   2203 
   2204           -Ce directs  flex  to  construct  equivalence  classes,
   2205           i.e.,  sets  of characters which have identical lexical
   2206           properties (for example,  if  the  only  appearance  of
   2207           digits  in  the  flex  input  is in the character class
   2208           "[0-9]" then the digits '0', '1', ..., '9' will all  be
   2209           put   in  the  same  equivalence  class).   Equivalence
   2210           classes usually give dramatic reductions in  the  final
   2211           table/object file sizes (typically a factor of 2-5) and
   2212           are pretty cheap performance-wise  (one  array  look-up
   2213           per character scanned).
   2214 
   2215           -Cf specifies that the full scanner  tables  should  be
   2216           generated - flex should not compress the tables by tak-
   2217           ing advantages of similar transition functions for dif-
   2218           ferent states.
   2219 
   2220           -CF specifies that the alternate fast scanner represen-
   2221           tation  (described  above  under the -F flag) should be
   2222           used.  This option cannot be used with -+.
   2223 
   2224           -Cm directs flex to construct meta-equivalence classes,
   2225           which  are  sets of equivalence classes (or characters,
   2226           if equivalence classes are not  being  used)  that  are
   2227           commonly  used  together.  Meta-equivalence classes are
   2228           often a big win when using compressed tables, but  they
   2229           have  a  moderate  performance  impact (one or two "if"
   2230           tests and one array look-up per character scanned).
   2231 
   2232           -Cr causes the generated scanner to bypass use  of  the
   2233           standard  I/O  library  (stdio)  for input.  Instead of
   2234           calling fread() or getc(), the  scanner  will  use  the
   2235           read()  system  call,  resulting  in a performance gain
   2236           which varies from system to system, but in  general  is
   2237           probably  negligible  unless  you are also using -Cf or
   2238 
   2239 
   2240 
   2241 Version 2.5          Last change: April 1995                   34
   2242 
   2243 
   2244 
   2245 
   2246 
   2247 
   2248 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2249 
   2250 
   2251 
   2252           -CF. Using -Cr can cause strange behavior if, for exam-
   2253           ple,  you  read  from yyin using stdio prior to calling
   2254           the scanner (because the  scanner  will  miss  whatever
   2255           text  your  previous  reads  left  in  the  stdio input
   2256           buffer).
   2257 
   2258           -Cr has no effect if you define YY_INPUT (see The  Gen-
   2259           erated Scanner above).
   2260 
   2261           A lone -C specifies that the scanner tables  should  be
   2262           compressed  but  neither  equivalence classes nor meta-
   2263           equivalence classes should be used.
   2264 
   2265           The options -Cf or  -CF  and  -Cm  do  not  make  sense
   2266           together - there is no opportunity for meta-equivalence
   2267           classes if the table is not being  compressed.   Other-
   2268           wise  the  options may be freely mixed, and are cumula-
   2269           tive.
   2270 
   2271           The default setting is -Cem, which specifies that  flex
   2272           should   generate   equivalence   classes   and   meta-
   2273           equivalence classes.  This setting provides the highest
   2274           degree   of  table  compression.   You  can  trade  off
   2275           faster-executing scanners at the cost of larger  tables
   2276           with the following generally being true:
   2277 
   2278               slowest & smallest
   2279                     -Cem
   2280                     -Cm
   2281                     -Ce
   2282                     -C
   2283                     -C{f,F}e
   2284                     -C{f,F}
   2285                     -C{f,F}a
   2286               fastest & largest
   2287 
   2288           Note that scanners with the smallest tables are usually
   2289           generated and compiled the quickest, so during develop-
   2290           ment you will usually want to use the default,  maximal
   2291           compression.
   2292 
   2293           -Cfe is often a good compromise between speed and  size
   2294           for production scanners.
   2295 
   2296      -ooutput
   2297           directs flex to write the scanner to  the  file  output
   2298           instead  of  lex.yy.c.  If  you  combine -o with the -t
   2299           option, then the scanner is written to stdout  but  its
   2300           #line directives (see the -L option above) refer to the
   2301           file output.
   2302 
   2303      -Pprefix
   2304 
   2305 
   2306 
   2307 Version 2.5          Last change: April 1995                   35
   2308 
   2309 
   2310 
   2311 
   2312 
   2313 
   2314 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2315 
   2316 
   2317 
   2318           changes the default yy prefix  used  by  flex  for  all
   2319           globally-visible variable and function names to instead
   2320           be prefix. For  example,  -Pfoo  changes  the  name  of
   2321           yytext  to  footext.  It  also  changes the name of the
   2322           default output file from lex.yy.c  to  lex.foo.c.  Here
   2323           are all of the names affected:
   2324 
   2325               yy_create_buffer
   2326               yy_delete_buffer
   2327               yy_flex_debug
   2328               yy_init_buffer
   2329               yy_flush_buffer
   2330               yy_load_buffer_state
   2331               yy_switch_to_buffer
   2332               yyin
   2333               yyleng
   2334               yylex
   2335               yylineno
   2336               yyout
   2337               yyrestart
   2338               yytext
   2339               yywrap
   2340 
   2341           (If you are using a C++ scanner, then only  yywrap  and
   2342           yyFlexLexer  are affected.) Within your scanner itself,
   2343           you can still refer to the global variables  and  func-
   2344           tions  using  either  version of their name; but exter-
   2345           nally, they have the modified name.
   2346 
   2347           This option lets you easily link together multiple flex
   2348           programs  into the same executable.  Note, though, that
   2349           using this option also renames  yywrap(),  so  you  now
   2350           must either provide your own (appropriately-named) ver-
   2351           sion of the routine for your scanner,  or  use  %option
   2352           noyywrap,  as  linking with -lfl no longer provides one
   2353           for you by default.
   2354 
   2355      -Sskeleton_file
   2356           overrides the default skeleton  file  from  which  flex
   2357           constructs its scanners.  You'll never need this option
   2358           unless you are doing flex maintenance or development.
   2359 
   2360      flex also  provides  a  mechanism  for  controlling  options
   2361      within  the  scanner  specification itself, rather than from
   2362      the flex command-line.  This is done  by  including  %option
   2363      directives  in  the  first section of the scanner specifica-
   2364      tion.  You  can  specify  multiple  options  with  a  single
   2365      %option directive, and multiple directives in the first sec-
   2366      tion of your flex input file.
   2367 
   2368      Most options are given simply as names, optionally  preceded
   2369      by  the word "no" (with no intervening whitespace) to negate
   2370 
   2371 
   2372 
   2373 Version 2.5          Last change: April 1995                   36
   2374 
   2375 
   2376 
   2377 
   2378 
   2379 
   2380 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2381 
   2382 
   2383 
   2384      their meaning.  A number are equivalent  to  flex  flags  or
   2385      their negation:
   2386 
   2387          7bit            -7 option
   2388          8bit            -8 option
   2389          align           -Ca option
   2390          backup          -b option
   2391          batch           -B option
   2392          c++             -+ option
   2393 
   2394          caseful or
   2395          case-sensitive  opposite of -i (default)
   2396 
   2397          case-insensitive or
   2398          caseless        -i option
   2399 
   2400          debug           -d option
   2401          default         opposite of -s option
   2402          ecs             -Ce option
   2403          fast            -F option
   2404          full            -f option
   2405          interactive     -I option
   2406          lex-compat      -l option
   2407          meta-ecs        -Cm option
   2408          perf-report     -p option
   2409          read            -Cr option
   2410          stdout          -t option
   2411          verbose         -v option
   2412          warn            opposite of -w option
   2413                          (use "%option nowarn" for -w)
   2414 
   2415          array           equivalent to "%array"
   2416          pointer         equivalent to "%pointer" (default)
   2417 
   2418      Some %option's provide features otherwise not available:
   2419 
   2420      always-interactive
   2421           instructs flex to generate a scanner which always  con-
   2422           siders  its input "interactive".  Normally, on each new
   2423           input file the scanner calls isatty() in an attempt  to
   2424           determine   whether   the  scanner's  input  source  is
   2425           interactive and thus should be read a  character  at  a
   2426           time.   When this option is used, however, then no such
   2427           call is made.
   2428 
   2429      main directs flex to provide a default  main()  program  for
   2430           the  scanner,  which  simply calls yylex(). This option
   2431           implies noyywrap (see below).
   2432 
   2433      never-interactive
   2434           instructs flex to generate a scanner which  never  con-
   2435           siders  its input "interactive" (again, no call made to
   2436 
   2437 
   2438 
   2439 Version 2.5          Last change: April 1995                   37
   2440 
   2441 
   2442 
   2443 
   2444 
   2445 
   2446 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2447 
   2448 
   2449 
   2450           isatty()). This is the opposite of always-interactive.
   2451 
   2452      stack
   2453           enables the use of start condition  stacks  (see  Start
   2454           Conditions above).
   2455 
   2456      stdinit
   2457           if set (i.e., %option  stdinit)  initializes  yyin  and
   2458           yyout  to  stdin  and stdout, instead of the default of
   2459           nil.  Some  existing  lex  programs  depend   on   this
   2460           behavior,  even though it is not compliant with ANSI C,
   2461           which does not require stdin and stdout to be  compile-
   2462           time constant.
   2463 
   2464      yylineno
   2465           directs flex to generate a scanner that  maintains  the
   2466           number  of  the current line read from its input in the
   2467           global variable yylineno. This  option  is  implied  by
   2468           %option lex-compat.
   2469 
   2470      yywrap
   2471           if unset (i.e., %option noyywrap),  makes  the  scanner
   2472           not  call  yywrap()  upon  an  end-of-file,  but simply
   2473           assume that there are no more files to scan (until  the
   2474           user  points  yyin  at  a  new  file  and calls yylex()
   2475           again).
   2476 
   2477      flex scans your rule actions to determine  whether  you  use
   2478      the  REJECT  or  yymore()  features.   The reject and yymore
   2479      options are available to override its decision as to whether
   2480      you  use  the options, either by setting them (e.g., %option
   2481      reject) to indicate the feature is indeed used, or unsetting
   2482      them  to  indicate  it  actually  is not used (e.g., %option
   2483      noyymore).
   2484 
   2485      Three options take string-delimited values, offset with '=':
   2486 
   2487          %option outfile="ABC"
   2488 
   2489      is equivalent to -oABC, and
   2490 
   2491          %option prefix="XYZ"
   2492 
   2493      is equivalent to -PXYZ. Finally,
   2494 
   2495          %option yyclass="foo"
   2496 
   2497      only applies when generating a C++ scanner ( -+ option).  It
   2498      informs  flex  that  you  have  derived foo as a subclass of
   2499      yyFlexLexer, so flex will place your actions in  the  member
   2500      function  foo::yylex()  instead  of yyFlexLexer::yylex(). It
   2501      also generates a yyFlexLexer::yylex() member  function  that
   2502 
   2503 
   2504 
   2505 Version 2.5          Last change: April 1995                   38
   2506 
   2507 
   2508 
   2509 
   2510 
   2511 
   2512 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2513 
   2514 
   2515 
   2516      emits      a      run-time      error      (by      invoking
   2517      yyFlexLexer::LexerError()) if called.   See  Generating  C++
   2518      Scanners, below, for additional information.
   2519 
   2520      A number of options are available for lint purists who  want
   2521      to  suppress the appearance of unneeded routines in the gen-
   2522      erated scanner.  Each of  the  following,  if  unset  (e.g.,
   2523      %option  nounput ), results in the corresponding routine not
   2524      appearing in the generated scanner:
   2525 
   2526          input, unput
   2527          yy_push_state, yy_pop_state, yy_top_state
   2528          yy_scan_buffer, yy_scan_bytes, yy_scan_string
   2529 
   2530      (though yy_push_state()  and  friends  won't  appear  anyway
   2531      unless you use %option stack).
   2532 
   2533 PERFORMANCE CONSIDERATIONS
   2534      The main design goal of  flex  is  that  it  generate  high-
   2535      performance  scanners.   It  has  been optimized for dealing
   2536      well with large sets of rules.  Aside from  the  effects  on
   2537      scanner  speed  of the table compression -C options outlined
   2538      above, there are a number of options/actions  which  degrade
   2539      performance.  These are, from most expensive to least:
   2540 
   2541          REJECT
   2542          %option yylineno
   2543          arbitrary trailing context
   2544 
   2545          pattern sets that require backing up
   2546          %array
   2547          %option interactive
   2548          %option always-interactive
   2549 
   2550          '^' beginning-of-line operator
   2551          yymore()
   2552 
   2553      with the first three all being quite expensive and the  last
   2554      two  being  quite  cheap.   Note also that unput() is imple-
   2555      mented as a routine call that potentially does quite  a  bit
   2556      of  work,  while yyless() is a quite-cheap macro; so if just
   2557      putting back some excess text you scanned, use yyless().
   2558 
   2559      REJECT should be avoided at all costs  when  performance  is
   2560      important.  It is a particularly expensive option.
   2561 
   2562      Getting rid of backing up is messy and often may be an enor-
   2563      mous  amount  of work for a complicated scanner.  In princi-
   2564      pal,  one  begins  by  using  the  -b  flag  to  generate  a
   2565      lex.backup file.  For example, on the input
   2566 
   2567          %%
   2568 
   2569 
   2570 
   2571 Version 2.5          Last change: April 1995                   39
   2572 
   2573 
   2574 
   2575 
   2576 
   2577 
   2578 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2579 
   2580 
   2581 
   2582          foo        return TOK_KEYWORD;
   2583          foobar     return TOK_KEYWORD;
   2584 
   2585      the file looks like:
   2586 
   2587          State #6 is non-accepting -
   2588           associated rule line numbers:
   2589                 2       3
   2590           out-transitions: [ o ]
   2591           jam-transitions: EOF [ \001-n  p-\177 ]
   2592 
   2593          State #8 is non-accepting -
   2594           associated rule line numbers:
   2595                 3
   2596           out-transitions: [ a ]
   2597           jam-transitions: EOF [ \001-`  b-\177 ]
   2598 
   2599          State #9 is non-accepting -
   2600           associated rule line numbers:
   2601                 3
   2602           out-transitions: [ r ]
   2603           jam-transitions: EOF [ \001-q  s-\177 ]
   2604 
   2605          Compressed tables always back up.
   2606 
   2607      The first few lines tell us that there's a scanner state  in
   2608      which  it  can  make  a  transition on an 'o' but not on any
   2609      other character,  and  that  in  that  state  the  currently
   2610      scanned text does not match any rule.  The state occurs when
   2611      trying to match the rules found at lines  2  and  3  in  the
   2612      input  file.  If the scanner is in that state and then reads
   2613      something other than an 'o', it will have to back up to find
   2614      a  rule  which is matched.  With a bit of headscratching one
   2615      can see that this must be the state it's in when it has seen
   2616      "fo".   When  this  has  happened,  if  anything  other than
   2617      another 'o' is seen, the scanner will have  to  back  up  to
   2618      simply match the 'f' (by the default rule).
   2619 
   2620      The comment regarding State #8 indicates there's  a  problem
   2621      when  "foob"  has  been  scanned.   Indeed, on any character
   2622      other than an 'a', the scanner  will  have  to  back  up  to
   2623      accept  "foo".  Similarly, the comment for State #9 concerns
   2624      when "fooba" has been scanned and an 'r' does not follow.
   2625 
   2626      The final comment reminds us that there's no point going  to
   2627      all the trouble of removing backing up from the rules unless
   2628      we're using -Cf or -CF, since there's  no  performance  gain
   2629      doing so with compressed scanners.
   2630 
   2631      The way to remove the backing up is to add "error" rules:
   2632 
   2633          %%
   2634 
   2635 
   2636 
   2637 Version 2.5          Last change: April 1995                   40
   2638 
   2639 
   2640 
   2641 
   2642 
   2643 
   2644 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2645 
   2646 
   2647 
   2648          foo         return TOK_KEYWORD;
   2649          foobar      return TOK_KEYWORD;
   2650 
   2651          fooba       |
   2652          foob        |
   2653          fo          {
   2654                      /* false alarm, not really a keyword */
   2655                      return TOK_ID;
   2656                      }
   2657 
   2658 
   2659      Eliminating backing up among a list of keywords can also  be
   2660      done using a "catch-all" rule:
   2661 
   2662          %%
   2663          foo         return TOK_KEYWORD;
   2664          foobar      return TOK_KEYWORD;
   2665 
   2666          [a-z]+      return TOK_ID;
   2667 
   2668      This is usually the best solution when appropriate.
   2669 
   2670      Backing up messages tend to cascade.  With a complicated set
   2671      of  rules it's not uncommon to get hundreds of messages.  If
   2672      one can decipher them, though, it often only takes  a  dozen
   2673      or so rules to eliminate the backing up (though it's easy to
   2674      make a mistake and have an error rule accidentally  match  a
   2675      valid  token.   A  possible  future  flex feature will be to
   2676      automatically add rules to eliminate backing up).
   2677 
   2678      It's important to keep in mind that you gain the benefits of
   2679      eliminating  backing up only if you eliminate every instance
   2680      of backing up.  Leaving just one means you gain nothing.
   2681 
   2682      Variable trailing context (where both the leading and trail-
   2683      ing  parts  do  not  have a fixed length) entails almost the
   2684      same performance loss as  REJECT  (i.e.,  substantial).   So
   2685      when possible a rule like:
   2686 
   2687          %%
   2688          mouse|rat/(cat|dog)   run();
   2689 
   2690      is better written:
   2691 
   2692          %%
   2693          mouse/cat|dog         run();
   2694          rat/cat|dog           run();
   2695 
   2696      or as
   2697 
   2698          %%
   2699          mouse|rat/cat         run();
   2700 
   2701 
   2702 
   2703 Version 2.5          Last change: April 1995                   41
   2704 
   2705 
   2706 
   2707 
   2708 
   2709 
   2710 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2711 
   2712 
   2713 
   2714          mouse|rat/dog         run();
   2715 
   2716      Note that here the special '|' action does not  provide  any
   2717      savings,  and can even make things worse (see Deficiencies /
   2718      Bugs below).
   2719 
   2720      Another area where the user can increase a scanner's perfor-
   2721      mance  (and  one that's easier to implement) arises from the
   2722      fact that the longer the  tokens  matched,  the  faster  the
   2723      scanner will run.  This is because with long tokens the pro-
   2724      cessing of most input characters takes place in the  (short)
   2725      inner  scanning  loop, and does not often have to go through
   2726      the additional work of setting up the  scanning  environment
   2727      (e.g.,  yytext)  for  the  action.  Recall the scanner for C
   2728      comments:
   2729 
   2730          %x comment
   2731          %%
   2732                  int line_num = 1;
   2733 
   2734          "/*"         BEGIN(comment);
   2735 
   2736          <comment>[^*\n]*
   2737          <comment>"*"+[^*/\n]*
   2738          <comment>\n             ++line_num;
   2739          <comment>"*"+"/"        BEGIN(INITIAL);
   2740 
   2741      This could be sped up by writing it as:
   2742 
   2743          %x comment
   2744          %%
   2745                  int line_num = 1;
   2746 
   2747          "/*"         BEGIN(comment);
   2748 
   2749          <comment>[^*\n]*
   2750          <comment>[^*\n]*\n      ++line_num;
   2751          <comment>"*"+[^*/\n]*
   2752          <comment>"*"+[^*/\n]*\n ++line_num;
   2753          <comment>"*"+"/"        BEGIN(INITIAL);
   2754 
   2755      Now instead of each  newline  requiring  the  processing  of
   2756      another  action,  recognizing  the newlines is "distributed"
   2757      over the other rules to keep the matched  text  as  long  as
   2758      possible.   Note  that  adding  rules does not slow down the
   2759      scanner!  The speed of the scanner  is  independent  of  the
   2760      number  of  rules or (modulo the considerations given at the
   2761      beginning of this section) how  complicated  the  rules  are
   2762      with regard to operators such as '*' and '|'.
   2763 
   2764      A final example in speeding up a scanner: suppose  you  want
   2765      to  scan through a file containing identifiers and keywords,
   2766 
   2767 
   2768 
   2769 Version 2.5          Last change: April 1995                   42
   2770 
   2771 
   2772 
   2773 
   2774 
   2775 
   2776 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2777 
   2778 
   2779 
   2780      one per line and with no other  extraneous  characters,  and
   2781      recognize all the keywords.  A natural first approach is:
   2782 
   2783          %%
   2784          asm      |
   2785          auto     |
   2786          break    |
   2787          ... etc ...
   2788          volatile |
   2789          while    /* it's a keyword */
   2790 
   2791          .|\n     /* it's not a keyword */
   2792 
   2793      To eliminate the back-tracking, introduce a catch-all rule:
   2794 
   2795          %%
   2796          asm      |
   2797          auto     |
   2798          break    |
   2799          ... etc ...
   2800          volatile |
   2801          while    /* it's a keyword */
   2802 
   2803          [a-z]+   |
   2804          .|\n     /* it's not a keyword */
   2805 
   2806      Now, if it's guaranteed that there's exactly  one  word  per
   2807      line,  then  we  can reduce the total number of matches by a
   2808      half by merging in the recognition of newlines with that  of
   2809      the other tokens:
   2810 
   2811          %%
   2812          asm\n    |
   2813          auto\n   |
   2814          break\n  |
   2815          ... etc ...
   2816          volatile\n |
   2817          while\n  /* it's a keyword */
   2818 
   2819          [a-z]+\n |
   2820          .|\n     /* it's not a keyword */
   2821 
   2822      One has to be careful here,  as  we  have  now  reintroduced
   2823      backing  up  into the scanner.  In particular, while we know
   2824      that there will never be any characters in the input  stream
   2825      other  than letters or newlines, flex can't figure this out,
   2826      and it will plan for possibly needing to back up when it has
   2827      scanned  a  token like "auto" and then the next character is
   2828      something other than a newline or a letter.   Previously  it
   2829      would  then  just match the "auto" rule and be done, but now
   2830      it has no "auto" rule, only a "auto\n" rule.   To  eliminate
   2831      the possibility of backing up, we could either duplicate all
   2832 
   2833 
   2834 
   2835 Version 2.5          Last change: April 1995                   43
   2836 
   2837 
   2838 
   2839 
   2840 
   2841 
   2842 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2843 
   2844 
   2845 
   2846      rules but without final newlines, or, since we never  expect
   2847      to  encounter  such  an  input  and therefore don't how it's
   2848      classified, we can introduce one more catch-all  rule,  this
   2849      one which doesn't include a newline:
   2850 
   2851          %%
   2852          asm\n    |
   2853          auto\n   |
   2854          break\n  |
   2855          ... etc ...
   2856          volatile\n |
   2857          while\n  /* it's a keyword */
   2858 
   2859          [a-z]+\n |
   2860          [a-z]+   |
   2861          .|\n     /* it's not a keyword */
   2862 
   2863      Compiled with -Cf, this is about as fast as one  can  get  a
   2864      flex scanner to go for this particular problem.
   2865 
   2866      A final note: flex is slow when matching NUL's, particularly
   2867      when  a  token  contains multiple NUL's.  It's best to write
   2868      rules which match short amounts of text if it's  anticipated
   2869      that the text will often include NUL's.
   2870 
   2871      Another final note regarding performance: as mentioned above
   2872      in  the section How the Input is Matched, dynamically resiz-
   2873      ing yytext to accommodate huge  tokens  is  a  slow  process
   2874      because  it presently requires that the (huge) token be res-
   2875      canned from the beginning.  Thus if  performance  is  vital,
   2876      you  should  attempt to match "large" quantities of text but
   2877      not "huge" quantities, where the cutoff between the  two  is
   2878      at about 8K characters/token.
   2879 
   2880 GENERATING C++ SCANNERS
   2881      flex provides two different ways to  generate  scanners  for
   2882      use  with C++.  The first way is to simply compile a scanner
   2883      generated by flex using a C++ compiler instead of a  C  com-
   2884      piler.   You  should  not  encounter any compilations errors
   2885      (please report any you find to the email  address  given  in
   2886      the  Author  section  below).   You can then use C++ code in
   2887      your rule actions instead of C code.  Note that the  default
   2888      input  source  for  your  scanner  remains yyin, and default
   2889      echoing is still done to yyout. Both of these remain FILE  *
   2890      variables and not C++ streams.
   2891 
   2892      You can also use flex to generate a C++ scanner class, using
   2893      the  -+  option  (or,  equivalently,  %option c++), which is
   2894      automatically specified if the name of the  flex  executable
   2895      ends  in a '+', such as flex++. When using this option, flex
   2896      defaults to generating the scanner  to  the  file  lex.yy.cc
   2897      instead  of  lex.yy.c.  The  generated  scanner includes the
   2898 
   2899 
   2900 
   2901 Version 2.5          Last change: April 1995                   44
   2902 
   2903 
   2904 
   2905 
   2906 
   2907 
   2908 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2909 
   2910 
   2911 
   2912      header file FlexLexer.h, which defines the interface to  two
   2913      C++ classes.
   2914 
   2915      The first class, FlexLexer, provides an abstract base  class
   2916      defining  the  general scanner class interface.  It provides
   2917      the following member functions:
   2918 
   2919      const char* YYText()
   2920           returns the text of the most  recently  matched  token,
   2921           the equivalent of yytext.
   2922 
   2923      int YYLeng()
   2924           returns the length of the most recently matched  token,
   2925           the equivalent of yyleng.
   2926 
   2927      int lineno() const
   2928           returns the current  input  line  number  (see  %option
   2929           yylineno), or 1 if %option yylineno was not used.
   2930 
   2931      void set_debug( int flag )
   2932           sets the debugging flag for the scanner, equivalent  to
   2933           assigning  to  yy_flex_debug  (see  the Options section
   2934           above).  Note that you must  build  the  scanner  using
   2935           %option debug to include debugging information in it.
   2936 
   2937      int debug() const
   2938           returns the current setting of the debugging flag.
   2939 
   2940      Also   provided   are   member   functions   equivalent   to
   2941      yy_switch_to_buffer(),  yy_create_buffer() (though the first
   2942      argument is an istream* object pointer  and  not  a  FILE*),
   2943      yy_flush_buffer(),   yy_delete_buffer(),   and   yyrestart()
   2944      (again, the first argument is a istream* object pointer).
   2945 
   2946      The second class  defined  in  FlexLexer.h  is  yyFlexLexer,
   2947      which  is  derived  from FlexLexer. It defines the following
   2948      additional member functions:
   2949 
   2950      yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
   2951           constructs a yyFlexLexer object using the given streams
   2952           for  input  and  output.  If not specified, the streams
   2953           default to cin and cout, respectively.
   2954 
   2955      virtual int yylex()
   2956           performs the same role is  yylex()  does  for  ordinary
   2957           flex  scanners:  it  scans  the input stream, consuming
   2958           tokens, until a rule's action returns a value.  If  you
   2959           derive a subclass S from yyFlexLexer and want to access
   2960           the member functions and variables of S inside yylex(),
   2961           then you need to use %option yyclass="S" to inform flex
   2962           that you will be using that subclass instead of yyFlex-
   2963           Lexer.   In   this   case,   rather   than   generating
   2964 
   2965 
   2966 
   2967 Version 2.5          Last change: April 1995                   45
   2968 
   2969 
   2970 
   2971 
   2972 
   2973 
   2974 FLEX(1)                  USER COMMANDS                    FLEX(1)
   2975 
   2976 
   2977 
   2978           yyFlexLexer::yylex(), flex  generates  S::yylex()  (and
   2979           also  generates a dummy yyFlexLexer::yylex() that calls
   2980           yyFlexLexer::LexerError() if called).
   2981 
   2982      virtual void switch_streams(istream* new_in = 0,
   2983           ostream* new_out = 0)  reassigns  yyin  to  new_in  (if
   2984           non-nil)  and  yyout  to  new_out (ditto), deleting the
   2985           previous input buffer if yyin is reassigned.
   2986 
   2987      int yylex( istream* new_in, ostream* new_out = 0 )
   2988           first switches the input  streams  via  switch_streams(
   2989           new_in,  new_out  )  and  then  returns  the  value  of
   2990           yylex().
   2991 
   2992      In addition, yyFlexLexer  defines  the  following  protected
   2993      virtual  functions which you can redefine in derived classes
   2994      to tailor the scanner:
   2995 
   2996      virtual int LexerInput( char* buf, int max_size )
   2997           reads up to max_size characters into  buf  and  returns
   2998           the  number  of  characters  read.  To indicate end-of-
   2999           input, return 0 characters.   Note  that  "interactive"
   3000           scanners  (see  the  -B  and -I flags) define the macro
   3001           YY_INTERACTIVE. If you redefine LexerInput()  and  need
   3002           to  take  different actions depending on whether or not
   3003           the scanner might  be  scanning  an  interactive  input
   3004           source,  you can test for the presence of this name via
   3005           #ifdef.
   3006 
   3007      virtual void LexerOutput( const char* buf, int size )
   3008           writes out size characters from the buffer buf,  which,
   3009           while NUL-terminated, may also contain "internal" NUL's
   3010           if the scanner's rules can match  text  with  NUL's  in
   3011           them.
   3012 
   3013      virtual void LexerError( const char* msg )
   3014           reports a fatal error message.  The default version  of
   3015           this function writes the message to the stream cerr and
   3016           exits.
   3017 
   3018      Note that a yyFlexLexer object contains its entire  scanning
   3019      state.   Thus  you  can use such objects to create reentrant
   3020      scanners.  You can instantiate  multiple  instances  of  the
   3021      same  yyFlexLexer  class,  and you can also combine multiple
   3022      C++ scanner classes together in the same program  using  the
   3023      -P option discussed above.
   3024 
   3025      Finally, note that the %array feature is  not  available  to
   3026      C++ scanner classes; you must use %pointer (the default).
   3027 
   3028      Here is an example of a simple C++ scanner:
   3029 
   3030 
   3031 
   3032 
   3033 Version 2.5          Last change: April 1995                   46
   3034 
   3035 
   3036 
   3037 
   3038 
   3039 
   3040 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3041 
   3042 
   3043 
   3044              // An example of using the flex C++ scanner class.
   3045 
   3046          %{
   3047          int mylineno = 0;
   3048          %}
   3049 
   3050          string  \"[^\n"]+\"
   3051 
   3052          ws      [ \t]+
   3053 
   3054          alpha   [A-Za-z]
   3055          dig     [0-9]
   3056          name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
   3057          num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
   3058          num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
   3059          number  {num1}|{num2}
   3060 
   3061          %%
   3062 
   3063          {ws}    /* skip blanks and tabs */
   3064 
   3065          "/*"    {
   3066                  int c;
   3067 
   3068                  while((c = yyinput()) != 0)
   3069                      {
   3070                      if(c == '\n')
   3071                          ++mylineno;
   3072 
   3073                      else if(c == '*')
   3074                          {
   3075                          if((c = yyinput()) == '/')
   3076                              break;
   3077                          else
   3078                              unput(c);
   3079                          }
   3080                      }
   3081                  }
   3082 
   3083          {number}  cout << "number " << YYText() << '\n';
   3084 
   3085          \n        mylineno++;
   3086 
   3087          {name}    cout << "name " << YYText() << '\n';
   3088 
   3089          {string}  cout << "string " << YYText() << '\n';
   3090 
   3091          %%
   3092 
   3093          int main( int /* argc */, char** /* argv */ )
   3094              {
   3095              FlexLexer* lexer = new yyFlexLexer;
   3096 
   3097 
   3098 
   3099 Version 2.5          Last change: April 1995                   47
   3100 
   3101 
   3102 
   3103 
   3104 
   3105 
   3106 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3107 
   3108 
   3109 
   3110              while(lexer->yylex() != 0)
   3111                  ;
   3112              return 0;
   3113              }
   3114      If you want to create multiple  (different)  lexer  classes,
   3115      you  use  the -P flag (or the prefix= option) to rename each
   3116      yyFlexLexer to some other xxFlexLexer. You then can  include
   3117      <FlexLexer.h>  in  your  other sources once per lexer class,
   3118      first renaming yyFlexLexer as follows:
   3119 
   3120          #undef yyFlexLexer
   3121          #define yyFlexLexer xxFlexLexer
   3122          #include <FlexLexer.h>
   3123 
   3124          #undef yyFlexLexer
   3125          #define yyFlexLexer zzFlexLexer
   3126          #include <FlexLexer.h>
   3127 
   3128      if, for example, you used %option  prefix="xx"  for  one  of
   3129      your scanners and %option prefix="zz" for the other.
   3130 
   3131      IMPORTANT: the present form of the scanning class is experi-
   3132      mental and may change considerably between major releases.
   3133 
   3134 INCOMPATIBILITIES WITH LEX AND POSIX
   3135      flex is a rewrite of the AT&T Unix lex tool (the two  imple-
   3136      mentations  do not share any code, though), with some exten-
   3137      sions and incompatibilities, both of which are of concern to
   3138      those who wish to write scanners acceptable to either imple-
   3139      mentation.  Flex is  fully  compliant  with  the  POSIX  lex
   3140      specification,   except   that   when  using  %pointer  (the
   3141      default), a call to unput() destroys the contents of yytext,
   3142      which is counter to the POSIX specification.
   3143 
   3144      In this section we discuss all of the known areas of  incom-
   3145      patibility  between flex, AT&T lex, and the POSIX specifica-
   3146      tion.
   3147 
   3148      flex's -l option turns on  maximum  compatibility  with  the
   3149      original  AT&T  lex  implementation,  at the cost of a major
   3150      loss in the generated scanner's performance.  We note  below
   3151      which incompatibilities can be overcome using the -l option.
   3152 
   3153      flex is fully compatible with lex with the following  excep-
   3154      tions:
   3155 
   3156      -    The undocumented lex scanner internal variable yylineno
   3157           is not supported unless -l or %option yylineno is used.
   3158 
   3159           yylineno should be maintained on  a  per-buffer  basis,
   3160           rather  than  a  per-scanner  (single  global variable)
   3161           basis.
   3162 
   3163 
   3164 
   3165 Version 2.5          Last change: April 1995                   48
   3166 
   3167 
   3168 
   3169 
   3170 
   3171 
   3172 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3173 
   3174 
   3175 
   3176           yylineno is not part of the POSIX specification.
   3177 
   3178      -    The input() routine is not redefinable, though  it  may
   3179           be  called  to  read  characters following whatever has
   3180           been matched by a rule.  If input() encounters an  end-
   3181           of-file  the  normal  yywrap()  processing  is done.  A
   3182           ``real'' end-of-file is returned by input() as EOF.
   3183 
   3184           Input is instead controlled by  defining  the  YY_INPUT
   3185           macro.
   3186 
   3187           The flex restriction that input() cannot  be  redefined
   3188           is  in  accordance  with the POSIX specification, which
   3189           simply does not specify  any  way  of  controlling  the
   3190           scanner's input other than by making an initial assign-
   3191           ment to yyin.
   3192 
   3193      -    The unput() routine is not redefinable.  This  restric-
   3194           tion is in accordance with POSIX.
   3195 
   3196      -    flex scanners are not as reentrant as lex scanners.  In
   3197           particular,  if  you have an interactive scanner and an
   3198           interrupt handler which long-jumps out of the  scanner,
   3199           and  the  scanner is subsequently called again, you may
   3200           get the following message:
   3201 
   3202               fatal flex scanner internal error--end of buffer missed
   3203 
   3204           To reenter the scanner, first use
   3205 
   3206               yyrestart( yyin );
   3207 
   3208           Note that this call will throw away any buffered input;
   3209           usually  this  isn't  a  problem  with  an  interactive
   3210           scanner.
   3211 
   3212           Also note that flex C++ scanner classes are  reentrant,
   3213           so  if  using  C++ is an option for you, you should use
   3214           them instead.  See "Generating C++ Scanners" above  for
   3215           details.
   3216 
   3217      -    output() is not supported.  Output from the ECHO  macro
   3218           is done to the file-pointer yyout (default stdout).
   3219 
   3220           output() is not part of the POSIX specification.
   3221 
   3222      -    lex does not support exclusive start  conditions  (%x),
   3223           though they are in the POSIX specification.
   3224 
   3225      -    When definitions are expanded, flex  encloses  them  in
   3226           parentheses.  With lex, the following:
   3227 
   3228 
   3229 
   3230 
   3231 Version 2.5          Last change: April 1995                   49
   3232 
   3233 
   3234 
   3235 
   3236 
   3237 
   3238 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3239 
   3240 
   3241 
   3242               NAME    [A-Z][A-Z0-9]*
   3243               %%
   3244               foo{NAME}?      printf( "Found it\n" );
   3245               %%
   3246 
   3247           will not match the string "foo" because when the  macro
   3248           is  expanded  the rule is equivalent to "foo[A-Z][A-Z0-
   3249           9]*?" and the precedence is such that the '?' is  asso-
   3250           ciated  with  "[A-Z0-9]*".  With flex, the rule will be
   3251           expanded to "foo([A-Z][A-Z0-9]*)?" and  so  the  string
   3252           "foo" will match.
   3253 
   3254           Note that if the definition begins with ^ or ends  with
   3255           $  then  it  is not expanded with parentheses, to allow
   3256           these operators to appear in definitions without losing
   3257           their  special  meanings.   But the <s>, /, and <<EOF>>
   3258           operators cannot be used in a flex definition.
   3259 
   3260           Using -l results in the lex behavior of no  parentheses
   3261           around the definition.
   3262 
   3263           The POSIX  specification  is  that  the  definition  be
   3264           enclosed in parentheses.
   3265 
   3266      -    Some implementations of lex allow a  rule's  action  to
   3267           begin  on  a  separate  line, if the rule's pattern has
   3268           trailing whitespace:
   3269 
   3270               %%
   3271               foo|bar<space here>
   3272                 { foobar_action(); }
   3273 
   3274           flex does not support this feature.
   3275 
   3276      -    The lex %r (generate a Ratfor scanner)  option  is  not
   3277           supported.  It is not part of the POSIX specification.
   3278 
   3279      -    After a call to unput(), yytext is undefined until  the
   3280           next  token  is  matched,  unless the scanner was built
   3281           using %array. This is not the  case  with  lex  or  the
   3282           POSIX specification.  The -l option does away with this
   3283           incompatibility.
   3284 
   3285      -    The precedence of the {} (numeric  range)  operator  is
   3286           different.   lex  interprets  "abc{1,3}" as "match one,
   3287           two, or  three  occurrences  of  'abc'",  whereas  flex
   3288           interprets  it  as "match 'ab' followed by one, two, or
   3289           three occurrences of 'c'".  The latter is in  agreement
   3290           with the POSIX specification.
   3291 
   3292      -    The precedence of the ^  operator  is  different.   lex
   3293           interprets  "^foo|bar"  as  "match  either 'foo' at the
   3294 
   3295 
   3296 
   3297 Version 2.5          Last change: April 1995                   50
   3298 
   3299 
   3300 
   3301 
   3302 
   3303 
   3304 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3305 
   3306 
   3307 
   3308           beginning of a line, or 'bar' anywhere",  whereas  flex
   3309           interprets  it  as "match either 'foo' or 'bar' if they
   3310           come at the beginning of a line".   The  latter  is  in
   3311           agreement with the POSIX specification.
   3312 
   3313      -    The special table-size declarations  such  as  %a  sup-
   3314           ported  by  lex are not required by flex scanners; flex
   3315           ignores them.
   3316 
   3317      -    The name FLEX_SCANNER is #define'd so scanners  may  be
   3318           written  for use with either flex or lex. Scanners also
   3319           include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION
   3320           indicating  which version of flex generated the scanner
   3321           (for example, for the 2.5 release, these defines  would
   3322           be 2 and 5 respectively).
   3323 
   3324      The following flex features are not included in lex  or  the
   3325      POSIX specification:
   3326 
   3327          C++ scanners
   3328          %option
   3329          start condition scopes
   3330          start condition stacks
   3331          interactive/non-interactive scanners
   3332          yy_scan_string() and friends
   3333          yyterminate()
   3334          yy_set_interactive()
   3335          yy_set_bol()
   3336          YY_AT_BOL()
   3337          <<EOF>>
   3338          <*>
   3339          YY_DECL
   3340          YY_START
   3341          YY_USER_ACTION
   3342          YY_USER_INIT
   3343          #line directives
   3344          %{}'s around actions
   3345          multiple actions on a line
   3346 
   3347      plus almost all of the flex flags.  The last feature in  the
   3348      list  refers to the fact that with flex you can put multiple
   3349      actions on the same line, separated with semi-colons,  while
   3350      with lex, the following
   3351 
   3352          foo    handle_foo(); ++num_foos_seen;
   3353 
   3354      is (rather surprisingly) truncated to
   3355 
   3356          foo    handle_foo();
   3357 
   3358      flex does not truncate the action.   Actions  that  are  not
   3359      enclosed  in  braces are simply terminated at the end of the
   3360 
   3361 
   3362 
   3363 Version 2.5          Last change: April 1995                   51
   3364 
   3365 
   3366 
   3367 
   3368 
   3369 
   3370 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3371 
   3372 
   3373 
   3374      line.
   3375 
   3376 DIAGNOSTICS
   3377      warning, rule cannot be matched  indicates  that  the  given
   3378      rule  cannot  be matched because it follows other rules that
   3379      will always match the same text as it.  For example, in  the
   3380      following  "foo" cannot be matched because it comes after an
   3381      identifier "catch-all" rule:
   3382 
   3383          [a-z]+    got_identifier();
   3384          foo       got_foo();
   3385 
   3386      Using REJECT in a scanner suppresses this warning.
   3387 
   3388      warning, -s option given but default  rule  can  be  matched
   3389      means  that  it  is  possible  (perhaps only in a particular
   3390      start condition) that the default  rule  (match  any  single
   3391      character)  is  the  only  one  that will match a particular
   3392      input.  Since -s was given, presumably this is not intended.
   3393 
   3394      reject_used_but_not_detected          undefined           or
   3395      yymore_used_but_not_detected  undefined  -  These errors can
   3396      occur at compile time.  They indicate that the scanner  uses
   3397      REJECT  or yymore() but that flex failed to notice the fact,
   3398      meaning that flex scanned the first two sections looking for
   3399      occurrences  of  these  actions  and failed to find any, but
   3400      somehow you snuck some in (via a #include  file,  for  exam-
   3401      ple).   Use  %option reject or %option yymore to indicate to
   3402      flex that you really do use these features.
   3403 
   3404      flex scanner jammed - a scanner compiled with -s has encoun-
   3405      tered  an  input  string  which wasn't matched by any of its
   3406      rules.  This error can also occur due to internal problems.
   3407 
   3408      token too large, exceeds YYLMAX - your scanner  uses  %array
   3409      and one of its rules matched a string longer than the YYLMAX
   3410      constant (8K bytes by default).  You can increase the  value
   3411      by  #define'ing  YYLMAX  in  the definitions section of your
   3412      flex input.
   3413 
   3414      scanner requires -8 flag to use the  character  'x'  -  Your
   3415      scanner specification includes recognizing the 8-bit charac-
   3416      ter 'x' and you did  not  specify  the  -8  flag,  and  your
   3417      scanner  defaulted  to 7-bit because you used the -Cf or -CF
   3418      table compression options.  See the  discussion  of  the  -7
   3419      flag for details.
   3420 
   3421      flex scanner push-back overflow - you used unput()  to  push
   3422      back  so  much text that the scanner's buffer could not hold
   3423      both the pushed-back text and the current token  in  yytext.
   3424      Ideally  the scanner should dynamically resize the buffer in
   3425      this case, but at present it does not.
   3426 
   3427 
   3428 
   3429 Version 2.5          Last change: April 1995                   52
   3430 
   3431 
   3432 
   3433 
   3434 
   3435 
   3436 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3437 
   3438 
   3439 
   3440      input buffer overflow, can't enlarge buffer because  scanner
   3441      uses  REJECT  -  the  scanner  was  working  on  matching an
   3442      extremely large token and needed to expand the input buffer.
   3443      This doesn't work with scanners that use REJECT.
   3444 
   3445      fatal flex scanner internal error--end of  buffer  missed  -
   3446      This  can  occur  in  an  scanner which is reentered after a
   3447      long-jump has jumped out (or over) the scanner's  activation
   3448      frame.  Before reentering the scanner, use:
   3449 
   3450          yyrestart( yyin );
   3451 
   3452      or, as noted above, switch to using the C++ scanner class.
   3453 
   3454      too many start conditions in <> you listed more start condi-
   3455      tions  in a <> construct than exist (so you must have listed
   3456      at least one of them twice).
   3457 
   3458 FILES
   3459      -lfl library with which scanners must be linked.
   3460 
   3461      lex.yy.c
   3462           generated scanner (called lexyy.c on some systems).
   3463 
   3464      lex.yy.cc
   3465           generated C++ scanner class, when using -+.
   3466 
   3467      <FlexLexer.h>
   3468           header file defining the C++ scanner base class,  Flex-
   3469           Lexer, and its derived class, yyFlexLexer.
   3470 
   3471      flex.skl
   3472           skeleton scanner.  This file is only used when building
   3473           flex, not when flex executes.
   3474 
   3475      lex.backup
   3476           backing-up information for -b flag (called  lex.bck  on
   3477           some systems).
   3478 
   3479 DEFICIENCIES / BUGS
   3480      Some trailing context patterns cannot  be  properly  matched
   3481      and  generate  warning  messages  ("dangerous  trailing con-
   3482      text").  These are patterns where the ending  of  the  first
   3483      part  of  the rule matches the beginning of the second part,
   3484      such as "zx*/xy*", where the 'x*' matches  the  'x'  at  the
   3485      beginning  of  the  trailing  context.  (Note that the POSIX
   3486      draft states that the text matched by such patterns is unde-
   3487      fined.)
   3488 
   3489      For some trailing context rules, parts  which  are  actually
   3490      fixed-length  are  not  recognized  as  such, leading to the
   3491      abovementioned performance loss.  In particular, parts using
   3492 
   3493 
   3494 
   3495 Version 2.5          Last change: April 1995                   53
   3496 
   3497 
   3498 
   3499 
   3500 
   3501 
   3502 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3503 
   3504 
   3505 
   3506      '|'   or  {n}  (such  as  "foo{3}")  are  always  considered
   3507      variable-length.
   3508 
   3509      Combining trailing context with the special '|'  action  can
   3510      result  in fixed trailing context being turned into the more
   3511      expensive variable trailing context.  For  example,  in  the
   3512      following:
   3513 
   3514          %%
   3515          abc      |
   3516          xyz/def
   3517 
   3518 
   3519      Use of unput() invalidates yytext  and  yyleng,  unless  the
   3520      %array directive or the -l option has been used.
   3521 
   3522      Pattern-matching  of  NUL's  is  substantially  slower  than
   3523      matching other characters.
   3524 
   3525      Dynamic resizing of the input buffer is slow, as it  entails
   3526      rescanning  all the text matched so far by the current (gen-
   3527      erally huge) token.
   3528 
   3529      Due to both buffering of input and  read-ahead,  you  cannot
   3530      intermix  calls to <stdio.h> routines, such as, for example,
   3531      getchar(), with flex rules and  expect  it  to  work.   Call
   3532      input() instead.
   3533 
   3534      The total table entries listed by the -v flag  excludes  the
   3535      number  of  table  entries needed to determine what rule has
   3536      been matched.  The number of entries is equal to the  number
   3537      of  DFA states if the scanner does not use REJECT, and some-
   3538      what greater than the number of states if it does.
   3539 
   3540      REJECT cannot be used with the -f or -F options.
   3541 
   3542      The flex internal algorithms need documentation.
   3543 
   3544 SEE ALSO
   3545      lex(1), yacc(1), sed(1), awk(1).
   3546 
   3547      John Levine,  Tony  Mason,  and  Doug  Brown,  Lex  &  Yacc,
   3548      O'Reilly and Associates.  Be sure to get the 2nd edition.
   3549 
   3550      M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
   3551 
   3552      Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers:  Prin-
   3553      ciples,   Techniques   and   Tools,  Addison-Wesley  (1986).
   3554      Describes  the  pattern-matching  techniques  used  by  flex
   3555      (deterministic finite automata).
   3556 
   3557 
   3558 
   3559 
   3560 
   3561 Version 2.5          Last change: April 1995                   54
   3562 
   3563 
   3564 
   3565 
   3566 
   3567 
   3568 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3569 
   3570 
   3571 
   3572 AUTHOR
   3573      Vern Paxson, with the help of many ideas and  much  inspira-
   3574      tion  from Van Jacobson.  Original version by Jef Poskanzer.
   3575      The fast table representation is a partial implementation of
   3576      a  design done by Van Jacobson.  The implementation was done
   3577      by Kevin Gong and Vern Paxson.
   3578 
   3579      Thanks to the many flex beta-testers, feedbackers, and  con-
   3580      tributors,  especially Francois Pinard, Casey Leedom, Robert
   3581      Abramovitz,  Stan  Adermann,  Terry  Allen,  David   Barker-
   3582      Plummer,  John  Basrai,  Neal  Becker,  Nelson  H.F.  Beebe,
   3583      benson (a] odi.com, Karl Berry, Peter A. Bigot, Simon Blanchard,
   3584      Keith  Bostic,  Frederic Brehm, Ian Brockbank, Kin Cho, Nick
   3585      Christopher, Brian Clapper, J.T.  Conklin,  Jason  Coughlin,
   3586      Bill  Cox,  Nick  Cropper, Dave Curtis, Scott David Daniels,
   3587      Chris  G.  Demetriou,  Theo  Deraadt,  Mike  Donahue,  Chuck
   3588      Doucette,  Tom  Epperly,  Leo  Eskin,  Chris  Faylor,  Chris
   3589      Flatters, Jon Forrest, Jeffrey Friedl, Joe Gayda,  Kaveh  R.
   3590      Ghazi,  Wolfgang  Glunz, Eric Goldman, Christopher M. Gould,
   3591      Ulrich Grepel, Peer Griebel, Jan  Hajic,  Charles  Hemphill,
   3592      NORO  Hideo,  Jarkko  Hietaniemi, Scott Hofmann, Jeff Honig,
   3593      Dana Hudes, Eric Hughes,  John  Interrante,  Ceriel  Jacobs,
   3594      Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry
   3595      Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O  Kane,
   3596      Amir  Katz, ken (a] ken.hilco.com, Kevin B. Kenny, Steve Kirsch,
   3597      Winfried Koenig, Marq  Kole,  Ronald  Lamprecht,  Greg  Lee,
   3598      Rohan  Lenard, Craig Leres, John Levine, Steve Liddle, David
   3599      Loffredo, Mike Long, Mohamed el Lozy, Brian  Madsen,  Malte,
   3600      Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn,
   3601      Jim Meyering,  R.  Alexander  Milowski,  Erik  Naggum,  G.T.
   3602      Nicol,  Landon  Noll,  James  Nordby,  Marc  Nozell, Richard
   3603      Ohnemus, Karsten Pahnke, Sven Panne,  Roland  Pesch,  Walter
   3604      Pelissero,  Gaumond  Pierre, Esmond Pitt, Jef Poskanzer, Joe
   3605      Rahmeh, Jarmo Raiha, Frederic Raimbault,  Pat  Rankin,  Rick
   3606      Richardson,  Kevin  Rodgers,  Kai  Uwe  Rommel, Jim Roskind,
   3607      Alberto Santini,  Andreas  Scherer,  Darrell  Schiebel,  Raf
   3608      Schietekat,  Doug  Schmidt,  Philippe  Schnoebelen,  Andreas
   3609      Schwab, Larry Schwimmer, Alex Siegel, Eckehard  Stolz,  Jan-
   3610      Erik  Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
   3611      Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi  Tsai,
   3612      Paul  Tuinenga,  Gary  Weik, Frank Whaley, Gerhard Wilhelms,
   3613      Kent Williams, Ken Yap,  Ron  Zellar,  Nathan  Zelle,  David
   3614      Zuhn,  and  those whose names have slipped my marginal mail-
   3615      archiving skills but whose contributions are appreciated all
   3616      the same.
   3617 
   3618      Thanks to Keith Bostic, Jon  Forrest,  Noah  Friedman,  John
   3619      Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol,
   3620      Francois Pinard, Rich Salz, and Richard  Stallman  for  help
   3621      with various distribution headaches.
   3622 
   3623 
   3624 
   3625 
   3626 
   3627 Version 2.5          Last change: April 1995                   55
   3628 
   3629 
   3630 
   3631 
   3632 
   3633 
   3634 FLEX(1)                  USER COMMANDS                    FLEX(1)
   3635 
   3636 
   3637 
   3638      Thanks to Esmond Pitt and Earle Horton for  8-bit  character
   3639      support; to Benson Margulies and Fred Burke for C++ support;
   3640      to Kent Williams and Tom Epperly for C++ class  support;  to
   3641      Ove  Ewerlid  for  support  of NUL's; and to Eric Hughes for
   3642      support of multiple buffers.
   3643 
   3644      This work was primarily done when I was with the  Real  Time
   3645      Systems  Group at the Lawrence Berkeley Laboratory in Berke-
   3646      ley, CA.  Many  thanks  to  all  there  for  the  support  I
   3647      received.
   3648 
   3649      Send comments to vern (a] ee.lbl.gov.
   3650 
   3651 
   3652 
   3653 
   3654 
   3655 
   3656 
   3657 
   3658 
   3659 
   3660 
   3661 
   3662 
   3663 
   3664 
   3665 
   3666 
   3667 
   3668 
   3669 
   3670 
   3671 
   3672 
   3673 
   3674 
   3675 
   3676 
   3677 
   3678 
   3679 
   3680 
   3681 
   3682 
   3683 
   3684 
   3685 
   3686 
   3687 
   3688 
   3689 
   3690 
   3691 
   3692 
   3693 Version 2.5          Last change: April 1995                   56
   3694 
   3695 
   3696 
   3697