1 \input texinfo 2 @c %**start of header 3 @setfilename flex.info 4 @settitle Flex - a scanner generator 5 @c @finalout 6 @c @setchapternewpage odd 7 @c %**end of header 8 9 @set EDITION 2.5 10 @set UPDATED March 1995 11 @set VERSION 2.5 12 13 @c FIXME - Reread a printed copy with a red pen and patience. 14 @c FIXME - Modify all "See ..." references and replace with @xref's. 15 16 @ifinfo 17 @format 18 START-INFO-DIR-ENTRY 19 * Flex: (flex). A fast scanner generator. 20 END-INFO-DIR-ENTRY 21 @end format 22 @end ifinfo 23 24 @c Define new indices for commands, filenames, and options. 25 @c @defcodeindex cm 26 @c @defcodeindex fl 27 @c @defcodeindex op 28 29 @c Put everything in one index (arbitrarily chosen to be the concept index). 30 @c @syncodeindex cm cp 31 @c @syncodeindex fl cp 32 @syncodeindex fn cp 33 @syncodeindex ky cp 34 @c @syncodeindex op cp 35 @syncodeindex pg cp 36 @syncodeindex vr cp 37 38 @ifinfo 39 This file documents Flex. 40 41 Copyright (c) 1990 The Regents of the University of California. 42 All rights reserved. 43 44 This code is derived from software contributed to Berkeley by 45 Vern Paxson. 46 47 The United States Government has rights in this work pursuant 48 to contract no. DE-AC03-76SF00098 between the United States 49 Department of Energy and the University of California. 50 51 Redistribution and use in source and binary forms with or without 52 modification are permitted provided that: (1) source distributions 53 retain this entire copyright notice and comment, and (2) 54 distributions including binaries display the following 55 acknowledgement: ``This product includes software developed by the 56 University of California, Berkeley and its contributors'' in the 57 documentation or other materials provided with the distribution and 58 in all advertising materials mentioning features or use of this 59 software. Neither the name of the University nor the names of its 60 contributors may be used to endorse or promote products derived 61 from this software without specific prior written permission. 62 63 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 64 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 65 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 66 PURPOSE. 67 68 @ignore 69 Permission is granted to process this file through TeX and print the 70 results, provided the printed document carries copying permission 71 notice identical to this one except for the removal of this paragraph 72 (this paragraph not being relevant to the printed manual). 73 74 @end ignore 75 @end ifinfo 76 77 @titlepage 78 @title Flex, version @value{VERSION} 79 @subtitle A fast scanner generator 80 @subtitle Edition @value{EDITION}, @value{UPDATED} 81 @author Vern Paxson 82 83 @page 84 @vskip 0pt plus 1filll 85 Copyright @copyright{} 1990 The Regents of the University of California. 86 All rights reserved. 87 88 This code is derived from software contributed to Berkeley by 89 Vern Paxson. 90 91 The United States Government has rights in this work pursuant 92 to contract no. DE-AC03-76SF00098 between the United States 93 Department of Energy and the University of California. 94 95 Redistribution and use in source and binary forms with or without 96 modification are permitted provided that: (1) source distributions 97 retain this entire copyright notice and comment, and (2) 98 distributions including binaries display the following 99 acknowledgement: ``This product includes software developed by the 100 University of California, Berkeley and its contributors'' in the 101 documentation or other materials provided with the distribution and 102 in all advertising materials mentioning features or use of this 103 software. Neither the name of the University nor the names of its 104 contributors may be used to endorse or promote products derived 105 from this software without specific prior written permission. 106 107 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 108 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 109 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 110 PURPOSE. 111 @end titlepage 112 113 @ifinfo 114 115 @node Top, Name, (dir), (dir) 116 @top flex 117 118 @cindex scanner generator 119 120 This manual documents @code{flex}. It covers release @value{VERSION}. 121 122 @menu 123 * Name:: Name 124 * Synopsis:: Synopsis 125 * Overview:: Overview 126 * Description:: Description 127 * Examples:: Some simple examples 128 * Format:: Format of the input file 129 * Patterns:: Patterns 130 * Matching:: How the input is matched 131 * Actions:: Actions 132 * Generated scanner:: The generated scanner 133 * Start conditions:: Start conditions 134 * Multiple buffers:: Multiple input buffers 135 * End-of-file rules:: End-of-file rules 136 * Miscellaneous:: Miscellaneous macros 137 * User variables:: Values available to the user 138 * YACC interface:: Interfacing with @code{yacc} 139 * Options:: Options 140 * Performance:: Performance considerations 141 * C++:: Generating C++ scanners 142 * Incompatibilities:: Incompatibilities with @code{lex} and POSIX 143 * Diagnostics:: Diagnostics 144 * Files:: Files 145 * Deficiencies:: Deficiencies / Bugs 146 * See also:: See also 147 * Author:: Author 148 @c * Index:: Index 149 @end menu 150 151 @end ifinfo 152 153 @node Name, Synopsis, Top, Top 154 @section Name 155 156 flex - fast lexical analyzer generator 157 158 @node Synopsis, Overview, Name, Top 159 @section Synopsis 160 161 @example 162 flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton] 163 [--help --version] [@var{filename} @dots{}] 164 @end example 165 166 @node Overview, Description, Synopsis, Top 167 @section Overview 168 169 This manual describes @code{flex}, a tool for generating programs 170 that perform pattern-matching on text. The manual 171 includes both tutorial and reference sections: 172 173 @table @asis 174 @item Description 175 a brief overview of the tool 176 177 @item Some Simple Examples 178 179 @item Format Of The Input File 180 181 @item Patterns 182 the extended regular expressions used by flex 183 184 @item How The Input Is Matched 185 the rules for determining what has been matched 186 187 @item Actions 188 how to specify what to do when a pattern is matched 189 190 @item The Generated Scanner 191 details regarding the scanner that flex produces; 192 how to control the input source 193 194 @item Start Conditions 195 introducing context into your scanners, and 196 managing "mini-scanners" 197 198 @item Multiple Input Buffers 199 how to manipulate multiple input sources; how to 200 scan from strings instead of files 201 202 @item End-of-file Rules 203 special rules for matching the end of the input 204 205 @item Miscellaneous Macros 206 a summary of macros available to the actions 207 208 @item Values Available To The User 209 a summary of values available to the actions 210 211 @item Interfacing With Yacc 212 connecting flex scanners together with yacc parsers 213 214 @item Options 215 flex command-line options, and the "%option" 216 directive 217 218 @item Performance Considerations 219 how to make your scanner go as fast as possible 220 221 @item Generating C++ Scanners 222 the (experimental) facility for generating C++ 223 scanner classes 224 225 @item Incompatibilities With Lex And POSIX 226 how flex differs from AT&T lex and the POSIX lex 227 standard 228 229 @item Diagnostics 230 those error messages produced by flex (or scanners 231 it generates) whose meanings might not be apparent 232 233 @item Files 234 files used by flex 235 236 @item Deficiencies / Bugs 237 known problems with flex 238 239 @item See Also 240 other documentation, related tools 241 242 @item Author 243 includes contact information 244 @end table 245 246 @node Description, Examples, Overview, Top 247 @section Description 248 249 @code{flex} is a tool for generating @dfn{scanners}: programs which 250 recognized lexical patterns in text. @code{flex} reads the given 251 input files, or its standard input if no file names are 252 given, for a description of a scanner to generate. The 253 description is in the form of pairs of regular expressions 254 and C code, called @dfn{rules}. @code{flex} generates as output a C 255 source file, @file{lex.yy.c}, which defines a routine @samp{yylex()}. 256 This file is compiled and linked with the @samp{-lfl} library to 257 produce an executable. When the executable is run, it 258 analyzes its input for occurrences of the regular 259 expressions. Whenever it finds one, it executes the 260 corresponding C code. 261 262 @node Examples, Format, Description, Top 263 @section Some simple examples 264 265 First some simple examples to get the flavor of how one 266 uses @code{flex}. The following @code{flex} input specifies a scanner 267 which whenever it encounters the string "username" will 268 replace it with the user's login name: 269 270 @example 271 %% 272 username printf( "%s", getlogin() ); 273 @end example 274 275 By default, any text not matched by a @code{flex} scanner is 276 copied to the output, so the net effect of this scanner is 277 to copy its input file to its output with each occurrence 278 of "username" expanded. In this input, there is just one 279 rule. "username" is the @var{pattern} and the "printf" is the 280 @var{action}. The "%%" marks the beginning of the rules. 281 282 Here's another simple example: 283 284 @example 285 int num_lines = 0, num_chars = 0; 286 287 %% 288 \n ++num_lines; ++num_chars; 289 . ++num_chars; 290 291 %% 292 main() 293 @{ 294 yylex(); 295 printf( "# of lines = %d, # of chars = %d\n", 296 num_lines, num_chars ); 297 @} 298 @end example 299 300 This scanner counts the number of characters and the 301 number of lines in its input (it produces no output other 302 than the final report on the counts). The first line 303 declares two globals, "num_lines" and "num_chars", which 304 are accessible both inside @samp{yylex()} and in the @samp{main()} 305 routine declared after the second "%%". There are two rules, 306 one which matches a newline ("\n") and increments both the 307 line count and the character count, and one which matches 308 any character other than a newline (indicated by the "." 309 regular expression). 310 311 A somewhat more complicated example: 312 313 @example 314 /* scanner for a toy Pascal-like language */ 315 316 %@{ 317 /* need this for the call to atof() below */ 318 #include <math.h> 319 %@} 320 321 DIGIT [0-9] 322 ID [a-z][a-z0-9]* 323 324 %% 325 326 @{DIGIT@}+ @{ 327 printf( "An integer: %s (%d)\n", yytext, 328 atoi( yytext ) ); 329 @} 330 331 @{DIGIT@}+"."@{DIGIT@}* @{ 332 printf( "A float: %s (%g)\n", yytext, 333 atof( yytext ) ); 334 @} 335 336 if|then|begin|end|procedure|function @{ 337 printf( "A keyword: %s\n", yytext ); 338 @} 339 340 @{ID@} printf( "An identifier: %s\n", yytext ); 341 342 "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext ); 343 344 "@{"[^@}\n]*"@}" /* eat up one-line comments */ 345 346 [ \t\n]+ /* eat up whitespace */ 347 348 . printf( "Unrecognized character: %s\n", yytext ); 349 350 %% 351 352 main( argc, argv ) 353 int argc; 354 char **argv; 355 @{ 356 ++argv, --argc; /* skip over program name */ 357 if ( argc > 0 ) 358 yyin = fopen( argv[0], "r" ); 359 else 360 yyin = stdin; 361 362 yylex(); 363 @} 364 @end example 365 366 This is the beginnings of a simple scanner for a language 367 like Pascal. It identifies different types of @var{tokens} and 368 reports on what it has seen. 369 370 The details of this example will be explained in the 371 following sections. 372 373 @node Format, Patterns, Examples, Top 374 @section Format of the input file 375 376 The @code{flex} input file consists of three sections, separated 377 by a line with just @samp{%%} in it: 378 379 @example 380 definitions 381 %% 382 rules 383 %% 384 user code 385 @end example 386 387 The @dfn{definitions} section contains declarations of simple 388 @dfn{name} definitions to simplify the scanner specification, 389 and declarations of @dfn{start conditions}, which are explained 390 in a later section. 391 Name definitions have the form: 392 393 @example 394 name definition 395 @end example 396 397 The "name" is a word beginning with a letter or an 398 underscore ('_') followed by zero or more letters, digits, '_', 399 or '-' (dash). The definition is taken to begin at the 400 first non-white-space character following the name and 401 continuing to the end of the line. The definition can 402 subsequently be referred to using "@{name@}", which will 403 expand to "(definition)". For example, 404 405 @example 406 DIGIT [0-9] 407 ID [a-z][a-z0-9]* 408 @end example 409 410 @noindent 411 defines "DIGIT" to be a regular expression which matches a 412 single digit, and "ID" to be a regular expression which 413 matches a letter followed by zero-or-more 414 letters-or-digits. A subsequent reference to 415 416 @example 417 @{DIGIT@}+"."@{DIGIT@}* 418 @end example 419 420 @noindent 421 is identical to 422 423 @example 424 ([0-9])+"."([0-9])* 425 @end example 426 427 @noindent 428 and matches one-or-more digits followed by a '.' followed 429 by zero-or-more digits. 430 431 The @var{rules} section of the @code{flex} input contains a series of 432 rules of the form: 433 434 @example 435 pattern action 436 @end example 437 438 @noindent 439 where the pattern must be unindented and the action must 440 begin on the same line. 441 442 See below for a further description of patterns and 443 actions. 444 445 Finally, the user code section is simply copied to 446 @file{lex.yy.c} verbatim. It is used for companion routines 447 which call or are called by the scanner. The presence of 448 this section is optional; if it is missing, the second @samp{%%} 449 in the input file may be skipped, too. 450 451 In the definitions and rules sections, any @emph{indented} text or 452 text enclosed in @samp{%@{} and @samp{%@}} is copied verbatim to the 453 output (with the @samp{%@{@}}'s removed). The @samp{%@{@}}'s must 454 appear unindented on lines by themselves. 455 456 In the rules section, any indented or %@{@} text appearing 457 before the first rule may be used to declare variables 458 which are local to the scanning routine and (after the 459 declarations) code which is to be executed whenever the 460 scanning routine is entered. Other indented or %@{@} text 461 in the rule section is still copied to the output, but its 462 meaning is not well-defined and it may well cause 463 compile-time errors (this feature is present for @code{POSIX} compliance; 464 see below for other such features). 465 466 In the definitions section (but not in the rules section), 467 an unindented comment (i.e., a line beginning with "/*") 468 is also copied verbatim to the output up to the next "*/". 469 470 @node Patterns, Matching, Format, Top 471 @section Patterns 472 473 The patterns in the input are written using an extended 474 set of regular expressions. These are: 475 476 @table @samp 477 @item x 478 match the character @samp{x} 479 @item . 480 any character (byte) except newline 481 @item [xyz] 482 a "character class"; in this case, the pattern 483 matches either an @samp{x}, a @samp{y}, or a @samp{z} 484 @item [abj-oZ] 485 a "character class" with a range in it; matches 486 an @samp{a}, a @samp{b}, any letter from @samp{j} through @samp{o}, 487 or a @samp{Z} 488 @item [^A-Z] 489 a "negated character class", i.e., any character 490 but those in the class. In this case, any 491 character EXCEPT an uppercase letter. 492 @item [^A-Z\n] 493 any character EXCEPT an uppercase letter or 494 a newline 495 @item @var{r}* 496 zero or more @var{r}'s, where @var{r} is any regular expression 497 @item @var{r}+ 498 one or more @var{r}'s 499 @item @var{r}? 500 zero or one @var{r}'s (that is, "an optional @var{r}") 501 @item @var{r}@{2,5@} 502 anywhere from two to five @var{r}'s 503 @item @var{r}@{2,@} 504 two or more @var{r}'s 505 @item @var{r}@{4@} 506 exactly 4 @var{r}'s 507 @item @{@var{name}@} 508 the expansion of the "@var{name}" definition 509 (see above) 510 @item "[xyz]\"foo" 511 the literal string: @samp{[xyz]"foo} 512 @item \@var{x} 513 if @var{x} is an @samp{a}, @samp{b}, @samp{f}, @samp{n}, @samp{r}, @samp{t}, or @samp{v}, 514 then the ANSI-C interpretation of \@var{x}. 515 Otherwise, a literal @samp{@var{x}} (used to escape 516 operators such as @samp{*}) 517 @item \0 518 a NUL character (ASCII code 0) 519 @item \123 520 the character with octal value 123 521 @item \x2a 522 the character with hexadecimal value @code{2a} 523 @item (@var{r}) 524 match an @var{r}; parentheses are used to override 525 precedence (see below) 526 @item @var{r}@var{s} 527 the regular expression @var{r} followed by the 528 regular expression @var{s}; called "concatenation" 529 @item @var{r}|@var{s} 530 either an @var{r} or an @var{s} 531 @item @var{r}/@var{s} 532 an @var{r} but only if it is followed by an @var{s}. The text 533 matched by @var{s} is included when determining whether this rule is 534 the @dfn{longest match}, but is then returned to the input before 535 the action is executed. So the action only sees the text matched 536 by @var{r}. This type of pattern is called @dfn{trailing context}. 537 (There are some combinations of @samp{@var{r}/@var{s}} that @code{flex} 538 cannot match correctly; see notes in the Deficiencies / Bugs section 539 below regarding "dangerous trailing context".) 540 @item ^@var{r} 541 an @var{r}, but only at the beginning of a line (i.e., 542 which just starting to scan, or right after a 543 newline has been scanned). 544 @item @var{r}$ 545 an @var{r}, but only at the end of a line (i.e., just 546 before a newline). Equivalent to "@var{r}/\n". 547 548 Note that flex's notion of "newline" is exactly 549 whatever the C compiler used to compile flex 550 interprets '\n' as; in particular, on some DOS 551 systems you must either filter out \r's in the 552 input yourself, or explicitly use @var{r}/\r\n for "r$". 553 @item <@var{s}>@var{r} 554 an @var{r}, but only in start condition @var{s} (see 555 below for discussion of start conditions) 556 <@var{s1},@var{s2},@var{s3}>@var{r} 557 same, but in any of start conditions @var{s1}, 558 @var{s2}, or @var{s3} 559 @item <*>@var{r} 560 an @var{r} in any start condition, even an exclusive one. 561 @item <<EOF>> 562 an end-of-file 563 <@var{s1},@var{s2}><<EOF>> 564 an end-of-file when in start condition @var{s1} or @var{s2} 565 @end table 566 567 Note that inside of a character class, all regular 568 expression operators lose their special meaning except escape 569 ('\') and the character class operators, '-', ']', and, at 570 the beginning of the class, '^'. 571 572 The regular expressions listed above are grouped according 573 to precedence, from highest precedence at the top to 574 lowest at the bottom. Those grouped together have equal 575 precedence. For example, 576 577 @example 578 foo|bar* 579 @end example 580 581 @noindent 582 is the same as 583 584 @example 585 (foo)|(ba(r*)) 586 @end example 587 588 @noindent 589 since the '*' operator has higher precedence than 590 concatenation, and concatenation higher than alternation ('|'). 591 This pattern therefore matches @emph{either} the string "foo" @emph{or} 592 the string "ba" followed by zero-or-more r's. To match 593 "foo" or zero-or-more "bar"'s, use: 594 595 @example 596 foo|(bar)* 597 @end example 598 599 @noindent 600 and to match zero-or-more "foo"'s-or-"bar"'s: 601 602 @example 603 (foo|bar)* 604 @end example 605 606 In addition to characters and ranges of characters, 607 character classes can also contain character class 608 @dfn{expressions}. These are expressions enclosed inside @samp{[}: and @samp{:}] 609 delimiters (which themselves must appear between the '[' 610 and ']' of the character class; other elements may occur 611 inside the character class, too). The valid expressions 612 are: 613 614 @example 615 [:alnum:] [:alpha:] [:blank:] 616 [:cntrl:] [:digit:] [:graph:] 617 [:lower:] [:print:] [:punct:] 618 [:space:] [:upper:] [:xdigit:] 619 @end example 620 621 These expressions all designate a set of characters 622 equivalent to the corresponding standard C @samp{isXXX} function. For 623 example, @samp{[:alnum:]} designates those characters for which 624 @samp{isalnum()} returns true - i.e., any alphabetic or numeric. 625 Some systems don't provide @samp{isblank()}, so flex defines 626 @samp{[:blank:]} as a blank or a tab. 627 628 For example, the following character classes are all 629 equivalent: 630 631 @example 632 [[:alnum:]] 633 [[:alpha:][:digit:] 634 [[:alpha:]0-9] 635 [a-zA-Z0-9] 636 @end example 637 638 If your scanner is case-insensitive (the @samp{-i} flag), then 639 @samp{[:upper:]} and @samp{[:lower:]} are equivalent to @samp{[:alpha:]}. 640 641 Some notes on patterns: 642 643 @itemize - 644 @item 645 A negated character class such as the example 646 "[^A-Z]" above @emph{will match a newline} unless "\n" (or an 647 equivalent escape sequence) is one of the 648 characters explicitly present in the negated character 649 class (e.g., "[^A-Z\n]"). This is unlike how many 650 other regular expression tools treat negated 651 character classes, but unfortunately the inconsistency 652 is historically entrenched. Matching newlines 653 means that a pattern like [^"]* can match the 654 entire input unless there's another quote in the 655 input. 656 657 @item 658 A rule can have at most one instance of trailing 659 context (the '/' operator or the '$' operator). 660 The start condition, '^', and "<<EOF>>" patterns 661 can only occur at the beginning of a pattern, and, 662 as well as with '/' and '$', cannot be grouped 663 inside parentheses. A '^' which does not occur at 664 the beginning of a rule or a '$' which does not 665 occur at the end of a rule loses its special 666 properties and is treated as a normal character. 667 668 The following are illegal: 669 670 @example 671 foo/bar$ 672 <sc1>foo<sc2>bar 673 @end example 674 675 Note that the first of these, can be written 676 "foo/bar\n". 677 678 The following will result in '$' or '^' being 679 treated as a normal character: 680 681 @example 682 foo|(bar$) 683 foo|^bar 684 @end example 685 686 If what's wanted is a "foo" or a 687 bar-followed-by-a-newline, the following could be used (the special 688 '|' action is explained below): 689 690 @example 691 foo | 692 bar$ /* action goes here */ 693 @end example 694 695 A similar trick will work for matching a foo or a 696 bar-at-the-beginning-of-a-line. 697 @end itemize 698 699 @node Matching, Actions, Patterns, Top 700 @section How the input is matched 701 702 When the generated scanner is run, it analyzes its input 703 looking for strings which match any of its patterns. If 704 it finds more than one match, it takes the one matching 705 the most text (for trailing context rules, this includes 706 the length of the trailing part, even though it will then 707 be returned to the input). If it finds two or more 708 matches of the same length, the rule listed first in the 709 @code{flex} input file is chosen. 710 711 Once the match is determined, the text corresponding to 712 the match (called the @var{token}) is made available in the 713 global character pointer @code{yytext}, and its length in the 714 global integer @code{yyleng}. The @var{action} corresponding to the 715 matched pattern is then executed (a more detailed 716 description of actions follows), and then the remaining input is 717 scanned for another match. 718 719 If no match is found, then the @dfn{default rule} is executed: 720 the next character in the input is considered matched and 721 copied to the standard output. Thus, the simplest legal 722 @code{flex} input is: 723 724 @example 725 %% 726 @end example 727 728 which generates a scanner that simply copies its input 729 (one character at a time) to its output. 730 731 Note that @code{yytext} can be defined in two different ways: 732 either as a character @emph{pointer} or as a character @emph{array}. 733 You can control which definition @code{flex} uses by including 734 one of the special directives @samp{%pointer} or @samp{%array} in the 735 first (definitions) section of your flex input. The 736 default is @samp{%pointer}, unless you use the @samp{-l} lex 737 compatibility option, in which case @code{yytext} will be an array. The 738 advantage of using @samp{%pointer} is substantially faster 739 scanning and no buffer overflow when matching very large 740 tokens (unless you run out of dynamic memory). The 741 disadvantage is that you are restricted in how your actions can 742 modify @code{yytext} (see the next section), and calls to the 743 @samp{unput()} function destroys the present contents of @code{yytext}, 744 which can be a considerable porting headache when moving 745 between different @code{lex} versions. 746 747 The advantage of @samp{%array} is that you can then modify @code{yytext} 748 to your heart's content, and calls to @samp{unput()} do not 749 destroy @code{yytext} (see below). Furthermore, existing @code{lex} 750 programs sometimes access @code{yytext} externally using 751 declarations of the form: 752 @example 753 extern char yytext[]; 754 @end example 755 This definition is erroneous when used with @samp{%pointer}, but 756 correct for @samp{%array}. 757 758 @samp{%array} defines @code{yytext} to be an array of @code{YYLMAX} characters, 759 which defaults to a fairly large value. You can change 760 the size by simply #define'ing @code{YYLMAX} to a different value 761 in the first section of your @code{flex} input. As mentioned 762 above, with @samp{%pointer} yytext grows dynamically to 763 accommodate large tokens. While this means your @samp{%pointer} scanner 764 can accommodate very large tokens (such as matching entire 765 blocks of comments), bear in mind that each time the 766 scanner must resize @code{yytext} it also must rescan the entire 767 token from the beginning, so matching such tokens can 768 prove slow. @code{yytext} presently does @emph{not} dynamically grow if 769 a call to @samp{unput()} results in too much text being pushed 770 back; instead, a run-time error results. 771 772 Also note that you cannot use @samp{%array} with C++ scanner 773 classes (the @code{c++} option; see below). 774 775 @node Actions, Generated scanner, Matching, Top 776 @section Actions 777 778 Each pattern in a rule has a corresponding action, which 779 can be any arbitrary C statement. The pattern ends at the 780 first non-escaped whitespace character; the remainder of 781 the line is its action. If the action is empty, then when 782 the pattern is matched the input token is simply 783 discarded. For example, here is the specification for a 784 program which deletes all occurrences of "zap me" from its 785 input: 786 787 @example 788 %% 789 "zap me" 790 @end example 791 792 (It will copy all other characters in the input to the 793 output since they will be matched by the default rule.) 794 795 Here is a program which compresses multiple blanks and 796 tabs down to a single blank, and throws away whitespace 797 found at the end of a line: 798 799 @example 800 %% 801 [ \t]+ putchar( ' ' ); 802 [ \t]+$ /* ignore this token */ 803 @end example 804 805 If the action contains a '@{', then the action spans till 806 the balancing '@}' is found, and the action may cross 807 multiple lines. @code{flex} knows about C strings and comments and 808 won't be fooled by braces found within them, but also 809 allows actions to begin with @samp{%@{} and will consider the 810 action to be all the text up to the next @samp{%@}} (regardless of 811 ordinary braces inside the action). 812 813 An action consisting solely of a vertical bar ('|') means 814 "same as the action for the next rule." See below for an 815 illustration. 816 817 Actions can include arbitrary C code, including @code{return} 818 statements to return a value to whatever routine called 819 @samp{yylex()}. Each time @samp{yylex()} is called it continues 820 processing tokens from where it last left off until it either 821 reaches the end of the file or executes a return. 822 823 Actions are free to modify @code{yytext} except for lengthening 824 it (adding characters to its end--these will overwrite 825 later characters in the input stream). This however does 826 not apply when using @samp{%array} (see above); in that case, 827 @code{yytext} may be freely modified in any way. 828 829 Actions are free to modify @code{yyleng} except they should not 830 do so if the action also includes use of @samp{yymore()} (see 831 below). 832 833 There are a number of special directives which can be 834 included within an action: 835 836 @itemize - 837 @item 838 @samp{ECHO} copies yytext to the scanner's output. 839 840 @item 841 @code{BEGIN} followed by the name of a start condition 842 places the scanner in the corresponding start 843 condition (see below). 844 845 @item 846 @code{REJECT} directs the scanner to proceed on to the 847 "second best" rule which matched the input (or a 848 prefix of the input). The rule is chosen as 849 described above in "How the Input is Matched", and 850 @code{yytext} and @code{yyleng} set up appropriately. It may 851 either be one which matched as much text as the 852 originally chosen rule but came later in the @code{flex} 853 input file, or one which matched less text. For 854 example, the following will both count the words in 855 the input and call the routine special() whenever 856 "frob" is seen: 857 858 @example 859 int word_count = 0; 860 %% 861 862 frob special(); REJECT; 863 [^ \t\n]+ ++word_count; 864 @end example 865 866 Without the @code{REJECT}, any "frob"'s in the input would 867 not be counted as words, since the scanner normally 868 executes only one action per token. Multiple 869 @code{REJECT's} are allowed, each one finding the next 870 best choice to the currently active rule. For 871 example, when the following scanner scans the token 872 "abcd", it will write "abcdabcaba" to the output: 873 874 @example 875 %% 876 a | 877 ab | 878 abc | 879 abcd ECHO; REJECT; 880 .|\n /* eat up any unmatched character */ 881 @end example 882 883 (The first three rules share the fourth's action 884 since they use the special '|' action.) @code{REJECT} is 885 a particularly expensive feature in terms of 886 scanner performance; if it is used in @emph{any} of the 887 scanner's actions it will slow down @emph{all} of the 888 scanner's matching. Furthermore, @code{REJECT} cannot be used 889 with the @samp{-Cf} or @samp{-CF} options (see below). 890 891 Note also that unlike the other special actions, 892 @code{REJECT} is a @emph{branch}; code immediately following it 893 in the action will @emph{not} be executed. 894 895 @item 896 @samp{yymore()} tells the scanner that the next time it 897 matches a rule, the corresponding token should be 898 @emph{appended} onto the current value of @code{yytext} rather 899 than replacing it. For example, given the input 900 "mega-kludge" the following will write 901 "mega-mega-kludge" to the output: 902 903 @example 904 %% 905 mega- ECHO; yymore(); 906 kludge ECHO; 907 @end example 908 909 First "mega-" is matched and echoed to the output. 910 Then "kludge" is matched, but the previous "mega-" 911 is still hanging around at the beginning of @code{yytext} 912 so the @samp{ECHO} for the "kludge" rule will actually 913 write "mega-kludge". 914 @end itemize 915 916 Two notes regarding use of @samp{yymore()}. First, @samp{yymore()} 917 depends on the value of @code{yyleng} correctly reflecting the 918 size of the current token, so you must not modify @code{yyleng} 919 if you are using @samp{yymore()}. Second, the presence of 920 @samp{yymore()} in the scanner's action entails a minor 921 performance penalty in the scanner's matching speed. 922 923 @itemize - 924 @item 925 @samp{yyless(n)} returns all but the first @var{n} characters of 926 the current token back to the input stream, where 927 they will be rescanned when the scanner looks for 928 the next match. @code{yytext} and @code{yyleng} are adjusted 929 appropriately (e.g., @code{yyleng} will now be equal to @var{n} 930 ). For example, on the input "foobar" the 931 following will write out "foobarbar": 932 933 @example 934 %% 935 foobar ECHO; yyless(3); 936 [a-z]+ ECHO; 937 @end example 938 939 An argument of 0 to @code{yyless} will cause the entire 940 current input string to be scanned again. Unless 941 you've changed how the scanner will subsequently 942 process its input (using @code{BEGIN}, for example), this 943 will result in an endless loop. 944 945 Note that @code{yyless} is a macro and can only be used in the 946 flex input file, not from other source files. 947 948 @item 949 @samp{unput(c)} puts the character @code{c} back onto the input 950 stream. It will be the next character scanned. 951 The following action will take the current token 952 and cause it to be rescanned enclosed in 953 parentheses. 954 955 @example 956 @{ 957 int i; 958 /* Copy yytext because unput() trashes yytext */ 959 char *yycopy = strdup( yytext ); 960 unput( ')' ); 961 for ( i = yyleng - 1; i >= 0; --i ) 962 unput( yycopy[i] ); 963 unput( '(' ); 964 free( yycopy ); 965 @} 966 @end example 967 968 Note that since each @samp{unput()} puts the given 969 character back at the @emph{beginning} of the input stream, 970 pushing back strings must be done back-to-front. 971 An important potential problem when using @samp{unput()} is that 972 if you are using @samp{%pointer} (the default), a call to @samp{unput()} 973 @emph{destroys} the contents of @code{yytext}, starting with its 974 rightmost character and devouring one character to the left 975 with each call. If you need the value of yytext preserved 976 after a call to @samp{unput()} (as in the above example), you 977 must either first copy it elsewhere, or build your scanner 978 using @samp{%array} instead (see How The Input Is Matched). 979 980 Finally, note that you cannot put back @code{EOF} to attempt to 981 mark the input stream with an end-of-file. 982 983 @item 984 @samp{input()} reads the next character from the input 985 stream. For example, the following is one way to 986 eat up C comments: 987 988 @example 989 %% 990 "/*" @{ 991 register int c; 992 993 for ( ; ; ) 994 @{ 995 while ( (c = input()) != '*' && 996 c != EOF ) 997 ; /* eat up text of comment */ 998 999 if ( c == '*' ) 1000 @{ 1001 while ( (c = input()) == '*' ) 1002 ; 1003 if ( c == '/' ) 1004 break; /* found the end */ 1005 @} 1006 1007 if ( c == EOF ) 1008 @{ 1009 error( "EOF in comment" ); 1010 break; 1011 @} 1012 @} 1013 @} 1014 @end example 1015 1016 (Note that if the scanner is compiled using @samp{C++}, 1017 then @samp{input()} is instead referred to as @samp{yyinput()}, 1018 in order to avoid a name clash with the @samp{C++} stream 1019 by the name of @code{input}.) 1020 1021 @item YY_FLUSH_BUFFER 1022 flushes the scanner's internal buffer so that the next time the scanner 1023 attempts to match a token, it will first refill the buffer using 1024 @code{YY_INPUT} (see The Generated Scanner, below). This action is 1025 a special case of the more general @samp{yy_flush_buffer()} function, 1026 described below in the section Multiple Input Buffers. 1027 1028 @item 1029 @samp{yyterminate()} can be used in lieu of a return 1030 statement in an action. It terminates the scanner 1031 and returns a 0 to the scanner's caller, indicating 1032 "all done". By default, @samp{yyterminate()} is also 1033 called when an end-of-file is encountered. It is a 1034 macro and may be redefined. 1035 @end itemize 1036 1037 @node Generated scanner, Start conditions, Actions, Top 1038 @section The generated scanner 1039 1040 The output of @code{flex} is the file @file{lex.yy.c}, which contains 1041 the scanning routine @samp{yylex()}, a number of tables used by 1042 it for matching tokens, and a number of auxiliary routines 1043 and macros. By default, @samp{yylex()} is declared as follows: 1044 1045 @example 1046 int yylex() 1047 @{ 1048 @dots{} various definitions and the actions in here @dots{} 1049 @} 1050 @end example 1051 1052 (If your environment supports function prototypes, then it 1053 will be "int yylex( void )".) This definition may be 1054 changed by defining the "YY_DECL" macro. For example, you 1055 could use: 1056 1057 @example 1058 #define YY_DECL float lexscan( a, b ) float a, b; 1059 @end example 1060 1061 to give the scanning routine the name @code{lexscan}, returning a 1062 float, and taking two floats as arguments. Note that if 1063 you give arguments to the scanning routine using a 1064 K&R-style/non-prototyped function declaration, you must 1065 terminate the definition with a semi-colon (@samp{;}). 1066 1067 Whenever @samp{yylex()} is called, it scans tokens from the 1068 global input file @code{yyin} (which defaults to stdin). It 1069 continues until it either reaches an end-of-file (at which 1070 point it returns the value 0) or one of its actions 1071 executes a @code{return} statement. 1072 1073 If the scanner reaches an end-of-file, subsequent calls are undefined 1074 unless either @code{yyin} is pointed at a new input file (in which case 1075 scanning continues from that file), or @samp{yyrestart()} is called. 1076 @samp{yyrestart()} takes one argument, a @samp{FILE *} pointer (which 1077 can be nil, if you've set up @code{YY_INPUT} to scan from a source 1078 other than @code{yyin}), and initializes @code{yyin} for scanning from 1079 that file. Essentially there is no difference between just assigning 1080 @code{yyin} to a new input file or using @samp{yyrestart()} to do so; 1081 the latter is available for compatibility with previous versions of 1082 @code{flex}, and because it can be used to switch input files in the 1083 middle of scanning. It can also be used to throw away the current 1084 input buffer, by calling it with an argument of @code{yyin}; but 1085 better is to use @code{YY_FLUSH_BUFFER} (see above). Note that 1086 @samp{yyrestart()} does @emph{not} reset the start condition to 1087 @code{INITIAL} (see Start Conditions, below). 1088 1089 1090 If @samp{yylex()} stops scanning due to executing a @code{return} 1091 statement in one of the actions, the scanner may then be called 1092 again and it will resume scanning where it left off. 1093 1094 By default (and for purposes of efficiency), the scanner 1095 uses block-reads rather than simple @samp{getc()} calls to read 1096 characters from @code{yyin}. The nature of how it gets its input 1097 can be controlled by defining the @code{YY_INPUT} macro. 1098 YY_INPUT's calling sequence is 1099 "YY_INPUT(buf,result,max_size)". Its action is to place 1100 up to @var{max_size} characters in the character array @var{buf} and 1101 return in the integer variable @var{result} either the number of 1102 characters read or the constant YY_NULL (0 on Unix 1103 systems) to indicate EOF. The default YY_INPUT reads from 1104 the global file-pointer "yyin". 1105 1106 A sample definition of YY_INPUT (in the definitions 1107 section of the input file): 1108 1109 @example 1110 %@{ 1111 #define YY_INPUT(buf,result,max_size) \ 1112 @{ \ 1113 int c = getchar(); \ 1114 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \ 1115 @} 1116 %@} 1117 @end example 1118 1119 This definition will change the input processing to occur 1120 one character at a time. 1121 1122 When the scanner receives an end-of-file indication from 1123 YY_INPUT, it then checks the @samp{yywrap()} function. If 1124 @samp{yywrap()} returns false (zero), then it is assumed that the 1125 function has gone ahead and set up @code{yyin} to point to 1126 another input file, and scanning continues. If it returns 1127 true (non-zero), then the scanner terminates, returning 0 1128 to its caller. Note that in either case, the start 1129 condition remains unchanged; it does @emph{not} revert to @code{INITIAL}. 1130 1131 If you do not supply your own version of @samp{yywrap()}, then you 1132 must either use @samp{%option noyywrap} (in which case the scanner 1133 behaves as though @samp{yywrap()} returned 1), or you must link with 1134 @samp{-lfl} to obtain the default version of the routine, which always 1135 returns 1. 1136 1137 Three routines are available for scanning from in-memory 1138 buffers rather than files: @samp{yy_scan_string()}, 1139 @samp{yy_scan_bytes()}, and @samp{yy_scan_buffer()}. See the discussion 1140 of them below in the section Multiple Input Buffers. 1141 1142 The scanner writes its @samp{ECHO} output to the @code{yyout} global 1143 (default, stdout), which may be redefined by the user 1144 simply by assigning it to some other @code{FILE} pointer. 1145 1146 @node Start conditions, Multiple buffers, Generated scanner, Top 1147 @section Start conditions 1148 1149 @code{flex} provides a mechanism for conditionally activating 1150 rules. Any rule whose pattern is prefixed with "<sc>" 1151 will only be active when the scanner is in the start 1152 condition named "sc". For example, 1153 1154 @example 1155 <STRING>[^"]* @{ /* eat up the string body ... */ 1156 @dots{} 1157 @} 1158 @end example 1159 1160 @noindent 1161 will be active only when the scanner is in the "STRING" 1162 start condition, and 1163 1164 @example 1165 <INITIAL,STRING,QUOTE>\. @{ /* handle an escape ... */ 1166 @dots{} 1167 @} 1168 @end example 1169 1170 @noindent 1171 will be active only when the current start condition is 1172 either "INITIAL", "STRING", or "QUOTE". 1173 1174 Start conditions are declared in the definitions (first) 1175 section of the input using unindented lines beginning with 1176 either @samp{%s} or @samp{%x} followed by a list of names. The former 1177 declares @emph{inclusive} start conditions, the latter @emph{exclusive} 1178 start conditions. A start condition is activated using 1179 the @code{BEGIN} action. Until the next @code{BEGIN} action is 1180 executed, rules with the given start condition will be active 1181 and rules with other start conditions will be inactive. 1182 If the start condition is @emph{inclusive}, then rules with no 1183 start conditions at all will also be active. If it is 1184 @emph{exclusive}, then @emph{only} rules qualified with the start 1185 condition will be active. A set of rules contingent on the 1186 same exclusive start condition describe a scanner which is 1187 independent of any of the other rules in the @code{flex} input. 1188 Because of this, exclusive start conditions make it easy 1189 to specify "mini-scanners" which scan portions of the 1190 input that are syntactically different from the rest 1191 (e.g., comments). 1192 1193 If the distinction between inclusive and exclusive start 1194 conditions is still a little vague, here's a simple 1195 example illustrating the connection between the two. The set 1196 of rules: 1197 1198 @example 1199 %s example 1200 %% 1201 1202 <example>foo do_something(); 1203 1204 bar something_else(); 1205 @end example 1206 1207 @noindent 1208 is equivalent to 1209 1210 @example 1211 %x example 1212 %% 1213 1214 <example>foo do_something(); 1215 1216 <INITIAL,example>bar something_else(); 1217 @end example 1218 1219 Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern 1220 in the second example wouldn't be active (i.e., couldn't match) when 1221 in start condition @samp{example}. If we just used @samp{<example>} 1222 to qualify @samp{bar}, though, then it would only be active in 1223 @samp{example} and not in @code{INITIAL}, while in the first example 1224 it's active in both, because in the first example the @samp{example} 1225 starting condition is an @emph{inclusive} (@samp{%s}) start condition. 1226 1227 Also note that the special start-condition specifier @samp{<*>} 1228 matches every start condition. Thus, the above example 1229 could also have been written; 1230 1231 @example 1232 %x example 1233 %% 1234 1235 <example>foo do_something(); 1236 1237 <*>bar something_else(); 1238 @end example 1239 1240 The default rule (to @samp{ECHO} any unmatched character) remains 1241 active in start conditions. It is equivalent to: 1242 1243 @example 1244 <*>.|\\n ECHO; 1245 @end example 1246 1247 @samp{BEGIN(0)} returns to the original state where only the 1248 rules with no start conditions are active. This state can 1249 also be referred to as the start-condition "INITIAL", so 1250 @samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}. (The 1251 parentheses around the start condition name are not required but 1252 are considered good style.) 1253 1254 @code{BEGIN} actions can also be given as indented code at the 1255 beginning of the rules section. For example, the 1256 following will cause the scanner to enter the "SPECIAL" start 1257 condition whenever @samp{yylex()} is called and the global 1258 variable @code{enter_special} is true: 1259 1260 @example 1261 int enter_special; 1262 1263 %x SPECIAL 1264 %% 1265 if ( enter_special ) 1266 BEGIN(SPECIAL); 1267 1268 <SPECIAL>blahblahblah 1269 @dots{}more rules follow@dots{} 1270 @end example 1271 1272 To illustrate the uses of start conditions, here is a 1273 scanner which provides two different interpretations of a 1274 string like "123.456". By default it will treat it as as 1275 three tokens, the integer "123", a dot ('.'), and the 1276 integer "456". But if the string is preceded earlier in 1277 the line by the string "expect-floats" it will treat it as 1278 a single token, the floating-point number 123.456: 1279 1280 @example 1281 %@{ 1282 #include <math.h> 1283 %@} 1284 %s expect 1285 1286 %% 1287 expect-floats BEGIN(expect); 1288 1289 <expect>[0-9]+"."[0-9]+ @{ 1290 printf( "found a float, = %f\n", 1291 atof( yytext ) ); 1292 @} 1293 <expect>\n @{ 1294 /* that's the end of the line, so 1295 * we need another "expect-number" 1296 * before we'll recognize any more 1297 * numbers 1298 */ 1299 BEGIN(INITIAL); 1300 @} 1301 1302 [0-9]+ @{ 1303 1304 Version 2.5 December 1994 18 1305 1306 printf( "found an integer, = %d\n", 1307 atoi( yytext ) ); 1308 @} 1309 1310 "." printf( "found a dot\n" ); 1311 @end example 1312 1313 Here is a scanner which recognizes (and discards) C 1314 comments while maintaining a count of the current input line. 1315 1316 @example 1317 %x comment 1318 %% 1319 int line_num = 1; 1320 1321 "/*" BEGIN(comment); 1322 1323 <comment>[^*\n]* /* eat anything that's not a '*' */ 1324 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1325 <comment>\n ++line_num; 1326 <comment>"*"+"/" BEGIN(INITIAL); 1327 @end example 1328 1329 This scanner goes to a bit of trouble to match as much 1330 text as possible with each rule. In general, when 1331 attempting to write a high-speed scanner try to match as 1332 much possible in each rule, as it's a big win. 1333 1334 Note that start-conditions names are really integer values 1335 and can be stored as such. Thus, the above could be 1336 extended in the following fashion: 1337 1338 @example 1339 %x comment foo 1340 %% 1341 int line_num = 1; 1342 int comment_caller; 1343 1344 "/*" @{ 1345 comment_caller = INITIAL; 1346 BEGIN(comment); 1347 @} 1348 1349 @dots{} 1350 1351 <foo>"/*" @{ 1352 comment_caller = foo; 1353 BEGIN(comment); 1354 @} 1355 1356 <comment>[^*\n]* /* eat anything that's not a '*' */ 1357 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1358 <comment>\n ++line_num; 1359 <comment>"*"+"/" BEGIN(comment_caller); 1360 @end example 1361 1362 Furthermore, you can access the current start condition 1363 using the integer-valued @code{YY_START} macro. For example, the 1364 above assignments to @code{comment_caller} could instead be 1365 written 1366 1367 @example 1368 comment_caller = YY_START; 1369 @end example 1370 1371 Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that 1372 is what's used by AT&T @code{lex}). 1373 1374 Note that start conditions do not have their own 1375 name-space; %s's and %x's declare names in the same fashion as 1376 #define's. 1377 1378 Finally, here's an example of how to match C-style quoted 1379 strings using exclusive start conditions, including 1380 expanded escape sequences (but not including checking for 1381 a string that's too long): 1382 1383 @example 1384 %x str 1385 1386 %% 1387 char string_buf[MAX_STR_CONST]; 1388 char *string_buf_ptr; 1389 1390 \" string_buf_ptr = string_buf; BEGIN(str); 1391 1392 <str>\" @{ /* saw closing quote - all done */ 1393 BEGIN(INITIAL); 1394 *string_buf_ptr = '\0'; 1395 /* return string constant token type and 1396 * value to parser 1397 */ 1398 @} 1399 1400 <str>\n @{ 1401 /* error - unterminated string constant */ 1402 /* generate error message */ 1403 @} 1404 1405 <str>\\[0-7]@{1,3@} @{ 1406 /* octal escape sequence */ 1407 int result; 1408 1409 (void) sscanf( yytext + 1, "%o", &result ); 1410 1411 if ( result > 0xff ) 1412 /* error, constant is out-of-bounds */ 1413 1414 *string_buf_ptr++ = result; 1415 @} 1416 1417 <str>\\[0-9]+ @{ 1418 /* generate error - bad escape sequence; something 1419 * like '\48' or '\0777777' 1420 */ 1421 @} 1422 1423 <str>\\n *string_buf_ptr++ = '\n'; 1424 <str>\\t *string_buf_ptr++ = '\t'; 1425 <str>\\r *string_buf_ptr++ = '\r'; 1426 <str>\\b *string_buf_ptr++ = '\b'; 1427 <str>\\f *string_buf_ptr++ = '\f'; 1428 1429 <str>\\(.|\n) *string_buf_ptr++ = yytext[1]; 1430 1431 <str>[^\\\n\"]+ @{ 1432 char *yptr = yytext; 1433 1434 while ( *yptr ) 1435 *string_buf_ptr++ = *yptr++; 1436 @} 1437 @end example 1438 1439 Often, such as in some of the examples above, you wind up 1440 writing a whole bunch of rules all preceded by the same 1441 start condition(s). Flex makes this a little easier and 1442 cleaner by introducing a notion of start condition @dfn{scope}. 1443 A start condition scope is begun with: 1444 1445 @example 1446 <SCs>@{ 1447 @end example 1448 1449 @noindent 1450 where SCs is a list of one or more start conditions. 1451 Inside the start condition scope, every rule automatically 1452 has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which 1453 matches the initial @samp{@{}. So, for example, 1454 1455 @example 1456 <ESC>@{ 1457 "\\n" return '\n'; 1458 "\\r" return '\r'; 1459 "\\f" return '\f'; 1460 "\\0" return '\0'; 1461 @} 1462 @end example 1463 1464 @noindent 1465 is equivalent to: 1466 1467 @example 1468 <ESC>"\\n" return '\n'; 1469 <ESC>"\\r" return '\r'; 1470 <ESC>"\\f" return '\f'; 1471 <ESC>"\\0" return '\0'; 1472 @end example 1473 1474 Start condition scopes may be nested. 1475 1476 Three routines are available for manipulating stacks of 1477 start conditions: 1478 1479 @table @samp 1480 @item void yy_push_state(int new_state) 1481 pushes the current start condition onto the top of 1482 the start condition stack and switches to @var{new_state} 1483 as though you had used @samp{BEGIN new_state} (recall that 1484 start condition names are also integers). 1485 1486 @item void yy_pop_state() 1487 pops the top of the stack and switches to it via 1488 @code{BEGIN}. 1489 1490 @item int yy_top_state() 1491 returns the top of the stack without altering the 1492 stack's contents. 1493 @end table 1494 1495 The start condition stack grows dynamically and so has no 1496 built-in size limitation. If memory is exhausted, program 1497 execution aborts. 1498 1499 To use start condition stacks, your scanner must include a 1500 @samp{%option stack} directive (see Options below). 1501 1502 @node Multiple buffers, End-of-file rules, Start conditions, Top 1503 @section Multiple input buffers 1504 1505 Some scanners (such as those which support "include" 1506 files) require reading from several input streams. As 1507 @code{flex} scanners do a large amount of buffering, one cannot 1508 control where the next input will be read from by simply 1509 writing a @code{YY_INPUT} which is sensitive to the scanning 1510 context. @code{YY_INPUT} is only called when the scanner reaches 1511 the end of its buffer, which may be a long time after 1512 scanning a statement such as an "include" which requires 1513 switching the input source. 1514 1515 To negotiate these sorts of problems, @code{flex} provides a 1516 mechanism for creating and switching between multiple 1517 input buffers. An input buffer is created by using: 1518 1519 @example 1520 YY_BUFFER_STATE yy_create_buffer( FILE *file, int size ) 1521 @end example 1522 1523 @noindent 1524 which takes a @code{FILE} pointer and a size and creates a buffer 1525 associated with the given file and large enough to hold 1526 @var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the 1527 size). It returns a @code{YY_BUFFER_STATE} handle, which may 1528 then be passed to other routines (see below). The 1529 @code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct} 1530 @code{yy_buffer_state} structure, so you may safely initialize 1531 YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you 1532 wish, and also refer to the opaque structure in order to 1533 correctly declare input buffers in source files other than 1534 that of your scanner. Note that the @code{FILE} pointer in the 1535 call to @code{yy_create_buffer} is only used as the value of @code{yyin} 1536 seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer 1537 uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to 1538 @code{yy_create_buffer}. You select a particular buffer to scan 1539 from using: 1540 1541 @example 1542 void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer ) 1543 @end example 1544 1545 switches the scanner's input buffer so subsequent tokens 1546 will come from @var{new_buffer}. Note that 1547 @samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set 1548 things up for continued scanning, instead of opening a new 1549 file and pointing @code{yyin} at it. Note also that switching 1550 input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()} 1551 does @emph{not} change the start condition. 1552 1553 @example 1554 void yy_delete_buffer( YY_BUFFER_STATE buffer ) 1555 @end example 1556 1557 @noindent 1558 is used to reclaim the storage associated with a buffer. 1559 You can also clear the current contents of a buffer using: 1560 1561 @example 1562 void yy_flush_buffer( YY_BUFFER_STATE buffer ) 1563 @end example 1564 1565 This function discards the buffer's contents, so the next time the 1566 scanner attempts to match a token from the buffer, it will first fill 1567 the buffer anew using @code{YY_INPUT}. 1568 1569 @samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()}, 1570 provided for compatibility with the C++ use of @code{new} and @code{delete} 1571 for creating and destroying dynamic objects. 1572 1573 Finally, the @code{YY_CURRENT_BUFFER} macro returns a 1574 @code{YY_BUFFER_STATE} handle to the current buffer. 1575 1576 Here is an example of using these features for writing a 1577 scanner which expands include files (the @samp{<<EOF>>} feature 1578 is discussed below): 1579 1580 @example 1581 /* the "incl" state is used for picking up the name 1582 * of an include file 1583 */ 1584 %x incl 1585 1586 %@{ 1587 #define MAX_INCLUDE_DEPTH 10 1588 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1589 int include_stack_ptr = 0; 1590 %@} 1591 1592 %% 1593 include BEGIN(incl); 1594 1595 [a-z]+ ECHO; 1596 [^a-z\n]*\n? ECHO; 1597 1598 <incl>[ \t]* /* eat the whitespace */ 1599 <incl>[^ \t\n]+ @{ /* got the include file name */ 1600 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) 1601 @{ 1602 fprintf( stderr, "Includes nested too deeply" ); 1603 exit( 1 ); 1604 @} 1605 1606 include_stack[include_stack_ptr++] = 1607 YY_CURRENT_BUFFER; 1608 1609 yyin = fopen( yytext, "r" ); 1610 1611 if ( ! yyin ) 1612 error( @dots{} ); 1613 1614 yy_switch_to_buffer( 1615 yy_create_buffer( yyin, YY_BUF_SIZE ) ); 1616 1617 BEGIN(INITIAL); 1618 @} 1619 1620 <<EOF>> @{ 1621 if ( --include_stack_ptr < 0 ) 1622 @{ 1623 yyterminate(); 1624 @} 1625 1626 else 1627 @{ 1628 yy_delete_buffer( YY_CURRENT_BUFFER ); 1629 yy_switch_to_buffer( 1630 include_stack[include_stack_ptr] ); 1631 @} 1632 @} 1633 @end example 1634 1635 Three routines are available for setting up input buffers 1636 for scanning in-memory strings instead of files. All of 1637 them create a new input buffer for scanning the string, 1638 and return a corresponding @code{YY_BUFFER_STATE} handle (which 1639 you should delete with @samp{yy_delete_buffer()} when done with 1640 it). They also switch to the new buffer using 1641 @samp{yy_switch_to_buffer()}, so the next call to @samp{yylex()} will 1642 start scanning the string. 1643 1644 @table @samp 1645 @item yy_scan_string(const char *str) 1646 scans a NUL-terminated string. 1647 1648 @item yy_scan_bytes(const char *bytes, int len) 1649 scans @code{len} bytes (including possibly NUL's) starting 1650 at location @var{bytes}. 1651 @end table 1652 1653 Note that both of these functions create and scan a @emph{copy} 1654 of the string or bytes. (This may be desirable, since 1655 @samp{yylex()} modifies the contents of the buffer it is 1656 scanning.) You can avoid the copy by using: 1657 1658 @table @samp 1659 @item yy_scan_buffer(char *base, yy_size_t size) 1660 which scans in place the buffer starting at @var{base}, 1661 consisting of @var{size} bytes, the last two bytes of 1662 which @emph{must} be @code{YY_END_OF_BUFFER_CHAR} (ASCII NUL). 1663 These last two bytes are not scanned; thus, 1664 scanning consists of @samp{base[0]} through @samp{base[size-2]}, 1665 inclusive. 1666 1667 If you fail to set up @var{base} in this manner (i.e., 1668 forget the final two @code{YY_END_OF_BUFFER_CHAR} bytes), 1669 then @samp{yy_scan_buffer()} returns a nil pointer instead 1670 of creating a new input buffer. 1671 1672 The type @code{yy_size_t} is an integral type to which you 1673 can cast an integer expression reflecting the size 1674 of the buffer. 1675 @end table 1676 1677 @node End-of-file rules, Miscellaneous, Multiple buffers, Top 1678 @section End-of-file rules 1679 1680 The special rule "<<EOF>>" indicates actions which are to 1681 be taken when an end-of-file is encountered and yywrap() 1682 returns non-zero (i.e., indicates no further files to 1683 process). The action must finish by doing one of four 1684 things: 1685 1686 @itemize - 1687 @item 1688 assigning @code{yyin} to a new input file (in previous 1689 versions of flex, after doing the assignment you 1690 had to call the special action @code{YY_NEW_FILE}; this is 1691 no longer necessary); 1692 1693 @item 1694 executing a @code{return} statement; 1695 1696 @item 1697 executing the special @samp{yyterminate()} action; 1698 1699 @item 1700 or, switching to a new buffer using 1701 @samp{yy_switch_to_buffer()} as shown in the example 1702 above. 1703 @end itemize 1704 1705 <<EOF>> rules may not be used with other patterns; they 1706 may only be qualified with a list of start conditions. If 1707 an unqualified <<EOF>> rule is given, it applies to @emph{all} 1708 start conditions which do not already have <<EOF>> 1709 actions. To specify an <<EOF>> rule for only the initial 1710 start condition, use 1711 1712 @example 1713 <INITIAL><<EOF>> 1714 @end example 1715 1716 These rules are useful for catching things like unclosed 1717 comments. An example: 1718 1719 @example 1720 %x quote 1721 %% 1722 1723 @dots{}other rules for dealing with quotes@dots{} 1724 1725 <quote><<EOF>> @{ 1726 error( "unterminated quote" ); 1727 yyterminate(); 1728 @} 1729 <<EOF>> @{ 1730 if ( *++filelist ) 1731 yyin = fopen( *filelist, "r" ); 1732 else 1733 yyterminate(); 1734 @} 1735 @end example 1736 1737 @node Miscellaneous, User variables, End-of-file rules, Top 1738 @section Miscellaneous macros 1739 1740 The macro @code{YY_USER_ACTION} can be defined to provide an 1741 action which is always executed prior to the matched 1742 rule's action. For example, it could be #define'd to call 1743 a routine to convert yytext to lower-case. When 1744 @code{YY_USER_ACTION} is invoked, the variable @code{yy_act} gives the 1745 number of the matched rule (rules are numbered starting 1746 with 1). Suppose you want to profile how often each of 1747 your rules is matched. The following would do the trick: 1748 1749 @example 1750 #define YY_USER_ACTION ++ctr[yy_act] 1751 @end example 1752 1753 where @code{ctr} is an array to hold the counts for the different 1754 rules. Note that the macro @code{YY_NUM_RULES} gives the total number 1755 of rules (including the default rule, even if you use @samp{-s}, so 1756 a correct declaration for @code{ctr} is: 1757 1758 @example 1759 int ctr[YY_NUM_RULES]; 1760 @end example 1761 1762 The macro @code{YY_USER_INIT} may be defined to provide an action 1763 which is always executed before the first scan (and before 1764 the scanner's internal initializations are done). For 1765 example, it could be used to call a routine to read in a 1766 data table or open a logging file. 1767 1768 The macro @samp{yy_set_interactive(is_interactive)} can be used 1769 to control whether the current buffer is considered 1770 @emph{interactive}. An interactive buffer is processed more slowly, 1771 but must be used when the scanner's input source is indeed 1772 interactive to avoid problems due to waiting to fill 1773 buffers (see the discussion of the @samp{-I} flag below). A 1774 non-zero value in the macro invocation marks the buffer as 1775 interactive, a zero value as non-interactive. Note that 1776 use of this macro overrides @samp{%option always-interactive} or 1777 @samp{%option never-interactive} (see Options below). 1778 @samp{yy_set_interactive()} must be invoked prior to beginning to 1779 scan the buffer that is (or is not) to be considered 1780 interactive. 1781 1782 The macro @samp{yy_set_bol(at_bol)} can be used to control 1783 whether the current buffer's scanning context for the next 1784 token match is done as though at the beginning of a line. 1785 A non-zero macro argument makes rules anchored with 1786 1787 The macro @samp{YY_AT_BOL()} returns true if the next token 1788 scanned from the current buffer will have '^' rules 1789 active, false otherwise. 1790 1791 In the generated scanner, the actions are all gathered in 1792 one large switch statement and separated using @code{YY_BREAK}, 1793 which may be redefined. By default, it is simply a 1794 "break", to separate each rule's action from the following 1795 rule's. Redefining @code{YY_BREAK} allows, for example, C++ 1796 users to #define YY_BREAK to do nothing (while being very 1797 careful that every rule ends with a "break" or a 1798 "return"!) to avoid suffering from unreachable statement 1799 warnings where because a rule's action ends with "return", 1800 the @code{YY_BREAK} is inaccessible. 1801 1802 @node User variables, YACC interface, Miscellaneous, Top 1803 @section Values available to the user 1804 1805 This section summarizes the various values available to 1806 the user in the rule actions. 1807 1808 @itemize - 1809 @item 1810 @samp{char *yytext} holds the text of the current token. 1811 It may be modified but not lengthened (you cannot 1812 append characters to the end). 1813 1814 If the special directive @samp{%array} appears in the 1815 first section of the scanner description, then 1816 @code{yytext} is instead declared @samp{char yytext[YYLMAX]}, 1817 where @code{YYLMAX} is a macro definition that you can 1818 redefine in the first section if you don't like the 1819 default value (generally 8KB). Using @samp{%array} 1820 results in somewhat slower scanners, but the value 1821 of @code{yytext} becomes immune to calls to @samp{input()} and 1822 @samp{unput()}, which potentially destroy its value when 1823 @code{yytext} is a character pointer. The opposite of 1824 @samp{%array} is @samp{%pointer}, which is the default. 1825 1826 You cannot use @samp{%array} when generating C++ scanner 1827 classes (the @samp{-+} flag). 1828 1829 @item 1830 @samp{int yyleng} holds the length of the current token. 1831 1832 @item 1833 @samp{FILE *yyin} is the file which by default @code{flex} reads 1834 from. It may be redefined but doing so only makes 1835 sense before scanning begins or after an EOF has 1836 been encountered. Changing it in the midst of 1837 scanning will have unexpected results since @code{flex} 1838 buffers its input; use @samp{yyrestart()} instead. Once 1839 scanning terminates because an end-of-file has been 1840 seen, you can assign @code{yyin} at the new input file and 1841 then call the scanner again to continue scanning. 1842 1843 @item 1844 @samp{void yyrestart( FILE *new_file )} may be called to 1845 point @code{yyin} at the new input file. The switch-over 1846 to the new file is immediate (any previously 1847 buffered-up input is lost). Note that calling 1848 @samp{yyrestart()} with @code{yyin} as an argument thus throws 1849 away the current input buffer and continues 1850 scanning the same input file. 1851 1852 @item 1853 @samp{FILE *yyout} is the file to which @samp{ECHO} actions are 1854 done. It can be reassigned by the user. 1855 1856 @item 1857 @code{YY_CURRENT_BUFFER} returns a @code{YY_BUFFER_STATE} handle 1858 to the current buffer. 1859 1860 @item 1861 @code{YY_START} returns an integer value corresponding to 1862 the current start condition. You can subsequently 1863 use this value with @code{BEGIN} to return to that start 1864 condition. 1865 @end itemize 1866 1867 @node YACC interface, Options, User variables, Top 1868 @section Interfacing with @code{yacc} 1869 1870 One of the main uses of @code{flex} is as a companion to the @code{yacc} 1871 parser-generator. @code{yacc} parsers expect to call a routine 1872 named @samp{yylex()} to find the next input token. The routine 1873 is supposed to return the type of the next token as well 1874 as putting any associated value in the global @code{yylval}. To 1875 use @code{flex} with @code{yacc}, one specifies the @samp{-d} option to @code{yacc} to 1876 instruct it to generate the file @file{y.tab.h} containing 1877 definitions of all the @samp{%tokens} appearing in the @code{yacc} input. 1878 This file is then included in the @code{flex} scanner. For 1879 example, if one of the tokens is "TOK_NUMBER", part of the 1880 scanner might look like: 1881 1882 @example 1883 %@{ 1884 #include "y.tab.h" 1885 %@} 1886 1887 %% 1888 1889 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; 1890 @end example 1891 1892 @node Options, Performance, YACC interface, Top 1893 @section Options 1894 @code{flex} has the following options: 1895 1896 @table @samp 1897 @item -b 1898 Generate backing-up information to @file{lex.backup}. 1899 This is a list of scanner states which require 1900 backing up and the input characters on which they 1901 do so. By adding rules one can remove backing-up 1902 states. If @emph{all} backing-up states are eliminated 1903 and @samp{-Cf} or @samp{-CF} is used, the generated scanner will 1904 run faster (see the @samp{-p} flag). Only users who wish 1905 to squeeze every last cycle out of their scanners 1906 need worry about this option. (See the section on 1907 Performance Considerations below.) 1908 1909 @item -c 1910 is a do-nothing, deprecated option included for 1911 POSIX compliance. 1912 1913 @item -d 1914 makes the generated scanner run in @dfn{debug} mode. 1915 Whenever a pattern is recognized and the global 1916 @code{yy_flex_debug} is non-zero (which is the default), 1917 the scanner will write to @code{stderr} a line of the 1918 form: 1919 1920 @example 1921 --accepting rule at line 53 ("the matched text") 1922 @end example 1923 1924 The line number refers to the location of the rule 1925 in the file defining the scanner (i.e., the file 1926 that was fed to flex). Messages are also generated 1927 when the scanner backs up, accepts the default 1928 rule, reaches the end of its input buffer (or 1929 encounters a NUL; at this point, the two look the 1930 same as far as the scanner's concerned), or reaches 1931 an end-of-file. 1932 1933 @item -f 1934 specifies @dfn{fast scanner}. No table compression is 1935 done and stdio is bypassed. The result is large 1936 but fast. This option is equivalent to @samp{-Cfr} (see 1937 below). 1938 1939 @item -h 1940 generates a "help" summary of @code{flex's} options to 1941 @code{stdout} and then exits. @samp{-?} and @samp{--help} are synonyms 1942 for @samp{-h}. 1943 1944 @item -i 1945 instructs @code{flex} to generate a @emph{case-insensitive} 1946 scanner. The case of letters given in the @code{flex} input 1947 patterns will be ignored, and tokens in the input 1948 will be matched regardless of case. The matched 1949 text given in @code{yytext} will have the preserved case 1950 (i.e., it will not be folded). 1951 1952 @item -l 1953 turns on maximum compatibility with the original 1954 AT&T @code{lex} implementation. Note that this does not 1955 mean @emph{full} compatibility. Use of this option costs 1956 a considerable amount of performance, and it cannot 1957 be used with the @samp{-+, -f, -F, -Cf}, or @samp{-CF} options. 1958 For details on the compatibilities it provides, see 1959 the section "Incompatibilities With Lex And POSIX" 1960 below. This option also results in the name 1961 @code{YY_FLEX_LEX_COMPAT} being #define'd in the generated 1962 scanner. 1963 1964 @item -n 1965 is another do-nothing, deprecated option included 1966 only for POSIX compliance. 1967 1968 @item -p 1969 generates a performance report to stderr. The 1970 report consists of comments regarding features of 1971 the @code{flex} input file which will cause a serious loss 1972 of performance in the resulting scanner. If you 1973 give the flag twice, you will also get comments 1974 regarding features that lead to minor performance 1975 losses. 1976 1977 Note that the use of @code{REJECT}, @samp{%option yylineno} and 1978 variable trailing context (see the Deficiencies / Bugs section below) 1979 entails a substantial performance penalty; use of @samp{yymore()}, 1980 the @samp{^} operator, and the @samp{-I} flag entail minor performance 1981 penalties. 1982 1983 @item -s 1984 causes the @dfn{default rule} (that unmatched scanner 1985 input is echoed to @code{stdout}) to be suppressed. If 1986 the scanner encounters input that does not match 1987 any of its rules, it aborts with an error. This 1988 option is useful for finding holes in a scanner's 1989 rule set. 1990 1991 @item -t 1992 instructs @code{flex} to write the scanner it generates to 1993 standard output instead of @file{lex.yy.c}. 1994 1995 @item -v 1996 specifies that @code{flex} should write to @code{stderr} a 1997 summary of statistics regarding the scanner it 1998 generates. Most of the statistics are meaningless to 1999 the casual @code{flex} user, but the first line identifies 2000 the version of @code{flex} (same as reported by @samp{-V}), and 2001 the next line the flags used when generating the 2002 scanner, including those that are on by default. 2003 2004 @item -w 2005 suppresses warning messages. 2006 2007 @item -B 2008 instructs @code{flex} to generate a @emph{batch} scanner, the 2009 opposite of @emph{interactive} scanners generated by @samp{-I} 2010 (see below). In general, you use @samp{-B} when you are 2011 @emph{certain} that your scanner will never be used 2012 interactively, and you want to squeeze a @emph{little} more 2013 performance out of it. If your goal is instead to 2014 squeeze out a @emph{lot} more performance, you should be 2015 using the @samp{-Cf} or @samp{-CF} options (discussed below), 2016 which turn on @samp{-B} automatically anyway. 2017 2018 @item -F 2019 specifies that the @dfn{fast} scanner table 2020 representation should be used (and stdio bypassed). This 2021 representation is about as fast as the full table 2022 representation @samp{(-f)}, and for some sets of patterns 2023 will be considerably smaller (and for others, 2024 larger). In general, if the pattern set contains 2025 both "keywords" and a catch-all, "identifier" rule, 2026 such as in the set: 2027 2028 @example 2029 "case" return TOK_CASE; 2030 "switch" return TOK_SWITCH; 2031 ... 2032 "default" return TOK_DEFAULT; 2033 [a-z]+ return TOK_ID; 2034 @end example 2035 2036 @noindent 2037 then you're better off using the full table 2038 representation. If only the "identifier" rule is 2039 present and you then use a hash table or some such to 2040 detect the keywords, you're better off using @samp{-F}. 2041 2042 This option is equivalent to @samp{-CFr} (see below). It 2043 cannot be used with @samp{-+}. 2044 2045 @item -I 2046 instructs @code{flex} to generate an @emph{interactive} scanner. 2047 An interactive scanner is one that only looks ahead 2048 to decide what token has been matched if it 2049 absolutely must. It turns out that always looking one 2050 extra character ahead, even if the scanner has 2051 already seen enough text to disambiguate the 2052 current token, is a bit faster than only looking ahead 2053 when necessary. But scanners that always look 2054 ahead give dreadful interactive performance; for 2055 example, when a user types a newline, it is not 2056 recognized as a newline token until they enter 2057 @emph{another} token, which often means typing in another 2058 whole line. 2059 2060 @code{Flex} scanners default to @emph{interactive} unless you use 2061 the @samp{-Cf} or @samp{-CF} table-compression options (see 2062 below). That's because if you're looking for 2063 high-performance you should be using one of these 2064 options, so if you didn't, @code{flex} assumes you'd 2065 rather trade off a bit of run-time performance for 2066 intuitive interactive behavior. Note also that you 2067 @emph{cannot} use @samp{-I} in conjunction with @samp{-Cf} or @samp{-CF}. 2068 Thus, this option is not really needed; it is on by 2069 default for all those cases in which it is allowed. 2070 2071 You can force a scanner to @emph{not} be interactive by 2072 using @samp{-B} (see above). 2073 2074 @item -L 2075 instructs @code{flex} not to generate @samp{#line} directives. 2076 Without this option, @code{flex} peppers the generated 2077 scanner with #line directives so error messages in 2078 the actions will be correctly located with respect 2079 to either the original @code{flex} input file (if the 2080 errors are due to code in the input file), or 2081 @file{lex.yy.c} (if the errors are @code{flex's} fault -- you 2082 should report these sorts of errors to the email 2083 address given below). 2084 2085 @item -T 2086 makes @code{flex} run in @code{trace} mode. It will generate a 2087 lot of messages to @code{stderr} concerning the form of 2088 the input and the resultant non-deterministic and 2089 deterministic finite automata. This option is 2090 mostly for use in maintaining @code{flex}. 2091 2092 @item -V 2093 prints the version number to @code{stdout} and exits. 2094 @samp{--version} is a synonym for @samp{-V}. 2095 2096 @item -7 2097 instructs @code{flex} to generate a 7-bit scanner, i.e., 2098 one which can only recognized 7-bit characters in 2099 its input. The advantage of using @samp{-7} is that the 2100 scanner's tables can be up to half the size of 2101 those generated using the @samp{-8} option (see below). 2102 The disadvantage is that such scanners often hang 2103 or crash if their input contains an 8-bit 2104 character. 2105 2106 Note, however, that unless you generate your 2107 scanner using the @samp{-Cf} or @samp{-CF} table compression options, 2108 use of @samp{-7} will save only a small amount of table 2109 space, and make your scanner considerably less 2110 portable. @code{Flex's} default behavior is to generate 2111 an 8-bit scanner unless you use the @samp{-Cf} or @samp{-CF}, in 2112 which case @code{flex} defaults to generating 7-bit 2113 scanners unless your site was always configured to 2114 generate 8-bit scanners (as will often be the case 2115 with non-USA sites). You can tell whether flex 2116 generated a 7-bit or an 8-bit scanner by inspecting 2117 the flag summary in the @samp{-v} output as described 2118 above. 2119 2120 Note that if you use @samp{-Cfe} or @samp{-CFe} (those table 2121 compression options, but also using equivalence 2122 classes as discussed see below), flex still 2123 defaults to generating an 8-bit scanner, since 2124 usually with these compression options full 8-bit 2125 tables are not much more expensive than 7-bit 2126 tables. 2127 2128 @item -8 2129 instructs @code{flex} to generate an 8-bit scanner, i.e., 2130 one which can recognize 8-bit characters. This 2131 flag is only needed for scanners generated using 2132 @samp{-Cf} or @samp{-CF}, as otherwise flex defaults to 2133 generating an 8-bit scanner anyway. 2134 2135 See the discussion of @samp{-7} above for flex's default 2136 behavior and the tradeoffs between 7-bit and 8-bit 2137 scanners. 2138 2139 @item -+ 2140 specifies that you want flex to generate a C++ 2141 scanner class. See the section on Generating C++ 2142 Scanners below for details. 2143 2144 @item -C[aefFmr] 2145 controls the degree of table compression and, more 2146 generally, trade-offs between small scanners and 2147 fast scanners. 2148 2149 @samp{-Ca} ("align") instructs flex to trade off larger 2150 tables in the generated scanner for faster 2151 performance because the elements of the tables are better 2152 aligned for memory access and computation. On some 2153 RISC architectures, fetching and manipulating 2154 long-words is more efficient than with smaller-sized 2155 units such as shortwords. This option can double 2156 the size of the tables used by your scanner. 2157 2158 @samp{-Ce} directs @code{flex} to construct @dfn{equivalence classes}, 2159 i.e., sets of characters which have identical 2160 lexical properties (for example, if the only appearance 2161 of digits in the @code{flex} input is in the character 2162 class "[0-9]" then the digits '0', '1', @dots{}, '9' 2163 will all be put in the same equivalence class). 2164 Equivalence classes usually give dramatic 2165 reductions in the final table/object file sizes 2166 (typically a factor of 2-5) and are pretty cheap 2167 performance-wise (one array look-up per character 2168 scanned). 2169 2170 @samp{-Cf} specifies that the @emph{full} scanner tables should 2171 be generated - @code{flex} should not compress the tables 2172 by taking advantages of similar transition 2173 functions for different states. 2174 2175 @samp{-CF} specifies that the alternate fast scanner 2176 representation (described above under the @samp{-F} flag) 2177 should be used. This option cannot be used with 2178 @samp{-+}. 2179 2180 @samp{-Cm} directs @code{flex} to construct @dfn{meta-equivalence 2181 classes}, which are sets of equivalence classes (or 2182 characters, if equivalence classes are not being 2183 used) that are commonly used together. 2184 Meta-equivalence classes are often a big win when using 2185 compressed tables, but they have a moderate 2186 performance impact (one or two "if" tests and one array 2187 look-up per character scanned). 2188 2189 @samp{-Cr} causes the generated scanner to @emph{bypass} use of 2190 the standard I/O library (stdio) for input. 2191 Instead of calling @samp{fread()} or @samp{getc()}, the scanner 2192 will use the @samp{read()} system call, resulting in a 2193 performance gain which varies from system to 2194 system, but in general is probably negligible unless 2195 you are also using @samp{-Cf} or @samp{-CF}. Using @samp{-Cr} can cause 2196 strange behavior if, for example, you read from 2197 @code{yyin} using stdio prior to calling the scanner 2198 (because the scanner will miss whatever text your 2199 previous reads left in the stdio input buffer). 2200 2201 @samp{-Cr} has no effect if you define @code{YY_INPUT} (see The 2202 Generated Scanner above). 2203 2204 A lone @samp{-C} specifies that the scanner tables should 2205 be compressed but neither equivalence classes nor 2206 meta-equivalence classes should be used. 2207 2208 The options @samp{-Cf} or @samp{-CF} and @samp{-Cm} do not make sense 2209 together - there is no opportunity for 2210 meta-equivalence classes if the table is not being 2211 compressed. Otherwise the options may be freely 2212 mixed, and are cumulative. 2213 2214 The default setting is @samp{-Cem}, which specifies that 2215 @code{flex} should generate equivalence classes and 2216 meta-equivalence classes. This setting provides the 2217 highest degree of table compression. You can trade 2218 off faster-executing scanners at the cost of larger 2219 tables with the following generally being true: 2220 2221 @example 2222 slowest & smallest 2223 -Cem 2224 -Cm 2225 -Ce 2226 -C 2227 -C@{f,F@}e 2228 -C@{f,F@} 2229 -C@{f,F@}a 2230 fastest & largest 2231 @end example 2232 2233 Note that scanners with the smallest tables are 2234 usually generated and compiled the quickest, so 2235 during development you will usually want to use the 2236 default, maximal compression. 2237 2238 @samp{-Cfe} is often a good compromise between speed and 2239 size for production scanners. 2240 2241 @item -ooutput 2242 directs flex to write the scanner to the file @samp{out-} 2243 @code{put} instead of @file{lex.yy.c}. If you combine @samp{-o} with 2244 the @samp{-t} option, then the scanner is written to 2245 @code{stdout} but its @samp{#line} directives (see the @samp{-L} option 2246 above) refer to the file @code{output}. 2247 2248 @item -Pprefix 2249 changes the default @samp{yy} prefix used by @code{flex} for all 2250 globally-visible variable and function names to 2251 instead be @var{prefix}. For example, @samp{-Pfoo} changes the 2252 name of @code{yytext} to @file{footext}. It also changes the 2253 name of the default output file from @file{lex.yy.c} to 2254 @file{lex.foo.c}. Here are all of the names affected: 2255 2256 @example 2257 yy_create_buffer 2258 yy_delete_buffer 2259 yy_flex_debug 2260 yy_init_buffer 2261 yy_flush_buffer 2262 yy_load_buffer_state 2263 yy_switch_to_buffer 2264 yyin 2265 yyleng 2266 yylex 2267 yylineno 2268 yyout 2269 yyrestart 2270 yytext 2271 yywrap 2272 @end example 2273 2274 (If you are using a C++ scanner, then only @code{yywrap} 2275 and @code{yyFlexLexer} are affected.) Within your scanner 2276 itself, you can still refer to the global variables 2277 and functions using either version of their name; 2278 but externally, they have the modified name. 2279 2280 This option lets you easily link together multiple 2281 @code{flex} programs into the same executable. Note, 2282 though, that using this option also renames 2283 @samp{yywrap()}, so you now @emph{must} either provide your own 2284 (appropriately-named) version of the routine for 2285 your scanner, or use @samp{%option noyywrap}, as linking 2286 with @samp{-lfl} no longer provides one for you by 2287 default. 2288 2289 @item -Sskeleton_file 2290 overrides the default skeleton file from which @code{flex} 2291 constructs its scanners. You'll never need this 2292 option unless you are doing @code{flex} maintenance or 2293 development. 2294 @end table 2295 2296 @code{flex} also provides a mechanism for controlling options 2297 within the scanner specification itself, rather than from 2298 the flex command-line. This is done by including @samp{%option} 2299 directives in the first section of the scanner 2300 specification. You can specify multiple options with a single 2301 @samp{%option} directive, and multiple directives in the first 2302 section of your flex input file. Most options are given 2303 simply as names, optionally preceded by the word "no" 2304 (with no intervening whitespace) to negate their meaning. 2305 A number are equivalent to flex flags or their negation: 2306 2307 @example 2308 7bit -7 option 2309 8bit -8 option 2310 align -Ca option 2311 backup -b option 2312 batch -B option 2313 c++ -+ option 2314 2315 caseful or 2316 case-sensitive opposite of -i (default) 2317 2318 case-insensitive or 2319 caseless -i option 2320 2321 debug -d option 2322 default opposite of -s option 2323 ecs -Ce option 2324 fast -F option 2325 full -f option 2326 interactive -I option 2327 lex-compat -l option 2328 meta-ecs -Cm option 2329 perf-report -p option 2330 read -Cr option 2331 stdout -t option 2332 verbose -v option 2333 warn opposite of -w option 2334 (use "%option nowarn" for -w) 2335 2336 array equivalent to "%array" 2337 pointer equivalent to "%pointer" (default) 2338 @end example 2339 2340 Some @samp{%option's} provide features otherwise not available: 2341 2342 @table @samp 2343 @item always-interactive 2344 instructs flex to generate a scanner which always 2345 considers its input "interactive". Normally, on 2346 each new input file the scanner calls @samp{isatty()} in 2347 an attempt to determine whether the scanner's input 2348 source is interactive and thus should be read a 2349 character at a time. When this option is used, 2350 however, then no such call is made. 2351 2352 @item main 2353 directs flex to provide a default @samp{main()} program 2354 for the scanner, which simply calls @samp{yylex()}. This 2355 option implies @code{noyywrap} (see below). 2356 2357 @item never-interactive 2358 instructs flex to generate a scanner which never 2359 considers its input "interactive" (again, no call 2360 made to @samp{isatty())}. This is the opposite of @samp{always-} 2361 @emph{interactive}. 2362 2363 @item stack 2364 enables the use of start condition stacks (see 2365 Start Conditions above). 2366 2367 @item stdinit 2368 if unset (i.e., @samp{%option nostdinit}) initializes @code{yyin} 2369 and @code{yyout} to nil @code{FILE} pointers, instead of @code{stdin} 2370 and @code{stdout}. 2371 2372 @item yylineno 2373 directs @code{flex} to generate a scanner that maintains the number 2374 of the current line read from its input in the global variable 2375 @code{yylineno}. This option is implied by @samp{%option lex-compat}. 2376 2377 @item yywrap 2378 if unset (i.e., @samp{%option noyywrap}), makes the 2379 scanner not call @samp{yywrap()} upon an end-of-file, but 2380 simply assume that there are no more files to scan 2381 (until the user points @code{yyin} at a new file and calls 2382 @samp{yylex()} again). 2383 @end table 2384 2385 @code{flex} scans your rule actions to determine whether you use 2386 the @code{REJECT} or @samp{yymore()} features. The @code{reject} and @code{yymore} 2387 options are available to override its decision as to 2388 whether you use the options, either by setting them (e.g., 2389 @samp{%option reject}) to indicate the feature is indeed used, or 2390 unsetting them to indicate it actually is not used (e.g., 2391 @samp{%option noyymore}). 2392 2393 Three options take string-delimited values, offset with '=': 2394 2395 @example 2396 %option outfile="ABC" 2397 @end example 2398 2399 @noindent 2400 is equivalent to @samp{-oABC}, and 2401 2402 @example 2403 %option prefix="XYZ" 2404 @end example 2405 2406 @noindent 2407 is equivalent to @samp{-PXYZ}. 2408 2409 Finally, 2410 2411 @example 2412 %option yyclass="foo" 2413 @end example 2414 2415 @noindent 2416 only applies when generating a C++ scanner (@samp{-+} option). It 2417 informs @code{flex} that you have derived @samp{foo} as a subclass of 2418 @code{yyFlexLexer} so @code{flex} will place your actions in the member 2419 function @samp{foo::yylex()} instead of @samp{yyFlexLexer::yylex()}. 2420 It also generates a @samp{yyFlexLexer::yylex()} member function that 2421 emits a run-time error (by invoking @samp{yyFlexLexer::LexerError()}) 2422 if called. See Generating C++ Scanners, below, for additional 2423 information. 2424 2425 A number of options are available for lint purists who 2426 want to suppress the appearance of unneeded routines in 2427 the generated scanner. Each of the following, if unset, 2428 results in the corresponding routine not appearing in the 2429 generated scanner: 2430 2431 @example 2432 input, unput 2433 yy_push_state, yy_pop_state, yy_top_state 2434 yy_scan_buffer, yy_scan_bytes, yy_scan_string 2435 @end example 2436 2437 @noindent 2438 (though @samp{yy_push_state()} and friends won't appear anyway 2439 unless you use @samp{%option stack}). 2440 2441 @node Performance, C++, Options, Top 2442 @section Performance considerations 2443 2444 The main design goal of @code{flex} is that it generate 2445 high-performance scanners. It has been optimized for dealing 2446 well with large sets of rules. Aside from the effects on 2447 scanner speed of the table compression @samp{-C} options outlined 2448 above, there are a number of options/actions which degrade 2449 performance. These are, from most expensive to least: 2450 2451 @example 2452 REJECT 2453 %option yylineno 2454 arbitrary trailing context 2455 2456 pattern sets that require backing up 2457 %array 2458 %option interactive 2459 %option always-interactive 2460 2461 '^' beginning-of-line operator 2462 yymore() 2463 @end example 2464 2465 with the first three all being quite expensive and the 2466 last two being quite cheap. Note also that @samp{unput()} is 2467 implemented as a routine call that potentially does quite 2468 a bit of work, while @samp{yyless()} is a quite-cheap macro; so 2469 if just putting back some excess text you scanned, use 2470 @samp{yyless()}. 2471 2472 @code{REJECT} should be avoided at all costs when performance is 2473 important. It is a particularly expensive option. 2474 2475 Getting rid of backing up is messy and often may be an 2476 enormous amount of work for a complicated scanner. In 2477 principal, one begins by using the @samp{-b} flag to generate a 2478 @file{lex.backup} file. For example, on the input 2479 2480 @example 2481 %% 2482 foo return TOK_KEYWORD; 2483 foobar return TOK_KEYWORD; 2484 @end example 2485 2486 @noindent 2487 the file looks like: 2488 2489 @example 2490 State #6 is non-accepting - 2491 associated rule line numbers: 2492 2 3 2493 out-transitions: [ o ] 2494 jam-transitions: EOF [ \001-n p-\177 ] 2495 2496 State #8 is non-accepting - 2497 associated rule line numbers: 2498 3 2499 out-transitions: [ a ] 2500 jam-transitions: EOF [ \001-` b-\177 ] 2501 2502 State #9 is non-accepting - 2503 associated rule line numbers: 2504 3 2505 out-transitions: [ r ] 2506 jam-transitions: EOF [ \001-q s-\177 ] 2507 2508 Compressed tables always back up. 2509 @end example 2510 2511 The first few lines tell us that there's a scanner state 2512 in which it can make a transition on an 'o' but not on any 2513 other character, and that in that state the currently 2514 scanned text does not match any rule. The state occurs 2515 when trying to match the rules found at lines 2 and 3 in 2516 the input file. If the scanner is in that state and then 2517 reads something other than an 'o', it will have to back up 2518 to find a rule which is matched. With a bit of 2519 head-scratching one can see that this must be the state it's in 2520 when it has seen "fo". When this has happened, if 2521 anything other than another 'o' is seen, the scanner will 2522 have to back up to simply match the 'f' (by the default 2523 rule). 2524 2525 The comment regarding State #8 indicates there's a problem 2526 when "foob" has been scanned. Indeed, on any character 2527 other than an 'a', the scanner will have to back up to 2528 accept "foo". Similarly, the comment for State #9 2529 concerns when "fooba" has been scanned and an 'r' does not 2530 follow. 2531 2532 The final comment reminds us that there's no point going 2533 to all the trouble of removing backing up from the rules 2534 unless we're using @samp{-Cf} or @samp{-CF}, since there's no 2535 performance gain doing so with compressed scanners. 2536 2537 The way to remove the backing up is to add "error" rules: 2538 2539 @example 2540 %% 2541 foo return TOK_KEYWORD; 2542 foobar return TOK_KEYWORD; 2543 2544 fooba | 2545 foob | 2546 fo @{ 2547 /* false alarm, not really a keyword */ 2548 return TOK_ID; 2549 @} 2550 @end example 2551 2552 Eliminating backing up among a list of keywords can also 2553 be done using a "catch-all" rule: 2554 2555 @example 2556 %% 2557 foo return TOK_KEYWORD; 2558 foobar return TOK_KEYWORD; 2559 2560 [a-z]+ return TOK_ID; 2561 @end example 2562 2563 This is usually the best solution when appropriate. 2564 2565 Backing up messages tend to cascade. With a complicated 2566 set of rules it's not uncommon to get hundreds of 2567 messages. If one can decipher them, though, it often only 2568 takes a dozen or so rules to eliminate the backing up 2569 (though it's easy to make a mistake and have an error rule 2570 accidentally match a valid token. A possible future @code{flex} 2571 feature will be to automatically add rules to eliminate 2572 backing up). 2573 2574 It's important to keep in mind that you gain the benefits 2575 of eliminating backing up only if you eliminate @emph{every} 2576 instance of backing up. Leaving just one means you gain 2577 nothing. 2578 2579 @var{Variable} trailing context (where both the leading and 2580 trailing parts do not have a fixed length) entails almost 2581 the same performance loss as @code{REJECT} (i.e., substantial). 2582 So when possible a rule like: 2583 2584 @example 2585 %% 2586 mouse|rat/(cat|dog) run(); 2587 @end example 2588 2589 @noindent 2590 is better written: 2591 2592 @example 2593 %% 2594 mouse/cat|dog run(); 2595 rat/cat|dog run(); 2596 @end example 2597 2598 @noindent 2599 or as 2600 2601 @example 2602 %% 2603 mouse|rat/cat run(); 2604 mouse|rat/dog run(); 2605 @end example 2606 2607 Note that here the special '|' action does @emph{not} provide any 2608 savings, and can even make things worse (see Deficiencies 2609 / Bugs below). 2610 2611 Another area where the user can increase a scanner's 2612 performance (and one that's easier to implement) arises from 2613 the fact that the longer the tokens matched, the faster 2614 the scanner will run. This is because with long tokens 2615 the processing of most input characters takes place in the 2616 (short) inner scanning loop, and does not often have to go 2617 through the additional work of setting up the scanning 2618 environment (e.g., @code{yytext}) for the action. Recall the 2619 scanner for C comments: 2620 2621 @example 2622 %x comment 2623 %% 2624 int line_num = 1; 2625 2626 "/*" BEGIN(comment); 2627 2628 <comment>[^*\n]* 2629 <comment>"*"+[^*/\n]* 2630 <comment>\n ++line_num; 2631 <comment>"*"+"/" BEGIN(INITIAL); 2632 @end example 2633 2634 This could be sped up by writing it as: 2635 2636 @example 2637 %x comment 2638 %% 2639 int line_num = 1; 2640 2641 "/*" BEGIN(comment); 2642 2643 <comment>[^*\n]* 2644 <comment>[^*\n]*\n ++line_num; 2645 <comment>"*"+[^*/\n]* 2646 <comment>"*"+[^*/\n]*\n ++line_num; 2647 <comment>"*"+"/" BEGIN(INITIAL); 2648 @end example 2649 2650 Now instead of each newline requiring the processing of 2651 another action, recognizing the newlines is "distributed" 2652 over the other rules to keep the matched text as long as 2653 possible. Note that @emph{adding} rules does @emph{not} slow down the 2654 scanner! The speed of the scanner is independent of the 2655 number of rules or (modulo the considerations given at the 2656 beginning of this section) how complicated the rules are 2657 with regard to operators such as '*' and '|'. 2658 2659 A final example in speeding up a scanner: suppose you want 2660 to scan through a file containing identifiers and 2661 keywords, one per line and with no other extraneous 2662 characters, and recognize all the keywords. A natural first 2663 approach is: 2664 2665 @example 2666 %% 2667 asm | 2668 auto | 2669 break | 2670 @dots{} etc @dots{} 2671 volatile | 2672 while /* it's a keyword */ 2673 2674 .|\n /* it's not a keyword */ 2675 @end example 2676 2677 To eliminate the back-tracking, introduce a catch-all 2678 rule: 2679 2680 @example 2681 %% 2682 asm | 2683 auto | 2684 break | 2685 ... etc ... 2686 volatile | 2687 while /* it's a keyword */ 2688 2689 [a-z]+ | 2690 .|\n /* it's not a keyword */ 2691 @end example 2692 2693 Now, if it's guaranteed that there's exactly one word per 2694 line, then we can reduce the total number of matches by a 2695 half by merging in the recognition of newlines with that 2696 of the other tokens: 2697 2698 @example 2699 %% 2700 asm\n | 2701 auto\n | 2702 break\n | 2703 @dots{} etc @dots{} 2704 volatile\n | 2705 while\n /* it's a keyword */ 2706 2707 [a-z]+\n | 2708 .|\n /* it's not a keyword */ 2709 @end example 2710 2711 One has to be careful here, as we have now reintroduced 2712 backing up into the scanner. In particular, while @emph{we} know 2713 that there will never be any characters in the input 2714 stream other than letters or newlines, @code{flex} can't figure 2715 this out, and it will plan for possibly needing to back up 2716 when it has scanned a token like "auto" and then the next 2717 character is something other than a newline or a letter. 2718 Previously it would then just match the "auto" rule and be 2719 done, but now it has no "auto" rule, only a "auto\n" rule. 2720 To eliminate the possibility of backing up, we could 2721 either duplicate all rules but without final newlines, or, 2722 since we never expect to encounter such an input and 2723 therefore don't how it's classified, we can introduce one 2724 more catch-all rule, this one which doesn't include a 2725 newline: 2726 2727 @example 2728 %% 2729 asm\n | 2730 auto\n | 2731 break\n | 2732 @dots{} etc @dots{} 2733 volatile\n | 2734 while\n /* it's a keyword */ 2735 2736 [a-z]+\n | 2737 [a-z]+ | 2738 .|\n /* it's not a keyword */ 2739 @end example 2740 2741 Compiled with @samp{-Cf}, this is about as fast as one can get a 2742 @code{flex} scanner to go for this particular problem. 2743 2744 A final note: @code{flex} is slow when matching NUL's, 2745 particularly when a token contains multiple NUL's. It's best to 2746 write rules which match @emph{short} amounts of text if it's 2747 anticipated that the text will often include NUL's. 2748 2749 Another final note regarding performance: as mentioned 2750 above in the section How the Input is Matched, dynamically 2751 resizing @code{yytext} to accommodate huge tokens is a slow 2752 process because it presently requires that the (huge) token 2753 be rescanned from the beginning. Thus if performance is 2754 vital, you should attempt to match "large" quantities of 2755 text but not "huge" quantities, where the cutoff between 2756 the two is at about 8K characters/token. 2757 2758 @node C++, Incompatibilities, Performance, Top 2759 @section Generating C++ scanners 2760 2761 @code{flex} provides two different ways to generate scanners for 2762 use with C++. The first way is to simply compile a 2763 scanner generated by @code{flex} using a C++ compiler instead of a C 2764 compiler. You should not encounter any compilations 2765 errors (please report any you find to the email address 2766 given in the Author section below). You can then use C++ 2767 code in your rule actions instead of C code. Note that 2768 the default input source for your scanner remains @code{yyin}, 2769 and default echoing is still done to @code{yyout}. Both of these 2770 remain @samp{FILE *} variables and not C++ @code{streams}. 2771 2772 You can also use @code{flex} to generate a C++ scanner class, using 2773 the @samp{-+} option, (or, equivalently, @samp{%option c++}), which 2774 is automatically specified if the name of the flex executable ends 2775 in a @samp{+}, such as @code{flex++}. When using this option, flex 2776 defaults to generating the scanner to the file @file{lex.yy.cc} instead 2777 of @file{lex.yy.c}. The generated scanner includes the header file 2778 @file{FlexLexer.h}, which defines the interface to two C++ classes. 2779 2780 The first class, @code{FlexLexer}, provides an abstract base 2781 class defining the general scanner class interface. It 2782 provides the following member functions: 2783 2784 @table @samp 2785 @item const char* YYText() 2786 returns the text of the most recently matched 2787 token, the equivalent of @code{yytext}. 2788 2789 @item int YYLeng() 2790 returns the length of the most recently matched 2791 token, the equivalent of @code{yyleng}. 2792 2793 @item int lineno() const 2794 returns the current input line number (see @samp{%option yylineno}), 2795 or 1 if @samp{%option yylineno} was not used. 2796 2797 @item void set_debug( int flag ) 2798 sets the debugging flag for the scanner, equivalent to assigning to 2799 @code{yy_flex_debug} (see the Options section above). Note that you 2800 must build the scanner using @samp{%option debug} to include debugging 2801 information in it. 2802 2803 @item int debug() const 2804 returns the current setting of the debugging flag. 2805 @end table 2806 2807 Also provided are member functions equivalent to 2808 @samp{yy_switch_to_buffer(), yy_create_buffer()} (though the 2809 first argument is an @samp{istream*} object pointer and not a 2810 @samp{FILE*}, @samp{yy_flush_buffer()}, @samp{yy_delete_buffer()}, 2811 and @samp{yyrestart()} (again, the first argument is a @samp{istream*} 2812 object pointer). 2813 2814 The second class defined in @file{FlexLexer.h} is @code{yyFlexLexer}, 2815 which is derived from @code{FlexLexer}. It defines the following 2816 additional member functions: 2817 2818 @table @samp 2819 @item yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ) 2820 constructs a @code{yyFlexLexer} object using the given 2821 streams for input and output. If not specified, 2822 the streams default to @code{cin} and @code{cout}, respectively. 2823 2824 @item virtual int yylex() 2825 performs the same role is @samp{yylex()} does for ordinary 2826 flex scanners: it scans the input stream, consuming 2827 tokens, until a rule's action returns a value. If you derive a subclass 2828 @var{S} 2829 from @code{yyFlexLexer} 2830 and want to access the member functions and variables of 2831 @var{S} 2832 inside @samp{yylex()}, 2833 then you need to use @samp{%option yyclass="@var{S}"} 2834 to inform @code{flex} 2835 that you will be using that subclass instead of @code{yyFlexLexer}. 2836 In this case, rather than generating @samp{yyFlexLexer::yylex()}, 2837 @code{flex} generates @samp{@var{S}::yylex()} 2838 (and also generates a dummy @samp{yyFlexLexer::yylex()} 2839 that calls @samp{yyFlexLexer::LexerError()} 2840 if called). 2841 2842 @item virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0) 2843 reassigns @code{yyin} to @code{new_in} 2844 (if non-nil) 2845 and @code{yyout} to @code{new_out} 2846 (ditto), deleting the previous input buffer if @code{yyin} 2847 is reassigned. 2848 2849 @item int yylex( istream* new_in = 0, ostream* new_out = 0 ) 2850 first switches the input streams via @samp{switch_streams( new_in, new_out )} 2851 and then returns the value of @samp{yylex()}. 2852 @end table 2853 2854 In addition, @code{yyFlexLexer} defines the following protected 2855 virtual functions which you can redefine in derived 2856 classes to tailor the scanner: 2857 2858 @table @samp 2859 @item virtual int LexerInput( char* buf, int max_size ) 2860 reads up to @samp{max_size} characters into @var{buf} and 2861 returns the number of characters read. To indicate 2862 end-of-input, return 0 characters. Note that 2863 "interactive" scanners (see the @samp{-B} and @samp{-I} flags) 2864 define the macro @code{YY_INTERACTIVE}. If you redefine 2865 @code{LexerInput()} and need to take different actions 2866 depending on whether or not the scanner might be 2867 scanning an interactive input source, you can test 2868 for the presence of this name via @samp{#ifdef}. 2869 2870 @item virtual void LexerOutput( const char* buf, int size ) 2871 writes out @var{size} characters from the buffer @var{buf}, 2872 which, while NUL-terminated, may also contain 2873 "internal" NUL's if the scanner's rules can match 2874 text with NUL's in them. 2875 2876 @item virtual void LexerError( const char* msg ) 2877 reports a fatal error message. The default version 2878 of this function writes the message to the stream 2879 @code{cerr} and exits. 2880 @end table 2881 2882 Note that a @code{yyFlexLexer} object contains its @emph{entire} 2883 scanning state. Thus you can use such objects to create 2884 reentrant scanners. You can instantiate multiple instances of 2885 the same @code{yyFlexLexer} class, and you can also combine 2886 multiple C++ scanner classes together in the same program 2887 using the @samp{-P} option discussed above. 2888 Finally, note that the @samp{%array} feature is not available to 2889 C++ scanner classes; you must use @samp{%pointer} (the default). 2890 2891 Here is an example of a simple C++ scanner: 2892 2893 @example 2894 // An example of using the flex C++ scanner class. 2895 2896 %@{ 2897 int mylineno = 0; 2898 %@} 2899 2900 string \"[^\n"]+\" 2901 2902 ws [ \t]+ 2903 2904 alpha [A-Za-z] 2905 dig [0-9] 2906 name (@{alpha@}|@{dig@}|\$)(@{alpha@}|@{dig@}|[_.\-/$])* 2907 num1 [-+]?@{dig@}+\.?([eE][-+]?@{dig@}+)? 2908 num2 [-+]?@{dig@}*\.@{dig@}+([eE][-+]?@{dig@}+)? 2909 number @{num1@}|@{num2@} 2910 2911 %% 2912 2913 @{ws@} /* skip blanks and tabs */ 2914 2915 "/*" @{ 2916 int c; 2917 2918 while((c = yyinput()) != 0) 2919 @{ 2920 if(c == '\n') 2921 ++mylineno; 2922 2923 else if(c == '*') 2924 @{ 2925 if((c = yyinput()) == '/') 2926 break; 2927 else 2928 unput(c); 2929 @} 2930 @} 2931 @} 2932 2933 @{number@} cout << "number " << YYText() << '\n'; 2934 2935 \n mylineno++; 2936 2937 @{name@} cout << "name " << YYText() << '\n'; 2938 2939 @{string@} cout << "string " << YYText() << '\n'; 2940 2941 %% 2942 2943 Version 2.5 December 1994 44 2944 2945 int main( int /* argc */, char** /* argv */ ) 2946 @{ 2947 FlexLexer* lexer = new yyFlexLexer; 2948 while(lexer->yylex() != 0) 2949 ; 2950 return 0; 2951 @} 2952 @end example 2953 2954 If you want to create multiple (different) lexer classes, 2955 you use the @samp{-P} flag (or the @samp{prefix=} option) to rename each 2956 @code{yyFlexLexer} to some other @code{xxFlexLexer}. You then can 2957 include @samp{<FlexLexer.h>} in your other sources once per lexer 2958 class, first renaming @code{yyFlexLexer} as follows: 2959 2960 @example 2961 #undef yyFlexLexer 2962 #define yyFlexLexer xxFlexLexer 2963 #include <FlexLexer.h> 2964 2965 #undef yyFlexLexer 2966 #define yyFlexLexer zzFlexLexer 2967 #include <FlexLexer.h> 2968 @end example 2969 2970 if, for example, you used @samp{%option prefix="xx"} for one of 2971 your scanners and @samp{%option prefix="zz"} for the other. 2972 2973 IMPORTANT: the present form of the scanning class is 2974 @emph{experimental} and may change considerably between major 2975 releases. 2976 2977 @node Incompatibilities, Diagnostics, C++, Top 2978 @section Incompatibilities with @code{lex} and POSIX 2979 2980 @code{flex} is a rewrite of the AT&T Unix @code{lex} tool (the two 2981 implementations do not share any code, though), with some 2982 extensions and incompatibilities, both of which are of 2983 concern to those who wish to write scanners acceptable to 2984 either implementation. Flex is fully compliant with the 2985 POSIX @code{lex} specification, except that when using @samp{%pointer} 2986 (the default), a call to @samp{unput()} destroys the contents of 2987 @code{yytext}, which is counter to the POSIX specification. 2988 2989 In this section we discuss all of the known areas of 2990 incompatibility between flex, AT&T lex, and the POSIX 2991 specification. 2992 2993 @code{flex's} @samp{-l} option turns on maximum compatibility with the 2994 original AT&T @code{lex} implementation, at the cost of a major 2995 loss in the generated scanner's performance. We note 2996 below which incompatibilities can be overcome using the @samp{-l} 2997 option. 2998 2999 @code{flex} is fully compatible with @code{lex} with the following 3000 exceptions: 3001 3002 @itemize - 3003 @item 3004 The undocumented @code{lex} scanner internal variable @code{yylineno} 3005 is not supported unless @samp{-l} or @samp{%option yylineno} is used. 3006 @code{yylineno} should be maintained on a per-buffer basis, rather 3007 than a per-scanner (single global variable) basis. @code{yylineno} is 3008 not part of the POSIX specification. 3009 3010 @item 3011 The @samp{input()} routine is not redefinable, though it 3012 may be called to read characters following whatever 3013 has been matched by a rule. If @samp{input()} encounters 3014 an end-of-file the normal @samp{yywrap()} processing is 3015 done. A ``real'' end-of-file is returned by 3016 @samp{input()} as @code{EOF}. 3017 3018 Input is instead controlled by defining the 3019 @code{YY_INPUT} macro. 3020 3021 The @code{flex} restriction that @samp{input()} cannot be 3022 redefined is in accordance with the POSIX 3023 specification, which simply does not specify any way of 3024 controlling the scanner's input other than by making 3025 an initial assignment to @code{yyin}. 3026 3027 @item 3028 The @samp{unput()} routine is not redefinable. This 3029 restriction is in accordance with POSIX. 3030 3031 @item 3032 @code{flex} scanners are not as reentrant as @code{lex} scanners. 3033 In particular, if you have an interactive scanner 3034 and an interrupt handler which long-jumps out of 3035 the scanner, and the scanner is subsequently called 3036 again, you may get the following message: 3037 3038 @example 3039 fatal flex scanner internal error--end of buffer missed 3040 @end example 3041 3042 To reenter the scanner, first use 3043 3044 @example 3045 yyrestart( yyin ); 3046 @end example 3047 3048 Note that this call will throw away any buffered 3049 input; usually this isn't a problem with an 3050 interactive scanner. 3051 3052 Also note that flex C++ scanner classes @emph{are} 3053 reentrant, so if using C++ is an option for you, you 3054 should use them instead. See "Generating C++ 3055 Scanners" above for details. 3056 3057 @item 3058 @samp{output()} is not supported. Output from the @samp{ECHO} 3059 macro is done to the file-pointer @code{yyout} (default 3060 @code{stdout}). 3061 3062 @samp{output()} is not part of the POSIX specification. 3063 3064 @item 3065 @code{lex} does not support exclusive start conditions 3066 (%x), though they are in the POSIX specification. 3067 3068 @item 3069 When definitions are expanded, @code{flex} encloses them 3070 in parentheses. With lex, the following: 3071 3072 @example 3073 NAME [A-Z][A-Z0-9]* 3074 %% 3075 foo@{NAME@}? printf( "Found it\n" ); 3076 %% 3077 @end example 3078 3079 will not match the string "foo" because when the 3080 macro is expanded the rule is equivalent to 3081 "foo[A-Z][A-Z0-9]*?" and the precedence is such that the 3082 '?' is associated with "[A-Z0-9]*". With @code{flex}, the 3083 rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and 3084 so the string "foo" will match. 3085 3086 Note that if the definition begins with @samp{^} or ends 3087 with @samp{$} then it is @emph{not} expanded with parentheses, to 3088 allow these operators to appear in definitions 3089 without losing their special meanings. But the 3090 @samp{<s>, /}, and @samp{<<EOF>>} operators cannot be used in a 3091 @code{flex} definition. 3092 3093 Using @samp{-l} results in the @code{lex} behavior of no 3094 parentheses around the definition. 3095 3096 The POSIX specification is that the definition be enclosed in 3097 parentheses. 3098 3099 @item 3100 Some implementations of @code{lex} allow a rule's action to begin on 3101 a separate line, if the rule's pattern has trailing whitespace: 3102 3103 @example 3104 %% 3105 foo|bar<space here> 3106 @{ foobar_action(); @} 3107 @end example 3108 3109 @code{flex} does not support this feature. 3110 3111 @item 3112 The @code{lex} @samp{%r} (generate a Ratfor scanner) option is 3113 not supported. It is not part of the POSIX 3114 specification. 3115 3116 @item 3117 After a call to @samp{unput()}, @code{yytext} is undefined until 3118 the next token is matched, unless the scanner was 3119 built using @samp{%array}. This is not the case with @code{lex} 3120 or the POSIX specification. The @samp{-l} option does 3121 away with this incompatibility. 3122 3123 @item 3124 The precedence of the @samp{@{@}} (numeric range) operator 3125 is different. @code{lex} interprets "abc@{1,3@}" as "match 3126 one, two, or three occurrences of 'abc'", whereas 3127 @code{flex} interprets it as "match 'ab' followed by one, 3128 two, or three occurrences of 'c'". The latter is 3129 in agreement with the POSIX specification. 3130 3131 @item 3132 The precedence of the @samp{^} operator is different. @code{lex} 3133 interprets "^foo|bar" as "match either 'foo' at the 3134 beginning of a line, or 'bar' anywhere", whereas 3135 @code{flex} interprets it as "match either 'foo' or 'bar' 3136 if they come at the beginning of a line". The 3137 latter is in agreement with the POSIX specification. 3138 3139 @item 3140 The special table-size declarations such as @samp{%a} 3141 supported by @code{lex} are not required by @code{flex} scanners; 3142 @code{flex} ignores them. 3143 3144 @item 3145 The name FLEX_SCANNER is #define'd so scanners may 3146 be written for use with either @code{flex} or @code{lex}. 3147 Scanners also include @code{YY_FLEX_MAJOR_VERSION} and 3148 @code{YY_FLEX_MINOR_VERSION} indicating which version of 3149 @code{flex} generated the scanner (for example, for the 3150 2.5 release, these defines would be 2 and 5 3151 respectively). 3152 @end itemize 3153 3154 The following @code{flex} features are not included in @code{lex} or the 3155 POSIX specification: 3156 3157 @example 3158 C++ scanners 3159 %option 3160 start condition scopes 3161 start condition stacks 3162 interactive/non-interactive scanners 3163 yy_scan_string() and friends 3164 yyterminate() 3165 yy_set_interactive() 3166 yy_set_bol() 3167 YY_AT_BOL() 3168 <<EOF>> 3169 <*> 3170 YY_DECL 3171 YY_START 3172 YY_USER_ACTION 3173 YY_USER_INIT 3174 #line directives 3175 %@{@}'s around actions 3176 multiple actions on a line 3177 @end example 3178 3179 @noindent 3180 plus almost all of the flex flags. The last feature in 3181 the list refers to the fact that with @code{flex} you can put 3182 multiple actions on the same line, separated with 3183 semicolons, while with @code{lex}, the following 3184 3185 @example 3186 foo handle_foo(); ++num_foos_seen; 3187 @end example 3188 3189 @noindent 3190 is (rather surprisingly) truncated to 3191 3192 @example 3193 foo handle_foo(); 3194 @end example 3195 3196 @code{flex} does not truncate the action. Actions that are not 3197 enclosed in braces are simply terminated at the end of the 3198 line. 3199 3200 @node Diagnostics, Files, Incompatibilities, Top 3201 @section Diagnostics 3202 3203 @table @samp 3204 @item warning, rule cannot be matched 3205 indicates that the given 3206 rule cannot be matched because it follows other rules that 3207 will always match the same text as it. For example, in 3208 the following "foo" cannot be matched because it comes 3209 after an identifier "catch-all" rule: 3210 3211 @example 3212 [a-z]+ got_identifier(); 3213 foo got_foo(); 3214 @end example 3215 3216 Using @code{REJECT} in a scanner suppresses this warning. 3217 3218 @item warning, -s option given but default rule can be matched 3219 means that it is possible (perhaps only in a particular 3220 start condition) that the default rule (match any single 3221 character) is the only one that will match a particular 3222 input. Since @samp{-s} was given, presumably this is not 3223 intended. 3224 3225 @item reject_used_but_not_detected undefined 3226 @itemx yymore_used_but_not_detected undefined 3227 These errors can 3228 occur at compile time. They indicate that the scanner 3229 uses @code{REJECT} or @samp{yymore()} but that @code{flex} failed to notice the 3230 fact, meaning that @code{flex} scanned the first two sections 3231 looking for occurrences of these actions and failed to 3232 find any, but somehow you snuck some in (via a #include 3233 file, for example). Use @samp{%option reject} or @samp{%option yymore} 3234 to indicate to flex that you really do use these features. 3235 3236 @item flex scanner jammed 3237 a scanner compiled with @samp{-s} has 3238 encountered an input string which wasn't matched by any of 3239 its rules. This error can also occur due to internal 3240 problems. 3241 3242 @item token too large, exceeds YYLMAX 3243 your scanner uses @samp{%array} 3244 and one of its rules matched a string longer than the @samp{YYL-} 3245 @code{MAX} constant (8K bytes by default). You can increase the 3246 value by #define'ing @code{YYLMAX} in the definitions section of 3247 your @code{flex} input. 3248 3249 @item scanner requires -8 flag to use the character '@var{x}' 3250 Your 3251 scanner specification includes recognizing the 8-bit 3252 character @var{x} and you did not specify the -8 flag, and your 3253 scanner defaulted to 7-bit because you used the @samp{-Cf} or @samp{-CF} 3254 table compression options. See the discussion of the @samp{-7} 3255 flag for details. 3256 3257 @item flex scanner push-back overflow 3258 you used @samp{unput()} to push 3259 back so much text that the scanner's buffer could not hold 3260 both the pushed-back text and the current token in @code{yytext}. 3261 Ideally the scanner should dynamically resize the buffer 3262 in this case, but at present it does not. 3263 3264 @item input buffer overflow, can't enlarge buffer because scanner uses REJECT 3265 the scanner was working on matching an 3266 extremely large token and needed to expand the input 3267 buffer. This doesn't work with scanners that use @code{REJECT}. 3268 3269 @item fatal flex scanner internal error--end of buffer missed 3270 This can occur in an scanner which is reentered after a 3271 long-jump has jumped out (or over) the scanner's 3272 activation frame. Before reentering the scanner, use: 3273 3274 @example 3275 yyrestart( yyin ); 3276 @end example 3277 3278 @noindent 3279 or, as noted above, switch to using the C++ scanner class. 3280 3281 @item too many start conditions in <> construct! 3282 you listed 3283 more start conditions in a <> construct than exist (so you 3284 must have listed at least one of them twice). 3285 @end table 3286 3287 @node Files, Deficiencies, Diagnostics, Top 3288 @section Files 3289 3290 @table @file 3291 @item -lfl 3292 library with which scanners must be linked. 3293 3294 @item lex.yy.c 3295 generated scanner (called @file{lexyy.c} on some systems). 3296 3297 @item lex.yy.cc 3298 generated C++ scanner class, when using @samp{-+}. 3299 3300 @item <FlexLexer.h> 3301 header file defining the C++ scanner base class, 3302 @code{FlexLexer}, and its derived class, @code{yyFlexLexer}. 3303 3304 @item flex.skl 3305 skeleton scanner. This file is only used when 3306 building flex, not when flex executes. 3307 3308 @item lex.backup 3309 backing-up information for @samp{-b} flag (called @file{lex.bck} 3310 on some systems). 3311 @end table 3312 3313 @node Deficiencies, See also, Files, Top 3314 @section Deficiencies / Bugs 3315 3316 Some trailing context patterns cannot be properly matched 3317 and generate warning messages ("dangerous trailing 3318 context"). These are patterns where the ending of the first 3319 part of the rule matches the beginning of the second part, 3320 such as "zx*/xy*", where the 'x*' matches the 'x' at the 3321 beginning of the trailing context. (Note that the POSIX 3322 draft states that the text matched by such patterns is 3323 undefined.) 3324 3325 For some trailing context rules, parts which are actually 3326 fixed-length are not recognized as such, leading to the 3327 abovementioned performance loss. In particular, parts 3328 using '|' or @{n@} (such as "foo@{3@}") are always considered 3329 variable-length. 3330 3331 Combining trailing context with the special '|' action can 3332 result in @emph{fixed} trailing context being turned into the 3333 more expensive @var{variable} trailing context. For example, in 3334 the following: 3335 3336 @example 3337 %% 3338 abc | 3339 xyz/def 3340 @end example 3341 3342 Use of @samp{unput()} invalidates yytext and yyleng, unless the 3343 @samp{%array} directive or the @samp{-l} option has been used. 3344 3345 Pattern-matching of NUL's is substantially slower than 3346 matching other characters. 3347 3348 Dynamic resizing of the input buffer is slow, as it 3349 entails rescanning all the text matched so far by the 3350 current (generally huge) token. 3351 3352 Due to both buffering of input and read-ahead, you cannot 3353 intermix calls to <stdio.h> routines, such as, for 3354 example, @samp{getchar()}, with @code{flex} rules and expect it to work. 3355 Call @samp{input()} instead. 3356 3357 The total table entries listed by the @samp{-v} flag excludes the 3358 number of table entries needed to determine what rule has 3359 been matched. The number of entries is equal to the 3360 number of DFA states if the scanner does not use @code{REJECT}, and 3361 somewhat greater than the number of states if it does. 3362 3363 @code{REJECT} cannot be used with the @samp{-f} or @samp{-F} options. 3364 3365 The @code{flex} internal algorithms need documentation. 3366 3367 @node See also, Author, Deficiencies, Top 3368 @section See also 3369 3370 @code{lex}(1), @code{yacc}(1), @code{sed}(1), @code{awk}(1). 3371 3372 John Levine, Tony Mason, and Doug Brown: Lex & Yacc; 3373 O'Reilly and Associates. Be sure to get the 2nd edition. 3374 3375 M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator. 3376 3377 Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: 3378 Principles, Techniques and Tools; Addison-Wesley (1986). 3379 Describes the pattern-matching techniques used by @code{flex} 3380 (deterministic finite automata). 3381 3382 @node Author, , See also, Top 3383 @section Author 3384 3385 Vern Paxson, with the help of many ideas and much inspiration from 3386 Van Jacobson. Original version by Jef Poskanzer. The fast table 3387 representation is a partial implementation of a design done by Van 3388 Jacobson. The implementation was done by Kevin Gong and Vern Paxson. 3389 3390 Thanks to the many @code{flex} beta-testers, feedbackers, and 3391 contributors, especially Francois Pinard, Casey Leedom, Stan 3392 Adermann, Terry Allen, David Barker-Plummer, John Basrai, Nelson 3393 H.F. Beebe, @samp{benson@@odi.com}, Karl Berry, Peter A. Bigot, 3394 Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank, Kin 3395 Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin, 3396 Bill Cox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris 3397 G. Demetriou, Theo Deraadt, Mike Donahue, Chuck Doucette, Tom Epperly, 3398 Leo Eskin, Chris Faylor, Chris Flatters, Jon Forrest, Joe Gayda, Kaveh 3399 R. Ghazi, Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer 3400 Griebel, Jan Hajic, Charles Hemphill, NORO Hideo, Jarkko Hietaniemi, 3401 Scott Hofmann, Jeff Honig, Dana Hudes, Eric Hughes, John Interrante, 3402 Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, 3403 Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane, 3404 Amir Katz, @samp{ken@@ken.hilco.com}, Kevin B. Kenny, Steve Kirsch, 3405 Winfried Koenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, 3406 Craig Leres, John Levine, Steve Liddle, Mike Long, Mohamed el Lozy, 3407 Brian Madsen, Malte, Joe Marshall, Bengt Martensson, Chris Metcalf, 3408 Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum, 3409 G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, Richard Ohnemus, 3410 Karsten Pahnke, Sven Panne, Roland Pesch, Walter Pelissero, Gaumond 3411 Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic 3412 Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel, 3413 Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, Raf 3414 Schietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex 3415 Siegel, Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, 3416 Dave Tallman, Ian Lance Taylor, Chris Thewalt, Richard M. Timoney, 3417 Jodi Tsai, Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, 3418 Kent Williams, Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, and 3419 those whose names have slipped my marginal mail-archiving skills but 3420 whose contributions are appreciated all the same. 3421 3422 Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore, 3423 Craig Leres, John Levine, Bob Mulcahy, G.T. Nicol, Francois Pinard, 3424 Rich Salz, and Richard Stallman for help with various distribution 3425 headaches. 3426 3427 Thanks to Esmond Pitt and Earle Horton for 8-bit character support; 3428 to Benson Margulies and Fred Burke for C++ support; to Kent Williams 3429 and Tom Epperly for C++ class support; to Ove Ewerlid for support of 3430 NUL's; and to Eric Hughes for support of multiple buffers. 3431 3432 This work was primarily done when I was with the Real Time Systems 3433 Group at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks 3434 to all there for the support I received. 3435 3436 Send comments to @samp{vern@@ee.lbl.gov}. 3437 3438 @c @node Index, , Top, Top 3439 @c @unnumbered Index 3440 @c 3441 @c @printindex cp 3442 3443 @contents 3444 @bye 3445 3446 @c Local variables: 3447 @c texinfo-column-for-description: 32 3448 @c End: 3449