Home | History | Annotate | Download | only in antlr3
      1 #!/usr/bin/ruby
      2 # encoding: utf-8
      3 
      4 =begin LICENSE
      5 
      6 [The "BSD licence"]
      7 Copyright (c) 2009-2010 Kyle Yetter
      8 All rights reserved.
      9 
     10 Redistribution and use in source and binary forms, with or without
     11 modification, are permitted provided that the following conditions
     12 are met:
     13 
     14  1. Redistributions of source code must retain the above copyright
     15     notice, this list of conditions and the following disclaimer.
     16  2. Redistributions in binary form must reproduce the above copyright
     17     notice, this list of conditions and the following disclaimer in the
     18     documentation and/or other materials provided with the distribution.
     19  3. The name of the author may not be used to endorse or promote products
     20     derived from this software without specific prior written permission.
     21 
     22 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
     23 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
     24 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
     25 IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
     26 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
     27 NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
     28 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
     29 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
     30 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
     31 THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     32 
     33 =end
     34 
     35 module ANTLR3
     36 
     37 
     38 =begin rdoc ANTLR3::Stream
     39 
     40 = ANTLR3 Streams
     41 
     42 This documentation first covers the general concept of streams as used by ANTLR
     43 recognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module.
     44 
     45 == ANTLR Stream Classes
     46 
     47 ANTLR recognizers need a way to walk through input data in a serialized IO-style
     48 fashion. They also need some book-keeping about the input to provide useful
     49 information to developers, such as current line number and column. Furthermore,
     50 to implement backtracking and various error recovery techniques, recognizers
     51 need a way to record various locations in the input at a number of points in the
     52 recognition process so the input state may be restored back to a prior state.
     53 
     54 ANTLR bundles all of this functionality into a number of Stream classes, each
     55 designed to be used by recognizers for a specific recognition task. Most of the
     56 Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default
     57 when 'antlr3' is required.
     58 
     59 ---
     60 
     61 Here's a brief overview of the various stream classes and their respective
     62 purpose:
     63 
     64 StringStream::
     65   Similar to StringIO from the standard Ruby library, StringStream wraps raw
     66   String data in a Stream interface for use by ANTLR lexers.
     67 FileStream::
     68   A subclass of StringStream, FileStream simply wraps data read from an IO or
     69   File object for use by lexers.
     70 CommonTokenStream::
     71   The job of a TokenStream is to read lexer output and then provide ANTLR
     72   parsers with the means to sequential walk through series of tokens.
     73   CommonTokenStream is the default TokenStream implementation.
     74 TokenRewriteStream::
     75   A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers
     76   the ability to produce new output text from an input token-sequence by
     77   managing rewrite "programs" on top of the stream.
     78 CommonTreeNodeStream::
     79   In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens
     80   to recognizers in a sequential fashion. However, the stream object serializes
     81   an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves
     82   the two-dimensional shape of the tree using special UP and DOWN tokens. The
     83   sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not
     84   defined in antlr3/stream.rb, but antlr3/tree.rb
     85 
     86 ---
     87 
     88 The next few sections cover the most significant methods of all stream classes. 
     89 
     90 === consume / look / peek
     91 
     92 <tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are
     93 advanced by one character and TokenStreams are advanced by one token.
     94 
     95 <tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest
     96 to a recognizer at look-ahead position specified by <tt>k</tt>. For
     97 <b>StringStreams</b>, this is the <i>integer value of the character</i>
     98 <tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this
     99 is the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the
    100 stream cursor.
    101 
    102 <tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at
    103 look-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the
    104 <i>bare-minimum lightweight information</i> that the recognizer needs,
    105 <tt>look</tt> provides the <i>full object of concern</i> in the stream. For
    106 <b>StringStreams</b>, this is a <i>string object containing the single
    107 character</i> <tt>k</tt> characters ahead of the stream cursor. For
    108 <b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens
    109 ahead of the stream cursor.
    110 
    111 <b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is
    112 implemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is
    113 implemented by some method with a name like <tt>LT(k)</tt>. When writing this
    114 Ruby runtime API, I found this naming practice both confusing, ambiguous, and
    115 un-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a
    116 quick-look (peek) and a full-fledged look-ahead operation (look). If this causes
    117 confusion or any sort of compatibility strife for developers using this
    118 implementation, all apologies.
    119 
    120 === mark / rewind / release
    121 
    122 <tt>marker = stream.mark</tt> causes the stream to record important information
    123 about the current stream state, place the data in an internal memory table, and
    124 return a memento, <tt>marker</tt>. The marker object is typically an integer key
    125 to the stream's internal memory table.
    126 
    127 Used in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can
    128 be used to restore the stream to an earlier state. This is used by recognizers
    129 to perform tasks such as backtracking and error recovery.
    130 
    131 <tt>stream.release(marker = last_marker)</tt> can be used to release an existing
    132 state marker from the memory table.
    133 
    134 === seek
    135 
    136 <tt>stream.seek(position)</tt> moves the stream cursor to an absolute position
    137 within the stream, basically like typical ruby <tt>IO#seek</tt> style methods.
    138 However, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute
    139 position seeking.
    140 
    141 == The Stream Module
    142 
    143 <tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream
    144 classes used by ANTLR recognizers.
    145 
    146 The module doesn't do much on its own besides define arguably annoying
    147 ``abstract'' pseudo-methods that demand implementation when it is mixed in to a
    148 class that wants to be a Stream. Right now this exists as an artifact of porting
    149 the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is
    150 represented as an interface. In Ruby, however, objects are duck-typed and
    151 interfaces aren't that useful as programmatic entities -- in fact, it's mildly
    152 wasteful to have a module like this hanging out. Thus, I may axe it.
    153 
    154 When mixed in, it does give the class a #size and #source_name attribute
    155 methods.
    156 
    157 Except in a small handful of places, most of the ANTLR runtime library uses
    158 duck-typing and not type checking on objects. This means that the methods which
    159 manipulate stream objects don't usually bother checking that the object is a
    160 Stream and assume that the object implements the proper stream interface. Thus,
    161 it is not strictly necessary that custom stream objects include ANTLR3::Stream,
    162 though it isn't a bad idea.
    163 
    164 =end
    165 
    166 module Stream
    167   include ANTLR3::Constants
    168   extend ClassMacros
    169   
    170   ##
    171   # :method: consume
    172   # used to advance a stream one unit (such as character or token)
    173   abstract :consume
    174   
    175   ##
    176   # :method: peek( k = 1 )
    177   # used to quickly retreive the object of interest to a recognizer at lookahead
    178   # position specified by <tt>k</tt> (such as integer value of a character or an
    179   # integer token type)
    180   abstract :peek
    181   
    182   ##
    183   # :method: look( k = 1 )
    184   # used to retreive the full object of interest at lookahead position specified
    185   # by <tt>k</tt> (such as a character string or a token structure)
    186   abstract :look
    187   
    188   ##
    189   # :method: mark
    190   # saves the current position for the purposes of backtracking and
    191   # returns a value to pass to #rewind at a later time
    192   abstract :mark
    193   
    194   ##
    195   # :method: index
    196   # returns the current position of the stream
    197   abstract :index
    198   
    199   ##
    200   # :method: rewind( marker = last_marker )
    201   # restores the stream position using the state information previously saved
    202   # by the given marker
    203   abstract :rewind
    204   
    205   ##
    206   # :method: release( marker = last_marker )
    207   # clears the saved state information associated with the given marker value
    208   abstract :release
    209   
    210   ##
    211   # :method: seek( position )
    212   # move the stream to the given absolute index given by +position+
    213   abstract :seek
    214   
    215   ##
    216   # the total number of symbols in the stream
    217   attr_reader :size
    218   
    219   ##
    220   # indicates an identifying name for the stream -- usually the file path of the input
    221   attr_accessor :source_name
    222 end
    223 
    224 =begin rdoc ANTLR3::CharacterStream
    225 
    226 CharacterStream further extends the abstract-ish base mixin Stream to add
    227 methods specific to navigating character-based input data. Thus, it serves as an
    228 immitation of the Java interface for text-based streams, which are primarily
    229 used by lexers.
    230 
    231 It adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be
    232 implemented to return a slice of the input string from position <tt>start</tt>
    233 to position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt>
    234 and <tt>column</tt>, which are expected to indicate the current line number and
    235 position within the current line, respectively.
    236 
    237 == A Word About <tt>line</tt> and <tt>column</tt> attributes
    238 
    239 Presumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text
    240 are familliar to most developers. Line numbers of text are indexed from number 1
    241 up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text:
    242 
    243   Hey this is the first line.
    244   Oh, and this is the second line.
    245 
    246 Line 1 is the string "Hey this is the first line\\n". If a character stream is at
    247 line 2, character 0, the stream cursor is sitting between the characters "\\n"
    248 and "O".
    249 
    250 *Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt>
    251 with the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered
    252 to keep it simple and familliar in this Ruby runtime API.
    253 
    254 =end
    255 
    256 module CharacterStream
    257   include Stream
    258   extend ClassMacros
    259   include Constants
    260   
    261   ##
    262   # :method: substring(start,stop)
    263   abstract :substring
    264   
    265   attr_accessor :line
    266   attr_accessor :column
    267 end
    268 
    269 
    270 =begin rdoc ANTLR3::TokenStream
    271 
    272 TokenStream further extends the abstract-ish base mixin Stream to add methods
    273 specific to navigating token sequences. Thus, it serves as an imitation of the
    274 Java interface for token-based streams, which are used by many different
    275 components in ANTLR, including parsers and tree parsers.
    276 
    277 == Token Streams
    278 
    279 Token streams wrap a sequence of token objects produced by some token source,
    280 usually a lexer. They provide the operations required by higher-level
    281 recognizers, such as parsers and tree parsers for navigating through the
    282 sequence of tokens. Unlike simple character-based streams, such as StringStream,
    283 token-based streams have an additional level of complexity because they must
    284 manage the task of "tuning" to a specific token channel.
    285 
    286 One of the main advantages of ANTLR-based recognition is the token
    287 <i>channel</i> feature, which allows you to hold on to all tokens of interest
    288 while only presenting a specific set of interesting tokens to a parser. For
    289 example, if you need to hide whitespace and comments from a parser, but hang on
    290 to them for some other purpose, you have the lexer assign the comments and
    291 whitespace to channel value HIDDEN as it creates the tokens.
    292 
    293 When you create a token stream, you can tune it to some specific channel value.
    294 Then, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only
    295 yield tokens that have the same value for <tt>channel</tt>. The stream skips
    296 over any non-matching tokens in between.
    297 
    298 == The TokenStream Interface
    299 
    300 In addition to the abstract methods and attribute methods provided by the base
    301 Stream module, TokenStream adds a number of additional method implementation
    302 requirements and attributes.
    303 
    304 =end
    305 
    306 module TokenStream
    307   include Stream
    308   extend ClassMacros
    309   
    310   ##
    311   # expected to return the token source object (such as a lexer) from which
    312   # all tokens in the stream were retreived
    313   attr_reader :token_source
    314   
    315   ##
    316   # expected to return the value of the last marker produced by a call to 
    317   # <tt>stream.mark</tt>
    318   attr_reader :last_marker
    319   
    320   ##
    321   # expected to return the integer index of the stream cursor
    322   attr_reader :position
    323   
    324   ##
    325   # the integer channel value to which the stream is ``tuned''
    326   attr_accessor :channel
    327   
    328   ##
    329   # :method: to_s(start=0,stop=tokens.length-1)
    330   # should take the tokens between start and stop in the sequence, extract their text
    331   # and return the concatenation of all the text chunks
    332   abstract :to_s
    333   
    334   ##
    335   # :method: at( i )
    336   # return the stream symbol at index +i+
    337   abstract :at
    338 end
    339 
    340 =begin rdoc ANTLR3::StringStream
    341 
    342 A StringStream's purpose is to wrap the basic, naked text input of a recognition
    343 system. Like all other stream types, it provides serial navigation of the input;
    344 a recognizer can arbitrarily step forward and backward through the stream's
    345 symbols as it requires. StringStream and its subclasses are they main way to
    346 feed text input into an ANTLR Lexer for token processing.
    347 
    348 The stream's symbols of interest, of course, are character values. Thus, the
    349 #peek method returns the integer character value at look-ahead position
    350 <tt>k</tt> and the #look method returns the character value as a +String+. They
    351 also track various pieces of information such as the line and column numbers at
    352 the current position.
    353 
    354 === Note About Text Encoding
    355 
    356 This version of the runtime library primarily targets ruby version 1.8, which
    357 does not have strong built-in support for multi-byte character encodings. Thus,
    358 characters are assumed to be represented by a single byte -- an integer between
    359 0 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte
    360 characters, but currently this library does not provide any streams to handle
    361 non-ASCII encoding. However, encoding-savvy recognition code is a future
    362 development goal for this project.
    363 
    364 =end
    365 
    366 class StringStream
    367   NEWLINE = ?\n.ord
    368   
    369   include CharacterStream
    370   
    371   # current integer character index of the stream
    372   attr_reader :position
    373   
    374   # the current line number of the input, indexed upward from 1
    375   attr_reader :line
    376   
    377   # the current character position within the current line, indexed upward from 0
    378   attr_reader :column
    379   
    380   # the name associated with the stream -- usually a file name
    381   # defaults to <tt>"(string)"</tt>
    382   attr_accessor :name
    383   
    384   # the entire string that is wrapped by the stream
    385   attr_reader :data
    386   attr_reader :string
    387   
    388   if RUBY_VERSION =~ /^1\.9/
    389     
    390     # creates a new StringStream object where +data+ is the string data to stream.
    391     # accepts the following options in a symbol-to-value hash:
    392     #
    393     # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
    394     # [:line] the initial line number; default: +1+
    395     # [:column] the initial column number; default: +0+
    396     # 
    397     def initialize( data, options = {} )      # for 1.9
    398       @string   = data.to_s.encode( Encoding::UTF_8 ).freeze
    399       @data     = @string.codepoints.to_a.freeze
    400       @position = options.fetch :position, 0
    401       @line     = options.fetch :line, 1
    402       @column   = options.fetch :column, 0
    403       @markers  = []
    404       @name   ||= options[ :file ] || options[ :name ] # || '(string)'
    405       mark
    406     end
    407     
    408     #
    409     # identical to #peek, except it returns the character value as a String
    410     # 
    411     def look( k = 1 )               # for 1.9
    412       k == 0 and return nil
    413       k += 1 if k < 0
    414       
    415       index = @position + k - 1
    416       index < 0 and return nil
    417       
    418       @string[ index ]
    419     end
    420     
    421   else
    422     
    423     # creates a new StringStream object where +data+ is the string data to stream.
    424     # accepts the following options in a symbol-to-value hash:
    425     #
    426     # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
    427     # [:line] the initial line number; default: +1+
    428     # [:column] the initial column number; default: +0+
    429     # 
    430     def initialize( data, options = {} )    # for 1.8
    431       @data = data.to_s
    432       @data.equal?( data ) and @data = @data.clone
    433       @data.freeze
    434       @string = @data
    435       @position = options.fetch :position, 0
    436       @line = options.fetch :line, 1
    437       @column = options.fetch :column, 0
    438       @markers = []
    439       @name ||= options[ :file ] || options[ :name ] # || '(string)'
    440       mark
    441     end
    442     
    443     #
    444     # identical to #peek, except it returns the character value as a String
    445     # 
    446     def look( k = 1 )                        # for 1.8
    447       k == 0 and return nil
    448       k += 1 if k < 0
    449       
    450       index = @position + k - 1
    451       index < 0 and return nil
    452       
    453       c = @data[ index ] and c.chr
    454     end
    455     
    456   end
    457   
    458   def size
    459     @data.length
    460   end
    461   
    462   alias length size
    463   
    464   # 
    465   # rewinds the stream back to the start and clears out any existing marker entries
    466   # 
    467   def reset
    468     initial_location = @markers.first
    469     @position, @line, @column = initial_location
    470     @markers.clear
    471     @markers << initial_location
    472     return self
    473   end
    474   
    475   #
    476   # advance the stream by one character; returns the character consumed
    477   # 
    478   def consume
    479     c = @data[ @position ] || EOF
    480     if @position < @data.length
    481       @column += 1
    482       if c == NEWLINE
    483         @line += 1
    484         @column = 0
    485       end
    486       @position += 1
    487     end
    488     return( c )
    489   end
    490   
    491   #
    492   # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents
    493   # the current character. +k+ greater than 1 represents upcoming characters. A negative
    494   # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last
    495   # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
    496   # 
    497   def peek( k = 1 )
    498     k == 0 and return nil
    499     k += 1 if k < 0
    500     index = @position + k - 1
    501     index < 0 and return nil
    502     @data[ index ] or EOF
    503   end
    504   
    505   #
    506   # return a substring around the stream cursor at a distance +k+
    507   # if <tt>k >= 0</tt>, return the next k characters
    508   # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters
    509   # 
    510   def through( k )
    511     if k >= 0 then @string[ @position, k ] else
    512       start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around
    513       @string[ start ... @position ]
    514     end
    515   end
    516   
    517   # operator style look-ahead
    518   alias >> look
    519   
    520   # operator style look-behind
    521   def <<( k )
    522     self << -k
    523   end
    524   
    525   alias index position
    526   alias character_index position
    527   
    528   alias source_name name
    529   
    530   #
    531   # Returns true if the stream appears to be at the beginning of a new line.
    532   # This is an extra utility method for use inside lexer actions if needed.
    533   # 
    534   def beginning_of_line?
    535     @position.zero? or @data[ @position - 1 ] == NEWLINE
    536   end
    537   
    538   #
    539   # Returns true if the stream appears to be at the end of a new line.
    540   # This is an extra utility method for use inside lexer actions if needed.
    541   # 
    542   def end_of_line?
    543     @data[ @position ] == NEWLINE #if @position < @data.length
    544   end
    545   
    546   #
    547   # Returns true if the stream has been exhausted.
    548   # This is an extra utility method for use inside lexer actions if needed.
    549   # 
    550   def end_of_string?
    551     @position >= @data.length
    552   end
    553 
    554   #
    555   # Returns true if the stream appears to be at the beginning of a stream (position = 0).
    556   # This is an extra utility method for use inside lexer actions if needed.
    557   # 
    558   def beginning_of_string?
    559     @position == 0
    560   end
    561   
    562   alias eof? end_of_string?
    563   alias bof? beginning_of_string?
    564   
    565   #
    566   # record the current stream location parameters in the stream's marker table and
    567   # return an integer-valued bookmark that may be used to restore the stream's
    568   # position with the #rewind method. This method is used to implement backtracking.
    569   # 
    570   def mark
    571     state = [ @position, @line, @column ].freeze
    572     @markers << state
    573     return @markers.length - 1
    574   end
    575   
    576   #
    577   # restore the stream to an earlier location recorded by #mark. If no marker value is
    578   # provided, the last marker generated by #mark will be used.
    579   # 
    580   def rewind( marker = @markers.length - 1, release = true )
    581     ( marker >= 0 and location = @markers[ marker ] ) or return( self )
    582     @position, @line, @column = location
    583     release( marker ) if release
    584     return self
    585   end
    586   
    587   #
    588   # the total number of markers currently in existence
    589   # 
    590   def mark_depth
    591     @markers.length
    592   end
    593   
    594   #
    595   # the last marker value created by a call to #mark
    596   # 
    597   def last_marker
    598     @markers.length - 1
    599   end
    600   
    601   #
    602   # let go of the bookmark data for the marker and all marker
    603   # values created after the marker.
    604   # 
    605   def release( marker = @markers.length - 1 )
    606     marker.between?( 1, @markers.length - 1 ) or return
    607     @markers.pop( @markers.length - marker )
    608     return self
    609   end
    610   
    611   #
    612   # jump to the absolute position value given by +index+.
    613   # note: if +index+ is before the current position, the +line+ and +column+
    614   #       attributes of the stream will probably be incorrect
    615   # 
    616   def seek( index )
    617     index = index.bound( 0, @data.length )  # ensures index is within the stream's range
    618     if index > @position
    619       skipped = through( index - @position )
    620       if lc = skipped.count( "\n" ) and lc.zero?
    621         @column += skipped.length
    622       else
    623         @line += lc
    624         @column = skipped.length - skipped.rindex( "\n" ) - 1
    625       end
    626     end
    627     @position = index
    628     return nil
    629   end
    630   
    631   # 
    632   # customized object inspection that shows:
    633   # * the stream class
    634   # * the stream's location in <tt>index / line:column</tt> format
    635   # * +before_chars+ characters before the cursor (6 characters by default)
    636   # * +after_chars+ characters after the cursor (10 characters by default)
    637   # 
    638   def inspect( before_chars = 6, after_chars = 10 )
    639     before = through( -before_chars ).inspect
    640     @position - before_chars > 0 and before.insert( 0, '... ' )
    641     
    642     after = through( after_chars ).inspect
    643     @position + after_chars + 1 < @data.length and after << ' ...'
    644     
    645     location = "#@position / line #@line:#@column"
    646     "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>"
    647   end
    648   
    649   #
    650   # return the string slice between position +start+ and +stop+
    651   # 
    652   def substring( start, stop )
    653     @string[ start, stop - start + 1 ]
    654   end
    655   
    656   #
    657   # identical to String#[]
    658   # 
    659   def []( start, *args )
    660     @string[ start, *args ]
    661   end
    662 end
    663 
    664 
    665 =begin rdoc ANTLR3::FileStream
    666 
    667 FileStream is a character stream that uses data stored in some external file. It
    668 is nearly identical to StringStream and functions as use data located in a file
    669 while automatically setting up the +source_name+ and +line+ parameters. It does
    670 not actually use any buffered IO operations throughout the stream navigation
    671 process. Instead, it reads the file data once when the stream is initialized.
    672 
    673 =end
    674 
    675 class FileStream < StringStream
    676   
    677   #
    678   # creates a new FileStream object using the given +file+ object.
    679   # If +file+ is a path string, the file will be read and the contents
    680   # will be used and the +name+ attribute will be set to the path.
    681   # If +file+ is an IO-like object (that responds to :read),
    682   # the content of the object will be used and the stream will
    683   # attempt to set its +name+ object first trying the method #name
    684   # on the object, then trying the method #path on the object.
    685   #
    686   # see StringStream.new for a list of additional options
    687   # the constructer accepts
    688   # 
    689   def initialize( file, options = {} )
    690     case file
    691     when $stdin then
    692       data = $stdin.read
    693       @name = '(stdin)'
    694     when ARGF
    695       data = file.read
    696       @name = file.path
    697     when ::File then
    698       file = file.clone
    699       file.reopen( file.path, 'r' )
    700       @name = file.path
    701       data = file.read
    702       file.close
    703     else
    704       if file.respond_to?( :read )
    705         data = file.read
    706         if file.respond_to?( :name ) then @name = file.name
    707         elsif file.respond_to?( :path ) then @name = file.path
    708         end
    709       else
    710         @name = file.to_s
    711         if test( ?f, @name ) then data = File.read( @name )
    712         else raise ArgumentError, "could not find an existing file at %p" % @name
    713         end
    714       end
    715     end
    716     super( data, options )
    717   end
    718   
    719 end
    720 
    721 =begin rdoc ANTLR3::CommonTokenStream
    722 
    723 CommonTokenStream serves as the primary token stream implementation for feeding
    724 sequential token input into parsers.
    725 
    726 Using some TokenSource (such as a lexer), the stream collects a token sequence,
    727 setting the token's <tt>index</tt> attribute to indicate the token's position
    728 within the stream. The streams may be tuned to some channel value; off-channel
    729 tokens will be filtered out by the #peek, #look, and #consume methods.
    730 
    731 === Sample Usage
    732 
    733   
    734   source_input = ANTLR3::StringStream.new("35 * 4 - 1")
    735   lexer = Calculator::Lexer.new(source_input)
    736   tokens = ANTLR3::CommonTokenStream.new(lexer)
    737   
    738   # assume this grammar defines whitespace as tokens on channel HIDDEN
    739   # and numbers and operations as tokens on channel DEFAULT
    740   tokens.look         # => 0 INT['35'] @ line 1 col 0 (0..1)
    741   tokens.look(2)      # => 2 MULT["*"] @ line 1 col 2 (3..3)
    742   tokens.tokens(0, 2)
    743     # => [0 INT["35"] @line 1 col 0 (0..1), 
    744     #     1 WS[" "] @line 1 col 2 (1..1), 
    745     #     2 MULT["*"] @ line 1 col 3 (3..3)]
    746     # notice the #tokens method does not filter off-channel tokens
    747   
    748   lexer.reset
    749   hidden_tokens = 
    750     ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN)
    751   hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1)
    752 
    753 =end
    754 
    755 class CommonTokenStream
    756   include TokenStream
    757   include Enumerable
    758   
    759   #
    760   # constructs a new token stream using the +token_source+ provided. +token_source+ is
    761   # usually a lexer, but can be any object that implements +next_token+ and includes
    762   # ANTLR3::TokenSource.
    763   #
    764   # If a block is provided, each token harvested will be yielded and if the block
    765   # returns a +nil+ or +false+ value, the token will not be added to the stream --
    766   # it will be discarded.
    767   #
    768   # === Options
    769   # [:channel] The channel value the stream should be tuned to initially
    770   # [:source_name] The source name (file name) attribute of the stream
    771   # 
    772   # === Example
    773   #
    774   #   # create a new token stream that is tuned to channel :comment, and
    775   #   # discard all WHITE_SPACE tokens
    776   #   ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token|
    777   #     token.name != 'WHITE_SPACE'
    778   #   end
    779   # 
    780   def initialize( token_source, options = {} )
    781     case token_source
    782     when CommonTokenStream
    783       # this is useful in cases where you want to convert a CommonTokenStream
    784       # to a RewriteTokenStream or other variation of the standard token stream
    785       stream = token_source
    786       @token_source = stream.token_source
    787       @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL }
    788       @source_name = options.fetch( :source_name ) { stream.source_name }
    789       tokens = stream.tokens.map { | t | t.dup }
    790     else
    791       @token_source = token_source
    792       @channel = options.fetch( :channel, DEFAULT_CHANNEL )
    793       @source_name = options.fetch( :source_name ) {  @token_source.source_name rescue nil }
    794       tokens = @token_source.to_a
    795     end
    796     @last_marker = nil
    797     @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens
    798     @tokens.each_with_index { |t, i| t.index = i }
    799     @position = 
    800       if first_token = @tokens.find { |t| t.channel == @channel }
    801         @tokens.index( first_token )
    802       else @tokens.length
    803       end
    804   end
    805   
    806   #
    807   # resets the token stream and rebuilds it with a potentially new token source.
    808   # If no +token_source+ value is provided, the stream will attempt to reset the
    809   # current +token_source+ by calling +reset+ on the object. The stream will
    810   # then clear the token buffer and attempt to harvest new tokens. Identical in
    811   # behavior to CommonTokenStream.new, if a block is provided, tokens will be
    812   # yielded and discarded if the block returns a +false+ or +nil+ value.
    813   # 
    814   def rebuild( token_source = nil )
    815     if token_source.nil?
    816       @token_source.reset rescue nil
    817     else @token_source = token_source
    818     end
    819     @tokens = block_given? ? @token_source.select { |token| yield( token ) } :   
    820                              @token_source.to_a
    821     @tokens.each_with_index { |t, i| t.index = i }
    822     @last_marker = nil
    823     @position = 
    824       if first_token = @tokens.find { |t| t.channel == @channel }
    825         @tokens.index( first_token )
    826       else @tokens.length
    827       end
    828     return self
    829   end
    830   
    831   #
    832   # tune the stream to a new channel value
    833   # 
    834   def tune_to( channel )
    835     @channel = channel
    836   end
    837   
    838   def token_class
    839     @token_source.token_class
    840   rescue NoMethodError
    841     @position == -1 and fill_buffer
    842     @tokens.empty? ? CommonToken : @tokens.first.class
    843   end
    844   
    845   alias index position
    846   
    847   def size
    848     @tokens.length
    849   end
    850   
    851   alias length size
    852   
    853   ###### State-Control ################################################
    854   
    855   #
    856   # rewind the stream to its initial state
    857   # 
    858   def reset
    859     @position = 0
    860     @position += 1 while token = @tokens[ @position ] and
    861                          token.channel != @channel
    862     @last_marker = nil
    863     return self
    864   end
    865   
    866   #
    867   # bookmark the current position of the input stream
    868   # 
    869   def mark
    870     @last_marker = @position
    871   end
    872   
    873   def release( marker = nil )
    874     # do nothing
    875   end
    876   
    877   
    878   def rewind( marker = @last_marker, release = true )
    879     seek( marker )
    880   end
    881   
    882   #
    883   # saves the current stream position, yields to the block,
    884   # and then ensures the stream's position is restored before
    885   # returning the value of the block
    886   #  
    887   def hold( pos = @position )
    888     block_given? or return enum_for( :hold, pos )
    889     begin
    890       yield
    891     ensure
    892       seek( pos )
    893     end
    894   end
    895   
    896   ###### Stream Navigation ###########################################
    897   
    898   #
    899   # advance the stream one step to the next on-channel token
    900   # 
    901   def consume
    902     token = @tokens[ @position ] || EOF_TOKEN
    903     if @position < @tokens.length
    904       @position = future?( 2 ) || @tokens.length
    905     end
    906     return( token )
    907   end
    908   
    909   #
    910   # jump to the stream position specified by +index+
    911   # note: seek does not check whether or not the
    912   #       token at the specified position is on-channel,
    913   #
    914   def seek( index )
    915     @position = index.to_i.bound( 0, @tokens.length )
    916     return self
    917   end
    918   
    919   #
    920   # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents
    921   # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative
    922   # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last
    923   # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
    924   # 
    925   def peek( k = 1 )
    926     tk = look( k ) and return( tk.type )
    927   end
    928   
    929   #
    930   # operates simillarly to #peek, but returns the full token object at look-ahead position +k+
    931   #
    932   def look( k = 1 )
    933     index = future?( k ) or return nil
    934     @tokens.fetch( index, EOF_TOKEN )
    935   end
    936   
    937   alias >> look
    938   def << k
    939     self >> -k
    940   end
    941   
    942   #
    943   # returns the index of the on-channel token at look-ahead position +k+ or nil if no other
    944   # on-channel tokens exist
    945   # 
    946   def future?( k = 1 )
    947     @position == -1 and fill_buffer
    948     
    949     case
    950     when k == 0 then nil
    951     when k < 0 then past?( -k )
    952     when k == 1 then @position
    953     else
    954       # since the stream only yields on-channel
    955       # tokens, the stream can't just go to the
    956       # next position, but rather must skip
    957       # over off-channel tokens
    958       ( k - 1 ).times.inject( @position ) do |cursor, |
    959         begin
    960           tk = @tokens.at( cursor += 1 ) or return( cursor )
    961           # ^- if tk is nil (i.e. i is outside array limits)
    962         end until tk.channel == @channel
    963         cursor
    964       end
    965     end
    966   end
    967   
    968   #
    969   # returns the index of the on-channel token at look-behind position +k+ or nil if no other
    970   # on-channel tokens exist before the current token
    971   # 
    972   def past?( k = 1 )
    973     @position == -1 and fill_buffer
    974     
    975     case
    976     when k == 0 then nil
    977     when @position - k < 0 then nil
    978     else
    979       
    980       k.times.inject( @position ) do |cursor, |
    981         begin
    982           cursor <= 0 and return( nil )
    983           tk = @tokens.at( cursor -= 1 ) or return( nil )
    984         end until tk.channel == @channel
    985         cursor
    986       end
    987       
    988     end
    989   end
    990   
    991   #
    992   # yields each token in the stream (including off-channel tokens)
    993   # If no block is provided, the method returns an Enumerator object.
    994   # #each accepts the same arguments as #tokens
    995   # 
    996   def each( *args )
    997     block_given? or return enum_for( :each, *args )
    998     tokens( *args ).each { |token| yield( token ) }
    999   end
   1000   
   1001   
   1002   #
   1003   # yields each token in the stream with the given channel value
   1004   # If no channel value is given, the stream's tuned channel value will be used.
   1005   # If no block is given, an enumerator will be returned. 
   1006   # 
   1007   def each_on_channel( channel = @channel )
   1008     block_given? or return enum_for( :each_on_channel, channel )
   1009     for token in @tokens
   1010       token.channel == channel and yield( token )
   1011     end
   1012   end
   1013   
   1014   #
   1015   # iterates through the token stream, yielding each on channel token along the way.
   1016   # After iteration has completed, the stream's position will be restored to where
   1017   # it was before #walk was called. While #each or #each_on_channel does not change
   1018   # the positions stream during iteration, #walk advances through the stream. This
   1019   # makes it possible to look ahead and behind the current token during iteration.
   1020   # If no block is given, an enumerator will be returned. 
   1021   # 
   1022   def walk
   1023     block_given? or return enum_for( :walk )
   1024     initial_position = @position
   1025     begin
   1026       while token = look and token.type != EOF
   1027         consume
   1028         yield( token )
   1029       end
   1030       return self
   1031     ensure
   1032       @position = initial_position
   1033     end
   1034   end
   1035   
   1036   # 
   1037   # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens
   1038   # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters
   1039   # are converted to integers with their <tt>to_i</tt> methods, and thus tokens
   1040   # can be provided to specify start and stop. If a block is provided, tokens are
   1041   # yielded and filtered out of the return array if the block returns a +false+
   1042   # or +nil+ value. 
   1043   # 
   1044   def tokens( start = nil, stop = nil )
   1045     stop.nil?  || stop >= @tokens.length and stop = @tokens.length - 1
   1046     start.nil? || stop < 0 and start = 0
   1047     tokens = @tokens[ start..stop ]
   1048     
   1049     if block_given?
   1050       tokens.delete_if { |t| not yield( t ) }
   1051     end
   1052     
   1053     return( tokens )
   1054   end
   1055   
   1056   
   1057   def at( i )
   1058     @tokens.at i
   1059   end
   1060   
   1061   #
   1062   # identical to Array#[], as applied to the stream's token buffer
   1063   # 
   1064   def []( i, *args )
   1065     @tokens[ i, *args ]
   1066   end
   1067   
   1068   ###### Standard Conversion Methods ###############################
   1069   def inspect
   1070     string = "#<%p: @token_source=%p @ %p/%p" %
   1071       [ self.class, @token_source.class, @position, @tokens.length ]
   1072     tk = look( -1 ) and string << " #{ tk.inspect } <--"
   1073     tk = look( 1 ) and string << " --> #{ tk.inspect }"
   1074     string << '>'
   1075   end
   1076   
   1077   #
   1078   # fetches the text content of all tokens between +start+ and +stop+ and
   1079   # joins the chunks into a single string
   1080   # 
   1081   def extract_text( start = 0, stop = @tokens.length - 1 )
   1082     start = start.to_i.at_least( 0 )
   1083     stop = stop.to_i.at_most( @tokens.length )
   1084     @tokens[ start..stop ].map! { |t| t.text }.join( '' )
   1085   end
   1086   
   1087   alias to_s extract_text
   1088   
   1089 end
   1090 
   1091 end
   1092