Home | History | Annotate | Download | only in library
      1 :mod:`parser` --- Access Python parse trees
      2 ===========================================
      3 
      4 .. module:: parser
      5    :synopsis: Access parse trees for Python source code.
      6 
      7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
      8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
      9 
     10 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
     11    L. Drake, Jr.  This copyright notice must be distributed on all copies, but
     12    this document otherwise may be distributed as part of the Python
     13    distribution.  No fee may be charged for this document in any representation,
     14    either on paper or electronically.  This restriction does not affect other
     15    elements in a distributed package in any way.
     16 
     17 .. index:: single: parsing; Python source code
     18 
     19 --------------
     20 
     21 The :mod:`parser` module provides an interface to Python's internal parser and
     22 byte-code compiler.  The primary purpose for this interface is to allow Python
     23 code to edit the parse tree of a Python expression and create executable code
     24 from this.  This is better than trying to parse and modify an arbitrary Python
     25 code fragment as a string because parsing is performed in a manner identical to
     26 the code forming the application.  It is also faster.
     27 
     28 .. note::
     29 
     30    From Python 2.5 onward, it's much more convenient to cut in at the Abstract
     31    Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
     32    module.
     33 
     34 There are a few things to note about this module which are important to making
     35 use of the data structures created.  This is not a tutorial on editing the parse
     36 trees for Python code, but some examples of using the :mod:`parser` module are
     37 presented.
     38 
     39 Most importantly, a good understanding of the Python grammar processed by the
     40 internal parser is required.  For full information on the language syntax, refer
     41 to :ref:`reference-index`.  The parser
     42 itself is created from a grammar specification defined in the file
     43 :file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
     44 stored in the ST objects created by this module are the actual output from the
     45 internal parser when created by the :func:`expr` or :func:`suite` functions,
     46 described below.  The ST objects created by :func:`sequence2st` faithfully
     47 simulate those structures.  Be aware that the values of the sequences which are
     48 considered "correct" will vary from one version of Python to another as the
     49 formal grammar for the language is revised.  However, transporting code from one
     50 Python version to another as source text will always allow correct parse trees
     51 to be created in the target version, with the only restriction being that
     52 migrating to an older version of the interpreter will not support more recent
     53 language constructs.  The parse trees are not typically compatible from one
     54 version to another, whereas source code has always been forward-compatible.
     55 
     56 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
     57 has a simple form.  Sequences representing non-terminal elements in the grammar
     58 always have a length greater than one.  The first element is an integer which
     59 identifies a production in the grammar.  These integers are given symbolic names
     60 in the C header file :file:`Include/graminit.h` and the Python module
     61 :mod:`symbol`.  Each additional element of the sequence represents a component
     62 of the production as recognized in the input string: these are always sequences
     63 which have the same form as the parent.  An important aspect of this structure
     64 which should be noted is that keywords used to identify the parent node type,
     65 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
     66 node tree without any special treatment.  For example, the :keyword:`if` keyword
     67 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
     68 associated with all :const:`NAME` tokens, including variable and function names
     69 defined by the user.  In an alternate form returned when line number information
     70 is requested, the same token might be represented as ``(1, 'if', 12)``, where
     71 the ``12`` represents the line number at which the terminal symbol was found.
     72 
     73 Terminal elements are represented in much the same way, but without any child
     74 elements and the addition of the source text which was identified.  The example
     75 of the :keyword:`if` keyword above is representative.  The various types of
     76 terminal symbols are defined in the C header file :file:`Include/token.h` and
     77 the Python module :mod:`token`.
     78 
     79 The ST objects are not required to support the functionality of this module,
     80 but are provided for three purposes: to allow an application to amortize the
     81 cost of processing complex parse trees, to provide a parse tree representation
     82 which conserves memory space when compared to the Python list or tuple
     83 representation, and to ease the creation of additional modules in C which
     84 manipulate parse trees.  A simple "wrapper" class may be created in Python to
     85 hide the use of ST objects.
     86 
     87 The :mod:`parser` module defines functions for a few distinct purposes.  The
     88 most important purposes are to create ST objects and to convert ST objects to
     89 other representations such as parse trees and compiled code objects, but there
     90 are also functions which serve to query the type of parse tree represented by an
     91 ST object.
     92 
     93 
     94 .. seealso::
     95 
     96    Module :mod:`symbol`
     97       Useful constants representing internal nodes of the parse tree.
     98 
     99    Module :mod:`token`
    100       Useful constants representing leaf nodes of the parse tree and functions for
    101       testing node values.
    102 
    103 
    104 .. _creating-sts:
    105 
    106 Creating ST Objects
    107 -------------------
    108 
    109 ST objects may be created from source code or from a parse tree. When creating
    110 an ST object from source, different functions are used to create the ``'eval'``
    111 and ``'exec'`` forms.
    112 
    113 
    114 .. function:: expr(source)
    115 
    116    The :func:`expr` function parses the parameter *source* as if it were an input
    117    to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
    118    is created to hold the internal parse tree representation, otherwise an
    119    appropriate exception is raised.
    120 
    121 
    122 .. function:: suite(source)
    123 
    124    The :func:`suite` function parses the parameter *source* as if it were an input
    125    to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
    126    is created to hold the internal parse tree representation, otherwise an
    127    appropriate exception is raised.
    128 
    129 
    130 .. function:: sequence2st(sequence)
    131 
    132    This function accepts a parse tree represented as a sequence and builds an
    133    internal representation if possible.  If it can validate that the tree conforms
    134    to the Python grammar and all nodes are valid node types in the host version of
    135    Python, an ST object is created from the internal representation and returned
    136    to the called.  If there is a problem creating the internal representation, or
    137    if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
    138    ST object created this way should not be assumed to compile correctly; normal
    139    exceptions raised by compilation may still be initiated when the ST object is
    140    passed to :func:`compilest`.  This may indicate problems not related to syntax
    141    (such as a :exc:`MemoryError` exception), but may also be due to constructs such
    142    as the result of parsing ``del f(0)``, which escapes the Python parser but is
    143    checked by the bytecode compiler.
    144 
    145    Sequences representing terminal tokens may be represented as either two-element
    146    lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
    147    'name', 56)``.  If the third element is present, it is assumed to be a valid
    148    line number.  The line number may be specified for any subset of the terminal
    149    symbols in the input tree.
    150 
    151 
    152 .. function:: tuple2st(sequence)
    153 
    154    This is the same function as :func:`sequence2st`.  This entry point is
    155    maintained for backward compatibility.
    156 
    157 
    158 .. _converting-sts:
    159 
    160 Converting ST Objects
    161 ---------------------
    162 
    163 ST objects, regardless of the input used to create them, may be converted to
    164 parse trees represented as list- or tuple- trees, or may be compiled into
    165 executable code objects.  Parse trees may be extracted with or without line
    166 numbering information.
    167 
    168 
    169 .. function:: st2list(st, line_info=False, col_info=False)
    170 
    171    This function accepts an ST object from the caller in *st* and returns a
    172    Python list representing the equivalent parse tree.  The resulting list
    173    representation can be used for inspection or the creation of a new parse tree in
    174    list form.  This function does not fail so long as memory is available to build
    175    the list representation.  If the parse tree will only be used for inspection,
    176    :func:`st2tuple` should be used instead to reduce memory consumption and
    177    fragmentation.  When the list representation is required, this function is
    178    significantly faster than retrieving a tuple representation and converting that
    179    to nested lists.
    180 
    181    If *line_info* is true, line number information will be included for all
    182    terminal tokens as a third element of the list representing the token.  Note
    183    that the line number provided specifies the line on which the token *ends*.
    184    This information is omitted if the flag is false or omitted.
    185 
    186 
    187 .. function:: st2tuple(st, line_info=False, col_info=False)
    188 
    189    This function accepts an ST object from the caller in *st* and returns a
    190    Python tuple representing the equivalent parse tree.  Other than returning a
    191    tuple instead of a list, this function is identical to :func:`st2list`.
    192 
    193    If *line_info* is true, line number information will be included for all
    194    terminal tokens as a third element of the list representing the token.  This
    195    information is omitted if the flag is false or omitted.
    196 
    197 
    198 .. function:: compilest(st, filename='<syntax-tree>')
    199 
    200    .. index::
    201       builtin: exec
    202       builtin: eval
    203 
    204    The Python byte compiler can be invoked on an ST object to produce code objects
    205    which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
    206    functions. This function provides the interface to the compiler, passing the
    207    internal parse tree from *st* to the parser, using the source file name
    208    specified by the *filename* parameter. The default value supplied for *filename*
    209    indicates that the source was an ST object.
    210 
    211    Compiling an ST object may result in exceptions related to compilation; an
    212    example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
    213    this statement is considered legal within the formal grammar for Python but is
    214    not a legal language construct.  The :exc:`SyntaxError` raised for this
    215    condition is actually generated by the Python byte-compiler normally, which is
    216    why it can be raised at this point by the :mod:`parser` module.  Most causes of
    217    compilation failure can be diagnosed programmatically by inspection of the parse
    218    tree.
    219 
    220 
    221 .. _querying-sts:
    222 
    223 Queries on ST Objects
    224 ---------------------
    225 
    226 Two functions are provided which allow an application to determine if an ST was
    227 created as an expression or a suite.  Neither of these functions can be used to
    228 determine if an ST was created from source code via :func:`expr` or
    229 :func:`suite` or from a parse tree via :func:`sequence2st`.
    230 
    231 
    232 .. function:: isexpr(st)
    233 
    234    .. index:: builtin: compile
    235 
    236    When *st* represents an ``'eval'`` form, this function returns true, otherwise
    237    it returns false.  This is useful, since code objects normally cannot be queried
    238    for this information using existing built-in functions.  Note that the code
    239    objects created by :func:`compilest` cannot be queried like this either, and
    240    are identical to those created by the built-in :func:`compile` function.
    241 
    242 
    243 .. function:: issuite(st)
    244 
    245    This function mirrors :func:`isexpr` in that it reports whether an ST object
    246    represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
    247    assume that this function is equivalent to ``not isexpr(st)``, as additional
    248    syntactic fragments may be supported in the future.
    249 
    250 
    251 .. _st-errors:
    252 
    253 Exceptions and Error Handling
    254 -----------------------------
    255 
    256 The parser module defines a single exception, but may also pass other built-in
    257 exceptions from other portions of the Python runtime environment.  See each
    258 function for information about the exceptions it can raise.
    259 
    260 
    261 .. exception:: ParserError
    262 
    263    Exception raised when a failure occurs within the parser module.  This is
    264    generally produced for validation failures rather than the built-in
    265    :exc:`SyntaxError` raised during normal parsing. The exception argument is
    266    either a string describing the reason of the failure or a tuple containing a
    267    sequence causing the failure from a parse tree passed to :func:`sequence2st`
    268    and an explanatory string.  Calls to :func:`sequence2st` need to be able to
    269    handle either type of exception, while calls to other functions in the module
    270    will only need to be aware of the simple string values.
    271 
    272 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
    273 raise exceptions which are normally raised by the parsing and compilation
    274 process.  These include the built in exceptions :exc:`MemoryError`,
    275 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
    276 cases, these exceptions carry all the meaning normally associated with them.
    277 Refer to the descriptions of each function for detailed information.
    278 
    279 
    280 .. _st-objects:
    281 
    282 ST Objects
    283 ----------
    284 
    285 Ordered and equality comparisons are supported between ST objects. Pickling of
    286 ST objects (using the :mod:`pickle` module) is also supported.
    287 
    288 
    289 .. data:: STType
    290 
    291    The type of the objects returned by :func:`expr`, :func:`suite` and
    292    :func:`sequence2st`.
    293 
    294 ST objects have the following methods:
    295 
    296 
    297 .. method:: ST.compile(filename='<syntax-tree>')
    298 
    299    Same as ``compilest(st, filename)``.
    300 
    301 
    302 .. method:: ST.isexpr()
    303 
    304    Same as ``isexpr(st)``.
    305 
    306 
    307 .. method:: ST.issuite()
    308 
    309    Same as ``issuite(st)``.
    310 
    311 
    312 .. method:: ST.tolist(line_info=False, col_info=False)
    313 
    314    Same as ``st2list(st, line_info, col_info)``.
    315 
    316 
    317 .. method:: ST.totuple(line_info=False, col_info=False)
    318 
    319    Same as ``st2tuple(st, line_info, col_info)``.
    320 
    321 
    322 Example: Emulation of :func:`compile`
    323 -------------------------------------
    324 
    325 While many useful operations may take place between parsing and bytecode
    326 generation, the simplest operation is to do nothing.  For this purpose, using
    327 the :mod:`parser` module to produce an intermediate data structure is equivalent
    328 to the code ::
    329 
    330    >>> code = compile('a + 5', 'file.py', 'eval')
    331    >>> a = 5
    332    >>> eval(code)
    333    10
    334 
    335 The equivalent operation using the :mod:`parser` module is somewhat longer, and
    336 allows the intermediate internal parse tree to be retained as an ST object::
    337 
    338    >>> import parser
    339    >>> st = parser.expr('a + 5')
    340    >>> code = st.compile('file.py')
    341    >>> a = 5
    342    >>> eval(code)
    343    10
    344 
    345 An application which needs both ST and code objects can package this code into
    346 readily available functions::
    347 
    348    import parser
    349 
    350    def load_suite(source_string):
    351        st = parser.suite(source_string)
    352        return st, st.compile()
    353 
    354    def load_expression(source_string):
    355        st = parser.expr(source_string)
    356        return st, st.compile()
    357