Home | History | Annotate | Download | only in library
      1 
      2 :mod:`parser` --- Access Python parse trees
      3 ===========================================
      4 
      5 .. module:: parser
      6    :synopsis: Access parse trees for Python source code.
      7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
      8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
      9 
     10 
     11 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
     12    L. Drake, Jr.  This copyright notice must be distributed on all copies, but
     13    this document otherwise may be distributed as part of the Python
     14    distribution.  No fee may be charged for this document in any representation,
     15    either on paper or electronically.  This restriction does not affect other
     16    elements in a distributed package in any way.
     17 
     18 .. index:: single: parsing; Python source code
     19 
     20 The :mod:`parser` module provides an interface to Python's internal parser and
     21 byte-code compiler.  The primary purpose for this interface is to allow Python
     22 code to edit the parse tree of a Python expression and create executable code
     23 from this.  This is better than trying to parse and modify an arbitrary Python
     24 code fragment as a string because parsing is performed in a manner identical to
     25 the code forming the application.  It is also faster.
     26 
     27 .. note::
     28 
     29    From Python 2.5 onward, it's much more convenient to cut in at the Abstract
     30    Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
     31    module.
     32 
     33    The :mod:`parser` module exports the names documented here also with "st"
     34    replaced by "ast"; this is a legacy from the time when there was no other
     35    AST and has nothing to do with the AST found in Python 2.5.  This is also the
     36    reason for the functions' keyword arguments being called *ast*, not *st*.
     37    The "ast" functions have been removed in Python 3.
     38 
     39 There are a few things to note about this module which are important to making
     40 use of the data structures created.  This is not a tutorial on editing the parse
     41 trees for Python code, but some examples of using the :mod:`parser` module are
     42 presented.
     43 
     44 Most importantly, a good understanding of the Python grammar processed by the
     45 internal parser is required.  For full information on the language syntax, refer
     46 to :ref:`reference-index`.  The parser
     47 itself is created from a grammar specification defined in the file
     48 :file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
     49 stored in the ST objects created by this module are the actual output from the
     50 internal parser when created by the :func:`expr` or :func:`suite` functions,
     51 described below.  The ST objects created by :func:`sequence2st` faithfully
     52 simulate those structures.  Be aware that the values of the sequences which are
     53 considered "correct" will vary from one version of Python to another as the
     54 formal grammar for the language is revised.  However, transporting code from one
     55 Python version to another as source text will always allow correct parse trees
     56 to be created in the target version, with the only restriction being that
     57 migrating to an older version of the interpreter will not support more recent
     58 language constructs.  The parse trees are not typically compatible from one
     59 version to another, whereas source code has always been forward-compatible.
     60 
     61 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
     62 has a simple form.  Sequences representing non-terminal elements in the grammar
     63 always have a length greater than one.  The first element is an integer which
     64 identifies a production in the grammar.  These integers are given symbolic names
     65 in the C header file :file:`Include/graminit.h` and the Python module
     66 :mod:`symbol`.  Each additional element of the sequence represents a component
     67 of the production as recognized in the input string: these are always sequences
     68 which have the same form as the parent.  An important aspect of this structure
     69 which should be noted is that keywords used to identify the parent node type,
     70 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
     71 node tree without any special treatment.  For example, the :keyword:`if` keyword
     72 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
     73 associated with all :const:`NAME` tokens, including variable and function names
     74 defined by the user.  In an alternate form returned when line number information
     75 is requested, the same token might be represented as ``(1, 'if', 12)``, where
     76 the ``12`` represents the line number at which the terminal symbol was found.
     77 
     78 Terminal elements are represented in much the same way, but without any child
     79 elements and the addition of the source text which was identified.  The example
     80 of the :keyword:`if` keyword above is representative.  The various types of
     81 terminal symbols are defined in the C header file :file:`Include/token.h` and
     82 the Python module :mod:`token`.
     83 
     84 The ST objects are not required to support the functionality of this module,
     85 but are provided for three purposes: to allow an application to amortize the
     86 cost of processing complex parse trees, to provide a parse tree representation
     87 which conserves memory space when compared to the Python list or tuple
     88 representation, and to ease the creation of additional modules in C which
     89 manipulate parse trees.  A simple "wrapper" class may be created in Python to
     90 hide the use of ST objects.
     91 
     92 The :mod:`parser` module defines functions for a few distinct purposes.  The
     93 most important purposes are to create ST objects and to convert ST objects to
     94 other representations such as parse trees and compiled code objects, but there
     95 are also functions which serve to query the type of parse tree represented by an
     96 ST object.
     97 
     98 
     99 .. seealso::
    100 
    101    Module :mod:`symbol`
    102       Useful constants representing internal nodes of the parse tree.
    103 
    104    Module :mod:`token`
    105       Useful constants representing leaf nodes of the parse tree and functions for
    106       testing node values.
    107 
    108 
    109 .. _creating-sts:
    110 
    111 Creating ST Objects
    112 -------------------
    113 
    114 ST objects may be created from source code or from a parse tree. When creating
    115 an ST object from source, different functions are used to create the ``'eval'``
    116 and ``'exec'`` forms.
    117 
    118 
    119 .. function:: expr(source)
    120 
    121    The :func:`expr` function parses the parameter *source* as if it were an input
    122    to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
    123    is created to hold the internal parse tree representation, otherwise an
    124    appropriate exception is raised.
    125 
    126 
    127 .. function:: suite(source)
    128 
    129    The :func:`suite` function parses the parameter *source* as if it were an input
    130    to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
    131    is created to hold the internal parse tree representation, otherwise an
    132    appropriate exception is raised.
    133 
    134 
    135 .. function:: sequence2st(sequence)
    136 
    137    This function accepts a parse tree represented as a sequence and builds an
    138    internal representation if possible.  If it can validate that the tree conforms
    139    to the Python grammar and all nodes are valid node types in the host version of
    140    Python, an ST object is created from the internal representation and returned
    141    to the called.  If there is a problem creating the internal representation, or
    142    if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
    143    ST object created this way should not be assumed to compile correctly; normal
    144    exceptions raised by compilation may still be initiated when the ST object is
    145    passed to :func:`compilest`.  This may indicate problems not related to syntax
    146    (such as a :exc:`MemoryError` exception), but may also be due to constructs such
    147    as the result of parsing ``del f(0)``, which escapes the Python parser but is
    148    checked by the bytecode compiler.
    149 
    150    Sequences representing terminal tokens may be represented as either two-element
    151    lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
    152    'name', 56)``.  If the third element is present, it is assumed to be a valid
    153    line number.  The line number may be specified for any subset of the terminal
    154    symbols in the input tree.
    155 
    156 
    157 .. function:: tuple2st(sequence)
    158 
    159    This is the same function as :func:`sequence2st`.  This entry point is
    160    maintained for backward compatibility.
    161 
    162 
    163 .. _converting-sts:
    164 
    165 Converting ST Objects
    166 ---------------------
    167 
    168 ST objects, regardless of the input used to create them, may be converted to
    169 parse trees represented as list- or tuple- trees, or may be compiled into
    170 executable code objects.  Parse trees may be extracted with or without line
    171 numbering information.
    172 
    173 
    174 .. function:: st2list(ast[, line_info])
    175 
    176    This function accepts an ST object from the caller in *ast* and returns a
    177    Python list representing the equivalent parse tree.  The resulting list
    178    representation can be used for inspection or the creation of a new parse tree in
    179    list form.  This function does not fail so long as memory is available to build
    180    the list representation.  If the parse tree will only be used for inspection,
    181    :func:`st2tuple` should be used instead to reduce memory consumption and
    182    fragmentation.  When the list representation is required, this function is
    183    significantly faster than retrieving a tuple representation and converting that
    184    to nested lists.
    185 
    186    If *line_info* is true, line number information will be included for all
    187    terminal tokens as a third element of the list representing the token.  Note
    188    that the line number provided specifies the line on which the token *ends*.
    189    This information is omitted if the flag is false or omitted.
    190 
    191 
    192 .. function:: st2tuple(ast[, line_info])
    193 
    194    This function accepts an ST object from the caller in *ast* and returns a
    195    Python tuple representing the equivalent parse tree.  Other than returning a
    196    tuple instead of a list, this function is identical to :func:`st2list`.
    197 
    198    If *line_info* is true, line number information will be included for all
    199    terminal tokens as a third element of the list representing the token.  This
    200    information is omitted if the flag is false or omitted.
    201 
    202 
    203 .. function:: compilest(ast, filename='<syntax-tree>')
    204 
    205    .. index:: builtin: eval
    206 
    207    The Python byte compiler can be invoked on an ST object to produce code objects
    208    which can be used as part of an :keyword:`exec` statement or a call to the
    209    built-in :func:`eval` function. This function provides the interface to the
    210    compiler, passing the internal parse tree from *ast* to the parser, using the
    211    source file name specified by the *filename* parameter. The default value
    212    supplied for *filename* indicates that the source was an ST object.
    213 
    214    Compiling an ST object may result in exceptions related to compilation; an
    215    example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
    216    this statement is considered legal within the formal grammar for Python but is
    217    not a legal language construct.  The :exc:`SyntaxError` raised for this
    218    condition is actually generated by the Python byte-compiler normally, which is
    219    why it can be raised at this point by the :mod:`parser` module.  Most causes of
    220    compilation failure can be diagnosed programmatically by inspection of the parse
    221    tree.
    222 
    223 
    224 .. _querying-sts:
    225 
    226 Queries on ST Objects
    227 ---------------------
    228 
    229 Two functions are provided which allow an application to determine if an ST was
    230 created as an expression or a suite.  Neither of these functions can be used to
    231 determine if an ST was created from source code via :func:`expr` or
    232 :func:`suite` or from a parse tree via :func:`sequence2st`.
    233 
    234 
    235 .. function:: isexpr(ast)
    236 
    237    .. index:: builtin: compile
    238 
    239    When *ast* represents an ``'eval'`` form, this function returns true, otherwise
    240    it returns false.  This is useful, since code objects normally cannot be queried
    241    for this information using existing built-in functions.  Note that the code
    242    objects created by :func:`compilest` cannot be queried like this either, and
    243    are identical to those created by the built-in :func:`compile` function.
    244 
    245 
    246 .. function:: issuite(ast)
    247 
    248    This function mirrors :func:`isexpr` in that it reports whether an ST object
    249    represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
    250    assume that this function is equivalent to ``not isexpr(ast)``, as additional
    251    syntactic fragments may be supported in the future.
    252 
    253 
    254 .. _st-errors:
    255 
    256 Exceptions and Error Handling
    257 -----------------------------
    258 
    259 The parser module defines a single exception, but may also pass other built-in
    260 exceptions from other portions of the Python runtime environment.  See each
    261 function for information about the exceptions it can raise.
    262 
    263 
    264 .. exception:: ParserError
    265 
    266    Exception raised when a failure occurs within the parser module.  This is
    267    generally produced for validation failures rather than the built-in
    268    :exc:`SyntaxError` raised during normal parsing. The exception argument is
    269    either a string describing the reason of the failure or a tuple containing a
    270    sequence causing the failure from a parse tree passed to :func:`sequence2st`
    271    and an explanatory string.  Calls to :func:`sequence2st` need to be able to
    272    handle either type of exception, while calls to other functions in the module
    273    will only need to be aware of the simple string values.
    274 
    275 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
    276 raise exceptions which are normally raised by the parsing and compilation
    277 process.  These include the built in exceptions :exc:`MemoryError`,
    278 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
    279 cases, these exceptions carry all the meaning normally associated with them.
    280 Refer to the descriptions of each function for detailed information.
    281 
    282 
    283 .. _st-objects:
    284 
    285 ST Objects
    286 ----------
    287 
    288 Ordered and equality comparisons are supported between ST objects. Pickling of
    289 ST objects (using the :mod:`pickle` module) is also supported.
    290 
    291 
    292 .. data:: STType
    293 
    294    The type of the objects returned by :func:`expr`, :func:`suite` and
    295    :func:`sequence2st`.
    296 
    297 ST objects have the following methods:
    298 
    299 
    300 .. method:: ST.compile([filename])
    301 
    302    Same as ``compilest(st, filename)``.
    303 
    304 
    305 .. method:: ST.isexpr()
    306 
    307    Same as ``isexpr(st)``.
    308 
    309 
    310 .. method:: ST.issuite()
    311 
    312    Same as ``issuite(st)``.
    313 
    314 
    315 .. method:: ST.tolist([line_info])
    316 
    317    Same as ``st2list(st, line_info)``.
    318 
    319 
    320 .. method:: ST.totuple([line_info])
    321 
    322    Same as ``st2tuple(st, line_info)``.
    323 
    324 
    325 Example: Emulation of :func:`compile`
    326 -------------------------------------
    327 
    328 While many useful operations may take place between parsing and bytecode
    329 generation, the simplest operation is to do nothing.  For this purpose, using
    330 the :mod:`parser` module to produce an intermediate data structure is equivalent
    331 to the code ::
    332 
    333    >>> code = compile('a + 5', 'file.py', 'eval')
    334    >>> a = 5
    335    >>> eval(code)
    336    10
    337 
    338 The equivalent operation using the :mod:`parser` module is somewhat longer, and
    339 allows the intermediate internal parse tree to be retained as an ST object::
    340 
    341    >>> import parser
    342    >>> st = parser.expr('a + 5')
    343    >>> code = st.compile('file.py')
    344    >>> a = 5
    345    >>> eval(code)
    346    10
    347 
    348 An application which needs both ST and code objects can package this code into
    349 readily available functions::
    350 
    351    import parser
    352 
    353    def load_suite(source_string):
    354        st = parser.suite(source_string)
    355        return st, st.compile()
    356 
    357    def load_expression(source_string):
    358        st = parser.expr(source_string)
    359        return st, st.compile()
    360