1 :mod:`parser` --- Access Python parse trees 2 =========================================== 3 4 .. module:: parser 5 :synopsis: Access parse trees for Python source code. 6 7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 9 10 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred 11 L. Drake, Jr. This copyright notice must be distributed on all copies, but 12 this document otherwise may be distributed as part of the Python 13 distribution. No fee may be charged for this document in any representation, 14 either on paper or electronically. This restriction does not affect other 15 elements in a distributed package in any way. 16 17 .. index:: single: parsing; Python source code 18 19 -------------- 20 21 The :mod:`parser` module provides an interface to Python's internal parser and 22 byte-code compiler. The primary purpose for this interface is to allow Python 23 code to edit the parse tree of a Python expression and create executable code 24 from this. This is better than trying to parse and modify an arbitrary Python 25 code fragment as a string because parsing is performed in a manner identical to 26 the code forming the application. It is also faster. 27 28 .. note:: 29 30 From Python 2.5 onward, it's much more convenient to cut in at the Abstract 31 Syntax Tree (AST) generation and compilation stage, using the :mod:`ast` 32 module. 33 34 There are a few things to note about this module which are important to making 35 use of the data structures created. This is not a tutorial on editing the parse 36 trees for Python code, but some examples of using the :mod:`parser` module are 37 presented. 38 39 Most importantly, a good understanding of the Python grammar processed by the 40 internal parser is required. For full information on the language syntax, refer 41 to :ref:`reference-index`. The parser 42 itself is created from a grammar specification defined in the file 43 :file:`Grammar/Grammar` in the standard Python distribution. The parse trees 44 stored in the ST objects created by this module are the actual output from the 45 internal parser when created by the :func:`expr` or :func:`suite` functions, 46 described below. The ST objects created by :func:`sequence2st` faithfully 47 simulate those structures. Be aware that the values of the sequences which are 48 considered "correct" will vary from one version of Python to another as the 49 formal grammar for the language is revised. However, transporting code from one 50 Python version to another as source text will always allow correct parse trees 51 to be created in the target version, with the only restriction being that 52 migrating to an older version of the interpreter will not support more recent 53 language constructs. The parse trees are not typically compatible from one 54 version to another, whereas source code has always been forward-compatible. 55 56 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple` 57 has a simple form. Sequences representing non-terminal elements in the grammar 58 always have a length greater than one. The first element is an integer which 59 identifies a production in the grammar. These integers are given symbolic names 60 in the C header file :file:`Include/graminit.h` and the Python module 61 :mod:`symbol`. Each additional element of the sequence represents a component 62 of the production as recognized in the input string: these are always sequences 63 which have the same form as the parent. An important aspect of this structure 64 which should be noted is that keywords used to identify the parent node type, 65 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the 66 node tree without any special treatment. For example, the :keyword:`!if` keyword 67 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value 68 associated with all :const:`NAME` tokens, including variable and function names 69 defined by the user. In an alternate form returned when line number information 70 is requested, the same token might be represented as ``(1, 'if', 12)``, where 71 the ``12`` represents the line number at which the terminal symbol was found. 72 73 Terminal elements are represented in much the same way, but without any child 74 elements and the addition of the source text which was identified. The example 75 of the :keyword:`if` keyword above is representative. The various types of 76 terminal symbols are defined in the C header file :file:`Include/token.h` and 77 the Python module :mod:`token`. 78 79 The ST objects are not required to support the functionality of this module, 80 but are provided for three purposes: to allow an application to amortize the 81 cost of processing complex parse trees, to provide a parse tree representation 82 which conserves memory space when compared to the Python list or tuple 83 representation, and to ease the creation of additional modules in C which 84 manipulate parse trees. A simple "wrapper" class may be created in Python to 85 hide the use of ST objects. 86 87 The :mod:`parser` module defines functions for a few distinct purposes. The 88 most important purposes are to create ST objects and to convert ST objects to 89 other representations such as parse trees and compiled code objects, but there 90 are also functions which serve to query the type of parse tree represented by an 91 ST object. 92 93 94 .. seealso:: 95 96 Module :mod:`symbol` 97 Useful constants representing internal nodes of the parse tree. 98 99 Module :mod:`token` 100 Useful constants representing leaf nodes of the parse tree and functions for 101 testing node values. 102 103 104 .. _creating-sts: 105 106 Creating ST Objects 107 ------------------- 108 109 ST objects may be created from source code or from a parse tree. When creating 110 an ST object from source, different functions are used to create the ``'eval'`` 111 and ``'exec'`` forms. 112 113 114 .. function:: expr(source) 115 116 The :func:`expr` function parses the parameter *source* as if it were an input 117 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object 118 is created to hold the internal parse tree representation, otherwise an 119 appropriate exception is raised. 120 121 122 .. function:: suite(source) 123 124 The :func:`suite` function parses the parameter *source* as if it were an input 125 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object 126 is created to hold the internal parse tree representation, otherwise an 127 appropriate exception is raised. 128 129 130 .. function:: sequence2st(sequence) 131 132 This function accepts a parse tree represented as a sequence and builds an 133 internal representation if possible. If it can validate that the tree conforms 134 to the Python grammar and all nodes are valid node types in the host version of 135 Python, an ST object is created from the internal representation and returned 136 to the called. If there is a problem creating the internal representation, or 137 if the tree cannot be validated, a :exc:`ParserError` exception is raised. An 138 ST object created this way should not be assumed to compile correctly; normal 139 exceptions raised by compilation may still be initiated when the ST object is 140 passed to :func:`compilest`. This may indicate problems not related to syntax 141 (such as a :exc:`MemoryError` exception), but may also be due to constructs such 142 as the result of parsing ``del f(0)``, which escapes the Python parser but is 143 checked by the bytecode compiler. 144 145 Sequences representing terminal tokens may be represented as either two-element 146 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1, 147 'name', 56)``. If the third element is present, it is assumed to be a valid 148 line number. The line number may be specified for any subset of the terminal 149 symbols in the input tree. 150 151 152 .. function:: tuple2st(sequence) 153 154 This is the same function as :func:`sequence2st`. This entry point is 155 maintained for backward compatibility. 156 157 158 .. _converting-sts: 159 160 Converting ST Objects 161 --------------------- 162 163 ST objects, regardless of the input used to create them, may be converted to 164 parse trees represented as list- or tuple- trees, or may be compiled into 165 executable code objects. Parse trees may be extracted with or without line 166 numbering information. 167 168 169 .. function:: st2list(st, line_info=False, col_info=False) 170 171 This function accepts an ST object from the caller in *st* and returns a 172 Python list representing the equivalent parse tree. The resulting list 173 representation can be used for inspection or the creation of a new parse tree in 174 list form. This function does not fail so long as memory is available to build 175 the list representation. If the parse tree will only be used for inspection, 176 :func:`st2tuple` should be used instead to reduce memory consumption and 177 fragmentation. When the list representation is required, this function is 178 significantly faster than retrieving a tuple representation and converting that 179 to nested lists. 180 181 If *line_info* is true, line number information will be included for all 182 terminal tokens as a third element of the list representing the token. Note 183 that the line number provided specifies the line on which the token *ends*. 184 This information is omitted if the flag is false or omitted. 185 186 187 .. function:: st2tuple(st, line_info=False, col_info=False) 188 189 This function accepts an ST object from the caller in *st* and returns a 190 Python tuple representing the equivalent parse tree. Other than returning a 191 tuple instead of a list, this function is identical to :func:`st2list`. 192 193 If *line_info* is true, line number information will be included for all 194 terminal tokens as a third element of the list representing the token. This 195 information is omitted if the flag is false or omitted. 196 197 198 .. function:: compilest(st, filename='<syntax-tree>') 199 200 .. index:: 201 builtin: exec 202 builtin: eval 203 204 The Python byte compiler can be invoked on an ST object to produce code objects 205 which can be used as part of a call to the built-in :func:`exec` or :func:`eval` 206 functions. This function provides the interface to the compiler, passing the 207 internal parse tree from *st* to the parser, using the source file name 208 specified by the *filename* parameter. The default value supplied for *filename* 209 indicates that the source was an ST object. 210 211 Compiling an ST object may result in exceptions related to compilation; an 212 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``: 213 this statement is considered legal within the formal grammar for Python but is 214 not a legal language construct. The :exc:`SyntaxError` raised for this 215 condition is actually generated by the Python byte-compiler normally, which is 216 why it can be raised at this point by the :mod:`parser` module. Most causes of 217 compilation failure can be diagnosed programmatically by inspection of the parse 218 tree. 219 220 221 .. _querying-sts: 222 223 Queries on ST Objects 224 --------------------- 225 226 Two functions are provided which allow an application to determine if an ST was 227 created as an expression or a suite. Neither of these functions can be used to 228 determine if an ST was created from source code via :func:`expr` or 229 :func:`suite` or from a parse tree via :func:`sequence2st`. 230 231 232 .. function:: isexpr(st) 233 234 .. index:: builtin: compile 235 236 When *st* represents an ``'eval'`` form, this function returns true, otherwise 237 it returns false. This is useful, since code objects normally cannot be queried 238 for this information using existing built-in functions. Note that the code 239 objects created by :func:`compilest` cannot be queried like this either, and 240 are identical to those created by the built-in :func:`compile` function. 241 242 243 .. function:: issuite(st) 244 245 This function mirrors :func:`isexpr` in that it reports whether an ST object 246 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to 247 assume that this function is equivalent to ``not isexpr(st)``, as additional 248 syntactic fragments may be supported in the future. 249 250 251 .. _st-errors: 252 253 Exceptions and Error Handling 254 ----------------------------- 255 256 The parser module defines a single exception, but may also pass other built-in 257 exceptions from other portions of the Python runtime environment. See each 258 function for information about the exceptions it can raise. 259 260 261 .. exception:: ParserError 262 263 Exception raised when a failure occurs within the parser module. This is 264 generally produced for validation failures rather than the built-in 265 :exc:`SyntaxError` raised during normal parsing. The exception argument is 266 either a string describing the reason of the failure or a tuple containing a 267 sequence causing the failure from a parse tree passed to :func:`sequence2st` 268 and an explanatory string. Calls to :func:`sequence2st` need to be able to 269 handle either type of exception, while calls to other functions in the module 270 will only need to be aware of the simple string values. 271 272 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may 273 raise exceptions which are normally raised by the parsing and compilation 274 process. These include the built in exceptions :exc:`MemoryError`, 275 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these 276 cases, these exceptions carry all the meaning normally associated with them. 277 Refer to the descriptions of each function for detailed information. 278 279 280 .. _st-objects: 281 282 ST Objects 283 ---------- 284 285 Ordered and equality comparisons are supported between ST objects. Pickling of 286 ST objects (using the :mod:`pickle` module) is also supported. 287 288 289 .. data:: STType 290 291 The type of the objects returned by :func:`expr`, :func:`suite` and 292 :func:`sequence2st`. 293 294 ST objects have the following methods: 295 296 297 .. method:: ST.compile(filename='<syntax-tree>') 298 299 Same as ``compilest(st, filename)``. 300 301 302 .. method:: ST.isexpr() 303 304 Same as ``isexpr(st)``. 305 306 307 .. method:: ST.issuite() 308 309 Same as ``issuite(st)``. 310 311 312 .. method:: ST.tolist(line_info=False, col_info=False) 313 314 Same as ``st2list(st, line_info, col_info)``. 315 316 317 .. method:: ST.totuple(line_info=False, col_info=False) 318 319 Same as ``st2tuple(st, line_info, col_info)``. 320 321 322 Example: Emulation of :func:`compile` 323 ------------------------------------- 324 325 While many useful operations may take place between parsing and bytecode 326 generation, the simplest operation is to do nothing. For this purpose, using 327 the :mod:`parser` module to produce an intermediate data structure is equivalent 328 to the code :: 329 330 >>> code = compile('a + 5', 'file.py', 'eval') 331 >>> a = 5 332 >>> eval(code) 333 10 334 335 The equivalent operation using the :mod:`parser` module is somewhat longer, and 336 allows the intermediate internal parse tree to be retained as an ST object:: 337 338 >>> import parser 339 >>> st = parser.expr('a + 5') 340 >>> code = st.compile('file.py') 341 >>> a = 5 342 >>> eval(code) 343 10 344 345 An application which needs both ST and code objects can package this code into 346 readily available functions:: 347 348 import parser 349 350 def load_suite(source_string): 351 st = parser.suite(source_string) 352 return st, st.compile() 353 354 def load_expression(source_string): 355 st = parser.expr(source_string) 356 return st, st.compile() 357