1 2 :mod:`parser` --- Access Python parse trees 3 =========================================== 4 5 .. module:: parser 6 :synopsis: Access parse trees for Python source code. 7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 9 10 11 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred 12 L. Drake, Jr. This copyright notice must be distributed on all copies, but 13 this document otherwise may be distributed as part of the Python 14 distribution. No fee may be charged for this document in any representation, 15 either on paper or electronically. This restriction does not affect other 16 elements in a distributed package in any way. 17 18 .. index:: single: parsing; Python source code 19 20 The :mod:`parser` module provides an interface to Python's internal parser and 21 byte-code compiler. The primary purpose for this interface is to allow Python 22 code to edit the parse tree of a Python expression and create executable code 23 from this. This is better than trying to parse and modify an arbitrary Python 24 code fragment as a string because parsing is performed in a manner identical to 25 the code forming the application. It is also faster. 26 27 .. note:: 28 29 From Python 2.5 onward, it's much more convenient to cut in at the Abstract 30 Syntax Tree (AST) generation and compilation stage, using the :mod:`ast` 31 module. 32 33 The :mod:`parser` module exports the names documented here also with "st" 34 replaced by "ast"; this is a legacy from the time when there was no other 35 AST and has nothing to do with the AST found in Python 2.5. This is also the 36 reason for the functions' keyword arguments being called *ast*, not *st*. 37 The "ast" functions have been removed in Python 3. 38 39 There are a few things to note about this module which are important to making 40 use of the data structures created. This is not a tutorial on editing the parse 41 trees for Python code, but some examples of using the :mod:`parser` module are 42 presented. 43 44 Most importantly, a good understanding of the Python grammar processed by the 45 internal parser is required. For full information on the language syntax, refer 46 to :ref:`reference-index`. The parser 47 itself is created from a grammar specification defined in the file 48 :file:`Grammar/Grammar` in the standard Python distribution. The parse trees 49 stored in the ST objects created by this module are the actual output from the 50 internal parser when created by the :func:`expr` or :func:`suite` functions, 51 described below. The ST objects created by :func:`sequence2st` faithfully 52 simulate those structures. Be aware that the values of the sequences which are 53 considered "correct" will vary from one version of Python to another as the 54 formal grammar for the language is revised. However, transporting code from one 55 Python version to another as source text will always allow correct parse trees 56 to be created in the target version, with the only restriction being that 57 migrating to an older version of the interpreter will not support more recent 58 language constructs. The parse trees are not typically compatible from one 59 version to another, whereas source code has always been forward-compatible. 60 61 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple` 62 has a simple form. Sequences representing non-terminal elements in the grammar 63 always have a length greater than one. The first element is an integer which 64 identifies a production in the grammar. These integers are given symbolic names 65 in the C header file :file:`Include/graminit.h` and the Python module 66 :mod:`symbol`. Each additional element of the sequence represents a component 67 of the production as recognized in the input string: these are always sequences 68 which have the same form as the parent. An important aspect of this structure 69 which should be noted is that keywords used to identify the parent node type, 70 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the 71 node tree without any special treatment. For example, the :keyword:`if` keyword 72 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value 73 associated with all :const:`NAME` tokens, including variable and function names 74 defined by the user. In an alternate form returned when line number information 75 is requested, the same token might be represented as ``(1, 'if', 12)``, where 76 the ``12`` represents the line number at which the terminal symbol was found. 77 78 Terminal elements are represented in much the same way, but without any child 79 elements and the addition of the source text which was identified. The example 80 of the :keyword:`if` keyword above is representative. The various types of 81 terminal symbols are defined in the C header file :file:`Include/token.h` and 82 the Python module :mod:`token`. 83 84 The ST objects are not required to support the functionality of this module, 85 but are provided for three purposes: to allow an application to amortize the 86 cost of processing complex parse trees, to provide a parse tree representation 87 which conserves memory space when compared to the Python list or tuple 88 representation, and to ease the creation of additional modules in C which 89 manipulate parse trees. A simple "wrapper" class may be created in Python to 90 hide the use of ST objects. 91 92 The :mod:`parser` module defines functions for a few distinct purposes. The 93 most important purposes are to create ST objects and to convert ST objects to 94 other representations such as parse trees and compiled code objects, but there 95 are also functions which serve to query the type of parse tree represented by an 96 ST object. 97 98 99 .. seealso:: 100 101 Module :mod:`symbol` 102 Useful constants representing internal nodes of the parse tree. 103 104 Module :mod:`token` 105 Useful constants representing leaf nodes of the parse tree and functions for 106 testing node values. 107 108 109 .. _creating-sts: 110 111 Creating ST Objects 112 ------------------- 113 114 ST objects may be created from source code or from a parse tree. When creating 115 an ST object from source, different functions are used to create the ``'eval'`` 116 and ``'exec'`` forms. 117 118 119 .. function:: expr(source) 120 121 The :func:`expr` function parses the parameter *source* as if it were an input 122 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object 123 is created to hold the internal parse tree representation, otherwise an 124 appropriate exception is raised. 125 126 127 .. function:: suite(source) 128 129 The :func:`suite` function parses the parameter *source* as if it were an input 130 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object 131 is created to hold the internal parse tree representation, otherwise an 132 appropriate exception is raised. 133 134 135 .. function:: sequence2st(sequence) 136 137 This function accepts a parse tree represented as a sequence and builds an 138 internal representation if possible. If it can validate that the tree conforms 139 to the Python grammar and all nodes are valid node types in the host version of 140 Python, an ST object is created from the internal representation and returned 141 to the called. If there is a problem creating the internal representation, or 142 if the tree cannot be validated, a :exc:`ParserError` exception is raised. An 143 ST object created this way should not be assumed to compile correctly; normal 144 exceptions raised by compilation may still be initiated when the ST object is 145 passed to :func:`compilest`. This may indicate problems not related to syntax 146 (such as a :exc:`MemoryError` exception), but may also be due to constructs such 147 as the result of parsing ``del f(0)``, which escapes the Python parser but is 148 checked by the bytecode compiler. 149 150 Sequences representing terminal tokens may be represented as either two-element 151 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1, 152 'name', 56)``. If the third element is present, it is assumed to be a valid 153 line number. The line number may be specified for any subset of the terminal 154 symbols in the input tree. 155 156 157 .. function:: tuple2st(sequence) 158 159 This is the same function as :func:`sequence2st`. This entry point is 160 maintained for backward compatibility. 161 162 163 .. _converting-sts: 164 165 Converting ST Objects 166 --------------------- 167 168 ST objects, regardless of the input used to create them, may be converted to 169 parse trees represented as list- or tuple- trees, or may be compiled into 170 executable code objects. Parse trees may be extracted with or without line 171 numbering information. 172 173 174 .. function:: st2list(ast[, line_info]) 175 176 This function accepts an ST object from the caller in *ast* and returns a 177 Python list representing the equivalent parse tree. The resulting list 178 representation can be used for inspection or the creation of a new parse tree in 179 list form. This function does not fail so long as memory is available to build 180 the list representation. If the parse tree will only be used for inspection, 181 :func:`st2tuple` should be used instead to reduce memory consumption and 182 fragmentation. When the list representation is required, this function is 183 significantly faster than retrieving a tuple representation and converting that 184 to nested lists. 185 186 If *line_info* is true, line number information will be included for all 187 terminal tokens as a third element of the list representing the token. Note 188 that the line number provided specifies the line on which the token *ends*. 189 This information is omitted if the flag is false or omitted. 190 191 192 .. function:: st2tuple(ast[, line_info]) 193 194 This function accepts an ST object from the caller in *ast* and returns a 195 Python tuple representing the equivalent parse tree. Other than returning a 196 tuple instead of a list, this function is identical to :func:`st2list`. 197 198 If *line_info* is true, line number information will be included for all 199 terminal tokens as a third element of the list representing the token. This 200 information is omitted if the flag is false or omitted. 201 202 203 .. function:: compilest(ast, filename='<syntax-tree>') 204 205 .. index:: builtin: eval 206 207 The Python byte compiler can be invoked on an ST object to produce code objects 208 which can be used as part of an :keyword:`exec` statement or a call to the 209 built-in :func:`eval` function. This function provides the interface to the 210 compiler, passing the internal parse tree from *ast* to the parser, using the 211 source file name specified by the *filename* parameter. The default value 212 supplied for *filename* indicates that the source was an ST object. 213 214 Compiling an ST object may result in exceptions related to compilation; an 215 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``: 216 this statement is considered legal within the formal grammar for Python but is 217 not a legal language construct. The :exc:`SyntaxError` raised for this 218 condition is actually generated by the Python byte-compiler normally, which is 219 why it can be raised at this point by the :mod:`parser` module. Most causes of 220 compilation failure can be diagnosed programmatically by inspection of the parse 221 tree. 222 223 224 .. _querying-sts: 225 226 Queries on ST Objects 227 --------------------- 228 229 Two functions are provided which allow an application to determine if an ST was 230 created as an expression or a suite. Neither of these functions can be used to 231 determine if an ST was created from source code via :func:`expr` or 232 :func:`suite` or from a parse tree via :func:`sequence2st`. 233 234 235 .. function:: isexpr(ast) 236 237 .. index:: builtin: compile 238 239 When *ast* represents an ``'eval'`` form, this function returns true, otherwise 240 it returns false. This is useful, since code objects normally cannot be queried 241 for this information using existing built-in functions. Note that the code 242 objects created by :func:`compilest` cannot be queried like this either, and 243 are identical to those created by the built-in :func:`compile` function. 244 245 246 .. function:: issuite(ast) 247 248 This function mirrors :func:`isexpr` in that it reports whether an ST object 249 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to 250 assume that this function is equivalent to ``not isexpr(ast)``, as additional 251 syntactic fragments may be supported in the future. 252 253 254 .. _st-errors: 255 256 Exceptions and Error Handling 257 ----------------------------- 258 259 The parser module defines a single exception, but may also pass other built-in 260 exceptions from other portions of the Python runtime environment. See each 261 function for information about the exceptions it can raise. 262 263 264 .. exception:: ParserError 265 266 Exception raised when a failure occurs within the parser module. This is 267 generally produced for validation failures rather than the built-in 268 :exc:`SyntaxError` raised during normal parsing. The exception argument is 269 either a string describing the reason of the failure or a tuple containing a 270 sequence causing the failure from a parse tree passed to :func:`sequence2st` 271 and an explanatory string. Calls to :func:`sequence2st` need to be able to 272 handle either type of exception, while calls to other functions in the module 273 will only need to be aware of the simple string values. 274 275 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may 276 raise exceptions which are normally raised by the parsing and compilation 277 process. These include the built in exceptions :exc:`MemoryError`, 278 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these 279 cases, these exceptions carry all the meaning normally associated with them. 280 Refer to the descriptions of each function for detailed information. 281 282 283 .. _st-objects: 284 285 ST Objects 286 ---------- 287 288 Ordered and equality comparisons are supported between ST objects. Pickling of 289 ST objects (using the :mod:`pickle` module) is also supported. 290 291 292 .. data:: STType 293 294 The type of the objects returned by :func:`expr`, :func:`suite` and 295 :func:`sequence2st`. 296 297 ST objects have the following methods: 298 299 300 .. method:: ST.compile([filename]) 301 302 Same as ``compilest(st, filename)``. 303 304 305 .. method:: ST.isexpr() 306 307 Same as ``isexpr(st)``. 308 309 310 .. method:: ST.issuite() 311 312 Same as ``issuite(st)``. 313 314 315 .. method:: ST.tolist([line_info]) 316 317 Same as ``st2list(st, line_info)``. 318 319 320 .. method:: ST.totuple([line_info]) 321 322 Same as ``st2tuple(st, line_info)``. 323 324 325 Example: Emulation of :func:`compile` 326 ------------------------------------- 327 328 While many useful operations may take place between parsing and bytecode 329 generation, the simplest operation is to do nothing. For this purpose, using 330 the :mod:`parser` module to produce an intermediate data structure is equivalent 331 to the code :: 332 333 >>> code = compile('a + 5', 'file.py', 'eval') 334 >>> a = 5 335 >>> eval(code) 336 10 337 338 The equivalent operation using the :mod:`parser` module is somewhat longer, and 339 allows the intermediate internal parse tree to be retained as an ST object:: 340 341 >>> import parser 342 >>> st = parser.expr('a + 5') 343 >>> code = st.compile('file.py') 344 >>> a = 5 345 >>> eval(code) 346 10 347 348 An application which needs both ST and code objects can package this code into 349 readily available functions:: 350 351 import parser 352 353 def load_suite(source_string): 354 st = parser.suite(source_string) 355 return st, st.compile() 356 357 def load_expression(source_string): 358 st = parser.expr(source_string) 359 return st, st.compile() 360