Home | History | Annotate | Download | only in library
      1 :mod:`xml.parsers.expat` --- Fast XML parsing using Expat
      2 =========================================================
      3 
      4 .. module:: xml.parsers.expat
      5    :synopsis: An interface to the Expat non-validating XML parser.
      6 
      7 .. moduleauthor:: Paul Prescod <paul (a] prescod.net>
      8 
      9 --------------
     10 
     11 .. Markup notes:
     12 
     13    Many of the attributes of the XMLParser objects are callbacks.  Since
     14    signature information must be presented, these are described using the method
     15    directive.  Since they are attributes which are set by client code, in-text
     16    references to these attributes should be marked using the :member: role.
     17 
     18 
     19 .. warning::
     20 
     21    The :mod:`pyexpat` module is not secure against maliciously
     22    constructed data.  If you need to parse untrusted or unauthenticated data see
     23    :ref:`xml-vulnerabilities`.
     24 
     25 
     26 .. index:: single: Expat
     27 
     28 The :mod:`xml.parsers.expat` module is a Python interface to the Expat
     29 non-validating XML parser. The module provides a single extension type,
     30 :class:`xmlparser`, that represents the current state of an XML parser.  After
     31 an :class:`xmlparser` object has been created, various attributes of the object
     32 can be set to handler functions.  When an XML document is then fed to the
     33 parser, the handler functions are called for the character data and markup in
     34 the XML document.
     35 
     36 .. index:: module: pyexpat
     37 
     38 This module uses the :mod:`pyexpat` module to provide access to the Expat
     39 parser.  Direct use of the :mod:`pyexpat` module is deprecated.
     40 
     41 This module provides one exception and one type object:
     42 
     43 
     44 .. exception:: ExpatError
     45 
     46    The exception raised when Expat reports an error.  See section
     47    :ref:`expaterror-objects` for more information on interpreting Expat errors.
     48 
     49 
     50 .. exception:: error
     51 
     52    Alias for :exc:`ExpatError`.
     53 
     54 
     55 .. data:: XMLParserType
     56 
     57    The type of the return values from the :func:`ParserCreate` function.
     58 
     59 The :mod:`xml.parsers.expat` module contains two functions:
     60 
     61 
     62 .. function:: ErrorString(errno)
     63 
     64    Returns an explanatory string for a given error number *errno*.
     65 
     66 
     67 .. function:: ParserCreate(encoding=None, namespace_separator=None)
     68 
     69    Creates and returns a new :class:`xmlparser` object.   *encoding*, if specified,
     70    must be a string naming the encoding  used by the XML data.  Expat doesn't
     71    support as many encodings as Python does, and its repertoire of encodings can't
     72    be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII.  If
     73    *encoding* [1]_ is given it will override the implicit or explicit encoding of the
     74    document.
     75 
     76    Expat can optionally do XML namespace processing for you, enabled by providing a
     77    value for *namespace_separator*.  The value must be a one-character string; a
     78    :exc:`ValueError` will be raised if the string has an illegal length (``None``
     79    is considered the same as omission).  When namespace processing is enabled,
     80    element type names and attribute names that belong to a namespace will be
     81    expanded.  The element name passed to the element handlers
     82    :attr:`StartElementHandler` and :attr:`EndElementHandler` will be the
     83    concatenation of the namespace URI, the namespace separator character, and the
     84    local part of the name.  If the namespace separator is a zero byte (``chr(0)``)
     85    then the namespace URI and the local part will be concatenated without any
     86    separator.
     87 
     88    For example, if *namespace_separator* is set to a space character (``' '``) and
     89    the following document is parsed:
     90 
     91    .. code-block:: xml
     92 
     93       <?xml version="1.0"?>
     94       <root xmlns    = "http://default-namespace.org/"
     95             xmlns:py = "http://www.python.org/ns/">
     96         <py:elem1 />
     97         <elem2 xmlns="" />
     98       </root>
     99 
    100    :attr:`StartElementHandler` will receive the following strings for each
    101    element::
    102 
    103       http://default-namespace.org/ root
    104       http://www.python.org/ns/ elem1
    105       elem2
    106 
    107    Due to limitations in the ``Expat`` library used by :mod:`pyexpat`,
    108    the :class:`xmlparser` instance returned can only be used to parse a single
    109    XML document.  Call ``ParserCreate`` for each document to provide unique
    110    parser instances.
    111 
    112 
    113 .. seealso::
    114 
    115    `The Expat XML Parser <http://www.libexpat.org/>`_
    116       Home page of the Expat project.
    117 
    118 
    119 .. _xmlparser-objects:
    120 
    121 XMLParser Objects
    122 -----------------
    123 
    124 :class:`xmlparser` objects have the following methods:
    125 
    126 
    127 .. method:: xmlparser.Parse(data[, isfinal])
    128 
    129    Parses the contents of the string *data*, calling the appropriate handler
    130    functions to process the parsed data.  *isfinal* must be true on the final call
    131    to this method; it allows the parsing of a single file in fragments,
    132    not the submission of multiple files.
    133    *data* can be the empty string at any time.
    134 
    135 
    136 .. method:: xmlparser.ParseFile(file)
    137 
    138    Parse XML data reading from the object *file*.  *file* only needs to provide
    139    the ``read(nbytes)`` method, returning the empty string when there's no more
    140    data.
    141 
    142 
    143 .. method:: xmlparser.SetBase(base)
    144 
    145    Sets the base to be used for resolving relative URIs in system identifiers in
    146    declarations.  Resolving relative identifiers is left to the application: this
    147    value will be passed through as the *base* argument to the
    148    :func:`ExternalEntityRefHandler`, :func:`NotationDeclHandler`, and
    149    :func:`UnparsedEntityDeclHandler` functions.
    150 
    151 
    152 .. method:: xmlparser.GetBase()
    153 
    154    Returns a string containing the base set by a previous call to :meth:`SetBase`,
    155    or ``None`` if  :meth:`SetBase` hasn't been called.
    156 
    157 
    158 .. method:: xmlparser.GetInputContext()
    159 
    160    Returns the input data that generated the current event as a string. The data is
    161    in the encoding of the entity which contains the text. When called while an
    162    event handler is not active, the return value is ``None``.
    163 
    164 
    165 .. method:: xmlparser.ExternalEntityParserCreate(context[, encoding])
    166 
    167    Create a "child" parser which can be used to parse an external parsed entity
    168    referred to by content parsed by the parent parser.  The *context* parameter
    169    should be the string passed to the :meth:`ExternalEntityRefHandler` handler
    170    function, described below. The child parser is created with the
    171    :attr:`ordered_attributes` and :attr:`specified_attributes` set to the values of
    172    this parser.
    173 
    174 .. method:: xmlparser.SetParamEntityParsing(flag)
    175 
    176    Control parsing of parameter entities (including the external DTD subset).
    177    Possible *flag* values are :const:`XML_PARAM_ENTITY_PARSING_NEVER`,
    178    :const:`XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE` and
    179    :const:`XML_PARAM_ENTITY_PARSING_ALWAYS`.  Return true if setting the flag
    180    was successful.
    181 
    182 .. method:: xmlparser.UseForeignDTD([flag])
    183 
    184    Calling this with a true value for *flag* (the default) will cause Expat to call
    185    the :attr:`ExternalEntityRefHandler` with :const:`None` for all arguments to
    186    allow an alternate DTD to be loaded.  If the document does not contain a
    187    document type declaration, the :attr:`ExternalEntityRefHandler` will still be
    188    called, but the :attr:`StartDoctypeDeclHandler` and
    189    :attr:`EndDoctypeDeclHandler` will not be called.
    190 
    191    Passing a false value for *flag* will cancel a previous call that passed a true
    192    value, but otherwise has no effect.
    193 
    194    This method can only be called before the :meth:`Parse` or :meth:`ParseFile`
    195    methods are called; calling it after either of those have been called causes
    196    :exc:`ExpatError` to be raised with the :attr:`code` attribute set to
    197    ``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
    198 
    199 :class:`xmlparser` objects have the following attributes:
    200 
    201 
    202 .. attribute:: xmlparser.buffer_size
    203 
    204    The size of the buffer used when :attr:`buffer_text` is true.
    205    A new buffer size can be set by assigning a new integer value
    206    to this attribute.
    207    When the size is changed, the buffer will be flushed.
    208 
    209 
    210 .. attribute:: xmlparser.buffer_text
    211 
    212    Setting this to true causes the :class:`xmlparser` object to buffer textual
    213    content returned by Expat to avoid multiple calls to the
    214    :meth:`CharacterDataHandler` callback whenever possible.  This can improve
    215    performance substantially since Expat normally breaks character data into chunks
    216    at every line ending.  This attribute is false by default, and may be changed at
    217    any time.
    218 
    219 
    220 .. attribute:: xmlparser.buffer_used
    221 
    222    If :attr:`buffer_text` is enabled, the number of bytes stored in the buffer.
    223    These bytes represent UTF-8 encoded text.  This attribute has no meaningful
    224    interpretation when :attr:`buffer_text` is false.
    225 
    226 
    227 .. attribute:: xmlparser.ordered_attributes
    228 
    229    Setting this attribute to a non-zero integer causes the attributes to be
    230    reported as a list rather than a dictionary.  The attributes are presented in
    231    the order found in the document text.  For each attribute, two list entries are
    232    presented: the attribute name and the attribute value.  (Older versions of this
    233    module also used this format.)  By default, this attribute is false; it may be
    234    changed at any time.
    235 
    236 
    237 .. attribute:: xmlparser.specified_attributes
    238 
    239    If set to a non-zero integer, the parser will report only those attributes which
    240    were specified in the document instance and not those which were derived from
    241    attribute declarations.  Applications which set this need to be especially
    242    careful to use what additional information is available from the declarations as
    243    needed to comply with the standards for the behavior of XML processors.  By
    244    default, this attribute is false; it may be changed at any time.
    245 
    246 
    247 The following attributes contain values relating to the most recent error
    248 encountered by an :class:`xmlparser` object, and will only have correct values
    249 once a call to :meth:`Parse` or :meth:`ParseFile` has raised an
    250 :exc:`xml.parsers.expat.ExpatError` exception.
    251 
    252 
    253 .. attribute:: xmlparser.ErrorByteIndex
    254 
    255    Byte index at which an error occurred.
    256 
    257 
    258 .. attribute:: xmlparser.ErrorCode
    259 
    260    Numeric code specifying the problem.  This value can be passed to the
    261    :func:`ErrorString` function, or compared to one of the constants defined in the
    262    ``errors`` object.
    263 
    264 
    265 .. attribute:: xmlparser.ErrorColumnNumber
    266 
    267    Column number at which an error occurred.
    268 
    269 
    270 .. attribute:: xmlparser.ErrorLineNumber
    271 
    272    Line number at which an error occurred.
    273 
    274 The following attributes contain values relating to the current parse location
    275 in an :class:`xmlparser` object.  During a callback reporting a parse event they
    276 indicate the location of the first of the sequence of characters that generated
    277 the event.  When called outside of a callback, the position indicated will be
    278 just past the last parse event (regardless of whether there was an associated
    279 callback).
    280 
    281 
    282 .. attribute:: xmlparser.CurrentByteIndex
    283 
    284    Current byte index in the parser input.
    285 
    286 
    287 .. attribute:: xmlparser.CurrentColumnNumber
    288 
    289    Current column number in the parser input.
    290 
    291 
    292 .. attribute:: xmlparser.CurrentLineNumber
    293 
    294    Current line number in the parser input.
    295 
    296 Here is the list of handlers that can be set.  To set a handler on an
    297 :class:`xmlparser` object *o*, use ``o.handlername = func``.  *handlername* must
    298 be taken from the following list, and *func* must be a callable object accepting
    299 the correct number of arguments.  The arguments are all strings, unless
    300 otherwise stated.
    301 
    302 
    303 .. method:: xmlparser.XmlDeclHandler(version, encoding, standalone)
    304 
    305    Called when the XML declaration is parsed.  The XML declaration is the
    306    (optional) declaration of the applicable version of the XML recommendation, the
    307    encoding of the document text, and an optional "standalone" declaration.
    308    *version* and *encoding* will be strings, and *standalone* will be ``1`` if the
    309    document is declared standalone, ``0`` if it is declared not to be standalone,
    310    or ``-1`` if the standalone clause was omitted. This is only available with
    311    Expat version 1.95.0 or newer.
    312 
    313 
    314 .. method:: xmlparser.StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset)
    315 
    316    Called when Expat begins parsing the document type declaration (``<!DOCTYPE
    317    ...``).  The *doctypeName* is provided exactly as presented.  The *systemId* and
    318    *publicId* parameters give the system and public identifiers if specified, or
    319    ``None`` if omitted.  *has_internal_subset* will be true if the document
    320    contains and internal document declaration subset. This requires Expat version
    321    1.2 or newer.
    322 
    323 
    324 .. method:: xmlparser.EndDoctypeDeclHandler()
    325 
    326    Called when Expat is done parsing the document type declaration. This requires
    327    Expat version 1.2 or newer.
    328 
    329 
    330 .. method:: xmlparser.ElementDeclHandler(name, model)
    331 
    332    Called once for each element type declaration.  *name* is the name of the
    333    element type, and *model* is a representation of the content model.
    334 
    335 
    336 .. method:: xmlparser.AttlistDeclHandler(elname, attname, type, default, required)
    337 
    338    Called for each declared attribute for an element type.  If an attribute list
    339    declaration declares three attributes, this handler is called three times, once
    340    for each attribute.  *elname* is the name of the element to which the
    341    declaration applies and *attname* is the name of the attribute declared.  The
    342    attribute type is a string passed as *type*; the possible values are
    343    ``'CDATA'``, ``'ID'``, ``'IDREF'``, ... *default* gives the default value for
    344    the attribute used when the attribute is not specified by the document instance,
    345    or ``None`` if there is no default value (``#IMPLIED`` values).  If the
    346    attribute is required to be given in the document instance, *required* will be
    347    true. This requires Expat version 1.95.0 or newer.
    348 
    349 
    350 .. method:: xmlparser.StartElementHandler(name, attributes)
    351 
    352    Called for the start of every element.  *name* is a string containing the
    353    element name, and *attributes* is the element attributes. If
    354    :attr:`ordered_attributes` is true, this is a list (see
    355    :attr:`ordered_attributes` for a full description). Otherwise it's a
    356    dictionary mapping names to values.
    357 
    358 
    359 .. method:: xmlparser.EndElementHandler(name)
    360 
    361    Called for the end of every element.
    362 
    363 
    364 .. method:: xmlparser.ProcessingInstructionHandler(target, data)
    365 
    366    Called for every processing instruction.
    367 
    368 
    369 .. method:: xmlparser.CharacterDataHandler(data)
    370 
    371    Called for character data.  This will be called for normal character data, CDATA
    372    marked content, and ignorable whitespace.  Applications which must distinguish
    373    these cases can use the :attr:`StartCdataSectionHandler`,
    374    :attr:`EndCdataSectionHandler`, and :attr:`ElementDeclHandler` callbacks to
    375    collect the required information.
    376 
    377 
    378 .. method:: xmlparser.UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName)
    379 
    380    Called for unparsed (NDATA) entity declarations.  This is only present for
    381    version 1.2 of the Expat library; for more recent versions, use
    382    :attr:`EntityDeclHandler` instead.  (The underlying function in the Expat
    383    library has been declared obsolete.)
    384 
    385 
    386 .. method:: xmlparser.EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName)
    387 
    388    Called for all entity declarations.  For parameter and internal entities,
    389    *value* will be a string giving the declared contents of the entity; this will
    390    be ``None`` for external entities.  The *notationName* parameter will be
    391    ``None`` for parsed entities, and the name of the notation for unparsed
    392    entities. *is_parameter_entity* will be true if the entity is a parameter entity
    393    or false for general entities (most applications only need to be concerned with
    394    general entities). This is only available starting with version 1.95.0 of the
    395    Expat library.
    396 
    397 
    398 .. method:: xmlparser.NotationDeclHandler(notationName, base, systemId, publicId)
    399 
    400    Called for notation declarations.  *notationName*, *base*, and *systemId*, and
    401    *publicId* are strings if given.  If the public identifier is omitted,
    402    *publicId* will be ``None``.
    403 
    404 
    405 .. method:: xmlparser.StartNamespaceDeclHandler(prefix, uri)
    406 
    407    Called when an element contains a namespace declaration.  Namespace declarations
    408    are processed before the :attr:`StartElementHandler` is called for the element
    409    on which declarations are placed.
    410 
    411 
    412 .. method:: xmlparser.EndNamespaceDeclHandler(prefix)
    413 
    414    Called when the closing tag is reached for an element  that contained a
    415    namespace declaration.  This is called once for each namespace declaration on
    416    the element in the reverse of the order for which the
    417    :attr:`StartNamespaceDeclHandler` was called to indicate the start of each
    418    namespace declaration's scope.  Calls to this handler are made after the
    419    corresponding :attr:`EndElementHandler` for the end of the element.
    420 
    421 
    422 .. method:: xmlparser.CommentHandler(data)
    423 
    424    Called for comments.  *data* is the text of the comment, excluding the leading
    425    ``'<!-``\ ``-'`` and trailing ``'-``\ ``->'``.
    426 
    427 
    428 .. method:: xmlparser.StartCdataSectionHandler()
    429 
    430    Called at the start of a CDATA section.  This and :attr:`EndCdataSectionHandler`
    431    are needed to be able to identify the syntactical start and end for CDATA
    432    sections.
    433 
    434 
    435 .. method:: xmlparser.EndCdataSectionHandler()
    436 
    437    Called at the end of a CDATA section.
    438 
    439 
    440 .. method:: xmlparser.DefaultHandler(data)
    441 
    442    Called for any characters in the XML document for which no applicable handler
    443    has been specified.  This means characters that are part of a construct which
    444    could be reported, but for which no handler has been supplied.
    445 
    446 
    447 .. method:: xmlparser.DefaultHandlerExpand(data)
    448 
    449    This is the same as the :func:`DefaultHandler`,  but doesn't inhibit expansion
    450    of internal entities. The entity reference will not be passed to the default
    451    handler.
    452 
    453 
    454 .. method:: xmlparser.NotStandaloneHandler()
    455 
    456    Called if the XML document hasn't been declared as being a standalone document.
    457    This happens when there is an external subset or a reference to a parameter
    458    entity, but the XML declaration does not set standalone to ``yes`` in an XML
    459    declaration.  If this handler returns ``0``, then the parser will raise an
    460    :const:`XML_ERROR_NOT_STANDALONE` error.  If this handler is not set, no
    461    exception is raised by the parser for this condition.
    462 
    463 
    464 .. method:: xmlparser.ExternalEntityRefHandler(context, base, systemId, publicId)
    465 
    466    Called for references to external entities.  *base* is the current base, as set
    467    by a previous call to :meth:`SetBase`.  The public and system identifiers,
    468    *systemId* and *publicId*, are strings if given; if the public identifier is not
    469    given, *publicId* will be ``None``.  The *context* value is opaque and should
    470    only be used as described below.
    471 
    472    For external entities to be parsed, this handler must be implemented. It is
    473    responsible for creating the sub-parser using
    474    ``ExternalEntityParserCreate(context)``, initializing it with the appropriate
    475    callbacks, and parsing the entity.  This handler should return an integer; if it
    476    returns ``0``, the parser will raise an
    477    :const:`XML_ERROR_EXTERNAL_ENTITY_HANDLING` error, otherwise parsing will
    478    continue.
    479 
    480    If this handler is not provided, external entities are reported by the
    481    :attr:`DefaultHandler` callback, if provided.
    482 
    483 
    484 .. _expaterror-objects:
    485 
    486 ExpatError Exceptions
    487 ---------------------
    488 
    489 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
    490 
    491 
    492 :exc:`ExpatError` exceptions have a number of interesting attributes:
    493 
    494 
    495 .. attribute:: ExpatError.code
    496 
    497    Expat's internal error number for the specific error.  The
    498    :data:`errors.messages <xml.parsers.expat.errors.messages>` dictionary maps
    499    these error numbers to Expat's error messages.  For example::
    500 
    501       from xml.parsers.expat import ParserCreate, ExpatError, errors
    502 
    503       p = ParserCreate()
    504       try:
    505           p.Parse(some_xml_document)
    506       except ExpatError as err:
    507           print("Error:", errors.messages[err.code])
    508 
    509    The :mod:`~xml.parsers.expat.errors` module also provides error message
    510    constants and a dictionary :data:`~xml.parsers.expat.errors.codes` mapping
    511    these messages back to the error codes, see below.
    512 
    513 
    514 .. attribute:: ExpatError.lineno
    515 
    516    Line number on which the error was detected.  The first line is numbered ``1``.
    517 
    518 
    519 .. attribute:: ExpatError.offset
    520 
    521    Character offset into the line where the error occurred.  The first column is
    522    numbered ``0``.
    523 
    524 
    525 .. _expat-example:
    526 
    527 Example
    528 -------
    529 
    530 The following program defines three handlers that just print out their
    531 arguments. ::
    532 
    533    import xml.parsers.expat
    534 
    535    # 3 handler functions
    536    def start_element(name, attrs):
    537        print('Start element:', name, attrs)
    538    def end_element(name):
    539        print('End element:', name)
    540    def char_data(data):
    541        print('Character data:', repr(data))
    542 
    543    p = xml.parsers.expat.ParserCreate()
    544 
    545    p.StartElementHandler = start_element
    546    p.EndElementHandler = end_element
    547    p.CharacterDataHandler = char_data
    548 
    549    p.Parse("""<?xml version="1.0"?>
    550    <parent id="top"><child1 name="paul">Text goes here</child1>
    551    <child2 name="fred">More text</child2>
    552    </parent>""", 1)
    553 
    554 The output from this program is::
    555 
    556    Start element: parent {'id': 'top'}
    557    Start element: child1 {'name': 'paul'}
    558    Character data: 'Text goes here'
    559    End element: child1
    560    Character data: '\n'
    561    Start element: child2 {'name': 'fred'}
    562    Character data: 'More text'
    563    End element: child2
    564    Character data: '\n'
    565    End element: parent
    566 
    567 
    568 .. _expat-content-models:
    569 
    570 Content Model Descriptions
    571 --------------------------
    572 
    573 .. module:: xml.parsers.expat.model
    574 
    575 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
    576 
    577 Content models are described using nested tuples.  Each tuple contains four
    578 values: the type, the quantifier, the name, and a tuple of children.  Children
    579 are simply additional content model descriptions.
    580 
    581 The values of the first two fields are constants defined in the
    582 :mod:`xml.parsers.expat.model` module.  These constants can be collected in two
    583 groups: the model type group and the quantifier group.
    584 
    585 The constants in the model type group are:
    586 
    587 
    588 .. data:: XML_CTYPE_ANY
    589    :noindex:
    590 
    591    The element named by the model name was declared to have a content model of
    592    ``ANY``.
    593 
    594 
    595 .. data:: XML_CTYPE_CHOICE
    596    :noindex:
    597 
    598    The named element allows a choice from a number of options; this is used for
    599    content models such as ``(A | B | C)``.
    600 
    601 
    602 .. data:: XML_CTYPE_EMPTY
    603    :noindex:
    604 
    605    Elements which are declared to be ``EMPTY`` have this model type.
    606 
    607 
    608 .. data:: XML_CTYPE_MIXED
    609    :noindex:
    610 
    611 
    612 .. data:: XML_CTYPE_NAME
    613    :noindex:
    614 
    615 
    616 .. data:: XML_CTYPE_SEQ
    617    :noindex:
    618 
    619    Models which represent a series of models which follow one after the other are
    620    indicated with this model type.  This is used for models such as ``(A, B, C)``.
    621 
    622 The constants in the quantifier group are:
    623 
    624 
    625 .. data:: XML_CQUANT_NONE
    626    :noindex:
    627 
    628    No modifier is given, so it can appear exactly once, as for ``A``.
    629 
    630 
    631 .. data:: XML_CQUANT_OPT
    632    :noindex:
    633 
    634    The model is optional: it can appear once or not at all, as for ``A?``.
    635 
    636 
    637 .. data:: XML_CQUANT_PLUS
    638    :noindex:
    639 
    640    The model must occur one or more times (like ``A+``).
    641 
    642 
    643 .. data:: XML_CQUANT_REP
    644    :noindex:
    645 
    646    The model must occur zero or more times, as for ``A*``.
    647 
    648 
    649 .. _expat-errors:
    650 
    651 Expat error constants
    652 ---------------------
    653 
    654 .. module:: xml.parsers.expat.errors
    655 
    656 The following constants are provided in the :mod:`xml.parsers.expat.errors`
    657 module.  These constants are useful in interpreting some of the attributes of
    658 the :exc:`ExpatError` exception objects raised when an error has occurred.
    659 Since for backwards compatibility reasons, the constants' value is the error
    660 *message* and not the numeric error *code*, you do this by comparing its
    661 :attr:`code` attribute with
    662 :samp:`errors.codes[errors.XML_ERROR_{CONSTANT_NAME}]`.
    663 
    664 The ``errors`` module has the following attributes:
    665 
    666 .. data:: codes
    667 
    668    A dictionary mapping numeric error codes to their string descriptions.
    669 
    670    .. versionadded:: 3.2
    671 
    672 
    673 .. data:: messages
    674 
    675    A dictionary mapping string descriptions to their error codes.
    676 
    677    .. versionadded:: 3.2
    678 
    679 
    680 .. data:: XML_ERROR_ASYNC_ENTITY
    681 
    682 
    683 .. data:: XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
    684 
    685    An entity reference in an attribute value referred to an external entity instead
    686    of an internal entity.
    687 
    688 
    689 .. data:: XML_ERROR_BAD_CHAR_REF
    690 
    691    A character reference referred to a character which is illegal in XML (for
    692    example, character ``0``, or '``&#0;``').
    693 
    694 
    695 .. data:: XML_ERROR_BINARY_ENTITY_REF
    696 
    697    An entity reference referred to an entity which was declared with a notation, so
    698    cannot be parsed.
    699 
    700 
    701 .. data:: XML_ERROR_DUPLICATE_ATTRIBUTE
    702 
    703    An attribute was used more than once in a start tag.
    704 
    705 
    706 .. data:: XML_ERROR_INCORRECT_ENCODING
    707 
    708 
    709 .. data:: XML_ERROR_INVALID_TOKEN
    710 
    711    Raised when an input byte could not properly be assigned to a character; for
    712    example, a NUL byte (value ``0``) in a UTF-8 input stream.
    713 
    714 
    715 .. data:: XML_ERROR_JUNK_AFTER_DOC_ELEMENT
    716 
    717    Something other than whitespace occurred after the document element.
    718 
    719 
    720 .. data:: XML_ERROR_MISPLACED_XML_PI
    721 
    722    An XML declaration was found somewhere other than the start of the input data.
    723 
    724 
    725 .. data:: XML_ERROR_NO_ELEMENTS
    726 
    727    The document contains no elements (XML requires all documents to contain exactly
    728    one top-level element)..
    729 
    730 
    731 .. data:: XML_ERROR_NO_MEMORY
    732 
    733    Expat was not able to allocate memory internally.
    734 
    735 
    736 .. data:: XML_ERROR_PARAM_ENTITY_REF
    737 
    738    A parameter entity reference was found where it was not allowed.
    739 
    740 
    741 .. data:: XML_ERROR_PARTIAL_CHAR
    742 
    743    An incomplete character was found in the input.
    744 
    745 
    746 .. data:: XML_ERROR_RECURSIVE_ENTITY_REF
    747 
    748    An entity reference contained another reference to the same entity; possibly via
    749    a different name, and possibly indirectly.
    750 
    751 
    752 .. data:: XML_ERROR_SYNTAX
    753 
    754    Some unspecified syntax error was encountered.
    755 
    756 
    757 .. data:: XML_ERROR_TAG_MISMATCH
    758 
    759    An end tag did not match the innermost open start tag.
    760 
    761 
    762 .. data:: XML_ERROR_UNCLOSED_TOKEN
    763 
    764    Some token (such as a start tag) was not closed before the end of the stream or
    765    the next token was encountered.
    766 
    767 
    768 .. data:: XML_ERROR_UNDEFINED_ENTITY
    769 
    770    A reference was made to an entity which was not defined.
    771 
    772 
    773 .. data:: XML_ERROR_UNKNOWN_ENCODING
    774 
    775    The document encoding is not supported by Expat.
    776 
    777 
    778 .. data:: XML_ERROR_UNCLOSED_CDATA_SECTION
    779 
    780    A CDATA marked section was not closed.
    781 
    782 
    783 .. data:: XML_ERROR_EXTERNAL_ENTITY_HANDLING
    784 
    785 
    786 .. data:: XML_ERROR_NOT_STANDALONE
    787 
    788    The parser determined that the document was not "standalone" though it declared
    789    itself to be in the XML declaration, and the :attr:`NotStandaloneHandler` was
    790    set and returned ``0``.
    791 
    792 
    793 .. data:: XML_ERROR_UNEXPECTED_STATE
    794 
    795 
    796 .. data:: XML_ERROR_ENTITY_DECLARED_IN_PE
    797 
    798 
    799 .. data:: XML_ERROR_FEATURE_REQUIRES_XML_DTD
    800 
    801    An operation was requested that requires DTD support to be compiled in, but
    802    Expat was configured without DTD support.  This should never be reported by a
    803    standard build of the :mod:`xml.parsers.expat` module.
    804 
    805 
    806 .. data:: XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING
    807 
    808    A behavioral change was requested after parsing started that can only be changed
    809    before parsing has started.  This is (currently) only raised by
    810    :meth:`UseForeignDTD`.
    811 
    812 
    813 .. data:: XML_ERROR_UNBOUND_PREFIX
    814 
    815    An undeclared prefix was found when namespace processing was enabled.
    816 
    817 
    818 .. data:: XML_ERROR_UNDECLARING_PREFIX
    819 
    820    The document attempted to remove the namespace declaration associated with a
    821    prefix.
    822 
    823 
    824 .. data:: XML_ERROR_INCOMPLETE_PE
    825 
    826    A parameter entity contained incomplete markup.
    827 
    828 
    829 .. data:: XML_ERROR_XML_DECL
    830 
    831    The document contained no document element at all.
    832 
    833 
    834 .. data:: XML_ERROR_TEXT_DECL
    835 
    836    There was an error parsing a text declaration in an external entity.
    837 
    838 
    839 .. data:: XML_ERROR_PUBLICID
    840 
    841    Characters were found in the public id that are not allowed.
    842 
    843 
    844 .. data:: XML_ERROR_SUSPENDED
    845 
    846    The requested operation was made on a suspended parser, but isn't allowed.  This
    847    includes attempts to provide additional input or to stop the parser.
    848 
    849 
    850 .. data:: XML_ERROR_NOT_SUSPENDED
    851 
    852    An attempt to resume the parser was made when the parser had not been suspended.
    853 
    854 
    855 .. data:: XML_ERROR_ABORTED
    856 
    857    This should not be reported to Python applications.
    858 
    859 
    860 .. data:: XML_ERROR_FINISHED
    861 
    862    The requested operation was made on a parser which was finished parsing input,
    863    but isn't allowed.  This includes attempts to provide additional input or to
    864    stop the parser.
    865 
    866 
    867 .. data:: XML_ERROR_SUSPEND_PE
    868 
    869 
    870 .. rubric:: Footnotes
    871 
    872 .. [#] The encoding string included in XML output should conform to the
    873    appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
    874    not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
    875    and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
    876 
    877