Home | History | Annotate | Download | only in library
      1 
      2 :mod:`xml.sax.handler` --- Base classes for SAX handlers
      3 ========================================================
      4 
      5 .. module:: xml.sax.handler
      6    :synopsis: Base classes for SAX event handlers.
      7 .. moduleauthor:: Lars Marius Garshol <larsga (a] garshol.priv.no>
      8 .. sectionauthor:: Martin v. Lwis <martin (a] v.loewis.de>
      9 
     10 
     11 .. versionadded:: 2.0
     12 
     13 The SAX API defines four kinds of handlers: content handlers, DTD handlers,
     14 error handlers, and entity resolvers. Applications normally only need to
     15 implement those interfaces whose events they are interested in; they can
     16 implement the interfaces in a single object or in multiple objects. Handler
     17 implementations should inherit from the base classes provided in the module
     18 :mod:`xml.sax.handler`, so that all methods get default implementations.
     19 
     20 
     21 .. class:: ContentHandler
     22 
     23    This is the main callback interface in SAX, and the one most important to
     24    applications. The order of events in this interface mirrors the order of the
     25    information in the document.
     26 
     27 
     28 .. class:: DTDHandler
     29 
     30    Handle DTD events.
     31 
     32    This interface specifies only those DTD events required for basic parsing
     33    (unparsed entities and attributes).
     34 
     35 
     36 .. class:: EntityResolver
     37 
     38    Basic interface for resolving entities. If you create an object implementing
     39    this interface, then register the object with your Parser, the parser will call
     40    the method in your object to resolve all external entities.
     41 
     42 
     43 .. class:: ErrorHandler
     44 
     45    Interface used by the parser to present error and warning messages to the
     46    application.  The methods of this object control whether errors are immediately
     47    converted to exceptions or are handled in some other way.
     48 
     49 In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants
     50 for the feature and property names.
     51 
     52 
     53 .. data:: feature_namespaces
     54 
     55    | value: ``"http://xml.org/sax/features/namespaces"``
     56    | true: Perform Namespace processing.
     57    | false: Optionally do not perform Namespace processing (implies
     58      namespace-prefixes; default).
     59    | access: (parsing) read-only; (not parsing) read/write
     60 
     61 
     62 .. data:: feature_namespace_prefixes
     63 
     64    | value: ``"http://xml.org/sax/features/namespace-prefixes"``
     65    | true: Report the original prefixed names and attributes used for Namespace
     66      declarations.
     67    | false: Do not report attributes used for Namespace declarations, and
     68      optionally do not report original prefixed names (default).
     69    | access: (parsing) read-only; (not parsing) read/write
     70 
     71 
     72 .. data:: feature_string_interning
     73 
     74    | value: ``"http://xml.org/sax/features/string-interning"``
     75    | true: All element names, prefixes, attribute names, Namespace URIs, and
     76      local names are interned using the built-in intern function.
     77    | false: Names are not necessarily interned, although they may be (default).
     78    | access: (parsing) read-only; (not parsing) read/write
     79 
     80 
     81 .. data:: feature_validation
     82 
     83    | value: ``"http://xml.org/sax/features/validation"``
     84    | true: Report all validation errors (implies external-general-entities and
     85      external-parameter-entities).
     86    | false: Do not report validation errors.
     87    | access: (parsing) read-only; (not parsing) read/write
     88 
     89 
     90 .. data:: feature_external_ges
     91 
     92    | value: ``"http://xml.org/sax/features/external-general-entities"``
     93    | true: Include all external general (text) entities.
     94    | false: Do not include external general entities.
     95    | access: (parsing) read-only; (not parsing) read/write
     96 
     97 
     98 .. data:: feature_external_pes
     99 
    100    | value: ``"http://xml.org/sax/features/external-parameter-entities"``
    101    | true: Include all external parameter entities, including the external DTD
    102      subset.
    103    | false: Do not include any external parameter entities, even the external
    104      DTD subset.
    105    | access: (parsing) read-only; (not parsing) read/write
    106 
    107 
    108 .. data:: all_features
    109 
    110    List of all features.
    111 
    112 
    113 .. data:: property_lexical_handler
    114 
    115    | value: ``"http://xml.org/sax/properties/lexical-handler"``
    116    | data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2)
    117    | description: An optional extension handler for lexical events like
    118      comments.
    119    | access: read/write
    120 
    121 
    122 .. data:: property_declaration_handler
    123 
    124    | value: ``"http://xml.org/sax/properties/declaration-handler"``
    125    | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2)
    126    | description: An optional extension handler for DTD-related events other
    127      than notations and unparsed entities.
    128    | access: read/write
    129 
    130 
    131 .. data:: property_dom_node
    132 
    133    | value: ``"http://xml.org/sax/properties/dom-node"``
    134    | data type: org.w3c.dom.Node (not supported in Python 2)
    135    | description: When parsing, the current DOM node being visited if this is
    136      a DOM iterator; when not parsing, the root DOM node for iteration.
    137    | access: (parsing) read-only; (not parsing) read/write
    138 
    139 
    140 .. data:: property_xml_string
    141 
    142    | value: ``"http://xml.org/sax/properties/xml-string"``
    143    | data type: String
    144    | description: The literal string of characters that was the source for the
    145      current event.
    146    | access: read-only
    147 
    148 
    149 .. data:: all_properties
    150 
    151    List of all known property names.
    152 
    153 
    154 .. _content-handler-objects:
    155 
    156 ContentHandler Objects
    157 ----------------------
    158 
    159 Users are expected to subclass :class:`ContentHandler` to support their
    160 application.  The following methods are called by the parser on the appropriate
    161 events in the input document:
    162 
    163 
    164 .. method:: ContentHandler.setDocumentLocator(locator)
    165 
    166    Called by the parser to give the application a locator for locating the origin
    167    of document events.
    168 
    169    SAX parsers are strongly encouraged (though not absolutely required) to supply a
    170    locator: if it does so, it must supply the locator to the application by
    171    invoking this method before invoking any of the other methods in the
    172    DocumentHandler interface.
    173 
    174    The locator allows the application to determine the end position of any
    175    document-related event, even if the parser is not reporting an error. Typically,
    176    the application will use this information for reporting its own errors (such as
    177    character content that does not match an application's business rules). The
    178    information returned by the locator is probably not sufficient for use with a
    179    search engine.
    180 
    181    Note that the locator will return correct information only during the invocation
    182    of the events in this interface. The application should not attempt to use it at
    183    any other time.
    184 
    185 
    186 .. method:: ContentHandler.startDocument()
    187 
    188    Receive notification of the beginning of a document.
    189 
    190    The SAX parser will invoke this method only once, before any other methods in
    191    this interface or in DTDHandler (except for :meth:`setDocumentLocator`).
    192 
    193 
    194 .. method:: ContentHandler.endDocument()
    195 
    196    Receive notification of the end of a document.
    197 
    198    The SAX parser will invoke this method only once, and it will be the last method
    199    invoked during the parse. The parser shall not invoke this method until it has
    200    either abandoned parsing (because of an unrecoverable error) or reached the end
    201    of input.
    202 
    203 
    204 .. method:: ContentHandler.startPrefixMapping(prefix, uri)
    205 
    206    Begin the scope of a prefix-URI Namespace mapping.
    207 
    208    The information from this event is not necessary for normal Namespace
    209    processing: the SAX XML reader will automatically replace prefixes for element
    210    and attribute names when the ``feature_namespaces`` feature is enabled (the
    211    default).
    212 
    213    There are cases, however, when applications need to use prefixes in character
    214    data or in attribute values, where they cannot safely be expanded automatically;
    215    the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the
    216    information to the application to expand prefixes in those contexts itself, if
    217    necessary.
    218 
    219    .. XXX This is not really the default, is it? MvL
    220 
    221    Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not
    222    guaranteed to be properly nested relative to each-other: all
    223    :meth:`startPrefixMapping` events will occur before the corresponding
    224    :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur
    225    after the corresponding :meth:`endElement` event, but their order is not
    226    guaranteed.
    227 
    228 
    229 .. method:: ContentHandler.endPrefixMapping(prefix)
    230 
    231    End the scope of a prefix-URI mapping.
    232 
    233    See :meth:`startPrefixMapping` for details. This event will always occur after
    234    the corresponding :meth:`endElement` event, but the order of
    235    :meth:`endPrefixMapping` events is not otherwise guaranteed.
    236 
    237 
    238 .. method:: ContentHandler.startElement(name, attrs)
    239 
    240    Signals the start of an element in non-namespace mode.
    241 
    242    The *name* parameter contains the raw XML 1.0 name of the element type as a
    243    string and the *attrs* parameter holds an object of the
    244    :class:`~xml.sax.xmlreader.Attributes`
    245    interface (see :ref:`attributes-objects`) containing the attributes of
    246    the element.  The object passed as *attrs* may be re-used by the parser; holding
    247    on to a reference to it is not a reliable way to keep a copy of the attributes.
    248    To keep a copy of the attributes, use the :meth:`copy` method of the *attrs*
    249    object.
    250 
    251 
    252 .. method:: ContentHandler.endElement(name)
    253 
    254    Signals the end of an element in non-namespace mode.
    255 
    256    The *name* parameter contains the name of the element type, just as with the
    257    :meth:`startElement` event.
    258 
    259 
    260 .. method:: ContentHandler.startElementNS(name, qname, attrs)
    261 
    262    Signals the start of an element in namespace mode.
    263 
    264    The *name* parameter contains the name of the element type as a ``(uri,
    265    localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in
    266    the source document, and the *attrs* parameter holds an instance of the
    267    :class:`~xml.sax.xmlreader.AttributesNS` interface (see
    268    :ref:`attributes-ns-objects`)
    269    containing the attributes of the element.  If no namespace is associated with
    270    the element, the *uri* component of *name* will be ``None``.  The object passed
    271    as *attrs* may be re-used by the parser; holding on to a reference to it is not
    272    a reliable way to keep a copy of the attributes.  To keep a copy of the
    273    attributes, use the :meth:`copy` method of the *attrs* object.
    274 
    275    Parsers may set the *qname* parameter to ``None``, unless the
    276    ``feature_namespace_prefixes`` feature is activated.
    277 
    278 
    279 .. method:: ContentHandler.endElementNS(name, qname)
    280 
    281    Signals the end of an element in namespace mode.
    282 
    283    The *name* parameter contains the name of the element type, just as with the
    284    :meth:`startElementNS` method, likewise the *qname* parameter.
    285 
    286 
    287 .. method:: ContentHandler.characters(content)
    288 
    289    Receive notification of character data.
    290 
    291    The Parser will call this method to report each chunk of character data. SAX
    292    parsers may return all contiguous character data in a single chunk, or they may
    293    split it into several chunks; however, all of the characters in any single event
    294    must come from the same external entity so that the Locator provides useful
    295    information.
    296 
    297    *content* may be a Unicode string or a byte string; the ``expat`` reader module
    298    produces always Unicode strings.
    299 
    300    .. note::
    301 
    302       The earlier SAX 1 interface provided by the Python XML Special Interest Group
    303       used a more Java-like interface for this method.  Since most parsers used from
    304       Python did not take advantage of the older interface, the simpler signature was
    305       chosen to replace it.  To convert old code to the new interface, use *content*
    306       instead of slicing content with the old *offset* and *length* parameters.
    307 
    308 
    309 .. method:: ContentHandler.ignorableWhitespace(whitespace)
    310 
    311    Receive notification of ignorable whitespace in element content.
    312 
    313    Validating Parsers must use this method to report each chunk of ignorable
    314    whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating
    315    parsers may also use this method if they are capable of parsing and using
    316    content models.
    317 
    318    SAX parsers may return all contiguous whitespace in a single chunk, or they may
    319    split it into several chunks; however, all of the characters in any single event
    320    must come from the same external entity, so that the Locator provides useful
    321    information.
    322 
    323 
    324 .. method:: ContentHandler.processingInstruction(target, data)
    325 
    326    Receive notification of a processing instruction.
    327 
    328    The Parser will invoke this method once for each processing instruction found:
    329    note that processing instructions may occur before or after the main document
    330    element.
    331 
    332    A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a
    333    text declaration (XML 1.0, section 4.3.1) using this method.
    334 
    335 
    336 .. method:: ContentHandler.skippedEntity(name)
    337 
    338    Receive notification of a skipped entity.
    339 
    340    The Parser will invoke this method once for each entity skipped. Non-validating
    341    processors may skip entities if they have not seen the declarations (because,
    342    for example, the entity was declared in an external DTD subset). All processors
    343    may skip external entities, depending on the values of the
    344    ``feature_external_ges`` and the ``feature_external_pes`` properties.
    345 
    346 
    347 .. _dtd-handler-objects:
    348 
    349 DTDHandler Objects
    350 ------------------
    351 
    352 :class:`DTDHandler` instances provide the following methods:
    353 
    354 
    355 .. method:: DTDHandler.notationDecl(name, publicId, systemId)
    356 
    357    Handle a notation declaration event.
    358 
    359 
    360 .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata)
    361 
    362    Handle an unparsed entity declaration event.
    363 
    364 
    365 .. _entity-resolver-objects:
    366 
    367 EntityResolver Objects
    368 ----------------------
    369 
    370 
    371 .. method:: EntityResolver.resolveEntity(publicId, systemId)
    372 
    373    Resolve the system identifier of an entity and return either the system
    374    identifier to read from as a string, or an InputSource to read from. The default
    375    implementation returns *systemId*.
    376 
    377 
    378 .. _sax-error-handler:
    379 
    380 ErrorHandler Objects
    381 --------------------
    382 
    383 Objects with this interface are used to receive error and warning information
    384 from the :class:`~xml.sax.xmlreader.XMLReader`.  If you create an object that
    385 implements this interface, then register the object with your
    386 :class:`~xml.sax.xmlreader.XMLReader`, the parser
    387 will call the methods in your object to report all warnings and errors. There
    388 are three levels of errors available: warnings, (possibly) recoverable errors,
    389 and unrecoverable errors.  All methods take a :exc:`SAXParseException` as the
    390 only parameter.  Errors and warnings may be converted to an exception by raising
    391 the passed-in exception object.
    392 
    393 
    394 .. method:: ErrorHandler.error(exception)
    395 
    396    Called when the parser encounters a recoverable error.  If this method does not
    397    raise an exception, parsing may continue, but further document information
    398    should not be expected by the application.  Allowing the parser to continue may
    399    allow additional errors to be discovered in the input document.
    400 
    401 
    402 .. method:: ErrorHandler.fatalError(exception)
    403 
    404    Called when the parser encounters an error it cannot recover from; parsing is
    405    expected to terminate when this method returns.
    406 
    407 
    408 .. method:: ErrorHandler.warning(exception)
    409 
    410    Called when the parser presents minor warning information to the application.
    411    Parsing is expected to continue when this method returns, and document
    412    information will continue to be passed to the application. Raising an exception
    413    in this method will cause parsing to end.
    414 
    415