Home | History | Annotate | Download | only in library
      1 
      2 :mod:`xml.sax.xmlreader` --- Interface for XML parsers
      3 ======================================================
      4 
      5 .. module:: xml.sax.xmlreader
      6    :synopsis: Interface which SAX-compliant XML parsers must implement.
      7 .. moduleauthor:: Lars Marius Garshol <larsga (a] garshol.priv.no>
      8 .. sectionauthor:: Martin v. Lwis <martin (a] v.loewis.de>
      9 
     10 
     11 .. versionadded:: 2.0
     12 
     13 SAX parsers implement the :class:`XMLReader` interface. They are implemented in
     14 a Python module, which must provide a function :func:`create_parser`. This
     15 function is invoked by  :func:`xml.sax.make_parser` with no arguments to create
     16 a new  parser object.
     17 
     18 
     19 .. class:: XMLReader()
     20 
     21    Base class which can be inherited by SAX parsers.
     22 
     23 
     24 .. class:: IncrementalParser()
     25 
     26    In some cases, it is desirable not to parse an input source at once, but to feed
     27    chunks of the document as they get available. Note that the reader will normally
     28    not read the entire file, but read it in chunks as well; still :meth:`parse`
     29    won't return until the entire document is processed. So these interfaces should
     30    be used if the blocking behaviour of :meth:`parse` is not desirable.
     31 
     32    When the parser is instantiated it is ready to begin accepting data from the
     33    feed method immediately. After parsing has been finished with a call to close
     34    the reset method must be called to make the parser ready to accept new data,
     35    either from feed or using the parse method.
     36 
     37    Note that these methods must *not* be called during parsing, that is, after
     38    parse has been called and before it returns.
     39 
     40    By default, the class also implements the parse method of the XMLReader
     41    interface using the feed, close and reset methods of the IncrementalParser
     42    interface as a convenience to SAX 2.0 driver writers.
     43 
     44 
     45 .. class:: Locator()
     46 
     47    Interface for associating a SAX event with a document location. A locator object
     48    will return valid results only during calls to DocumentHandler methods; at any
     49    other time, the results are unpredictable. If information is not available,
     50    methods may return ``None``.
     51 
     52 
     53 .. class:: InputSource([systemId])
     54 
     55    Encapsulation of the information needed by the :class:`XMLReader` to read
     56    entities.
     57 
     58    This class may include information about the public identifier, system
     59    identifier, byte stream (possibly with character encoding information) and/or
     60    the character stream of an entity.
     61 
     62    Applications will create objects of this class for use in the
     63    :meth:`XMLReader.parse` method and for returning from
     64    EntityResolver.resolveEntity.
     65 
     66    An :class:`InputSource` belongs to the application, the :class:`XMLReader` is
     67    not allowed to modify :class:`InputSource` objects passed to it from the
     68    application, although it may make copies and modify those.
     69 
     70 
     71 .. class:: AttributesImpl(attrs)
     72 
     73    This is an implementation of the :class:`Attributes` interface (see section
     74    :ref:`attributes-objects`).  This is a dictionary-like object which
     75    represents the element attributes in a :meth:`startElement` call. In addition
     76    to the most useful dictionary operations, it supports a number of other
     77    methods as described by the interface. Objects of this class should be
     78    instantiated by readers; *attrs* must be a dictionary-like object containing
     79    a mapping from attribute names to attribute values.
     80 
     81 
     82 .. class:: AttributesNSImpl(attrs, qnames)
     83 
     84    Namespace-aware variant of :class:`AttributesImpl`, which will be passed to
     85    :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but
     86    understands attribute names as two-tuples of *namespaceURI* and
     87    *localname*. In addition, it provides a number of methods expecting qualified
     88    names as they appear in the original document.  This class implements the
     89    :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`).
     90 
     91 
     92 .. _xmlreader-objects:
     93 
     94 XMLReader Objects
     95 -----------------
     96 
     97 The :class:`XMLReader` interface supports the following methods:
     98 
     99 
    100 .. method:: XMLReader.parse(source)
    101 
    102    Process an input source, producing SAX events. The *source* object can be a
    103    system identifier (a string identifying the input source -- typically a file
    104    name or a URL), a file-like object, or an :class:`InputSource` object. When
    105    :meth:`parse` returns, the input is completely processed, and the parser object
    106    can be discarded or reset. As a limitation, the current implementation only
    107    accepts byte streams; processing of character streams is for further study.
    108 
    109 
    110 .. method:: XMLReader.getContentHandler()
    111 
    112    Return the current :class:`~xml.sax.handler.ContentHandler`.
    113 
    114 
    115 .. method:: XMLReader.setContentHandler(handler)
    116 
    117    Set the current :class:`~xml.sax.handler.ContentHandler`.  If no
    118    :class:`~xml.sax.handler.ContentHandler` is set, content events will be
    119    discarded.
    120 
    121 
    122 .. method:: XMLReader.getDTDHandler()
    123 
    124    Return the current :class:`~xml.sax.handler.DTDHandler`.
    125 
    126 
    127 .. method:: XMLReader.setDTDHandler(handler)
    128 
    129    Set the current :class:`~xml.sax.handler.DTDHandler`.  If no
    130    :class:`~xml.sax.handler.DTDHandler` is set, DTD
    131    events will be discarded.
    132 
    133 
    134 .. method:: XMLReader.getEntityResolver()
    135 
    136    Return the current :class:`~xml.sax.handler.EntityResolver`.
    137 
    138 
    139 .. method:: XMLReader.setEntityResolver(handler)
    140 
    141    Set the current :class:`~xml.sax.handler.EntityResolver`.  If no
    142    :class:`~xml.sax.handler.EntityResolver` is set,
    143    attempts to resolve an external entity will result in opening the system
    144    identifier for the entity, and fail if it is not available.
    145 
    146 
    147 .. method:: XMLReader.getErrorHandler()
    148 
    149    Return the current :class:`~xml.sax.handler.ErrorHandler`.
    150 
    151 
    152 .. method:: XMLReader.setErrorHandler(handler)
    153 
    154    Set the current error handler.  If no :class:`~xml.sax.handler.ErrorHandler`
    155    is set, errors will be raised as exceptions, and warnings will be printed.
    156 
    157 
    158 .. method:: XMLReader.setLocale(locale)
    159 
    160    Allow an application to set the locale for errors and warnings.
    161 
    162    SAX parsers are not required to provide localization for errors and warnings; if
    163    they cannot support the requested locale, however, they must raise a SAX
    164    exception.  Applications may request a locale change in the middle of a parse.
    165 
    166 
    167 .. method:: XMLReader.getFeature(featurename)
    168 
    169    Return the current setting for feature *featurename*.  If the feature is not
    170    recognized, :exc:`SAXNotRecognizedException` is raised. The well-known
    171    featurenames are listed in the module :mod:`xml.sax.handler`.
    172 
    173 
    174 .. method:: XMLReader.setFeature(featurename, value)
    175 
    176    Set the *featurename* to *value*. If the feature is not recognized,
    177    :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not
    178    supported by the parser, *SAXNotSupportedException* is raised.
    179 
    180 
    181 .. method:: XMLReader.getProperty(propertyname)
    182 
    183    Return the current setting for property *propertyname*. If the property is not
    184    recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known
    185    propertynames are listed in the module :mod:`xml.sax.handler`.
    186 
    187 
    188 .. method:: XMLReader.setProperty(propertyname, value)
    189 
    190    Set the *propertyname* to *value*. If the property is not recognized,
    191    :exc:`SAXNotRecognizedException` is raised. If the property or its setting is
    192    not supported by the parser, *SAXNotSupportedException* is raised.
    193 
    194 
    195 .. _incremental-parser-objects:
    196 
    197 IncrementalParser Objects
    198 -------------------------
    199 
    200 Instances of :class:`IncrementalParser` offer the following additional methods:
    201 
    202 
    203 .. method:: IncrementalParser.feed(data)
    204 
    205    Process a chunk of *data*.
    206 
    207 
    208 .. method:: IncrementalParser.close()
    209 
    210    Assume the end of the document. That will check well-formedness conditions that
    211    can be checked only at the end, invoke handlers, and may clean up resources
    212    allocated during parsing.
    213 
    214 
    215 .. method:: IncrementalParser.reset()
    216 
    217    This method is called after close has been called to reset the parser so that it
    218    is ready to parse new documents. The results of calling parse or feed after
    219    close without calling reset are undefined.
    220 
    221 
    222 .. _locator-objects:
    223 
    224 Locator Objects
    225 ---------------
    226 
    227 Instances of :class:`Locator` provide these methods:
    228 
    229 
    230 .. method:: Locator.getColumnNumber()
    231 
    232    Return the column number where the current event begins.
    233 
    234 
    235 .. method:: Locator.getLineNumber()
    236 
    237    Return the line number where the current event begins.
    238 
    239 
    240 .. method:: Locator.getPublicId()
    241 
    242    Return the public identifier for the current event.
    243 
    244 
    245 .. method:: Locator.getSystemId()
    246 
    247    Return the system identifier for the current event.
    248 
    249 
    250 .. _input-source-objects:
    251 
    252 InputSource Objects
    253 -------------------
    254 
    255 
    256 .. method:: InputSource.setPublicId(id)
    257 
    258    Sets the public identifier of this :class:`InputSource`.
    259 
    260 
    261 .. method:: InputSource.getPublicId()
    262 
    263    Returns the public identifier of this :class:`InputSource`.
    264 
    265 
    266 .. method:: InputSource.setSystemId(id)
    267 
    268    Sets the system identifier of this :class:`InputSource`.
    269 
    270 
    271 .. method:: InputSource.getSystemId()
    272 
    273    Returns the system identifier of this :class:`InputSource`.
    274 
    275 
    276 .. method:: InputSource.setEncoding(encoding)
    277 
    278    Sets the character encoding of this :class:`InputSource`.
    279 
    280    The encoding must be a string acceptable for an XML encoding declaration (see
    281    section 4.3.3 of the XML recommendation).
    282 
    283    The encoding attribute of the :class:`InputSource` is ignored if the
    284    :class:`InputSource` also contains a character stream.
    285 
    286 
    287 .. method:: InputSource.getEncoding()
    288 
    289    Get the character encoding of this InputSource.
    290 
    291 
    292 .. method:: InputSource.setByteStream(bytefile)
    293 
    294    Set the byte stream (a Python file-like object which does not perform
    295    byte-to-character conversion) for this input source.
    296 
    297    The SAX parser will ignore this if there is also a character stream specified,
    298    but it will use a byte stream in preference to opening a URI connection itself.
    299 
    300    If the application knows the character encoding of the byte stream, it should
    301    set it with the setEncoding method.
    302 
    303 
    304 .. method:: InputSource.getByteStream()
    305 
    306    Get the byte stream for this input source.
    307 
    308    The getEncoding method will return the character encoding for this byte stream,
    309    or ``None`` if unknown.
    310 
    311 
    312 .. method:: InputSource.setCharacterStream(charfile)
    313 
    314    Set the character stream for this input source. (The stream must be a Python 1.6
    315    Unicode-wrapped file-like that performs conversion to Unicode strings.)
    316 
    317    If there is a character stream specified, the SAX parser will ignore any byte
    318    stream and will not attempt to open a URI connection to the system identifier.
    319 
    320 
    321 .. method:: InputSource.getCharacterStream()
    322 
    323    Get the character stream for this input source.
    324 
    325 
    326 .. _attributes-objects:
    327 
    328 The :class:`Attributes` Interface
    329 ---------------------------------
    330 
    331 :class:`Attributes` objects implement a portion of the mapping protocol,
    332 including the methods :meth:`~collections.Mapping.copy`,
    333 :meth:`~collections.Mapping.get`,
    334 :meth:`~collections.Mapping.has_key`,
    335 :meth:`~collections.Mapping.items`,
    336 :meth:`~collections.Mapping.keys`,
    337 and :meth:`~collections.Mapping.values`.  The following methods
    338 are also provided:
    339 
    340 
    341 .. method:: Attributes.getLength()
    342 
    343    Return the number of attributes.
    344 
    345 
    346 .. method:: Attributes.getNames()
    347 
    348    Return the names of the attributes.
    349 
    350 
    351 .. method:: Attributes.getType(name)
    352 
    353    Returns the type of the attribute *name*, which is normally ``'CDATA'``.
    354 
    355 
    356 .. method:: Attributes.getValue(name)
    357 
    358    Return the value of attribute *name*.
    359 
    360 .. getValueByQName, getNameByQName, getQNameByName, getQNames available
    361 .. here already, but documented only for derived class.
    362 
    363 
    364 .. _attributes-ns-objects:
    365 
    366 The :class:`AttributesNS` Interface
    367 -----------------------------------
    368 
    369 This interface is a subtype of the :class:`Attributes` interface (see section
    370 :ref:`attributes-objects`).  All methods supported by that interface are also
    371 available on :class:`AttributesNS` objects.
    372 
    373 The following methods are also available:
    374 
    375 
    376 .. method:: AttributesNS.getValueByQName(name)
    377 
    378    Return the value for a qualified name.
    379 
    380 
    381 .. method:: AttributesNS.getNameByQName(name)
    382 
    383    Return the ``(namespace, localname)`` pair for a qualified *name*.
    384 
    385 
    386 .. method:: AttributesNS.getQNameByName(name)
    387 
    388    Return the qualified name for a ``(namespace, localname)`` pair.
    389 
    390 
    391 .. method:: AttributesNS.getQNames()
    392 
    393    Return the qualified names of all attributes.
    394 
    395