Home | History | Annotate | Download | only in library
      1 .. _xml:
      2 
      3 XML Processing Modules
      4 ======================
      5 
      6 .. module:: xml
      7    :synopsis: Package containing XML processing modules
      8 .. sectionauthor:: Christian Heimes <christian (a] python.org>
      9 .. sectionauthor:: Georg Brandl <georg (a] python.org>
     10 
     11 
     12 Python's interfaces for processing XML are grouped in the ``xml`` package.
     13 
     14 .. warning::
     15 
     16    The XML modules are not secure against erroneous or maliciously
     17    constructed data.  If you need to parse untrusted or unauthenticated data see
     18    :ref:`xml-vulnerabilities`.
     19 
     20 It is important to note that modules in the :mod:`xml` package require that
     21 there be at least one SAX-compliant XML parser available. The Expat parser is
     22 included with Python, so the :mod:`xml.parsers.expat` module will always be
     23 available.
     24 
     25 The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
     26 definition of the Python bindings for the DOM and SAX interfaces.
     27 
     28 The XML handling submodules are:
     29 
     30 * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
     31   XML processor
     32 
     33 ..
     34 
     35 * :mod:`xml.dom`: the DOM API definition
     36 * :mod:`xml.dom.minidom`: a minimal DOM implementation
     37 * :mod:`xml.dom.pulldom`: support for building partial DOM trees
     38 
     39 ..
     40 
     41 * :mod:`xml.sax`: SAX2 base classes and convenience functions
     42 * :mod:`xml.parsers.expat`: the Expat parser binding
     43 
     44 
     45 .. _xml-vulnerabilities:
     46 
     47 XML vulnerabilities
     48 ===================
     49 
     50 The XML processing modules are not secure against maliciously constructed data.
     51 An attacker can abuse vulnerabilities for e.g. denial of service attacks, to
     52 access local files, to generate network connections to other machines, or
     53 to or circumvent firewalls. The attacks on XML abuse unfamiliar features
     54 like inline `DTD`_ (document type definition) with entities.
     55 
     56 The following table gives an overview of the known attacks and if the various
     57 modules are vulnerable to them.
     58 
     59 =========================  ==============   ===============   ==============   ==============   ==============
     60 kind                       sax              etree             minidom          pulldom          xmlrpc
     61 =========================  ==============   ===============   ==============   ==============   ==============
     62 billion laughs             **Vulnerable**   **Vulnerable**    **Vulnerable**   **Vulnerable**   **Vulnerable**
     63 quadratic blowup           **Vulnerable**   **Vulnerable**    **Vulnerable**   **Vulnerable**   **Vulnerable**
     64 external entity expansion  **Vulnerable**   Safe    (1)       Safe    (2)      **Vulnerable**   Safe    (3)
     65 `DTD`_ retrieval           **Vulnerable**   Safe              Safe             **Vulnerable**   Safe
     66 decompression bomb         Safe             Safe              Safe             Safe             **Vulnerable**
     67 =========================  ==============   ===============   ==============   ==============   ==============
     68 
     69 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
     70    ParserError when an entity occurs.
     71 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
     72    the unexpanded entity verbatim.
     73 3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
     74 
     75 
     76 billion laughs / exponential entity expansion
     77   The `Billion Laughs`_ attack -- also known as exponential entity expansion --
     78   uses multiple levels of nested entities. Each entity refers to another entity
     79   several times, the final entity definition contains a small string. Eventually
     80   the small string is expanded to several gigabytes. The exponential expansion
     81   consumes lots of CPU time, too.
     82 
     83 quadratic blowup entity expansion
     84   A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
     85   entity expansion, too. Instead of nested entities it repeats one large entity
     86   with a couple of thousand chars over and over again. The attack isn't as
     87   efficient as the exponential case but it avoids triggering countermeasures of
     88   parsers against heavily nested entities.
     89 
     90 external entity expansion
     91   Entity declarations can contain more than just text for replacement. They can
     92   also point to external resources by public identifiers or system identifiers.
     93   System identifiers are standard URIs or can refer to local files. The XML
     94   parser retrieves the resource with e.g. HTTP or FTP requests and embeds the
     95   content into the XML document.
     96 
     97 `DTD`_ retrieval
     98   Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type
     99   definitions from remote or local locations. The feature has similar
    100   implications as the external entity expansion issue.
    101 
    102 decompression bomb
    103   The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
    104   that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed
    105   files. For an attacker it can reduce the amount of transmitted data by three
    106   magnitudes or more.
    107 
    108 The documentation of `defusedxml`_ on PyPI has further information about
    109 all known attack vectors with examples and references.
    110 
    111 defused packages
    112 ----------------
    113 
    114 These external packages are recommended for any code that parses
    115 untrusted XML data.
    116 
    117 `defusedxml`_ is a pure Python package with modified subclasses of all stdlib
    118 XML parsers that prevent any potentially malicious operation. The
    119 package also ships with example exploits and extended documentation on more
    120 XML exploits like xpath injection.
    121 
    122 `defusedexpat`_ provides a modified libexpat and patched replacement
    123 :mod:`pyexpat` extension module with countermeasures against entity expansion
    124 DoS attacks. Defusedexpat still allows a sane and configurable amount of entity
    125 expansions. The modifications will be merged into future releases of Python.
    126 
    127 The workarounds and modifications are not included in patch releases as they
    128 break backward compatibility. After all inline DTD and entity expansion are
    129 well-defined XML features.
    130 
    131 
    132 .. _defusedxml: https://pypi.python.org/pypi/defusedxml/
    133 .. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/
    134 .. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs
    135 .. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb
    136 .. _DTD: https://en.wikipedia.org/wiki/Document_type_definition
    137