Home | History | Annotate | Download | only in library
      1 :mod:`urllib.request` --- Extensible library for opening URLs
      2 =============================================================
      3 
      4 .. module:: urllib.request
      5    :synopsis: Extensible library for opening URLs.
      6 
      7 .. moduleauthor:: Jeremy Hylton <jeremy (a] alum.mit.edu>
      8 .. sectionauthor:: Moshe Zadka <moshez (a] users.sourceforge.net>
      9 .. sectionauthor:: Senthil Kumaran <senthil (a] uthcode.com>
     10 
     11 **Source code:** :source:`Lib/urllib/request.py`
     12 
     13 --------------
     14 
     15 The :mod:`urllib.request` module defines functions and classes which help in
     16 opening URLs (mostly HTTP) in a complex world --- basic and digest
     17 authentication, redirections, cookies and more.
     18 
     19 .. seealso::
     20 
     21     The `Requests package <http://docs.python-requests.org/>`_
     22     is recommended for a higher-level HTTP client interface.
     23 
     24 
     25 The :mod:`urllib.request` module defines the following functions:
     26 
     27 
     28 .. function:: urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None)
     29 
     30    Open the URL *url*, which can be either a string or a
     31    :class:`Request` object.
     32 
     33    *data* must be an object specifying additional data to be sent to the
     34    server, or ``None`` if no such data is needed.  See :class:`Request`
     35    for details.
     36 
     37    urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
     38    in its HTTP requests.
     39 
     40    The optional *timeout* parameter specifies a timeout in seconds for
     41    blocking operations like the connection attempt (if not specified,
     42    the global default timeout setting will be used).  This actually
     43    only works for HTTP, HTTPS and FTP connections.
     44 
     45    If *context* is specified, it must be a :class:`ssl.SSLContext` instance
     46    describing the various SSL options. See :class:`~http.client.HTTPSConnection`
     47    for more details.
     48 
     49    The optional *cafile* and *capath* parameters specify a set of trusted
     50    CA certificates for HTTPS requests.  *cafile* should point to a single
     51    file containing a bundle of CA certificates, whereas *capath* should
     52    point to a directory of hashed certificate files.  More information can
     53    be found in :meth:`ssl.SSLContext.load_verify_locations`.
     54 
     55    The *cadefault* parameter is ignored.
     56 
     57    This function always returns an object which can work as a
     58    :term:`context manager` and has methods such as
     59 
     60    * :meth:`~urllib.response.addinfourl.geturl` --- return the URL of the resource retrieved,
     61      commonly used to determine if a redirect was followed
     62 
     63    * :meth:`~urllib.response.addinfourl.info` --- return the meta-information of the page, such as headers,
     64      in the form of an :func:`email.message_from_string` instance (see
     65      `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_)
     66 
     67    * :meth:`~urllib.response.addinfourl.getcode` -- return the HTTP status code of the response.
     68 
     69    For HTTP and HTTPS URLs, this function returns a
     70    :class:`http.client.HTTPResponse` object slightly modified. In addition
     71    to the three new methods above, the msg attribute contains the
     72    same information as the :attr:`~http.client.HTTPResponse.reason`
     73    attribute --- the reason phrase returned by server --- instead of
     74    the response headers as it is specified in the documentation for
     75    :class:`~http.client.HTTPResponse`.
     76 
     77    For FTP, file, and data URLs and requests explicitly handled by legacy
     78    :class:`URLopener` and :class:`FancyURLopener` classes, this function
     79    returns a :class:`urllib.response.addinfourl` object.
     80 
     81    Raises :exc:`~urllib.error.URLError` on protocol errors.
     82 
     83    Note that ``None`` may be returned if no handler handles the request (though
     84    the default installed global :class:`OpenerDirector` uses
     85    :class:`UnknownHandler` to ensure this never happens).
     86 
     87    In addition, if proxy settings are detected (for example, when a ``*_proxy``
     88    environment variable like :envvar:`http_proxy` is set),
     89    :class:`ProxyHandler` is default installed and makes sure the requests are
     90    handled through the proxy.
     91 
     92    The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
     93    discontinued; :func:`urllib.request.urlopen` corresponds to the old
     94    ``urllib2.urlopen``.  Proxy handling, which was done by passing a dictionary
     95    parameter to ``urllib.urlopen``, can be obtained by using
     96    :class:`ProxyHandler` objects.
     97 
     98    .. versionchanged:: 3.2
     99       *cafile* and *capath* were added.
    100 
    101    .. versionchanged:: 3.2
    102       HTTPS virtual hosts are now supported if possible (that is, if
    103       :data:`ssl.HAS_SNI` is true).
    104 
    105    .. versionadded:: 3.2
    106       *data* can be an iterable object.
    107 
    108    .. versionchanged:: 3.3
    109       *cadefault* was added.
    110 
    111    .. versionchanged:: 3.4.3
    112       *context* was added.
    113 
    114    .. deprecated:: 3.6
    115 
    116        *cafile*, *capath* and *cadefault* are deprecated in favor of *context*.
    117        Please use :meth:`ssl.SSLContext.load_cert_chain` instead, or let
    118        :func:`ssl.create_default_context` select the system's trusted CA
    119        certificates for you.
    120 
    121 .. function:: install_opener(opener)
    122 
    123    Install an :class:`OpenerDirector` instance as the default global opener.
    124    Installing an opener is only necessary if you want urlopen to use that
    125    opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
    126    :func:`~urllib.request.urlopen`.  The code does not check for a real
    127    :class:`OpenerDirector`, and any class with the appropriate interface will
    128    work.
    129 
    130 
    131 .. function:: build_opener([handler, ...])
    132 
    133    Return an :class:`OpenerDirector` instance, which chains the handlers in the
    134    order given. *handler*\s can be either instances of :class:`BaseHandler`, or
    135    subclasses of :class:`BaseHandler` (in which case it must be possible to call
    136    the constructor without any parameters).  Instances of the following classes
    137    will be in front of the *handler*\s, unless the *handler*\s contain them,
    138    instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
    139    settings are detected), :class:`UnknownHandler`, :class:`HTTPHandler`,
    140    :class:`HTTPDefaultErrorHandler`, :class:`HTTPRedirectHandler`,
    141    :class:`FTPHandler`, :class:`FileHandler`, :class:`HTTPErrorProcessor`.
    142 
    143    If the Python installation has SSL support (i.e., if the :mod:`ssl` module
    144    can be imported), :class:`HTTPSHandler` will also be added.
    145 
    146    A :class:`BaseHandler` subclass may also change its :attr:`handler_order`
    147    attribute to modify its position in the handlers list.
    148 
    149 
    150 .. function:: pathname2url(path)
    151 
    152    Convert the pathname *path* from the local syntax for a path to the form used in
    153    the path component of a URL.  This does not produce a complete URL.  The return
    154    value will already be quoted using the :func:`~urllib.parse.quote` function.
    155 
    156 
    157 .. function:: url2pathname(path)
    158 
    159    Convert the path component *path* from a percent-encoded URL to the local syntax for a
    160    path.  This does not accept a complete URL.  This function uses
    161    :func:`~urllib.parse.unquote` to decode *path*.
    162 
    163 .. function:: getproxies()
    164 
    165    This helper function returns a dictionary of scheme to proxy server URL
    166    mappings. It scans the environment for variables named ``<scheme>_proxy``,
    167    in a case insensitive approach, for all operating systems first, and when it
    168    cannot find it, looks for proxy information from Mac OSX System
    169    Configuration for Mac OS X and Windows Systems Registry for Windows.
    170    If both lowercase and uppercase environment variables exist (and disagree),
    171    lowercase is preferred.
    172 
    173    .. note::
    174 
    175       If the environment variable ``REQUEST_METHOD`` is set, which usually
    176       indicates your script is running in a CGI environment, the environment
    177       variable ``HTTP_PROXY`` (uppercase ``_PROXY``) will be ignored. This is
    178       because that variable can be injected by a client using the "Proxy:" HTTP
    179       header. If you need to use an HTTP proxy in a CGI environment, either use
    180       ``ProxyHandler`` explicitly, or make sure the variable name is in
    181       lowercase (or at least the ``_proxy`` suffix).
    182 
    183 
    184 The following classes are provided:
    185 
    186 .. class:: Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
    187 
    188    This class is an abstraction of a URL request.
    189 
    190    *url* should be a string containing a valid URL.
    191 
    192    *data* must be an object specifying additional data to send to the
    193    server, or ``None`` if no such data is needed.  Currently HTTP
    194    requests are the only ones that use *data*.  The supported object
    195    types include bytes, file-like objects, and iterables.  If no
    196    ``Content-Length`` nor ``Transfer-Encoding`` header field
    197    has been provided, :class:`HTTPHandler` will set these headers according
    198    to the type of *data*.  ``Content-Length`` will be used to send
    199    bytes objects, while ``Transfer-Encoding: chunked`` as specified in
    200    :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables.
    201 
    202    For an HTTP POST request method, *data* should be a buffer in the
    203    standard :mimetype:`application/x-www-form-urlencoded` format.  The
    204    :func:`urllib.parse.urlencode` function takes a mapping or sequence
    205    of 2-tuples and returns an ASCII string in this format. It should
    206    be encoded to bytes before being used as the *data* parameter.
    207 
    208    *headers* should be a dictionary, and will be treated as if
    209    :meth:`add_header` was called with each key and value as arguments.
    210    This is often used to "spoof" the ``User-Agent`` header value, which is
    211    used by a browser to identify itself -- some HTTP servers only
    212    allow requests coming from common browsers as opposed to scripts.
    213    For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
    214    (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while
    215    :mod:`urllib`'s default user agent string is
    216    ``"Python-urllib/2.6"`` (on Python 2.6).
    217 
    218    An appropriate ``Content-Type`` header should be included if the *data*
    219    argument is present.  If this header has not been provided and *data*
    220    is not None, ``Content-Type: application/x-www-form-urlencoded`` will
    221    be added as a default.
    222 
    223    The final two arguments are only of interest for correct handling
    224    of third-party HTTP cookies:
    225 
    226    *origin_req_host* should be the request-host of the origin
    227    transaction, as defined by :rfc:`2965`.  It defaults to
    228    ``http.cookiejar.request_host(self)``.  This is the host name or IP
    229    address of the original request that was initiated by the user.
    230    For example, if the request is for an image in an HTML document,
    231    this should be the request-host of the request for the page
    232    containing the image.
    233 
    234    *unverifiable* should indicate whether the request is unverifiable,
    235    as defined by :rfc:`2965`.  It defaults to ``False``.  An unverifiable
    236    request is one whose URL the user did not have the option to
    237    approve.  For example, if the request is for an image in an HTML
    238    document, and the user had no option to approve the automatic
    239    fetching of the image, this should be true.
    240 
    241    *method* should be a string that indicates the HTTP request method that
    242    will be used (e.g. ``'HEAD'``).  If provided, its value is stored in the
    243    :attr:`~Request.method` attribute and is used by :meth:`get_method()`.
    244    The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise.
    245    Subclasses may indicate a different default method by setting the
    246    :attr:`~Request.method` attribute in the class itself.
    247 
    248    .. note::
    249       The request will not work as expected if the data object is unable
    250       to deliver its content more than once (e.g. a file or an iterable
    251       that can produce the content only once) and the request is retried
    252       for HTTP redirects or authentication.  The *data* is sent to the
    253       HTTP server right away after the headers.  There is no support for
    254       a 100-continue expectation in the library.
    255 
    256    .. versionchanged:: 3.3
    257       :attr:`Request.method` argument is added to the Request class.
    258 
    259    .. versionchanged:: 3.4
    260       Default :attr:`Request.method` may be indicated at the class level.
    261 
    262    .. versionchanged:: 3.6
    263       Do not raise an error if the ``Content-Length`` has not been
    264       provided and *data* is neither ``None`` nor a bytes object.
    265       Fall back to use chunked transfer encoding instead.
    266 
    267 .. class:: OpenerDirector()
    268 
    269    The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
    270    together. It manages the chaining of handlers, and recovery from errors.
    271 
    272 
    273 .. class:: BaseHandler()
    274 
    275    This is the base class for all registered handlers --- and handles only the
    276    simple mechanics of registration.
    277 
    278 
    279 .. class:: HTTPDefaultErrorHandler()
    280 
    281    A class which defines a default handler for HTTP error responses; all responses
    282    are turned into :exc:`~urllib.error.HTTPError` exceptions.
    283 
    284 
    285 .. class:: HTTPRedirectHandler()
    286 
    287    A class to handle redirections.
    288 
    289 
    290 .. class:: HTTPCookieProcessor(cookiejar=None)
    291 
    292    A class to handle HTTP Cookies.
    293 
    294 
    295 .. class:: ProxyHandler(proxies=None)
    296 
    297    Cause requests to go through a proxy. If *proxies* is given, it must be a
    298    dictionary mapping protocol names to URLs of proxies. The default is to read
    299    the list of proxies from the environment variables
    300    ``<protocol>_proxy``.  If no proxy environment variables are set, then
    301    in a Windows environment proxy settings are obtained from the registry's
    302    Internet Settings section, and in a Mac OS X environment proxy information
    303    is retrieved from the OS X System Configuration Framework.
    304 
    305    To disable autodetected proxy pass an empty dictionary.
    306 
    307    The :envvar:`no_proxy` environment variable can be used to specify hosts
    308    which shouldn't be reached via proxy; if set, it should be a comma-separated
    309    list of hostname suffixes, optionally with ``:port`` appended, for example
    310    ``cern.ch,ncsa.uiuc.edu,some.host:8080``.
    311 
    312     .. note::
    313 
    314        ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set;
    315        see the documentation on :func:`~urllib.request.getproxies`.
    316 
    317 
    318 .. class:: HTTPPasswordMgr()
    319 
    320    Keep a database of  ``(realm, uri) -> (user, password)`` mappings.
    321 
    322 
    323 .. class:: HTTPPasswordMgrWithDefaultRealm()
    324 
    325    Keep a database of  ``(realm, uri) -> (user, password)`` mappings. A realm of
    326    ``None`` is considered a catch-all realm, which is searched if no other realm
    327    fits.
    328 
    329 
    330 .. class:: HTTPPasswordMgrWithPriorAuth()
    331 
    332    A variant of :class:`HTTPPasswordMgrWithDefaultRealm` that also has a
    333    database of ``uri -> is_authenticated`` mappings.  Can be used by a
    334    BasicAuth handler to determine when to send authentication credentials
    335    immediately instead of waiting for a ``401`` response first.
    336 
    337    .. versionadded:: 3.5
    338 
    339 
    340 .. class:: AbstractBasicAuthHandler(password_mgr=None)
    341 
    342    This is a mixin class that helps with HTTP authentication, both to the remote
    343    host and to a proxy. *password_mgr*, if given, should be something that is
    344    compatible with :class:`HTTPPasswordMgr`; refer to section
    345    :ref:`http-password-mgr` for information on the interface that must be
    346    supported.  If *passwd_mgr* also provides ``is_authenticated`` and
    347    ``update_authenticated`` methods (see
    348    :ref:`http-password-mgr-with-prior-auth`), then the handler will use the
    349    ``is_authenticated`` result for a given URI to determine whether or not to
    350    send authentication credentials with the request.  If ``is_authenticated``
    351    returns ``True`` for the URI, credentials are sent.  If ``is_authenticated``
    352    is ``False``, credentials are not sent, and then if a ``401`` response is
    353    received the request is re-sent with the authentication credentials.  If
    354    authentication succeeds, ``update_authenticated`` is called to set
    355    ``is_authenticated`` ``True`` for the URI, so that subsequent requests to
    356    the URI or any of its super-URIs will automatically include the
    357    authentication credentials.
    358 
    359    .. versionadded:: 3.5
    360       Added ``is_authenticated`` support.
    361 
    362 
    363 .. class:: HTTPBasicAuthHandler(password_mgr=None)
    364 
    365    Handle authentication with the remote host. *password_mgr*, if given, should
    366    be something that is compatible with :class:`HTTPPasswordMgr`; refer to
    367    section :ref:`http-password-mgr` for information on the interface that must
    368    be supported. HTTPBasicAuthHandler will raise a :exc:`ValueError` when
    369    presented with a wrong Authentication scheme.
    370 
    371 
    372 .. class:: ProxyBasicAuthHandler(password_mgr=None)
    373 
    374    Handle authentication with the proxy. *password_mgr*, if given, should be
    375    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    376    :ref:`http-password-mgr` for information on the interface that must be
    377    supported.
    378 
    379 
    380 .. class:: AbstractDigestAuthHandler(password_mgr=None)
    381 
    382    This is a mixin class that helps with HTTP authentication, both to the remote
    383    host and to a proxy. *password_mgr*, if given, should be something that is
    384    compatible with :class:`HTTPPasswordMgr`; refer to section
    385    :ref:`http-password-mgr` for information on the interface that must be
    386    supported.
    387 
    388 
    389 .. class:: HTTPDigestAuthHandler(password_mgr=None)
    390 
    391    Handle authentication with the remote host. *password_mgr*, if given, should
    392    be something that is compatible with :class:`HTTPPasswordMgr`; refer to
    393    section :ref:`http-password-mgr` for information on the interface that must
    394    be supported. When both Digest Authentication Handler and Basic
    395    Authentication Handler are both added, Digest Authentication is always tried
    396    first. If the Digest Authentication returns a 40x response again, it is sent
    397    to Basic Authentication handler to Handle.  This Handler method will raise a
    398    :exc:`ValueError` when presented with an authentication scheme other than
    399    Digest or Basic.
    400 
    401    .. versionchanged:: 3.3
    402       Raise :exc:`ValueError` on unsupported Authentication Scheme.
    403 
    404 
    405 
    406 .. class:: ProxyDigestAuthHandler(password_mgr=None)
    407 
    408    Handle authentication with the proxy. *password_mgr*, if given, should be
    409    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    410    :ref:`http-password-mgr` for information on the interface that must be
    411    supported.
    412 
    413 
    414 .. class:: HTTPHandler()
    415 
    416    A class to handle opening of HTTP URLs.
    417 
    418 
    419 .. class:: HTTPSHandler(debuglevel=0, context=None, check_hostname=None)
    420 
    421    A class to handle opening of HTTPS URLs.  *context* and *check_hostname*
    422    have the same meaning as in :class:`http.client.HTTPSConnection`.
    423 
    424    .. versionchanged:: 3.2
    425       *context* and *check_hostname* were added.
    426 
    427 
    428 .. class:: FileHandler()
    429 
    430    Open local files.
    431 
    432 .. class:: DataHandler()
    433 
    434    Open data URLs.
    435 
    436    .. versionadded:: 3.4
    437 
    438 .. class:: FTPHandler()
    439 
    440    Open FTP URLs.
    441 
    442 
    443 .. class:: CacheFTPHandler()
    444 
    445    Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
    446 
    447 
    448 .. class:: UnknownHandler()
    449 
    450    A catch-all class to handle unknown URLs.
    451 
    452 
    453 .. class:: HTTPErrorProcessor()
    454 
    455    Process HTTP error responses.
    456 
    457 
    458 .. _request-objects:
    459 
    460 Request Objects
    461 ---------------
    462 
    463 The following methods describe :class:`Request`'s public interface,
    464 and so all may be overridden in subclasses.  It also defines several
    465 public attributes that can be used by clients to inspect the parsed
    466 request.
    467 
    468 .. attribute:: Request.full_url
    469 
    470    The original URL passed to the constructor.
    471 
    472    .. versionchanged:: 3.4
    473 
    474    Request.full_url is a property with setter, getter and a deleter. Getting
    475    :attr:`~Request.full_url` returns the original request URL with the
    476    fragment, if it was present.
    477 
    478 .. attribute:: Request.type
    479 
    480    The URI scheme.
    481 
    482 .. attribute:: Request.host
    483 
    484    The URI authority, typically a host, but may also contain a port
    485    separated by a colon.
    486 
    487 .. attribute:: Request.origin_req_host
    488 
    489    The original host for the request, without port.
    490 
    491 .. attribute:: Request.selector
    492 
    493    The URI path.  If the :class:`Request` uses a proxy, then selector
    494    will be the full URL that is passed to the proxy.
    495 
    496 .. attribute:: Request.data
    497 
    498    The entity body for the request, or ``None`` if not specified.
    499 
    500    .. versionchanged:: 3.4
    501       Changing value of :attr:`Request.data` now deletes "Content-Length"
    502       header if it was previously set or calculated.
    503 
    504 .. attribute:: Request.unverifiable
    505 
    506    boolean, indicates whether the request is unverifiable as defined
    507    by :rfc:`2965`.
    508 
    509 .. attribute:: Request.method
    510 
    511    The HTTP request method to use.  By default its value is :const:`None`,
    512    which means that :meth:`~Request.get_method` will do its normal computation
    513    of the method to be used.  Its value can be set (thus overriding the default
    514    computation in :meth:`~Request.get_method`) either by providing a default
    515    value by setting it at the class level in a :class:`Request` subclass, or by
    516    passing a value in to the :class:`Request` constructor via the *method*
    517    argument.
    518 
    519    .. versionadded:: 3.3
    520 
    521    .. versionchanged:: 3.4
    522       A default value can now be set in subclasses; previously it could only
    523       be set via the constructor argument.
    524 
    525 
    526 .. method:: Request.get_method()
    527 
    528    Return a string indicating the HTTP request method.  If
    529    :attr:`Request.method` is not ``None``, return its value, otherwise return
    530    ``'GET'`` if :attr:`Request.data` is ``None``, or ``'POST'`` if it's not.
    531    This is only meaningful for HTTP requests.
    532 
    533    .. versionchanged:: 3.3
    534       get_method now looks at the value of :attr:`Request.method`.
    535 
    536 
    537 .. method:: Request.add_header(key, val)
    538 
    539    Add another header to the request.  Headers are currently ignored by all
    540    handlers except HTTP handlers, where they are added to the list of headers sent
    541    to the server.  Note that there cannot be more than one header with the same
    542    name, and later calls will overwrite previous calls in case the *key* collides.
    543    Currently, this is no loss of HTTP functionality, since all headers which have
    544    meaning when used more than once have a (header-specific) way of gaining the
    545    same functionality using only one header.
    546 
    547 
    548 .. method:: Request.add_unredirected_header(key, header)
    549 
    550    Add a header that will not be added to a redirected request.
    551 
    552 
    553 .. method:: Request.has_header(header)
    554 
    555    Return whether the instance has the named header (checks both regular and
    556    unredirected).
    557 
    558 
    559 .. method:: Request.remove_header(header)
    560 
    561    Remove named header from the request instance (both from regular and
    562    unredirected headers).
    563 
    564    .. versionadded:: 3.4
    565 
    566 
    567 .. method:: Request.get_full_url()
    568 
    569    Return the URL given in the constructor.
    570 
    571    .. versionchanged:: 3.4
    572 
    573    Returns :attr:`Request.full_url`
    574 
    575 
    576 .. method:: Request.set_proxy(host, type)
    577 
    578    Prepare the request by connecting to a proxy server. The *host* and *type* will
    579    replace those of the instance, and the instance's selector will be the original
    580    URL given in the constructor.
    581 
    582 
    583 .. method:: Request.get_header(header_name, default=None)
    584 
    585    Return the value of the given header. If the header is not present, return
    586    the default value.
    587 
    588 
    589 .. method:: Request.header_items()
    590 
    591    Return a list of tuples (header_name, header_value) of the Request headers.
    592 
    593 .. versionchanged:: 3.4
    594    The request methods add_data, has_data, get_data, get_type, get_host,
    595    get_selector, get_origin_req_host and is_unverifiable that were deprecated
    596    since 3.3 have been removed.
    597 
    598 
    599 .. _opener-director-objects:
    600 
    601 OpenerDirector Objects
    602 ----------------------
    603 
    604 :class:`OpenerDirector` instances have the following methods:
    605 
    606 
    607 .. method:: OpenerDirector.add_handler(handler)
    608 
    609    *handler* should be an instance of :class:`BaseHandler`.  The following methods
    610    are searched, and added to the possible chains (note that HTTP errors are a
    611    special case).
    612 
    613    * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
    614      URLs.
    615 
    616    * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
    617      errors with HTTP error code *type*.
    618 
    619    * :meth:`protocol_error` --- signal that the handler knows how to handle errors
    620      from (non-\ ``http``) *protocol*.
    621 
    622    * :meth:`protocol_request` --- signal that the handler knows how to pre-process
    623      *protocol* requests.
    624 
    625    * :meth:`protocol_response` --- signal that the handler knows how to
    626      post-process *protocol* responses.
    627 
    628 
    629 .. method:: OpenerDirector.open(url, data=None[, timeout])
    630 
    631    Open the given *url* (which can be a request object or a string), optionally
    632    passing the given *data*. Arguments, return values and exceptions raised are
    633    the same as those of :func:`urlopen` (which simply calls the :meth:`open`
    634    method on the currently installed global :class:`OpenerDirector`).  The
    635    optional *timeout* parameter specifies a timeout in seconds for blocking
    636    operations like the connection attempt (if not specified, the global default
    637    timeout setting will be used). The timeout feature actually works only for
    638    HTTP, HTTPS and FTP connections).
    639 
    640 
    641 .. method:: OpenerDirector.error(proto, *args)
    642 
    643    Handle an error of the given protocol.  This will call the registered error
    644    handlers for the given protocol with the given arguments (which are protocol
    645    specific).  The HTTP protocol is a special case which uses the HTTP response
    646    code to determine the specific error handler; refer to the :meth:`http_error_\*`
    647    methods of the handler classes.
    648 
    649    Return values and exceptions raised are the same as those of :func:`urlopen`.
    650 
    651 OpenerDirector objects open URLs in three stages:
    652 
    653 The order in which these methods are called within each stage is determined by
    654 sorting the handler instances.
    655 
    656 #. Every handler with a method named like :meth:`protocol_request` has that
    657    method called to pre-process the request.
    658 
    659 #. Handlers with a method named like :meth:`protocol_open` are called to handle
    660    the request. This stage ends when a handler either returns a non-\ :const:`None`
    661    value (ie. a response), or raises an exception (usually
    662    :exc:`~urllib.error.URLError`).  Exceptions are allowed to propagate.
    663 
    664    In fact, the above algorithm is first tried for methods named
    665    :meth:`default_open`.  If all such methods return :const:`None`, the algorithm
    666    is repeated for methods named like :meth:`protocol_open`.  If all such methods
    667    return :const:`None`, the algorithm is repeated for methods named
    668    :meth:`unknown_open`.
    669 
    670    Note that the implementation of these methods may involve calls of the parent
    671    :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
    672    :meth:`~OpenerDirector.error` methods.
    673 
    674 #. Every handler with a method named like :meth:`protocol_response` has that
    675    method called to post-process the response.
    676 
    677 
    678 .. _base-handler-objects:
    679 
    680 BaseHandler Objects
    681 -------------------
    682 
    683 :class:`BaseHandler` objects provide a couple of methods that are directly
    684 useful, and others that are meant to be used by derived classes.  These are
    685 intended for direct use:
    686 
    687 
    688 .. method:: BaseHandler.add_parent(director)
    689 
    690    Add a director as parent.
    691 
    692 
    693 .. method:: BaseHandler.close()
    694 
    695    Remove any parents.
    696 
    697 The following attribute and methods should only be used by classes derived from
    698 :class:`BaseHandler`.
    699 
    700 .. note::
    701 
    702    The convention has been adopted that subclasses defining
    703    :meth:`protocol_request` or :meth:`protocol_response` methods are named
    704    :class:`\*Processor`; all others are named :class:`\*Handler`.
    705 
    706 
    707 .. attribute:: BaseHandler.parent
    708 
    709    A valid :class:`OpenerDirector`, which can be used to open using a different
    710    protocol, or handle errors.
    711 
    712 
    713 .. method:: BaseHandler.default_open(req)
    714 
    715    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    716    define it if they want to catch all URLs.
    717 
    718    This method, if implemented, will be called by the parent
    719    :class:`OpenerDirector`.  It should return a file-like object as described in
    720    the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
    721    It should raise :exc:`~urllib.error.URLError`, unless a truly exceptional
    722    thing happens (for example, :exc:`MemoryError` should not be mapped to
    723    :exc:`URLError`).
    724 
    725    This method will be called before any protocol-specific open method.
    726 
    727 
    728 .. method:: BaseHandler.protocol_open(req)
    729    :noindex:
    730 
    731    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    732    define it if they want to handle URLs with the given protocol.
    733 
    734    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    735    Return values should be the same as for  :meth:`default_open`.
    736 
    737 
    738 .. method:: BaseHandler.unknown_open(req)
    739 
    740    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    741    define it if they want to catch all URLs with no specific registered handler to
    742    open it.
    743 
    744    This method, if implemented, will be called by the :attr:`parent`
    745    :class:`OpenerDirector`.  Return values should be the same as for
    746    :meth:`default_open`.
    747 
    748 
    749 .. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
    750 
    751    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    752    override it if they intend to provide a catch-all for otherwise unhandled HTTP
    753    errors.  It will be called automatically by the  :class:`OpenerDirector` getting
    754    the error, and should not normally be called in other circumstances.
    755 
    756    *req* will be a :class:`Request` object, *fp* will be a file-like object with
    757    the HTTP error body, *code* will be the three-digit code of the error, *msg*
    758    will be the user-visible explanation of the code and *hdrs* will be a mapping
    759    object with the headers of the error.
    760 
    761    Return values and exceptions raised should be the same as those of
    762    :func:`urlopen`.
    763 
    764 
    765 .. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
    766 
    767    *nnn* should be a three-digit HTTP error code.  This method is also not defined
    768    in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
    769    subclass, when an HTTP error with code *nnn* occurs.
    770 
    771    Subclasses should override this method to handle specific HTTP errors.
    772 
    773    Arguments, return values and exceptions raised should be the same as for
    774    :meth:`http_error_default`.
    775 
    776 
    777 .. method:: BaseHandler.protocol_request(req)
    778    :noindex:
    779 
    780    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    781    define it if they want to pre-process requests of the given protocol.
    782 
    783    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    784    *req* will be a :class:`Request` object. The return value should be a
    785    :class:`Request` object.
    786 
    787 
    788 .. method:: BaseHandler.protocol_response(req, response)
    789    :noindex:
    790 
    791    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    792    define it if they want to post-process responses of the given protocol.
    793 
    794    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    795    *req* will be a :class:`Request` object. *response* will be an object
    796    implementing the same interface as the return value of :func:`urlopen`.  The
    797    return value should implement the same interface as the return value of
    798    :func:`urlopen`.
    799 
    800 
    801 .. _http-redirect-handler:
    802 
    803 HTTPRedirectHandler Objects
    804 ---------------------------
    805 
    806 .. note::
    807 
    808    Some HTTP redirections require action from this module's client code.  If this
    809    is the case, :exc:`~urllib.error.HTTPError` is raised.  See :rfc:`2616` for
    810    details of the precise meanings of the various redirection codes.
    811 
    812    An :class:`HTTPError` exception raised as a security consideration if the
    813    HTTPRedirectHandler is presented with a redirected URL which is not an HTTP,
    814    HTTPS or FTP URL.
    815 
    816 
    817 .. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
    818 
    819    Return a :class:`Request` or ``None`` in response to a redirect. This is called
    820    by the default implementations of the :meth:`http_error_30\*` methods when a
    821    redirection is received from the server.  If a redirection should take place,
    822    return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
    823    redirect to *newurl*.  Otherwise, raise :exc:`~urllib.error.HTTPError` if
    824    no other handler should try to handle this URL, or return ``None`` if you
    825    can't but another handler might.
    826 
    827    .. note::
    828 
    829       The default implementation of this method does not strictly follow :rfc:`2616`,
    830       which says that 301 and 302 responses to ``POST`` requests must not be
    831       automatically redirected without confirmation by the user.  In reality, browsers
    832       do allow automatic redirection of these responses, changing the POST to a
    833       ``GET``, and the default implementation reproduces this behavior.
    834 
    835 
    836 .. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
    837 
    838    Redirect to the ``Location:`` or ``URI:`` URL.  This method is called by the
    839    parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
    840 
    841 
    842 .. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
    843 
    844    The same as :meth:`http_error_301`, but called for the 'found' response.
    845 
    846 
    847 .. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
    848 
    849    The same as :meth:`http_error_301`, but called for the 'see other' response.
    850 
    851 
    852 .. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
    853 
    854    The same as :meth:`http_error_301`, but called for the 'temporary redirect'
    855    response.
    856 
    857 
    858 .. _http-cookie-processor:
    859 
    860 HTTPCookieProcessor Objects
    861 ---------------------------
    862 
    863 :class:`HTTPCookieProcessor` instances have one attribute:
    864 
    865 .. attribute:: HTTPCookieProcessor.cookiejar
    866 
    867    The :class:`http.cookiejar.CookieJar` in which cookies are stored.
    868 
    869 
    870 .. _proxy-handler:
    871 
    872 ProxyHandler Objects
    873 --------------------
    874 
    875 
    876 .. method:: ProxyHandler.protocol_open(request)
    877    :noindex:
    878 
    879    The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
    880    *protocol* which has a proxy in the *proxies* dictionary given in the
    881    constructor.  The method will modify requests to go through the proxy, by
    882    calling ``request.set_proxy()``, and call the next handler in the chain to
    883    actually execute the protocol.
    884 
    885 
    886 .. _http-password-mgr:
    887 
    888 HTTPPasswordMgr Objects
    889 -----------------------
    890 
    891 These methods are available on :class:`HTTPPasswordMgr` and
    892 :class:`HTTPPasswordMgrWithDefaultRealm` objects.
    893 
    894 
    895 .. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
    896 
    897    *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
    898    *passwd* must be strings. This causes ``(user, passwd)`` to be used as
    899    authentication tokens when authentication for *realm* and a super-URI of any of
    900    the given URIs is given.
    901 
    902 
    903 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
    904 
    905    Get user/password for given realm and URI, if any.  This method will return
    906    ``(None, None)`` if there is no matching user/password.
    907 
    908    For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
    909    searched if the given *realm* has no matching user/password.
    910 
    911 
    912 .. _http-password-mgr-with-prior-auth:
    913 
    914 HTTPPasswordMgrWithPriorAuth Objects
    915 ------------------------------------
    916 
    917 This password manager extends :class:`HTTPPasswordMgrWithDefaultRealm` to support
    918 tracking URIs for which authentication credentials should always be sent.
    919 
    920 
    921 .. method:: HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, \
    922             passwd, is_authenticated=False)
    923 
    924    *realm*, *uri*, *user*, *passwd* are as for
    925    :meth:`HTTPPasswordMgr.add_password`.  *is_authenticated* sets the initial
    926    value of the ``is_authenticated`` flag for the given URI or list of URIs.
    927    If *is_authenticated* is specified as ``True``, *realm* is ignored.
    928 
    929 
    930 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
    931 
    932    Same as for :class:`HTTPPasswordMgrWithDefaultRealm` objects
    933 
    934 
    935 .. method:: HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, \
    936             is_authenticated=False)
    937 
    938    Update the ``is_authenticated`` flag for the given *uri* or list
    939    of URIs.
    940 
    941 
    942 .. method:: HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri)
    943 
    944    Returns the current state of the ``is_authenticated`` flag for
    945    the given URI.
    946 
    947 
    948 .. _abstract-basic-auth-handler:
    949 
    950 AbstractBasicAuthHandler Objects
    951 --------------------------------
    952 
    953 
    954 .. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
    955 
    956    Handle an authentication request by getting a user/password pair, and re-trying
    957    the request.  *authreq* should be the name of the header where the information
    958    about the realm is included in the request, *host* specifies the URL and path to
    959    authenticate for, *req* should be the (failed) :class:`Request` object, and
    960    *headers* should be the error headers.
    961 
    962    *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
    963    authority component (e.g. ``"http://python.org/"``). In either case, the
    964    authority must not contain a userinfo component (so, ``"python.org"`` and
    965    ``"python.org:80"`` are fine, ``"joe:password (a] python.org"`` is not).
    966 
    967 
    968 .. _http-basic-auth-handler:
    969 
    970 HTTPBasicAuthHandler Objects
    971 ----------------------------
    972 
    973 
    974 .. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code,  msg, hdrs)
    975 
    976    Retry the request with authentication information, if available.
    977 
    978 
    979 .. _proxy-basic-auth-handler:
    980 
    981 ProxyBasicAuthHandler Objects
    982 -----------------------------
    983 
    984 
    985 .. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code,  msg, hdrs)
    986 
    987    Retry the request with authentication information, if available.
    988 
    989 
    990 .. _abstract-digest-auth-handler:
    991 
    992 AbstractDigestAuthHandler Objects
    993 ---------------------------------
    994 
    995 
    996 .. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
    997 
    998    *authreq* should be the name of the header where the information about the realm
    999    is included in the request, *host* should be the host to authenticate to, *req*
   1000    should be the (failed) :class:`Request` object, and *headers* should be the
   1001    error headers.
   1002 
   1003 
   1004 .. _http-digest-auth-handler:
   1005 
   1006 HTTPDigestAuthHandler Objects
   1007 -----------------------------
   1008 
   1009 
   1010 .. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code,  msg, hdrs)
   1011 
   1012    Retry the request with authentication information, if available.
   1013 
   1014 
   1015 .. _proxy-digest-auth-handler:
   1016 
   1017 ProxyDigestAuthHandler Objects
   1018 ------------------------------
   1019 
   1020 
   1021 .. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code,  msg, hdrs)
   1022 
   1023    Retry the request with authentication information, if available.
   1024 
   1025 
   1026 .. _http-handler-objects:
   1027 
   1028 HTTPHandler Objects
   1029 -------------------
   1030 
   1031 
   1032 .. method:: HTTPHandler.http_open(req)
   1033 
   1034    Send an HTTP request, which can be either GET or POST, depending on
   1035    ``req.has_data()``.
   1036 
   1037 
   1038 .. _https-handler-objects:
   1039 
   1040 HTTPSHandler Objects
   1041 --------------------
   1042 
   1043 
   1044 .. method:: HTTPSHandler.https_open(req)
   1045 
   1046    Send an HTTPS request, which can be either GET or POST, depending on
   1047    ``req.has_data()``.
   1048 
   1049 
   1050 .. _file-handler-objects:
   1051 
   1052 FileHandler Objects
   1053 -------------------
   1054 
   1055 
   1056 .. method:: FileHandler.file_open(req)
   1057 
   1058    Open the file locally, if there is no host name, or the host name is
   1059    ``'localhost'``.
   1060 
   1061    .. versionchanged:: 3.2
   1062       This method is applicable only for local hostnames.  When a remote
   1063       hostname is given, an :exc:`~urllib.error.URLError` is raised.
   1064 
   1065 
   1066 .. _data-handler-objects:
   1067 
   1068 DataHandler Objects
   1069 -------------------
   1070 
   1071 .. method:: DataHandler.data_open(req)
   1072 
   1073    Read a data URL. This kind of URL contains the content encoded in the URL
   1074    itself. The data URL syntax is specified in :rfc:`2397`. This implementation
   1075    ignores white spaces in base64 encoded data URLs so the URL may be wrapped
   1076    in whatever source file it comes from. But even though some browsers don't
   1077    mind about a missing padding at the end of a base64 encoded data URL, this
   1078    implementation will raise an :exc:`ValueError` in that case.
   1079 
   1080 
   1081 .. _ftp-handler-objects:
   1082 
   1083 FTPHandler Objects
   1084 ------------------
   1085 
   1086 
   1087 .. method:: FTPHandler.ftp_open(req)
   1088 
   1089    Open the FTP file indicated by *req*. The login is always done with empty
   1090    username and password.
   1091 
   1092 
   1093 .. _cacheftp-handler-objects:
   1094 
   1095 CacheFTPHandler Objects
   1096 -----------------------
   1097 
   1098 :class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
   1099 following additional methods:
   1100 
   1101 
   1102 .. method:: CacheFTPHandler.setTimeout(t)
   1103 
   1104    Set timeout of connections to *t* seconds.
   1105 
   1106 
   1107 .. method:: CacheFTPHandler.setMaxConns(m)
   1108 
   1109    Set maximum number of cached connections to *m*.
   1110 
   1111 
   1112 .. _unknown-handler-objects:
   1113 
   1114 UnknownHandler Objects
   1115 ----------------------
   1116 
   1117 
   1118 .. method:: UnknownHandler.unknown_open()
   1119 
   1120    Raise a :exc:`~urllib.error.URLError` exception.
   1121 
   1122 
   1123 .. _http-error-processor-objects:
   1124 
   1125 HTTPErrorProcessor Objects
   1126 --------------------------
   1127 
   1128 .. method:: HTTPErrorProcessor.http_response(request, response)
   1129 
   1130    Process HTTP error responses.
   1131 
   1132    For 200 error codes, the response object is returned immediately.
   1133 
   1134    For non-200 error codes, this simply passes the job on to the
   1135    :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
   1136    Eventually, :class:`HTTPDefaultErrorHandler` will raise an
   1137    :exc:`~urllib.error.HTTPError` if no other handler handles the error.
   1138 
   1139 
   1140 .. method:: HTTPErrorProcessor.https_response(request, response)
   1141 
   1142    Process HTTPS error responses.
   1143 
   1144    The behavior is same as :meth:`http_response`.
   1145 
   1146 
   1147 .. _urllib-request-examples:
   1148 
   1149 Examples
   1150 --------
   1151 
   1152 In addition to the examples below, more examples are given in
   1153 :ref:`urllib-howto`.
   1154 
   1155 This example gets the python.org main page and displays the first 300 bytes of
   1156 it. ::
   1157 
   1158    >>> import urllib.request
   1159    >>> with urllib.request.urlopen('http://www.python.org/') as f:
   1160    ...     print(f.read(300))
   1161    ...
   1162    b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   1163    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
   1164    xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
   1165    <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
   1166    <title>Python Programming '
   1167 
   1168 Note that urlopen returns a bytes object.  This is because there is no way
   1169 for urlopen to automatically determine the encoding of the byte stream
   1170 it receives from the HTTP server. In general, a program will decode
   1171 the returned bytes object to string once it determines or guesses
   1172 the appropriate encoding.
   1173 
   1174 The following W3C document, https://www.w3.org/International/O-charset\ , lists
   1175 the various ways in which an (X)HTML or an XML document could have specified its
   1176 encoding information.
   1177 
   1178 As the python.org website uses *utf-8* encoding as specified in its meta tag, we
   1179 will use the same for decoding the bytes object. ::
   1180 
   1181    >>> with urllib.request.urlopen('http://www.python.org/') as f:
   1182    ...     print(f.read(100).decode('utf-8'))
   1183    ...
   1184    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   1185    "http://www.w3.org/TR/xhtml1/DTD/xhtm
   1186 
   1187 It is also possible to achieve the same result without using the
   1188 :term:`context manager` approach. ::
   1189 
   1190    >>> import urllib.request
   1191    >>> f = urllib.request.urlopen('http://www.python.org/')
   1192    >>> print(f.read(100).decode('utf-8'))
   1193    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   1194    "http://www.w3.org/TR/xhtml1/DTD/xhtm
   1195 
   1196 In the following example, we are sending a data-stream to the stdin of a CGI
   1197 and reading the data it returns to us. Note that this example will only work
   1198 when the Python installation supports SSL. ::
   1199 
   1200    >>> import urllib.request
   1201    >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
   1202    ...                       data=b'This data is passed to stdin of the CGI')
   1203    >>> with urllib.request.urlopen(req) as f:
   1204    ...     print(f.read().decode('utf-8'))
   1205    ...
   1206    Got Data: "This data is passed to stdin of the CGI"
   1207 
   1208 The code for the sample CGI used in the above example is::
   1209 
   1210    #!/usr/bin/env python
   1211    import sys
   1212    data = sys.stdin.read()
   1213    print('Content-type: text/plain\n\nGot Data: "%s"' % data)
   1214 
   1215 Here is an example of doing a ``PUT`` request using :class:`Request`::
   1216 
   1217     import urllib.request
   1218     DATA = b'some data'
   1219     req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT')
   1220     with urllib.request.urlopen(req) as f:
   1221         pass
   1222     print(f.status)
   1223     print(f.reason)
   1224 
   1225 Use of Basic HTTP Authentication::
   1226 
   1227    import urllib.request
   1228    # Create an OpenerDirector with support for Basic HTTP Authentication...
   1229    auth_handler = urllib.request.HTTPBasicAuthHandler()
   1230    auth_handler.add_password(realm='PDQ Application',
   1231                              uri='https://mahler:8092/site-updates.py',
   1232                              user='klem',
   1233                              passwd='kadidd!ehopper')
   1234    opener = urllib.request.build_opener(auth_handler)
   1235    # ...and install it globally so it can be used with urlopen.
   1236    urllib.request.install_opener(opener)
   1237    urllib.request.urlopen('http://www.example.com/login.html')
   1238 
   1239 :func:`build_opener` provides many handlers by default, including a
   1240 :class:`ProxyHandler`.  By default, :class:`ProxyHandler` uses the environment
   1241 variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
   1242 involved.  For example, the :envvar:`http_proxy` environment variable is read to
   1243 obtain the HTTP proxy's URL.
   1244 
   1245 This example replaces the default :class:`ProxyHandler` with one that uses
   1246 programmatically-supplied proxy URLs, and adds proxy authorization support with
   1247 :class:`ProxyBasicAuthHandler`. ::
   1248 
   1249    proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
   1250    proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
   1251    proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
   1252 
   1253    opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
   1254    # This time, rather than install the OpenerDirector, we use it directly:
   1255    opener.open('http://www.example.com/login.html')
   1256 
   1257 Adding HTTP headers:
   1258 
   1259 Use the *headers* argument to the :class:`Request` constructor, or::
   1260 
   1261    import urllib.request
   1262    req = urllib.request.Request('http://www.example.com/')
   1263    req.add_header('Referer', 'http://www.python.org/')
   1264    # Customize the default User-Agent header value:
   1265    req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
   1266    r = urllib.request.urlopen(req)
   1267 
   1268 :class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
   1269 every :class:`Request`.  To change this::
   1270 
   1271    import urllib.request
   1272    opener = urllib.request.build_opener()
   1273    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
   1274    opener.open('http://www.example.com/')
   1275 
   1276 Also, remember that a few standard headers (:mailheader:`Content-Length`,
   1277 :mailheader:`Content-Type` and :mailheader:`Host`)
   1278 are added when the :class:`Request` is passed to :func:`urlopen` (or
   1279 :meth:`OpenerDirector.open`).
   1280 
   1281 .. _urllib-examples:
   1282 
   1283 Here is an example session that uses the ``GET`` method to retrieve a URL
   1284 containing parameters::
   1285 
   1286    >>> import urllib.request
   1287    >>> import urllib.parse
   1288    >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
   1289    >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
   1290    >>> with urllib.request.urlopen(url) as f:
   1291    ...     print(f.read().decode('utf-8'))
   1292    ...
   1293 
   1294 The following example uses the ``POST`` method instead. Note that params output
   1295 from urlencode is encoded to bytes before it is sent to urlopen as data::
   1296 
   1297    >>> import urllib.request
   1298    >>> import urllib.parse
   1299    >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
   1300    >>> data = data.encode('ascii')
   1301    >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
   1302    ...     print(f.read().decode('utf-8'))
   1303    ...
   1304 
   1305 The following example uses an explicitly specified HTTP proxy, overriding
   1306 environment settings::
   1307 
   1308    >>> import urllib.request
   1309    >>> proxies = {'http': 'http://proxy.example.com:8080/'}
   1310    >>> opener = urllib.request.FancyURLopener(proxies)
   1311    >>> with opener.open("http://www.python.org") as f:
   1312    ...     f.read().decode('utf-8')
   1313    ...
   1314 
   1315 The following example uses no proxies at all, overriding environment settings::
   1316 
   1317    >>> import urllib.request
   1318    >>> opener = urllib.request.FancyURLopener({})
   1319    >>> with opener.open("http://www.python.org/") as f:
   1320    ...     f.read().decode('utf-8')
   1321    ...
   1322 
   1323 
   1324 Legacy interface
   1325 ----------------
   1326 
   1327 The following functions and classes are ported from the Python 2 module
   1328 ``urllib`` (as opposed to ``urllib2``).  They might become deprecated at
   1329 some point in the future.
   1330 
   1331 .. function:: urlretrieve(url, filename=None, reporthook=None, data=None)
   1332 
   1333    Copy a network object denoted by a URL to a local file. If the URL
   1334    points to a local file, the object will not be copied unless filename is supplied.
   1335    Return a tuple ``(filename, headers)`` where *filename* is the
   1336    local file name under which the object can be found, and *headers* is whatever
   1337    the :meth:`info` method of the object returned by :func:`urlopen` returned (for
   1338    a remote object). Exceptions are the same as for :func:`urlopen`.
   1339 
   1340    The second argument, if present, specifies the file location to copy to (if
   1341    absent, the location will be a tempfile with a generated name). The third
   1342    argument, if present, is a callable that will be called once on
   1343    establishment of the network connection and once after each block read
   1344    thereafter.  The callable will be passed three arguments; a count of blocks
   1345    transferred so far, a block size in bytes, and the total size of the file.  The
   1346    third argument may be ``-1`` on older FTP servers which do not return a file
   1347    size in response to a retrieval request.
   1348 
   1349    The following example illustrates the most common usage scenario::
   1350 
   1351       >>> import urllib.request
   1352       >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
   1353       >>> html = open(local_filename)
   1354       >>> html.close()
   1355 
   1356    If the *url* uses the :file:`http:` scheme identifier, the optional *data*
   1357    argument may be given to specify a ``POST`` request (normally the request
   1358    type is ``GET``).  The *data* argument must be a bytes object in standard
   1359    :mimetype:`application/x-www-form-urlencoded` format; see the
   1360    :func:`urllib.parse.urlencode` function.
   1361 
   1362    :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
   1363    the amount of data available  was less than the expected amount (which is the
   1364    size reported by a  *Content-Length* header). This can occur, for example, when
   1365    the  download is interrupted.
   1366 
   1367    The *Content-Length* is treated as a lower bound: if there's more data  to read,
   1368    urlretrieve reads more data, but if less data is available,  it raises the
   1369    exception.
   1370 
   1371    You can still retrieve the downloaded data in this case, it is stored  in the
   1372    :attr:`content` attribute of the exception instance.
   1373 
   1374    If no *Content-Length* header was supplied, urlretrieve can not check the size
   1375    of the data it has downloaded, and just returns it.  In this case you just have
   1376    to assume that the download was successful.
   1377 
   1378 .. function:: urlcleanup()
   1379 
   1380    Cleans up temporary files that may have been left behind by previous
   1381    calls to :func:`urlretrieve`.
   1382 
   1383 .. class:: URLopener(proxies=None, **x509)
   1384 
   1385    .. deprecated:: 3.3
   1386 
   1387    Base class for opening and reading URLs.  Unless you need to support opening
   1388    objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
   1389    you probably want to use :class:`FancyURLopener`.
   1390 
   1391    By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
   1392    of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
   1393    Applications can define their own :mailheader:`User-Agent` header by subclassing
   1394    :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
   1395    :attr:`version` to an appropriate string value in the subclass definition.
   1396 
   1397    The optional *proxies* parameter should be a dictionary mapping scheme names to
   1398    proxy URLs, where an empty dictionary turns proxies off completely.  Its default
   1399    value is ``None``, in which case environmental proxy settings will be used if
   1400    present, as discussed in the definition of :func:`urlopen`, above.
   1401 
   1402    Additional keyword parameters, collected in *x509*, may be used for
   1403    authentication of the client when using the :file:`https:` scheme.  The keywords
   1404    *key_file* and *cert_file* are supported to provide an  SSL key and certificate;
   1405    both are needed to support client authentication.
   1406 
   1407    :class:`URLopener` objects will raise an :exc:`OSError` exception if the server
   1408    returns an error code.
   1409 
   1410    .. method:: open(fullurl, data=None)
   1411 
   1412       Open *fullurl* using the appropriate protocol.  This method sets up cache and
   1413       proxy information, then calls the appropriate open method with its input
   1414       arguments.  If the scheme is not recognized, :meth:`open_unknown` is called.
   1415       The *data* argument has the same meaning as the *data* argument of
   1416       :func:`urlopen`.
   1417 
   1418 
   1419    .. method:: open_unknown(fullurl, data=None)
   1420 
   1421       Overridable interface to open unknown URL types.
   1422 
   1423 
   1424    .. method:: retrieve(url, filename=None, reporthook=None, data=None)
   1425 
   1426       Retrieves the contents of *url* and places it in *filename*.  The return value
   1427       is a tuple consisting of a local filename and either an
   1428       :class:`email.message.Message` object containing the response headers (for remote
   1429       URLs) or ``None`` (for local URLs).  The caller must then open and read the
   1430       contents of *filename*.  If *filename* is not given and the URL refers to a
   1431       local file, the input filename is returned.  If the URL is non-local and
   1432       *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
   1433       with a suffix that matches the suffix of the last path component of the input
   1434       URL.  If *reporthook* is given, it must be a function accepting three numeric
   1435       parameters: A chunk number, the maximum size chunks are read in and the total size of the download
   1436       (-1 if unknown).  It will be called once at the start and after each chunk of data is read from the
   1437       network.  *reporthook* is ignored for local URLs.
   1438 
   1439       If the *url* uses the :file:`http:` scheme identifier, the optional *data*
   1440       argument may be given to specify a ``POST`` request (normally the request type
   1441       is ``GET``).  The *data* argument must in standard
   1442       :mimetype:`application/x-www-form-urlencoded` format; see the
   1443       :func:`urllib.parse.urlencode` function.
   1444 
   1445 
   1446    .. attribute:: version
   1447 
   1448       Variable that specifies the user agent of the opener object.  To get
   1449       :mod:`urllib` to tell servers that it is a particular user agent, set this in a
   1450       subclass as a class variable or in the constructor before calling the base
   1451       constructor.
   1452 
   1453 
   1454 .. class:: FancyURLopener(...)
   1455 
   1456    .. deprecated:: 3.3
   1457 
   1458    :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
   1459    for the following HTTP response codes: 301, 302, 303, 307 and 401.  For the 30x
   1460    response codes listed above, the :mailheader:`Location` header is used to fetch
   1461    the actual URL.  For 401 response codes (authentication required), basic HTTP
   1462    authentication is performed.  For the 30x response codes, recursion is bounded
   1463    by the value of the *maxtries* attribute, which defaults to 10.
   1464 
   1465    For all other response codes, the method :meth:`http_error_default` is called
   1466    which you can override in subclasses to handle the error appropriately.
   1467 
   1468    .. note::
   1469 
   1470       According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
   1471       must not be automatically redirected without confirmation by the user.  In
   1472       reality, browsers do allow automatic redirection of these responses, changing
   1473       the POST to a GET, and :mod:`urllib` reproduces this behaviour.
   1474 
   1475    The parameters to the constructor are the same as those for :class:`URLopener`.
   1476 
   1477    .. note::
   1478 
   1479       When performing basic authentication, a :class:`FancyURLopener` instance calls
   1480       its :meth:`prompt_user_passwd` method.  The default implementation asks the
   1481       users for the required information on the controlling terminal.  A subclass may
   1482       override this method to support more appropriate behavior if needed.
   1483 
   1484    The :class:`FancyURLopener` class offers one additional method that should be
   1485    overloaded to provide the appropriate behavior:
   1486 
   1487    .. method:: prompt_user_passwd(host, realm)
   1488 
   1489       Return information needed to authenticate the user at the given host in the
   1490       specified security realm.  The return value should be a tuple, ``(user,
   1491       password)``, which can be used for basic authentication.
   1492 
   1493       The implementation prompts for this information on the terminal; an application
   1494       should override this method to use an appropriate interaction model in the local
   1495       environment.
   1496 
   1497 
   1498 :mod:`urllib.request` Restrictions
   1499 ----------------------------------
   1500 
   1501   .. index::
   1502      pair: HTTP; protocol
   1503      pair: FTP; protocol
   1504 
   1505 * Currently, only the following protocols are supported: HTTP (versions 0.9 and
   1506   1.0), FTP, local files, and data URLs.
   1507 
   1508   .. versionchanged:: 3.4 Added support for data URLs.
   1509 
   1510 * The caching feature of :func:`urlretrieve` has been disabled until someone
   1511   finds the time to hack proper processing of Expiration time headers.
   1512 
   1513 * There should be a function to query whether a particular URL is in the cache.
   1514 
   1515 * For backward compatibility, if a URL appears to point to a local file but the
   1516   file can't be opened, the URL is re-interpreted using the FTP protocol.  This
   1517   can sometimes cause confusing error messages.
   1518 
   1519 * The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
   1520   long delays while waiting for a network connection to be set up.  This means
   1521   that it is difficult to build an interactive Web client using these functions
   1522   without using threads.
   1523 
   1524   .. index::
   1525      single: HTML
   1526      pair: HTTP; protocol
   1527 
   1528 * The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
   1529   returned by the server.  This may be binary data (such as an image), plain text
   1530   or (for example) HTML.  The HTTP protocol provides type information in the reply
   1531   header, which can be inspected by looking at the :mailheader:`Content-Type`
   1532   header.  If the returned data is HTML, you can use the module
   1533   :mod:`html.parser` to parse it.
   1534 
   1535   .. index:: single: FTP
   1536 
   1537 * The code handling the FTP protocol cannot differentiate between a file and a
   1538   directory.  This can lead to unexpected behavior when attempting to read a URL
   1539   that points to a file that is not accessible.  If the URL ends in a ``/``, it is
   1540   assumed to refer to a directory and will be handled accordingly.  But if an
   1541   attempt to read a file leads to a 550 error (meaning the URL cannot be found or
   1542   is not accessible, often for permission reasons), then the path is treated as a
   1543   directory in order to handle the case when a directory is specified by a URL but
   1544   the trailing ``/`` has been left off.  This can cause misleading results when
   1545   you try to fetch a file whose read permissions make it inaccessible; the FTP
   1546   code will try to read it, fail with a 550 error, and then perform a directory
   1547   listing for the unreadable file. If fine-grained control is needed, consider
   1548   using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
   1549   *_urlopener* to meet your needs.
   1550 
   1551 
   1552 
   1553 :mod:`urllib.response` --- Response classes used by urllib
   1554 ==========================================================
   1555 
   1556 .. module:: urllib.response
   1557    :synopsis: Response classes used by urllib.
   1558 
   1559 The :mod:`urllib.response` module defines functions and classes which define a
   1560 minimal file like interface, including ``read()`` and ``readline()``. The
   1561 typical response object is an addinfourl instance, which defines an ``info()``
   1562 method and that returns headers and a ``geturl()`` method that returns the url.
   1563 Functions defined by this module are used internally by the
   1564 :mod:`urllib.request` module.
   1565 
   1566