Home | History | Annotate | Download | only in library
      1 :mod:`urllib2` --- extensible library for opening URLs
      2 ======================================================
      3 
      4 .. module:: urllib2
      5    :synopsis: Next generation URL opening library.
      6 .. moduleauthor:: Jeremy Hylton <jhylton (a] users.sourceforge.net>
      7 .. sectionauthor:: Moshe Zadka <moshez (a] users.sourceforge.net>
      8 
      9 
     10 .. note::
     11    The :mod:`urllib2` module has been split across several modules in
     12    Python 3 named :mod:`urllib.request` and :mod:`urllib.error`.
     13    The :term:`2to3` tool will automatically adapt imports when converting
     14    your sources to Python 3.
     15 
     16 
     17 The :mod:`urllib2` module defines functions and classes which help in opening
     18 URLs (mostly HTTP) in a complex world --- basic and digest authentication,
     19 redirections, cookies and more.
     20 
     21 .. seealso::
     22 
     23     The `Requests package <http://requests.readthedocs.org/>`_
     24     is recommended for a higher-level HTTP client interface.
     25 
     26 
     27 The :mod:`urllib2` module defines the following functions:
     28 
     29 
     30 .. function:: urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
     31 
     32    Open the URL *url*, which can be either a string or a :class:`Request` object.
     33 
     34    *data* may be a string specifying additional data to send to the server, or
     35    ``None`` if no such data is needed.  Currently HTTP requests are the only ones
     36    that use *data*; the HTTP request will be a POST instead of a GET when the
     37    *data* parameter is provided.  *data* should be a buffer in the standard
     38    :mimetype:`application/x-www-form-urlencoded` format.  The
     39    :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
     40    returns a string in this format. urllib2 module sends HTTP/1.1 requests with
     41    ``Connection:close`` header included.
     42 
     43    The optional *timeout* parameter specifies a timeout in seconds for blocking
     44    operations like the connection attempt (if not specified, the global default
     45    timeout setting will be used).  This actually only works for HTTP, HTTPS and
     46    FTP connections.
     47 
     48    If *context* is specified, it must be a :class:`ssl.SSLContext` instance
     49    describing the various SSL options. See :class:`~httplib.HTTPSConnection` for
     50    more details.
     51 
     52    The optional *cafile* and *capath* parameters specify a set of trusted CA
     53    certificates for HTTPS requests.  *cafile* should point to a single file
     54    containing a bundle of CA certificates, whereas *capath* should point to a
     55    directory of hashed certificate files.  More information can be found in
     56    :meth:`ssl.SSLContext.load_verify_locations`.
     57 
     58    The *cadefault* parameter is ignored.
     59 
     60    This function returns a file-like object with three additional methods:
     61 
     62    * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to
     63      determine if a redirect was followed
     64 
     65    * :meth:`info` --- return the meta-information of the page, such as headers,
     66      in the form of an :class:`mimetools.Message` instance
     67      (see `Quick Reference to HTTP Headers <https://www.cs.tut.fi/~jkorpela/http.html>`_)
     68 
     69    * :meth:`getcode` --- return the HTTP status code of the response.
     70 
     71    Raises :exc:`URLError` on errors.
     72 
     73    Note that ``None`` may be returned if no handler handles the request (though the
     74    default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
     75    ensure this never happens).
     76 
     77    In addition, if proxy settings are detected (for example, when a ``*_proxy``
     78    environment variable like :envvar:`http_proxy` is set),
     79    :class:`ProxyHandler` is default installed and makes sure the requests are
     80    handled through the proxy.
     81 
     82    .. versionchanged:: 2.6
     83      *timeout* was added.
     84 
     85    .. versionchanged:: 2.7.9
     86       *cafile*, *capath*, *cadefault*, and *context* were added.
     87 
     88 
     89 .. function:: install_opener(opener)
     90 
     91    Install an :class:`OpenerDirector` instance as the default global opener.
     92    Installing an opener is only necessary if you want urlopen to use that opener;
     93    otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
     94    The code does not check for a real :class:`OpenerDirector`, and any class with
     95    the appropriate interface will work.
     96 
     97 
     98 .. function:: build_opener([handler, ...])
     99 
    100    Return an :class:`OpenerDirector` instance, which chains the handlers in the
    101    order given. *handler*\s can be either instances of :class:`BaseHandler`, or
    102    subclasses of :class:`BaseHandler` (in which case it must be possible to call
    103    the constructor without any parameters).  Instances of the following classes
    104    will be in front of the *handler*\s, unless the *handler*\s contain them,
    105    instances of them or subclasses of them: :class:`ProxyHandler` (if proxy
    106    settings are detected),
    107    :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
    108    :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
    109    :class:`HTTPErrorProcessor`.
    110 
    111    If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
    112    :class:`HTTPSHandler` will also be added.
    113 
    114    Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
    115    :attr:`handler_order` attribute to modify its position in the handlers
    116    list.
    117 
    118 The following exceptions are raised as appropriate:
    119 
    120 
    121 .. exception:: URLError
    122 
    123    The handlers raise this exception (or derived exceptions) when they run into a
    124    problem.  It is a subclass of :exc:`IOError`.
    125 
    126    .. attribute:: reason
    127 
    128       The reason for this error.  It can be a message string or another exception
    129       instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
    130       URLs).
    131 
    132 
    133 .. exception:: HTTPError
    134 
    135    Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
    136    can also function as a non-exceptional file-like return value (the same thing
    137    that :func:`urlopen` returns).  This is useful when handling exotic HTTP
    138    errors, such as requests for authentication.
    139 
    140    .. attribute:: code
    141 
    142       An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
    143       This numeric value corresponds to a value found in the dictionary of
    144       codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`.
    145 
    146    .. attribute:: reason
    147 
    148       The reason for this error.  It can be a message string or another exception
    149       instance.
    150 
    151 The following classes are provided:
    152 
    153 
    154 .. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable])
    155 
    156    This class is an abstraction of a URL request.
    157 
    158    *url* should be a string containing a valid URL.
    159 
    160    *data* may be a string specifying additional data to send to the server, or
    161    ``None`` if no such data is needed.  Currently HTTP requests are the only ones
    162    that use *data*; the HTTP request will be a POST instead of a GET when the
    163    *data* parameter is provided.  *data* should be a buffer in the standard
    164    :mimetype:`application/x-www-form-urlencoded` format.  The
    165    :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
    166    returns a string in this format.
    167 
    168    *headers* should be a dictionary, and will be treated as if :meth:`add_header`
    169    was called with each key and value as arguments.  This is often used to "spoof"
    170    the ``User-Agent`` header value, which is used by a browser to identify itself --
    171    some HTTP servers only allow requests coming from common browsers as opposed
    172    to scripts.  For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
    173    (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
    174    default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
    175 
    176    The final two arguments are only of interest for correct handling of third-party
    177    HTTP cookies:
    178 
    179    *origin_req_host* should be the request-host of the origin transaction, as
    180    defined by :rfc:`2965`.  It defaults to ``cookielib.request_host(self)``.  This
    181    is the host name or IP address of the original request that was initiated by the
    182    user.  For example, if the request is for an image in an HTML document, this
    183    should be the request-host of the request for the page containing the image.
    184 
    185    *unverifiable* should indicate whether the request is unverifiable, as defined
    186    by RFC 2965.  It defaults to ``False``.  An unverifiable request is one whose URL
    187    the user did not have the option to approve.  For example, if the request is for
    188    an image in an HTML document, and the user had no option to approve the
    189    automatic fetching of the image, this should be true.
    190 
    191 
    192 .. class:: OpenerDirector()
    193 
    194    The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
    195    together. It manages the chaining of handlers, and recovery from errors.
    196 
    197 
    198 .. class:: BaseHandler()
    199 
    200    This is the base class for all registered handlers --- and handles only the
    201    simple mechanics of registration.
    202 
    203 
    204 .. class:: HTTPDefaultErrorHandler()
    205 
    206    A class which defines a default handler for HTTP error responses; all responses
    207    are turned into :exc:`HTTPError` exceptions.
    208 
    209 
    210 .. class:: HTTPRedirectHandler()
    211 
    212    A class to handle redirections.
    213 
    214 
    215 .. class:: HTTPCookieProcessor([cookiejar])
    216 
    217    A class to handle HTTP Cookies.
    218 
    219 
    220 .. class:: ProxyHandler([proxies])
    221 
    222    Cause requests to go through a proxy. If *proxies* is given, it must be a
    223    dictionary mapping protocol names to URLs of proxies. The default is to read
    224    the list of proxies from the environment variables
    225    :envvar:`<protocol>_proxy`.  If no proxy environment variables are set, then
    226    in a Windows environment proxy settings are obtained from the registry's
    227    Internet Settings section, and in a Mac OS X environment proxy information
    228    is retrieved from the OS X System Configuration Framework.
    229 
    230    To disable autodetected proxy pass an empty dictionary.
    231 
    232     .. note::
    233 
    234        ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set;
    235        see the documentation on :func:`~urllib.getproxies`.
    236 
    237 
    238 .. class:: HTTPPasswordMgr()
    239 
    240    Keep a database of  ``(realm, uri) -> (user, password)`` mappings.
    241 
    242 
    243 .. class:: HTTPPasswordMgrWithDefaultRealm()
    244 
    245    Keep a database of  ``(realm, uri) -> (user, password)`` mappings. A realm of
    246    ``None`` is considered a catch-all realm, which is searched if no other realm
    247    fits.
    248 
    249 
    250 .. class:: AbstractBasicAuthHandler([password_mgr])
    251 
    252    This is a mixin class that helps with HTTP authentication, both to the remote
    253    host and to a proxy. *password_mgr*, if given, should be something that is
    254    compatible with :class:`HTTPPasswordMgr`; refer to section
    255    :ref:`http-password-mgr` for information on the interface that must be
    256    supported.
    257 
    258 
    259 .. class:: HTTPBasicAuthHandler([password_mgr])
    260 
    261    Handle authentication with the remote host. *password_mgr*, if given, should be
    262    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    263    :ref:`http-password-mgr` for information on the interface that must be
    264    supported.
    265 
    266 
    267 .. class:: ProxyBasicAuthHandler([password_mgr])
    268 
    269    Handle authentication with the proxy. *password_mgr*, if given, should be
    270    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    271    :ref:`http-password-mgr` for information on the interface that must be
    272    supported.
    273 
    274 
    275 .. class:: AbstractDigestAuthHandler([password_mgr])
    276 
    277    This is a mixin class that helps with HTTP authentication, both to the remote
    278    host and to a proxy. *password_mgr*, if given, should be something that is
    279    compatible with :class:`HTTPPasswordMgr`; refer to section
    280    :ref:`http-password-mgr` for information on the interface that must be
    281    supported.
    282 
    283 
    284 .. class:: HTTPDigestAuthHandler([password_mgr])
    285 
    286    Handle authentication with the remote host. *password_mgr*, if given, should be
    287    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    288    :ref:`http-password-mgr` for information on the interface that must be
    289    supported.
    290 
    291 
    292 .. class:: ProxyDigestAuthHandler([password_mgr])
    293 
    294    Handle authentication with the proxy. *password_mgr*, if given, should be
    295    something that is compatible with :class:`HTTPPasswordMgr`; refer to section
    296    :ref:`http-password-mgr` for information on the interface that must be
    297    supported.
    298 
    299 
    300 .. class:: HTTPHandler()
    301 
    302    A class to handle opening of HTTP URLs.
    303 
    304 
    305 .. class:: HTTPSHandler([debuglevel[, context]])
    306 
    307    A class to handle opening of HTTPS URLs. *context* has the same meaning as
    308    for :class:`httplib.HTTPSConnection`.
    309 
    310    .. versionchanged:: 2.7.9
    311       *context* added.
    312 
    313 
    314 .. class:: FileHandler()
    315 
    316    Open local files.
    317 
    318 
    319 .. class:: FTPHandler()
    320 
    321    Open FTP URLs.
    322 
    323 
    324 .. class:: CacheFTPHandler()
    325 
    326    Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
    327 
    328 
    329 .. class:: UnknownHandler()
    330 
    331    A catch-all class to handle unknown URLs.
    332 
    333 
    334 .. class:: HTTPErrorProcessor()
    335 
    336    Process HTTP error responses.
    337 
    338 
    339 .. _request-objects:
    340 
    341 Request Objects
    342 ---------------
    343 
    344 The following methods describe all of :class:`Request`'s public interface, and
    345 so all must be overridden in subclasses.
    346 
    347 
    348 .. method:: Request.add_data(data)
    349 
    350    Set the :class:`Request` data to *data*.  This is ignored by all handlers except
    351    HTTP handlers --- and there it should be a byte string, and will change the
    352    request to be ``POST`` rather than ``GET``.
    353 
    354 
    355 .. method:: Request.get_method()
    356 
    357    Return a string indicating the HTTP request method.  This is only meaningful for
    358    HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
    359 
    360 
    361 .. method:: Request.has_data()
    362 
    363    Return whether the instance has a non-\ ``None`` data.
    364 
    365 
    366 .. method:: Request.get_data()
    367 
    368    Return the instance's data.
    369 
    370 
    371 .. method:: Request.add_header(key, val)
    372 
    373    Add another header to the request.  Headers are currently ignored by all
    374    handlers except HTTP handlers, where they are added to the list of headers sent
    375    to the server.  Note that there cannot be more than one header with the same
    376    name, and later calls will overwrite previous calls in case the *key* collides.
    377    Currently, this is no loss of HTTP functionality, since all headers which have
    378    meaning when used more than once have a (header-specific) way of gaining the
    379    same functionality using only one header.
    380 
    381 
    382 .. method:: Request.add_unredirected_header(key, header)
    383 
    384    Add a header that will not be added to a redirected request.
    385 
    386    .. versionadded:: 2.4
    387 
    388 
    389 .. method:: Request.has_header(header)
    390 
    391    Return whether the instance has the named header (checks both regular and
    392    unredirected).
    393 
    394    .. versionadded:: 2.4
    395 
    396 
    397 .. method:: Request.get_full_url()
    398 
    399    Return the URL given in the constructor.
    400 
    401 
    402 .. method:: Request.get_type()
    403 
    404    Return the type of the URL --- also known as the scheme.
    405 
    406 
    407 .. method:: Request.get_host()
    408 
    409    Return the host to which a connection will be made.
    410 
    411 
    412 .. method:: Request.get_selector()
    413 
    414    Return the selector --- the part of the URL that is sent to the server.
    415 
    416 
    417 .. method:: Request.get_header(header_name, default=None)
    418 
    419    Return the value of the given header. If the header is not present, return
    420    the default value.
    421 
    422 
    423 .. method:: Request.header_items()
    424 
    425    Return a list of tuples (header_name, header_value) of the Request headers.
    426 
    427 
    428 .. method:: Request.set_proxy(host, type)
    429 
    430    Prepare the request by connecting to a proxy server. The *host* and *type* will
    431    replace those of the instance, and the instance's selector will be the original
    432    URL given in the constructor.
    433 
    434 
    435 .. method:: Request.get_origin_req_host()
    436 
    437    Return the request-host of the origin transaction, as defined by :rfc:`2965`.
    438    See the documentation for the :class:`Request` constructor.
    439 
    440 
    441 .. method:: Request.is_unverifiable()
    442 
    443    Return whether the request is unverifiable, as defined by RFC 2965. See the
    444    documentation for the :class:`Request` constructor.
    445 
    446 
    447 .. _opener-director-objects:
    448 
    449 OpenerDirector Objects
    450 ----------------------
    451 
    452 :class:`OpenerDirector` instances have the following methods:
    453 
    454 
    455 .. method:: OpenerDirector.add_handler(handler)
    456 
    457    *handler* should be an instance of :class:`BaseHandler`.  The following
    458    methods are searched, and added to the possible chains (note that HTTP errors
    459    are a special case).
    460 
    461    * :samp:`{protocol}_open` --- signal that the handler knows how to open
    462      *protocol* URLs.
    463 
    464    * :samp:`http_error_{type}` --- signal that the handler knows how to handle
    465      HTTP errors with HTTP error code *type*.
    466 
    467    * :samp:`{protocol}_error` --- signal that the handler knows how to handle
    468      errors from (non-\ ``http``) *protocol*.
    469 
    470    * :samp:`{protocol}_request` --- signal that the handler knows how to
    471      pre-process *protocol* requests.
    472 
    473    * :samp:`{protocol}_response` --- signal that the handler knows how to
    474      post-process *protocol* responses.
    475 
    476 
    477 .. method:: OpenerDirector.open(url[, data][, timeout])
    478 
    479    Open the given *url* (which can be a request object or a string), optionally
    480    passing the given *data*. Arguments, return values and exceptions raised are
    481    the same as those of :func:`urlopen` (which simply calls the :meth:`open`
    482    method on the currently installed global :class:`OpenerDirector`).  The
    483    optional *timeout* parameter specifies a timeout in seconds for blocking
    484    operations like the connection attempt (if not specified, the global default
    485    timeout setting will be used). The timeout feature actually works only for
    486    HTTP, HTTPS and FTP connections).
    487 
    488    .. versionchanged:: 2.6
    489       *timeout* was added.
    490 
    491 
    492 .. method:: OpenerDirector.error(proto[, arg[, ...]])
    493 
    494    Handle an error of the given protocol.  This will call the registered error
    495    handlers for the given protocol with the given arguments (which are protocol
    496    specific).  The HTTP protocol is a special case which uses the HTTP response
    497    code to determine the specific error handler; refer to the :meth:`http_error_\*`
    498    methods of the handler classes.
    499 
    500    Return values and exceptions raised are the same as those of :func:`urlopen`.
    501 
    502 OpenerDirector objects open URLs in three stages:
    503 
    504 The order in which these methods are called within each stage is determined by
    505 sorting the handler instances.
    506 
    507 #. Every handler with a method named like :samp:`{protocol}_request` has that
    508    method called to pre-process the request.
    509 
    510 #. Handlers with a method named like :samp:`{protocol}_open` are called to handle
    511    the request. This stage ends when a handler either returns a non-\ :const:`None`
    512    value (ie. a response), or raises an exception (usually :exc:`URLError`).
    513    Exceptions are allowed to propagate.
    514 
    515    In fact, the above algorithm is first tried for methods named
    516    :meth:`default_open`.  If all such methods return :const:`None`, the
    517    algorithm is repeated for methods named like :samp:`{protocol}_open`.  If all
    518    such methods return :const:`None`, the algorithm is repeated for methods
    519    named :meth:`unknown_open`.
    520 
    521    Note that the implementation of these methods may involve calls of the parent
    522    :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and
    523    :meth:`~OpenerDirector.error` methods.
    524 
    525 #. Every handler with a method named like :samp:`{protocol}_response` has that
    526    method called to post-process the response.
    527 
    528 
    529 .. _base-handler-objects:
    530 
    531 BaseHandler Objects
    532 -------------------
    533 
    534 :class:`BaseHandler` objects provide a couple of methods that are directly
    535 useful, and others that are meant to be used by derived classes.  These are
    536 intended for direct use:
    537 
    538 
    539 .. method:: BaseHandler.add_parent(director)
    540 
    541    Add a director as parent.
    542 
    543 
    544 .. method:: BaseHandler.close()
    545 
    546    Remove any parents.
    547 
    548 The following attributes and methods should only be used by classes derived from
    549 :class:`BaseHandler`.
    550 
    551 .. note::
    552 
    553    The convention has been adopted that subclasses defining
    554    :meth:`protocol_request` or :meth:`protocol_response` methods are named
    555    :class:`\*Processor`; all others are named :class:`\*Handler`.
    556 
    557 
    558 .. attribute:: BaseHandler.parent
    559 
    560    A valid :class:`OpenerDirector`, which can be used to open using a different
    561    protocol, or handle errors.
    562 
    563 
    564 .. method:: BaseHandler.default_open(req)
    565 
    566    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    567    define it if they want to catch all URLs.
    568 
    569    This method, if implemented, will be called by the parent
    570    :class:`OpenerDirector`.  It should return a file-like object as described in
    571    the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
    572    It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
    573    example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
    574 
    575    This method will be called before any protocol-specific open method.
    576 
    577 
    578 .. method:: BaseHandler.protocol_open(req)
    579    :noindex:
    580 
    581    ("protocol" is to be replaced by the protocol name.)
    582 
    583    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    584    define it if they want to handle URLs with the given *protocol*.
    585 
    586    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    587    Return values should be the same as for  :meth:`default_open`.
    588 
    589 
    590 .. method:: BaseHandler.unknown_open(req)
    591 
    592    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    593    define it if they want to catch all URLs with no specific registered handler to
    594    open it.
    595 
    596    This method, if implemented, will be called by the :attr:`parent`
    597    :class:`OpenerDirector`.  Return values should be the same as for
    598    :meth:`default_open`.
    599 
    600 
    601 .. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
    602 
    603    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    604    override it if they intend to provide a catch-all for otherwise unhandled HTTP
    605    errors.  It will be called automatically by the  :class:`OpenerDirector` getting
    606    the error, and should not normally be called in other circumstances.
    607 
    608    *req* will be a :class:`Request` object, *fp* will be a file-like object with
    609    the HTTP error body, *code* will be the three-digit code of the error, *msg*
    610    will be the user-visible explanation of the code and *hdrs* will be a mapping
    611    object with the headers of the error.
    612 
    613    Return values and exceptions raised should be the same as those of
    614    :func:`urlopen`.
    615 
    616 
    617 .. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
    618 
    619    *nnn* should be a three-digit HTTP error code.  This method is also not defined
    620    in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
    621    subclass, when an HTTP error with code *nnn* occurs.
    622 
    623    Subclasses should override this method to handle specific HTTP errors.
    624 
    625    Arguments, return values and exceptions raised should be the same as for
    626    :meth:`http_error_default`.
    627 
    628 
    629 .. method:: BaseHandler.protocol_request(req)
    630    :noindex:
    631 
    632    ("protocol" is to be replaced by the protocol name.)
    633 
    634    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    635    define it if they want to pre-process requests of the given *protocol*.
    636 
    637    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    638    *req* will be a :class:`Request` object. The return value should be a
    639    :class:`Request` object.
    640 
    641 
    642 .. method:: BaseHandler.protocol_response(req, response)
    643    :noindex:
    644 
    645    ("protocol" is to be replaced by the protocol name.)
    646 
    647    This method is *not* defined in :class:`BaseHandler`, but subclasses should
    648    define it if they want to post-process responses of the given *protocol*.
    649 
    650    This method, if defined, will be called by the parent :class:`OpenerDirector`.
    651    *req* will be a :class:`Request` object. *response* will be an object
    652    implementing the same interface as the return value of :func:`urlopen`.  The
    653    return value should implement the same interface as the return value of
    654    :func:`urlopen`.
    655 
    656 
    657 .. _http-redirect-handler:
    658 
    659 HTTPRedirectHandler Objects
    660 ---------------------------
    661 
    662 .. note::
    663 
    664    Some HTTP redirections require action from this module's client code.  If this
    665    is the case, :exc:`HTTPError` is raised.  See :rfc:`2616` for details of the
    666    precise meanings of the various redirection codes.
    667 
    668 
    669 .. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)
    670 
    671    Return a :class:`Request` or ``None`` in response to a redirect. This is called
    672    by the default implementations of the :meth:`http_error_30\*` methods when a
    673    redirection is received from the server.  If a redirection should take place,
    674    return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
    675    redirect to *newurl*.  Otherwise, raise :exc:`HTTPError` if no other handler
    676    should try to handle this URL, or return ``None`` if you can't but another
    677    handler might.
    678 
    679    .. note::
    680 
    681       The default implementation of this method does not strictly follow :rfc:`2616`,
    682       which says that 301 and 302 responses to ``POST`` requests must not be
    683       automatically redirected without confirmation by the user.  In reality, browsers
    684       do allow automatic redirection of these responses, changing the POST to a
    685       ``GET``, and the default implementation reproduces this behavior.
    686 
    687 
    688 .. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
    689 
    690    Redirect to the ``Location:`` or ``URI:`` URL.  This method is called by the
    691    parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
    692 
    693 
    694 .. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
    695 
    696    The same as :meth:`http_error_301`, but called for the 'found' response.
    697 
    698 
    699 .. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
    700 
    701    The same as :meth:`http_error_301`, but called for the 'see other' response.
    702 
    703 
    704 .. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
    705 
    706    The same as :meth:`http_error_301`, but called for the 'temporary redirect'
    707    response.
    708 
    709 
    710 .. _http-cookie-processor:
    711 
    712 HTTPCookieProcessor Objects
    713 ---------------------------
    714 
    715 .. versionadded:: 2.4
    716 
    717 :class:`HTTPCookieProcessor` instances have one attribute:
    718 
    719 
    720 .. attribute:: HTTPCookieProcessor.cookiejar
    721 
    722    The :class:`cookielib.CookieJar` in which cookies are stored.
    723 
    724 
    725 .. _proxy-handler:
    726 
    727 ProxyHandler Objects
    728 --------------------
    729 
    730 
    731 .. method:: ProxyHandler.protocol_open(request)
    732    :noindex:
    733 
    734    ("protocol" is to be replaced by the protocol name.)
    735 
    736    The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every
    737    *protocol* which has a proxy in the *proxies* dictionary given in the
    738    constructor.  The method will modify requests to go through the proxy, by
    739    calling ``request.set_proxy()``, and call the next handler in the chain to
    740    actually execute the protocol.
    741 
    742 
    743 .. _http-password-mgr:
    744 
    745 HTTPPasswordMgr Objects
    746 -----------------------
    747 
    748 These methods are available on :class:`HTTPPasswordMgr` and
    749 :class:`HTTPPasswordMgrWithDefaultRealm` objects.
    750 
    751 
    752 .. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
    753 
    754    *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
    755    *passwd* must be strings. This causes ``(user, passwd)`` to be used as
    756    authentication tokens when authentication for *realm* and a super-URI of any of
    757    the given URIs is given.
    758 
    759 
    760 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
    761 
    762    Get user/password for given realm and URI, if any.  This method will return
    763    ``(None, None)`` if there is no matching user/password.
    764 
    765    For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
    766    searched if the given *realm* has no matching user/password.
    767 
    768 
    769 .. _abstract-basic-auth-handler:
    770 
    771 AbstractBasicAuthHandler Objects
    772 --------------------------------
    773 
    774 
    775 .. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
    776 
    777    Handle an authentication request by getting a user/password pair, and re-trying
    778    the request.  *authreq* should be the name of the header where the information
    779    about the realm is included in the request, *host* specifies the URL and path to
    780    authenticate for, *req* should be the (failed) :class:`Request` object, and
    781    *headers* should be the error headers.
    782 
    783    *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
    784    authority component (e.g. ``"http://python.org/"``). In either case, the
    785    authority must not contain a userinfo component (so, ``"python.org"`` and
    786    ``"python.org:80"`` are fine, ``"joe:password (a] python.org"`` is not).
    787 
    788 
    789 .. _http-basic-auth-handler:
    790 
    791 HTTPBasicAuthHandler Objects
    792 ----------------------------
    793 
    794 
    795 .. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code,  msg, hdrs)
    796 
    797    Retry the request with authentication information, if available.
    798 
    799 
    800 .. _proxy-basic-auth-handler:
    801 
    802 ProxyBasicAuthHandler Objects
    803 -----------------------------
    804 
    805 
    806 .. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code,  msg, hdrs)
    807 
    808    Retry the request with authentication information, if available.
    809 
    810 
    811 .. _abstract-digest-auth-handler:
    812 
    813 AbstractDigestAuthHandler Objects
    814 ---------------------------------
    815 
    816 
    817 .. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
    818 
    819    *authreq* should be the name of the header where the information about the realm
    820    is included in the request, *host* should be the host to authenticate to, *req*
    821    should be the (failed) :class:`Request` object, and *headers* should be the
    822    error headers.
    823 
    824 
    825 .. _http-digest-auth-handler:
    826 
    827 HTTPDigestAuthHandler Objects
    828 -----------------------------
    829 
    830 
    831 .. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code,  msg, hdrs)
    832 
    833    Retry the request with authentication information, if available.
    834 
    835 
    836 .. _proxy-digest-auth-handler:
    837 
    838 ProxyDigestAuthHandler Objects
    839 ------------------------------
    840 
    841 
    842 .. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code,  msg, hdrs)
    843 
    844    Retry the request with authentication information, if available.
    845 
    846 
    847 .. _http-handler-objects:
    848 
    849 HTTPHandler Objects
    850 -------------------
    851 
    852 
    853 .. method:: HTTPHandler.http_open(req)
    854 
    855    Send an HTTP request, which can be either GET or POST, depending on
    856    ``req.has_data()``.
    857 
    858 
    859 .. _https-handler-objects:
    860 
    861 HTTPSHandler Objects
    862 --------------------
    863 
    864 
    865 .. method:: HTTPSHandler.https_open(req)
    866 
    867    Send an HTTPS request, which can be either GET or POST, depending on
    868    ``req.has_data()``.
    869 
    870 
    871 .. _file-handler-objects:
    872 
    873 FileHandler Objects
    874 -------------------
    875 
    876 
    877 .. method:: FileHandler.file_open(req)
    878 
    879    Open the file locally, if there is no host name, or the host name is
    880    ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
    881    using :attr:`parent`.
    882 
    883 
    884 .. _ftp-handler-objects:
    885 
    886 FTPHandler Objects
    887 ------------------
    888 
    889 
    890 .. method:: FTPHandler.ftp_open(req)
    891 
    892    Open the FTP file indicated by *req*. The login is always done with empty
    893    username and password.
    894 
    895 
    896 .. _cacheftp-handler-objects:
    897 
    898 CacheFTPHandler Objects
    899 -----------------------
    900 
    901 :class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
    902 following additional methods:
    903 
    904 
    905 .. method:: CacheFTPHandler.setTimeout(t)
    906 
    907    Set timeout of connections to *t* seconds.
    908 
    909 
    910 .. method:: CacheFTPHandler.setMaxConns(m)
    911 
    912    Set maximum number of cached connections to *m*.
    913 
    914 
    915 .. _unknown-handler-objects:
    916 
    917 UnknownHandler Objects
    918 ----------------------
    919 
    920 
    921 .. method:: UnknownHandler.unknown_open()
    922 
    923    Raise a :exc:`URLError` exception.
    924 
    925 
    926 .. _http-error-processor-objects:
    927 
    928 HTTPErrorProcessor Objects
    929 --------------------------
    930 
    931 .. versionadded:: 2.4
    932 
    933 
    934 .. method:: HTTPErrorProcessor.http_response()
    935 
    936    Process HTTP error responses.
    937 
    938    For 200 error codes, the response object is returned immediately.
    939 
    940    For non-200 error codes, this simply passes the job on to the
    941    :samp:`{protocol}_error_code` handler methods, via
    942    :meth:`OpenerDirector.error`.  Eventually,
    943    :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no
    944    other handler handles the error.
    945 
    946 .. method:: HTTPErrorProcessor.https_response()
    947 
    948    Process HTTPS error responses.
    949 
    950    The behavior is same as :meth:`http_response`.
    951 
    952 
    953 .. _urllib2-examples:
    954 
    955 Examples
    956 --------
    957 
    958 In addition to the examples below, more examples are given in
    959 :ref:`urllib-howto`.
    960 
    961 This example gets the python.org main page and displays the first 100 bytes of
    962 it::
    963 
    964    >>> import urllib2
    965    >>> f = urllib2.urlopen('http://www.python.org/')
    966    >>> print f.read(100)
    967    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    968    <?xml-stylesheet href="./css/ht2html
    969 
    970 Here we are sending a data-stream to the stdin of a CGI and reading the data it
    971 returns to us. Note that this example will only work when the Python
    972 installation supports SSL. ::
    973 
    974    >>> import urllib2
    975    >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
    976    ...                       data='This data is passed to stdin of the CGI')
    977    >>> f = urllib2.urlopen(req)
    978    >>> print f.read()
    979    Got Data: "This data is passed to stdin of the CGI"
    980 
    981 The code for the sample CGI used in the above example is::
    982 
    983    #!/usr/bin/env python
    984    import sys
    985    data = sys.stdin.read()
    986    print 'Content-type: text-plain\n\nGot Data: "%s"' % data
    987 
    988 Use of Basic HTTP Authentication::
    989 
    990    import urllib2
    991    # Create an OpenerDirector with support for Basic HTTP Authentication...
    992    auth_handler = urllib2.HTTPBasicAuthHandler()
    993    auth_handler.add_password(realm='PDQ Application',
    994                              uri='https://mahler:8092/site-updates.py',
    995                              user='klem',
    996                              passwd='kadidd!ehopper')
    997    opener = urllib2.build_opener(auth_handler)
    998    # ...and install it globally so it can be used with urlopen.
    999    urllib2.install_opener(opener)
   1000    urllib2.urlopen('http://www.example.com/login.html')
   1001 
   1002 :func:`build_opener` provides many handlers by default, including a
   1003 :class:`ProxyHandler`.  By default, :class:`ProxyHandler` uses the environment
   1004 variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
   1005 involved.  For example, the :envvar:`http_proxy` environment variable is read to
   1006 obtain the HTTP proxy's URL.
   1007 
   1008 This example replaces the default :class:`ProxyHandler` with one that uses
   1009 programmatically-supplied proxy URLs, and adds proxy authorization support with
   1010 :class:`ProxyBasicAuthHandler`. ::
   1011 
   1012    proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
   1013    proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
   1014    proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
   1015 
   1016    opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
   1017    # This time, rather than install the OpenerDirector, we use it directly:
   1018    opener.open('http://www.example.com/login.html')
   1019 
   1020 Adding HTTP headers:
   1021 
   1022 Use the *headers* argument to the :class:`Request` constructor, or::
   1023 
   1024    import urllib2
   1025    req = urllib2.Request('http://www.example.com/')
   1026    req.add_header('Referer', 'http://www.python.org/')
   1027    # Customize the default User-Agent header value:
   1028    req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
   1029    r = urllib2.urlopen(req)
   1030 
   1031 :class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
   1032 every :class:`Request`.  To change this::
   1033 
   1034    import urllib2
   1035    opener = urllib2.build_opener()
   1036    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
   1037    opener.open('http://www.example.com/')
   1038 
   1039 Also, remember that a few standard headers (:mailheader:`Content-Length`,
   1040 :mailheader:`Content-Type` and :mailheader:`Host`) are added when the
   1041 :class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
   1042 
   1043