1 :mod:`urllib2` --- extensible library for opening URLs 2 ====================================================== 3 4 .. module:: urllib2 5 :synopsis: Next generation URL opening library. 6 .. moduleauthor:: Jeremy Hylton <jhylton (a] users.sourceforge.net> 7 .. sectionauthor:: Moshe Zadka <moshez (a] users.sourceforge.net> 8 9 10 .. note:: 11 The :mod:`urllib2` module has been split across several modules in 12 Python 3 named :mod:`urllib.request` and :mod:`urllib.error`. 13 The :term:`2to3` tool will automatically adapt imports when converting 14 your sources to Python 3. 15 16 17 The :mod:`urllib2` module defines functions and classes which help in opening 18 URLs (mostly HTTP) in a complex world --- basic and digest authentication, 19 redirections, cookies and more. 20 21 .. seealso:: 22 23 The `Requests package <http://requests.readthedocs.org/>`_ 24 is recommended for a higher-level HTTP client interface. 25 26 27 The :mod:`urllib2` module defines the following functions: 28 29 30 .. function:: urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]]) 31 32 Open the URL *url*, which can be either a string or a :class:`Request` object. 33 34 *data* may be a string specifying additional data to send to the server, or 35 ``None`` if no such data is needed. Currently HTTP requests are the only ones 36 that use *data*; the HTTP request will be a POST instead of a GET when the 37 *data* parameter is provided. *data* should be a buffer in the standard 38 :mimetype:`application/x-www-form-urlencoded` format. The 39 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and 40 returns a string in this format. urllib2 module sends HTTP/1.1 requests with 41 ``Connection:close`` header included. 42 43 The optional *timeout* parameter specifies a timeout in seconds for blocking 44 operations like the connection attempt (if not specified, the global default 45 timeout setting will be used). This actually only works for HTTP, HTTPS and 46 FTP connections. 47 48 If *context* is specified, it must be a :class:`ssl.SSLContext` instance 49 describing the various SSL options. See :class:`~httplib.HTTPSConnection` for 50 more details. 51 52 The optional *cafile* and *capath* parameters specify a set of trusted CA 53 certificates for HTTPS requests. *cafile* should point to a single file 54 containing a bundle of CA certificates, whereas *capath* should point to a 55 directory of hashed certificate files. More information can be found in 56 :meth:`ssl.SSLContext.load_verify_locations`. 57 58 The *cadefault* parameter is ignored. 59 60 This function returns a file-like object with three additional methods: 61 62 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to 63 determine if a redirect was followed 64 65 * :meth:`info` --- return the meta-information of the page, such as headers, 66 in the form of an :class:`mimetools.Message` instance 67 (see `Quick Reference to HTTP Headers <https://www.cs.tut.fi/~jkorpela/http.html>`_) 68 69 * :meth:`getcode` --- return the HTTP status code of the response. 70 71 Raises :exc:`URLError` on errors. 72 73 Note that ``None`` may be returned if no handler handles the request (though the 74 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to 75 ensure this never happens). 76 77 In addition, if proxy settings are detected (for example, when a ``*_proxy`` 78 environment variable like :envvar:`http_proxy` is set), 79 :class:`ProxyHandler` is default installed and makes sure the requests are 80 handled through the proxy. 81 82 .. versionchanged:: 2.6 83 *timeout* was added. 84 85 .. versionchanged:: 2.7.9 86 *cafile*, *capath*, *cadefault*, and *context* were added. 87 88 89 .. function:: install_opener(opener) 90 91 Install an :class:`OpenerDirector` instance as the default global opener. 92 Installing an opener is only necessary if you want urlopen to use that opener; 93 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`. 94 The code does not check for a real :class:`OpenerDirector`, and any class with 95 the appropriate interface will work. 96 97 98 .. function:: build_opener([handler, ...]) 99 100 Return an :class:`OpenerDirector` instance, which chains the handlers in the 101 order given. *handler*\s can be either instances of :class:`BaseHandler`, or 102 subclasses of :class:`BaseHandler` (in which case it must be possible to call 103 the constructor without any parameters). Instances of the following classes 104 will be in front of the *handler*\s, unless the *handler*\s contain them, 105 instances of them or subclasses of them: :class:`ProxyHandler` (if proxy 106 settings are detected), 107 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`, 108 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`, 109 :class:`HTTPErrorProcessor`. 110 111 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported), 112 :class:`HTTPSHandler` will also be added. 113 114 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its 115 :attr:`handler_order` attribute to modify its position in the handlers 116 list. 117 118 The following exceptions are raised as appropriate: 119 120 121 .. exception:: URLError 122 123 The handlers raise this exception (or derived exceptions) when they run into a 124 problem. It is a subclass of :exc:`IOError`. 125 126 .. attribute:: reason 127 128 The reason for this error. It can be a message string or another exception 129 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local 130 URLs). 131 132 133 .. exception:: HTTPError 134 135 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError` 136 can also function as a non-exceptional file-like return value (the same thing 137 that :func:`urlopen` returns). This is useful when handling exotic HTTP 138 errors, such as requests for authentication. 139 140 .. attribute:: code 141 142 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_. 143 This numeric value corresponds to a value found in the dictionary of 144 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`. 145 146 .. attribute:: reason 147 148 The reason for this error. It can be a message string or another exception 149 instance. 150 151 The following classes are provided: 152 153 154 .. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable]) 155 156 This class is an abstraction of a URL request. 157 158 *url* should be a string containing a valid URL. 159 160 *data* may be a string specifying additional data to send to the server, or 161 ``None`` if no such data is needed. Currently HTTP requests are the only ones 162 that use *data*; the HTTP request will be a POST instead of a GET when the 163 *data* parameter is provided. *data* should be a buffer in the standard 164 :mimetype:`application/x-www-form-urlencoded` format. The 165 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and 166 returns a string in this format. 167 168 *headers* should be a dictionary, and will be treated as if :meth:`add_header` 169 was called with each key and value as arguments. This is often used to "spoof" 170 the ``User-Agent`` header value, which is used by a browser to identify itself -- 171 some HTTP servers only allow requests coming from common browsers as opposed 172 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 173 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s 174 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). 175 176 The final two arguments are only of interest for correct handling of third-party 177 HTTP cookies: 178 179 *origin_req_host* should be the request-host of the origin transaction, as 180 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This 181 is the host name or IP address of the original request that was initiated by the 182 user. For example, if the request is for an image in an HTML document, this 183 should be the request-host of the request for the page containing the image. 184 185 *unverifiable* should indicate whether the request is unverifiable, as defined 186 by RFC 2965. It defaults to ``False``. An unverifiable request is one whose URL 187 the user did not have the option to approve. For example, if the request is for 188 an image in an HTML document, and the user had no option to approve the 189 automatic fetching of the image, this should be true. 190 191 192 .. class:: OpenerDirector() 193 194 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained 195 together. It manages the chaining of handlers, and recovery from errors. 196 197 198 .. class:: BaseHandler() 199 200 This is the base class for all registered handlers --- and handles only the 201 simple mechanics of registration. 202 203 204 .. class:: HTTPDefaultErrorHandler() 205 206 A class which defines a default handler for HTTP error responses; all responses 207 are turned into :exc:`HTTPError` exceptions. 208 209 210 .. class:: HTTPRedirectHandler() 211 212 A class to handle redirections. 213 214 215 .. class:: HTTPCookieProcessor([cookiejar]) 216 217 A class to handle HTTP Cookies. 218 219 220 .. class:: ProxyHandler([proxies]) 221 222 Cause requests to go through a proxy. If *proxies* is given, it must be a 223 dictionary mapping protocol names to URLs of proxies. The default is to read 224 the list of proxies from the environment variables 225 :envvar:`<protocol>_proxy`. If no proxy environment variables are set, then 226 in a Windows environment proxy settings are obtained from the registry's 227 Internet Settings section, and in a Mac OS X environment proxy information 228 is retrieved from the OS X System Configuration Framework. 229 230 To disable autodetected proxy pass an empty dictionary. 231 232 .. note:: 233 234 ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; 235 see the documentation on :func:`~urllib.getproxies`. 236 237 238 .. class:: HTTPPasswordMgr() 239 240 Keep a database of ``(realm, uri) -> (user, password)`` mappings. 241 242 243 .. class:: HTTPPasswordMgrWithDefaultRealm() 244 245 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of 246 ``None`` is considered a catch-all realm, which is searched if no other realm 247 fits. 248 249 250 .. class:: AbstractBasicAuthHandler([password_mgr]) 251 252 This is a mixin class that helps with HTTP authentication, both to the remote 253 host and to a proxy. *password_mgr*, if given, should be something that is 254 compatible with :class:`HTTPPasswordMgr`; refer to section 255 :ref:`http-password-mgr` for information on the interface that must be 256 supported. 257 258 259 .. class:: HTTPBasicAuthHandler([password_mgr]) 260 261 Handle authentication with the remote host. *password_mgr*, if given, should be 262 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 263 :ref:`http-password-mgr` for information on the interface that must be 264 supported. 265 266 267 .. class:: ProxyBasicAuthHandler([password_mgr]) 268 269 Handle authentication with the proxy. *password_mgr*, if given, should be 270 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 271 :ref:`http-password-mgr` for information on the interface that must be 272 supported. 273 274 275 .. class:: AbstractDigestAuthHandler([password_mgr]) 276 277 This is a mixin class that helps with HTTP authentication, both to the remote 278 host and to a proxy. *password_mgr*, if given, should be something that is 279 compatible with :class:`HTTPPasswordMgr`; refer to section 280 :ref:`http-password-mgr` for information on the interface that must be 281 supported. 282 283 284 .. class:: HTTPDigestAuthHandler([password_mgr]) 285 286 Handle authentication with the remote host. *password_mgr*, if given, should be 287 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 288 :ref:`http-password-mgr` for information on the interface that must be 289 supported. 290 291 292 .. class:: ProxyDigestAuthHandler([password_mgr]) 293 294 Handle authentication with the proxy. *password_mgr*, if given, should be 295 something that is compatible with :class:`HTTPPasswordMgr`; refer to section 296 :ref:`http-password-mgr` for information on the interface that must be 297 supported. 298 299 300 .. class:: HTTPHandler() 301 302 A class to handle opening of HTTP URLs. 303 304 305 .. class:: HTTPSHandler([debuglevel[, context]]) 306 307 A class to handle opening of HTTPS URLs. *context* has the same meaning as 308 for :class:`httplib.HTTPSConnection`. 309 310 .. versionchanged:: 2.7.9 311 *context* added. 312 313 314 .. class:: FileHandler() 315 316 Open local files. 317 318 319 .. class:: FTPHandler() 320 321 Open FTP URLs. 322 323 324 .. class:: CacheFTPHandler() 325 326 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. 327 328 329 .. class:: UnknownHandler() 330 331 A catch-all class to handle unknown URLs. 332 333 334 .. class:: HTTPErrorProcessor() 335 336 Process HTTP error responses. 337 338 339 .. _request-objects: 340 341 Request Objects 342 --------------- 343 344 The following methods describe all of :class:`Request`'s public interface, and 345 so all must be overridden in subclasses. 346 347 348 .. method:: Request.add_data(data) 349 350 Set the :class:`Request` data to *data*. This is ignored by all handlers except 351 HTTP handlers --- and there it should be a byte string, and will change the 352 request to be ``POST`` rather than ``GET``. 353 354 355 .. method:: Request.get_method() 356 357 Return a string indicating the HTTP request method. This is only meaningful for 358 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``. 359 360 361 .. method:: Request.has_data() 362 363 Return whether the instance has a non-\ ``None`` data. 364 365 366 .. method:: Request.get_data() 367 368 Return the instance's data. 369 370 371 .. method:: Request.add_header(key, val) 372 373 Add another header to the request. Headers are currently ignored by all 374 handlers except HTTP handlers, where they are added to the list of headers sent 375 to the server. Note that there cannot be more than one header with the same 376 name, and later calls will overwrite previous calls in case the *key* collides. 377 Currently, this is no loss of HTTP functionality, since all headers which have 378 meaning when used more than once have a (header-specific) way of gaining the 379 same functionality using only one header. 380 381 382 .. method:: Request.add_unredirected_header(key, header) 383 384 Add a header that will not be added to a redirected request. 385 386 .. versionadded:: 2.4 387 388 389 .. method:: Request.has_header(header) 390 391 Return whether the instance has the named header (checks both regular and 392 unredirected). 393 394 .. versionadded:: 2.4 395 396 397 .. method:: Request.get_full_url() 398 399 Return the URL given in the constructor. 400 401 402 .. method:: Request.get_type() 403 404 Return the type of the URL --- also known as the scheme. 405 406 407 .. method:: Request.get_host() 408 409 Return the host to which a connection will be made. 410 411 412 .. method:: Request.get_selector() 413 414 Return the selector --- the part of the URL that is sent to the server. 415 416 417 .. method:: Request.get_header(header_name, default=None) 418 419 Return the value of the given header. If the header is not present, return 420 the default value. 421 422 423 .. method:: Request.header_items() 424 425 Return a list of tuples (header_name, header_value) of the Request headers. 426 427 428 .. method:: Request.set_proxy(host, type) 429 430 Prepare the request by connecting to a proxy server. The *host* and *type* will 431 replace those of the instance, and the instance's selector will be the original 432 URL given in the constructor. 433 434 435 .. method:: Request.get_origin_req_host() 436 437 Return the request-host of the origin transaction, as defined by :rfc:`2965`. 438 See the documentation for the :class:`Request` constructor. 439 440 441 .. method:: Request.is_unverifiable() 442 443 Return whether the request is unverifiable, as defined by RFC 2965. See the 444 documentation for the :class:`Request` constructor. 445 446 447 .. _opener-director-objects: 448 449 OpenerDirector Objects 450 ---------------------- 451 452 :class:`OpenerDirector` instances have the following methods: 453 454 455 .. method:: OpenerDirector.add_handler(handler) 456 457 *handler* should be an instance of :class:`BaseHandler`. The following 458 methods are searched, and added to the possible chains (note that HTTP errors 459 are a special case). 460 461 * :samp:`{protocol}_open` --- signal that the handler knows how to open 462 *protocol* URLs. 463 464 * :samp:`http_error_{type}` --- signal that the handler knows how to handle 465 HTTP errors with HTTP error code *type*. 466 467 * :samp:`{protocol}_error` --- signal that the handler knows how to handle 468 errors from (non-\ ``http``) *protocol*. 469 470 * :samp:`{protocol}_request` --- signal that the handler knows how to 471 pre-process *protocol* requests. 472 473 * :samp:`{protocol}_response` --- signal that the handler knows how to 474 post-process *protocol* responses. 475 476 477 .. method:: OpenerDirector.open(url[, data][, timeout]) 478 479 Open the given *url* (which can be a request object or a string), optionally 480 passing the given *data*. Arguments, return values and exceptions raised are 481 the same as those of :func:`urlopen` (which simply calls the :meth:`open` 482 method on the currently installed global :class:`OpenerDirector`). The 483 optional *timeout* parameter specifies a timeout in seconds for blocking 484 operations like the connection attempt (if not specified, the global default 485 timeout setting will be used). The timeout feature actually works only for 486 HTTP, HTTPS and FTP connections). 487 488 .. versionchanged:: 2.6 489 *timeout* was added. 490 491 492 .. method:: OpenerDirector.error(proto[, arg[, ...]]) 493 494 Handle an error of the given protocol. This will call the registered error 495 handlers for the given protocol with the given arguments (which are protocol 496 specific). The HTTP protocol is a special case which uses the HTTP response 497 code to determine the specific error handler; refer to the :meth:`http_error_\*` 498 methods of the handler classes. 499 500 Return values and exceptions raised are the same as those of :func:`urlopen`. 501 502 OpenerDirector objects open URLs in three stages: 503 504 The order in which these methods are called within each stage is determined by 505 sorting the handler instances. 506 507 #. Every handler with a method named like :samp:`{protocol}_request` has that 508 method called to pre-process the request. 509 510 #. Handlers with a method named like :samp:`{protocol}_open` are called to handle 511 the request. This stage ends when a handler either returns a non-\ :const:`None` 512 value (ie. a response), or raises an exception (usually :exc:`URLError`). 513 Exceptions are allowed to propagate. 514 515 In fact, the above algorithm is first tried for methods named 516 :meth:`default_open`. If all such methods return :const:`None`, the 517 algorithm is repeated for methods named like :samp:`{protocol}_open`. If all 518 such methods return :const:`None`, the algorithm is repeated for methods 519 named :meth:`unknown_open`. 520 521 Note that the implementation of these methods may involve calls of the parent 522 :class:`OpenerDirector` instance's :meth:`~OpenerDirector.open` and 523 :meth:`~OpenerDirector.error` methods. 524 525 #. Every handler with a method named like :samp:`{protocol}_response` has that 526 method called to post-process the response. 527 528 529 .. _base-handler-objects: 530 531 BaseHandler Objects 532 ------------------- 533 534 :class:`BaseHandler` objects provide a couple of methods that are directly 535 useful, and others that are meant to be used by derived classes. These are 536 intended for direct use: 537 538 539 .. method:: BaseHandler.add_parent(director) 540 541 Add a director as parent. 542 543 544 .. method:: BaseHandler.close() 545 546 Remove any parents. 547 548 The following attributes and methods should only be used by classes derived from 549 :class:`BaseHandler`. 550 551 .. note:: 552 553 The convention has been adopted that subclasses defining 554 :meth:`protocol_request` or :meth:`protocol_response` methods are named 555 :class:`\*Processor`; all others are named :class:`\*Handler`. 556 557 558 .. attribute:: BaseHandler.parent 559 560 A valid :class:`OpenerDirector`, which can be used to open using a different 561 protocol, or handle errors. 562 563 564 .. method:: BaseHandler.default_open(req) 565 566 This method is *not* defined in :class:`BaseHandler`, but subclasses should 567 define it if they want to catch all URLs. 568 569 This method, if implemented, will be called by the parent 570 :class:`OpenerDirector`. It should return a file-like object as described in 571 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``. 572 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for 573 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`). 574 575 This method will be called before any protocol-specific open method. 576 577 578 .. method:: BaseHandler.protocol_open(req) 579 :noindex: 580 581 ("protocol" is to be replaced by the protocol name.) 582 583 This method is *not* defined in :class:`BaseHandler`, but subclasses should 584 define it if they want to handle URLs with the given *protocol*. 585 586 This method, if defined, will be called by the parent :class:`OpenerDirector`. 587 Return values should be the same as for :meth:`default_open`. 588 589 590 .. method:: BaseHandler.unknown_open(req) 591 592 This method is *not* defined in :class:`BaseHandler`, but subclasses should 593 define it if they want to catch all URLs with no specific registered handler to 594 open it. 595 596 This method, if implemented, will be called by the :attr:`parent` 597 :class:`OpenerDirector`. Return values should be the same as for 598 :meth:`default_open`. 599 600 601 .. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) 602 603 This method is *not* defined in :class:`BaseHandler`, but subclasses should 604 override it if they intend to provide a catch-all for otherwise unhandled HTTP 605 errors. It will be called automatically by the :class:`OpenerDirector` getting 606 the error, and should not normally be called in other circumstances. 607 608 *req* will be a :class:`Request` object, *fp* will be a file-like object with 609 the HTTP error body, *code* will be the three-digit code of the error, *msg* 610 will be the user-visible explanation of the code and *hdrs* will be a mapping 611 object with the headers of the error. 612 613 Return values and exceptions raised should be the same as those of 614 :func:`urlopen`. 615 616 617 .. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs) 618 619 *nnn* should be a three-digit HTTP error code. This method is also not defined 620 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a 621 subclass, when an HTTP error with code *nnn* occurs. 622 623 Subclasses should override this method to handle specific HTTP errors. 624 625 Arguments, return values and exceptions raised should be the same as for 626 :meth:`http_error_default`. 627 628 629 .. method:: BaseHandler.protocol_request(req) 630 :noindex: 631 632 ("protocol" is to be replaced by the protocol name.) 633 634 This method is *not* defined in :class:`BaseHandler`, but subclasses should 635 define it if they want to pre-process requests of the given *protocol*. 636 637 This method, if defined, will be called by the parent :class:`OpenerDirector`. 638 *req* will be a :class:`Request` object. The return value should be a 639 :class:`Request` object. 640 641 642 .. method:: BaseHandler.protocol_response(req, response) 643 :noindex: 644 645 ("protocol" is to be replaced by the protocol name.) 646 647 This method is *not* defined in :class:`BaseHandler`, but subclasses should 648 define it if they want to post-process responses of the given *protocol*. 649 650 This method, if defined, will be called by the parent :class:`OpenerDirector`. 651 *req* will be a :class:`Request` object. *response* will be an object 652 implementing the same interface as the return value of :func:`urlopen`. The 653 return value should implement the same interface as the return value of 654 :func:`urlopen`. 655 656 657 .. _http-redirect-handler: 658 659 HTTPRedirectHandler Objects 660 --------------------------- 661 662 .. note:: 663 664 Some HTTP redirections require action from this module's client code. If this 665 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the 666 precise meanings of the various redirection codes. 667 668 669 .. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) 670 671 Return a :class:`Request` or ``None`` in response to a redirect. This is called 672 by the default implementations of the :meth:`http_error_30\*` methods when a 673 redirection is received from the server. If a redirection should take place, 674 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the 675 redirect to *newurl*. Otherwise, raise :exc:`HTTPError` if no other handler 676 should try to handle this URL, or return ``None`` if you can't but another 677 handler might. 678 679 .. note:: 680 681 The default implementation of this method does not strictly follow :rfc:`2616`, 682 which says that 301 and 302 responses to ``POST`` requests must not be 683 automatically redirected without confirmation by the user. In reality, browsers 684 do allow automatic redirection of these responses, changing the POST to a 685 ``GET``, and the default implementation reproduces this behavior. 686 687 688 .. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) 689 690 Redirect to the ``Location:`` or ``URI:`` URL. This method is called by the 691 parent :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. 692 693 694 .. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) 695 696 The same as :meth:`http_error_301`, but called for the 'found' response. 697 698 699 .. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) 700 701 The same as :meth:`http_error_301`, but called for the 'see other' response. 702 703 704 .. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) 705 706 The same as :meth:`http_error_301`, but called for the 'temporary redirect' 707 response. 708 709 710 .. _http-cookie-processor: 711 712 HTTPCookieProcessor Objects 713 --------------------------- 714 715 .. versionadded:: 2.4 716 717 :class:`HTTPCookieProcessor` instances have one attribute: 718 719 720 .. attribute:: HTTPCookieProcessor.cookiejar 721 722 The :class:`cookielib.CookieJar` in which cookies are stored. 723 724 725 .. _proxy-handler: 726 727 ProxyHandler Objects 728 -------------------- 729 730 731 .. method:: ProxyHandler.protocol_open(request) 732 :noindex: 733 734 ("protocol" is to be replaced by the protocol name.) 735 736 The :class:`ProxyHandler` will have a method :samp:`{protocol}_open` for every 737 *protocol* which has a proxy in the *proxies* dictionary given in the 738 constructor. The method will modify requests to go through the proxy, by 739 calling ``request.set_proxy()``, and call the next handler in the chain to 740 actually execute the protocol. 741 742 743 .. _http-password-mgr: 744 745 HTTPPasswordMgr Objects 746 ----------------------- 747 748 These methods are available on :class:`HTTPPasswordMgr` and 749 :class:`HTTPPasswordMgrWithDefaultRealm` objects. 750 751 752 .. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) 753 754 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and 755 *passwd* must be strings. This causes ``(user, passwd)`` to be used as 756 authentication tokens when authentication for *realm* and a super-URI of any of 757 the given URIs is given. 758 759 760 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri) 761 762 Get user/password for given realm and URI, if any. This method will return 763 ``(None, None)`` if there is no matching user/password. 764 765 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be 766 searched if the given *realm* has no matching user/password. 767 768 769 .. _abstract-basic-auth-handler: 770 771 AbstractBasicAuthHandler Objects 772 -------------------------------- 773 774 775 .. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 776 777 Handle an authentication request by getting a user/password pair, and re-trying 778 the request. *authreq* should be the name of the header where the information 779 about the realm is included in the request, *host* specifies the URL and path to 780 authenticate for, *req* should be the (failed) :class:`Request` object, and 781 *headers* should be the error headers. 782 783 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an 784 authority component (e.g. ``"http://python.org/"``). In either case, the 785 authority must not contain a userinfo component (so, ``"python.org"`` and 786 ``"python.org:80"`` are fine, ``"joe:password (a] python.org"`` is not). 787 788 789 .. _http-basic-auth-handler: 790 791 HTTPBasicAuthHandler Objects 792 ---------------------------- 793 794 795 .. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) 796 797 Retry the request with authentication information, if available. 798 799 800 .. _proxy-basic-auth-handler: 801 802 ProxyBasicAuthHandler Objects 803 ----------------------------- 804 805 806 .. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) 807 808 Retry the request with authentication information, if available. 809 810 811 .. _abstract-digest-auth-handler: 812 813 AbstractDigestAuthHandler Objects 814 --------------------------------- 815 816 817 .. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) 818 819 *authreq* should be the name of the header where the information about the realm 820 is included in the request, *host* should be the host to authenticate to, *req* 821 should be the (failed) :class:`Request` object, and *headers* should be the 822 error headers. 823 824 825 .. _http-digest-auth-handler: 826 827 HTTPDigestAuthHandler Objects 828 ----------------------------- 829 830 831 .. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) 832 833 Retry the request with authentication information, if available. 834 835 836 .. _proxy-digest-auth-handler: 837 838 ProxyDigestAuthHandler Objects 839 ------------------------------ 840 841 842 .. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) 843 844 Retry the request with authentication information, if available. 845 846 847 .. _http-handler-objects: 848 849 HTTPHandler Objects 850 ------------------- 851 852 853 .. method:: HTTPHandler.http_open(req) 854 855 Send an HTTP request, which can be either GET or POST, depending on 856 ``req.has_data()``. 857 858 859 .. _https-handler-objects: 860 861 HTTPSHandler Objects 862 -------------------- 863 864 865 .. method:: HTTPSHandler.https_open(req) 866 867 Send an HTTPS request, which can be either GET or POST, depending on 868 ``req.has_data()``. 869 870 871 .. _file-handler-objects: 872 873 FileHandler Objects 874 ------------------- 875 876 877 .. method:: FileHandler.file_open(req) 878 879 Open the file locally, if there is no host name, or the host name is 880 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it 881 using :attr:`parent`. 882 883 884 .. _ftp-handler-objects: 885 886 FTPHandler Objects 887 ------------------ 888 889 890 .. method:: FTPHandler.ftp_open(req) 891 892 Open the FTP file indicated by *req*. The login is always done with empty 893 username and password. 894 895 896 .. _cacheftp-handler-objects: 897 898 CacheFTPHandler Objects 899 ----------------------- 900 901 :class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the 902 following additional methods: 903 904 905 .. method:: CacheFTPHandler.setTimeout(t) 906 907 Set timeout of connections to *t* seconds. 908 909 910 .. method:: CacheFTPHandler.setMaxConns(m) 911 912 Set maximum number of cached connections to *m*. 913 914 915 .. _unknown-handler-objects: 916 917 UnknownHandler Objects 918 ---------------------- 919 920 921 .. method:: UnknownHandler.unknown_open() 922 923 Raise a :exc:`URLError` exception. 924 925 926 .. _http-error-processor-objects: 927 928 HTTPErrorProcessor Objects 929 -------------------------- 930 931 .. versionadded:: 2.4 932 933 934 .. method:: HTTPErrorProcessor.http_response() 935 936 Process HTTP error responses. 937 938 For 200 error codes, the response object is returned immediately. 939 940 For non-200 error codes, this simply passes the job on to the 941 :samp:`{protocol}_error_code` handler methods, via 942 :meth:`OpenerDirector.error`. Eventually, 943 :class:`urllib2.HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no 944 other handler handles the error. 945 946 .. method:: HTTPErrorProcessor.https_response() 947 948 Process HTTPS error responses. 949 950 The behavior is same as :meth:`http_response`. 951 952 953 .. _urllib2-examples: 954 955 Examples 956 -------- 957 958 In addition to the examples below, more examples are given in 959 :ref:`urllib-howto`. 960 961 This example gets the python.org main page and displays the first 100 bytes of 962 it:: 963 964 >>> import urllib2 965 >>> f = urllib2.urlopen('http://www.python.org/') 966 >>> print f.read(100) 967 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 968 <?xml-stylesheet href="./css/ht2html 969 970 Here we are sending a data-stream to the stdin of a CGI and reading the data it 971 returns to us. Note that this example will only work when the Python 972 installation supports SSL. :: 973 974 >>> import urllib2 975 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi', 976 ... data='This data is passed to stdin of the CGI') 977 >>> f = urllib2.urlopen(req) 978 >>> print f.read() 979 Got Data: "This data is passed to stdin of the CGI" 980 981 The code for the sample CGI used in the above example is:: 982 983 #!/usr/bin/env python 984 import sys 985 data = sys.stdin.read() 986 print 'Content-type: text-plain\n\nGot Data: "%s"' % data 987 988 Use of Basic HTTP Authentication:: 989 990 import urllib2 991 # Create an OpenerDirector with support for Basic HTTP Authentication... 992 auth_handler = urllib2.HTTPBasicAuthHandler() 993 auth_handler.add_password(realm='PDQ Application', 994 uri='https://mahler:8092/site-updates.py', 995 user='klem', 996 passwd='kadidd!ehopper') 997 opener = urllib2.build_opener(auth_handler) 998 # ...and install it globally so it can be used with urlopen. 999 urllib2.install_opener(opener) 1000 urllib2.urlopen('http://www.example.com/login.html') 1001 1002 :func:`build_opener` provides many handlers by default, including a 1003 :class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment 1004 variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme 1005 involved. For example, the :envvar:`http_proxy` environment variable is read to 1006 obtain the HTTP proxy's URL. 1007 1008 This example replaces the default :class:`ProxyHandler` with one that uses 1009 programmatically-supplied proxy URLs, and adds proxy authorization support with 1010 :class:`ProxyBasicAuthHandler`. :: 1011 1012 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'}) 1013 proxy_auth_handler = urllib2.ProxyBasicAuthHandler() 1014 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') 1015 1016 opener = urllib2.build_opener(proxy_handler, proxy_auth_handler) 1017 # This time, rather than install the OpenerDirector, we use it directly: 1018 opener.open('http://www.example.com/login.html') 1019 1020 Adding HTTP headers: 1021 1022 Use the *headers* argument to the :class:`Request` constructor, or:: 1023 1024 import urllib2 1025 req = urllib2.Request('http://www.example.com/') 1026 req.add_header('Referer', 'http://www.python.org/') 1027 # Customize the default User-Agent header value: 1028 req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') 1029 r = urllib2.urlopen(req) 1030 1031 :class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to 1032 every :class:`Request`. To change this:: 1033 1034 import urllib2 1035 opener = urllib2.build_opener() 1036 opener.addheaders = [('User-agent', 'Mozilla/5.0')] 1037 opener.open('http://www.example.com/') 1038 1039 Also, remember that a few standard headers (:mailheader:`Content-Length`, 1040 :mailheader:`Content-Type` and :mailheader:`Host`) are added when the 1041 :class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`). 1042 1043