Home | History | Annotate | Download | only in howto
      1 .. _urllib-howto:
      2 
      3 ***********************************************************
      4   HOWTO Fetch Internet Resources Using The urllib Package
      5 ***********************************************************
      6 
      7 :Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
      8 
      9 .. note::
     10 
     11     There is a French translation of an earlier revision of this
     12     HOWTO, available at `urllib2 - Le Manuel manquant
     13     <http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_.
     14 
     15 
     16 
     17 Introduction
     18 ============
     19 
     20 .. sidebar:: Related Articles
     21 
     22     You may also find useful the following article on fetching web resources
     23     with Python:
     24 
     25     * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
     26 
     27         A tutorial on *Basic Authentication*, with examples in Python.
     28 
     29 **urllib.request** is a Python module for fetching URLs
     30 (Uniform Resource Locators). It offers a very simple interface, in the form of
     31 the *urlopen* function. This is capable of fetching URLs using a variety of
     32 different protocols. It also offers a slightly more complex interface for
     33 handling common situations - like basic authentication, cookies, proxies and so
     34 on. These are provided by objects called handlers and openers.
     35 
     36 urllib.request supports fetching URLs for many "URL schemes" (identified by the string
     37 before the ``":"`` in URL - for example ``"ftp"`` is the URL scheme of
     38 ``"ftp://python.org/"``) using their associated network protocols (e.g. FTP, HTTP).
     39 This tutorial focuses on the most common case, HTTP.
     40 
     41 For straightforward situations *urlopen* is very easy to use. But as soon as you
     42 encounter errors or non-trivial cases when opening HTTP URLs, you will need some
     43 understanding of the HyperText Transfer Protocol. The most comprehensive and
     44 authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
     45 not intended to be easy to read. This HOWTO aims to illustrate using *urllib*,
     46 with enough detail about HTTP to help you through. It is not intended to replace
     47 the :mod:`urllib.request` docs, but is supplementary to them.
     48 
     49 
     50 Fetching URLs
     51 =============
     52 
     53 The simplest way to use urllib.request is as follows::
     54 
     55     import urllib.request
     56     with urllib.request.urlopen('http://python.org/') as response:
     57        html = response.read()
     58 
     59 If you wish to retrieve a resource via URL and store it in a temporary
     60 location, you can do so via the :func:`shutil.copyfileobj` and
     61 :func:`tempfile.NamedTemporaryFile` functions::
     62 
     63     import shutil
     64     import tempfile
     65     import urllib.request
     66 
     67     with urllib.request.urlopen('http://python.org/') as response:
     68         with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
     69             shutil.copyfileobj(response, tmp_file)
     70 
     71     with open(tmp_file.name) as html:
     72         pass
     73 
     74 Many uses of urllib will be that simple (note that instead of an 'http:' URL we
     75 could have used a URL starting with 'ftp:', 'file:', etc.).  However, it's the
     76 purpose of this tutorial to explain the more complicated cases, concentrating on
     77 HTTP.
     78 
     79 HTTP is based on requests and responses - the client makes requests and servers
     80 send responses. urllib.request mirrors this with a ``Request`` object which represents
     81 the HTTP request you are making. In its simplest form you create a Request
     82 object that specifies the URL you want to fetch. Calling ``urlopen`` with this
     83 Request object returns a response object for the URL requested. This response is
     84 a file-like object, which means you can for example call ``.read()`` on the
     85 response::
     86 
     87     import urllib.request
     88 
     89     req = urllib.request.Request('http://www.voidspace.org.uk')
     90     with urllib.request.urlopen(req) as response:
     91        the_page = response.read()
     92 
     93 Note that urllib.request makes use of the same Request interface to handle all URL
     94 schemes.  For example, you can make an FTP request like so::
     95 
     96     req = urllib.request.Request('ftp://example.com/')
     97 
     98 In the case of HTTP, there are two extra things that Request objects allow you
     99 to do: First, you can pass data to be sent to the server.  Second, you can pass
    100 extra information ("metadata") *about* the data or the about request itself, to
    101 the server - this information is sent as HTTP "headers".  Let's look at each of
    102 these in turn.
    103 
    104 Data
    105 ----
    106 
    107 Sometimes you want to send data to a URL (often the URL will refer to a CGI
    108 (Common Gateway Interface) script or other web application). With HTTP,
    109 this is often done using what's known as a **POST** request. This is often what
    110 your browser does when you submit a HTML form that you filled in on the web. Not
    111 all POSTs have to come from forms: you can use a POST to transmit arbitrary data
    112 to your own application. In the common case of HTML forms, the data needs to be
    113 encoded in a standard way, and then passed to the Request object as the ``data``
    114 argument. The encoding is done using a function from the :mod:`urllib.parse`
    115 library. ::
    116 
    117     import urllib.parse
    118     import urllib.request
    119 
    120     url = 'http://www.someserver.com/cgi-bin/register.cgi'
    121     values = {'name' : 'Michael Foord',
    122               'location' : 'Northampton',
    123               'language' : 'Python' }
    124 
    125     data = urllib.parse.urlencode(values)
    126     data = data.encode('ascii') # data should be bytes
    127     req = urllib.request.Request(url, data)
    128     with urllib.request.urlopen(req) as response:
    129        the_page = response.read()
    130 
    131 Note that other encodings are sometimes required (e.g. for file upload from HTML
    132 forms - see `HTML Specification, Form Submission
    133 <https://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
    134 details).
    135 
    136 If you do not pass the ``data`` argument, urllib uses a **GET** request. One
    137 way in which GET and POST requests differ is that POST requests often have
    138 "side-effects": they change the state of the system in some way (for example by
    139 placing an order with the website for a hundredweight of tinned spam to be
    140 delivered to your door).  Though the HTTP standard makes it clear that POSTs are
    141 intended to *always* cause side-effects, and GET requests *never* to cause
    142 side-effects, nothing prevents a GET request from having side-effects, nor a
    143 POST requests from having no side-effects. Data can also be passed in an HTTP
    144 GET request by encoding it in the URL itself.
    145 
    146 This is done as follows::
    147 
    148     >>> import urllib.request
    149     >>> import urllib.parse
    150     >>> data = {}
    151     >>> data['name'] = 'Somebody Here'
    152     >>> data['location'] = 'Northampton'
    153     >>> data['language'] = 'Python'
    154     >>> url_values = urllib.parse.urlencode(data)
    155     >>> print(url_values)  # The order may differ from below.  #doctest: +SKIP
    156     name=Somebody+Here&language=Python&location=Northampton
    157     >>> url = 'http://www.example.com/example.cgi'
    158     >>> full_url = url + '?' + url_values
    159     >>> data = urllib.request.urlopen(full_url)
    160 
    161 Notice that the full URL is created by adding a ``?`` to the URL, followed by
    162 the encoded values.
    163 
    164 Headers
    165 -------
    166 
    167 We'll discuss here one particular HTTP header, to illustrate how to add headers
    168 to your HTTP request.
    169 
    170 Some websites [#]_ dislike being browsed by programs, or send different versions
    171 to different browsers [#]_. By default urllib identifies itself as
    172 ``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
    173 numbers of the Python release,
    174 e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
    175 not work. The way a browser identifies itself is through the
    176 ``User-Agent`` header [#]_. When you create a Request object you can
    177 pass a dictionary of headers in. The following example makes the same
    178 request as above, but identifies itself as a version of Internet
    179 Explorer [#]_. ::
    180 
    181     import urllib.parse
    182     import urllib.request
    183 
    184     url = 'http://www.someserver.com/cgi-bin/register.cgi'
    185     user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    186     values = {'name': 'Michael Foord',
    187               'location': 'Northampton',
    188               'language': 'Python' }
    189     headers = {'User-Agent': user_agent}
    190 
    191     data = urllib.parse.urlencode(values)
    192     data = data.encode('ascii')
    193     req = urllib.request.Request(url, data, headers)
    194     with urllib.request.urlopen(req) as response:
    195        the_page = response.read()
    196 
    197 The response also has two useful methods. See the section on `info and geturl`_
    198 which comes after we have a look at what happens when things go wrong.
    199 
    200 
    201 Handling Exceptions
    202 ===================
    203 
    204 *urlopen* raises :exc:`URLError` when it cannot handle a response (though as
    205 usual with Python APIs, built-in exceptions such as :exc:`ValueError`,
    206 :exc:`TypeError` etc. may also be raised).
    207 
    208 :exc:`HTTPError` is the subclass of :exc:`URLError` raised in the specific case of
    209 HTTP URLs.
    210 
    211 The exception classes are exported from the :mod:`urllib.error` module.
    212 
    213 URLError
    214 --------
    215 
    216 Often, URLError is raised because there is no network connection (no route to
    217 the specified server), or the specified server doesn't exist.  In this case, the
    218 exception raised will have a 'reason' attribute, which is a tuple containing an
    219 error code and a text error message.
    220 
    221 e.g. ::
    222 
    223     >>> req = urllib.request.Request('http://www.pretend_server.org')
    224     >>> try: urllib.request.urlopen(req)
    225     ... except urllib.error.URLError as e:
    226     ...     print(e.reason)      #doctest: +SKIP
    227     ...
    228     (4, 'getaddrinfo failed')
    229 
    230 
    231 HTTPError
    232 ---------
    233 
    234 Every HTTP response from the server contains a numeric "status code". Sometimes
    235 the status code indicates that the server is unable to fulfil the request. The
    236 default handlers will handle some of these responses for you (for example, if
    237 the response is a "redirection" that requests the client fetch the document from
    238 a different URL, urllib will handle that for you). For those it can't handle,
    239 urlopen will raise an :exc:`HTTPError`. Typical errors include '404' (page not
    240 found), '403' (request forbidden), and '401' (authentication required).
    241 
    242 See section 10 of :rfc:`2616` for a reference on all the HTTP error codes.
    243 
    244 The :exc:`HTTPError` instance raised will have an integer 'code' attribute, which
    245 corresponds to the error sent by the server.
    246 
    247 Error Codes
    248 ~~~~~~~~~~~
    249 
    250 Because the default handlers handle redirects (codes in the 300 range), and
    251 codes in the 100--299 range indicate success, you will usually only see error
    252 codes in the 400--599 range.
    253 
    254 :attr:`http.server.BaseHTTPRequestHandler.responses` is a useful dictionary of
    255 response codes in that shows all the response codes used by :rfc:`2616`. The
    256 dictionary is reproduced here for convenience ::
    257 
    258     # Table mapping response codes to messages; entries have the
    259     # form {code: (shortmessage, longmessage)}.
    260     responses = {
    261         100: ('Continue', 'Request received, please continue'),
    262         101: ('Switching Protocols',
    263               'Switching to new protocol; obey Upgrade header'),
    264 
    265         200: ('OK', 'Request fulfilled, document follows'),
    266         201: ('Created', 'Document created, URL follows'),
    267         202: ('Accepted',
    268               'Request accepted, processing continues off-line'),
    269         203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
    270         204: ('No Content', 'Request fulfilled, nothing follows'),
    271         205: ('Reset Content', 'Clear input form for further input.'),
    272         206: ('Partial Content', 'Partial content follows.'),
    273 
    274         300: ('Multiple Choices',
    275               'Object has several resources -- see URI list'),
    276         301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
    277         302: ('Found', 'Object moved temporarily -- see URI list'),
    278         303: ('See Other', 'Object moved -- see Method and URL list'),
    279         304: ('Not Modified',
    280               'Document has not changed since given time'),
    281         305: ('Use Proxy',
    282               'You must use proxy specified in Location to access this '
    283               'resource.'),
    284         307: ('Temporary Redirect',
    285               'Object moved temporarily -- see URI list'),
    286 
    287         400: ('Bad Request',
    288               'Bad request syntax or unsupported method'),
    289         401: ('Unauthorized',
    290               'No permission -- see authorization schemes'),
    291         402: ('Payment Required',
    292               'No payment -- see charging schemes'),
    293         403: ('Forbidden',
    294               'Request forbidden -- authorization will not help'),
    295         404: ('Not Found', 'Nothing matches the given URI'),
    296         405: ('Method Not Allowed',
    297               'Specified method is invalid for this server.'),
    298         406: ('Not Acceptable', 'URI not available in preferred format.'),
    299         407: ('Proxy Authentication Required', 'You must authenticate with '
    300               'this proxy before proceeding.'),
    301         408: ('Request Timeout', 'Request timed out; try again later.'),
    302         409: ('Conflict', 'Request conflict.'),
    303         410: ('Gone',
    304               'URI no longer exists and has been permanently removed.'),
    305         411: ('Length Required', 'Client must specify Content-Length.'),
    306         412: ('Precondition Failed', 'Precondition in headers is false.'),
    307         413: ('Request Entity Too Large', 'Entity is too large.'),
    308         414: ('Request-URI Too Long', 'URI is too long.'),
    309         415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
    310         416: ('Requested Range Not Satisfiable',
    311               'Cannot satisfy request range.'),
    312         417: ('Expectation Failed',
    313               'Expect condition could not be satisfied.'),
    314 
    315         500: ('Internal Server Error', 'Server got itself in trouble'),
    316         501: ('Not Implemented',
    317               'Server does not support this operation'),
    318         502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
    319         503: ('Service Unavailable',
    320               'The server cannot process the request due to a high load'),
    321         504: ('Gateway Timeout',
    322               'The gateway server did not receive a timely response'),
    323         505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
    324         }
    325 
    326 When an error is raised the server responds by returning an HTTP error code
    327 *and* an error page. You can use the :exc:`HTTPError` instance as a response on the
    328 page returned. This means that as well as the code attribute, it also has read,
    329 geturl, and info, methods as returned by the ``urllib.response`` module::
    330 
    331     >>> req = urllib.request.Request('http://www.python.org/fish.html')
    332     >>> try:
    333     ...     urllib.request.urlopen(req)
    334     ... except urllib.error.HTTPError as e:
    335     ...     print(e.code)
    336     ...     print(e.read())  #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
    337     ...
    338     404
    339     b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    340       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
    341       ...
    342       <title>Page Not Found</title>\n
    343       ...
    344 
    345 Wrapping it Up
    346 --------------
    347 
    348 So if you want to be prepared for :exc:`HTTPError` *or* :exc:`URLError` there are two
    349 basic approaches. I prefer the second approach.
    350 
    351 Number 1
    352 ~~~~~~~~
    353 
    354 ::
    355 
    356 
    357     from urllib.request import Request, urlopen
    358     from urllib.error import URLError, HTTPError
    359     req = Request(someurl)
    360     try:
    361         response = urlopen(req)
    362     except HTTPError as e:
    363         print('The server couldn\'t fulfill the request.')
    364         print('Error code: ', e.code)
    365     except URLError as e:
    366         print('We failed to reach a server.')
    367         print('Reason: ', e.reason)
    368     else:
    369         # everything is fine
    370 
    371 
    372 .. note::
    373 
    374     The ``except HTTPError`` *must* come first, otherwise ``except URLError``
    375     will *also* catch an :exc:`HTTPError`.
    376 
    377 Number 2
    378 ~~~~~~~~
    379 
    380 ::
    381 
    382     from urllib.request import Request, urlopen
    383     from urllib.error import URLError
    384     req = Request(someurl)
    385     try:
    386         response = urlopen(req)
    387     except URLError as e:
    388         if hasattr(e, 'reason'):
    389             print('We failed to reach a server.')
    390             print('Reason: ', e.reason)
    391         elif hasattr(e, 'code'):
    392             print('The server couldn\'t fulfill the request.')
    393             print('Error code: ', e.code)
    394     else:
    395         # everything is fine
    396 
    397 
    398 info and geturl
    399 ===============
    400 
    401 The response returned by urlopen (or the :exc:`HTTPError` instance) has two
    402 useful methods :meth:`info` and :meth:`geturl` and is defined in the module
    403 :mod:`urllib.response`..
    404 
    405 **geturl** - this returns the real URL of the page fetched. This is useful
    406 because ``urlopen`` (or the opener object used) may have followed a
    407 redirect. The URL of the page fetched may not be the same as the URL requested.
    408 
    409 **info** - this returns a dictionary-like object that describes the page
    410 fetched, particularly the headers sent by the server. It is currently an
    411 :class:`http.client.HTTPMessage` instance.
    412 
    413 Typical headers include 'Content-length', 'Content-type', and so on. See the
    414 `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_
    415 for a useful listing of HTTP headers with brief explanations of their meaning
    416 and use.
    417 
    418 
    419 Openers and Handlers
    420 ====================
    421 
    422 When you fetch a URL you use an opener (an instance of the perhaps
    423 confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using
    424 the default opener - via ``urlopen`` - but you can create custom
    425 openers. Openers use handlers. All the "heavy lifting" is done by the
    426 handlers. Each handler knows how to open URLs for a particular URL scheme (http,
    427 ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
    428 redirections or HTTP cookies.
    429 
    430 You will want to create openers if you want to fetch URLs with specific handlers
    431 installed, for example to get an opener that handles cookies, or to get an
    432 opener that does not handle redirections.
    433 
    434 To create an opener, instantiate an ``OpenerDirector``, and then call
    435 ``.add_handler(some_handler_instance)`` repeatedly.
    436 
    437 Alternatively, you can use ``build_opener``, which is a convenience function for
    438 creating opener objects with a single function call.  ``build_opener`` adds
    439 several handlers by default, but provides a quick way to add more and/or
    440 override the default handlers.
    441 
    442 Other sorts of handlers you might want to can handle proxies, authentication,
    443 and other common but slightly specialised situations.
    444 
    445 ``install_opener`` can be used to make an ``opener`` object the (global) default
    446 opener. This means that calls to ``urlopen`` will use the opener you have
    447 installed.
    448 
    449 Opener objects have an ``open`` method, which can be called directly to fetch
    450 urls in the same way as the ``urlopen`` function: there's no need to call
    451 ``install_opener``, except as a convenience.
    452 
    453 
    454 Basic Authentication
    455 ====================
    456 
    457 To illustrate creating and installing a handler we will use the
    458 ``HTTPBasicAuthHandler``. For a more detailed discussion of this subject --
    459 including an explanation of how Basic Authentication works - see the `Basic
    460 Authentication Tutorial
    461 <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
    462 
    463 When authentication is required, the server sends a header (as well as the 401
    464 error code) requesting authentication.  This specifies the authentication scheme
    465 and a 'realm'. The header looks like: ``WWW-Authenticate: SCHEME
    466 realm="REALM"``.
    467 
    468 e.g.
    469 
    470 .. code-block:: none
    471 
    472     WWW-Authenticate: Basic realm="cPanel Users"
    473 
    474 
    475 The client should then retry the request with the appropriate name and password
    476 for the realm included as a header in the request. This is 'basic
    477 authentication'. In order to simplify this process we can create an instance of
    478 ``HTTPBasicAuthHandler`` and an opener to use this handler.
    479 
    480 The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
    481 the mapping of URLs and realms to passwords and usernames. If you know what the
    482 realm is (from the authentication header sent by the server), then you can use a
    483 ``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
    484 case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
    485 you to specify a default username and password for a URL. This will be supplied
    486 in the absence of you providing an alternative combination for a specific
    487 realm. We indicate this by providing ``None`` as the realm argument to the
    488 ``add_password`` method.
    489 
    490 The top-level URL is the first URL that requires authentication. URLs "deeper"
    491 than the URL you pass to .add_password() will also match. ::
    492 
    493     # create a password manager
    494     password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    495 
    496     # Add the username and password.
    497     # If we knew the realm, we could use it instead of None.
    498     top_level_url = "http://example.com/foo/"
    499     password_mgr.add_password(None, top_level_url, username, password)
    500 
    501     handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
    502 
    503     # create "opener" (OpenerDirector instance)
    504     opener = urllib.request.build_opener(handler)
    505 
    506     # use the opener to fetch a URL
    507     opener.open(a_url)
    508 
    509     # Install the opener.
    510     # Now all calls to urllib.request.urlopen use our opener.
    511     urllib.request.install_opener(opener)
    512 
    513 .. note::
    514 
    515     In the above example we only supplied our ``HTTPBasicAuthHandler`` to
    516     ``build_opener``. By default openers have the handlers for normal situations
    517     -- ``ProxyHandler`` (if a proxy setting such as an :envvar:`http_proxy`
    518     environment variable is set), ``UnknownHandler``, ``HTTPHandler``,
    519     ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
    520     ``FileHandler``, ``DataHandler``, ``HTTPErrorProcessor``.
    521 
    522 ``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
    523 component and the hostname and optionally the port number)
    524 e.g. ``"http://example.com/"`` *or* an "authority" (i.e. the hostname,
    525 optionally including the port number) e.g. ``"example.com"`` or ``"example.com:8080"``
    526 (the latter example includes a port number).  The authority, if present, must
    527 NOT contain the "userinfo" component - for example ``"joe:password (a] example.com"`` is
    528 not correct.
    529 
    530 
    531 Proxies
    532 =======
    533 
    534 **urllib** will auto-detect your proxy settings and use those. This is through
    535 the ``ProxyHandler``, which is part of the normal handler chain when a proxy
    536 setting is detected.  Normally that's a good thing, but there are occasions
    537 when it may not be helpful [#]_. One way to do this is to setup our own
    538 ``ProxyHandler``, with no proxies defined. This is done using similar steps to
    539 setting up a `Basic Authentication`_ handler: ::
    540 
    541     >>> proxy_support = urllib.request.ProxyHandler({})
    542     >>> opener = urllib.request.build_opener(proxy_support)
    543     >>> urllib.request.install_opener(opener)
    544 
    545 .. note::
    546 
    547     Currently ``urllib.request`` *does not* support fetching of ``https`` locations
    548     through a proxy.  However, this can be enabled by extending urllib.request as
    549     shown in the recipe [#]_.
    550 
    551 .. note::
    552 
    553     ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; see
    554     the documentation on :func:`~urllib.request.getproxies`.
    555 
    556 
    557 Sockets and Layers
    558 ==================
    559 
    560 The Python support for fetching resources from the web is layered.  urllib uses
    561 the :mod:`http.client` library, which in turn uses the socket library.
    562 
    563 As of Python 2.3 you can specify how long a socket should wait for a response
    564 before timing out. This can be useful in applications which have to fetch web
    565 pages. By default the socket module has *no timeout* and can hang. Currently,
    566 the socket timeout is not exposed at the http.client or urllib.request levels.
    567 However, you can set the default timeout globally for all sockets using ::
    568 
    569     import socket
    570     import urllib.request
    571 
    572     # timeout in seconds
    573     timeout = 10
    574     socket.setdefaulttimeout(timeout)
    575 
    576     # this call to urllib.request.urlopen now uses the default timeout
    577     # we have set in the socket module
    578     req = urllib.request.Request('http://www.voidspace.org.uk')
    579     response = urllib.request.urlopen(req)
    580 
    581 
    582 -------
    583 
    584 
    585 Footnotes
    586 =========
    587 
    588 This document was reviewed and revised by John Lee.
    589 
    590 .. [#] Google for example.
    591 .. [#] Browser sniffing is a very bad practice for website design - building
    592        sites using web standards is much more sensible. Unfortunately a lot of
    593        sites still send different versions to different browsers.
    594 .. [#] The user agent for MSIE 6 is
    595        *'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
    596 .. [#] For details of more HTTP request headers, see
    597        `Quick Reference to HTTP Headers`_.
    598 .. [#] In my case I have to use a proxy to access the internet at work. If you
    599        attempt to fetch *localhost* URLs through this proxy it blocks them. IE
    600        is set to use the proxy, which urllib picks up on. In order to test
    601        scripts with a localhost server, I have to prevent urllib from using
    602        the proxy.
    603 .. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
    604        <https://code.activestate.com/recipes/456195/>`_.
    605 
    606