Home | History | Annotate | Download | only in library
      1 :mod:`cgi` --- Common Gateway Interface support
      2 ===============================================
      3 
      4 .. module:: cgi
      5    :synopsis: Helpers for running Python scripts via the Common Gateway Interface.
      6 
      7 
      8 .. index::
      9    pair: WWW; server
     10    pair: CGI; protocol
     11    pair: HTTP; protocol
     12    pair: MIME; headers
     13    single: URL
     14    single: Common Gateway Interface
     15 
     16 **Source code:** :source:`Lib/cgi.py`
     17 
     18 --------------
     19 
     20 Support module for Common Gateway Interface (CGI) scripts.
     21 
     22 This module defines a number of utilities for use by CGI scripts written in
     23 Python.
     24 
     25 
     26 Introduction
     27 ------------
     28 
     29 .. _cgi-intro:
     30 
     31 A CGI script is invoked by an HTTP server, usually to process user input
     32 submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element.
     33 
     34 Most often, CGI scripts live in the server's special :file:`cgi-bin` directory.
     35 The HTTP server places all sorts of information about the request (such as the
     36 client's hostname, the requested URL, the query string, and lots of other
     37 goodies) in the script's shell environment, executes the script, and sends the
     38 script's output back to the client.
     39 
     40 The script's input is connected to the client too, and sometimes the form data
     41 is read this way; at other times the form data is passed via the "query string"
     42 part of the URL.  This module is intended to take care of the different cases
     43 and provide a simpler interface to the Python script.  It also provides a number
     44 of utilities that help in debugging scripts, and the latest addition is support
     45 for file uploads from a form (if your browser supports it).
     46 
     47 The output of a CGI script should consist of two sections, separated by a blank
     48 line.  The first section contains a number of headers, telling the client what
     49 kind of data is following.  Python code to generate a minimal header section
     50 looks like this::
     51 
     52    print "Content-Type: text/html"     # HTML is following
     53    print                               # blank line, end of headers
     54 
     55 The second section is usually HTML, which allows the client software to display
     56 nicely formatted text with header, in-line images, etc. Here's Python code that
     57 prints a simple piece of HTML::
     58 
     59    print "<TITLE>CGI script output</TITLE>"
     60    print "<H1>This is my first CGI script</H1>"
     61    print "Hello, world!"
     62 
     63 
     64 .. _using-the-cgi-module:
     65 
     66 Using the cgi module
     67 --------------------
     68 
     69 Begin by writing ``import cgi``.  Do not use ``from cgi import *`` --- the
     70 module defines all sorts of names for its own use or for backward compatibility
     71 that you don't want in your namespace.
     72 
     73 When you write a new script, consider adding these lines::
     74 
     75    import cgitb
     76    cgitb.enable()
     77 
     78 This activates a special exception handler that will display detailed reports in
     79 the Web browser if any errors occur.  If you'd rather not show the guts of your
     80 program to users of your script, you can have the reports saved to files
     81 instead, with code like this::
     82 
     83    import cgitb
     84    cgitb.enable(display=0, logdir="/path/to/logdir")
     85 
     86 It's very helpful to use this feature during script development. The reports
     87 produced by :mod:`cgitb` provide information that can save you a lot of time in
     88 tracking down bugs.  You can always remove the ``cgitb`` line later when you
     89 have tested your script and are confident that it works correctly.
     90 
     91 To get at submitted form data, it's best to use the :class:`FieldStorage` class.
     92 The other classes defined in this module are provided mostly for backward
     93 compatibility. Instantiate it exactly once, without arguments.  This reads the
     94 form contents from standard input or the environment (depending on the value of
     95 various environment variables set according to the CGI standard).  Since it may
     96 consume standard input, it should be instantiated only once.
     97 
     98 The :class:`FieldStorage` instance can be indexed like a Python dictionary.
     99 It allows membership testing with the :keyword:`in` operator, and also supports
    100 the standard dictionary method :meth:`~dict.keys` and the built-in function
    101 :func:`len`.  Form fields containing empty strings are ignored and do not appear
    102 in the dictionary; to keep such values, provide a true value for the optional
    103 *keep_blank_values* keyword parameter when creating the :class:`FieldStorage`
    104 instance.
    105 
    106 For instance, the following code (which assumes that the
    107 :mailheader:`Content-Type` header and blank line have already been printed)
    108 checks that the fields ``name`` and ``addr`` are both set to a non-empty
    109 string::
    110 
    111    form = cgi.FieldStorage()
    112    if "name" not in form or "addr" not in form:
    113        print "<H1>Error</H1>"
    114        print "Please fill in the name and addr fields."
    115        return
    116    print "<p>name:", form["name"].value
    117    print "<p>addr:", form["addr"].value
    118    ...further form processing here...
    119 
    120 Here the fields, accessed through ``form[key]``, are themselves instances of
    121 :class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form
    122 encoding). The :attr:`~FieldStorage.value` attribute of the instance yields
    123 the string value of the field.  The :meth:`~FieldStorage.getvalue` method
    124 returns this string value directly; it also accepts an optional second argument
    125 as a default to return if the requested key is not present.
    126 
    127 If the submitted form data contains more than one field with the same name, the
    128 object retrieved by ``form[key]`` is not a :class:`FieldStorage` or
    129 :class:`MiniFieldStorage` instance but a list of such instances.  Similarly, in
    130 this situation, ``form.getvalue(key)`` would return a list of strings. If you
    131 expect this possibility (when your HTML form contains multiple fields with the
    132 same name), use the :meth:`~FieldStorage.getlist` method, which always returns
    133 a list of values (so that you do not need to special-case the single item
    134 case).  For example, this code concatenates any number of username fields,
    135 separated by commas::
    136 
    137    value = form.getlist("username")
    138    usernames = ",".join(value)
    139 
    140 If a field represents an uploaded file, accessing the value via the
    141 :attr:`~FieldStorage.value` attribute or the :func:`~FieldStorage.getvalue`
    142 method reads the entire file in memory as a string.  This may not be what you
    143 want. You can test for an uploaded file by testing either the
    144 :attr:`~FieldStorage.filename` attribute or the :attr:`~FieldStorage.file`
    145 attribute.  You can then read the data at leisure from the :attr:`!file`
    146 attribute::
    147 
    148    fileitem = form["userfile"]
    149    if fileitem.file:
    150        # It's an uploaded file; count lines
    151        linecount = 0
    152        while 1:
    153            line = fileitem.file.readline()
    154            if not line: break
    155            linecount = linecount + 1
    156 
    157 If an error is encountered when obtaining the contents of an uploaded file
    158 (for example, when the user interrupts the form submission by clicking on
    159 a Back or Cancel button) the :attr:`~FieldStorage.done` attribute of the
    160 object for the field will be set to the value -1.
    161 
    162 The file upload draft standard entertains the possibility of uploading multiple
    163 files from one field (using a recursive :mimetype:`multipart/\*` encoding).
    164 When this occurs, the item will be a dictionary-like :class:`FieldStorage` item.
    165 This can be determined by testing its :attr:`!type` attribute, which should be
    166 :mimetype:`multipart/form-data` (or perhaps another MIME type matching
    167 :mimetype:`multipart/\*`).  In this case, it can be iterated over recursively
    168 just like the top-level form object.
    169 
    170 When a form is submitted in the "old" format (as the query string or as a single
    171 data part of type :mimetype:`application/x-www-form-urlencoded`), the items will
    172 actually be instances of the class :class:`MiniFieldStorage`.  In this case, the
    173 :attr:`!list`, :attr:`!file`, and :attr:`filename` attributes are always ``None``.
    174 
    175 A form submitted via POST that also has a query string will contain both
    176 :class:`FieldStorage` and :class:`MiniFieldStorage` items.
    177 
    178 Higher Level Interface
    179 ----------------------
    180 
    181 .. versionadded:: 2.2
    182 
    183 The previous section explains how to read CGI form data using the
    184 :class:`FieldStorage` class.  This section describes a higher level interface
    185 which was added to this class to allow one to do it in a more readable and
    186 intuitive way.  The interface doesn't make the techniques described in previous
    187 sections obsolete --- they are still useful to process file uploads efficiently,
    188 for example.
    189 
    190 .. XXX: Is this true ?
    191 
    192 The interface consists of two simple methods. Using the methods you can process
    193 form data in a generic way, without the need to worry whether only one or more
    194 values were posted under one name.
    195 
    196 In the previous section, you learned to write following code anytime you
    197 expected a user to post more than one value under one name::
    198 
    199    item = form.getvalue("item")
    200    if isinstance(item, list):
    201        # The user is requesting more than one item.
    202    else:
    203        # The user is requesting only one item.
    204 
    205 This situation is common for example when a form contains a group of multiple
    206 checkboxes with the same name::
    207 
    208    <input type="checkbox" name="item" value="1" />
    209    <input type="checkbox" name="item" value="2" />
    210 
    211 In most situations, however, there's only one form control with a particular
    212 name in a form and then you expect and need only one value associated with this
    213 name.  So you write a script containing for example this code::
    214 
    215    user = form.getvalue("user").upper()
    216 
    217 The problem with the code is that you should never expect that a client will
    218 provide valid input to your scripts.  For example, if a curious user appends
    219 another ``user=foo`` pair to the query string, then the script would crash,
    220 because in this situation the ``getvalue("user")`` method call returns a list
    221 instead of a string.  Calling the :meth:`~str.upper` method on a list is not valid
    222 (since lists do not have a method of this name) and results in an
    223 :exc:`AttributeError` exception.
    224 
    225 Therefore, the appropriate way to read form data values was to always use the
    226 code which checks whether the obtained value is a single value or a list of
    227 values.  That's annoying and leads to less readable scripts.
    228 
    229 A more convenient approach is to use the methods :meth:`~FieldStorage.getfirst`
    230 and :meth:`~FieldStorage.getlist` provided by this higher level interface.
    231 
    232 
    233 .. method:: FieldStorage.getfirst(name[, default])
    234 
    235    This method always returns only one value associated with form field *name*.
    236    The method returns only the first value in case that more values were posted
    237    under such name.  Please note that the order in which the values are received
    238    may vary from browser to browser and should not be counted on. [#]_  If no such
    239    form field or value exists then the method returns the value specified by the
    240    optional parameter *default*.  This parameter defaults to ``None`` if not
    241    specified.
    242 
    243 
    244 .. method:: FieldStorage.getlist(name)
    245 
    246    This method always returns a list of values associated with form field *name*.
    247    The method returns an empty list if no such form field or value exists for
    248    *name*.  It returns a list consisting of one item if only one such value exists.
    249 
    250 Using these methods you can write nice compact code::
    251 
    252    import cgi
    253    form = cgi.FieldStorage()
    254    user = form.getfirst("user", "").upper()    # This way it's safe.
    255    for item in form.getlist("item"):
    256        do_something(item)
    257 
    258 
    259 Old classes
    260 -----------
    261 
    262 .. deprecated:: 2.6
    263 
    264    These classes, present in earlier versions of the :mod:`cgi` module, are
    265    still supported for backward compatibility.  New applications should use the
    266    :class:`FieldStorage` class.
    267 
    268 :class:`SvFormContentDict` stores single value form content as dictionary; it
    269 assumes each field name occurs in the form only once.
    270 
    271 :class:`FormContentDict` stores multiple value form content as a dictionary (the
    272 form items are lists of values).  Useful if your form contains multiple fields
    273 with the same name.
    274 
    275 Other classes (:class:`FormContent`, :class:`InterpFormContentDict`) are present
    276 for backwards compatibility with really old applications only.
    277 
    278 
    279 .. _functions-in-cgi-module:
    280 
    281 Functions
    282 ---------
    283 
    284 These are useful if you want more control, or if you want to employ some of the
    285 algorithms implemented in this module in other circumstances.
    286 
    287 
    288 .. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]])
    289 
    290    Parse a query in the environment or from a file (the file defaults to
    291    ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values* and *strict_parsing* parameters are
    292    passed to :func:`urlparse.parse_qs` unchanged.
    293 
    294 
    295 .. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]])
    296 
    297    This function is deprecated in this module. Use :func:`urlparse.parse_qs`
    298    instead. It is maintained here only for backward compatibility.
    299 
    300 .. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]])
    301 
    302    This function is deprecated in this module. Use :func:`urlparse.parse_qsl`
    303    instead. It is maintained here only for backward compatibility.
    304 
    305 .. function:: parse_multipart(fp, pdict)
    306 
    307    Parse input of type :mimetype:`multipart/form-data` (for  file uploads).
    308    Arguments are *fp* for the input file and *pdict* for a dictionary containing
    309    other parameters in the :mailheader:`Content-Type` header.
    310 
    311    Returns a dictionary just like :func:`urlparse.parse_qs` keys are the field names, each
    312    value is a list of values for that field.  This is easy to use but not much good
    313    if you are expecting megabytes to be uploaded --- in that case, use the
    314    :class:`FieldStorage` class instead which is much more flexible.
    315 
    316    Note that this does not parse nested multipart parts --- use
    317    :class:`FieldStorage` for that.
    318 
    319 
    320 .. function:: parse_header(string)
    321 
    322    Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
    323    dictionary of parameters.
    324 
    325 
    326 .. function:: test()
    327 
    328    Robust test CGI script, usable as main program. Writes minimal HTTP headers and
    329    formats all information provided to the script in HTML form.
    330 
    331 
    332 .. function:: print_environ()
    333 
    334    Format the shell environment in HTML.
    335 
    336 
    337 .. function:: print_form(form)
    338 
    339    Format a form in HTML.
    340 
    341 
    342 .. function:: print_directory()
    343 
    344    Format the current directory in HTML.
    345 
    346 
    347 .. function:: print_environ_usage()
    348 
    349    Print a list of useful (used by CGI) environment variables in HTML.
    350 
    351 
    352 .. function:: escape(s[, quote])
    353 
    354    Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe
    355    sequences.  Use this if you need to display text that might contain such
    356    characters in HTML.  If the optional flag *quote* is true, the quotation mark
    357    character (``"``) is also translated; this helps for inclusion in an HTML
    358    attribute value delimited by double quotes, as in ``<a href="...">``.  Note
    359    that single quotes are never translated.
    360 
    361    If the value to be quoted might include single- or double-quote characters,
    362    or both, consider using the :func:`~xml.sax.saxutils.quoteattr` function in the
    363    :mod:`xml.sax.saxutils` module instead.
    364 
    365 
    366 .. _cgi-security:
    367 
    368 Caring about security
    369 ---------------------
    370 
    371 .. index:: pair: CGI; security
    372 
    373 There's one important rule: if you invoke an external program (via the
    374 :func:`os.system` or :func:`os.popen` functions. or others with similar
    375 functionality), make very sure you don't pass arbitrary strings received from
    376 the client to the shell.  This is a well-known security hole whereby clever
    377 hackers anywhere on the Web can exploit a gullible CGI script to invoke
    378 arbitrary shell commands.  Even parts of the URL or field names cannot be
    379 trusted, since the request doesn't have to come from your form!
    380 
    381 To be on the safe side, if you must pass a string gotten from a form to a shell
    382 command, you should make sure the string contains only alphanumeric characters,
    383 dashes, underscores, and periods.
    384 
    385 
    386 Installing your CGI script on a Unix system
    387 -------------------------------------------
    388 
    389 Read the documentation for your HTTP server and check with your local system
    390 administrator to find the directory where CGI scripts should be installed;
    391 usually this is in a directory :file:`cgi-bin` in the server tree.
    392 
    393 Make sure that your script is readable and executable by "others"; the Unix file
    394 mode should be ``0755`` octal (use ``chmod 0755 filename``).  Make sure that the
    395 first line of the script contains ``#!`` starting in column 1 followed by the
    396 pathname of the Python interpreter, for instance::
    397 
    398    #!/usr/local/bin/python
    399 
    400 Make sure the Python interpreter exists and is executable by "others".
    401 
    402 Make sure that any files your script needs to read or write are readable or
    403 writable, respectively, by "others" --- their mode should be ``0644`` for
    404 readable and ``0666`` for writable.  This is because, for security reasons, the
    405 HTTP server executes your script as user "nobody", without any special
    406 privileges.  It can only read (write, execute) files that everybody can read
    407 (write, execute).  The current directory at execution time is also different (it
    408 is usually the server's cgi-bin directory) and the set of environment variables
    409 is also different from what you get when you log in.  In particular, don't count
    410 on the shell's search path for executables (:envvar:`PATH`) or the Python module
    411 search path (:envvar:`PYTHONPATH`) to be set to anything interesting.
    412 
    413 If you need to load modules from a directory which is not on Python's default
    414 module search path, you can change the path in your script, before importing
    415 other modules.  For example::
    416 
    417    import sys
    418    sys.path.insert(0, "/usr/home/joe/lib/python")
    419    sys.path.insert(0, "/usr/local/lib/python")
    420 
    421 (This way, the directory inserted last will be searched first!)
    422 
    423 Instructions for non-Unix systems will vary; check your HTTP server's
    424 documentation (it will usually have a section on CGI scripts).
    425 
    426 
    427 Testing your CGI script
    428 -----------------------
    429 
    430 Unfortunately, a CGI script will generally not run when you try it from the
    431 command line, and a script that works perfectly from the command line may fail
    432 mysteriously when run from the server.  There's one reason why you should still
    433 test your script from the command line: if it contains a syntax error, the
    434 Python interpreter won't execute it at all, and the HTTP server will most likely
    435 send a cryptic error to the client.
    436 
    437 Assuming your script has no syntax errors, yet it does not work, you have no
    438 choice but to read the next section.
    439 
    440 
    441 Debugging CGI scripts
    442 ---------------------
    443 
    444 .. index:: pair: CGI; debugging
    445 
    446 First of all, check for trivial installation errors --- reading the section
    447 above on installing your CGI script carefully can save you a lot of time.  If
    448 you wonder whether you have understood the installation procedure correctly, try
    449 installing a copy of this module file (:file:`cgi.py`) as a CGI script.  When
    450 invoked as a script, the file will dump its environment and the contents of the
    451 form in HTML form. Give it the right mode etc, and send it a request.  If it's
    452 installed in the standard :file:`cgi-bin` directory, it should be possible to
    453 send it a request by entering a URL into your browser of the form:
    454 
    455 .. code-block:: none
    456 
    457    http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
    458 
    459 If this gives an error of type 404, the server cannot find the script -- perhaps
    460 you need to install it in a different directory.  If it gives another error,
    461 there's an installation problem that you should fix before trying to go any
    462 further.  If you get a nicely formatted listing of the environment and form
    463 content (in this example, the fields should be listed as "addr" with value "At
    464 Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been
    465 installed correctly.  If you follow the same procedure for your own script, you
    466 should now be able to debug it.
    467 
    468 The next step could be to call the :mod:`cgi` module's :func:`test` function
    469 from your script: replace its main code with the single statement ::
    470 
    471    cgi.test()
    472 
    473 This should produce the same results as those gotten from installing the
    474 :file:`cgi.py` file itself.
    475 
    476 When an ordinary Python script raises an unhandled exception (for whatever
    477 reason: of a typo in a module name, a file that can't be opened, etc.), the
    478 Python interpreter prints a nice traceback and exits.  While the Python
    479 interpreter will still do this when your CGI script raises an exception, most
    480 likely the traceback will end up in one of the HTTP server's log files, or be
    481 discarded altogether.
    482 
    483 Fortunately, once you have managed to get your script to execute *some* code,
    484 you can easily send tracebacks to the Web browser using the :mod:`cgitb` module.
    485 If you haven't done so already, just add the lines::
    486 
    487    import cgitb
    488    cgitb.enable()
    489 
    490 to the top of your script.  Then try running it again; when a problem occurs,
    491 you should see a detailed report that will likely make apparent the cause of the
    492 crash.
    493 
    494 If you suspect that there may be a problem in importing the :mod:`cgitb` module,
    495 you can use an even more robust approach (which only uses built-in modules)::
    496 
    497    import sys
    498    sys.stderr = sys.stdout
    499    print "Content-Type: text/plain"
    500    print
    501    ...your code here...
    502 
    503 This relies on the Python interpreter to print the traceback.  The content type
    504 of the output is set to plain text, which disables all HTML processing.  If your
    505 script works, the raw HTML will be displayed by your client.  If it raises an
    506 exception, most likely after the first two lines have been printed, a traceback
    507 will be displayed. Because no HTML interpretation is going on, the traceback
    508 will be readable.
    509 
    510 
    511 Common problems and solutions
    512 -----------------------------
    513 
    514 * Most HTTP servers buffer the output from CGI scripts until the script is
    515   completed.  This means that it is not possible to display a progress report on
    516   the client's display while the script is running.
    517 
    518 * Check the installation instructions above.
    519 
    520 * Check the HTTP server's log files.  (``tail -f logfile`` in a separate window
    521   may be useful!)
    522 
    523 * Always check a script for syntax errors first, by doing something like
    524   ``python script.py``.
    525 
    526 * If your script does not have any syntax errors, try adding ``import cgitb;
    527   cgitb.enable()`` to the top of the script.
    528 
    529 * When invoking external programs, make sure they can be found. Usually, this
    530   means using absolute path names --- :envvar:`PATH` is usually not set to a very
    531   useful value in a CGI script.
    532 
    533 * When reading or writing external files, make sure they can be read or written
    534   by the userid under which your CGI script will be running: this is typically the
    535   userid under which the web server is running, or some explicitly specified
    536   userid for a web server's ``suexec`` feature.
    537 
    538 * Don't try to give a CGI script a set-uid mode.  This doesn't work on most
    539   systems, and is a security liability as well.
    540 
    541 .. rubric:: Footnotes
    542 
    543 .. [#] Note that some recent versions of the HTML specification do state what order the
    544    field values should be supplied in, but knowing whether a request was
    545    received from a conforming browser, or even from a browser at all, is tedious
    546    and error-prone.
    547