1 :mod:`cgi` --- Common Gateway Interface support 2 =============================================== 3 4 .. module:: cgi 5 :synopsis: Helpers for running Python scripts via the Common Gateway Interface. 6 7 8 .. index:: 9 pair: WWW; server 10 pair: CGI; protocol 11 pair: HTTP; protocol 12 pair: MIME; headers 13 single: URL 14 single: Common Gateway Interface 15 16 **Source code:** :source:`Lib/cgi.py` 17 18 -------------- 19 20 Support module for Common Gateway Interface (CGI) scripts. 21 22 This module defines a number of utilities for use by CGI scripts written in 23 Python. 24 25 26 Introduction 27 ------------ 28 29 .. _cgi-intro: 30 31 A CGI script is invoked by an HTTP server, usually to process user input 32 submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element. 33 34 Most often, CGI scripts live in the server's special :file:`cgi-bin` directory. 35 The HTTP server places all sorts of information about the request (such as the 36 client's hostname, the requested URL, the query string, and lots of other 37 goodies) in the script's shell environment, executes the script, and sends the 38 script's output back to the client. 39 40 The script's input is connected to the client too, and sometimes the form data 41 is read this way; at other times the form data is passed via the "query string" 42 part of the URL. This module is intended to take care of the different cases 43 and provide a simpler interface to the Python script. It also provides a number 44 of utilities that help in debugging scripts, and the latest addition is support 45 for file uploads from a form (if your browser supports it). 46 47 The output of a CGI script should consist of two sections, separated by a blank 48 line. The first section contains a number of headers, telling the client what 49 kind of data is following. Python code to generate a minimal header section 50 looks like this:: 51 52 print "Content-Type: text/html" # HTML is following 53 print # blank line, end of headers 54 55 The second section is usually HTML, which allows the client software to display 56 nicely formatted text with header, in-line images, etc. Here's Python code that 57 prints a simple piece of HTML:: 58 59 print "<TITLE>CGI script output</TITLE>" 60 print "<H1>This is my first CGI script</H1>" 61 print "Hello, world!" 62 63 64 .. _using-the-cgi-module: 65 66 Using the cgi module 67 -------------------- 68 69 Begin by writing ``import cgi``. Do not use ``from cgi import *`` --- the 70 module defines all sorts of names for its own use or for backward compatibility 71 that you don't want in your namespace. 72 73 When you write a new script, consider adding these lines:: 74 75 import cgitb 76 cgitb.enable() 77 78 This activates a special exception handler that will display detailed reports in 79 the Web browser if any errors occur. If you'd rather not show the guts of your 80 program to users of your script, you can have the reports saved to files 81 instead, with code like this:: 82 83 import cgitb 84 cgitb.enable(display=0, logdir="/path/to/logdir") 85 86 It's very helpful to use this feature during script development. The reports 87 produced by :mod:`cgitb` provide information that can save you a lot of time in 88 tracking down bugs. You can always remove the ``cgitb`` line later when you 89 have tested your script and are confident that it works correctly. 90 91 To get at submitted form data, it's best to use the :class:`FieldStorage` class. 92 The other classes defined in this module are provided mostly for backward 93 compatibility. Instantiate it exactly once, without arguments. This reads the 94 form contents from standard input or the environment (depending on the value of 95 various environment variables set according to the CGI standard). Since it may 96 consume standard input, it should be instantiated only once. 97 98 The :class:`FieldStorage` instance can be indexed like a Python dictionary. 99 It allows membership testing with the :keyword:`in` operator, and also supports 100 the standard dictionary method :meth:`~dict.keys` and the built-in function 101 :func:`len`. Form fields containing empty strings are ignored and do not appear 102 in the dictionary; to keep such values, provide a true value for the optional 103 *keep_blank_values* keyword parameter when creating the :class:`FieldStorage` 104 instance. 105 106 For instance, the following code (which assumes that the 107 :mailheader:`Content-Type` header and blank line have already been printed) 108 checks that the fields ``name`` and ``addr`` are both set to a non-empty 109 string:: 110 111 form = cgi.FieldStorage() 112 if "name" not in form or "addr" not in form: 113 print "<H1>Error</H1>" 114 print "Please fill in the name and addr fields." 115 return 116 print "<p>name:", form["name"].value 117 print "<p>addr:", form["addr"].value 118 ...further form processing here... 119 120 Here the fields, accessed through ``form[key]``, are themselves instances of 121 :class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form 122 encoding). The :attr:`~FieldStorage.value` attribute of the instance yields 123 the string value of the field. The :meth:`~FieldStorage.getvalue` method 124 returns this string value directly; it also accepts an optional second argument 125 as a default to return if the requested key is not present. 126 127 If the submitted form data contains more than one field with the same name, the 128 object retrieved by ``form[key]`` is not a :class:`FieldStorage` or 129 :class:`MiniFieldStorage` instance but a list of such instances. Similarly, in 130 this situation, ``form.getvalue(key)`` would return a list of strings. If you 131 expect this possibility (when your HTML form contains multiple fields with the 132 same name), use the :meth:`~FieldStorage.getlist` method, which always returns 133 a list of values (so that you do not need to special-case the single item 134 case). For example, this code concatenates any number of username fields, 135 separated by commas:: 136 137 value = form.getlist("username") 138 usernames = ",".join(value) 139 140 If a field represents an uploaded file, accessing the value via the 141 :attr:`~FieldStorage.value` attribute or the :func:`~FieldStorage.getvalue` 142 method reads the entire file in memory as a string. This may not be what you 143 want. You can test for an uploaded file by testing either the 144 :attr:`~FieldStorage.filename` attribute or the :attr:`~FieldStorage.file` 145 attribute. You can then read the data at leisure from the :attr:`!file` 146 attribute:: 147 148 fileitem = form["userfile"] 149 if fileitem.file: 150 # It's an uploaded file; count lines 151 linecount = 0 152 while 1: 153 line = fileitem.file.readline() 154 if not line: break 155 linecount = linecount + 1 156 157 If an error is encountered when obtaining the contents of an uploaded file 158 (for example, when the user interrupts the form submission by clicking on 159 a Back or Cancel button) the :attr:`~FieldStorage.done` attribute of the 160 object for the field will be set to the value -1. 161 162 The file upload draft standard entertains the possibility of uploading multiple 163 files from one field (using a recursive :mimetype:`multipart/\*` encoding). 164 When this occurs, the item will be a dictionary-like :class:`FieldStorage` item. 165 This can be determined by testing its :attr:`!type` attribute, which should be 166 :mimetype:`multipart/form-data` (or perhaps another MIME type matching 167 :mimetype:`multipart/\*`). In this case, it can be iterated over recursively 168 just like the top-level form object. 169 170 When a form is submitted in the "old" format (as the query string or as a single 171 data part of type :mimetype:`application/x-www-form-urlencoded`), the items will 172 actually be instances of the class :class:`MiniFieldStorage`. In this case, the 173 :attr:`!list`, :attr:`!file`, and :attr:`filename` attributes are always ``None``. 174 175 A form submitted via POST that also has a query string will contain both 176 :class:`FieldStorage` and :class:`MiniFieldStorage` items. 177 178 Higher Level Interface 179 ---------------------- 180 181 .. versionadded:: 2.2 182 183 The previous section explains how to read CGI form data using the 184 :class:`FieldStorage` class. This section describes a higher level interface 185 which was added to this class to allow one to do it in a more readable and 186 intuitive way. The interface doesn't make the techniques described in previous 187 sections obsolete --- they are still useful to process file uploads efficiently, 188 for example. 189 190 .. XXX: Is this true ? 191 192 The interface consists of two simple methods. Using the methods you can process 193 form data in a generic way, without the need to worry whether only one or more 194 values were posted under one name. 195 196 In the previous section, you learned to write following code anytime you 197 expected a user to post more than one value under one name:: 198 199 item = form.getvalue("item") 200 if isinstance(item, list): 201 # The user is requesting more than one item. 202 else: 203 # The user is requesting only one item. 204 205 This situation is common for example when a form contains a group of multiple 206 checkboxes with the same name:: 207 208 <input type="checkbox" name="item" value="1" /> 209 <input type="checkbox" name="item" value="2" /> 210 211 In most situations, however, there's only one form control with a particular 212 name in a form and then you expect and need only one value associated with this 213 name. So you write a script containing for example this code:: 214 215 user = form.getvalue("user").upper() 216 217 The problem with the code is that you should never expect that a client will 218 provide valid input to your scripts. For example, if a curious user appends 219 another ``user=foo`` pair to the query string, then the script would crash, 220 because in this situation the ``getvalue("user")`` method call returns a list 221 instead of a string. Calling the :meth:`~str.upper` method on a list is not valid 222 (since lists do not have a method of this name) and results in an 223 :exc:`AttributeError` exception. 224 225 Therefore, the appropriate way to read form data values was to always use the 226 code which checks whether the obtained value is a single value or a list of 227 values. That's annoying and leads to less readable scripts. 228 229 A more convenient approach is to use the methods :meth:`~FieldStorage.getfirst` 230 and :meth:`~FieldStorage.getlist` provided by this higher level interface. 231 232 233 .. method:: FieldStorage.getfirst(name[, default]) 234 235 This method always returns only one value associated with form field *name*. 236 The method returns only the first value in case that more values were posted 237 under such name. Please note that the order in which the values are received 238 may vary from browser to browser and should not be counted on. [#]_ If no such 239 form field or value exists then the method returns the value specified by the 240 optional parameter *default*. This parameter defaults to ``None`` if not 241 specified. 242 243 244 .. method:: FieldStorage.getlist(name) 245 246 This method always returns a list of values associated with form field *name*. 247 The method returns an empty list if no such form field or value exists for 248 *name*. It returns a list consisting of one item if only one such value exists. 249 250 Using these methods you can write nice compact code:: 251 252 import cgi 253 form = cgi.FieldStorage() 254 user = form.getfirst("user", "").upper() # This way it's safe. 255 for item in form.getlist("item"): 256 do_something(item) 257 258 259 Old classes 260 ----------- 261 262 .. deprecated:: 2.6 263 264 These classes, present in earlier versions of the :mod:`cgi` module, are 265 still supported for backward compatibility. New applications should use the 266 :class:`FieldStorage` class. 267 268 :class:`SvFormContentDict` stores single value form content as dictionary; it 269 assumes each field name occurs in the form only once. 270 271 :class:`FormContentDict` stores multiple value form content as a dictionary (the 272 form items are lists of values). Useful if your form contains multiple fields 273 with the same name. 274 275 Other classes (:class:`FormContent`, :class:`InterpFormContentDict`) are present 276 for backwards compatibility with really old applications only. 277 278 279 .. _functions-in-cgi-module: 280 281 Functions 282 --------- 283 284 These are useful if you want more control, or if you want to employ some of the 285 algorithms implemented in this module in other circumstances. 286 287 288 .. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]]) 289 290 Parse a query in the environment or from a file (the file defaults to 291 ``sys.stdin`` and environment defaults to ``os.environ``). The *keep_blank_values* and *strict_parsing* parameters are 292 passed to :func:`urlparse.parse_qs` unchanged. 293 294 295 .. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]]) 296 297 This function is deprecated in this module. Use :func:`urlparse.parse_qs` 298 instead. It is maintained here only for backward compatibility. 299 300 .. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]]) 301 302 This function is deprecated in this module. Use :func:`urlparse.parse_qsl` 303 instead. It is maintained here only for backward compatibility. 304 305 .. function:: parse_multipart(fp, pdict) 306 307 Parse input of type :mimetype:`multipart/form-data` (for file uploads). 308 Arguments are *fp* for the input file and *pdict* for a dictionary containing 309 other parameters in the :mailheader:`Content-Type` header. 310 311 Returns a dictionary just like :func:`urlparse.parse_qs` keys are the field names, each 312 value is a list of values for that field. This is easy to use but not much good 313 if you are expecting megabytes to be uploaded --- in that case, use the 314 :class:`FieldStorage` class instead which is much more flexible. 315 316 Note that this does not parse nested multipart parts --- use 317 :class:`FieldStorage` for that. 318 319 320 .. function:: parse_header(string) 321 322 Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a 323 dictionary of parameters. 324 325 326 .. function:: test() 327 328 Robust test CGI script, usable as main program. Writes minimal HTTP headers and 329 formats all information provided to the script in HTML form. 330 331 332 .. function:: print_environ() 333 334 Format the shell environment in HTML. 335 336 337 .. function:: print_form(form) 338 339 Format a form in HTML. 340 341 342 .. function:: print_directory() 343 344 Format the current directory in HTML. 345 346 347 .. function:: print_environ_usage() 348 349 Print a list of useful (used by CGI) environment variables in HTML. 350 351 352 .. function:: escape(s[, quote]) 353 354 Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe 355 sequences. Use this if you need to display text that might contain such 356 characters in HTML. If the optional flag *quote* is true, the quotation mark 357 character (``"``) is also translated; this helps for inclusion in an HTML 358 attribute value delimited by double quotes, as in ``<a href="...">``. Note 359 that single quotes are never translated. 360 361 If the value to be quoted might include single- or double-quote characters, 362 or both, consider using the :func:`~xml.sax.saxutils.quoteattr` function in the 363 :mod:`xml.sax.saxutils` module instead. 364 365 366 .. _cgi-security: 367 368 Caring about security 369 --------------------- 370 371 .. index:: pair: CGI; security 372 373 There's one important rule: if you invoke an external program (via the 374 :func:`os.system` or :func:`os.popen` functions. or others with similar 375 functionality), make very sure you don't pass arbitrary strings received from 376 the client to the shell. This is a well-known security hole whereby clever 377 hackers anywhere on the Web can exploit a gullible CGI script to invoke 378 arbitrary shell commands. Even parts of the URL or field names cannot be 379 trusted, since the request doesn't have to come from your form! 380 381 To be on the safe side, if you must pass a string gotten from a form to a shell 382 command, you should make sure the string contains only alphanumeric characters, 383 dashes, underscores, and periods. 384 385 386 Installing your CGI script on a Unix system 387 ------------------------------------------- 388 389 Read the documentation for your HTTP server and check with your local system 390 administrator to find the directory where CGI scripts should be installed; 391 usually this is in a directory :file:`cgi-bin` in the server tree. 392 393 Make sure that your script is readable and executable by "others"; the Unix file 394 mode should be ``0755`` octal (use ``chmod 0755 filename``). Make sure that the 395 first line of the script contains ``#!`` starting in column 1 followed by the 396 pathname of the Python interpreter, for instance:: 397 398 #!/usr/local/bin/python 399 400 Make sure the Python interpreter exists and is executable by "others". 401 402 Make sure that any files your script needs to read or write are readable or 403 writable, respectively, by "others" --- their mode should be ``0644`` for 404 readable and ``0666`` for writable. This is because, for security reasons, the 405 HTTP server executes your script as user "nobody", without any special 406 privileges. It can only read (write, execute) files that everybody can read 407 (write, execute). The current directory at execution time is also different (it 408 is usually the server's cgi-bin directory) and the set of environment variables 409 is also different from what you get when you log in. In particular, don't count 410 on the shell's search path for executables (:envvar:`PATH`) or the Python module 411 search path (:envvar:`PYTHONPATH`) to be set to anything interesting. 412 413 If you need to load modules from a directory which is not on Python's default 414 module search path, you can change the path in your script, before importing 415 other modules. For example:: 416 417 import sys 418 sys.path.insert(0, "/usr/home/joe/lib/python") 419 sys.path.insert(0, "/usr/local/lib/python") 420 421 (This way, the directory inserted last will be searched first!) 422 423 Instructions for non-Unix systems will vary; check your HTTP server's 424 documentation (it will usually have a section on CGI scripts). 425 426 427 Testing your CGI script 428 ----------------------- 429 430 Unfortunately, a CGI script will generally not run when you try it from the 431 command line, and a script that works perfectly from the command line may fail 432 mysteriously when run from the server. There's one reason why you should still 433 test your script from the command line: if it contains a syntax error, the 434 Python interpreter won't execute it at all, and the HTTP server will most likely 435 send a cryptic error to the client. 436 437 Assuming your script has no syntax errors, yet it does not work, you have no 438 choice but to read the next section. 439 440 441 Debugging CGI scripts 442 --------------------- 443 444 .. index:: pair: CGI; debugging 445 446 First of all, check for trivial installation errors --- reading the section 447 above on installing your CGI script carefully can save you a lot of time. If 448 you wonder whether you have understood the installation procedure correctly, try 449 installing a copy of this module file (:file:`cgi.py`) as a CGI script. When 450 invoked as a script, the file will dump its environment and the contents of the 451 form in HTML form. Give it the right mode etc, and send it a request. If it's 452 installed in the standard :file:`cgi-bin` directory, it should be possible to 453 send it a request by entering a URL into your browser of the form: 454 455 .. code-block:: none 456 457 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home 458 459 If this gives an error of type 404, the server cannot find the script -- perhaps 460 you need to install it in a different directory. If it gives another error, 461 there's an installation problem that you should fix before trying to go any 462 further. If you get a nicely formatted listing of the environment and form 463 content (in this example, the fields should be listed as "addr" with value "At 464 Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been 465 installed correctly. If you follow the same procedure for your own script, you 466 should now be able to debug it. 467 468 The next step could be to call the :mod:`cgi` module's :func:`test` function 469 from your script: replace its main code with the single statement :: 470 471 cgi.test() 472 473 This should produce the same results as those gotten from installing the 474 :file:`cgi.py` file itself. 475 476 When an ordinary Python script raises an unhandled exception (for whatever 477 reason: of a typo in a module name, a file that can't be opened, etc.), the 478 Python interpreter prints a nice traceback and exits. While the Python 479 interpreter will still do this when your CGI script raises an exception, most 480 likely the traceback will end up in one of the HTTP server's log files, or be 481 discarded altogether. 482 483 Fortunately, once you have managed to get your script to execute *some* code, 484 you can easily send tracebacks to the Web browser using the :mod:`cgitb` module. 485 If you haven't done so already, just add the lines:: 486 487 import cgitb 488 cgitb.enable() 489 490 to the top of your script. Then try running it again; when a problem occurs, 491 you should see a detailed report that will likely make apparent the cause of the 492 crash. 493 494 If you suspect that there may be a problem in importing the :mod:`cgitb` module, 495 you can use an even more robust approach (which only uses built-in modules):: 496 497 import sys 498 sys.stderr = sys.stdout 499 print "Content-Type: text/plain" 500 print 501 ...your code here... 502 503 This relies on the Python interpreter to print the traceback. The content type 504 of the output is set to plain text, which disables all HTML processing. If your 505 script works, the raw HTML will be displayed by your client. If it raises an 506 exception, most likely after the first two lines have been printed, a traceback 507 will be displayed. Because no HTML interpretation is going on, the traceback 508 will be readable. 509 510 511 Common problems and solutions 512 ----------------------------- 513 514 * Most HTTP servers buffer the output from CGI scripts until the script is 515 completed. This means that it is not possible to display a progress report on 516 the client's display while the script is running. 517 518 * Check the installation instructions above. 519 520 * Check the HTTP server's log files. (``tail -f logfile`` in a separate window 521 may be useful!) 522 523 * Always check a script for syntax errors first, by doing something like 524 ``python script.py``. 525 526 * If your script does not have any syntax errors, try adding ``import cgitb; 527 cgitb.enable()`` to the top of the script. 528 529 * When invoking external programs, make sure they can be found. Usually, this 530 means using absolute path names --- :envvar:`PATH` is usually not set to a very 531 useful value in a CGI script. 532 533 * When reading or writing external files, make sure they can be read or written 534 by the userid under which your CGI script will be running: this is typically the 535 userid under which the web server is running, or some explicitly specified 536 userid for a web server's ``suexec`` feature. 537 538 * Don't try to give a CGI script a set-uid mode. This doesn't work on most 539 systems, and is a security liability as well. 540 541 .. rubric:: Footnotes 542 543 .. [#] Note that some recent versions of the HTML specification do state what order the 544 field values should be supplied in, but knowing whether a request was 545 received from a conforming browser, or even from a browser at all, is tedious 546 and error-prone. 547