Home | History | Annotate | Download | only in docs
      1 Comment Example
      2 ===============
      3 
      4 .. contents::
      5 
      6 Introduction
      7 ------------
      8 
      9 This is an example of how to write WSGI middleware with WebOb.  The
     10 specific example adds a simple comment form to HTML web pages; any
     11 page served through the middleware that is HTML gets a comment form
     12 added to it, and shows any existing comments.
     13 
     14 Code
     15 ----
     16 
     17 The finished code for this is available in
     18 `docs/comment-example-code/example.py
     19 <https://github.com/Pylons/webob/blob/master/docs/comment-example-code/example.py>`_
     20 -- you can run that file as a script to try it out.
     21 
     22 Instantiating Middleware
     23 ------------------------
     24 
     25 Middleware of any complexity at all is usually best created as a
     26 class with its configuration as arguments to that class.
     27 
     28 Every middleware needs an application (``app``) that it wraps.  This
     29 middleware also needs a location to store the comments; we'll put them
     30 all in a single directory.
     31 
     32 .. code-block:: python
     33 
     34     import os
     35 
     36     class Commenter(object):
     37         def __init__(self, app, storage_dir):
     38             self.app = app
     39             self.storage_dir = storage_dir
     40             if not os.path.exists(storage_dir):
     41                 os.makedirs(storage_dir)
     42 
     43 When you use this middleware, you'll use it like:
     44 
     45 .. code-block:: python
     46 
     47     app = ... make the application ...
     48     app = Commenter(app, storage_dir='./comments')
     49 
     50 For our application we'll use a simple static file server that is
     51 included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
     52 Paste`` to install this).  The setup is all at the bottom of
     53 ``example.py``, and looks like this:
     54 
     55 .. code-block:: python
     56 
     57     if __name__ == '__main__':
     58         import optparse
     59         parser = optparse.OptionParser(
     60             usage='%prog --port=PORT BASE_DIRECTORY'
     61             )
     62         parser.add_option(
     63             '-p', '--port',
     64             default='8080',
     65             dest='port',
     66             type='int',
     67             help='Port to serve on (default 8080)')
     68         parser.add_option(
     69             '--comment-data',
     70             default='./comments',
     71             dest='comment_data',
     72             help='Place to put comment data into (default ./comments/)')
     73         options, args = parser.parse_args()
     74         if not args:
     75             parser.error('You must give a BASE_DIRECTORY')
     76         base_dir = args[0]
     77         from paste.urlparser import StaticURLParser
     78         app = StaticURLParser(base_dir)
     79         app = Commenter(app, options.comment_data)
     80         from wsgiref.simple_server import make_server
     81         httpd = make_server('localhost', options.port, app)
     82         print 'Serving on http://localhost:%s' % options.port
     83         try:
     84             httpd.serve_forever()
     85         except KeyboardInterrupt:
     86             print '^C'
     87 
     88 I won't explain it here, but basically it takes some options, creates
     89 an application that serves static files
     90 (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
     91 options.comment_data)`` then serves that.
     92 
     93 The Middleware
     94 --------------
     95 
     96 While we've created the class structure for the middleware, it doesn't
     97 actually do anything.  Here's a kind of minimal version of the
     98 middleware (using WebOb):
     99 
    100 .. code-block:: python
    101 
    102     from webob import Request
    103 
    104     class Commenter(object):
    105 
    106         def __init__(self, app, storage_dir):
    107             self.app = app
    108             self.storage_dir = storage_dir
    109             if not os.path.exists(storage_dir):
    110                 os.makedirs(storage_dir)
    111 
    112         def __call__(self, environ, start_response):
    113             req = Request(environ)
    114             resp = req.get_response(self.app)
    115             return resp(environ, start_response)
    116 
    117 This doesn't modify the response it any way.  You could write it like
    118 this without WebOb:
    119 
    120 .. code-block:: python
    121 
    122     class Commenter(object):
    123         ...
    124         def __call__(self, environ, start_response):
    125             return self.app(environ, start_response)
    126 
    127 But it won't be as convenient later.  First, lets create a little bit
    128 of infrastructure for our middleware.  We need to save and load
    129 per-url data (the comments themselves).  We'll keep them in pickles,
    130 where each url has a pickle named after the url (but double-quoted, so
    131 ``http://localhost:8080/index.html`` becomes
    132 ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).
    133 
    134 .. code-block:: python
    135 
    136     from cPickle import load, dump
    137 
    138     class Commenter(object):
    139         ...
    140 
    141         def get_data(self, url):
    142             filename = self.url_filename(url)
    143             if not os.path.exists(filename):
    144                 return []
    145             else:
    146                 f = open(filename, 'rb')
    147                 data = load(f)
    148                 f.close()
    149                 return data
    150 
    151         def save_data(self, url, data):
    152             filename = self.url_filename(url)
    153             f = open(filename, 'wb')
    154             dump(data, f)
    155             f.close()
    156 
    157         def url_filename(self, url):
    158             # Double-quoting makes the filename safe
    159             return os.path.join(self.storage_dir, urllib.quote(url, ''))
    160 
    161 You can get the full request URL with ``req.url``, so to get the
    162 comment data with these methods you do ``data =
    163 self.get_data(req.url)``.
    164 
    165 Now we'll update the ``__call__`` method to filter *some* responses,
    166 and get the comment data for those.  We don't want to change responses
    167 that were error responses (anything but ``200``), nor do we want to
    168 filter responses that aren't HTML.  So we get:
    169 
    170 .. code-block:: python
    171 
    172     class Commenter(object):
    173         ...
    174 
    175         def __call__(self, environ, start_response):
    176             req = Request(environ)
    177             resp = req.get_response(self.app)
    178             if resp.content_type != 'text/html' or resp.status_code != 200:
    179                 return resp(environ, start_response)
    180             data = self.get_data(req.url)
    181             ... do stuff with data, update resp ...
    182             return resp(environ, start_response)
    183 
    184 So far we're punting on actually adding the comments to the page.  We
    185 also haven't defined what ``data`` will hold.  Let's say it's a list
    186 of dictionaries, where each dictionary looks like ``{'name': 'John
    187 Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
    188 site!'}``.
    189 
    190 We'll also need a simple method to add stuff to the page.  We'll use a
    191 regular expression to find the end of the page and put text in:
    192 
    193 .. code-block:: python
    194 
    195     import re
    196 
    197     class Commenter(object):
    198         ...
    199 
    200         _end_body_re = re.compile(r'</body.*?>', re.I|re.S)
    201 
    202         def add_to_end(self, html, extra_html):
    203             """
    204             Adds extra_html to the end of the html page (before </body>)
    205             """
    206             match = self._end_body_re.search(html)
    207             if not match:
    208                 return html + extra_html
    209             else:
    210                 return html[:match.start()] + extra_html + html[match.start():]
    211 
    212 And then we'll use it like:
    213 
    214 .. code-block:: python
    215 
    216     data = self.get_data(req.url)
    217     body = resp.body
    218     body = self.add_to_end(body, self.format_comments(data))
    219     resp.body = body
    220     return resp(environ, start_response)
    221 
    222 We get the body, update it, and put it back in the response.  This
    223 also updates ``Content-Length``.  Then we define:
    224 
    225 .. code-block:: python
    226 
    227     from webob import html_escape
    228 
    229     class Commenter(object):
    230         ...
    231 
    232         def format_comments(self, comments):
    233             if not comments:
    234                 return ''
    235             text = []
    236             text.append('<hr>')
    237             text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
    238             for comment in comments:
    239                 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
    240                     html_escape(comment['homepage']), html_escape(comment['name']),
    241                     time.strftime('%c', comment['time'])))
    242                 # Susceptible to XSS attacks!:
    243                 text.append(comment['comments'])
    244             return ''.join(text)
    245 
    246 We put in a header (with an anchor we'll use later), and a section for
    247 each comment.  Note that ``html_escape`` is the same as ``cgi.escape``
    248 and just turns ``&`` into ``&amp;``, etc.
    249 
    250 Because we put in some text without quoting it is susceptible to a
    251 `Cross-Site Scripting
    252 <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack.  Fixing
    253 that is beyond the scope of this tutorial; you could quote it or clean
    254 it with something like `lxml.html.clean
    255 <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.
    256 
    257 Accepting Comments
    258 ------------------
    259 
    260 All of those pieces *display* comments, but still no one can actually
    261 make comments.  To handle this we'll take a little piece of the URL
    262 space for our own, everything under ``/.comments``, so when someone
    263 POSTs there it will add a comment.
    264 
    265 When the request comes in there are two parts to the path:
    266 ``SCRIPT_NAME`` and ``PATH_INFO``.  Everything in ``SCRIPT_NAME`` has
    267 already been parsed, and everything in ``PATH_INFO`` has yet to be
    268 parsed.  That means that the URL *without* ``PATH_INFO`` is the path
    269 to the middleware; we can intercept anything else below
    270 ``SCRIPT_NAME`` but nothing above it.  The name for the URL without
    271 ``PATH_INFO`` is ``req.application_url``.  We have to capture it early
    272 to make sure it doesn't change (since the WSGI application we are
    273 wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).
    274 
    275 So here's what this all looks like:
    276 
    277 .. code-block:: python
    278 
    279     class Commenter(object):
    280         ...
    281 
    282         def __call__(self, environ, start_response):
    283             req = Request(environ)
    284             if req.path_info_peek() == '.comments':
    285                 return self.process_comment(req)(environ, start_response)
    286             # This is the base path of *this* middleware:
    287             base_url = req.application_url
    288             resp = req.get_response(self.app)
    289             if resp.content_type != 'text/html' or resp.status_code != 200:
    290                 # Not an HTML response, we don't want to
    291                 # do anything to it
    292                 return resp(environ, start_response)
    293             # Make sure the content isn't gzipped:
    294             resp.decode_content()
    295             comments = self.get_data(req.url)
    296             body = resp.body
    297             body = self.add_to_end(body, self.format_comments(comments))
    298             body = self.add_to_end(body, self.submit_form(base_url, req))
    299             resp.body = body
    300             return resp(environ, start_response)
    301 
    302 ``base_url`` is the path where the middleware is located (if you run
    303 the example server, it will be ``http://localhost:PORT/``).  We use
    304 ``req.path_info_peek()`` to look at the next segment of the URL --
    305 what comes after base_url.  If it is ``.comments`` then we handle it
    306 internally and don't pass the request on.
    307 
    308 We also put in a little guard, ``resp.decode_content()`` in case the
    309 application returns a gzipped response.
    310 
    311 Then we get the data, add the comments, add the *form* to make new
    312 comments, and return the result.
    313 
    314 submit_form
    315 ~~~~~~~~~~~
    316 
    317 Here's what the form looks like:
    318 
    319 .. code-block:: python
    320 
    321     class Commenter(object):
    322         ...
    323 
    324         def submit_form(self, base_path, req):
    325             return '''<h2>Leave a comment:</h2>
    326             <form action="%s/.comments" method="POST">
    327              <input type="hidden" name="url" value="%s">
    328              <table width="100%%">
    329               <tr><td>Name:</td>
    330                   <td><input type="text" name="name" style="width: 100%%"></td></tr>
    331               <tr><td>URL:</td>
    332                   <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
    333              </table>
    334              Comments:<br>
    335              <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
    336              <input type="submit" value="Submit comment">
    337             </form>
    338             ''' % (base_path, html_escape(req.url))
    339 
    340 Nothing too exciting.  It submits a form with the keys ``url`` (the
    341 URL being commented on), ``name``, ``homepage``, and ``comments``.
    342 
    343 process_comment
    344 ~~~~~~~~~~~~~~~
    345 
    346 If you look at the method call, what we do is call the method then
    347 treat the result as a WSGI application:
    348 
    349 .. code-block:: python
    350 
    351     return self.process_comment(req)(environ, start_response)
    352 
    353 You could write this as:
    354 
    355 .. code-block:: python
    356 
    357     response = self.process_comment(req)
    358     return response(environ, start_response)
    359 
    360 A common pattern in WSGI middleware that *doesn't* use WebOb is to
    361 just do:
    362 
    363 .. code-block:: python
    364 
    365     return self.process_comment(environ, start_response)
    366 
    367 But the WebOb style makes it easier to modify the response if you want
    368 to; modifying a traditional WSGI response/application output requires
    369 changing your logic flow considerably.
    370 
    371 Here's the actual processing code:
    372 
    373 .. code-block:: python
    374 
    375     from webob import exc
    376     from webob import Response
    377 
    378     class Commenter(object):
    379         ...
    380 
    381         def process_comment(self, req):
    382             try:
    383                 url = req.params['url']
    384                 name = req.params['name']
    385                 homepage = req.params['homepage']
    386                 comments = req.params['comments']
    387             except KeyError, e:
    388                 resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
    389                 return resp
    390             data = self.get_data(url)
    391             data.append(dict(
    392                 name=name,
    393                 homepage=homepage,
    394                 comments=comments,
    395                 time=time.gmtime()))
    396             self.save_data(url, data)
    397             resp = exc.HTTPSeeOther(location=url+'#comment-area')
    398             return resp
    399 
    400 We either give a Bad Request response (if the form submission is
    401 somehow malformed), or a redirect back to the original page.
    402 
    403 The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
    404 ``HTTPSeeOther``) are Response subclasses that can be used to quickly
    405 create responses for these non-200 cases where the response body
    406 usually doesn't matter much.
    407 
    408 Conclusion
    409 ----------
    410 
    411 This shows how to make response modifying middleware, which is
    412 probably the most difficult kind of middleware to write with WSGI --
    413 modifying the request is quite simple in comparison, as you simply
    414 update ``environ``.
    415