Home | History | Annotate | Download | only in docs
      1 A Do-It-Yourself Framework
      2 ++++++++++++++++++++++++++
      3 
      4 :author: Ian Bicking <ianb (a] colorstudy.com>
      5 :revision: $Rev$
      6 :date: $LastChangedDate$
      7 
      8 This tutorial has been translated `into Portuguese
      9 <http://montegasppa.blogspot.com/2007/06/um-framework-faa-voc-mesmo.html>`_.
     10 
     11 A newer version of this article is available `using WebOb
     12 <http://pythonpaste.org/webob/do-it-yourself.html>`_.
     13 
     14 .. contents::
     15 
     16 .. comments:
     17 
     18    Explain SCRIPT_NAME/PATH_INFO better
     19 
     20 Introduction and Audience
     21 =========================
     22 
     23 This short tutorial is meant to teach you a little about WSGI, and as
     24 an example a bit about the architecture that Paste has enabled and
     25 encourages.
     26 
     27 This isn't an introduction to all the parts of Paste -- in fact, we'll
     28 only use a few, and explain each part.  This isn't to encourage
     29 everyone to go off and make their own framework (though honestly I
     30 wouldn't mind).  The goal is that when you have finished reading this
     31 you feel more comfortable with some of the frameworks built using this
     32 architecture, and a little more secure that you will understand the
     33 internals if you look under the hood.
     34 
     35 What is WSGI?
     36 =============
     37 
     38 At its simplest WSGI is an interface between web servers and web
     39 applications.  We'll explain the mechanics of WSGI below, but a higher
     40 level view is to say that WSGI lets code pass around web requests in a
     41 fairly formal way.  But there's more!  WSGI is more than just HTTP.
     42 It might seem like it is just *barely* more than HTTP, but that little
     43 bit is important:
     44 
     45 * You pass around a CGI-like environment, which means data like
     46   ``REMOTE_USER`` (the logged-in username) can be securely passed
     47   about.
     48 
     49 * A CGI-like environment can be passed around with more context --
     50   specifically instead of just one path you two: ``SCRIPT_NAME`` (how
     51   we got here) and ``PATH_INFO`` (what we have left).
     52 
     53 * You can -- and often should -- put your own extensions into the WSGI
     54   environment.  This allows for callbacks, extra information,
     55   arbitrary Python objects, or whatever you want.  These are things
     56   you can't put in custom HTTP headers.
     57 
     58 This means that WSGI can be used not just between a web server an an
     59 application, but can be used at all levels for communication.  This
     60 allows web applications to become more like libraries -- well
     61 encapsulated and reusable, but still with rich reusable functionality.
     62 
     63 Writing a WSGI Application
     64 ==========================
     65 
     66 The first part is about how to use `WSGI
     67 <http://www.python.org/peps/pep-0333.html>`_ at its most basic.  You
     68 can read the spec, but I'll do a very brief summary:
     69 
     70 * You will be writing a *WSGI application*.  That's an object that
     71   responds to requests.  An application is just a callable object
     72   (like a function) that takes two arguments: ``environ`` and
     73   ``start_response``.
     74 
     75 * The environment looks a lot like a CGI environment, with keys like
     76   ``REQUEST_METHOD``, ``HTTP_HOST``, etc.
     77 
     78 * The environment also has some special keys like ``wsgi.input`` (the
     79   input stream, like the body of a POST request).
     80 
     81 * ``start_response`` is a function that starts the response -- you
     82   give the status and headers here.
     83 
     84 * Lastly the application returns an iterator with the body response
     85   (commonly this is just a list of strings, or just a list containing
     86   one string that is the entire body.)
     87 
     88 So, here's a simple application::
     89 
     90     def app(environ, start_response):
     91         start_response('200 OK', [('content-type', 'text/html')])
     92         return ['Hello world!']
     93 
     94 Well... that's unsatisfying.  Sure, you can imagine what it does, but
     95 you can't exactly point your web browser at it.
     96 
     97 There's other cleaner ways to do this, but this tutorial isn't about
     98 *clean* it's about *easy-to-understand*.  So just add this to the
     99 bottom of your file::
    100 
    101     if __name__ == '__main__':
    102         from paste import httpserver
    103         httpserver.serve(app, host='127.0.0.1', port='8080')
    104 
    105 Now visit http://localhost:8080 and you should see your new app.
    106 If you want to understand how a WSGI server works, I'd recommend
    107 looking at the `CGI WSGI server
    108 <http://www.python.org/peps/pep-0333.html#the-server-gateway-side>`_
    109 in the WSGI spec.
    110 
    111 An Interactive App
    112 ------------------
    113 
    114 That last app wasn't very interesting.  Let's at least make it
    115 interactive.  To do that we'll give a form, and then parse the form
    116 fields::
    117 
    118     from paste.request import parse_formvars
    119 
    120     def app(environ, start_response):
    121         fields = parse_formvars(environ)
    122         if environ['REQUEST_METHOD'] == 'POST':
    123             start_response('200 OK', [('content-type', 'text/html')])
    124             return ['Hello, ', fields['name'], '!']
    125         else:
    126             start_response('200 OK', [('content-type', 'text/html')])
    127             return ['<form method="POST">Name: <input type="text" '
    128                     'name="name"><input type="submit"></form>']
    129 
    130 The ``parse_formvars`` function just takes the WSGI environment and
    131 calls the `cgi <http://python.org/doc/current/lib/module-cgi.html>`_
    132 module (the ``FieldStorage`` class) and turns that into a MultiDict.
    133 
    134 Now For a Framework
    135 ===================
    136 
    137 Now, this probably feels a bit crude.  After all, we're testing for
    138 things like REQUEST_METHOD to handle more than one thing, and it's
    139 unclear how you can have more than one page.
    140 
    141 We want to build a framework, which is just a kind of generic
    142 application.  In this tutorial we'll implement an *object publisher*,
    143 which is something you may have seen in Zope, Quixote, or CherryPy.
    144 
    145 Object Publishing
    146 -----------------
    147 
    148 In a typical Python object publisher you translate ``/`` to ``.``.  So
    149 ``/articles/view?id=5`` turns into ``root.articles.view(id=5)``.  We
    150 have to start with some root object, of course, which we'll pass in...
    151 
    152 ::
    153 
    154     class ObjectPublisher(object):
    155 
    156         def __init__(self, root):
    157             self.root = root
    158 
    159         def __call__(self, environ, start_response):
    160             ...
    161 
    162     app = ObjectPublisher(my_root_object)
    163 
    164 We override ``__call__`` to make instances of ``ObjectPublisher``
    165 callable objects, just like a function, and just like WSGI
    166 applications.  Now all we have to do is translate that ``environ``
    167 into the thing we are publishing, then call that thing, then turn the
    168 response into what WSGI wants.
    169 
    170 The Path
    171 --------
    172 
    173 WSGI puts the requested path into two variables: ``SCRIPT_NAME`` and
    174 ``PATH_INFO``.  ``SCRIPT_NAME`` is everything that was used up
    175 *getting here*.  ``PATH_INFO`` is everything left over -- it's
    176 the part the framework should be using to find the object.  If you put
    177 the two back together, you get the full path used to get to where we
    178 are right now; this is very useful for generating correct URLs, and
    179 we'll make sure we preserve this.
    180 
    181 So here's how we might implement ``__call__``::
    182 
    183     def __call__(self, environ, start_response):
    184         fields = parse_formvars(environ)
    185         obj = self.find_object(self.root, environ)
    186         response_body = obj(**fields.mixed())
    187         start_response('200 OK', [('content-type', 'text/html')])
    188         return [response_body]
    189 
    190     def find_object(self, obj, environ):
    191         path_info = environ.get('PATH_INFO', '')
    192         if not path_info or path_info == '/':
    193             # We've arrived!
    194             return obj
    195         # PATH_INFO always starts with a /, so we'll get rid of it:
    196         path_info = path_info.lstrip('/')
    197         # Then split the path into the "next" chunk, and everything
    198         # after it ("rest"):
    199         parts = path_info.split('/', 1)
    200         next = parts[0]
    201         if len(parts) == 1:
    202             rest = ''
    203         else:
    204             rest = '/' + parts[1]
    205         # Hide private methods/attributes:
    206         assert not next.startswith('_')
    207         # Now we get the attribute; getattr(a, 'b') is equivalent
    208         # to a.b...
    209         next_obj = getattr(obj, next)
    210         # Now fix up SCRIPT_NAME and PATH_INFO...
    211         environ['SCRIPT_NAME'] += '/' + next
    212         environ['PATH_INFO'] = rest
    213         # and now parse the remaining part of the URL...
    214         return self.find_object(next_obj, environ)
    215 
    216 And that's it, we've got a framework.
    217 
    218 Taking It For a Ride
    219 --------------------
    220 
    221 Now, let's write a little application.  Put that ``ObjectPublisher``
    222 class into a module ``objectpub``::
    223 
    224     from objectpub import ObjectPublisher
    225 
    226     class Root(object):
    227 
    228         # The "index" method:
    229         def __call__(self):
    230             return '''
    231             <form action="welcome">
    232             Name: <input type="text" name="name">
    233             <input type="submit">
    234             </form>
    235             '''
    236 
    237         def welcome(self, name):
    238             return 'Hello %s!' % name
    239 
    240     app = ObjectPublisher(Root())
    241 
    242     if __name__ == '__main__':
    243         from paste import httpserver
    244         httpserver.serve(app, host='127.0.0.1', port='8080')
    245 
    246 Alright, done!  Oh, wait.  There's still some big missing features,
    247 like how do you set headers?  And instead of giving ``404 Not Found``
    248 responses in some places, you'll just get an attribute error.  We'll
    249 fix those up in a later installment...
    250 
    251 Give Me More!
    252 -------------
    253 
    254 You'll notice some things are missing here.  Most specifically,
    255 there's no way to set the output headers, and the information on the
    256 request is a little slim.
    257 
    258 ::
    259 
    260     # This is just a dictionary-like object that has case-
    261     # insensitive keys:
    262     from paste.response import HeaderDict
    263 
    264     class Request(object):
    265         def __init__(self, environ):
    266             self.environ = environ
    267             self.fields = parse_formvars(environ)
    268 
    269     class Response(object):
    270         def __init__(self):
    271             self.headers = HeaderDict(
    272                 {'content-type': 'text/html'})
    273 
    274 Now I'll teach you a little trick.  We don't want to change the
    275 signature of the methods.  But we can't put the request and response
    276 objects in normal global variables, because we want to be
    277 thread-friendly, and all threads see the same global variables (even
    278 if they are processing different requests).
    279 
    280 But Python 2.4 introduced a concept of "thread-local values".  That's
    281 a value that just this one thread can see.  This is in the
    282 `threading.local <http://docs.python.org/lib/module-threading.html>`_
    283 object.  When you create an instance of ``local`` any attributes you
    284 set on that object can only be seen by the thread you set them in.  So
    285 we'll attach the request and response objects here.
    286 
    287 So, let's remind ourselves of what the ``__call__`` function looked
    288 like::
    289 
    290     class ObjectPublisher(object):
    291         ...
    292 
    293         def __call__(self, environ, start_response):
    294             fields = parse_formvars(environ)
    295             obj = self.find_object(self.root, environ)
    296             response_body = obj(**fields.mixed())
    297             start_response('200 OK', [('content-type', 'text/html')])
    298             return [response_body]
    299 
    300 Lets's update that::
    301 
    302     import threading
    303     webinfo = threading.local()
    304 
    305     class ObjectPublisher(object):
    306         ...
    307 
    308         def __call__(self, environ, start_response):
    309             webinfo.request = Request(environ)
    310             webinfo.response = Response()
    311             obj = self.find_object(self.root, environ)
    312             response_body = obj(**dict(webinfo.request.fields))
    313             start_response('200 OK', webinfo.response.headers.items())
    314             return [response_body]
    315 
    316 Now in our method we might do::
    317 
    318     class Root:
    319         def rss(self):
    320             webinfo.response.headers['content-type'] = 'text/xml'
    321             ...
    322 
    323 If we were being fancier we would do things like handle `cookies
    324 <http://python.org/doc/current/lib/module-Cookie.html>`_ in these
    325 objects.  But we aren't going to do that now.  You have a framework,
    326 be happy!
    327 
    328 WSGI Middleware
    329 ===============
    330 
    331 `Middleware
    332 <http://www.python.org/peps/pep-0333.html#middleware-components-that-play-both-sides>`_
    333 is where people get a little intimidated by WSGI and Paste.
    334 
    335 What is middleware?  Middleware is software that serves as an
    336 intermediary.
    337 
    338 
    339 So lets
    340 write one.  We'll write an authentication middleware, so that you can
    341 keep your greeting from being seen by just anyone.
    342 
    343 Let's use HTTP authentication, which also can mystify people a bit.
    344 HTTP authentication is fairly simple:
    345 
    346 * When authentication is requires, we give a ``401 Authentication
    347   Required`` status with a ``WWW-Authenticate: Basic realm="This
    348   Realm"`` header
    349 
    350 * The client then sends back a header ``Authorization: Basic
    351   encoded_info``
    352 
    353 * The "encoded_info" is a base-64 encoded version of
    354   ``username:password``
    355 
    356 So how does this work?  Well, we're writing "middleware", which means
    357 we'll typically pass the request on to another application.  We could
    358 change the request, or change the response, but in this case sometimes
    359 we *won't* pass the request on (like, when we need to give that 401
    360 response).
    361 
    362 To give an example of a really really simple middleware, here's one
    363 that capitalizes the response::
    364 
    365     class Capitalizer(object):
    366 
    367         # We generally pass in the application to be wrapped to
    368         # the middleware constructor:
    369         def __init__(self, wrap_app):
    370             self.wrap_app = wrap_app
    371 
    372         def __call__(self, environ, start_response):
    373             # We call the application we are wrapping with the
    374             # same arguments we get...
    375             response_iter = self.wrap_app(environ, start_response)
    376             # then change the response...
    377             response_string = ''.join(response_iter)
    378             return [response_string.upper()]
    379 
    380 Techically this isn't quite right, because there there's two ways to
    381 return the response body, but we're skimming bits.
    382 `paste.wsgilib.intercept_output
    383 <http://pythonpaste.org/module-paste.wsgilib.html#intercept_output>`_
    384 is a somewhat more thorough implementation of this.
    385 
    386 .. note::
    387 
    388    This, like a lot of parts of this (now fairly old) tutorial is
    389    better, more thorough, and easier using `WebOb
    390    <http://pythonpaste.org/webob/>`_.  This particular example looks
    391    like::
    392 
    393        from webob import Request
    394 
    395        class Capitalizer(object):
    396            def __init__(self, app):
    397                self.app = app
    398            def __call__(self, environ, start_response):
    399                req = Request(environ)
    400                resp = req.get_response(self.app)
    401                resp.body = resp.body.upper()
    402                return resp(environ, start_response)
    403 
    404 So here's some code that does something useful, authentication::
    405 
    406     class AuthMiddleware(object):
    407 
    408         def __init__(self, wrap_app):
    409             self.wrap_app = wrap_app
    410 
    411         def __call__(self, environ, start_response):
    412             if not self.authorized(environ.get('HTTP_AUTHORIZATION')):
    413                 # Essentially self.auth_required is a WSGI application
    414                 # that only knows how to respond with 401...
    415                 return self.auth_required(environ, start_response)
    416             # But if everything is okay, then pass everything through
    417             # to the application we are wrapping...
    418             return self.wrap_app(environ, start_response)
    419 
    420         def authorized(self, auth_header):
    421             if not auth_header:
    422                 # If they didn't give a header, they better login...
    423                 return False
    424             # .split(None, 1) means split in two parts on whitespace:
    425             auth_type, encoded_info = auth_header.split(None, 1)
    426             assert auth_type.lower() == 'basic'
    427             unencoded_info = encoded_info.decode('base64')
    428             username, password = unencoded_info.split(':', 1)
    429             return self.check_password(username, password)
    430 
    431         def check_password(self, username, password):
    432             # Not very high security authentication...
    433             return username == password
    434 
    435         def auth_required(self, environ, start_response):
    436             start_response('401 Authentication Required',
    437                 [('Content-type', 'text/html'),
    438                  ('WWW-Authenticate', 'Basic realm="this realm"')])
    439             return ["""
    440             <html>
    441              <head><title>Authentication Required</title></head>
    442              <body>
    443               <h1>Authentication Required</h1>
    444               If you can't get in, then stay out.
    445              </body>
    446             </html>"""]
    447 
    448 .. note::
    449 
    450    Again, here's the same thing with WebOb::
    451 
    452        from webob import Request, Response
    453 
    454        class AuthMiddleware(object):
    455            def __init__(self, app):
    456                self.app = app
    457            def __call__(self, environ, start_response):
    458                req = Request(environ)
    459                if not self.authorized(req.headers['authorization']):
    460                    resp = self.auth_required(req)
    461                else:
    462                    resp = self.app
    463                return resp(environ, start_response)
    464            def authorized(self, header):
    465                if not header:
    466                    return False
    467                auth_type, encoded = header.split(None, 1)
    468                if not auth_type.lower() == 'basic':
    469                    return False
    470                username, password = encoded.decode('base64').split(':', 1)
    471                return self.check_password(username, password)
    472         def check_password(self, username, password):
    473             return username == password
    474         def auth_required(self, req):
    475             return Response(status=401, headers={'WWW-Authenticate': 'Basic realm="this realm"'},
    476                             body="""\
    477             <html>
    478              <head><title>Authentication Required</title></head>
    479              <body>
    480               <h1>Authentication Required</h1>
    481               If you can't get in, then stay out.
    482              </body>
    483             </html>""")
    484 
    485 So, how do we use this?
    486 
    487 ::
    488 
    489     app = ObjectPublisher(Root())
    490     wrapped_app = AuthMiddleware(app)
    491 
    492     if __name__ == '__main__':
    493         from paste import httpserver
    494         httpserver.serve(wrapped_app, host='127.0.0.1', port='8080')
    495 
    496 Now you have middleware!  Hurrah!
    497 
    498 Give Me More Middleware!
    499 ------------------------
    500 
    501 It's even easier to use other people's middleware than to make your
    502 own, because then you don't have to program.  If you've been following
    503 along, you've probably encountered a few exceptions, and have to look
    504 at the console to see the exception reports.  Let's make that a little
    505 easier, and show the exceptions in the browser...
    506 
    507 ::
    508 
    509     app = ObjectPublisher(Root())
    510     wrapped_app = AuthMiddleware(app)
    511     from paste.exceptions.errormiddleware import ErrorMiddleware
    512     exc_wrapped_app = ErrorMiddleware(wrapped_app)
    513 
    514 Easy!  But let's make it *more* fancy...
    515 
    516 ::
    517 
    518     app = ObjectPublisher(Root())
    519     wrapped_app = AuthMiddleware(app)
    520     from paste.evalexception import EvalException
    521     exc_wrapped_app = EvalException(wrapped_app)
    522 
    523 So go make an error now.  And hit the little +'s.  And type stuff in
    524 to the boxes.
    525 
    526 Conclusion
    527 ==========
    528 
    529 Now that you've created your framework and application (I'm sure it's
    530 much nicer than the one I've given so far).  You might keep writing it
    531 (many people have so far), but even if you don't you should be able to
    532 recognize these components in other frameworks now, and you'll have a
    533 better understanding how they probably work under the covers.
    534 
    535 Also check out the version of this tutorial written `using WebOb
    536 <http://pythonpaste.org/webob/do-it-yourself.html>`_.  That tutorial
    537 includes things like **testing** and **pattern-matching dispatch**
    538 (instead of object publishing).
    539