1 A Do-It-Yourself Framework 2 ++++++++++++++++++++++++++ 3 4 :author: Ian Bicking <ianb (a] colorstudy.com> 5 :revision: $Rev$ 6 :date: $LastChangedDate$ 7 8 This tutorial has been translated `into Portuguese 9 <http://montegasppa.blogspot.com/2007/06/um-framework-faa-voc-mesmo.html>`_. 10 11 A newer version of this article is available `using WebOb 12 <http://pythonpaste.org/webob/do-it-yourself.html>`_. 13 14 .. contents:: 15 16 .. comments: 17 18 Explain SCRIPT_NAME/PATH_INFO better 19 20 Introduction and Audience 21 ========================= 22 23 This short tutorial is meant to teach you a little about WSGI, and as 24 an example a bit about the architecture that Paste has enabled and 25 encourages. 26 27 This isn't an introduction to all the parts of Paste -- in fact, we'll 28 only use a few, and explain each part. This isn't to encourage 29 everyone to go off and make their own framework (though honestly I 30 wouldn't mind). The goal is that when you have finished reading this 31 you feel more comfortable with some of the frameworks built using this 32 architecture, and a little more secure that you will understand the 33 internals if you look under the hood. 34 35 What is WSGI? 36 ============= 37 38 At its simplest WSGI is an interface between web servers and web 39 applications. We'll explain the mechanics of WSGI below, but a higher 40 level view is to say that WSGI lets code pass around web requests in a 41 fairly formal way. But there's more! WSGI is more than just HTTP. 42 It might seem like it is just *barely* more than HTTP, but that little 43 bit is important: 44 45 * You pass around a CGI-like environment, which means data like 46 ``REMOTE_USER`` (the logged-in username) can be securely passed 47 about. 48 49 * A CGI-like environment can be passed around with more context -- 50 specifically instead of just one path you two: ``SCRIPT_NAME`` (how 51 we got here) and ``PATH_INFO`` (what we have left). 52 53 * You can -- and often should -- put your own extensions into the WSGI 54 environment. This allows for callbacks, extra information, 55 arbitrary Python objects, or whatever you want. These are things 56 you can't put in custom HTTP headers. 57 58 This means that WSGI can be used not just between a web server an an 59 application, but can be used at all levels for communication. This 60 allows web applications to become more like libraries -- well 61 encapsulated and reusable, but still with rich reusable functionality. 62 63 Writing a WSGI Application 64 ========================== 65 66 The first part is about how to use `WSGI 67 <http://www.python.org/peps/pep-0333.html>`_ at its most basic. You 68 can read the spec, but I'll do a very brief summary: 69 70 * You will be writing a *WSGI application*. That's an object that 71 responds to requests. An application is just a callable object 72 (like a function) that takes two arguments: ``environ`` and 73 ``start_response``. 74 75 * The environment looks a lot like a CGI environment, with keys like 76 ``REQUEST_METHOD``, ``HTTP_HOST``, etc. 77 78 * The environment also has some special keys like ``wsgi.input`` (the 79 input stream, like the body of a POST request). 80 81 * ``start_response`` is a function that starts the response -- you 82 give the status and headers here. 83 84 * Lastly the application returns an iterator with the body response 85 (commonly this is just a list of strings, or just a list containing 86 one string that is the entire body.) 87 88 So, here's a simple application:: 89 90 def app(environ, start_response): 91 start_response('200 OK', [('content-type', 'text/html')]) 92 return ['Hello world!'] 93 94 Well... that's unsatisfying. Sure, you can imagine what it does, but 95 you can't exactly point your web browser at it. 96 97 There's other cleaner ways to do this, but this tutorial isn't about 98 *clean* it's about *easy-to-understand*. So just add this to the 99 bottom of your file:: 100 101 if __name__ == '__main__': 102 from paste import httpserver 103 httpserver.serve(app, host='127.0.0.1', port='8080') 104 105 Now visit http://localhost:8080 and you should see your new app. 106 If you want to understand how a WSGI server works, I'd recommend 107 looking at the `CGI WSGI server 108 <http://www.python.org/peps/pep-0333.html#the-server-gateway-side>`_ 109 in the WSGI spec. 110 111 An Interactive App 112 ------------------ 113 114 That last app wasn't very interesting. Let's at least make it 115 interactive. To do that we'll give a form, and then parse the form 116 fields:: 117 118 from paste.request import parse_formvars 119 120 def app(environ, start_response): 121 fields = parse_formvars(environ) 122 if environ['REQUEST_METHOD'] == 'POST': 123 start_response('200 OK', [('content-type', 'text/html')]) 124 return ['Hello, ', fields['name'], '!'] 125 else: 126 start_response('200 OK', [('content-type', 'text/html')]) 127 return ['<form method="POST">Name: <input type="text" ' 128 'name="name"><input type="submit"></form>'] 129 130 The ``parse_formvars`` function just takes the WSGI environment and 131 calls the `cgi <http://python.org/doc/current/lib/module-cgi.html>`_ 132 module (the ``FieldStorage`` class) and turns that into a MultiDict. 133 134 Now For a Framework 135 =================== 136 137 Now, this probably feels a bit crude. After all, we're testing for 138 things like REQUEST_METHOD to handle more than one thing, and it's 139 unclear how you can have more than one page. 140 141 We want to build a framework, which is just a kind of generic 142 application. In this tutorial we'll implement an *object publisher*, 143 which is something you may have seen in Zope, Quixote, or CherryPy. 144 145 Object Publishing 146 ----------------- 147 148 In a typical Python object publisher you translate ``/`` to ``.``. So 149 ``/articles/view?id=5`` turns into ``root.articles.view(id=5)``. We 150 have to start with some root object, of course, which we'll pass in... 151 152 :: 153 154 class ObjectPublisher(object): 155 156 def __init__(self, root): 157 self.root = root 158 159 def __call__(self, environ, start_response): 160 ... 161 162 app = ObjectPublisher(my_root_object) 163 164 We override ``__call__`` to make instances of ``ObjectPublisher`` 165 callable objects, just like a function, and just like WSGI 166 applications. Now all we have to do is translate that ``environ`` 167 into the thing we are publishing, then call that thing, then turn the 168 response into what WSGI wants. 169 170 The Path 171 -------- 172 173 WSGI puts the requested path into two variables: ``SCRIPT_NAME`` and 174 ``PATH_INFO``. ``SCRIPT_NAME`` is everything that was used up 175 *getting here*. ``PATH_INFO`` is everything left over -- it's 176 the part the framework should be using to find the object. If you put 177 the two back together, you get the full path used to get to where we 178 are right now; this is very useful for generating correct URLs, and 179 we'll make sure we preserve this. 180 181 So here's how we might implement ``__call__``:: 182 183 def __call__(self, environ, start_response): 184 fields = parse_formvars(environ) 185 obj = self.find_object(self.root, environ) 186 response_body = obj(**fields.mixed()) 187 start_response('200 OK', [('content-type', 'text/html')]) 188 return [response_body] 189 190 def find_object(self, obj, environ): 191 path_info = environ.get('PATH_INFO', '') 192 if not path_info or path_info == '/': 193 # We've arrived! 194 return obj 195 # PATH_INFO always starts with a /, so we'll get rid of it: 196 path_info = path_info.lstrip('/') 197 # Then split the path into the "next" chunk, and everything 198 # after it ("rest"): 199 parts = path_info.split('/', 1) 200 next = parts[0] 201 if len(parts) == 1: 202 rest = '' 203 else: 204 rest = '/' + parts[1] 205 # Hide private methods/attributes: 206 assert not next.startswith('_') 207 # Now we get the attribute; getattr(a, 'b') is equivalent 208 # to a.b... 209 next_obj = getattr(obj, next) 210 # Now fix up SCRIPT_NAME and PATH_INFO... 211 environ['SCRIPT_NAME'] += '/' + next 212 environ['PATH_INFO'] = rest 213 # and now parse the remaining part of the URL... 214 return self.find_object(next_obj, environ) 215 216 And that's it, we've got a framework. 217 218 Taking It For a Ride 219 -------------------- 220 221 Now, let's write a little application. Put that ``ObjectPublisher`` 222 class into a module ``objectpub``:: 223 224 from objectpub import ObjectPublisher 225 226 class Root(object): 227 228 # The "index" method: 229 def __call__(self): 230 return ''' 231 <form action="welcome"> 232 Name: <input type="text" name="name"> 233 <input type="submit"> 234 </form> 235 ''' 236 237 def welcome(self, name): 238 return 'Hello %s!' % name 239 240 app = ObjectPublisher(Root()) 241 242 if __name__ == '__main__': 243 from paste import httpserver 244 httpserver.serve(app, host='127.0.0.1', port='8080') 245 246 Alright, done! Oh, wait. There's still some big missing features, 247 like how do you set headers? And instead of giving ``404 Not Found`` 248 responses in some places, you'll just get an attribute error. We'll 249 fix those up in a later installment... 250 251 Give Me More! 252 ------------- 253 254 You'll notice some things are missing here. Most specifically, 255 there's no way to set the output headers, and the information on the 256 request is a little slim. 257 258 :: 259 260 # This is just a dictionary-like object that has case- 261 # insensitive keys: 262 from paste.response import HeaderDict 263 264 class Request(object): 265 def __init__(self, environ): 266 self.environ = environ 267 self.fields = parse_formvars(environ) 268 269 class Response(object): 270 def __init__(self): 271 self.headers = HeaderDict( 272 {'content-type': 'text/html'}) 273 274 Now I'll teach you a little trick. We don't want to change the 275 signature of the methods. But we can't put the request and response 276 objects in normal global variables, because we want to be 277 thread-friendly, and all threads see the same global variables (even 278 if they are processing different requests). 279 280 But Python 2.4 introduced a concept of "thread-local values". That's 281 a value that just this one thread can see. This is in the 282 `threading.local <http://docs.python.org/lib/module-threading.html>`_ 283 object. When you create an instance of ``local`` any attributes you 284 set on that object can only be seen by the thread you set them in. So 285 we'll attach the request and response objects here. 286 287 So, let's remind ourselves of what the ``__call__`` function looked 288 like:: 289 290 class ObjectPublisher(object): 291 ... 292 293 def __call__(self, environ, start_response): 294 fields = parse_formvars(environ) 295 obj = self.find_object(self.root, environ) 296 response_body = obj(**fields.mixed()) 297 start_response('200 OK', [('content-type', 'text/html')]) 298 return [response_body] 299 300 Lets's update that:: 301 302 import threading 303 webinfo = threading.local() 304 305 class ObjectPublisher(object): 306 ... 307 308 def __call__(self, environ, start_response): 309 webinfo.request = Request(environ) 310 webinfo.response = Response() 311 obj = self.find_object(self.root, environ) 312 response_body = obj(**dict(webinfo.request.fields)) 313 start_response('200 OK', webinfo.response.headers.items()) 314 return [response_body] 315 316 Now in our method we might do:: 317 318 class Root: 319 def rss(self): 320 webinfo.response.headers['content-type'] = 'text/xml' 321 ... 322 323 If we were being fancier we would do things like handle `cookies 324 <http://python.org/doc/current/lib/module-Cookie.html>`_ in these 325 objects. But we aren't going to do that now. You have a framework, 326 be happy! 327 328 WSGI Middleware 329 =============== 330 331 `Middleware 332 <http://www.python.org/peps/pep-0333.html#middleware-components-that-play-both-sides>`_ 333 is where people get a little intimidated by WSGI and Paste. 334 335 What is middleware? Middleware is software that serves as an 336 intermediary. 337 338 339 So lets 340 write one. We'll write an authentication middleware, so that you can 341 keep your greeting from being seen by just anyone. 342 343 Let's use HTTP authentication, which also can mystify people a bit. 344 HTTP authentication is fairly simple: 345 346 * When authentication is requires, we give a ``401 Authentication 347 Required`` status with a ``WWW-Authenticate: Basic realm="This 348 Realm"`` header 349 350 * The client then sends back a header ``Authorization: Basic 351 encoded_info`` 352 353 * The "encoded_info" is a base-64 encoded version of 354 ``username:password`` 355 356 So how does this work? Well, we're writing "middleware", which means 357 we'll typically pass the request on to another application. We could 358 change the request, or change the response, but in this case sometimes 359 we *won't* pass the request on (like, when we need to give that 401 360 response). 361 362 To give an example of a really really simple middleware, here's one 363 that capitalizes the response:: 364 365 class Capitalizer(object): 366 367 # We generally pass in the application to be wrapped to 368 # the middleware constructor: 369 def __init__(self, wrap_app): 370 self.wrap_app = wrap_app 371 372 def __call__(self, environ, start_response): 373 # We call the application we are wrapping with the 374 # same arguments we get... 375 response_iter = self.wrap_app(environ, start_response) 376 # then change the response... 377 response_string = ''.join(response_iter) 378 return [response_string.upper()] 379 380 Techically this isn't quite right, because there there's two ways to 381 return the response body, but we're skimming bits. 382 `paste.wsgilib.intercept_output 383 <http://pythonpaste.org/module-paste.wsgilib.html#intercept_output>`_ 384 is a somewhat more thorough implementation of this. 385 386 .. note:: 387 388 This, like a lot of parts of this (now fairly old) tutorial is 389 better, more thorough, and easier using `WebOb 390 <http://pythonpaste.org/webob/>`_. This particular example looks 391 like:: 392 393 from webob import Request 394 395 class Capitalizer(object): 396 def __init__(self, app): 397 self.app = app 398 def __call__(self, environ, start_response): 399 req = Request(environ) 400 resp = req.get_response(self.app) 401 resp.body = resp.body.upper() 402 return resp(environ, start_response) 403 404 So here's some code that does something useful, authentication:: 405 406 class AuthMiddleware(object): 407 408 def __init__(self, wrap_app): 409 self.wrap_app = wrap_app 410 411 def __call__(self, environ, start_response): 412 if not self.authorized(environ.get('HTTP_AUTHORIZATION')): 413 # Essentially self.auth_required is a WSGI application 414 # that only knows how to respond with 401... 415 return self.auth_required(environ, start_response) 416 # But if everything is okay, then pass everything through 417 # to the application we are wrapping... 418 return self.wrap_app(environ, start_response) 419 420 def authorized(self, auth_header): 421 if not auth_header: 422 # If they didn't give a header, they better login... 423 return False 424 # .split(None, 1) means split in two parts on whitespace: 425 auth_type, encoded_info = auth_header.split(None, 1) 426 assert auth_type.lower() == 'basic' 427 unencoded_info = encoded_info.decode('base64') 428 username, password = unencoded_info.split(':', 1) 429 return self.check_password(username, password) 430 431 def check_password(self, username, password): 432 # Not very high security authentication... 433 return username == password 434 435 def auth_required(self, environ, start_response): 436 start_response('401 Authentication Required', 437 [('Content-type', 'text/html'), 438 ('WWW-Authenticate', 'Basic realm="this realm"')]) 439 return [""" 440 <html> 441 <head><title>Authentication Required</title></head> 442 <body> 443 <h1>Authentication Required</h1> 444 If you can't get in, then stay out. 445 </body> 446 </html>"""] 447 448 .. note:: 449 450 Again, here's the same thing with WebOb:: 451 452 from webob import Request, Response 453 454 class AuthMiddleware(object): 455 def __init__(self, app): 456 self.app = app 457 def __call__(self, environ, start_response): 458 req = Request(environ) 459 if not self.authorized(req.headers['authorization']): 460 resp = self.auth_required(req) 461 else: 462 resp = self.app 463 return resp(environ, start_response) 464 def authorized(self, header): 465 if not header: 466 return False 467 auth_type, encoded = header.split(None, 1) 468 if not auth_type.lower() == 'basic': 469 return False 470 username, password = encoded.decode('base64').split(':', 1) 471 return self.check_password(username, password) 472 def check_password(self, username, password): 473 return username == password 474 def auth_required(self, req): 475 return Response(status=401, headers={'WWW-Authenticate': 'Basic realm="this realm"'}, 476 body="""\ 477 <html> 478 <head><title>Authentication Required</title></head> 479 <body> 480 <h1>Authentication Required</h1> 481 If you can't get in, then stay out. 482 </body> 483 </html>""") 484 485 So, how do we use this? 486 487 :: 488 489 app = ObjectPublisher(Root()) 490 wrapped_app = AuthMiddleware(app) 491 492 if __name__ == '__main__': 493 from paste import httpserver 494 httpserver.serve(wrapped_app, host='127.0.0.1', port='8080') 495 496 Now you have middleware! Hurrah! 497 498 Give Me More Middleware! 499 ------------------------ 500 501 It's even easier to use other people's middleware than to make your 502 own, because then you don't have to program. If you've been following 503 along, you've probably encountered a few exceptions, and have to look 504 at the console to see the exception reports. Let's make that a little 505 easier, and show the exceptions in the browser... 506 507 :: 508 509 app = ObjectPublisher(Root()) 510 wrapped_app = AuthMiddleware(app) 511 from paste.exceptions.errormiddleware import ErrorMiddleware 512 exc_wrapped_app = ErrorMiddleware(wrapped_app) 513 514 Easy! But let's make it *more* fancy... 515 516 :: 517 518 app = ObjectPublisher(Root()) 519 wrapped_app = AuthMiddleware(app) 520 from paste.evalexception import EvalException 521 exc_wrapped_app = EvalException(wrapped_app) 522 523 So go make an error now. And hit the little +'s. And type stuff in 524 to the boxes. 525 526 Conclusion 527 ========== 528 529 Now that you've created your framework and application (I'm sure it's 530 much nicer than the one I've given so far). You might keep writing it 531 (many people have so far), but even if you don't you should be able to 532 recognize these components in other frameworks now, and you'll have a 533 better understanding how they probably work under the covers. 534 535 Also check out the version of this tutorial written `using WebOb 536 <http://pythonpaste.org/webob/do-it-yourself.html>`_. That tutorial 537 includes things like **testing** and **pattern-matching dispatch** 538 (instead of object publishing). 539