Home | History | Annotate | Download | only in docs
      1 The Paste HTTP Server Thread Pool
      2 =================================
      3 
      4 This document describes how the thread pool in ``paste.httpserver``
      5 works, and how it can adapt to problems.
      6 
      7 Note all of the configuration parameters listed here are prefixed with
      8 ``threadpool_`` when running through a Paste Deploy configuration.
      9 
     10 Error Cases
     11 -----------
     12 
     13 When a WSGI application is called, it's possible that it will block
     14 indefinitely.  There's two basic ways you can manage threads:
     15 
     16 * Start a thread on every request, close it down when the thread stops
     17 
     18 * Start a pool of threads, and reuse those threads for subsequent
     19   requests
     20 
     21 In both cases things go wrong -- if you start a thread every request
     22 you will have an explosion of threads, and with it memory and a loss
     23 of performance.  This can culminate in really high loads, swapping,
     24 and the whole site grinds to a halt.
     25 
     26 If you are using a pool of threads, all the threads can simply be used
     27 up.  New requests go into a queue to be processed, but since that
     28 queue never moves forward everyone will just block.  The site
     29 basically freezes, though memory usage doesn't generally get worse.
     30 
     31 Paste Thread Pool
     32 -----------------
     33 
     34 The thread pool in Paste has some options to walk the razor's edge
     35 between the two techniques, and to try to respond usefully in most
     36 cases.
     37 
     38 The pool tracks all workers threads.  Threads can be in a few states:
     39 
     40 * Idle, waiting for a request ("idle")
     41 
     42 * Working on a request
     43 
     44   * For a reasonable amount of time ("busy")
     45 
     46   * For an unreasonably long amount of time ("hung")
     47 
     48 * Thread that should die
     49 
     50   * An exception has been injected that should kill the thread, but it
     51     hasn't happened yet ("dying")
     52 
     53   * An exception has been injected, but the thread has persisted for
     54     an unreasonable amount of time ("zombie")
     55 
     56 When a request comes in, if there are no idle worker threads waiting
     57 then the server looks at the workers; all workers are busy or hung.
     58 If too many are hung, another thread is opened up.  The limit is if
     59 there are less than ``spawn_if_under`` busy threads.  So if you have
     60 10 workers, ``spawn_if_under`` is 5, and there are 6 hung threads and
     61 4 busy threads, another thread will be opened (bringing the number of
     62 busy threads back to 5).  Later those threads may be collected again
     63 if some of the threads become un-hung.  A thread is hung if it has
     64 been working for longer than ``hung_thread_limit`` (default 30
     65 seconds).
     66 
     67 Every so often, the server will check all the threads for error
     68 conditions.  This happens every ``hung_check_period`` requests
     69 (default 100).  At this time if there are more than enough threads
     70 (because of ``spawn_if_under``) some threads may be collected.  If any
     71 threads have been working for longer than ``kill_thread_limit``
     72 (default 1800 seconds, i.e., 30 minutes) then the thread will be
     73 killed.
     74 
     75 To kill a thread the ``ctypes`` module must be installed.  This will
     76 raise an exception (``SystemExit``) in the thread, which should cause
     77 the thread to stop.  It can take quite a while for this to actually
     78 take effect, sometimes on the order of several minutes.  This uses a
     79 non-public API (hence the ``ctypes`` requirement), and so it might not
     80 work in all cases.  I've tried it in pure Python code and with a hung
     81 socket, and in both cases it worked.  As soon as the thread is killed
     82 (before it is actually dead) another worker is added to the pool.
     83 
     84 If the killed thread lives longer than ``dying_thread_limit`` (default
     85 300 seconds, 5 minutes) then it is considered a zombie.
     86 
     87 Zombie threads are not handled specially unless you set
     88 ``max_zombies_before_die``.  If you set this and there are more than
     89 this many zombie threads, then the entire process will be killed.
     90 This is useful if you are running the server under some process
     91 monitor, such as ``start-stop-daemon``, ``daemontools``, ``runit``, or
     92 with ``paster serve --monitor``.  To make the process die, it may run
     93 ``os._exit``, which is considered an impolite way to exit a process
     94 (akin to ``kill -9``).  It *will* try to run the functions registered
     95 with ``atexit`` (except for the thread cleanup functions, which are
     96 the ones which will block so long as there are living threads).
     97 
     98 Notification
     99 ------------
    100 
    101 If you set ``error_email`` (including setting it globally in a Paste
    102 Deploy ``[DEFAULT]`` section) then you will be notified of two error
    103 conditions: when hung threads are killed, and when the process is
    104 killed due to too many zombie threads.
    105 
    106 Missed Cases
    107 ------------
    108 
    109 If you have a worker pool size of 10, and 11 slow or hung requests
    110 come in, the first 10 will get handed off but the server won't know
    111 yet that they will hang.  The last request will stay stuck in a queue
    112 until another request comes in.  When a later request comes later
    113 (after ``hung_thread_limit`` seconds) the server will notice the
    114 problem and add more threads, and the 11th request will come through.
    115 
    116 If a trickle of bad requests keeps coming in, the number of hung
    117 threads will keep increasing.  At 100 the ``hung_check_period`` may
    118 not clean them up fast enough.
    119 
    120 Killing threads is not something Python really supports.  Corruption
    121 of the process, memory leaks, or who knows what might occur.  For the
    122 most part the threads seem to be killed in a fairly simple manner --
    123 an exception is raised, and ``finally`` blocks do get executed.  But
    124 this hasn't been tried much in production, so there's not much
    125 experience with it.
    126 
    127 watch_threads
    128 -------------
    129 
    130 If you want to see what's going on in your process, you can install
    131 the application ``egg:Paste#watch_threads`` (in the
    132 ``paste.debug.watchthreads`` module).  This lets you see requests and
    133 how long they have been running.  In Python 2.5 you can see tracebacks
    134 of the running requests; before that you can only see request data
    135 (URLs, User-Agent, etc).  If you set ``allow_kill = true`` then you
    136 can also kill threads from the application.  The thread pool is
    137 intended to run reliably without intervention, but this can help debug
    138 problems or give you some feeling of what causes problems in the site.
    139 
    140 This does open up privacy problems, as it gives you access to all the
    141 request data in the site, including cookies, IP addresses, etc.  It
    142 shouldn't be left on in a public setting.
    143 
    144 socket_timeout
    145 --------------
    146 
    147 The HTTP server (not the thread pool) also accepts an argument
    148 ``socket_timeout``.  It is turned off by default.  You might find it
    149 helpful to turn it on.
    150 
    151