Home | History | Annotate | Download | only in doc
      1 # Combiner Explanation
      2 ## Talk by ctiller, notes by vjpai
      3 
      4 Typical way of doing critical section
      5 
      6 ```
      7 mu.lock()
      8 do_stuff()
      9 mu.unlock()
     10 ```
     11 
     12 An alternative way of doing it is
     13 
     14 ```
     15 class combiner {
     16   run(f) {
     17     mu.lock()
     18     f()
     19     mu.unlock()
     20   }
     21   mutex mu;
     22 }
     23 
     24 combiner.run(do_stuff)
     25 ```
     26 
     27 If you have two threads calling combiner, there will be some kind of
     28 queuing in place. It's called `combiner` because you can pass in more
     29 than one do_stuff at once and they will run under a common `mu`.
     30 
     31 The implementation described above has the issue that you're blocking a thread
     32 for a period of time, and this is considered harmful because it's an application thread that you're blocking.
     33 
     34 Instead, get a new property:
     35 * Keep things running in serial execution
     36 * Don't ever sleep the thread
     37 * But maybe allow things to end up running on a different thread from where they were started
     38 * This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked
     39 
     40 ```
     41 class combiner {
     42   mpscq q; // multi-producer single-consumer queue can be made non-blocking
     43   state s; // is it empty or executing
     44   
     45   run(f) {
     46     if (q.push(f)) { 
     47       // q.push returns true if it's the first thing
     48       while (q.pop(&f)) { // modulo some extra work to avoid races
     49         f();
     50       }
     51     }
     52   }
     53 }
     54 ```
     55 
     56 The basic idea is that the first one to push onto the combiner
     57 executes the work and then keeps executing functions from the queue
     58 until the combiner is drained.
     59 
     60 Our combiner does some additional work, with the motivation of write-batching.
     61 
     62 We have a second tier of `run` called `run_finally`. Anything queued
     63 onto `run_finally` runs after we have drained the queue. That means
     64 that there is essentially a finally-queue. This is not guaranteed to
     65 be final, but it's best-effort. In the process of running the finally
     66 item, we might put something onto the main combiner queue and so we'll
     67 need to re-enter.
     68 
     69 `chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write.
     70 
     71 ```
     72 class combiner {
     73   mpscq q; // multi-producer single-consumer queue can be made non-blocking
     74   state s; // is it empty or executing
     75   queue finally; // you can only do run_finally when you are already running something from the combiner
     76   
     77   run(f) {
     78     if (q.push(f)) { 
     79       // q.push returns true if it's the first thing
     80       loop:
     81       while (q.pop(&f)) { // modulo some extra work to avoid races
     82         f();
     83       }
     84       while (finally.pop(&f)) {
     85         f();
     86       }
     87       goto loop;
     88     }
     89   }
     90 }
     91 ```
     92 
     93 So that explains how combiners work in general. In gRPC, there is
     94 `start_batch(..., tag)` and then work only gets activated by somebody
     95 calling `cq::next` which returns a tag. This gives an API-level
     96 guarantee that there will be a thread doing polling to actually make
     97 work happen. However, some operations are not covered by a poller
     98 thread, such as cancellation that doesn't have a completion. Other
     99 callbacks that don't have a completion are the internal work that gets
    100 done before the batch gets completed. We need a condition called
    101 `covered_by_poller` that means that the item will definitely need some
    102 thread at some point to call `cq::next` . This includes those
    103 callbacks that directly cause a completion but also those that are
    104 indirectly required before getting a completion. If we can't tell for
    105 sure for a specific path, we have to assumed it is not covered by
    106 poller.
    107 
    108 The above combiner has the problem that it keeps draining for a
    109 potentially infinite amount of time and that can lead to a huge tail
    110 latency for some operations. So we can tweak it by returning to the application
    111 if we know that it is valid to do so:
    112 
    113 ```
    114 while (q.pop(&f)) {
    115   f();
    116   if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) {
    117     offload_combiner_work_to_some_other_thread();
    118   }
    119 }
    120 ```
    121 
    122 `offload` is more than `break`; it does `break` but also causes some
    123 other thread that is currently waiting on a poll to break out of its
    124 poll. This is done by setting up a per-polling-island work-queue
    125 (distributor) wakeup FD. The work-queue is the converse of the combiner; it
    126 tries to spray events onto as many threads as possible to get as much concurrency as possible.
    127 
    128 So `offload` really does:
    129 
    130 ``` 
    131   workqueue.run(continue_from_while_loop);
    132   break;
    133 ```
    134 
    135 This needs us to add another class variable for a `workqueue`
    136 (which is really conceptually a distributor).
    137 
    138 ```
    139 workqueue::run(f) {
    140   q.push(f)
    141   eventfd.wakeup()
    142 }
    143 
    144 workqueue::readable() {
    145   eventfd.consume();
    146   q.pop(&f);
    147   f();
    148   if (!q.empty()) {
    149     eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue
    150   }
    151 }
    152 ```
    153 
    154 In principle, `run_finally` could get starved, but this hasn't
    155 happened in practice. If we were concerned about this, we could put a
    156 limit on how many things come off the regular `q` before the `finally`
    157 queue gets processed.
    158 
    159