Home | History | Annotate | Download | only in doc
      1 # Output pipelines in gemmlowp
      2 
      3 In gemmlowp, the "output pipeline" is the process that takes a final `int32`
      4 accumulator value (the output of the compute/kernel stage), and processes it to
      5 obtain the final value (typically a `uint8` value) and write it to the
      6 destination matrix.
      7 
      8 Gemmlowp has some genericity in what arithmetic transformations take place in
      9 the output pipeline, so as to allow different users to implement different
     10 quantization paradigms. See [low-precision.md](low-precision.md) and
     11 [quantization.md](quantization.md).
     12 
     13 Besides implementing a quantization paradigms, the other thing that output
     14 pipelines are good for, is implementing fused operations where a matrix
     15 multiplication feeds into other operations applied to its result, without
     16 additional array traversals. For instance, when implementing neural network
     17 inference, one might have a Convolutional layer with a bias-addition and an
     18 activation. One then wants to feed the result of the matrix multiplication
     19 implementing the Convolutional operator itself, directly into the bias-addition
     20 and activation function. gemmlowp's output pipelines allow implementing that:
     21 the bias-addition and activation function are just additional stages in the
     22 output pipeline.
     23 
     24 ## Usage
     25 
     26 The gemmlowp entry point allowing to use an arbitrary output pipeline is
     27 `GemmWithOutputPipeline` in [public/gemmlowp.h](../public/gemmlowp.h).
     28 
     29 The output pipeline is specified as a `std::tuple` of "output stages", each of
     30 which defining an elementary arithmetic transformation.
     31 
     32 All available output stages are defined in
     33 [public/output_stages.h](../public/output_stages.h).
     34 
     35 ## Example usage
     36 
     37 The best part to see examples of using various output pipelines is in the unit
     38 test,
     39 
     40 ```
     41 test/test.cc
     42 ```
     43 
     44 specifically in this function:
     45 
     46 ```
     47 TestOutputStages
     48 ```
     49 
     50 Separately, a self-contained example showing how to use gemmlowp to compute a
     51 quantized matrix multiplication with a sounds quantization paradigm, is here:
     52 
     53 [doc/quantization_example.cc](quantization_example.cc)
     54