Home | History | Annotate | only in /external/brotli/research
Up to higher level directory
NameDateSize
brotlidump.py21-Aug-201890K
deorummolae.cc21-Aug-20188.6K
deorummolae.h21-Aug-2018815
draw_diff.cc21-Aug-20183.1K
draw_histogram.cc21-Aug-20185.6K
esaxx/21-Aug-2018
find_opt_references.cc21-Aug-20188.5K
img/21-Aug-2018
Makefile21-Aug-2018300
read_dist.h21-Aug-20181.6K
README.md21-Aug-20182.8K

README.md

      1 ## Introduction
      2 
      3 In this directory we publish simple tools to analyze backward reference distance distributions in LZ77 compression. We developed these tools to be able to make more efficient encoding of distances in large-window brotli. In large-window compression the average cost of a backward reference distance is higher, and this may allow for more advanced encoding strategies, such as delta coding or an increase in context size, to bring significant compression density improvements. Our tools visualize the backward references as histogram images, i.e., one pixel in the image shows how many distances of a certain range exist at a certain locality in the data. The human visual system is excellent at pattern detection, so we tried to roughly identify patterns visually before going into more quantitative analysis. These tools can turn out to be useful in development of  other LZ77-based compressors and we hope you try them out.
      4 
      5 
      6 ## Tools
      7 ### find\_opt\_references
      8 
      9 This tool generates optimal (match-length-wise) backward references for every position in the input files and stores them in `*.dist` file described below.
     10 
     11 Example usage:
     12 
     13     find_opt_references input.txt output.dist
     14 
     15 ### draw\_histogram
     16 
     17 This tool generates a visualization of the distribution of backward references stored in `*.dist` file. The original file size has to be specified as a second parameter. The output is a grayscale PGM (binary) image.
     18 
     19 Example usage:
     20 
     21     draw_histogram input.dist 65536 output.pgm
     22 
     23 Here's an example of resulting image:
     24 
     25 ![](img/enwik9_brotli.png)
     26 
     27 ### draw\_diff
     28 
     29 This tool generates a diff PPM (binary) image between two input 8-bit PGM (binary) images. Input images must be of same size. Useful for comparing different backward references distributions for same input file. Normally used for comparison of output images from `draw_histogram` tool.
     30 
     31 Example usage:
     32 
     33     draw_diff image1.pgm image2.pgm diff.ppm
     34 
     35 For example the diff of this image
     36 
     37 ![](img/enwik9_brotli.png)
     38 
     39 and this image
     40 
     41 ![](img/enwik9_opt.png)
     42 
     43 looks like this:
     44 
     45 ![](img/enwik9_diff.png)
     46 
     47 
     48 ## Backward distance file format
     49 
     50 The format of `*.dist` files is as follows:
     51 
     52     [[     0| match length][     1|position|distance]...]
     53      [1 byte|      4 bytes][1 byte| 4 bytes| 4 bytes]
     54 
     55 More verbose explanation: for each backward reference there is a position-distance pair, also a copy length may be specified. Copy length is prefixed with flag byte 0, position-distance pair is prefixed with flag byte 1. Each number is a 32-bit integer. Copy length always comes before position-distance pair. Standalone copy length is allowed, in this case it is ignored.
     56 
     57 Here's an example of how to read from `*.dist` file:
     58 
     59 ```c++
     60 #include "read_dist.h"
     61 
     62 FILE* f;
     63 int copy, pos, dist;
     64 while (ReadBackwardReference(fin, &copy, &pos, &dist)) {
     65    ...
     66 }
     67 ```
     68