Home | History | Annotate | only in /external/tensorflow/tensorflow/contrib/coder
Up to higher level directory
NameDateSize
__init__.py21-Aug-20181K
BUILD21-Aug-20183.5K
kernels/21-Aug-2018
ops/21-Aug-2018
python/21-Aug-2018
README.md21-Aug-20182.6K

README.md

      1 # Entropy coder
      2 
      3 This module contains range encoder and range decoder which can encode integer
      4 data into string with cumulative distribution functions (CDF).
      5 
      6 ## Data and CDF values
      7 
      8 The data to be encoded should be non-negative integers in half-open interval
      9 `[0, m)`. Then a CDF is represented as an integral vector of length `m + 1`
     10 where `CDF(i) = f(Pr(X < i) * 2^precision)` for i = 0,1,...,m, and `precision`
     11 is an attribute in range `0 < precision <= 16`. The function `f` maps real
     12 values into integers, e.g., round or floor. It is important that to encode a
     13 number `i`, `CDF(i + 1) - CDF(i)` cannot be zero.
     14 
     15 Note that we used `Pr(X < i)` not `Pr(X <= i)`, and therefore CDF(0) = 0 always.
     16 
     17 ## RangeEncode: data shapes and CDF shapes
     18 
     19 For each data element, its CDF has to be provided. Therefore if the shape of CDF
     20 should be `data.shape + (m + 1,)` in NumPy-like notation. For example, if `data`
     21 is a 2-D tensor of shape (10, 10) and its elements are in `[0, 64)`, then the
     22 CDF tensor should have shape (10, 10, 65).
     23 
     24 This may make CDF tensor too large, and in many applications all data elements
     25 may have the same probability distribution. To handle this, `RangeEncode`
     26 supports limited broadcasting CDF into data. Broadcasting is limited in the
     27 following sense:
     28 
     29 - All CDF axes but the last one is broadcasted into data but not the other way
     30   around,
     31 - The number of CDF axes does not extend, i.e., `CDF.ndim == data.ndim + 1`.
     32 
     33 In the previous example where data has shape (10, 10), the following are
     34 acceptable CDF shapes:
     35 
     36 - (10, 10, 65)
     37 - (1, 10, 65)
     38 - (10, 1, 65)
     39 - (1, 1, 65)
     40 
     41 ## RangeDecode
     42 
     43 `RangeEncode` encodes neither data shape nor termination character. Therefore
     44 the decoder should know how many characters are encoded into the string, and
     45 `RangeDecode` takes the encoded data shape as the second argument. The same
     46 shape restrictions as `RangeEncode` inputs apply here.
     47 
     48 ## Example
     49 
     50 ```python
     51 data = tf.random_uniform((128, 128), 0, 10, dtype=tf.int32)
     52 
     53 histogram = tf.bincount(data, minlength=10, maxlength=10)
     54 cdf = tf.cumsum(histogram, exclusive=False)
     55 # CDF should have length m + 1.
     56 cdf = tf.pad(cdf, [[1, 0]])
     57 # CDF axis count must be one more than data.
     58 cdf = tf.reshape(cdf, [1, 1, -1])
     59 
     60 # Note that data has 2^14 elements, and therefore the sum of CDF is 2^14.
     61 data = tf.cast(data, tf.int16)
     62 encoded = coder.range_encode(data, cdf, precision=14)
     63 decoded = coder.range_decode(encoded, tf.shape(data), cdf, precision=14)
     64 
     65 # data and decoded should be the same.
     66 sess = tf.Session()
     67 x, y = sess.run((data, decoded))
     68 assert np.all(x == y)
     69 ```
     70 
     71 ## Authors
     72 Sung Jin Hwang (github: [ssjhv](https://github.com/ssjhv)) and Nick Johnston
     73 (github: [nmjohn](https://github.com/nmjohn))
     74