Home | History | Annotate | only in /external/tensorflow/tensorflow/contrib/slim/python/slim/data
Up to higher level directory
NameDateSize
BUILD21-Aug-20185.3K
data_decoder.py21-Aug-20182.2K
data_provider.py21-Aug-20184K
dataset.py21-Aug-20182.4K
dataset_data_provider.py21-Aug-20184.2K
dataset_data_provider_test.py21-Aug-20184.9K
parallel_reader.py21-Aug-201811.3K
parallel_reader_test.py21-Aug-20187.7K
prefetch_queue.py21-Aug-20183.5K
prefetch_queue_test.py21-Aug-20188.4K
README.md21-Aug-20186.2K
test_utils.py21-Aug-20183.7K
tfexample_decoder.py21-Aug-201818.7K
tfexample_decoder_test.py21-Aug-201835.2K

README.md

      1 # TensorFlow-Slim Data
      2 
      3 TF-Slim provides a data loading library for facilitating the reading of data
      4 from various formats. TF-Slim's data modules are composed of several layers of
      5 abstraction to make it flexible enough to support multiple file storage types,
      6 such as TFRecords or Text files, data encoding and features naming schemes.
      7 
      8 # Overview
      9 
     10 The task of loading data has two main components: (1) specification of how
     11 a dataset is represented so it can be read and interpreted and (2) instruction
     12 for providing the data to consumers of the dataset.
     13 
     14 Secondly, one must specify instructions for how
     15 the data is actually provided and housed in memory. For example, if the data is
     16 sharded over many sources, should it be read in parallel from these sources?
     17 Should it be read serially? Should the data be shuffled in memory?
     18 
     19 # Dataset Specification
     20 
     21 TF-Slim defines a dataset to be a set of files (that may or may not be encoded)
     22 representing a finite set of samples, and which can be read to provide a
     23 predefined set of entities or `items`. For example, a dataset might be stored
     24 over thousands of files or a single file. The files might store the data in
     25 clear text or some advanced encoding scheme. It might provide a single `item`,
     26 like an image, or several `items`, like an image, a class label and a scene
     27 label.
     28 
     29 More concretely, TF-Slim's
     30 [dataset](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/dataset.py)
     31 is a tuple that encapsulates the following elements of a dataset specification:
     32 
     33 * `data_sources`: A list of file paths that together make up the dataset
     34 * `reader`: A TensorFlow
     35 [Reader](https://www.tensorflow.org/api_docs/python/io_ops.html#ReaderBase)
     36 appropriate for the file type in `data_sources`.
     37 * `decoder`: A TF-Slim
     38 [data_decoder](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/data_decoder.py)
     39 class which is used to decode the content of the read dataset files.
     40 * `num_samples`: The number of samples in the dataset.
     41 * `items_to_descriptions`: A map from the items provided by the dataset to
     42 descriptions of each.
     43 
     44 In a nutshell, a dataset is read by (a) opening the files specified by
     45 `data_sources` using the given `reader` class (b) decoding the files using
     46 the given `decoder` and (c) allowing the user to request a list of `items` to
     47 be returned as `Tensors`.
     48 
     49 ## Data Decoders
     50 
     51 A
     52 [data_decoder](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/data_decoder.py)
     53 is a class which is given some (possibly serialized/encoded) data and returns a
     54 list of `Tensors`. In particular, a given data decoder is able to decode a
     55 predefined list of `items` and can return a subset or all of them, when
     56 requested:
     57 
     58 ```python
     59 # Load the data
     60 my_encoded_data = ...
     61 data_decoder = MyDataDecoder()
     62 
     63 # Decode the inputs and labels:
     64 decoded_input, decoded_labels = data_decoder.Decode(data, ['input', 'labels'])
     65 
     66 # Decode just the inputs:
     67 decoded_input = data_decoder.Decode(data, ['input'])
     68 
     69 # Check which items a data decoder knows how to decode:
     70 for item in data_decoder.list_items():
     71   print(item)
     72 ```
     73 
     74 ## Example: TFExampleDecoder
     75 
     76 The
     77 [tfexample_decoder.py](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py)
     78 is a data decoder which decodes serialized `TFExample` protocol buffers. A
     79 `TFExample` protocol buffer is a map from keys (strings) to either a
     80 `tf.FixedLenFeature` or `tf.VarLenFeature`. Consequently, to decode a
     81 `TFExample`, one must provide a mapping from one or more `TFExample` fields
     82 to each of the `items` that the `tfexample_decoder` can provide. For
     83 example, a dataset of `TFExamples` might store images in various formats and
     84 each `TFExample` might contain an `encoding` key and a `format` key which can
     85 be used to decode the image using the appropriate decoder (jpg, png, etc).
     86 
     87 To make this possible, the `tfexample_decoder` is constructed by specifying
     88 the a map of `TFExample` keys to either `tf.FixedLenFeature` or
     89 `tf.VarLenFeature` as well as a set of `ItemHandlers`. An `ItemHandler`
     90 provides a mapping from `TFExample` keys to the item being provided. Because a
     91 `tfexample_decoder` might return multiple `items`, one often constructs a
     92 `tfexample_decoder` using multiple `ItemHandlers`.
     93 
     94 `tfexample_decoder` provides some predefined `ItemHandlers` which take care
     95 of the common cases of mapping `TFExamples` to images, `Tensors` and
     96 `SparseTensors`. For example, the following specification might be
     97 used to decode a dataset of images:
     98 
     99 ```python
    100 keys_to_features = {
    101     'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
    102     'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
    103     'image/class/label': tf.FixedLenFeature(
    104         [1], tf.int64, default_value=tf.zeros([1], dtype=tf.int64)),
    105 }
    106 
    107 items_to_handlers = {
    108     'image': tfexample_decoder.Image(
    109       image_key = 'image/encoded',
    110       format_key = 'image/format',
    111       shape=[28, 28],
    112       channels=1),
    113     'label': tfexample_decoder.Tensor('image/class/label'),
    114 }
    115 
    116 decoder = tfexample_decoder.TFExampleDecoder(
    117     keys_to_features, items_to_handlers)
    118 ```
    119 
    120 Notice that the TFExample is parsed using three keys: `image/encoded`,
    121 `image/format` and `image/class/label`. Additionally, the first two keys are
    122 mapped to a single `item` named 'image'. As defined, this `data_decoder`
    123 provides two `items` named 'image' and 'label'.
    124 
    125 # Data Provision
    126 
    127 A
    128 [data_provider](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/data_provider.py)
    129 is a class which provides `Tensors` for each item requested:
    130 
    131 ```python
    132 my_data_provider = ...
    133 image, class_label, bounding_box = my_data_provider.get(
    134     ['image', 'label', 'bb'])
    135 ```
    136 
    137 The
    138 [dataset_data_provider](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/dataset_data_provider.py)
    139 is a `data_provider` that provides data from a given `dataset` specification:
    140 
    141 ```python
    142 dataset = GetDataset(...)
    143 data_provider = dataset_data_provider.DatasetDataProvider(
    144     dataset, common_queue_capacity=32, common_queue_min=8)
    145 ```
    146 
    147 The `dataset_data_provider` enables control over several elements of data
    148 provision:
    149 
    150 * How many concurrent readers are used.
    151 * Whether the data is shuffled as its loaded into its queue
    152 * Whether to take a single pass over the data or read data indefinitely.
    153 
    154