Home | History | Annotate | Download | only in tutorials
      1 # Image Recognition
      2 
      3 Our brains make vision seem easy. It doesn't take any effort for humans to
      4 tell apart a lion and a jaguar, read a sign, or recognize a human's face.
      5 But these are actually hard problems to solve with a computer: they only
      6 seem easy because our brains are incredibly good at understanding images.
      7 
      8 In the last few years, the field of machine learning has made tremendous
      9 progress on addressing these difficult problems. In particular, we've
     10 found that a kind of model called a deep
     11 [convolutional neural network](https://colah.github.io/posts/2014-07-Conv-Nets-Modular/)
     12 can achieve reasonable performance on hard visual recognition tasks --
     13 matching or exceeding human performance in some domains.
     14 
     15 Researchers have demonstrated steady progress
     16 in computer vision by validating their work against
     17 [ImageNet](http://www.image-net.org) -- an academic benchmark for computer vision.
     18 Successive models continue to show improvements, each time achieving
     19 a new state-of-the-art result:
     20 [QuocNet], [AlexNet], [Inception (GoogLeNet)], [BN-Inception-v2].
     21 Researchers both internal and external to Google have published papers describing all
     22 these models but the results are still hard to reproduce.
     23 We're now taking the next step by releasing code for running image recognition
     24 on our latest model, [Inception-v3].
     25 
     26 [QuocNet]: https://static.googleusercontent.com/media/research.google.com/en//archive/unsupervised_icml2012.pdf
     27 [AlexNet]: https://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
     28 [Inception (GoogLeNet)]: https://arxiv.org/abs/1409.4842
     29 [BN-Inception-v2]: https://arxiv.org/abs/1502.03167
     30 [Inception-v3]: https://arxiv.org/abs/1512.00567
     31 
     32 Inception-v3 is trained for the [ImageNet] Large Visual Recognition Challenge
     33 using the data from 2012. This is a standard task in computer vision,
     34 where models try to classify entire
     35 images into [1000 classes], like "Zebra", "Dalmatian", and "Dishwasher".
     36 For example, here are the results from [AlexNet] classifying some images:
     37 
     38 <div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
     39 <img style="width:100%" src="https://www.tensorflow.org/images/AlexClassification.png">
     40 </div>
     41 
     42 To compare models, we examine how often the model fails to predict the
     43 correct answer as one of their top 5 guesses -- termed "top-5 error rate".
     44 [AlexNet] achieved by setting a top-5 error rate of 15.3% on the 2012
     45 validation data set; [Inception (GoogLeNet)] achieved 6.67%;
     46 [BN-Inception-v2] achieved 4.9%; [Inception-v3] reaches 3.46%.
     47 
     48 > How well do humans do on ImageNet Challenge? There's a [blog post] by
     49 Andrej Karpathy who attempted to measure his own performance. He reached
     50 5.1% top-5 error rate.
     51 
     52 [ImageNet]: http://image-net.org/
     53 [1000 classes]: http://image-net.org/challenges/LSVRC/2014/browse-synsets
     54 [blog post]: https://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
     55 
     56 This tutorial will teach you how to use [Inception-v3]. You'll learn how to
     57 classify images into [1000 classes] in Python or C++. We'll also discuss how to
     58 extract higher level features from this model which may be reused for other
     59 vision tasks.
     60 
     61 We're excited to see what the community will do with this model.
     62 
     63 
     64 ##Usage with Python API
     65 
     66 `classify_image.py` downloads the trained model from `tensorflow.org`
     67 when the program is run for the first time. You'll need about 200M of free space
     68 available on your hard disk.
     69 
     70 Start by cloning the [TensorFlow models repo](https://github.com/tensorflow/models) from GitHub. Run the following commands:
     71 
     72     cd models/tutorials/image/imagenet
     73     python classify_image.py
     74 
     75 The above command will classify a supplied image of a panda bear.
     76 
     77 <div style="width:15%; margin:auto; margin-bottom:10px; margin-top:20px;">
     78   <img style="width:100%" src="https://www.tensorflow.org/images/cropped_panda.jpg">
     79 </div>
     80 
     81 If the model runs correctly, the script will produce the following output:
     82 
     83     giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
     84     indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
     85     lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
     86     custard apple (score = 0.00149)
     87     earthstar (score = 0.00127)
     88 
     89 If you wish to supply other JPEG images, you may do so by editing
     90 the `--image_file` argument.
     91 
     92 > If you download the model data to a different directory, you
     93 will need to point `--model_dir`  to the directory used.
     94 
     95 ## Usage with the C++ API
     96 
     97 You can run the same [Inception-v3] model in C++ for use in production
     98 environments. You can download the archive containing the GraphDef that defines
     99 the model like this (running from the root directory of the TensorFlow
    100 repository):
    101 
    102 ```bash
    103 curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" |
    104   tar -C tensorflow/examples/label_image/data -xz
    105 ```
    106 
    107 Next, we need to compile the C++ binary that includes the code to load and run the graph.
    108 If you've followed
    109 @{$install_sources$the instructions to download the source installation of TensorFlow}
    110 for your platform, you should be able to build the example by
    111 running this command from your shell terminal:
    112 
    113 ```bash
    114 bazel build tensorflow/examples/label_image/...
    115 ```
    116 
    117 That should create a binary executable that you can then run like this:
    118 
    119 ```bash
    120 bazel-bin/tensorflow/examples/label_image/label_image
    121 ```
    122 
    123 This uses the default example image that ships with the framework, and should
    124 output something similar to this:
    125 
    126 ```
    127 I tensorflow/examples/label_image/main.cc:206] military uniform (653): 0.834306
    128 I tensorflow/examples/label_image/main.cc:206] mortarboard (668): 0.0218692
    129 I tensorflow/examples/label_image/main.cc:206] academic gown (401): 0.0103579
    130 I tensorflow/examples/label_image/main.cc:206] pickelhaube (716): 0.00800814
    131 I tensorflow/examples/label_image/main.cc:206] bulletproof vest (466): 0.00535088
    132 ```
    133 In this case, we're using the default image of
    134 [Admiral Grace Hopper](https://en.wikipedia.org/wiki/Grace_Hopper), and you can
    135 see the network correctly identifies she's wearing a military uniform, with a high
    136 score of 0.8.
    137 
    138 
    139 <div style="width:45%; margin:auto; margin-bottom:10px; margin-top:20px;">
    140   <img style="width:100%" src="https://www.tensorflow.org/images/grace_hopper.jpg">
    141 </div>
    142 
    143 Next, try it out on your own images by supplying the --image= argument, e.g.
    144 
    145 ```bash
    146 bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png
    147 ```
    148 
    149 If you look inside the [`tensorflow/examples/label_image/main.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc)
    150 file, you can find out
    151 how it works. We hope this code will help you integrate TensorFlow into
    152 your own applications, so we will walk step by step through the main functions:
    153 
    154 The command line flags control where the files are loaded from, and properties of the input images.
    155 The model expects to get square 299x299 RGB images, so those are the `input_width`
    156 and `input_height` flags. We also need to scale the pixel values from integers that
    157 are between 0 and 255 to the floating point values that the graph operates on.
    158 We control the scaling with the `input_mean` and `input_std` flags: we first subtract
    159 `input_mean` from each pixel value, then divide it by `input_std`.
    160 
    161 These values probably look somewhat magical, but they are just defined by the
    162 original model author based on what he/she wanted to use as input images for
    163 training. If you have a graph that you've trained yourself, you'll just need
    164 to adjust the values to match whatever you used during your training process.
    165 
    166 You can see how they're applied to an image in the
    167 [`ReadTensorFromImageFile()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc#L88)
    168 function.
    169 
    170 ```C++
    171 // Given an image file name, read in the data, try to decode it as an image,
    172 // resize it to the requested size, and then scale the values as desired.
    173 Status ReadTensorFromImageFile(string file_name, const int input_height,
    174                                const int input_width, const float input_mean,
    175                                const float input_std,
    176                                std::vector<Tensor>* out_tensors) {
    177   tensorflow::GraphDefBuilder b;
    178 ```
    179 We start by creating a `GraphDefBuilder`, which is an object we can use to
    180 specify a model to run or load.
    181 
    182 ```C++
    183   string input_name = "file_reader";
    184   string output_name = "normalized";
    185   tensorflow::Node* file_reader =
    186       tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
    187                                 b.opts().WithName(input_name));
    188 ```
    189 We then start creating nodes for the small model we want to run
    190 to load, resize, and scale the pixel values to get the result the main model
    191 expects as its input. The first node we create is just a `Const` op that holds a
    192 tensor with the file name of the image we want to load. That's then passed as the
    193 first input to the `ReadFile` op. You might notice we're passing `b.opts()` as the last
    194 argument to all the op creation functions. The argument ensures that the node is added to
    195 the model definition held in the `GraphDefBuilder`. We also name the `ReadFile`
    196 operator by making the `WithName()` call to `b.opts()`. This gives a name to the node,
    197 which isn't strictly necessary since an automatic name will be assigned if you don't
    198 do this, but it does make debugging a bit easier.
    199 
    200 ```C++
    201   // Now try to figure out what kind of file it is and decode it.
    202   const int wanted_channels = 3;
    203   tensorflow::Node* image_reader;
    204   if (tensorflow::StringPiece(file_name).ends_with(".png")) {
    205     image_reader = tensorflow::ops::DecodePng(
    206         file_reader,
    207         b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
    208   } else {
    209     // Assume if it's not a PNG then it must be a JPEG.
    210     image_reader = tensorflow::ops::DecodeJpeg(
    211         file_reader,
    212         b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
    213   }
    214   // Now cast the image data to float so we can do normal math on it.
    215   tensorflow::Node* float_caster = tensorflow::ops::Cast(
    216       image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
    217   // The convention for image ops in TensorFlow is that all images are expected
    218   // to be in batches, so that they're four-dimensional arrays with indices of
    219   // [batch, height, width, channel]. Because we only have a single image, we
    220   // have to add a batch dimension of 1 to the start with ExpandDims().
    221   tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
    222       float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
    223   // Bilinearly resize the image to fit the required dimensions.
    224   tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
    225       dims_expander, tensorflow::ops::Const({input_height, input_width},
    226                                             b.opts().WithName("size")),
    227       b.opts());
    228   // Subtract the mean and divide by the scale.
    229   tensorflow::ops::Div(
    230       tensorflow::ops::Sub(
    231           resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
    232       tensorflow::ops::Const({input_std}, b.opts()),
    233       b.opts().WithName(output_name));
    234 ```
    235 We then keep adding more nodes, to decode the file data as an image, to cast the
    236 integers into floating point values, to resize it, and then finally to run the
    237 subtraction and division operations on the pixel values.
    238 
    239 ```C++
    240   // This runs the GraphDef network definition that we've just constructed, and
    241   // returns the results in the output tensor.
    242   tensorflow::GraphDef graph;
    243   TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
    244 ```
    245 At the end of this we have
    246 a model definition stored in the b variable, which we turn into a full graph
    247 definition with the `ToGraphDef()` function.
    248 
    249 ```C++
    250   std::unique_ptr<tensorflow::Session> session(
    251       tensorflow::NewSession(tensorflow::SessionOptions()));
    252   TF_RETURN_IF_ERROR(session->Create(graph));
    253   TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
    254   return Status::OK();
    255 ```
    256 Then we create a @{tf.Session}
    257 object, which is the interface to actually running the graph, and run it,
    258 specifying which node we want to get the output from, and where to put the
    259 output data.
    260 
    261 This gives us a vector of `Tensor` objects, which in this case we know will only be a
    262 single object long. You can think of a `Tensor` as a multi-dimensional array in this
    263 context, and it holds a 299 pixel high, 299 pixel wide, 3 channel image as float
    264 values. If you have your own image-processing framework in your product already, you
    265 should be able to use that instead, as long as you apply the same transformations
    266 before you feed images into the main graph.
    267 
    268 This is a simple example of creating a small TensorFlow graph dynamically in C++,
    269 but for the pre-trained Inception model we want to load a much larger definition from
    270 a file. You can see how we do that in the `LoadGraph()` function.
    271 
    272 ```C++
    273 // Reads a model graph definition from disk, and creates a session object you
    274 // can use to run it.
    275 Status LoadGraph(string graph_file_name,
    276                  std::unique_ptr<tensorflow::Session>* session) {
    277   tensorflow::GraphDef graph_def;
    278   Status load_graph_status =
    279       ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
    280   if (!load_graph_status.ok()) {
    281     return tensorflow::errors::NotFound("Failed to load compute graph at '",
    282                                         graph_file_name, "'");
    283   }
    284 ```
    285 If you've looked through the image loading code, a lot of the terms should seem familiar. Rather than
    286 using a `GraphDefBuilder` to produce a `GraphDef` object, we load a protobuf file that
    287 directly contains the `GraphDef`.
    288 
    289 ```C++
    290   session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));
    291   Status session_create_status = (*session)->Create(graph_def);
    292   if (!session_create_status.ok()) {
    293     return session_create_status;
    294   }
    295   return Status::OK();
    296 }
    297 ```
    298 Then we create a Session object from that `GraphDef` and
    299 pass it back to the caller so that they can run it at a later time.
    300 
    301 The `GetTopLabels()` function is a lot like the image loading, except that in this case
    302 we want to take the results of running the main graph, and turn it into a sorted list
    303 of the highest-scoring labels. Just like the image loader, it creates a
    304 `GraphDefBuilder`, adds a couple of nodes to it, and then runs the short graph to get a
    305 pair of output tensors. In this case they represent the sorted scores and index
    306 positions of the highest results.
    307 
    308 ```C++
    309 // Analyzes the output of the Inception graph to retrieve the highest scores and
    310 // their positions in the tensor, which correspond to categories.
    311 Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
    312                     Tensor* indices, Tensor* scores) {
    313   tensorflow::GraphDefBuilder b;
    314   string output_name = "top_k";
    315   tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
    316                         how_many_labels, b.opts().WithName(output_name));
    317   // This runs the GraphDef network definition that we've just constructed, and
    318   // returns the results in the output tensors.
    319   tensorflow::GraphDef graph;
    320   TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
    321   std::unique_ptr<tensorflow::Session> session(
    322       tensorflow::NewSession(tensorflow::SessionOptions()));
    323   TF_RETURN_IF_ERROR(session->Create(graph));
    324   // The TopK node returns two outputs, the scores and their original indices,
    325   // so we have to append :0 and :1 to specify them both.
    326   std::vector<Tensor> out_tensors;
    327   TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
    328                                   {}, &out_tensors));
    329   *scores = out_tensors[0];
    330   *indices = out_tensors[1];
    331   return Status::OK();
    332 ```
    333 The `PrintTopLabels()` function takes those sorted results, and prints them out in a
    334 friendly way. The `CheckTopLabel()` function is very similar, but just makes sure that
    335 the top label is the one we expect, for debugging purposes.
    336 
    337 At the end, [`main()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/label_image/main.cc#L252)
    338 ties together all of these calls.
    339 
    340 ```C++
    341 int main(int argc, char* argv[]) {
    342   // We need to call this to set up global state for TensorFlow.
    343   tensorflow::port::InitMain(argv[0], &argc, &argv);
    344   Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
    345   if (!s.ok()) {
    346     LOG(ERROR) << "Error parsing command line flags: " << s.ToString();
    347     return -1;
    348   }
    349 
    350   // First we load and initialize the model.
    351   std::unique_ptr<tensorflow::Session> session;
    352   string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);
    353   Status load_graph_status = LoadGraph(graph_path, &session);
    354   if (!load_graph_status.ok()) {
    355     LOG(ERROR) << load_graph_status;
    356     return -1;
    357   }
    358 ```
    359 We load the main graph.
    360 
    361 ```C++
    362   // Get the image from disk as a float array of numbers, resized and normalized
    363   // to the specifications the main graph expects.
    364   std::vector<Tensor> resized_tensors;
    365   string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);
    366   Status read_tensor_status = ReadTensorFromImageFile(
    367       image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,
    368       FLAGS_input_std, &resized_tensors);
    369   if (!read_tensor_status.ok()) {
    370     LOG(ERROR) << read_tensor_status;
    371     return -1;
    372   }
    373   const Tensor& resized_tensor = resized_tensors[0];
    374 ```
    375 Load, resize, and process the input image.
    376 
    377 ```C++
    378   // Actually run the image through the model.
    379   std::vector<Tensor> outputs;
    380   Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}},
    381                                    {FLAGS_output_layer}, {}, &outputs);
    382   if (!run_status.ok()) {
    383     LOG(ERROR) << "Running model failed: " << run_status;
    384     return -1;
    385   }
    386 ```
    387 Here we run the loaded graph with the image as an input.
    388 
    389 ```C++
    390   // This is for automated testing to make sure we get the expected result with
    391   // the default settings. We know that label 866 (military uniform) should be
    392   // the top label for the Admiral Hopper image.
    393   if (FLAGS_self_test) {
    394     bool expected_matches;
    395     Status check_status = CheckTopLabel(outputs, 866, &expected_matches);
    396     if (!check_status.ok()) {
    397       LOG(ERROR) << "Running check failed: " << check_status;
    398       return -1;
    399     }
    400     if (!expected_matches) {
    401       LOG(ERROR) << "Self-test failed!";
    402       return -1;
    403     }
    404   }
    405 ```
    406 For testing purposes we can check to make sure we get the output we expect here.
    407 
    408 ```C++
    409   // Do something interesting with the results we've generated.
    410   Status print_status = PrintTopLabels(outputs, FLAGS_labels);
    411 ```
    412 Finally we print the labels we found.
    413 
    414 ```C++
    415   if (!print_status.ok()) {
    416     LOG(ERROR) << "Running print failed: " << print_status;
    417     return -1;
    418   }
    419 ```
    420 
    421 The error handling here is using TensorFlow's `Status`
    422 object, which is very convenient because it lets you know whether any error has
    423 occurred with the `ok()` checker, and then can be printed out to give a readable error
    424 message.
    425 
    426 In this case we are demonstrating object recognition, but you should be able to
    427 use very similar code on other models you've found or trained yourself, across
    428 all
    429 sorts of domains. We hope this small example gives you some ideas on how to use
    430 TensorFlow within your own products.
    431 
    432 > **EXERCISE**: Transfer learning is the idea that, if you know how to solve a task well, you
    433 should be able to transfer some of that understanding to solving related
    434 problems.  One way to perform transfer learning is to remove the final
    435 classification layer of the network and extract
    436 the [next-to-last layer of the CNN](https://arxiv.org/abs/1310.1531), in this case a 2048 dimensional vector.
    437 There's a guide to doing this @{$image_retraining$in the how-to section}.
    438 
    439 
    440 ## Resources for Learning More
    441 
    442 To learn about neural networks in general, Michael Nielsen's
    443 [free online book](http://neuralnetworksanddeeplearning.com/chap1.html)
    444 is an excellent resource. For convolutional neural networks in particular,
    445 Chris Olah has some
    446 [nice blog posts](https://colah.github.io/posts/2014-07-Conv-Nets-Modular/),
    447 and Michael Nielsen's book has a
    448 [great chapter](http://neuralnetworksanddeeplearning.com/chap6.html)
    449 covering them.
    450 
    451 To find out more about implementing convolutional neural networks, you can jump
    452 to the TensorFlow @{$deep_cnn$deep convolutional networks tutorial},
    453 or start a bit more gently with our @{$layers$MNIST starter tutorial}.
    454 Finally, if you want to get up to speed on research in this area, you can
    455 read the recent work of all the papers referenced in this tutorial.
    456 
    457