README.md
1 # TensorFlow evaluation metrics and summary statistics
2
3 ## Evaluation metrics
4
5 Metrics are used in evaluation to assess the quality of a model. Most are
6 "streaming" ops, meaning they create variables to accumulate a running total,
7 and return an update tensor to update these variables, and a value tensor to
8 read the accumulated value. Example:
9
10 value, update_op = metrics.streaming_mean_squared_error(
11 predictions, targets, weight)
12
13 Most metric functions take a pair of tensors, `predictions` and ground truth
14 `targets` (`streaming_mean` is an exception, it takes a single value tensor,
15 usually a loss). It is assumed that the shape of both these tensors is of the
16 form `[batch_size, d1, ... dN]` where `batch_size` is the number of samples in
17 the batch and `d1` ... `dN` are the remaining dimensions.
18
19 The `weight` parameter can be used to adjust the relative weight of samples
20 within the batch. The result of each loss is a scalar average of all sample
21 losses with non-zero weights.
22
23 The result is 2 tensors that should be used like the following for each eval
24 run:
25
26 ```python
27 predictions = ...
28 labels = ...
29 value, update_op = some_metric(predictions, labels)
30
31 for step_num in range(max_steps):
32 update_op.run()
33
34 print "evaluation score: ", value.eval()
35 ```
36