README.md
1 # K-FAC: Kronecker-Factored Approximate Curvature
2
3 **K-FAC in TensorFlow** is an implementation of [K-FAC][kfac-paper], an
4 approximate second-order optimization method, in TensorFlow. When applied to
5 feedforward and convolutional neural networks, K-FAC can converge `>3.5x`
6 faster in `>14x` fewer iterations than SGD with Momentum.
7
8 [kfac-paper]: https://arxiv.org/abs/1503.05671
9
10 ## What is K-FAC?
11
12 K-FAC, short for "Kronecker-factored Approximate Curvature", is an approximation
13 to the [Natural Gradient][natural_gradient] algorithm designed specifically for
14 neural networks. It maintains a block-diagonal approximation to the [Fisher
15 Information matrix][fisher_information], whose inverse preconditions the
16 gradient.
17
18 K-FAC can be used in place of SGD, Adam, and other `Optimizer` implementations.
19 Experimentally, K-FAC converges `>3.5x` faster than well-tuned SGD.
20
21 Unlike most optimizers, K-FAC exploits structure in the model itself (e.g. "What
22 are the weights for layer i?"). As such, you must add some additional code while
23 constructing your model to use K-FAC.
24
25 [natural_gradient]: http://www.mitpressjournals.org/doi/abs/10.1162/089976698300017746
26 [fisher_information]: https://en.wikipedia.org/wiki/Fisher_information#Matrix_form
27
28 ## Why should I use K-FAC?
29
30 K-FAC can take advantage of the curvature of the optimization problem, resulting
31 in **faster training**. For an 8-layer Autoencoder, K-FAC converges to the same
32 loss as SGD with Momentum in 3.8x fewer seconds and 14.7x fewer updates. See how
33 training loss changes as a function of number of epochs, steps, and seconds:
34
35 ![autoencoder](g3doc/autoencoder.png)
36
37 ## Is K-FAC for me?
38
39 If you have a feedforward or convolutional model for classification that is
40 converging too slowly, K-FAC is for you. K-FAC can be used in your model if:
41
42 * Your model defines a posterior distribution.
43 * Your model uses only fully-connected or convolutional layers (residual
44 connections OK).
45 * You are training on CPU or GPU.
46 * You can modify model code to register layers with K-FAC.
47
48 ## How do I use K-FAC?
49
50 Using K-FAC requires three steps:
51
52 1. Registering layer inputs, weights, and pre-activations with a
53 `LayerCollection`.
54 1. Minimizing the loss with a `KfacOptimizer`.
55 1. Keeping K-FAC's preconditioner updated.
56
57 ```python
58 # Build model.
59 w = tf.get_variable("w", ...)
60 b = tf.get_variable("b", ...)
61 logits = tf.matmul(x, w) + b
62 loss = tf.reduce_mean(
63 tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits))
64
65 # Register layers.
66 layer_collection = LayerCollection()
67 layer_collection.register_fully_connected((w, b), x, logits)
68 layer_collection.register_categorical_predictive_distribution(logits)
69
70 # Construct training ops.
71 optimizer = KfacOptimizer(..., layer_collection=layer_collection)
72 train_op = optimizer.minimize(loss)
73
74 # Minimize loss.
75 with tf.Session() as sess:
76 ...
77 sess.run([train_op, optimizer.cov_update_op, optimizer.inv_update_op])
78 ```
79
80 See [`examples/`](https://www.tensorflow.org/code/tensorflow/contrib/kfac/examples/) for runnable, end-to-end illustrations.
81
82 ## Authors
83
84 - Alok Aggarwal
85 - Daniel Duckworth
86 - James Martens
87 - Matthew Johnson
88 - Olga Wichrowska
89 - Roger Grosse
90