Home | History | Annotate | only in /external/tensorflow/tensorflow/contrib/kfac
Up to higher level directory
NameDateSize
__init__.py21-Aug-20182K
BUILD21-Aug-20181.2K
examples/21-Aug-2018
g3doc/21-Aug-2018
python/21-Aug-2018
README.md21-Aug-20183.1K

README.md

      1 # K-FAC: Kronecker-Factored Approximate Curvature
      2 
      3 **K-FAC in TensorFlow** is an implementation of [K-FAC][kfac-paper], an
      4 approximate second-order optimization method, in TensorFlow. When applied to
      5 feedforward and convolutional neural networks, K-FAC can converge `>3.5x`
      6 faster in `>14x` fewer iterations than SGD with Momentum.
      7 
      8 [kfac-paper]: https://arxiv.org/abs/1503.05671
      9 
     10 ## What is K-FAC?
     11 
     12 K-FAC, short for "Kronecker-factored Approximate Curvature", is an approximation
     13 to the [Natural Gradient][natural_gradient] algorithm designed specifically for
     14 neural networks. It maintains a block-diagonal approximation to the [Fisher
     15 Information matrix][fisher_information], whose inverse preconditions the
     16 gradient.
     17 
     18 K-FAC can be used in place of SGD, Adam, and other `Optimizer` implementations.
     19 Experimentally, K-FAC converges `>3.5x` faster than well-tuned SGD.
     20 
     21 Unlike most optimizers, K-FAC exploits structure in the model itself (e.g. "What
     22 are the weights for layer i?"). As such, you must add some additional code while
     23 constructing your model to use K-FAC.
     24 
     25 [natural_gradient]: http://www.mitpressjournals.org/doi/abs/10.1162/089976698300017746
     26 [fisher_information]: https://en.wikipedia.org/wiki/Fisher_information#Matrix_form
     27 
     28 ## Why should I use K-FAC?
     29 
     30 K-FAC can take advantage of the curvature of the optimization problem, resulting
     31 in **faster training**. For an 8-layer Autoencoder, K-FAC converges to the same
     32 loss as SGD with Momentum in 3.8x fewer seconds and 14.7x fewer updates. See how
     33 training loss changes as a function of number of epochs, steps, and seconds:
     34 
     35 ![autoencoder](g3doc/autoencoder.png)
     36 
     37 ## Is K-FAC for me?
     38 
     39 If you have a feedforward or convolutional model for classification that is
     40 converging too slowly, K-FAC is for you. K-FAC can be used in your model if:
     41 
     42 *   Your model defines a posterior distribution.
     43 *   Your model uses only fully-connected or convolutional layers (residual
     44     connections OK).
     45 *   You are training on CPU or GPU.
     46 *   You can modify model code to register layers with K-FAC.
     47 
     48 ## How do I use K-FAC?
     49 
     50 Using K-FAC requires three steps:
     51 
     52 1.  Registering layer inputs, weights, and pre-activations with a
     53     `LayerCollection`.
     54 1.  Minimizing the loss with a `KfacOptimizer`.
     55 1.  Keeping K-FAC's preconditioner updated.
     56 
     57 ```python
     58 # Build model.
     59 w = tf.get_variable("w", ...)
     60 b = tf.get_variable("b", ...)
     61 logits = tf.matmul(x, w) + b
     62 loss = tf.reduce_mean(
     63   tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits))
     64 
     65 # Register layers.
     66 layer_collection = LayerCollection()
     67 layer_collection.register_fully_connected((w, b), x, logits)
     68 layer_collection.register_categorical_predictive_distribution(logits)
     69 
     70 # Construct training ops.
     71 optimizer = KfacOptimizer(..., layer_collection=layer_collection)
     72 train_op = optimizer.minimize(loss)
     73 
     74 # Minimize loss.
     75 with tf.Session() as sess:
     76   ...
     77   sess.run([train_op, optimizer.cov_update_op, optimizer.inv_update_op])
     78 ```
     79 
     80 See [`examples/`](https://www.tensorflow.org/code/tensorflow/contrib/kfac/examples/) for runnable, end-to-end illustrations.
     81 
     82 ## Authors
     83 
     84 - Alok Aggarwal
     85 - Daniel Duckworth
     86 - James Martens
     87 - Matthew Johnson
     88 - Olga Wichrowska
     89 - Roger Grosse
     90