.. _trainers:
========
Trainers
========
The most common method for training a neural network model is to use a
stochastic gradient-based optimizer. In ``theanets`` many of these algorithms
are available by interfacing with the ``downhill`` package:
- ``sgd``: `Stochastic gradient descent`_
- ``nag``: `Nesterov's accelerated gradient`_
- ``rprop``: `Resilient backpropagation`_
- ``rmsprop``: RMSProp_
- ``adadelta``: ADADELTA_
- ``esgd``: `Equilibrated SGD`_
- ``adam``: Adam_
.. _Stochastic gradient descent: http://downhill.readthedocs.org/en/stable/generated/downhill.first_order.SGD.html
.. _Nesterov's accelerated gradient: http://downhill.readthedocs.org/en/stable/generated/downhill.first_order.NAG.html
.. _Resilient backpropagation: http://downhill.readthedocs.org/en/stable/generated/downhill.adaptive.RProp.html
.. _RMSProp: http://downhill.readthedocs.org/en/stable/generated/downhill.adaptive.RMSProp.html
.. _ADADELTA: http://downhill.readthedocs.org/en/stable/generated/downhill.adaptive.ADADELTA.html
.. _Equilibrated SGD: http://downhill.readthedocs.org/en/stable/generated/downhill.adaptive.ESGD.html
.. _Adam: http://downhill.readthedocs.org/en/stable/generated/downhill.adaptive.Adam.html
In addition to the optimization algorithms provided by ``downhill``,
``theanets`` defines a few algorithms that are more specific to neural networks.
These trainers tend to take advantage of the layered structure of the loss
function for a network.
- ``sample``: :class:`Sample trainer `
This trainer sets model parameters directly to samples drawn from the training
data. This is a very fast "training" algorithm since all updates take place at
once; however, often features derived directly from the training data require
further tuning to perform well.
- ``layerwise``: :class:`Layerwise (supervised) pretrainer `
Greedy supervised layerwise pre-training: This trainer applies RMSProp to each
layer sequentially.
- ``pretrain``: :class:`Unsupervised pretrainer `
Greedy unsupervised layerwise pre-training: This trainer applies RMSProp to a
tied-weights "shadow" autoencoder using an unlabeled dataset, and then transfers
the learned autoencoder weights to the model being trained.