TrainersΒΆ
The most common method for training a neural network model is to use a
stochastic gradient-based optimizer. In theanets
many of these algorithms
are available by interfacing with the downhill
package:
sgd
: Stochastic gradient descentnag
: Nesterov’s accelerated gradientrprop
: Resilient backpropagationrmsprop
: RMSPropadadelta
: ADADELTAesgd
: Equilibrated SGDadam
: Adam
In addition to the optimization algorithms provided by downhill
,
theanets
defines a few algorithms that are more specific to neural networks.
These trainers tend to take advantage of the layered structure of the loss
function for a network.
sample
:Sample trainer
This trainer sets model parameters directly to samples drawn from the training data. This is a very fast “training” algorithm since all updates take place at once; however, often features derived directly from the training data require further tuning to perform well.
layerwise
:Layerwise (supervised) pretrainer
Greedy supervised layerwise pre-training: This trainer applies RMSProp to each layer sequentially.
pretrain
:Unsupervised pretrainer
Greedy unsupervised layerwise pre-training: This trainer applies RMSProp to a tied-weights “shadow” autoencoder using an unlabeled dataset, and then transfers the learned autoencoder weights to the model being trained.