class theanets.trainer.SupervisedPretrainer(algo, network)

This trainer adapts parameters using a supervised pretraining approach.

In this variant, we create “taps” at increasing depths into the original network weights, training only those weights that are below the tap. So, for a hypothetical binary classifier network with layers [3, 4, 5, 6, 2], we would first insert a tap after the first hidden layer (effectively a binary classifier in a [3, 4, (2)] configuration, where (2) indicates that the corresponding layer is the tap, not present in the original) and train just that network. Then we insert a tap at the next layer (effectively training a [3, 4, 5, (2)] classifier, re-using the trained weights for the 3 x 4 layer), and so forth. When we get to training the last layer, i.e., [3, 4, 5, 6, 2], then we just train all of the layers in the original network.

For autoencoder networks with tied weights, consider an example with layers [3, 4, 5, 6, 5’, 4’, 3’], where the prime indicates that the layer is tied. In cases like this, we train the “outermost” pair of layers first, then add then next pair of layers inward, etc. The training for our example would start with [3, 4, 3’], then proceed to [3, 4, 5, 4’, 3’], and then finish by training all the layers in the original network.

By using layers from the original network whenever possible, we preserve all of the relevant settings of noise, dropouts, loss function and the like, in addition to removing the need for copying trained weights around between different Network instances.



Y. Bengio, P. Lamblin, D. Popovici, & H. Larochelle. (NIPS 2006) “Greedy Layer-Wise Training of Deep Networks”

The Appendix also contains pseudocode for the approaches:

__init__(algo, network)


__init__(algo, network)
itertrain(train[, valid]) Train a model using a training and validation set.
itertrain(train, valid=None, **kwargs)

Train a model using a training and validation set.

This method yields a series of monitor values to the caller. After every iteration, a pair of monitor dictionaries is generated: one evaluated on the training dataset, and another evaluated on the validation dataset. The validation monitors might not be updated during every training iteration; in this case, the most recent validation monitors will be yielded along with the training monitors.


train : Dataset

A set of training data for computing updates to model parameters.

valid : Dataset

A set of validation data for computing monitor values and determining when the loss has stopped improving.