theanets.regularizers.Contractive

class theanets.regularizers.Contractive(pattern=None, weight=0.0, wrt='*')

Penalize the derivative of hidden layers with respect to their inputs.

Parameters:

wrt : str, optional

A glob-style pattern that specifies the inputs with respect to which the derivative should be computed. Defaults to '*', which matches all inputs.

Notes

This regularizer implements the loss() method to add the following term to the network’s loss function:

\[\frac{1}{|\Omega|} \sum_{i \in \Omega} \|\frac{\partial Z_i}{x}\|_F^2\]

where \(\Omega\) is a set of “matching” graph output indices, \(Z_i\) is the output of network graph \(i\), \(x\) is the input to the network graph, and :math`|cdot|_F` is the Frobenius norm (sum of the squared elements in the array).

This regularizer attempts to make the derivative of the hidden representatin flat with respect to the input. In theory, this encourages the network to learn features that are insensitive to small changes in the input (that is, they are mostly perpindicular to the input manifold).

Like the HiddenL1 regularizer, this acts indirectly to force a model to cover the space of its input dataset using as few features as possible; this pressure often causes features to be duplicated with slight variations to “tile” the input space in a very different way than a non-regularized model.

References

[Rif11]

S. Rifai, P. Vincent, X. Muller, X. Glorot, & Y. Bengio. (ICML 2011). “Contractive auto-encoders: Explicit invariance during feature extraction.”

http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Rifai_455.pdf

Examples

This regularizer can be specified at training or test time by providing the hidden_l1 or hidden_sparsity keyword arguments:

>>> net = theanets.Regression(...)

To use this regularizer at training time:

>>> net.train(..., contractive=0.1)

By default all hidden layer outputs are included. To include only some graph outputs:

>>> net.train(..., contractive=dict(weight=0.1, pattern='hid3:out', wrt='in'))

To use this regularizer when running the model forward to generate a prediction:

>>> net.predict(..., contractive=0.1)

The value associated with the keyword argument can be a scalar—in which case it provides the weight for the regularizer—or a dictionary, in which case it will be passed as keyword arguments directly to the constructor.

__init__(pattern=None, weight=0.0, wrt='*')

Methods

__init__([pattern, weight, wrt])
log() Log some diagnostic info about this regularizer.
loss(layer_list, outputs)
modify_graph(outputs) Modify the outputs of a particular layer in the computation graph.
log()

Log some diagnostic info about this regularizer.