theanets.feedforward.Autoencoder¶

class theanets.feedforward.Autoencoder(layers, loss='mse', weighted=False, rng=13)¶

An autoencoder network attempts to reproduce its input.

Notes

Autoencoder models default to a MSE loss. To use a different loss, provide a non-default argument for the loss keyword argument when constructing your model.

Formally, an autoencoder defines a parametric mapping from a data space to the same space:

\[F_\theta: \mathcal{S} \to \mathcal{S}\]

Often, this mapping can be decomposed into an “encoding” stage \(f_\alpha(\cdot)\) and a corresponding “decoding” stage \(g_\beta(\cdot)\) to and from some latent space \(\mathcal{Z} = \mathbb{R}^{n_z}\):

\[\begin{split}\begin{eqnarray*} f_\alpha &:& \mathcal{S} \to \mathcal{Z} \\ g_\beta &:& \mathcal{Z} \to \mathcal{S} \end{eqnarray*}\end{split}\]

Autoencoders form an interesting class of models for several reasons. They:

require only “unlabeled” data (which is typically easy to obtain),
are generalizations of many popular density estimation techniques, and
can be used to model the “manifold” or density of a dataset.

Many extremely common dimensionality reduction techniques can be expressed as autoencoders. For instance, Principal Component Analysis (PCA) can be expressed as a model with two tied, linear layers:

>>> pca = theanets.Autoencoder([10, (5, 'linear'), (10, 'tied')])

Similarly, Independent Component Analysis (ICA) can be expressed as the same model, but trained with a sparsity penalty on the hidden-layer activations:

>>> ica = pca
>>> ica.train([inputs], hidden_l1=0.1)

In this light, “nonlinear PCA” is quite easy to formulate as well!

Examples

To create an autoencoder, just create a new model instance. Often you’ll provide the layer configuration at this time:

>>> model = theanets.Autoencoder([10, 20, 10])

If you want to create an autoencoder with tied weights, specify that layer type when creating the model:

>>> model = theanets.Autoencoder([10, 20, (10, 'tied')])

See Creating a Model for more information.

Data

Training data for an autoencoder takes the form of a two-dimensional array. The shape of this array is (num-examples, num-variables): the first axis enumerates data points in a batch, and the second enumerates the variables in the model.

For instance, to create a training dataset containing 1000 examples:

>>> inputs = np.random.randn(1000, 10).astype('f')

Training

Training the model can be as simple as calling the train() method:

>>> model.train([inputs])

See Training a Model for more information about training.

Use

A model can be used to predict() the output of some input data points:

>>> test = np.random.randn(3, 10).astype('f')
>>> print(model.predict(test))

Additionally, autoencoders can encode() a set of input data points:

>>> enc = model.encode(test)
>>> enc.shape
(3, 20)

The model can also decode() a set of encoded data:

>>> model.decode(enc)

See Using a Model for more information about using models.

__init__(layers, loss='mse', weighted=False, rng=13)¶

Methods

`__init__`(layers[, loss, weighted, rng])
`add_layer`([layer, is_output])	Add a layer to our network graph.
`add_loss`([loss])	Add a loss function to the model.
`build_graph`([regularizers])	Connect the layers in this network to form a computation graph.
`decode`(z[, layer])	Decode an encoded dataset by computing the output layer activation.
`encode`(x[, layer, sample])	Encode a dataset using the hidden layer activations of our network.
`feed_forward`(x, **kwargs)	Compute a forward pass of all layers from the given input.
`find`(which, param)	Get a parameter from a layer in the network.
`itertrain`(train[, valid, algo, subalgo, ...])	Train our network, one batch at a time.
`load`(filename)	Load a saved network from disk.
`loss`(**kwargs)	Return a variable representing the regularized loss for this network.
`monitors`(**kwargs)	Return expressions that should be computed to monitor training.
`predict`(x, **kwargs)	Compute a forward pass of the inputs, returning the network output.
`save`(filename)	Save the state of this network to a pickle file on disk.
`score`(x[, w])	Compute R^2 coefficient of determination for a given input.
`set_loss`(args, *kwargs)	Clear the current loss functions from the network and add a new one.
`train`(args, *kwargs)	Train the network until the trainer converges.
`updates`(**kwargs)	Return expressions to run as updates during network training.

Attributes

`DEFAULT_OUTPUT_ACTIVATION`
`INPUT_NDIM`
`OUTPUT_NDIM`
`inputs`	A list of Theano variables for feedforward computations.
`num_params`	Number of parameters in the entire network model.
`params`	A list of the learnable Theano parameters for this network.
`variables`	A list of Theano variables for loss computations.

decode(z, layer=None)¶

Decode an encoded dataset by computing the output layer activation.

Parameters:

z : ndarray

A matrix containing encoded data from this autoencoder.

layer : int or str or Layer, optional

The index or name of the hidden layer that was used to encode z.

Returns:

decoded : ndarray

The decoded dataset.

encode(x, layer=None, sample=False)¶

Encode a dataset using the hidden layer activations of our network.

Parameters:

x : ndarray

A dataset to encode. Rows of this dataset capture individual data points, while columns represent the variables in each data point.

layer : str, optional

The name of the hidden layer output to use. By default, we use the “middle” hidden layer—for example, for a 4,2,4 or 4,3,2,3,4 autoencoder, we use the layer with size 2.

sample : bool, optional

If True, then draw a sample using the hidden activations as independent Bernoulli probabilities for the encoded data. This assumes the hidden layer has a logistic sigmoid activation function.

Returns:

ndarray :

The given dataset, encoded by the appropriate hidden layer activation.

score(x, w=None)¶

Compute R^2 coefficient of determination for a given input.

Parameters:

x : ndarray (num-examples, num-inputs)

An array containing data to be fed into the network. Multiple examples are arranged as rows in this array, with columns containing the variables for each example.

Returns:

r2 : float

The R^2 correlation between the prediction of this netork and its input. This can serve as one measure of the information loss of the autoencoder.