theanets.feedforward.Autoencoder

class theanets.feedforward.Autoencoder(layers, loss='mse', weighted=False, rng=13)

An autoencoder network attempts to reproduce its input.

Notes

Autoencoder models default to a MSE loss. To use a different loss, provide a non-default argument for the loss keyword argument when constructing your model.

Formally, an autoencoder defines a parametric mapping from a data space to the same space:

\[F_\theta: \mathcal{S} \to \mathcal{S}\]

Often, this mapping can be decomposed into an “encoding” stage \(f_\alpha(\cdot)\) and a corresponding “decoding” stage \(g_\beta(\cdot)\) to and from some latent space \(\mathcal{Z} = \mathbb{R}^{n_z}\):

\[\begin{split}\begin{eqnarray*} f_\alpha &:& \mathcal{S} \to \mathcal{Z} \\ g_\beta &:& \mathcal{Z} \to \mathcal{S} \end{eqnarray*}\end{split}\]

Autoencoders form an interesting class of models for several reasons. They:

  • require only “unlabeled” data (which is typically easy to obtain),
  • are generalizations of many popular density estimation techniques, and
  • can be used to model the “manifold” or density of a dataset.

Many extremely common dimensionality reduction techniques can be expressed as autoencoders. For instance, Principal Component Analysis (PCA) can be expressed as a model with two tied, linear layers:

>>> pca = theanets.Autoencoder([10, (5, 'linear'), (10, 'tied')])

Similarly, Independent Component Analysis (ICA) can be expressed as the same model, but trained with a sparsity penalty on the hidden-layer activations:

>>> ica = pca
>>> ica.train([inputs], hidden_l1=0.1)

In this light, “nonlinear PCA” is quite easy to formulate as well!

Examples

To create an autoencoder, just create a new model instance. Often you’ll provide the layer configuration at this time:

>>> model = theanets.Autoencoder([10, 20, 10])

If you want to create an autoencoder with tied weights, specify that layer type when creating the model:

>>> model = theanets.Autoencoder([10, 20, (10, 'tied')])

See Creating a Model for more information.

Data

Training data for an autoencoder takes the form of a two-dimensional array. The shape of this array is (num-examples, num-variables): the first axis enumerates data points in a batch, and the second enumerates the variables in the model.

For instance, to create a training dataset containing 1000 examples:

>>> inputs = np.random.randn(1000, 10).astype('f')

Training

Training the model can be as simple as calling the train() method:

>>> model.train([inputs])

See Training a Model for more information about training.

Use

A model can be used to predict() the output of some input data points:

>>> test = np.random.randn(3, 10).astype('f')
>>> print(model.predict(test))

Additionally, autoencoders can encode() a set of input data points:

>>> enc = model.encode(test)
>>> enc.shape
(3, 20)

The model can also decode() a set of encoded data:

>>> model.decode(enc)

See Using a Model for more information about using models.

__init__(layers, loss='mse', weighted=False, rng=13)

Methods

__init__(layers[, loss, weighted, rng])
add_layer([layer, is_output]) Add a layer to our network graph.
add_loss([loss]) Add a loss function to the model.
build_graph([regularizers]) Connect the layers in this network to form a computation graph.
decode(z[, layer]) Decode an encoded dataset by computing the output layer activation.
encode(x[, layer, sample]) Encode a dataset using the hidden layer activations of our network.
feed_forward(x, **kwargs) Compute a forward pass of all layers from the given input.
find(which, param) Get a parameter from a layer in the network.
itertrain(train[, valid, algo, subalgo, ...]) Train our network, one batch at a time.
load(filename) Load a saved network from disk.
loss(**kwargs) Return a variable representing the regularized loss for this network.
monitors(**kwargs) Return expressions that should be computed to monitor training.
predict(x, **kwargs) Compute a forward pass of the inputs, returning the network output.
save(filename) Save the state of this network to a pickle file on disk.
score(x[, w]) Compute R^2 coefficient of determination for a given input.
set_loss(*args, **kwargs) Clear the current loss functions from the network and add a new one.
train(*args, **kwargs) Train the network until the trainer converges.
updates(**kwargs) Return expressions to run as updates during network training.

Attributes

DEFAULT_OUTPUT_ACTIVATION
INPUT_NDIM
OUTPUT_NDIM
inputs A list of Theano variables for feedforward computations.
num_params Number of parameters in the entire network model.
params A list of the learnable Theano parameters for this network.
variables A list of Theano variables for loss computations.
decode(z, layer=None)

Decode an encoded dataset by computing the output layer activation.

Parameters:

z : ndarray

A matrix containing encoded data from this autoencoder.

layer : int or str or Layer, optional

The index or name of the hidden layer that was used to encode z.

Returns:

decoded : ndarray

The decoded dataset.

encode(x, layer=None, sample=False)

Encode a dataset using the hidden layer activations of our network.

Parameters:

x : ndarray

A dataset to encode. Rows of this dataset capture individual data points, while columns represent the variables in each data point.

layer : str, optional

The name of the hidden layer output to use. By default, we use the “middle” hidden layer—for example, for a 4,2,4 or 4,3,2,3,4 autoencoder, we use the layer with size 2.

sample : bool, optional

If True, then draw a sample using the hidden activations as independent Bernoulli probabilities for the encoded data. This assumes the hidden layer has a logistic sigmoid activation function.

Returns:

ndarray :

The given dataset, encoded by the appropriate hidden layer activation.

score(x, w=None)

Compute R^2 coefficient of determination for a given input.

Parameters:

x : ndarray (num-examples, num-inputs)

An array containing data to be fed into the network. Multiple examples are arranged as rows in this array, with columns containing the variables for each example.

Returns:

r2 : float

The R^2 correlation between the prediction of this netork and its input. This can serve as one measure of the information loss of the autoencoder.