theanets.layers.recurrent.LSTM

class theanets.layers.recurrent.LSTM(c_0=None, **kwargs)[source]

Long Short-Term Memory (LSTM) layer.

An LSTM layer is composed of a number of “cells” that are explicitly designed to store information for a certain period of time. Each cell’s stored value is “guarded” by three gates that permit or deny modification of the cell’s value:

  • The “input” gate turns on when the input to the LSTM layer should influence the cell’s value.
  • The “output” gate turns on when the cell’s stored value should propagate to the next layer.
  • The “forget” gate turns on when the cell’s stored value should be reset.

Notes

The output \(h_t\) of the LSTM layer at time \(t\) is given as a function of the input \(x_t\) and the previous states of the layer \(h_{t-1}\) and the internal cell \(c_{t-1}\) by:

\[\begin{split}\begin{eqnarray} i_t &=& \sigma(x_t W_{xi} + h_{t-1} W_{hi} + c_{t-1} W_{ci} + b_i) \\ f_t &=& \sigma(x_t W_{xf} + h_{t-1} W_{hf} + c_{t-1} W_{cf} + b_f) \\ c_t &=& f_t c_{t-1} + i_t \tanh(x_t W_{xc} + h_{t-1} W_{hc} + b_c) \\ o_t &=& \sigma(x_t W_{xo} + h_{t-1} W_{ho} + c_t W_{co} + b_o) \\ h_t &=& o_t \tanh(c_t) \end{eqnarray}\end{split}\]

where the \(W_{ab}\) are weight matrix parameters and the \(b_x\) are bias vectors. Equations (1), (2), and (4) give the activations for the three gates in the LSTM unit; these gates are activated using the logistic sigmoid so that their activities are confined to the open interval (0, 1). The value of the cell is updated by equation (3) and is just the weighted sum of the previous cell value and the new cell value, where the weights are given by the forget and input gate activations, respectively. The output of the unit is the cell value weighted by the activation of the output gate.

The LSTM cell has become quite popular in recurrent neural network models. It works amazingly well across a wide variety of tasks and is relatively stable during training. The cost of this performance comes in the form of large numbers of trainable parameters: Each gate as well as the cell receives input from the current input, the previous state of all cells in the LSTM layer, and the previous output of the LSTM layer.

The implementation details for this layer come from the specification given on page 5 of [Gra13a].

Parameters

  • b — vector of bias values for each hidden unit
  • ci — vector of peephole input weights
  • cf — vector of peephole forget weights
  • co — vector of peephole output weights
  • xh — matrix connecting inputs to four gates
  • hh — matrix connecting hiddens to four gates

Outputs

  • out — the post-activation state of the layer
  • cell — the state of the hidden “cell”

References

[Hoc97]S. Hochreiter & J. Schmidhuber. (1997) “Long short-term memory.” Neural computation, 9(8), 1735-1780.
[Gra13a](1, 2) A. Graves. (2013) “Generating Sequences with Recurrent Neural Networks.” http://arxiv.org/pdf/1308.0850v5.pdf

Examples

LSTM layers can be incorporated into classification models:

>>> cls = theanets.recurrent.Classifier((28, (100, 'lstm'), 10))

or regression models:

>>> reg = theanets.recurrent.Regressor((28, dict(size=100, form='lstm'), 10))

This layer’s parameters can be retrieved using find:

>>> bias = net.find('hid1', 'b')
>>> ci = net.find('hid1', 'ci')
__init__(c_0=None, **kwargs)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

Methods

__init__([c_0]) x.__init__(…) initializes x; see help(type(x)) for signature
add_bias(name, size[, mean, std]) Helper method to create a new bias vector.
add_weights(name, nin, nout[, mean, std, …]) Helper method to create a new weight matrix.
bind(graph[, reset, initialize]) Bind this layer into a computation graph.
connect(inputs) Create Theano variables representing the outputs of this layer.
find(key) Get a shared variable for a parameter by name.
full_name(name) Return a fully-scoped name for the given layer output.
log() Log some information about this layer.
log_params() Log information about this layer’s parameters.
resolve_inputs(layers) Resolve the names of inputs for this layer into shape tuples.
resolve_outputs() Resolve the names of outputs for this layer into shape tuples.
setup() Set up the parameters and initial values for this layer.
to_spec() Create a specification dictionary for this layer.
transform(inputs) Transform the inputs for this layer into an output for the layer.

Attributes

input_name Name of layer input (for layers with one input).
input_shape Shape of layer input (for layers with one input).
input_size Size of layer input (for layers with one input).
output_name Full name of the default output for this layer.
output_shape Shape of default output from this layer.
output_size Number of “neurons” in this layer’s default output.
params A list of all parameters in this layer.
resolve_inputs(layers)[source]

Resolve the names of inputs for this layer into shape tuples.

Parameters:
layers : list of Layer

A list of the layers that are available for resolving inputs.

Raises:
theanets.util.ConfigurationError :

If an input cannot be resolved.

setup()[source]

Set up the parameters and initial values for this layer.

to_spec()[source]

Create a specification dictionary for this layer.

Returns:
spec : dict

A dictionary specifying the configuration of this layer.

transform(inputs)[source]

Transform the inputs for this layer into an output for the layer.

Parameters:
inputs : dict of Theano expressions

Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See Layer.connect().

Returns:
output : Theano expression

The output for this layer is the same as the input.

updates : list

An empty updates list.