class theanets.layers.recurrent.LSTM(size, inputs, name=None, activation='relu', **kwargs)

Long Short-Term Memory (LSTM) layer.

An LSTM layer is composed of a number of “cells” that are explicitly designed to store information for a certain period of time. Each cell’s stored value is “guarded” by three gates that permit or deny modification of the cell’s value:

  • The “input” gate turns on when the input to the LSTM layer should influence the cell’s value.
  • The “output” gate turns on when the cell’s stored value should propagate to the next layer.
  • The “forget” gate turns on when the cell’s stored value should be reset.


The output \(h_t\) of the LSTM layer at time \(t\) is given as a function of the input \(x_t\) and the previous states of the layer \(h_{t-1}\) and the internal cell \(c_{t-1}\) by:

\[\begin{split}\begin{eqnarray} i_t &=& \sigma(x_t W_{xi} + h_{t-1} W_{hi} + c_{t-1} W_{ci} + b_i) \\ f_t &=& \sigma(x_t W_{xf} + h_{t-1} W_{hf} + c_{t-1} W_{cf} + b_f) \\ c_t &=& f_t c_{t-1} + i_t \tanh(x_t W_{xc} + h_{t-1} W_{hc} + b_c) \\ o_t &=& \sigma(x_t W_{xo} + h_{t-1} W_{ho} + c_t W_{co} + b_o) \\ h_t &=& o_t \tanh(c_t) \end{eqnarray}\end{split}\]

where the \(W_{ab}\) are weight matrix parameters and the \(b_x\) are bias vectors. Equations (1), (2), and (4) give the activations for the three gates in the LSTM unit; these gates are activated using the logistic sigmoid so that their activities are confined to the open interval (0, 1). The value of the cell is updated by equation (3) and is just the weighted sum of the previous cell value and the new cell value, where the weights are given by the forget and input gate activations, respectively. The output of the unit is the cell value weighted by the activation of the output gate.

The LSTM cell has become quite popular in recurrent neural network models. It works amazingly well across a wide variety of tasks and is relatively stable during training. The cost of this performance comes in the form of large numbers of trainable parameters: Each gate as well as the cell receives input from the current input, the previous state of all cells in the LSTM layer, and the previous output of the LSTM layer.

The implementation details for this layer come from the specification given on page 5 of [Gra13a].


  • b — vector of bias values for each hidden unit
  • ci — vector of peephole input weights
  • cf — vector of peephole forget weights
  • co — vector of peephole output weights
  • xh — matrix connecting inputs to four gates
  • hh — matrix connecting hiddens to four gates


  • out — the post-activation state of the layer
  • cell — the state of the hidden “cell”


[Hoc97]S. Hochreiter & J. Schmidhuber. (1997) “Long short-term memory.” Neural computation, 9(8), 1735-1780.
[Gra13a](1, 2) A. Graves. (2013) “Generating Sequences with Recurrent Neural Networks.”


LSTM layers can be incorporated into classification models:

>>> cls = theanets.recurrent.Classifier((28, (100, 'lstm'), 10))

or regression models:

>>> reg = theanets.recurrent.Regressor((28, dict(size=100, form='lstm'), 10))

This layer’s parameters can be retrieved using find:

>>> bias = net.find('hid1', 'b')
>>> ci = net.find('hid1', 'ci')
__init__(size, inputs, name=None, activation='relu', **kwargs)


__init__(size, inputs[, name, activation])
add_bias(name, size[, mean, std]) Helper method to create a new bias vector.
add_weights(name, nin, nout[, mean, std, ...]) Helper method to create a new weight matrix.
connect(inputs) Create Theano variables representing the outputs of this layer.
find(key) Get a shared variable for a parameter by name.
initial_state(name, batch_size) Return an array of suitable for representing initial state.
log() Log some information about this layer.
output_name([name]) Return a fully-scoped name for the given layer output.
setup() Set up the parameters and initial values for this layer.
to_spec() Create a specification dictionary for this layer.
transform(inputs) Transform the inputs for this layer into an output for the layer.


input_size For networks with one input, get the input size.
num_params Total number of learnable parameters in this layer.
params A list of all parameters in this layer.

Set up the parameters and initial values for this layer.


Transform the inputs for this layer into an output for the layer.


inputs : dict of theano expressions

Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See base.Layer.connect().


outputs : dict of theano expressions

A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “cell” output that gives the value of each hidden cell in the layer, and an “out” output that gives the actual gated output from the layer.

updates : list of update pairs

A sequence of updates to apply inside a theano function.