theanets.layers.recurrent.LSTM¶
-
class
theanets.layers.recurrent.
LSTM
(c_0=None, **kwargs)[source]¶ Long Short-Term Memory (LSTM) layer.
An LSTM layer is composed of a number of “cells” that are explicitly designed to store information for a certain period of time. Each cell’s stored value is “guarded” by three gates that permit or deny modification of the cell’s value:
- The “input” gate turns on when the input to the LSTM layer should influence the cell’s value.
- The “output” gate turns on when the cell’s stored value should propagate to the next layer.
- The “forget” gate turns on when the cell’s stored value should be reset.
Notes
The output \(h_t\) of the LSTM layer at time \(t\) is given as a function of the input \(x_t\) and the previous states of the layer \(h_{t-1}\) and the internal cell \(c_{t-1}\) by:
\[\begin{split}\begin{eqnarray} i_t &=& \sigma(x_t W_{xi} + h_{t-1} W_{hi} + c_{t-1} W_{ci} + b_i) \\ f_t &=& \sigma(x_t W_{xf} + h_{t-1} W_{hf} + c_{t-1} W_{cf} + b_f) \\ c_t &=& f_t c_{t-1} + i_t \tanh(x_t W_{xc} + h_{t-1} W_{hc} + b_c) \\ o_t &=& \sigma(x_t W_{xo} + h_{t-1} W_{ho} + c_t W_{co} + b_o) \\ h_t &=& o_t \tanh(c_t) \end{eqnarray}\end{split}\]where the \(W_{ab}\) are weight matrix parameters and the \(b_x\) are bias vectors. Equations (1), (2), and (4) give the activations for the three gates in the LSTM unit; these gates are activated using the logistic sigmoid so that their activities are confined to the open interval (0, 1). The value of the cell is updated by equation (3) and is just the weighted sum of the previous cell value and the new cell value, where the weights are given by the forget and input gate activations, respectively. The output of the unit is the cell value weighted by the activation of the output gate.
The LSTM cell has become quite popular in recurrent neural network models. It works amazingly well across a wide variety of tasks and is relatively stable during training. The cost of this performance comes in the form of large numbers of trainable parameters: Each gate as well as the cell receives input from the current input, the previous state of all cells in the LSTM layer, and the previous output of the LSTM layer.
The implementation details for this layer come from the specification given on page 5 of [Gra13a].
Parameters
b
— vector of bias values for each hidden unitci
— vector of peephole input weightscf
— vector of peephole forget weightsco
— vector of peephole output weightsxh
— matrix connecting inputs to four gateshh
— matrix connecting hiddens to four gates
Outputs
out
— the post-activation state of the layercell
— the state of the hidden “cell”
References
[Hoc97] S. Hochreiter & J. Schmidhuber. (1997) “Long short-term memory.” Neural computation, 9(8), 1735-1780. [Gra13a] (1, 2) A. Graves. (2013) “Generating Sequences with Recurrent Neural Networks.” http://arxiv.org/pdf/1308.0850v5.pdf Examples
LSTM layers can be incorporated into classification models:
>>> cls = theanets.recurrent.Classifier((28, (100, 'lstm'), 10))
or regression models:
>>> reg = theanets.recurrent.Regressor((28, dict(size=100, form='lstm'), 10))
This layer’s parameters can be retrieved using
find
:>>> bias = net.find('hid1', 'b') >>> ci = net.find('hid1', 'ci')
Methods
__init__
([c_0])x.__init__(…) initializes x; see help(type(x)) for signature add_bias
(name, size[, mean, std])Helper method to create a new bias vector. add_weights
(name, nin, nout[, mean, std, …])Helper method to create a new weight matrix. bind
(graph[, reset, initialize])Bind this layer into a computation graph. connect
(inputs)Create Theano variables representing the outputs of this layer. find
(key)Get a shared variable for a parameter by name. full_name
(name)Return a fully-scoped name for the given layer output. log
()Log some information about this layer. log_params
()Log information about this layer’s parameters. resolve_inputs
(layers)Resolve the names of inputs for this layer into shape tuples. resolve_outputs
()Resolve the names of outputs for this layer into shape tuples. setup
()Set up the parameters and initial values for this layer. to_spec
()Create a specification dictionary for this layer. transform
(inputs)Transform the inputs for this layer into an output for the layer. Attributes
input_name
Name of layer input (for layers with one input). input_shape
Shape of layer input (for layers with one input). input_size
Size of layer input (for layers with one input). output_name
Full name of the default output for this layer. output_shape
Shape of default output from this layer. output_size
Number of “neurons” in this layer’s default output. params
A list of all parameters in this layer. -
resolve_inputs
(layers)[source]¶ Resolve the names of inputs for this layer into shape tuples.
Parameters: - layers : list of
Layer
A list of the layers that are available for resolving inputs.
Raises: - theanets.util.ConfigurationError :
If an input cannot be resolved.
- layers : list of
-
to_spec
()[source]¶ Create a specification dictionary for this layer.
Returns: - spec : dict
A dictionary specifying the configuration of this layer.
-
transform
(inputs)[source]¶ Transform the inputs for this layer into an output for the layer.
Parameters: - inputs : dict of Theano expressions
Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See
Layer.connect()
.
Returns: - output : Theano expression
The output for this layer is the same as the input.
- updates : list
An empty updates list.