theanets.layers.recurrent.SCRN

class theanets.layers.recurrent.SCRN(rate='vector', **kwargs)

Simple Contextual Recurrent Network layer.

Notes

A Simple Contextual Recurrent Network incorporates an explicitly slow-moving hidden context layer with a simple recurrent network.

The update equations in this layer are largely those given by [Mik15], pages 4 and 5, but this implementation adds a bias term for the output of the layer. The update equations are thus:

\[\begin{split}\begin{eqnarray} s_t &=& r \odot x_t W_{xs} + (1 - r) \odot s_{t-1} \\ h_t &=& \sigma(x_t W_{xh} + h_{t-1} W_{hh} + s_t W_{sh}) \\ o_t &=& g\left(h_t W_{ho} + s_t W_{so} + b\right). \\ \end{eqnarray}\end{split}\]

Here, \(g(\cdot)\) is the activation function for the layer and \(\odot\) is elementwise multiplication. The rate values \(r\) are computed using \(r = \sigma(\hat{r})\) so that the rate values are limited to the open interval (0, 1). \(\sigma(\cdot)\) is the logistic sigmoid.

Parameters

  • xs — matrix connecting inputs to state units (called B in the paper)
  • xh — matrix connecting inputs to hidden units (A)
  • sh — matrix connecting state to hiddens (P)
  • hh — matrix connecting hiddens to hiddens (R)
  • ho — matrix connecting hiddens to output (U)
  • so — matrix connecting state to output (V)
  • b — vector of output bias values (not in original paper)

Additionally, if rate is specified as 'vector' (the default), then we also have:

  • r — vector of learned rate values for the state units

Outputs

  • out — the post-activation state of the layer
  • pre — the pre-activation state of the layer
  • hid — the state of the layer’s hidden units
  • state — the state of the layer’s state units
  • rate — the rate values of the state units

References

[Mik15](1, 2) T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, & M. Ranzato (ICLR 2015) “Learning Longer Memory in Recurrent Neural Networks.” http://arxiv.org/abs/1412.7753
__init__(rate='vector', **kwargs)

Methods

__init__([rate])
add_bias(name, size[, mean, std]) Helper method to create a new bias vector.
add_weights(name, nin, nout[, mean, std, ...]) Helper method to create a new weight matrix.
connect(inputs) Create Theano variables representing the outputs of this layer.
find(key) Get a shared variable for a parameter by name.
initial_state(name, batch_size) Return an array of suitable for representing initial state.
log() Log some information about this layer.
output_name([name]) Return a fully-scoped name for the given layer output.
setup()
to_spec() Create a specification dictionary for this layer.
transform(inputs) Transform inputs to this layer into outputs for the layer.

Attributes

input_size For networks with one input, get the input size.
num_params Total number of learnable parameters in this layer.
params A list of all parameters in this layer.
transform(inputs)

Transform inputs to this layer into outputs for the layer.

Parameters:

inputs : dict of theano expressions

Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See base.Layer.connect().

Returns:

outputs : dict of theano expressions

A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “pre” output that gives the unit activity before applying the layer’s activation function, a “hid” output that gives the post-activation values before applying the rate mixing, and an “out” output that gives the overall output.

updates : sequence of update pairs

A sequence of updates to apply to this layer’s state inside a theano function.