theanets.layers.recurrent.SCRN¶
-
class
theanets.layers.recurrent.SCRN(rate='vector', **kwargs)¶ Simple Contextual Recurrent Network layer.
Notes
A Simple Contextual Recurrent Network incorporates an explicitly slow-moving hidden context layer with a simple recurrent network.
The update equations in this layer are largely those given by [Mik15], pages 4 and 5, but this implementation adds a bias term for the output of the layer. The update equations are thus:
\[\begin{split}\begin{eqnarray} s_t &=& r \odot x_t W_{xs} + (1 - r) \odot s_{t-1} \\ h_t &=& \sigma(x_t W_{xh} + h_{t-1} W_{hh} + s_t W_{sh}) \\ o_t &=& g\left(h_t W_{ho} + s_t W_{so} + b\right). \\ \end{eqnarray}\end{split}\]Here, \(g(\cdot)\) is the activation function for the layer and \(\odot\) is elementwise multiplication. The rate values \(r\) are computed using \(r = \sigma(\hat{r})\) so that the rate values are limited to the open interval (0, 1). \(\sigma(\cdot)\) is the logistic sigmoid.
Parameters
xs— matrix connecting inputs to state units (called B in the paper)xh— matrix connecting inputs to hidden units (A)sh— matrix connecting state to hiddens (P)hh— matrix connecting hiddens to hiddens (R)ho— matrix connecting hiddens to output (U)so— matrix connecting state to output (V)b— vector of output bias values (not in original paper)
Additionally, if
rateis specified as'vector'(the default), then we also have:r— vector of learned rate values for the state units
Outputs
out— the post-activation state of the layerpre— the pre-activation state of the layerhid— the state of the layer’s hidden unitsstate— the state of the layer’s state unitsrate— the rate values of the state units
References
[Mik15] (1, 2) T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, & M. Ranzato (ICLR 2015) “Learning Longer Memory in Recurrent Neural Networks.” http://arxiv.org/abs/1412.7753 -
__init__(rate='vector', **kwargs)¶
Methods
__init__([rate])add_bias(name, size[, mean, std])Helper method to create a new bias vector. add_weights(name, nin, nout[, mean, std, ...])Helper method to create a new weight matrix. connect(inputs)Create Theano variables representing the outputs of this layer. find(key)Get a shared variable for a parameter by name. initial_state(name, batch_size)Return an array of suitable for representing initial state. log()Log some information about this layer. output_name([name])Return a fully-scoped name for the given layer output. setup()to_spec()Create a specification dictionary for this layer. transform(inputs)Transform inputs to this layer into outputs for the layer. Attributes
input_sizeFor networks with one input, get the input size. num_paramsTotal number of learnable parameters in this layer. paramsA list of all parameters in this layer. -
transform(inputs)¶ Transform inputs to this layer into outputs for the layer.
Parameters: inputs : dict of theano expressions
Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See
base.Layer.connect().Returns: outputs : dict of theano expressions
A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “pre” output that gives the unit activity before applying the layer’s activation function, a “hid” output that gives the post-activation values before applying the rate mixing, and an “out” output that gives the overall output.
updates : sequence of update pairs
A sequence of updates to apply to this layer’s state inside a theano function.