theanets.layers.recurrent.SCRN¶
-
class
theanets.layers.recurrent.
SCRN
(rate='vector', **kwargs)¶ Simple Contextual Recurrent Network layer.
Notes
A Simple Contextual Recurrent Network incorporates an explicitly slow-moving hidden context layer with a simple recurrent network.
The update equations in this layer are largely those given by [Mik15], pages 4 and 5, but this implementation adds a bias term for the output of the layer. The update equations are thus:
\[\begin{split}\begin{eqnarray} s_t &=& r \odot x_t W_{xs} + (1 - r) \odot s_{t-1} \\ h_t &=& \sigma(x_t W_{xh} + h_{t-1} W_{hh} + s_t W_{sh}) \\ o_t &=& g\left(h_t W_{ho} + s_t W_{so} + b\right). \\ \end{eqnarray}\end{split}\]Here, \(g(\cdot)\) is the activation function for the layer and \(\odot\) is elementwise multiplication. The rate values \(r\) are computed using \(r = \sigma(\hat{r})\) so that the rate values are limited to the open interval (0, 1). \(\sigma(\cdot)\) is the logistic sigmoid.
Parameters
xs
— matrix connecting inputs to state units (called B in the paper)xh
— matrix connecting inputs to hidden units (A)sh
— matrix connecting state to hiddens (P)hh
— matrix connecting hiddens to hiddens (R)ho
— matrix connecting hiddens to output (U)so
— matrix connecting state to output (V)b
— vector of output bias values (not in original paper)
Additionally, if
rate
is specified as'vector'
(the default), then we also have:r
— vector of learned rate values for the state units
Outputs
out
— the post-activation state of the layerpre
— the pre-activation state of the layerhid
— the state of the layer’s hidden unitsstate
— the state of the layer’s state unitsrate
— the rate values of the state units
References
[Mik15] (1, 2) T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, & M. Ranzato (ICLR 2015) “Learning Longer Memory in Recurrent Neural Networks.” http://arxiv.org/abs/1412.7753 -
__init__
(rate='vector', **kwargs)¶
Methods
__init__
([rate])add_bias
(name, size[, mean, std])Helper method to create a new bias vector. add_weights
(name, nin, nout[, mean, std, ...])Helper method to create a new weight matrix. connect
(inputs)Create Theano variables representing the outputs of this layer. find
(key)Get a shared variable for a parameter by name. initial_state
(name, batch_size)Return an array of suitable for representing initial state. log
()Log some information about this layer. output_name
([name])Return a fully-scoped name for the given layer output. setup
()to_spec
()Create a specification dictionary for this layer. transform
(inputs)Transform inputs to this layer into outputs for the layer. Attributes
input_size
For networks with one input, get the input size. num_params
Total number of learnable parameters in this layer. params
A list of all parameters in this layer. -
transform
(inputs)¶ Transform inputs to this layer into outputs for the layer.
Parameters: inputs : dict of theano expressions
Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See
base.Layer.connect()
.Returns: outputs : dict of theano expressions
A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “pre” output that gives the unit activity before applying the layer’s activation function, a “hid” output that gives the post-activation values before applying the rate mixing, and an “out” output that gives the overall output.
updates : sequence of update pairs
A sequence of updates to apply to this layer’s state inside a theano function.