theanets.layers.recurrent.MRNN¶
-
class
theanets.layers.recurrent.
MRNN
(factors=None, **kwargs)¶ A recurrent network layer with multiplicative dynamics.
Notes
The formulation of MRNN implemented here uses a factored dynamics matrix. To understand the motivation for a factored dynamics, imagine for a moment a vanilla recurrent layer with one binary input, whose hidden dynamics depend on the input, so that \(W_{hh}^0\) is used if the input is 0, and \(W_{hh}^1\) is used if the input is 1:
\[h_t = \sigma(h_{t-1} W_{hh}^{x_t} + x_t W_{xh} + b)\]This generalizes to the idea that there might be an entire collection of \(W_{hh}^i\) matrices that govern the hidden dynamics of the network, one for each \(0 \le i < N\). But in the general case, it would be prohibitively expensive to store this weight tensor; in addition, there are probably many shared hidden dynamics that one might want to learn across all of these runtime “modes.”
The MRNN solves this problem by factoring the weight tensor idea into two 2–dimensional arrays. The hidden state is mapped to and from “factor space” by \(W_{hf}\) and \(W_{fh}\), respectively, and the latent factors are modulated by the input using \(W_{xf}\).
The overall hidden activation for the MRNN model, then, looks like:
\[h_t = \sigma((x_t W_{xf} \odot h_{t-1} W_{hf}) W_{fh} + x_t W_{xh} + b)\]where \(odot\) represents the elementwise product of two vectors.
Parameters
b
— vector of bias values for each hidden unitxf
— matrix connecting inputs to factorsxh
— matrix connecting inputs to hiddenshf
— matrix connecting hiddens to factorsfh
— matrix connecting factors to hiddens
Outputs
out
— the post-activation state of the layerpre
— the pre-activation state of the layerfactors
— the activations of the latent factors
References
[Sut11] I. Sutskever, J. Martens, & G. E. Hinton. (ICML 2011) “Generating text with recurrent neural networks.” http://www.icml-2011.org/papers/524_icmlpaper.pdf -
__init__
(factors=None, **kwargs)¶
Methods
__init__
([factors])add_weights
(name, nin, nout[, mean, std, ...])Helper method to create a new weight matrix. initial_state
(name, batch_size)Return an array of suitable for representing initial state. setup
()Set up the parameters and initial values for this layer. to_spec
()Create a specification dictionary for this layer. transform
(inputs)Transform the inputs for this layer into an output for the layer. Attributes
input_size
For networks with one input, get the input size. num_params
Total number of learnable parameters in this layer. params
A list of all parameters in this layer. -
setup
()¶ Set up the parameters and initial values for this layer.
-
to_spec
()¶ Create a specification dictionary for this layer.
Returns: spec : dict
A dictionary specifying the configuration of this layer.
-
transform
(inputs)¶ Transform the inputs for this layer into an output for the layer.
Parameters: inputs : dict of theano expressions
Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See
base.Layer.connect()
.Returns: outputs : dict of theano expressions
A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “factors” output that gives the activation of the hidden weight factors given the input data (but not incorporating influence from the hidden states), a “pre” output that gives the unit activity before applying the layer’s activation function, and an “out” output that gives the post-activation output.
updates : list of update pairs
A sequence of updates to apply inside a theano function.