theanets.layers.recurrent.Clockwork¶
-
class
theanets.layers.recurrent.
Clockwork
(periods, **kwargs)[source]¶ A Clockwork RNN layer updates “modules” of neurons at specific rates.
Parameters: - periods : sequence of int
The periods for the modules in this clockwork layer. The number of values in this sequence specifies the number of modules in the layer. The layer size must be an integer multiple of the number of modules given in this sequence.
Notes
In a vanilla
RNN
layer, all neurons in the hidden pool are updated at every time step by mixing an affine transformation of the input with an affine transformation of the state of the hidden pool neurons at the previous time step:\[h_t = g(x_tW_{xh} + h_{t-1}W_{hh} + b_h)\]In a Clockwork RNN layer, neurons in the hidden pool are split into \(M\) “modules” of equal size (\(h^i\) for \(i = 1, \dots, M\)), each of which has an associated clock period (a positive integer \(T_i\) for \(i = 1, \dots, M\)). The neurons in module \(i\) are updated only when the time index \(t\) of the input \(x_t\) is an even multiple of \(T_i\). Thus some of modules (those with large \(T\)) only respond to “slow” features in the input, and others (those with small \(T\)) respond to “fast” features.
Furthermore, “fast” modules with small periods receive inputs from “slow” modules with large periods, but not vice-versa: this allows the “slow” features to influence the “fast” features, but not the other way around.
The state \(h_t^i\) of module \(i\) at time step \(t\) is thus governed by the following mathematical relation:
\[\begin{split}h_t^i = \left\{ \begin{align*} &g\left( x_tW_{xh}^i + b_h^i + \sum_{j=i}^M h_{t-1}^jW_{hh}^j\right) \mbox{ if } t \mod T_i = 0 \\ &h_{t-1}^i \mbox{ otherwise.} \end{align*} \right.\end{split}\]Here, the \(M\) modules have been ordered such that \(T_i < T_j\) for \(i < j\) – that is, the modules are ordered from “fastest” to “slowest.”
Note that, unlike in the original paper, the hidden-hidden weight matrix is stored in full (i.e., it is
size
xsize
); the module separation is enforced by masking this weight matrix with zeros in the appropriate places. This implementation runs much faster on a GPU than an approach that uses dedicated module parameters.Parameters
b
— vector of bias values for each hidden unitxh
— matrix connecting inputs to hidden unitshh
— matrix connecting hiddens to hiddens
Outputs
out
— the post-activation state of the layerpre
— the pre-activation state of the layer
References
[Kou14] J. Koutník, K. Greff, F. Gomez, & J. Schmidhuber. (2014) “A Clockwork RNN.” http://arxiv.org/abs/1402.3511 Methods
__init__
(periods, **kwargs)x.__init__(…) initializes x; see help(type(x)) for signature add_bias
(name, size[, mean, std])Helper method to create a new bias vector. add_weights
(name, nin, nout[, mean, std, …])Helper method to create a new weight matrix. bind
(*args, **kwargs)Bind this layer into a computation graph. connect
(inputs)Create Theano variables representing the outputs of this layer. find
(key)Get a shared variable for a parameter by name. full_name
(name)Return a fully-scoped name for the given layer output. log
()Log some information about this layer. log_params
()Log information about this layer’s parameters. resolve_inputs
(layers)Resolve the names of inputs for this layer into shape tuples. resolve_outputs
()Resolve the names of outputs for this layer into shape tuples. setup
()Set up the parameters and initial values for this layer. to_spec
()Create a specification dictionary for this layer. transform
(inputs)Transform the inputs for this layer into an output for the layer. Attributes
input_name
Name of layer input (for layers with one input). input_shape
Shape of layer input (for layers with one input). input_size
Size of layer input (for layers with one input). output_name
Full name of the default output for this layer. output_shape
Shape of default output from this layer. output_size
Number of “neurons” in this layer’s default output. params
A list of all parameters in this layer.