theanets.layers.recurrent.MUT1¶

class theanets.layers.recurrent.MUT1(h_0=None, **kwargs)[source]

“MUT1” evolved recurrent layer.

Notes

This layer is a close cousin of the GRU, which updates the state of the hidden units by linearly interpolating the state from the previous time step with a “target” state. Unlike the GRU, however, this layer omits a dependency on the hidden state for the “rate gate”, and the current input is piped through the tanh function before influencing the target hidden state.

The update equations in this layer are mostly those given by [Joz15], page 7:

$\begin{split}\begin{eqnarray} r_t &=& \sigma(x_t W_{xr} + h_{t-1} W_{hr} + b_r) \\ z_t &=& \sigma(x_t W_{xz} + b_z) \\ \hat{h}_t &=& \tanh\left(\tanh(x_t W_{xh}) + (r_t \odot h_{t-1}) W_{hh} + b_h\right) \\ h_t &=& (1 - z_t) \odot h_{t-1} + z_t \odot \hat{h}_t. \end{eqnarray}\end{split}$

Here, the layer activation is always set to $$\tanh$$, and $$\sigma(\cdot)$$ is the logistic sigmoid, which ensures that the two gates in the layer are limited to the open interval (0, 1). The symbol $$\odot$$ indicates elementwise multiplication.

Parameters

• bh — vector of bias values for each hidden unit
• br — vector of reset biases
• bz — vector of rate biases
• xh — matrix connecting inputs to hidden units
• xr — matrix connecting inputs to reset gates
• xz — matrix connecting inputs to rate gates
• hh — matrix connecting hiddens to hiddens
• hr — matrix connecting hiddens to reset gates

Outputs

• out — the post-activation state of the layer
• pre — the pre-activation state of the layer
• hid — the pre-rate-mixing hidden state
• rate — the rate values

References

 [Joz15] (1, 2) R. Jozefowicz, W. Zaremba, & I. Sutskever (2015) “An Empirical Exploration of Recurrent Network Architectures.” http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf
__init__(h_0=None, **kwargs)

x.__init__(…) initializes x; see help(type(x)) for signature

Methods

 __init__([h_0]) x.__init__(…) initializes x; see help(type(x)) for signature add_bias(name, size[, mean, std]) Helper method to create a new bias vector. add_weights(name, nin, nout[, mean, std, …]) Helper method to create a new weight matrix. bind(graph[, reset, initialize]) Bind this layer into a computation graph. connect(inputs) Create Theano variables representing the outputs of this layer. find(key) Get a shared variable for a parameter by name. full_name(name) Return a fully-scoped name for the given layer output. log() Log some information about this layer. log_params() Log information about this layer’s parameters. resolve_inputs(layers) Resolve the names of inputs for this layer into shape tuples. resolve_outputs() Resolve the names of outputs for this layer into shape tuples. setup() Set up the parameters and initial values for this layer. to_spec() Create a specification dictionary for this layer. transform(inputs) Transform the inputs for this layer into an output for the layer.

Attributes

 input_name Name of layer input (for layers with one input). input_shape Shape of layer input (for layers with one input). input_size Size of layer input (for layers with one input). output_name Full name of the default output for this layer. output_shape Shape of default output from this layer. output_size Number of “neurons” in this layer’s default output. params A list of all parameters in this layer.
setup()[source]

Set up the parameters and initial values for this layer.

transform(inputs)[source]

Transform the inputs for this layer into an output for the layer.

Parameters: inputs : dict of Theano expressions Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See Layer.connect(). output : Theano expression The output for this layer is the same as the input. updates : list An empty updates list.