theanets.layers.recurrent.MUT1¶

class theanets.layers.recurrent.MUT1(size, inputs, name=None, activation='relu', **kwargs)¶

“MUT1” evolved recurrent layer.

Notes

This layer is a close cousin of the GRU, which updates the state of the hidden units by linearly interpolating the state from the previous time step with a “target” state. Unlike the GRU, however, this layer omits a dependency on the hidden state for the “rate gate”, and the current input is piped through the tanh function before influencing the target hidden state.

The update equations in this layer are mostly those given by [Joz15], page 7:

\[\begin{split}\begin{eqnarray} r_t &=& \sigma(x_t W_{xr} + h_{t-1} W_{hr} + b_r) \\ z_t &=& \sigma(x_t W_{xz} + b_z) \\ \hat{h}_t &=& \tanh\left(\tanh(x_t W_{xh}) + (r_t \odot h_{t-1}) W_{hh} + b_h\right) \\ h_t &=& (1 - z_t) \odot h_{t-1} + z_t \odot \hat{h}_t. \end{eqnarray}\end{split}\]

Here, the layer activation is always set to \(\tanh\), and \(\sigma(\cdot)\) is the logistic sigmoid, which ensures that the two gates in the layer are limited to the open interval (0, 1). The symbol \(\odot\) indicates elementwise multiplication.

Parameters

bh — vector of bias values for each hidden unit
br — vector of reset biases
bz — vector of rate biases
xh — matrix connecting inputs to hidden units
xr — matrix connecting inputs to reset gates
xz — matrix connecting inputs to rate gates
hh — matrix connecting hiddens to hiddens
hr — matrix connecting hiddens to reset gates

Outputs

out — the post-activation state of the layer
pre — the pre-activation state of the layer
hid — the pre-rate-mixing hidden state
rate — the rate values

References

[Joz15]

(1, 2) R. Jozefowicz, W. Zaremba, & I. Sutskever (2015) “An Empirical Exploration of Recurrent Network Architectures.” http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf

__init__(size, inputs, name=None, activation='relu', **kwargs)¶

Methods

`__init__`(size, inputs[, name, activation])
`add_bias`(name, size[, mean, std])	Helper method to create a new bias vector.
`add_weights`(name, nin, nout[, mean, std, ...])	Helper method to create a new weight matrix.
`connect`(inputs)	Create Theano variables representing the outputs of this layer.
`find`(key)	Get a shared variable for a parameter by name.
`initial_state`(name, batch_size)	Return an array of suitable for representing initial state.
`log`()	Log some information about this layer.
`output_name`([name])	Return a fully-scoped name for the given layer output.
`setup`()
`to_spec`()	Create a specification dictionary for this layer.
`transform`(inputs)	Transform inputs to this layer into outputs for the layer.

Attributes

`input_size`	For networks with one input, get the input size.
`num_params`	Total number of learnable parameters in this layer.
`params`	A list of all parameters in this layer.

transform(inputs)¶

Transform inputs to this layer into outputs for the layer.

Parameters:

inputs : dict of theano expressions

Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See base.Layer.connect().

Returns:

outputs : dict of theano expressions

A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “pre” output that gives the unit activity before applying the layer’s activation function, a “hid” output that gives the post-activation values before applying the rate mixing, and an “out” output that gives the overall output.

updates : sequence of update pairs

A sequence of updates to apply to this layer’s state inside a theano function.