class theanets.layers.recurrent.GRU(size, inputs, name=None, activation='relu', **kwargs)

Gated Recurrent Unit layer.


The Gated Recurrent Unit lies somewhere between the LSTM and the RRNN in complexity. Like the RRNN, its hidden state is updated at each time step to be a linear interpolation between the previous hidden state, \(h_{t-1}\), and the “target” hidden state, \(h_t\). The interpolation is modulated by an “update gate” that serves the same purpose as the rate gates in the RRNN. Like the LSTM, the target hidden state can also be reset using a dedicated gate. All gates in this layer are activated based on the current input as well as the previous hidden state.

The update equations in this layer are largely those given by [Chu14], page 4, except for the addition of a hidden bias term. They are:

\[\begin{split}\begin{eqnarray} r_t &=& \sigma(x_t W_{xr} + h_{t-1} W_{hr} + b_r) \\ z_t &=& \sigma(x_t W_{xz} + h_{t-1} W_{hz} + b_z) \\ \hat{h}_t &=& g\left(x_t W_{xh} + (r_t \odot h_{t-1}) W_{hh} + b_h\right) \\ h_t &=& (1 - z_t) \odot h_{t-1} + z_t \odot \hat{h}_t. \end{eqnarray}\end{split}\]

Here, \(g(\cdot)\) is the activation function for the layer, and \(\sigma(\cdot)\) is the logistic sigmoid, which ensures that the two gates in the layer are limited to the open interval (0, 1). The symbol \(\odot\) indicates elementwise multiplication.


  • bh — vector of bias values for each hidden unit
  • br — vector of reset biases
  • bz — vector of rate biases
  • xh — matrix connecting inputs to hidden units
  • xr — matrix connecting inputs to reset gates
  • xz — matrix connecting inputs to rate gates
  • hh — matrix connecting hiddens to hiddens
  • hr — matrix connecting hiddens to reset gates
  • hz — matrix connecting hiddens to rate gates


  • out — the post-activation state of the layer
  • pre — the pre-activation state of the layer
  • hid — the pre-rate-mixing hidden state
  • rate — the rate values


[Chu14](1, 2) J. Chung, C. Gulcehre, K. H. Cho, & Y. Bengio (2014), “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”
__init__(size, inputs, name=None, activation='relu', **kwargs)


__init__(size, inputs[, name, activation])
add_bias(name, size[, mean, std]) Helper method to create a new bias vector.
add_weights(name, nin, nout[, mean, std, ...]) Helper method to create a new weight matrix.
connect(inputs) Create Theano variables representing the outputs of this layer.
find(key) Get a shared variable for a parameter by name.
initial_state(name, batch_size) Return an array of suitable for representing initial state.
log() Log some information about this layer.
output_name([name]) Return a fully-scoped name for the given layer output.
to_spec() Create a specification dictionary for this layer.
transform(inputs) Transform inputs to this layer into outputs for the layer.


input_size For networks with one input, get the input size.
num_params Total number of learnable parameters in this layer.
params A list of all parameters in this layer.

Transform inputs to this layer into outputs for the layer.


inputs : dict of theano expressions

Symbolic inputs to this layer, given as a dictionary mapping string names to Theano expressions. See base.Layer.connect().


outputs : dict of theano expressions

A map from string output names to Theano expressions for the outputs from this layer. This layer type generates a “pre” output that gives the unit activity before applying the layer’s activation function, a “hid” output that gives the post-activation values before applying the rate mixing, and an “out” output that gives the overall output.

updates : sequence of update pairs

A sequence of updates to apply to this layer’s state inside a theano function.