class theanets.activations.Maxout(*args, **kwargs)

Arbitrary piecewise linear activation.

This activation is unusual in that it requires a parameter at initialization time: the number of linear pieces to use. Consider a layer for the moment with just one unit. A maxout activation with \(k\) pieces uses a slope \(m_k\) and an intercept \(b_k\) for each linear piece. It then transforms the input to the maximum of all of the pieces:

\[f(x) = \max_k m_k x + b_k\]

The parameters \(m_k\) and \(b_k\) are learnable.

For layers with more than one unit, the maxout activation allocates a slope \(m_{ki}\) and intercept \(b_{ki}\) for each unit \(i\) and each piece \(k\). The activation for unit \(x_i\) is:

\[f(x_i) = \max_k m_{ki} x_i + b_{ki}\]

Again, the slope and intercept parameters are learnable.

This activation is actually a generalization of the rectified linear activations; to see how, just allocate 2 pieces and set the intercepts to 0. The slopes of the relu activation are given by \(m = (0, 1)\), those of the Prelu function are given by \(m = (r, 1)\), and those of the LGrelu are given by \(m = (r, g)\) where \(r\) is the leak rate parameter and \(g\) is a gain parameter.


To use this activation in a network layer specification, provide an activation string of the form 'maxout:k', where k is an integer giving the number of piecewise functions.

For example, the layer tuple (100, 'rnn', 'maxout:10') specifies a vanilla RNN layer with 100 units and a maxout activation with 10 pieces.


pieces : int

Number of linear pieces to use in the activation.

__init__(*args, **kwargs)


__init__(*args, **kwargs)