theanets.activations.Maxout¶
-
class
theanets.activations.
Maxout
(*args, **kwargs)¶ Arbitrary piecewise linear activation.
This activation is unusual in that it requires a parameter at initialization time: the number of linear pieces to use. Consider a layer for the moment with just one unit. A maxout activation with \(k\) pieces uses a slope \(m_k\) and an intercept \(b_k\) for each linear piece. It then transforms the input to the maximum of all of the pieces:
\[f(x) = \max_k m_k x + b_k\]The parameters \(m_k\) and \(b_k\) are learnable.
For layers with more than one unit, the maxout activation allocates a slope \(m_{ki}\) and intercept \(b_{ki}\) for each unit \(i\) and each piece \(k\). The activation for unit \(x_i\) is:
\[f(x_i) = \max_k m_{ki} x_i + b_{ki}\]Again, the slope and intercept parameters are learnable.
This activation is actually a generalization of the rectified linear activations; to see how, just allocate 2 pieces and set the intercepts to 0. The slopes of the
relu
activation are given by \(m = (0, 1)\), those of thePrelu
function are given by \(m = (r, 1)\), and those of theLGrelu
are given by \(m = (r, g)\) where \(r\) is the leak rate parameter and \(g\) is a gain parameter.Note
To use this activation in a network layer specification, provide an activation string of the form
'maxout:k'
, wherek
is an integer giving the number of piecewise functions.For example, the layer tuple
(100, 'rnn', 'maxout:10')
specifies a vanillaRNN
layer with 100 units and a maxout activation with 10 pieces.Parameters: pieces : int
Number of linear pieces to use in the activation.
-
__init__
(*args, **kwargs)¶
Methods
__init__
(*args, **kwargs)-