Activation Functions

An activation function (sometimes also called a transfer function) specifies how the final output of a layer is computed from the weighted sums of the inputs.

By default, hidden layers in theanets use a rectified linear activation function: \(g(z) = \max(0, z)\).

Output layers in theanets.Regressor and theanets.Autoencoder models use linear activations (i.e., the output is just the weighted sum of the inputs from the previous layer: \(g(z) = z\)), and the output layer in theanets.Classifier models uses a softmax activation: \(g(z) = \exp(z) / \sum\exp(z)\).

To specify a different activation function for a layer, include an activation key chosen from the table below, or create a custom activation. As described in Specifying Layers, the activation key can be included in your model specification either using the activation keyword argument in a layer dictionary, or by including the key in a tuple with the layer size:

net = theanets.Regressor([10, (10, 'tanh'), 10])

The activations that theanets provides are:

Key Description \(g(z) =\)
linear linear \(z\)
sigmoid logistic sigmoid \((1 + \exp(-z))^{-1}\)
logistic logistic sigmoid \((1 + \exp(-z))^{-1}\)
tanh hyperbolic tangent \(\tanh(z)\)
softplus smooth relu approximation \(\log(1 + \exp(z))\)
softmax categorical distribution \(\exp(z) / \sum\exp(z)\)
relu rectified linear \(\max(0, z)\)
trel truncated rectified linear \(\max(0, \min(1, z))\)
trec thresholded rectified linear \(z \mbox{ if } z > 1 \mbox{ else } 0\)
tlin thresholded linear \(z \mbox{ if } |z| > 1 \mbox{ else } 0\)
rect:min truncation \(\min(1, z)\)
rect:max rectification \(\max(0, z)\)
norm:mean mean-normalization \(z - \bar{z}\)
norm:max max-normalization \(z / \max |z|\)
norm:std variance-normalization \(z / \mathbb{E}[(z-\bar{z})^2]\)
norm:z z-score normalization \((z-\bar{z}) / \mathbb{E}[(z-\bar{z})^2]\)
prelu relu with parametric leak \(\max(0, z) - \max(0, -rz)\)
lgrelu relu with leak and gain \(\max(0, gz) - \max(0, -rz)\)
maxout piecewise linear \(\max_i m_i z\)

Composition

Activation functions can also be composed by concatenating multiple function names togather using a +. For example, to create a layer that uses a batch-normalized hyperbolic tangent activation:

net = theanets.Regressor([10, (10, 'tanh+norm:z'), 10])

Just like function composition, the order of the components matters! Unlike the notation for mathematical function composition, the functions will be applied from left-to-right.

Custom Activations

To define a custom activation, create a subclass of theanets.Activation, and implement the __call__ method to make the class instance callable. The callable will be given one argument, the array of layer outputs to activate.