Reference

Computation graphs

This module contains a base class for modeling computation graphs.

Neural networks are really just a concise, computational way of describing a mathematical model of a computation graph that operates on a particular set of data.

At a high level, a neural network is a computation graph that describes a parametric mapping

\[F_\theta: \mathcal{S} \to \mathcal{T}\]

between a source space \(\mathcal{S}\) and a target space \(\mathcal{T}\), using parameters \(\theta\). For example, suppose we are processing vectors representing the MNIST handwritten digits. We could think of \(\mathcal{S} = \mathbb{R}^{28 \times 28} = \mathbb{R}^{784}\) (i.e., the space of all 28×28 images), and for classifying the MNIST digits we could think of \(\mathcal{T} = \mathbb{R}^{10}\).

This mapping is assumed to be fairly complex. If it were not – if you could capture the mapping using a simple expression like \(F_{\{a\}}(x) = ax^2\) – then we would just use the expression directly and not need to deal with an entire network. So if the mapping is complex, we will do a couple of things to make our problem tractable. First, we will assume some structure for \(F_\theta\). Second, we will fit our model to some set of data that we have obtained, so that our parameters \(\theta\) are tuned to the problem at hand.

Graph structure

The mapping \(F_\theta\) is implemented in neural networks by assuming a specific, layered form. Computation nodes – also called units or (sometimes) neurons – are arranged in a \(k+1\) partite graph, with layer \(k\) containing \(n_k\) nodes. The number of input nodes in the graph is referred to as \(n_0\).

Most layers are connected together using a set of weights. A weight matrix \(W^k \in \mathbb{R}^{n_{k-1} \times n_k}\) specifies the strength of the connection between nodes in layer \(k\) and those in layer \(k-1\) – all other pairs of nodes are typically not connected. Each layer of nodes also typically has a bias vector that determines the offset of each node from the origin. Together, the parameters \(\theta\) of the model are these \(k\) weight matrices and \(k\) bias vectors (there are no weights or biases for the input nodes in the graph).

Network(layers[, weighted, sparse_input]) The network class encapsulates a network computation graph.

Feedforward networks

Autoencoder(layers[, weighted, sparse_input]) An autoencoder attempts to reproduce its input.
Classifier(layers[, weighted, sparse_input]) A classifier attempts to match a 1-hot target output.
Regressor(layers[, weighted, sparse_input]) A regression model attempts to produce a target output.

Recurrent networks

This module contains recurrent network structures.

Autoencoder(layers[, weighted, sparse_input]) An autoencoder network attempts to reproduce its input.
Classifier(layers[, weighted, sparse_input]) A classifier attempts to match a 1-hot target output.
Predictor(layers[, weighted, sparse_input]) A predictor network attempts to predict its next time step.
Regressor(layers[, weighted, sparse_input]) A regressor attempts to produce a target output.

Recurrent helpers

batches(samples[, labels, steps, ...]) Return a callable that generates samples from a dataset.
Text(text[, alpha, min_count, unknown]) A class for handling sequential text data.

Layer types

This module contains classes for different types of network layers.

In a standard feedforward neural network layer, each node \(i\) in layer \(k\) receives inputs from all nodes in layer \(k-1\), then transforms the weighted sum of these inputs:

\[z_i^k = \sigma\left( b_i^k + \sum_{j=1}^{n_{k-1}} w^k_{ji} z_j^{k-1} \right)\]

where \(\sigma: \mathbb{R} \to \mathbb{R}\) is an activation function.

In addition to standard feedforward layers, other types of layers are also commonly used:

build(layer, *args, **kwargs) Construct a layer by name.
Layer(size, inputs[, name, activation]) Layers in network graphs derive from this base class.

Feedforward layers

Feedforward layers for neural network computation graphs.

Classifier(**kwargs) A classifier layer performs a softmax over a linear input transform.
Feedforward(size, inputs[, name, activation]) A feedforward neural network layer performs a transform of its input.
Input(**kwargs) The input of a network is a special type of layer with no parameters.
Tied(partner, **kwargs) A tied-weights feedforward layer shadows weights from another layer.

Convolution layers

Convolutional layers for neural network computation graphs.

Convolutional layers are characterized by computations that use a convolution operation on their inputs.

Convolution(filter_shape[, stride, border_mode]) Convolution layers convolve filters over the input arrays.
Conv1(filter_size[, stride, border_mode]) 1-dimensional convolutions run over one data axis.

Recurrent layers

Recurrent layers for neural network computation graphs.

Recurrent layers are basically defined by the presence of explicitly modeled time dependencies in the computation graph.

Recurrent(**kwargs) A recurrent network layer incorporates some dependency on past values.
RNN(**kwargs) Standard recurrent network layer.
ARRNN(**kwargs) An adaptive-rate RNN defines per-hidden-unit accumulation rates.
LRRNN(**kwargs) A learned-rate RNN defines per-hidden-unit accumulation rates.
GRU(**kwargs) Gated Recurrent Unit layer.
LSTM(**kwargs) Long Short-Term Memory (LSTM) layer.
Clockwork(periods, **kwargs) A Clockwork RNN layer updates “modules” of neurons at specific rates.
MRNN([factors]) Define a recurrent network layer using multiplicative dynamics.
Bidirectional([worker]) A bidirectional recurrent layer runs worker models forward and backward.

Activations

Activation functions for network layers.

Activation functions are normally constructed using the build() function. Commonly available functions are:

  • “linear”
  • “logistic” (or “sigmoid”)
  • “tanh”
  • “softmax” (typically used for classifier output layers)
  • “relu” (or “rect:max”)
  • “rect:min”
  • “rect:minmax”
  • “softplus” (continuous approximation of “relu”)
  • “norm:mean”: subtractive (mean) batch normalization
  • “norm:max”: divisive (max) batch normalization
  • “norm:std”: divisive (standard deviation) batch normalization
  • “norm:z”: z-score batch normalization

Additionally, the names of all classes defined in this module can be used as keys when building an activation function.

build(name, layer, **kwargs) Construct an activation function by name.
Activation(name, layer, **kwargs) An activation function for a neural network layer.
Prelu(*args, **kwargs) Parametric rectified linear activation with learnable leak rate.
LGrelu(*args, **kwargs) Rectified linear activation with learnable leak rate and gain.
Maxout(*args, **kwargs) Arbitrary piecewise linear activation.

Training strategies

This module contains optimization methods for neural networks.

Many optimization methods are general-purpose optimization routines that happen to be pretty good for training neural networks; these are provided by downhill. The other methods here — SampleTrainer, SupervisedPretrainer, and UnsupervisedPretrainer — are more specific to neural networks, often taking advantage of the layered structure of many common network architectures.

DownhillTrainer(algo, network) Wrapper for using trainers from downhill.
SampleTrainer(network) This trainer replaces network weights with samples from the input.
SupervisedPretrainer(algo, network) This trainer adapts parameters using a supervised pretraining approach.
UnsupervisedPretrainer(algo, network) Train a classification model using an unsupervised pre-training step.

Drivers

This module contains some glue code encapsulating a “main” process.

The code here is aimed at wrapping the most common tasks involved in creating and, especially, training a neural network model.

Experiment(network, *args, **kwargs) This class encapsulates tasks for training and evaluating a network.