theanets.feedforward.Classifier

class theanets.feedforward.Classifier(layers, weighted=False, sparse_input=False)

A classifier attempts to match a 1-hot target output.

Classification models in theanets are trained by optimizing a (possibly regularized) loss that centers around the categorical cross-entropy. This error computes the difference between the distribution generated by the classification model and the empirical distribution of the labeled data.

If we have a labeled dataset containing \(m\) \(d\)-dimensional input samples \(X \in \mathbb{R}^{m \times d}\) and \(m\) paired target outputs \(Y \in \{0,1,\dots,K-1\}^m\), then the loss that the Classifier model optimizes with respect to the model parameters \(\theta\) is:

\[\mathcal{L}(X, Y, \theta) = R(X, \theta) - \frac{1}{m} \sum_{i=1}^m \sum_{k=0}^{K-1} p(k | y_i) \log q_\theta(k | x_i)\]

Here, \(p(k|y_i)\) is the probability that example \(i\) is labeled with class \(k\); in theanets classification models, this is 1 if \(k = y_i\) and 0 otherwise—so, in practice, the sum over classes reduces to a single term. Next, \(q_\theta(k|x_i)\) is the probability that the model assigns to class \(k\) given input \(x_i\); this corresponds to the relevant softmax output from the model. Finally, \(R\) is a regularization function.

A classifier model requires the following inputs at training time:

  • x: A two-dimensional array of input data. Each row of x is expected to be one data item. Each column of x holds the measurements of a particular input variable across all data items.
  • labels: A one-dimensional array of target labels. Each element of labels is expected to be the class index for a single data item.

The number of rows in x must match the number of elements in the labels vector. Additionally, the values in labels are expected to range from 0 to one less than the number of classes in the data being modeled. For example, for the MNIST digits dataset, which represents digits 0 through 9, the labels array contains integer class labels 0 through 9.

__init__(layers, weighted=False, sparse_input=False)

Methods

accuracy(outputs) Build a theano expression for computing the network accuracy.
classify(x)
error(outputs) Build a theano expression for computing the network error.
monitors(**kwargs) Return expressions that should be computed to monitor training.
predict(x) Compute a greedy classification for the given set of data.
predict_logit(x) Compute the logit values that underlie the softmax output.
predict_proba(x) Compute class posterior probabilities for the given set of data.
score(x, y[, w]) Compute the mean accuracy on a set of labeled data.

Attributes

DEFAULT_OUTPUT_ACTIVATION
num_params Number of parameters in the entire network model.
params A list of the learnable theano parameters for this network.
accuracy(outputs)

Build a theano expression for computing the network accuracy.

Parameters:

outputs : dict mapping str to theano expression

A dictionary of all outputs generated by the layers in this network.

Returns:

acc : theano expression

A theano expression representing the network accuracy.

error(outputs)

Build a theano expression for computing the network error.

Parameters:

outputs : dict mapping str to theano expression

A dictionary of all outputs generated by the layers in this network.

Returns:

error : theano expression

A theano expression representing the network error.

monitors(**kwargs)

Return expressions that should be computed to monitor training.

Returns:

monitors : list of (name, expression) pairs

A list of named monitor expressions to compute for this network.

predict(x)

Compute a greedy classification for the given set of data.

Parameters:

x : ndarray (num-examples, num-variables)

An array containing examples to classify. Examples are given as the rows in this array.

Returns:

k : ndarray (num-examples, )

A vector of class index values, one per row of input data.

predict_logit(x)

Compute the logit values that underlie the softmax output.

Parameters:

x : ndarray (num-examples, num-variables)

An array containing examples to classify. Examples are given as the rows in this array.

Returns:

l : ndarray (num-examples, num-classes)

An array of posterior class logit values, one row of logit values per row of input data.

predict_proba(x)

Compute class posterior probabilities for the given set of data.

Parameters:

x : ndarray (num-examples, num-variables)

An array containing examples to predict. Examples are given as the rows in this array.

Returns:

p : ndarray (num-examples, num-classes)

An array of class posterior probability values, one per row of input data.

score(x, y, w=None)

Compute the mean accuracy on a set of labeled data.

Parameters:

x : ndarray (num-examples, num-variables)

An array containing examples to classify. Examples are given as the rows in this array.

y : ndarray (num-examples, )

A vector of integer class labels, one for each row of input data.

w : ndarray (num-examples, )

A vector of weights, one for each row of input data.

Returns:

score : float

The (possibly weighted) mean accuracy of the model on the data.