theanets.losses.MaximumMeanDiscrepancy

class theanets.losses.MaximumMeanDiscrepancy(kernel=1, **kwargs)

Maximum Mean Discrepancy (MMD) loss function.

Parameters:

kernel : callable or numeric, optional

A kernel function to call for computing pairwise kernel values. If this is a callable, it should take two Theano arrays as arguments and return a Theano array. If it is a numeric value, the kernel will be a Gaussian with the given value as the bandwidth parameter. Defaults to 1.

Notes

This loss computes the differential between a predicted distribution (generated by a network) and an observed distribution (of data within a mini-batch). The loss is given by:

\[\mathcal{L}(x, y) = \| \sum_{j=1}^N \phi(y_j) - \sum_{i=1}^N \phi(x_i) \|_2^2\]

This can be expanded to

\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N \phi(y_j)^\top \phi(y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N \phi(y_j)^\top \phi(x_i) + \sum_{i=1}^N \sum_{i'=1}^N \phi(x_i)^\top \phi(x_{i'})\]

and then the kernel trick can be applied,

\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N k(y_j, y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N k(y_j, x_i) + \sum_{i=1}^N \sum_{i'=1}^N k(x_i, x_{i'})\]

By default the loss here uses the Gaussian kernel

\[k(x, x') = \exp(-(x-x')^2/\sigma)\]

where \(\sigma\) is a scalar bandwidth parameter. However, other kernels can be provided when constructing the loss.

References

[Gre07]A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, & A. J. Smola (NIPS 2007) “A Kernel Method for the Two-Sample-Problem.” http://papers.nips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf
[Li15]Y. Li, K. Swersky, & R. Zemel (ICML 2015) “Generative Moment Matching Networks.” http://jmlr.org/proceedings/papers/v37/li15.pdf
__init__(kernel=1, **kwargs)

Methods

__init__([kernel])
gaussian(bw)
log() Log some diagnostic info about this loss.

Attributes

variables A list of Theano variables used in this loss.