class theanets.losses.MaximumMeanDiscrepancy(kernel=1, **kwargs)

Maximum Mean Discrepancy (MMD) loss function.


kernel : callable or numeric, optional

A kernel function to call for computing pairwise kernel values. If this is a callable, it should take two Theano arrays as arguments and return a Theano array. If it is a numeric value, the kernel will be a Gaussian with the given value as the bandwidth parameter. Defaults to 1.


This loss computes the differential between a predicted distribution (generated by a network) and an observed distribution (of data within a mini-batch). The loss is given by:

\[\mathcal{L}(x, y) = \| \sum_{j=1}^N \phi(y_j) - \sum_{i=1}^N \phi(x_i) \|_2^2\]

This can be expanded to

\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N \phi(y_j)^\top \phi(y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N \phi(y_j)^\top \phi(x_i) + \sum_{i=1}^N \sum_{i'=1}^N \phi(x_i)^\top \phi(x_{i'})\]

and then the kernel trick can be applied,

\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N k(y_j, y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N k(y_j, x_i) + \sum_{i=1}^N \sum_{i'=1}^N k(x_i, x_{i'})\]

By default the loss here uses the Gaussian kernel

\[k(x, x') = \exp(-(x-x')^2/\sigma)\]

where \(\sigma\) is a scalar bandwidth parameter. However, other kernels can be provided when constructing the loss.


[Gre07]A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, & A. J. Smola (NIPS 2007) “A Kernel Method for the Two-Sample-Problem.”
[Li15]Y. Li, K. Swersky, & R. Zemel (ICML 2015) “Generative Moment Matching Networks.”
__init__(kernel=1, **kwargs)


log() Log some diagnostic info about this loss.


variables A list of Theano variables used in this loss.