# theanets.losses.MaximumMeanDiscrepancy¶

class theanets.losses.MaximumMeanDiscrepancy(kernel=1, **kwargs)

Maximum Mean Discrepancy (MMD) loss function.

Parameters: kernel : callable or numeric, optional A kernel function to call for computing pairwise kernel values. If this is a callable, it should take two Theano arrays as arguments and return a Theano array. If it is a numeric value, the kernel will be a Gaussian with the given value as the bandwidth parameter. Defaults to 1.

Notes

This loss computes the differential between a predicted distribution (generated by a network) and an observed distribution (of data within a mini-batch). The loss is given by:

$\mathcal{L}(x, y) = \| \sum_{j=1}^N \phi(y_j) - \sum_{i=1}^N \phi(x_i) \|_2^2$

This can be expanded to

$\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N \phi(y_j)^\top \phi(y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N \phi(y_j)^\top \phi(x_i) + \sum_{i=1}^N \sum_{i'=1}^N \phi(x_i)^\top \phi(x_{i'})$

and then the kernel trick can be applied,

$\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N k(y_j, y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N k(y_j, x_i) + \sum_{i=1}^N \sum_{i'=1}^N k(x_i, x_{i'})$

By default the loss here uses the Gaussian kernel

$k(x, x') = \exp(-(x-x')^2/\sigma)$

where $$\sigma$$ is a scalar bandwidth parameter. However, other kernels can be provided when constructing the loss.

References

 [Gre07] A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, & A. J. Smola (NIPS 2007) “A Kernel Method for the Two-Sample-Problem.” http://papers.nips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf
 [Li15] Y. Li, K. Swersky, & R. Zemel (ICML 2015) “Generative Moment Matching Networks.” http://jmlr.org/proceedings/papers/v37/li15.pdf
__init__(kernel=1, **kwargs)

Methods

 __init__([kernel]) gaussian(bw) log() Log some diagnostic info about this loss.

Attributes

 variables A list of Theano variables used in this loss.