theanets.losses.MaximumMeanDiscrepancy¶
-
class
theanets.losses.
MaximumMeanDiscrepancy
(kernel=1, **kwargs)¶ Maximum Mean Discrepancy (MMD) loss function.
Parameters: kernel : callable or numeric, optional
A kernel function to call for computing pairwise kernel values. If this is a callable, it should take two Theano arrays as arguments and return a Theano array. If it is a numeric value, the kernel will be a Gaussian with the given value as the bandwidth parameter. Defaults to 1.
Notes
This loss computes the differential between a predicted distribution (generated by a network) and an observed distribution (of data within a mini-batch). The loss is given by:
\[\mathcal{L}(x, y) = \| \sum_{j=1}^N \phi(y_j) - \sum_{i=1}^N \phi(x_i) \|_2^2\]This can be expanded to
\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N \phi(y_j)^\top \phi(y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N \phi(y_j)^\top \phi(x_i) + \sum_{i=1}^N \sum_{i'=1}^N \phi(x_i)^\top \phi(x_{i'})\]and then the kernel trick can be applied,
\[\mathcal{L}(x, y) = \sum_{j=1}^N \sum_{j'=1}^N k(y_j, y_{j'}) - 2 \sum_{j=1}^N \sum_{i=1}^N k(y_j, x_i) + \sum_{i=1}^N \sum_{i'=1}^N k(x_i, x_{i'})\]By default the loss here uses the Gaussian kernel
\[k(x, x') = \exp(-(x-x')^2/\sigma)\]where \(\sigma\) is a scalar bandwidth parameter. However, other kernels can be provided when constructing the loss.
References
[Gre07] A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, & A. J. Smola (NIPS 2007) “A Kernel Method for the Two-Sample-Problem.” http://papers.nips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf [Li15] Y. Li, K. Swersky, & R. Zemel (ICML 2015) “Generative Moment Matching Networks.” http://jmlr.org/proceedings/papers/v37/li15.pdf -
__init__
(kernel=1, **kwargs)¶
Methods
__init__
([kernel])gaussian
(bw)log
()Log some diagnostic info about this loss. Attributes
variables
A list of Theano variables used in this loss. -