“Minibatch Discrimination”, 2016-06-10 ():
…One of the main failure modes for GAN is for the Generator to collapse to a parameter setting where it always emits the same point. When collapse to a single mode is imminent, the gradient of the Discriminator may point in similar directions for many similar points. Because the Discriminator processes each example independently, there is no coordination between its gradients, and thus no mechanism to tell the outputs of the Generator to become more dissimilar to each other. Instead, all outputs race toward a single point that the Discriminator currently believes is highly realistic. After collapse has occurred, the Discriminator learns that this single point comes from the Generator, but gradient descent is unable to separate the identical outputs. The gradients of the Discriminator then push the single point produced by the Generator around space forever, and the algorithm cannot converge to a distribution with the correct amount of entropy.
An obvious strategy to avoid this type of failure is to allow the Discriminator to look at multiple data examples in combination, and perform what we call minibatch discrimination.
The concept of minibatch discrimination is quite general: any Discriminator model that looks at multiple examples in combination, rather than in isolation, could potentially help avoid collapse of the Generator. In fact, the successful application of batch normalization in the Discriminator by et al 2015 is well explained from this perspective.
…We compute these minibatch features [internal model embedding] separately for samples from the Generator and from the training data. As before, the Discriminator is still required to output a single number for each example indicating how likely it is to come from the training data: The task of the Discriminator is thus effectively still to classify single examples as real data or generated data, but it is now able to use the other examples in the minibatch as side information. Minibatch discrimination allows us to generate visually appealing samples very quickly, and in this regard it is superior to feature matching (§6).
Interestingly, however, feature matching was found to work much better if the goal is to obtain a strong classifier using the approach to semi-supervised learning described in §5.
See Also:
Neural networks trained with SGD learn distributions of increasing complexity
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Top-K Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
Generator Knows What Discriminator Should Learn in Unconditional GANs