Skip to main content

Novelty Nets: Classifier Anti-Guidance

Generative modeling proposal for increasing diversity of samples by a helper NN memorizing past samples and ‘repelling’ new samples away from old ones.

How can we avoid generative models always creating ‘same-y’ samples, particularly when prompts don’t work well? Novelty search approaches typically operate ‘outside’ the generative model, and so are hamstrung by the inherent non-novelty of the generative model’s usual sampling.

I propose novelty nets: small neural net adapter layers which are trained online during sampling to memorize the history of all previous samples, producing a ‘probability this is not novel’, and thus enable gradient descent to minimize that probability and yield a meaningfully-different new sample each time. This systematically increases the diversity and improves exploration & variation, as one no longer struggles to fight a model stubbornly insisting on generating extremely similar samples because that is just what it considers highly-likely or high-quality.

Novelty nets could be particularly useful for image generation, both at the user & service-level, as the nets push all samples collectively away from each other, reducing the esthetically unpleasant ‘same-y-ness’ of AI-generated images.

‘Saith He is terrible: watch His feats in proof!
…Please Him and hinder this?—What Prosper does?
Aha, if He would tell me how! Not He!
There is the sport: discover how or die!
All need not die, for of the things o’ the isle
Some flee afar, some dive, some run up trees;
Those at His mercy,—why, they please Him most
When . . . when . . . well, never try the same way twice!
Repeat what act has pleased, He may grow wroth.

Robert Browning, “Caliban Upon Setebos” (1864161ya)

Zooming up to the high-level view, one of the biggest problems of generative model services as of 2024 is the horrible same-iness: each sample is good, possibly flawless, but somehow, they all look the same. Better prompting can help fix this by pushing out of the default ‘look’, but the problem repeats itself fractally: the cyberpunk noir all look the same, the steampunk Sung China all look the same… Mere words struggle to avoid the repetition.

And this problem is worsened by the “I know it when I see it” reality of a lot of creative work: not only do I lack the words for what I want (and there might not be any words), I don’t know what I want to begin with. What I want is to generate many samples, as different as possible in as many respects as possible, where I can go “aha!” or “interesting” or “wow”. Once I begin to recognize what I’m looking for or see better ideas, I can then home in on them and vary them with normal techniques.

But I need to have a good sample to look at first. And this is the same problem that generative models like LLMs face in trying to create good but novel/diverse samples: they are better at recognizing an interesting finished sample than they are at creating one, which means that while they are in the middle of creating a sample, how can they tell if it’s novel or just bad? (If you already could tell it’s both novel & good partway through, then you probably would’ve generated it before…) Which leads to conservatively safe ‘good’ outputs, which collectively are useless.

Sampling Diminishing Returns

This seems like a good candidate for better sampling from generative modeling—the models are much more diverse than we see, the problem is the sampling. In the standard sampling, the random samples are all closely ‘concentrated’ around some center point. This generally means that if you generate 100 samples, you don’t get much more diversity in any particular respect than you did with 10 samples; the extremes are hard to sample from by brute force. (And if you try, you run into the practical problem of diminishing returns: those 100 samples will take 10× longer to review than the 10 samples did, and if you go to 1,000 or 10,000…)

In evolutionary computation & reinforcement learning, this sort of loosely-directed exploration problem has an appealing solution in novelty search: a search phase in which the ultimate ‘reward’ is ignored temporarily, and instead, the ‘reward’ is defined as ‘different from any other result’. This lets a large ‘library’ of ‘interesting’ results be accumulated, which can then be explored, analyzed, or ranked on the true reward.


  1. Where ‘dissimilar’ is defined as something like Euclidean distance in an embedding, and you are taking the k-medoids.↩︎

  2. One way to think of it would be to imagine defining every sampled image as an embedded ‘tag’, and adding all prior sampled images as (very weak) negative tags to the guidance. This would probably be far more expensive, though.↩︎

  3. Which creates a collective action problem: my lazy use of Midjourney for blog spam decoration damages your carefully-developed game assets by making everyone ‘allergic’ to the traces of ‘Midjourney look’ in your art.↩︎