Experimenting with CLIP as a feature extractor for NIMA (neural image assessment) type things... a simple linear regression yielded a direction in CLIP latent space that can increase the "aestheticness" of an output. Here's "a datura witch" and a couple of Starry Night variants:

Dec 18, 2021 · 10:36 PM UTC

Replying to @RiversHaveWings
wow! how did you train the linear weights, just good and bad examples? and do you have a notebook with it somewhere? :D
I computed ViT-B/16 embeddings for everything in the AVA train set (using the prefiltered CSVs from github.com/kentsyx/Neural-IM…). Then I did a linear regression vs their mean rating in scikit-learn and copied the weights and bias into a PyTorch nn.Linear layer.
Replying to @RiversHaveWings
You should try an SVM with RBF kernels. With about 1k labels it gets really good on CLIP latents.
Huh! I have around 220k CLIP embeddings and distributions of ratings in the ranges 1-10 for them, it looks like scikit-learn will have trouble scaling to that many samples?
This is cool ☮️
Replying to @RiversHaveWings
That's a fantastic idea