Rivers Have Wings · Dec 18, 2021 · 10:36 PM UTC

Rivers Have Wings · Dec 18, 2021 · 10:36 PM UTC

Rivers Have Wings

18 Dec 2021

Experimenting with CLIP as a feature extractor for NIMA (neural image assessment) type things... a simple linear regression yielded a direction in CLIP latent space that can increase the "aestheticness" of an output. Here's "a datura witch" and a couple of Starry Night variants:

Dec 18, 2021 · 10:36 PM UTC

239

Xander Steenbrugge · Dec 19, 2021 · 2:07 PM UTC

Xander Steenbrugge

@xsteenbrugge

19 Dec 2021

Replying to @RiversHaveWings

May I ask which NIMA model you're using? I recently wanted to add this repo to my pipeline, but all the pretrained model links are broken... github.com/kentsyx/Neural-IM…

GitHub - yunxiaoshi/Neural-IMage-Assessment: A PyTorch Implementation of Neural IMage Assessment

A PyTorch Implementation of Neural IMage Assessment - GitHub - yunxiaoshi/Neural-IMage-Assessment: A PyTorch Implementation of Neural IMage Assessment

github.com

Rivers Have Wings · Dec 19, 2021 · 2:24 PM UTC

Rivers Have Wings @RiversHaveWings

19 Dec 2021

I trained it myself, it takes CLIP ViT-B/16 embeddings as input rather than features output by VGG.

more replies

Ariel Ekgren · Dec 18, 2021 · 11:17 PM UTC

Ariel Ekgren

@ArYoMo

18 Dec 2021

Replying to @RiversHaveWings

wow! how did you train the linear weights, just good and bad examples? and do you have a notebook with it somewhere? :D

Rivers Have Wings · Dec 18, 2021 · 11:21 PM UTC

Rivers Have Wings @RiversHaveWings

18 Dec 2021

I computed ViT-B/16 embeddings for everything in the AVA train set (using the prefiltered CSVs from github.com/kentsyx/Neural-IM…). Then I did a linear regression vs their mean rating in scikit-learn and copied the weights and bias into a PyTorch nn.Linear layer.

GitHub - yunxiaoshi/Neural-IMage-Assessment: A PyTorch Implementation of Neural IMage Assessment

A PyTorch Implementation of Neural IMage Assessment - GitHub - yunxiaoshi/Neural-IMage-Assessment: A PyTorch Implementation of Neural IMage Assessment

github.com

more replies

Alex Nichol · Dec 19, 2021 · 5:29 PM UTC

Alex Nichol @unixpickle

19 Dec 2021

Replying to @RiversHaveWings

You should try an SVM with RBF kernels. With about 1k labels it gets really good on CLIP latents.

Rivers Have Wings · Dec 19, 2021 · 5:54 PM UTC

Rivers Have Wings @RiversHaveWings

19 Dec 2021

Huh! I have around 220k CLIP embeddings and distributions of ratings in the ranges 1-10 for them, it looks like scikit-learn will have trouble scaling to that many samples?