Had implemented RLHF for diffusion models 2 weeks back, replicating the DDPO paper. Here's a before/after comparison training with aesthetics reward. This is actually the 1st RL algorithm I've coded from scratch! Code linked in next tweet. Explanatory blog post coming soon!

Jul 6, 2023 · 9:49 AM UTC

Oh and here is the reward curve during training. As you can see, it goes up, which is good! 😄
Replying to @iScienceLuvr
Looking forward to seeing the process you used. Seems like the tuned model eliminates legs, for the most part.
I guess legs are ugly/not very aesthetic? 😄
Replying to @iScienceLuvr
Awesome stuff! And looking forward to the post!
Replying to @iScienceLuvr
Is this with laion aesthetics as the reward? The after looks a lot like their dataset 🤔
Yes, the after on the right is Stable Diffusion v1.4 RL fine-tuned with the LAION aesthetics classifier as a reward function