Here's a video of a conditional model I trained last March. The text on the bottom is encoded into a vector and fed into StyleGAN's mapping network. The right side is a heatmap of the dlatents. I guess I can say I was working on text-to-image synthesis before it was cool.😉

Jan 9, 2021 · 11:42 PM UTC

Here's the notebook, which also includes an editor. colab.research.google.com/dr…
One sorta hidden feature of the notebook is the ability to have it try to replicate a real image from the training data by copying the tags from the real image and using them as input to the generator (instead of the tags in the textbox). Just drag the Danbooru ID slider.