Bibliography (5):

  1. Microsoft COCO: Common Objects in Context

  2. ImageInWords: Unlocking Hyper-Detailed Image Descriptions

  3. Evaluating Text-to-Visual Generation with Image-to-Text Generation

  4. https://github.com/google-deepmind/proactive_t2i_agents

  5. https://www.youtube.com/watch?v=HQgjLWp4Lo8