Bibliography (5):

  1. https://www.wikipedia.org/

  2. Generating Diverse High-Fidelity Images with VQ-VAE-2

  3. https://openai.com/blog/dall-e/

  4. Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining

  5. WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training