Bibliography (39):

  1. https://openai.com/index/whisper/

  2. https://github.com/openai/whisper

  3. https://github.com/openai/whisper/blob/main/model-card.md

  4. https://cookbook.openai.com/examples/whisper_prompting_guide

  5. https://github.com/alphacep/whisper-prompts

  6. https://www.lesswrong.com/posts/thePw6qdyabD8XR4y/interpreting-openai-s-whisper

  7. https://www.lesswrong.com/posts/thePw6qdyabD8XR4y/interpreting-openai-s-whisper#3_1__Whisper_learns_language_modelling_bigrams

  8. Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

  9. ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

  10. Why YouTube Could Give Google an Edge in AI

  11. How Tech Giants Cut Corners to Harvest Data for AI: OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems

  12. VoxLingua107: a Dataset for Spoken Language Recognition

  13. Attention Is All You Need

  14. Using the Output Embedding to Improve Language Models

  15. 2022-radford-figure1-overviewofwhispertransformerarchitecture.png

  16. BPEs: Neural Machine Translation of Rare Words with Subword Units

  17. Language Models are Unsupervised Multitask Learners

  18. https://arxiv.org/pdf/2212.04356#page=4&org=openai

  19. ​ β€˜dynamic evaluation (NN)’ directory

  20. https://arxiv.org/pdf/2212.04356#page=28&org=openai

  21. FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

  22. CoVoST 2 and Massively Multilingual Speech-to-Text Translation

  23. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

  24. 2022-radford-figure4-correlationofpretraininglanguagedatawithtranslationperformance.jpg

  25. https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_conformer_ctc_large

  26. Conformer: Convolution-augmented Transformer for Speech Recognition

  27. 2022-radford-figure6-whisperbenchmarksagainstrivalsacrossotherdatasets.png

  28. 2022-radford-figure8-whisperscalingbymodelsize.png

  29. Chinchilla: Training Compute-Optimal Large Language Models