How Tech Giants Cut Corners to Harvest Data for AI: OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
2022-radford-figure1-overviewofwhispertransformerarchitecture.png
2022-radford-figure3-correlationofpretrainingdataperlanguagewithlanguageperformance.jpg
2022-radford-figure4-correlationofpretraininglanguagedatawithtranslationperformance.jpg
2022-radford-figure6-whisperbenchmarksagainstrivalsacrossotherdatasets.png
2022-radford-figure7-whispervsprofessionalhumantranscribersonkincaid46.jpg
2022-radford-figure9-crossoverinmonolingualvsmultilingualtrainingscalingshowseventualtransfer.jpg
https://cookbook.openai.com/examples/whisper_prompting_guide
https://github.com/openai/whisper/discussions/1762#discussion-5819873
https://openai.com/blog/introducing-chatgpt-and-whisper-apis
https://www.lesswrong.com/posts/KbRxdBCcJqwtbiPzm/whisper-s-wild-implications-1
https://www.lesswrong.com/posts/thePw6qdyabD8XR4y/interpreting-openai-s-whisper
How Tech Giants Cut Corners to Harvest Data for AI: OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems
https%253A%252F%252Fwww.nytimes.com%252F2024%252F04%252F06%252Ftechnology%252Ftech-giants-harvest-data-artificial-intelligence.html.html
https%253A%252F%252Fwww.theinformation.com%252Farticles%252Fwhy-youtube-could-give-google-an-edge-in-ai.html
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
https%253A%252F%252Farxiv.org%252Fabs%252F2212.04356%2523openai.html
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
https%253A%252F%252Farxiv.org%252Fabs%252F2210.13352%2523huggingface.html
Wikipedia Bibliography: