“Navigating the Challenges and Opportunities of Synthetic Voices: We’re Sharing Lessons from a Small Scale Preview of Voice Engine, a Model for Creating Custom Voices.”, 2024-03-29 (; backlinks):
OpenAI is committed to developing safe and broadly beneficial AI. Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.
…Building Voice Engine safely: …Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used.
We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures. [But not Scarlett Johansson?]