Montreal-based Lyrebird’s is launching a new service. The company says its API will let you synthesize speech in anyone’s voice from just a minute-long recording, which means you could, for instance, generate a clip of President Trump declaring war on Canada.
Lyrebird has posted some audio examples that sound pretty convincing (listen below). The company says that it doesn’t require the speaker to say the words that you’ll use the voice to speak in the audio you generate, and it’ll also be able to create different intonations.
If any of this sounds familiar, it might be because you’re thinking of Adobe’s demo of its similar tech last November. But while Adobe’s Project VoCo requires 20 minutes of audio and appears to use system resources for speech synthesis, Lyrebird only needs a minute-long recording and says it’s close to launching its cloud-based API to process audio and spit out results.
By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.
Lyrebird might be on to something there: the widespread availability of image manipulation tools has led to people questioning the veracity of photographs that are circulated in the press and on the web, as well as the integrity of their sources. But there’s still a huge risk of people falling prey to scams and misinformation through tampered audio.
And we’re not just talking about copying the voices of world leaders: people could be duped into handing over sensitive data when they think they’re speaking with a significant other or a family member, and company employees could find themselves following counter-productive orders from someone on the phone who happens to sound an awful lot like their boss.