Enlarge image
Presentation of the Amazon Echo Dot in 2018
Photo:
GRANT HINDSLEY/ AFP
“Alexa,” asks a boy in the video, “can Grandma read me the end of The Wizard of Oz?” In her typical, slightly robotic voice, Alexa confirms this briefly – and then her voice changes.
She now sounds like the boy's late grandmother.
At least that's what Alexa research director Rohit Prasad claims.
The scene is from this short video that Prasad showed at Amazon's re:Mars conference in Las Vegas on Wednesday.
One minute of training material, i.e. just a short recording of the alleged grandmother, was enough, he said afterwards, to synthesize her voice and make it sound authentic.
That was possible because Amazon approaches speech synthesis differently than previous providers of this technology.
Amazon's whitepaper on this states: "We divided voice generation into two tasks: speech content and speaker identity".
This reduces the complexity of the problem of creating an authentic-sounding voice from little material.
Other comparable models would require 30 times more training material.
»Preserving memories of people«
When all Alexa devices will get this ability, Prasad didn't say.
According to Amazon, the technology is still in the development phase.
However, Prasad has already formulated what it should ultimately do: Many people lost loved ones in the pandemic.
»Artificial intelligence cannot take away the pain of loss, but it can definitely preserve the memories of these people.«
Voice synthesis is not a new technique per se.
Many large companies and small start-ups already offer them in one form or another.
Adobe already presented the software VoCo in 2016, a kind of Photoshop for voices, but after that the project went silent.
A little later, the Canadian start-up Lyrebird released software that allowed users to synthesize their own voice - albeit more badly than right.
Even then, the developers dreamed that in the future everyone would be able “to speak their navigation system with any voice, or have audio books read out by any voice”.
The technology should also give a voice again to people who, for example, can no longer speak for themselves due to ALS disease.
Today Lyrebird is part of Descript.
The software is used to synthesize the voices of podcasters, which simplifies production, for example when a podcast needs to be edited after it has been spoken.
Google, Microsoft and Nuance, among others, now have various voice generators on offer, but not for end consumers, but for corporate customers and their service hotlines, for example.
No wonder, because in the wrong hands such software has a certain potential for abuse.
If you can have any voice say anything with just a little training material, you could use this to spread disinformation on social networks or to attempt fraud on the phone - as a kind of grandchild trick with AI turbo.
Amazon would have to take appropriate precautions to prevent this before making the technology available to all Alexa users.