In this last year, advances in artificial intelligence revealed an unprecedented facet to compose realistic images: make fake videos and coherent texts. Voice cloning also adds to the list and security experts anticipate a new wave of virtual scams.
Voices spoofed through AI still have a long way to go before they are indistinguishable from human ones but, in a short video call, they are capable of deceiving even the most experienced.
Above all, if the person who appears on the mobile screen requesting a deposit for some fatality (accident, theft or emergency), matches the voice and appearance of who he claims to be.
The rise of this modality has its epicenter in China and expands vertiginously to other latitudes. Different Asian media report what resembles a chimera of science fiction: people stripped by a digitized avatar.
Calls from digital avatars asking victims for money are growing.
For this reason, the Cyberspace Administration is warning the public through posts on Weibo, the "Chinese Twitter", to "be cautious when giving biometric information and refrain from sharing videos and other images on the Internet."
However, the authorities do not have enough resources to stop this type of fraud, because it is difficult to identify the fraudster through calls and because it is a new technology that does not have the jurisprudence to act as soon as possible.
According to Xinhua, a businessman named Wan Guo, transferred $ 3,<> from his account, after receiving a communication on WeChat from a woman whose face and voice closely resembled those of his wife.
The excuse was that he had been in an accident with his vehicle and had to pay for the damages caused to another driver. To escape the ambush of suspicion, he insisted that it must be immediate.
"Although there was something artificial in his way of expressing himself, it did not occur to me to distrust since the expressions of the face and even the voice, were those of my wife," the businessman told the Global Times.
An expanding scourge
The danger posed by voice cloning through AI.
Investigators in the case discovered that the scammers knew the habits of the marriage. They also knew that the woman had a cooking channel on a social network and from there they took the screenshots of her face and voice to plot the deepfake.
Sensing something out of place, the businessman contacted his wife by message after a few moments and she denied the facts. Immediately, Guo notified the banking institution that proceeded to block the operation, preventing the transfer.
Several similar cases appear in the pages of the Washington Post, where they indicate that, according to data from the Federal Trade Commission (FTC), this type of fraud is among the second most frequent, with more than 36,000 complaints.
The fear is that some scammers began using this technology to clone the faces of influencers and streamers, to claim products and royalties from the companies that hire them.
In most attempts it is difficult to detect the trap, especially because the tone of urgency makes the victim less reflective. The issue is more complicated if the caller is older and unaware of these technologies.
How deception is generated
Introducing Eleven Multilingual v1: Our New Speech Synthesis Model!
We're thrilled to launch our new model supporting seven new languages: French, German, Hindi, Italian, Polish, Portuguese, and Spanish.
Try it out now at: https://t.co/KEGnUGJ82j pic.twitter.com/ocHMm9tgCj
— ElevenLabs (@elevenlabsio) April 27, 2023
Just a year ago, AI-generated images looked like imperfect sketches and now, they are realistic enough to fool millions of people. Something similar happens with the voice.
Some speech synthesis startups such as ElevenLabs or Respeecher, using AI, allow you to replicate any voiceover with just a few seconds of audio sample, something very easy to achieve on social networks.
Speech generation software analyzes the pattern that makes a person's voice unique and searches a vast database to find a similar tone. Then recreate the timbre and individual sounds to create a similar effect.
Security experts argue that the artificial voice does not always sound as melodious as the original. The tone is monotonous and robotic, with strange stutters and synthetic artifacts.
SL
See also