When the face and sound can be easily changed, what do we need to confirm our identity?

Editor’s note: This article is from WeChat public account “Big Data Digest” (ID: BigDataDigest), author Liu Junxi.

The first AI scam in Europe: cheating 220,000 with the CEO voice, recording in a realistic voice synthesis in just 1 minute< /p>

Think of more than just AI fraud. Startups, the current AI technology is so developed, It’s easy to change your face , what about the sound?

When referring to this sound-changing technology, the first thing that comes to mind is the bow-sounding singer used by Conan in Detective Conan. Conan used the invention of Dr. Ayi to bring the “sleeping Maori Kogoro” to the top. The altar of the detective world.

The first AI scam in Europe: cheating 220,000 with CEO voice, just 1 minute recording for realistic speech synthesis

But imagine, if someone uses this technology for fraud, is it suddenly cool?

According to The Wall Street Journal, in March of this year, criminals used similar AI technology. They successfully imitated the voice of a British British energy company CEO in Germany, and defrauded 22The input data is processed, and the hierarchical features are used to summarize common features from a large amount of sample data.

The first to generate human natural speech using neural networks is WaveNet, released by Google’s DeepMind Research Lab.

The first AI scam in Europe: cheating 220,000 with the CEO voice, realistic recording is only 1 minute recording

Next, take WaveNet as an example to briefly introduce how AI synthesizes speech through neural networks and machine learning.

Thesis link:

https://arxiv.org/abs/1609.03499

WaveNet is an audio generation model based on PixelCNN. In this generation model, each audio sample is conditional on the previous audio sample. Conditional probabilities are modeled by a set of convolutional layers. This network does not have a pooling layer, and the output of the model has the same time dimension as the input.

Using a temporary convolution in the model architecture ensures that the model does not violate the order of data modeling. In this model, each predicted speech sample is fed back to the network to help predict the next speech sample, which is faster than the RNN training because the temporal convolution does not have periodic connections.

Europe's first AI scam: cheating 220,000 with CEO voice, realistic voice synthesis takes only 1 minute to record

One of the main challenges of using temporary convolution is that it requires a lot of layers to increase the receptive field. To solve this problem, the author uses a widened convolution, and the widened convolution enables only a few layers of the network to have more Great feelings. The model uses the Softmax distribution to model the conditional distribution of individual audio samples.

This model evaluates speech generation, text-to-speech conversion, and music audio modeling in multiplayer scenarios. The average opinion score (MOS) is used in the test. MOS can measure the quality of the sound. In essence, it is the same as one who evaluates the sound quality. It has numbers between 1 and 5, with 5 indicating the best quality.

The first AI scam in Europe: cheating 220,000 with CEO voice, realistic voice synthesis takes only 1 minute to record

Related detection technology is under study

Irakli Beridze, director of the Center for Artificial Intelligence and Robotics at the United Nations Interregional Crime and Justice Research Institute, said applying machine learning techniques to deceptive voices makes cybercrime easier.

The UN Center is investigating techniques for detecting fake videos, which Mr. Beridze said may be a more useful tool for hackers. “Imagine a video call with the CEO’s voice. This is a facial expression that you are familiar with, so you don’t have any doubts at all,” he said.

Some netizens on Twitter have expressed their views on this, and believe that AI technology can be used to break the AI ​​difficulties. Perhaps this will become one of the main solutions to solve similar problems in the future.

The first AI scam in Europe: cheating 220,000 with the CEO voice, and recording in a realistic voice for only 1 minute

Enter “voice scam”, “recognition” and other keywords on Baidu, you can see the relevant posts of Baidu experience, although these “experiences” are quite early, but you can see that everyone is fighting against such scams. It is already quite long lasting.

The first AI scam in Europe: cheating 220,000 with CEO voice, 1 minute recording for realistic speech synthesis

In any case, I hope that the relevant identification technology can be studied at an early date.

I don’t know if you have encountered similar voice fraud incidents? How do you deal with the best when you encounter such an event? Welcome to leave a message to discuss.

Related reports:

https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402