The voice of ta has never been so close to you

Editor’s note: This article comes from the WeChat public account “Brain polar body” ( ID: unity007) , author: wind speech away.

In 2016, James Vlahos, a US science and technology journalist, did something that touched countless people.

A few months before his father’s death, he was determined to keep his father’s voice and teachings forever. So James with a non-technical background, with the help of an AI project, taught himself speech synthesis and machine learning. With the help of his father, he recorded his father’s voice for one to two hours a day, recording more than 90,000 words for training AI models. Eventually, a voice assistant similar to Siri, composed of his father’s voice, was able to make James pin his sorrow.

AI voice customization will bring three possibilities to 2020

This story has touched countless families around the world, but also made AI developers and technicians see the significance of customization with AI voice. There is no doubt that many families around the world are eager for similar functions. Whether it is recording the voices of the elderly, so that their voices can accompany their children more, or the voices between lovers, families are becoming the AI ​​voice technology application The main battlefield.

This demand is also receiving increasing attention from the industry. In recent years, technologies such as speech synthesis and sound cloning have been developed one after another, and the overall ability of natural language processing has also developed by leaps and bounds. The customization of AI speech has not taken months to use tens of thousands of corpora for machine learning training, but it is truly “flying into the homes of ordinary people”.

In early March, Baidu, which has been investing heavily in AI technology, launched a voice customization function in the small speaker. In the Xiaodu app, the feature scenario of “parents and children tell stories” allows users to record their own and family voice packets.

AI voice customization will bring three possibilities to 2020

This is in conversational AI hardwareFor the first time, user voice customization capabilities have appeared. When users can customize their own voice packages and let smart speakers continue to send their own voices, many industry rules seem to be changing.

Let’s take a look at the three possible changes in the 2020 era of AI voice customization from the confluence of the development of speech synthesis, conversational AI, and intelligent speech hardware.

The threshold is gone: AI voice enters the era of customization

In fact, the ability of AI voice customization has always been in the high expectations of the AI ​​industry and users. On the one hand, let AI simulate user sounds, which are related to many social and emotional factors such as family, companionship, and memory; on the other hand, familiar sounds may trigger many new application imaginations. For example, you may be too lazy to open audio classes, but if your love bean or goddess Given you audio lessons, you may not bother to sleep.

Therefore, the engineering and commercial application of AI voice customization has always been highly anticipated. This technical clue can be said to be a surprise for the continued development of AI voice hardware, such as smart speakers and smart screens.

Related technologies for AI voice customization have ushered in the process of continuously lowering the threshold and increasing the scale of applications in the past few years. James Vlahos used more than 90,000 corpora for machine learning training, but now it only takes a few minutes to train a custom speech model with semantic understanding and natural language processing effects that far exceed siri.

AI voice customization will bring three possibilities to 2020

In recent years, with the upgrading of technology, the exploration of industries related to customizing user speech has been progressing. For example, a public welfare project called Revoice hopes to help patients with frostbite to retain their own voice. Cerence, a car AI manufacturer, launched the function of creating user voice and voice assistants last year. Microsoft’s Custom Voice service can to some extent The user’s voice becomes Xiaobing’s voice. Last year, the “Voice Customization Function” began to be applied to map scenarios. Users recorded 20 sentences on the Baidu Maps APP and were able to generate personal complete voice packages.

Today, the ability to customize speech comes to the most complex AI scenarios: conversational AI devices.

In the small voice customization function, the user can record his own voice pack in the “Parents and Tale Story” function when he enters the small degree APP. It ca n’t be too complicated. It can be recorded in 3-5 minutes. The recorded voice can broadcast a large story, and the tone, intonation, and frustration are very realistic under the blessing of Baidu ’s AI voice capabilities.Look.

This means that AI’s ability to customize speech has basically no longer any user threshold. We don’t need to learn complex technology, waste a lot of time, and endure failures again and again. Users will be able to use smart voice customization to implement applications in home scenarios in a very simple way. The industrialization channel of voice customization has also begun.

On another trajectory, we can think of it as the overall evolution of intelligent voice assistants and conversational AI hardware.

Since the birth of Amazon’s Ehco in 2015, voice assistants have been in the basic ability to ask and answer with machine tones. Users often find no motivation to continue. The Q & A mode is also not similar to real-life interaction.

In 2019, Xiaodu Assistant realized full-duplex no-wake-up capability, which can wake up multiple interactions at one time, finally allowing multiple rounds of dialogue to be realized in hardware, and chat began to look like a real person.

The AI ​​voice customization capability may be regarded as another upgrade of the intelligent voice assistant and related hardware in 2020. Users can use this to implement thousands of faces of AI hardware, and developers have a new development foundation. The industrialization effects of the chain can also follow.

Rice circle & family: AI hardware or outbreak in two scenarios

The first change brought by AI voice customization is that users may start to rethink how they apply conversational AI hardware and why they buy related products.

With the AI ​​voice customization capability, two business scene changes are obvious. First of all, in the home scene, the ability to customize the voice of the family is actually crucial. Because the voice of the family represents companionship, dependence and warmth, this is human nature and cannot be changed at any time. Use parents’ voices to tell stories and knowledge to children, let children’s voices accompany parents in smart speakers, report time to parents, and read news. These warm applications are a common demand of Chinese people and an inevitable choice for busy urban work.

The situation today is a good example. The epidemic delayed the return to work, which gave many parents more time to accompany their children, which resulted in “parent dependence in the epidemic.” But when rework begins, what should parents do if they have to leave the child again? In the smart speaker home use scenario, the voice customization function gives an option.

On the other hand, the greater bonus of the AI ​​voice customization function depends on the rice circle. The energy of the rice circle these days has made the whole society quite instructive. Then let the voice of Aidou not only appear in the map navigation, but always stay in smart products, talk to yourself, chat, tell stories, and play games-the purchasing power and redevelopment ability generated by this are simply not dare to detail miss you.

These two scenarios are most likely to quickly show an explosion under the AI ​​voice customization capability. Based on this, a new wave of developer bonuses will be launched soon.

Generalization customization: AI voice developmentGet a new ferry ticket

With the maturity of the AI ​​voice industry and the increasingly complete technical support of developers, more and more voice bloggers and AI developers have joined the tide of the AI ​​voice ecosystem. With the launch of the AI ​​voice customization function, developers’ basic capabilities have achieved a breakthrough, and the “thousands and thousands of voices” dialog AI device is no longer just an industry imagination.

AI voice developers may soon get new opportunities for “generalized customization” through voice customization. It is foreseeable that AI voice customization will affect the development space and industrial value of AI voice in the following ways:

1. Skill customization has developed rapidly. It is a broad industry imagination to customize a voice skill with the voice of a family member, or even a voice skill exclusive to only family members, couples, and fans. Many voice skills will have a complete change after the option of user voice, which may affect many scenarios such as entertainment, family, education, and companionship.

2. Customizing the life scene becomes the highlight. Hearing the voices of your loved ones and idols in smart homes, smart phone assistants, and smart wearable devices is a thing that can be full of various gameplay. Developers can use various hardware forms to express their imagination of AI voice customization.

3. Countless new ways of “sound copyright”. As mentioned above, the emergence and popularity of AI voice customization capabilities will make “high net worth voice” a new copyright capital. The voices of celebrities, idols, public figures, and even celebrities in specific fields will be popularized in various hardware through the form of AI interaction, creating another vertical outlet for the content industry and the technology industry.

The AI ​​voice pan-customization application, hardware, and exclusive services that can be implemented on a large scale are a new form of integrating users, idols, software developers, and hardware brands, which stimulates the desire to purchase and platform development opportunities. Perhaps it will be a unique landscape in 2020.

AI voice customization will bring three possibilities to 2020

4. The social value and meaning of AI speech have been re-evaluated. From James Vlahos’s story, it is not difficult to see that the AI ​​voice customization ability contains profound and lasting affection and family meaning. People cannot accompany forever, but the intelligence of each other’s voice can amplify many important moments and a sense of companionship. Developers of AI voice customization will likely take on more explorations of family, society and company. From technical value to social value, the influence of AI voice customization will also be amplified in this way.

AI voice customization is becoming a new driving factor in the conversational AI hardware market. Looking closely at the conversational AI hardware and AI voice markets over the past three years, you will find that the market’s volatile growth has shown a close relationship with technological breakthroughs. When a hardware form is in the grass-roots stage, this kind of commercial energy that erupts due to technology is the industry norm.

In other words, the hardware market opened by conversational AI presents a logical relationship: a breakthrough in technological capabilities represents a better user experience, which in turn will directly generate market feedback. In 2019, after Xiaodu brought full-duplex no-wake-up capability, the AI ​​voice hardware market once unsealed the three-legged state, showing a situation of leap forward by itself. The AI ​​voice customization capability, as a technological breakthrough that is more closely related to developers, skill ecology, and content ecology, will obviously continue to maintain this technological leadership and bring more market feedback, so that the qualitative change of certain markets is approaching .

But no matter which platform ultimately gets the right to stay, for AI developers, the industrial opportunities brought by the voice customization capabilities have just begun. Thousands of people’s hardware, ever-changing applications, and every possible technological breakthrough are the results we finally want to see in the new hardware form.