After the PC and mobile phone, where is the next entrance to the Internet? Cars are probably one of the most realistic answers.

As an important part of the Internet of Things, Internet of Vehicles has become a new field for technology and Internet giants to enter.

In 2014, Apple launched the iPhone-based smart car system CarPlay; Google also released the benchmark product Android Auto in the same year. The two strong forms of smartphone systems are once again translated into the smart in-vehicle system market.

On top of the underlying system, application developers in the mobile age have also begun to think about the interactive logic of the App in the driving scene. In August of this year, Tencent released the car WeChat products. Car WeChat only retains several of the core functions of WeChat: message viewing, sending and voice calls. For the driver’s safety, the on-board WeChat interaction mode only supports voice and steering wheel buttons.

In the era of car networking, cars are becoming more intelligent, and the interaction between people and cars will be more diverse and frequent. Nowadays, the operating system and upper-layer applications of smart cars are booming, but under the software layer, how to realize efficient, safe and easy-to-use human-computer interaction between people and vehicles is still a big consideration for all practitioners. problem.

From traditional mechanical buttons to modern touch screens, voice control, gesture control, and more and more people and vehicles interact. In the future, the active perception function of smart cars may become the ultimate pursuit of human-car interaction.

The era of multimodal interaction is coming

From the 1885 Mercedes-Benz to today’s pure electric vehicles, the car’s technical architecture and product form have undergone earth-shaking changes. But for automakers, adding as many features as possible to a car may be the only constant pursuit.

If a 19th-century car driver crosses the moment, it is likely to be amazed by the myriad of buttons, knobs and touch screens on modern cars. Seat adjustment, air conditioning, radio, music playback, modern cars integrate more and more functional modules, these modules enrich the car driving experience, but also bring more control burden to the driver.

In the context of automotive intelligence, the driver’s “cognitive overload” is inevitable. How to improve the interactive experience of people and vehicles, let the driver focus on the driving itself, free from various trivial things such as seat adjustment, route navigation, listening to songs, etc., thus enhancing the interactive experience of people and vehicles and driving safety, will become the next generation The primary goal of human-car interaction products.

Multi-model car interaction, AI-aware evolution of smart carsTraditional human-car interaction mode with “touch + voice” as the core needs new changes

Today, the traditional human-car interaction mode with “touch + voice” as the core has almost reached the ceiling of the interactive experience of the users inside the car. If you want to achieve more complex human-car interaction under the premise of ensuring driving safety, “multi-modal interaction” is imperative.

What is multimodal interaction?

This is a AI interaction that combines multiple senses such as vision and voice. The driver and the passenger can command the vehicle through various modes such as voice and gesture, and the vehicle also has a smart sensing function, which can more accurately determine the user’s intention. Taking music playback as an example, the traditional car music playing mode is that the driver selects and controls music playing through buttons, knobs, touch screens and the like. In the single mode, the user can use the voice to control the music playback. After multi-modality is added, the vehicle can identify the user who issued the command through the face and voiceprint recognition, and provide customization according to the user’s personal preferences and environmental scenes. The song list. In addition, multi-mode interaction will improve the interactive experience in many practical application scenarios, such as adjusting different windows based on lip language and voice, intelligent active capture based on emotion recognition, voice reminding service based on attention detection and so on.

In fact, the significance of multi-modal interaction is far more than liberating user interaction in the car. Through the links of perception, recommendation, and interaction, multi-modal interaction will give life and wisdom to the car, let the car realize active thinking, and continuously optimize the in-car service and scene iteration.

Multi-model car interaction, AI-aware evolution of smart carsMulti-mode voice interaction will transform the active service provider

Multi-mode interaction will enable the device to combine user behavior habits to more accurately determine user intent, and achieve stereoscopic intelligent multi-mode interaction in the AI ​​era. All of this is no longer at the product concept level, and some domestic technology providers have begun to implement multi-modal human-computer interaction products.

Prospects and challenges of technology landing

But the technical realization of multi-mode interaction between human and car interaction is not easy.

The multi-mode interaction trend is in dialogue with the horizon. In the multi-modal human-car intelligent interaction solution of the horizon, the camera, microphone array, chip platform and CAN bus constitute the core hardware components. Among them, the camera realizes the recognition of faces and gestures, the microphone is used to obtain voice information, and the chip and CAN are negative.Responsible for communication and lower the instructions to the output layer. Among them, not only the technology of different forms, but also the fusion of data from different sources.

Multi-model car interaction, AI-aware evolution of smart cars

The horizon hopes to blend visual and speech through multimodal interactions to achieve “1+1>2” effects

Traditional single-mode interactions, such as voice interactions, often have significant limitations. Compared with the smart speakers in the home scene, the interior scene is noisy and crowded, and the speech recognition is more difficult. How to perceive the different voice commands proposed by the driver and multiple passengers and accurately position the instructions to the person has always been a problem. China’s new car brand ideally its first product, the ideal ONE, adopts the horizon multi-zone recognition scheme, adopts sound source localization, blind source separation and noise reduction algorithm to realize multi-phone interaction in the vehicle environment.

But on the horizon, this is not enough.

In a recent collaboration with a host factory, Horizon provided a multimodal command word scheme. The scheme adopts visual recognition and speech recognition technology. Through the fusion analysis of lip language and speech, the false alarm rate and the false positive rate under no sound are reduced from the underlying logic. The head of the multi-mode product of the horizon said, “Multi-mode command words are the first in the industry to combine lip-feature and speech features for speech recognition algorithms. Joint learning of two modal data can effectively improve the command word recognition rate in high-noise environments. And reduce the rate of misrecognition without audible.”

Multi-model car interaction, AI-aware evolution of smart cars

Four microphones in the ideal ONE car

In terms of visual interaction, taking seat adjustment as an example, in the history of automobile development, the seat adjustment function has been constantly evolving from scratch, from mechanical to electric, from manual adjustment to intelligent memory. The seat memory function based on face recognition can completely liberate the user’s hands. After the car monitors the user to get on the car, the seat position can be automatically adjusted according to the user’s setting, without any operation by the user.

In addition, as mentioned above, in the era of multi-mode interaction, the core experience of smart cars is not limited to personalization. Active, another step of multimodal interactionKey words. For example, when the car sensor detects that the user in the car is happy, he can take the initiative to capture. When the driver detects dangerous driving signs such as closing eyes, yawning, playing mobile phones, etc., the vehicle can also issue a reminder, even classify the fatigue and provide different Fatigue degradation and even alarm service.

Before the driverless popularity, the car’s intervention in the driver’s dangerous driving behavior will greatly reduce the accident rate.

The iterative evolution of chip computing power and algorithms is a key driver

There is no doubt that the automotive industry is experiencing profound and rapid changes. A car is no longer just a means of transportation. It is also a way of life. It is also a fashion trend. There is the possibility of innovation with unlimited service. It is not only autonomous driving and assisted driving that makes travel more safe and beautiful. In terms of business model innovation, there are time-sharing and shared travel. In the front-end high-tech technology integration, there are high-speed interconnections of car networking and 5G blessing, and cars. Roads lead to the advancement of infrastructure; in the car human-computer interaction, smarter and more advanced interactive technology is also constantly applied, this change is like the turn of the PC era and the mobile era Apple brought touch screen interaction.

In the near future, we need to build a rich artificial intelligence software framework on the underlying computing power support, which has environment awareness. It can use three kinds of sensors to model the three-dimensional dynamic panorama of the surrounding environment. All the surrounding targets, especially the behavioral predictions of dynamic targets, are used for decision-making planning. At the same time, the intelligent cockpit technology, such as multi-mode human-computer interaction, will have a richer experience, such as a clearer display. Natural voice and visual interaction, but regardless of the perception of the environment outside the car or the intelligent cabin in the car, it also puts forward increasing requirements for chip computing power and algorithms.

The role of artificial intelligence companies such as the Horizon in the automotive industry chain is to provide the underlying computing power with the edge AI chip as a partner to support the upper application software.

The rapid development of the automotive industry is driven by the development of the entire information industry, including sensors, AI chips, and state-of-the-art artificial intelligence software systems.

At present, the industry generally believes that before the advent of the era of automatic driving, intelligent driving functions such as ADAS (assisted driving), DMS (driving behavior monitoring), and multi-mode interaction will become the standard modules for intelligent cars in the transition phase of automatic driving. In the future, the car’s AI perception will continue to evolve, and how to use artificial intelligence to make cars smarter and more interactive, will be an important goal for car manufacturers and related technology companies.