“What we really do commercialize is the technology behind Magi – an open information extraction based on migration learning.”

文 | Huamu Sanchang

Edit | 汝晴

Previously reportedPeak Labs” has recently released the public version of its artificial intelligence system Magi “magi.com“. Through this search engine, the user can input the keyword to obtain the structured knowledge and web search results that Magi learned autonomously from the Internet text. Each structured result will be followed by the source link and its credibility score.

magi.com search example

Magi is a machine learning-based information extraction and retrieval system that does not use semi-structured features such as HTML tags, and handles natural language text directly without the need for preset fields and keywords. It can synthesize public texts on the Internet and private data extraction within the enterprise into structured knowledge, providing users with self-renewing, quantifiable and resolvable, traceable knowledge systems. Moreover, this system has Lifelong machine learning, which can openly acquire and independently learn information on the Internet and continuously enhance its ability to process natural language texts.

The quality of Internet corpus is uneven, and plagiarism, automatic generation, and malicious tampering can cause a lot of factual errors, and may even make the model worse and worse in the process of continuous learning adjustment. Previous procedures involving web corpus processing often used whitelisting mechanisms to circumvent this problem, but the whitelisting mechanism also lost a lot of valuable value while filtering out unreliable sources.Information. Magi introduces statistical signals in traditional search through independent research and development of the entire network search engine to help assess the quality of information.

“In the academic world, the more citations are cited, the more influential they are; in web search, the more anti-chains a URL has, the higher the importance of the page. For knowledge, when The facts are expressed in more contexts, which should be more correct and circulated.” Peak Labs founder Ji Yichao told, “Magi gives higher quality to the source and has multiple contexts and expressions. Because the different contexts and expressions indicate that the content has been re-refined or interpreted from multiple angles, and cross-validation of multiple inputs reduces the risk of AI making mistakes.”

On the final results page, magi.com will give a credibility score for each result, then use color to distinguish between credibility, green for higher credibility, and red for lower.

magi.com uses color to distinguish credibility

“In addition, it is important to emphasize that magi.com is the database of our technology’s external presentation and background knowledge. What we really do commercialization is the technology behind Magi – based on natural language understanding (NLU) and migration learning. Open information extraction.” Ji Yichao said.

Ji Yichao told Magi that the services that Magi can provide for corporate customers include:

  1. Structured data and knowledge systems. What Magi is learning is the general background knowledge on the Internet, and these previously existed texts are difficult to use directly by AI. The service is aimed at companies that need structured data, such as various voice assistants and decision engines, which can retrieve information from Magi’s database in DSL or vectorized form to enhance their performance.

  2. Customized natural language understanding solutions and enterpriseassisted RPAs. Peak Labs uses its own pre-training data and the number accumulated by Magi based on migration learning technology.According to to improve the performance of the information extraction service. Customers in vertical industries such as finance, healthcare, and consulting only need to provide a small sample to get a customized natural language understanding program. For example, in the travel industry, Magi can be customized to automatically read user-written travel notes and discover POIs and related attributes.

Magi Customized Service Training Interface

Data as a “fuel” for Al training is a necessary condition for its value. According to IDC statistics, the amount of data produced globally will increase from 16.1ZB in 2016 to 163ZB in 2025, of which 80% to 90% are unstructured data such as text, pictures, audio and video. Non-structuralization cannot be read by the AI, so structured data needs to be processed. This process is part of China’s artificial intelligence basic data services.

Before, iResearch released the “2019 China Artificial Intelligence Basic Data Service Research Report”. In 2018, China’s artificial intelligence basic data service market scale was 2.586 billion yuan, of which data resource customization services accounted for 86.2% of the estimated 2023. The annual market size exceeded 11.3 billion yuan. The compound annual growth rate of the industry is 23.5%. For start-up companies, there are still opportunities for development in this industry.

On the other hand, many sub-vertical vertical industries lack sufficient structured data to train AI models, and how to use small sample data to train AI has become a trend. “We found that the scarcity of structured data greatly limits the application of artificial intelligence in the segmentation industry. Building a customized natural language understanding solution from zero requires professionals and a lot of time cost – just to develop a medical industry artificial intelligence. It is unimaginable for busy doctors to take out a few months for crowdsourcing.” Ji Yichao said.

From a technical point of view, Magi’s migration learning NLU algorithm has the advantage that the AI ​​engine can be applied to the professional vertical field simply by training the AI ​​engine with general data. Magi first uses Internet knowledge and its own data for pre-training, while the professional vertical field task requires only a small amount of manual data annotation to achieve large-scale data training. For the enterprise, this technology is also reduced for customization.The cost of AI.

“We hope that Magi can help companies reduce the cost of AI customization like ImageNet,” said Ji Yichao.