Allen Institute pushes AI model: facts can be checked for the opinions of the paper

It is very important for scientific researchers to check the facts in the thesis and conduct repeated literature review. But it is not a simple task to quickly check and check the countless papers in the past.

Researchers at the University of Washington and the Allen Institute of Artificial Intelligence said they have developed an AI system called VeriSci, which can automatically conduct scientific views Check the facts. This paper called “Fact or Fiction: Verifying Scientific Claims” was published on the preprint website Arxiv on May 1. According to the paper, this AI system can not only identify abstracts that support or oppose research opinions, but also extract evidence from the abstract to provide an argument for its own predictions.

Fact checking the viewpoints in the paper has another important role, that is, it can help solve the problem of duplication of scientific literature. In the literature, it was found that the repeatability of the study was very difficult. In 2016, in a poll of 1500 scientists, 70% of them said that they had tried to check the literature, but failed to achieve it.

Specifically, the researchers first established a SciFact corpus. This corpus contains: scientific claims; abstracts that support or refute scientific claims; and reasonable annotations. Then, using citation technology, this method can annotate the scientific claims in the scientific literature, and then the researchers train the system based on the BERT model (bidirectional encoder representation quantity model, launched by Google, which is a model in the field of natural language processing). Thus, sentences can be identified and each claim can be marked.

According to the introduction, the SciFact data set has checked the facts of 1409 scientific views from 5183 abstract corpora. These abstracts are collected from a public database (S2ORC) with millions of scientific papers. To ensure that only high-quality papers are included, the research team eliminated articles with less than 10 citations and partial texts, and randomly sampled them from a series of award-winning journals covering basic sciences and clinical medicine.

At the same time, in order to label SciFact, the researchers recruited a group of annotators. The annotator’s job is to find the citation sentence in the context of the original article, and can rewrite three opinions based on the citation. It is necessary to ensure that the opinion matches the original meaning. On the other hand, natural language processing experts are responsible for creating keyword anti-filtering to obtain examples of abstract refuting keywords. Doing so can also avoid introducing scientific views with obvious prejudices. NoteThe interpreter marks the abstract of the paper with support, rebuttal or insufficient information, and appropriately marks the reasons for the support or rebuttal. The researchers also introduced interference factors to avoid the occurrence of the same citation sentence in different paragraphs of the same article.

This corpus contains: scientific claims; abstracts that support or refute scientific claims

After the SciFact data set is built, the training VeriSci model includes three parts: abstract retrieval, that is, retrieval of the abstract with the highest similarity to a given opinion; selection of basic principles, which can determine each candidate abstract The basic principle of label prediction is to make the final label prediction. In the experiment, the researchers claimed that about half (46.5%) of the probability that the system can correctly identify supporting or refuting tags and provide reasonable evidence.

In order to prove the versatility of the system, the research team conducted an experimental demonstration around the scientific paper of the new coronavirus. According to the report, most of the points raised by VeriSci related to the new coronavirus (23 out of 36) are considered reasonable by medical annotators, indicating that the model can be successfully retrieved and classified.

But VeriSci is not perfect, because it is often confused by context, unable to synthesize arguments, or fail to integrate information from different sources for judgment.

“Scientific fact verification presents a series of unique challenges that expand the limitations of neural models in complex language understanding and reasoning. Despite its small size, SciFact Training VeriSci is better than training based on fact-checking datasets built from Wikipedia articles and political news. “The researchers said in the paper:” Research offers hope, but our findings show that additional Only work can improve the performance of the end-to-end fact-checking system. “

domeet webmaster