This article is transferred from WeChat public account “Machine’s Ability”

The epidemic of new coronavirus pneumonia in Wuhan continues to spread. As of 7:00 on January 30, 2020, the number of confirmed cases has reached 7,201, and the number of confirmed cases has exceeded SARS in 2003. With the increase in the number of confirmed diagnoses, it is necessary to identify potential hosts and intermediate hosts that may be infected with Wuhan 2019 new coronavirus (2019-nCoV) as soon as possible, and cut off the virus transmission chain.

A recent research paper pointed out that bat and mink may be two potential hosts of the new coronavirus based on a deep learning-based viral host prediction method, of which mink may be an intermediate host.

This method is different from other traditional detection methods and can be regarded as a major breakthrough in AI technology in virus detection.

A recent research paper pointed out that deep learning-based viral host prediction methods, Detected that bats and mink may be two potential hosts of the new coronavirus, and mink may be an intermediate host . This method is different from other traditional detection methods and can be regarded as a major breakthrough in virus detection by AI technology.

A study by Zhu Huaiqiu, a professor at Peking University School of Engineering, entitled “Deep Learning Algorithms Predicting the Host and Infectivity of New Coronaviruses,” was published on the bioRxiv preprint platform on January 25.

This study proposes a deep learning-based virus host prediction method for detecting which host the virus can infect with a DNA sequence as input, and It is applied to Wuhan 2019 new coronavirus (2019-nCoV).

To construct a viral host prediction VHP model, Zhu Huaiqiu’s team used a two-way convolutional neural network (BiPathCNN), where each viral sequence is represented by a thermal matrix of its bases and codons, respectively.

The so-called two-way convolutional neural network (BiPathCNN), that is, the input of the same data set for a convolutional neural network of the same structure will also extract different features, in order to use this difference to mine images This paper proposes a two-way convolutional neural network model for image classification.

Considering the difference in the length of the input sequences, this study established two BiPathCNNs (BiPathCNN-A and BiPathCNN-B), which were used to predict 100bp to 400bp and 400bp to 800bp virus sequence hosts, respectively.

Professor Zhu Huaiqiu, Vice Dean, School of Engineering, Peking University

Zhu Huaiqiu’s team divided the hosts of the virus into five categories, including plants, bacteria, invertebrates, vertebrates, and humans .

In the practical application of viral sequences, by inputting viral nucleotide sequences, VHP will export each host type, reflecting the infectivity within each host type.

The research speculates that Bat coronavirus has a more similar infection pattern than the new coronavirus compared to other coronaviruses that infect other vertebrates . In addition, by comparing the virus transmission patterns of all hosts on vertebrates, It was found that the mink virus transmission pattern is closer to the new coronavirus .

Research shows that the six genomes of the new coronavirus are highly likely to infect humans . The predictions suggest that the new coronaviruses are as powerful as SARS-CoV, Bat SARS-like CoV, and Middle East Respiratory Syndrome Coronavirus (MERS-CoV). Appealing.

In this regard, the method of inferring virus sinks based on deep learning based on AI technology has actually been applied in recent years, which can reduce the repetitive work in the virus detection process or can be regarded as an important breakthrough for AI in fighting the epidemic.

In November 2018, the University of Glasgow research team released a new artificial intelligence research report: Scientists can use the new machine learning algorithm to predict the natural host of viruses such as Ebola and Zika from the genetic level, and take Measures to prevent the spread of these viruses to humans.

Of course, humans have a relatively limited awareness of diseases. Due to the complexity of viruses and disease types, it is difficult to completely replace them with artificial intelligence at this stage. However, in most cases, AI occupies an advantage in processing complex data, and the conclusions drawn cannot be fully guaranteed, and the final diagnosis and judgment still need to be confirmed by humans.

The following is the main content of the paper published by Zhu Huaiqiu, a professor at the School of Engineering, Peking University.

Report name: Deep learning algorithm predicts host and infectivity of new coronavirus

Report version: The report was published on 25 January to the pre-print publishing platform of medical research papers medRxiv

Research findings:


The study speculates that bat coronavirus has a more similar infection pattern than the new coronavirus compared to coronaviruses that infect other vertebrates. In addition, by comparing the viral transmission patterns of all hosts in vertebrates, it was found that the infectious pattern of mink virus is closer to the new coronavirus.

Research shows that the six genomes of the new coronavirus are highly likely to infect humans. The predictions suggest that the new coronaviruses are as powerful as SARS-CoV, Bat SARS-like CoV, and Middle East Respiratory Syndrome Coronavirus (MERS-CoV). Appealing.

Research methods:


Research using VHP (Virushost pThe rediction (virus host prediction) method reports predictions for 2019-nCoV hosts.

Virus sequence data released before 2018 is used to build the training set, while those released after 2018 are used for testing. The data set for training and testing includes the genomes of all DNA viruses, the coding sequences of all RNA viruses, and their host information in GenBank. In the prediction results of 2019-nCoV by VHP, the values ​​reflect the infectivity of the new virus, and the score pattern and p-value model reflect the infectivity pattern of the new virus.

With the online publication of the entire genome sequence, Zhu Huaiqiu’s team predicted potential hosts for 2019-nCoV, as well as 44 other coronaviruses in NCBI refseq and 4 bat SARS-like coronaviruses in GenBank. The results show that the six genomes of nCoV in 2019 are highly likely to infect humans (p value <0.05).

Besides, most of the reported p-values ​​of human infectious coronavirus are the lowest values ​​predicted by the VHP method. The similar probability of 2019-nCoV and other human coronaviruses illustrates the high risk of 2019-nCoV.

VHP method and algorithm verification:

To construct the VHP model, Zhu Huaiqiu’s team used a two-way convolutional neural network (BiPathCNN), where each viral sequence is represented by a thermal matrix of its bases and codons, respectively.

Considering the difference in the length of the input sequences, two BiPathCNNs (BiPathCNN-A and BiPathCNN-B) were established to predict 100bp to 400bp and 400bp to 800bp viral sequence hosts.

The dataset used for training and testing includes the genomes of all DNA viruses, all RNsA virus coding sequence and its host information in GenBank. In order to develop methodological experts for the prediction of potential host types of new viruses, training sequences were constructed using virus sequence data released before 2018, and tested using virus sequence data released after 2018.

The hosts of viruses are divided into five categories, including plants, bacteria, invertebrates, vertebrates, and humans .

The figure above details the host subtypes included in these five types. In the practical application of viral sequences, by inputting viral nucleotide sequences, VHP will export each host type, reflecting the infectivity within each host type, respectively. In addition, VHP provides 5 p-values ​​to count the difference between infected and non-infected events.

To evaluate the performance of VHP, Zhu Huaiqiu’s team compared the AUC (area under the curve) of blast and VHP. The comparison shows that the average AUC of VHP is high (see the figure below).

This report predicts the possibility of nCoV infecting humans in 2019 and implies the risk of nCoV in 2019.

The report also shows that the VHP model can play an important role in public health services, providing a powerful tool for preventing new viruses that may infect humans.Powerful help, which provides reliable predictions of host and human potential for infection.