This article is from the WeChat public account: Heart of the Machine (ID: almosthuman2014) , author: Joseph Bullock et al., Almost Human compile

With the new crown epidemic Researchers around the world are working to alleviate the epidemic. Its research focuses include tracking virus transmission, promoting virus detection, developing vaccines, finding new treatments, and understanding the socio-economic impact of the epidemic. In this review article, researchers from Durham University, University of Montreal, WHO and other institutions explored the role of AI-related technologies in the epidemic, and summarized the application of AI in three levels: medical, molecular, and social.

Specifically, Molecular level includes related research such as drug mining; medical level includes diagnosis and treatment of individual patients; social level includes epidemiological and information medical research, etc. In addition, the paper reviews the currently available open source datasets and other resources.

The purpose of this review is not to evaluate the importance of the technology described in the article, nor to recommend it, but to show readers the current AI technologyThe scope of application of surgery in the fight against epidemics.

Paper link: https://drive.google.com/file/d/1vDcb6HeS-hufNgqH0dDhIEGjuJpnnkzT/view

Medical level: from diagnosis to outcome prediction

To date, the applications of AI to COVID-19 have mostly focused on the diagnosis of medical imaging. In recent literature, in addition to methods of using patient medical data to predict disease progression, non-invasive detection methods for disease monitoring, there are also cases where AI assists the computer in CT diagnosis.

Medical Imaging Diagnosis

RT-PCR test is the key method to diagnose COVID-19, but this method still has limitations such as sample collection and analysis time, so people pay more and more attention to using medical imaging technology for COVID-19 diagnosis. COVID-19 has special radiological features and image modes. These features can be observed through CT scans, but for medical staff in the radiology department, identifying these images is also time-consuming. Therefore, during the CT scan diagnosis process, Using machine learning methods is an ideal choice.

Many studies have identified the diagnosis as a binary classification problem, namely “health” and “neovirus positive.”

Wang et al. used the improved Inception neural network architecture to train the areas identified by doctors to perform binary classification on healthy patients and patients with new crowns. Based on a dataset of approximately 1,000 image slices from 259 patients, the researchers trainedIdentify the suspected COVID-19 model and provide the results to the doctor for further diagnosis.

Chen et al. also found that training the UNet ++ neural network on more than 6000 CT image slice data labeled by professional doctors has a performance close to that of professional doctors. This model was later deployed at Wuhan University People’s Hospital to assist doctors in speeding up case analysis and diagnosis, and the model is now open source.

Other machine learning methods attribute the diagnosis to 3 classification tasks: Health, COVID-19 patients, and other types of pneumonia patients .

In Xu and Song’s research, the classic ResNet architecture can be used for feature extraction. Xu et al. Added several fully connected layers for classification, and Song et al. Added a feature pyramid network (Feature Pyramid Network) and attention The force module makes the network more complex, but performs better in terms of fine-grained images.

Both studies show that there may be multiple suspected results even during diagnosis (including non-COVID-19 pneumonia types) This method can also distinguish accurately.

Furthermore, some studies have adopted a fusion approach: Combining existing software with specific machine learning methods for greater accuracy

In the study by Gozes et al., commercial medical imaging programs can be used to process raw images and then use them in conjunction with an ML Pipeline. This two-step approach includes a U-Net architecture trained on pulmonary abnormal medical image data and Resnet-50 trained on ImagetNet, where the image classification has been fine-tuned to “Coronavirus” and “Health”.

In the study by Shan et al., a “human-in-the-loop” approach was adopted to reduce the labeling time required for machine learning architectures. Researchers use a small amount of manually labeled data to train initial models based on the V-Net architecture.

This model segments the new CT scan image, and then corrects it by a professional doctor, then iteratesContinuous feedback to the model during the process. This approach allows systems based on deep learning technology to be used for automatic segmentation and statistics of infected areas, as well as assessing the severity of a patient’s COVID-19 condition.

Research shows that the performance of this model is gradually improved. After training with 200 annotated sample data, the manual time required for new image analysis is reduced from more than 30 minutes to more than 5 minutes. This method combines the advantages of machine learning with human expertise and is a promising research direction.

Non-invasive measurement of disease tracking

Another original method that does not require special medical imaging equipment is the use of a Kinect depth camera to identify a patient’s breathing pattern.

This method is based on recent clinical findings on the symptoms of COVID-19 patients, that is, the breathing pattern of COVID-19 patients is different from other influenza or common cold, which shows the symptoms of shortness of breath.

Based on the above clinical information, the researchers developed a two-way GRU neural network with an attention mechanism and used it to identify abnormal breathing patterns.

Researchers trained the model using real data from 20 participants and a large amount of simulation data based on real records. Although these abnormal breathing patterns are not necessarily related to a true diagnosis of COVID-19, the prediction of these shortness of breath symptoms can be used as a primary diagnostic feature to help monitor potential patients on a large scale.

Some other solutions use mobile phones to detect COVID-19, some use embedded sensors to identify COVID-19 symptoms, and some answer high-risk patients by answering some key questions in the mobile phone questionnaire. Although the above methods are all important attempts in mobile technology, the current research is not enough to evaluate the feasibility and performance of these methods.

Patient predictions

Yan et al. proposed a prediction method based on patient clinical data and characteristics in blood sample detection. This method can help clinicians to identify high-risk patients as early as possible, hoping to improve the prognosis of patients and reduce the mortality of critically ill patients. .

Similar methods to this research include a prediction model based on the XGBoost algorithm, which is used to predict the risk of death and identify performance.Enabling key measurement features for testing in hospitals. Based on data from 375 patients, the authors screened three key clinical indicators from more than 300 input features, providing a clinically heuristic basis for predicting patient mortality. A big advantage of this method is that it is well interpretable, because the three indicators screened are related to several of the most important factors in the pathological progress of COVID-19, namely cell damage, cellular immunity and inflammation.

A complementary study was training a U-Net variant on semi-automatically labeled CT images, a method designed to predict whether COVID-19 patients would need to be hospitalized for long periods of time. This means that once the initial diagnosis is completed, we can still use machine learning to predict the severity of the patient’s condition and whether a long-term hospital stay is required.

Molecular level: from protein to drug mining

Prediction of protein structure

The 3D structure of proteins is determined by their genetic sequence, and the structure will affect the function and function of the protein. In general, protein structure is determined by experimental research methods such as X-ray crystal diffraction patterns, but these methods are expensive and time consuming.

Computational models have recently been used to predict protein structure. There are two main methods: one is template modeling, and its principle is to use similar proteins as template sequences to predict protein structure; the other is to Template modeling, which mainly predicts the structure of proteins without known similar structures.

At the end of 2018, Google DeepMind launched AlphaFold, which can use gene sequences to predict protein structure. Given a new protein, AlphaFold uses neural networks to predict the distance between pairs of amino acids and the angle between the chemical bonds that connect them. Based on the two physical properties predicted by the neural network, DeepMind also trained a neural network to predict the distance between protein paired residues (residues) Independently distributed, these probabilities can be combined into a score that estimates the accuracy of the protein structure. Currently, AlphaFold can predict the structure of 6 proteins related to SARS-Cov-2, including SARS-Cov-2 membrane protein, protein 3a, Nsp2, Nsp4, Nsp6, and papai.n-like protease.

Improving viral DNA testing

Currently, machine learning and new genomic technologies are also used to improve the performance of PT-PCR testing. Metsky et al. Used CRISPR (a tool that cuts the genetic code chain of a specific gene and uses an enzyme to edit the genome) to carry out inspection analysis design, To detect 67 respiratory viruses, including SARS-CoV-2. In addition, for those assays that are predicted to be sensitive and specific and cover multiple genomes, some ML models can speed up their design.

Old medicine new use

One way to discover that current drugs can be used to treat COVID-19 is the biomedical knowledge map. Biomedical knowledge map networks can capture the links between proteins and different entities, such as drugs, so that they can better understand how they are related to each other.

Richardson et al. used the biomedical knowledge map to identify baritinib, a drug commonly used to treat arthritis, but because it can inhibit AP2-related protein kinase 1 (AAK1) , making it difficult for the virus to enter host cells, so the drug may be suitable for the treatment of COVID-19.

Ge et al. also proposed a similar method to construct a knowledge map of human proteins, viral proteins, and drugs, using a dataset that captures the relationships between these entities. This knowledge map is used to predict drug candidates that may be effective. The authors have identified the polyadenylation polymerase inhibitor CVL218, which is currently in clinical trials.

Other studies have also used models created to predict the complex affinity of protein ligands in order to solve the problem of new use of old drugs. Hu et al. Used a multitasking neural network to make generalized predictions of affinity. The author has identified a series of SARS-Cov-2 related proteins, such as RNA-dependent ribonucleic acid polymerase, 3C-like protease, helicase, and envelope protein, etc., using the data set of 4895 drugs to expand Targeted therapy. They recommend 10 drugs that might work and thisTarget drugs and complex affinity scores for these drugs. To improve the interpretability of the model, they also predicted the exact location where each target protein might bind.

Similarly, Beck et al. used their Molecule Transformer-Drug Target Interaction (MT-DTI) complex affinity model, Identified the US Food and Drug Administration’s (FDA) approval of antiviral drugs for 6 coronavirus proteins (respectively 3C-like protease, RNA-dependent ribonuclease, helicase, 3″ -to-5 “exonuclease, endoRNAse and 2” -O-ribose methyl transfer Enzymes) effective drugs. The MT-DTI model inputs string data in the form of SMILES data and amino acid sequences, and uses a text modeling method that borrows from the BERT algorithm. In addition, drugs identified by this model may have targeting effects on the aforementioned proteins.

Finally, Zhang et al. used a dense fully connected neural network, which was trained on the PDBBind dataset to predict the affinity of the complex to identify potential inhibitors of 3C-like proteases. They created a homologous (template) model using SARS virus variants and explored the existing complex (such as ChemDiv and TargetMol) and tripeptide data sets to find therapeutics with targeted effects on proteins.

Drug Discovery

Some researchers are trying to find new compounds to treat new coronary pneumonia. Zhavoronkov et al. (2020a) and others used a proprietary pipeline to find class 3C-like hydrolase inhibitors. Their model uses three inputs: protein crystal structure, crystal-like examples, andThe protein model itself. For each input type, the researchers fit 28 different models, including generating self-encoders and generating adversarial networks. Researchers use reinforcement learning to explore potential drug candidates, with a reward function associated with some criteria-drug similarity, novelty, and diversity. At the same time, they confirmed that the identified candidate compounds were different from existing compounds, indicating that they did find different drugs.

Tang et al. (2020) also uses reinforcement learning to discover drugs. Researchers have sorted out 284 known molecules that can suppress SARS-like viruses. They shred these proteins into 316 fragments and then combined them with advanced deep Q-learning for drug design. This reinforcement learning reward function has three evaluation angles: the drug similarity score, the addition of a predefined “prone to use fragment, and the appearance of known pharmacodynamic groups ( Specific structure related to the efficacy of the compound) .

Results, 4922 results were filtered by heuristic search. Finally, the top 47 compounds were evaluated in molecular simulations. Researchers select the compounds that are most likely to be effective and conduct production and testing.

Social level: epidemiology and informatics

Epidemiology

Epidemiological research covers a very wide area. The scale and relevance of the epidemic, and the real-time update of data have led to many types of modeling. But this time the team will focus on the use of machine learning to complete the case of epidemiological modeling.

Given the rapid rate of epidemic infection, short-term real-time prediction is one of the important sources of information. At the same time, the model must have flexibility to adapt to various changing protocols or procedures.

Hu et al. (2020b) † Collected by WHO and other forecast participants from January 11 to February 27, 2020 Data collected during the development to create a new domesticAccumulate or add data sets for confirmed cases. This information is mainly used to train the adjusted autoencoder (MAE) in order to predict new cases in real time and estimate the severity and persistence of the epidemic time.

Similarly, Al-qaness et al. (ANFIS) (Jang, 1993) , flower pollination algorithm (FPA) (Yang, 2012) and salp swarm algorithm (SSA) (Mirjalili et al., 2017) to optimize the parameters in the model.

And Mizumoto et al. (2020) use the ML method to use infection data collected from the Diamond Princess cruise ship to understand asymptomatic Incidence of cases. The authors used this data to model the time series by Bayesian analysis and used the Hamiltonian Monte Carlo (HMC) and No-U-Turn -Sampler (Homan & Gelman, 2014) Adjust model parameters to estimate the probability of asymptomatic infection. Although it is important to analyze in this closed environment, it remains to be seen whether it is worthwhile to apply to a wider population.

Informatics

Currently, social media and online platforms have become the main channels for spreading outbreak-related information, and the team is more focused on “information epidemics”, such as misinformation or rumors will spread.

Cinelli et al. (2020) † Analyzed the content of social media related to COVID-19, authored from Twitter, Instagram, YouTube, Reddit, and 8 million comments or posts collected from Gab using COVID-19 keywords from January 1st to February 14th, 2020. The author estimates the degree of participation in the COVID-19 topic and compares the development progress of the topics across platforms. Engagement is a (e.g. comments, likes, etc.) that reflects the cumulative number of posts and feedback on posts within 45 days. The author uses phenomenological (Fisman et al., 2013) and the classic SIR model to represent the amount of information spread or copied.

Similarly, Mejova & Kalimeri (2020) † The research object is to use Facebook ads with virus-related content. “And” COVID-19 “and other keywords to search all ads, the scope of which covers 34 countries and regions, and collected more than 923 results. Most are in the U.S. and EU, and 5% of them are highly misleading.

In addition, some researchers have begun to organize the specific news content of the new crown virus, and performed manual and automatic authenticity verification and correlation analysis. Pandey et al. (2020) † developed a channel to assess the similarity between daily news headlines and WHO recommendations. If the similarity is above a certain threshold, this new article will appear on the user’s timeline with the WHORelated suggestions. The threshold of similarity is determined by manual review and is continuously updated according to user feedback. For conflicting information, this method can help the public identify accurate and reliable news reports, and it can also promote important guidance articles to generate a wider range of images, and promote official attention and recommendations.

Data sets and other resources

Using AI to fight the new crown virus is inseparable from various open source datasets and other resources. This article focuses on the currently available case data, text data, and biomedical data.

Case data

Case data refers to the number and geographical distribution of cases. Such data is important for tracking the spread of the COVID19 epidemic. The case data listed in this review include:

  • WHO COVID-2019 status report: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports

  • John Hopkins CSSE: https://github.com/CSSEGISandData/COVID-19

  • nCoV2019 GitHub project: https://github.com/beoutbreakprepared/nCoV2019

  • Humanitarian Social Exchange Project: https://data.humdata.org/event/covid-19

  • Project developed for medical experts: https://github.com/CodeForPhilly/chime

  • Mobile change data after blockade in Italy: https://covid19mm.github.io/in-progress/2020/03/13/first-report-assessment.html

    Text data

    The NLP method has played an important role in this epidemic research. The large amount of text information interpreted by this technology can help us understand what information is currently known. (Such as virus transmission, environmental stability, risk factors, etc.) . The data in this section includes:

    • WHO Global New Coronavirus Research Literature Database: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov

    • CORD-19, the largest open source dataset of related documents of Xinguan currently: https://pages.semanticscholar.org/coronavirus-research

    • Kaggle Open Source Dataset Challenge: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

    • Other open source datasets: https://www.ncbi.nlm.nih.gov/research/coronavirus/; https://covid-19.dimensions.ai/

    • Social media dataset: https://github.com/echen102/COVID-19-TweetIDs; https://www.kaggle.com/smid80/coronavirus-covid19-tweets


      Biomedical data

      Currently, there are not many open source datasets and models for diagnosis. Some of the CT scan methods mentioned above can be found, but the methods used to train the system are not systematically open source. Currently efforts in this direction include:

      • Covid Chest X-Ray Dataset: https://github.com/ieee8023/covid-chestxray-dataset

      • Data Against Covid-19: https://www.data-against-covid.org/

        In terms of genome sequencing and drug mining, there are several datasets based on previously existing plans or created from scratch for COVID-19. Projects worth noting in this regard include:

        • GISAID Initiative: https://www.gisaid.org/epiflu-applications/next-hcov-19-app/

        • RCSB protein database: http://www.rcsb.org/news?year=2020&article=5e3c4bcba5007a04a313edcc

        • Drug mining information sharing website: https://ghddi-ailab.github.io/Targeting2019-nCoV/

        • Nextstrain to track the genetic diversity of the new crown virus: https://nextstrain.org/

        • Protein Foldit: https://fold.it/

          At the end of the article, the researchers called on the community to carry out more interdisciplinary cooperation and data sharing to fight the epidemic through the power of the international community.

          This article is from WeChat public account: The Heart of the Machine (ID: almosthuman2014) < span class = "text-remarks">, author: Joseph Bullock et al., Almost Human compile