Artificial intelligence has obvious advantages for processing massive genetic data.

Editor’s note: This article comes from the WeChat public account “ machine it can “(ID: almosthuman2017), author: Fu Haitian, Fan Xiaofang.

With the rapid advancement of detection technology and the drastic reduction of detection costs, genetic testing services have gradually entered the public’s field of vision. However, although gene detection technology has reached the level of widespread clinical application, genetic data calculation has become its biggest bottleneck. The sequencing of biological genomes usually involves the processing of up to terabytes of data, which places extremely high demands on data processing and analysis techniques.

Artificial intelligence has obvious advantages in processing massive genetic data, and has been widely used in quantitative analysis of genetic indicators, construction of gene drug databases, construction of genetic disease knowledge bases, and interpretation of genetic test report data. With the deep integration of artificial intelligence and gene detection technology, it is expected to help genetic testing services to realize the automation, batching and personalization of gene analysis, and improve the accuracy and speed of gene data analysis.

I. Overview of genetic testing

1.1 Main applications of gene detection in the medical field

① Tumor screening (individualized medication / concomitant diagnosis) ② New drug development ③ Genetic disease detection ④ Cardiovascular disease ⑤ Reproductive health (newborn disease screening / preimplantation detection / non-invasive prenatal screening) ⑥ Drugs Genomics and Basic Medical Research

1.2 Mainstream technologies for gene detection

① Polymerase chain reaction (PCR) ② Single molecule sequencing ③ High-throughput sequencing technology (NGS) ④ Gene chip

1.3 Genetic testing service process

Decoding virus genes, AI fights against epidemic situation, and hopes to solve mass gene interpretation problem

1.4 AI-related technologies applicable to genetic testing

Machine learning: The use of machine learning techniques to identify patterns in large sets of genetic data, used to predict the likelihood that an individual will develop certain diseases or to help obtain potential treatment designs.

Deep learning: CNN, RNN and other network models can be used to identify different components of genes, such as exons, introns, promoters, enhancers, splice sites, non-transcribed regions, etc.

Blockchain: Using a decentralized consensus approach, the storage baseA network that provides value quantification and equity returns for data contributors, genetic science workers, technology developers, and community ecological participants due to big data.

Data mining: can be used to study the correlation of gene expression, such as the correlation between expression and methylation, the correlation between expression and mutation, the correlation between expression and SNP site, the correlation between expression and DNA copy number .

II. Gene testing industry and market overview

2.1 Genetic testing market size and policy background

According to Foresight.com, from 2007 to 2017, the growth rate of the market size of the Chinese gene sequencing industry is higher than the global level, with an average compound growth rate of 47.5%. In 2018, the global gene sequencing market size was around US $ 11.7 billion, of which the domestic gene sequencing industry market size reached 8 billion yuan, and it is estimated that it will reach 9.8 billion yuan by 2020.

In May 2017, the “Thirteenth Five-Year Plan for Biotechnology Innovation Special Plan” of the Ministry of Science and Technology mentioned the development of next-generation gene sequencing technology, and attached importance to the application of single-molecule technology and analysis and interpretation of sequencing data. In July 2017, the State Council released the “New Generation Artificial Intelligence Development Plan”, which proposes to conduct research on large-scale genome recognition, proteomics, metabolomics, and new drug development based on artificial intelligence.

2.2 Genetic Testing Industry Chain

Decoding virus genes, AI fights against epidemic situation, and hopes to solve massive gene interpretation problem

2.3 Business Model of Gene Detection

Scientific-level genetic testing: a solution that covers multiple modules including research program design, gene sequencing, data mining, and functional verification.

Clinical-level genetic testing: including microbiological, genetic disease, and tumor testing, serving medication guidance and treatment decision-making.

Consumer-level genetic testing: including ancestral analysis, alcohol metabolism, nutritional metabolism, skin characteristics, health risks and other testing items.

III. Segmented application scenarios and representative organizations in the field of gene detection

Decoding virus genes, AI fights against epidemic situation, more hope to solve the problem of massive gene interpretation

Fourth, genetic testing represents artificial intelligenceTechnology application products / solutions and application cases

Emedgene-AI Assistant: The company has developed a natural language processing (NLP) engine that can automatically read newly released scientific literature and incorporate it into Emedgene’s total knowledge base. Emedgene’s genomics AI assistant’s job is to automatically collect the logic used in interpreting genetic cases. This logic is incorporated into the AI ​​assistant. When entering new genetic information, the AI ​​assistant will look for similar cases for logical interpretation.

When the AI ​​assistant recognizes the new pathogenic body, the Emedgene Genome Research Department will develop the logical algorithms of the pathogenic body, and then add these new algorithms to the AI ​​assistant, and the next time a similar situation occurs display.

Zinovis-iGenomeCloud: This is an enterprise-level tumor immune genomic big data analysis platform, which can solve the WES-oriented quantitative analysis of immune indicators and the construction of a baseline database of immune indicators. Pain points such as the construction of immune knowledge base and auxiliary interpretation logic in the report.

The platform retains the possibility of customized redevelopment for customers, including IT hardware configuration, initialization of mutation detection AI model, iteration of mutation detection AI model, LIMS interface, advanced quality control early warning, task management scheduling, report generation process, and Many modules, including the data analysis management system, can be customized according to customer needs, expanding functions and computing throughput.

Kyle Bio-Kyle Deep Map System: Kyle Bio uses RNA-seq technology for salivary transcriptomics analysis, and developed the Kyle deep map system, which improves the accuracy of prediction through the optimization training and verification of AI models Sex. At present, the company’s self-developed Kaier deep map artificial intelligence early cancer screening system has obtained independent and complete intellectual property protection, and has obtained medical device registration inspection.

Kayle Deep Map System ’s artificial intelligence-based RNA gene detection has specificity and sensitivity close to or greater than 80% for early screening of a variety of cancers; the detection cycle is usually 1-3 days, compared with conventional DNA-based sequencing The detection cycle of early cancer screening products is 7-20 days.

DeepDiagnos——Driving mutation screening algorithm: This algorithm can quickly analyze the whole genome data of patients and find the driving mutations in them. The algorithm model is mainly divided into two parts. The first part is the judgment of the tumor. First, a series of mutant gene lists are selected by an algorithm, and the possibility of tumor occurrence is judged by these mutations.

The second part is to build a model according to different diseases, put the detected data into the model and score, and then arrange the results in order of the score. The highest score is more likely to occur. The algorithm is currently not ideal for the diagnosis of stage I tumors and is more suitable for early screening of tumors.

Google-DeepVariant: ThisIt is a mutation detection software developed based on deep convolutional neural networks. DeepVariant simulates human analysis of genetic sequencing comparison data, and does not have any prior knowledge of genomics and makes no statistical assumptions on genetic sequencing data. Through supervised learning of a large number of labeled genome comparison data snapshot images and training of deep convolutional neural network (CNN) image recognition models based on the Tensorflow deep learning framework, it is possible to find gene mutations from high-throughput sequencing data and complete genotyping Function, its algorithm has the advantages of sequencing platform irrelevance, cross-species mutation detection, high versatility and other traditional bioinformatics methods.

IBM——Watson for Genomics (WfG): WfG can extract the required information from structured and unstructured information sources on a large scale in a short time, and further machine learning. At the same time, WfG can understand and read the specific mutation and pathology of the tumor, and reconstruct the knowledge base and identify potential treatment options, helping doctors save energy and time and make treatment decisions.

At present, the WfG solution supports a variety of tumor types, including but not limited to common solid tumors such as lung cancer and breast cancer, leukemia, lymphoma, myeloma, hematological tumors, unknown tumors and rare tumors.

At the same time, WfG has established cooperation with expert teams from 14 cancer centers and independent medical laboratories in the United States to make clinical interpretation more standardized. Clinical studies show that for 1018 enrolled patients, WfG completed clinical interpretation of each patient within 3 minutes after targeted full exon sequencing and biomarker analysis.

V. Limitations of the application of artificial intelligence technology in gene detection

1. The domestic lack of core intellectual property technology products for genetic testing, the cost of imported equipment is high.

2. Analysis and interpretation of genetic data There is a shortage of professionals and a large talent gap.

3. There are hidden dangers of discrimination in genetic data in the fields of employment and insurance, which harm the legitimate rights and interests of individuals.

4. The speed of interpretation and deep mining of genetic data is far below the speed of data production.

5. Gene-binding pathological data is easy to match and track to specific individuals, violating personal privacy.

Six, the development trend of intelligent genetic testing

1. The overall gene industry will extend from gene detection to gene editing and gene therapy.

2. The price of gene sequencing has fallen rapidly, driving the rapid development of the consumer-grade gene testing market.

3. Combination of multiple technologies such as gene detection and immunotherapy has become a trend of clinical diagnosis and treatment.

4. The development of pharmacogenomics promotes the development of genomics drugs and promotes the development of precision medicine.

5. Genetic data is integrated into clinical workflows and systems to assist doctors’ decision-making processes.