Excessive commitment makes research susceptible to misinterpretation by the media and the public.

Editor’s note: This article href=”https://mp.weixin.qq.com/s/De9YFqHu9VJndsCG1Dt9Rg”> “academic headlines” from the micro-channel public number , Author: Ho Quiet.

The digitization of society means that we are accumulating data at an unprecedented rate, and medical care is no exception. According to IBM estimates, each person will accumulate approximately 1 terabyte of data in their lifetime, and the total amount of global healthcare data doubles every few years.

To handle this big data, more and more clinicians are working with computer scientists and other related disciplines to use artificial intelligence (AI) technology to help detect noisy signals. A recent forecast shows that the value of the medical artificial intelligence market will increase from USD 2 billion in 2018 to USD 3.6 billion in 2025, with a compound annual growth rate of 50%.

AI is an innovative and fast-growing field with the potential to improve patient care and reduce the heavy burden of medical services. Deep learning is a branch of artificial intelligence and has shown special prospects in the field of medical imaging. With the publication of more and more research results, all circles are increasingly interested in deep learning research in medical imaging and other fields.

Is AI surpassing the doctor?

In the past year or two, we often saw headlines such as “Studies found that Google artificial intelligence discovered lung cancer a year earlier than doctors” and “AI is better at diagnosing skin cancer than doctors”. Media outreach has greatly increased public and business interest in AI + healthcare, and has also catalyzed the accelerated implementation of technology. In fact, the research methodology and risk of bias behind these titles have not been tested in detail. But researchers in the British Medical Journal (BMJ) recently warned that “Many studies and media claim that artificial intelligence is as good as or better than human experts at interpreting medical images, but the quality of AI is actually very high. Poor and exaggerated, this poses a risk to patient safety. “

To further study this issue, researchers at Imperial College London reviewed the research results published in the past 10 years, systematically examining the research design, reporting standards, risk of bias, and comparing the performance of deep learning algorithms in medical imaging with Compare with clinical experts. Their data were obtained from the Medline, Embase, Cochrane Central Controlled Trials Register and the World Health Organization’s Trial Register from 2010 to June 2019. They included 7,334 study records and 968 trial registrations. They have adopted a randomized trial registration and non-randomized research approach to recognized reporting standardsTo measure, the performance of deep learning algorithms in medical imaging images was compared with multiple clinical experts.

The so-called randomized trial is to randomize the research subjects, set up a control group, and apply blind methods, so that both researchers and subjects cannot know the results of the grouping. CONSORT (Consolidated Standard for Reporting Clinical Trials) is a standard for reporting randomized controlled trials. It includes the number of patients in each group and the number of patients assigned to the treatment. It helps medical staff to understand the background, purpose, and intervention of the trial. , Stochastic methods and statistical analysis. Rather than randomized trials, the study subjects were grouped according to the wishes of the investigator or patient, and TRIPOD was used as the reporting standard in this trial.

In randomized clinical trials, researchers found only 10 records of deep learning, of which 2 have been published, respectively, ophthalmology and radiology, and the remaining 8 are currently or will soon recruit clinical patients for trials.

AI’s performance is “exaggerated”

In the first trial, 350 pediatric patients at the Chinese Eye Clinic were recruited. These patients received cataract assessment and diagnosis with or without the AI ​​platform and received treatment recommendations. Researchers have found that the accuracy of AI diagnosis is 87%, and the accuracy rate of expert doctors reaches 99%. These results are significantly lower than the diagnostic accuracy of expert doctors, but the average time for AI platform diagnosis is faster than expert diagnosis.

The second completed trial recruited patients undergoing colonoscopy and found that the AI ​​system can detect polyps significantly, with a lower risk of error and a high degree of compliance with reporting standards. In 81 non-randomized clinical trials, they found that only 9 studies were prospective, and only 6 of them were tested in a real-world clinical setting. A summary of 77 of the 81 studies included comparisons between AI and clinician performance, with 30% of studies stating that AI was superior to clinicians.

In order to review the results of the study independently, they also had severe restrictions on access to the raw data and code, and only 1 study provided raw tag data and code. By using a bias risk tool to evaluate the study, the researchers also found that more than two-thirds of the studies were judged to have a higher risk of bias and that they were less likely to adhere to accepted reporting standards. Three-quarters of the studies claim that the performance of artificial intelligence is comparable to or better than that of clinicians, and only 38% of the studies indicate that further prospective studies or trials are needed.

In summary, there are few prospective deep learning studies and randomized trials in the medical imaging field. Most non-randomized trials are not prospective, have a high risk of bias, and deviate from existing reporting standards. Most studies lack data and code availability, and human comparator groups oftenVery small. However, the researchers also pointed out some limitations in the evaluation study, such as the possibility of missing research and the focus on deep learning medical imaging research, so the findings may not be applicable to other types of artificial intelligence.

Nevertheless, they said, “There are currently a lot of exaggerations about being equal to or better than clinicians, which pose a potential risk to patient safety and population health at the social level.” They also warned “” Excessive commitment will make research easy to be misinterpreted by the media and the public, and the results may not be in the best interests of the patient, nor will it maximize patient safety, and the best way is to ensure that we have high-quality and transparent reports Evidence base. “

Reference:

[1] https://www.eurekalert.org/emb_releases/2020-03/b-co032320.php

[2] https://www.bmj.com/content/368/bmj.m689