Early in the pandemic, researchers, startups, and institutions developed AI systems that claimed to diagnose COVID-19 from the sound of a person’s cough. We were initially enthusiastic about this prospect, dubbing cough-scrutinizing AI as ‘promising.’ However, a recent study suggests that some cough-analyzing algorithms are less accurate than we and the public were led to believe.
A Cautionary Tale for Machine Learning in Healthcare
The study, conducted by researchers from The Alan Turing Institute and Royal Statistical Society, commissioned by the U.K. Health Security Agency, reviewed audio-based AI tech as a COVID-19 screening tool. The research team found that even the most accurate cough-detecting model performed worse than a model based on user-reported systems and demographic data, such as age and gender.
The Implications: A Blow to Commercial Efforts
"The implications are that the AI models used by many apps add little or no value over and above the predictive accuracy offered by user-reported symptoms," the co-authors of the report told TechCrunch in an email interview. For instance, Fujitsu’s Cough in a Box app, funded by the U.K.’s Department of Health and Social Care to collect and analyze audio recordings of COVID-19 symptoms, may be less effective than initially claimed.
A Study of Over 67,000 Participants
The researchers examined data from over 67,000 people recruited through the National Health Service’s Test and Trace and REACT-1 programs. These participants were asked to send back nose and throat swab test results for COVID-19 along with recordings of them coughing, breathing, and talking.
A Poor Predictor: Coughs in Diagnosing COVID-19
The AI model’s diagnostic accuracy was not much better than chance when controlling for confounders. Partly to blame was recruitment bias in the Test and Trace system, which required participants to have at least one COVID-19 symptom to take part.
A Blow to Scientific Claims: A 98.5% Accuracy Claim
One paper co-authored by researchers at the Massachusetts Institute of Technology pegged the accuracy of a cough-analyzing COVID-19 algorithm at 98.5%. However, this claim seems dubiously high in retrospect.
The Future of Cough Detection: Will It Work for Other Respiratory Viruses?
Professor Chris Holmes, lead author of the study and program director for health and medical science at The Alan Turing Institute, leaves open the possibility that the tech may work for other respiratory viruses in the future. However, this would not be the first time healthcare AI has overpromised and underdelivered.
A History of Overpromising and Underdelivering: Healthcare AI
In 2018, STAT reported that IBM’s Watson supercomputer spat out erroneous cancer treatment advice, resulting from training on a small number of synthetic cases. In 2021, an audit of Epic’s AI algorithm for diagnosing breast cancer found that it was less accurate than human radiologists.
Conclusion: A Cautionary Tale for Machine Learning in Healthcare
The recent study highlights the importance of rigorously testing and validating AI-powered diagnosis tools before deploying them in clinical settings. We must be cautious not to overpromise and underdeliver when it comes to machine learning in healthcare, lest we risk compromising patient care.
Related Reading: