A new $14 M National Institutes of Health project, Voice as a Biomarker of Health, is building a database of vocal biomarkers that could help diagnose cancer. The research project is part of the NIH Common Fund’s Bridge2AI program and includes 12 research institutions led by the University of Southern Florida and Weill Cornell Medicine. French artificial intelligence (AI) start-up Owkin provides the technology for the project’s security and privacy of sensitive voice data.
According to Dr. Yaël Bensoussan, USF Health Morsani College of Medicine and co-principal project investigator, acoustic analysis is a science that has existed for decades.
“Speech pathologists and acoustic engineers are known as the experts in that field and have been doing incredible research by extracting acoustic features from the voice signal,” said Bensoussan. “The common ones are pitch or amplitude and their derivatives. For example, we know that men with Parkinson’s can have a decrease in volume (amplitude and an increase in pitch with the disease).”
Bensoussan says there are also biomarkers in the way we speak. For example, the cadence or words per minute can be slowed in certain diseases. “With emerging ML technology, we can train models by providing an acoustic sample raw sound without necessarily extracting these features.”
“With natural language processing technology, we can also analyze the content of speech when previously someone had to manually transcribe or count words or sentences,” said Bensoussan. “The emerging technology will change how we use voice as a biomarker.”
For example, head and neck cancer can change many different features in your voice and speech.
“With vocal cord cancer, the first symptom is usually a change in voice because even the small cancers affect how your vocal cords vibrate. The quality of the sound will be rough,” said Bensoussan. “As cancer grows, it can make the breathing passage smaller. In that case, you now hear a very rough voice with a breathing sound on inspiration called stridor due to changes in airflow.”
But Bensoussan says another example is oropharyngeal cancer which is more common in men in their 40s due to HPV. “The masses in the back of the throat change the way the voice resonates, and the sound of the voice becomes muffled, like a hot potato voice.”
“Experts such as head and neck cancer surgeons can usually hear these cues, and we can usually have a high suspicion of cancer because we are used to hearing these changes in the voice of that population,” said Bensoussan.
Using AI, the project wants to reproduce that expertise by training ML models to pick up those acoustic cues and spot diseases by detecting changes in the human voice so physicians and providers in low-resource settings can access the technology.
According to Bensoussan, the research team will use ML to analyze speech and voice in three ways.
“The first is by using raw audio clips and converting them to visual Mel spectrograms the model can analyze; second, we will extract [..] acoustic features such as frequency (pitch changes), formants, amplitude, and others and for the models to learn on that data,” said Bensoussan. “The third is natural language processing, which allows us to analyze the content of the speech without someone having to transcribe the words on a piece of paper manually.”
In a press statement, Dr. Thomas Clozel, co-founder and CEO of Owkin, said that by using AI to analyze minute changes in the human voice, they hope to help doctors to diagnose and treat diseases ranging from cancer to depression.
“Vocal biomarkers are set to play an increasingly important role in healthcare,” said Clozel. “We are excited to be using federated learning, our privacy-preserving AI framework, to connect the medical world in the pursuit of improving outcomes for patients.”
Benoussan says that voice is one of the cheapest and lowest resource bio-marker to collect.
“When you think about genomic biomarkers, the current technology for analysis is quite costly. If you think about imaging testing like scans, MRIs, etc., they are resource intensive and pose risks to patients,” said Benoussan. “We have known for decades that voice and speech change with different diseases, but it used to take expensive microphones and hardware to extract acoustic features and analyze them.”
According to Benoussan, with the current technology integrated with smartphones and tablets with high-quality microphone recorders, collecting voice becomes easy and accessible in remote or low-resource settings with no laboratories or expansive machines.
“We also know that companies like Apple, Alexa, or Amazon [..] are investing in voice as a biomarker as they record customers’ voices all day,” added Benoussan. “Our role as clinicians and researchers funded by the NIH is to ensure accuracy and diversity of the data and protect patients’ rights when they decide to participate in our research.”
NIH’s Bridge2AI program is focused on creating an extensive database of 30,000 voices, including speech and breathing sounds, over the next four years within a user-friendly, safe IT cloud infrastructure and platform. “We want other researchers with clinical questions to be able to use this database to answer it, and we want to provide them with the tools to collect voice data easily and link it to health information if they want to contribute to the database,” said Benoussan.