Alexander Craik, Heather R Dial, Jose L Contreras-Vidal
{"title":"Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG).","authors":"Alexander Craik, Heather R Dial, Jose L Contreras-Vidal","doi":"10.1088/1741-2552/ad8d0a","DOIUrl":null,"url":null,"abstract":"<p><p>Neurological disorders affecting speech production adversely impact quality of
life for over 7 million individuals in the US. Traditional speech interfaces like eyetracking
devices and P300 spellers are slow and unnatural for these patients. An
alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech
characteristics, offering a more natural communication mechanism. This research
explores the feasibility of decoding speech features using non-invasive EEG. Nine
neurologically intact participants were equipped with a 63-channel EEG system
with additional sensors to eliminate eye artifacts. Participants read aloud sentences
displayed on a screen selected for phonetic similarity to the English language. Deep
learning models, including Convolutional Neural Networks and Recurrent Neural
Networks with and without attention modules, were optimized with a focus on
minimizing trainable parameters and utilizing small input window sizes for real-time
application. These models were employed for discrete and continuous speech decoding
tasks, achieving statistically significant participant-independent decoding performance
for discrete classes and continuous characteristics of the produced audio signal. A
frequency sub-band analysis highlighted the significance of certain frequency bands
(delta, theta, and gamma) for decoding performance, and a perturbation analysis
was used to identify crucial channels. Assessed channel selection methods did not
significantly improve performance, suggesting a distributed representation of speech
information encoded in the EEG signals. Leave-One-Out training demonstrated
the feasibility of utilizing common speech neural correlates, reducing data collection
requirements from individual participants.</p>","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ad8d0a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Neurological disorders affecting speech production adversely impact quality of
life for over 7 million individuals in the US. Traditional speech interfaces like eyetracking
devices and P300 spellers are slow and unnatural for these patients. An
alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech
characteristics, offering a more natural communication mechanism. This research
explores the feasibility of decoding speech features using non-invasive EEG. Nine
neurologically intact participants were equipped with a 63-channel EEG system
with additional sensors to eliminate eye artifacts. Participants read aloud sentences
displayed on a screen selected for phonetic similarity to the English language. Deep
learning models, including Convolutional Neural Networks and Recurrent Neural
Networks with and without attention modules, were optimized with a focus on
minimizing trainable parameters and utilizing small input window sizes for real-time
application. These models were employed for discrete and continuous speech decoding
tasks, achieving statistically significant participant-independent decoding performance
for discrete classes and continuous characteristics of the produced audio signal. A
frequency sub-band analysis highlighted the significance of certain frequency bands
(delta, theta, and gamma) for decoding performance, and a perturbation analysis
was used to identify crucial channels. Assessed channel selection methods did not
significantly improve performance, suggesting a distributed representation of speech
information encoded in the EEG signals. Leave-One-Out training demonstrated
the feasibility of utilizing common speech neural correlates, reducing data collection
requirements from individual participants.