Speaker identification in auditory attention decoding (SI-AAD) aims to identify the attended speaker from electroencephalography (EEG) signals. However, its application for hearing-impaired individuals is limited since existing methods rarely consider altered auditory perception from hearing-assistive devices under mixed-speech conditions, compounded by the lack of relevant datasets and difficulties in learning robust EEG-speech correspondences due to weak cross-modal alignment and insufficient feature extraction. Therefore, we construct five mixed-speech AAD datasets (MS-AAD), serving as the first EEG benchmark to simulate typical device-induced acoustic alterations without spatial cues. To enhance modality alignment, we propose a timbre-enhanced latent alignment (TELA) framework that jointly models latent embeddings and perceptual speaker cues via contrastive learning and auxiliary timbre classification. To further improve EEG-based feature extraction, we design FCTNet, a frequency-channel-temporal attention-based EEG encoder that captures rich neural patterns across multiple domains. Experiments on MS-AAD demonstrate that TELA and FCTNet jointly achieve 89.5% SI-AAD accuracy across diverse hearing conditions, highlighting the critical role of device-simulated acoustic dataset design and perceptually guided representation learning with advanced EEG encoding in mixed-speech SI-AAD for hearing-assistive applications.
扫码关注我们
求助内容:
应助结果提醒方式:
