Speech is one of the most abundant and natural sources of acoustic data containing prosodic and spectral information. Acoustic features help diagnose mental and emotional health issues. In recent years, several researchers have looked at speech features as a way to detect depression. However, most of the frameworks only work with the data on which they were trained and do not work with new speakers, recording devices, or languages. This research aims to identify reliable and interpretable acoustic features that serve as stable indicators of depression in various speech datasets.
This study used two publicly available datasets, E-DAIC and MODMA. A total of 107 handcrafted prosodic, spectral, and voice quality acoustic features were extracted from 4-second segments, with 1-second overlap for long audios and padding for short audio clips. Subject-aware pre-processing was used to prevent speaker level overlap. Five feature selection algorithms were used and their findings were integrated using a consensus-based rank aggregation framework to identify consistent depression related features in both datasets. The resulting set of characteristics was evaluated using four classifier architectures through a K-sweep analysis. The adaptation of the correlation alignment domain was used to reduce distribution mismatches by aligning second-order statistics between the source and target domains, allowing robust cross-dataset transfer evaluation. Bidirectional cross-dataset evaluation demonstrated effective generalization in both transfer directions. Models trained on E-DAIC achieved F1=0.49-0.52 in MODMA (92%–94% of within-dataset performance), while MODMA trained models achieved F1=0.34–0.35 in E-DAIC, exceeding the baseline within-dataset of E-DAIC. The negative domain loss observed in E-DAIC (domain loss = −0.22 to −0.24) reflects high intra-dataset heterogeneity from naturalistic recording conditions rather than poor generalizability. These findings demonstrate that robust acoustic depression biomarkers can be learned from diverse datasets, enabling the detection of cross-linguistic depression.
扫码关注我们
求助内容:
应助结果提醒方式:
