Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.

Q2 Computer Science Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2024-01-01

Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou

{"title":"Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.","authors":"Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"187-200"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017207/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.

Abstract Image

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数字生物标记物的主题协调：从语言标记改进对轻度认知障碍的检测。

轻度认知障碍（MCI）是包括阿尔茨海默病（AD）在内的痴呆症的早期阶段，也是治疗干预和治疗的关键阶段。早期发现 MCI 可为早期干预提供机会，并极大地丰富临床试验的队列。血浆和脑脊液生物标记物中的成像和活体标记物具有很高的检测性能，但其高昂的成本和侵扰性要求有更实惠、更易获得的替代品。数字生物标志物，尤其是语言标志物的最新进展显示出巨大的潜力，这些标志物从语言和/或语音中提取出与 MCI 相关的变量，然后用于预测建模。语言标记建模的一大挑战来自于每个人说话方式的多变性。由于大量的数据收集工作，语言研究的队列规模通常较小，人与人之间的可变性使得语言标记很难推广到未见过的受试者。在本文中，我们提出了一种新颖的受试者协调工具，以解决不同受试者之间语言标记分布差异的问题，从而提高机器学习模型的泛化性能。我们的实证结果表明，基于我们协调过的特征建立的机器学习模型在未见数据上的预测性能有所提高。源代码和实验脚本见 https://github.com/illidanlab/subject_harmonization。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量