A machine learning approach for identifying anatomical biomarkers of early mild cognitive impairment.

IF 2.3 3区生物学 Q2 MULTIDISCIPLINARY SCIENCES PeerJ Pub Date : 2024-12-13 eCollection Date: 2024-01-01 DOI:10.7717/peerj.18490

Alwani Liyana Ahmad, Jose M Sanchez-Bornot, Roberto C Sotero, Damien Coyle, Zamzuri Idris, Ibrahima Faye

{"title":"A machine learning approach for identifying anatomical biomarkers of early mild cognitive impairment.","authors":"Alwani Liyana Ahmad, Jose M Sanchez-Bornot, Roberto C Sotero, Damien Coyle, Zamzuri Idris, Ibrahima Faye","doi":"10.7717/peerj.18490","DOIUrl":null,"url":null,"abstract":"Background: Alzheimer's Disease (AD) poses a major challenge as a neurodegenerative disorder, and early detection is critical for effective intervention. Magnetic resonance imaging (MRI) is a critical tool in AD research due to its availability and cost-effectiveness in clinical settings.Objective: This study aims to conduct a comprehensive analysis of machine learning (ML) methods for MRI-based biomarker selection and classification to investigate early cognitive decline in AD. The focus to discriminate between classifying healthy control (HC) participants who remained stable and those who developed mild cognitive impairment (MCI) within five years (unstable HC or uHC).Methods: 3-Tesla (3T) MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies 3 (OASIS-3) were used, focusing on HC and uHC groups. Freesurfer's recon-all and other tools were used to extract anatomical biomarkers from subcortical and cortical brain regions. ML techniques were applied for feature selection and classification, using the MATLAB Classification Learner (MCL) app for initial analysis, followed by advanced methods such as nested cross-validation and Bayesian optimization, which were evaluated within a Monte Carlo replication analysis as implemented in our customized pipeline. Additionally, polynomial regression-based data harmonization techniques were used to enhance ML and statistical analysis. In our study, ML classifiers were evaluated using performance metrics such as Accuracy (Acc), area under the receiver operating characteristic curve (AROC), F1-score, and a normalized Matthew's correlation coefficient (MCC').Results: Feature selection consistently identified biomarkers across ADNI and OASIS-3, with the entorhinal, hippocampus, lateral ventricle, and lateral orbitofrontal regions being the most affected. Classification results varied between balanced and imbalanced datasets and between ADNI and OASIS-3. For ADNI balanced datasets, the naíve Bayes model using z-score harmonization and ReliefF feature selection performed best (Acc = 69.17%, AROC = 77.73%, F1 = 69.21%, MCC' = 69.28%). For OASIS-3 balanced datasets, SVM with zscore-corrected data outperformed others (Acc = 66.58%, AROC = 72.01%, MCC' = 66.78%), while logistic regression had the best F1-score (66.68%). In imbalanced data, RUSBoost showed the strongest overall performance on ADNI (F1 = 50.60%, AROC = 81.54%) and OASIS-3 (MCC' = 63.31%). Support vector machine (SVM) excelled on ADNI in terms of Acc (82.93%) and MCC' (70.21%), while naïve Bayes performed best on OASIS-3 by F1 (42.54%) and AROC (70.33%).Conclusion: Data harmonization significantly improved the consistency and performance of feature selection and ML classification, with z-score harmonization yielding the best results. This study also highlights the importance of nested cross-validation (CV) to control overfitting and the potential of a semi-automatic pipeline for early AD detection using MRI, with future applications integrating other neuroimaging data to enhance prediction.","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"12 ","pages":"e18490"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648692/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.18490","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Alzheimer's Disease (AD) poses a major challenge as a neurodegenerative disorder, and early detection is critical for effective intervention. Magnetic resonance imaging (MRI) is a critical tool in AD research due to its availability and cost-effectiveness in clinical settings.

Objective: This study aims to conduct a comprehensive analysis of machine learning (ML) methods for MRI-based biomarker selection and classification to investigate early cognitive decline in AD. The focus to discriminate between classifying healthy control (HC) participants who remained stable and those who developed mild cognitive impairment (MCI) within five years (unstable HC or uHC).

Methods: 3-Tesla (3T) MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies 3 (OASIS-3) were used, focusing on HC and uHC groups. Freesurfer's recon-all and other tools were used to extract anatomical biomarkers from subcortical and cortical brain regions. ML techniques were applied for feature selection and classification, using the MATLAB Classification Learner (MCL) app for initial analysis, followed by advanced methods such as nested cross-validation and Bayesian optimization, which were evaluated within a Monte Carlo replication analysis as implemented in our customized pipeline. Additionally, polynomial regression-based data harmonization techniques were used to enhance ML and statistical analysis. In our study, ML classifiers were evaluated using performance metrics such as Accuracy (Acc), area under the receiver operating characteristic curve (AROC), F1-score, and a normalized Matthew's correlation coefficient (MCC').

Results: Feature selection consistently identified biomarkers across ADNI and OASIS-3, with the entorhinal, hippocampus, lateral ventricle, and lateral orbitofrontal regions being the most affected. Classification results varied between balanced and imbalanced datasets and between ADNI and OASIS-3. For ADNI balanced datasets, the naíve Bayes model using z-score harmonization and ReliefF feature selection performed best (Acc = 69.17%, AROC = 77.73%, F1 = 69.21%, MCC' = 69.28%). For OASIS-3 balanced datasets, SVM with zscore-corrected data outperformed others (Acc = 66.58%, AROC = 72.01%, MCC' = 66.78%), while logistic regression had the best F1-score (66.68%). In imbalanced data, RUSBoost showed the strongest overall performance on ADNI (F1 = 50.60%, AROC = 81.54%) and OASIS-3 (MCC' = 63.31%). Support vector machine (SVM) excelled on ADNI in terms of Acc (82.93%) and MCC' (70.21%), while naïve Bayes performed best on OASIS-3 by F1 (42.54%) and AROC (70.33%).

Conclusion: Data harmonization significantly improved the consistency and performance of feature selection and ML classification, with z-score harmonization yielding the best results. This study also highlights the importance of nested cross-validation (CV) to control overfitting and the potential of a semi-automatic pipeline for early AD detection using MRI, with future applications integrating other neuroimaging data to enhance prediction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

背景：阿尔茨海默病（AD）作为一种神经退行性疾病是一项重大挑战，早期发现对于有效干预至关重要。磁共振成像（MRI）因其在临床环境中的可用性和成本效益而成为阿尔茨海默病研究的重要工具：本研究旨在对基于 MRI 的生物标记物选择和分类的机器学习（ML）方法进行全面分析，以研究 AD 的早期认知能力下降。方法：使用阿尔茨海默病神经影像学倡议（ADNI）和影像学研究开放存取系列3（OASIS-3）中的3特斯拉（3T）磁共振成像数据，重点关注HC组和uHC组。使用 Freesurfer's recon-all 和其他工具从皮层下和皮层脑区提取解剖生物标记。我们使用 MATLAB Classification Learner (MCL) 应用程序进行初步分析，然后使用嵌套交叉验证和贝叶斯优化等高级方法进行特征选择和分类。此外，我们还使用了基于多项式回归的数据协调技术来加强 ML 和统计分析。在我们的研究中，使用准确度（Acc）、接收者工作特征曲线下面积（AROC）、F1-分数和归一化马修相关系数（MCC'）等性能指标对 ML 分类器进行了评估：结果：特征选择一致地识别出了ADNI和OASIS-3的生物标记物，其中受影响最严重的是内侧、海马、侧脑室和外侧眶额区。平衡数据集和不平衡数据集以及 ADNI 和 OASIS-3 的分类结果各不相同。对于 ADNI 平衡数据集，使用 z 分数协调和 ReliefF 特征选择的天真贝叶斯模型表现最佳（Acc = 69.17%，AROC = 77.73%，F1 = 69.21%，MCC' = 69.28%）。对于 OASIS-3 平衡数据集，使用 zscore 修正数据的 SVM 的表现优于其他方法（Acc = 66.58%，AROC = 72.01%，MCC' = 66.78%），而逻辑回归的 F1 分数最好（66.68%）。在不平衡数据中，RUSBoost 在 ADNI（F1 = 50.60%，AROC = 81.54%）和 OASIS-3 （MCC' = 63.31%）上表现出最强的整体性能。支持向量机（SVM）在 ADNI 上的 Acc（82.93%）和 MCC'（70.21%）表现出色，而天真贝叶斯在 OASIS-3 上的 F1（42.54%）和 AROC（70.33%）表现最佳：结论：数据协调大大提高了特征选择和 ML 分类的一致性和性能，z-score 协调产生了最佳结果。这项研究还强调了嵌套交叉验证（CV）对控制过拟合的重要性，以及利用核磁共振成像进行早期AD检测的半自动化管道的潜力，未来的应用还将整合其他神经影像数据以增强预测效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ MULTIDISCIPLINARY SCIENCES-

CiteScore

4.70

自引率

3.70%

发文量

1665

审稿时长

10 weeks

期刊介绍： PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.