{"title":"Natural language processing in Alzheimer's disease research: Systematic review of methods, data, and efficacy.","authors":"Arezo Shakeri, Mina Farmanbar","doi":"10.1002/dad2.70082","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Alzheimer's disease (AD) prevalence is increasing, with no current cure. Natural language processing (NLP) offers the potential for non-invasive diagnostics, social burden assessment, and research advancements in AD.</p><p><strong>Method: </strong>A systematic review using Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines explored NLP applications in AD, focusing on dataset types, sources, research foci, methods, and effectiveness. Searches were conducted across six databases (ACM, Embase, IEEE, PubMed, Scopus, and Web of Science) from January 2020 to July 2024.</p><p><strong>Results: </strong>Of 1740 records, 79 studies were selected. Frequently used datasets included speech and electronic health records (EHR), along with social media and scientific publications. Machine learning and neural networks were primarily applied to speech, EHR, and social media data, while rule-based methods were used to analyze literature datasets.</p><p><strong>Discussion: </strong>NLP has proven effective in various aspects of AD research, including diagnosis, monitoring, social burden assessment, biomarker analysis, and research. However, there are opportunities for improvement in dataset diversity, model interpretability, multilingual capabilities, and addressing ethical concerns.</p><p><strong>Highlights: </strong>This review systematically analyzed 79 studies from six major databases, focusing on the advancements and applications of natural language processing (NLP) in Alzheimer's disease (AD) research.The study highlights the need for models focusing on remote monitoring of AD patients using speech analysis, offering a cost-effective alternative to traditional methods such as brain imaging and aiding clinicians in both prediagnosis and post-diagnosis periods.The use of pretrained multilingual models is recommended to improve AD detection across different languages by leveraging diverse speech features and utilizing publicly available datasets.</p>","PeriodicalId":53226,"journal":{"name":"Alzheimer''s and Dementia: Diagnosis, Assessment and Disease Monitoring","volume":"17 1","pages":"e70082"},"PeriodicalIF":4.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11812127/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alzheimer''s and Dementia: Diagnosis, Assessment and Disease Monitoring","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/dad2.70082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Alzheimer's disease (AD) prevalence is increasing, with no current cure. Natural language processing (NLP) offers the potential for non-invasive diagnostics, social burden assessment, and research advancements in AD.
Method: A systematic review using Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines explored NLP applications in AD, focusing on dataset types, sources, research foci, methods, and effectiveness. Searches were conducted across six databases (ACM, Embase, IEEE, PubMed, Scopus, and Web of Science) from January 2020 to July 2024.
Results: Of 1740 records, 79 studies were selected. Frequently used datasets included speech and electronic health records (EHR), along with social media and scientific publications. Machine learning and neural networks were primarily applied to speech, EHR, and social media data, while rule-based methods were used to analyze literature datasets.
Discussion: NLP has proven effective in various aspects of AD research, including diagnosis, monitoring, social burden assessment, biomarker analysis, and research. However, there are opportunities for improvement in dataset diversity, model interpretability, multilingual capabilities, and addressing ethical concerns.
Highlights: This review systematically analyzed 79 studies from six major databases, focusing on the advancements and applications of natural language processing (NLP) in Alzheimer's disease (AD) research.The study highlights the need for models focusing on remote monitoring of AD patients using speech analysis, offering a cost-effective alternative to traditional methods such as brain imaging and aiding clinicians in both prediagnosis and post-diagnosis periods.The use of pretrained multilingual models is recommended to improve AD detection across different languages by leveraging diverse speech features and utilizing publicly available datasets.
期刊介绍:
Alzheimer''s & Dementia: Diagnosis, Assessment & Disease Monitoring (DADM) is an open access, peer-reviewed, journal from the Alzheimer''s Association® that will publish new research that reports the discovery, development and validation of instruments, technologies, algorithms, and innovative processes. Papers will cover a range of topics interested in the early and accurate detection of individuals with memory complaints and/or among asymptomatic individuals at elevated risk for various forms of memory disorders. The expectation for published papers will be to translate fundamental knowledge about the neurobiology of the disease into practical reports that describe both the conceptual and methodological aspects of the submitted scientific inquiry. Published topics will explore the development of biomarkers, surrogate markers, and conceptual/methodological challenges. Publication priority will be given to papers that 1) describe putative surrogate markers that accurately track disease progression, 2) biomarkers that fulfill international regulatory requirements, 3) reports from large, well-characterized population-based cohorts that comprise the heterogeneity and diversity of asymptomatic individuals and 4) algorithmic development that considers multi-marker arrays (e.g., integrated-omics, genetics, biofluids, imaging, etc.) and advanced computational analytics and technologies.