Pub Date : 2026-01-27DOI: 10.64898/2026.01.26.26344845
Mirage Modi, Jordan E Krull, Donte Johnson, Xiaoying Wang, Timothy D Gauntner, Mingjia Li, Hao Cheng, Anjun Ma, Ping Zhang, Daniel G Stover, Zihai Li, Qin Ma
Background: Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. Sparse autoencoders offer a mechanistic interpretability approach to reveal how models represent medical knowledge and why they fail.
Methods: We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations in physician-validated oncology cases, comparing staging and treatment against National Comprehensive Cancer Network (NCCN) and American Joint Committee on Cancer (AJCC) 8/9th edition guidelines. We trained sparse autoencoders on 1 billion tokens from 50,000 Medical Information Mart for Intensive Care (MIMIC-IV) clinical notes to analyze how each architecture encodes polysemous medical terms, performed 850 ablation experiments, and tested a two-stage retrieval intervention for sense disambiguation.
Results: Models exhibited dramatic reasoning instability. OpenBioLLM staging accuracy shifted from 45.9% to 99.1% based solely on prompt format. For cases with intentionally insufficient information, MedGemma generated definitive staging 100% of the time while GPT-5 appropriately indicated uncertainty, yet GPT-5 was only 7% accurate in staging IVB tumors. Overall, AJCC concordance was 97.5% for GPT-5 versus 26.8% for MedGemma and 31.0% for OpenBioLLM. Sparse autoencoder analysis revealed MedGemma's top-10 features showed 77.8% overlap across word senses (e.g., "cardiac arrest" versus "respiratory arrest") versus 13.6% for OpenBioLLM, indicating hierarchical versus distributed encoding. Ablating context-encoding features degraded prediction by 16.4 nats in MedGemma versus 6.7 nats in OpenBioLLM. A retrieval intervention improved MedGemma disambiguation by 10.2% (p=0.022, paired t-test) but harmed OpenBioLLM by 2.0% (p=0.021).
Conclusions: Medical AI systems exhibit clinical reasoning fragility not captured by benchmark performance. We find that architecturally distinct models encode medical concepts differently, and interventions effective for one architecture may harm another. We recommend that safety validation must be architecture-specific, as benchmark equivalence does not imply functional equivalence.
{"title":"Why Large Language Models' Clinical Reasoning Fails: Insights from Explainable Deep Learning.","authors":"Mirage Modi, Jordan E Krull, Donte Johnson, Xiaoying Wang, Timothy D Gauntner, Mingjia Li, Hao Cheng, Anjun Ma, Ping Zhang, Daniel G Stover, Zihai Li, Qin Ma","doi":"10.64898/2026.01.26.26344845","DOIUrl":"https://doi.org/10.64898/2026.01.26.26344845","url":null,"abstract":"<p><strong>Background: </strong>Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. Sparse autoencoders offer a mechanistic interpretability approach to reveal how models represent medical knowledge and why they fail.</p><p><strong>Methods: </strong>We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations in physician-validated oncology cases, comparing staging and treatment against National Comprehensive Cancer Network (NCCN) and American Joint Committee on Cancer (AJCC) 8/9th edition guidelines. We trained sparse autoencoders on 1 billion tokens from 50,000 Medical Information Mart for Intensive Care (MIMIC-IV) clinical notes to analyze how each architecture encodes polysemous medical terms, performed 850 ablation experiments, and tested a two-stage retrieval intervention for sense disambiguation.</p><p><strong>Results: </strong>Models exhibited dramatic reasoning instability. OpenBioLLM staging accuracy shifted from 45.9% to 99.1% based solely on prompt format. For cases with intentionally insufficient information, MedGemma generated definitive staging 100% of the time while GPT-5 appropriately indicated uncertainty, yet GPT-5 was only 7% accurate in staging IVB tumors. Overall, AJCC concordance was 97.5% for GPT-5 versus 26.8% for MedGemma and 31.0% for OpenBioLLM. Sparse autoencoder analysis revealed MedGemma's top-10 features showed 77.8% overlap across word senses (e.g., \"cardiac arrest\" versus \"respiratory arrest\") versus 13.6% for OpenBioLLM, indicating hierarchical versus distributed encoding. Ablating context-encoding features degraded prediction by 16.4 nats in MedGemma versus 6.7 nats in OpenBioLLM. A retrieval intervention improved MedGemma disambiguation by 10.2% (p=0.022, paired t-test) but harmed OpenBioLLM by 2.0% (p=0.021).</p><p><strong>Conclusions: </strong>Medical AI systems exhibit clinical reasoning fragility not captured by benchmark performance. We find that architecturally distinct models encode medical concepts differently, and interventions effective for one architecture may harm another. We recommend that safety validation must be architecture-specific, as benchmark equivalence does not imply functional equivalence.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.03.26343378
Neil R Smalheiser, Joe D Menke, Arthur W Holt
Objective: Our goal is to unify the 72 biomedical publication types and study designs (collectively, PTs) into a single rubric and hierarchy.
Materials and methods: This is carried out in a data-driven manner by computing pairwise similarities of each PT against all others to form a similarity matrix. By performing hierarchical clustering we place each PT in a specific category and collect these into broader categories.
Results: Spearman correlations among PT pairs ranged from strongly negative to strongly positive (-0.732 to +0.997), with a mean of 0.176. Overall, we obtained 13 clusters of PTs and 5 more general categories: Observational Clinical Research, Qualitative and Genetic Methods, Clinical Evaluation and Validation, Interventional Trial Research, and Scholarly Synthesis and Discourse. These were then utilized to construct a unified hierarchy of PT terms.
Discussion: The rubric provides a flexible classification scheme for publication types and study designs that can accommodate new PTs as they are added over time.
Conclusion: The similarity metric has the potential to improve the modeling, implementation and evaluation of automated indexing systems. The PT rubric provides an overview that complements the existing NIH MeSH Hierarchy trees, and the unified hierarchy permits proper automated expansion for PT indexing and PubMed user queries involving PT terms.
{"title":"A Similarity Metric, Rubric and Unified Hierarchy for Biomedical Publication Types and Study Designs.","authors":"Neil R Smalheiser, Joe D Menke, Arthur W Holt","doi":"10.64898/2026.01.03.26343378","DOIUrl":"10.64898/2026.01.03.26343378","url":null,"abstract":"<p><strong>Objective: </strong>Our goal is to unify the 72 biomedical publication types and study designs (collectively, PTs) into a single rubric and hierarchy.</p><p><strong>Materials and methods: </strong>This is carried out in a data-driven manner by computing pairwise similarities of each PT against all others to form a similarity matrix. By performing hierarchical clustering we place each PT in a specific category and collect these into broader categories.</p><p><strong>Results: </strong>Spearman correlations among PT pairs ranged from strongly negative to strongly positive (-0.732 to +0.997), with a mean of 0.176. Overall, we obtained 13 clusters of PTs and 5 more general categories: Observational Clinical Research, Qualitative and Genetic Methods, Clinical Evaluation and Validation, Interventional Trial Research, and Scholarly Synthesis and Discourse. These were then utilized to construct a unified hierarchy of PT terms.</p><p><strong>Discussion: </strong>The rubric provides a flexible classification scheme for publication types and study designs that can accommodate new PTs as they are added over time.</p><p><strong>Conclusion: </strong>The similarity metric has the potential to improve the modeling, implementation and evaluation of automated indexing systems. The PT rubric provides an overview that complements the existing NIH MeSH Hierarchy trees, and the unified hierarchy permits proper automated expansion for PT indexing and PubMed user queries involving PT terms.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12772678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.24.25343227
Enrique Perez-Benavides, Jueqi Wang, Zijian Chen, Stefen Beeler-Duden, Zachary Jackokes, John D Van Horn, Michael C Schatz, Kevin A Pelphrey, Archana Venkataraman
Autism Spectrum Disorder standardized behavioral assessments provide quantitative measures of symptoms, yet their reliability and consistency have not been systematically evaluated. We present the first large-scale comparative analysis of four widely used assessments. We analyzed behavioral assessments across three autism cohorts using correlations, clustering, and diagnostic agreement analyses. We related behavioral variation to genetic and imaging data to evaluate biomarker associations. Sentence-level embeddings generated by large language models reveal substantial semantic overlap across instruments. Nonetheless, behavioral scores are weakly correlated (0.26 ± 0.21), and diagnostic classification shows only 65-80% agreement between tests. These patterns hold across three datasets comprising N = 1 954. None of the assessments show consistent associations with widely studied MRI or genetic biomarkers. These findings expose critical inconsistencies among widely used autism assessments and underscore the need for more reliable tools to support precision phenotyping, biomarker discovery, and individualized care. Rather than diminishing the utility of behavioral assessment in autism, the inconsistencies identified here highlight a critical opportunity to refine how behavioral phenotypes are defined and operationalized.
{"title":"Behavioral Assessment Reliability in Clinical Phenotyping and Biomarker Research for Autism.","authors":"Enrique Perez-Benavides, Jueqi Wang, Zijian Chen, Stefen Beeler-Duden, Zachary Jackokes, John D Van Horn, Michael C Schatz, Kevin A Pelphrey, Archana Venkataraman","doi":"10.64898/2026.01.24.25343227","DOIUrl":"https://doi.org/10.64898/2026.01.24.25343227","url":null,"abstract":"<p><p>Autism Spectrum Disorder standardized behavioral assessments provide quantitative measures of symptoms, yet their reliability and consistency have not been systematically evaluated. We present the first large-scale comparative analysis of four widely used assessments. We analyzed behavioral assessments across three autism cohorts using correlations, clustering, and diagnostic agreement analyses. We related behavioral variation to genetic and imaging data to evaluate biomarker associations. Sentence-level embeddings generated by large language models reveal substantial semantic overlap across instruments. Nonetheless, behavioral scores are weakly correlated (0.26 ± 0.21), and diagnostic classification shows only 65-80% agreement between tests. These patterns hold across three datasets comprising N = 1 954. None of the assessments show consistent associations with widely studied MRI or genetic biomarkers. These findings expose critical inconsistencies among widely used autism assessments and underscore the need for more reliable tools to support precision phenotyping, biomarker discovery, and individualized care. Rather than diminishing the utility of behavioral assessment in autism, the inconsistencies identified here highlight a critical opportunity to refine how behavioral phenotypes are defined and operationalized.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870647/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.23.26344650
Linda K McEvoy, Bowei Zhang, Steve Nguyen, Adam X Maihofer, Caroline M Nievergelt, Casanova Ramon, Steve Horvath, Ake T Lu, Christos Davatzikos, Guray Erus, Susan M Resnick, Mark A Espeland, Stephen R Rapp, Kenneth Beckman, Luigi Ferrucci, Andrea Z LaCroix, Aladdin H Shadyab
Epigenetic clocks of biological aging have been associated with cognitive impairment and dementia. Less is known about whether they are associated with an older-appearing brain or with an atrophy pattern associated with dementia. We examined associations of five epigenetic clocks measured at baseline with the Spatial Pattern of Atrophy for Recognition of Brain Aging (SPARE-BA) and the Alzheimer's Disease Pattern Similarity Score (AD-PS) derived from structural MRIs obtained an average of 8 years later among 1,196 older women. Using linear regression models adjusting for relevant covariates, we observed no associations between any epigenetic clock and accelerated brain aging based on SPARE-BA. We observed a significant association between AgeAccelGrim2 and AD-PS (β = 0.015; 95% CI 0.004 to 0.027; p = 0.01). This association appeared to be primarily driven by the association of a DNA methylation marker of smoking pack years with frontal and temporal lobe volumes. AgeAccelGrim2 was not associated with volumes in regions implicated in early AD (hippocampus and entorhinal cortex). Taken together with prior findings, these results suggest that measures of epigenetic and brain age acceleration capture different aspects of biological aging, and that AgeAccelGrim2 is predictive of neurodegenerative changes associated with smoking that increase risk of dementia.
生物衰老的表观遗传时钟与认知障碍和痴呆有关。对于它们是否与看起来更老的大脑或与痴呆症相关的萎缩模式有关,人们知之甚少。我们研究了在基线时测量的五种表观遗传时钟与脑老化识别萎缩空间模式(SPARE-BA)和阿尔茨海默病模式相似评分(AD-PS)的关联,这些评分来自于平均8年后在1196名老年妇女中获得的结构核磁共振成像。通过调整相关协变量的线性回归模型,我们观察到基于SPARE-BA的任何表观遗传时钟与大脑加速衰老之间没有关联。我们观察到AgeAccelGrim2和AD-PS之间存在显著关联(β = 0.015; 95% CI 0.004至0.027;p = 0.01)。这种关联似乎主要是由吸烟年限的DNA甲基化标记物与额叶和颞叶体积的关联所驱动的。AgeAccelGrim2与早期AD相关区域(海马和内嗅皮质)的体积无关。结合先前的研究结果,这些结果表明,表观遗传和大脑年龄加速的测量方法捕捉到了生物衰老的不同方面,并且AgeAccelGrim2可以预测与吸烟相关的神经退行性变化,这些变化会增加痴呆症的风险。
{"title":"Association of epigenetic age acceleration with MRI biomarkers of aging and Alzheimer's disease neurodegeneration.","authors":"Linda K McEvoy, Bowei Zhang, Steve Nguyen, Adam X Maihofer, Caroline M Nievergelt, Casanova Ramon, Steve Horvath, Ake T Lu, Christos Davatzikos, Guray Erus, Susan M Resnick, Mark A Espeland, Stephen R Rapp, Kenneth Beckman, Luigi Ferrucci, Andrea Z LaCroix, Aladdin H Shadyab","doi":"10.64898/2026.01.23.26344650","DOIUrl":"https://doi.org/10.64898/2026.01.23.26344650","url":null,"abstract":"<p><p>Epigenetic clocks of biological aging have been associated with cognitive impairment and dementia. Less is known about whether they are associated with an older-appearing brain or with an atrophy pattern associated with dementia. We examined associations of five epigenetic clocks measured at baseline with the Spatial Pattern of Atrophy for Recognition of Brain Aging (SPARE-BA) and the Alzheimer's Disease Pattern Similarity Score (AD-PS) derived from structural MRIs obtained an average of 8 years later among 1,196 older women. Using linear regression models adjusting for relevant covariates, we observed no associations between any epigenetic clock and accelerated brain aging based on SPARE-BA. We observed a significant association between AgeAccelGrim2 and AD-PS (β = 0.015; 95% CI 0.004 to 0.027; p = 0.01). This association appeared to be primarily driven by the association of a DNA methylation marker of smoking pack years with frontal and temporal lobe volumes. AgeAccelGrim2 was not associated with volumes in regions implicated in early AD (hippocampus and entorhinal cortex). Taken together with prior findings, these results suggest that measures of epigenetic and brain age acceleration capture different aspects of biological aging, and that AgeAccelGrim2 is predictive of neurodegenerative changes associated with smoking that increase risk of dementia.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.23.26344710
Andrew J Vickers, Adam Brentnall
Objective: To determine the potential effect of implementing population-based prostate-specific antigen (PSA) screening in England on overdiagnosis and testing rates compared with the current opportunistic testing policy.
Design: Statistical modeling study. English data on rates of prostate cancer by stage, symptomatic and asymptomatic PSA testing, and life expectancy were merged with epidemiological assumptions on lead time to evaluate plausible overdiagnosis and PSA testing rates from an organized population-based program, in comparison with the current opportunistic policy. In the base-case scenario, organized screening increased the rate of asymptomatic PSA testing (screening) in men aged 50 - 69 year and decreased PSA testing in older men. An alternative modeling approach estimated change in overdiagnosis using data from the CAP trial, and current asymptomatic cancer detection rates.
Setting: England, 2018/19.
Participants: Adult men.
Main outcome measures: Rates of PSA testing, early-stage prostate cancer incidence, and overdiagnosis (prostate cancer that would not be diagnosed in a man's lifetime but for the PSA test).
Results: In the base scenario, introduction of population-based screening led to an approximate 25% reduction in both PSA testing and overdiagnosis rates in the target population compared with the current policy. This was due to the anticipated decrease in PSA testing and overdiagnosis in men aged 70+ years being larger than the projected increase in PSA testing and overdiagnosis in men 50-69 years. The overall incidence of early-stage cancer prostate cancer was similar. Population-based screening was found to detect more early-stage cancers that were not overdiagnosed, and therefore likely to have a greater impact on prostate-cancer morbidity and mortality than current policy. Findings were robust in sensitivity analyses including an entirely separate modeling approach.
Conclusion: Opportunistic screening policies in England have led to high rates of overdiagnosis and PSA testing. In comparison with current policy, a risk-adapted, population-based prostate cancer screening program would likely reduce the number of PSA tests and overdiagnoses, and increase benefits of PSA testing from reduced prostate-cancer mortality Population health in England could be improved by either adopting an organized program or by prohibiting PSA testing of asymptomatic men in primary care.
{"title":"Effect of implementing population-based prostate-specific antigen screening on testing rates and prostate cancer overdiagnosis in England: a statistical modelling study.","authors":"Andrew J Vickers, Adam Brentnall","doi":"10.64898/2026.01.23.26344710","DOIUrl":"https://doi.org/10.64898/2026.01.23.26344710","url":null,"abstract":"<p><strong>Objective: </strong>To determine the potential effect of implementing population-based prostate-specific antigen (PSA) screening in England on overdiagnosis and testing rates compared with the current opportunistic testing policy.</p><p><strong>Design: </strong>Statistical modeling study. English data on rates of prostate cancer by stage, symptomatic and asymptomatic PSA testing, and life expectancy were merged with epidemiological assumptions on lead time to evaluate plausible overdiagnosis and PSA testing rates from an organized population-based program, in comparison with the current opportunistic policy. In the base-case scenario, organized screening increased the rate of asymptomatic PSA testing (screening) in men aged 50 - 69 year and decreased PSA testing in older men. An alternative modeling approach estimated change in overdiagnosis using data from the CAP trial, and current asymptomatic cancer detection rates.</p><p><strong>Setting: </strong>England, 2018/19.</p><p><strong>Participants: </strong>Adult men.</p><p><strong>Main outcome measures: </strong>Rates of PSA testing, early-stage prostate cancer incidence, and overdiagnosis (prostate cancer that would not be diagnosed in a man's lifetime but for the PSA test).</p><p><strong>Results: </strong>In the base scenario, introduction of population-based screening led to an approximate 25% reduction in both PSA testing and overdiagnosis rates in the target population compared with the current policy. This was due to the anticipated decrease in PSA testing and overdiagnosis in men aged 70+ years being larger than the projected increase in PSA testing and overdiagnosis in men 50-69 years. The overall incidence of early-stage cancer prostate cancer was similar. Population-based screening was found to detect more early-stage cancers that were not overdiagnosed, and therefore likely to have a greater impact on prostate-cancer morbidity and mortality than current policy. Findings were robust in sensitivity analyses including an entirely separate modeling approach.</p><p><strong>Conclusion: </strong>Opportunistic screening policies in England have led to high rates of overdiagnosis and PSA testing. In comparison with current policy, a risk-adapted, population-based prostate cancer screening program would likely reduce the number of PSA tests and overdiagnoses, and increase benefits of PSA testing from reduced prostate-cancer mortality Population health in England could be improved by either adopting an organized program or by prohibiting PSA testing of asymptomatic men in primary care.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870584/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.24.26344771
Platon Lukyanenko, Sunil Ghelani, Yuting Yang, Bohan Jiang, Timothy Miller, David Harrild, Nao Sasaki, Francesca Sperotto, Danielle Sganga, John Triedman, Andrew J Powell, Tal Geva, William G La Cava, Joshua Mayourian
Background: Delayed or missed diagnosis of congenital heart disease (CHD) contributes to excess pediatric mortality worldwide. Echocardiography (echo) is central to diagnosing and triaging CHD, yet expert interpretation remains a scarce and maldistributed global resource. Artificial intelligence (AI) offers the potential to democratize diagnostics and extend expert-level interpretation beyond large academic centers, but its application in CHD remains underexplored.
Methods: We developed EchoFocus-CHD, an AI-enabled model for automated detection of 12 critical and 8 non-critical CHD lesions, individually and as composites. The composite critical CHD outcome was the primary endpoint. The model expands on a multi-task, view-agnostic architecture (PanEcho) with a transformer encoder to improve focus on relevant echo views. The model was trained (80%) and tested (20%) on the first echo per patient from Boston Children's Hospital (BCH), with external validation on US and international studies from patients referred to BCH.
Results: The internal and external cohorts included 3.4 million videos from 54,727 echos (median age at echo 7.1 [IQR, 0.2-15.0] years; 5.8% critical CHD; 23.6% non-critical CHD) and 167,484 videos from 3,356 echos (median age at echo 2.5 [IQR, 0.3-9.4] years; 29.4% critical CHD; 45.6% non-critical CHD), respectively. EchoFocus-CHD showed excellent internal ability to detect the composite critical CHD outcome (AUROC 0.94, LR+ 7.50, LR- 0.14) and individual critical lesions (AUROC 0.83-1.00), as well as composite non-critical CHD (AUROC 0.90, LR+ 5.00, LR- 0.23) and individual non-critical lesions (AUROC 0.70-0.96). Performance declined during external validation to detect critical CHD (AUROC 0.77), coinciding with greater expert disagreement on external cases (κ=0.72 versus 0.82 for internal cases). Explainability analyses demonstrated that the model prioritized the same clinically relevant views (parasternal long-axis, parasternal short-axis, and subxiphoid long-axis) across internal and external cohorts, while UMAP analysis revealed a domain shift between cohorts. Retraining on all available US patients attenuated domain shift, improving international critical CHD detection (AUROC 0.87) and calibration.
Conclusions: EchoFocus-CHD shows promise for automated CHD detection and highlights the need to address domain shift for real-world deployment. By identifying high-risk CHD lesions, this approach could support triage, prioritize expert review, and optimize resource allocation, advancing more equitable global cardiovascular care.
{"title":"Automated Echocardiographic Detection of Congenital Heart Disease Using Artificial Intelligence.","authors":"Platon Lukyanenko, Sunil Ghelani, Yuting Yang, Bohan Jiang, Timothy Miller, David Harrild, Nao Sasaki, Francesca Sperotto, Danielle Sganga, John Triedman, Andrew J Powell, Tal Geva, William G La Cava, Joshua Mayourian","doi":"10.64898/2026.01.24.26344771","DOIUrl":"https://doi.org/10.64898/2026.01.24.26344771","url":null,"abstract":"<p><strong>Background: </strong>Delayed or missed diagnosis of congenital heart disease (CHD) contributes to excess pediatric mortality worldwide. Echocardiography (echo) is central to diagnosing and triaging CHD, yet expert interpretation remains a scarce and maldistributed global resource. Artificial intelligence (AI) offers the potential to democratize diagnostics and extend expert-level interpretation beyond large academic centers, but its application in CHD remains underexplored.</p><p><strong>Methods: </strong>We developed EchoFocus-CHD, an AI-enabled model for automated detection of 12 critical and 8 non-critical CHD lesions, individually and as composites. The composite critical CHD outcome was the primary endpoint. The model expands on a multi-task, view-agnostic architecture (PanEcho) with a transformer encoder to improve focus on relevant echo views. The model was trained (80%) and tested (20%) on the first echo per patient from Boston Children's Hospital (BCH), with external validation on US and international studies from patients referred to BCH.</p><p><strong>Results: </strong>The internal and external cohorts included 3.4 million videos from 54,727 echos (median age at echo 7.1 [IQR, 0.2-15.0] years; 5.8% critical CHD; 23.6% non-critical CHD) and 167,484 videos from 3,356 echos (median age at echo 2.5 [IQR, 0.3-9.4] years; 29.4% critical CHD; 45.6% non-critical CHD), respectively. EchoFocus-CHD showed excellent internal ability to detect the composite critical CHD outcome (AUROC 0.94, LR+ 7.50, LR- 0.14) and individual critical lesions (AUROC 0.83-1.00), as well as composite non-critical CHD (AUROC 0.90, LR+ 5.00, LR- 0.23) and individual non-critical lesions (AUROC 0.70-0.96). Performance declined during external validation to detect critical CHD (AUROC 0.77), coinciding with greater expert disagreement on external cases (κ=0.72 versus 0.82 for internal cases). Explainability analyses demonstrated that the model prioritized the same clinically relevant views (parasternal long-axis, parasternal short-axis, and subxiphoid long-axis) across internal and external cohorts, while UMAP analysis revealed a domain shift between cohorts. Retraining on all available US patients attenuated domain shift, improving international critical CHD detection (AUROC 0.87) and calibration.</p><p><strong>Conclusions: </strong>EchoFocus-CHD shows promise for automated CHD detection and highlights the need to address domain shift for real-world deployment. By identifying high-risk CHD lesions, this approach could support triage, prioritize expert review, and optimize resource allocation, advancing more equitable global cardiovascular care.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.22.26344669
Youjin Lee, Seonguk Kim, Sangil Kim, Yeona Kang
Background: Distinguishing individuals with cognitive decline (CD), including early Alzheimer's disease, from cognitively normal (CN) individuals is essential for improving diagnostic accuracy and enabling timely intervention. Positron emission tomography (PET) captures functional brain alterations associated with CD, but its broader application is often limited by cost and radiation exposure. To enhance the clinical utility of PET while addressing data limitations, we propose a multi-representational learning framework that leverages both imaging data and region-level quantification in a data-efficient manner.
Methods: Voxel-level features were extracted using convolutional neural networks (CNN) or principal component analysis networks (PCANet) from [¹⁸F]FDG PET imaging. Region-level features were derived from standardized uptake value ratio measurements across predefined brain regions and processed using a deep neural network (DNN). These voxel- and region-level information are integrated through direct concatenation. For final prediction, different machine learning models and ensemble technique were applied. The models were trained and validated using 5-fold cross-validation on PET scans from 252 participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI), comprising 118 CN and 134 CD subjects. Additional correlation analysis and disease classification comparison with the Mini-Mental State Examination (MMSE) were also performed.
Results: In 5-fold cross-validation, CNN, PCANet, and DNN models achieved classification accuracies of 0.69 ± 0.04, 0.69 ± 0.06, and 0.82 ± 0.06, respectively. The integrated DNN-CNN model using direct concatenation yielded the highest accuracy (0.87 ± 0.05), with a 6.10% improvement in accuracy and reduced standard deviation relative to the DNN-only model. Moreover, there were an increase of 14.29% in Recall (0.77 to 0.88) and an increase of 7.32% in F1-Score (0.82 to 0.88). Moreover, the model output showed a significant level of relation with MMSE, and it outperformed the MMSE-based classification in accuracy, recall, and f1, except precision.
Conclusion: Combining PET imaging with region-level quantification and deep learning improves diagnostic performance over single-feature based models. Notably, fusion-based approaches enhanced sensitivity to cognitive decline. This multimodal strategy offers a more data-efficient and accurate approach for classifying cognitive decline and supports broader PET application in clinical settings.
{"title":"Feature Integration of [ <sup>18</sup> F]FDG PET Brain Imaging Using Deep Learning for Sensitive Cognitive Decline Detection.","authors":"Youjin Lee, Seonguk Kim, Sangil Kim, Yeona Kang","doi":"10.64898/2026.01.22.26344669","DOIUrl":"https://doi.org/10.64898/2026.01.22.26344669","url":null,"abstract":"<p><strong>Background: </strong>Distinguishing individuals with cognitive decline (CD), including early Alzheimer's disease, from cognitively normal (CN) individuals is essential for improving diagnostic accuracy and enabling timely intervention. Positron emission tomography (PET) captures functional brain alterations associated with CD, but its broader application is often limited by cost and radiation exposure. To enhance the clinical utility of PET while addressing data limitations, we propose a multi-representational learning framework that leverages both imaging data and region-level quantification in a data-efficient manner.</p><p><strong>Methods: </strong>Voxel-level features were extracted using convolutional neural networks (CNN) or principal component analysis networks (PCANet) from [¹⁸F]FDG PET imaging. Region-level features were derived from standardized uptake value ratio measurements across predefined brain regions and processed using a deep neural network (DNN). These voxel- and region-level information are integrated through direct concatenation. For final prediction, different machine learning models and ensemble technique were applied. The models were trained and validated using 5-fold cross-validation on PET scans from 252 participants in the Alzheimer's Disease Neuroimaging Initiative (ADNI), comprising 118 CN and 134 CD subjects. Additional correlation analysis and disease classification comparison with the Mini-Mental State Examination (MMSE) were also performed.</p><p><strong>Results: </strong>In 5-fold cross-validation, CNN, PCANet, and DNN models achieved classification accuracies of 0.69 ± 0.04, 0.69 ± 0.06, and 0.82 ± 0.06, respectively. The integrated DNN-CNN model using direct concatenation yielded the highest accuracy (0.87 ± 0.05), with a 6.10% improvement in accuracy and reduced standard deviation relative to the DNN-only model. Moreover, there were an increase of 14.29% in Recall (0.77 to 0.88) and an increase of 7.32% in F1-Score (0.82 to 0.88). Moreover, the model output showed a significant level of relation with MMSE, and it outperformed the MMSE-based classification in accuracy, recall, and f1, except precision.</p><p><strong>Conclusion: </strong>Combining PET imaging with region-level quantification and deep learning improves diagnostic performance over single-feature based models. Notably, fusion-based approaches enhanced sensitivity to cognitive decline. This multimodal strategy offers a more data-efficient and accurate approach for classifying cognitive decline and supports broader PET application in clinical settings.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870617/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.25.26344776
Yaqi Huang, Meng Hao, Shuai Jiang, Xiangnan Li, Yulong Tang, Zixin Hu, Xiaofeng Wang, Li Han, Yi Li, Hui Zhang
<p><strong>Importance: </strong>Frailty is a multisystem syndrome that reflects age-related physiological decline, underscoring the need for more biologically informed risk stratification within frailty assessments. Frailty and heart stress (HS) are individually associated with increased mortality risk, but their combined effects remain practically unexplored.</p><p><strong>Objective: </strong>To evaluate whether the combined exposure to frailty and HS is associated with an increased risk of mortality.</p><p><strong>Design setting and participants: </strong>This prospective cohort study used data from the US National Health and Nutrition Examination Survey (NHANES) and the Health and Retirement Study (HRS). Participants with complete data on frailty and HS were included. Analyses was performed between May 2025 and October 2025.</p><p><strong>Exposure: </strong>Frailty was assessed using three frailty indices (FI) based on self-reported items (FI-Self-report), blood biomarkers (FI-Lab), and their combination (FI-Combined). HS was defined by age-adjusted elevation in N-terminal pro-B-type natriuretic peptide (NT-proBNP) levels. Participants were estimate into four groups according to baseline frailty and HS status.</p><p><strong>Main outcomes and measures: </strong>The primary outcome was all-cause mortality. Cox proportional hazard models were employed to calculate the hazard ratios (HRs) and 95% confidence intervals (CIs).</p><p><strong>Results: </strong>A total of 12,252 participants from NHANES (mean age 49.91 years, 52.18% female), and 9,488 participants from HRS (mean age 69.16 years, 58.97% female) were included. Compared with those having neither frailty nor HS, participants with frailty and/or HS showed significantly elevated mortality risk in both cohorts, with HRs ranging from 1.81 to 5.54. The highest mortality risk was observed in participant with both frailty and HS, the HRs were 3.58 (95% CI: 3.20-4.01) for FI Self Report, 3.43 (95% CI: 3.04-3.86) for FI Lab, and 4.15 (95% CI: 3.70-4.67) for FI Combined in NHANES; the corresponding HRs were 5.02 (95% CI: 4.38-5.76), 4.73 (95% CI: 4.13-5.41), and 5.54 (95% CI: 4.84-6.35) in HRS, respectively.</p><p><strong>Conclusions and relevance: </strong>Co-occurrence of frailty and HS is common, and jointly associated with increased mortality risk in the general population. These findings support integrating HS into frailty assessments to improve mortality risk stratification and guide targeted interventions.</p><p><strong>Key points: </strong><b>Question:</b> Is the combination of frailty and heart stress (HS) associated with increased mortality risk? <b>Findings:</b> In this prospective cohort study including 12,252 participants from the US National Health and Nutrition Examination Survey (NHANES) and 9,488 participants from the Health and Retirement Study (HRS), participants with frailty and/or HS exhibited higher risk of all-cause mortality. The greatest mortality risk was found among partici
{"title":"Heart Stress, Frailty and Mortality Risk in two prospective cohorts.","authors":"Yaqi Huang, Meng Hao, Shuai Jiang, Xiangnan Li, Yulong Tang, Zixin Hu, Xiaofeng Wang, Li Han, Yi Li, Hui Zhang","doi":"10.64898/2026.01.25.26344776","DOIUrl":"https://doi.org/10.64898/2026.01.25.26344776","url":null,"abstract":"<p><strong>Importance: </strong>Frailty is a multisystem syndrome that reflects age-related physiological decline, underscoring the need for more biologically informed risk stratification within frailty assessments. Frailty and heart stress (HS) are individually associated with increased mortality risk, but their combined effects remain practically unexplored.</p><p><strong>Objective: </strong>To evaluate whether the combined exposure to frailty and HS is associated with an increased risk of mortality.</p><p><strong>Design setting and participants: </strong>This prospective cohort study used data from the US National Health and Nutrition Examination Survey (NHANES) and the Health and Retirement Study (HRS). Participants with complete data on frailty and HS were included. Analyses was performed between May 2025 and October 2025.</p><p><strong>Exposure: </strong>Frailty was assessed using three frailty indices (FI) based on self-reported items (FI-Self-report), blood biomarkers (FI-Lab), and their combination (FI-Combined). HS was defined by age-adjusted elevation in N-terminal pro-B-type natriuretic peptide (NT-proBNP) levels. Participants were estimate into four groups according to baseline frailty and HS status.</p><p><strong>Main outcomes and measures: </strong>The primary outcome was all-cause mortality. Cox proportional hazard models were employed to calculate the hazard ratios (HRs) and 95% confidence intervals (CIs).</p><p><strong>Results: </strong>A total of 12,252 participants from NHANES (mean age 49.91 years, 52.18% female), and 9,488 participants from HRS (mean age 69.16 years, 58.97% female) were included. Compared with those having neither frailty nor HS, participants with frailty and/or HS showed significantly elevated mortality risk in both cohorts, with HRs ranging from 1.81 to 5.54. The highest mortality risk was observed in participant with both frailty and HS, the HRs were 3.58 (95% CI: 3.20-4.01) for FI Self Report, 3.43 (95% CI: 3.04-3.86) for FI Lab, and 4.15 (95% CI: 3.70-4.67) for FI Combined in NHANES; the corresponding HRs were 5.02 (95% CI: 4.38-5.76), 4.73 (95% CI: 4.13-5.41), and 5.54 (95% CI: 4.84-6.35) in HRS, respectively.</p><p><strong>Conclusions and relevance: </strong>Co-occurrence of frailty and HS is common, and jointly associated with increased mortality risk in the general population. These findings support integrating HS into frailty assessments to improve mortality risk stratification and guide targeted interventions.</p><p><strong>Key points: </strong><b>Question:</b> Is the combination of frailty and heart stress (HS) associated with increased mortality risk? <b>Findings:</b> In this prospective cohort study including 12,252 participants from the US National Health and Nutrition Examination Survey (NHANES) and 9,488 participants from the Health and Retirement Study (HRS), participants with frailty and/or HS exhibited higher risk of all-cause mortality. The greatest mortality risk was found among partici","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.22.26343677
Niveditha Ravindra, Justin Lack, Clifton L Dalgard, Eddy vanCollenburg, Adam Corner, Lan Beppu, Harry Erba, Megan Othus, Jerald P Radich, Laura W Dillon, Christopher S Hourigan
Post-treatment measurable residual disease (MRD) in acute myeloid leukemia (AML) patients is associated with adverse clinical outcomes. Validated molecular methods for AML MRD are preferable to flow cytometry assays but are not available for all patients. The limit of detection (LOD) of next-generation sequencing (NGS) assays for single nucleotide variants is restricted by technical error rates. Structural alterations are common genetic features of AML, but MRD approaches for detecting this class of variants have primarily relied on RNA. However, RNA has suboptimal stability, not all structural alterations are expressed as transcripts, and the impact of anti-leukemic therapy on transcription may make leukemic disease burden quantification inaccurate. In this study, we demonstrate a whole genome sequencing (WGS)-based approach to identify genomic DNA breakpoints of chromosomal rearrangements that allowed design of highly sensitive patient-personalized digital droplet PCR (ddPCR) MRD assays. Acute myeloid leukemia (AML) is an aggressive malignancy of the hematopoietic precursor cells that predominantly affects older individuals. Oncogenic transformation occurring through the acquisition of structural chromosomal aberrations is noted in 35% of AML cases, and can result in the formation of fusion proteins that confer proliferation and survival advantages (1). When compared to classical cytogenetics for the identification of structural variants at diagnosis, newer techniques such as optical genome mapping can identify clinically pertinent aberrations that may be cryptic or smaller than the resolution of conventional karyotyping and FISH (2). Similarly, short-read whole genome sequencing (WGS) has been shown to increase diagnostic yield and better refine risk stratification when compared to traditional cytogenetic testing in myeloid malignancies (3). Additionally, WGS can be utilized to identify genomic breakpoints of chromosomal rearrangements at a basepair (bp) resolution.
{"title":"Whole Genome Sequencing Informed Patient Personalized Measurable Residual Disease Assays for Acute Myeloid Leukemia.","authors":"Niveditha Ravindra, Justin Lack, Clifton L Dalgard, Eddy vanCollenburg, Adam Corner, Lan Beppu, Harry Erba, Megan Othus, Jerald P Radich, Laura W Dillon, Christopher S Hourigan","doi":"10.64898/2026.01.22.26343677","DOIUrl":"https://doi.org/10.64898/2026.01.22.26343677","url":null,"abstract":"<p><p>Post-treatment measurable residual disease (MRD) in acute myeloid leukemia (AML) patients is associated with adverse clinical outcomes. Validated molecular methods for AML MRD are preferable to flow cytometry assays but are not available for all patients. The limit of detection (LOD) of next-generation sequencing (NGS) assays for single nucleotide variants is restricted by technical error rates. Structural alterations are common genetic features of AML, but MRD approaches for detecting this class of variants have primarily relied on RNA. However, RNA has suboptimal stability, not all structural alterations are expressed as transcripts, and the impact of anti-leukemic therapy on transcription may make leukemic disease burden quantification inaccurate. In this study, we demonstrate a whole genome sequencing (WGS)-based approach to identify genomic DNA breakpoints of chromosomal rearrangements that allowed design of highly sensitive patient-personalized digital droplet PCR (ddPCR) MRD assays. Acute myeloid leukemia (AML) is an aggressive malignancy of the hematopoietic precursor cells that predominantly affects older individuals. Oncogenic transformation occurring through the acquisition of structural chromosomal aberrations is noted in 35% of AML cases, and can result in the formation of fusion proteins that confer proliferation and survival advantages (1). When compared to classical cytogenetics for the identification of structural variants at diagnosis, newer techniques such as optical genome mapping can identify clinically pertinent aberrations that may be cryptic or smaller than the resolution of conventional karyotyping and FISH (2). Similarly, short-read whole genome sequencing (WGS) has been shown to increase diagnostic yield and better refine risk stratification when compared to traditional cytogenetic testing in myeloid malignancies (3). Additionally, WGS can be utilized to identify genomic breakpoints of chromosomal rearrangements at a basepair (bp) resolution.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870688/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.64898/2026.01.25.26344738
Konstantinos Chiotis, Ganna Blazhenets, David N Soleimani-Meigooni, Paul S Aisen, Alinda Amuiri, Alireza Atri, Laurel Beckett, Michael Brickhouse, David G Clark, Jeffrey L Dage, Gregory S Day, Ranjan Duara, Ani Eloyan, Tatiana Foroud, Neill R Graff-Radford, Ian M Grant, Dustin B Hammers, Lawrence S Honig, Erik C B Johnson, David T Jones, Kala Kirby, Robert Koeppe, Joel H Kramer, Walter A Kukull, Julien Lagarde, Antoine Leuzy, Piyush Maiti, Joseph C Masdeu, Mario F Mendez, Erik Musiek, Kelly N Nudelman, Chiadi U Onyike, Meghan Riddle, Salma Rocha, Emily Rogalski, Stephen Salloway, Daniel R Schonhaut, Sharon Sha, Ranjani Shankar, Alexander Taurone, Maryanne Thangarajah, Arthur W Toga, Alexandra Touroutoglou, Raymond Scott Turner, Prashanthi Vemuri, Thomas S Wingo, David A Wolk, Kyle Womack, Jiaxiuxiu Zhang, Maria C Carrillo, Bradford C Dickerson, Liana G Apostolova, Renaud La Joie, Gil D Rabinovici
Early-onset Alzheimer's disease (EOAD) and Late-onset AD (LOAD) differ in clinical presentations and rates of progression. We aimed to compare baseline and longitudinal tau PET burden, and their relationship with clinical variables in amyloid-PET positive, cognitively impaired participants from the Longitudinal Early-Onset Alzheimer's Disease Study (EOAD; n=390) and Alzheimer's Disease Neuroimaging Initiative (LOAD; n=211). Patients with EOAD showed higher baseline tau PET retention, broader neuroanatomical involvement and faster accumulation rates over time compared to LOAD, after adjusting for amyloid load and clinical stage. Tau PET showed stronger correlations with baseline amyloid burden and clinical measures of global cognition and function in EOAD than LOAD. We conclude that earlier age of onset in AD is linked to a more aggressive tauopathy, which in turn is a primary driver of clinical decline. These findings suggest that optimal therapeutic targets and strategies may differ between EOAD and LOAD.
One sentence summary: Younger patients with Alzheimer's disease show more aggressive tau spread, suggesting age of onset defines distinct disease pathways with key clinical implications.
{"title":"Distinct Tau PET Dynamics in Early vs. Late Age-of-Onset Alzheimer's disease.","authors":"Konstantinos Chiotis, Ganna Blazhenets, David N Soleimani-Meigooni, Paul S Aisen, Alinda Amuiri, Alireza Atri, Laurel Beckett, Michael Brickhouse, David G Clark, Jeffrey L Dage, Gregory S Day, Ranjan Duara, Ani Eloyan, Tatiana Foroud, Neill R Graff-Radford, Ian M Grant, Dustin B Hammers, Lawrence S Honig, Erik C B Johnson, David T Jones, Kala Kirby, Robert Koeppe, Joel H Kramer, Walter A Kukull, Julien Lagarde, Antoine Leuzy, Piyush Maiti, Joseph C Masdeu, Mario F Mendez, Erik Musiek, Kelly N Nudelman, Chiadi U Onyike, Meghan Riddle, Salma Rocha, Emily Rogalski, Stephen Salloway, Daniel R Schonhaut, Sharon Sha, Ranjani Shankar, Alexander Taurone, Maryanne Thangarajah, Arthur W Toga, Alexandra Touroutoglou, Raymond Scott Turner, Prashanthi Vemuri, Thomas S Wingo, David A Wolk, Kyle Womack, Jiaxiuxiu Zhang, Maria C Carrillo, Bradford C Dickerson, Liana G Apostolova, Renaud La Joie, Gil D Rabinovici","doi":"10.64898/2026.01.25.26344738","DOIUrl":"https://doi.org/10.64898/2026.01.25.26344738","url":null,"abstract":"<p><p>Early-onset Alzheimer's disease (EOAD) and Late-onset AD (LOAD) differ in clinical presentations and rates of progression. We aimed to compare baseline and longitudinal tau PET burden, and their relationship with clinical variables in amyloid-PET positive, cognitively impaired participants from the Longitudinal Early-Onset Alzheimer's Disease Study (EOAD; n=390) and Alzheimer's Disease Neuroimaging Initiative (LOAD; n=211). Patients with EOAD showed higher baseline tau PET retention, broader neuroanatomical involvement and faster accumulation rates over time compared to LOAD, after adjusting for amyloid load and clinical stage. Tau PET showed stronger correlations with baseline amyloid burden and clinical measures of global cognition and function in EOAD than LOAD. We conclude that earlier age of onset in AD is linked to a more aggressive tauopathy, which in turn is a primary driver of clinical decline. These findings suggest that optimal therapeutic targets and strategies may differ between EOAD and LOAD.</p><p><strong>One sentence summary: </strong>Younger patients with Alzheimer's disease show more aggressive tau spread, suggesting age of onset defines distinct disease pathways with key clinical implications.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146128040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}