首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Multi-scale cancer driver gene prediction by flexible data selection and network topology guidance 基于灵活数据选择和网络拓扑引导的多尺度癌症驱动基因预测。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 DOI: 10.1016/j.jbi.2025.104961
Jian Liu , Yingzan Ren , Guodong Xiao , Ponian Li , Chuanqi Sun , Jiaxin Chen , Fubin Ma , Rui Gao , Jia Mi , Haiyan Cong , Mingyi Wang , Yusen Zhang

Objective

Efficient and comprehensive prioritization of cancer driver genes across individual patients, cancer cohorts, and pan–cancer is crucial for advancing cancer diagnosis and treatment. The existing methods are effective, but they seem to have reached a plateau in accuracy enhancement and lack broad–scale joint analysis, flexibility in adapting to cancer and interpretability.

Methods

Here, we introduce GenMorw, a heterogeneous network framework that discovers a novel association score between patients and their mutated genes, enabling the estimation of the likelihood of the mutated genes acting as drivers in patients. GenMorw flexibly integrates or fully utilize collected mutation, gene/miRNA expression, methylation data and PPI networks to classify patient groups based on data–specific characteristics and identify potential drivers at the individual, cancer and pan–cancer levels.

Results

GenMorw outperforms existing algorithms with an average cohort AUC improvement of 17.66% and higher overall accuracy by a cumulative ranking strategy in patient–gene heterogeneous networks. Except for AUC evaluation, other various comparative strategies consistently demonstrate the superior performance of GenMorw across multiple cancers, outperforming other algorithms. Some uniquely predicted genes, such as ANK3, CENPF, and COL7A1, which are absent from standard databases and not identified by other methods, were validated as highly cancer–related through literature review and survival analysis. Based on GenMorw–derived heterogeneous networks, the strongly connected components and cliques, which are extracted from them, capture most of the predicted or known driver genes to help predict driver genes.

Conclusion

We conclude that GenMorw, with its novel gene–patient score mechanism, offers a significant advance in cancer driver gene discovery by capturing both population-wide and patient-specific network signals, thereby improving predictive power and enabling deeper insights into cancer heterogeneity.
目的:在个体患者、癌症群体和泛癌症中高效、全面地确定癌症驱动基因的优先级对于推进癌症的诊断和治疗至关重要。现有的方法是有效的,但它们似乎在准确性提高方面已经达到了一个平台,缺乏广泛的联合分析,适应癌症的灵活性和可解释性。方法:在这里,我们引入了GenMorw,这是一个异构网络框架,它发现了患者与其突变基因之间的一种新的关联评分,从而能够估计突变基因在患者中作为驱动因素的可能性。GenMorw灵活整合或充分利用收集到的突变、基因/miRNA表达、甲基化数据和PPI网络,根据数据特异性特征对患者群体进行分类,并在个体、癌症和泛癌症水平上识别潜在的驱动因素。结果:GenMorw优于现有算法,在患者-基因异质性网络中,通过累积排序策略,平均队列AUC提高了17.66%,总体准确率更高。除AUC评估外,其他各种比较策略一致显示GenMorw在多种癌症中的优越性能,优于其他算法。一些独特的预测基因,如ANK3、CENPF和COL7A1,没有在标准数据库中,也没有通过其他方法识别,通过文献回顾和生存分析被证实为与癌症高度相关。基于genmorw衍生的异构网络,从中提取的强连接成分和派系捕获了大多数预测或已知的驱动基因,以帮助预测驱动基因。结论:我们得出的结论是,GenMorw通过其新颖的基因-患者评分机制,通过捕获人群范围和患者特异性网络信号,在癌症驱动基因发现方面取得了重大进展,从而提高了预测能力,并能够更深入地了解癌症异质性。
{"title":"Multi-scale cancer driver gene prediction by flexible data selection and network topology guidance","authors":"Jian Liu ,&nbsp;Yingzan Ren ,&nbsp;Guodong Xiao ,&nbsp;Ponian Li ,&nbsp;Chuanqi Sun ,&nbsp;Jiaxin Chen ,&nbsp;Fubin Ma ,&nbsp;Rui Gao ,&nbsp;Jia Mi ,&nbsp;Haiyan Cong ,&nbsp;Mingyi Wang ,&nbsp;Yusen Zhang","doi":"10.1016/j.jbi.2025.104961","DOIUrl":"10.1016/j.jbi.2025.104961","url":null,"abstract":"<div><h3>Objective</h3><div>Efficient and comprehensive prioritization of cancer driver genes across individual patients, cancer cohorts, and pan–cancer is crucial for advancing cancer diagnosis and treatment. The existing methods are effective, but they seem to have reached a plateau in accuracy enhancement and lack broad–scale joint analysis, flexibility in adapting to cancer and interpretability.</div></div><div><h3>Methods</h3><div>Here, we introduce GenMorw, a heterogeneous network framework that discovers a novel association score between patients and their mutated genes, enabling the estimation of the likelihood of the mutated genes acting as drivers in patients. GenMorw flexibly integrates or fully utilize collected mutation, gene/miRNA expression, methylation data and PPI networks to classify patient groups based on data–specific characteristics and identify potential drivers at the individual, cancer and pan–cancer levels.</div></div><div><h3>Results</h3><div>GenMorw outperforms existing algorithms with an average cohort AUC improvement of 17.66% and higher overall accuracy by a cumulative ranking strategy in patient–gene heterogeneous networks. Except for AUC evaluation, other various comparative strategies consistently demonstrate the superior performance of GenMorw across multiple cancers, outperforming other algorithms. Some uniquely predicted genes, such as ANK3, CENPF, and COL7A1, which are absent from standard databases and not identified by other methods, were validated as highly cancer–related through literature review and survival analysis. Based on GenMorw–derived heterogeneous networks, the strongly connected components and cliques, which are extracted from them, capture most of the predicted or known driver genes to help predict driver genes.</div></div><div><h3>Conclusion</h3><div>We conclude that GenMorw, with its novel gene–patient score mechanism, offers a significant advance in cancer driver gene discovery by capturing both population-wide and patient-specific network signals, thereby improving predictive power and enabling deeper insights into cancer heterogeneity.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104961"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145587610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Drug repositioning with metapath guidance and adaptive negative sampling enhancement” [J. Biomed. Inform. 171 (2025) 104916] “药物再定位与路径引导和自适应负采样增强”[J]。生物医学。通报。171(2025)104916]。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 DOI: 10.1016/j.jbi.2025.104953
Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen
{"title":"Corrigendum to “Drug repositioning with metapath guidance and adaptive negative sampling enhancement” [J. Biomed. Inform. 171 (2025) 104916]","authors":"Yaozheng Zhou ,&nbsp;Xingyu Shi ,&nbsp;Lingfeng Wang ,&nbsp;Jin Xu ,&nbsp;Demin Li ,&nbsp;Congzhou Chen","doi":"10.1016/j.jbi.2025.104953","DOIUrl":"10.1016/j.jbi.2025.104953","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104953"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145556907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on multimodal spatially-constrained contrastive learning for knee osteoarthritis severity grading 多模态空间约束对比学习在膝关节骨关节炎严重程度分级中的应用研究
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 DOI: 10.1016/j.jbi.2025.104962
YuHao Wu , Zhijie Xiang , Yuzhe Tan , Jiayue Hu , Desheng Chen , Jing Zhao , Haicheng Wei
To address the limitations of single-modal feature coverage and class distribution imbalance in knee osteoarthritis (KOA) classification, this study proposes a Multimodal Spatial-constraint Contrastive Learning (MSCL) model. First, dynamic and static plantar pressure data and human keypoint trajectories are synchronously acquired. The model first feeds dynamic plantar pressure and keypoint data into a multimodal spatial–temporal fusion branch, where graph convolutional networks and Transformers extract spatial–temporal representations of human keypoints and dynamic pressure patterns respectively, followed by Cross Attention fusion. Subsequently, static plantar pressure is processed through a pyramid CNN architecture to generate coarse-grained spatial constraint vectors, which serve as anatomical priors to regularize the fused representations. Finally, a contrastive learning framework is integrated to establish explicit mapping between the enhanced representations and Kellgren–Lawrence (KL) grading system, enabling precise KOA severity stratification. Experimental results demonstrate that the MSCL model achieves 0.94 macro-average accuracy in KL grading, with 7% improvement in F1-scores for imbalanced categories with limited samples. This work establishes a novel paradigm for accurate KOA assessment through multimodal gait analysis.
针对膝关节骨关节炎(KOA)分类中单模态特征覆盖和类别分布不平衡的局限性,本研究提出了一个多模态空间约束对比学习(MSCL)模型。首先,同步获取动态、静态足底压力数据和人体关键点轨迹;该模型首先将动态足底压力和关键点数据输入到多模态时空融合分支中,其中图卷积网络和transformer分别提取人体关键点和动态压力模式的时空表示,然后进行交叉注意融合。随后,通过金字塔CNN架构对静态足底压力进行处理,生成粗粒度空间约束向量,作为正则化融合表征的解剖先验。最后,集成了一个对比学习框架,在增强表征和Kellgren-Lawrence (KL)分级系统之间建立显式映射,从而实现精确的KOA严重程度分层。实验结果表明,MSCL模型在KL分级中达到了0.94的宏观平均准确率,在有限样本的不平衡类别中f1分数提高了7%。本研究为通过多模态步态分析准确评估KOA建立了一个新的范式。
{"title":"Study on multimodal spatially-constrained contrastive learning for knee osteoarthritis severity grading","authors":"YuHao Wu ,&nbsp;Zhijie Xiang ,&nbsp;Yuzhe Tan ,&nbsp;Jiayue Hu ,&nbsp;Desheng Chen ,&nbsp;Jing Zhao ,&nbsp;Haicheng Wei","doi":"10.1016/j.jbi.2025.104962","DOIUrl":"10.1016/j.jbi.2025.104962","url":null,"abstract":"<div><div>To address the limitations of single-modal feature coverage and class distribution imbalance in knee osteoarthritis (KOA) classification, this study proposes a Multimodal Spatial-constraint Contrastive Learning (MSCL) model. First, dynamic and static plantar pressure data and human keypoint trajectories are synchronously acquired. The model first feeds dynamic plantar pressure and keypoint data into a multimodal spatial–temporal fusion branch, where graph convolutional networks and Transformers extract spatial–temporal representations of human keypoints and dynamic pressure patterns respectively, followed by Cross Attention fusion. Subsequently, static plantar pressure is processed through a pyramid CNN architecture to generate coarse-grained spatial constraint vectors, which serve as anatomical priors to regularize the fused representations. Finally, a contrastive learning framework is integrated to establish explicit mapping between the enhanced representations and Kellgren–Lawrence (KL) grading system, enabling precise KOA severity stratification. Experimental results demonstrate that the MSCL model achieves 0.94 macro-average accuracy in KL grading, with 7% improvement in F1-scores for imbalanced categories with limited samples. This work establishes a novel paradigm for accurate KOA assessment through multimodal gait analysis.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104962"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models 半新颖——一种利用大型语言模型嵌入来检测生物医学出版物语义新颖性的新方法。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 DOI: 10.1016/j.jbi.2025.104952
Xueqing Peng , Yutong Xie , Huan He , Brian Ondov , Kalpana Raja , Qijia Liu , Qiaozhu Mei , Hua Xu

Objective

The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.

Methods

We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.

Results

The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (ρ = 0.1782, p < 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (p < 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.

Conclusion

SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.
目的:科学文献的快速增长需要强有力的方法来识别新的贡献。然而,目前在生物医学研究中没有广泛认可的新颖性测量方法。现有的方法通常使用孤立的文章特征(如关键词、MeSH术语或参考文献)来量化新颖性,这可能会失去重要的上下文和文本语义内容的细微差别。方法:我们提出了SemNovel,一个语义新颖性检测框架,利用来自大型语言模型(llm)的嵌入来捕获更丰富的语义内容。具体而言,我们采用LLM-embedder (BAAI/ LLM-embedder)构建语义宇宙,这是一个以Llama2-7B-Chat为基础,BGE基为嵌入骨干的统一嵌入模型。我们采用t分布随机邻居嵌入(t-SNE)进行二维可视化,并将整个PubMed库投影到一个“语义宇宙”中。根据每篇文章与先前出版物的距离计算其SemNovel分数。我们通过其与未来研究影响的相关性以及区分突破性研究的能力来验证SemNovel的有效性。我们进一步探索了它在分析研究轨迹和跨学科合作趋势方面的潜力。为了提高可用性,我们为用户开发了一个交互界面来分析SemNovel分数。结果:SemNovel分数与未来的研究影响呈正相关,通过引用计数来衡量(ρ = 0.1782,p )。结论:SemNovel是量化生物医学文献中语义新颖性的一种可扩展且稳健的方法。它为发现突破性研究、跟踪科学进展和分析创新趋势提供了强大的工具。
{"title":"SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models","authors":"Xueqing Peng ,&nbsp;Yutong Xie ,&nbsp;Huan He ,&nbsp;Brian Ondov ,&nbsp;Kalpana Raja ,&nbsp;Qijia Liu ,&nbsp;Qiaozhu Mei ,&nbsp;Hua Xu","doi":"10.1016/j.jbi.2025.104952","DOIUrl":"10.1016/j.jbi.2025.104952","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.</div></div><div><h3>Methods</h3><div>We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.</div></div><div><h3>Results</h3><div>The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (<em>ρ</em> = 0.1782, <em>p</em> &lt; 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (<em>p</em> &lt; 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.</div></div><div><h3>Conclusion</h3><div>SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104952"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Caption-augmented reasoning model with Hierarchical rank LoRA finetuing for medical visual question Answering 基于层次排序LoRA的医学视觉问答标题增强推理模型。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 DOI: 10.1016/j.jbi.2025.104964
Yong Li , Jianping Man , Yi Zhou , Likeng Liang

Objective

Medical Visual Question Answering (VQA) is a quintessential application scenario of biomedical Multimodal Large Language Models (MLLMs). Previous studies mainly focused on input image-question pairs, neglecting the rich medical knowledge of the relevant captions of the pretrained datasets. This limits the model’s reasoning capability and causes overfitting. This paper aims to effectively utilize the captions of pretrained datasets to solve the above issues.

Methods

This paper proposes a Caption-Augmented Reasoning Model (CARM), which introduces three innovative components to leverage the captions during finetuning: (1) A Cross-Modal Visual Augmentation (CMVA) module that enriched image feature representations through semantic alignment with retrieved captions; (2) A Retrieval Cross-Modal Attention (RCMA) mechanism that established explicit connections between visual features and domain-specific medical knowledge; (3) A Hierarchical Rank Low-Rank Adaptation (HR-LoRA) module that optimized parameter-efficient finetuning through rank-adaptive decomposition in both unimodal encoders and multimodal fusion layers.

Results

The proposed CARM achieved state-of-the-art performance across three benchmark datasets, with accuracy scores of 0.798 on VQA-RAD, 0.867 on VQA-SLAKE, and 0.718 on VQA-Med-2019, respectively, outperforming existing medical VQA models. Qualitative evaluations revealed that our caption-based augmentation effectively directed model attention to the image regions related to a question.

Conclusions

The proposed CARM effectively improves visual grounding and reasoning accuracy with the systematic integration of medical captions, and the HR-LoRA alleviates overfitting and improves training efficiency.
目的:医学视觉问答(VQA)是生物医学多模态大语言模型(MLLMs)的典型应用场景。以往的研究主要集中在输入图像-问题对上,忽略了预训练数据集相关标题中丰富的医学知识。这限制了模型的推理能力并导致过拟合。本文旨在有效地利用预训练数据集的标题来解决上述问题。方法:本文提出了一种标题增强推理模型(CARM),该模型引入了三个创新组件来在微调过程中利用标题:(1)跨模态视觉增强(CMVA)模块,通过与检索到的标题进行语义对齐来丰富图像特征表示;(2)检索跨模态注意(RCMA)机制建立了视觉特征与特定领域医学知识之间的显式联系;(3)层次秩低秩自适应(HR-LoRA)模块,通过单峰编码器和多峰融合层的秩自适应分解优化参数高效微调。结果:所提出的CARM在三个基准数据集上取得了最先进的性能,VQA- rad、VQA- slake和VQA- med -2019的准确率分别为0.798、0.867和0.718,优于现有的医学VQA模型。定性评估表明,我们基于标题的增强有效地将模型的注意力引导到与问题相关的图像区域。结论:本文提出的CARM通过对医学字幕的系统集成,有效提高了视觉基础和推理精度,HR-LoRA缓解了过拟合,提高了训练效率。
{"title":"Caption-augmented reasoning model with Hierarchical rank LoRA finetuing for medical visual question Answering","authors":"Yong Li ,&nbsp;Jianping Man ,&nbsp;Yi Zhou ,&nbsp;Likeng Liang","doi":"10.1016/j.jbi.2025.104964","DOIUrl":"10.1016/j.jbi.2025.104964","url":null,"abstract":"<div><h3>Objective</h3><div>Medical Visual Question Answering (VQA) is a quintessential application scenario of biomedical Multimodal Large Language Models (MLLMs). Previous studies mainly focused on input image-question pairs, neglecting the rich medical knowledge of the relevant captions of the pretrained datasets. This limits the model’s reasoning capability and causes overfitting. This paper aims to effectively utilize the captions of pretrained datasets to solve the above issues.</div></div><div><h3>Methods</h3><div>This paper proposes a Caption-Augmented Reasoning Model (CARM), which introduces three innovative components to leverage the captions during finetuning: (1) A Cross-Modal Visual Augmentation (CMVA) module that enriched image feature representations through semantic alignment with retrieved captions; (2) A Retrieval Cross-Modal Attention (RCMA) mechanism that established explicit connections between visual features and domain-specific medical knowledge; (3) A Hierarchical Rank Low-Rank Adaptation (HR-LoRA) module that optimized parameter-efficient finetuning through rank-adaptive decomposition in both unimodal encoders and multimodal fusion layers.</div></div><div><h3>Results</h3><div>The proposed CARM achieved state-of-the-art performance across three benchmark datasets, with accuracy scores of 0.798 on VQA-RAD, 0.867 on VQA-SLAKE, and 0.718 on VQA-Med-2019, respectively, outperforming existing medical VQA models. Qualitative evaluations revealed that our caption-based augmentation effectively directed model attention to the image regions related to a question.</div></div><div><h3>Conclusions</h3><div>The proposed CARM effectively improves visual grounding and reasoning accuracy with the systematic integration of medical captions, and the HR-LoRA alleviates overfitting and improves training efficiency.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104964"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain adaptation of stable diffusion for ultrasound inpainting: a synthetic data approach for enhanced thyroid nodule segmentation 超声成像稳定扩散的域自适应:一种增强甲状腺结节分割的综合数据方法。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-30 DOI: 10.1016/j.jbi.2025.104963
Antonin Prochazka, Jan Zeman

Objective

To enhance the cross-domain generalization of thyroid-nodule segmentation models by augmenting limited ultrasound training data with synthetic images generated by a fine-tuned Stable Diffusion model.

Methods

Three public thyroid ultrasound datasets with heterogeneous acquisition characteristics were used: TN3K (training + testing), TDID, and TUCC. The denoising UNet inside Stable Diffusion v1.4 was fine-tuned on 2303 TN3K nodules and then used to synthesize realistic thyroid nodules. Using the model’s inpainting capability, same number of synthetic nodules were inserted into original ultrasound images. The combined data were then used to train ResUNet, DeepLabV3+ and MITUnet segmentation networks with identical hyper-parameters. Performance between the models trained on native data only and native + synthetic data was quantified with the Dice similarity coefficient (Dice score) and Intersection-over-Union (IoU).

Results

Across the in-domain TN3K test set (n = 614), performance gains were modest, with the best improvements reaching + 2.2 % in Dice score for DeepLabV3+. In contrast, substantial gains were observed on the external datasets. On the TDID dataset (n = 462), DeepLabV3+ improved from 38.2 % to 59.1 % Dice (+20.9 %), while MITUNet and ResUNet also gained up by 7.1 % and 6.9 % respectively. On the TUCC dataset (n = 192), DeepLabV3+ improved by 11.4 % in Dice, MITUNet by 6.9 %, and ResUNet by 3.1 %. All improvements—except for in-domain TN3K—were statistically significant (p < 0.01, paired t-test or Wilcoxon signed-rank test), confirming that synthetic images generated by Stable Diffusion enhance cross-domain segmentation robustness.

Conclusion

Augmenting ultrasound dataset with synthetic images generated by a task-specific Stable Diffusion model substantially improves the robustness of thyroid nodule segmentation across datasets acquired with different devices, at different institutions, and by different operators.
目的:利用经微调的稳定扩散模型生成的合成图像增强有限超声训练数据,增强甲状腺结节分割模型的跨域泛化。方法:使用三个具有异构采集特征的公共甲状腺超声数据集:TN3K(训练 + 测试)、TDID和TUCC。利用Stable Diffusion v1.4中的去噪UNet对2303个TN3K结节进行微调,合成真实的甲状腺结节。利用模型的绘制能力,将相同数量的合成结节插入到原始超声图像中。然后使用组合数据训练具有相同超参数的ResUNet、DeepLabV3+和MITUnet分割网络。使用Dice相似系数(Dice score)和Intersection-over-Union (IoU)来量化仅在本地数据和本地 + 合成数据上训练的模型之间的性能。结果:在域内TN3K测试集(n = 614)中,性能的提高是适度的,DeepLabV3+的Dice得分的最佳改进达到 + 2.2 %。相比之下,在外部数据集上观察到实质性的收益。在TDID数据集(n = 462)上,DeepLabV3+从38.2 %提高到59.1 % Dice(+20.9 %),而MITUNet和ResUNet也分别提高了7.1 %和6.9 %。在TUCC数据集(n = 192)上,DeepLabV3+在Dice上提高了11.4 %,在MITUNet上提高了6.9 %,在ResUNet上提高了3.1 %。结论:用特定任务的稳定扩散模型生成的合成图像增强超声数据集,大大提高了不同设备、不同机构和不同操作人员获得的数据集的甲状腺结节分割的鲁棒性。
{"title":"Domain adaptation of stable diffusion for ultrasound inpainting: a synthetic data approach for enhanced thyroid nodule segmentation","authors":"Antonin Prochazka,&nbsp;Jan Zeman","doi":"10.1016/j.jbi.2025.104963","DOIUrl":"10.1016/j.jbi.2025.104963","url":null,"abstract":"<div><h3>Objective</h3><div>To enhance the cross-domain generalization of thyroid-nodule segmentation models by augmenting limited ultrasound training data with synthetic images generated by a fine-tuned Stable Diffusion model.</div></div><div><h3>Methods</h3><div>Three public thyroid ultrasound datasets with heterogeneous acquisition characteristics were used: TN3K (training + testing), TDID, and TUCC. The denoising UNet inside Stable Diffusion v1.4 was fine-tuned on 2303 TN3K nodules and then used to synthesize realistic thyroid nodules. Using the model’s inpainting capability, same number of synthetic nodules were inserted into original ultrasound images. The combined data were then used to train ResUNet, DeepLabV3+ and MITUnet segmentation networks with identical hyper-parameters. Performance between the models trained on native data only and native + synthetic data was quantified with the Dice similarity coefficient (Dice score) and Intersection-over-Union (IoU).</div></div><div><h3>Results</h3><div>Across the in-domain TN3K test set (n = 614), performance gains were modest, with the best improvements reaching + 2.2 % in Dice score for DeepLabV3+. In contrast, substantial gains were observed on the external datasets. On the TDID dataset (n = 462), DeepLabV3+ improved from 38.2 % to 59.1 % Dice (+20.9 %), while MITUNet and ResUNet also gained up by 7.1 % and 6.9 % respectively. On the TUCC dataset (n = 192), DeepLabV3+ improved by 11.4 % in Dice, MITUNet by 6.9 %, and ResUNet by 3.1 %. All improvements—except for in-domain TN3K—were statistically significant (p &lt; 0.01, paired <em>t</em>-test or Wilcoxon signed-rank test), confirming that synthetic images generated by Stable Diffusion enhance cross-domain segmentation robustness.</div></div><div><h3>Conclusion</h3><div>Augmenting ultrasound dataset with synthetic images generated by a task-specific Stable Diffusion model substantially improves the robustness of thyroid nodule segmentation across datasets acquired with different devices, at different institutions, and by different operators.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"173 ","pages":"Article 104963"},"PeriodicalIF":4.5,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MDD-MARF: a multimodal depression detection model based on multi-level attention mechanism and residual fusion MDD-MARF:基于多层次注意机制和残差融合的多模态抑郁检测模型
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-29 DOI: 10.1016/j.jbi.2025.104965
Jianghai Zhou , Jike Ge , Zuqin Chen , Jie Tan , You Li

Objective

Depression is a serious mental disorder that significantly affects patients’ work ability and social functioning. With the rapid development of artificial intelligence, researchers have begun to explore automatic depression detection methods based on multimodal data. However, multimodal data are often accompanied by a large amount of noise. Existing methods usually lack sufficient feature screening after extraction and are directly applied to downstream tasks, which may limit the model’s generalization ability. In addition, current multimodal fusion strategies still face several challenges.

Methods

To address these challenges, we propose a novel multimodal depression detection model that integrates three modalities: audio, vision, and text. The model extracts depression-related key features through a multi-level attention mechanism and achieves efficient multimodal feature fusion using skip connections with a residual structure.

Results

Experiments conducted on the DAIC-WOZ dataset showed that the proposed method achieved a mean absolute error (MAE) of 3.13 and a root mean square error (RMSE) of 3.59, outperforming existing state-of-the-art models. The generalization ability of the model was further validated on the E-DAIC dataset, demonstrating its effectiveness and robustness.

Conclusion

The proposed method provides an efficient and reliable solution for depression detection using multimodal data and multi-level attention mechanisms. The findings highlight the significant value of multimodal learning in the medical field and offer strong support for the development of AI-assisted clinical decision-making systems.
目的抑郁症是一种严重的精神障碍,严重影响患者的工作能力和社会功能。随着人工智能的快速发展,研究人员开始探索基于多模态数据的抑郁症自动检测方法。然而,多模态数据往往伴随着大量的噪声。现有方法通常在提取后缺乏足够的特征筛选,直接应用于下游任务,这可能会限制模型的泛化能力。此外,当前的多模态融合策略还面临着一些挑战。为了解决这些挑战,我们提出了一种新的多模态抑郁症检测模型,该模型集成了三种模态:音频、视觉和文本。该模型通过多层次注意机制提取抑郁相关关键特征,并利用带有残余结构的跳跃连接实现高效的多模态特征融合。结果在DAIC-WOZ数据集上进行的实验表明,该方法的平均绝对误差(MAE)为3.13,均方根误差(RMSE)为3.59,优于现有的最先进模型。在e - aic数据集上进一步验证了模型的泛化能力,证明了模型的有效性和鲁棒性。结论该方法利用多模态数据和多层次注意机制,为抑郁症检测提供了高效可靠的解决方案。研究结果突出了多模式学习在医学领域的重要价值,并为人工智能辅助临床决策系统的发展提供了强有力的支持。
{"title":"MDD-MARF: a multimodal depression detection model based on multi-level attention mechanism and residual fusion","authors":"Jianghai Zhou ,&nbsp;Jike Ge ,&nbsp;Zuqin Chen ,&nbsp;Jie Tan ,&nbsp;You Li","doi":"10.1016/j.jbi.2025.104965","DOIUrl":"10.1016/j.jbi.2025.104965","url":null,"abstract":"<div><h3>Objective</h3><div>Depression is a serious mental disorder that significantly affects patients’ work ability and social functioning. With the rapid development of artificial intelligence, researchers have begun to explore automatic depression detection methods based on multimodal data. However, multimodal data are often accompanied by a large amount of noise. Existing methods usually lack sufficient feature screening after extraction and are directly applied to downstream tasks, which may limit the model’s generalization ability. In addition, current multimodal fusion strategies still face several challenges.</div></div><div><h3>Methods</h3><div>To address these challenges, we propose a novel multimodal depression detection model that integrates three modalities: audio, vision, and text. The model extracts depression-related key features through a multi-level attention mechanism and achieves efficient multimodal feature fusion using skip connections with a residual structure.</div></div><div><h3>Results</h3><div>Experiments conducted on the DAIC-WOZ dataset showed that the proposed method achieved a mean absolute error (MAE) of 3.13 and a root mean square error (RMSE) of 3.59, outperforming existing state-of-the-art models. The generalization ability of the model was further validated on the E-DAIC dataset, demonstrating its effectiveness and robustness.</div></div><div><h3>Conclusion</h3><div>The proposed method provides an efficient and reliable solution for depression detection using multimodal data and multi-level attention mechanisms. The findings highlight the significant value of multimodal learning in the medical field and offer strong support for the development of AI-assisted clinical decision-making systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"173 ","pages":"Article 104965"},"PeriodicalIF":4.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145645673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting drug-target interactions based on multivariate information fusion and graph contrast learning 基于多元信息融合和图对比学习的药物-靶标相互作用预测。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-18 DOI: 10.1016/j.jbi.2025.104960
Siying Yang , Ping-an He , Pan Zeng , Yajie Meng , Zilong Zhang , Feifei Cui , Yuhua Yao , Jialiang Yang , Junlin Xu
Drug-target interaction (DTI) prediction is of great significant in stimulating innovation and research in the medical field. In recent years, traditional experimental methods for predicting DTIs have proven to be time-consuming and costly. As a result, machine learning methods have been extensively applied to improve the prediction of drug-target interactions. However, the sparsity of inter-node connections often results in insufficiently learned node representations. Furthermore, many methods do not take into account the topological similarity between nodes when integrating similarities. This study proposes a model that integrates multiple sources of information and utilizes Graph Contrastive Learning (GCL) to predict potential drug and target interactions (MGCLDTI). Firstly, MGCLDTI employs the DeepWalk algorithm to extract global topological representations from the heterogeneous graph which incorporates multi-view information of drugs, targets, and diseases. Subsequently, a densification strategy is implemented to alleviate the noise impact arising from the sparsity of the DTI matrix. Furthermore, a GCL model with node masking is applied to enhance local structural awareness and optimize the embeddings of drugs and targets. Finally, DTI scores are predicted using the LightGBM algorithm. Comparative results against state-of-the-art methods demonstrate that MGCLDTI achieves superior predictive performance. Besides, ablation studies reveal the effectiveness of each component. Case studies also provide compelling evidence of MGCLDTI’s accuracy in identifying potential DTIs.
药物-靶标相互作用(DTI)预测对促进医学领域的创新和研究具有重要意义。近年来,传统的预测dti的实验方法被证明是耗时且昂贵的。因此,机器学习方法已被广泛应用于改善药物-靶标相互作用的预测。然而,节点间连接的稀疏性常常导致学习到的节点表示不充分。此外,许多方法在积分相似度时没有考虑节点间的拓扑相似度。本研究提出了一个整合多种信息来源的模型,并利用图对比学习(GCL)来预测潜在的药物和靶标相互作用(MGCLDTI)。首先,MGCLDTI采用DeepWalk算法从包含药物、靶点和疾病多视图信息的异构图中提取全局拓扑表示;随后,实现了致密化策略,以减轻DTI矩阵稀疏性引起的噪声影响。在此基础上,采用基于节点掩蔽的GCL模型增强局部结构感知,优化药物和靶点的嵌入。最后,使用LightGBM算法预测DTI分数。与最先进方法的比较结果表明,MGCLDTI具有优越的预测性能。此外,消融研究揭示了各组分的有效性。案例研究也提供了令人信服的证据,证明MGCLDTI在识别潜在dti方面的准确性。
{"title":"Predicting drug-target interactions based on multivariate information fusion and graph contrast learning","authors":"Siying Yang ,&nbsp;Ping-an He ,&nbsp;Pan Zeng ,&nbsp;Yajie Meng ,&nbsp;Zilong Zhang ,&nbsp;Feifei Cui ,&nbsp;Yuhua Yao ,&nbsp;Jialiang Yang ,&nbsp;Junlin Xu","doi":"10.1016/j.jbi.2025.104960","DOIUrl":"10.1016/j.jbi.2025.104960","url":null,"abstract":"<div><div>Drug-target interaction (DTI) prediction is of great significant in stimulating innovation and research in the medical field. In recent years, traditional experimental methods for predicting DTIs have proven to be time-consuming and costly. As a result, machine learning methods have been extensively applied to improve the prediction of drug-target interactions. However, the sparsity of inter-node connections often results in insufficiently learned node representations. Furthermore, many methods do not take into account the topological similarity between nodes when integrating similarities. This study proposes a model that integrates multiple sources of information and utilizes Graph Contrastive Learning (GCL) to predict potential drug and target interactions (MGCLDTI). Firstly, MGCLDTI employs the DeepWalk algorithm to extract global topological representations from the heterogeneous graph which incorporates multi-view information of drugs, targets, and diseases. Subsequently, a densification strategy is implemented to alleviate the noise impact arising from the sparsity of the DTI matrix. Furthermore, a GCL model with node masking is applied to enhance local structural awareness and optimize the embeddings of drugs and targets. Finally, DTI scores are predicted using the LightGBM algorithm. Comparative results against state-of-the-art methods demonstrate that MGCLDTI achieves superior predictive performance. Besides, ablation studies reveal the effectiveness of each component. Case studies also provide compelling evidence of MGCLDTI’s accuracy in identifying potential DTIs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104960"},"PeriodicalIF":4.5,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145556918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study 通过机器学习从多中心初级和二级保健电子健康记录中揭示新的膀胱癌关联:一项病例对照研究
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-15 DOI: 10.1016/j.jbi.2025.104959
Xu Wang , Andrea Preston , Jonathan Aning , Shang-Ming Zhou

Objective

The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.

Methods

We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.

Results

The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.

Conclusion

The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.
目的:膀胱癌(BC)发病率和死亡率的上升强调了识别相关特征的重要性。目前依赖血尿作为BC的主要指标被证明是不充分的。虽然挖掘电子健康记录(EHRs)提供了识别bc相关信号的潜力,但传统的数据驱动方法难以处理高维数据集。本研究旨在通过开发用于初级保健电子病历(PanSPICE)的parsimony驱动的类别平衡二进制信号提取器(PanSPICE)来发现新的bc相关临床信号,该提取器专为来自多中心的高维数据量身定制。方法:我们从威尔士全国数据库中收集BC病例和对照患者(n = 64,884),在初级保健机构中获得48,261个特征。PanSPICE方法从信息增益开始对特征进行预排序,然后将保留粘性二元粒子群优化(RSBPSO)与C5.0分类树相结合,克服特征选择中的计算障碍。两层优化分别处理护理过程(POC)、诊断(DIAG)和药物(MED)中的临床信号,以防止特征掩蔽。为RSBPSO量身定制适应度函数,同时优化模型性能和特征稀疏性。所选特征之间的关联使用剥夺指数调整后的逻辑回归模型进行解释。结果:38 PanSPICE确定最优特性(AUC(曲线下的面积) = 0.81,95 % CI: 0.80 - -0.82),包括尿路感染(或 = 2.19,95 % CI: 2.05 - -2.14)和逆协会与中风(或 = 0.64,95 % CI: 0.54 - -0.74)和老年痴呆症(或 = 0.25,95 % CI: 0.17 - -0.35)。性别分层显示女性特异性尿糖检测相关(OR = 1.24,95 % CI: 1.08-1.43)。某些药物,如甲氧苄啶,与BC呈正相关,而其他药物,包括雷米普利和强的松龙,显示出保护作用。结论:PanSPICE实现了高效的高维电子病历分析,揭示了未被识别的潜在BC风险概况和保护性合并症。BC相关性的性别差异突出了性别分层分析的重要性,而计算的进步为基于ehr的临床发现提供了模板。研究结果支持对神经保护通路进行进一步的机制研究。
{"title":"Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study","authors":"Xu Wang ,&nbsp;Andrea Preston ,&nbsp;Jonathan Aning ,&nbsp;Shang-Ming Zhou","doi":"10.1016/j.jbi.2025.104959","DOIUrl":"10.1016/j.jbi.2025.104959","url":null,"abstract":"<div><h3>Objective</h3><div>The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.</div></div><div><h3>Methods</h3><div>We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.</div></div><div><h3>Results</h3><div>The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.</div></div><div><h3>Conclusion</h3><div>The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104959"},"PeriodicalIF":4.5,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-objective optimization formulation for Alzheimer’s disease trial patient selection 阿尔茨海默病试验患者选择的多目标优化配方。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-15 DOI: 10.1016/j.jbi.2025.104955
Alireza Moayedikia , Sara Fin , Uffe Kock Wiil

Objective:

Clinical trial recruitment faces critical challenges with screen failure rates exceeding 80% in Alzheimer’s disease (AD) trials. Traditional patient selection relies on expert consensus without systematic evaluation of trade-offs between statistical power, recruitment feasibility, safety, and cost. We developed a multi-objective optimization framework to systematically identify optimal eligibility criteria configurations that balance competing objectives in AD clinical trial design.

Methods:

We implemented the Non-dominated Sorting Genetic Algorithm III (NSGA-III) to optimize patient selection criteria across three objectives: patient identification accuracy (F1 score), recruitment balance, and economic efficiency. The framework utilized National Alzheimer’s Coordinating Center data comprising 2,743 participants with comprehensive clinical assessments and cerebrospinal fluid biomarker measurements. We optimized 14 eligibility parameters including age boundaries, cognitive thresholds, biomarker criteria, and comorbidity management policies. Statistical validation employed Monte Carlo simulation with 10,000 iterations, bootstrap analysis, and SHAP interpretability analysis.

Results:

Optimization identified 11 Pareto-optimal solutions spanning F1 scores from 0.979 to 0.995 and eligible patient pools from 108 to 327. Compared to standard criteria selecting 101 participants, optimized approaches identified 102 participants with no significant demographic or clinical differences after multiple comparison correction. Monte Carlo simulation revealed mean cost savings of $1,048 per patient (95% CI: -$1,251 to $3,492), with 80.7% probability of positive savings but 19.3% risk of cost increases (SD = $1,208). Cross-validation demonstrated high precision (95.1%) with strategic selectivity (9.4% recall). SHAP analysis identified biomarker requirements as the dominant cost driver. Optimization algorithms converged toward solutions similar to expert-designed criteria, validating both computational and clinical approaches.

Conclusion:

Multi-objective optimization provides meaningful but incremental value through systematic validation and probabilistic efficiency enhancement rather than revolutionary transformation. The convergence toward established practice demonstrates that computational approaches serve as sophisticated validation tools that identify concrete yet uncertain efficiency improvements within existing frameworks. The substantial variability in projected outcomes establishes realistic expectations and highlights the importance of site-specific evaluation, particularly regarding recruitment infrastructure quality as the dominant determinant of success. This establishes a mature paradigm for evidence-based trial design optimization that enhances rather than replaces clinical expertise.
目的:阿尔茨海默病(AD)临床试验筛选失败率超过80%,临床试验招募面临严峻挑战。传统的患者选择依赖于专家共识,而没有对统计能力、招募可行性、安全性和成本之间的权衡进行系统评估。我们开发了一个多目标优化框架,系统地确定最佳资格标准配置,以平衡阿尔茨海默病临床试验设计中的竞争目标。方法:我们实施非支配排序遗传算法III (NSGA-III),以优化患者选择标准,包括三个目标:患者识别准确性(F1评分)、招募平衡和经济效率。该框架利用了国家阿尔茨海默病协调中心的数据,包括2743名参与者的综合临床评估和脑脊液生物标志物测量。我们优化了14个资格参数,包括年龄界限、认知阈值、生物标志物标准和合并症管理政策。统计验证采用具有10,000次迭代的蒙特卡罗模拟、自举分析和SHAP可解释性分析。结果:优化确定了11个pareto最优解,F1评分范围为0.979 ~ 0.995,符合条件的患者池范围为108 ~ 327。与选择101名受试者的标准标准相比,经过多次比较校正后,优化的方法确定了102名无显著人口统计学或临床差异的受试者。蒙特卡罗模拟显示,每位患者平均节省了1,048美元的成本(95% CI: - 1,251美元至3,492美元),节省成本的概率为80.7%,但成本增加的风险为19.3% (SD = 1,208美元)。交叉验证结果表明,该方法具有较高的精密度(95.1%)和策略选择性(9.4%)。SHAP分析确定生物标志物需求是主要的成本驱动因素。优化算法趋向于解决方案类似于专家设计的标准,验证计算和临床方法。结论:多目标优化不是革命性的变革,而是系统性的验证和概率性的效率提升,提供了有意义的增量价值。向既定实践的趋同表明,计算方法可以作为复杂的验证工具,在现有框架中识别具体但不确定的效率改进。预测结果的巨大可变性建立了现实的期望,并突出了具体地点评估的重要性,特别是将招聘基础设施质量作为成功的主要决定因素。这为循证试验设计优化建立了一个成熟的范例,增强而不是取代临床专业知识。
{"title":"Multi-objective optimization formulation for Alzheimer’s disease trial patient selection","authors":"Alireza Moayedikia ,&nbsp;Sara Fin ,&nbsp;Uffe Kock Wiil","doi":"10.1016/j.jbi.2025.104955","DOIUrl":"10.1016/j.jbi.2025.104955","url":null,"abstract":"<div><h3>Objective:</h3><div>Clinical trial recruitment faces critical challenges with screen failure rates exceeding 80% in Alzheimer’s disease (AD) trials. Traditional patient selection relies on expert consensus without systematic evaluation of trade-offs between statistical power, recruitment feasibility, safety, and cost. We developed a multi-objective optimization framework to systematically identify optimal eligibility criteria configurations that balance competing objectives in AD clinical trial design.</div></div><div><h3>Methods:</h3><div>We implemented the Non-dominated Sorting Genetic Algorithm III (NSGA-III) to optimize patient selection criteria across three objectives: patient identification accuracy (F1 score), recruitment balance, and economic efficiency. The framework utilized National Alzheimer’s Coordinating Center data comprising 2,743 participants with comprehensive clinical assessments and cerebrospinal fluid biomarker measurements. We optimized 14 eligibility parameters including age boundaries, cognitive thresholds, biomarker criteria, and comorbidity management policies. Statistical validation employed Monte Carlo simulation with 10,000 iterations, bootstrap analysis, and SHAP interpretability analysis.</div></div><div><h3>Results:</h3><div>Optimization identified 11 Pareto-optimal solutions spanning F1 scores from 0.979 to 0.995 and eligible patient pools from 108 to 327. Compared to standard criteria selecting 101 participants, optimized approaches identified 102 participants with no significant demographic or clinical differences after multiple comparison correction. Monte Carlo simulation revealed mean cost savings of $1,048 per patient (95% CI: -$1,251 to $3,492), with 80.7% probability of positive savings but 19.3% risk of cost increases (SD = $1,208). Cross-validation demonstrated high precision (95.1%) with strategic selectivity (9.4% recall). SHAP analysis identified biomarker requirements as the dominant cost driver. Optimization algorithms converged toward solutions similar to expert-designed criteria, validating both computational and clinical approaches.</div></div><div><h3>Conclusion:</h3><div>Multi-objective optimization provides meaningful but incremental value through systematic validation and probabilistic efficiency enhancement rather than revolutionary transformation. The convergence toward established practice demonstrates that computational approaches serve as sophisticated validation tools that identify concrete yet uncertain efficiency improvements within existing frameworks. The substantial variability in projected outcomes establishes realistic expectations and highlights the importance of site-specific evaluation, particularly regarding recruitment infrastructure quality as the dominant determinant of success. This establishes a mature paradigm for evidence-based trial design optimization that enhances rather than replaces clinical expertise.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104955"},"PeriodicalIF":4.5,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145540741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1