Pub Date : 2025-10-08DOI: 10.1016/j.jbi.2025.104923
Qiuyang Feng , Xiao Huang
Drug–drug interactions are a major concern in healthcare, as concurrent drug use can cause severe adverse effects. Existing machine learning methods often neglect data imbalance and DDI directionality, limiting clinical reliability. To overcome these issues, we employed GPT-4o Large Language Model to convert free-text DDI descriptions into structured triplets for directionality analysis and applied SMOTE to alleviate class imbalance. Using four key drug features (molecular fingerprints, enzymes, pathways, targets), our Deep Neural Networks (DNN) achieved 88.9% accuracy and showed an average AUPR gain of 0.68 for minority classes attributable to SMOTE. By applying attention-based feature importance analysis, we demonstrated that the most influential feature in the DNN model was supported by pharmacological evidence. These results demonstrate the effectiveness of our framework for accurate and robust DDI prediction. The source code and data are available at https://github.com/FrankFengF/Drug-drug-interaction-prediction-
{"title":"Multi-feature machine learning for enhanced drug–drug interaction prediction","authors":"Qiuyang Feng , Xiao Huang","doi":"10.1016/j.jbi.2025.104923","DOIUrl":"10.1016/j.jbi.2025.104923","url":null,"abstract":"<div><div>Drug–drug interactions are a major concern in healthcare, as concurrent drug use can cause severe adverse effects. Existing machine learning methods often neglect data imbalance and DDI directionality, limiting clinical reliability. To overcome these issues, we employed GPT-4o Large Language Model to convert free-text DDI descriptions into structured triplets for directionality analysis and applied SMOTE to alleviate class imbalance. Using four key drug features (molecular fingerprints, enzymes, pathways, targets), our Deep Neural Networks (DNN) achieved 88.9% accuracy and showed an average AUPR gain of 0.68 for minority classes attributable to SMOTE. By applying attention-based feature importance analysis, we demonstrated that the most influential feature in the DNN model was supported by pharmacological evidence. These results demonstrate the effectiveness of our framework for accurate and robust DDI prediction. The source code and data are available at <span><span>https://github.com/FrankFengF/Drug-drug-interaction-prediction-</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104923"},"PeriodicalIF":4.5,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145258203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-04DOI: 10.1016/j.jbi.2025.104925
Luke Stevens , Nan Kennedy , Rob J. Taylor , Adam Lewis , Frank E. Harrell Jr , Matthew S. Shotwell , Emily S. Serdoz , Gordon R. Bernard , Wesley H. Self , Christopher J. Lindsell , Paul A. Harris , Jonathan D. Casey
Objective
Since 2012, the electronic data capture platform REDCap has included an embedded randomization module allowing a single randomization per study record with the ability to stratify by variables such as study site and participant sex at birth. In recent years, platform, adaptive, decentralized, and pragmatic trials have gained popularity. These trial designs often require approaches to randomization not supported by the original REDCap randomization module, including randomizing patients into multiple domains or at multiple points in time, changing allocation tables to add or drop study groups, or adaptively changing allocation ratios based on data from previously enrolled participants. Our team aimed to develop new randomization functions to address these issues.
Methods
A collaborative process facilitated by the NIH-funded Trial Innovation Network was initiated to modernize the randomization module in REDCap, incorporating feedback from clinical trialists, biostatisticians, technologists, and other experts.
Results
This effort led to the development of an advanced randomization module within the REDCap platform. In addition to supporting platform, adaptive, decentralized, and pragmatic trials, the new module introduces several new features, such as improved support for blinded randomization, additional randomization metadata capture (e.g., user identity and timestamp), additional tools allowing REDCap administrators to support investigators using the randomization module, and the ability for clinicians participating in pragmatic or decentralized trials to perform randomization through a survey without needing log-in access to the study database. As of June 19, 2025, multiple randomizations have been used in 211 projects from 55 institutions, randomizations with real-time trigger logic in 108 projects from 64 institutions, and blinded group allocation in 24 projects from 17 institutions.
Conclusion
The new randomization module aims to streamline the randomization process, improve trial efficiency, and ensure robust data integrity, thereby supporting the conduct of more sophisticated and adaptive clinical trials.
{"title":"A REDCap advanced randomization module to meet the needs of modern trials","authors":"Luke Stevens , Nan Kennedy , Rob J. Taylor , Adam Lewis , Frank E. Harrell Jr , Matthew S. Shotwell , Emily S. Serdoz , Gordon R. Bernard , Wesley H. Self , Christopher J. Lindsell , Paul A. Harris , Jonathan D. Casey","doi":"10.1016/j.jbi.2025.104925","DOIUrl":"10.1016/j.jbi.2025.104925","url":null,"abstract":"<div><h3>Objective</h3><div>Since 2012, the electronic data capture platform REDCap has included an embedded randomization module allowing a single randomization per study record with the ability to stratify by variables such as study site and participant sex at birth. In recent years, platform, adaptive, decentralized, and pragmatic trials have gained popularity. These trial designs often require approaches to randomization not supported by the original REDCap randomization module, including randomizing patients into multiple domains or at multiple points in time, changing allocation tables to add or drop study groups, or adaptively changing allocation ratios based on data from previously enrolled participants. Our team aimed to develop new randomization functions to address these issues.</div></div><div><h3>Methods</h3><div>A collaborative process facilitated by the NIH-funded Trial Innovation Network was initiated to modernize the randomization module in REDCap, incorporating feedback from clinical trialists, biostatisticians, technologists, and other experts.</div></div><div><h3>Results</h3><div>This effort led to the development of an advanced randomization module within the REDCap platform. In addition to supporting platform, adaptive, decentralized, and pragmatic trials, the new module introduces several new features, such as improved support for blinded randomization, additional randomization metadata capture (e.g., user identity and timestamp), additional tools allowing REDCap administrators to support investigators using the randomization module, and the ability for clinicians participating in pragmatic or decentralized trials to perform randomization through a survey without needing log-in access to the study database. As of June 19, 2025, multiple randomizations have been used in 211 projects from 55 institutions, randomizations with real-time trigger logic in 108 projects from 64 institutions, and blinded group allocation in 24 projects from 17 institutions.</div></div><div><h3>Conclusion</h3><div>The new randomization module aims to streamline the randomization process, improve trial efficiency, and ensure robust data integrity, thereby supporting the conduct of more sophisticated and adaptive clinical trials.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104925"},"PeriodicalIF":4.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145238683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.jbi.2025.104920
Şeyma Selcan Mağara, Noah Dietrich, Ali Burak Ünal, Mete Akgün
Objective:
Record linkage is essential for integrating data from multiple sources with diverse applications in real-world healthcare and research. Probabilistic Privacy-Preserving Record Linkage (PPRL) enables this integration occurs, while protecting sensitive information from unauthorized access, especially when datasets lack exact identifiers. As privacy regulations evolve and multi-institutional collaborations expand globally, there is a growing demand for methods that effectively balance security, accuracy, and efficiency. However, ensuring both privacy and scalability in large-scale record linkage remains a key challenge.
Method:
This paper presents a novel and efficient PPRL method based on a secure 3-party computation (MPC) framework. Our approach allows multiple parties to compute linkage results without exposing their private inputs and significantly improves the speed of linkage process compared to existing PPRL solutions.
Result:
Our method preserves the linkage quality of a state-of-the-art (SOTA) MPC-based PPRL method while achieving up to 14 times faster performance. For example, linking a record against a database of 10,000 records takes just 8.74 s in a realistic network with 700 Mbps bandwidth and 60 ms latency, compared to 92.32 s with the SOTA method. Even on a slower internet connection with 100 Mbps bandwidth and 60 ms latency, the linkage completes in 28 s, where as the SOTA method requires 287.96 s. These results demonstrate the significant scalability and efficiency improvements of our approach.
Conclusion:
Our novel PPRL method, based on secure 3-party computation, offers an efficient and scalable solution for large-scale record linkage while ensuring privacy protection. The approach demonstrates significant performance improvements, making it a promising tool for secure data integration in privacy-sensitive sectors.
{"title":"Accelerating probabilistic privacy-preserving medical record linkage: A three-party MPC approach","authors":"Şeyma Selcan Mağara, Noah Dietrich, Ali Burak Ünal, Mete Akgün","doi":"10.1016/j.jbi.2025.104920","DOIUrl":"10.1016/j.jbi.2025.104920","url":null,"abstract":"<div><h3>Objective:</h3><div>Record linkage is essential for integrating data from multiple sources with diverse applications in real-world healthcare and research. Probabilistic Privacy-Preserving Record Linkage (PPRL) enables this integration occurs, while protecting sensitive information from unauthorized access, especially when datasets lack exact identifiers. As privacy regulations evolve and multi-institutional collaborations expand globally, there is a growing demand for methods that effectively balance security, accuracy, and efficiency. However, ensuring both privacy and scalability in large-scale record linkage remains a key challenge.</div></div><div><h3>Method:</h3><div>This paper presents a novel and efficient PPRL method based on a secure 3-party computation (MPC) framework. Our approach allows multiple parties to compute linkage results without exposing their private inputs and significantly improves the speed of linkage process compared to existing PPRL solutions.</div></div><div><h3>Result:</h3><div>Our method preserves the linkage quality of a state-of-the-art (SOTA) MPC-based PPRL method while achieving up to 14 times faster performance. For example, linking a record against a database of 10,000 records takes just 8.74 s in a realistic network with 700 Mbps bandwidth and 60 ms latency, compared to 92.32 s with the SOTA method. Even on a slower internet connection with 100 Mbps bandwidth and 60 ms latency, the linkage completes in 28 s, where as the SOTA method requires 287.96 s. These results demonstrate the significant scalability and efficiency improvements of our approach.</div></div><div><h3>Conclusion:</h3><div>Our novel PPRL method, based on secure 3-party computation, offers an efficient and scalable solution for large-scale record linkage while ensuring privacy protection. The approach demonstrates significant performance improvements, making it a promising tool for secure data integration in privacy-sensitive sectors.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104920"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30DOI: 10.1016/j.jbi.2025.104921
Biyang Zeng, Shikui Tu, Lei Xu
Predicting the synergy of drug combinations is crucial for cancer treatment and drug development. Accurate prediction requires the integration of multiple types of data, including molecular structures of individual drugs, available synergy scores between drugs, and gene expression information from different cancer cell lines. The first two types contain multi-scale information within or between drugs, while the cell lines serve as the contextual background for drug interactions. Existing machine learning methods fail to fully utilize and integrate these information, leading to suboptimal performance. To address this issue, we introduce GraphFusion, an innovative approach that combines molecular graphs and drug synergy graphs with cell line contextual information. By employing novel GCN and Graphormer modules capable of accepting and utilizing external information, GraphFusion integrates these two levels of graph information. Specifically, the molecular graphs pass fine-grained structural information to the synergy graphs, while the synergy graphs convey global drug interaction data to the molecular graphs. Additionally, cell line information is incorporated as contextual background. This comprehensive integration enables GraphFusion to achieve state-of-the-art results on the O’Neil and NCI-ALMANAC datasets.
{"title":"GraphFusion: Integrative prediction of drug synergy using multi-scale graph representations and cell line contexts","authors":"Biyang Zeng, Shikui Tu, Lei Xu","doi":"10.1016/j.jbi.2025.104921","DOIUrl":"10.1016/j.jbi.2025.104921","url":null,"abstract":"<div><div>Predicting the synergy of drug combinations is crucial for cancer treatment and drug development. Accurate prediction requires the integration of multiple types of data, including molecular structures of individual drugs, available synergy scores between drugs, and gene expression information from different cancer cell lines. The first two types contain multi-scale information within or between drugs, while the cell lines serve as the contextual background for drug interactions. Existing machine learning methods fail to fully utilize and integrate these information, leading to suboptimal performance. To address this issue, we introduce GraphFusion, an innovative approach that combines molecular graphs and drug synergy graphs with cell line contextual information. By employing novel GCN and Graphormer modules capable of accepting and utilizing external information, GraphFusion integrates these two levels of graph information. Specifically, the molecular graphs pass fine-grained structural information to the synergy graphs, while the synergy graphs convey global drug interaction data to the molecular graphs. Additionally, cell line information is incorporated as contextual background. This comprehensive integration enables GraphFusion to achieve state-of-the-art results on the O’Neil and NCI-ALMANAC datasets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104921"},"PeriodicalIF":4.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145212771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biobanks and biomolecular resources are increasingly central to data-driven biomedical research, encompassing not only metadata but also granular, sample-related data from diverse sources such as healthcare systems, national registries, and research outputs. However, the lack of a standardised, machine-readable format for representing such data limits interoperability, data reuse and integration into clinical and research environments. While MIABIS provides a conceptual model for biobank data, its abstract nature and reliance on heterogeneous implementations create barriers to practical, scalable adoption. This study presents a pragmatic, operational implementation of MIABIS focused on enabling real-world exchange and integration of sample-level data.
Methods
We systematically evaluated established data exchange standards, comparing HL7 FHIR and OMOP CDM with respect to their suitability for structuring sample-related data in a semantically robust and machine-readable form. Based on this analysis, we developed a FHIR-based representation of MIABIS that supports complex biobank structures and enables integration with federated data infrastructures. Supporting tools, including a Python library and an implementation guide, were created to ensure usability across diverse research and clinical contexts.
Results
We created nine interoperable FHIR profiles covering core MIABIS entities, ensuring consistency with FHIR standards. To support adoption, we developed an open-source Python library that abstracts FHIR interactions and provides schema validation for MIABIS-compliant data. The library was integrated into an ETL tool in operation at Czech Node of BBMRI-ERIC, European Biobanking and Biomolecular Resources Research Infrastructure, to demonstrate usability with real-world sample-related data. Separately, we validated the representation of MIABIS entities at the organisational level by converting the data structures of BBMRI-ERIC Directory into FHIR, demonstrating compatibility with federated data infrastructures.
Conclusion
This work delivers a machine-readable, interoperable implementation of MIABIS, enabling the exchange of both organisational and sample-level data across biobanks and health information systems. By integrating MIABIS with HL7 FHIR, we provide a host of reusable tools and mechanisms for further evolution of the data model. Combined, these benefits can help with the integration into clinical and research workflows, supporting data discoverability, reuse, and cross-institutional collaboration in biomedical research.
{"title":"Definitions to data flow: Operationalizing MIABIS in HL7 FHIR","authors":"Radovan Tomášik , Šimon Koňár , Niina Eklund , Cäcilia Engels , Zdenka Dudova , Radoslava Kacová , Roman Hrstka , Petr Holub","doi":"10.1016/j.jbi.2025.104919","DOIUrl":"10.1016/j.jbi.2025.104919","url":null,"abstract":"<div><h3>Objective</h3><div>Biobanks and biomolecular resources are increasingly central to data-driven biomedical research, encompassing not only metadata but also granular, sample-related data from diverse sources such as healthcare systems, national registries, and research outputs. However, the lack of a standardised, machine-readable format for representing such data limits interoperability, data reuse and integration into clinical and research environments. While MIABIS provides a conceptual model for biobank data, its abstract nature and reliance on heterogeneous implementations create barriers to practical, scalable adoption. This study presents a pragmatic, operational implementation of MIABIS focused on enabling real-world exchange and integration of sample-level data.</div></div><div><h3>Methods</h3><div>We systematically evaluated established data exchange standards, comparing HL7 FHIR and OMOP CDM with respect to their suitability for structuring sample-related data in a semantically robust and machine-readable form. Based on this analysis, we developed a FHIR-based representation of MIABIS that supports complex biobank structures and enables integration with federated data infrastructures. Supporting tools, including a Python library and an implementation guide, were created to ensure usability across diverse research and clinical contexts.</div></div><div><h3>Results</h3><div>We <em>created nine interoperable FHIR profiles</em> covering core MIABIS entities, ensuring consistency with FHIR standards. To support adoption, we <em>developed an open-source Python library</em> that abstracts FHIR interactions and provides schema validation for MIABIS-compliant data. The <em>library was integrated into an ETL tool</em> in operation at Czech Node of BBMRI-ERIC, European Biobanking and Biomolecular Resources Research Infrastructure, to demonstrate usability with real-world sample-related data. Separately, we validated the representation of MIABIS entities at the organisational level by converting the data structures of BBMRI-ERIC Directory into FHIR, demonstrating compatibility with federated data infrastructures.</div></div><div><h3>Conclusion</h3><div>This work delivers a machine-readable, interoperable implementation of MIABIS, enabling the exchange of both organisational and sample-level data across biobanks and health information systems. By integrating MIABIS with HL7 FHIR, we provide a host of reusable tools and mechanisms for further evolution of the data model. Combined, these benefits can help with the integration into clinical and research workflows, supporting data discoverability, reuse, and cross-institutional collaboration in biomedical research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104919"},"PeriodicalIF":4.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1016/j.jbi.2025.104897
Christina A. van Hal , Elmer V. Bernstam , Todd R. Johnson
Objective:
Randomized Controlled Trials (RCTs) are the gold standard for clinical evidence, but ethical and practical constraints sometimes necessitate or warrant the use of observational data. The aim of this study is to identify informatics tools that support the design and conduct of Target Trial Emulations (TTEs), a framework for designing observational studies that closely emulate RCTs so as to minimize biases that often arise when using real-world evidence (RWE) to estimate causal effects.
Methods:
We divided the process of conducting TTEs into three phases and seven steps. We then systematically reviewed the literature to identify currently available tools that support one or more of the seven steps required to conduct a TTE. For each tool, we noted which step or steps the tool supports.
Results:
7625 papers were included in the initial review, with 76 meeting our inclusion criteria. Our review identified 24 distinct tools applicable to the three phases of TTE. Specifically, 3 tools support the Design Phase, 5 support the Implementation Phase, and 19 support the Analysis Phase, with some tools applicable to multiple phases.
Conclusion:
This review revealed significant gaps in tool support for the Design Phase of TTEs, while support for the Implementation and Analysis phases was highly variable. No single tool currently supports all aspects of TTEs from start to finish and few tools are interoperable, meaning they cannot be easily integrated into a unified workflow. The results highlight the need for further development of informatics tools for supporting TTEs.
{"title":"Review of tools to support Target Trial Emulation","authors":"Christina A. van Hal , Elmer V. Bernstam , Todd R. Johnson","doi":"10.1016/j.jbi.2025.104897","DOIUrl":"10.1016/j.jbi.2025.104897","url":null,"abstract":"<div><h3>Objective:</h3><div>Randomized Controlled Trials (RCTs) are the gold standard for clinical evidence, but ethical and practical constraints sometimes necessitate or warrant the use of observational data. The aim of this study is to identify informatics tools that support the design and conduct of Target Trial Emulations (TTEs), a framework for designing observational studies that closely emulate RCTs so as to minimize biases that often arise when using real-world evidence (RWE) to estimate causal effects.</div></div><div><h3>Methods:</h3><div>We divided the process of conducting TTEs into three phases and seven steps. We then systematically reviewed the literature to identify currently available tools that support one or more of the seven steps required to conduct a TTE. For each tool, we noted which step or steps the tool supports.</div></div><div><h3>Results:</h3><div>7625 papers were included in the initial review, with 76 meeting our inclusion criteria. Our review identified 24 distinct tools applicable to the three phases of TTE. Specifically, 3 tools support the Design Phase, 5 support the Implementation Phase, and 19 support the Analysis Phase, with some tools applicable to multiple phases.</div></div><div><h3>Conclusion:</h3><div>This review revealed significant gaps in tool support for the Design Phase of TTEs, while support for the Implementation and Analysis phases was highly variable. No single tool currently supports all aspects of TTEs from start to finish and few tools are interoperable, meaning they cannot be easily integrated into a unified workflow. The results highlight the need for further development of informatics tools for supporting TTEs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104897"},"PeriodicalIF":4.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145185991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1016/j.jbi.2025.104916
Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen
Objective:
Drug repositioning plays a pivotal role in expediting the drug discovery pipeline. The rapid development of computational methods has opened new avenues for predicting drug-disease associations (DDAs). Despite advancements in existing methodologies, challenges such as insufficient exploration of diverse relationships in heterogeneous biological networks and inadequate quality of negative samples have persisted.
Methods:
In this study, we introduce DRMGNE, a novel drug repositioning framework that harnesses metapath-guided learning and adaptive negative enhancement for DDA prediction. DRMGNE initiates with an autoencoder to extract semantic features based on similarity matrices. Subsequently, a comprehensive set of metapaths is designed to generate subgraphs, and graph convolutional networks are utilized to extract enriched node representations reflecting topological structures. Furthermore, the adaptive negative enhancement strategy is employed to improve the quality of negative samples, ensuring balanced learning.
Results:
Experimental evaluations demonstrate that DRMGNE outperforms state-of-the-art algorithms across three benchmark datasets. Additionally, case studies and molecular docking validations further underscore its potential in facilitating drug discovery and accelerating drug repurposing efforts.
Conclusion:
DRMGNE is a novel framework for DDA prediction that leverages metapath-based guidance and adaptive negative enhancement. Experiments on benchmark datasets show superior performance over existing methods, underscoring its potential impact in drug discovery.
{"title":"Drug repositioning with metapath guidance and adaptive negative sampling enhancement","authors":"Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen","doi":"10.1016/j.jbi.2025.104916","DOIUrl":"10.1016/j.jbi.2025.104916","url":null,"abstract":"<div><h3>Objective:</h3><div>Drug repositioning plays a pivotal role in expediting the drug discovery pipeline. The rapid development of computational methods has opened new avenues for predicting drug-disease associations (DDAs). Despite advancements in existing methodologies, challenges such as insufficient exploration of diverse relationships in heterogeneous biological networks and inadequate quality of negative samples have persisted.</div></div><div><h3>Methods:</h3><div>In this study, we introduce DRMGNE, a novel drug repositioning framework that harnesses metapath-guided learning and adaptive negative enhancement for DDA prediction. DRMGNE initiates with an autoencoder to extract semantic features based on similarity matrices. Subsequently, a comprehensive set of metapaths is designed to generate subgraphs, and graph convolutional networks are utilized to extract enriched node representations reflecting topological structures. Furthermore, the adaptive negative enhancement strategy is employed to improve the quality of negative samples, ensuring balanced learning.</div></div><div><h3>Results:</h3><div>Experimental evaluations demonstrate that DRMGNE outperforms state-of-the-art algorithms across three benchmark datasets. Additionally, case studies and molecular docking validations further underscore its potential in facilitating drug discovery and accelerating drug repurposing efforts.</div></div><div><h3>Conclusion:</h3><div>DRMGNE is a novel framework for DDA prediction that leverages metapath-based guidance and adaptive negative enhancement. Experiments on benchmark datasets show superior performance over existing methods, underscoring its potential impact in drug discovery.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104916"},"PeriodicalIF":4.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145182165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1016/j.jbi.2025.104914
Mingxuan Li, Shuai Li, Zhen Li, Mandong Hu
Drug Repositioning (DR) represents an innovative drug development strategy that significantly reduces both cost and time by identifying new therapeutic indications for approved drugs. Current methods primarily focus on extracting information from drug–disease networks, but often overlook critical local structural details between nodes. This study introduces CSDPDR, a novel Dual-branch graph neural network that integrates Topology Feature Information and Salient Feature Information to enhance drug repositioning accuracy and efficiency. Through the Topology-aware branch with Adaptive Residual Graph Attention and the Saliency-aware branch with Score-Driven Top-K Convolutional Graph Pooling, the model can capture both large-scale topology patterns and fine-grained local information. Furthermore, our approach effectively alleviate graph sparsity issues through meta-path-based network enhancement and confidence-based filtering mechanisms. Comparative experiments on two benchmark datasets an additional dataset demonstrate that CSDPDR significantly outperforms several state-of-the-art baseline methods. Case studies on Alzheimer’s disease and breast neoplasms further validate the model’s practical applicability and effectiveness.
{"title":"Cross-scale semantic fusion integration of dual pathway models in drug repositioning","authors":"Mingxuan Li, Shuai Li, Zhen Li, Mandong Hu","doi":"10.1016/j.jbi.2025.104914","DOIUrl":"10.1016/j.jbi.2025.104914","url":null,"abstract":"<div><div>Drug Repositioning (DR) represents an innovative drug development strategy that significantly reduces both cost and time by identifying new therapeutic indications for approved drugs. Current methods primarily focus on extracting information from drug–disease networks, but often overlook critical local structural details between nodes. This study introduces CSDPDR, a novel Dual-branch graph neural network that integrates Topology Feature Information and Salient Feature Information to enhance drug repositioning accuracy and efficiency. Through the Topology-aware branch with Adaptive Residual Graph Attention and the Saliency-aware branch with Score-Driven Top-K Convolutional Graph Pooling, the model can capture both large-scale topology patterns and fine-grained local information. Furthermore, our approach effectively alleviate graph sparsity issues through meta-path-based network enhancement and confidence-based filtering mechanisms. Comparative experiments on two benchmark datasets an additional dataset demonstrate that CSDPDR significantly outperforms several state-of-the-art baseline methods. Case studies on Alzheimer’s disease and breast neoplasms further validate the model’s practical applicability and effectiveness.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104914"},"PeriodicalIF":4.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145182173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1016/j.jbi.2025.104918
Pengfei Yin , Abel Armas Cervantes , Daniel Capurro
Importance
Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.
Objective
Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.
Methods
This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.
Result
Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.
Conclusion
In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.
{"title":"Measuring and visualizing healthcare process variability","authors":"Pengfei Yin , Abel Armas Cervantes , Daniel Capurro","doi":"10.1016/j.jbi.2025.104918","DOIUrl":"10.1016/j.jbi.2025.104918","url":null,"abstract":"<div><h3>Importance</h3><div>Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.</div></div><div><h3>Objective</h3><div>Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.</div></div><div><h3>Methods</h3><div>This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.</div></div><div><h3>Result</h3><div>Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.</div></div><div><h3>Conclusion</h3><div>In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104918"},"PeriodicalIF":4.5,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1016/j.jbi.2025.104913
Yue Zhang , Dengqun Sun , Lei Li , Jian Zhou , Xiuquan Du , Shuo Li
Objective:
Medical time series, a type of multivariate time series with missing values, is widely used to predict time series analysis, the “impute first, then predict” end-to-end architecture is used to address this issue. However, existing methods are likely to lead to the loss of uniqueness and key information of low-frequency sampled variables (LFSVs) when dealing with them. In this paper, we aim to develop a method that effectively handles LFSVs, preserving their distinctive characteristics and essential information throughout the modeling process.
Methods:
We propose a novel end-to-end method named Low-Frequency Variable-Driven network (LFVDNet) for medical time series analysis. Specifically, the Time-Aware Imputer (TA) module encodes the observed values and critical time information, and uses the attention mechanism to establish an association between the observed values and the missing values. TA adopts channel-independent strategy to prevent interference from high-frequency sampled variables (HFSVs) on LFSVs, thereby preserving the unique information contained in LFSVs. The Offset-Selection Module (OS) independently selects data points for each variable through offsets, avoiding the natural disadvantages of LFSVs in selection-based imputation, thus solving the problem of the loss of key information of LFSVs. LFVDNet is the first method for analyzing multivariate time series with missing values that emphasizes the effective utilization of LFSVs.
Results:
We carried out the experiments on four public datasets and the experimental results indicate that LFVDNet has better robustness and performance. All code is available at https://github.com/dxqllp/LFVDNet.
Conclusions:
This study proposes a novel method for medical time series analysis, namely LFVDNet, which aims to effectively utilize LFSVs. Specifically, we have designed the TA module, which performs imputation through temporal correlations. The OS module, on the other hand, performs selective imputation based on a data point selection strategy. We have verified the effectiveness of this method on four datasets constructed from PhysioNet 2012 and MIMIC-IV.
{"title":"LFVDNet: Low-frequency variable-driven network for medical time series","authors":"Yue Zhang , Dengqun Sun , Lei Li , Jian Zhou , Xiuquan Du , Shuo Li","doi":"10.1016/j.jbi.2025.104913","DOIUrl":"10.1016/j.jbi.2025.104913","url":null,"abstract":"<div><h3>Objective:</h3><div>Medical time series, a type of multivariate time series with missing values, is widely used to predict time series analysis, the “impute first, then predict” end-to-end architecture is used to address this issue. However, existing methods are likely to lead to the loss of uniqueness and key information of low-frequency sampled variables (LFSVs) when dealing with them. In this paper, we aim to develop a method that effectively handles LFSVs, preserving their distinctive characteristics and essential information throughout the modeling process.</div></div><div><h3>Methods:</h3><div>We propose a novel end-to-end method named <em><strong>L</strong>ow-<strong>F</strong>requency <strong>V</strong>ariable-<strong>D</strong>riven network</em> (LFVDNet) for medical time series analysis. Specifically, the Time-Aware Imputer (TA) module encodes the observed values and critical time information, and uses the attention mechanism to establish an association between the observed values and the missing values. TA adopts channel-independent strategy to prevent interference from high-frequency sampled variables (HFSVs) on LFSVs, thereby preserving the unique information contained in LFSVs. The Offset-Selection Module (OS) independently selects data points for each variable through offsets, avoiding the natural disadvantages of LFSVs in selection-based imputation, thus solving the problem of the loss of key information of LFSVs. LFVDNet is the first method for analyzing multivariate time series with missing values that emphasizes the effective utilization of LFSVs.</div></div><div><h3>Results:</h3><div>We carried out the experiments on four public datasets and the experimental results indicate that LFVDNet has better robustness and performance. All code is available at <span><span>https://github.com/dxqllp/LFVDNet</span><svg><path></path></svg></span>.</div></div><div><h3>Conclusions:</h3><div>This study proposes a novel method for medical time series analysis, namely LFVDNet, which aims to effectively utilize LFSVs. Specifically, we have designed the TA module, which performs imputation through temporal correlations. The OS module, on the other hand, performs selective imputation based on a data point selection strategy. We have verified the effectiveness of this method on four datasets constructed from PhysioNet 2012 and MIMIC-IV.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104913"},"PeriodicalIF":4.5,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}