首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Multi-feature machine learning for enhanced drug–drug interaction prediction 多特征机器学习增强药物-药物相互作用预测。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-08 DOI: 10.1016/j.jbi.2025.104923
Qiuyang Feng , Xiao Huang
Drug–drug interactions are a major concern in healthcare, as concurrent drug use can cause severe adverse effects. Existing machine learning methods often neglect data imbalance and DDI directionality, limiting clinical reliability. To overcome these issues, we employed GPT-4o Large Language Model to convert free-text DDI descriptions into structured triplets for directionality analysis and applied SMOTE to alleviate class imbalance. Using four key drug features (molecular fingerprints, enzymes, pathways, targets), our Deep Neural Networks (DNN) achieved 88.9% accuracy and showed an average AUPR gain of 0.68 for minority classes attributable to SMOTE. By applying attention-based feature importance analysis, we demonstrated that the most influential feature in the DNN model was supported by pharmacological evidence. These results demonstrate the effectiveness of our framework for accurate and robust DDI prediction. The source code and data are available at https://github.com/FrankFengF/Drug-drug-interaction-prediction-
药物-药物相互作用是医疗保健中的一个主要问题,因为同时使用药物会导致严重的不良反应。现有的机器学习方法往往忽略了数据的不平衡和DDI的方向性,限制了临床的可靠性。为了克服这些问题,我们使用gpt - 40大型语言模型将自由文本DDI描述转换为结构化三元组进行方向性分析,并应用SMOTE来缓解类不平衡。利用四个关键的药物特征(分子指纹图谱、酶、途径、靶标),我们的深度神经网络(DNN)达到了88.9%的准确率,并且显示出归因于SMOTE的少数类别的平均AUPR增益为0.68。通过应用基于注意的特征重要性分析,我们证明DNN模型中最具影响力的特征得到了药理学证据的支持。这些结果证明了我们的框架对准确和稳健的DDI预测的有效性。源代码和数据可从https://github.com/FrankFengF/Drug-drug-interaction-prediction获得。
{"title":"Multi-feature machine learning for enhanced drug–drug interaction prediction","authors":"Qiuyang Feng ,&nbsp;Xiao Huang","doi":"10.1016/j.jbi.2025.104923","DOIUrl":"10.1016/j.jbi.2025.104923","url":null,"abstract":"<div><div>Drug–drug interactions are a major concern in healthcare, as concurrent drug use can cause severe adverse effects. Existing machine learning methods often neglect data imbalance and DDI directionality, limiting clinical reliability. To overcome these issues, we employed GPT-4o Large Language Model to convert free-text DDI descriptions into structured triplets for directionality analysis and applied SMOTE to alleviate class imbalance. Using four key drug features (molecular fingerprints, enzymes, pathways, targets), our Deep Neural Networks (DNN) achieved 88.9% accuracy and showed an average AUPR gain of 0.68 for minority classes attributable to SMOTE. By applying attention-based feature importance analysis, we demonstrated that the most influential feature in the DNN model was supported by pharmacological evidence. These results demonstrate the effectiveness of our framework for accurate and robust DDI prediction. The source code and data are available at <span><span>https://github.com/FrankFengF/Drug-drug-interaction-prediction-</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104923"},"PeriodicalIF":4.5,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145258203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A REDCap advanced randomization module to meet the needs of modern trials 一个REDCap高级随机化模块,以满足现代试验的需要。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-04 DOI: 10.1016/j.jbi.2025.104925
Luke Stevens , Nan Kennedy , Rob J. Taylor , Adam Lewis , Frank E. Harrell Jr , Matthew S. Shotwell , Emily S. Serdoz , Gordon R. Bernard , Wesley H. Self , Christopher J. Lindsell , Paul A. Harris , Jonathan D. Casey

Objective

Since 2012, the electronic data capture platform REDCap has included an embedded randomization module allowing a single randomization per study record with the ability to stratify by variables such as study site and participant sex at birth. In recent years, platform, adaptive, decentralized, and pragmatic trials have gained popularity. These trial designs often require approaches to randomization not supported by the original REDCap randomization module, including randomizing patients into multiple domains or at multiple points in time, changing allocation tables to add or drop study groups, or adaptively changing allocation ratios based on data from previously enrolled participants. Our team aimed to develop new randomization functions to address these issues.

Methods

A collaborative process facilitated by the NIH-funded Trial Innovation Network was initiated to modernize the randomization module in REDCap, incorporating feedback from clinical trialists, biostatisticians, technologists, and other experts.

Results

This effort led to the development of an advanced randomization module within the REDCap platform. In addition to supporting platform, adaptive, decentralized, and pragmatic trials, the new module introduces several new features, such as improved support for blinded randomization, additional randomization metadata capture (e.g., user identity and timestamp), additional tools allowing REDCap administrators to support investigators using the randomization module, and the ability for clinicians participating in pragmatic or decentralized trials to perform randomization through a survey without needing log-in access to the study database. As of June 19, 2025, multiple randomizations have been used in 211 projects from 55 institutions, randomizations with real-time trigger logic in 108 projects from 64 institutions, and blinded group allocation in 24 projects from 17 institutions.

Conclusion

The new randomization module aims to streamline the randomization process, improve trial efficiency, and ensure robust data integrity, thereby supporting the conduct of more sophisticated and adaptive clinical trials.
目的:自2012年以来,电子数据采集平台REDCap包含了一个嵌入式随机化模块,允许每个研究记录进行单个随机化,并能够根据研究地点和参与者出生时的性别等变量进行分层。近年来,平台化、自适应、去中心化、实用化的审判越来越受欢迎。这些试验设计通常需要采用原始REDCap随机化模块不支持的随机化方法,包括将患者随机分配到多个领域或多个时间点,改变分配表以增加或减少研究组,或根据先前入组的参与者的数据自适应地改变分配比例。我们的团队旨在开发新的随机化功能来解决这些问题。方法:在美国国立卫生研究院资助的试验创新网络的推动下,启动了一个协作过程,将临床试验学家、生物统计学家、技术专家和其他专家的反馈结合起来,使REDCap中的随机化模块现代化。结果:这一努力促成了REDCap平台内高级随机化模块的开发。除了支持平台、自适应、去中心化和实用的试验之外,新模块还引入了几个新功能,例如改进了对盲法随机化的支持、额外的随机化元数据捕获(例如,用户身份和时间戳)、允许REDCap管理员使用随机化模块支持调查人员的额外工具。参与实用或分散试验的临床医生无需登录研究数据库即可通过调查执行随机化的能力。截至2025年6月19日,55所院校211个项目采用了多重随机化,64所院校108个项目采用了实时触发逻辑随机化,17所院校24个项目采用了盲法分组。结论:新的随机化模块旨在简化随机化过程,提高试验效率,确保数据的完整性,从而支持开展更复杂和适应性更强的临床试验。
{"title":"A REDCap advanced randomization module to meet the needs of modern trials","authors":"Luke Stevens ,&nbsp;Nan Kennedy ,&nbsp;Rob J. Taylor ,&nbsp;Adam Lewis ,&nbsp;Frank E. Harrell Jr ,&nbsp;Matthew S. Shotwell ,&nbsp;Emily S. Serdoz ,&nbsp;Gordon R. Bernard ,&nbsp;Wesley H. Self ,&nbsp;Christopher J. Lindsell ,&nbsp;Paul A. Harris ,&nbsp;Jonathan D. Casey","doi":"10.1016/j.jbi.2025.104925","DOIUrl":"10.1016/j.jbi.2025.104925","url":null,"abstract":"<div><h3>Objective</h3><div>Since 2012, the electronic data capture platform REDCap has included an embedded randomization module allowing a single randomization per study record with the ability to stratify by variables such as study site and participant sex at birth. In recent years, platform, adaptive, decentralized, and pragmatic trials have gained popularity. These trial designs often require approaches to randomization not supported by the original REDCap randomization module, including randomizing patients into multiple domains or at multiple points in time, changing allocation tables to add or drop study groups, or adaptively changing allocation ratios based on data from previously enrolled participants. Our team aimed to develop new randomization functions to address these issues.</div></div><div><h3>Methods</h3><div>A collaborative process facilitated by the NIH-funded Trial Innovation Network was initiated to modernize the randomization module in REDCap, incorporating feedback from clinical trialists, biostatisticians, technologists, and other experts.</div></div><div><h3>Results</h3><div>This effort led to the development of an advanced randomization module within the REDCap platform. In addition to supporting platform, adaptive, decentralized, and pragmatic trials, the new module introduces several new features, such as improved support for blinded randomization, additional randomization metadata capture (e.g., user identity and timestamp), additional tools allowing REDCap administrators to support investigators using the randomization module, and the ability for clinicians participating in pragmatic or decentralized trials to perform randomization through a survey without needing log-in access to the study database. As of June 19, 2025, multiple randomizations have been used in 211 projects from 55 institutions, randomizations with real-time trigger logic in 108 projects from 64 institutions, and blinded group allocation in 24 projects from 17 institutions.</div></div><div><h3>Conclusion</h3><div>The new randomization module aims to streamline the randomization process, improve trial efficiency, and ensure robust data integrity, thereby supporting the conduct of more sophisticated and adaptive clinical trials.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104925"},"PeriodicalIF":4.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145238683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating probabilistic privacy-preserving medical record linkage: A three-party MPC approach 加速概率隐私保护医疗记录链接:一种三方MPC方法
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-01 DOI: 10.1016/j.jbi.2025.104920
Şeyma Selcan Mağara, Noah Dietrich, Ali Burak Ünal, Mete Akgün

Objective:

Record linkage is essential for integrating data from multiple sources with diverse applications in real-world healthcare and research. Probabilistic Privacy-Preserving Record Linkage (PPRL) enables this integration occurs, while protecting sensitive information from unauthorized access, especially when datasets lack exact identifiers. As privacy regulations evolve and multi-institutional collaborations expand globally, there is a growing demand for methods that effectively balance security, accuracy, and efficiency. However, ensuring both privacy and scalability in large-scale record linkage remains a key challenge.

Method:

This paper presents a novel and efficient PPRL method based on a secure 3-party computation (MPC) framework. Our approach allows multiple parties to compute linkage results without exposing their private inputs and significantly improves the speed of linkage process compared to existing PPRL solutions.

Result:

Our method preserves the linkage quality of a state-of-the-art (SOTA) MPC-based PPRL method while achieving up to 14 times faster performance. For example, linking a record against a database of 10,000 records takes just 8.74 s in a realistic network with 700 Mbps bandwidth and 60 ms latency, compared to 92.32 s with the SOTA method. Even on a slower internet connection with 100 Mbps bandwidth and 60 ms latency, the linkage completes in 28 s, where as the SOTA method requires 287.96 s. These results demonstrate the significant scalability and efficiency improvements of our approach.

Conclusion:

Our novel PPRL method, based on secure 3-party computation, offers an efficient and scalable solution for large-scale record linkage while ensuring privacy protection. The approach demonstrates significant performance improvements, making it a promising tool for secure data integration in privacy-sensitive sectors.
目的:记录链接对于在现实世界的医疗保健和研究中整合来自多个来源的不同应用的数据至关重要。概率隐私保护记录链接(PPRL)实现了这种集成,同时保护敏感信息免受未经授权的访问,特别是当数据集缺乏精确的标识符时。随着隐私法规的发展和多机构合作在全球范围内的扩展,对有效平衡安全性、准确性和效率的方法的需求不断增长。然而,在大规模记录链接中,确保隐私和可扩展性仍然是一个关键的挑战。方法:提出一种基于安全三方计算(MPC)框架的新型高效PPRL方法。我们的方法允许多方在不暴露其私有输入的情况下计算链接结果,与现有的PPRL解决方案相比,显著提高了链接过程的速度。结果:我们的方法保留了最先进的(SOTA)基于mpc的PPRL方法的连接质量,同时实现了高达14倍的性能提升。例如,在带宽为700 Mbps、延迟为60 ms的实际网络中,将一条记录与包含10,000条记录的数据库相关联只需要8.74 s,而使用SOTA方法需要92.32 s。即使在带宽为100mbps、延迟为60ms的较慢的互联网连接上,连接也需要在28秒内完成,而SOTA方法需要287.96秒。这些结果表明,我们的方法具有显著的可扩展性和效率改进。结论:基于安全三方计算的PPRL方法在保证隐私保护的同时,为大规模记录链接提供了高效、可扩展的解决方案。该方法显示了显著的性能改进,使其成为隐私敏感领域中安全数据集成的有前途的工具。
{"title":"Accelerating probabilistic privacy-preserving medical record linkage: A three-party MPC approach","authors":"Şeyma Selcan Mağara,&nbsp;Noah Dietrich,&nbsp;Ali Burak Ünal,&nbsp;Mete Akgün","doi":"10.1016/j.jbi.2025.104920","DOIUrl":"10.1016/j.jbi.2025.104920","url":null,"abstract":"<div><h3>Objective:</h3><div>Record linkage is essential for integrating data from multiple sources with diverse applications in real-world healthcare and research. Probabilistic Privacy-Preserving Record Linkage (PPRL) enables this integration occurs, while protecting sensitive information from unauthorized access, especially when datasets lack exact identifiers. As privacy regulations evolve and multi-institutional collaborations expand globally, there is a growing demand for methods that effectively balance security, accuracy, and efficiency. However, ensuring both privacy and scalability in large-scale record linkage remains a key challenge.</div></div><div><h3>Method:</h3><div>This paper presents a novel and efficient PPRL method based on a secure 3-party computation (MPC) framework. Our approach allows multiple parties to compute linkage results without exposing their private inputs and significantly improves the speed of linkage process compared to existing PPRL solutions.</div></div><div><h3>Result:</h3><div>Our method preserves the linkage quality of a state-of-the-art (SOTA) MPC-based PPRL method while achieving up to 14 times faster performance. For example, linking a record against a database of 10,000 records takes just 8.74 s in a realistic network with 700 Mbps bandwidth and 60 ms latency, compared to 92.32 s with the SOTA method. Even on a slower internet connection with 100 Mbps bandwidth and 60 ms latency, the linkage completes in 28 s, where as the SOTA method requires 287.96 s. These results demonstrate the significant scalability and efficiency improvements of our approach.</div></div><div><h3>Conclusion:</h3><div>Our novel PPRL method, based on secure 3-party computation, offers an efficient and scalable solution for large-scale record linkage while ensuring privacy protection. The approach demonstrates significant performance improvements, making it a promising tool for secure data integration in privacy-sensitive sectors.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104920"},"PeriodicalIF":4.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphFusion: Integrative prediction of drug synergy using multi-scale graph representations and cell line contexts GraphFusion:使用多尺度图形表示和细胞系上下文对药物协同作用进行综合预测。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-30 DOI: 10.1016/j.jbi.2025.104921
Biyang Zeng, Shikui Tu, Lei Xu
Predicting the synergy of drug combinations is crucial for cancer treatment and drug development. Accurate prediction requires the integration of multiple types of data, including molecular structures of individual drugs, available synergy scores between drugs, and gene expression information from different cancer cell lines. The first two types contain multi-scale information within or between drugs, while the cell lines serve as the contextual background for drug interactions. Existing machine learning methods fail to fully utilize and integrate these information, leading to suboptimal performance. To address this issue, we introduce GraphFusion, an innovative approach that combines molecular graphs and drug synergy graphs with cell line contextual information. By employing novel GCN and Graphormer modules capable of accepting and utilizing external information, GraphFusion integrates these two levels of graph information. Specifically, the molecular graphs pass fine-grained structural information to the synergy graphs, while the synergy graphs convey global drug interaction data to the molecular graphs. Additionally, cell line information is incorporated as contextual background. This comprehensive integration enables GraphFusion to achieve state-of-the-art results on the O’Neil and NCI-ALMANAC datasets.
预测药物组合的协同作用对癌症治疗和药物开发至关重要。准确的预测需要整合多种类型的数据,包括单个药物的分子结构,药物之间可用的协同评分,以及来自不同癌细胞系的基因表达信息。前两种类型包含药物内部或药物之间的多尺度信息,而细胞系则作为药物相互作用的背景。现有的机器学习方法不能充分利用和整合这些信息,导致性能不佳。为了解决这个问题,我们引入了GraphFusion,这是一种将分子图和药物协同作用图与细胞系上下文信息相结合的创新方法。GraphFusion采用能够接受和利用外部信息的新型GCN和graphhormer模块,将这两个层次的图形信息整合在一起。具体来说,分子图将细粒度的结构信息传递给协同图,而协同图将全局药物相互作用数据传递给分子图。此外,细胞系信息被纳入上下文背景。这种全面的集成使GraphFusion能够在O'Neil和NCI-ALMANAC数据集上实现最先进的结果。
{"title":"GraphFusion: Integrative prediction of drug synergy using multi-scale graph representations and cell line contexts","authors":"Biyang Zeng,&nbsp;Shikui Tu,&nbsp;Lei Xu","doi":"10.1016/j.jbi.2025.104921","DOIUrl":"10.1016/j.jbi.2025.104921","url":null,"abstract":"<div><div>Predicting the synergy of drug combinations is crucial for cancer treatment and drug development. Accurate prediction requires the integration of multiple types of data, including molecular structures of individual drugs, available synergy scores between drugs, and gene expression information from different cancer cell lines. The first two types contain multi-scale information within or between drugs, while the cell lines serve as the contextual background for drug interactions. Existing machine learning methods fail to fully utilize and integrate these information, leading to suboptimal performance. To address this issue, we introduce GraphFusion, an innovative approach that combines molecular graphs and drug synergy graphs with cell line contextual information. By employing novel GCN and Graphormer modules capable of accepting and utilizing external information, GraphFusion integrates these two levels of graph information. Specifically, the molecular graphs pass fine-grained structural information to the synergy graphs, while the synergy graphs convey global drug interaction data to the molecular graphs. Additionally, cell line information is incorporated as contextual background. This comprehensive integration enables GraphFusion to achieve state-of-the-art results on the O’Neil and NCI-ALMANAC datasets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104921"},"PeriodicalIF":4.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145212771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Definitions to data flow: Operationalizing MIABIS in HL7 FHIR 数据流的定义:在HL7 FHIR中实现MIABIS。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-27 DOI: 10.1016/j.jbi.2025.104919
Radovan Tomášik , Šimon Koňár , Niina Eklund , Cäcilia Engels , Zdenka Dudova , Radoslava Kacová , Roman Hrstka , Petr Holub

Objective

Biobanks and biomolecular resources are increasingly central to data-driven biomedical research, encompassing not only metadata but also granular, sample-related data from diverse sources such as healthcare systems, national registries, and research outputs. However, the lack of a standardised, machine-readable format for representing such data limits interoperability, data reuse and integration into clinical and research environments. While MIABIS provides a conceptual model for biobank data, its abstract nature and reliance on heterogeneous implementations create barriers to practical, scalable adoption. This study presents a pragmatic, operational implementation of MIABIS focused on enabling real-world exchange and integration of sample-level data.

Methods

We systematically evaluated established data exchange standards, comparing HL7 FHIR and OMOP CDM with respect to their suitability for structuring sample-related data in a semantically robust and machine-readable form. Based on this analysis, we developed a FHIR-based representation of MIABIS that supports complex biobank structures and enables integration with federated data infrastructures. Supporting tools, including a Python library and an implementation guide, were created to ensure usability across diverse research and clinical contexts.

Results

We created nine interoperable FHIR profiles covering core MIABIS entities, ensuring consistency with FHIR standards. To support adoption, we developed an open-source Python library that abstracts FHIR interactions and provides schema validation for MIABIS-compliant data. The library was integrated into an ETL tool in operation at Czech Node of BBMRI-ERIC, European Biobanking and Biomolecular Resources Research Infrastructure, to demonstrate usability with real-world sample-related data. Separately, we validated the representation of MIABIS entities at the organisational level by converting the data structures of BBMRI-ERIC Directory into FHIR, demonstrating compatibility with federated data infrastructures.

Conclusion

This work delivers a machine-readable, interoperable implementation of MIABIS, enabling the exchange of both organisational and sample-level data across biobanks and health information systems. By integrating MIABIS with HL7 FHIR, we provide a host of reusable tools and mechanisms for further evolution of the data model. Combined, these benefits can help with the integration into clinical and research workflows, supporting data discoverability, reuse, and cross-institutional collaboration in biomedical research.
目的:生物银行和生物分子资源在数据驱动的生物医学研究中越来越重要,不仅包括元数据,还包括来自不同来源(如医疗保健系统、国家登记处和研究成果)的颗粒状样本相关数据。然而,缺乏一种标准化的、机器可读的格式来表示这些数据,限制了互操作性、数据重用和临床和研究环境的集成。虽然MIABIS为生物银行数据提供了一个概念模型,但它的抽象性和对异构实现的依赖为实际的、可扩展的采用创造了障碍。本研究提出了一种实用的、可操作的MIABIS实现方法,重点是实现样本级数据的真实交换和集成。方法:我们系统地评估了已建立的数据交换标准,比较了HL7 FHIR和OMOP CDM在以语义鲁棒性和机器可读形式构建样本相关数据方面的适用性。基于这一分析,我们开发了一个基于fhir的MIABIS表示,它支持复杂的生物库结构,并能够与联邦数据基础设施集成。包括Python库和实现指南在内的支持工具被创建,以确保在不同的研究和临床环境中可用性。结果:我们创建了9个可互操作的FHIR配置文件,涵盖了核心MIABIS实体,确保了与FHIR标准的一致性。为了支持采用,我们开发了一个开源Python库,它抽象了FHIR交互,并为符合miabis的数据提供了模式验证。该库被整合到BBMRI-ERIC捷克节点的ETL工具中,欧洲生物银行和生物分子资源研究基础设施,以展示与现实世界样本相关数据的可用性。另外,我们通过将BBMRI-ERIC目录的数据结构转换为FHIR,验证了MIABIS实体在组织级别的表示,展示了与联邦数据基础设施的兼容性。结论:这项工作提供了一个机器可读、可互操作的MIABIS实现,使生物库和卫生信息系统之间的组织和样本级数据交换成为可能。通过将MIABIS与HL7 FHIR集成,我们为数据模型的进一步发展提供了大量可重用的工具和机制。综合起来,这些优势可以帮助整合到临床和研究工作流程中,支持生物医学研究中的数据发现、重用和跨机构协作。
{"title":"Definitions to data flow: Operationalizing MIABIS in HL7 FHIR","authors":"Radovan Tomášik ,&nbsp;Šimon Koňár ,&nbsp;Niina Eklund ,&nbsp;Cäcilia Engels ,&nbsp;Zdenka Dudova ,&nbsp;Radoslava Kacová ,&nbsp;Roman Hrstka ,&nbsp;Petr Holub","doi":"10.1016/j.jbi.2025.104919","DOIUrl":"10.1016/j.jbi.2025.104919","url":null,"abstract":"<div><h3>Objective</h3><div>Biobanks and biomolecular resources are increasingly central to data-driven biomedical research, encompassing not only metadata but also granular, sample-related data from diverse sources such as healthcare systems, national registries, and research outputs. However, the lack of a standardised, machine-readable format for representing such data limits interoperability, data reuse and integration into clinical and research environments. While MIABIS provides a conceptual model for biobank data, its abstract nature and reliance on heterogeneous implementations create barriers to practical, scalable adoption. This study presents a pragmatic, operational implementation of MIABIS focused on enabling real-world exchange and integration of sample-level data.</div></div><div><h3>Methods</h3><div>We systematically evaluated established data exchange standards, comparing HL7 FHIR and OMOP CDM with respect to their suitability for structuring sample-related data in a semantically robust and machine-readable form. Based on this analysis, we developed a FHIR-based representation of MIABIS that supports complex biobank structures and enables integration with federated data infrastructures. Supporting tools, including a Python library and an implementation guide, were created to ensure usability across diverse research and clinical contexts.</div></div><div><h3>Results</h3><div>We <em>created nine interoperable FHIR profiles</em> covering core MIABIS entities, ensuring consistency with FHIR standards. To support adoption, we <em>developed an open-source Python library</em> that abstracts FHIR interactions and provides schema validation for MIABIS-compliant data. The <em>library was integrated into an ETL tool</em> in operation at Czech Node of BBMRI-ERIC, European Biobanking and Biomolecular Resources Research Infrastructure, to demonstrate usability with real-world sample-related data. Separately, we validated the representation of MIABIS entities at the organisational level by converting the data structures of BBMRI-ERIC Directory into FHIR, demonstrating compatibility with federated data infrastructures.</div></div><div><h3>Conclusion</h3><div>This work delivers a machine-readable, interoperable implementation of MIABIS, enabling the exchange of both organisational and sample-level data across biobanks and health information systems. By integrating MIABIS with HL7 FHIR, we provide a host of reusable tools and mechanisms for further evolution of the data model. Combined, these benefits can help with the integration into clinical and research workflows, supporting data discoverability, reuse, and cross-institutional collaboration in biomedical research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104919"},"PeriodicalIF":4.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of tools to support Target Trial Emulation 回顾支持目标试验仿真的工具。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-26 DOI: 10.1016/j.jbi.2025.104897
Christina A. van Hal , Elmer V. Bernstam , Todd R. Johnson

Objective:

Randomized Controlled Trials (RCTs) are the gold standard for clinical evidence, but ethical and practical constraints sometimes necessitate or warrant the use of observational data. The aim of this study is to identify informatics tools that support the design and conduct of Target Trial Emulations (TTEs), a framework for designing observational studies that closely emulate RCTs so as to minimize biases that often arise when using real-world evidence (RWE) to estimate causal effects.

Methods:

We divided the process of conducting TTEs into three phases and seven steps. We then systematically reviewed the literature to identify currently available tools that support one or more of the seven steps required to conduct a TTE. For each tool, we noted which step or steps the tool supports.

Results:

7625 papers were included in the initial review, with 76 meeting our inclusion criteria. Our review identified 24 distinct tools applicable to the three phases of TTE. Specifically, 3 tools support the Design Phase, 5 support the Implementation Phase, and 19 support the Analysis Phase, with some tools applicable to multiple phases.

Conclusion:

This review revealed significant gaps in tool support for the Design Phase of TTEs, while support for the Implementation and Analysis phases was highly variable. No single tool currently supports all aspects of TTEs from start to finish and few tools are interoperable, meaning they cannot be easily integrated into a unified workflow. The results highlight the need for further development of informatics tools for supporting TTEs.
目的:随机对照试验(RCTs)是临床证据的黄金标准,但伦理和实践限制有时需要或保证使用观察性数据。本研究的目的是确定支持目标试验模拟(TTEs)设计和实施的信息学工具,目标试验模拟是一种设计密切模仿随机对照试验的观察性研究的框架,以便最大限度地减少使用真实世界证据(RWE)估计因果效应时经常出现的偏差。方法:将其分为3个阶段和7个步骤。然后,我们系统地回顾了文献,以确定当前可用的工具,这些工具支持进行TTE所需的七个步骤中的一个或多个步骤。对于每个工具,我们记录了该工具支持的步骤。结果:初审共纳入7625篇论文,其中76篇符合我们的纳入标准。我们的审查确定了适用于TTE三个阶段的24种不同工具。具体来说,有3个工具支持设计阶段,5个工具支持实现阶段,19个工具支持分析阶段,其中一些工具适用于多个阶段。结论:这篇综述揭示了对设计阶段的工具支持的显著差距,而对实施和分析阶段的支持是高度可变的。目前还没有一个工具能够从头到尾支持tte的所有方面,而且很少有工具是可互操作的,这意味着它们不能很容易地集成到一个统一的工作流中。研究结果强调了进一步开发支持tts的信息学工具的必要性。
{"title":"Review of tools to support Target Trial Emulation","authors":"Christina A. van Hal ,&nbsp;Elmer V. Bernstam ,&nbsp;Todd R. Johnson","doi":"10.1016/j.jbi.2025.104897","DOIUrl":"10.1016/j.jbi.2025.104897","url":null,"abstract":"<div><h3>Objective:</h3><div>Randomized Controlled Trials (RCTs) are the gold standard for clinical evidence, but ethical and practical constraints sometimes necessitate or warrant the use of observational data. The aim of this study is to identify informatics tools that support the design and conduct of Target Trial Emulations (TTEs), a framework for designing observational studies that closely emulate RCTs so as to minimize biases that often arise when using real-world evidence (RWE) to estimate causal effects.</div></div><div><h3>Methods:</h3><div>We divided the process of conducting TTEs into three phases and seven steps. We then systematically reviewed the literature to identify currently available tools that support one or more of the seven steps required to conduct a TTE. For each tool, we noted which step or steps the tool supports.</div></div><div><h3>Results:</h3><div>7625 papers were included in the initial review, with 76 meeting our inclusion criteria. Our review identified 24 distinct tools applicable to the three phases of TTE. Specifically, 3 tools support the Design Phase, 5 support the Implementation Phase, and 19 support the Analysis Phase, with some tools applicable to multiple phases.</div></div><div><h3>Conclusion:</h3><div>This review revealed significant gaps in tool support for the Design Phase of TTEs, while support for the Implementation and Analysis phases was highly variable. No single tool currently supports all aspects of TTEs from start to finish and few tools are interoperable, meaning they cannot be easily integrated into a unified workflow. The results highlight the need for further development of informatics tools for supporting TTEs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104897"},"PeriodicalIF":4.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145185991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug repositioning with metapath guidance and adaptive negative sampling enhancement 基于路径引导和自适应负采样增强的药物重新定位。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-25 DOI: 10.1016/j.jbi.2025.104916
Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen

Objective:

Drug repositioning plays a pivotal role in expediting the drug discovery pipeline. The rapid development of computational methods has opened new avenues for predicting drug-disease associations (DDAs). Despite advancements in existing methodologies, challenges such as insufficient exploration of diverse relationships in heterogeneous biological networks and inadequate quality of negative samples have persisted.

Methods:

In this study, we introduce DRMGNE, a novel drug repositioning framework that harnesses metapath-guided learning and adaptive negative enhancement for DDA prediction. DRMGNE initiates with an autoencoder to extract semantic features based on similarity matrices. Subsequently, a comprehensive set of metapaths is designed to generate subgraphs, and graph convolutional networks are utilized to extract enriched node representations reflecting topological structures. Furthermore, the adaptive negative enhancement strategy is employed to improve the quality of negative samples, ensuring balanced learning.

Results:

Experimental evaluations demonstrate that DRMGNE outperforms state-of-the-art algorithms across three benchmark datasets. Additionally, case studies and molecular docking validations further underscore its potential in facilitating drug discovery and accelerating drug repurposing efforts.

Conclusion:

DRMGNE is a novel framework for DDA prediction that leverages metapath-based guidance and adaptive negative enhancement. Experiments on benchmark datasets show superior performance over existing methods, underscoring its potential impact in drug discovery.
目的:药物重新定位在加快药物发现过程中起着关键作用。计算方法的快速发展为预测药物-疾病关联(DDAs)开辟了新的途径。尽管现有方法取得了进步,但诸如对异质生物网络中多种关系的探索不足和阴性样本质量不足等挑战仍然存在。方法:在本研究中,我们引入了一种新的药物重新定位框架DRMGNE,该框架利用元路径引导学习和自适应负增强来预测DDA。DRMGNE首先使用自编码器提取基于相似矩阵的语义特征。随后,设计了一套全面的元路径来生成子图,并利用图卷积网络提取反映拓扑结构的丰富节点表示。此外,采用自适应负增强策略提高负样本的质量,保证均衡学习。结果:实验评估表明,DRMGNE在三个基准数据集上优于最先进的算法。此外,案例研究和分子对接验证进一步强调了其在促进药物发现和加速药物再利用方面的潜力。结论:DRMGNE是一种新的DDA预测框架,它利用了基于元路径的引导和自适应负增强。在基准数据集上的实验显示了优于现有方法的性能,强调了其在药物发现中的潜在影响。
{"title":"Drug repositioning with metapath guidance and adaptive negative sampling enhancement","authors":"Yaozheng Zhou ,&nbsp;Xingyu Shi ,&nbsp;Lingfeng Wang ,&nbsp;Jin Xu ,&nbsp;Demin Li ,&nbsp;Congzhou Chen","doi":"10.1016/j.jbi.2025.104916","DOIUrl":"10.1016/j.jbi.2025.104916","url":null,"abstract":"<div><h3>Objective:</h3><div>Drug repositioning plays a pivotal role in expediting the drug discovery pipeline. The rapid development of computational methods has opened new avenues for predicting drug-disease associations (DDAs). Despite advancements in existing methodologies, challenges such as insufficient exploration of diverse relationships in heterogeneous biological networks and inadequate quality of negative samples have persisted.</div></div><div><h3>Methods:</h3><div>In this study, we introduce DRMGNE, a novel drug repositioning framework that harnesses metapath-guided learning and adaptive negative enhancement for DDA prediction. DRMGNE initiates with an autoencoder to extract semantic features based on similarity matrices. Subsequently, a comprehensive set of metapaths is designed to generate subgraphs, and graph convolutional networks are utilized to extract enriched node representations reflecting topological structures. Furthermore, the adaptive negative enhancement strategy is employed to improve the quality of negative samples, ensuring balanced learning.</div></div><div><h3>Results:</h3><div>Experimental evaluations demonstrate that DRMGNE outperforms state-of-the-art algorithms across three benchmark datasets. Additionally, case studies and molecular docking validations further underscore its potential in facilitating drug discovery and accelerating drug repurposing efforts.</div></div><div><h3>Conclusion:</h3><div>DRMGNE is a novel framework for DDA prediction that leverages metapath-based guidance and adaptive negative enhancement. Experiments on benchmark datasets show superior performance over existing methods, underscoring its potential impact in drug discovery.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104916"},"PeriodicalIF":4.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145182165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-scale semantic fusion integration of dual pathway models in drug repositioning 药物重新定位中双通路模型的跨尺度语义融合整合。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-25 DOI: 10.1016/j.jbi.2025.104914
Mingxuan Li, Shuai Li, Zhen Li, Mandong Hu
Drug Repositioning (DR) represents an innovative drug development strategy that significantly reduces both cost and time by identifying new therapeutic indications for approved drugs. Current methods primarily focus on extracting information from drug–disease networks, but often overlook critical local structural details between nodes. This study introduces CSDPDR, a novel Dual-branch graph neural network that integrates Topology Feature Information and Salient Feature Information to enhance drug repositioning accuracy and efficiency. Through the Topology-aware branch with Adaptive Residual Graph Attention and the Saliency-aware branch with Score-Driven Top-K Convolutional Graph Pooling, the model can capture both large-scale topology patterns and fine-grained local information. Furthermore, our approach effectively alleviate graph sparsity issues through meta-path-based network enhancement and confidence-based filtering mechanisms. Comparative experiments on two benchmark datasets an additional dataset demonstrate that CSDPDR significantly outperforms several state-of-the-art baseline methods. Case studies on Alzheimer’s disease and breast neoplasms further validate the model’s practical applicability and effectiveness.
药物重新定位(DR)是一种创新的药物开发策略,通过为已批准的药物确定新的治疗适应症,显著降低成本和时间。目前的方法主要集中于从药物-疾病网络中提取信息,但往往忽略了节点之间关键的局部结构细节。本研究引入了一种新的双分支图神经网络CSDPDR,该网络将拓扑特征信息和显著特征信息相结合,以提高药物重定位的准确性和效率。通过自适应残差图注意的拓扑感知分支和分数驱动的Top-K卷积图池的显著性感知分支,该模型既能捕获大规模的拓扑模式,又能捕获细粒度的局部信息。此外,我们的方法通过基于元路径的网络增强和基于置信度的过滤机制有效地缓解了图稀疏性问题。在两个基准数据集和另一个数据集上的对比实验表明,CSDPDR显著优于几种最先进的基线方法。阿尔茨海默病和乳腺肿瘤的案例研究进一步验证了该模型的实用性和有效性。
{"title":"Cross-scale semantic fusion integration of dual pathway models in drug repositioning","authors":"Mingxuan Li,&nbsp;Shuai Li,&nbsp;Zhen Li,&nbsp;Mandong Hu","doi":"10.1016/j.jbi.2025.104914","DOIUrl":"10.1016/j.jbi.2025.104914","url":null,"abstract":"<div><div>Drug Repositioning (DR) represents an innovative drug development strategy that significantly reduces both cost and time by identifying new therapeutic indications for approved drugs. Current methods primarily focus on extracting information from drug–disease networks, but often overlook critical local structural details between nodes. This study introduces CSDPDR, a novel Dual-branch graph neural network that integrates Topology Feature Information and Salient Feature Information to enhance drug repositioning accuracy and efficiency. Through the Topology-aware branch with Adaptive Residual Graph Attention and the Saliency-aware branch with Score-Driven Top-K Convolutional Graph Pooling, the model can capture both large-scale topology patterns and fine-grained local information. Furthermore, our approach effectively alleviate graph sparsity issues through meta-path-based network enhancement and confidence-based filtering mechanisms. Comparative experiments on two benchmark datasets an additional dataset demonstrate that CSDPDR significantly outperforms several state-of-the-art baseline methods. Case studies on Alzheimer’s disease and breast neoplasms further validate the model’s practical applicability and effectiveness.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104914"},"PeriodicalIF":4.5,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145182173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring and visualizing healthcare process variability 测量和可视化医疗保健过程可变性。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-23 DOI: 10.1016/j.jbi.2025.104918
Pengfei Yin , Abel Armas Cervantes , Daniel Capurro

Importance

Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.

Objective

Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.

Methods

This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.

Result

Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p < 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p < 0.001) and 21 % (95 % CI: 0.71 to 0.88, p < 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p < 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.

Conclusion

In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.
重要性:了解导致患者护理临床变异性的因素至关重要,因为无根据的变异性可能导致不良事件增加和住院时间延长。确定这种可变性何时变得过度,是优化患者结果和医疗效率的一个步骤。目的:探讨临床变异与临床转归的关系。本研究旨在找出临床变异与住院时间(LOS)之间关系显著的时间点。方法:本队列研究使用MIMIC-IV数据集,收集美国贝斯以色列女执事医疗中心的电子健康记录。我们关注的是接受择期冠状动脉搭桥手术的成年患者,共观察了847例患者。记录年龄、种族、保险类型、Charlson共病指数(CCI)等人口统计学因素。我们进行了变异性分析,其中患者的临床过程表示为事件序列。根据记录的活动起始日对数据进行分割,建立观察窗口。通过回归分析,我们确定了可变性对LOS的影响变得独立显著的时间窗口。结果:回归分析显示,在变异性距离组中排名前20位 %的患者的LOS增加了81 %(95 % CI: 1.72至1.91,p )。结论:在所研究的队列中,变异性较大的患者路程与较长的LOS相关,并具有剂量-反应关系:变异性越高,LOS越长。本研究提出了一种标准化的方法来测量和可视化临床过程中的变异性,并测量其对患者相关结果的影响。
{"title":"Measuring and visualizing healthcare process variability","authors":"Pengfei Yin ,&nbsp;Abel Armas Cervantes ,&nbsp;Daniel Capurro","doi":"10.1016/j.jbi.2025.104918","DOIUrl":"10.1016/j.jbi.2025.104918","url":null,"abstract":"<div><h3>Importance</h3><div>Understanding factors that contribute to clinical variability in patient care is critical, as unwarranted variability can lead to increased adverse events and prolonged hospital stays. Determining when this variability becomes excessive can be a step in optimizing patient outcomes and healthcare efficiency.</div></div><div><h3>Objective</h3><div>Explore the association between clinical variation and clinical outcomes. This study aims to identify the point in time when the relationship between clinical variation and length of stay (LOS) becomes significant.</div></div><div><h3>Methods</h3><div>This cohort study uses MIMIC-IV, a dataset collecting electronic health records of the Beth Israel Deaconess Medical Center in the United States. We focused on adult patients who underwent elective coronary bypass surgery, generating 847 patient observations. Demographic factors such as age, race, insurance type, and the Charlson Comorbidity Index (CCI) were recorded. We performed a variability analysis where patients’ clinical processes are represented as sequences of events. The data was segmented based on the initial day of recorded activity to establish observation windows. Using a regression analysis, we identified the temporal window where variability’s impact on LOS becomes independently significant.</div></div><div><h3>Result</h3><div>Regression analysis revealed that patients in the top 20 % of the variability distance group experienced an 81 % increase in LOS (95 % CI: 1.72 to 1.91, p &lt; 0.001). Insurance types, such as Medicare and Other, were associated with 18 % (95 % CI: 0.73 to 0.92, p &lt; 0.001) and 21 % (95 % CI: 0.71 to 0.88, p &lt; 0.001) decreases in LOS, respectively. Neither age nor race significantly affected LOS, but a higher CCI was associated with a 3.3 % increase in LOS (95 % CI: 1.02 to 1.05, p &lt; 0.001). These findings indicate that higher variability and CCI significantly influence LOS, with insurance type also playing a crucial role.</div></div><div><h3>Conclusion</h3><div>In the studied cohort, patient journeys with greater variability were associated with longer LOS with a dose–response relationship: the higher the variability, the longer LOS. This study presents a standardized way to measure and visualize variability in clinical processes and measure its impact on patient-relevant outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104918"},"PeriodicalIF":4.5,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LFVDNet: Low-frequency variable-driven network for medical time series LFVDNet:医疗时间序列的低频变量驱动网络。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-23 DOI: 10.1016/j.jbi.2025.104913
Yue Zhang , Dengqun Sun , Lei Li , Jian Zhou , Xiuquan Du , Shuo Li

Objective:

Medical time series, a type of multivariate time series with missing values, is widely used to predict time series analysis, the “impute first, then predict” end-to-end architecture is used to address this issue. However, existing methods are likely to lead to the loss of uniqueness and key information of low-frequency sampled variables (LFSVs) when dealing with them. In this paper, we aim to develop a method that effectively handles LFSVs, preserving their distinctive characteristics and essential information throughout the modeling process.

Methods:

We propose a novel end-to-end method named Low-Frequency Variable-Driven network (LFVDNet) for medical time series analysis. Specifically, the Time-Aware Imputer (TA) module encodes the observed values and critical time information, and uses the attention mechanism to establish an association between the observed values and the missing values. TA adopts channel-independent strategy to prevent interference from high-frequency sampled variables (HFSVs) on LFSVs, thereby preserving the unique information contained in LFSVs. The Offset-Selection Module (OS) independently selects data points for each variable through offsets, avoiding the natural disadvantages of LFSVs in selection-based imputation, thus solving the problem of the loss of key information of LFSVs. LFVDNet is the first method for analyzing multivariate time series with missing values that emphasizes the effective utilization of LFSVs.

Results:

We carried out the experiments on four public datasets and the experimental results indicate that LFVDNet has better robustness and performance. All code is available at https://github.com/dxqllp/LFVDNet.

Conclusions:

This study proposes a novel method for medical time series analysis, namely LFVDNet, which aims to effectively utilize LFSVs. Specifically, we have designed the TA module, which performs imputation through temporal correlations. The OS module, on the other hand, performs selective imputation based on a data point selection strategy. We have verified the effectiveness of this method on four datasets constructed from PhysioNet 2012 and MIMIC-IV.
目的:医学时间序列作为一种多变量时间序列的缺失值预测被广泛应用于时间序列分析,采用“先估算后预测”的端到端架构解决这一问题。然而,现有的方法在处理低频采样变量(LFSVs)时,容易导致其唯一性和关键信息的丢失。在本文中,我们的目标是开发一种有效处理LFSVs的方法,在整个建模过程中保留其独特的特征和基本信息。方法:提出一种新的端到端医学时间序列分析方法——低频变量驱动网络(LFVDNet)。具体来说,TA (time - aware Imputer)模块对观测值和关键时间信息进行编码,并利用注意机制在观测值和缺失值之间建立关联。TA采用信道无关策略,防止高频采样变量(HFSVs)对LFSVs的干扰,从而保留了LFSVs中所包含的唯一信息。偏移选择模块(Offset-Selection Module, OS)通过偏移量独立选择每个变量的数据点,避免了LFSVs在基于选择的插值中固有的缺点,从而解决了LFSVs关键信息丢失的问题。LFVDNet是第一个强调lfsv有效利用的多变量缺失值时间序列分析方法。结果:我们在四个公共数据集上进行了实验,实验结果表明LFVDNet具有更好的鲁棒性和性能。本文提出了一种新的医学时间序列分析方法,即LFVDNet,旨在有效地利用lfsv。具体而言,我们设计了TA模块,该模块通过时间相关性进行imputation。另一方面,操作系统模块根据数据点选择策略执行选择性插补。我们在PhysioNet 2012和MIMIC-IV构建的四个数据集上验证了该方法的有效性。
{"title":"LFVDNet: Low-frequency variable-driven network for medical time series","authors":"Yue Zhang ,&nbsp;Dengqun Sun ,&nbsp;Lei Li ,&nbsp;Jian Zhou ,&nbsp;Xiuquan Du ,&nbsp;Shuo Li","doi":"10.1016/j.jbi.2025.104913","DOIUrl":"10.1016/j.jbi.2025.104913","url":null,"abstract":"<div><h3>Objective:</h3><div>Medical time series, a type of multivariate time series with missing values, is widely used to predict time series analysis, the “impute first, then predict” end-to-end architecture is used to address this issue. However, existing methods are likely to lead to the loss of uniqueness and key information of low-frequency sampled variables (LFSVs) when dealing with them. In this paper, we aim to develop a method that effectively handles LFSVs, preserving their distinctive characteristics and essential information throughout the modeling process.</div></div><div><h3>Methods:</h3><div>We propose a novel end-to-end method named <em><strong>L</strong>ow-<strong>F</strong>requency <strong>V</strong>ariable-<strong>D</strong>riven network</em> (LFVDNet) for medical time series analysis. Specifically, the Time-Aware Imputer (TA) module encodes the observed values and critical time information, and uses the attention mechanism to establish an association between the observed values and the missing values. TA adopts channel-independent strategy to prevent interference from high-frequency sampled variables (HFSVs) on LFSVs, thereby preserving the unique information contained in LFSVs. The Offset-Selection Module (OS) independently selects data points for each variable through offsets, avoiding the natural disadvantages of LFSVs in selection-based imputation, thus solving the problem of the loss of key information of LFSVs. LFVDNet is the first method for analyzing multivariate time series with missing values that emphasizes the effective utilization of LFSVs.</div></div><div><h3>Results:</h3><div>We carried out the experiments on four public datasets and the experimental results indicate that LFVDNet has better robustness and performance. All code is available at <span><span>https://github.com/dxqllp/LFVDNet</span><svg><path></path></svg></span>.</div></div><div><h3>Conclusions:</h3><div>This study proposes a novel method for medical time series analysis, namely LFVDNet, which aims to effectively utilize LFSVs. Specifically, we have designed the TA module, which performs imputation through temporal correlations. The OS module, on the other hand, performs selective imputation based on a data point selection strategy. We have verified the effectiveness of this method on four datasets constructed from PhysioNet 2012 and MIMIC-IV.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"171 ","pages":"Article 104913"},"PeriodicalIF":4.5,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145149181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1