首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Early multi-cancer detection through deep learning: An anomaly detection approach using Variational Autoencoder. 通过深度学习进行早期多癌检测:使用变异自动编码器的异常检测方法。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-19 DOI: 10.1016/j.jbi.2024.104751
Innocent Tatchum Sado, Louis Fippo Fitime, Geraud Fokou Pelap, Claude Tinku, Gaelle Mireille Meudje, Thomas Bouetou Bouetou

Cancer is a disease that causes many deaths worldwide. The treatment of cancer is first and foremost a matter of detection, a treatment that is most effective when the disease is detected at an early stage. With the evolution of technology, several computer-aided diagnosis tools have been developed around cancer; several image-based cancer detection methods have been developed too. However, cancer detection faces many difficulties related to early detection which is crucial for patient survival rate. To detect cancer early, scientists have been using transcriptomic data. However, this presents some challenges such as unlabelled data, a large amount of data, and image-based techniques that only focus on one type of cancer. The purpose of this work is to develop a deep learning model that can effectively detect as soon as possible, specifically in the early stages, any type of cancer as an anomaly in transcriptomic data. This model must have the ability to act independently and not be restricted to any specific type of cancer. To achieve this goal, we modeled a deep neural network (a Variational Autoencoder) and then defined an algorithm for detecting anomalies in the output of the Variational Autoencoder. The Variational Autoencoder consists of an encoder and a decoder with a hidden layer. With the TCGA and GTEx data, we were able to train the model for six types of cancer using the Adam optimizer with decay learning for training, and a two-component loss function. As a result, we obtained the lowest value of accuracy 0.950, and the lowest value of recall 0.830. This research leads us to the design of a deep learning model for the detection of cancer as an anomaly in transcriptomic data.

癌症是一种导致全球许多人死亡的疾病。癌症的治疗首先要靠检测,只有在早期发现癌症,治疗效果才会最好。随着技术的发展,围绕癌症开发出了多种计算机辅助诊断工具,还开发出了多种基于图像的癌症检测方法。然而,癌症检测面临着许多与早期检测有关的困难,而早期检测对患者的存活率至关重要。为了早期检测癌症,科学家们一直在使用转录组数据。然而,这也带来了一些挑战,如无标记数据、数据量大以及基于图像的技术只关注一种类型的癌症。这项工作的目的是开发一种深度学习模型,它能尽快(特别是在早期阶段)有效检测转录组数据中任何类型癌症的异常。该模型必须具备独立行动的能力,并且不局限于任何特定类型的癌症。为了实现这一目标,我们建立了一个深度神经网络模型(变异自动编码器),然后定义了一种算法,用于检测变异自动编码器输出中的异常。变异自动编码器由一个编码器和一个带隐藏层的解码器组成。利用 TCGA 和 GTEx 数据,我们使用亚当优化器(Adam optimizer)、衰减学习训练和双分量损失函数对六种类型的癌症进行了模型训练。结果,我们获得了准确率最低值 0.950 和召回率最低值 0.830。这项研究为我们设计了一种深度学习模型,用于检测转录组数据中的癌症异常。
{"title":"Early multi-cancer detection through deep learning: An anomaly detection approach using Variational Autoencoder.","authors":"Innocent Tatchum Sado, Louis Fippo Fitime, Geraud Fokou Pelap, Claude Tinku, Gaelle Mireille Meudje, Thomas Bouetou Bouetou","doi":"10.1016/j.jbi.2024.104751","DOIUrl":"https://doi.org/10.1016/j.jbi.2024.104751","url":null,"abstract":"<p><p>Cancer is a disease that causes many deaths worldwide. The treatment of cancer is first and foremost a matter of detection, a treatment that is most effective when the disease is detected at an early stage. With the evolution of technology, several computer-aided diagnosis tools have been developed around cancer; several image-based cancer detection methods have been developed too. However, cancer detection faces many difficulties related to early detection which is crucial for patient survival rate. To detect cancer early, scientists have been using transcriptomic data. However, this presents some challenges such as unlabelled data, a large amount of data, and image-based techniques that only focus on one type of cancer. The purpose of this work is to develop a deep learning model that can effectively detect as soon as possible, specifically in the early stages, any type of cancer as an anomaly in transcriptomic data. This model must have the ability to act independently and not be restricted to any specific type of cancer. To achieve this goal, we modeled a deep neural network (a Variational Autoencoder) and then defined an algorithm for detecting anomalies in the output of the Variational Autoencoder. The Variational Autoencoder consists of an encoder and a decoder with a hidden layer. With the TCGA and GTEx data, we were able to train the model for six types of cancer using the Adam optimizer with decay learning for training, and a two-component loss function. As a result, we obtained the lowest value of accuracy 0.950, and the lowest value of recall 0.830. This research leads us to the design of a deep learning model for the detection of cancer as an anomaly in transcriptomic data.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104751"},"PeriodicalIF":4.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142687219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Importance of variables from different time frames for predicting self-harm using health system data. 利用医疗系统数据预测自残时不同时间段变量的重要性。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-16 DOI: 10.1016/j.jbi.2024.104750
Charles J Wolock, Brian D Williamson, Susan M Shortreed, Gregory E Simon, Karen J Coleman, Rodney Yeargans, Brian K Ahmedani, Yihe Daida, Frances L Lynch, Rebecca C Rossom, Rebecca A Ziebell, Maricela Cruz, Robert D Wellman, R Yates Coley

Objective: Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. Using the framework of algorithm-agnostic variable importance, we study the predictive potential of variables corresponding to different time horizons prior to the index visit and demonstrate the application of variable importance techniques in the biomedical informatics setting.

Materials and methods: We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value.

Results: Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important.

Discussion: Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made.

Conclusion: Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm. Variable importance analyses can inform whether and how to implement risk prediction models into clinical practice given real-world data limitations. These analyses be applied more broadly in biomedical informatics research to provide insight into general clinical risk prediction tasks.

目的:利用医疗系统数据(电子健康记录和保险理赔信息)开发的自残风险预测模型通常会使用患者在进行预测时的就诊指数之前长达数年的信息。可能无法获得所有患者在某些时间段的测量数据。利用算法诊断变量重要性框架,我们研究了指数就诊前不同时间段相应变量的预测潜力,并展示了变量重要性技术在生物医学信息学中的应用:我们使用七个医疗系统的数据,利用变量重要性来量化近期(指数就诊前三个月内)和远期(指数就诊前一年以上)患者心理健康信息预测自残风险的潜力。我们将重要性量化为预测任务中排除相关变量集后预测性的下降幅度。我们使用判别指标来定义预测性:接收者操作特征曲线下面积(AUC)、灵敏度和阳性预测值:结果:与指标就诊前三个月相对应的心理健康预测因子显示出强烈的重要性信号;在一种情况下,排除这些变量后,AUC 从 0.85 降至 0.77。与更远的信息相对应的预测因子则不那么重要:讨论:指标就诊前几个月的预测因素非常重要。在预测时近期数据不完全可用(例如,由于保险理赔处理的滞后性)的情况下,自残预测模型的实施可能具有挑战性:结论:不同时间段的临床变量在预测自残时表现出不同程度的重要性。鉴于现实世界数据的局限性,变量重要性分析可以为是否以及如何在临床实践中实施风险预测模型提供信息。这些分析可更广泛地应用于生物医学信息学研究,为一般临床风险预测任务提供见解。
{"title":"Importance of variables from different time frames for predicting self-harm using health system data.","authors":"Charles J Wolock, Brian D Williamson, Susan M Shortreed, Gregory E Simon, Karen J Coleman, Rodney Yeargans, Brian K Ahmedani, Yihe Daida, Frances L Lynch, Rebecca C Rossom, Rebecca A Ziebell, Maricela Cruz, Robert D Wellman, R Yates Coley","doi":"10.1016/j.jbi.2024.104750","DOIUrl":"10.1016/j.jbi.2024.104750","url":null,"abstract":"<p><strong>Objective: </strong>Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. Using the framework of algorithm-agnostic variable importance, we study the predictive potential of variables corresponding to different time horizons prior to the index visit and demonstrate the application of variable importance techniques in the biomedical informatics setting.</p><p><strong>Materials and methods: </strong>We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value.</p><p><strong>Results: </strong>Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important.</p><p><strong>Discussion: </strong>Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made.</p><p><strong>Conclusion: </strong>Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm. Variable importance analyses can inform whether and how to implement risk prediction models into clinical practice given real-world data limitations. These analyses be applied more broadly in biomedical informatics research to provide insight into general clinical risk prediction tasks.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104750"},"PeriodicalIF":4.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning approaches for the discovery of clinical pathways from patient data: A systematic review 从患者数据中发现临床路径的机器学习方法:系统综述。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104746
Lillian Muyama , Antoine Neuraz , Adrien Coulet

Background:

Clinical pathways are sequences of events followed during the clinical care of a group of patients who meet pre-defined criteria. They have many applications ranging from healthcare evaluation and optimization to clinical decision support. These pathways can be discovered from existing healthcare data, in particular with machine learning which is a family of methods used to learn patterns from data. This review provides a comprehensive overview of the literature concerning the use of machine learning methods for clinical pathway discovery from patient data.

Methods:

Guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method , we conducted a systematic review of the existing literature. We searched 6 databases, i.e., ACM Digital Library, ScienceDirect, Web of Science, PubMed, IEEE Xplore, and Scopus spanning from January 2004 to December 2023 using search terms pertinent to clinical pathways and their development. Subsequently, the retrieved papers were analyzed to assess their relevance to the scope of this study.

Results:

In total, 131 papers that met the specified inclusion criteria were identified. These papers expressed diverse motivations behind data-driven clinical pathway discovery ranging from knowledge discovery to conformance checking with established clinical guidelines (derived from existing literature and clinical experts). Notably, the predominant methods employed (67.2%, n=88) involved unsupervised machine learning techniques, such as clustering and process mining.

Conclusions:

Relevant clinical pathways can be discovered from patient data using machine learning methods, with the desirable potential to aid clinical decision-making in healthcare. However, to reach this objective, the methods used to discover pathways should be reproducible, and rigorous performance evaluation by clinical experts needs to be conducted for validation.
背景:临床路径是对一组符合预定标准的患者进行临床治疗时所遵循的事件序列。它们有很多应用,从医疗评估和优化到临床决策支持。这些路径可以从现有的医疗数据中发现,特别是通过机器学习,机器学习是从数据中学习模式的一系列方法。本综述全面概述了有关使用机器学习方法从患者数据中发现临床路径的文献:在系统综述和荟萃分析首选报告项目(PRISMA)方法的指导下,我们对现有文献进行了系统综述。我们检索了 6 个数据库,即 ACM Digital Library、ScienceDirect、Web of Science、PubMed、IEEE Xplore 和 Scopus,检索时间跨度为 2004 年 1 月至 2023 年 12 月,检索词与临床路径及其开发相关。随后,对检索到的论文进行了分析,以评估它们与本研究范围的相关性:共有 131 篇论文符合特定的纳入标准。这些论文表达了数据驱动临床路径发现背后的各种动机,从知识发现到与既定临床指南(来自现有文献和临床专家)的一致性检查。值得注意的是,采用的主要方法(67.2%,n=88)涉及无监督机器学习方法,如聚类和流程挖掘:结论:使用机器学习方法可以从患者数据中发现相关的临床路径,具有帮助医疗保健临床决策的理想潜力。然而,要实现这一目标,用于发现路径的方法应具有可重复性,并由临床专家进行严格的性能评估以进行验证。
{"title":"Machine learning approaches for the discovery of clinical pathways from patient data: A systematic review","authors":"Lillian Muyama ,&nbsp;Antoine Neuraz ,&nbsp;Adrien Coulet","doi":"10.1016/j.jbi.2024.104746","DOIUrl":"10.1016/j.jbi.2024.104746","url":null,"abstract":"<div><h3>Background:</h3><div>Clinical pathways are sequences of events followed during the clinical care of a group of patients who meet pre-defined criteria. They have many applications ranging from healthcare evaluation and optimization to clinical decision support. These pathways can be discovered from existing healthcare data, in particular with machine learning which is a family of methods used to learn patterns from data. This review provides a comprehensive overview of the literature concerning the use of machine learning methods for clinical pathway discovery from patient data.</div></div><div><h3>Methods:</h3><div>Guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method , we conducted a systematic review of the existing literature. We searched 6 databases, <em>i.e.</em>, ACM Digital Library, ScienceDirect, Web of Science, PubMed, IEEE Xplore, and Scopus spanning from January 2004 to December 2023 using search terms pertinent to clinical pathways and their development. Subsequently, the retrieved papers were analyzed to assess their relevance to the scope of this study.</div></div><div><h3>Results:</h3><div>In total, 131 papers that met the specified inclusion criteria were identified. These papers expressed diverse motivations behind data-driven clinical pathway discovery ranging from knowledge discovery to conformance checking with established clinical guidelines (derived from existing literature and clinical experts). Notably, the predominant methods employed (67.2%, <span><math><mi>n</mi></math></span>=88) involved unsupervised machine learning techniques, such as clustering and process mining.</div></div><div><h3>Conclusions:</h3><div>Relevant clinical pathways can be discovered from patient data using machine learning methods, with the desirable potential to aid clinical decision-making in healthcare. However, to reach this objective, the methods used to discover pathways should be reproducible, and rigorous performance evaluation by clinical experts needs to be conducted for validation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104746"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering 针对医学视觉问题解答的多目标跨模态自监督视觉语言预训练。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104748
Gang Liu , Jinlong He , Pengfei Li , Zixu Zhao , Shenjun Zhong
Medical Visual Question Answering (VQA) is a task that aims to provide answers to questions about medical images, which utilizes both visual and textual information in the reasoning process. The absence of large-scale annotated medical VQA datasets presents a formidable obstacle to training a medical VQA model from scratch in an end-to-end manner. Existing works have been using image captioning dataset in the pre-training stage and fine-tuning to downstream VQA tasks. Following the same paradigm, we use a collection of public medical image captioning datasets to pre-train multimodality models in a self-supervised setup, and fine-tune to downstream medical VQA tasks. In the work, we propose a method that featured with Cross-Modal pre-training with Multiple Objectives (CMMO), which includes masked image modeling, masked language modeling, image-text matching, and image-text contrastive learning. The proposed method is designed to associate the visual features of medical images with corresponding medical concepts in captions, for learning aligned vision and language feature representations, and multi-modal interactions. The experimental results reveal that our proposed CMMO method outperforms state-of-the-art methods on three public medical VQA datasets, showing absolute improvements of 2.6%, 0.9%, and 4.0% on the VQA-RAD, PathVQA, and SLAKE dataset, respectively. We also conduct comprehensive ablation studies to validate our method, and visualize the attention maps which show a strong interpretability. The code and pre-trained weights will be released at https://github.com/pengfeiliHEU/CMMO.
医学视觉问题解答(VQA)是一项旨在为医学图像问题提供答案的任务,它在推理过程中同时利用了视觉和文本信息。由于缺乏大规模的注释医学 VQA 数据集,要以端到端的方式从头开始训练医学 VQA 模型面临着巨大的障碍。现有的工作都是在预训练阶段使用图像标题数据集,然后根据下游的 VQA 任务进行微调。按照同样的模式,我们使用公共医疗图像标题数据集在自监督设置中预训练多模态模型,并根据下游医疗 VQA 任务进行微调。在这项工作中,我们提出了一种以多目标交叉模态预训练(CMMO)为特色的方法,其中包括屏蔽图像建模、屏蔽语言建模、图像-文本匹配和图像-文本对比学习。所提出的方法旨在将医学图像的视觉特征与标题中相应的医学概念联系起来,以学习一致的视觉和语言特征表征以及多模态交互。实验结果表明,我们提出的 CMMO 方法在三个公共医疗 VQA 数据集上的表现优于最先进的方法,在 VQA-RAD、PathVQA 和 SLAKE 数据集上的绝对改进率分别为 2.6%、0.9% 和 4.0%。我们还进行了全面的消融研究,以验证我们的方法,并对注意力图进行了可视化,结果显示了很强的可解释性。代码和预训练权重将在 https://github.com/pengfeiliHEU/CMMO 上发布。
{"title":"Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering","authors":"Gang Liu ,&nbsp;Jinlong He ,&nbsp;Pengfei Li ,&nbsp;Zixu Zhao ,&nbsp;Shenjun Zhong","doi":"10.1016/j.jbi.2024.104748","DOIUrl":"10.1016/j.jbi.2024.104748","url":null,"abstract":"<div><div>Medical Visual Question Answering (VQA) is a task that aims to provide answers to questions about medical images, which utilizes both visual and textual information in the reasoning process. The absence of large-scale annotated medical VQA datasets presents a formidable obstacle to training a medical VQA model from scratch in an end-to-end manner. Existing works have been using image captioning dataset in the pre-training stage and fine-tuning to downstream VQA tasks. Following the same paradigm, we use a collection of public medical image captioning datasets to pre-train multimodality models in a self-supervised setup, and fine-tune to downstream medical VQA tasks. In the work, we propose a method that featured with Cross-Modal pre-training with Multiple Objectives (CMMO), which includes masked image modeling, masked language modeling, image-text matching, and image-text contrastive learning. The proposed method is designed to associate the visual features of medical images with corresponding medical concepts in captions, for learning aligned vision and language feature representations, and multi-modal interactions. The experimental results reveal that our proposed CMMO method outperforms state-of-the-art methods on three public medical VQA datasets, showing absolute improvements of 2.6%, 0.9%, and 4.0% on the VQA-RAD, PathVQA, and SLAKE dataset, respectively. We also conduct comprehensive ablation studies to validate our method, and visualize the attention maps which show a strong interpretability. The code and pre-trained weights will be released at <span><span>https://github.com/pengfeiliHEU/CMMO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104748"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction MultiADE:药物不良事件提取的多领域基准。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104744
Xiang Dai , Sarvnaz Karimi , Abeed Sarker , Ben Hachey , Cecile Paris

Objective:

Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, and shared tasks have been organised to facilitate active adverse event surveillance. However, most – if not all – datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation – the ability of a machine learning model to perform well on new, unseen domains (text types) – is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that is effective on various types of text, such as scientific literature and social media posts.

Methods:

We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named MultiADE. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset—CADECv2, which is an extension of CADEC (Karimi et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines.

Conclusion:

Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances.
The newly created CADECv2 and the scripts for building the benchmark are publicly available at CSIRO’s Data Portal (https://data.csiro.au/collection/csiro:62387). These resources enable the research community to further information extraction, leading to more effective active adverse drug event surveillance.
目的:主动不良事件监测从电子健康记录、医学文献、社交媒体和搜索引擎日志等不同数据源监测药物不良事件 (ADE)。多年来,人们创建了许多数据集,并组织了共享任务,以促进主动不良事件监测。然而,大多数(如果不是全部的话)数据集或共享任务都侧重于从特定类型的文本中提取 ADE。领域泛化--机器学习模型在新的、未见过的领域(文本类型)中表现良好的能力--还未得到充分探索。鉴于自然语言处理技术的飞速发展,一个悬而未决的问题是,我们离建立一个能在科学文献和社交媒体帖子等各种类型文本中有效使用的单一 ADE 提取模型还有多远:我们建立了一个多领域药物不良事件提取基准,并将其命名为 MultiADE,从而为回答这个问题做出了贡献。新基准包括从不同文本类型中采样的几个现有数据集和我们新创建的数据集--CADECv2,它是 CADEC(Karimi 等人,2015 年)的扩展,涵盖了比 CADEC 更多不同药物的在线帖子。我们的新数据集由人类注释者按照详细的注释指南进行仔细注释:我们的基准结果表明,训练模型的泛化能力远非完美,因此无法用于处理不同类型的文本。此外,尽管中间转移学习是一种很有前途的利用现有资源的方法,但还需要进一步研究领域适应方法,特别是选择有用的训练实例的经济有效的方法。新创建的 CADECv2 和用于构建基准的脚本可在 CSIRO 的数据门户网站 (https://data.csiro.au/collection/csiro:62387) 上公开获取。这些资源使研究界能够进一步提取信息,从而更有效地开展药物不良事件主动监测。
{"title":"MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction","authors":"Xiang Dai ,&nbsp;Sarvnaz Karimi ,&nbsp;Abeed Sarker ,&nbsp;Ben Hachey ,&nbsp;Cecile Paris","doi":"10.1016/j.jbi.2024.104744","DOIUrl":"10.1016/j.jbi.2024.104744","url":null,"abstract":"<div><h3>Objective:</h3><div>Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, and shared tasks have been organised to facilitate active adverse event surveillance. However, most – if not all – datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation – the ability of a machine learning model to perform well on new, unseen domains (text types) – is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that is effective on various <em>types of text</em>, such as scientific literature and social media posts.</div></div><div><h3>Methods:</h3><div>We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named <span>MultiADE</span>. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset—<span>CADECv2</span>, which is an extension of <span>CADEC</span> (Karimi et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines.</div></div><div><h3>Conclusion:</h3><div>Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances.</div><div>The newly created <span>CADECv2</span> and the scripts for building the benchmark are publicly available at CSIRO’s Data Portal (<span><span>https://data.csiro.au/collection/csiro:62387</span><svg><path></path></svg></span>). These resources enable the research community to further information extraction, leading to more effective active adverse drug event surveillance.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104744"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling the phenotypic patterns of hypertension and chronic hypotension 解析高血压和慢性低血压的表型模式。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104743
William W. Stead , Adam Lewis , Nunzia B. Giuse , Annette M. Williams , Italo Biaggioni , Lisa Bastarache

Objective

2017 blood pressure (BP) categories focus on cardiac risk. We hypothesize that studying the balance between mechanisms that increase or decrease BP across the medical phenome will lead to new insights. We devised a classifier that uses BP measures to assign individuals to mutually exclusive categories centered in the upper (Htn), lower (Hotn) and middle (Naf) zones of the BP spectrum; and examined the epidemiologic and phenotypic patterns of these BP-categories.

Methods

We classified a cohort of 832,560 deidentified electronic health records by BP-category; compared the frequency of BP-categories and four subtypes of Htn and Hotn by sex and age-decade; visualized the distributions of systolic, diastolic, mean arterial and pulse pressures stratified by BP-category; and ran Phenome-wide Association Studies (PheWAS) for Htn and Hotn. We paired knowledgebases for hypertension and hypotension and computed aggregate knowledgebase status (KB-status) indicating known associations. We assessed alignment of PheWAS results with KB-status for phecodes in the knowledgebase, and paired PheWAS correlations with KB-status to surface phenotypic patterns.

Results

BP-categories represent distinct distributions within the multimodal distributions of systolic and diastolic pressure. They are centered in the upper, lower, and middle zones of mean arterial pressure and provide a different signal than pulse pressure. For phecodes in the knowledgebase, 85% of positive correlations align with KB-status. Phenotypic patterns for Htn and Hotn overlap for several phecodes and are separate for others. Our analysis suggests five candidates for hypothesis testing research, two where the prevalence of the association with Htn or Hotn may be under appreciated, three where mechanisms that increase and decrease blood pressure may be affecting one another’s expression.

Conclusion

PairedPheWAS methods may open a phenome-wide path to disentangling hypertension and chronic hypotension. Our classifier provides a starting point for assigning individuals to BP-categories representing the upper, lower, and middle zones of the BP spectrum. 4.7 % of individuals matching 2017 BP categories for normal, elevated BP or isolated hypertension, have diastolic pressure < 60. Research is needed to fine-tune the classifier, provide external validation, evaluate the clinical significance of diastolic pressure < 60, and test the candidate hypotheses.
目的:2017 年的血压(BP)分类侧重于心脏风险。我们假设,研究整个医学表型组中血压升高或降低机制之间的平衡将带来新的见解。我们设计了一种分类器,利用血压测量值将个体分配到以血压谱上区(Htn)、下区(Hotn)和中区(Naf)为中心的相互排斥的类别;并研究了这些血压类别的流行病学和表型模式:我们按血压类别对 832,560 份去标识化电子健康记录进行了分类;按性别和年龄段比较了血压类别以及 Htn 和 Hotn 四种亚型的频率;可视化了按血压类别分层的收缩压、舒张压、平均动脉压和脉搏压的分布;并对 Htn 和 Hotn 进行了全表型关联研究 (PheWAS)。我们将高血压和低血压知识库配对,并计算了表明已知关联的知识库总体状态(KB-status)。我们评估了PheWAS结果与知识库中嗜铬细胞编码的知识库状态的一致性,并将PheWAS相关性与知识库状态配对,以显示表型模式:血压类别代表了收缩压和舒张压多模态分布中的不同分布。它们以平均动脉压的上区、下区和中区为中心,提供与脉压不同的信号。对于知识库中的嗜铬细胞编码,85% 的正相关性与 KB 状态一致。Htn 和 Hotn 的表型模式在几个嗜铬细胞编码中重叠,而在其他编码中则分开。我们的分析为假设检验研究提出了五个候选方案,其中两个方案与 Htn 或 Hotn 的相关性可能未得到充分重视,三个方案中血压升高和降低的机制可能会影响彼此的表达:结论:PheWAS 成对方法可能为区分高血压和慢性低血压开辟了一条全表象之路。我们的分类器为将个体分配到代表血压谱上层、下层和中层区域的血压类别提供了一个起点。在符合 2017 年血压正常、血压升高或孤立性高血压类别的个体中,4.7% 的人有舒张压
{"title":"Disentangling the phenotypic patterns of hypertension and chronic hypotension","authors":"William W. Stead ,&nbsp;Adam Lewis ,&nbsp;Nunzia B. Giuse ,&nbsp;Annette M. Williams ,&nbsp;Italo Biaggioni ,&nbsp;Lisa Bastarache","doi":"10.1016/j.jbi.2024.104743","DOIUrl":"10.1016/j.jbi.2024.104743","url":null,"abstract":"<div><h3>Objective</h3><div>2017 blood pressure (BP) categories focus on cardiac risk. We hypothesize that studying the balance between mechanisms that increase or decrease BP across the medical phenome will lead to new insights. We devised a classifier that uses BP measures to assign individuals to mutually exclusive categories centered in the upper (Htn), lower (Hotn) and middle (Naf) zones of the BP spectrum; and examined the epidemiologic and phenotypic patterns of these BP-categories.</div></div><div><h3>Methods</h3><div>We classified a cohort of 832,560 deidentified electronic health records by BP-category; compared the frequency of BP-categories and four subtypes of Htn and Hotn by sex and age-decade; visualized the distributions of systolic, diastolic, mean arterial and pulse pressures stratified by BP-category; and ran Phenome-wide Association Studies (PheWAS) for Htn and Hotn. We paired knowledgebases for hypertension and hypotension and computed aggregate knowledgebase status (KB-status) indicating known associations. We assessed alignment of PheWAS results with KB-status for phecodes in the knowledgebase, and paired PheWAS correlations with KB-status to surface phenotypic patterns.</div></div><div><h3>Results</h3><div>BP-categories represent distinct distributions within the multimodal distributions of systolic and diastolic pressure. They are centered in the upper, lower, and middle zones of mean arterial pressure and provide a different signal than pulse pressure. For phecodes in the knowledgebase, 85% of positive correlations align with KB-status. Phenotypic patterns for Htn and Hotn overlap for several phecodes and are separate for others. Our analysis suggests five candidates for hypothesis testing research, two where the prevalence of the association with Htn or Hotn may be under appreciated, three where mechanisms that increase and decrease blood pressure may be affecting one another’s expression.</div></div><div><h3>Conclusion</h3><div>PairedPheWAS methods may open a phenome-wide path to disentangling hypertension and chronic hypotension. Our classifier provides a starting point for assigning individuals to BP-categories representing the upper, lower, and middle zones of the BP spectrum. 4.7 % of individuals matching 2017 BP categories for normal, elevated BP or isolated hypertension, have diastolic pressure &lt; 60. Research is needed to fine-tune the classifier, provide external validation, evaluate the clinical significance of diastolic pressure &lt; 60, and test the candidate hypotheses.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104743"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension 机器阅读理解下基于演示的生物医学命名实体识别学习
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104739
Leilei Su , Jian Chen , Yifan Peng , Cong Sun

Objective:

Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model’s capability to recognize biomedical entities in scenarios of few-shot learning.

Methods:

By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA.

Results:

We examined the models’ efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively.

Conclusion:

We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.
目的:虽然深度学习技术已经取得了显著的成就,但它们往往依赖于大量的手标注数据,而且在少量学习的场景中往往表现不佳。方法:通过将生物医学命名实体识别(BioNER)重新定义为机器阅读理解(MRC)问题,我们提出了一种基于演示的学习方法来解决生物医学命名实体识别(BioNER)的少量学习问题,该方法涉及构建适当的任务演示。在评估我们提出的方法时,我们使用了六个基准数据集,包括BC4CHEMD、BC5CDR-Chemical、BC5CDR-Disease、NCBI-Disease、BC2GM和JNLPBA,将我们提出的方法与现有的先进方法进行了比较。在 25 次学习中,我们观察到平均 F1 分数比基准方法提高了 1.1%,在六个数据集上分别达到 61.7%、84.1%、69.1%、70.1%、50.6% 和 59.9%。在 50 次学习中,我们的平均 F1 分数比基准方法进一步提高了 1.0%,分别达到了 73.1%、86.8%、76.1%、75.6%、61.7% 和 65.4%。此外,我们的 MRC 语言模型可以成功地与完全监督学习方法竞争,后者在很大程度上依赖于丰富的注释数据。这些结果凸显了未来推进少量生物核酸方法的可能途径。
{"title":"Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension","authors":"Leilei Su ,&nbsp;Jian Chen ,&nbsp;Yifan Peng ,&nbsp;Cong Sun","doi":"10.1016/j.jbi.2024.104739","DOIUrl":"10.1016/j.jbi.2024.104739","url":null,"abstract":"<div><h3>Objective:</h3><div>Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model’s capability to recognize biomedical entities in scenarios of few-shot learning.</div></div><div><h3>Methods:</h3><div>By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA.</div></div><div><h3>Results:</h3><div>We examined the models’ efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively.</div></div><div><h3>Conclusion:</h3><div>We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104739"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer HEART:利用异构关系感知转换器学习更好的电子病历数据表示。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104741
Tinglin Huang , Syed Asad Rizvi , Rohan Krishna Thakur , Vimig Socrates , Meili Gupta , David van Dijk , R. Andrew Taylor , Rex Ying

Objective:

Pretrained language models have recently demonstrated their effectiveness in modeling Electronic Health Record (EHR) data by modeling the encounters of patients as sentences. However, existing methods fall short of utilizing the inherent heterogeneous correlations between medical entities—which include diagnoses, medications, procedures, and lab tests. Existing studies either focus merely on diagnosis entities or encode different entities in a homogeneous space, leading to suboptimal performance. Motivated by this, we aim to develop a foundational language model pre-trained on EHR data with explicitly incorporating the heterogeneous correlations among these entities.

Methods:

In this study, we propose HEART, a heterogeneous relation-aware transformer for EHR. Our model includes a range of heterogeneous entities within each input sequence and represents pairwise relationships between entities as a relation embedding. Such a higher-order representation allows the model to perform complex reasoning and derive attention weights in the heterogeneous context. Additionally, a multi-level attention scheme is employed to exploit the connection between different encounters while alleviating the high computational costs. For pretraining, HEART engages with two tasks, missing entity prediction and anomaly detection, which both effectively enhance the model’s performance on various downstream tasks.

Results:

Extensive experiments on two EHR datasets and five downstream tasks demonstrate HEART’s superior performance compared to four SOTA foundation models. For instance, HEART achieves improvements of 12.1% and 4.1% over Med-BERT in death and readmission prediction, respectively. Additionally, case studies show that HEART offers interpretable insights into the relationships between entities through the learned relation embeddings.

Conclusion:

We study the problem of EHR representation learning and propose HEART, a model that leverages the heterogeneous relationships between medical entities. Our approach includes a multi-level encoding scheme and two specialized pretrained objectives, designed to boost both the efficiency and effectiveness of the model. We have comprehensively evaluated HEART across five clinically significant downstream tasks using two EHR datasets. The experimental results verify the model’s great performance and validate its practical utility in healthcare applications. Code: https://github.com/Graph-and-Geometric-Learning/HEART.
目的:最近,预训练语言模型通过将患者的就诊情况建模为句子,证明了其在电子健康记录(EHR)数据建模方面的有效性。然而,现有的方法无法利用医疗实体(包括诊断、药物、手术和化验)之间固有的异质性关联。现有研究要么只关注诊断实体,要么在同质空间中对不同实体进行编码,从而导致性能不佳。受此启发,我们旨在开发一种在电子病历数据上进行预训练的基础语言模型,明确纳入这些实体之间的异质相关性:在这项研究中,我们提出了 HEART,一种用于电子病历的异构关系感知转换器。我们的模型包括每个输入序列中的一系列异构实体,并将实体间的成对关系表示为关系嵌入。这种高阶表示法允许模型在异构环境中执行复杂的推理并推导出关注权重。此外,HEART 还采用了多级注意力方案,以利用不同遭遇之间的联系,同时降低高昂的计算成本。在预训练中,HEART 参与了缺失实体预测和异常检测两项任务,这两项任务都能有效提高模型在各种下游任务中的性能:在两个电子病历数据集和五个下游任务上进行的广泛实验表明,与四个 SOTA 基础模型相比,HEART 的性能更为出色。例如,在死亡预测和再入院预测方面,HEART 比 Med-BERT 分别提高了 12.1% 和 4.1%。此外,案例研究表明,HEART 通过学习到的关系嵌入对实体之间的关系提供了可解释的见解:我们对电子病历表示学习问题进行了研究,并提出了 HEART 模型,该模型充分利用了医疗实体之间的异构关系。我们的方法包括多级编码方案和两个专门的预训练目标,旨在提高模型的效率和有效性。我们利用两个电子病历数据集对 HEART 的五项临床重要下游任务进行了全面评估。实验结果验证了该模型的卓越性能,并验证了其在医疗保健应用中的实用性。
{"title":"HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer","authors":"Tinglin Huang ,&nbsp;Syed Asad Rizvi ,&nbsp;Rohan Krishna Thakur ,&nbsp;Vimig Socrates ,&nbsp;Meili Gupta ,&nbsp;David van Dijk ,&nbsp;R. Andrew Taylor ,&nbsp;Rex Ying","doi":"10.1016/j.jbi.2024.104741","DOIUrl":"10.1016/j.jbi.2024.104741","url":null,"abstract":"<div><h3>Objective:</h3><div>Pretrained language models have recently demonstrated their effectiveness in modeling Electronic Health Record (EHR) data by modeling the encounters of patients as sentences. However, existing methods fall short of utilizing the inherent heterogeneous correlations between medical entities—which include diagnoses, medications, procedures, and lab tests. Existing studies either focus merely on diagnosis entities or encode different entities in a homogeneous space, leading to suboptimal performance. Motivated by this, we aim to develop a foundational language model pre-trained on EHR data with explicitly incorporating the heterogeneous correlations among these entities.</div></div><div><h3>Methods:</h3><div>In this study, we propose <span>HEART</span>, a heterogeneous relation-aware transformer for EHR. Our model includes a range of heterogeneous entities within each input sequence and represents pairwise relationships between entities as a relation embedding. Such a higher-order representation allows the model to perform complex reasoning and derive attention weights in the heterogeneous context. Additionally, a multi-level attention scheme is employed to exploit the connection between different encounters while alleviating the high computational costs. For pretraining, <span>HEART</span> engages with two tasks, missing entity prediction and anomaly detection, which both effectively enhance the model’s performance on various downstream tasks.</div></div><div><h3>Results:</h3><div>Extensive experiments on two EHR datasets and five downstream tasks demonstrate <span>HEART</span>’s superior performance compared to four SOTA foundation models. For instance, <span>HEART</span> achieves improvements of 12.1% and 4.1% over Med-BERT in death and readmission prediction, respectively. Additionally, case studies show that <span>HEART</span> offers interpretable insights into the relationships between entities through the learned relation embeddings.</div></div><div><h3>Conclusion:</h3><div>We study the problem of EHR representation learning and propose HEART, a model that leverages the heterogeneous relationships between medical entities. Our approach includes a multi-level encoding scheme and two specialized pretrained objectives, designed to boost both the efficiency and effectiveness of the model. We have comprehensively evaluated HEART across five clinically significant downstream tasks using two EHR datasets. The experimental results verify the model’s great performance and validate its practical utility in healthcare applications. Code: <span><span>https://github.com/Graph-and-Geometric-Learning/HEART</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104741"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for evaluation of minimal cut sets 最小切割集评估算法。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104740
Marcin Radom , Agnieszka Rybarczyk , Igor Piekarz , Piotr Formanowicz

Objective:

We propose a way to enhance the evaluation of minimal cut sets (MCSs) in biological systems modeled by Petri nets, by providing criteria and methodology for determining their optimality in disabling specific processes without affecting critical system components.

Methods:

This study concerns Petri nets to model biological systems and utilizes two primary approaches for MCS evaluation. First is the analyzing impact on t-invariants to identify structural dependencies. Second is assessing the impact on potentially starved transitions caused by the inactivity of specific MCSs. This approach deal with net dynamics. These methodologies aim to offer practical tools for assessing the quality and effectiveness of MCSs.

Results:

The proposed methodologies were applied to two case studies. In the first case, a cholesterol metabolism network was analyzed to investigate how local inflammation and oxidative stress, in conjunction with cholesterol imbalances, influence the progression of atherosclerosis. The MCSs were ranked, with the top sets presented, focusing on those that disabled the fewest number of t-invariants. In the second case, a carbohydrate metabolism disorder model was examined to understand its impact on atherosclerosis progression. The analysis aimed to identify MCSs that could inhibit the atherosclerosis process by targeting specific transitions. Both studies utilized the Holmes software for calculations, demonstrating the effectiveness of the proposed evaluation methodologies in ranking MCSs for practical biological applications.

Conclusion:

The algorithms proposed in this paper offer an analytical approach for evaluating the quality of MCSs in biological systems. By providing criteria for MCS optimality, these approaches have potential to enhance the utility of MCS analysis in systems biology, aiding in the understanding and manipulation of complex biological networks.
Algorithm are implemented within Holmes software, an open-source project available at https://github.com/bszawulak/HolmesPN.
目标:我们提出了一种在 Petri 网建模的生物系统中加强最小割集(MCS)评估的方法,提供了确定最小割集在不影响关键系统组件的情况下禁用特定过程的最优性的标准和方法:本研究采用 Petri 网为生物系统建模,并利用两种主要方法对 MCS 进行评估。首先是分析对 t 变量的影响,以确定结构依赖性。其次是评估特定多重监控系统不活动对潜在饥饿转换的影响。这种方法处理的是净动态。这些方法旨在为评估监控监的质量和有效性提供实用工具:结果:所提出的方法适用于两个案例研究。第一个案例分析了胆固醇代谢网络,以研究局部炎症和氧化应激与胆固醇失衡如何影响动脉粥样硬化的进展。对多态性变异体进行了排序,并展示了最优秀的变异体,重点是那些禁用 t 变异体数量最少的变异体。第二种情况是研究碳水化合物代谢紊乱模型,以了解其对动脉粥样硬化进展的影响。分析的目的是找出可以通过靶向特定转变来抑制动脉粥样硬化过程的 MCS。这两项研究都使用了 Holmes 软件进行计算,证明了所提出的评估方法在实际生物应用中对 MCS 进行排序的有效性:本文提出的算法提供了一种评估生物系统中多重控制信号质量的分析方法。通过提供 MCS 最佳性标准,这些方法有望提高系统生物学中 MCS 分析的实用性,帮助理解和操纵复杂的生物网络。算法在 Holmes 软件中实现,该软件是一个开源项目,可在 https://github.com/bszawulak/HolmesPN 上获取。
{"title":"Algorithms for evaluation of minimal cut sets","authors":"Marcin Radom ,&nbsp;Agnieszka Rybarczyk ,&nbsp;Igor Piekarz ,&nbsp;Piotr Formanowicz","doi":"10.1016/j.jbi.2024.104740","DOIUrl":"10.1016/j.jbi.2024.104740","url":null,"abstract":"<div><h3>Objective:</h3><div>We propose a way to enhance the evaluation of minimal cut sets (MCSs) in biological systems modeled by Petri nets, by providing criteria and methodology for determining their optimality in disabling specific processes without affecting critical system components.</div></div><div><h3>Methods:</h3><div>This study concerns Petri nets to model biological systems and utilizes two primary approaches for MCS evaluation. First is the analyzing impact on t-invariants to identify structural dependencies. Second is assessing the impact on potentially starved transitions caused by the inactivity of specific MCSs. This approach deal with net dynamics. These methodologies aim to offer practical tools for assessing the quality and effectiveness of MCSs.</div></div><div><h3>Results:</h3><div>The proposed methodologies were applied to two case studies. In the first case, a cholesterol metabolism network was analyzed to investigate how local inflammation and oxidative stress, in conjunction with cholesterol imbalances, influence the progression of atherosclerosis. The MCSs were ranked, with the top sets presented, focusing on those that disabled the fewest number of t-invariants. In the second case, a carbohydrate metabolism disorder model was examined to understand its impact on atherosclerosis progression. The analysis aimed to identify MCSs that could inhibit the atherosclerosis process by targeting specific transitions. Both studies utilized the Holmes software for calculations, demonstrating the effectiveness of the proposed evaluation methodologies in ranking MCSs for practical biological applications.</div></div><div><h3>Conclusion:</h3><div>The algorithms proposed in this paper offer an analytical approach for evaluating the quality of MCSs in biological systems. By providing criteria for MCS optimality, these approaches have potential to enhance the utility of MCS analysis in systems biology, aiding in the understanding and manipulation of complex biological networks.</div><div>Algorithm are implemented within Holmes software, an open-source project available at <span><span>https://github.com/bszawulak/HolmesPN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104740"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142501142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data generation in healthcare environments. 医疗环境中的数据生成。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104742
Ricardo Cardoso Pereira, Pedro Pereira Rodrigues, Irina Sousa Moreira, Pedro Henriques Abreu
{"title":"Data generation in healthcare environments.","authors":"Ricardo Cardoso Pereira, Pedro Pereira Rodrigues, Irina Sousa Moreira, Pedro Henriques Abreu","doi":"10.1016/j.jbi.2024.104742","DOIUrl":"10.1016/j.jbi.2024.104742","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104742"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142568734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1