首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Taming vision transformers for clinical laryngoscopy assessment 用于临床喉镜检查评估的驯服视力变换器。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-01 DOI: 10.1016/j.jbi.2024.104766
Xinzhu Zhang , Jing Zhao , Daoming Zong , Henglei Ren , Chunli Gao

Objective:

Laryngoscopy, essential for diagnosing laryngeal cancer (LCA), faces challenges due to high inter-observer variability and the reliance on endoscopist expertise. Distinguishing precancerous from early-stage cancerous lesions is particularly challenging, even for experienced practitioners, given their similar appearances. This study aims to enhance laryngoscopic image analysis to improve early screening/detection of cancer or precancerous conditions.

Methods:

We propose MedFormer, a laryngeal cancer classification method based on the Vision Transformer (ViT). To address data scarcity, MedFormer employs a customized transfer learning approach that leverages the representational power of pre-trained transformers. This method enables robust out-of-domain generalization by fine-tuning a minimal set of additional parameters.

Results:

MedFormer exhibits sensitivity-specificity values of 98%–89% for identifying precancerous lesions (leukoplakia) and 89%–97% for detecting cancer, surpassing CNN counterparts significantly. Additionally, when compared to the two selected ViT-based models, MedFormer also demonstrates superior performance. It also outperforms physician visual evaluations (PVE) in certain scenarios and matches PVE performance in all cases. Visualizations using class activation maps (CAM) and deformable patches demonstrate MedFormer’s interpretability, aiding clinicians in understanding the model’s predictions.

Conclusion:

We highlight the potential of visual transformers in clinical laryngoscopic assessments, presenting MedFormer as an effective method for the early detection of laryngeal cancer.
目的:喉镜检查是诊断喉癌(LCA)的关键,由于观察者之间的高度变异性和对内镜专家专业知识的依赖,喉镜检查面临着挑战。区分癌前病变和早期癌性病变尤其具有挑战性,即使对经验丰富的医生来说也是如此,因为它们的外观相似。本研究旨在加强喉镜图像分析,以提高早期筛查/发现癌症或癌前病变。方法:提出一种基于视觉变换器(Vision Transformer, ViT)的喉癌分类方法MedFormer。为了解决数据短缺问题,MedFormer采用了一种定制的迁移学习方法,利用了预训练变压器的表征能力。该方法通过微调最小附加参数集实现鲁棒的域外泛化。结果:MedFormer识别癌前病变(白斑)的敏感性-特异性值为98%-89%,检测癌症的敏感性-特异性值为89%-97%,明显超过CNN。此外,与两种选定的基于vit的模型相比,MedFormer也表现出卓越的性能。在某些情况下,它也优于医生的视觉评估(PVE),并在所有情况下匹配PVE的表现。使用类激活图(CAM)和可变形贴片的可视化展示了MedFormer的可解释性,帮助临床医生理解模型的预测。结论:我们强调了视觉变形在临床喉镜评估中的潜力,表明MedFormer是一种早期发现喉癌的有效方法。
{"title":"Taming vision transformers for clinical laryngoscopy assessment","authors":"Xinzhu Zhang ,&nbsp;Jing Zhao ,&nbsp;Daoming Zong ,&nbsp;Henglei Ren ,&nbsp;Chunli Gao","doi":"10.1016/j.jbi.2024.104766","DOIUrl":"10.1016/j.jbi.2024.104766","url":null,"abstract":"<div><h3>Objective:</h3><div>Laryngoscopy, essential for diagnosing laryngeal cancer (LCA), faces challenges due to high inter-observer variability and the reliance on endoscopist expertise. Distinguishing precancerous from early-stage cancerous lesions is particularly challenging, even for experienced practitioners, given their similar appearances. This study aims to enhance laryngoscopic image analysis to improve early screening/detection of cancer or precancerous conditions.</div></div><div><h3>Methods:</h3><div>We propose MedFormer, a laryngeal cancer classification method based on the Vision Transformer (ViT). To address data scarcity, MedFormer employs a customized transfer learning approach that leverages the representational power of pre-trained transformers. This method enables robust out-of-domain generalization by fine-tuning a minimal set of additional parameters.</div></div><div><h3>Results:</h3><div>MedFormer exhibits sensitivity-specificity values of 98%–89% for identifying precancerous lesions (leukoplakia) and 89%–97% for detecting cancer, surpassing CNN counterparts significantly. Additionally, when compared to the two selected ViT-based models, MedFormer also demonstrates superior performance. It also outperforms physician visual evaluations (PVE) in certain scenarios and matches PVE performance in all cases. Visualizations using class activation maps (CAM) and deformable patches demonstrate MedFormer’s interpretability, aiding clinicians in understanding the model’s predictions.</div></div><div><h3>Conclusion:</h3><div>We highlight the potential of visual transformers in clinical laryngoscopic assessments, presenting MedFormer as an effective method for the early detection of laryngeal cancer.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"162 ","pages":"Article 104766"},"PeriodicalIF":4.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143006140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining implementation outcomes in health information exchange systems: A scoping review
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-20 DOI: 10.1016/j.jbi.2025.104782
Bonnie Lum , Navisha Weerasinghe , Charlene H. Chu , Dan Perri , Lisa Cranley

Background

Health information exchange (HIE) facilitates the secure exchange of digital health data across disparate health systems and settings. The implementation of information technology projects in healthcare is complex, further complicated by the fact that implementation success, through the measure of implementation outcomes, has been inconsistently defined and evaluated. There is no known scoping review examining implementation success through implementation outcomes in the field of HIE technologies. The aim of this scoping review was to provide a synthesis of studies related to reported implementation outcomes of HIE solutions (and related interoperability technologies) with a goal to inform the implementation of large-scale HIE projects in the future.

Methods

A scoping review, guided by the Arksey and O’Malley Framework, was conducted in four databases (Medline, Embase, CINAHL, and Web of Science), gathering studies from January 2010 to June 2023. Studies that described the implementation of a technology supporting interoperability or HIE across different organizations and/or across different healthcare settings and described the evaluation of one or more implementation outcomes from the Implementation Outcome Framework (IOF) were included.

Results

37 studies were included in this review. The implementation outcome adoption was most frequently reported (n = 24). Fidelity and penetration were not reported. Few studies provided definitions for the outcomes being evaluated. Few studies provided details surrounding the stage of implementation as it relates to the outcome examined. No studies used the IOF or other similar implementation science evaluation frameworks.

Conclusion

This review highlights the existing gaps in the field of HIE/interoperability solutions implementation studies. Future studies should employ theoretical frameworks to guide their research, standardize language used to describe implementation outcomes, and expand knowledge of salient outcomes at varying stages of implementation.
{"title":"Examining implementation outcomes in health information exchange systems: A scoping review","authors":"Bonnie Lum ,&nbsp;Navisha Weerasinghe ,&nbsp;Charlene H. Chu ,&nbsp;Dan Perri ,&nbsp;Lisa Cranley","doi":"10.1016/j.jbi.2025.104782","DOIUrl":"10.1016/j.jbi.2025.104782","url":null,"abstract":"<div><h3>Background</h3><div>Health information exchange (HIE) facilitates the secure exchange of digital health data across disparate health systems and settings. The implementation of information technology projects in healthcare is complex, further complicated by the fact that implementation success, through the measure of implementation outcomes, has been inconsistently defined and evaluated. There is no known scoping review examining implementation success through implementation outcomes in the field of HIE technologies. The aim of this scoping review was to provide a synthesis of studies related to reported implementation outcomes of HIE solutions (and related interoperability technologies) with a goal to inform the implementation of large-scale HIE projects in the future.</div></div><div><h3>Methods</h3><div>A scoping review, guided by the Arksey and O’Malley Framework, was conducted in four databases (Medline, Embase, CINAHL, and Web of Science), gathering studies from January 2010 to June 2023. Studies that described the implementation of a technology supporting interoperability or HIE across different organizations and/or across different healthcare settings and described the evaluation of one or more implementation outcomes from the Implementation Outcome Framework (IOF) were included.</div></div><div><h3>Results</h3><div>37 studies were included in this review. The implementation outcome adoption was most frequently reported (n = 24). Fidelity and penetration were not reported. Few studies provided definitions for the outcomes being evaluated. Few studies provided details surrounding the stage of implementation as it relates to the outcome examined. No studies used the IOF or other similar implementation science evaluation frameworks.</div></div><div><h3>Conclusion</h3><div>This review highlights the existing gaps in the field of HIE/interoperability solutions implementation studies. Future studies should employ theoretical frameworks to guide their research, standardize language used to describe implementation outcomes, and expand knowledge of salient outcomes at varying stages of implementation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104782"},"PeriodicalIF":4.0,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143023464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coherence and comprehensibility: Large language models predict lay understanding of health-related content 连贯性和可理解性:大型语言模型预测外行对健康相关内容的理解。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104758
Trevor Cohen , Weizhe Xu , Yue Guo , Serguei Pakhomov , Gondy Leroy
Health literacy is a prerequisite to informed health-related decision making. To facilitate understanding of information, text should be presented at an appropriate reading level for the reader. Cognitive studies suggest that the coherence of a text – the interconnectedness between the ideas it expresses – is especially important for low-knowledge readers, who lack the background knowledge to draw inferences from text that is implicitly connected only. Prior work in cognitive science has yielded automated methods to estimate coherence. These methods estimate the proximity between text representations in a semantic vector space, with the underlying idea that units of text that are poorly connected will be further apart in this space. In addition, recent work with large language models (LLMs) has produced probabilistic methodological analogues that have yet to be evaluated for this purpose. This work concerns the relationship between these automated measures and layperson comprehension of biomedical text. To characterize this relationship, we applied a range of automated measures of text coherence to a set of text snippets, some of which were deliberately modified to improve their accessibility in a series of reading comprehension experiments. Results indicate significant associations between reader comprehension – as estimated using multiple-choice questions – and LLM-derived coherence metrics. Interventions designed to improve the comprehensibility of passages also improved their coherence, as measured with the best-performing LLM-derived models and shown by improved reader understanding of the text. These findings support the utility of LLM-derived measures of text coherence as a means to identify gaps in connectedness that make biomedical text difficult for laypeople to understand, with the potential to inform both manual and automated methods to improve the accessibility of the biomedical literature.
卫生知识普及是作出与卫生有关的知情决策的先决条件。为了便于理解信息,文本应该以适合读者的阅读水平呈现。认知研究表明,文本的连贯性——即文本所表达的观点之间的相互联系——对低知识的读者来说尤为重要,因为他们缺乏背景知识,无法从只有隐含联系的文本中得出推论。认知科学先前的工作已经产生了估计连贯性的自动化方法。这些方法估计语义向量空间中文本表示之间的接近度,其基本思想是连接不良的文本单元在该空间中会进一步分开。此外,最近使用大型语言模型(llm)的工作已经产生了概率方法类似物,但尚未为此目的进行评估。这项工作涉及这些自动化措施和外行人对生物医学文本的理解之间的关系。为了描述这种关系,我们对一组文本片段应用了一系列文本连贯性的自动测量,其中一些片段在一系列阅读理解实验中被故意修改以提高其可访问性。结果表明,读者理解-估计使用多项选择题-和法学硕士衍生的连贯性指标之间的显著关联。旨在提高段落可理解性的干预措施也提高了它们的连贯性,正如用表现最好的法学硕士衍生模型所衡量的那样,并通过提高读者对文本的理解来显示。这些发现支持了法学硕士衍生的文本连贯性测量的效用,作为识别使外行难以理解的生物医学文本的连通性差距的一种手段,具有通知手动和自动化方法以提高生物医学文献的可及性的潜力。
{"title":"Coherence and comprehensibility: Large language models predict lay understanding of health-related content","authors":"Trevor Cohen ,&nbsp;Weizhe Xu ,&nbsp;Yue Guo ,&nbsp;Serguei Pakhomov ,&nbsp;Gondy Leroy","doi":"10.1016/j.jbi.2024.104758","DOIUrl":"10.1016/j.jbi.2024.104758","url":null,"abstract":"<div><div>Health literacy is a prerequisite to informed health-related decision making. To facilitate understanding of information, text should be presented at an appropriate reading level for the reader. Cognitive studies suggest that the coherence of a text – the interconnectedness between the ideas it expresses – is especially important for low-knowledge readers, who lack the background knowledge to draw inferences from text that is implicitly connected only. Prior work in cognitive science has yielded automated methods to estimate coherence. These methods estimate the <em>proximity</em> between text representations in a semantic vector space, with the underlying idea that units of text that are poorly connected will be further apart in this space. In addition, recent work with large language models (LLMs) has produced <em>probabilistic</em> methodological analogues that have yet to be evaluated for this purpose. This work concerns the relationship between these automated measures and layperson comprehension of biomedical text. To characterize this relationship, we applied a range of automated measures of text coherence to a set of text snippets, some of which were deliberately modified to improve their accessibility in a series of reading comprehension experiments. Results indicate significant associations between reader comprehension – as estimated using multiple-choice questions – and LLM-derived coherence metrics. Interventions designed to improve the comprehensibility of passages also improved their coherence, as measured with the best-performing LLM-derived models and shown by improved reader understanding of the text. These findings support the utility of LLM-derived measures of text coherence as a means to identify gaps in connectedness that make biomedical text difficult for laypeople to understand, with the potential to inform both manual and automated methods to improve the accessibility of the biomedical literature.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104758"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142813110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient strabismus diagnosis from small samples: Harnessing spatial features for improved accuracy 从小样本有效的斜视诊断:利用空间特征提高准确性。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104759
Renzhong Wu , Shenghui Liao , Yongrong Ji , Xiaoyan Kui , Fuchang Han , Ziyang Hu , Xuefei Song
Strabismus is a common ophthalmological condition, and early diagnosis is crucial to preventing visual impairment and loss of stereopsis. However, traditional methods for diagnosing strabismus often rely on specialized ophthalmic equipment and trained personnel, limiting the widespread accessibility of strabismus diagnosis. Computer-aided strabismus diagnosis is an effective and widely used technology that assists clinicians in making clinical diagnoses and improving efficiency. To address this, we designed an efficient strabismus diagnosis model, RIS-MLP, based on a small number of samples derived from frontal facial images captured under natural lighting conditions via the Hirschberg test. The RIS-MLP combines light reflex point detection and iris detection modules to accurately extract key spatial features even under noisy and occluded conditions. The optimized spatial feature strategies further enhances the performance of the classification module. To validate the superiority of RIS-MLP, we conducted both direct and indirect comparative experiments. Indirect comparisons demonstrate that the RIS-MLP has advantages in terms of sample efficiency. While direct comparisons show that the RIS-MLP can mitigate overfitting to a certain extent, and the RIS-MLP along with its variants (e.g., RIS-SVM) have outperformed state-of-the-art models on our noisy and imbalanced dataset.
斜视是一种常见的眼科疾病,早期诊断对预防视力损害和立体视觉丧失至关重要。然而,传统的斜视诊断方法往往依赖于专业的眼科设备和训练有素的人员,限制了斜视诊断的广泛可及性。计算机辅助斜视诊断是一种有效的、广泛应用的技术,可以帮助临床医生进行临床诊断,提高诊断效率。为了解决这个问题,我们设计了一个有效的斜视诊断模型RIS-MLP,该模型基于Hirschberg测试在自然光条件下捕获的少量正面面部图像样本。RIS-MLP结合光反射点检测和虹膜检测模块,即使在噪声和遮挡条件下也能准确提取关键空间特征。优化后的空间特征策略进一步提高了分类模块的性能。为了验证RIS-MLP的优越性,我们进行了直接和间接的对比实验。间接比较表明,RIS-MLP在样本效率方面具有优势。虽然直接比较表明RIS-MLP可以在一定程度上缓解过拟合,而且RIS-MLP及其变体(例如RIS-SVM)在我们的嘈杂和不平衡数据集上的表现优于最先进的模型。
{"title":"Efficient strabismus diagnosis from small samples: Harnessing spatial features for improved accuracy","authors":"Renzhong Wu ,&nbsp;Shenghui Liao ,&nbsp;Yongrong Ji ,&nbsp;Xiaoyan Kui ,&nbsp;Fuchang Han ,&nbsp;Ziyang Hu ,&nbsp;Xuefei Song","doi":"10.1016/j.jbi.2024.104759","DOIUrl":"10.1016/j.jbi.2024.104759","url":null,"abstract":"<div><div>Strabismus is a common ophthalmological condition, and early diagnosis is crucial to preventing visual impairment and loss of stereopsis. However, traditional methods for diagnosing strabismus often rely on specialized ophthalmic equipment and trained personnel, limiting the widespread accessibility of strabismus diagnosis. Computer-aided strabismus diagnosis is an effective and widely used technology that assists clinicians in making clinical diagnoses and improving efficiency. To address this, we designed an efficient strabismus diagnosis model, RIS-MLP, based on a small number of samples derived from frontal facial images captured under natural lighting conditions via the Hirschberg test. The RIS-MLP combines light reflex point detection and iris detection modules to accurately extract key spatial features even under noisy and occluded conditions. The optimized spatial feature strategies further enhances the performance of the classification module. To validate the superiority of RIS-MLP, we conducted both direct and indirect comparative experiments. Indirect comparisons demonstrate that the RIS-MLP has advantages in terms of sample efficiency. While direct comparisons show that the RIS-MLP can mitigate overfitting to a certain extent, and the RIS-MLP along with its variants (e.g., RIS-SVM) have outperformed state-of-the-art models on our noisy and imbalanced dataset.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104759"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142818178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation 在电子病历中增强自杀行为检测:一个具有转换模型和基于语义检索的注释的多标签NLP框架。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104755
Kimia Zandbiglari , Shobhan Kumar , Muhammad Bilal , Amie Goodin , Masoud Rouhizadeh

Background:

Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods.

Methods:

We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs.

Results:

Lexical analysis revealed key themes in assessing suicide risk, considering an individual’s history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly “Suicide Attempt” and “Family History” instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions.

Conclusion:

This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making.
背景:自杀是世界范围内死亡的主要原因,因此早期识别自杀行为对临床医生至关重要。当前用于识别电子健康记录(EHRs)中自杀行为的自然语言处理(NLP)方法依赖于关键字搜索、基于规则的方法和二元分类,这些方法可能无法完全捕捉自杀行为的复杂性和范围。本研究旨在创建一个带有标注指南的多类别标注数据集,并开发一种新的NLP方法,用于细粒度、多标签的自杀行为分类,提高标注过程的效率和NLP方法的准确性。方法:我们根据FDA、CDC和WHO的指南开发了一个多类别标签系统,区分了六种自杀行为,并允许每个数据样本进行多个标签。为了高效地创建带注释的数据集,我们使用基于mpnet的语义检索框架从大型EHR数据集中提取相关句子,在捕获不同表达的同时减少注释空间。专家使用多类系统对提取的句子进行注释。然后,我们将任务制定为一个多标签分类问题,并在策划的数据集上微调基于变压器的模型,以准确地分类电子病历中的自杀行为。结果:词汇分析揭示了评估自杀风险的关键主题,考虑了个人的历史、精神健康、物质使用和家庭背景。基于变压器的微调模型有效地识别了电子健康记录中的自杀行为,Bio_ClinicalBERT、BioBERT和XLNet达到了F1分数(0.81),优于BERT和RoBERTa。该方法基于多标签分类系统,有效地捕捉了自杀行为的复杂性,特别是“自杀企图”和“家族史”实例。该方法使用特定任务的NLP模型和多标签分类系统,比传统的二元分类更有效地捕捉自杀行为的复杂性。然而,由于不同的指标和标签定义,与现有研究的直接比较是困难的。结论:本研究提出了一个强大的NLP框架,用于检测电子病历中的自杀行为,利用基于变压器的特定任务微调模型和半自动化管道。尽管存在局限性,但该方法证明了先进的NLP技术在增强自杀行为识别方面的潜力。未来的工作应注重模型的扩展和整合,以进一步改善患者护理和临床决策。
{"title":"Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation","authors":"Kimia Zandbiglari ,&nbsp;Shobhan Kumar ,&nbsp;Muhammad Bilal ,&nbsp;Amie Goodin ,&nbsp;Masoud Rouhizadeh","doi":"10.1016/j.jbi.2024.104755","DOIUrl":"10.1016/j.jbi.2024.104755","url":null,"abstract":"<div><h3>Background:</h3><div>Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods.</div></div><div><h3>Methods:</h3><div>We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs.</div></div><div><h3>Results:</h3><div>Lexical analysis revealed key themes in assessing suicide risk, considering an individual’s history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly “Suicide Attempt” and “Family History” instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions.</div></div><div><h3>Conclusion:</h3><div>This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104755"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human intention recognition for trauma resuscitation: An interpretable deep learning approach for medical process data 创伤复苏的人类意图识别:医疗过程数据的可解释深度学习方法。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104767
Keyi Li , Mary S. Kim , Wenjin Zhang , Sen Yang , Genevieve J. Sippel , Aleksandra Sarcevic , Randall S. Burd , Ivan Marsic

Objective

Trauma resuscitation is the initial evaluation and management of injured patients in the emergency department. This time-critical process requires the simultaneous pursuit of multiple resuscitation goals. Recognizing whether the required goal is being pursued can reduce errors in goal-related task performance and improve patient outcomes. The intention to pursue a goal can often be inferred from ongoing and completed treatment activities, but monitoring goal pursuit is cognitively demanding and prone to errors. We introduced an interpretable deep learning-based approach to aid decision making by automatically recognizing goal pursuit during trauma resuscitation.

Methods

We developed a predictive model to recognize the pursuit of two resuscitation goals: airway stabilization and circulatory support. We used event logs of 381 pediatric trauma resuscitations from August 2014 to November 2022 to train a neural network model with a dual-GRU structure that learns from both time-level and activity-type-level features. Our model makes predictions based on a sequence of activities and corresponding timestamps. To enhance the model and facilitate interpretation of predictions, we used the attention weights assigned by our model to represent the importance of features. These weights identified the critical time points and contributing activities during a goal pursuit.

Results

Our model achieved an average area under the receiver operating characteristic curve (AUC) score of 0.84 for recognizing airway stabilization and 0.83 for recognizing circulatory support. The most contributing activities and timestamps were aligned with domain knowledge.

Conclusion

Our interpretable predictive model can recognize provider intention based on a limited number of treatment activities. The model outperformed existing predictive models for medical events in accuracy and in interpretability. Integrating our model into a decision-support system would automate the tracking of provider actions, optimizing workflow to ensure timely delivery of care.
目的:创伤复苏是急诊科对受伤患者的初步评估和处理。这个时间紧迫的过程需要同时追求多个复苏目标。认识到是否正在追求所需的目标可以减少与目标相关的任务执行中的错误,并改善患者的治疗效果。追求目标的意图通常可以从正在进行和完成的治疗活动中推断出来,但监测目标追求是认知上的要求,容易出错。我们引入了一种可解释的基于深度学习的方法,通过自动识别创伤复苏过程中的目标追求来辅助决策。方法:我们开发了一个预测模型来识别两个复苏目标:气道稳定和循环支持。我们使用2014年8月至2022年11月381例儿童创伤复苏的事件日志来训练具有双gru结构的神经网络模型,该模型同时学习时间水平和活动类型水平的特征。我们的模型根据一系列活动和相应的时间戳进行预测。为了增强模型并促进预测的解释,我们使用模型分配的注意力权重来表示特征的重要性。这些权重确定了目标追求过程中的关键时间点和贡献活动。结果:我们的模型在识别气道稳定和识别循环支持方面的接受者工作特征曲线下的平均面积(AUC)得分分别为0.84和0.83。贡献最大的活动和时间戳与领域知识保持一致。结论:我们的可解释预测模型可以根据有限的治疗活动识别提供者的意图。该模型在准确性和可解释性方面优于现有的医疗事件预测模型。将我们的模型集成到决策支持系统中,可以自动跟踪提供者的行动,优化工作流程,确保及时提供护理。
{"title":"Human intention recognition for trauma resuscitation: An interpretable deep learning approach for medical process data","authors":"Keyi Li ,&nbsp;Mary S. Kim ,&nbsp;Wenjin Zhang ,&nbsp;Sen Yang ,&nbsp;Genevieve J. Sippel ,&nbsp;Aleksandra Sarcevic ,&nbsp;Randall S. Burd ,&nbsp;Ivan Marsic","doi":"10.1016/j.jbi.2024.104767","DOIUrl":"10.1016/j.jbi.2024.104767","url":null,"abstract":"<div><h3>Objective</h3><div>Trauma resuscitation is the initial evaluation and management of injured patients in the emergency department. This time-critical process requires the simultaneous pursuit of multiple resuscitation goals. Recognizing whether the required goal is being pursued can reduce errors in goal-related task performance and improve patient outcomes. The intention to pursue a goal can often be inferred from ongoing and completed treatment activities, but monitoring goal pursuit is cognitively demanding and prone to errors. We introduced an interpretable deep learning-based approach to aid decision making by automatically recognizing goal pursuit during trauma resuscitation.</div></div><div><h3>Methods</h3><div>We developed a predictive model to recognize the pursuit of two resuscitation goals: airway stabilization and circulatory support. We used event logs of 381 pediatric trauma resuscitations from August 2014 to November 2022 to train a neural network model with a dual-GRU structure that learns from both time-level and activity-type-level features. Our model makes predictions based on a sequence of activities and corresponding timestamps. To enhance the model and facilitate interpretation of predictions, we used the attention weights assigned by our model to represent the importance of features. These weights identified the critical time points and contributing activities during a goal pursuit.</div></div><div><h3>Results</h3><div>Our model achieved an average area under the receiver operating characteristic curve (AUC) score of 0.84 for recognizing airway stabilization and 0.83 for recognizing circulatory support. The most contributing activities and timestamps were aligned with domain knowledge.</div></div><div><h3>Conclusion</h3><div>Our interpretable predictive model can recognize provider intention based on a limited number of treatment activities. The model outperformed existing predictive models for medical events in accuracy and in interpretability. Integrating our model into a decision-support system would automate the tracking of provider actions, optimizing workflow to ensure timely delivery of care.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104767"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142921206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel machine learning model for predicting cancer drugs’ susceptibilities and discovering novel treatments 用于预测癌症药物敏感性和发现新型疗法的新型机器学习模型。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104762
Xiaowen Cao , Li Xing , Hao Ding , He Li , Yushan Hu , Yao Dong , Hua He , Junhua Gu , Xuekui Zhang

Background and Objective

Timely treatment is crucial for cancer patients, so it’s important to administer the appropriate treatment as soon as possible. Because individuals can respond differently to a given drug due to their unique genomic profiles, we aim to use their genomic information to predict how various drugs will affect them and determine the best course of treatment.

Methods

We present Kernelized Residual Stacking (KRS), a new multi-task learning approach, and use it to predict the responses to anti-cancer drugs based on genomic data. We demonstrate the superior predictive performance of KRS, outperforming popular competitors, by utilizing the Genomics of Drug Sensitivity in Cancer (GDSC) study and the Cancer Cell Line Encyclopedia (CCLE) study. Downstream analysis of feature genes selected by KRS is conducted to discover novel therapies.

Results

We used two genomic studies to show that KRS outperforms a few popular competitors in predicting drugs’ susceptibilities. Through downstream analysis of feature genes selected by KRS, we found that the PI3K-Akt pathway could alter drugs’ susceptibilities, and its expression correlated positively with the hub gene ERBB2. We discovered eight novel small molecules based on these feature genes, which could be developed into novel combination therapies with anti-cancer drugs.

Conclusions

KRS outperforms competitors in prediction performance and selects feature genes highly correlated with drugs’ susceptibilities. Novel biological results are found by investigating KRS’s feature genes.
背景与目的:及时治疗对癌症患者至关重要,因此尽早给予适当的治疗是很重要的。由于个体对特定药物的反应不同,这是由于他们独特的基因组图谱,我们的目标是利用他们的基因组信息来预测各种药物对他们的影响,并确定最佳治疗方案。方法:提出了一种新的多任务学习方法——kernel - ized Residual Stacking (KRS),并将其应用于基于基因组数据的抗癌药物反应预测。通过利用癌症药物敏感性基因组学(GDSC)研究和癌症细胞系百科全书(CCLE)研究,我们证明了KRS优越的预测性能,优于流行的竞争对手。对KRS选择的特征基因进行下游分析,以发现新的治疗方法。结果:我们使用了两项基因组研究,表明KRS在预测药物敏感性方面优于一些流行的竞争对手。通过对KRS选择的特征基因进行下游分析,我们发现PI3K-Akt通路可以改变药物的敏感性,其表达与枢纽基因ERBB2呈正相关。我们发现了八个基于这些特征基因的新型小分子,它们可以与抗癌药物开发出新的联合疗法。结论:KRS预测性能优于竞争对手,选择了与药物敏感性高度相关的特征基因。通过对KRS特征基因的研究,发现了新的生物学结果。
{"title":"Novel machine learning model for predicting cancer drugs’ susceptibilities and discovering novel treatments","authors":"Xiaowen Cao ,&nbsp;Li Xing ,&nbsp;Hao Ding ,&nbsp;He Li ,&nbsp;Yushan Hu ,&nbsp;Yao Dong ,&nbsp;Hua He ,&nbsp;Junhua Gu ,&nbsp;Xuekui Zhang","doi":"10.1016/j.jbi.2024.104762","DOIUrl":"10.1016/j.jbi.2024.104762","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Timely treatment is crucial for cancer patients, so it’s important to administer the appropriate treatment as soon as possible. Because individuals can respond differently to a given drug due to their unique genomic profiles, we aim to use their genomic information to predict how various drugs will affect them and determine the best course of treatment.</div></div><div><h3>Methods</h3><div>We present Kernelized Residual Stacking (KRS), a new multi-task learning approach, and use it to predict the responses to anti-cancer drugs based on genomic data. We demonstrate the superior predictive performance of KRS, outperforming popular competitors, by utilizing the Genomics of Drug Sensitivity in Cancer (GDSC) study and the Cancer Cell Line Encyclopedia (CCLE) study. Downstream analysis of feature genes selected by KRS is conducted to discover novel therapies.</div></div><div><h3>Results</h3><div>We used two genomic studies to show that KRS outperforms a few popular competitors in predicting drugs’ susceptibilities. Through downstream analysis of feature genes selected by KRS, we found that the PI3K-Akt pathway could alter drugs’ susceptibilities, and its expression correlated positively with the hub gene ERBB2. We discovered eight novel small molecules based on these feature genes, which could be developed into novel combination therapies with anti-cancer drugs.</div></div><div><h3>Conclusions</h3><div>KRS outperforms competitors in prediction performance and selects feature genes highly correlated with drugs’ susceptibilities. Novel biological results are found by investigating KRS’s feature genes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104762"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling repeated measurements data using the multilevel Bayesian network: A case of child morbidity 使用多层贝叶斯网络对重复测量数据建模:一个儿童发病率的案例。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104760
Bezalem Eshetu Yirdaw, Legesse Kassa Debusho

Background and Objective:

In epidemiological research, studying the long-term dependencies between multiple diseases is important. This study extends the multilevel Bayesian network (MBN) for repeated measures data that can estimate the rate of change in outcomes over time while quantifying the variabilities of these rates across higher-level units through various variance–covariance structures.

Method:

The performance and reliability of a model are examined through a simulation study, and its practical application is demonstrated using child morbidity data. This data has a hierarchical structure in which children were randomly selected from clusters (villages) and their conditions were assessed quarterly from March 2015 to May 2016. MBN was used to explore the relationship between outcomes weight-for-age (WAZ), height-for-age (HAZ), the number of days a child suffers from diarrhea (NOD), and flu (NOF), and estimate the rate of change of these outcomes over time. Since the outcomes considered were hybrid in nature, the connected three-parent set block Gibbs sampler with a multilevel generalized Poisson regression, multilevel zero inflated Poisson regression, and linear mixed-effects models were considered during the structure and parametric learning of the MBN.

Result:

The simulation study confirmed that a MBN using the time metric t as a node performed well for repeated measures data. The result from the structure learning of MBN shows a causal relationship between WAZ, HAZ, NOD and NOF. Furthermore, exclusive breastfeeding months and usage of micronutrient powder appeared as a strong predictor for all outcomes considered in this study.

Conclusion:

This study reveals that MBN is suitable in modeling repeated measures data to study the relationship between outcomes and estimate rate of change of an outcome over time while quantifying the variability due to higher-level clustering variables. Furthermore, the study highlights the importance of focusing on monitoring children with low WAZ and HAZ scores together with good feeding practices against the frequency of getting flu and diarrhea.
背景与目的:在流行病学研究中,研究多种疾病之间的长期依赖关系具有重要意义。本研究扩展了多层贝叶斯网络(MBN)用于重复测量数据,可以估计结果随时间的变化率,同时通过各种方差-协方差结构量化这些变化率在更高级别单元中的可变性。方法:通过仿真研究检验模型的性能和可靠性,并用儿童发病率数据验证模型的实际应用。该数据具有分层结构,从集群(村庄)中随机选择儿童,并在2015年3月至2016年5月期间对其状况进行季度评估。MBN用于探讨体重年龄比(WAZ)、身高年龄比(HAZ)、儿童腹泻天数(NOD)和流感天数(NOF)之间的关系,并估计这些结果随时间的变化率。由于考虑的结果本质上是混合的,因此在MBN的结构和参数学习过程中,考虑了具有多层广义泊松回归、多层零膨胀泊松回归和线性混合效应模型的连接三亲集块Gibbs采样器。结果:仿真研究证实,使用时间度量t作为节点的MBN在重复测量数据方面表现良好。MBN的结构学习结果表明,WAZ、HAZ、NOD和NOF之间存在因果关系。此外,纯母乳喂养月份和微量营养素粉的使用似乎是本研究中考虑的所有结果的有力预测因子。结论:本研究表明,MBN适合于对重复测量数据建模,研究结果之间的关系和估计结果随时间的变化率,同时量化由高水平聚类变量引起的变异性。此外,该研究强调了重点监测低WAZ和HAZ评分儿童的重要性,以及良好的喂养习惯,以防止患流感和腹泻的频率。
{"title":"Modeling repeated measurements data using the multilevel Bayesian network: A case of child morbidity","authors":"Bezalem Eshetu Yirdaw,&nbsp;Legesse Kassa Debusho","doi":"10.1016/j.jbi.2024.104760","DOIUrl":"10.1016/j.jbi.2024.104760","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>In epidemiological research, studying the long-term dependencies between multiple diseases is important. This study extends the multilevel Bayesian network (MBN) for repeated measures data that can estimate the rate of change in outcomes over time while quantifying the variabilities of these rates across higher-level units through various variance–covariance structures.</div></div><div><h3>Method:</h3><div>The performance and reliability of a model are examined through a simulation study, and its practical application is demonstrated using child morbidity data. This data has a hierarchical structure in which children were randomly selected from clusters (villages) and their conditions were assessed quarterly from March 2015 to May 2016. MBN was used to explore the relationship between outcomes weight-for-age (WAZ), height-for-age (HAZ), the number of days a child suffers from diarrhea (NOD), and flu (NOF), and estimate the rate of change of these outcomes over time. Since the outcomes considered were hybrid in nature, the connected three-parent set block Gibbs sampler with a multilevel generalized Poisson regression, multilevel zero inflated Poisson regression, and linear mixed-effects models were considered during the structure and parametric learning of the MBN.</div></div><div><h3>Result:</h3><div>The simulation study confirmed that a MBN using the time metric <span><math><mi>t</mi></math></span> as a node performed well for repeated measures data. The result from the structure learning of MBN shows a causal relationship between WAZ, HAZ, NOD and NOF. Furthermore, exclusive breastfeeding months and usage of micronutrient powder appeared as a strong predictor for all outcomes considered in this study.</div></div><div><h3>Conclusion:</h3><div>This study reveals that MBN is suitable in modeling repeated measures data to study the relationship between outcomes and estimate rate of change of an outcome over time while quantifying the variability due to higher-level clustering variables. Furthermore, the study highlights the importance of focusing on monitoring children with low WAZ and HAZ scores together with good feeding practices against the frequency of getting flu and diarrhea.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104760"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Repeatable process for extracting health data from HL7 CDA documents 用于从HL7 CDA文档提取运行状况数据的可重复流程。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104765
Harry-Anton Talvik , Marek Oja , Sirli Tamm , Kerli Mooses , Dage Särg , Marcus Lõo , Õie Renata Siimon , Hendrik Šuvalov , Raivo Kolde , Jaak Vilo , Sulev Reisberg , Sven Laur

Objective

This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes.

Methods

We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies.

Results

We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process’s efficiency.

Conclusion

After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline’s repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work.
目的:本研究旨在解决将现实世界临床文档架构(CDA)数据转换为观察性医疗结果伙伴关系(OMOP)公共数据模型(CDM)的文献空白,重点关注映射阶段之前的初始步骤。我们强调了可重复的提取-转换-加载(ETL)管道的重要性,以便从爱沙尼亚的HL7 CDA文档中提取健康数据,用于研究目的。方法:我们开发了一个可重复的ETL管道,以促进从CDA文档到OMOP CDM的健康数据的提取、清理和重组,确保高质量和结构化的数据格式。该管道旨在适应不断更新的数据交换格式变化,并处理不同科学研究的各种CDA文档子集。结果:我们通过选定的用例证明,我们的管道成功地将很大一部分诊断代码、体重和eGFR测量值以及PAP测试结果从CDA文档转换为OMOP CDM,显示了提取结构化数据的易用性。然而,遇到了诸如协调不同的编码系统和从自由文本部分提取实验室结果等挑战。流水线的迭代开发有助于快速检测和纠正错误,提高了流程的效率。结论:经过十年的重点工作,我们的研究已经开发出一种ETL管道,可以有效地将HL7 CDA文档转换为爱沙尼亚的OMOP CDM,解决了关键数据提取和转换挑战。该管道的可重复性和对各种数据子集的适应性使其成为研究人员处理健康数据的宝贵资源。虽然在爱沙尼亚的数据上进行了测试,但概述的原则是广泛适用的,可能有助于处理因国家而异的卫生数据标准。尽管出现了新的健康数据标准,但CDA与回顾性健康研究的相关性确保了这项工作的持续重要性。
{"title":"Repeatable process for extracting health data from HL7 CDA documents","authors":"Harry-Anton Talvik ,&nbsp;Marek Oja ,&nbsp;Sirli Tamm ,&nbsp;Kerli Mooses ,&nbsp;Dage Särg ,&nbsp;Marcus Lõo ,&nbsp;Õie Renata Siimon ,&nbsp;Hendrik Šuvalov ,&nbsp;Raivo Kolde ,&nbsp;Jaak Vilo ,&nbsp;Sulev Reisberg ,&nbsp;Sven Laur","doi":"10.1016/j.jbi.2024.104765","DOIUrl":"10.1016/j.jbi.2024.104765","url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes.</div></div><div><h3>Methods</h3><div>We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies.</div></div><div><h3>Results</h3><div>We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process’s efficiency.</div></div><div><h3>Conclusion</h3><div>After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline’s repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104765"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reviewer acknowledgement 2024 评论家承认。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 DOI: 10.1016/j.jbi.2024.104763
{"title":"Reviewer acknowledgement 2024","authors":"","doi":"10.1016/j.jbi.2024.104763","DOIUrl":"10.1016/j.jbi.2024.104763","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104763"},"PeriodicalIF":4.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142882210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1