首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Toward the European Health Data Space: The IMPaCT-Data secure infrastructure for EHR-based precision medicine research 迈向欧洲健康数据空间:基于电子病历的精准医学研究的 IMPaCT 数据安全基础设施。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-14 DOI: 10.1016/j.jbi.2024.104670
Silvia Rodríguez-Mejías , Sara Degli-Esposti , Sara González-García , Carlos Luis Parra-Calderón

Background:

Art. 50 of the proposal for a Regulation on the European Health Data Space (EHDS) states that “health data access bodies shall provide access to electronic health data only through a secure processing environment, with technical and organizational measures and security and interoperability requirements”.

Objective:

To identify specific security measures that nodes participating in health data spaces shall implement based on the results of the IMPaCT-Data project, whose goal is to facilitate the exchange of electronic health records (EHR) between public entities based in Spain and the secondary use of this information for precision medicine research in compliance with the General Data Protection Regulation (GDPR).

Data and methods:

This article presents an analysis of 24 out of a list of 72 security measures identified in the Spanish National Security Scheme (ENS) and adopted by members of the federated data infrastructure developed during the IMPaCT-Data project.

Results:

The IMPaCT-Data case helps clarify roles and responsibilities of entities willing to participate in the EHDS by reconciling technical system notions with the legal terminology. Most relevant security measures for Data Space Gatekeepers, Enablers and Prosumers are identified and explained.

Conclusion:

The EHDS can only be viable as long as the fiduciary duty of care of public health authorities is preserved; this implies that the secondary use of personal data shall contribute to the public interest and/or to protect the vital interests of the data subjects. This condition can only be met if all nodes participating in a health data space adopt the appropriate organizational and technical security measures necessary to fulfill their role.

背景:背景欧洲健康数据空间(EHDS)条例提案第 50 条规定,"健康数据访问机构只能通过安全的处理环境、技术和组织措施以及安全和互操作性要求来访问电子健康数据":目的:根据 IMPaCT-Data 项目的结果,确定参与健康数据空间的节点应实施的具体安全措施,该项目旨在促进西班牙公共实体之间的电子健康记录(EHR)交换,并根据《通用数据保护条例》(GDPR)将这些信息二次用于精准医学研究:本文对西班牙国家安全计划(ENS)中确定的 72 项安全措施清单中的 24 项进行了分析,这些安全措施被 IMPaCT-Data 项目期间开发的联合数据基础设施的成员所采用:结果:IMPaCT-Data 案例通过协调技术系统概念和法律术语,有助于明确愿意参与 EHDS 的实体的角色和责任。确定并解释了与数据空间守门人、使能者和消费者最相关的安全措施:只有在公共卫生机构的受托责任得到维护的情况下,EHDS 才是可行的;这意味着个人数据的二次使用应有助于公共利益和/或保护数据主体的重要利益。只有在参与健康数据空间的所有节点都采取履行其职责所需的适当组织和技术安全措施时,这一条件才能得到满足。
{"title":"Toward the European Health Data Space: The IMPaCT-Data secure infrastructure for EHR-based precision medicine research","authors":"Silvia Rodríguez-Mejías ,&nbsp;Sara Degli-Esposti ,&nbsp;Sara González-García ,&nbsp;Carlos Luis Parra-Calderón","doi":"10.1016/j.jbi.2024.104670","DOIUrl":"10.1016/j.jbi.2024.104670","url":null,"abstract":"<div><h3>Background:</h3><p>Art. 50 of the proposal for a Regulation on the European Health Data Space (EHDS) states that “health data access bodies shall provide access to electronic health data only through a secure processing environment, with technical and organizational measures and security and interoperability requirements”.</p></div><div><h3>Objective:</h3><p>To identify specific security measures that nodes participating in health data spaces shall implement based on the results of the IMPaCT-Data project, whose goal is to facilitate the exchange of electronic health records (EHR) between public entities based in Spain and the secondary use of this information for precision medicine research in compliance with the General Data Protection Regulation (GDPR).</p></div><div><h3>Data and methods:</h3><p>This article presents an analysis of 24 out of a list of 72 security measures identified in the Spanish National Security Scheme (ENS) and adopted by members of the federated data infrastructure developed during the IMPaCT-Data project.</p></div><div><h3>Results:</h3><p>The IMPaCT-Data case helps clarify roles and responsibilities of entities willing to participate in the EHDS by reconciling technical system notions with the legal terminology. Most relevant security measures for Data Space Gatekeepers, Enablers and Prosumers are identified and explained.</p></div><div><h3>Conclusion:</h3><p>The EHDS can only be viable as long as the fiduciary duty of care of public health authorities is preserved; this implies that the secondary use of personal data shall contribute to the public interest and/or to protect the vital interests of the data subjects. This condition can only be met if all nodes participating in a health data space adopt the appropriate organizational and technical security measures necessary to fulfill their role.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104670"},"PeriodicalIF":4.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424000881/pdfft?md5=479104a466d3a0a855cf5ab64177b453&pid=1-s2.0-S1532046424000881-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141331006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records 将生成式人工智能与检索增强生成相结合,总结并提取电子健康记录中的关键临床信息。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-14 DOI: 10.1016/j.jbi.2024.104662
Mohammad Alkhalaf , Ping Yu , Mengyang Yin , Chao Deng

Background

Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information.

Methodology

We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model’s output of each task manually against a gold standard dataset.

Result

The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs’ clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided.

Conclusion

This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.

背景:营养不良是老年护理机构(RACF)普遍存在的问题,会导致不良的健康后果。从电子健康记录(EHR)中的大量数据中有效提取关键临床信息的能力可提高对问题程度的认识,并制定有效的干预措施。本研究旨在测试零点提示工程应用于生成式人工智能(AI)模型本身以及与检索增强生成(RAG)相结合的效果,以自动完成汇总电子健康记录中的结构化和非结构化数据并提取重要营养不良信息的任务:我们采用了 Llama 2 13B 模型和零镜头提示。数据集包括澳大利亚 40 家 RACF 中与营养不良管理相关的非结构化和结构化电子病历。我们首先对模型单独进行了零点学习,然后将其与 RAG 结合起来完成了两项任务:生成有关客户营养状况的结构化摘要和提取有关营养不良风险因素的关键信息。我们在第一项任务中使用了 25 份笔记,在第二项任务中使用了 1,399 份笔记。我们根据金标准数据集手动评估了模型在每个任务中的输出结果:评估结果表明,应用于生成式人工智能模型的零点学习在总结和提取 RACF 病人营养状况信息方面非常有效。生成的摘要简洁准确地反映了原始数据,总体准确率为 93.25%。加入 RAG 后,总结过程得到改善,准确率提高了 6%,达到 99.25%。该模型还证明了其提取风险因素的能力,准确率达到 90%。然而,添加 RAG 并没有进一步提高这项任务的准确率。总体而言,当笔记中明确说明信息时,该模型表现出强劲的性能;然而,它可能会遇到幻觉限制,尤其是在没有明确提供细节的情况下:本研究表明,在自动生成电子病历数据的结构化摘要和提取关键临床信息的过程中,将零点学习应用于生成式人工智能模型的性能很高,但也存在局限性。加入 RAG 方法提高了模型性能,并缓解了幻觉问题。
{"title":"Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records","authors":"Mohammad Alkhalaf ,&nbsp;Ping Yu ,&nbsp;Mengyang Yin ,&nbsp;Chao Deng","doi":"10.1016/j.jbi.2024.104662","DOIUrl":"10.1016/j.jbi.2024.104662","url":null,"abstract":"<div><h3>Background</h3><p>Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information.</p></div><div><h3>Methodology</h3><p>We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model’s output of each task manually against a gold standard dataset.</p></div><div><h3>Result</h3><p>The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs’ clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided.</p></div><div><h3>Conclusion</h3><p>This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104662"},"PeriodicalIF":4.5,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424000807/pdfft?md5=4158d315b635a695a3a8d2e212c8aebd&pid=1-s2.0-S1532046424000807-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141330974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases 评估机器学习模型的公平性:在预测慢性病患者死亡率时使用匹配对应物对种族偏见的研究。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-13 DOI: 10.1016/j.jbi.2024.104677
Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong

Objective

Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.

Methods

We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.

Results

We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.

Discussion and conclusion

This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.

目的:现有的公平性评估方法往往忽视了比较组之间在健康的社会决定因素(如人口统计学和社会经济学)方面的系统性差异,从而可能导致不准确甚至相互矛盾的结论。本研究旨在使用一种考虑系统性差异的公平性检测方法,评估在预测慢性病患者死亡率方面的种族差异:我们从麻省总医院布里格姆分院的电子健康记录(EHR)中创建了五个数据集,每个数据集侧重于不同的慢性疾病:充血性心力衰竭(CHF)、慢性肾病(CKD)、慢性阻塞性肺病(COPD)、慢性肝病(CLD)和痴呆症。对于每个数据集,我们都开发了单独的机器学习模型来预测 1 年死亡率,并通过比较黑人和白人的预测结果来研究种族差异。我们比较了整体黑人和白人与通过倾向得分匹配确定的黑人和匹配白人之间的种族公平性评价,其中系统性差异得到了缓解:结果:我们发现黑人和白人在年龄、性别、婚姻状况、教育程度、吸烟状况、医疗保险类型、体重指数和 Charlson 合并症指数(P 值 讨论和结论:本研究通过重点检查系统性差异为公平性评估研究做出了贡献,并强调了在临床环境中使用的机器学习模型揭示种族偏见的潜力。
{"title":"Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases","authors":"Yifei Wang ,&nbsp;Liqin Wang ,&nbsp;Zhengyang Zhou ,&nbsp;John Laurentiev ,&nbsp;Joshua R. Lakin ,&nbsp;Li Zhou ,&nbsp;Pengyu Hong","doi":"10.1016/j.jbi.2024.104677","DOIUrl":"10.1016/j.jbi.2024.104677","url":null,"abstract":"<div><h3>Objective</h3><p>Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.</p></div><div><h3>Methods</h3><p>We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.</p></div><div><h3>Results</h3><p>We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (<em>p</em>-value &lt; 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (<em>p</em> = 0.043), in the CKD cohort for insurance type (<em>p</em> = 0.005) and education level (<em>p</em> = 0.016), and in the dementia cohort for body mass index (<em>p</em> = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with <em>p</em>-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and <em>p</em>-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.</p></div><div><h3>Discussion and conclusion</h3><p>This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104677"},"PeriodicalIF":4.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective 识别并减少电子表型中的偏差:从计算角度进行综合研究。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-12 DOI: 10.1016/j.jbi.2024.104671
Sirui Ding , Shenghan Zhang , Xia Hu , Na Zou

Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods’ performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.

电子表型是识别特殊患者群体的一项基本任务,在数字健康时代的精准医疗中发挥着重要作用。表型分析为其他相关的生物医学研究和临床工作,如疾病诊断、药物开发和临床试验等,提供了真实世界的证据。随着电子健康记录的发展,先进的机器学习技术大大提高了电子表型的性能。在医疗保健领域,精确性和公平性都是必须考虑的重要方面。然而,大多数相关工作都集中在设计具有更高精度的表型模型上。很少有人关注表型的公平性。在表型分析中忽略偏差会导致患者亚群代表性不足,这将进一步影响后续的医疗保健活动,如临床试验中的患者招募。在这项工作中,我们希望通过全面的实验研究来确定电子表型模型中存在的偏差,并评估广泛使用的去偏差方法在这些模型中的表现,从而弥补这一差距。我们选择肺炎和败血症作为表型分析的目标疾病。我们对从基于规则到数据驱动的 9 种电子表型方法进行了基准测试。同时,我们评估了 5 种偏差缓解策略的性能,包括前处理、中处理和后处理。通过大量的实验,我们总结了表型分析中发现的偏差和表型分析中偏差缓解策略的要点。
{"title":"Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective","authors":"Sirui Ding ,&nbsp;Shenghan Zhang ,&nbsp;Xia Hu ,&nbsp;Na Zou","doi":"10.1016/j.jbi.2024.104671","DOIUrl":"10.1016/j.jbi.2024.104671","url":null,"abstract":"<div><p>Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods’ performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104671"},"PeriodicalIF":4.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Location-enhanced syntactic knowledge for biomedical relation extraction 用于生物医学关系提取的位置增强句法知识。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-12 DOI: 10.1016/j.jbi.2024.104676
Yan Zhang, Zhihao Yang, Yumeng Yang, Hongfei Lin, Jian Wang

Biomedical relation extraction has long been considered a challenging task due to the specialization and complexity of biomedical texts. Syntactic knowledge has been widely employed in existing research to enhance relation extraction, providing guidance for the semantic understanding and text representation of models. However, the utilization of syntactic knowledge in most studies is not exhaustive, and there is often a lack of fine-grained noise reduction, leading to confusion in relation classification. In this paper, we propose an attention generator that comprehensively considers both syntactic dependency type information and syntactic position information to distinguish the importance of different dependency connections. Additionally, we integrate positional information, dependency type information, and word representations together to introduce location-enhanced syntactic knowledge for guiding our biomedical relation extraction. Experimental results on three widely used English benchmark datasets in the biomedical domain consistently outperform a range of baseline models, demonstrating that our approach not only makes full use of syntactic knowledge but also effectively reduces the impact of noisy words.

由于生物医学文本的专业性和复杂性,生物医学关系提取一直被认为是一项具有挑战性的任务。句法知识在现有研究中被广泛用于加强关系提取,为模型的语义理解和文本表示提供指导。然而,大多数研究对句法知识的利用并不全面,而且往往缺乏细粒度降噪,导致关系分类混乱。在本文中,我们提出了一种注意力生成器,它能综合考虑句法依赖类型信息和句法位置信息,以区分不同依赖连接的重要性。此外,我们还将位置信息、依赖类型信息和单词表示整合在一起,引入了位置增强句法知识,用于指导我们的生物医学关系提取。在生物医学领域广泛使用的三个英语基准数据集上的实验结果一致优于一系列基准模型,这表明我们的方法不仅充分利用了句法知识,还有效地降低了噪声词的影响。
{"title":"Location-enhanced syntactic knowledge for biomedical relation extraction","authors":"Yan Zhang,&nbsp;Zhihao Yang,&nbsp;Yumeng Yang,&nbsp;Hongfei Lin,&nbsp;Jian Wang","doi":"10.1016/j.jbi.2024.104676","DOIUrl":"10.1016/j.jbi.2024.104676","url":null,"abstract":"<div><p>Biomedical relation extraction has long been considered a challenging task due to the specialization and complexity of biomedical texts. Syntactic knowledge has been widely employed in existing research to enhance relation extraction, providing guidance for the semantic understanding and text representation of models. However, the utilization of syntactic knowledge in most studies is not exhaustive, and there is often a lack of fine-grained noise reduction, leading to confusion in relation classification. In this paper, we propose an attention generator that comprehensively considers both syntactic dependency type information and syntactic position information to distinguish the importance of different dependency connections. Additionally, we integrate positional information, dependency type information, and word representations together to introduce location-enhanced syntactic knowledge for guiding our biomedical relation extraction. Experimental results on three widely used English benchmark datasets in the biomedical domain consistently outperform a range of baseline models, demonstrating that our approach not only makes full use of syntactic knowledge but also effectively reduces the impact of noisy words.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104676"},"PeriodicalIF":4.5,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving biomedical Named Entity Recognition with additional external contexts 利用额外的外部上下文改进生物医学命名实体识别。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-11 DOI: 10.1016/j.jbi.2024.104674
Bui Duc Tho , Minh-Tien Nguyen , Dung Tien Le , Lin-Lung Ying , Shumpei Inoue , Tri-Thanh Nguyen

Objective:

Biomedical Named Entity Recognition (bio NER) is the task of recognizing named entities in biomedical texts. This paper introduces a new model that addresses bio NER by considering additional external contexts. Different from prior methods that mainly use original input sequences for sequence labeling, the model takes into account additional contexts to enhance the representation of entities in the original sequences, since additional contexts can provide enhanced information for the concept explanation of biomedical entities.

Methods:

To exploit an additional context, given an original input sequence, the model first retrieves the relevant sentences from PubMed and then ranks the retrieved sentences to form the contexts. It next combines the context with the original input sequence to form a new enhanced sequence. The original and new enhanced sequences are fed into PubMedBERT for learning feature representation. To obtain more fine-grained features, the model stacks a BiLSTM layer on top of PubMedBERT. The final named entity label prediction is done by using a CRF layer. The model is jointly trained in an end-to-end manner to take advantage of the additional context for NER of the original sequence.

Results:

Experimental results on six biomedical datasets show that the proposed model achieves promising performance compared to strong baselines and confirms the contribution of additional contexts for bio NER.

Conclusion:

The promising results confirm three important points. First, the additional context from PubMed helps to improve the quality of the recognition of biomedical entities. Second, PubMed is more appropriate than the Google search engine for providing relevant information of bio NER. Finally, more relevant sentences from the context are more beneficial than irrelevant ones to provide enhanced information for the original input sequences. The model is flexible to integrate any additional context types for the NER task.

目的:生物医学命名实体识别(bio NER)是一项识别生物医学文本中命名实体的任务:生物医学命名实体识别(bio NER)是一项识别生物医学文本中命名实体的任务。本文介绍了一种通过考虑额外外部上下文来解决生物 NER 问题的新模型。与之前主要使用原始输入序列进行序列标注的方法不同,该模型考虑了附加上下文,以增强原始序列中实体的表示,因为附加上下文可为生物医学实体的概念解释提供更多信息:为了利用附加上下文,在给定原始输入序列的情况下,模型首先从 PubMed 中检索相关句子,然后对检索到的句子进行排序以形成上下文。接下来,它将上下文与原始输入序列相结合,形成新的增强序列。原始序列和新的增强序列被输入 PubMedBERT 以学习特征表示。为了获得更精细的特征,该模型在 PubMedBERT 的顶部堆叠了一个 BiLSTM 层。最终的命名实体标签预测由 CRF 层完成。该模型以端到端的方式进行联合训练,以利用额外的上下文对原始序列进行 NER:在六个生物医学数据集上的实验结果表明,与强大的基线相比,所提出的模型取得了可喜的性能,并证实了附加上下文对生物 NER 的贡献:良好的结果证实了三个要点。首先,来自 PubMed 的附加上下文有助于提高生物医学实体的识别质量。其次,PubMed 比 Google 搜索引擎更适合提供生物 NER 的相关信息。最后,上下文中的相关句子比无关句子更有利于为原始输入序列提供增强信息。该模型非常灵活,可以为 NER 任务整合任何其他上下文类型。
{"title":"Improving biomedical Named Entity Recognition with additional external contexts","authors":"Bui Duc Tho ,&nbsp;Minh-Tien Nguyen ,&nbsp;Dung Tien Le ,&nbsp;Lin-Lung Ying ,&nbsp;Shumpei Inoue ,&nbsp;Tri-Thanh Nguyen","doi":"10.1016/j.jbi.2024.104674","DOIUrl":"10.1016/j.jbi.2024.104674","url":null,"abstract":"<div><h3>Objective:</h3><p>Biomedical Named Entity Recognition (bio NER) is the task of recognizing named entities in biomedical texts. This paper introduces a new model that addresses bio NER by considering additional external contexts. Different from prior methods that mainly use original input sequences for sequence labeling, the model takes into account additional contexts to enhance the representation of entities in the original sequences, since additional contexts can provide enhanced information for the concept explanation of biomedical entities.</p></div><div><h3>Methods:</h3><p>To exploit an additional context, given an original input sequence, the model first retrieves the relevant sentences from PubMed and then ranks the retrieved sentences to form the contexts. It next combines the context with the original input sequence to form a new enhanced sequence. The original and new enhanced sequences are fed into PubMedBERT for learning feature representation. To obtain more fine-grained features, the model stacks a BiLSTM layer on top of PubMedBERT. The final named entity label prediction is done by using a CRF layer. The model is jointly trained in an end-to-end manner to take advantage of the additional context for NER of the original sequence.</p></div><div><h3>Results:</h3><p>Experimental results on six biomedical datasets show that the proposed model achieves promising performance compared to strong baselines and confirms the contribution of additional contexts for bio NER.</p></div><div><h3>Conclusion:</h3><p>The promising results confirm three important points. First, the additional context from PubMed helps to improve the quality of the recognition of biomedical entities. Second, PubMed is more appropriate than the Google search engine for providing relevant information of bio NER. Finally, more relevant sentences from the context are more beneficial than irrelevant ones to provide enhanced information for the original input sequences. The model is flexible to integrate any additional context types for the NER task.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104674"},"PeriodicalIF":4.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141317373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAS-DDI: A dual-view framework with drug association and drug structure for drug–drug interaction prediction DAS-DDI:用于药物相互作用预测的药物关联和药物结构双视角框架。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-09 DOI: 10.1016/j.jbi.2024.104672
Dongjiang Niu, Lianwei Zhang, Beiyi Zhang, Qiang Zhang, Zhen Li

In drug development and clinical application, drug–drug interaction (DDI) prediction is crucial for patient safety and therapeutic efficacy. However, traditional methods for DDI prediction often overlook the structural features of drugs and the complex interrelationships between them, which affect the accuracy and interpretability of the model. In this paper, a novel dual-view DDI prediction framework, DAS-DDI is proposed. Firstly, a drug association network is constructed based on similarity information among drugs, which could provide rich context information for DDI prediction. Subsequently, a novel drug substructure extraction method is proposed, which could update the features of nodes and chemical bonds simultaneously, improving the comprehensiveness of the feature. Furthermore, an attention mechanism is employed to fuse multiple drug embeddings from different views dynamically, enhancing the discriminative ability of the model in handling multi-view data. Comparative experiments on three public datasets demonstrate the superiority of DAS-DDI compared with other state-of-the-art models under two scenarios.

在药物开发和临床应用中,药物相互作用(DDI)预测对患者安全和疗效至关重要。然而,传统的 DDI 预测方法往往忽略了药物的结构特征以及它们之间复杂的相互关系,从而影响了模型的准确性和可解释性。本文提出了一种新颖的双视角 DDI 预测框架 DAS-DDI。首先,基于药物间的相似性信息构建了药物关联网络,为 DDI 预测提供了丰富的上下文信息。随后,提出了一种新颖的药物亚结构提取方法,该方法可以同时更新节点和化学键的特征,提高了特征的全面性。此外,还采用了一种注意力机制来动态融合来自不同视图的多个药物嵌入,从而提高了模型处理多视图数据的判别能力。在三个公开数据集上进行的对比实验表明,在两种情况下,DAS-DDI 与其他最先进的模型相比更具优势。
{"title":"DAS-DDI: A dual-view framework with drug association and drug structure for drug–drug interaction prediction","authors":"Dongjiang Niu,&nbsp;Lianwei Zhang,&nbsp;Beiyi Zhang,&nbsp;Qiang Zhang,&nbsp;Zhen Li","doi":"10.1016/j.jbi.2024.104672","DOIUrl":"10.1016/j.jbi.2024.104672","url":null,"abstract":"<div><p>In drug development and clinical application, drug–drug interaction (DDI) prediction is crucial for patient safety and therapeutic efficacy. However, traditional methods for DDI prediction often overlook the structural features of drugs and the complex interrelationships between them, which affect the accuracy and interpretability of the model. In this paper, a novel dual-view DDI prediction framework, DAS-DDI is proposed. Firstly, a drug association network is constructed based on similarity information among drugs, which could provide rich context information for DDI prediction. Subsequently, a novel drug substructure extraction method is proposed, which could update the features of nodes and chemical bonds simultaneously, improving the comprehensiveness of the feature. Furthermore, an attention mechanism is employed to fuse multiple drug embeddings from different views dynamically, enhancing the discriminative ability of the model in handling multi-view data. Comparative experiments on three public datasets demonstrate the superiority of DAS-DDI compared with other state-of-the-art models under two scenarios.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104672"},"PeriodicalIF":4.5,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification 临床领域知识衍生模板改进了气胸分类中的事后人工智能解释。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-09 DOI: 10.1016/j.jbi.2024.104673
Han Yuan , Chuan Hong , Peng-Tao Jiang , Gangming Zhao , Nguyen Tuan Anh Tran , Xinxing Xu , Yet Yen Yan , Nan Liu

Objective

Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. Recently, artificial intelligence (AI), especially deep learning (DL), has been increasingly employed for automating the diagnostic process of pneumothorax. To address the opaqueness often associated with DL models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement.

Method

We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of the explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template’s boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods (Saliency Map, Grad-CAM, and Integrated Gradients) with and without our template guidance when explaining two DL models (VGG-19 and ResNet-50) in two real-world datasets (SIIM-ACR and ChestX-Det).

Results

The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. We further visualized baseline and template-guided model explanations on radiographs to showcase the performance of our approach.

Conclusions

In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving model explanations. Our approach not only aligns model explanations more closely with clinical insights but also exhibits extensibility to other thoracic diseases. We anticipate that our template guidance will forge a novel approach to elucidating AI models by integrating clinical domain expertise.

目的:气胸是一种急性胸腔疾病,由肺部和胸壁之间的异常积气引起。最近,人工智能(AI),尤其是深度学习(DL),越来越多地被用于气胸的自动化诊断过程。为了解决深度学习模型经常出现的不透明性问题,人们引入了可解释人工智能(XAI)方法来概述与气胸相关的区域。然而,这些解释有时与实际病变区域存在偏差,凸显了进一步改进的必要性:我们提出了一种模板指导方法,将气胸的临床知识纳入 XAI 方法生成的模型解释中,从而提高这些解释的质量。我们的方法首先利用放射科医生创建的一个病灶划分,生成一个代表潜在气胸发生区域的模板。然后将该模板叠加到模型解释上,以过滤掉超出模板边界的无关解释。为了验证该方法的有效性,我们在两个真实世界数据集(SIIM-ACR 和 ChestX-Det)中解释两个 DL 模型(VGG-19 和 ResNet-50)时,对三种 XAI 方法(Saliency Map、Grad-CAM 和 Integrated Gradients)进行了比较分析:结果:在基于三种 XAI 方法、两种 DL 模型和两个数据集的 12 个基准场景中,所提出的方法持续改进了基线 XAI 方法。在比较模型解释和地面实况病变区域时,根据比基线性能的性能改进计算得出的平均增量百分比为:交集大于联合(IoU)97.8%,骰子相似系数(DSC)94.1%。我们进一步将基线和模板指导下的模型解释在X光片上可视化,以展示我们方法的性能:结论:在气胸诊断方面,我们提出了一种模板指导方法来改进模型解释。我们的方法不仅能使模型解释与临床见解更紧密地结合起来,还能扩展到其他胸部疾病。我们预计,我们的模板指导将通过整合临床领域的专业知识,为人工智能模型的阐释提供一种新方法。
{"title":"Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification","authors":"Han Yuan ,&nbsp;Chuan Hong ,&nbsp;Peng-Tao Jiang ,&nbsp;Gangming Zhao ,&nbsp;Nguyen Tuan Anh Tran ,&nbsp;Xinxing Xu ,&nbsp;Yet Yen Yan ,&nbsp;Nan Liu","doi":"10.1016/j.jbi.2024.104673","DOIUrl":"10.1016/j.jbi.2024.104673","url":null,"abstract":"<div><h3>Objective</h3><p>Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. Recently, artificial intelligence (AI), especially deep learning (DL), has been increasingly employed for automating the diagnostic process of pneumothorax. To address the opaqueness often associated with DL models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement.</p></div><div><h3>Method</h3><p>We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of the explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template’s boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods (Saliency Map, Grad-CAM, and Integrated Gradients) with and without our template guidance when explaining two DL models (VGG-19 and ResNet-50) in two real-world datasets (SIIM-ACR and ChestX-Det).</p></div><div><h3>Results</h3><p>The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. We further visualized baseline and template-guided model explanations on radiographs to showcase the performance of our approach.</p></div><div><h3>Conclusions</h3><p>In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving model explanations. Our approach not only aligns model explanations more closely with clinical insights but also exhibits extensibility to other thoracic diseases. We anticipate that our template guidance will forge a novel approach to elucidating AI models by integrating clinical domain expertise.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104673"},"PeriodicalIF":4.5,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical research text summarization method based on fusion of domain knowledge 基于领域知识融合的临床研究文本摘要方法。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-08 DOI: 10.1016/j.jbi.2024.104668
Shiwei Jiang , Qingxiao Zheng , Taiyong Li , Shuanghong Luo

Objective

The objective of this study is to integrate PICO knowledge into the clinical research text summarization process, aiming to enhance the model’s comprehension of biomedical texts while capturing crucial content from the perspective of summary readers, ultimately improving the quality of summaries.

Methods

We propose a clinical research text summarization method called DKGE-PEGASUS (Domain-Knowledge and Graph Convolutional Enhanced PEGASUS), which is based on integrating domain knowledge. The model mainly consists of three components: a PICO label prediction module, a text information re-mining unit based on Graph Convolutional Neural Networks (GCN), and a pre-trained summarization model. First, the PICO label prediction module is used to identify PICO elements in clinical research texts while obtaining word embeddings enriched with PICO knowledge. Then, we use GCN to reinforce the encoder of the pre-trained summarization model to achieve deeper text information mining while explicitly injecting PICO knowledge. Finally, the outputs of the PICO label prediction module, the GCN text information re-mining unit, and the encoder of the pre-trained model are fused to produce the final coding results, which are then decoded by the decoder to generate summaries.

Results

Experiments conducted on two datasets, PubMed and CDSR, demonstrated the effectiveness of our method. The Rouge-1 scores achieved were 42.64 and 38.57, respectively. Furthermore, the quality of our summarization results was found to significantly outperform the baseline model in comparisons of summarization results for a segment of biomedical text.

Conclusion

The method proposed in this paper is better equipped to identify critical elements in clinical research texts and produce a higher-quality summary.

研究目的本研究的目的是将 PICO 知识整合到临床研究文本摘要过程中,旨在增强模型对生物医学文本的理解能力,同时从摘要读者的角度捕捉关键内容,最终提高摘要的质量:我们提出了一种名为 DKGE-PEGASUS (领域知识与图卷积增强 PEGASUS)的临床研究文本摘要方法,该方法以整合领域知识为基础。该模型主要由三个部分组成:PICO 标签预测模块、基于图卷积神经网络(GCN)的文本信息再挖掘单元和预训练摘要模型。首先,PICO 标签预测模块用于识别临床研究文本中的 PICO 要素,同时获得富含 PICO 知识的词嵌入。然后,我们使用 GCN 来加强预训练摘要模型的编码器,以实现更深入的文本信息挖掘,同时明确注入 PICO 知识。最后,融合 PICO 标签预测模块、GCN 文本信息再挖掘单元和预训练模型编码器的输出结果,生成最终编码结果,然后由解码器解码生成摘要:在 PubMed 和 CDSR 两个数据集上进行的实验证明了我们方法的有效性。所获得的 Rouge-1 分数分别为 42.64 和 38.57。此外,在一段生物医学文本的摘要结果比较中,我们的摘要结果质量明显优于基线模型:本文提出的方法能更好地识别临床研究文本中的关键要素,并生成更高质量的摘要。
{"title":"Clinical research text summarization method based on fusion of domain knowledge","authors":"Shiwei Jiang ,&nbsp;Qingxiao Zheng ,&nbsp;Taiyong Li ,&nbsp;Shuanghong Luo","doi":"10.1016/j.jbi.2024.104668","DOIUrl":"10.1016/j.jbi.2024.104668","url":null,"abstract":"<div><h3>Objective</h3><p>The objective of this study is to integrate PICO knowledge into the clinical research text summarization process, aiming to enhance the model’s comprehension of biomedical texts while capturing crucial content from the perspective of summary readers, ultimately improving the quality of summaries.</p></div><div><h3>Methods</h3><p>We propose a clinical research text summarization method called DKGE-PEGASUS (Domain-Knowledge and Graph Convolutional Enhanced PEGASUS), which is based on integrating domain knowledge. The model mainly consists of three components: a PICO label prediction module, a text information re-mining unit based on Graph Convolutional Neural Networks (GCN), and a pre-trained summarization model. First, the PICO label prediction module is used to identify PICO elements in clinical research texts while obtaining word embeddings enriched with PICO knowledge. Then, we use GCN to reinforce the encoder of the pre-trained summarization model to achieve deeper text information mining while explicitly injecting PICO knowledge. Finally, the outputs of the PICO label prediction module, the GCN text information re-mining unit, and the encoder of the pre-trained model are fused to produce the final coding results, which are then decoded by the decoder to generate summaries.</p></div><div><h3>Results</h3><p>Experiments conducted on two datasets, PubMed and CDSR, demonstrated the effectiveness of our method. The Rouge-1 scores achieved were 42.64 and 38.57, respectively. Furthermore, the quality of our summarization results was found to significantly outperform the baseline model in comparisons of summarization results for a segment of biomedical text.</p></div><div><h3>Conclusion</h3><p>The method proposed in this paper is better equipped to identify critical elements in clinical research texts and produce a higher-quality summary.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104668"},"PeriodicalIF":4.5,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event prediction by estimating continuously the completion of a single temporal pattern’s instances 通过连续估算单个时间模式实例的完成时间进行事件预测。
IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-08 DOI: 10.1016/j.jbi.2024.104665
Nevo Itzhak , Szymon Jaroszewicz , Robert Moskovitch

Objective:

Develop a new method for continuous prediction that utilizes a single temporal pattern ending with an event of interest and its multiple instances detected in the temporal data.

Methods:

Use temporal abstraction to transform time series, instantaneous events, and time intervals into a uniform representation using symbolic time intervals (STIs). Introduce a new approach to event prediction using a single time intervals-related pattern (TIRP), which can learn models to predict whether and when an event of interest will occur, based on multiple instances of a pattern that end with the event.

Results:

The proposed methods achieved an average improvement of 5% AUROC over LSTM-FCN, the best-performed baseline model, out of the evaluated baseline models (RawXGB, Resnet, LSTM-FCN, and ROCKET) that were applied to real-life datasets.

Conclusion:

The proposed methods for predicting events continuously have the potential to be used in a wide range of real-world and real-time applications in diverse domains with heterogeneous multivariate temporal data. For example, it could be used to predict panic attacks early using wearable devices or to predict complications early in intensive care unit patients.

目标:开发一种新的连续预测方法:开发一种新的连续预测方法,利用以感兴趣事件为终点的单一时间模式及其在时间数据中检测到的多个实例:使用时间抽象法将时间序列、瞬时事件和时间间隔转换为使用符号时间间隔(STI)的统一表示法。引入一种使用单个时间间隔相关模式(TIRP)进行事件预测的新方法,该方法可以学习模型,根据以事件结束的模式的多个实例来预测相关事件是否会发生以及何时发生:在应用于现实生活数据集的评估基线模型(RawXGB、Resnet、LSTM-FCN 和 ROCKET)中,所提出的方法比表现最好的基线模型 LSTM-FCN 平均提高了 5% 的 AUROC:所提出的连续事件预测方法具有广泛的现实世界和实时应用潜力,可用于具有异构多变量时间数据的不同领域。例如,可利用可穿戴设备及早预测恐慌症发作,或及早预测重症监护室病人的并发症。
{"title":"Event prediction by estimating continuously the completion of a single temporal pattern’s instances","authors":"Nevo Itzhak ,&nbsp;Szymon Jaroszewicz ,&nbsp;Robert Moskovitch","doi":"10.1016/j.jbi.2024.104665","DOIUrl":"10.1016/j.jbi.2024.104665","url":null,"abstract":"<div><h3>Objective:</h3><p>Develop a new method for continuous prediction that utilizes a single temporal pattern ending with an event of interest and its multiple instances detected in the temporal data.</p></div><div><h3>Methods:</h3><p>Use temporal abstraction to transform time series, instantaneous events, and time intervals into a uniform representation using symbolic time intervals (STIs). Introduce a new approach to event prediction using a single time intervals-related pattern (TIRP), which can learn models to predict whether and when an event of interest will occur, based on multiple instances of a pattern that end with the event.</p></div><div><h3>Results:</h3><p>The proposed methods achieved an average improvement of 5% AUROC over LSTM-FCN, the best-performed baseline model, out of the evaluated baseline models (RawXGB, Resnet, LSTM-FCN, and ROCKET) that were applied to real-life datasets.</p></div><div><h3>Conclusion:</h3><p>The proposed methods for predicting events continuously have the potential to be used in a wide range of real-world and real-time applications in diverse domains with heterogeneous multivariate temporal data. For example, it could be used to predict panic attacks early using wearable devices or to predict complications early in intensive care unit patients.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104665"},"PeriodicalIF":4.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1