A Study of Biomedical Relation Extraction Using GPT Models.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Pub Date : 2024-05-31 eCollection Date: 2024-01-01

Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja

{"title":"A Study of Biomedical Relation Extraction Using GPT Models.","authors":"Jeffrey Zhang, Maxwell Wibert, Huixue Zhou, Xueqing Peng, Qingyu Chen, Vipina K Keloth, Yan Hu, Rui Zhang, Hua Xu, Kalpana Raja","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"391-400"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141827/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用 GPT 模型提取生物医学关系的研究。

关系提取（RE）是一项自然语言处理（NLP）任务，用于提取生物医学实体之间的语义关系。预训练大型语言模型（LLM）的最新发展促使 NLP 研究人员将其用于各种 NLP 任务。我们研究了从 EU-ADR、Gene Associations Database (GAD) 和 ChemProt 这三个标准数据集中提取关系的 GPT-3.5-turbo 和 GPT-4。与使用带有屏蔽实体的数据集的现有方法不同，我们在实验中对每个数据集使用了三个版本：带有屏蔽实体的版本、带有原始实体（未屏蔽）的第二个版本以及用原始术语替换缩写的第三个版本。我们为不同版本开发了提示，并使用了 GPT API 的聊天完成模型。我们的方法在 GPT-3.5-turbo 中取得了 0.498 到 0.809 的 F1 分数，在 GPT-4 中取得了 0.84 的最高 F1 分数。在某些实验中，GPT、BioBERT 和 PubMedBERT 的性能几乎相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

自引率

0.00%

发文量