Integrating K+ Entities Into Coreference Resolution on Biomedical Texts

IF 3.4 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-08-21 DOI:10.1109/TCBB.2024.3447273

Yufei Li;Xiaoyong Ma;Xiangyu Zhou;Penghzhen Cheng;Kai He;Tieliang Gong;Chen Li

{"title":"Integrating K+ Entities Into Coreference Resolution on Biomedical Texts","authors":"Yufei Li;Xiaoyong Ma;Xiangyu Zhou;Penghzhen Cheng;Kai He;Tieliang Gong;Chen Li","doi":"10.1109/TCBB.2024.3447273","DOIUrl":null,"url":null,"abstract":"Biomedical Coreference Resolution focuses on identifying the coreferences in biomedical texts, which normally consists of two parts: (i) mention detection to identify textual representation of biological entities and (ii) finding their coreference links. Recently, a popular approach to enhance the task is to embed knowledge base into deep neural networks. However, the way in which these methods integrate knowledge leads to the shortcoming that such knowledge may play a larger role in mention detection than coreference resolution. Specifically, they tend to integrate knowledge prior to mention detection, as part of the embeddings. Besides, they primarily focus on mention-dependent knowledge (KBase), i.e., knowledge entities directly related to mentions, while ignores the correlated knowledge (K+) between mentions in the mention-pair. For mentions with significant differences in word form, this may limit their ability to extract potential correlations between those mentions. Thus, this paper develops a novel model to integrate both KBase and K+ entities and achieves the state-of-the-art performance on BioNLP and CRAFT-CR datasets. Empirical studies on mention detection with different length reveals the effectiveness of the KBase entities. The evaluation on cross-sentence and match/mismatch coreference further demonstrate the superiority of the K+ entities in extracting background potential correlation between mentions.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2145-2155"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10643354/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Biomedical Coreference Resolution focuses on identifying the coreferences in biomedical texts, which normally consists of two parts: (i) mention detection to identify textual representation of biological entities and (ii) finding their coreference links. Recently, a popular approach to enhance the task is to embed knowledge base into deep neural networks. However, the way in which these methods integrate knowledge leads to the shortcoming that such knowledge may play a larger role in mention detection than coreference resolution. Specifically, they tend to integrate knowledge prior to mention detection, as part of the embeddings. Besides, they primarily focus on mention-dependent knowledge (KBase), i.e., knowledge entities directly related to mentions, while ignores the correlated knowledge (K+) between mentions in the mention-pair. For mentions with significant differences in word form, this may limit their ability to extract potential correlations between those mentions. Thus, this paper develops a novel model to integrate both KBase and K+ entities and achieves the state-of-the-art performance on BioNLP and CRAFT-CR datasets. Empirical studies on mention detection with different length reveals the effectiveness of the KBase entities. The evaluation on cross-sentence and match/mismatch coreference further demonstrate the superiority of the K+ entities in extracting background potential correlation between mentions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将 K+ 实体整合到生物医学文本的核心参照解析中。

生物医学核心参照解析的重点是识别生物医学文本中的核心参照，通常包括两部分：(i) 提及检测，以识别生物实体的文本表示；(ii) 寻找其核心参照链接。最近，一种增强任务的流行方法是将知识库嵌入深度神经网络。然而，这些方法整合知识的方式导致了一个缺陷，即这些知识在提及检测中的作用可能大于核心参照解析。具体来说，这些方法倾向于在提及检测之前整合知识，将其作为嵌入的一部分。此外，它们主要关注与提及相关的知识（KBase），即与提及直接相关的知识实体，而忽略了提及对中提及之间的相关知识（K+）。对于词形差异较大的提及，这可能会限制其提取这些提及之间潜在关联的能力。因此，本文开发了一种整合 KBase 和 K+ 实体的新型模型，并在 BioNLP 和 CRAFT-CR 数据集上取得了最先进的性能。对不同长度的提及检测进行的实证研究揭示了 KBase 实体的有效性。对跨句子和匹配/不匹配核心参照的评估进一步证明了 K+ 实体在提取提及之间背景潜在相关性方面的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE/ACM Transactions on Computational Biology and Bioinformatics 工程技术-计算机：跨学科应用

CiteScore

7.50

自引率

6.70%

发文量

479

审稿时长

3 months

期刊介绍： IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system