用于文档级关系提取的序列到序列方法

Workshop on Biomedical Natural Language Processing Pub Date : 2022-04-03 DOI:10.48550/arXiv.2204.01098

John Giorgi, Gary D Bader, Bo Wang

{"title":"用于文档级关系提取的序列到序列方法","authors":"John Giorgi, Gary D Bader, Bo Wang","doi":"10.48550/arXiv.2204.01098","DOIUrl":null,"url":null,"abstract":"Motivated by the fact that many relations cross the sentence boundary, there has been increasing interest in document-level relation extraction (DocRE). DocRE requires integrating information within and across sentences, capturing complex interactions between mentions of entities. Most existing methods are pipeline-based, requiring entities as input. However, jointly learning to extract entities and relations can improve performance and be more efficient due to shared parameters and training steps. In this paper, we develop a sequence-to-sequence approach, seq2rel, that can learn the subtasks of DocRE (entity extraction, coreference resolution and relation extraction) end-to-end, replacing a pipeline of task-specific components. Using a simple strategy we call entity hinting, we compare our approach to existing pipeline-based methods on several popular biomedical datasets, in some cases exceeding their performance. We also report the first end-to-end results on these datasets for future comparison. Finally, we demonstrate that, under our model, an end-to-end approach outperforms a pipeline-based approach. Our code, data and trained models are available at https://github.com/johngiorgi/seq2rel. An online demo is available at https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A sequence-to-sequence approach for document-level relation extraction\",\"authors\":\"John Giorgi, Gary D Bader, Bo Wang\",\"doi\":\"10.48550/arXiv.2204.01098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivated by the fact that many relations cross the sentence boundary, there has been increasing interest in document-level relation extraction (DocRE). DocRE requires integrating information within and across sentences, capturing complex interactions between mentions of entities. Most existing methods are pipeline-based, requiring entities as input. However, jointly learning to extract entities and relations can improve performance and be more efficient due to shared parameters and training steps. In this paper, we develop a sequence-to-sequence approach, seq2rel, that can learn the subtasks of DocRE (entity extraction, coreference resolution and relation extraction) end-to-end, replacing a pipeline of task-specific components. Using a simple strategy we call entity hinting, we compare our approach to existing pipeline-based methods on several popular biomedical datasets, in some cases exceeding their performance. We also report the first end-to-end results on these datasets for future comparison. Finally, we demonstrate that, under our model, an end-to-end approach outperforms a pipeline-based approach. Our code, data and trained models are available at https://github.com/johngiorgi/seq2rel. An online demo is available at https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py.\",\"PeriodicalId\":200974,\"journal\":{\"name\":\"Workshop on Biomedical Natural Language Processing\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Biomedical Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2204.01098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Biomedical Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.01098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

由于许多关系跨越句子边界，人们对文档级关系提取(DocRE)越来越感兴趣。DocRE需要整合句子内部和句子之间的信息，捕捉实体提及之间的复杂交互。大多数现有的方法都是基于管道的，需要实体作为输入。然而，由于共享参数和训练步骤，联合学习提取实体和关系可以提高性能和效率。在本文中，我们开发了一种序列到序列的方法，seq2rel，它可以端到端学习DocRE的子任务(实体提取、共同引用解析和关系提取)，取代了任务特定组件的管道。我们使用一种称为实体暗示的简单策略，将我们的方法与几种流行的生物医学数据集上现有的基于流水线的方法进行比较，在某些情况下超过了它们的性能。我们还报告了这些数据集的第一个端到端结果，以便将来进行比较。最后，我们证明，在我们的模型下，端到端方法优于基于管道的方法。我们的代码、数据和经过训练的模型可在https://github.com/johngiorgi/seq2rel上获得。在线演示可在https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A sequence-to-sequence approach for document-level relation extraction

Motivated by the fact that many relations cross the sentence boundary, there has been increasing interest in document-level relation extraction (DocRE). DocRE requires integrating information within and across sentences, capturing complex interactions between mentions of entities. Most existing methods are pipeline-based, requiring entities as input. However, jointly learning to extract entities and relations can improve performance and be more efficient due to shared parameters and training steps. In this paper, we develop a sequence-to-sequence approach, seq2rel, that can learn the subtasks of DocRE (entity extraction, coreference resolution and relation extraction) end-to-end, replacing a pipeline of task-specific components. Using a simple strategy we call entity hinting, we compare our approach to existing pipeline-based methods on several popular biomedical datasets, in some cases exceeding their performance. We also report the first end-to-end results on these datasets for future comparison. Finally, we demonstrate that, under our model, an end-to-end approach outperforms a pipeline-based approach. Our code, data and trained models are available at https://github.com/johngiorgi/seq2rel. An online demo is available at https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Biomedical Natural Language Processing

自引率

0.00%

发文量