{"title":"Document-Level Relation Extraction with Progressive Self-Distillation","authors":"Quan Wang, Zhendong Mao, Jie Gao, Yongdong Zhang","doi":"10.1145/3656168","DOIUrl":null,"url":null,"abstract":"<p>Document-level relation extraction (RE) aims to simultaneously predict relations (including no-relation cases denoted as NA) between all entity pairs in a document. It is typically formulated as a relation classification task with entities pre-detected in advance and solved by a hard-label training regime, which however neglects the divergence of the NA class and the correlations among other classes. This article introduces <b>progressive self-distillation</b> (PSD), a new training regime that employs online, self-knowledge distillation (KD) to produce and incorporate soft labels for document-level RE. The key idea of PSD is to gradually soften hard labels using past predictions from an RE model itself, which are adjusted adaptively as training proceeds. As such, PSD has to learn only one RE model within a single training pass, requiring no extra computation or annotation to pretrain another high-capacity teacher. PSD is conceptually simple, easy to implement, and generally applicable to various RE models to further improve their performance, without introducing additional parameters or significantly increasing training overheads into the models. It is also a general framework that can be flexibly extended to distilling various types of knowledge, rather than being restricted to soft labels themselves. Extensive experiments on four benchmarking datasets verify the effectiveness and generality of the proposed approach. The code is available at https://github.com/GaoJieCN/psd.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"47 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656168","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Document-level relation extraction (RE) aims to simultaneously predict relations (including no-relation cases denoted as NA) between all entity pairs in a document. It is typically formulated as a relation classification task with entities pre-detected in advance and solved by a hard-label training regime, which however neglects the divergence of the NA class and the correlations among other classes. This article introduces progressive self-distillation (PSD), a new training regime that employs online, self-knowledge distillation (KD) to produce and incorporate soft labels for document-level RE. The key idea of PSD is to gradually soften hard labels using past predictions from an RE model itself, which are adjusted adaptively as training proceeds. As such, PSD has to learn only one RE model within a single training pass, requiring no extra computation or annotation to pretrain another high-capacity teacher. PSD is conceptually simple, easy to implement, and generally applicable to various RE models to further improve their performance, without introducing additional parameters or significantly increasing training overheads into the models. It is also a general framework that can be flexibly extended to distilling various types of knowledge, rather than being restricted to soft labels themselves. Extensive experiments on four benchmarking datasets verify the effectiveness and generality of the proposed approach. The code is available at https://github.com/GaoJieCN/psd.
文档级关系提取(RE)的目的是同时预测文档中所有实体对之间的关系(包括无关系情况,以 NA 表示)。它通常被表述为一项关系分类任务,预先检测出实体,并通过硬标签训练机制来解决,但这种训练机制忽略了 NA 类的发散性和其他类之间的相关性。本文介绍了渐进式自我蒸馏(PSD),这是一种新的训练机制,它采用在线自我知识蒸馏(KD)来生成和纳入文档级 RE 的软标签。PSD 的关键理念是利用 RE 模型本身过去的预测来逐步软化硬标签,这些预测会随着训练的进行而进行自适应调整。因此,PSD 只需在单次训练中学习一个 RE 模型,不需要额外的计算或注释来预训另一个高容量教师。PSD 概念简单,易于实现,一般适用于各种 RE 模型,可进一步提高其性能,而不会引入额外参数或显著增加模型的训练开销。它还是一个通用框架,可以灵活扩展到提炼各种类型的知识,而不局限于软标签本身。在四个基准数据集上进行的广泛实验验证了所提方法的有效性和通用性。代码可在 https://github.com/GaoJieCN/psd 上获取。
期刊介绍:
The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain:
new principled information retrieval models or algorithms with sound empirical validation;
observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking;
accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques;
formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks;
development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking;
development of computational models of user information preferences and interaction behaviors;
creation and analysis of evaluation methodologies for information retrieval and information seeking; or
surveys of existing work that propose a significant synthesis.
The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.