用于文档级关系提取的基于置信度的自适应数据修订框架

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-09-26 DOI:10.1016/j.ipm.2024.103909
Chao Jiang , Jinzhi Liao , Xiang Zhao , Daojian Zeng , Jianhua Dai
{"title":"用于文档级关系提取的基于置信度的自适应数据修订框架","authors":"Chao Jiang ,&nbsp;Jinzhi Liao ,&nbsp;Xiang Zhao ,&nbsp;Daojian Zeng ,&nbsp;Jianhua Dai","doi":"10.1016/j.ipm.2024.103909","DOIUrl":null,"url":null,"abstract":"<div><div>Noisy annotations have become a key issue limiting <strong>Doc</strong>ument-level <strong>R</strong>elation <strong>E</strong>xtraction <strong>(DocRE)</strong>. Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based <strong>Re</strong>vision framework for <strong>D</strong>ocRE (<strong>ReD</strong>), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose <strong>ReD-DocRED</strong>, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103909"},"PeriodicalIF":7.4000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An adaptive confidence-based data revision framework for Document-level Relation Extraction\",\"authors\":\"Chao Jiang ,&nbsp;Jinzhi Liao ,&nbsp;Xiang Zhao ,&nbsp;Daojian Zeng ,&nbsp;Jianhua Dai\",\"doi\":\"10.1016/j.ipm.2024.103909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Noisy annotations have become a key issue limiting <strong>Doc</strong>ument-level <strong>R</strong>elation <strong>E</strong>xtraction <strong>(DocRE)</strong>. Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based <strong>Re</strong>vision framework for <strong>D</strong>ocRE (<strong>ReD</strong>), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose <strong>ReD-DocRED</strong>, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.<span><span><sup>1</sup></span></span></div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 1\",\"pages\":\"Article 103909\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324002681\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002681","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

嘈杂的注释已成为限制文档级关系提取(DocRE)的一个关键问题。以往的研究通过人工重新标注来解决这一问题。然而,这种手工策略效率低、人力成本高,而且无法推广到大规模数据集。为解决这一问题,我们构建了基于置信度的 DocRE 修订框架(ReD),旨在实现高质量的数据自动修订。具体来说,我们首先引入了一个去噪训练模块,以识别关系事实并防止出现噪声注释。其次,配备基于置信度的数据修订模块,对长尾分布式关系事实进行自适应数据修订。数据修订后,我们设计了一个迭代训练模块,以创建一个良性循环,将修订后的数据转化为有用的训练数据,以支持进一步的修订。通过利用 ReD,我们提出了 ReD-DocRED,它由来自 DocRED 的 101,873 份修订注释文档组成。ReD-DocRED 引入了 57.1% 的新关系事实,同时,在 ReD-DocRED 上训练的模型的 F1 分数也有了显著提高,从 6.35 到 16.55 不等。实验结果表明,ReD 可以实现高质量的数据修订,并在一定程度上取代人工标注1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An adaptive confidence-based data revision framework for Document-level Relation Extraction
Noisy annotations have become a key issue limiting Document-level Relation Extraction (DocRE). Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based Revision framework for DocRE (ReD), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose ReD-DocRED, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.1
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrieval Enhancing robustness in implicit feedback recommender systems with subgraph contrastive learning Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval Patients' cognitive and behavioral paradoxes in the process of adopting conflicting health information: A dynamic perspective Study of technology communities and dominant technology lock-in in the Internet of Things domain - Based on social network analysis of patent network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1