Utber: Utilizing Fine-Grained Entity Types to Relation Extraction with Distant Supervision

Chengmin Wu, Lei Chen
{"title":"Utber: Utilizing Fine-Grained Entity Types to Relation Extraction with Distant Supervision","authors":"Chengmin Wu, Lei Chen","doi":"10.1109/SMDS49396.2020.00015","DOIUrl":null,"url":null,"abstract":"Recently, much effort has been paid to relation extraction during the construction of large ontological knowledge bases (KBs). However, most of the traditional relation extraction systems rely on human-annotated data for training, which requires expensive human effort. Therefore, Distant supervision is proposed to assist the creation of large amounts of labeled data. By this method, an existing KB is heuristically aligned to texts, and the alignment data are treated as training data. Nevertheless, the noise in the training data may cause two serious problems. First, the heuristic label alignment may fail and cause the wrong label problem. Second, the existing statistical models are applied to ad-hoc features, and hence perform poorly due to the dynamic features of noisy data. To address these two problems, in this paper, we propose a novel framework for automatic relation extraction from unstructured text corpora. Specifically, to solve the first problem, we propose a fine-grained entity typing technique to filter wrong data by choosing positive entity type pairs and conduct joint instance-type selection over bag of instances. To solve the second problem, instead of directly defining manually crafted features, we propose a deep neural architecture with attention mechanism to automatically learn positive and negative instance features. Extensive experiments on real-world datasets demonstrate that our method outperforms the competitive state-of-the-art techniques in terms of effectiveness.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Smart Data Services (SMDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMDS49396.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Recently, much effort has been paid to relation extraction during the construction of large ontological knowledge bases (KBs). However, most of the traditional relation extraction systems rely on human-annotated data for training, which requires expensive human effort. Therefore, Distant supervision is proposed to assist the creation of large amounts of labeled data. By this method, an existing KB is heuristically aligned to texts, and the alignment data are treated as training data. Nevertheless, the noise in the training data may cause two serious problems. First, the heuristic label alignment may fail and cause the wrong label problem. Second, the existing statistical models are applied to ad-hoc features, and hence perform poorly due to the dynamic features of noisy data. To address these two problems, in this paper, we propose a novel framework for automatic relation extraction from unstructured text corpora. Specifically, to solve the first problem, we propose a fine-grained entity typing technique to filter wrong data by choosing positive entity type pairs and conduct joint instance-type selection over bag of instances. To solve the second problem, instead of directly defining manually crafted features, we propose a deep neural architecture with attention mechanism to automatically learn positive and negative instance features. Extensive experiments on real-world datasets demonstrate that our method outperforms the competitive state-of-the-art techniques in terms of effectiveness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用细粒度实体类型进行远程监督的关系提取
在大型本体知识库(KBs)的构建过程中,关系抽取是近年来研究的热点。然而,大多数传统的关系提取系统依赖于人工标注的数据进行训练,这需要耗费大量的人力。因此,提出了远程监督来协助创建大量标记数据。通过该方法,将现有知识库启发式地与文本对齐,并将对齐数据作为训练数据。然而,训练数据中的噪声可能会导致两个严重的问题。首先,启发式标签对齐可能会失败并导致错误的标签问题。其次,由于噪声数据的动态特性,现有的统计模型应用于ad-hoc特征,因此性能不佳。为了解决这两个问题,本文提出了一种从非结构化文本语料库中自动提取关系的新框架。具体来说,为了解决第一个问题,我们提出了一种细粒度实体类型技术,通过选择正实体类型对来过滤错误数据,并对实例包进行联合实例类型选择。为了解决第二个问题,我们提出了一种带有注意机制的深度神经结构来自动学习正面和负面实例特征,而不是直接定义手工制作的特征。在真实世界数据集上的大量实验表明,我们的方法在有效性方面优于竞争最先进的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies 2020 IEEE International Conference on Smart Data Services (SMDS) SMDS 2020 Scalable and Hybrid Ensemble-Based Causality Discovery Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1