Semi-Open Information Extraction

Yu Bowen, Zhenyu Zhang, Jiawei Sheng, Tingwen Liu, Yubin Wang, Yu-Chih Wang, Bin Wang
{"title":"Semi-Open Information Extraction","authors":"Yu Bowen, Zhenyu Zhang, Jiawei Sheng, Tingwen Liu, Yubin Wang, Yu-Chih Wang, Bin Wang","doi":"10.1145/3442381.3450029","DOIUrl":null,"url":null,"abstract":"Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine. In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two-dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3450029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine. In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two-dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
半开放信息提取
开放信息抽取(OIE)是一项旨在发现句子中所有以主语、谓语、宾语形式组织的文本事实的任务,近年来受到了广泛关注。然而,在一些知识驱动的应用中,如问答,我们经常有一个目标实体,并希望获得其结构化的事实知识,以便更好地理解,而不是漫无目的地从语料库中提取所有可能的事实。在本文中,我们定义了一个新的任务,即半开放信息提取(SOIE),以解决这一需求。SOIE的目标是从一般和多样化的网络文本中发现针对特定实体的独立于领域的事实。为了促进这项新任务的研究,我们提出了一个大规模的人工标注基准,称为SOIED,它包括从网络搜索引擎收集的24,000个中文句子中标注的8,013个主题实体的61,984个事实。此外,我们提出了一种新的统一模型,称为USE。首先,我们引入主题引导序列作为预训练语言模型的输入,并对主题嵌入条件下的隐藏表示进行规范化,以主题感知的方式对句子进行编码。其次,我们将SOIE分解为三个不耦合的子任务:谓词提取、对象提取和边界对齐。它们都可以通过基于特定于任务的标记方案形成一个二维标记表来表示为表填充问题。第三,我们引入了一种协作学习策略,通过明确地交换信息线索,使子任务之间的交互关系得到更好的利用。最后,我们在新数据集上评估USE和几个强基线。实验结果证明了该方法的优点,并为今后的改进提供了新的思路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service Outlier-Resilient Web Service QoS Prediction Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy Unsupervised Lifelong Learning with Curricula The Structure of Toxic Conversations on Twitter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1