A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis

Vahideh Reshadat, Maryam Hoorali, Heshaam Faili
{"title":"A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis","authors":"Vahideh Reshadat, Maryam Hoorali, Heshaam Faili","doi":"10.4036/IIS.2016.R.03","DOIUrl":null,"url":null,"abstract":"Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.","PeriodicalId":91087,"journal":{"name":"Interdisciplinary information sciences","volume":"22 1","pages":"87-100"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4036/IIS.2016.R.03","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary information sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4036/IIS.2016.R.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于浅、深语言分析的开放信息提取混合方法
开放信息抽取是一种独立于关系的抽取范式,它从海量异构语料库(如Web)中抽取断言。轻关系提取器通过将分析限制在一些肤浅的语言工具(如词性标注)上来关注效率。虽然这些方法快速且可扩展,但由于只使用了肤浅的句法特征,它们无法处理复杂的句子(例如复杂和长距离关系)。本文提出了两种新的混合方法,TextRunner-DepOE (TR-DOE)和ReVerb-DepOE (RV-DOE),它们结合了浅层开放IE系统的高性能子集和深层开放IE系统的优势。我们通过调整两个组合参数:句子长度和置信度来检测准确率和召回率之间的最佳权衡。由于重点是有效利用时间,我们使用了快速和鲁棒的深度提取器。实验表明,该混合方法的性能明显高于其组成系统。最好的结果是TR-DOE,其f值几乎是TextRunner的两倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Measuring National Refugee Policies: Recent Trends and Implications for Future Development Computational Complexity of Puzzles and Related Topics Singular limit of deterministic and stochastic reaction-diffusion systems Lecture Notes on the Singular Limit of Reaction-diffusion Systems Qualitative Properties of Solutions of Degenerate Parabolic Equations via Energy Approaches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1