Scalable Knowledge Graph Construction from Text Collections

R. Clancy, I. Ilyas, Jimmy J. Lin
{"title":"Scalable Knowledge Graph Construction from Text Collections","authors":"R. Clancy, I. Ilyas, Jimmy J. Lin","doi":"10.18653/v1/D19-6607","DOIUrl":null,"url":null,"abstract":"We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.","PeriodicalId":153447,"journal":{"name":"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)","volume":"13 1-4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-6607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从文本集合构建可扩展的知识图谱
我们提出了一个可扩展的开源平台,可以将潜在的大型文本集合“提炼”成知识图。我们的平台采用存储在Apache Solr中的文档,并通过Apache Spark集成扩展斯坦福CoreNLP工具包,以提取提及和关系,然后将其摄取到Neo4j图形数据库中。然后用从外部知识图中提取的事实来丰富原始知识图。完整的产品可以通过使用Neo4j的原生Cypher查询语言的各种应用程序来操作:我们提出了一种子图匹配方法,将提取的关系与外部事实对齐,并显示事实验证,定位断言事实的文本支持,检测不一致和缺失的事实,以及提取远程监督的训练数据都可以在同一个框架内执行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Veritas Annotator: Discovering the Origin of a Rumour Neural Multi-Task Learning for Stance Prediction Hybrid Models for Aspects Extraction without Labelled Dataset Relation Extraction among Multiple Entities Using a Dual Pointer Network with a Multi-Head Attention Mechanism Team GPLSI. Approach for automated fact checking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1