从文本集合构建可扩展的知识图谱

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER) Pub Date : 1900-01-01 DOI:10.18653/v1/D19-6607

R. Clancy, I. Ilyas, Jimmy J. Lin

{"title":"从文本集合构建可扩展的知识图谱","authors":"R. Clancy, I. Ilyas, Jimmy J. Lin","doi":"10.18653/v1/D19-6607","DOIUrl":null,"url":null,"abstract":"We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.","PeriodicalId":153447,"journal":{"name":"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)","volume":"13 1-4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Scalable Knowledge Graph Construction from Text Collections\",\"authors\":\"R. Clancy, I. Ilyas, Jimmy J. Lin\",\"doi\":\"10.18653/v1/D19-6607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.\",\"PeriodicalId\":153447,\"journal\":{\"name\":\"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)\",\"volume\":\"13 1-4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/D19-6607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-6607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

我们提出了一个可扩展的开源平台，可以将潜在的大型文本集合“提炼”成知识图。我们的平台采用存储在Apache Solr中的文档，并通过Apache Spark集成扩展斯坦福CoreNLP工具包，以提取提及和关系，然后将其摄取到Neo4j图形数据库中。然后用从外部知识图中提取的事实来丰富原始知识图。完整的产品可以通过使用Neo4j的原生Cypher查询语言的各种应用程序来操作:我们提出了一种子图匹配方法，将提取的关系与外部事实对齐，并显示事实验证，定位断言事实的文本支持，检测不一致和缺失的事实，以及提取远程监督的训练数据都可以在同一个框架内执行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Scalable Knowledge Graph Construction from Text Collections

We present a scalable, open-source platform that “distills” a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j’s native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

自引率

0.00%

发文量

期刊最新文献

Veritas Annotator: Discovering the Origin of a Rumour Neural Multi-Task Learning for Stance Prediction Hybrid Models for Aspects Extraction without Labelled Dataset Relation Extraction among Multiple Entities Using a Dual Pointer Network with a Multi-Head Attention Mechanism Team GPLSI. Approach for automated fact checking