Corpus processing service: A Knowledge Graph platform to perform deep data exploration on corpora

Applied AI letters Pub Date : 2020-12-16 DOI:10.1002/ail2.20
Peter W. J. Staar, Michele Dolfi, Christoph Auer
{"title":"Corpus processing service: A Knowledge Graph platform to perform deep data exploration on corpora","authors":"Peter W. J. Staar,&nbsp;Michele Dolfi,&nbsp;Christoph Auer","doi":"10.1002/ail2.20","DOIUrl":null,"url":null,"abstract":"<p>Knowledge Graphs have been fast emerging as the de facto standard to model and explore knowledge in weakly structured data. Large corpora of documents constitute a source of weakly structured data of particular interest for both the academic and business world. Key examples include scientific publications, technical reports, manuals, patents, regulations, etc. Such corpora embed many facts that are elementary to critical decision making or enabling new discoveries. In this paper, we present a scalable cloud platform to create and serve Knowledge Graphs, which we named corpus processing service (CPS). Its purpose is to process large document corpora, extract the content and embedded facts, and ultimately represent these in a consistent knowledge graph that can be intuitively queried. To accomplish this, we use state-of-the-art natural language understanding models to extract entities and relationships from documents converted with our previously presented corpus conversion service platform. This pipeline is complemented with a newly developed graph engine which ensures extremely performant graph queries and provides powerful graph analytics capabilities. Both components are tightly integrated and can be easily consumed through REST APIs. Additionally, we provide user interfaces to control the data ingestion flow and formulate queries using a visual programming approach. The CPS platform is designed as a modular microservice system operating on Kubernetes clusters. Finally, we validate the quality of queries on our end-to-end knowledge pipeline in a real-world application in the oil and gas industry.</p>","PeriodicalId":72253,"journal":{"name":"Applied AI letters","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ail2.20","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied AI letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ail2.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Knowledge Graphs have been fast emerging as the de facto standard to model and explore knowledge in weakly structured data. Large corpora of documents constitute a source of weakly structured data of particular interest for both the academic and business world. Key examples include scientific publications, technical reports, manuals, patents, regulations, etc. Such corpora embed many facts that are elementary to critical decision making or enabling new discoveries. In this paper, we present a scalable cloud platform to create and serve Knowledge Graphs, which we named corpus processing service (CPS). Its purpose is to process large document corpora, extract the content and embedded facts, and ultimately represent these in a consistent knowledge graph that can be intuitively queried. To accomplish this, we use state-of-the-art natural language understanding models to extract entities and relationships from documents converted with our previously presented corpus conversion service platform. This pipeline is complemented with a newly developed graph engine which ensures extremely performant graph queries and provides powerful graph analytics capabilities. Both components are tightly integrated and can be easily consumed through REST APIs. Additionally, we provide user interfaces to control the data ingestion flow and formulate queries using a visual programming approach. The CPS platform is designed as a modular microservice system operating on Kubernetes clusters. Finally, we validate the quality of queries on our end-to-end knowledge pipeline in a real-world application in the oil and gas industry.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语料库处理服务:对语料库进行深度数据挖掘的知识图谱平台
知识图已经迅速成为在弱结构数据中建模和探索知识的事实上的标准。大型文档语料库构成了弱结构数据的来源,对学术界和商界都特别有意义。主要的例子包括科学出版物、技术报告、手册、专利、法规等。这样的语料库包含了许多对关键决策或新发现至关重要的事实。在本文中,我们提出了一个可扩展的云平台来创建和服务知识图,我们将其命名为语料处理服务(CPS)。它的目的是处理大型文档语料库,提取内容和嵌入的事实,并最终将其表示为可以直观查询的一致知识图。为了实现这一点,我们使用最先进的自然语言理解模型,从使用我们先前提供的语料库转换服务平台转换的文档中提取实体和关系。这个管道与新开发的图形引擎相辅相成,它确保了极其高性能的图形查询,并提供了强大的图形分析功能。这两个组件紧密集成,可以通过REST api轻松使用。此外,我们还提供了用户界面来控制数据摄取流,并使用可视化编程方法制定查询。CPS平台被设计为在Kubernetes集群上运行的模块化微服务系统。最后,我们在油气行业的实际应用中验证了端到端知识管道的查询质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Issue Information Fine-Tuned Pretrained Transformer for Amharic News Headline Generation TL-GNN: Android Malware Detection Using Transfer Learning Issue Information Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1