RDFAdaptor: Efficient ETL Plugins for RDF Data Process

Jiao Li, Guojian Xian, Ruixue Zhao, Yongwen Huang, Yuantao Kou, Tingting Luo, Tan Sun
{"title":"RDFAdaptor: Efficient ETL Plugins for RDF Data Process","authors":"Jiao Li, Guojian Xian, Ruixue Zhao, Yongwen Huang, Yuantao Kou, Tingting Luo, Tan Sun","doi":"10.2478/jdis-2021-0020","DOIUrl":null,"url":null,"abstract":"Abstract Purpose The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats, thus developing out the necessity for RDF data processing with specific purposes. The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor, a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency. Design/methodology/approach The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats, and reuses the Java framework RDF4J as middleware that realizes access to data repositories, SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support. It can support effortless services with various configuration templates in multi-scenario applications, and help extend data process tasks in other services or tools to complement missing functions. Findings The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface, supports data integration and federation over multi-source heterogeneous repositories or endpoints, as well as manage linked data in hybrid storage mode. Research limitations The plugin set can support several application scenarios of RDF data process, but error detection/check and interaction with other graph repositories remain to be improved. Practical implications The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation, multi-format data conversion, remote RDF data migration, and RDF graph update in semantic query process. Originality/value This is the first attempt to develop components instead of systems that can include extract, consolidate, and store RDF data on the basis of an ecologically mature data warehousing environment.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"6 1","pages":"123 - 145"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2021-0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Abstract Purpose The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats, thus developing out the necessity for RDF data processing with specific purposes. The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor, a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency. Design/methodology/approach The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats, and reuses the Java framework RDF4J as middleware that realizes access to data repositories, SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support. It can support effortless services with various configuration templates in multi-scenario applications, and help extend data process tasks in other services or tools to complement missing functions. Findings The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface, supports data integration and federation over multi-source heterogeneous repositories or endpoints, as well as manage linked data in hybrid storage mode. Research limitations The plugin set can support several application scenarios of RDF data process, but error detection/check and interaction with other graph repositories remain to be improved. Practical implications The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation, multi-format data conversion, remote RDF data migration, and RDF graph update in semantic query process. Originality/value This is the first attempt to develop components instead of systems that can include extract, consolidate, and store RDF data on the basis of an ecologically mature data warehousing environment.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RDFAdaptor:RDF数据处理的高效ETL插件
语义网的跨学科性质和快速发展导致RDF数据以大量被广泛接受的序列化格式大量发布,从而产生了针对特定目的进行RDF数据处理的必要性。本文对RDF数据端点的主要挑战进行了评估,并介绍了RDF适配器,这是一套用于RDF数据处理的插件,它以高效率覆盖了整个生命周期。RDFAdaptor是基于著名的ETL工具——pentaho数据集成——设计的,它提供了一个用户友好和直观的界面,允许连接到各种数据源和格式,并重用Java框架RDF4J作为中间件,实现对数据存储库、SPARQL端点和所有领先的RDF数据库解决方案的访问,并支持SPARQL 1.1。它可以在多场景应用程序中支持使用各种配置模板的轻松服务,并帮助扩展其他服务或工具中的数据处理任务,以补充缺失的功能。提出的综合RDF ETL解决方案——rdfadaptor——提供了一个易于使用和直观的界面,支持多源异构存储库或端点上的数据集成和联合,以及在混合存储模式下管理链接数据。该插件集可以支持RDF数据处理的几种应用场景,但错误检测/检查以及与其他图形存储库的交互仍有待改进。该插件集可以提供用户界面和配置模板,使其可用于RDF数据生成、多格式数据转换、远程RDF数据迁移和语义查询过程中的RDF图更新等各种应用。原创性/价值这是第一次尝试开发组件而不是系统,这些组件可以在生态成熟的数据仓库环境的基础上包含提取、合并和存储RDF数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial board publication strategy and acceptance rates in Turkish national journals Multimodal sentiment analysis for social media contents during public emergencies Perspectives from a publishing ethics and research integrity team for required improvements Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings An author credit allocation method with improved distinguishability and robustness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1