在Apache Spark上启用RETE算法进行RDFS推理

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2) Pub Date : 2018-11-01 DOI:10.1109/SC2.2018.00028

H. Ju, Sangyoon Oh

{"title":"在Apache Spark上启用RETE算法进行RDFS推理","authors":"H. Ju, Sangyoon Oh","doi":"10.1109/SC2.2018.00028","DOIUrl":null,"url":null,"abstract":"Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.","PeriodicalId":340244,"journal":{"name":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Enabling RETE Algorithm for RDFS Reasoning on Apache Spark\",\"authors\":\"H. Ju, Sangyoon Oh\",\"doi\":\"10.1109/SC2.2018.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.\",\"PeriodicalId\":340244,\"journal\":{\"name\":\"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC2.2018.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC2.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

语义网技术已被用于帮助各种软件，包括智能个人助理，通过获取新的数据或通过数据之间的关系来理解知识。然而，由于需要处理大量的数据，目前的语义web方案(如RDFS推理)很难应用于现实世界的数据。在本研究中，我们以并行方式在Apache Spark上设计并启用了使用RETE算法的RDFS推理。此外，我们还应用已有研究中的规则序列优化排序来提高处理性能。从实证实验结果来看，我们验证了我们设计的实现具有较强的可扩展性。但是，目前使用Spark提供不同功能来重复数据删除的naïve方法应该得到改进，以获得更好的处理性能。在今后的研究中，我们将进一步研究寻找新的重复数据删除方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enabling RETE Algorithm for RDFS Reasoning on Apache Spark

Semantic web technology has been used to help various software, including Intelligence Personal Assistant, by acquiring new data or understanding the knowledge through relations between data. However, it is hard to apply the current semantic web schemes such as RDFS reasoning to the real world data because of huge volume of data need to be processed. In this study, we design and enable RDFS reasoning with RETE algorithm on Apache Spark in parallel fashion. In addition, we apply rule sequence optimization ordering from existing studies to enhance the processing performance. From the empirical experiment results, we verified that the implementation of our design shows a strong scalability. However, the current naïve approach of using Spark provided distinct function to deduplicate data should be improved to yield a better processing performance. In future studies, we will study further to find new deduplication method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2)

自引率

0.00%

发文量

期刊最新文献

Get Your Head Out of the Clouds: The Illusion of Confidentiality & Privacy Improving the Performance of Stock Trend Prediction by Applying GA to Feature Selection Publisher's Information SC2 2018 Program Committee Hera Object Storage: A Seamless, Automated Multi-Tiering Solution on Top of OpenStack Swift