一种基于RDBMS的数据重复检测方法

2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2011-05-11 DOI:10.1109/JCSSE.2011.5930142

Kiettisak Chanhom, J. Natwichai

{"title":"一种基于RDBMS的数据重复检测方法","authors":"Kiettisak Chanhom, J. Natwichai","doi":"10.1109/JCSSE.2011.5930142","DOIUrl":null,"url":null,"abstract":"Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.","PeriodicalId":287775,"journal":{"name":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An efficient approach for data-duplication detection based on RDBMS\",\"authors\":\"Kiettisak Chanhom, J. Natwichai\",\"doi\":\"10.1109/JCSSE.2011.5930142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.\",\"PeriodicalId\":287775,\"journal\":{\"name\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2011.5930142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2011.5930142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据重复是信息系统管理中的一个重要问题。代替在信息系统中存储单个现实世界对象作为实体，可以发生复制，即存储多个代表单个对象的实体。这个问题会降低信息系统的服务质量。在本文中，我们提出了一种基于RDBMS基础的有效的重复检测方法。我们的方法基于要处理的数据首先存储在RDBMS中的假设。因此，建议的方法不需要从存储中导入/导出数据。此外，这种方法还将受益于RDBMS的查询优化器。在TPC-H数据集上的实验结果验证了所提出的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An efficient approach for data-duplication detection based on RDBMS

Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量

期刊最新文献

Transforming state tables to Coloured Petri nets for automatic verification of internet protocols Clustering by attraction and distraction Event recognition from information-linkage based using phrase tree traversal Towards a complete project oriented risk management model: A refinement of PRORISK Solving software module clustering problem by evolutionary algorithms