Semantic-based intelligent data clean framework for big data

Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) Pub Date : 2014-12-15 DOI:10.1109/SPAC.2014.6982731

Jia Wang, Zhijun Song, Qian Li, Jun Yu, Fei Chen

引用次数: 5

Abstract

In order to overcome the limitation of existing data cleansing methods working on massive data, in this paper, we propose a generic semantic-based framework using parallelized processing model for effective big data cleansing. We also use an improved Semantic-Based Keyword Matching Algorithm to deal with duplicate data. Experimental results show that this parallelized framework with improved Semantic-Based Keyword Matching Algorithm can identify duplicates with high recall and precision and have a good performance for big data cleansing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义的大数据智能数据清理框架

为了克服现有数据清理方法在海量数据上的局限性，本文提出了一种基于语义的通用框架，利用并行处理模型进行有效的大数据清理。我们还使用改进的基于语义的关键字匹配算法来处理重复数据。实验结果表明，该并行化框架结合改进的基于语义的关键字匹配算法，能够以较高的查全率和查准率识别重复项，具有良好的大数据清理性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)

自引率

0.00%

发文量

期刊最新文献

A new GPR image de-nosing method based on BEMD Design and implementation of one vertical video search engine Multi-scale sparse denoising model based on non-separable wavelet Dollar bill denomination recognition algorithm based on local texture feature Class specific dictionary learning for face recognition