Cleaning Big Data Streams: A Systematic Literature Review

Obaid Alotaibi, E. Pardede, Sarath Tomy
{"title":"Cleaning Big Data Streams: A Systematic Literature Review","authors":"Obaid Alotaibi, E. Pardede, Sarath Tomy","doi":"10.3390/technologies11040101","DOIUrl":null,"url":null,"abstract":"In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.","PeriodicalId":22341,"journal":{"name":"Technologies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/technologies11040101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
清理大数据流:系统文献综述
在当今的大数据时代,由于大数据格式的不同以及正在产生的海量数据,清理大数据流已成为一项具有挑战性的任务。许多研究提出了不同的技术来克服这些挑战,例如实时清理大数据。这篇系统的文献综述介绍了最近开发的用于清洗过程和每个数据清洗问题的技术。按照PRISMA框架,检索四个数据库,即IEEE Xplore, ACM Library, Scopus和Science Direct,以选择相关研究。在选择了相关研究之后,我们确定了用于清理大数据流的技术以及用于检查其效率的评估方法。此外,我们还定义了在清理过程中可能出现的清理问题,即缺失值、重复数据、异常值和不相关数据。根据我们的研究,确定了未来清理大数据流的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Technology in Forensic Sciences: Innovation and Precision Enhanced Energy Transfer Efficiency for IoT-Enabled Cyber-Physical Systems in 6G Edge Networks with WPT-MIMO-NOMA Development of a Body Weight Support System Employing Model-Based System Engineering Methodology Nano-Level Additive Manufacturing: Condensed Review of Processes, Materials, and Industrial Applications Development of a New Prototype Paediatric Central Sleep Apnoea Monitor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1