An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools

Akshay Mudgal, Shaveta Bhatia
{"title":"An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools","authors":"Akshay Mudgal, Shaveta Bhatia","doi":"10.34028/iajit/20/6/11","DOIUrl":null,"url":null,"abstract":"With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment","PeriodicalId":161392,"journal":{"name":"The International Arab Journal of Information Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Arab Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/6/11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于实验的流处理工具效率评估研究
随着互联网技术的进步,常规数据生成的增强已经在一个剧烈的水平上被放大。几个不同的行业,例如酒店、国防、铁路、医疗保健、社交媒体、教育等,都在很大程度上创造和制作不同类型的原始和处理过的数据,然而,每个行业都有自己独特的理由来保护和称他们的数据是必要的和至关重要的。如此庞大的数据需要一定的空间来保存和保护,这就是大数据。数据流处理技术(Data Stream Processing Technology, DSPT)是对大量数据进行编译和计算的重要机制和支柱,也是对原始数据进行采集和处理,将其称为信息的方法。DSPT有多种,如Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas。ti, Cassandra等。本文旨在比较五大知名且广泛使用的开源大数据DSPT(即Apache Spark、Flink、Kafka、Storm和Samza)。将根据12个不同但相互关联的标准进行广泛的比较。设计了一个矩阵,通过它执行了五个不同的实验,并置将在此基础上准备。本文总结了开源大数据DPST的广泛研究,并在一个良好控制和复杂的环境中采用了实际的实验方法
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cohesive Pair-Wises Constrained Deep Embedding for Semi-Supervised Clustering with Very Few Labeled Samples* Scrupulous SCGAN Framework for Recognition of Restored Images with Caffe based PCA Filtration Fuzzy Heuristics for Detecting and Preventing Black Hole Attack XAI-PDF: A Robust Framework for Malicious PDF Detection Leveraging SHAP-Based Feature Engineering Healthcare Data Security in Cloud Storage Using Light Weight Symmetric Key Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1