Big data processing with harnessing hadoop - MapReduce for optimizing analytical workloads

K. V. Rama Satish, N. Kavya
{"title":"Big data processing with harnessing hadoop - MapReduce for optimizing analytical workloads","authors":"K. V. Rama Satish, N. Kavya","doi":"10.1109/IC3I.2014.7019818","DOIUrl":null,"url":null,"abstract":"Now a days, we are living with social media data like heartbeat. The exponential growth with data first presented challenges to cutting-edge businesses such as Google, MSN, Flipkart, Microsoft, Facebook, Twitter, LinkedIn etc. Nevertheless, existing big data analytical models for hadoop comply with MapReduce analytical workloads that process a small segment of the whole data set, thus failing to assess the capabilities of the MapReduce model under heavy workloads that process exponentially accumulative data sizes.[1] In all social business and technical research applications, there is a need to process big data of data in efficient manner on normal uses data. In this paper, we have proposed an efficient technique to classify the big data from e-mail using firefly and naïve bayes classifier. Proposed technique is comprised into two phase, (i) Map reduce framework for training and (ii) Map reduce framework for testing. Initially, the input twitter data is given to the process to select the suitable feature for data classification. The traditional firefly algorithm is applied and the optimized feature space is adopted for the best fitting results. Once the best feature space is identified through firefly algorithm, the data classification is done using the naïve bayes classifier. Here, these two processes are effectively distributed based on the concept given in Map-Reduce framework. The results of the experiment are validated using evaluation metrics namely, computation time, accuracy, specificity and sensitivity. For comparative analysis, proposed big data classification is compared with the existing works of naïve bayes and neural network.","PeriodicalId":430848,"journal":{"name":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2014.7019818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Now a days, we are living with social media data like heartbeat. The exponential growth with data first presented challenges to cutting-edge businesses such as Google, MSN, Flipkart, Microsoft, Facebook, Twitter, LinkedIn etc. Nevertheless, existing big data analytical models for hadoop comply with MapReduce analytical workloads that process a small segment of the whole data set, thus failing to assess the capabilities of the MapReduce model under heavy workloads that process exponentially accumulative data sizes.[1] In all social business and technical research applications, there is a need to process big data of data in efficient manner on normal uses data. In this paper, we have proposed an efficient technique to classify the big data from e-mail using firefly and naïve bayes classifier. Proposed technique is comprised into two phase, (i) Map reduce framework for training and (ii) Map reduce framework for testing. Initially, the input twitter data is given to the process to select the suitable feature for data classification. The traditional firefly algorithm is applied and the optimized feature space is adopted for the best fitting results. Once the best feature space is identified through firefly algorithm, the data classification is done using the naïve bayes classifier. Here, these two processes are effectively distributed based on the concept given in Map-Reduce framework. The results of the experiment are validated using evaluation metrics namely, computation time, accuracy, specificity and sensitivity. For comparative analysis, proposed big data classification is compared with the existing works of naïve bayes and neural network.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用hadoop - MapReduce优化分析工作负载的大数据处理
如今,我们生活在像心跳这样的社交媒体数据中。数据的指数级增长首先对谷歌、MSN、Flipkart、微软、Facebook、Twitter、LinkedIn等前沿企业提出了挑战。然而,现有的hadoop大数据分析模型符合MapReduce分析工作负载,处理整个数据集的一小部分,因此无法评估MapReduce模型在处理指数级累积数据量的繁重工作负载下的能力。[1]在所有的社会商业和技术研究应用中,都需要对正常使用的数据进行高效的数据大数据处理。本文提出了一种利用萤火虫和naïve贝叶斯分类器对电子邮件大数据进行分类的有效方法。建议的技术分为两个阶段,(i)用于训练的地图缩减框架和(ii)用于测试的地图缩减框架。首先,将输入的twitter数据交给流程选择合适的特征进行数据分类。采用传统的萤火虫算法,利用优化后的特征空间获得最佳拟合结果。通过萤火虫算法识别出最佳特征空间后,使用naïve贝叶斯分类器进行数据分类。在这里,这两个进程基于Map-Reduce框架中给出的概念进行了有效的分布。通过计算时间、准确性、特异性和敏感性等评价指标对实验结果进行了验证。为了进行对比分析,我们将提出的大数据分类与naïve贝叶斯和神经网络的现有工作进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart home and smart city solutions enabled by 5G, IoT, AAI and CoT services Video retrieval: An accurate approach based on Kirsch descriptor Microarray data classification using Fuzzy K-Nearest Neighbor Assessment of data quality in Web sites: towards a model A novel cross layer wireless mesh network protocol for distributed generation in electrical networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1