Hive和SQL在云数据分析中的案例研究

Shireesha Chandra, A. Varde, Jiayin Wang
{"title":"Hive和SQL在云数据分析中的案例研究","authors":"Shireesha Chandra, A. Varde, Jiayin Wang","doi":"10.1109/UEMCON47517.2019.8992925","DOIUrl":null,"url":null,"abstract":"The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Hive and SQL Case Study in Cloud Data Analytics\",\"authors\":\"Shireesha Chandra, A. Varde, Jiayin Wang\",\"doi\":\"10.1109/UEMCON47517.2019.8992925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.\",\"PeriodicalId\":187022,\"journal\":{\"name\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON47517.2019.8992925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

数字宇宙正在以非常快的速度膨胀,产生大量的数据集。为了跟上大数据的处理和存储需求,并发现知识,我们需要可扩展的基础设施和技术,可以同时从多个磁盘访问数据。云计算为如此庞大的数据集提供了数据分析的范例。虽然SQL在数据库和数据挖掘专业人士中继续流行,但近年来Hive已经成为一种快速发展的大数据技术,这使得它非常适合在云上使用。在本文中,我们对Hive和SQL进行了调查研究,并对它们进行了详细的案例研究,考虑了云数据管理和挖掘。我们在这里的工作构成了一个彻底的审查,重点是在云基础设施上处理Hive查询,考虑了三种不同的方法,并深入研究了云上使用类似方法的SQL处理。真实数据集用于Hive和SQL的各种操作。本文对这两种技术进行了性能比较,并解释了在处理和分析数据时一种技术优于另一种技术的环境。它提供了基于案例研究的云数据分析建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Hive and SQL Case Study in Cloud Data Analytics
The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning for DDoS Attack Classification Using Hive Plots Low Power Design for DVFS Capable Software ADREMOVER: THE IMPROVED MACHINE LEARNING APPROACH FOR BLOCKING ADS Overhead View Person Detection Using YOLO Multi-sensor Wearable for Child Safety
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1