{"title":"Hive和SQL在云数据分析中的案例研究","authors":"Shireesha Chandra, A. Varde, Jiayin Wang","doi":"10.1109/UEMCON47517.2019.8992925","DOIUrl":null,"url":null,"abstract":"The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Hive and SQL Case Study in Cloud Data Analytics\",\"authors\":\"Shireesha Chandra, A. Varde, Jiayin Wang\",\"doi\":\"10.1109/UEMCON47517.2019.8992925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.\",\"PeriodicalId\":187022,\"journal\":{\"name\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON47517.2019.8992925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.