Comparative Study of Apache Pig & Apache Cassandra in Hadoop Distributed Environment

2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) Pub Date : 2020-11-05 DOI:10.1109/ICECA49313.2020.9297532

Y. Gupta, Tanusha Mittal

{"title":"Comparative Study of Apache Pig & Apache Cassandra in Hadoop Distributed Environment","authors":"Y. Gupta, Tanusha Mittal","doi":"10.1109/ICECA49313.2020.9297532","DOIUrl":null,"url":null,"abstract":"Big data analytics is the one which acquire, organise and analyse the huge volume of data with high velocity to find some patterns and useful information. The data sets are so large that it can’t be handled by traditional databases to manage and process the structure and unstructured data. Hence, big data tools i.e. Hadoop, is required due to its high scalability, availability and cluster environment mechanism for analysing large volume of data. MapReduce is one of the important components of Hadoop which is able to handle the unstructured data. But to use MapReduce, high programming skills are needed. Therefore, due to the reason of programming, users are moving towards some other tools i.e. Apache Pig or Apache Cassandra. In these tools, the data is simply analysed by executing the queries or commands. This paper will discuss about the architectural of Apache Pig and Apache Cassandra and afterwards both the technologies regarding some factors are compared to find out which one is better.","PeriodicalId":297285,"journal":{"name":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA49313.2020.9297532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Big data analytics is the one which acquire, organise and analyse the huge volume of data with high velocity to find some patterns and useful information. The data sets are so large that it can’t be handled by traditional databases to manage and process the structure and unstructured data. Hence, big data tools i.e. Hadoop, is required due to its high scalability, availability and cluster environment mechanism for analysing large volume of data. MapReduce is one of the important components of Hadoop which is able to handle the unstructured data. But to use MapReduce, high programming skills are needed. Therefore, due to the reason of programming, users are moving towards some other tools i.e. Apache Pig or Apache Cassandra. In these tools, the data is simply analysed by executing the queries or commands. This paper will discuss about the architectural of Apache Pig and Apache Cassandra and afterwards both the technologies regarding some factors are compared to find out which one is better.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Apache Pig与Apache Cassandra在Hadoop分布式环境下的比较研究

大数据分析是对海量数据进行快速获取、整理和分析，从中发现一些规律和有用信息的一门学科。数据集非常庞大，传统数据库无法对结构化和非结构化数据进行管理和处理。因此，需要大数据工具，如Hadoop，因为它具有高可扩展性，可用性和集群环境机制，可以分析大量数据。MapReduce是Hadoop中处理非结构化数据的重要组件之一。但是要使用MapReduce，需要很高的编程技能。因此，由于编程的原因，用户正在转向其他一些工具，如Apache Pig或Apache Cassandra。在这些工具中，只需通过执行查询或命令来分析数据。本文将讨论Apache Pig和Apache Cassandra的体系结构，然后将两种技术在一些因素上进行比较，找出哪一种技术更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)

自引率

0.00%

发文量