{"title":"Comparative Study of Apache Pig & Apache Cassandra in Hadoop Distributed Environment","authors":"Y. Gupta, Tanusha Mittal","doi":"10.1109/ICECA49313.2020.9297532","DOIUrl":null,"url":null,"abstract":"Big data analytics is the one which acquire, organise and analyse the huge volume of data with high velocity to find some patterns and useful information. The data sets are so large that it can’t be handled by traditional databases to manage and process the structure and unstructured data. Hence, big data tools i.e. Hadoop, is required due to its high scalability, availability and cluster environment mechanism for analysing large volume of data. MapReduce is one of the important components of Hadoop which is able to handle the unstructured data. But to use MapReduce, high programming skills are needed. Therefore, due to the reason of programming, users are moving towards some other tools i.e. Apache Pig or Apache Cassandra. In these tools, the data is simply analysed by executing the queries or commands. This paper will discuss about the architectural of Apache Pig and Apache Cassandra and afterwards both the technologies regarding some factors are compared to find out which one is better.","PeriodicalId":297285,"journal":{"name":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA49313.2020.9297532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Big data analytics is the one which acquire, organise and analyse the huge volume of data with high velocity to find some patterns and useful information. The data sets are so large that it can’t be handled by traditional databases to manage and process the structure and unstructured data. Hence, big data tools i.e. Hadoop, is required due to its high scalability, availability and cluster environment mechanism for analysing large volume of data. MapReduce is one of the important components of Hadoop which is able to handle the unstructured data. But to use MapReduce, high programming skills are needed. Therefore, due to the reason of programming, users are moving towards some other tools i.e. Apache Pig or Apache Cassandra. In these tools, the data is simply analysed by executing the queries or commands. This paper will discuss about the architectural of Apache Pig and Apache Cassandra and afterwards both the technologies regarding some factors are compared to find out which one is better.