M. Rashid, A. Hamid, Nazir Ahmad, M. Rehman, Mir Mohammad Yousuf
{"title":"使用Apache Flume进行实时Twitter数据情感分析的新颖机器学习方法","authors":"M. Rashid, A. Hamid, Nazir Ahmad, M. Rehman, Mir Mohammad Yousuf","doi":"10.1109/PDGC50313.2020.9315782","DOIUrl":null,"url":null,"abstract":"A lot of data is generated from multiple sources. This data contains many hidden patterns and information. Many researchers are trying to get meaningful insights out of these patterns. Data from these sources mostly contains opinions. Opinions can be mined to lead various extractions from organizational point of view. One approach is to use Sentiment Analysis. In this paper, the authors are storing the Twitter Streaming Data into HDFS of Hadoop by using Flume and then extracting with Apache Hive. Later, Machine Learning classification algorithms are applied to decode the sentiment in this data using Apache Mahout. A novel approach based on hybrid Naïve Bayes and Decision Tree Algorithms are used to enhance the performance of sentiment analysis of streaming twitter data. The implemented research approach achieved an accuracy of 86.44% in comparison to 81.11% for Naïve Bayes Classifier.","PeriodicalId":347216,"journal":{"name":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Novel Machine Learning Approach for Sentiment Analysis of Real Time Twitter Data with Apache Flume\",\"authors\":\"M. Rashid, A. Hamid, Nazir Ahmad, M. Rehman, Mir Mohammad Yousuf\",\"doi\":\"10.1109/PDGC50313.2020.9315782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A lot of data is generated from multiple sources. This data contains many hidden patterns and information. Many researchers are trying to get meaningful insights out of these patterns. Data from these sources mostly contains opinions. Opinions can be mined to lead various extractions from organizational point of view. One approach is to use Sentiment Analysis. In this paper, the authors are storing the Twitter Streaming Data into HDFS of Hadoop by using Flume and then extracting with Apache Hive. Later, Machine Learning classification algorithms are applied to decode the sentiment in this data using Apache Mahout. A novel approach based on hybrid Naïve Bayes and Decision Tree Algorithms are used to enhance the performance of sentiment analysis of streaming twitter data. The implemented research approach achieved an accuracy of 86.44% in comparison to 81.11% for Naïve Bayes Classifier.\",\"PeriodicalId\":347216,\"journal\":{\"name\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC50313.2020.9315782\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC50313.2020.9315782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novel Machine Learning Approach for Sentiment Analysis of Real Time Twitter Data with Apache Flume
A lot of data is generated from multiple sources. This data contains many hidden patterns and information. Many researchers are trying to get meaningful insights out of these patterns. Data from these sources mostly contains opinions. Opinions can be mined to lead various extractions from organizational point of view. One approach is to use Sentiment Analysis. In this paper, the authors are storing the Twitter Streaming Data into HDFS of Hadoop by using Flume and then extracting with Apache Hive. Later, Machine Learning classification algorithms are applied to decode the sentiment in this data using Apache Mahout. A novel approach based on hybrid Naïve Bayes and Decision Tree Algorithms are used to enhance the performance of sentiment analysis of streaming twitter data. The implemented research approach achieved an accuracy of 86.44% in comparison to 81.11% for Naïve Bayes Classifier.