MapReduce与Hadoop在天气数据和单词统计分析方面的优势

2021 5th International Conference on Trends in Electronics and Informatics (ICOEI) Pub Date : 2021-06-03 DOI:10.1109/ICOEI51242.2021.9452980

Sree Lakshmi K, Theertha Jayarajan N, Nitha L

{"title":"MapReduce与Hadoop在天气数据和单词统计分析方面的优势","authors":"Sree Lakshmi K, Theertha Jayarajan N, Nitha L","doi":"10.1109/ICOEI51242.2021.9452980","DOIUrl":null,"url":null,"abstract":"Data flows from various sources in structured, semistructured or unstructured form and this type of data flow is referred as big data. Due to their large scale, rapid growth and diverse formats, these datasets are difficult to manage using conventional tools and techniques. Big Data analysis is a daunting activity as it requires large decentralized file systems that should be adaptive, resilient and responsive to fault. For the effective analysis of big data, Map Reduce is commonly used. Big data analysis helps researchers, scholars, and business users to extract the value and knowledge. Huge amounts of data have become accessible to decision makers in the information age. Due to the rapid increase of such data, strategies to manage and obtain value and knowledge from these datasets must be studied and delivered. Moreover, decision-makers must be able to extract useful information from such a dynamic and rapidly changing set of data, which includes everything from daily transactions to customer contact and social media data. In this paper, we explore Hadoop's parallel processing power in two application areas. The first scenario is calculation of minimum and maximum temperature with huge amount of weather data, which has been collected from an open source. The application analyses the entire weather station data set and the minimum and maximum temperatures (in Fahrenheit) of the respective weather stations will be displayed. The second scenario is to find the word count from huge datasets and checks the frequency of each word in a given data set irrespective of the data volume.","PeriodicalId":420826,"journal":{"name":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Ascendancy of MapReduce with Hadoop for Weather Data and Word Count Analytics\",\"authors\":\"Sree Lakshmi K, Theertha Jayarajan N, Nitha L\",\"doi\":\"10.1109/ICOEI51242.2021.9452980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data flows from various sources in structured, semistructured or unstructured form and this type of data flow is referred as big data. Due to their large scale, rapid growth and diverse formats, these datasets are difficult to manage using conventional tools and techniques. Big Data analysis is a daunting activity as it requires large decentralized file systems that should be adaptive, resilient and responsive to fault. For the effective analysis of big data, Map Reduce is commonly used. Big data analysis helps researchers, scholars, and business users to extract the value and knowledge. Huge amounts of data have become accessible to decision makers in the information age. Due to the rapid increase of such data, strategies to manage and obtain value and knowledge from these datasets must be studied and delivered. Moreover, decision-makers must be able to extract useful information from such a dynamic and rapidly changing set of data, which includes everything from daily transactions to customer contact and social media data. In this paper, we explore Hadoop's parallel processing power in two application areas. The first scenario is calculation of minimum and maximum temperature with huge amount of weather data, which has been collected from an open source. The application analyses the entire weather station data set and the minimum and maximum temperatures (in Fahrenheit) of the respective weather stations will be displayed. The second scenario is to find the word count from huge datasets and checks the frequency of each word in a given data set irrespective of the data volume.\",\"PeriodicalId\":420826,\"journal\":{\"name\":\"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOEI51242.2021.9452980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI51242.2021.9452980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据以结构化、半结构化或非结构化的形式从各种来源流出，这种类型的数据流被称为大数据。由于这些数据集规模庞大、增长迅速、格式多样，使用传统工具和技术很难对其进行管理。大数据分析是一项艰巨的任务，因为它需要大型分散的文件系统，这些文件系统应该具有自适应能力、弹性和对故障的响应能力。为了对大数据进行有效的分析，Map Reduce是常用的。大数据分析帮助研究人员、学者和商业用户提取价值和知识。在信息时代，决策者可以获得大量的数据。由于此类数据的快速增长，必须研究和提供管理策略，并从这些数据集中获取价值和知识。此外，决策者必须能够从这种动态和快速变化的数据集中提取有用的信息，这些数据集包括从日常交易到客户联系和社交媒体数据的所有内容。在本文中，我们将探讨Hadoop在两个应用领域中的并行处理能力。第一种情况是利用从开源软件收集的大量天气数据计算最低和最高温度。该应用程序分析整个气象站数据集，并显示各个气象站的最低和最高温度(以华氏度为单位)。第二种情况是从庞大的数据集中找到单词计数，并检查给定数据集中每个单词的频率，而不考虑数据量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Ascendancy of MapReduce with Hadoop for Weather Data and Word Count Analytics

Data flows from various sources in structured, semistructured or unstructured form and this type of data flow is referred as big data. Due to their large scale, rapid growth and diverse formats, these datasets are difficult to manage using conventional tools and techniques. Big Data analysis is a daunting activity as it requires large decentralized file systems that should be adaptive, resilient and responsive to fault. For the effective analysis of big data, Map Reduce is commonly used. Big data analysis helps researchers, scholars, and business users to extract the value and knowledge. Huge amounts of data have become accessible to decision makers in the information age. Due to the rapid increase of such data, strategies to manage and obtain value and knowledge from these datasets must be studied and delivered. Moreover, decision-makers must be able to extract useful information from such a dynamic and rapidly changing set of data, which includes everything from daily transactions to customer contact and social media data. In this paper, we explore Hadoop's parallel processing power in two application areas. The first scenario is calculation of minimum and maximum temperature with huge amount of weather data, which has been collected from an open source. The application analyses the entire weather station data set and the minimum and maximum temperatures (in Fahrenheit) of the respective weather stations will be displayed. The second scenario is to find the word count from huge datasets and checks the frequency of each word in a given data set irrespective of the data volume.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)

自引率

0.00%

发文量

期刊最新文献

A Comparative Analysis of Various Transfer Learning Approaches Skin Cancer Detection Deep Learning Methods for Object Detection in Autonomous Vehicles Load Manage Optimization through Grid and PV Energy Integration System Design of Brain Controlled Robotic Car using Raspberry Pi Feasibility Study of Economic Forecasting Model based on Data Mining