Maratbek T. Gabdullin, Yerulan Suinullayev, Yelikbay Kabi, Jeong Won Kang, Assel Mukasheva
{"title":"Comparative Analysis of Hadoop and Spark Performance for Real-time Big Data Smart Platforms Utilizing IoT Technology in Electrical Facilities","authors":"Maratbek T. Gabdullin, Yerulan Suinullayev, Yelikbay Kabi, Jeong Won Kang, Assel Mukasheva","doi":"10.1007/s42835-024-01937-1","DOIUrl":null,"url":null,"abstract":"<p>As the adoption of IoT technology in power systems accelerates and the need for improved methods to handle large volumes of data emerges, real-time big data smart platforms must address the growing data processing demands in IoT integrated power systems. Therefore, in this study, we assess the performance of Hadoop and Spark for iterative computing and real-time data processing applications. Our evaluation is based on metrics such as execution time, resource utilization, and scalability, particularly with increasing data volume. The comparison aims to provide guidance to researchers, practitioners and entrepreneurs on platform selection depending on their specific requirements. The study identified the strengths and weaknesses of both platforms and provided valuable insights into optimizing the performance of big data applications. Text documents and charts for Word Count and PageRank tasks were used for comparison, and performance testing was performed on datasets of different sizes. The results showed that Spark outperforms Hadoop in most applications, especially in iterative computation and real-time data processing, due to its use of in-memory computation. However, Hadoop is best suited for batch processing operations that require multiple steps. It can perform these operations in parallel across multiple cluster nodes, enabling fast processing of large amounts of data. This comprehensive performance comparison of Hadoop and Spark in iterative computing and real-time data processing applications provides valuable information for researchers, practitioners, and enterprises on the trade-offs and benefits of using these big data platforms.</p>","PeriodicalId":15577,"journal":{"name":"Journal of Electrical Engineering & Technology","volume":"48 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical Engineering & Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s42835-024-01937-1","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
As the adoption of IoT technology in power systems accelerates and the need for improved methods to handle large volumes of data emerges, real-time big data smart platforms must address the growing data processing demands in IoT integrated power systems. Therefore, in this study, we assess the performance of Hadoop and Spark for iterative computing and real-time data processing applications. Our evaluation is based on metrics such as execution time, resource utilization, and scalability, particularly with increasing data volume. The comparison aims to provide guidance to researchers, practitioners and entrepreneurs on platform selection depending on their specific requirements. The study identified the strengths and weaknesses of both platforms and provided valuable insights into optimizing the performance of big data applications. Text documents and charts for Word Count and PageRank tasks were used for comparison, and performance testing was performed on datasets of different sizes. The results showed that Spark outperforms Hadoop in most applications, especially in iterative computation and real-time data processing, due to its use of in-memory computation. However, Hadoop is best suited for batch processing operations that require multiple steps. It can perform these operations in parallel across multiple cluster nodes, enabling fast processing of large amounts of data. This comprehensive performance comparison of Hadoop and Spark in iterative computing and real-time data processing applications provides valuable information for researchers, practitioners, and enterprises on the trade-offs and benefits of using these big data platforms.
期刊介绍:
ournal of Electrical Engineering and Technology (JEET), which is the official publication of the Korean Institute of Electrical Engineers (KIEE) being published bimonthly, released the first issue in March 2006.The journal is open to submission from scholars and experts in the wide areas of electrical engineering technologies.
The scope of the journal includes all issues in the field of Electrical Engineering and Technology. Included are techniques for electrical power engineering, electrical machinery and energy conversion systems, electrophysics and applications, information and controls.