{"title":"Performance Optimization of Machine Learning Algorithms Based on Spark","authors":"Weikang Luo, Shenglin Zhang, Yinggen Xu","doi":"10.2478/amns-2024-0416","DOIUrl":null,"url":null,"abstract":"\n This paper proposes a performance optimization strategy for Spark-based machine learning algorithms in Shuffle and memory data management modules. The Shuffle module is optimized by introducing Observer monitoring module in Spark cluster to achieve task status monitoring and dynamic ShuffleWrite task generation. Meanwhile, an adaptive caching mechanism for RDD data addresses the lack of in-memory data caching. The performance-optimized algorithm performs well in the experiments, with a clustering accuracy of 89% and a response time that is 5% faster than the Random Forest algorithm. In road network traffic state discrimination, the optimized algorithm’s classification decision F-measure value is as high as 99.53%, which is 5.32% higher than that before unoptimization, and the running time is 767 seconds less than that of the unoptimized algorithm when dealing with about 6,880,000 pieces of data, which significantly improves the efficiency and accuracy.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":"51 1","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/amns-2024-0416","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a performance optimization strategy for Spark-based machine learning algorithms in Shuffle and memory data management modules. The Shuffle module is optimized by introducing Observer monitoring module in Spark cluster to achieve task status monitoring and dynamic ShuffleWrite task generation. Meanwhile, an adaptive caching mechanism for RDD data addresses the lack of in-memory data caching. The performance-optimized algorithm performs well in the experiments, with a clustering accuracy of 89% and a response time that is 5% faster than the Random Forest algorithm. In road network traffic state discrimination, the optimized algorithm’s classification decision F-measure value is as high as 99.53%, which is 5.32% higher than that before unoptimization, and the running time is 767 seconds less than that of the unoptimized algorithm when dealing with about 6,880,000 pieces of data, which significantly improves the efficiency and accuracy.
期刊介绍:
ACS Applied Electronic Materials is an interdisciplinary journal publishing original research covering all aspects of electronic materials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials science, engineering, optics, physics, and chemistry into important applications of electronic materials. Sample research topics that span the journal's scope are inorganic, organic, ionic and polymeric materials with properties that include conducting, semiconducting, superconducting, insulating, dielectric, magnetic, optoelectronic, piezoelectric, ferroelectric and thermoelectric.
Indexed/Abstracted:
Web of Science SCIE
Scopus
CAS
INSPEC
Portico