{"title":"Exploring and Evaluating the Scalability and Eficinecy of Apache Spark Using Educational Datasets","authors":"Jian Zhang, Zijiang Yang, Y. Benslimane","doi":"10.1109/ICMLC48188.2019.8949260","DOIUrl":null,"url":null,"abstract":"The combination of data mining and machine learning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the worldwide fast growth of the Information Communication Technology (ICT), data come with significant large volume, high velocity and extensive variety. In this paper, four popular data mining methods are applied on Apache Spark using large volume of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The output of the paper convincingly presents useful strategies of computing resource allocation and tuning to make full advantage of the in-memory system of Apache Spark with the tasks of data mining and machine learning on educational datasets.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The combination of data mining and machine learning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the worldwide fast growth of the Information Communication Technology (ICT), data come with significant large volume, high velocity and extensive variety. In this paper, four popular data mining methods are applied on Apache Spark using large volume of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The output of the paper convincingly presents useful strategies of computing resource allocation and tuning to make full advantage of the in-memory system of Apache Spark with the tasks of data mining and machine learning on educational datasets.