{"title":"Improvement of Dynamic Hybrid Collaborative Filtering Based on Spark","authors":"Haorui Li, Qiang Huang","doi":"10.1109/ICCC47050.2019.9064416","DOIUrl":null,"url":null,"abstract":"Iterative computation due to the advantage of memory computing framework in Spark big data platform, so This paper applies ALS model recommendation algorithm on Spark platform and improves its calculation method. Considering more practical factors to get more accurate result sets, we first use C-Means clustering to classify data preprocessing, so as to reduce the calculation of redundant data and the sparsity of matrix. Secondly, the cosine similarity and Pearson similarity are applied to improve the user similarity calculation. Finally, a mixed recommendation function is constructed. On the Spark distributed large data platform, this method trains and compares the results offline and real-time through MovieLens data set, which shows that it reduces the computing time, improves the efficiency and accuracy of the algorithm.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"198 1","pages":"8-12"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Iterative computation due to the advantage of memory computing framework in Spark big data platform, so This paper applies ALS model recommendation algorithm on Spark platform and improves its calculation method. Considering more practical factors to get more accurate result sets, we first use C-Means clustering to classify data preprocessing, so as to reduce the calculation of redundant data and the sparsity of matrix. Secondly, the cosine similarity and Pearson similarity are applied to improve the user similarity calculation. Finally, a mixed recommendation function is constructed. On the Spark distributed large data platform, this method trains and compares the results offline and real-time through MovieLens data set, which shows that it reduces the computing time, improves the efficiency and accuracy of the algorithm.