{"title":"Efficient Distributed MST Based Clustering for Recommender Systems","authors":"Ahmad Shahzad, Frans Coenen","doi":"10.1109/ICDMW51313.2020.00037","DOIUrl":null,"url":null,"abstract":"This paper presents the Distributed Kruskal Algorithm for Minimum Spanning Tree (MST) based clustering to be used in the context of recommendation engines. The algorithm can operate over large graph data sets distributed over a number of machines. The operation of the algorithm is evaluated by comparing both the quality of the cluster configurations produced, and the accuracy of the predictions, with non-MST based clustering approaches. The results indicate that the proposed approach produces comparable recommendations at much lower storage, hence runtime, costs.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents the Distributed Kruskal Algorithm for Minimum Spanning Tree (MST) based clustering to be used in the context of recommendation engines. The algorithm can operate over large graph data sets distributed over a number of machines. The operation of the algorithm is evaluated by comparing both the quality of the cluster configurations produced, and the accuracy of the predictions, with non-MST based clustering approaches. The results indicate that the proposed approach produces comparable recommendations at much lower storage, hence runtime, costs.