{"title":"Efficient Distributed Matrix Factorization Alternating Least Squares (EDMFALS) for Recommendation Systems Using Spark","authors":"R. R. S. Ravi Kumar, G. Appa Rao, S. Anuradha","doi":"10.1142/s0219649222500125","DOIUrl":null,"url":null,"abstract":"With the emergence of e-commerce and social networking systems, the use of recommendation systems gained popularity to predict the user ratings of an item. Since the large volume of data is generated from various sources at high speed, predicting the ratings accurately in real-time adds enormous benefit to the users while choosing the correct item. So a recommendation system must be capable enough to predict the rating accurately when the data are large. Apache Spark is a distributed framework well suited for processing large datasets and real-time data streams. In this paper, we propose an efficient matrix factorisation algorithm based on Spark MLlib alternating least squares (ALS) for collaborative filtering. The optimisations used for the proposed algorithm using Tungsten improved the performance of the algorithm significantly while doing the predictions. The experimental results prove that the proposed work is significantly faster for top-N recommendations and rating predictions compared with the existing works.","PeriodicalId":45460,"journal":{"name":"Journal of Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2021-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219649222500125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the emergence of e-commerce and social networking systems, the use of recommendation systems gained popularity to predict the user ratings of an item. Since the large volume of data is generated from various sources at high speed, predicting the ratings accurately in real-time adds enormous benefit to the users while choosing the correct item. So a recommendation system must be capable enough to predict the rating accurately when the data are large. Apache Spark is a distributed framework well suited for processing large datasets and real-time data streams. In this paper, we propose an efficient matrix factorisation algorithm based on Spark MLlib alternating least squares (ALS) for collaborative filtering. The optimisations used for the proposed algorithm using Tungsten improved the performance of the algorithm significantly while doing the predictions. The experimental results prove that the proposed work is significantly faster for top-N recommendations and rating predictions compared with the existing works.
期刊介绍:
JIKM is a refereed journal published quarterly by World Scientific and dedicated to the exchange of the latest research and practical information in the field of information processing and knowledge management. The journal publishes original research and case studies by academic, business and government contributors on all aspects of information processing, information management, knowledge management, tools, techniques and technologies, knowledge creation and sharing, best practices, policies and guidelines. JIKM is an international journal aimed at providing quality information to subscribers around the world. Managed by an international editorial board, JIKM positions itself as one of the leading scholarly journals in the field of information processing and knowledge management. It is a good reference for both information and knowledge management professionals. The journal covers key areas in the field of information and knowledge management. Research papers, practical applications, working papers, and case studies are invited in the following areas: -Business intelligence and competitive intelligence -Communication and organizational culture -e-Learning and life long learning -Electronic records and document management -Information processing and information management -Information organization, taxonomies and ontology -Intellectual capital -Knowledge creation, retention, sharing and transfer -Knowledge discovery, data and text mining -Knowledge management and innovations -Knowledge management education -Knowledge management tools and technologies -Knowledge management measurements -Knowledge professionals and leadership -Learning organization and organizational learning -Practical implementations of knowledge management