Improving top-N recommendations using batch approximation for weighted pair-wise loss

Machine learning with applications Pub Date : 2023-12-13 DOI:10.1016/j.mlwa.2023.100520

Sofia Aftab, Heri Ramampiaro

{"title":"Improving top-N recommendations using batch approximation for weighted pair-wise loss","authors":"Sofia Aftab, Heri Ramampiaro","doi":"10.1016/j.mlwa.2023.100520","DOIUrl":null,"url":null,"abstract":"<div><p>In collaborative filtering, matrix factorization and collaborative metric learning are challenged by situations where non-preferred items may appear so close to a user in the feature embedding space that they lead to degrading the recommendation performance. We call such items ‘potential impostor’ risks. Addressing the issues with ‘potential impostor’ is important because it can result in inefficient learning and poor feature extraction. To achieve this, we propose a novel loss function formulation designed to enhance learning efficiency by actively identifying and addressing impostors, leveraging item associations and learning the distribution of negative items. This approach is crucial for models to differentiate between positive and negative items effectively, even when they are closely aligned in the feature space. Here, a loss function is generally an objective optimization function that is defined based on user–item interaction data, through either implicit or explicit feedback. The loss function essentially decides how well a recommendation algorithm performs. In this paper, we introduce and define the concept of ‘potential impostor’, highlighting its impact on learned representation quality and algorithmic efficiency. We tackle the limitations of non-metric methods, like the Weighted Approximate Rank Pairwise Loss (WARP) method, which struggles to capture item–item similarities, by using a ‘similarity propagation’ strategy with a new loss term. Similarly, we address fixed margin inefficiencies in Weighted Collaborative Metric Learning (WCML), through density distribution approximation. This moves potential impostors away from the margin for more robust learning. Additionally, we propose a large-scale batch approximation algorithm for increased detection of impostors, coupled with an active learning strategy for improved top-<span><math><mi>N</mi></math></span> recommendation performance. Our extensive empirical analysis across five major and diverse datasets demonstrates the effectiveness and feasibility of our methods, compared to existing techniques with respect to improving AUC, reducing impostor rate, and increasing the average distance metrics. More specifically, our evaluation shows that our two proposed methods outperform the existing state-of-the-art techniques, with an improvement of AUC by 3.5% and 3.7%, NDCG by 1.0% and 9.1% and HR by 1.3% and 3.6%, respectively. Similarly, the impostor rate is decreased by 35% and 18%, and their average distance is increased by 33% and 37%, respectively.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"15 ","pages":"Article 100520"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000737/pdfft?md5=9ed329936dd4420c5fffd4c4464c6908&pid=1-s2.0-S2666827023000737-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827023000737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In collaborative filtering, matrix factorization and collaborative metric learning are challenged by situations where non-preferred items may appear so close to a user in the feature embedding space that they lead to degrading the recommendation performance. We call such items ‘potential impostor’ risks. Addressing the issues with ‘potential impostor’ is important because it can result in inefficient learning and poor feature extraction. To achieve this, we propose a novel loss function formulation designed to enhance learning efficiency by actively identifying and addressing impostors, leveraging item associations and learning the distribution of negative items. This approach is crucial for models to differentiate between positive and negative items effectively, even when they are closely aligned in the feature space. Here, a loss function is generally an objective optimization function that is defined based on user–item interaction data, through either implicit or explicit feedback. The loss function essentially decides how well a recommendation algorithm performs. In this paper, we introduce and define the concept of ‘potential impostor’, highlighting its impact on learned representation quality and algorithmic efficiency. We tackle the limitations of non-metric methods, like the Weighted Approximate Rank Pairwise Loss (WARP) method, which struggles to capture item–item similarities, by using a ‘similarity propagation’ strategy with a new loss term. Similarly, we address fixed margin inefficiencies in Weighted Collaborative Metric Learning (WCML), through density distribution approximation. This moves potential impostors away from the margin for more robust learning. Additionally, we propose a large-scale batch approximation algorithm for increased detection of impostors, coupled with an active learning strategy for improved top- $N$ recommendation performance. Our extensive empirical analysis across five major and diverse datasets demonstrates the effectiveness and feasibility of our methods, compared to existing techniques with respect to improving AUC, reducing impostor rate, and increasing the average distance metrics. More specifically, our evaluation shows that our two proposed methods outperform the existing state-of-the-art techniques, with an improvement of AUC by 3.5% and 3.7%, NDCG by 1.0% and 9.1% and HR by 1.3% and 3.6%, respectively. Similarly, the impostor rate is decreased by 35% and 18%, and their average distance is increased by 33% and 37%, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用批量近似加权配对损失改进 Top-N 推荐

在协同过滤中，当非首选项在特征嵌入空间中与用户非常接近而导致推荐性能下降时，矩阵分解和协同度量学习受到挑战。我们把这类产品称为“潜在的冒充者”风险。解决“潜在的冒充者”的问题很重要，因为它可能导致低效的学习和糟糕的特征提取。为了实现这一目标，我们提出了一种新的损失函数公式，旨在通过主动识别和处理冒名顶替者、利用项目关联和学习负面项目的分布来提高学习效率。这种方法对于模型有效区分正负项至关重要，即使它们在特征空间中紧密对齐。在这里，损失函数通常是基于用户-物品交互数据，通过隐式或显式反馈定义的目标优化函数。损失函数本质上决定了推荐算法的性能。在本文中，我们引入并定义了“潜在的冒充者”的概念，强调了它对学习表征质量和算法效率的影响。我们通过使用带有新损失项的“相似性传播”策略，解决了非度量方法的局限性，如加权近似秩成对损失(WARP)方法，该方法难以捕获项与项之间的相似性。同样，我们通过密度分布近似解决加权协同度量学习(WCML)中的固定边际低效问题。这让潜在的冒牌货远离了更强大的学习空间。此外，我们提出了一种大规模批处理近似算法来增加对冒名顶替者的检测，并结合主动学习策略来提高top-N推荐性能。我们对五个主要和不同的数据集进行了广泛的实证分析，与现有技术相比，我们的方法在提高AUC、降低冒名顶替率和增加平均距离指标方面具有有效性和可行性。更具体地说，我们的评估表明，我们提出的两种方法优于现有的最先进技术，AUC分别提高了3.5%和3.7%，NDCG分别提高了1.0%和9.1%，HR分别提高了1.3%和3.6%。同样，骗子率降低了35%和18%，他们的平均距离分别增加了33%和37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days