Improving top-N recommendations using batch approximation for weighted pair-wise loss

Sofia Aftab, Heri Ramampiaro
{"title":"Improving top-N recommendations using batch approximation for weighted pair-wise loss","authors":"Sofia Aftab,&nbsp;Heri Ramampiaro","doi":"10.1016/j.mlwa.2023.100520","DOIUrl":null,"url":null,"abstract":"<div><p>In collaborative filtering, matrix factorization and collaborative metric learning are challenged by situations where non-preferred items may appear so close to a user in the feature embedding space that they lead to degrading the recommendation performance. We call such items ‘potential impostor’ risks. Addressing the issues with ‘potential impostor’ is important because it can result in inefficient learning and poor feature extraction. To achieve this, we propose a novel loss function formulation designed to enhance learning efficiency by actively identifying and addressing impostors, leveraging item associations and learning the distribution of negative items. This approach is crucial for models to differentiate between positive and negative items effectively, even when they are closely aligned in the feature space. Here, a loss function is generally an objective optimization function that is defined based on user–item interaction data, through either implicit or explicit feedback. The loss function essentially decides how well a recommendation algorithm performs. In this paper, we introduce and define the concept of ‘potential impostor’, highlighting its impact on learned representation quality and algorithmic efficiency. We tackle the limitations of non-metric methods, like the Weighted Approximate Rank Pairwise Loss (WARP) method, which struggles to capture item–item similarities, by using a ‘similarity propagation’ strategy with a new loss term. Similarly, we address fixed margin inefficiencies in Weighted Collaborative Metric Learning (WCML), through density distribution approximation. This moves potential impostors away from the margin for more robust learning. Additionally, we propose a large-scale batch approximation algorithm for increased detection of impostors, coupled with an active learning strategy for improved top-<span><math><mi>N</mi></math></span> recommendation performance. Our extensive empirical analysis across five major and diverse datasets demonstrates the effectiveness and feasibility of our methods, compared to existing techniques with respect to improving AUC, reducing impostor rate, and increasing the average distance metrics. More specifically, our evaluation shows that our two proposed methods outperform the existing state-of-the-art techniques, with an improvement of AUC by 3.5% and 3.7%, NDCG by 1.0% and 9.1% and HR by 1.3% and 3.6%, respectively. Similarly, the impostor rate is decreased by 35% and 18%, and their average distance is increased by 33% and 37%, respectively.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"15 ","pages":"Article 100520"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000737/pdfft?md5=9ed329936dd4420c5fffd4c4464c6908&pid=1-s2.0-S2666827023000737-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827023000737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In collaborative filtering, matrix factorization and collaborative metric learning are challenged by situations where non-preferred items may appear so close to a user in the feature embedding space that they lead to degrading the recommendation performance. We call such items ‘potential impostor’ risks. Addressing the issues with ‘potential impostor’ is important because it can result in inefficient learning and poor feature extraction. To achieve this, we propose a novel loss function formulation designed to enhance learning efficiency by actively identifying and addressing impostors, leveraging item associations and learning the distribution of negative items. This approach is crucial for models to differentiate between positive and negative items effectively, even when they are closely aligned in the feature space. Here, a loss function is generally an objective optimization function that is defined based on user–item interaction data, through either implicit or explicit feedback. The loss function essentially decides how well a recommendation algorithm performs. In this paper, we introduce and define the concept of ‘potential impostor’, highlighting its impact on learned representation quality and algorithmic efficiency. We tackle the limitations of non-metric methods, like the Weighted Approximate Rank Pairwise Loss (WARP) method, which struggles to capture item–item similarities, by using a ‘similarity propagation’ strategy with a new loss term. Similarly, we address fixed margin inefficiencies in Weighted Collaborative Metric Learning (WCML), through density distribution approximation. This moves potential impostors away from the margin for more robust learning. Additionally, we propose a large-scale batch approximation algorithm for increased detection of impostors, coupled with an active learning strategy for improved top-N recommendation performance. Our extensive empirical analysis across five major and diverse datasets demonstrates the effectiveness and feasibility of our methods, compared to existing techniques with respect to improving AUC, reducing impostor rate, and increasing the average distance metrics. More specifically, our evaluation shows that our two proposed methods outperform the existing state-of-the-art techniques, with an improvement of AUC by 3.5% and 3.7%, NDCG by 1.0% and 9.1% and HR by 1.3% and 3.6%, respectively. Similarly, the impostor rate is decreased by 35% and 18%, and their average distance is increased by 33% and 37%, respectively.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用批量近似加权配对损失改进 Top-N 推荐
在协同过滤中,当非首选项在特征嵌入空间中与用户非常接近而导致推荐性能下降时,矩阵分解和协同度量学习受到挑战。我们把这类产品称为“潜在的冒充者”风险。解决“潜在的冒充者”的问题很重要,因为它可能导致低效的学习和糟糕的特征提取。为了实现这一目标,我们提出了一种新的损失函数公式,旨在通过主动识别和处理冒名顶替者、利用项目关联和学习负面项目的分布来提高学习效率。这种方法对于模型有效区分正负项至关重要,即使它们在特征空间中紧密对齐。在这里,损失函数通常是基于用户-物品交互数据,通过隐式或显式反馈定义的目标优化函数。损失函数本质上决定了推荐算法的性能。在本文中,我们引入并定义了“潜在的冒充者”的概念,强调了它对学习表征质量和算法效率的影响。我们通过使用带有新损失项的“相似性传播”策略,解决了非度量方法的局限性,如加权近似秩成对损失(WARP)方法,该方法难以捕获项与项之间的相似性。同样,我们通过密度分布近似解决加权协同度量学习(WCML)中的固定边际低效问题。这让潜在的冒牌货远离了更强大的学习空间。此外,我们提出了一种大规模批处理近似算法来增加对冒名顶替者的检测,并结合主动学习策略来提高top-N推荐性能。我们对五个主要和不同的数据集进行了广泛的实证分析,与现有技术相比,我们的方法在提高AUC、降低冒名顶替率和增加平均距离指标方面具有有效性和可行性。更具体地说,我们的评估表明,我们提出的两种方法优于现有的最先进技术,AUC分别提高了3.5%和3.7%,NDCG分别提高了1.0%和9.1%,HR分别提高了1.3%和3.6%。同样,骗子率降低了35%和18%,他们的平均距离分别增加了33%和37%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
期刊最新文献
Document Layout Error Rate (DLER) metric to evaluate image segmentation methods Supervised machine learning for microbiomics: Bridging the gap between current and best practices Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans A survey on knowledge distillation: Recent advancements Texas rural land market integration: A causal analysis using machine learning applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1