用于去噪推荐的大语言模型增强型硬样本识别

Tianrui Song, Wenshuo Chao, Hao Liu
{"title":"用于去噪推荐的大语言模型增强型硬样本识别","authors":"Tianrui Song, Wenshuo Chao, Hao Liu","doi":"arxiv-2409.10343","DOIUrl":null,"url":null,"abstract":"Implicit feedback, often used to build recommender systems, unavoidably\nconfronts noise due to factors such as misclicks and position bias. Previous\nstudies have attempted to alleviate this by identifying noisy samples based on\ntheir diverged patterns, such as higher loss values, and mitigating the noise\nthrough sample dropping or reweighting. Despite the progress, we observe\nexisting approaches struggle to distinguish hard samples and noise samples, as\nthey often exhibit similar patterns, thereby limiting their effectiveness in\ndenoising recommendations. To address this challenge, we propose a Large\nLanguage Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically,\nwe construct an LLM-based scorer to evaluate the semantic consistency of items\nwith the user preference, which is quantified based on summarized historical\nuser interactions. The resulting scores are used to assess the hardness of\nsamples for the pointwise or pairwise training objectives. To ensure\nefficiency, we introduce a variance-based sample pruning strategy to filter\npotential hard samples before scoring. Besides, we propose an iterative\npreference update module designed to continuously refine summarized user\npreference, which may be biased due to false-positive user-item interactions.\nExtensive experiments on three real-world datasets and four backbone\nrecommenders demonstrate the effectiveness of our approach.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"69 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation\",\"authors\":\"Tianrui Song, Wenshuo Chao, Hao Liu\",\"doi\":\"arxiv-2409.10343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implicit feedback, often used to build recommender systems, unavoidably\\nconfronts noise due to factors such as misclicks and position bias. Previous\\nstudies have attempted to alleviate this by identifying noisy samples based on\\ntheir diverged patterns, such as higher loss values, and mitigating the noise\\nthrough sample dropping or reweighting. Despite the progress, we observe\\nexisting approaches struggle to distinguish hard samples and noise samples, as\\nthey often exhibit similar patterns, thereby limiting their effectiveness in\\ndenoising recommendations. To address this challenge, we propose a Large\\nLanguage Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically,\\nwe construct an LLM-based scorer to evaluate the semantic consistency of items\\nwith the user preference, which is quantified based on summarized historical\\nuser interactions. The resulting scores are used to assess the hardness of\\nsamples for the pointwise or pairwise training objectives. To ensure\\nefficiency, we introduce a variance-based sample pruning strategy to filter\\npotential hard samples before scoring. Besides, we propose an iterative\\npreference update module designed to continuously refine summarized user\\npreference, which may be biased due to false-positive user-item interactions.\\nExtensive experiments on three real-world datasets and four backbone\\nrecommenders demonstrate the effectiveness of our approach.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"69 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于误点击和位置偏差等因素,通常用于构建推荐系统的隐式反馈不可避免地会遇到噪音。以往的研究试图通过根据样本的不同模式(如较高的损失值)来识别噪声样本,并通过样本丢弃或重新加权来减轻噪声,从而缓解这一问题。尽管取得了进展,但我们发现现有的方法在区分硬样本和噪声样本时仍有困难,因为它们经常表现出相似的模式,从而限制了它们在剔除建议方面的有效性。为了应对这一挑战,我们提出了大型语言模型增强硬样本去噪 (LLMHD) 框架。具体来说,我们构建了一个基于 LLM 的评分器来评估项目与用户偏好在语义上的一致性,而用户偏好是基于历史用户交互总结量化的。由此得出的分数可用于评估点或成对训练目标的样本硬度。为了确保效率,我们引入了基于方差的样本剪枝策略,在评分前过滤潜在的硬样本。此外,我们还提出了一个迭代偏好更新模块,旨在不断完善总结出的用户偏好,而用户偏好可能会因为用户与项目之间的假阳性交互而产生偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation
Implicit feedback, often used to build recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns, such as higher loss values, and mitigating the noise through sample dropping or reweighting. Despite the progress, we observe existing approaches struggle to distinguish hard samples and noise samples, as they often exhibit similar patterns, thereby limiting their effectiveness in denoising recommendations. To address this challenge, we propose a Large Language Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically, we construct an LLM-based scorer to evaluate the semantic consistency of items with the user preference, which is quantified based on summarized historical user interactions. The resulting scores are used to assess the hardness of samples for the pointwise or pairwise training objectives. To ensure efficiency, we introduce a variance-based sample pruning strategy to filter potential hard samples before scoring. Besides, we propose an iterative preference update module designed to continuously refine summarized user preference, which may be biased due to false-positive user-item interactions. Extensive experiments on three real-world datasets and four backbone recommenders demonstrate the effectiveness of our approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Active Reconfigurable Intelligent Surface Empowered Synthetic Aperture Radar Imaging FLARE: Fusing Language Models and Collaborative Architectures for Recommender Enhancement Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1