{"title":"Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation","authors":"Tianrui Song, Wenshuo Chao, Hao Liu","doi":"arxiv-2409.10343","DOIUrl":null,"url":null,"abstract":"Implicit feedback, often used to build recommender systems, unavoidably\nconfronts noise due to factors such as misclicks and position bias. Previous\nstudies have attempted to alleviate this by identifying noisy samples based on\ntheir diverged patterns, such as higher loss values, and mitigating the noise\nthrough sample dropping or reweighting. Despite the progress, we observe\nexisting approaches struggle to distinguish hard samples and noise samples, as\nthey often exhibit similar patterns, thereby limiting their effectiveness in\ndenoising recommendations. To address this challenge, we propose a Large\nLanguage Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically,\nwe construct an LLM-based scorer to evaluate the semantic consistency of items\nwith the user preference, which is quantified based on summarized historical\nuser interactions. The resulting scores are used to assess the hardness of\nsamples for the pointwise or pairwise training objectives. To ensure\nefficiency, we introduce a variance-based sample pruning strategy to filter\npotential hard samples before scoring. Besides, we propose an iterative\npreference update module designed to continuously refine summarized user\npreference, which may be biased due to false-positive user-item interactions.\nExtensive experiments on three real-world datasets and four backbone\nrecommenders demonstrate the effectiveness of our approach.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"69 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Implicit feedback, often used to build recommender systems, unavoidably
confronts noise due to factors such as misclicks and position bias. Previous
studies have attempted to alleviate this by identifying noisy samples based on
their diverged patterns, such as higher loss values, and mitigating the noise
through sample dropping or reweighting. Despite the progress, we observe
existing approaches struggle to distinguish hard samples and noise samples, as
they often exhibit similar patterns, thereby limiting their effectiveness in
denoising recommendations. To address this challenge, we propose a Large
Language Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically,
we construct an LLM-based scorer to evaluate the semantic consistency of items
with the user preference, which is quantified based on summarized historical
user interactions. The resulting scores are used to assess the hardness of
samples for the pointwise or pairwise training objectives. To ensure
efficiency, we introduce a variance-based sample pruning strategy to filter
potential hard samples before scoring. Besides, we propose an iterative
preference update module designed to continuously refine summarized user
preference, which may be biased due to false-positive user-item interactions.
Extensive experiments on three real-world datasets and four backbone
recommenders demonstrate the effectiveness of our approach.