In recent years, various dense retrieval methods have been developed to improve the performance of search engines with a vectorized index. However, these approaches require a large pre-computed index and have limited capacity to memorize all semantics in a document within a single vector. To address these issues, researchers have explored end-to-end generative retrieval models that use a seq-to-seq generative model to directly return identifiers of relevant documents. Although these models have been effective, they are often trained with the maximum likelihood estimation method. It only encourages the model to assign a high probability to the relevant document identifier, ignoring the relevance comparisons of other documents. This may lead to performance degradation in ranking tasks, where the core is to compare the relevance between documents. To address this issue, we propose a ranking-oriented generative retrieval model that incorporates relevance signals in order to better estimate the relative relevance of different documents in ranking tasks. Based upon the analysis of the optimization objectives of dense retrieval and generative retrieval, we propose utilizing dense retrieval to provide relevance feedback for generative retrieval. Under an alternate training framework, the generative retrieval model gradually acquires higher-quality ranking signals to optimize the model. Experimental results show that our approach increasing Recall@1 by 12.9% with respect to the baselines on MS MARCO dataset.
近年来,人们开发了各种密集检索方法,以提高搜索引擎的矢量化索引性能。然而,这些方法需要大量的预计算索引,而且记忆单个向量中文档所有语义的能力有限。为了解决这些问题,研究人员探索了端到端生成检索模型,这些模型使用序列到序列生成模型直接返回相关文档的标识符。虽然这些模型很有效,但它们通常是用最大似然估计法来训练的。这种方法只鼓励模型为相关文档标识符分配高概率,而忽略了其他文档的相关性比较。这可能会导致排序任务的性能下降,而排序任务的核心是比较文档之间的相关性。为了解决这个问题,我们提出了一种以排序为导向的生成式检索模型,该模型结合了相关性信号,以便在排序任务中更好地估计不同文档的相对相关性。基于对高密度检索和生成式检索优化目标的分析,我们建议利用高密度检索为生成式检索提供相关性反馈。在另一种训练框架下,生成式检索模型逐渐获得更高质量的排序信号,从而优化模型。实验结果表明,在 MS MARCO 数据集上,我们的方法将 Recall@1 提高了 12.9%。
{"title":"ROGER: Ranking-oriented Generative Retrieval","authors":"Yujia Zhou, Jing Yao, Zhicheng Dou, Yiteng Tu, Ledell Wu, Tat-Seng Chua, Ji-Rong Wen","doi":"10.1145/3603167","DOIUrl":"https://doi.org/10.1145/3603167","url":null,"abstract":"<p>In recent years, various dense retrieval methods have been developed to improve the performance of search engines with a vectorized index. However, these approaches require a large pre-computed index and have limited capacity to memorize all semantics in a document within a single vector. To address these issues, researchers have explored end-to-end generative retrieval models that use a seq-to-seq generative model to directly return identifiers of relevant documents. Although these models have been effective, they are often trained with the maximum likelihood estimation method. It only encourages the model to assign a high probability to the relevant document identifier, ignoring the relevance comparisons of other documents. This may lead to performance degradation in ranking tasks, where the core is to compare the relevance between documents. To address this issue, we propose a ranking-oriented generative retrieval model that incorporates relevance signals in order to better estimate the relative relevance of different documents in ranking tasks. Based upon the analysis of the optimization objectives of dense retrieval and generative retrieval, we propose utilizing dense retrieval to provide relevance feedback for generative retrieval. Under an alternate training framework, the generative retrieval model gradually acquires higher-quality ranking signals to optimize the model. Experimental results show that our approach increasing Recall@1 by 12.9% with respect to the baselines on MS MARCO dataset.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visually-aware recommender systems have found widespread applications in domains where visual elements significantly contribute to the inference of users’ potential preferences. While the incorporation of visual information holds the promise of enhancing recommendation accuracy and alleviating the cold-start problem, it is essential to point out that the inclusion of item images may introduce substantial security challenges. Some existing works have shown that the item provider can manipulate item exposure rates to its advantage by constructing adversarial images. However, these works cannot reveal the real vulnerability of visually-aware recommender systems because (1) the generated adversarial images are markedly distorted, rendering them easily detected by human observers; (2) the effectiveness of these attacks is inconsistent and even ineffective in some scenarios or datasets. To shed light on the real vulnerabilities of visually-aware recommender systems when confronted with adversarial images, this paper introduces a novel attack method, IPDGI (Item Promotion by Diffusion Generated Image). Specifically, IPDGI employs a guided diffusion model to generate adversarial samples designed to promote the exposure rates of target items (e.g., long-tail items). Taking advantage of accurately modeling benign images’ distribution by diffusion models, the generated adversarial images have high fidelity with original images, ensuring the stealth of our IPDGI. To demonstrate the effectiveness of our proposed methods, we conduct extensive experiments on two commonly used e-commerce recommendation datasets (Amazon Beauty and Amazon Baby) with several typical visually-aware recommender systems. The experimental results show that our attack method significantly improves both the performance of promoting the long-tailed (i.e., unpopular) items and the quality of generated adversarial images.
{"title":"Adversarial Item Promotion on Visually-Aware Recommender Systems by Guided Diffusion","authors":"Lijian Chen, Wei Yuan, Tong Chen, Guanhua Ye, Nguyen Quoc Viet Hung, Hongzhi Yin","doi":"10.1145/3666088","DOIUrl":"https://doi.org/10.1145/3666088","url":null,"abstract":"<p>Visually-aware recommender systems have found widespread applications in domains where visual elements significantly contribute to the inference of users’ potential preferences. While the incorporation of visual information holds the promise of enhancing recommendation accuracy and alleviating the cold-start problem, it is essential to point out that the inclusion of item images may introduce substantial security challenges. Some existing works have shown that the item provider can manipulate item exposure rates to its advantage by constructing adversarial images. However, these works cannot reveal the real vulnerability of visually-aware recommender systems because (1) the generated adversarial images are markedly distorted, rendering them easily detected by human observers; (2) the effectiveness of these attacks is inconsistent and even ineffective in some scenarios or datasets. To shed light on the real vulnerabilities of visually-aware recommender systems when confronted with adversarial images, this paper introduces a novel attack method, IPDGI (Item Promotion by Diffusion Generated Image). Specifically, IPDGI employs a guided diffusion model to generate adversarial samples designed to promote the exposure rates of target items (e.g., long-tail items). Taking advantage of accurately modeling benign images’ distribution by diffusion models, the generated adversarial images have high fidelity with original images, ensuring the stealth of our IPDGI. To demonstrate the effectiveness of our proposed methods, we conduct extensive experiments on two commonly used e-commerce recommendation datasets (Amazon Beauty and Amazon Baby) with several typical visually-aware recommender systems. The experimental results show that our attack method significantly improves both the performance of promoting the long-tailed (i.e., unpopular) items and the quality of generated adversarial images.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141168249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-(k) retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-(k) retrieval methods. We study clustering-based approximate MIPS where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that clustering-based retrieval serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and explore their potential. First, we cast the clustering-based paradigm as dynamic pruning and turn that insight into a novel organization of the inverted index for approximate MIPS over general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, that is robust to query distributions.
{"title":"Bridging Dense and Sparse Maximum Inner Product Search","authors":"Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty","doi":"10.1145/3665324","DOIUrl":"https://doi.org/10.1145/3665324","url":null,"abstract":"<p>Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-(k) retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-(k) retrieval methods. We study clustering-based approximate MIPS where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that clustering-based retrieval serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and explore their potential. First, we cast the clustering-based paradigm as dynamic pruning and turn that insight into a novel organization of the inverted index for approximate MIPS over general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, that is robust to query distributions.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"5 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing sequential POI recommendation methods overlook a fact that each city exhibits distinct characteristics and totally ignore the city signature. In this study, we claim that city matters in sequential POI recommendation and fully exploring city signature can highlight the characteristics of each city and facilitate cross-city complementary learning. To this end, we consider the two-city scenario and propose a dual-target cross-city sequential POI recommendation model (DCSPR) to achieve the purpose of complementary learning across cities. On one hand, DCSPR respectively captures geographical and cultural characteristics for each city by mining intra-city regions and intra-city functions of POIs. On the other hand, DCSPR builds a transfer channel between cities based on intra-city functions, and adopts a novel transfer strategy to transfer useful cultural characteristics across cities by mining inter-city functions of POIs. Moreover, to utilize these captured characteristics for sequential POI recommendation, DCSPR involves a new region- and function-aware network for each city to learn transition patterns from multiple views. Extensive experiments conducted on two real-world datasets with four cities demonstrate the effectiveness of DCSPR.
现有的顺序 POI 推荐方法忽视了每个城市都具有鲜明特点的事实,完全忽略了城市特征。在本研究中,我们认为城市在连续 POI 推荐中非常重要,充分挖掘城市特征可以突出每个城市的特点,促进跨城市互补学习。为此,我们考虑了双城市场景,提出了双目标跨城市顺序 POI 推荐模型(DCSPR),以实现跨城市互补学习的目的。一方面,DCSPR 通过挖掘城市内区域和 POI 的城市内功能,分别捕捉每个城市的地理和文化特征。另一方面,DCSPR 基于城市内函数建立城市间的转移通道,并采用新颖的转移策略,通过挖掘 POIs 的城市间函数在城市间转移有用的文化特征。此外,为了利用这些捕捉到的特征进行顺序 POI 推荐,DCSPR 还为每个城市建立了一个新的区域和功能感知网络,以便从多个视图中学习过渡模式。在包含四个城市的两个真实世界数据集上进行的广泛实验证明了 DCSPR 的有效性。
{"title":"City Matters! A Dual-Target Cross-City Sequential POI Recommendation Model","authors":"Ke Sun, Chenliang Li, Tieyun Qian","doi":"10.1145/3664284","DOIUrl":"https://doi.org/10.1145/3664284","url":null,"abstract":"<p>Existing sequential POI recommendation methods overlook a fact that each city exhibits distinct characteristics and totally ignore the city signature. In this study, we claim that city matters in sequential POI recommendation and fully exploring city signature can highlight the characteristics of each city and facilitate cross-city complementary learning. To this end, we consider the two-city scenario and propose a <b>d</b>ual-target <b>c</b>ross-city <b>s</b>equential <b>P</b>OI <b>r</b>ecommendation model (DCSPR) to achieve the purpose of complementary learning across cities. On one hand, <span>DCSPR</span> respectively captures <b>geographical and cultural characteristics</b> for each city by mining intra-city regions and intra-city functions of POIs. On the other hand, <span>DCSPR</span> builds <b>a transfer channel</b> between cities based on intra-city functions, and adopts a novel transfer strategy to transfer useful cultural characteristics across cities by mining inter-city functions of POIs. Moreover, to utilize these captured characteristics for sequential POI recommendation, <span>DCSPR</span> involves a new <b>region- and function-aware network</b> for each city to learn transition patterns from multiple views. Extensive experiments conducted on two real-world datasets with four cities demonstrate the effectiveness of <span>DCSPR</span>.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"48 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Providing potential next point-of-interest (POI) suggestions for users has become a prominent task in location-based social networks, which receives more and more attention from the industry and academia, and it remains challenging due to highly dynamic and personalized interactions in user movements. Currently, state-of-the-art works develop various graph- and sequential-based learning methods to model user-POI interactions and transition regularities. However, there are still two significant shortcomings in these works: (1) Ignoring personalized spatial- and temporal-aspect interactive characteristics capable of exhibiting periodic interests of users; (2) Insufficiently leveraging the sequential patterns of interactions for beyond-pairwise high-order collaborative signals among users’ sequences. To jointly address these challenges, we propose a novel multi-view hypergraph learning with spatial-temporal periodic interests for next POI recommendation (MvStHgL). In the local view, we attempt to learn the POI representation of each interaction via jointing periodic characteristics of spatial and temporal aspects. In the global view, we design a hypergraph by regarding interactive sequences as hyperedges to capture high-order collaborative signals across users, for further POI representations. More specifically, the output of POI representations in the local view is used for the initialized embedding, and the aggregation and propagation in the hypergraph are performed by a novel Node-to-Hypergraph-to-Node scheme. Furthermore, the captured POI embeddings are applied to achieve sequential dependency modeling for next POI prediction. Extensive experiments on three real-world datasets demonstrate that our proposed model outperforms the state-of-the-art models.
为用户提供潜在的下一个兴趣点(POI)建议已成为基于位置的社交网络中的一项重要任务,受到业界和学术界越来越多的关注。目前,最先进的作品开发了各种基于图和序列的学习方法,以模拟用户-POI 的交互和过渡规律性。然而,这些研究仍存在两个重大缺陷:(1) 忽视了能够展现用户周期性兴趣的个性化空间和时间方面的交互特征;(2) 没有充分利用交互的序列模式来获取用户序列间的超对等高阶协作信号。为了共同应对这些挑战,我们提出了一种用于下一个 POI 推荐的新型多视图超图学习与时空周期性兴趣(MvStHgL)。在局部视图中,我们试图通过联合空间和时间方面的周期性特征来学习每次交互的 POI 表示。在全局视图中,我们设计了一个超图,将交互序列视为超门,以捕捉用户间的高阶协作信号,从而进一步获得 POI 表示。更具体地说,本地视图中 POI 表示的输出用于初始化嵌入,超图中的聚合和传播则通过新颖的节点到超图到节点方案来完成。此外,捕获的 POI 嵌入应用于下一个 POI 预测的顺序依赖建模。在三个真实世界数据集上进行的广泛实验表明,我们提出的模型优于最先进的模型。
{"title":"MvStHgL: Multi-view Hypergraph Learning with Spatial-temporal Periodic Interests for Next POI Recommendation","authors":"Jingmin An, Ming Gao, Jiafu Tang","doi":"10.1145/3664651","DOIUrl":"https://doi.org/10.1145/3664651","url":null,"abstract":"<p>Providing potential next point-of-interest (POI) suggestions for users has become a prominent task in location-based social networks, which receives more and more attention from the industry and academia, and it remains challenging due to highly dynamic and personalized interactions in user movements. Currently, state-of-the-art works develop various graph- and sequential-based learning methods to model user-POI interactions and transition regularities. However, there are still two significant shortcomings in these works: (1) Ignoring personalized spatial- and temporal-aspect interactive characteristics capable of exhibiting periodic interests of users; (2) Insufficiently leveraging the sequential patterns of interactions for beyond-pairwise high-order collaborative signals among users’ sequences. To jointly address these challenges, we propose a novel multi-view hypergraph learning with spatial-temporal periodic interests for next POI recommendation (MvStHgL). In the local view, we attempt to learn the POI representation of each interaction via jointing periodic characteristics of spatial and temporal aspects. In the global view, we design a hypergraph by regarding interactive sequences as hyperedges to capture high-order collaborative signals across users, for further POI representations. More specifically, the output of POI representations in the local view is used for the initialized embedding, and the aggregation and propagation in the hypergraph are performed by a novel Node-to-Hypergraph-to-Node scheme. Furthermore, the captured POI embeddings are applied to achieve sequential dependency modeling for next POI prediction. Extensive experiments on three real-world datasets demonstrate that our proposed model outperforms the state-of-the-art models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"32 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Session-Based Recommendation (SBR) seeks to predict users’ future item preferences by analyzing their interactions with previously clicked items. In recent approaches, Graph Neural Networks (GNNs) have been commonly applied to capture item relations within a session to infer user intentions. However, these GNN-based methods typically struggle with feature ambiguity between the sequential session information and the item conversion within an item graph, which may impede the model's ability to accurately infer user intentions. In this paper, we propose a novel Multi-hop Multi-view Memory Transformer ((rm{M^{3}T})) to effectively integrate the sequence-view information and relation conversion (graph-view information) of items in a session. First, we propose a Multi-view Memory Transformer ((rm{M^{2}T})) module to concurrently obtain multi-view information of items. Then, a set of trainable memory matrices are employed to store sharable item features, which mitigates cross-view item feature ambiguity. To comprehensively capture latent user intentions, a Multi-hop (rm{M^{2}T}) ((rm{M^{3}T})) framework is designed to integrate user intentions across different hops of an item graph. Specifically, a k-order power method is proposed to manage the item graph to alleviate the over-smoothing problem when obtaining high-order relations of items. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our method.
基于会话的推荐(SBR)旨在通过分析用户与之前点击过的项目之间的互动来预测用户未来的项目偏好。在最近的方法中,图形神经网络(GNN)通常被用于捕捉会话中的项目关系,以推断用户意图。然而,这些基于图神经网络的方法通常难以解决连续会话信息与项目图内项目转换之间的特征模糊性问题,这可能会妨碍模型准确推断用户意图的能力。在本文中,我们提出了一种新颖的多跳多视图记忆转换器(Multi-hop Multi-view Memory Transformer),以有效整合会话中项目的序列视图信息和关系转换(图视图信息)。首先,我们提出了一个多视图记忆转换器(Multi-view Memory Transformer)模块来并发获取项目的多视图信息。然后,采用一组可训练的记忆矩阵来存储可共享的项目特征,从而减轻跨视角项目特征的模糊性。为了全面捕捉潜在用户意图,我们设计了一个多跳(rm{M^{2}T})((rm{M^{3}T}))框架来整合项目图中不同跳的用户意图。具体来说,我们提出了一种 k 阶幂方法来管理项目图,以缓解在获取项目高阶关系时的过度平滑问题。在三个真实世界数据集上进行的广泛实验证明了我们方法的优越性。
{"title":"Multi-hop Multi-view Memory Transformer for Session-based Recommendation","authors":"Xingrui Zhuo, Shengsheng Qian, Jun Hu, Fuxin Dai, Kangyi Lin, Gongqing Wu","doi":"10.1145/3663760","DOIUrl":"https://doi.org/10.1145/3663760","url":null,"abstract":"<p>A <b>S</b>ession-<b>B</b>ased <b>R</b>ecommendation (SBR) seeks to predict users’ future item preferences by analyzing their interactions with previously clicked items. In recent approaches, <b>G</b>raph <b>N</b>eural <b>N</b>etworks (GNNs) have been commonly applied to capture item relations within a session to infer user intentions. However, these GNN-based methods typically struggle with feature ambiguity between the sequential session information and the item conversion within an item graph, which may impede the model's ability to accurately infer user intentions. In this paper, we propose a novel <b>M</b>ulti-hop <b>M</b>ulti-view <b>M</b>emory <b>T</b>ransformer ((rm{M^{3}T})) to effectively integrate the sequence-view information and relation conversion (graph-view information) of items in a session. First, we propose a <b>M</b>ulti-view <b>M</b>emory <b>T</b>ransformer ((rm{M^{2}T})) module to concurrently obtain multi-view information of items. Then, a set of trainable memory matrices are employed to store sharable item features, which mitigates cross-view item feature ambiguity. To comprehensively capture latent user intentions, a <b>M</b>ulti-hop (rm{M^{2}T}) ((rm{M^{3}T})) framework is designed to integrate user intentions across different hops of an item graph. Specifically, a k-order power method is proposed to manage the item graph to alleviate the over-smoothing problem when obtaining high-order relations of items. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our method.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"7 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.
{"title":"Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching","authors":"Haitao Shi, Meng Liu, Xiaoxuan Mu, Xuemeng Song, Yupeng Hu, Liqiang Nie","doi":"10.1145/3662732","DOIUrl":"https://doi.org/10.1145/3662732","url":null,"abstract":"<p>Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"53 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaocong Chen, Siyu Wang, Julian McAuley, Dietmar Jannach, Lina Yao
Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its interactive nature. The training of reinforcement learning-based recommender systems demands expensive online interactions to amass adequate trajectories, essential for agents to learn user preferences. This inefficiency renders reinforcement learning-based recommender systems a formidable undertaking, necessitating the exploration of potential solutions. Recent strides in offline reinforcement learning present a new perspective. Offline reinforcement learning empowers agents to glean insights from offline datasets and deploy learned policies in online settings. Given that recommender systems possess extensive offline datasets, the framework of offline reinforcement learning aligns seamlessly. Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited. This survey aims to introduce and delve into offline reinforcement learning within recommender systems, offering an inclusive review of existing literature in this domain. Furthermore, we strive to underscore prevalent challenges, opportunities, and future pathways, poised to propel research in this evolving field.
{"title":"On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems","authors":"Xiaocong Chen, Siyu Wang, Julian McAuley, Dietmar Jannach, Lina Yao","doi":"10.1145/3661996","DOIUrl":"https://doi.org/10.1145/3661996","url":null,"abstract":"<p>Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its interactive nature. The training of reinforcement learning-based recommender systems demands expensive online interactions to amass adequate trajectories, essential for agents to learn user preferences. This inefficiency renders reinforcement learning-based recommender systems a formidable undertaking, necessitating the exploration of potential solutions. Recent strides in offline reinforcement learning present a new perspective. Offline reinforcement learning empowers agents to glean insights from offline datasets and deploy learned policies in online settings. Given that recommender systems possess extensive offline datasets, the framework of offline reinforcement learning aligns seamlessly. Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited. This survey aims to introduce and delve into offline reinforcement learning within recommender systems, offering an inclusive review of existing literature in this domain. Furthermore, we strive to underscore prevalent challenges, opportunities, and future pathways, poised to propel research in this evolving field.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengyang Shao, Le Wu, Kun Zhang, Defu Lian, Richang Hong, Yong Li, Meng Wang
Recently, the user-side fairness issue in Collaborative Filtering (CF) algorithms has gained considerable attention, arguing that results should not discriminate an individual or a sub user group based on users’ sensitive attributes (e.g., gender). Researchers have proposed fairness-aware CF models by decreasing statistical associations between predictions and sensitive attributes. A more natural idea is to achieve model fairness from a causal perspective. The remaining challenge is that we have no access to interventions, i.e., the counterfactual world that produces recommendations when each user have changed the sensitive attribute value. To this end, we first borrow the Rubin-Neyman potential outcome framework to define average causal effects of sensitive attributes. Then, we show that removing causal effects of sensitive attributes is equal to average counterfactual fairness in CF. Then, we use the propensity re-weighting paradigm to estimate the average causal effects of sensitive attributes and formulate the estimated causal effects as an additional regularization term. To the best of our knowledge, we are one of the first few attempts to achieve counterfactual fairness from the causal effect estimation perspective in CF, which frees us from building sophisticated causal graph. Finally, experiments on three real-world datasets show the superiority of our proposed model.
{"title":"Average User-side Counterfactual Fairness for Collaborative Filtering","authors":"Pengyang Shao, Le Wu, Kun Zhang, Defu Lian, Richang Hong, Yong Li, Meng Wang","doi":"10.1145/3656639","DOIUrl":"https://doi.org/10.1145/3656639","url":null,"abstract":"<p>Recently, the user-side fairness issue in Collaborative Filtering (CF) algorithms has gained considerable attention, arguing that results should not discriminate an individual or a sub user group based on users’ sensitive attributes (e.g., gender). Researchers have proposed fairness-aware CF models by decreasing statistical associations between predictions and sensitive attributes. A more natural idea is to achieve model fairness from a causal perspective. The remaining challenge is that we have no access to interventions, i.e., the counterfactual world that produces recommendations when each user have changed the sensitive attribute value. To this end, we first borrow the Rubin-Neyman potential outcome framework to define average causal effects of sensitive attributes. Then, we show that removing causal effects of sensitive attributes is equal to average counterfactual fairness in CF. Then, we use the propensity re-weighting paradigm to estimate the average causal effects of sensitive attributes and formulate the estimated causal effects as an additional regularization term. To the best of our knowledge, we are one of the first few attempts to achieve counterfactual fairness from the causal effect estimation perspective in CF, which frees us from building sophisticated causal graph. Finally, experiments on three real-world datasets show the superiority of our proposed model.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"63 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document-level relation extraction (RE) aims to simultaneously predict relations (including no-relation cases denoted as NA) between all entity pairs in a document. It is typically formulated as a relation classification task with entities pre-detected in advance and solved by a hard-label training regime, which however neglects the divergence of the NA class and the correlations among other classes. This article introduces progressive self-distillation (PSD), a new training regime that employs online, self-knowledge distillation (KD) to produce and incorporate soft labels for document-level RE. The key idea of PSD is to gradually soften hard labels using past predictions from an RE model itself, which are adjusted adaptively as training proceeds. As such, PSD has to learn only one RE model within a single training pass, requiring no extra computation or annotation to pretrain another high-capacity teacher. PSD is conceptually simple, easy to implement, and generally applicable to various RE models to further improve their performance, without introducing additional parameters or significantly increasing training overheads into the models. It is also a general framework that can be flexibly extended to distilling various types of knowledge, rather than being restricted to soft labels themselves. Extensive experiments on four benchmarking datasets verify the effectiveness and generality of the proposed approach. The code is available at https://github.com/GaoJieCN/psd.
文档级关系提取(RE)的目的是同时预测文档中所有实体对之间的关系(包括无关系情况,以 NA 表示)。它通常被表述为一项关系分类任务,预先检测出实体,并通过硬标签训练机制来解决,但这种训练机制忽略了 NA 类的发散性和其他类之间的相关性。本文介绍了渐进式自我蒸馏(PSD),这是一种新的训练机制,它采用在线自我知识蒸馏(KD)来生成和纳入文档级 RE 的软标签。PSD 的关键理念是利用 RE 模型本身过去的预测来逐步软化硬标签,这些预测会随着训练的进行而进行自适应调整。因此,PSD 只需在单次训练中学习一个 RE 模型,不需要额外的计算或注释来预训另一个高容量教师。PSD 概念简单,易于实现,一般适用于各种 RE 模型,可进一步提高其性能,而不会引入额外参数或显著增加模型的训练开销。它还是一个通用框架,可以灵活扩展到提炼各种类型的知识,而不局限于软标签本身。在四个基准数据集上进行的广泛实验验证了所提方法的有效性和通用性。代码可在 https://github.com/GaoJieCN/psd 上获取。
{"title":"Document-Level Relation Extraction with Progressive Self-Distillation","authors":"Quan Wang, Zhendong Mao, Jie Gao, Yongdong Zhang","doi":"10.1145/3656168","DOIUrl":"https://doi.org/10.1145/3656168","url":null,"abstract":"<p>Document-level relation extraction (RE) aims to simultaneously predict relations (including no-relation cases denoted as NA) between all entity pairs in a document. It is typically formulated as a relation classification task with entities pre-detected in advance and solved by a hard-label training regime, which however neglects the divergence of the NA class and the correlations among other classes. This article introduces <b>progressive self-distillation</b> (PSD), a new training regime that employs online, self-knowledge distillation (KD) to produce and incorporate soft labels for document-level RE. The key idea of PSD is to gradually soften hard labels using past predictions from an RE model itself, which are adjusted adaptively as training proceeds. As such, PSD has to learn only one RE model within a single training pass, requiring no extra computation or annotation to pretrain another high-capacity teacher. PSD is conceptually simple, easy to implement, and generally applicable to various RE models to further improve their performance, without introducing additional parameters or significantly increasing training overheads into the models. It is also a general framework that can be flexibly extended to distilling various types of knowledge, rather than being restricted to soft labels themselves. Extensive experiments on four benchmarking datasets verify the effectiveness and generality of the proposed approach. The code is available at https://github.com/GaoJieCN/psd.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"47 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}