Personalized Session-based recommendation (PSBR) is a general and challenging task in the real world, aiming to recommend a session’s next clicked item based on the session’s item transition information and the corresponding user’s historical sessions. A session is defined as a sequence of interacted items during a short period. The PSBR problem has a natural hierarchical architecture in which each session consists of a series of items, and each user owns a series of sessions. However, the existing PSBR methods can merely capture the pairwise relation information within items and users. To effectively capture the hierarchical information, we propose a novel hierarchical hypergraph neural network to model the hierarchical architecture. Moreover, considering that the items in sessions are sequentially ordered, while the hypergraph can only model the set relation, we propose a directed graph aggregator (DGA) to aggregate the sequential information from the directed global item graph. By attentively combining the embeddings of the above two modules, we propose a framework dubbed H3GNN (Hybrid Hierarchical HyperGraph Neural Network). Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed model compared to the state-of-the-art methods, and ablation experiment results validate the effectiveness of all the proposed components.
{"title":"H3GNN: Hybrid Hierarchical HyperGraph Neural Network for Personalized Session-based Recommendation","authors":"Zhizhuo Yin, Kai Han, Pengzi Wang, Xi Zhu","doi":"10.1145/3630002","DOIUrl":"https://doi.org/10.1145/3630002","url":null,"abstract":"Personalized Session-based recommendation (PSBR) is a general and challenging task in the real world, aiming to recommend a session’s next clicked item based on the session’s item transition information and the corresponding user’s historical sessions. A session is defined as a sequence of interacted items during a short period. The PSBR problem has a natural hierarchical architecture in which each session consists of a series of items, and each user owns a series of sessions. However, the existing PSBR methods can merely capture the pairwise relation information within items and users. To effectively capture the hierarchical information, we propose a novel hierarchical hypergraph neural network to model the hierarchical architecture. Moreover, considering that the items in sessions are sequentially ordered, while the hypergraph can only model the set relation, we propose a directed graph aggregator (DGA) to aggregate the sequential information from the directed global item graph. By attentively combining the embeddings of the above two modules, we propose a framework dubbed H3GNN (Hybrid Hierarchical HyperGraph Neural Network). Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed model compared to the state-of-the-art methods, and ablation experiment results validate the effectiveness of all the proposed components.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"25 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135366499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated recommender systems (FedRecs) have been widely explored recently due to their capability to safeguard user data privacy. These systems enable a central server to collaboratively learn recommendation models by sharing public parameters with clients, providing privacy-preserving solutions. However, this collaborative approach also creates a vulnerability that allows adversaries to manipulate FedRecs. Existing works on FedRec security already reveal that items can easily be promoted by malicious users via model poisoning attacks, but all of them mainly focus on FedRecs with only collaborative information (i.e., user-item interactions). We contend that these attacks are effective primarily due to the data sparsity of collaborative signals. In light of this, we propose a method to address data sparsity and model poisoning threats by incorporating product visual information. Intriguingly, our empirical findings demonstrate that the inclusion of visual information renders all existing model poisoning attacks ineffective. Nevertheless, the integration of visual information also introduces a new avenue for adversaries to manipulate federated recommender systems, as this information typically originates from external sources. To assess such threats, we propose a novel form of poisoning attack tailored for visually-aware FedRecs, namely image poisoning attacks, where adversaries can gradually modify the uploaded image with human-unaware perturbations to manipulate item ranks during the FedRecs’ training process. Moreover, we provide empirical evidence showcasing a heightened threat when image poisoning attacks are combined with model poisoning attacks, resulting in easier manipulation of the federated recommendation systems. To ensure the safe utilization of visual information, we employ a diffusion model in visually-aware FedRecs to purify each uploaded image and detect the adversarial images. Extensive experiments conducted with two FedRecs on two datasets demonstrate the effectiveness and generalization of our proposed attacks and defenses.
{"title":"Manipulating Visually-aware Federated Recommender Systems and Its Countermeasures","authors":"Wei Yuan, Shilong Yuan, Chaoqun Yang, Quoc Viet Hung Nguyen, Hongzhi Yin","doi":"10.1145/3630005","DOIUrl":"https://doi.org/10.1145/3630005","url":null,"abstract":"Federated recommender systems (FedRecs) have been widely explored recently due to their capability to safeguard user data privacy. These systems enable a central server to collaboratively learn recommendation models by sharing public parameters with clients, providing privacy-preserving solutions. However, this collaborative approach also creates a vulnerability that allows adversaries to manipulate FedRecs. Existing works on FedRec security already reveal that items can easily be promoted by malicious users via model poisoning attacks, but all of them mainly focus on FedRecs with only collaborative information (i.e., user-item interactions). We contend that these attacks are effective primarily due to the data sparsity of collaborative signals. In light of this, we propose a method to address data sparsity and model poisoning threats by incorporating product visual information. Intriguingly, our empirical findings demonstrate that the inclusion of visual information renders all existing model poisoning attacks ineffective. Nevertheless, the integration of visual information also introduces a new avenue for adversaries to manipulate federated recommender systems, as this information typically originates from external sources. To assess such threats, we propose a novel form of poisoning attack tailored for visually-aware FedRecs, namely image poisoning attacks, where adversaries can gradually modify the uploaded image with human-unaware perturbations to manipulate item ranks during the FedRecs’ training process. Moreover, we provide empirical evidence showcasing a heightened threat when image poisoning attacks are combined with model poisoning attacks, resulting in easier manipulation of the federated recommendation systems. To ensure the safe utilization of visual information, we employ a diffusion model in visually-aware FedRecs to purify each uploaded image and detect the adversarial images. Extensive experiments conducted with two FedRecs on two datasets demonstrate the effectiveness and generalization of our proposed attacks and defenses.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"29 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135366638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Surong Yan, Chenglong Shi, Haosen Wang, Lei Chen, Ling Jiang, Ruilin Guo, Kwei-Jay Lin
Casting sequential recommendation (SR) as a reinforcement learning (RL) problem is promising and some RL-based methods have been proposed for SR. However, these models are sub-optimal due to the following limitations: a) they fail to leverage the supervision signals in the RL training to capture users’ explicit preferences, leading to slow convergence; and b) they do not utilize auxiliary information (e.g., knowledge graph) to avoid blindness when exploring users’ potential interests. To address the above-mentioned limitations, we propose a multiplex information-guided RL model (MELOD), which employs a novel RL training framework with Teach and Explore components for SR. We adopt a Teach component to accurately capture users’ explicit preferences and speed up RL convergence. Meanwhile, we design a dynamic intent induction network (DIIN) as a policy function to generate diverse predictions. We utilize the DIIN for the Explore component to mine users’ potential interests by conducting a sequential and knowledge information joint-guided exploration. Moreover, a sequential and knowledge-aware reward function is designed to achieve stable RL training. These components significantly improve MELOD’s performance and convergence against existing RL algorithms to achieve effectiveness and efficiency. Experimental results on seven real-world datasets show that our model significantly outperforms state-of-the-art methods.
{"title":"Teach and Explore: A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation","authors":"Surong Yan, Chenglong Shi, Haosen Wang, Lei Chen, Ling Jiang, Ruilin Guo, Kwei-Jay Lin","doi":"10.1145/3630003","DOIUrl":"https://doi.org/10.1145/3630003","url":null,"abstract":"Casting sequential recommendation (SR) as a reinforcement learning (RL) problem is promising and some RL-based methods have been proposed for SR. However, these models are sub-optimal due to the following limitations: a) they fail to leverage the supervision signals in the RL training to capture users’ explicit preferences, leading to slow convergence; and b) they do not utilize auxiliary information (e.g., knowledge graph) to avoid blindness when exploring users’ potential interests. To address the above-mentioned limitations, we propose a multiplex information-guided RL model (MELOD), which employs a novel RL training framework with Teach and Explore components for SR. We adopt a Teach component to accurately capture users’ explicit preferences and speed up RL convergence. Meanwhile, we design a dynamic intent induction network (DIIN) as a policy function to generate diverse predictions. We utilize the DIIN for the Explore component to mine users’ potential interests by conducting a sequential and knowledge information joint-guided exploration. Moreover, a sequential and knowledge-aware reward function is designed to achieve stable RL training. These components significantly improve MELOD’s performance and convergence against existing RL algorithms to achieve effectiveness and efficiency. Experimental results on seven real-world datasets show that our model significantly outperforms state-of-the-art methods.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"5 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135366775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An Zhang, Wenchang Ma, Jingnan Zheng, Xiang Wang, Tat-Seng Chua
In leading collaborative filtering (CF) models, representations of users and items are prone to learn popularity bias in the training data as shortcuts. The popularity shortcut tricks are good for in-distribution (ID) performance but poorly generalized to out-of-distribution (OOD) data, i.e., when popularity distribution of test data shifts w.r.t. the training one. To close the gap, debiasing strategies try to assess the shortcut degrees and mitigate them from the representations. However, there exist two deficiencies: (1) when measuring the shortcut degrees, most strategies only use statistical metrics on a single aspect (i.e., item frequency on item and user frequency on user aspect), failing to accommodate the compositional degree of a user-item pair; (2) when mitigating shortcuts, many strategies assume that the test distribution is known in advance. This results in low-quality debiased representations. Worse still, these strategies achieve OOD generalizability with a sacrifice on ID performance. In this work, we present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without any assumptions on the test data. It first learns a shortcut model, which yields a shortcut degree of a user-item pair based on their popularity representations. Then, it trains the CF model by adjusting the predictions with the interaction-wise shortcut degrees. By taking both causal- and information-theoretical looks at PopGo, we can justify why it encourages the CF model to capture the critical popularity-agnostic features while leaving the spurious popularity-relevant patterns out. We use PopGo to debias two high-performing CF models (MF, LightGCN) on four benchmark datasets. On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies (e.g., DICE, MACR).
{"title":"Robust Collaborative Filtering to Popularity Distribution Shift","authors":"An Zhang, Wenchang Ma, Jingnan Zheng, Xiang Wang, Tat-Seng Chua","doi":"10.1145/3627159","DOIUrl":"https://doi.org/10.1145/3627159","url":null,"abstract":"In leading collaborative filtering (CF) models, representations of users and items are prone to learn popularity bias in the training data as shortcuts. The popularity shortcut tricks are good for in-distribution (ID) performance but poorly generalized to out-of-distribution (OOD) data, i.e., when popularity distribution of test data shifts w.r.t. the training one. To close the gap, debiasing strategies try to assess the shortcut degrees and mitigate them from the representations. However, there exist two deficiencies: (1) when measuring the shortcut degrees, most strategies only use statistical metrics on a single aspect (i.e., item frequency on item and user frequency on user aspect), failing to accommodate the compositional degree of a user-item pair; (2) when mitigating shortcuts, many strategies assume that the test distribution is known in advance. This results in low-quality debiased representations. Worse still, these strategies achieve OOD generalizability with a sacrifice on ID performance. In this work, we present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without any assumptions on the test data. It first learns a shortcut model, which yields a shortcut degree of a user-item pair based on their popularity representations. Then, it trains the CF model by adjusting the predictions with the interaction-wise shortcut degrees. By taking both causal- and information-theoretical looks at PopGo, we can justify why it encourages the CF model to capture the critical popularity-agnostic features while leaving the spurious popularity-relevant patterns out. We use PopGo to debias two high-performing CF models (MF, LightGCN) on four benchmark datasets. On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies (e.g., DICE, MACR).","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136013633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training , and 2) ignore the textual context related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
{"title":"Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model","authors":"Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, Liqiang Nie","doi":"10.1145/3606368","DOIUrl":"https://doi.org/10.1145/3606368","url":null,"abstract":"Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training , and 2) ignore the textual context related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135347473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Legal case retrieval is a special Information Retrieval (IR) task focusing on legal case documents. Depending on the downstream tasks of the retrieved case documents, users’ information needs in legal case retrieval could be significantly different from those in Web search and traditional ad-hoc retrieval tasks. While there are several studies that retrieve legal cases based on text similarity, the underlying search intents of legal retrieval users, as shown in this paper, are more complicated than that yet mostly unexplored. To this end, we present a novel hierarchical intent taxonomy of legal case retrieval. It consists of five intent types categorized by three criteria, i.e., search for Particular Case(s) , Characterization , Penalty , Procedure , and Interest . The taxonomy was constructed transparently and evaluated extensively through interviews, editorial user studies, and query log analysis. Through a laboratory user study, we reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval. Furthermore, we apply the proposed taxonomy to various downstream legal retrieval tasks, e.g., result ranking and satisfaction prediction, and demonstrate its effectiveness. Our work provides important insights into the understanding of user intents in legal case retrieval and potentially leads to better retrieval techniques in the legal domain, such as intent-aware ranking strategies and evaluation methodologies.
{"title":"An Intent Taxonomy of Legal Case Retrieval","authors":"Yunqiu Shao, Haitao Li, Yueyue Wu, Yiqun Liu, Qingyao Ai, Jiaxin Mao, Yixiao Ma, Shaoping Ma","doi":"10.1145/3626093","DOIUrl":"https://doi.org/10.1145/3626093","url":null,"abstract":"Legal case retrieval is a special Information Retrieval (IR) task focusing on legal case documents. Depending on the downstream tasks of the retrieved case documents, users’ information needs in legal case retrieval could be significantly different from those in Web search and traditional ad-hoc retrieval tasks. While there are several studies that retrieve legal cases based on text similarity, the underlying search intents of legal retrieval users, as shown in this paper, are more complicated than that yet mostly unexplored. To this end, we present a novel hierarchical intent taxonomy of legal case retrieval. It consists of five intent types categorized by three criteria, i.e., search for Particular Case(s) , Characterization , Penalty , Procedure , and Interest . The taxonomy was constructed transparently and evaluated extensively through interviews, editorial user studies, and query log analysis. Through a laboratory user study, we reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval. Furthermore, we apply the proposed taxonomy to various downstream legal retrieval tasks, e.g., result ranking and satisfaction prediction, and demonstrate its effectiveness. Our work provides important insights into the understanding of user intents in legal case retrieval and potentially leads to better retrieval techniques in the legal domain, such as intent-aware ranking strategies and evaluation methodologies.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135132323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized Query Expansion, the task of expanding queries with additional terms extracted from the user-related vocabulary, is a well-known solution to improve the retrieval performance of a system w.r.t. short queries. Recent approaches rely on word embeddings to select expansion terms from user-related texts. Although delivering promising results with former word embedding techniques, we argue that these methods are not suited for contextual word embeddings, which produce a unique vector representation for each term occurrence. In this article, we propose a Personalized Query Expansion method designed to solve the issues arising from the use of contextual word embeddings with the current Personalized Query Expansion approaches based on word embeddings. Specifically, we employ a clustering-based procedure to identify the terms that better represent the user interests and to improve the diversity of those selected for expansion, achieving improvements up to 4% w.r.t. the best-performing baseline in terms of MAP@100. Moreover, our approach outperforms previous ones in terms of efficiency, allowing us to achieve sub-millisecond expansion times even in data-rich scenarios. Finally, we introduce a novel metric to evaluate the expansion terms diversity and empirically show the unsuitability of previous approaches based on word embeddings when employed along with contextual word embeddings, which cause the selection of semantically overlapping expansion terms.
{"title":"Personalized Query Expansion with Contextual Word Embeddings","authors":"Elias Bassani, Nicola Tonellotto, Gabriella Pasi","doi":"10.1145/3624988","DOIUrl":"https://doi.org/10.1145/3624988","url":null,"abstract":"Personalized Query Expansion, the task of expanding queries with additional terms extracted from the user-related vocabulary, is a well-known solution to improve the retrieval performance of a system w.r.t. short queries. Recent approaches rely on word embeddings to select expansion terms from user-related texts. Although delivering promising results with former word embedding techniques, we argue that these methods are not suited for contextual word embeddings, which produce a unique vector representation for each term occurrence. In this article, we propose a Personalized Query Expansion method designed to solve the issues arising from the use of contextual word embeddings with the current Personalized Query Expansion approaches based on word embeddings. Specifically, we employ a clustering-based procedure to identify the terms that better represent the user interests and to improve the diversity of those selected for expansion, achieving improvements up to 4% w.r.t. the best-performing baseline in terms of MAP@100. Moreover, our approach outperforms previous ones in terms of efficiency, allowing us to achieve sub-millisecond expansion times even in data-rich scenarios. Finally, we introduce a novel metric to evaluate the expansion terms diversity and empirically show the unsuitability of previous approaches based on word embeddings when employed along with contextual word embeddings, which cause the selection of semantically overlapping expansion terms.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yupeng Hu, Kun Wang, Meng Liu, Haoyu Tang, Liqiang Nie
Localizing a desired moment within an untrimmed video via a given natural language query, i.e., cross-modal moment localization, has attracted widespread research attention recently. However, it is a challenging task because it requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal semantic correlations (consistency and complementarity). Existing efforts mainly focus on intra-modal semantic understanding and inter-modal semantic alignment, while ignoring necessary semantic supplement. Consequently, we present a cross-modal semantic perception network for more effective intra-modal semantic understanding and inter-modal semantic collaboration. Concretely, we design a dual-path representation network for intra-modal semantic modeling. Meanwhile, we develop a semantic collaborative network to achieve multi-granularity semantic alignment and hierarchical semantic supplement. Thereby, effective moment localization can be achieved based on sufficient semantic collaborative learning. Extensive comparison experiments demonstrate the promising performance of our model compared with existing state-of-the-art competitors.
{"title":"Semantic Collaborative Learning for Cross-Modal Moment Localization","authors":"Yupeng Hu, Kun Wang, Meng Liu, Haoyu Tang, Liqiang Nie","doi":"10.1145/3620669","DOIUrl":"https://doi.org/10.1145/3620669","url":null,"abstract":"Localizing a desired moment within an untrimmed video via a given natural language query, i.e., cross-modal moment localization, has attracted widespread research attention recently. However, it is a challenging task because it requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal semantic correlations (consistency and complementarity). Existing efforts mainly focus on intra-modal semantic understanding and inter-modal semantic alignment, while ignoring necessary semantic supplement. Consequently, we present a cross-modal semantic perception network for more effective intra-modal semantic understanding and inter-modal semantic collaboration. Concretely, we design a dual-path representation network for intra-modal semantic modeling. Meanwhile, we develop a semantic collaborative network to achieve multi-granularity semantic alignment and hierarchical semantic supplement. Thereby, effective moment localization can be achieved based on sufficient semantic collaborative learning. Extensive comparison experiments demonstrate the promising performance of our model compared with existing state-of-the-art competitors.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43407668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Wang, Yanmin Zhu, Tianzi Zang, Chunyang Wang, Kuan Liu, Peibo Ma
Review-based recommender systems explore semantic aspects of users’ preferences by incorporating user-generated reviews into rating-based models. Recent works have demonstrated the potential of review information to improve the recommendation capacity. However, most existing studies rely on optimizing review-based representation learning part, thus failing to explicitly capture the fine-grained semantic aspects, and also ignoring the intrinsic correlation between ratings and reviews. To address these problems, we propose a multi-aspect graph contrastive learning framework, named MAGCL, with three distinctive designs: (i) a multi-aspect representation learning module, which projects semantic relations to different subspaces by decoupling review information, and then obtains high-order decoupled representations in each aspect via graph encoder. (ii) the contrastive learning module performs graph contrastive learning to capture the correlation between rating and review patterns, which utilize unlabeled data to generate self-supervised signals, in turn, relieve the data sparsity problem of supervision signals. (iii) the multi-task learning module conducts joint training to learn high-order structure-aware yet self-discriminative node representations by combining recommendation task and self-supervised task, which helps alleviate the over-smoothing problem. Extensive experiments are conducted on four real-world review datasets and the results show the superiority of the proposed framework MAGCL compared with several state-of-the-arts. We also provide further analysis on multi-aspect representations and graph contrastive learning to verify the advantage of proposed framework.
{"title":"Multi-aspect Graph Contrastive Learning for Review-enhanced Recommendation","authors":"K. Wang, Yanmin Zhu, Tianzi Zang, Chunyang Wang, Kuan Liu, Peibo Ma","doi":"10.1145/3618106","DOIUrl":"https://doi.org/10.1145/3618106","url":null,"abstract":"Review-based recommender systems explore semantic aspects of users’ preferences by incorporating user-generated reviews into rating-based models. Recent works have demonstrated the potential of review information to improve the recommendation capacity. However, most existing studies rely on optimizing review-based representation learning part, thus failing to explicitly capture the fine-grained semantic aspects, and also ignoring the intrinsic correlation between ratings and reviews. To address these problems, we propose a multi-aspect graph contrastive learning framework, named MAGCL, with three distinctive designs: (i) a multi-aspect representation learning module, which projects semantic relations to different subspaces by decoupling review information, and then obtains high-order decoupled representations in each aspect via graph encoder. (ii) the contrastive learning module performs graph contrastive learning to capture the correlation between rating and review patterns, which utilize unlabeled data to generate self-supervised signals, in turn, relieve the data sparsity problem of supervision signals. (iii) the multi-task learning module conducts joint training to learn high-order structure-aware yet self-discriminative node representations by combining recommendation task and self-supervised task, which helps alleviate the over-smoothing problem. Extensive experiments are conducted on four real-world review datasets and the results show the superiority of the proposed framework MAGCL compared with several state-of-the-arts. We also provide further analysis on multi-aspect representations and graph contrastive learning to verify the advantage of proposed framework.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47031563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Shi, Quanliang Liu, Hong Xie, Di Wu, Boxin Peng, Mingsheng Shang, Defu Lian
While personalization increases the utility of item recommendation, it also suffers from the issue of popularity bias. However, previous methods emphasize adopting supervised learning models to relieve popularity bias in the static recommendation, ignoring the dynamic transfer of user preference and amplification effects of the feedback loop in the recommender system (RS). In this paper, we focus on studying this issue in the interactive recommendation. We argue that diversification and novelty are both equally crucial for improving user satisfaction of IRS in the aforementioned setting. To achieve this goal, we propose a Diversity-Novelty-aware Interactive Recommendation framework (DNaIR) that augments offline reinforcement learning (RL) to increase the exposure rate of long-tail items with high quality. Its main idea is first to aggregate the item similarity, popularity, and quality into the reward model to help the planning of RL policy. It then designs a diversity-aware stochastic action generator to achieve an efficient and lightweight DNaIR algorithm. Extensive experiments are conducted on the three real-world datasets and an authentic RL environment (Virtual-Taobao). The experiments show that our model can better and full use of the long-tail items to improve recommendation satisfaction, especially those low popularity items with high-quality ones, thus achieving state-of-the-art performance.
{"title":"Relieving Popularity Bias in Interactive Recommendation: A Diversity-Novelty-Aware Reinforcement Learning Approach","authors":"Xiaoyu Shi, Quanliang Liu, Hong Xie, Di Wu, Boxin Peng, Mingsheng Shang, Defu Lian","doi":"10.1145/3618107","DOIUrl":"https://doi.org/10.1145/3618107","url":null,"abstract":"While personalization increases the utility of item recommendation, it also suffers from the issue of popularity bias. However, previous methods emphasize adopting supervised learning models to relieve popularity bias in the static recommendation, ignoring the dynamic transfer of user preference and amplification effects of the feedback loop in the recommender system (RS). In this paper, we focus on studying this issue in the interactive recommendation. We argue that diversification and novelty are both equally crucial for improving user satisfaction of IRS in the aforementioned setting. To achieve this goal, we propose a Diversity-Novelty-aware Interactive Recommendation framework (DNaIR) that augments offline reinforcement learning (RL) to increase the exposure rate of long-tail items with high quality. Its main idea is first to aggregate the item similarity, popularity, and quality into the reward model to help the planning of RL policy. It then designs a diversity-aware stochastic action generator to achieve an efficient and lightweight DNaIR algorithm. Extensive experiments are conducted on the three real-world datasets and an authentic RL environment (Virtual-Taobao). The experiments show that our model can better and full use of the long-tail items to improve recommendation satisfaction, especially those low popularity items with high-quality ones, thus achieving state-of-the-art performance.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43544461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}