Junfan Chen, Richong Zhang, Xiaohan Jiang, Chunming Hu
Meta-learning has recently promoted few-shot text classification, which identifies target classes based on information transferred from source classes through a series of small tasks or episodes. Existing works constructing their meta-learner on Prototypical Networks need improvement in learning discriminative text representations between similar classes that may lead to conflicts in label prediction. The overfitting problems caused by a few training instances need to be adequately addressed. In addition, efficient episode sampling procedures that could enhance few-shot training should be utilized. To address the problems mentioned above, we first present a contrastive learning framework that simultaneously learns discriminative text representations via supervised contrastive learning while mitigating the overfitting problem via unsupervised contrastive regularization, and then we build an efficient self-paced episode sampling approach on top of it to include more difficult episodes as training progresses. Empirical results on 8 few-shot text classification datasets show that our model outperforms the current state-of-the-art models. The extensive experimental analysis demonstrates that our supervised contrastive representation learning and unsupervised contrastive regularization techniques improve the performance of few-shot text classification. The episode-sampling analysis reveals that our self-paced sampling strategy improves training efficiency.
元学习(Meta-learning)近来推动了少量文本分类(few-shot text classification)的发展,它通过一系列小型任务或事件,根据从源类中传递的信息来识别目标类。在原型网络上构建元学习器的现有作品在学习相似类别之间的区别性文本表征方面需要改进,这可能会导致标签预测中的冲突。少数训练实例导致的过拟合问题也需要充分解决。此外,还应该利用高效的插集采样程序来加强少量训练。为了解决上述问题,我们首先提出了一种对比学习框架,该框架在通过无监督对比正则化减轻过拟合问题的同时,还通过有监督对比学习学习了具有区分性的文本表征,然后我们在此基础上建立了一种高效的自定步调情节采样方法,随着训练的进行,将更多的困难情节纳入其中。在 8 个少量文本分类数据集上的经验结果表明,我们的模型优于目前最先进的模型。广泛的实验分析表明,我们的有监督对比表示学习和无监督对比正则化技术提高了少量文本分类的性能。情节采样分析表明,我们的自定步调采样策略提高了训练效率。
{"title":"SPContrastNet: A Self-Paced Contrastive Learning Model for Few-Shot Text Classification","authors":"Junfan Chen, Richong Zhang, Xiaohan Jiang, Chunming Hu","doi":"10.1145/3652600","DOIUrl":"https://doi.org/10.1145/3652600","url":null,"abstract":"<p>Meta-learning has recently promoted few-shot text classification, which identifies target classes based on information transferred from source classes through a series of small tasks or episodes. Existing works constructing their meta-learner on Prototypical Networks need improvement in learning discriminative text representations between similar classes that may lead to conflicts in label prediction. The overfitting problems caused by a few training instances need to be adequately addressed. In addition, efficient episode sampling procedures that could enhance few-shot training should be utilized. To address the problems mentioned above, we first present a contrastive learning framework that simultaneously learns discriminative text representations via supervised contrastive learning while mitigating the overfitting problem via unsupervised contrastive regularization, and then we build an efficient self-paced episode sampling approach on top of it to include more difficult episodes as training progresses. Empirical results on 8 few-shot text classification datasets show that our model outperforms the current state-of-the-art models. The extensive experimental analysis demonstrates that our supervised contrastive representation learning and unsupervised contrastive regularization techniques improve the performance of few-shot text classification. The episode-sampling analysis reveals that our self-paced sampling strategy improves training efficiency.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fairness has been gradually recognized as a significant problem in the recommendation domain. Previous models usually achieve fairness by reducing the average performance gap between different user groups. However, the average performance may not sufficiently represent all the characteristics of the performances in a user group. Thus, equivalent average performance may not mean the recommender model is fair, for example, the variance of the performances can be different. To alleviate this problem, in this paper, we define a novel type of fairness, where we require that the performance distributions across different user groups should be similar. We prove that with the same performance distribution, the numerical characteristics of the group performance, including the expectation, variance and any higher order moment, are also the same. To achieve distributional fairness, we propose a generative and adversarial training framework. In specific, we regard the recommender model as the generator to compute the performance for each user in different groups, and then we deploy a discriminator to judge which group the performance is drawn from. By iteratively optimizing the generator and the discriminator, we can theoretically prove that the optimal generator (the recommender model) can indeed lead to the equivalent performance distributions. To smooth the adversarial training process, we propose a novel dual curriculum learning strategy for optimal scheduling of training samples. Additionally, we tailor our framework to better suit top-N recommendation tasks by incorporating softened ranking metrics as measures of performance discrepancies. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of our model.
{"title":"Distributional Fairness-aware Recommendation","authors":"Hao Yang, Xian Wu, Zhaopeng Qiu, Yefeng Zheng, Xu Chen","doi":"10.1145/3652854","DOIUrl":"https://doi.org/10.1145/3652854","url":null,"abstract":"<p>Fairness has been gradually recognized as a significant problem in the recommendation domain. Previous models usually achieve fairness by reducing the average performance gap between different user groups. However, the average performance may not sufficiently represent all the characteristics of the performances in a user group. Thus, equivalent average performance may not mean the recommender model is fair, for example, the variance of the performances can be different. To alleviate this problem, in this paper, we define a novel type of fairness, where we require that the performance distributions across different user groups should be similar. We prove that with the same performance distribution, the numerical characteristics of the group performance, including the expectation, variance and any higher order moment, are also the same. To achieve distributional fairness, we propose a generative and adversarial training framework. In specific, we regard the recommender model as the generator to compute the performance for each user in different groups, and then we deploy a discriminator to judge which group the performance is drawn from. By iteratively optimizing the generator and the discriminator, we can theoretically prove that the optimal generator (the recommender model) can indeed lead to the equivalent performance distributions. To smooth the adversarial training process, we propose a novel dual curriculum learning strategy for optimal scheduling of training samples. Additionally, we tailor our framework to better suit top-N recommendation tasks by incorporating softened ranking metrics as measures of performance discrepancies. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of our model.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"143 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, federated recommendation has become a research hotspot mainly because of users’ awareness of privacy in data. As a recent and important recommendation problem, in heterogeneous one-class collaborative filtering (HOCCF), each user may involve of two different types of implicit feedback, i.e., examinations and purchases. So far, privacy-preserving HOCCF has received relatively little attention. Existing federated recommendation works often overlook the fact that some privacy sensitive behaviors such as purchases should be collected to ensure the basic business imperatives in e-commerce for example. Hence, the user privacy constraints can and should be relaxed while deploying a recommendation system in real scenarios. In this paper, we study the federated multi-behavior recommendation problem under the assumption that purchase behaviors can be collected. Moreover, there are two additional challenges that need to be addressed when deploying federated recommendation. One is the low storage capacity for users’ devices to store all the item vectors, and the other is the low computational power for users to participate in federated learning. To release the potential of privacy-preserving HOCCF, we propose a novel framework, named discrete federated multi-behavior recommendation (DFMR), which allows the collection of the business necessary behaviors (i.e., purchases) by the server. As to reduce the storage overhead, we use discrete hashing techniques, which can compress the parameters down to 1.56% of the real-valued parameters. To further improve the computation-efficiency, we design a memorization strategy in the cache updating module to accelerate the training process. Extensive experiments on four public datasets show the superiority of our DFMR in terms of both accuracy and efficiency.
{"title":"Discrete Federated Multi-behavior Recommendation for Privacy-Preserving Heterogeneous One-Class Collaborative Filtering","authors":"Enyue Yang, Weike Pan, Qiang Yang, Zhong Ming","doi":"10.1145/3652853","DOIUrl":"https://doi.org/10.1145/3652853","url":null,"abstract":"<p>Recently, federated recommendation has become a research hotspot mainly because of users’ awareness of privacy in data. As a recent and important recommendation problem, in heterogeneous one-class collaborative filtering (HOCCF), each user may involve of two different types of implicit feedback, i.e., examinations and purchases. So far, privacy-preserving HOCCF has received relatively little attention. Existing federated recommendation works often overlook the fact that some privacy sensitive behaviors such as purchases should be collected to ensure the basic business imperatives in e-commerce for example. Hence, the user privacy constraints can and should be relaxed while deploying a recommendation system in real scenarios. In this paper, we study the federated multi-behavior recommendation problem under the assumption that purchase behaviors can be collected. Moreover, there are two additional challenges that need to be addressed when deploying federated recommendation. One is the low storage capacity for users’ devices to store all the item vectors, and the other is the low computational power for users to participate in federated learning. To release the potential of privacy-preserving HOCCF, we propose a novel framework, named discrete federated multi-behavior recommendation (DFMR), which allows the collection of the business necessary behaviors (i.e., purchases) by the server. As to reduce the storage overhead, we use discrete hashing techniques, which can compress the parameters down to 1.56% of the real-valued parameters. To further improve the computation-efficiency, we design a memorization strategy in the cache updating module to accelerate the training process. Extensive experiments on four public datasets show the superiority of our DFMR in terms of both accuracy and efficiency.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"87 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Event prediction is a vital and challenging task in temporal knowledge graphs (TKGs), which have played crucial roles in various applications. Recently, many graph neural networks based approaches are proposed to model the graph structure information in TKGs. However, these approaches only construct graphs based on quadruplets and model the pairwise correlation between entities, which fail to capture the high-order correlations among entities. To this end, we propose DHyper, a recurrent Dual Hypergraph neural network for event prediction in TKGs, which simultaneously models the influences of both the high-order correlations among entities and among relations. Specifically, a dual hypergraph learning module is proposed to discover the high-order correlations among entities and among relations in a parameterized way. A dual hypergraph message passing network is introduced to perform the information aggregation and representation fusion on the entity hypergraph and the relation hypergraph. Extensive experiments on six real-world datasets demonstrate that DHyper achieves the state-of-the-art performances, outperforming the best baseline by an average of 13.09%, 4.26%, 17.60%, and 18.03% in MRR, Hits@1, Hits@3, and Hits@10, respectively.
{"title":"DHyper: A Recurrent Dual Hypergraph Neural Network for Event Prediction in Temporal Knowledge Graphs","authors":"Xing Tang, Ling Chen, Hongyu Shi, Dandan Lyu","doi":"10.1145/3653015","DOIUrl":"https://doi.org/10.1145/3653015","url":null,"abstract":"<p>Event prediction is a vital and challenging task in temporal knowledge graphs (TKGs), which have played crucial roles in various applications. Recently, many graph neural networks based approaches are proposed to model the graph structure information in TKGs. However, these approaches only construct graphs based on quadruplets and model the pairwise correlation between entities, which fail to capture the high-order correlations among entities. To this end, we propose DHyper, a recurrent <b>D</b>ual <b>Hyper</b>graph neural network for event prediction in TKGs, which simultaneously models the influences of both the high-order correlations among entities and among relations. Specifically, a dual hypergraph learning module is proposed to discover the high-order correlations among entities and among relations in a parameterized way. A dual hypergraph message passing network is introduced to perform the information aggregation and representation fusion on the entity hypergraph and the relation hypergraph. Extensive experiments on six real-world datasets demonstrate that DHyper achieves the state-of-the-art performances, outperforming the best baseline by an average of 13.09%, 4.26%, 17.60%, and 18.03% in MRR, Hits@1, Hits@3, and Hits@10, respectively.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"21 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous studies on sequential recommendation (SR) have predominantly concentrated on optimizing recommendation accuracy. However, there remains a significant gap in enhancing recommendation diversity, particularly for short interaction sequences. The limited availability of interaction information in short sequences hampers the recommender’s ability to comprehensively model users’ intents, consequently affecting both the diversity and accuracy of recommendation. In light of the above challenge, we propose reTrospective and pRospective Transformers for dIversified sEquential Recommendation (TRIER). The TRIER addresses the issue of insufficient information in short interaction sequences by first retrospectively learning to predict users’ potential historical interactions, thereby introducing additional information and expanding short interaction sequences, and then capturing users’ potential intents from multiple augmented sequences. Finally, the TRIER learns to generate diverse recommendation lists by covering as many potential intents as possible.
To evaluate the effectiveness of TRIER, we conduct extensive experiments on three benchmark datasets. The experimental results demonstrate that TRIER significantly outperforms state-of-the-art methods, exhibiting diversity improvement of up to 11.36% in terms of intra-list distance (ILD@5) on the Steam dataset, 3.43% ILD@5 on the Yelp dataset and 3.77% in terms of category coverage (CC@5) on the Beauty dataset. As for accuracy, on the Yelp dataset, we observe notable improvement of 7.62% and 8.63% in HR@5 and NDCG@5, respectively. Moreover, we found that TRIER reveals more significant accuracy and diversity improvement for short interaction sequences.
{"title":"Diversifying Sequential Recommendation with Retrospective and Prospective Transformers","authors":"Chaoyu Shi, Pengjie Ren, Dongjie Fu, Xin Xin, Shansong Yang, Fei Cai, Zhaochun Ren, Zhumin Chen","doi":"10.1145/3653016","DOIUrl":"https://doi.org/10.1145/3653016","url":null,"abstract":"<p>Previous studies on sequential recommendation (SR) have predominantly concentrated on optimizing recommendation accuracy. However, there remains a significant gap in enhancing recommendation diversity, particularly for short interaction sequences. The limited availability of interaction information in short sequences hampers the recommender’s ability to comprehensively model users’ intents, consequently affecting both the diversity and accuracy of recommendation. In light of the above challenge, we propose <i>reTrospective and pRospective Transformers for dIversified sEquential Recommendation</i> (TRIER). The TRIER addresses the issue of insufficient information in short interaction sequences by first retrospectively learning to predict users’ potential historical interactions, thereby introducing additional information and expanding short interaction sequences, and then capturing users’ potential intents from multiple augmented sequences. Finally, the TRIER learns to generate diverse recommendation lists by covering as many potential intents as possible. </p><p>To evaluate the effectiveness of TRIER, we conduct extensive experiments on three benchmark datasets. The experimental results demonstrate that TRIER significantly outperforms state-of-the-art methods, exhibiting diversity improvement of up to 11.36% in terms of intra-list distance (ILD@5) on the Steam dataset, 3.43% ILD@5 on the Yelp dataset and 3.77% in terms of category coverage (CC@5) on the Beauty dataset. As for accuracy, on the Yelp dataset, we observe notable improvement of 7.62% and 8.63% in HR@5 and NDCG@5, respectively. Moreover, we found that TRIER reveals more significant accuracy and diversity improvement for short interaction sequences.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Search result diversification plays a crucial role in improving users’ search experience by providing users with documents covering more subtopics. Previous studies have made great progress in leveraging inter-document interactions to measure the similarity among documents. However, different parts of the document may embody different subtopics and existing models ignore the subtle similarities and differences of content within each document. In this paper, we propose a hierarchical attention framework to combine intra-document interactions with inter-document interactions in a complementary manner in order to conduct multi-grained document modeling. Specifically, we separate the document into passages to model the document content from multi-grained perspectives. Then, we design stacked interaction blocks to conduct inter-document and intra-document interactions. Moreover, to measure the subtopic coverage of each document more accurately, we propose a passage-aware document-subtopic interaction to perform fine-grained document-subtopic interaction. Experimental results demonstrate that our model achieves state-of-the-art performance compared with existing methods.
{"title":"Multi-grained Document Modeling for Search Result Diversification","authors":"Zhirui Deng, Zhicheng Dou, Zhan Su, Ji-Rong Wen","doi":"10.1145/3652852","DOIUrl":"https://doi.org/10.1145/3652852","url":null,"abstract":"<p>Search result diversification plays a crucial role in improving users’ search experience by providing users with documents covering more subtopics. Previous studies have made great progress in leveraging inter-document interactions to measure the similarity among documents. However, different parts of the document may embody different subtopics and existing models ignore the subtle similarities and differences of content within each document. In this paper, we propose a hierarchical attention framework to combine intra-document interactions with inter-document interactions in a complementary manner in order to conduct multi-grained document modeling. Specifically, we separate the document into passages to model the document content from multi-grained perspectives. Then, we design stacked interaction blocks to conduct inter-document and intra-document interactions. Moreover, to measure the subtopic coverage of each document more accurately, we propose a passage-aware document-subtopic interaction to perform fine-grained document-subtopic interaction. Experimental results demonstrate that our model achieves state-of-the-art performance compared with existing methods.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"47 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton
We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N=200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an active wizard that alerted participants to information it could provide or a passive wizard who only answered questions that were provided by the user. The active policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the passive policy. Also, it resulted in 1.7 times more knowledge communicated than the passive policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.
我们介绍了两项实证研究,以调查用户在厨房环境中使用 Alexa 和 Google Home 等数字助理时的期望和行为:首先,一项调查(N=200)询问了参与者对此类系统应能提供的信息种类的期望。虽然对烹饪步骤和流程信息的期望已达成共识,但喜欢烹饪的年轻参与者表示更有可能期望获得有关食物历史或烹饪科学的详细信息。在一项 "向导"(Wizard-of-Oz)的后续研究(N = 48)中,用户在菜谱步骤中的指导可以是主动向导(提醒参与者它可以提供的信息),也可以是被动向导(只回答用户提出的问题)。与被动向导相比,主动向导所产生的对话语句数量几乎是被动向导的两倍,而与知识相关的用户提问数量则是被动向导的 1.5 倍。此外,主动政策所传播的知识也是被动政策的 1.7 倍。我们将在相关工作的背景下讨论这些研究结果,并揭示设计和使用此类烹饪助手及其他用途(如 DIY 和手工任务)的意义,以及我们在评估此类系统时学到的经验。
{"title":"Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing Assistant","authors":"Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton","doi":"10.1145/3649500","DOIUrl":"https://doi.org/10.1145/3649500","url":null,"abstract":"<p>We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N=200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an <i>active</i> wizard that alerted participants to information it could provide or a <i>passive</i> wizard who only answered questions that were provided by the user. The <i>active</i> policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the <i>passive</i> policy. Also, it resulted in 1.7 times more knowledge communicated than the <i>passive</i> policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"8 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological relationships in each individual user behavior sequence, which may not be effective in making accurate and robust recommendations. On one hand, the performance of existing sequential recommendation methods is usually sensitive to the length of a user’s behavior sequence (i.e., the list of a user’s historically interacted items). On the other hand, besides the context information in each individual user behavior sequence, the collaborative information among different users’ behavior sequences is also crucial to make accurate recommendations. However, this kind of information is usually ignored by existing sequential recommendation methods. In this work, we propose a new sequential recommendation framework, which encodes the context information in each individual user behavior sequence as well as the collaborative information among the behavior sequences of different users, through building a local dependency graph for each item. We conduct extensive experiments to compare the proposed model with state-of-the-art sequential recommendation methods on five benchmark datasets. The experimental results demonstrate that the proposed model is able to achieve better recommendation performance than existing methods, by incorporating collaborative information.
{"title":"Collaborative Sequential Recommendations via Multi-View GNN-Transformers","authors":"Tianze Luo, Yong Liu, Sinno Jialin Pan","doi":"10.1145/3649436","DOIUrl":"https://doi.org/10.1145/3649436","url":null,"abstract":"<p>Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological relationships in each individual user behavior sequence, which may not be effective in making accurate and robust recommendations. On one hand, the performance of existing sequential recommendation methods is usually sensitive to the length of a user’s behavior sequence (<i>i.e.</i>, the list of a user’s historically interacted items). On the other hand, besides the context information in each individual user behavior sequence, the collaborative information among different users’ behavior sequences is also crucial to make accurate recommendations. However, this kind of information is usually ignored by existing sequential recommendation methods. In this work, we propose a new sequential recommendation framework, which encodes the context information in each individual user behavior sequence as well as the collaborative information among the behavior sequences of different users, through building a local dependency graph for each item. We conduct extensive experiments to compare the proposed model with state-of-the-art sequential recommendation methods on five benchmark datasets. The experimental results demonstrate that the proposed model is able to achieve better recommendation performance than existing methods, by incorporating collaborative information.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"24 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.
{"title":"Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding","authors":"Yunchang Zhu, Liang Pang, Kangxi Wu, Yanyan Lan, Huawei Shen, Xueqi Cheng","doi":"10.1145/3652599","DOIUrl":"https://doi.org/10.1145/3652599","url":null,"abstract":"<p>Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"116 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge tracing models based on deep learning can achieve impressive predictive performance by leveraging attention mechanisms. However, there still exist two challenges in attentive knowledge tracing: First, the mechanism of classical models of attentive knowledge tracing demonstrates relatively low attention when processing exercise sequences with shifting knowledge concepts, making it difficult to capture the comprehensive state of knowledge across sequences. Second, classical models do not consider stochastic behaviors, which negatively affects models of attentive knowledge tracing in terms of capturing anomalous knowledge states. This paper proposes a model of attentive knowledge tracing, called Enhancing Locality for Attentive Knowledge Tracing (ELAKT), that is a variant of the deep knowledge tracing model. The proposed model leverages the encoder module of the transformer to aggregate knowledge embedding generated by both exercises and responses over all timesteps. In addition, it uses causal convolutions to aggregate and smooth the states of local knowledge. The ELAKT model uses the states of comprehensive knowledge concepts to introduce a prediction correction module to forecast the future responses of students to deal with noise caused by stochastic behaviors. The results of experiments demonstrated that the ELAKT model consistently outperforms state-of-the-art baseline knowledge tracing models.
{"title":"ELAKT: Enhancing Locality for Attentive Knowledge Tracing","authors":"Yanjun Pu, Fang Liu, Rongye Shi, Haitao Yuan, Ruibo Chen, Tianhao Peng, WenJun Wu","doi":"10.1145/3652601","DOIUrl":"https://doi.org/10.1145/3652601","url":null,"abstract":"<p>Knowledge tracing models based on deep learning can achieve impressive predictive performance by leveraging attention mechanisms. However, there still exist two challenges in attentive knowledge tracing: First, the mechanism of classical models of attentive knowledge tracing demonstrates relatively low attention when processing exercise sequences with shifting knowledge concepts, making it difficult to capture the comprehensive state of knowledge across sequences. Second, classical models do not consider stochastic behaviors, which negatively affects models of attentive knowledge tracing in terms of capturing anomalous knowledge states. This paper proposes a model of attentive knowledge tracing, called Enhancing Locality for Attentive Knowledge Tracing (ELAKT), that is a variant of the deep knowledge tracing model. The proposed model leverages the encoder module of the transformer to aggregate knowledge embedding generated by both exercises and responses over all timesteps. In addition, it uses causal convolutions to aggregate and smooth the states of local knowledge. The ELAKT model uses the states of comprehensive knowledge concepts to introduce a prediction correction module to forecast the future responses of students to deal with noise caused by stochastic behaviors. The results of experiments demonstrated that the ELAKT model consistently outperforms state-of-the-art baseline knowledge tracing models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}