Fei Liu, Chenyang Bu, Haotian Zhang, Le Wu, Kui Yu, Xuegang Hu
In educational data mining, knowledge tracing (KT) aims to model learning performance based on student knowledge mastery. Deep-learning-based KT models perform remarkably better than traditional KT and have attracted considerable attention. However, most of them lack interpretability, making it challenging to explain why the model performed well in the prediction. In this paper, we propose an interpretable deep KT model, referred to as fuzzy deep knowledge tracing (FDKT) via fuzzy reasoning. Specifically, we formalize continuous scores into several fuzzy scores using the fuzzification module. Then, we input the fuzzy scores into the fuzzy reasoning module (FRM). FRM is designed to deduce the current cognitive ability, based on which the future performance was predicted. FDKT greatly enhanced the intrinsic interpretability of deep-learning-based KT through the interpretation of the deduction of student cognition. Furthermore, it broadened the application of KT to continuous scores. Improved performance with regard to both the advantages of FDKT was demonstrated through comparisons with the state-of-the-art models.
{"title":"FDKT: Towards an interpretable deep knowledge tracing via fuzzy reasoning","authors":"Fei Liu, Chenyang Bu, Haotian Zhang, Le Wu, Kui Yu, Xuegang Hu","doi":"10.1145/3656167","DOIUrl":"https://doi.org/10.1145/3656167","url":null,"abstract":"<p>In educational data mining, knowledge tracing (KT) aims to model learning performance based on student knowledge mastery. Deep-learning-based KT models perform remarkably better than traditional KT and have attracted considerable attention. However, most of them lack interpretability, making it challenging to explain why the model performed well in the prediction. In this paper, we propose an interpretable deep KT model, referred to as fuzzy deep knowledge tracing (FDKT) via fuzzy reasoning. Specifically, we formalize continuous scores into several fuzzy scores using the fuzzification module. Then, we input the fuzzy scores into the fuzzy reasoning module (FRM). FRM is designed to deduce the current cognitive ability, based on which the future performance was predicted. FDKT greatly enhanced the intrinsic interpretability of deep-learning-based KT through the interpretation of the deduction of student cognition. Furthermore, it broadened the application of KT to continuous scores. Improved performance with regard to both the advantages of FDKT was demonstrated through comparisons with the state-of-the-art models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"46 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generating appropriate emotions for responses is essential for dialog systems to provide human-like interaction in various application scenarios. Most previous dialog systems tried to achieve this goal by learning empathetic manners from anonymous conversational data. However, emotional responses generated by those methods may be inconsistent, which will decrease user engagement and service quality. Psychological findings suggest that the emotional expressions of humans are rooted in personality traits. Therefore, we propose a new task, Personality-affected Emotion Generation, to generate emotion based on the personality given to the dialog system and further investigate a solution through the personality-affected mood transition. Specifically, we first construct a daily dialog dataset, Personality EmotionLines Dataset (PELD), with emotion and personality annotations. Subsequently, we analyze the challenges in this task, i.e., (1) heterogeneously integrating personality and emotional factors and (2) extracting multi-granularity emotional information in the dialog context. Finally, we propose to model the personality as the transition weight by simulating the mood transition process in the dialog system and solve the challenges above. We conduct extensive experiments on PELD for evaluation. Results suggest that by adopting our method, the emotion generation performance is improved by 13% in macro-F1 and 5% in weighted-F1 from the BERT-base model.
对话系统要想在各种应用场景中提供类似人类的交互,就必须为回应生成适当的情感。以前的大多数对话系统都试图通过从匿名对话数据中学习移情礼仪来实现这一目标。然而,这些方法产生的情绪反应可能会不一致,从而降低用户参与度和服务质量。心理学研究结果表明,人类的情感表达源于个性特征。因此,我们提出了一项新任务--"受个性影响的情绪生成",根据对话系统的个性生成情绪,并通过受个性影响的情绪转换进一步研究解决方案。具体来说,我们首先构建了一个包含情感和个性注释的日常对话数据集--个性情感线数据集(PELD)。随后,我们分析了这一任务所面临的挑战,即:(1)异构整合个性和情感因素;(2)提取对话语境中的多粒度情感信息。最后,我们提出通过模拟对话系统中的情绪转换过程,将个性作为转换权重建模,从而解决上述难题。我们在 PELD 上进行了广泛的实验评估。结果表明,采用我们的方法,在宏 F1 和加权 F1 中,情感生成性能分别比基于 BERT 的模型提高了 13% 和 5%。
{"title":"Personality-affected Emotion Generation in Dialog Systems","authors":"Zhiyuan Wen, Jiannong Cao, Jiaxing Shen, Ruosong Yang, Shuaiqi Liu, Maosong Sun","doi":"10.1145/3655616","DOIUrl":"https://doi.org/10.1145/3655616","url":null,"abstract":"<p>Generating appropriate emotions for responses is essential for dialog systems to provide human-like interaction in various application scenarios. Most previous dialog systems tried to achieve this goal by learning empathetic manners from anonymous conversational data. However, emotional responses generated by those methods may be inconsistent, which will decrease user engagement and service quality. Psychological findings suggest that the emotional expressions of humans are rooted in personality traits. Therefore, we propose a new task, Personality-affected Emotion Generation, to generate emotion based on the personality given to the dialog system and further investigate a solution through the personality-affected mood transition. Specifically, we first construct a daily dialog dataset, Personality EmotionLines Dataset (<b>PELD</b>), with emotion and personality annotations. Subsequently, we analyze the challenges in this task, <i>i.e.</i>, (1) heterogeneously integrating personality and emotional factors and (2) extracting multi-granularity emotional information in the dialog context. Finally, we propose to model the personality as the transition weight by simulating the mood transition process in the dialog system and solve the challenges above. We conduct extensive experiments on PELD for evaluation. Results suggest that by adopting our method, the emotion generation performance is improved by <b>13%</b> in macro-F1 and <b>5%</b> in weighted-F1 from the BERT-base model.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-domain Named Entity Recognition (NER) transfers knowledge learned from a rich-resource source domain to improve the learning in a low-resource target domain. Most existing works are designed based on the sequence labeling framework, defining entity detection and type prediction as a monolithic process. However, they typically ignore the discrepant transferability of these two sub-tasks: the former locating spans corresponding to entities is largely domain-robust, while the latter owns distinct entity types across domains. Combining them into an entangled learning problem may contribute to the complexity of domain transfer. In this work, we propose the novel divide-and-transfer paradigm in which different sub-tasks are learned using separate functional modules for respective cross-domain transfer. To demonstrate the effectiveness of divide-and-transfer, we concretely implement two NER frameworks by applying this paradigm with different cross-domain transfer strategies. Experimental results on 10 different domain pairs show the notable superiority of our proposed frameworks. Experimental analyses indicate that significant advantages of the divide-and-transfer paradigm over prior monolithic ones originate from its better performance on low-resource data and a much greater transferability. It gives us a new insight into cross-domain NER. Our code is available at our github.
跨领域命名实体识别(NER)将从资源丰富的源领域学习到的知识转移到资源匮乏的目标领域,从而提高学习效率。现有的大多数工作都是基于序列标注框架设计的,将实体检测和类型预测定义为一个整体过程。然而,它们通常忽略了这两个子任务的不同可转移性:前者定位与实体相对应的跨度在很大程度上是不受领域限制的,而后者则拥有跨领域的不同实体类型。将它们结合成一个纠缠不清的学习问题可能会增加领域转移的复杂性。在这项工作中,我们提出了新颖的 "分而治之 "范式,即使用不同的功能模块学习不同的子任务,以实现各自的跨领域转移。为了证明 "分割-转移 "的有效性,我们采用不同的跨域转移策略,具体实施了两个 NER 框架。在 10 个不同域对上的实验结果表明,我们提出的框架具有显著的优越性。实验分析表明,与之前的单一范式相比,"分割-转移 "范式的显著优势在于其在低资源数据上的更好性能和更高的可转移性。它让我们对跨域 NER 有了新的认识。我们的代码可在 github 上获取。
{"title":"Cross-domain NER under a Divide-and-Transfer Paradigm","authors":"Xinghua Zhang, Bowen Yu, Xin Cong, Taoyu Su, Quangang Li, Tingwen Liu, Hongbo Xu","doi":"10.1145/3655618","DOIUrl":"https://doi.org/10.1145/3655618","url":null,"abstract":"<p>Cross-domain Named Entity Recognition (NER) transfers knowledge learned from a rich-resource source domain to improve the learning in a low-resource target domain. Most existing works are designed based on the sequence labeling framework, defining entity detection and type prediction as a monolithic process. However, they typically ignore the discrepant transferability of these two sub-tasks: the former locating spans corresponding to entities is largely domain-robust, while the latter owns distinct entity types across domains. Combining them into an entangled learning problem may contribute to the complexity of domain transfer. In this work, we propose the novel divide-and-transfer paradigm in which different sub-tasks are learned using separate functional modules for respective cross-domain transfer. To demonstrate the effectiveness of divide-and-transfer, we concretely implement two NER frameworks by applying this paradigm with different cross-domain transfer strategies. Experimental results on 10 different domain pairs show the notable superiority of our proposed frameworks. Experimental analyses indicate that significant advantages of the divide-and-transfer paradigm over prior monolithic ones originate from its better performance on low-resource data and a much greater transferability. It gives us a new insight into cross-domain NER. Our code is available at our github.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User behavior data, such as ratings and clicks, has been widely used to build personalizing models for recommender systems. However, many unflattering factors (e.g., popularity, ranking position, users’ selection) significantly affect the performance of the learned recommendation model. Most existing work on unbiased recommendation addressed these biases from sample granularity (e.g., sample reweighting, data augmentation) or from the perspective of representation learning (e.g., bias-modeling). However, these methods are usually designed for a specific bias, lacking the universal capability to handle complex situations where multiple biases co-exist. Besides, rare work frees itself from laborious and sophisticated debiasing configurations (e.g., propensity scores, imputed values, or user behavior-generating process).
Towards this research gap, in this paper, we propose a universal Generative framework for Bias Disentanglement termed as GBD, constantly generating calibration perturbations for the intermediate representations during training to keep them from being affected by the bias. Specifically, a bias-identifier that tries to retrieve the bias-related information from the representations is first introduced. Subsequently, the calibration perturbations are generated to significantly deteriorate the bias-identifier’s performance, making the bias gradually disentangled from the calibrated representations. Therefore, without relying on notorious debiasing configurations, a bias-agnostic model is obtained under the guidance of the bias identifier. We further present its universality by subsuming the representative biases and their mixture under the proposed framework. Finally, extensive experiments on the real-world, synthetic, and semi-synthetic datasets have demonstrated the superiority of the proposed approach against a wide range of recommendation debiasing methods. The code is available at https://github.com/Zhidan-Wang/GBD.
{"title":"Toward Bias-Agnostic Recommender Systems: A Universal Generative Framework","authors":"Zhidan Wang, Lixin Zou, Chenliang Li, Shuaiqiang Wang, Xu Chen, Dawei Yin, Weidong Liu","doi":"10.1145/3655617","DOIUrl":"https://doi.org/10.1145/3655617","url":null,"abstract":"<p>User behavior data, such as ratings and clicks, has been widely used to build personalizing models for recommender systems. However, many unflattering factors (e.g., popularity, ranking position, users’ selection) significantly affect the performance of the learned recommendation model. Most existing work on unbiased recommendation addressed these biases from sample granularity (e.g., sample reweighting, data augmentation) or from the perspective of representation learning (e.g., bias-modeling). However, these methods are usually designed for a specific bias, lacking the universal capability to handle complex situations where multiple biases co-exist. Besides, rare work frees itself from laborious and sophisticated debiasing configurations (e.g., propensity scores, imputed values, or user behavior-generating process). </p><p>Towards this research gap, in this paper, we propose a universal <b>G</b>enerative framework for <b>B</b>ias <b>D</b>isentanglement termed as <b>GBD</b>, constantly generating calibration perturbations for the intermediate representations during training to keep them from being affected by the bias. Specifically, a bias-identifier that tries to retrieve the bias-related information from the representations is first introduced. Subsequently, the calibration perturbations are generated to significantly deteriorate the bias-identifier’s performance, making the bias gradually disentangled from the calibrated representations. Therefore, without relying on notorious debiasing configurations, a bias-agnostic model is obtained under the guidance of the bias identifier. We further present its universality by subsuming the representative biases and their mixture under the proposed framework. Finally, extensive experiments on the real-world, synthetic, and semi-synthetic datasets have demonstrated the superiority of the proposed approach against a wide range of recommendation debiasing methods. The code is available at https://github.com/Zhidan-Wang/GBD.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"46 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140584578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information Extraction (IE) focuses on transforming unstructured data into structured knowledge, of which Named Entity Recognition (NER) is a fundamental component. In the realm of Information Retrieval (IR), effectively recognizing entities can substantially enhance the precision of search and recommendation systems. Existing methods frame NER as a sequence labeling task, which requires extra data and, therefore may be limited in terms of sustainability. One promising solution is to employ a Machine Reading Comprehension (MRC) approach for NER tasks, thereby eliminating the dependence on additional data. This process encounters key challenges, including: 1) Unconventional predictions; 2) Inefficient multi-stream processing; 3) Absence of a proficient reasoning strategy. To this end, we present the Single-Stream Reasoner (SSR), a solution utilizing a reasoning strategy and standardized inputs. This yields a type-agnostic solution for both flat and nested NER tasks, without the need for additional data. On ten NER benchmarks, SSR achieved state-of-the-art results, highlighting its robustness. Furthermore, we illustrated its efficiency through convergence, inference speed, and low-resource scenario performance comparisons. Our architecture displays adaptability and can effortlessly merge with various foundational models and reasoning strategies, fostering advancements in both IR and IE fields.
信息提取(IE)侧重于将非结构化数据转化为结构化知识,而命名实体识别(NER)是其中的一个基本组成部分。在信息检索(IR)领域,有效识别实体可以大大提高搜索和推荐系统的精确度。现有的方法将 NER 定义为序列标注任务,这需要额外的数据,因此在可持续性方面可能受到限制。一个有前景的解决方案是采用机器阅读理解(MRC)方法来完成 NER 任务,从而消除对额外数据的依赖。这一过程会遇到一些关键挑战,包括1) 非常规预测;2) 多流处理效率低下;3) 缺乏熟练的推理策略。为此,我们提出了单流推理器(SSR),这是一种利用推理策略和标准化输入的解决方案。这为平面和嵌套 NER 任务提供了一种类型无关的解决方案,而无需额外的数据。在十个 NER 基准上,SSR 取得了最先进的结果,凸显了它的鲁棒性。此外,我们还通过收敛性、推理速度和低资源场景性能比较说明了它的效率。我们的架构具有很强的适应性,可以毫不费力地与各种基础模型和推理策略融合,从而促进了 IR 和 IE 领域的进步。
{"title":"SSR: Solving Named Entity Recognition Problems via a Single-stream Reasoner","authors":"Yuxiang Zhang, Junjie Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana","doi":"10.1145/3655619","DOIUrl":"https://doi.org/10.1145/3655619","url":null,"abstract":"<p>Information Extraction (IE) focuses on transforming unstructured data into structured knowledge, of which Named Entity Recognition (NER) is a fundamental component. In the realm of Information Retrieval (IR), effectively recognizing entities can substantially enhance the precision of search and recommendation systems. Existing methods frame NER as a sequence labeling task, which requires extra data and, therefore may be limited in terms of sustainability. One promising solution is to employ a Machine Reading Comprehension (MRC) approach for NER tasks, thereby eliminating the dependence on additional data. This process encounters key challenges, including: 1) Unconventional predictions; 2) Inefficient multi-stream processing; 3) Absence of a proficient reasoning strategy. To this end, we present the Single-Stream Reasoner (SSR), a solution utilizing a reasoning strategy and standardized inputs. This yields a type-agnostic solution for both flat and nested NER tasks, without the need for additional data. On ten NER benchmarks, SSR achieved state-of-the-art results, highlighting its robustness. Furthermore, we illustrated its efficiency through convergence, inference speed, and low-resource scenario performance comparisons. Our architecture displays adaptability and can effortlessly merge with various foundational models and reasoning strategies, fostering advancements in both IR and IE fields.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"51 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140585146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Point-of-Interest (POI) recommendation, an important research hotspot in the field of urban computing, plays a crucial role in urban construction. While understanding the process of users’ travel decisions and exploring the causality of POI choosing is not easy due to the complex and diverse influencing factors in urban travel scenarios. Moreover, the spurious explanations caused by severe data sparsity, i.e., misrepresenting universal relevance as causality, may also hinder us from understanding users’ travel decisions. To this end, in this paper, we propose a factor-level causal explanation generation framework based on counterfactual data augmentation for user travel decisions, named Factor-level Causal Explanation for User Travel Decisions (FCE-UTD), which can distinguish between true and false causal factors and generate true causal explanations. Specifically, we first assume that a user decision is composed of a set of several different factors. Then, by preserving the user decision structure with a joint counterfactual contrastive learning paradigm, we learn the representation of factors and detect the relevant factors. Next, we further identify true causal factors by constructing counterfactual decisions with a counterfactual representation generator, in particular, it can not only augment the dataset and mitigate the sparsity but also contribute to clarifying the causal factors from other false causal factors that may cause spurious explanations. Besides, a causal dependency learner is proposed to identify causal factors for each decision by learning causal dependency scores. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our approach in terms of check-in rate, fidelity, and downstream tasks under different behavior scenarios. The extra case studies also demonstrate the ability of FCE-UTD to generate causal explanations in POI choosing.
兴趣点(POI)推荐是城市计算领域的一个重要研究热点,在城市建设中发挥着至关重要的作用。由于城市出行场景中的影响因素复杂多样,理解用户的出行决策过程并探索兴趣点选择的因果关系并非易事。此外,严重的数据稀缺性所导致的虚假解释,即把普遍相关性误解为因果关系,也可能阻碍我们理解用户的出行决策。为此,我们在本文中提出了一种基于反事实数据增强的用户出行决策因素级因果解释生成框架,命名为用户出行决策因素级因果解释(FCE-UTD),它可以区分真假因果因素并生成真实的因果解释。具体来说,我们首先假设用户决策是由一系列不同因素组成的。然后,通过联合反事实对比学习范式保留用户决策结构,我们学习因素的表征并检测相关因素。接下来,我们通过反事实表征生成器构建反事实决策,进一步识别真正的因果因素,特别是,它不仅可以增强数据集,缓解稀疏性,还有助于从其他可能导致虚假解释的虚假因果因素中澄清因果因素。此外,还提出了一种因果依赖学习器,通过学习因果依赖分数来识别每个决策的因果因素。在三个真实世界数据集上进行的广泛实验证明了我们的方法在不同行为场景下的签到率、保真度和下游任务方面的优越性。额外的案例研究也证明了 FCE-UTD 在 POI 选择中生成因果解释的能力。
{"title":"Beyond Relevance: Factor-level Causal Explanation for User Travel Decisions with Counterfactual Data Augmentation","authors":"Hanzhe Li, Jingjing Gu, Xinjiang Lu, Dazhong Shen, Yuting Liu, YaNan Deng, Guoliang Shi, Hui Xiong","doi":"10.1145/3653673","DOIUrl":"https://doi.org/10.1145/3653673","url":null,"abstract":"<p>Point-of-Interest (POI) recommendation, an important research hotspot in the field of urban computing, plays a crucial role in urban construction. While understanding the process of users’ travel decisions and exploring the causality of POI choosing is not easy due to the complex and diverse influencing factors in urban travel scenarios. Moreover, the spurious explanations caused by severe data sparsity, i.e., misrepresenting universal relevance as causality, may also hinder us from understanding users’ travel decisions. To this end, in this paper, we propose a factor-level causal explanation generation framework based on counterfactual data augmentation for user travel decisions, named Factor-level Causal Explanation for User Travel Decisions (FCE-UTD), which can distinguish between true and false causal factors and generate true causal explanations. Specifically, we first assume that a user decision is composed of a set of several different factors. Then, by preserving the user decision structure with a joint counterfactual contrastive learning paradigm, we learn the representation of factors and detect the relevant factors. Next, we further identify true causal factors by constructing counterfactual decisions with a counterfactual representation generator, in particular, it can not only augment the dataset and mitigate the sparsity but also contribute to clarifying the causal factors from other false causal factors that may cause spurious explanations. Besides, a causal dependency learner is proposed to identify causal factors for each decision by learning causal dependency scores. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our approach in terms of check-in rate, fidelity, and downstream tasks under different behavior scenarios. The extra case studies also demonstrate the ability of FCE-UTD to generate causal explanations in POI choosing.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"21 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, a novel generative retrieval (GR) paradigm has been proposed, where a single sequence-to-sequence model is learned to directly generate a list of relevant document identifiers (docids) given a query. Existing generative retrieval (GR) models commonly employ maximum likelihood estimation (MLE) for optimization: this involves maximizing the likelihood of a single relevant docid given an input query, with the assumption that the likelihood for each docid is independent of the other docids in the list. We refer to these models as the pointwise approach in this paper. While the pointwise approach has been shown to be effective in the context of generative retrieval (GR), it is considered sub-optimal due to its disregard for the fundamental principle that ranking involves making predictions about lists. In this paper, we address this limitation by introducing an alternative listwise approach, which empowers the generative retrieval (GR) model to optimize the relevance at the docid list level. Specifically, we view the generation of a ranked docid list as a sequence learning process: at each step we learn a subset of parameters that maximizes the corresponding generation likelihood of the i-th docid given the (preceding) top i − 1 docids. To formalize the sequence learning process, we design a positional conditional probability for generative retrieval (GR). To alleviate the potential impact of beam search on the generation quality during inference, we perform relevance calibration on the generation likelihood of model-generated docids according to relevance grades. We conduct extensive experiments on representative binary and multi-graded relevance datasets. Our empirical results demonstrate that our method outperforms state-of-the-art generative retrieval (GR) baselines in terms of retrieval performance.
最近,有人提出了一种新颖的生成式检索(GR)范式,即学习一个单一的序列到序列模型来直接生成给定查询的相关文档标识符(docids)列表。现有的生成式检索(GR)模型通常采用最大似然估计法(MLE)进行优化:即在输入查询的情况下最大化单个相关文档标识符的似然,并假设每个文档标识符的似然与列表中的其他文档标识符无关。我们在本文中将这些模型称为点式方法。虽然在生成式检索(GR)中,点式方法被证明是有效的,但由于它忽视了排序涉及对列表进行预测的基本原则,因此被认为是次优方法。在本文中,我们通过引入另一种列表方法来解决这一局限性,该方法使生成式检索(GR)模型能够在 docid 列表级别优化相关性。具体来说,我们将生成一个有排序的 docid 列表视为一个序列学习过程:在每一步中,我们学习一个参数子集,该子集能最大化第 i 个 docid 在前 i - 1 个 docid 的情况下的相应生成可能性。为了使序列学习过程正规化,我们设计了生成检索(GR)的位置条件概率。为了减轻推理过程中波束搜索对生成质量的潜在影响,我们根据相关性等级对模型生成文档的生成可能性进行相关性校准。我们在具有代表性的二元和多等级相关性数据集上进行了广泛的实验。实证结果表明,我们的方法在检索性能方面优于最先进的生成式检索(GR)基线。
{"title":"Listwise Generative Retrieval Models via a Sequential Learning Process","authors":"Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng","doi":"10.1145/3653712","DOIUrl":"https://doi.org/10.1145/3653712","url":null,"abstract":"<p>Recently, a novel generative retrieval (GR) paradigm has been proposed, where a single sequence-to-sequence model is learned to directly generate a list of relevant document identifiers (docids) given a query. Existing generative retrieval (GR) models commonly employ maximum likelihood estimation (MLE) for optimization: this involves maximizing the likelihood of a single relevant docid given an input query, with the assumption that the likelihood for each docid is independent of the other docids in the list. We refer to these models as the pointwise approach in this paper. While the pointwise approach has been shown to be effective in the context of generative retrieval (GR), it is considered sub-optimal due to its disregard for the fundamental principle that ranking involves making predictions about lists. In this paper, we address this limitation by introducing an alternative listwise approach, which empowers the generative retrieval (GR) model to optimize the relevance at the docid list level. Specifically, we view the generation of a ranked docid list as a sequence learning process: at each step we learn a subset of parameters that maximizes the corresponding generation likelihood of the <i>i</i>-th docid given the (preceding) top <i>i</i> − 1 docids. To formalize the sequence learning process, we design a positional conditional probability for generative retrieval (GR). To alleviate the potential impact of beam search on the generation quality during inference, we perform relevance calibration on the generation likelihood of model-generated docids according to relevance grades. We conduct extensive experiments on representative binary and multi-graded relevance datasets. Our empirical results demonstrate that our method outperforms state-of-the-art generative retrieval (GR) baselines in terms of retrieval performance.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"22 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As people inevitably interact with items across multiple domains or various platforms, cross-domain recommendation (CDR) has gained increasing attention. However, the rising privacy concerns limit the practical applications of existing CDR models since they assume that full or partial data are accessible among different domains. Recent studies on privacy-aware CDR models neglect the heterogeneity from multiple domain data and fail to achieve consistent improvements in cross-domain recommendation; thus, it remains a challenging task to conduct effective CDR in a privacy-preserving way.
In this paper, we propose a novel federated graph learning approach for Privacy-Preserving Cross-Domain Recommendation (denoted as PPCDR) to capture users’ preferences based on distributed multi-domain data and improve recommendation performance for all domains without privacy leakage. The main idea of PPCDR is to model both global preference among multiple domains and local preference at a specific domain for a given user, which characterizes the user’s shared and domain-specific tastes towards the items for interaction. Specifically, in the private update process of PPCDR, we design a graph transfer module for each domain to fuse global and local user preferences and update them based on local domain data. In the federated update process, through applying the local differential privacy (LDP) technique for privacy-preserving, we collaboratively learn global user preferences based on multi-domain data, and adapt these global preferences to heterogeneous domain data via personalized aggregation. In this way, PPCDR can effectively approximate the multi-domain training process that directly shares local interaction data in a privacy-preserving way. Extensive experiments on three CDR datasets demonstrate that PPCDR consistently outperforms competitive single- and cross-domain baselines and effectively protects domain privacy.
{"title":"Privacy-Preserving Cross-Domain Recommendation with Federated Graph Learning","authors":"Changxin Tian, Yuexiang Xie, Xu Chen, Yaliang Li, Wayne Xin Zhao","doi":"10.1145/3653448","DOIUrl":"https://doi.org/10.1145/3653448","url":null,"abstract":"<p>As people inevitably interact with items across multiple domains or various platforms, cross-domain recommendation (CDR) has gained increasing attention. However, the rising privacy concerns limit the practical applications of existing CDR models since they assume that full or partial data are accessible among different domains. Recent studies on privacy-aware CDR models neglect the heterogeneity from multiple domain data and fail to achieve consistent improvements in cross-domain recommendation; thus, it remains a challenging task to conduct effective CDR in a privacy-preserving way. </p><p>In this paper, we propose a novel federated graph learning approach for <b>P</b>rivacy-<b>P</b>reserving <b>C</b>ross-<b>D</b>omain <b>R</b>ecommendation (denoted as <b>PPCDR</b>) to capture users’ preferences based on distributed multi-domain data and improve recommendation performance for all domains without privacy leakage. The main idea of PPCDR is to model both global preference among multiple domains and local preference at a specific domain for a given user, which characterizes the user’s shared and domain-specific tastes towards the items for interaction. Specifically, in the private update process of PPCDR, we design a graph transfer module for each domain to fuse global and local user preferences and update them based on local domain data. In the federated update process, through applying the local differential privacy (LDP) technique for privacy-preserving, we collaboratively learn global user preferences based on multi-domain data, and adapt these global preferences to heterogeneous domain data via personalized aggregation. In this way, PPCDR can effectively approximate the multi-domain training process that directly shares local interaction data in a privacy-preserving way. Extensive experiments on three CDR datasets demonstrate that PPCDR consistently outperforms competitive single- and cross-domain baselines and effectively protects domain privacy.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Yi, Qi Zhang, Hui He, Kaize Shi, Liang Hu, Ning An, Zhendong Niu
Multivariate time series (MTS) forecasting is crucial in many real-world applications. To achieve accurate MTS forecasting, it is essential to simultaneously consider both intra- and inter-series relationships among time series data. However, previous work has typically modeled intra- and inter-series relationships separately and has disregarded multi-order interactions present within and between time series data, which can seriously degrade forecasting accuracy. In this paper, we reexamine intra- and inter-series relationships from the perspective of mutual information and accordingly construct a comprehensive relationship learning mechanism tailored to simultaneously capture the intricate multi-order intra- and inter-series couplings. Based on the mechanism, we propose a novel deep coupling network for MTS forecasting, named DeepCN, which consists of a coupling mechanism dedicated to explicitly exploring the multi-order intra- and inter-series relationships among time series data concurrently, a coupled variable representation module aimed at encoding diverse variable patterns, and an inference module facilitating predictions through one forward step. Extensive experiments conducted on seven real-world datasets demonstrate that our proposed DeepCN achieves superior performance compared with the state-of-the-art baselines.
{"title":"Deep Coupling Network For Multivariate Time Series Forecasting","authors":"Kun Yi, Qi Zhang, Hui He, Kaize Shi, Liang Hu, Ning An, Zhendong Niu","doi":"10.1145/3653447","DOIUrl":"https://doi.org/10.1145/3653447","url":null,"abstract":"<p>Multivariate time series (MTS) forecasting is crucial in many real-world applications. To achieve accurate MTS forecasting, it is essential to simultaneously consider both intra- and inter-series relationships among time series data. However, previous work has typically modeled intra- and inter-series relationships separately and has disregarded multi-order interactions present within and between time series data, which can seriously degrade forecasting accuracy. In this paper, we reexamine intra- and inter-series relationships from the perspective of mutual information and accordingly construct a comprehensive relationship learning mechanism tailored to simultaneously capture the intricate multi-order intra- and inter-series couplings. Based on the mechanism, we propose a novel deep coupling network for MTS forecasting, named DeepCN, which consists of a coupling mechanism dedicated to explicitly exploring the multi-order intra- and inter-series relationships among time series data concurrently, a coupled variable representation module aimed at encoding diverse variable patterns, and an inference module facilitating predictions through one forward step. Extensive experiments conducted on seven real-world datasets demonstrate that our proposed DeepCN achieves superior performance compared with the state-of-the-art baselines.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"7 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.
{"title":"Passage-aware Search Result Diversification","authors":"Zhan Su, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen","doi":"10.1145/3653672","DOIUrl":"https://doi.org/10.1145/3653672","url":null,"abstract":"<p>Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"266 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}