ACM Transactions on Information Systems最新文献_第4页

Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue 生成目标导向型主动对话的目标约束双向规划

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-03-13 DOI: 10.1145/3652598

Jian Wang, Dongding Lin, Wenjie Li

Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.

以目标为导向的主动对话系统旨在将对话从对话情境引向预先确定的目标，如就指定项目提出建议或引入新的特定话题。为此，这类对话系统必须规划合理的行动来主动推动对话，同时规划适当的话题来推动对话顺利进入目标话题。在这项工作中，我们主要关注面向目标对话生成的有效对话规划。受认知科学决策理论的启发，我们提出了一种新颖的目标受限双向规划（TRIP）方法，通过前瞻和后顾之忧来规划合适的对话路径。通过将规划制定为一项生成任务，我们的 TRIP 利用两个变换器解码器双向生成对话路径，该路径由一系列动作、话题对组成。我们希望这两个解码器能相互监督，并通过最小化决策差距和目标的对比生成来趋同于一致的行动和话题。此外，我们还提出了一种具有双向协议的目标受限解码算法，以更好地控制计划过程。随后，我们采用规划好的对话路径，以流水线方式指导对话生成，并探索了两种变体：基于提示的生成和计划控制的生成。我们在两个具有挑战性的对话数据集上进行了广泛的实验，这些数据集被重新用于探索面向目标的对话。我们的自动和人工评估结果表明，所提出的方法明显优于各种基线模型。

{"title":"Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue","authors":"Jian Wang, Dongding Lin, Wenjie Li","doi":"10.1145/3652598","DOIUrl":"https://doi.org/10.1145/3652598","url":null,"abstract":"Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"77 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Unified Representation Learning for Career Mobility Analysis with Trajectory Hypergraph 利用轨迹超图实现职业流动性分析的统一表征学习

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-03-06 DOI: 10.1145/3651158

Rui Zha, Ying Sun, Chuan Qin, Le Zhang, Tong Xu, Hengshu Zhu, Enhong Chen

Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose UniTRep, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the market-aware talent representations, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the trajectory-enhanced market representations, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.

职业流动分析旨在了解人才在不同劳动力市场主体间的职业流动模式，从而实现以人才为中心的广泛应用，如职位推荐、劳动力需求预测和企业竞争力分析等。该领域的现有研究主要集中在一个固定的尺度上，要么研究微观层面的个体轨迹，要么研究宏观层面的市场主体之间的人群流动。因此，人才与劳动力市场之间内在的跨尺度互动在很大程度上被忽视了。为了弥补这一缺陷，我们提出了一个用于跨尺度职业流动分析的新型统一表征学习框架--UniTRep。具体来说，我们首先引入一个轨迹超图结构，以低信息损耗的方式组织职业流动模式，其中市场实体和人才轨迹分别表示为节点和超边。然后，在学习市场感知人才表征时，我们会将节点信息传播到超图中，并将市场背景特征纳入个人轨迹建模过程。在学习轨迹增强型市场表征时，我们汇总与特定节点相关的超节点信息，将细粒度的轨迹语义整合到劳动力市场建模中。此外，我们还设计了两个辅助任务，以自我监督策略优化尺度内和跨尺度学习。在真实世界数据集上进行的大量实验清楚地验证了 UniTRep 在各种任务中的表现明显优于最先进的基线方法。

{"title":"Towards Unified Representation Learning for Career Mobility Analysis with Trajectory Hypergraph","authors":"Rui Zha, Ying Sun, Chuan Qin, Le Zhang, Tong Xu, Hengshu Zhu, Enhong Chen","doi":"10.1145/3651158","DOIUrl":"https://doi.org/10.1145/3651158","url":null,"abstract":"Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose UniTRep, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the market-aware talent representations, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the trajectory-enhanced market representations, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"29 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval 针对深度跨模态哈希检索的隐形黑盒后门攻击

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-03-02 DOI: 10.1145/3650205

Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen

Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.

深度跨模态哈希算法因其出色的效率和存储能力，推动了多模态检索领域的发展，但其对后门攻击的脆弱性却鲜有研究。值得注意的是，目前的深度跨模态散列方法不可避免地需要大规模的训练数据，从而导致样本中毒，其触发因素不易察觉，很容易伪装成训练数据，在受害者模型中埋下后门。然而，现有的后门攻击主要集中在单模态视觉领域，而多模态差距和哈希量化削弱了它们的攻击性能。针对上述挑战，我们在本文中针对深度跨模态哈希检索进行了隐形黑盒后门攻击。据我们所知，这是该研究领域的首次尝试。具体来说，我们开发了一种灵活的触发器生成器来生成攻击者指定的触发器，它可以学习非中毒模态的样本语义，从而弥补跨模态攻击的差距。然后，我们设计了一个输入感知注入网络，它以特定样本隐身的形式将生成的触发器嵌入良性样本中，并实现触发器与中毒样本之间的跨模态语义交互。由于受害者模型的知识不可知性，我们使任何跨模态哈希山寨版都能促进黑盒后门攻击，并减轻哈希量化的攻击削弱。此外，我们还提出了一种混淆扰动和掩码策略，以诱导高性能受害者模型关注中毒样本中不易察觉的触发点。在基准数据集上进行的大量实验表明，我们的方法对深度跨模态哈希检索的攻击性能达到了一流水平。此外，我们还研究了可转移攻击、少量中毒、多模态中毒、可感知性以及后门攻击的潜在防御等因素的影响。我们的代码和数据集可在 https://github.com/tswang0116/IB3A 上获取。

{"title":"Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval","authors":"Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen","doi":"10.1145/3650205","DOIUrl":"https://doi.org/10.1145/3650205","url":null,"abstract":"Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"54 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140018530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few-shot Learning for Heterogeneous Information Networks 异构信息网络的少量学习

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-27 DOI: 10.1145/3649311

Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke

Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks.

In this paper, we design a meta-learning framework, called META-HIN, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, META-HIN is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs.

In our experiments, we fuse several datasets from multiple domains to verify META-HIN’s broad applicability in a multiple-graph scenario. META-HIN consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.

异构信息网络（HIN）是许多特定领域检索和推荐场景以及对话环境中的关键资源。目前挖掘图数据的方法通常依赖于丰富的监督信息。然而，对于一项新任务来说，图学习的监督信号往往很稀缺，而且可能只有少数标注节点可用。元学习机制能够利用可适应新任务的先验知识。在本文中，我们设计了一个元学习框架，称为 META-HIN，用于解决 HIN 上的少量学习问题。据我们所知，我们是第一批设计出统一框架来实现 HINs 少量学习并促进不同图领域下游任务的人。以往的大多数模型只关注单个图上的单一任务，而 META-HIN 则不同，它能处理多个图上的不同任务（以节点分类、链接预测和异常检测为例）。对子图进行采样，以建立支持和查询集。在由元学习模块处理之前，先通过结构模块对子图进行建模，以捕捉结构特征。然后，使用异构 GNN 模块作为基础模型来表达子图的特征。我们还设计了一个基于 GAN 的对比学习模块，该模块能够利用子图的无监督信息。在实验中，我们融合了多个领域的数据集，以验证 META-HIN 在多图场景中的广泛适用性。在我们考虑的所有任务和数据集上，META-HIN 的性能始终显著优于最先进的替代方案。

{"title":"Few-shot Learning for Heterogeneous Information Networks","authors":"Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke","doi":"10.1145/3649311","DOIUrl":"https://doi.org/10.1145/3649311","url":null,"abstract":"Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks. In this paper, we design a meta-learning framework, called <sans-serif>META-HIN</sans-serif>, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, <sans-serif>META-HIN</sans-serif> is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs. In our experiments, we fuse several datasets from multiple domains to verify <sans-serif>META-HIN</sans-serif>’s broad applicability in a multiple-graph scenario. <sans-serif>META-HIN</sans-serif> consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"52 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139980284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Filter-based Stance Network for Rumor Verification 基于过滤器的谣言验证立场网络

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-26 DOI: 10.1145/3649462

Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua

Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, i.e., rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.

社交媒体上的谣言验证旨在识别谣言的真实价值，这对减少有害的公众影响非常重要。谣言可能会引起激烈的讨论和回复，传递出用户的不同立场，这可能有助于识别谣言。因此，有几项研究提出通过在时域中模拟谣言的整个立场序列来验证谣言。然而，这些工作忽略了这样一个立场序列可以分解成不同强度的争议，而这些争议可以用来聚类具有相同共识的立场序列。此外，现有的立场提取器未能同时考虑之前发布的所有推文和回复链对获取新回复立场的影响。针对上述问题，我们在本文中提出了一种新颖的基于立场的网络来聚合立场序列的争议，用于谣言验证，称为基于过滤器的立场网络（FSNet）。由于不同强度的争议反映为不同的立场变化，因此在频域表示不同的争议很方便，但在时域表示却很困难。我们提出的 FSNet 将立场序列分解为频域中的多个争议，并得到它们的加权聚合。具体而言，FSNet 由两个模块组成：立场提取器和过滤块。为了更好地获得针对来源的立场特征，立场提取器包括两个阶段。在第一阶段，通过汇总对话中所有先前发布的推文信息，获得每个回复的推文表示。然后，在第二阶段利用回复链提取对消息来源的立场特征，即谣言感知立场。在过滤块模块中，按照时间顺序对对话中的所有推文进行排序，从而构建出谣言感知立场序列。之后，采用傅立叶变换将立场序列转换为频域，其中不同的频率成分反映了不同强度的争议。最后，应用频率滤波器来探索争议的不同贡献。我们利用立场标签和谣言标签对 FSNet 进行监督，以加强谣言真实性与人群立场之间的关系。在两个基准数据集上进行的广泛实验表明，我们的模型大大优于所有基线模型。

{"title":"Filter-based Stance Network for Rumor Verification","authors":"Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua","doi":"10.1145/3649462","DOIUrl":"https://doi.org/10.1145/3649462","url":null,"abstract":"Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, i.e., rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"2 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Weak Supervision for Neural Information Retrieval 神经信息检索的广义弱监督

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-21 DOI: 10.1145/3647639

Yen-Chieh Lien, Hamed Zamani, W. Bruce Croft

Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on four retrieval benchmarks suggest that our implementations of GWS lead to substantial improvements compared to weak supervision if the weak labeler is sufficiently reliable.

神经排序模型（NRMs）在多项信息检索（IR）任务中表现出了有效的性能。然而，训练 NRM 通常需要大规模的训练数据，而获取这些数据既困难又昂贵。为了解决这个问题，人们可以通过弱监督来训练 NRM，即使用现有的排名模型（称为弱标签器）自动生成一个大型数据集，用于训练 NRM。弱监督式 NRM 可以从观察到的数据中进行泛化，并明显优于弱标签器。本文通过迭代重标记过程推广了这一想法，证明弱监督模型可以迭代地扮演弱标记者的角色，并在不使用人工标记数据的情况下显著提高排名性能。本文提出的广义弱监督（GWS）解决方案是通用的，与排序模型架构是正交的。本文提供了四种 GWS 实现方法：自标注、交叉标注、交叉和自标注联合以及贪婪多标注。GWS 还得益于基于查询性能预测方法的查询重要性加权机制，以减少生成的训练数据中的噪声。我们还在自标注和期望最大化之间建立了理论联系。我们在四个检索基准上进行的实验表明，如果弱标签器足够可靠，我们的 GWS 实现与弱监督相比会有很大改进。

{"title":"Generalized Weak Supervision for Neural Information Retrieval","authors":"Yen-Chieh Lien, Hamed Zamani, W. Bruce Croft","doi":"10.1145/3647639","DOIUrl":"https://doi.org/10.1145/3647639","url":null,"abstract":"Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on four retrieval benchmarks suggest that our implementations of GWS lead to substantial improvements compared to weak supervision if the weak labeler is sufficiently reliable.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Semi-Supervised Text Classification with Dual Meta-Learning 利用双重元学习改进半监督文本分类

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-20 DOI: 10.1145/3648612

Shujie Li, Guanghu Yuan, Min Yang, Ying Shen, Chengming Li, Ruifeng Xu, Xiaoyan Zhao

The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised classifier trained on solely the labeled samples. Pseudo-labeling is one of the most widely used SSTC techniques, which trains a teacher classifier with a small number of labeled examples to predict pseudo labels for the unlabeled data. The generated pseudo-labeled examples are then utilized to train a student classifier, such that the learned student classifier can outperform the teacher classifier. Nevertheless, the predicted pseudo labels may be inaccurate, making the performance of the student classifier degraded. The student classifier may perform even worse than the teacher classifier. To alleviate this issue, in this paper, we introduce a dual meta-learning (DML) technique for semi-supervised text classification, which improves the teacher and student classifiers simultaneously in an iterative manner. Specifically, we propose a meta-noise correction method to improve the student classifier by proposing a Noise Transition Matrix (NTM) with meta-learning to rectify the noisy pseudo labels. In addition, we devise a meta pseudo supervision method to improve the teacher classifier. Concretely, we exploit the feedback performance from the student classifier to further guide the teacher classifier to produce more accurate pseudo labels for the unlabeled data. In this way, both teacher and student classifiers can co-evolve in the iterative training process. Extensive experiments on four benchmark datasets highlight the effectiveness of our DML method against existing state-of-the-art methods for semi-supervised text classification. We release our code and data of this paper publicly at https://github.com/GRIT621/DML.

半监督文本分类法（SSTC）的目标是通过探索少量已标记数据和大量未标记数据来训练模型，从而使学习到的半监督分类器的性能优于仅在已标记样本上训练的监督分类器。伪标签技术是应用最广泛的 SSTC 技术之一，它使用少量已标记示例训练教师分类器，以预测未标记数据的伪标签。然后利用生成的伪标签示例来训练学生分类器，这样学习到的学生分类器就能超越教师分类器。然而，预测的伪标签可能不准确，从而降低了学生分类器的性能。学生分类器的表现甚至可能比教师分类器更差。为了缓解这一问题，我们在本文中引入了一种用于半监督文本分类的双重元学习（DML）技术，它能以迭代的方式同时改进教师和学生分类器。具体来说，我们提出了一种元噪声校正方法，通过元学习提出噪声转换矩阵（NTM）来校正噪声伪标签，从而改进学生分类器。此外，我们还设计了一种元伪监督方法来改进教师分类器。具体来说，我们利用学生分类器的反馈性能，进一步指导教师分类器为未标记数据生成更准确的伪标签。这样，教师和学生分类器就能在迭代训练过程中共同发展。我们在四个基准数据集上进行了广泛的实验，结果表明，与现有的最先进的半监督文本分类方法相比，我们的 DML 方法非常有效。我们在 https://github.com/GRIT621/DML 上公开发布了本文的代码和数据。

{"title":"Improving Semi-Supervised Text Classification with Dual Meta-Learning","authors":"Shujie Li, Guanghu Yuan, Min Yang, Ying Shen, Chengming Li, Ruifeng Xu, Xiaoyan Zhao","doi":"10.1145/3648612","DOIUrl":"https://doi.org/10.1145/3648612","url":null,"abstract":"The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised classifier trained on solely the labeled samples. Pseudo-labeling is one of the most widely used SSTC techniques, which trains a teacher classifier with a small number of labeled examples to predict pseudo labels for the unlabeled data. The generated pseudo-labeled examples are then utilized to train a student classifier, such that the learned student classifier can outperform the teacher classifier. Nevertheless, the predicted pseudo labels may be inaccurate, making the performance of the student classifier degraded. The student classifier may perform even worse than the teacher classifier. To alleviate this issue, in this paper, we introduce a dual meta-learning (DML) technique for semi-supervised text classification, which improves the teacher and student classifiers simultaneously in an iterative manner. Specifically, we propose a meta-noise correction method to improve the student classifier by proposing a Noise Transition Matrix (NTM) with meta-learning to rectify the noisy pseudo labels. In addition, we devise a meta pseudo supervision method to improve the teacher classifier. Concretely, we exploit the feedback performance from the student classifier to further guide the teacher classifier to produce more accurate pseudo labels for the unlabeled data. In this way, both teacher and student classifiers can co-evolve in the iterative training process. Extensive experiments on four benchmark datasets highlight the effectiveness of our DML method against existing state-of-the-art methods for semi-supervised text classification. We release our code and data of this paper publicly at https://github.com/GRIT621/DML.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"14 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting Bag of Words Document Representations for Efficient Ranking with Transformers 重新审视词袋文档表示法，利用变换器实现高效排序

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-09 DOI: 10.1145/3640460

David Rau, Mostafa Dehghani, Jaap Kamps

Modern transformer-based information retrieval models achieve state-of-the-art performance across various benchmarks. The self-attention of the transformer models is a powerful mechanism to contextualize terms over the whole input but quickly becomes prohibitively expensive for long input as required in document retrieval. Instead of focusing on the model itself to improve efficiency, this paper explores different bag of words document representations that encode full documents by only a fraction of their characteristic terms, allowing us to control and reduce the input length. We experiment with various models for document retrieval on MS MARCO data, as well as zero-shot document retrieval on Robust04, and show large gains in efficiency while retaining reasonable effectiveness. Inference time efficiency gains are both lowering the time and memory complexity in a controllable way, allowing for further trading off memory footprint and query latency. More generally, this line of research connects traditional IR models with neural “NLP” models and offers novel ways to explore the space between (efficient, but less effective) traditional rankers and (effective, but less efficient) neural rankers elegantly.

基于转换器的现代信息检索模型在各种基准测试中都达到了最先进的性能。转换器模型的自关注是一种强大的机制，可将整个输入中的术语上下文化，但对于文档检索中所需的长输入，这种机制很快就会变得昂贵得令人望而却步。为了提高效率，本文没有把重点放在模型本身，而是探索了不同的词袋文档表示法，这些表示法只用部分特征词对完整文档进行编码，从而使我们能够控制和减少输入长度。我们在 MS MARCO 数据的文档检索以及 Robust04 的零次文档检索中尝试了各种模型，结果表明在保持合理有效性的同时，效率也得到了大幅提高。推理时间效率的提高以一种可控的方式降低了时间和内存复杂性，从而可以进一步权衡内存占用和查询延迟。更广泛地说，这项研究将传统的红外模型与神经 "NLP "模型联系起来，为探索（高效但效率较低）传统排序器和（有效但效率较低）神经排序器之间的空间提供了新的方法。

{"title":"Revisiting Bag of Words Document Representations for Efficient Ranking with Transformers","authors":"David Rau, Mostafa Dehghani, Jaap Kamps","doi":"10.1145/3640460","DOIUrl":"https://doi.org/10.1145/3640460","url":null,"abstract":"Modern transformer-based information retrieval models achieve state-of-the-art performance across various benchmarks. The self-attention of the transformer models is a powerful mechanism to contextualize terms over the whole input but quickly becomes prohibitively expensive for long input as required in document retrieval. Instead of focusing on the model itself to improve efficiency, this paper explores different bag of words document representations that encode full documents by only a fraction of their characteristic terms, allowing us to control and reduce the input length. We experiment with various models for document retrieval on MS MARCO data, as well as zero-shot document retrieval on Robust04, and show large gains in efficiency while retaining reasonable effectiveness. Inference time efficiency gains are both lowering the time and memory complexity in a controllable way, allowing for further trading off memory footprint and query latency. More generally, this line of research connects traditional IR models with neural “NLP” models and offers novel ways to explore the space between (efficient, but less effective) traditional rankers and (effective, but less efficient) neural rankers elegantly.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"60 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Token-Event-Role Structure-based Multi-Channel Document-Level Event Extraction 基于令牌-事件-角色结构的多通道文档级事件提取

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-07 DOI: 10.1145/3643885

Qizhi Wan, Changxuan Wan, Keli Xiao, Hui Xiong, Dexi Liu, Xiping Liu, Rong Hu

Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing different events, resulting in limited event extraction performance. This paper introduces a novel framework for document-level event extraction, incorporating a new data structure called token-event-role and a multi-channel argument role prediction module. The proposed data structure enables our model to uncover the primary role of tokens in multiple events, facilitating a more comprehensive understanding of event relationships. By leveraging the multi-channel prediction module, we transform entity and multi-event extraction into a single task of predicting token-event pairs, thereby reducing the overall parameter size and enhancing model efficiency. The results demonstrate that our approach outperforms the state-of-the-art method by 9.5 percentage points in terms of the F1 score, highlighting its superior performance in event extraction. Furthermore, an ablation study confirms the significant value of the proposed data structure in improving event extraction tasks, further validating its importance in enhancing the overall performance of the framework.

文档级事件提取是一个长期存在的具有挑战性的信息检索问题，涉及一系列子任务：实体提取、事件类型判断和特定事件类型的多事件提取。然而，将该问题作为多个学习任务来处理会增加模型的复杂性。此外，现有方法没有充分利用跨越不同事件的实体之间的相关性，导致事件提取性能有限。本文介绍了一种用于文档级事件提取的新框架，其中包含一种名为 "标记-事件-角色 "的新数据结构和一个多通道参数角色预测模块。所提出的数据结构使我们的模型能够揭示标记在多个事件中的主要作用，从而有助于更全面地理解事件关系。通过利用多通道预测模块，我们将实体和多事件提取转化为预测标记-事件对的单一任务，从而减少了整体参数大小，提高了模型效率。结果表明，我们的方法在 F1 分数上比最先进的方法高出 9.5 个百分点，突出了其在事件提取方面的卓越性能。此外，一项消融研究证实了所提出的数据结构在改进事件提取任务方面的重要价值，进一步验证了它在提高框架整体性能方面的重要性。

{"title":"Token-Event-Role Structure-based Multi-Channel Document-Level Event Extraction","authors":"Qizhi Wan, Changxuan Wan, Keli Xiao, Hui Xiong, Dexi Liu, Xiping Liu, Rong Hu","doi":"10.1145/3643885","DOIUrl":"https://doi.org/10.1145/3643885","url":null,"abstract":"Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing different events, resulting in limited event extraction performance. This paper introduces a novel framework for document-level event extraction, incorporating a new data structure called token-event-role and a multi-channel argument role prediction module. The proposed data structure enables our model to uncover the primary role of tokens in multiple events, facilitating a more comprehensive understanding of event relationships. By leveraging the multi-channel prediction module, we transform entity and multi-event extraction into a single task of predicting token-event pairs, thereby reducing the overall parameter size and enhancing model efficiency. The results demonstrate that our approach outperforms the state-of-the-art method by 9.5 percentage points in terms of the F1 score, highlighting its superior performance in event extraction. Furthermore, an ablation study confirms the significant value of the proposed data structure in improving event extraction tasks, further validating its importance in enhancing the overall performance of the framework.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"27 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transferring Causal Mechanism over Meta-representations for Target-unknown Cross-domain Recommendation 在元表征上转移因果机制，实现目标未知的跨域推荐

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems

Pub Date : 2024-02-01 DOI: 10.1145/3643807

Shengyu Zhang, Qiaowei Miao, Ping Nie, Mengze Li, Zhengyu Chen, Fuli Feng, Kun Kuang, Fei Wu

Tackling the pervasive issue of data sparsity in recommender systems, we present an insightful investigation into the burgeoning area of non-overlapping cross-domain recommendation, a technique that facilitates the transfer of interaction knowledge across domains without necessitating inter-domain user/item correspondence. Existing approaches have predominantly depended on auxiliary information, such as user reviews and item tags, to establish inter-domain connectivity, but these resources may become inaccessible due to privacy and commercial constraints.

To address these limitations, our study introduces an in-depth exploration of Target-unknown Cross-domain Recommendation, which contends with the distinct challenge of lacking target domain information during the training phase in the source domain. We illustrate two critical obstacles inherent to Target-unknown CDR: the lack of an inter-domain bridge due to insufficient user/item correspondence or side information, and the potential pitfalls of source-domain training biases when confronting distribution shifts across domains. To surmount these obstacles, we propose the CMCDR framework, a novel approach that leverages causal mechanisms extracted from meta-user/item representations. The CMCDR framework employs a vector-quantized encoder-decoder architecture, enabling the disentanglement of user/item characteristics. We posit that domain-transferable knowledge is more readily discernible from user/item characteristics, i.e., the meta-representations, rather than raw users and items. Capitalizing on these meta-representations, our CMCDR framework adeptly incorporates an attention-driven predictor that approximates the front-door adjustment method grounded in causal theory. This cutting-edge strategy effectively mitigates source-domain training biases and enhances generalization capabilities against distribution shifts. Extensive experiments demonstrate the empirical effectiveness and the rationality of CMCDR for target-unknown cross-domain recommendation.

为了解决推荐系统中普遍存在的数据稀缺问题，我们对正在蓬勃发展的非重叠跨域推荐领域进行了深入研究，这种技术可以促进跨域交互知识的传递，而不需要域间用户/物品的对应关系。现有方法主要依赖用户评论和物品标签等辅助信息来建立域间连接，但由于隐私和商业限制，这些资源可能无法访问。为了解决这些局限性，我们的研究对目标未知跨域推荐进行了深入探讨，以应对在源域训练阶段缺乏目标域信息这一独特挑战。我们说明了目标未知跨域推荐固有的两个关键障碍：由于用户/项目对应关系或侧面信息不足而缺乏跨域桥梁，以及在面对跨域分布变化时源域训练偏差的潜在隐患。为了克服这些障碍，我们提出了 CMCDR 框架，这是一种利用从元用户/项目表征中提取的因果机制的新方法。CMCDR 框架采用矢量量化编码器-解码器架构，实现了用户/物品特征的分离。我们认为，从用户/项目特征（即元表征）而不是原始用户和项目中，更容易辨别出领域可转移知识。利用这些元表征，我们的 CMCDR 框架巧妙地纳入了注意力驱动预测器，该预测器近似于以因果理论为基础的前门调整方法。这一尖端策略有效地减轻了源域训练偏差，并增强了针对分布变化的泛化能力。广泛的实验证明了 CMCDR 在目标未知的跨域推荐中的实证有效性和合理性。

{"title":"Transferring Causal Mechanism over Meta-representations for Target-unknown Cross-domain Recommendation","authors":"Shengyu Zhang, Qiaowei Miao, Ping Nie, Mengze Li, Zhengyu Chen, Fuli Feng, Kun Kuang, Fei Wu","doi":"10.1145/3643807","DOIUrl":"https://doi.org/10.1145/3643807","url":null,"abstract":"Tackling the pervasive issue of data sparsity in recommender systems, we present an insightful investigation into the burgeoning area of non-overlapping cross-domain recommendation, a technique that facilitates the transfer of interaction knowledge across domains without necessitating inter-domain user/item correspondence. Existing approaches have predominantly depended on auxiliary information, such as user reviews and item tags, to establish inter-domain connectivity, but these resources may become inaccessible due to privacy and commercial constraints. To address these limitations, our study introduces an in-depth exploration of Target-unknown Cross-domain Recommendation, which contends with the distinct challenge of lacking target domain information during the training phase in the source domain. We illustrate two critical obstacles inherent to Target-unknown CDR: the lack of an inter-domain bridge due to insufficient user/item correspondence or side information, and the potential pitfalls of source-domain training biases when confronting distribution shifts across domains. To surmount these obstacles, we propose the CMCDR framework, a novel approach that leverages causal mechanisms extracted from meta-user/item representations. The CMCDR framework employs a vector-quantized encoder-decoder architecture, enabling the disentanglement of user/item characteristics. We posit that domain-transferable knowledge is more readily discernible from user/item characteristics, i.e., the meta-representations, rather than raw users and items. Capitalizing on these meta-representations, our CMCDR framework adeptly incorporates an attention-driven predictor that approximates the front-door adjustment method grounded in causal theory. This cutting-edge strategy effectively mitigates source-domain training biases and enhances generalization capabilities against distribution shifts. Extensive experiments demonstrate the empirical effectiveness and the rationality of CMCDR for target-unknown cross-domain recommendation.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"234 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0