Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.
{"title":"Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue","authors":"Jian Wang, Dongding Lin, Wenjie Li","doi":"10.1145/3652598","DOIUrl":"https://doi.org/10.1145/3652598","url":null,"abstract":"<p>Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"77 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose UniTRep, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the market-aware talent representations, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the trajectory-enhanced market representations, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.
{"title":"Towards Unified Representation Learning for Career Mobility Analysis with Trajectory Hypergraph","authors":"Rui Zha, Ying Sun, Chuan Qin, Le Zhang, Tong Xu, Hengshu Zhu, Enhong Chen","doi":"10.1145/3651158","DOIUrl":"https://doi.org/10.1145/3651158","url":null,"abstract":"<p>Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose <b>UniTRep</b>, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the <i>market-aware talent representations</i>, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the <i>trajectory-enhanced market representations</i>, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"29 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen
Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.
{"title":"Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval","authors":"Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen","doi":"10.1145/3650205","DOIUrl":"https://doi.org/10.1145/3650205","url":null,"abstract":"<p>Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"54 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140018530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke
Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks.
In this paper, we design a meta-learning framework, called META-HIN, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, META-HIN is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs.
In our experiments, we fuse several datasets from multiple domains to verify META-HIN’s broad applicability in a multiple-graph scenario. META-HIN consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.
异构信息网络(HIN)是许多特定领域检索和推荐场景以及对话环境中的关键资源。目前挖掘图数据的方法通常依赖于丰富的监督信息。然而,对于一项新任务来说,图学习的监督信号往往很稀缺,而且可能只有少数标注节点可用。元学习机制能够利用可适应新任务的先验知识。在本文中,我们设计了一个元学习框架,称为 META-HIN,用于解决 HIN 上的少量学习问题。据我们所知,我们是第一批设计出统一框架来实现 HINs 少量学习并促进不同图领域下游任务的人。以往的大多数模型只关注单个图上的单一任务,而 META-HIN 则不同,它能处理多个图上的不同任务(以节点分类、链接预测和异常检测为例)。对子图进行采样,以建立支持和查询集。在由元学习模块处理之前,先通过结构模块对子图进行建模,以捕捉结构特征。然后,使用异构 GNN 模块作为基础模型来表达子图的特征。我们还设计了一个基于 GAN 的对比学习模块,该模块能够利用子图的无监督信息。在实验中,我们融合了多个领域的数据集,以验证 META-HIN 在多图场景中的广泛适用性。在我们考虑的所有任务和数据集上,META-HIN 的性能始终显著优于最先进的替代方案。
{"title":"Few-shot Learning for Heterogeneous Information Networks","authors":"Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke","doi":"10.1145/3649311","DOIUrl":"https://doi.org/10.1145/3649311","url":null,"abstract":"<p>Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks. </p><p>In this paper, we design a meta-learning framework, called <sans-serif>META-HIN</sans-serif>, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, <sans-serif>META-HIN</sans-serif> is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs. </p><p>In our experiments, we fuse several datasets from multiple domains to verify <sans-serif>META-HIN</sans-serif>’s broad applicability in a multiple-graph scenario. <sans-serif>META-HIN</sans-serif> consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"52 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139980284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua
Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, i.e., rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.
{"title":"Filter-based Stance Network for Rumor Verification","authors":"Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua","doi":"10.1145/3649462","DOIUrl":"https://doi.org/10.1145/3649462","url":null,"abstract":"<p>Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, <i>i.e.</i>, rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"2 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on four retrieval benchmarks suggest that our implementations of GWS lead to substantial improvements compared to weak supervision if the weak labeler is sufficiently reliable.
{"title":"Generalized Weak Supervision for Neural Information Retrieval","authors":"Yen-Chieh Lien, Hamed Zamani, W. Bruce Croft","doi":"10.1145/3647639","DOIUrl":"https://doi.org/10.1145/3647639","url":null,"abstract":"<p>Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on four retrieval benchmarks suggest that our implementations of GWS lead to substantial improvements compared to weak supervision if the weak labeler is sufficiently reliable.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised classifier trained on solely the labeled samples. Pseudo-labeling is one of the most widely used SSTC techniques, which trains a teacher classifier with a small number of labeled examples to predict pseudo labels for the unlabeled data. The generated pseudo-labeled examples are then utilized to train a student classifier, such that the learned student classifier can outperform the teacher classifier. Nevertheless, the predicted pseudo labels may be inaccurate, making the performance of the student classifier degraded. The student classifier may perform even worse than the teacher classifier. To alleviate this issue, in this paper, we introduce a dual meta-learning (DML) technique for semi-supervised text classification, which improves the teacher and student classifiers simultaneously in an iterative manner. Specifically, we propose a meta-noise correction method to improve the student classifier by proposing a Noise Transition Matrix (NTM) with meta-learning to rectify the noisy pseudo labels. In addition, we devise a meta pseudo supervision method to improve the teacher classifier. Concretely, we exploit the feedback performance from the student classifier to further guide the teacher classifier to produce more accurate pseudo labels for the unlabeled data. In this way, both teacher and student classifiers can co-evolve in the iterative training process. Extensive experiments on four benchmark datasets highlight the effectiveness of our DML method against existing state-of-the-art methods for semi-supervised text classification. We release our code and data of this paper publicly at https://github.com/GRIT621/DML.
{"title":"Improving Semi-Supervised Text Classification with Dual Meta-Learning","authors":"Shujie Li, Guanghu Yuan, Min Yang, Ying Shen, Chengming Li, Ruifeng Xu, Xiaoyan Zhao","doi":"10.1145/3648612","DOIUrl":"https://doi.org/10.1145/3648612","url":null,"abstract":"<p>The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised classifier trained on solely the labeled samples. Pseudo-labeling is one of the most widely used SSTC techniques, which trains a teacher classifier with a small number of labeled examples to predict pseudo labels for the unlabeled data. The generated pseudo-labeled examples are then utilized to train a student classifier, such that the learned student classifier can outperform the teacher classifier. Nevertheless, the predicted pseudo labels may be inaccurate, making the performance of the student classifier degraded. The student classifier may perform even worse than the teacher classifier. To alleviate this issue, in this paper, we introduce a dual meta-learning (<b>DML</b>) technique for semi-supervised text classification, which improves the teacher and student classifiers simultaneously in an iterative manner. Specifically, we propose a meta-noise correction method to improve the student classifier by proposing a Noise Transition Matrix (NTM) with meta-learning to rectify the noisy pseudo labels. In addition, we devise a meta pseudo supervision method to improve the teacher classifier. Concretely, we exploit the feedback performance from the student classifier to further guide the teacher classifier to produce more accurate pseudo labels for the unlabeled data. In this way, both teacher and student classifiers can co-evolve in the iterative training process. Extensive experiments on four benchmark datasets highlight the effectiveness of our DML method against existing state-of-the-art methods for semi-supervised text classification. We release our code and data of this paper publicly at https://github.com/GRIT621/DML.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"14 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern transformer-based information retrieval models achieve state-of-the-art performance across various benchmarks. The self-attention of the transformer models is a powerful mechanism to contextualize terms over the whole input but quickly becomes prohibitively expensive for long input as required in document retrieval. Instead of focusing on the model itself to improve efficiency, this paper explores different bag of words document representations that encode full documents by only a fraction of their characteristic terms, allowing us to control and reduce the input length. We experiment with various models for document retrieval on MS MARCO data, as well as zero-shot document retrieval on Robust04, and show large gains in efficiency while retaining reasonable effectiveness. Inference time efficiency gains are both lowering the time and memory complexity in a controllable way, allowing for further trading off memory footprint and query latency. More generally, this line of research connects traditional IR models with neural “NLP” models and offers novel ways to explore the space between (efficient, but less effective) traditional rankers and (effective, but less efficient) neural rankers elegantly.
基于转换器的现代信息检索模型在各种基准测试中都达到了最先进的性能。转换器模型的自关注是一种强大的机制,可将整个输入中的术语上下文化,但对于文档检索中所需的长输入,这种机制很快就会变得昂贵得令人望而却步。为了提高效率,本文没有把重点放在模型本身,而是探索了不同的词袋文档表示法,这些表示法只用部分特征词对完整文档进行编码,从而使我们能够控制和减少输入长度。我们在 MS MARCO 数据的文档检索以及 Robust04 的零次文档检索中尝试了各种模型,结果表明在保持合理有效性的同时,效率也得到了大幅提高。推理时间效率的提高以一种可控的方式降低了时间和内存复杂性,从而可以进一步权衡内存占用和查询延迟。更广泛地说,这项研究将传统的红外模型与神经 "NLP "模型联系起来,为探索(高效但效率较低)传统排序器和(有效但效率较低)神经排序器之间的空间提供了新的方法。
{"title":"Revisiting Bag of Words Document Representations for Efficient Ranking with Transformers","authors":"David Rau, Mostafa Dehghani, Jaap Kamps","doi":"10.1145/3640460","DOIUrl":"https://doi.org/10.1145/3640460","url":null,"abstract":"<p>Modern transformer-based information retrieval models achieve state-of-the-art performance across various benchmarks. The self-attention of the transformer models is a powerful mechanism to contextualize terms over the whole input but quickly becomes prohibitively expensive for long input as required in document retrieval. Instead of focusing on the model itself to improve efficiency, this paper explores different bag of words document representations that encode full documents by only a fraction of their characteristic terms, allowing us to control and reduce the input length. We experiment with various models for document retrieval on MS MARCO data, as well as zero-shot document retrieval on Robust04, and show large gains in efficiency while retaining reasonable effectiveness. Inference time efficiency gains are both lowering the time and memory complexity in a controllable way, allowing for further trading off memory footprint and query latency. More generally, this line of research connects traditional IR models with neural “NLP” models and offers novel ways to explore the space between (efficient, but less effective) traditional rankers and (effective, but less efficient) neural rankers elegantly.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"60 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qizhi Wan, Changxuan Wan, Keli Xiao, Hui Xiong, Dexi Liu, Xiping Liu, Rong Hu
Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing different events, resulting in limited event extraction performance. This paper introduces a novel framework for document-level event extraction, incorporating a new data structure called token-event-role and a multi-channel argument role prediction module. The proposed data structure enables our model to uncover the primary role of tokens in multiple events, facilitating a more comprehensive understanding of event relationships. By leveraging the multi-channel prediction module, we transform entity and multi-event extraction into a single task of predicting token-event pairs, thereby reducing the overall parameter size and enhancing model efficiency. The results demonstrate that our approach outperforms the state-of-the-art method by 9.5 percentage points in terms of the F1 score, highlighting its superior performance in event extraction. Furthermore, an ablation study confirms the significant value of the proposed data structure in improving event extraction tasks, further validating its importance in enhancing the overall performance of the framework.
文档级事件提取是一个长期存在的具有挑战性的信息检索问题,涉及一系列子任务:实体提取、事件类型判断和特定事件类型的多事件提取。然而,将该问题作为多个学习任务来处理会增加模型的复杂性。此外,现有方法没有充分利用跨越不同事件的实体之间的相关性,导致事件提取性能有限。本文介绍了一种用于文档级事件提取的新框架,其中包含一种名为 "标记-事件-角色 "的新数据结构和一个多通道参数角色预测模块。所提出的数据结构使我们的模型能够揭示标记在多个事件中的主要作用,从而有助于更全面地理解事件关系。通过利用多通道预测模块,我们将实体和多事件提取转化为预测标记-事件对的单一任务,从而减少了整体参数大小,提高了模型效率。结果表明,我们的方法在 F1 分数上比最先进的方法高出 9.5 个百分点,突出了其在事件提取方面的卓越性能。此外,一项消融研究证实了所提出的数据结构在改进事件提取任务方面的重要价值,进一步验证了它在提高框架整体性能方面的重要性。
{"title":"Token-Event-Role Structure-based Multi-Channel Document-Level Event Extraction","authors":"Qizhi Wan, Changxuan Wan, Keli Xiao, Hui Xiong, Dexi Liu, Xiping Liu, Rong Hu","doi":"10.1145/3643885","DOIUrl":"https://doi.org/10.1145/3643885","url":null,"abstract":"<p>Document-level event extraction is a long-standing challenging information retrieval problem involving a sequence of sub-tasks: entity extraction, event type judgment, and event type-specific multi-event extraction. However, addressing the problem as multiple learning tasks leads to increased model complexity. Also, existing methods insufficiently utilize the correlation of entities crossing different events, resulting in limited event extraction performance. This paper introduces a novel framework for document-level event extraction, incorporating a new data structure called token-event-role and a multi-channel argument role prediction module. The proposed data structure enables our model to uncover the primary role of tokens in multiple events, facilitating a more comprehensive understanding of event relationships. By leveraging the multi-channel prediction module, we transform entity and multi-event extraction into a single task of predicting token-event pairs, thereby reducing the overall parameter size and enhancing model efficiency. The results demonstrate that our approach outperforms the state-of-the-art method by 9.5 percentage points in terms of the <i>F</i>1 score, highlighting its superior performance in event extraction. Furthermore, an ablation study confirms the significant value of the proposed data structure in improving event extraction tasks, further validating its importance in enhancing the overall performance of the framework.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"27 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengyu Zhang, Qiaowei Miao, Ping Nie, Mengze Li, Zhengyu Chen, Fuli Feng, Kun Kuang, Fei Wu
Tackling the pervasive issue of data sparsity in recommender systems, we present an insightful investigation into the burgeoning area of non-overlapping cross-domain recommendation, a technique that facilitates the transfer of interaction knowledge across domains without necessitating inter-domain user/item correspondence. Existing approaches have predominantly depended on auxiliary information, such as user reviews and item tags, to establish inter-domain connectivity, but these resources may become inaccessible due to privacy and commercial constraints.
To address these limitations, our study introduces an in-depth exploration of Target-unknown Cross-domain Recommendation, which contends with the distinct challenge of lacking target domain information during the training phase in the source domain. We illustrate two critical obstacles inherent to Target-unknown CDR: the lack of an inter-domain bridge due to insufficient user/item correspondence or side information, and the potential pitfalls of source-domain training biases when confronting distribution shifts across domains. To surmount these obstacles, we propose the CMCDR framework, a novel approach that leverages causal mechanisms extracted from meta-user/item representations. The CMCDR framework employs a vector-quantized encoder-decoder architecture, enabling the disentanglement of user/item characteristics. We posit that domain-transferable knowledge is more readily discernible from user/item characteristics, i.e., the meta-representations, rather than raw users and items. Capitalizing on these meta-representations, our CMCDR framework adeptly incorporates an attention-driven predictor that approximates the front-door adjustment method grounded in causal theory. This cutting-edge strategy effectively mitigates source-domain training biases and enhances generalization capabilities against distribution shifts. Extensive experiments demonstrate the empirical effectiveness and the rationality of CMCDR for target-unknown cross-domain recommendation.
{"title":"Transferring Causal Mechanism over Meta-representations for Target-unknown Cross-domain Recommendation","authors":"Shengyu Zhang, Qiaowei Miao, Ping Nie, Mengze Li, Zhengyu Chen, Fuli Feng, Kun Kuang, Fei Wu","doi":"10.1145/3643807","DOIUrl":"https://doi.org/10.1145/3643807","url":null,"abstract":"<p>Tackling the pervasive issue of data sparsity in recommender systems, we present an insightful investigation into the burgeoning area of non-overlapping cross-domain recommendation, a technique that facilitates the transfer of interaction knowledge across domains without necessitating inter-domain user/item correspondence. Existing approaches have predominantly depended on auxiliary information, such as user reviews and item tags, to establish inter-domain connectivity, but these resources may become inaccessible due to privacy and commercial constraints. </p><p>To address these limitations, our study introduces an in-depth exploration of Target-unknown Cross-domain Recommendation, which contends with the distinct challenge of lacking target domain information during the training phase in the source domain. We illustrate two critical obstacles inherent to Target-unknown CDR: the lack of an inter-domain bridge due to insufficient user/item correspondence or side information, and the potential pitfalls of source-domain training biases when confronting distribution shifts across domains. To surmount these obstacles, we propose the CMCDR framework, a novel approach that leverages causal mechanisms extracted from meta-user/item representations. The CMCDR framework employs a vector-quantized encoder-decoder architecture, enabling the disentanglement of user/item characteristics. We posit that domain-transferable knowledge is more readily discernible from user/item characteristics, <i>i</i>.<i>e</i>., the meta-representations, rather than raw users and items. Capitalizing on these meta-representations, our CMCDR framework adeptly incorporates an attention-driven predictor that approximates the front-door adjustment method grounded in causal theory. This cutting-edge strategy effectively mitigates source-domain training biases and enhances generalization capabilities against distribution shifts. Extensive experiments demonstrate the empirical effectiveness and the rationality of CMCDR for target-unknown cross-domain recommendation.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"234 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}