Search result diversification plays a crucial role in improving users’ search experience by providing users with documents covering more subtopics. Previous studies have made great progress in leveraging inter-document interactions to measure the similarity among documents. However, different parts of the document may embody different subtopics and existing models ignore the subtle similarities and differences of content within each document. In this paper, we propose a hierarchical attention framework to combine intra-document interactions with inter-document interactions in a complementary manner in order to conduct multi-grained document modeling. Specifically, we separate the document into passages to model the document content from multi-grained perspectives. Then, we design stacked interaction blocks to conduct inter-document and intra-document interactions. Moreover, to measure the subtopic coverage of each document more accurately, we propose a passage-aware document-subtopic interaction to perform fine-grained document-subtopic interaction. Experimental results demonstrate that our model achieves state-of-the-art performance compared with existing methods.
{"title":"Multi-grained Document Modeling for Search Result Diversification","authors":"Zhirui Deng, Zhicheng Dou, Zhan Su, Ji-Rong Wen","doi":"10.1145/3652852","DOIUrl":"https://doi.org/10.1145/3652852","url":null,"abstract":"<p>Search result diversification plays a crucial role in improving users’ search experience by providing users with documents covering more subtopics. Previous studies have made great progress in leveraging inter-document interactions to measure the similarity among documents. However, different parts of the document may embody different subtopics and existing models ignore the subtle similarities and differences of content within each document. In this paper, we propose a hierarchical attention framework to combine intra-document interactions with inter-document interactions in a complementary manner in order to conduct multi-grained document modeling. Specifically, we separate the document into passages to model the document content from multi-grained perspectives. Then, we design stacked interaction blocks to conduct inter-document and intra-document interactions. Moreover, to measure the subtopic coverage of each document more accurately, we propose a passage-aware document-subtopic interaction to perform fine-grained document-subtopic interaction. Experimental results demonstrate that our model achieves state-of-the-art performance compared with existing methods.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton
We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N=200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an active wizard that alerted participants to information it could provide or a passive wizard who only answered questions that were provided by the user. The active policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the passive policy. Also, it resulted in 1.7 times more knowledge communicated than the passive policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.
我们介绍了两项实证研究,以调查用户在厨房环境中使用 Alexa 和 Google Home 等数字助理时的期望和行为:首先,一项调查(N=200)询问了参与者对此类系统应能提供的信息种类的期望。虽然对烹饪步骤和流程信息的期望已达成共识,但喜欢烹饪的年轻参与者表示更有可能期望获得有关食物历史或烹饪科学的详细信息。在一项 "向导"(Wizard-of-Oz)的后续研究(N = 48)中,用户在菜谱步骤中的指导可以是主动向导(提醒参与者它可以提供的信息),也可以是被动向导(只回答用户提出的问题)。与被动向导相比,主动向导所产生的对话语句数量几乎是被动向导的两倍,而与知识相关的用户提问数量则是被动向导的 1.5 倍。此外,主动政策所传播的知识也是被动政策的 1.7 倍。我们将在相关工作的背景下讨论这些研究结果,并揭示设计和使用此类烹饪助手及其他用途(如 DIY 和手工任务)的意义,以及我们在评估此类系统时学到的经验。
{"title":"Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing Assistant","authors":"Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton","doi":"10.1145/3649500","DOIUrl":"https://doi.org/10.1145/3649500","url":null,"abstract":"<p>We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N=200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an <i>active</i> wizard that alerted participants to information it could provide or a <i>passive</i> wizard who only answered questions that were provided by the user. The <i>active</i> policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the <i>passive</i> policy. Also, it resulted in 1.7 times more knowledge communicated than the <i>passive</i> policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological relationships in each individual user behavior sequence, which may not be effective in making accurate and robust recommendations. On one hand, the performance of existing sequential recommendation methods is usually sensitive to the length of a user’s behavior sequence (i.e., the list of a user’s historically interacted items). On the other hand, besides the context information in each individual user behavior sequence, the collaborative information among different users’ behavior sequences is also crucial to make accurate recommendations. However, this kind of information is usually ignored by existing sequential recommendation methods. In this work, we propose a new sequential recommendation framework, which encodes the context information in each individual user behavior sequence as well as the collaborative information among the behavior sequences of different users, through building a local dependency graph for each item. We conduct extensive experiments to compare the proposed model with state-of-the-art sequential recommendation methods on five benchmark datasets. The experimental results demonstrate that the proposed model is able to achieve better recommendation performance than existing methods, by incorporating collaborative information.
{"title":"Collaborative Sequential Recommendations via Multi-View GNN-Transformers","authors":"Tianze Luo, Yong Liu, Sinno Jialin Pan","doi":"10.1145/3649436","DOIUrl":"https://doi.org/10.1145/3649436","url":null,"abstract":"<p>Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological relationships in each individual user behavior sequence, which may not be effective in making accurate and robust recommendations. On one hand, the performance of existing sequential recommendation methods is usually sensitive to the length of a user’s behavior sequence (<i>i.e.</i>, the list of a user’s historically interacted items). On the other hand, besides the context information in each individual user behavior sequence, the collaborative information among different users’ behavior sequences is also crucial to make accurate recommendations. However, this kind of information is usually ignored by existing sequential recommendation methods. In this work, we propose a new sequential recommendation framework, which encodes the context information in each individual user behavior sequence as well as the collaborative information among the behavior sequences of different users, through building a local dependency graph for each item. We conduct extensive experiments to compare the proposed model with state-of-the-art sequential recommendation methods on five benchmark datasets. The experimental results demonstrate that the proposed model is able to achieve better recommendation performance than existing methods, by incorporating collaborative information.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.
{"title":"Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding","authors":"Yunchang Zhu, Liang Pang, Kangxi Wu, Yanyan Lan, Huawei Shen, Xueqi Cheng","doi":"10.1145/3652599","DOIUrl":"https://doi.org/10.1145/3652599","url":null,"abstract":"<p>Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge tracing models based on deep learning can achieve impressive predictive performance by leveraging attention mechanisms. However, there still exist two challenges in attentive knowledge tracing: First, the mechanism of classical models of attentive knowledge tracing demonstrates relatively low attention when processing exercise sequences with shifting knowledge concepts, making it difficult to capture the comprehensive state of knowledge across sequences. Second, classical models do not consider stochastic behaviors, which negatively affects models of attentive knowledge tracing in terms of capturing anomalous knowledge states. This paper proposes a model of attentive knowledge tracing, called Enhancing Locality for Attentive Knowledge Tracing (ELAKT), that is a variant of the deep knowledge tracing model. The proposed model leverages the encoder module of the transformer to aggregate knowledge embedding generated by both exercises and responses over all timesteps. In addition, it uses causal convolutions to aggregate and smooth the states of local knowledge. The ELAKT model uses the states of comprehensive knowledge concepts to introduce a prediction correction module to forecast the future responses of students to deal with noise caused by stochastic behaviors. The results of experiments demonstrated that the ELAKT model consistently outperforms state-of-the-art baseline knowledge tracing models.
{"title":"ELAKT: Enhancing Locality for Attentive Knowledge Tracing","authors":"Yanjun Pu, Fang Liu, Rongye Shi, Haitao Yuan, Ruibo Chen, Tianhao Peng, WenJun Wu","doi":"10.1145/3652601","DOIUrl":"https://doi.org/10.1145/3652601","url":null,"abstract":"<p>Knowledge tracing models based on deep learning can achieve impressive predictive performance by leveraging attention mechanisms. However, there still exist two challenges in attentive knowledge tracing: First, the mechanism of classical models of attentive knowledge tracing demonstrates relatively low attention when processing exercise sequences with shifting knowledge concepts, making it difficult to capture the comprehensive state of knowledge across sequences. Second, classical models do not consider stochastic behaviors, which negatively affects models of attentive knowledge tracing in terms of capturing anomalous knowledge states. This paper proposes a model of attentive knowledge tracing, called Enhancing Locality for Attentive Knowledge Tracing (ELAKT), that is a variant of the deep knowledge tracing model. The proposed model leverages the encoder module of the transformer to aggregate knowledge embedding generated by both exercises and responses over all timesteps. In addition, it uses causal convolutions to aggregate and smooth the states of local knowledge. The ELAKT model uses the states of comprehensive knowledge concepts to introduce a prediction correction module to forecast the future responses of students to deal with noise caused by stochastic behaviors. The results of experiments demonstrated that the ELAKT model consistently outperforms state-of-the-art baseline knowledge tracing models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.
{"title":"Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue","authors":"Jian Wang, Dongding Lin, Wenjie Li","doi":"10.1145/3652598","DOIUrl":"https://doi.org/10.1145/3652598","url":null,"abstract":"<p>Target-oriented proactive dialogue systems aim to lead conversations from a dialogue context toward a pre-determined target, such as making recommendations on designated items or introducing new specific topics. To this end, it is critical for such dialogue systems to plan reasonable actions to drive the conversation proactively, and meanwhile, to plan appropriate topics to move the conversation forward to the target topic smoothly. In this work, we mainly focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back. By formulating the planning as a generation task, our TRIP bidirectionally generates a dialogue path consisting of a sequence of <action, topic> pairs using two Transformer decoders. They are expected to supervise each other and converge on consistent actions and topics by minimizing the decision gap and contrastive generation of targets. Moreover, we propose a target-constrained decoding algorithm with a bidirectional agreement to better control the planning process. Subsequently, we adopt the planned dialogue paths to guide dialogue generation in a pipeline manner, where we explore two variants: prompt-based generation and plan-controlled generation. Extensive experiments are conducted on two challenging dialogue datasets, which are re-purposed for exploring target-oriented dialogue. Our automatic and human evaluations demonstrate that the proposed methods significantly outperform various baseline models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose UniTRep, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the market-aware talent representations, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the trajectory-enhanced market representations, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.
{"title":"Towards Unified Representation Learning for Career Mobility Analysis with Trajectory Hypergraph","authors":"Rui Zha, Ying Sun, Chuan Qin, Le Zhang, Tong Xu, Hengshu Zhu, Enhong Chen","doi":"10.1145/3651158","DOIUrl":"https://doi.org/10.1145/3651158","url":null,"abstract":"<p>Career mobility analysis aims at understanding the occupational movement patterns of talents across distinct labor market entities, which enables a wide range of talent-centered applications, such as job recommendation, labor demand forecasting, and company competitive analysis. Existing studies in this field mainly focus on a single fixed scale, either investigating individual trajectories at the micro-level or crowd flows among market entities at the macro-level. Consequently, the intrinsic cross-scale interactions between talents and the labor market are largely overlooked. To bridge this gap, we propose <b>UniTRep</b>, a novel unified representation learning framework for cross-scale career mobility analysis. Specifically, we first introduce a trajectory hypergraph structure to organize the career mobility patterns in a low-information-loss manner, where market entities and talent trajectories are represented as nodes and hyperedges, respectively. Then, for learning the <i>market-aware talent representations</i>, we attentively propagate the node information to the hyperedges and incorporate the market contextual features into the process of individual trajectory modeling. For learning the <i>trajectory-enhanced market representations</i>, we aggregate the message from hyperedges associated with a specific node to integrate the fine-grained semantics of trajectories into labor market modeling. Moreover, we design two auxiliary tasks to optimize both intra-scale and cross-scale learning with a self-supervised strategy. Extensive experiments on a real-world dataset clearly validate that UniTRep can significantly outperform state-of-the-art baselines for various tasks.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen
Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.
{"title":"Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval","authors":"Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen","doi":"10.1145/3650205","DOIUrl":"https://doi.org/10.1145/3650205","url":null,"abstract":"<p>Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this paper. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140018530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke
Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks.
In this paper, we design a meta-learning framework, called META-HIN, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, META-HIN is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs.
In our experiments, we fuse several datasets from multiple domains to verify META-HIN’s broad applicability in a multiple-graph scenario. META-HIN consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.
异构信息网络(HIN)是许多特定领域检索和推荐场景以及对话环境中的关键资源。目前挖掘图数据的方法通常依赖于丰富的监督信息。然而,对于一项新任务来说,图学习的监督信号往往很稀缺,而且可能只有少数标注节点可用。元学习机制能够利用可适应新任务的先验知识。在本文中,我们设计了一个元学习框架,称为 META-HIN,用于解决 HIN 上的少量学习问题。据我们所知,我们是第一批设计出统一框架来实现 HINs 少量学习并促进不同图领域下游任务的人。以往的大多数模型只关注单个图上的单一任务,而 META-HIN 则不同,它能处理多个图上的不同任务(以节点分类、链接预测和异常检测为例)。对子图进行采样,以建立支持和查询集。在由元学习模块处理之前,先通过结构模块对子图进行建模,以捕捉结构特征。然后,使用异构 GNN 模块作为基础模型来表达子图的特征。我们还设计了一个基于 GAN 的对比学习模块,该模块能够利用子图的无监督信息。在实验中,我们融合了多个领域的数据集,以验证 META-HIN 在多图场景中的广泛适用性。在我们考虑的所有任务和数据集上,META-HIN 的性能始终显著优于最先进的替代方案。
{"title":"Few-shot Learning for Heterogeneous Information Networks","authors":"Yang Fang, Xiang Zhao, Weidong Xiao, Maarten de Rijke","doi":"10.1145/3649311","DOIUrl":"https://doi.org/10.1145/3649311","url":null,"abstract":"<p>Heterogeneous information networks (HINs) are a key resource in many domain-specific retrieval and recommendation scenarios, and in conversational environments. Current approaches to mining graph data often rely on abundant supervised information. However, supervised signals for graph learning tend to be scarce for a new task and only a handful of labeled nodes may be available. Meta-learning mechanisms are able to harness prior knowledge that can be adapted to new tasks. </p><p>In this paper, we design a meta-learning framework, called <sans-serif>META-HIN</sans-serif>, for few-shot learning problems on HINs. To the best of our knowledge, we are among the first to design a unified framework to realize the few-shot learning of HINs and facilitate different downstream tasks across different domains of graphs. Unlike most previous models, which focus on a single task on a single graph, <sans-serif>META-HIN</sans-serif> is able to deal with different tasks (node classification, link prediction, and anomaly detection are used as examples) across multiple graphs. Subgraphs are sampled to build the support and query set. Before being processed by the meta-learning module, subgraphs are modeled via a structure module to capture structural features. Then, a heterogeneous GNN module is used as the base model to express the features of subgraphs. We also design a GAN-based contrastive learning module that is able to exploit unsupervised information of the subgraphs. </p><p>In our experiments, we fuse several datasets from multiple domains to verify <sans-serif>META-HIN</sans-serif>’s broad applicability in a multiple-graph scenario. <sans-serif>META-HIN</sans-serif> consistently and significantly outperforms state-of-the-art alternatives on every task and across all datasets that we consider.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139980284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua
Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, i.e., rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.
{"title":"Filter-based Stance Network for Rumor Verification","authors":"Jun Li, Yi Bin, Yunshan Ma, Yang Yang, Zi Huang, Tat-Seng Chua","doi":"10.1145/3649462","DOIUrl":"https://doi.org/10.1145/3649462","url":null,"abstract":"<p>Rumor verification on social media aims to identify the truth value of a rumor, which is important to decrease the detrimental public effects. A rumor might arouse heated discussions and replies, conveying different stances of users that could be helpful in identifying the rumor. Thus, several works have been proposed to verify a rumor by modelling its entire stance sequence in the time domain. However, these works ignore that such a stance sequence could be decomposed into controversies with different intensities, which could be used to cluster the stance sequences with the same consensus. Besides, the existing stance extractors fail to consider both the impact of all the previously posted tweets and the reply chain on obtaining the stance of a new reply. To address the above problems, in this paper, we propose a novel stance-based network to aggregate the controversies of the stance sequence for rumor verification, termed Filter-based Stance Network (FSNet). As controversies with different intensities are reflected as the different changes of stances, it is convenient to represent different controversies in the frequency domain, but it is hard in the time domain. Our proposed FSNet decomposes the stance sequence into multiple controversies in the frequency domain and obtains the weighted aggregation of them. In specific, FSNet consists of two modules: the stance extractor and the filter block. To obtain better stance features toward the source, the stance extractor contains two stages. In the first stage, the tweet representation of each reply is obtained by aggregating information from all previously posted tweets in a conversation. Then, the features of stance toward the source, <i>i.e.</i>, rumor-aware stance, are extracted with the reply chains in the second stage. In the filter block module, a rumor-aware stance sequence is constructed by sorting all the tweets of a conversation in chronological order. Fourier Transform thereafter is employed to convert the stance sequence into the frequency domain, where different frequency components reflect controversies of different intensities. Finally, a frequency filter is applied to explore the different contributions of controversies. We supervise our FSNet with both stance labels and rumor labels to strengthen the relations between rumor veracity and crowd stances. Extensive experiments on two benchmark datasets demonstrate that our model substantially outperforms all the baselines.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}