High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold ((min_util )) to find rules. However, specifying the appropriate (min_util ) threshold is a difficult problem for users. A smaller (min_util ) threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound based strategy to raise the internal (min_util ) threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.
{"title":"Mining top-k high on-shelf utility itemsets using novel threshold raising strategies","authors":"Kuldeep Singh, Bhaskar Biswas","doi":"10.1145/3645115","DOIUrl":"https://doi.org/10.1145/3645115","url":null,"abstract":"<p>High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold ((min_util )) to find rules. However, specifying the appropriate (min_util ) threshold is a difficult problem for users. A smaller (min_util ) threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (<b>T</b>op-<b>K</b> high <b>O</b>n-<b>S</b>helf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound based strategy to raise the internal (min_util ) threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"175 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, Graph Neural Networks (GNNs) have achieved unprecedented success in handling graph-structured data, thereby driving the development of numerous GNN-oriented techniques for inductive knowledge graph completion (KGC). A key limitation of existing methods, however, is their dependence on pre-defined aggregation functions, which lack the adaptability to diverse data, resulting in suboptimal performance on established benchmarks. Another challenge arises from the exponential increase in irrelated entities as the reasoning path lengthens, introducing unwarranted noise and consequently diminishing the model’s generalization capabilities. To surmount these obstacles, we design an innovative framework that synergizes Multi-Level Sampling with an Adaptive Aggregation mechanism (MLSAA). Distinctively, our model couples GNNs with enhanced set transformers, enabling dynamic selection of the most appropriate aggregation function tailored to specific datasets and tasks. This adaptability significantly boosts both the model’s flexibility and its expressive capacity. Additionally, we unveil a unique sampling strategy designed to selectively filter irrelevant entities, while retaining potentially beneficial targets throughout the reasoning process. We undertake an exhaustive evaluation of our novel inductive KGC method across three pivotal benchmark datasets and the experimental results corroborate the efficacy of MLSAA.
{"title":"Incorporating Multi-Level Sampling with Adaptive Aggregation for Inductive Knowledge Graph Completion","authors":"Kai Sun, Huajie Jiang, Yongli Hu, Baocai Yin","doi":"10.1145/3644822","DOIUrl":"https://doi.org/10.1145/3644822","url":null,"abstract":"<p>In recent years, Graph Neural Networks (GNNs) have achieved unprecedented success in handling graph-structured data, thereby driving the development of numerous GNN-oriented techniques for inductive knowledge graph completion (KGC). A key limitation of existing methods, however, is their dependence on pre-defined aggregation functions, which lack the adaptability to diverse data, resulting in suboptimal performance on established benchmarks. Another challenge arises from the exponential increase in irrelated entities as the reasoning path lengthens, introducing unwarranted noise and consequently diminishing the model’s generalization capabilities. To surmount these obstacles, we design an innovative framework that synergizes <b>M</b>ulti-<b>L</b>evel <b>S</b>ampling with an <b>A</b>daptive <b>A</b>ggregation mechanism (MLSAA). Distinctively, our model couples GNNs with enhanced set transformers, enabling dynamic selection of the most appropriate aggregation function tailored to specific datasets and tasks. This adaptability significantly boosts both the model’s flexibility and its expressive capacity. Additionally, we unveil a unique sampling strategy designed to selectively filter irrelevant entities, while retaining potentially beneficial targets throughout the reasoning process. We undertake an exhaustive evaluation of our novel inductive KGC method across three pivotal benchmark datasets and the experimental results corroborate the efficacy of MLSAA.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early classification of longitudinal data remains an active area of research today. The complexity of these datasets and the high rates of missing data caused by irregular sampling present data-level challenges for the Early Longitudinal Data Classification (ELDC) problem. Coupled with the algorithmic challenge of optimising the opposing objectives of early classification (i.e., earliness and accuracy), ELDC becomes a non-trivial task. Inspired by the generative power and utility of the Generative Adversarial Network (GAN), we propose a novel context-conditional, longitudinal early classifier GAN (LEC-GAN). This model utilises informative missingness, static features, and earlier observations to improve the ELDC objective. It achieves this by incorporating ELDC as an auxiliary task within an imputation optimization process. Our experiments on several datasets demonstrate that LEC-GAN outperforms all relevant baselines in terms of F1 scores while increasing the earliness of prediction.
纵向数据的早期分类仍然是当今一个活跃的研究领域。这些数据集的复杂性和不规则抽样造成的高数据缺失率给早期纵向数据分类(ELDC)问题带来了数据层面的挑战。再加上优化早期分类的对立目标(即早期性和准确性)的算法挑战,ELDC 成为了一项非同小可的任务。受生成对抗网络(GAN)的生成能力和实用性的启发,我们提出了一种新颖的上下文条件纵向早期分类器 GAN(LEC-GAN)。该模型利用信息缺失、静态特征和早期观测来改善 ELDC 目标。它通过将 ELDC 作为一项辅助任务纳入估算优化流程来实现这一目标。我们在多个数据集上进行的实验表明,LEC-GAN 在提高预测准确率的同时,在 F1 分数方面优于所有相关基线。
{"title":"Conditional Generative Adversarial Network for Early Classification of Longitudinal Datasets using an Imputation Approach","authors":"Sharon Torao Pingi, Richi Nayak, Md Abul Bashar","doi":"10.1145/3644821","DOIUrl":"https://doi.org/10.1145/3644821","url":null,"abstract":"<p>Early classification of longitudinal data remains an active area of research today. The complexity of these datasets and the high rates of missing data caused by irregular sampling present data-level challenges for the Early Longitudinal Data Classification (ELDC) problem. Coupled with the algorithmic challenge of optimising the opposing objectives of early classification (i.e., earliness and accuracy), ELDC becomes a non-trivial task. Inspired by the generative power and utility of the Generative Adversarial Network (GAN), we propose a novel context-conditional, longitudinal early classifier GAN (LEC-GAN). This model utilises informative missingness, static features, and earlier observations to improve the ELDC objective. It achieves this by incorporating ELDC as an auxiliary task within an imputation optimization process. Our experiments on several datasets demonstrate that LEC-GAN outperforms all relevant baselines in terms of F1 scores while increasing the earliness of prediction.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"37 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-instance Learning (MIL) is a popular learning paradigm arising from many real applications. It assigns a label to a set of instances, named as a bag, and the bag’s label is determined by the instances within it. A bag is positive if and only if it has at least one positive instance. Since labeling bags is more complicated than labeling each instance, we will often face the mislabeling problem in MIL. Furthermore, it is more common that a negative bag has been mislabeled to a positive one since one mislabeled instance will lead to the change of the whole bag label. This is an important problem that originated from real applications, e.g., web mining and image classification, but little research has concentrated on it as far as we know. In this paper, we focus on this MIL problem with one side label noise that the negative bags are mislabeled as positive ones. To address this challenging problem, we propose a novel multi-instance learning method with One Side Label Noise (OSLN). We design a new double weighting approach under traditional framework to characterize the ’faithfulness’ of each instance and each bag in learning the classifier. Briefly, on the instance level, we employ a sparse weighting method to select the key instances, and the MIL problem with one size label noise is converted to a mislabeled supervised learning scenario. On the bag level, the weights of bags, together with the selected key instances, will be utilized to identify the real positive bags. In addition, we have solved our proposed model by an alternative iteration method with proved convergence behavior. Empirical studies on various datasets have validated the effectiveness of our method.
多实例学习(Multi-instance Learning,MIL)是在许多实际应用中产生的一种流行的学习范式。它为一组实例分配一个标签,命名为 "包",包的标签由其中的实例决定。当且仅当一个包至少有一个正向实例时,它才是正向的。由于给袋贴标签比给每个实例贴标签要复杂得多,因此我们在 MIL 中经常会遇到贴错标签的问题。此外,由于一个错误标注的实例会导致整个包的标签发生变化,因此负包被错误标注为正包的情况更为常见。这是一个源于实际应用(如网络挖掘和图像分类)的重要问题,但就我们所知,很少有研究集中于此。在本文中,我们将重点关注这一具有单侧标签噪声的 MIL 问题,即负袋被误标记为正袋。为了解决这个具有挑战性的问题,我们提出了一种带有单侧标签噪声(OSLN)的新型多实例学习方法。我们在传统框架下设计了一种新的双重加权方法,用于描述每个实例和每个袋在分类器学习中的 "忠实度"。简而言之,在实例层面,我们采用稀疏加权法来选择关键实例,并将单侧标签噪声的 MIL 问题转换为错误标签的监督学习场景。在袋层面,我们将利用袋的权重和所选的关键实例来识别真正的正向袋。此外,我们还采用了另一种迭代方法来解决我们提出的模型,其收敛性已得到证实。对各种数据集的实证研究验证了我们方法的有效性。
{"title":"Multi-Instance Learning with One Side Label Noise","authors":"Tianxiang Luan, Shilin Gu, Xijia Tang, Wenzhang Zhuge, Chenping Hou","doi":"10.1145/3644076","DOIUrl":"https://doi.org/10.1145/3644076","url":null,"abstract":"<p>Multi-instance Learning (MIL) is a popular learning paradigm arising from many real applications. It assigns a label to a set of instances, named as a bag, and the bag’s label is determined by the instances within it. A bag is positive if and only if it has at least one positive instance. Since labeling bags is more complicated than labeling each instance, we will often face the mislabeling problem in MIL. Furthermore, it is more common that a negative bag has been mislabeled to a positive one since one mislabeled instance will lead to the change of the whole bag label. This is an important problem that originated from real applications, e.g., web mining and image classification, but little research has concentrated on it as far as we know. In this paper, we focus on this MIL problem with one side label noise that the negative bags are mislabeled as positive ones. To address this challenging problem, we propose a novel multi-instance learning method with One Side Label Noise (OSLN). We design a new double weighting approach under traditional framework to characterize the ’faithfulness’ of each instance and each bag in learning the classifier. Briefly, on the instance level, we employ a sparse weighting method to select the key instances, and the MIL problem with one size label noise is converted to a mislabeled supervised learning scenario. On the bag level, the weights of bags, together with the selected key instances, will be utilized to identify the real positive bags. In addition, we have solved our proposed model by an alternative iteration method with proved convergence behavior. Empirical studies on various datasets have validated the effectiveness of our method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"125 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haobo Wang, Cheng Peng, Hede Dong, Lei Feng, Weiwei Liu, Tianlei Hu, Ke Chen, Gang Chen
A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely-used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.
{"title":"On the Value of Head Labels in Multi-Label Text Classification","authors":"Haobo Wang, Cheng Peng, Hede Dong, Lei Feng, Weiwei Liu, Tianlei Hu, Ke Chen, Gang Chen","doi":"10.1145/3643853","DOIUrl":"https://doi.org/10.1145/3643853","url":null,"abstract":"<p>A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely-used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"254 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xihong Yang, Yiqi Wang, Yue Liu, Yi Wen, Lingyuan Meng, Sihang Zhou, Xinwang Liu, En Zhu
Graph Neural Networks (GNNs) have achieved promising performance in semi-supervised node classification in recent years. However, the problem of insufficient supervision, together with representation collapse, largely limits the performance of the GNNs in this field. To alleviate the collapse of node representations in semi-supervised scenario, we propose a novel graph contrastive learning method, termed Mixed Graph Contrastive Network (MGCN). In our method, we improve the discriminative capability of the latent embeddings by an interpolation-based augmentation strategy and a correlation reduction mechanism. Specifically, we first conduct the interpolation-based augmentation in the latent space and then force the prediction model to change linearly between samples. Second, we enable the learned network to tell apart samples across two interpolation-perturbed views through forcing the correlation matrix across views to approximate an identity matrix. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discriminative representation learning. Extensive experimental results on six datasets demonstrate the effectiveness and the generality of MGCN compared to the existing state-of-the-art methods. The code of MGCN is available at https://github.com/xihongyang1999/MGCN on Github.
{"title":"Mixed Graph Contrastive Network for Semi-Supervised Node Classification","authors":"Xihong Yang, Yiqi Wang, Yue Liu, Yi Wen, Lingyuan Meng, Sihang Zhou, Xinwang Liu, En Zhu","doi":"10.1145/3641549","DOIUrl":"https://doi.org/10.1145/3641549","url":null,"abstract":"<p>Graph Neural Networks (GNNs) have achieved promising performance in semi-supervised node classification in recent years. However, the problem of insufficient supervision, together with representation collapse, largely limits the performance of the GNNs in this field. To alleviate the collapse of node representations in semi-supervised scenario, we propose a novel graph contrastive learning method, termed <b>M</b>ixed <b>G</b>raph <b>C</b>ontrastive <b>N</b>etwork (MGCN). In our method, we improve the discriminative capability of the latent embeddings by an interpolation-based augmentation strategy and a correlation reduction mechanism. Specifically, we first conduct the interpolation-based augmentation in the latent space and then force the prediction model to change linearly between samples. Second, we enable the learned network to tell apart samples across two interpolation-perturbed views through forcing the correlation matrix across views to approximate an identity matrix. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discriminative representation learning. Extensive experimental results on six datasets demonstrate the effectiveness and the generality of MGCN compared to the existing state-of-the-art methods. The code of MGCN is available at https://github.com/xihongyang1999/MGCN on Github.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Zhu, Bo Jiang, Haiming Jin, Mengtian Zhang, Feng Gao, Jianqiang Huang, Tao Lin, Xinbing Wang
A networked time series (NETS) is a family of time series on a given graph, one for each node. It has a wide range of applications from intelligent transportation, environment monitoring to smart grid management. An important task in such applications is to predict the future values of a NETS based on its historical values and the underlying graph. Most existing methods require complete data for training. However, in real-world scenarios, it is not uncommon to have missing data due to sensor malfunction, incomplete sensing coverage, etc. In this paper, we study the problem of NETS prediction with incomplete data. We propose NETS-ImpGAN, a novel deep learning framework that can be trained on incomplete data with missing values in both history and future. Furthermore, we propose Graph Temporal Attention Networks, which incorporate the attention mechanism to capture both inter-time series and temporal correlations. We conduct extensive experiments on four real-world datasets under different missing patterns and missing rates. The experimental results show that NETS-ImpGAN outperforms existing methods, reducing the MAE by up to 25%.
{"title":"Networked Time Series Prediction with Incomplete Data via Generative Adversarial Network","authors":"Yichen Zhu, Bo Jiang, Haiming Jin, Mengtian Zhang, Feng Gao, Jianqiang Huang, Tao Lin, Xinbing Wang","doi":"10.1145/3643822","DOIUrl":"https://doi.org/10.1145/3643822","url":null,"abstract":"<p>A <i>networked time series (NETS)</i> is a family of time series on a given graph, one for each node. It has a wide range of applications from intelligent transportation, environment monitoring to smart grid management. An important task in such applications is to predict the future values of a NETS based on its historical values and the underlying graph. Most existing methods require complete data for training. However, in real-world scenarios, it is not uncommon to have missing data due to sensor malfunction, incomplete sensing coverage, etc. In this paper, we study the problem of <i>NETS prediction with incomplete data</i>. We propose NETS-ImpGAN, a novel deep learning framework that can be trained on incomplete data with missing values in both history and future. Furthermore, we propose <i>Graph Temporal Attention Networks</i>, which incorporate the attention mechanism to capture both inter-time series and temporal correlations. We conduct extensive experiments on four real-world datasets under different missing patterns and missing rates. The experimental results show that NETS-ImpGAN outperforms existing methods, reducing the MAE by up to 25%.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"111 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139755477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In graph classification, attention- and pooling-based graph neural networks (GNNs) predominate to extract salient features from the input graph and support the prediction. They mostly follow the paradigm of “learning to attend”, which maximizes the mutual information between the attended graph and the ground-truth label. However, this paradigm causes GNN classifiers to indiscriminately absorb all statistical correlations between input features and labels in the training data, without distinguishing the causal and noncausal effects of features. Rather than emphasizing causal features, the attended graphs tend to rely on noncausal features as shortcuts to predictions. These shortcut features may easily change outside the training distribution, thereby leading to poor generalization for GNN classifiers. In this paper, we take a causal view on GNN modeling. Under our causal assumption, the shortcut feature serves as a confounder between the causal feature and prediction. It misleads the classifier into learning spurious correlations that facilitate prediction in in-distribution (ID) test evaluation, while causing significant performance drop in out-of-distribution (OOD) test data. To address this issue, we employ the backdoor adjustment from causal theory — combining each causal feature with various shortcut features, to identify causal patterns and mitigate the confounding effect. Specifically, we employ attention modules to estimate the causal and shortcut features of the input graph. Then, a memory bank collects the estimated shortcut features, enhancing the diversity of shortcut features for combination. Simultaneously, we apply the prototype strategy to improve the consistency of intra-class causal features. We term our method as CAL+, which can promote stable relationships between causal estimation and prediction, regardless of distribution changes. Extensive experiments on synthetic and real-world OOD benchmarks demonstrate our method’s effectiveness in improving OOD generalization. Our codes are released at https://github.com/shuyao-wang/CAL-plus.
{"title":"Enhancing Out-of-distribution Generalization on Graphs via Causal Attention Learning","authors":"Yongduo Sui, Wenyu Mao, Shuyao Wang, Xiang Wang, Jiancan Wu, Xiangnan He, Tat-Seng Chua","doi":"10.1145/3644392","DOIUrl":"https://doi.org/10.1145/3644392","url":null,"abstract":"<p>In graph classification, attention- and pooling-based graph neural networks (GNNs) predominate to extract salient features from the input graph and support the prediction. They mostly follow the paradigm of “learning to attend”, which maximizes the mutual information between the attended graph and the ground-truth label. However, this paradigm causes GNN classifiers to indiscriminately absorb all statistical correlations between input features and labels in the training data, without distinguishing the causal and noncausal effects of features. Rather than emphasizing causal features, the attended graphs tend to rely on noncausal features as shortcuts to predictions. These shortcut features may easily change outside the training distribution, thereby leading to poor generalization for GNN classifiers. In this paper, we take a causal view on GNN modeling. Under our causal assumption, the shortcut feature serves as a confounder between the causal feature and prediction. It misleads the classifier into learning spurious correlations that facilitate prediction in in-distribution (ID) test evaluation, while causing significant performance drop in out-of-distribution (OOD) test data. To address this issue, we employ the backdoor adjustment from causal theory — combining each causal feature with various shortcut features, to identify causal patterns and mitigate the confounding effect. Specifically, we employ attention modules to estimate the causal and shortcut features of the input graph. Then, a memory bank collects the estimated shortcut features, enhancing the diversity of shortcut features for combination. Simultaneously, we apply the prototype strategy to improve the consistency of intra-class causal features. We term our method as CAL+, which can promote stable relationships between causal estimation and prediction, regardless of distribution changes. Extensive experiments on synthetic and real-world OOD benchmarks demonstrate our method’s effectiveness in improving OOD generalization. Our codes are released at https://github.com/shuyao-wang/CAL-plus.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139755547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaolong Ling, Enqi Xu, Peng Zhou, Liang Du, Kui Yu, Xindong Wu
Fair feature selection for classification decision tasks has recently garnered significant attention from researchers. However, existing fair feature selection algorithms fall short of providing a full explanation of the causal relationship between features and sensitive attributes, potentially impacting the accuracy of fair feature identification. To address this issue, we propose a Fair Causal Feature Selection algorithm, called FairCFS. Specifically, FairCFS constructs a localized causal graph that identifies the Markov blankets of class and sensitive variables, to block the transmission of sensitive information for selecting fair causal features. Extensive experiments on seven public real-world datasets validate that FairCFS has comparable accuracy compared to eight state-of-the-art feature selection algorithms, while presenting more superior fairness.
{"title":"Fair Feature Selection: A Causal Perspective","authors":"Zhaolong Ling, Enqi Xu, Peng Zhou, Liang Du, Kui Yu, Xindong Wu","doi":"10.1145/3643890","DOIUrl":"https://doi.org/10.1145/3643890","url":null,"abstract":"<p>Fair feature selection for classification decision tasks has recently garnered significant attention from researchers. However, existing fair feature selection algorithms fall short of providing a full explanation of the causal relationship between features and sensitive attributes, potentially impacting the accuracy of fair feature identification. To address this issue, we propose a Fair Causal Feature Selection algorithm, called FairCFS. Specifically, FairCFS constructs a localized causal graph that identifies the Markov blankets of class and sensitive variables, to block the transmission of sensitive information for selecting fair causal features. Extensive experiments on seven public real-world datasets validate that FairCFS has comparable accuracy compared to eight state-of-the-art feature selection algorithms, while presenting more superior fairness.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"36 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weighting strategy prevails in machine learning. For example, a common approach in robust machine learning is to exert low weights on samples which are likely to be noisy or quite hard. This study summarizes another less-explored strategy, namely, perturbation. Various incarnations of perturbation have been utilized but it has not been explicitly revealed. Learning with perturbation is called perturbation learning and a systematic taxonomy is constructed for it in this study. In our taxonomy, learning with perturbation is divided on the basis of the perturbation targets, directions, inference manners, and granularity levels. Many existing learning algorithms including some classical ones can be understood with the constructed taxonomy. Alternatively, these algorithms share the same component, namely, perturbation in their procedures. Furthermore, a family of new learning algorithms can be obtained by varying existing learning algorithms with our taxonomy. Specifically, three concrete new learning algorithms are proposed for robust machine learning. Extensive experiments on image classification and text sentiment analysis verify the effectiveness of the three new algorithms. Learning with perturbation can also be used in other various learning scenarios, such as imbalanced learning, clustering, regression, and so on.
{"title":"A Taxonomy for Learning with Perturbation and Algorithms","authors":"Rujing Yao, Ou Wu","doi":"10.1145/3644391","DOIUrl":"https://doi.org/10.1145/3644391","url":null,"abstract":"<p>Weighting strategy prevails in machine learning. For example, a common approach in robust machine learning is to exert low weights on samples which are likely to be noisy or quite hard. This study summarizes another less-explored strategy, namely, perturbation. Various incarnations of perturbation have been utilized but it has not been explicitly revealed. Learning with perturbation is called perturbation learning and a systematic taxonomy is constructed for it in this study. In our taxonomy, learning with perturbation is divided on the basis of the perturbation targets, directions, inference manners, and granularity levels. Many existing learning algorithms including some classical ones can be understood with the constructed taxonomy. Alternatively, these algorithms share the same component, namely, perturbation in their procedures. Furthermore, a family of new learning algorithms can be obtained by varying existing learning algorithms with our taxonomy. Specifically, three concrete new learning algorithms are proposed for robust machine learning. Extensive experiments on image classification and text sentiment analysis verify the effectiveness of the three new algorithms. Learning with perturbation can also be used in other various learning scenarios, such as imbalanced learning, clustering, regression, and so on.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"218 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}