Pub Date : 2024-01-31DOI: 10.48550/arXiv.2401.17527
Haotian Ling, Zhihai Wang, Jie Wang
Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), as they significantly tighten the dual bounds and improve the solving performance. A key problem for cuts is when to stop cuts generation, which is important for the efficiency of solving MILPs. However, many modern MILP solvers employ hard-coded heuristics to tackle this problem, which tends to neglect underlying patterns among MILPs from certain applications. To address this challenge, we formulate the cuts generation stopping problem as a reinforcement learning problem and propose a novel hybrid graph representation model (HYGRO) to learn effective stopping strategies. An appealing feature of HYGRO is that it can effectively capture both the dynamic and static features of MILPs, enabling dynamic decision-making for the stopping strategies. To the best of our knowledge, HYGRO is the first data-driven method to tackle the cuts generation stopping problem. By integrating our approach with modern solvers, experiments demonstrate that HYGRO significantly improves the efficiency of solving MILPs compared to competitive baselines, achieving up to 31% improvement.
{"title":"Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming","authors":"Haotian Ling, Zhihai Wang, Jie Wang","doi":"10.48550/arXiv.2401.17527","DOIUrl":"https://doi.org/10.48550/arXiv.2401.17527","url":null,"abstract":"Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), as they significantly tighten the dual bounds and improve the solving performance. A key problem for cuts is when to stop cuts generation, which is important for the efficiency of solving MILPs. However, many modern MILP solvers employ hard-coded heuristics to tackle this problem, which tends to neglect underlying patterns among MILPs from certain applications. To address this challenge, we formulate the cuts generation stopping problem as a reinforcement learning problem and propose a novel hybrid graph representation model (HYGRO) to learn effective stopping strategies. An appealing feature of HYGRO is that it can effectively capture both the dynamic and static features of MILPs, enabling dynamic decision-making for the stopping strategies. To the best of our knowledge, HYGRO is the first data-driven method to tackle the cuts generation stopping problem. By integrating our approach with modern solvers, experiments demonstrate that HYGRO significantly improves the efficiency of solving MILPs compared to competitive baselines, achieving up to 31% improvement.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"206 ","pages":"20759-20767"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140529474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either relies on domain-expert knowledge or regarding the skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.
{"title":"A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction","authors":"Wenshuo Chao, Zhaopeng Qiu, Likang Wu, Zhuoning Guo, Zhi Zheng, Hengshu Zhu, Hao Liu","doi":"10.48550/arXiv.2401.17838","DOIUrl":"https://doi.org/10.48550/arXiv.2401.17838","url":null,"abstract":"The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either relies on domain-expert knowledge or regarding the skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. \u0000In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"173 ","pages":"19813-19822"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140529291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.48550/arXiv.2401.15820
Yong Guan, Freddy Lecue, Jiaoyan Chen, Ru Li, Jeff Z. Pan
Although neural models have achieved remarkable performance, they still encounter doubts due to the intransparency. To this end, model prediction explanation is attracting more and more attentions. However, current methods rarely incorporate external knowledge and still suffer from three limitations: (1) Neglecting concept completeness. Merely selecting concepts may not sufficient for prediction. (2) Lacking concept fusion. Failure to merge semantically-equivalent concepts. (3) Difficult in manipulating model behavior. Lack of verification for explanation on original model. To address these issues, we propose a novel knowledge-aware neuron interpretation framework to explain model predictions for image scene classification. Specifically, for concept completeness, we present core concepts of a scene based on knowledge graph, ConceptNet, to gauge the completeness of concepts. Our method, incorporating complete concepts, effectively provides better prediction explanations compared to baselines. Furthermore, for concept fusion, we introduce a knowledge graph-based method known as Concept Filtering, which produces over 23% point gain on neuron behaviors for neuron interpretation. At last, we propose Model Manipulation, which aims to study whether the core concepts based on ConceptNet could be employed to manipulate model behavior. The results show that core concepts can effectively improve the performance of original model by over 26%.
{"title":"Knowledge-Aware Neuron Interpretation for Scene Classification","authors":"Yong Guan, Freddy Lecue, Jiaoyan Chen, Ru Li, Jeff Z. Pan","doi":"10.48550/arXiv.2401.15820","DOIUrl":"https://doi.org/10.48550/arXiv.2401.15820","url":null,"abstract":"Although neural models have achieved remarkable performance, they still encounter doubts due to the intransparency. To this end, model prediction explanation is attracting more and more attentions. However, current methods rarely incorporate external knowledge and still suffer from three limitations: (1) Neglecting concept completeness. Merely selecting concepts may not sufficient for prediction. (2) Lacking concept fusion. Failure to merge semantically-equivalent concepts. (3) Difficult in manipulating model behavior. Lack of verification for explanation on original model. To address these issues, we propose a novel knowledge-aware neuron interpretation framework to explain model predictions for image scene classification. Specifically, for concept completeness, we present core concepts of a scene based on knowledge graph, ConceptNet, to gauge the completeness of concepts. Our method, incorporating complete concepts, effectively provides better prediction explanations compared to baselines. Furthermore, for concept fusion, we introduce a knowledge graph-based method known as Concept Filtering, which produces over 23% point gain on neuron behaviors for neuron interpretation. At last, we propose Model Manipulation, which aims to study whether the core concepts based on ConceptNet could be employed to manipulate model behavior. The results show that core concepts can effectively improve the performance of original model by over 26%.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"72 5","pages":"1950-1958"},"PeriodicalIF":0.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140529588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.48550/arXiv.2401.15447
Lokesh Nagalapatti, Akshay Iyer, Abir De, Sunita Sarawagi
We address the Individualized continuous treatment effect (ICTE) estimation problem where we predict the effect of any continuous valued treatment on an individual using ob- servational data. The main challenge in this estimation task is the potential confounding of treatment assignment with in- dividual’s covariates in the training data, whereas during in- ference ICTE requires prediction on independently sampled treatments. In contrast to prior work that relied on regularizers or unstable GAN training, we advocate the direct approach of augmenting training individuals with independently sam- pled treatments and inferred counterfactual outcomes. We in- fer counterfactual outcomes using a two-pronged strategy: a Gradient Interpolation for close-to-observed treatments, and a Gaussian Process based Kernel Smoothing which allows us to down weigh high variance inferences. We evaluate our method on five benchmarks and show that our method out- performs six state-of-the-art methods on the counterfactual estimation error. We analyze the superior performance of our method by showing that (1) our inferred counterfactual re- sponses are more accurate, and (2) adding them to the train- ing data reduces the distributional distance between the con- founded training distribution and test distribution where treat- ment is independent of covariates. Our proposed method is model-agnostic and we show that it improves ICTE accuracy of several existing models.
我们要解决个体化连续治疗效果(ICTE)估计问题,即利用观察数据预测任何连续治疗对个体的效果。这一估计任务的主要挑战在于治疗分配可能与训练数据中的个体协变量混淆,而在推断过程中,ICTE 需要对独立采样的治疗进行预测。与之前依赖正则化器或不稳定 GAN 训练的工作不同,我们主张直接使用独立采样的治疗方法和推断出的反事实结果来增强训练个体。我们使用双管齐下的策略来推断反事实结果:梯度插值法用于接近观察到的处理方法,而基于高斯过程的核平滑法则允许我们降低高方差推断的权重。我们在五个基准上对我们的方法进行了评估,结果表明我们的方法在反事实估计误差方面优于六种最先进的方法。我们分析了我们的方法的优越性能,表明:(1) 我们推断出的反事实再反应更准确;(2) 将它们添加到训练数据中,可以减少在处理与协变量无关的情况下,有根据的训练分布与测试分布之间的分布距离。我们提出的方法与模型无关,而且我们的研究表明,它提高了几个现有模型的 ICTE 准确性。
{"title":"Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing","authors":"Lokesh Nagalapatti, Akshay Iyer, Abir De, Sunita Sarawagi","doi":"10.48550/arXiv.2401.15447","DOIUrl":"https://doi.org/10.48550/arXiv.2401.15447","url":null,"abstract":"We address the Individualized continuous treatment effect\u0000(ICTE) estimation problem where we predict the effect of\u0000any continuous valued treatment on an individual using ob-\u0000servational data. The main challenge in this estimation task\u0000is the potential confounding of treatment assignment with in-\u0000dividual’s covariates in the training data, whereas during in-\u0000ference ICTE requires prediction on independently sampled\u0000treatments. In contrast to prior work that relied on regularizers\u0000or unstable GAN training, we advocate the direct approach\u0000of augmenting training individuals with independently sam-\u0000pled treatments and inferred counterfactual outcomes. We in-\u0000fer counterfactual outcomes using a two-pronged strategy: a\u0000Gradient Interpolation for close-to-observed treatments, and\u0000a Gaussian Process based Kernel Smoothing which allows\u0000us to down weigh high variance inferences. We evaluate our\u0000method on five benchmarks and show that our method out-\u0000performs six state-of-the-art methods on the counterfactual\u0000estimation error. We analyze the superior performance of our\u0000method by showing that (1) our inferred counterfactual re-\u0000sponses are more accurate, and (2) adding them to the train-\u0000ing data reduces the distributional distance between the con-\u0000founded training distribution and test distribution where treat-\u0000ment is independent of covariates. Our proposed method is\u0000model-agnostic and we show that it improves ICTE accuracy\u0000of several existing models.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"178 6","pages":"14397-14404"},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-26DOI: 10.48550/arXiv.2401.14729
Chao Chen, Jie Liu, Chang Zhou, Jie Tang, Gangshan Wu
Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or keypoint-based, suffer from depicting lanes effectively and efficiently. Proposal-based methods detect lanes by distinguishing and regressing a collection of proposals in a streamlined top-down way, yet lack sufficient flexibility in lane representation. Keypoint-based methods, on the other hand, construct lanes flexibly from local descriptors, which typically entail complicated post-processing. In this paper, we present a “Sketch-and-Refine” paradigm that utilizes the merits of both keypoint-based and proposal-based methods. The motivation is that local directions of lanes are semantically simple and clear. At the “Sketch” stage, local directions of keypoints can be easily estimated by fast convolutional layers. Then we can build a set of lane proposals accordingly with moderate accuracy. At the “Refine” stage, we further optimize these proposals via a novel Lane Segment Association Module (LSAM), which allows adaptive lane segment adjustment. Last but not least, we propose multi-level feature integration to enrich lane feature representations more efficiently. Based on the proposed “Sketch-and-Refine” paradigm, we propose a fast yet effective lane detector dubbed “SRLane”. Experiments show that our SRLane can run at a fast speed (i.e., 278 FPS) while yielding an F1 score of 78.9%. The source code is available at: https://github.com/passerer/SRLane.
{"title":"Sketch and Refine: Towards Fast and Accurate Lane Detection","authors":"Chao Chen, Jie Liu, Chang Zhou, Jie Tang, Gangshan Wu","doi":"10.48550/arXiv.2401.14729","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14729","url":null,"abstract":"Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or keypoint-based, suffer from depicting lanes effectively and efficiently. Proposal-based methods detect lanes by distinguishing and regressing a collection of proposals in a streamlined top-down way, yet lack sufficient flexibility in lane representation. Keypoint-based methods, on the other hand, construct lanes flexibly from local descriptors, which typically entail complicated post-processing. In this paper, we present a “Sketch-and-Refine” paradigm that utilizes the merits of both keypoint-based and proposal-based methods. The motivation is that local directions of lanes are semantically simple and clear. At the “Sketch” stage, local directions of keypoints can be easily estimated by fast convolutional layers. Then we can build a set of lane proposals accordingly with moderate accuracy. At the “Refine” stage, we further optimize these proposals via a novel Lane Segment Association Module (LSAM), which allows adaptive lane segment adjustment. Last but not least, we propose multi-level feature integration to enrich lane feature representations more efficiently. Based on the proposed “Sketch-and-Refine” paradigm, we propose a fast yet effective lane detector dubbed “SRLane”. Experiments show that our SRLane can run at a fast speed (i.e., 278 FPS) while yielding an F1 score of 78.9%. The source code is available at: https://github.com/passerer/SRLane.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"23 2","pages":"1001-1009"},"PeriodicalIF":0.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-25DOI: 10.48550/arXiv.2401.14184
Anastasia Kurmukova, Deniz Gündüz
This paper introduces a novel approach called "friendly attack" aimed at enhancing the performance of error correction channel codes. Inspired by the concept of adversarial attacks, our method leverages the idea of introducing slight perturbations to the neural network input, resulting in a substantial impact on the network's performance. By introducing small perturbations to fixed-point modulated codewords before transmission, we effectively improve the decoder's performance without violating the input power constraint. The perturbation design is accomplished by a modified iterative fast gradient method. This study investigates various decoder architectures suitable for computing gradients to obtain the desired perturbations. Specifically, we consider belief propagation (BP) for LDPC codes; the error correcting code transformer, BP and neural BP (NBP) for polar codes, and neural BCJR for convolutional codes. We demonstrate that the proposed friendly attack method can improve the reliability across different channels, modulations, codes, and decoders. This method allows us to increase the reliability of communication with a legacy receiver by simply modifying the transmitted codeword appropriately.
本文介绍了一种名为 "友好攻击 "的新方法,旨在提高纠错信道编码的性能。受对抗攻击概念的启发,我们的方法利用了在神经网络输入中引入微小扰动,从而对网络性能产生重大影响的思想。通过在传输前对定点调制码字引入微小扰动,我们可以在不违反输入功率约束的情况下有效提高解码器的性能。扰动设计是通过改进的迭代快速梯度法完成的。本研究探讨了适合计算梯度以获得所需扰动的各种解码器架构。具体来说,我们考虑了 LDPC 码的信念传播 (BP)、纠错码变换器、极性码的 BP 和神经 BP (NBP),以及卷积码的神经 BCJR。我们证明,所提出的友好攻击方法可以提高不同信道、调制、编码和解码器的可靠性。通过这种方法,我们只需适当修改传输的码字,就能提高与传统接收器通信的可靠性。
{"title":"Friendly Attacks to Improve Channel Coding Reliability","authors":"Anastasia Kurmukova, Deniz Gündüz","doi":"10.48550/arXiv.2401.14184","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14184","url":null,"abstract":"This paper introduces a novel approach called \"friendly attack\" aimed at enhancing the performance of error correction channel codes. Inspired by the concept of adversarial attacks, our method leverages the idea of introducing slight perturbations to the neural network input, resulting in a substantial impact on the network's performance. By introducing small perturbations to fixed-point modulated codewords before transmission, we effectively improve the decoder's performance without violating the input power constraint. The perturbation design is accomplished by a modified iterative fast gradient method. This study investigates various decoder architectures suitable for computing gradients to obtain the desired perturbations. Specifically, we consider belief propagation (BP) for LDPC codes; the error correcting code transformer, BP and neural BP (NBP) for polar codes, and neural BCJR for convolutional codes. We demonstrate that the proposed friendly attack method can improve the reliability across different channels, modulations, codes, and decoders. This method allows us to increase the reliability of communication with a legacy receiver by simply modifying the transmitted codeword appropriately.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"26 1","pages":"13292-13300"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.
{"title":"On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling","authors":"Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, A. Luu","doi":"10.48550/arXiv.2401.14113","DOIUrl":"https://doi.org/10.48550/arXiv.2401.14113","url":null,"abstract":"Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"279 1-2","pages":"19261-19269"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.48550/arXiv.2401.11740
Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai
Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.
{"title":"Multi-level Cross-modal Alignment for Image Clustering","authors":"Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai","doi":"10.48550/arXiv.2401.11740","DOIUrl":"https://doi.org/10.48550/arXiv.2401.11740","url":null,"abstract":"Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"290 17","pages":"14695-14703"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-16DOI: 10.48550/arXiv.2401.08464
Binghui Xie, Yongqiang Chen, Jiaqi Wang, Kaiwen Zhou, Bo Han, Wei Meng, James Cheng
Domain generalization is a critical challenge for machine learning systems. Prior domain generalization methods focus on extracting domain-invariant features across several stationary domains to enable generalization to new domains. However, in non-stationary tasks where new domains evolve in an underlying continuous structure, such as time, merely extracting the invariant features is insufficient for generalization to the evolving new domains. Nevertheless, it is non-trivial to learn both evolving and invariant features within a single model due to their conflicts. To bridge this gap, we build causal models to characterize the distribution shifts concerning the two patterns, and propose to learn both dynamic and invariant features via a new framework called Mutual Information-Based Sequential Autoencoders (MISTS). MISTS adopts information theoretic constraints onto sequential autoencoders to disentangle the dynamic and invariant features, and leverage an adaptive classifier to make predictions based on both evolving and invariant information. Our experimental results on both synthetic and real-world datasets demonstrate that MISTS succeeds in capturing both evolving and invariant information, and present promising results in evolving domain generalization tasks.
{"title":"Enhancing Evolving Domain Generalization through Dynamic Latent Representations","authors":"Binghui Xie, Yongqiang Chen, Jiaqi Wang, Kaiwen Zhou, Bo Han, Wei Meng, James Cheng","doi":"10.48550/arXiv.2401.08464","DOIUrl":"https://doi.org/10.48550/arXiv.2401.08464","url":null,"abstract":"Domain generalization is a critical challenge for machine learning systems. Prior domain generalization methods focus on extracting domain-invariant features across several stationary domains to enable generalization to new domains. However, in non-stationary tasks where new domains evolve in an underlying continuous structure, such as time, merely extracting the invariant features is insufficient for generalization to the evolving new domains. Nevertheless, it is non-trivial to learn both evolving and invariant features within a single model due to their conflicts. To bridge this gap, we build causal models to characterize the distribution shifts concerning the two patterns, and propose to learn both dynamic and invariant features via a new framework called Mutual Information-Based Sequential Autoencoders (MISTS). MISTS adopts information theoretic constraints onto sequential autoencoders to disentangle the dynamic and invariant features, and leverage an adaptive classifier to make predictions based on both evolving and invariant information. Our experimental results on both synthetic and real-world datasets demonstrate that MISTS succeeds in capturing both evolving and invariant information, and present promising results in evolving domain generalization tasks.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"24 4","pages":"16040-16048"},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-16DOI: 10.48550/arXiv.2401.08527
Yequan Bie, Luyang Luo, Hao Chen
Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis. The code is available at https://github.com/Tommy-Bie/MICA.
{"title":"MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment","authors":"Yequan Bie, Luyang Luo, Hao Chen","doi":"10.48550/arXiv.2401.08527","DOIUrl":"https://doi.org/10.48550/arXiv.2401.08527","url":null,"abstract":"Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis. The code is available at https://github.com/Tommy-Bie/MICA.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"52 9","pages":"837-845"},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}