Yi Wu, Yanyang Xu, Wenhao Zhu, Guojie Song, Zhouchen Lin, Liangji Wang, Shaoguo Liu
In recent years, graph Transformers (GTs) have been demonstrated as a robust architecture for a wide range of graph learning tasks. However, the quadratic complexity of GTs limits their scalability on large-scale data, in comparison to Graph Neural Networks (GNNs). In this work, we propose the Kernel Decomposition Linear Graph Transformer (KDLGT), an accelerating framework for building scalable and powerful GTs. KDLGT employs the kernel decomposition approach to rearrange the order of matrix multiplication, thereby reducing complexity to linear. Additionally, it categorizes GTs into three distinct types and provides tailored accelerating methods for each category to encompass all types of GTs. Furthermore, we provide a theoretical analysis of the performance gap between KDLGT and self-attention to ensure its effectiveness. Under this framework, we select two representative GTs to design our models. Experiments on both real-world and synthetic datasets indicate that KDLGT not only achieves state-of-the-art performance on various datasets but also reaches an acceleration ratio of approximately 10 on graphs of certain sizes.
{"title":"KDLGT: A Linear Graph Transformer Framework via Kernel Decomposition Approach","authors":"Yi Wu, Yanyang Xu, Wenhao Zhu, Guojie Song, Zhouchen Lin, Liangji Wang, Shaoguo Liu","doi":"10.24963/ijcai.2023/263","DOIUrl":"https://doi.org/10.24963/ijcai.2023/263","url":null,"abstract":"In recent years, graph Transformers (GTs) have been demonstrated as a robust architecture for a wide range of graph learning tasks. However, the quadratic complexity of GTs limits their scalability on large-scale data, in comparison to Graph Neural Networks (GNNs). In this work, we propose the Kernel Decomposition Linear Graph Transformer (KDLGT), an accelerating framework for building scalable and powerful GTs. KDLGT employs the kernel decomposition approach to rearrange the order of matrix multiplication, thereby reducing complexity to linear. Additionally, it categorizes GTs into three distinct types and provides tailored accelerating methods for each category to encompass all types of GTs. Furthermore, we provide a theoretical analysis of the performance gap between KDLGT and self-attention to ensure its effectiveness. Under this framework, we select two representative GTs to design our models. Experiments on both real-world and synthetic datasets indicate that KDLGT not only achieves state-of-the-art performance on various datasets but also reaches an acceleration ratio of approximately 10 on graphs of certain sizes.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129445328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated machine learning (AutoML) has been widely researched and adopted for supervised problems, but progress in unsupervised settings has been limited. We propose `"LOTUS", a novel framework to automate outlier detection based on meta-learning. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our framework and find that it outperforms all state-of-the-art automated outlier detection tools. This approach can also be easily generalized to automate other unsupervised settings.
{"title":"AutoML for Outlier Detection with Optimal Transport Distances","authors":"Prabhant Singh, J. Vanschoren","doi":"10.24963/ijcai.2023/843","DOIUrl":"https://doi.org/10.24963/ijcai.2023/843","url":null,"abstract":"Automated machine learning (AutoML) has been widely researched and adopted for supervised problems, but progress in unsupervised settings has been limited. We propose `\"LOTUS\", a novel framework to automate outlier detection based on meta-learning. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our framework and find that it outperforms all state-of-the-art automated outlier detection tools. This approach can also be easily generalized to automate other unsupervised settings.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129643683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gender bias is a pervasive issue that impacts women's and marginalized groups' ability to fully participate in social, economic, and political spheres. This study introduces a novel problem of Gender-biased Language Identification and Extraction (GLIdE) from social media interactions and develops a multi-task deep framework that detects gender-biased content and identifies connected causal phrases from the text using emotional information that is present in the input. The method uses a zero-shot strategy with emotional information and a mechanism to represent gender-stereotyped information as a knowledge graph. In this work, we also introduce the first-of-its-kind Gender-biased Analysis Corpus (GAC) of 12,432 social media posts and improve the best-performing baseline for gender-biased language identification and extraction tasks by margins of 4.88% and 5 ROS points, demonstrating this through empirical evaluation and extensive qualitative analysis. By improving the accuracy of identifying and analyzing gender-biased language, this work can contribute to achieving gender equality and promoting inclusive societies, in line with the United Nations Sustainable Development Goals (UN SDGs) and the Leave No One Behind principle (LNOB). We adhere to the principles of transparency and collaboration in line with the UN SDGs by openly sharing our code and dataset.
{"title":"Promoting Gender Equality through Gender-biased Language Analysis in Social Media","authors":"G. Singh, Soumitra Ghosh, Asif Ekbal","doi":"10.24963/ijcai.2023/689","DOIUrl":"https://doi.org/10.24963/ijcai.2023/689","url":null,"abstract":"Gender bias is a pervasive issue that impacts women's and marginalized groups' ability to fully participate in social, economic, and political spheres. This study introduces a novel problem of Gender-biased Language Identification and Extraction (GLIdE) from social media interactions and develops a multi-task deep framework that detects gender-biased content and identifies connected causal phrases from the text using emotional information that is present in the input. The method uses a zero-shot strategy with emotional information and a mechanism to represent gender-stereotyped information as a knowledge graph. In this work, we also introduce the first-of-its-kind Gender-biased Analysis Corpus (GAC) of 12,432 social media posts and improve the best-performing baseline for gender-biased language identification and extraction tasks by margins of 4.88% and 5 ROS points, demonstrating this through empirical evaluation and extensive qualitative analysis. By improving the accuracy of identifying and analyzing gender-biased language, this work can contribute to achieving gender equality and promoting inclusive societies, in line with the United Nations Sustainable Development Goals (UN SDGs) and the Leave No One Behind principle (LNOB). We adhere to the principles of transparency and collaboration in line with the UN SDGs by openly sharing our code and dataset.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127065231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generalized planning (GP) studies the computation of general solutions for a set of planning problems. Computing general solutions with correctness guarantee has long been a key issue in GP. Abstractions are widely used to solve GP problems. For example, a popular abstraction model for GP is qualitative numeric planning (QNP), which extends classical planning with non-negative real variables that can be increased or decreased by some arbitrary amount. The refinement of correct solutions of sound abstractions are solutions with correctness guarantees for GP problems. More recent literature proposed a uniform abstraction framework for GP and gave model-theoretic definitions of sound and complete abstractions for GP problems. In this paper, based on the previous work, we explore automatic verification of sound abstractions for GP. Firstly, we present a proof-theoretic characterization for sound abstractions. Secondly, based on the characterization, we give a sufficient condition for sound abstractions with deterministic actions. Then we study how to verify the sufficient condition when the abstraction models are bounded QNPs where integer variables can be incremented or decremented by one. To this end, we develop methods to handle counting and transitive closure, which are often used to define numerical variables. Finally, we implement a sound bounded QNP abstraction verification system and report experimental results on several domains.
{"title":"Automatic Verification for Soundness of Bounded QNP Abstractions for Generalized Planning","authors":"Zhenhe Cui, Weidu Kuang, Yongmei Liu","doi":"10.24963/ijcai.2023/351","DOIUrl":"https://doi.org/10.24963/ijcai.2023/351","url":null,"abstract":"Generalized planning (GP) studies the computation of general solutions for a set of planning problems. Computing general solutions with correctness guarantee has long been a key issue in GP. Abstractions are widely used to solve GP problems. For example, a popular abstraction model for GP is qualitative numeric planning (QNP), which extends classical planning with non-negative real variables that can be increased or decreased by some arbitrary amount. The refinement of correct solutions of sound abstractions are solutions with correctness guarantees for GP problems. More recent literature proposed a uniform abstraction framework for GP and gave model-theoretic definitions of sound and complete abstractions for GP problems. In this paper, based on the previous work, we explore automatic verification of sound abstractions for GP. Firstly, we present a proof-theoretic characterization for sound abstractions. Secondly, based on the characterization, we give a sufficient condition for sound abstractions with deterministic actions. Then we study how to verify the sufficient condition when the abstraction models are bounded QNPs where integer variables can be incremented or decremented by one. To this end, we develop methods to handle counting and transitive closure, which are often used to define numerical variables. Finally, we implement a sound bounded QNP abstraction verification system and report experimental results on several domains.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As autonomous systems tackle more real-world situations, mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. We argue there are two main drawbacks to these approaches. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent's dynamics, geometry, and uncertainty. We apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent's dynamics and the user's desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. A variety of 2D and 3D test cases are presented in the full paper including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.
{"title":"Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search (Extended Abstract)","authors":"Daniel Strawser, B. Williams","doi":"10.24963/ijcai.2023/792","DOIUrl":"https://doi.org/10.24963/ijcai.2023/792","url":null,"abstract":"As autonomous systems tackle more real-world situations, mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. We argue there are two main drawbacks to these approaches. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent's dynamics, geometry, and uncertainty. We apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent's dynamics and the user's desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. A variety of 2D and 3D test cases are presented in the full paper including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127351209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the dawn of Transformer-based models, the trade-off between transparency and accuracy has been a topical issue in the NLP community. Working towards ethical and transparent automated content moderation (ACM), my goal is to find where it is still relevant to implement linguistic expertise. I show that transparent statistical models based on linguistic knowledge can still be competitive, while linguistic features have many other useful applications.
{"title":"Automated Content Moderation Using Transparent Solutions and Linguistic Expertise","authors":"Veronika Solopova","doi":"10.24963/ijcai.2023/823","DOIUrl":"https://doi.org/10.24963/ijcai.2023/823","url":null,"abstract":"Since the dawn of Transformer-based models, the trade-off between transparency and accuracy has been a topical issue in the NLP community. Working towards ethical and transparent automated content moderation (ACM), my goal is to find where it is still relevant to implement linguistic expertise. I show that transparent statistical models based on linguistic knowledge can still be competitive, while linguistic features have many other useful applications.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of a large amount of unstructured text has generated interest in utilizing it for future decision-making and developing strategies in various critical domains. Despite some progress, automatically generating accurate reasoning models from the raw text is still an active area of research. Furthermore, most proposed approaches focus on a specific do-main. As such, their suggested transformation methods are usually unreliable when applied to other domains. This research aims to develop a framework, SCANER (Semi-automated CAusal Network Extraction from Raw text), to convert raw text into Causal Bayesian Networks (CBNs). The framework will then be employed in various domains to demonstrate its utilization as a decision-support tool. The preliminary experiments have focused on three domains: political narratives, food insecurity, and medical sciences. The future focus is on developing BNs from political narratives and modifying them through various methods to reduce the level of aggressiveness or extremity in the narratives without causing conflict among the masses or countries.
{"title":"On Building a Semi-Automated Framework for Generating Causal Bayesian Networks from Raw Text","authors":"Solat J. Sheikh","doi":"10.24963/ijcai.2023/822","DOIUrl":"https://doi.org/10.24963/ijcai.2023/822","url":null,"abstract":"The availability of a large amount of unstructured text has generated interest in utilizing it for future decision-making and developing strategies in various critical domains. Despite some progress, automatically generating accurate reasoning models from the raw text is still an active area of research. Furthermore, most proposed approaches focus on a specific do-main. As such, their suggested transformation methods are usually unreliable when applied to other domains. This research aims to develop a framework, SCANER (Semi-automated CAusal Network Extraction from Raw text), to convert raw text into Causal Bayesian Networks (CBNs). The framework will then be employed in various domains to demonstrate its utilization as a decision-support tool. The preliminary experiments have focused on three domains: political narratives, food insecurity, and medical sciences. The future focus is on developing BNs from political narratives and modifying them through various methods to reduce the level of aggressiveness or extremity in the narratives without causing conflict among the masses or countries.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129067219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xixuan Hao, Danqing Huang, Jieru Lin, Chin-Yew Lin
It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts a statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection. Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added in the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learnt relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.
{"title":"Relation-enhanced DETR for Component Detection in Graphic Design Reverse Engineering","authors":"Xixuan Hao, Danqing Huang, Jieru Lin, Chin-Yew Lin","doi":"10.24963/ijcai.2023/532","DOIUrl":"https://doi.org/10.24963/ijcai.2023/532","url":null,"abstract":"It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts a statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection. Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added in the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learnt relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132097118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multicriteria decision making requires defining the result of conflicting and possibly interacting criteria. Allowing criteria interactions in a decision model increases the complexity of the preference learning task due to the combinatorial nature of the possible interactions. In this paper, we propose an approach to learn a decision model in which the interaction pattern is revealed from preference data and kept as simple as possible. We consider weighted aggregation functions like multilinear utilities or Choquet integrals, admitting representations including non-linear terms measuring the joint benefit or penalty attached to some combinations of criteria. The weighting coefficients known as Möbius masses model positive or negative synergies among criteria. We propose an approach to learn the Möbius masses, based on iterative reweighted least square for sparse recovery, and dualization to improve scalability. This approach is applied to learn sparse representations of the multilinear utility model and conjunctive/disjunctive forms of the discrete Choquet integral from preferences examples, in aggregation problems possibly involving more than 20 criteria.
{"title":"Learning Preference Models with Sparse Interactions of Criteria","authors":"Margot Herin, P. Perny, Nataliya Sokolovska","doi":"10.24963/ijcai.2023/421","DOIUrl":"https://doi.org/10.24963/ijcai.2023/421","url":null,"abstract":"Multicriteria decision making requires defining the result of conflicting and possibly interacting criteria. Allowing criteria interactions in a decision model increases the complexity of the preference learning task due to the combinatorial nature of the possible interactions. In this paper, we propose an approach to learn a decision model in which the interaction pattern is revealed from preference data and kept as simple as possible. We consider weighted aggregation functions like multilinear utilities or Choquet integrals, admitting representations including non-linear terms measuring the joint benefit or penalty attached to some combinations of criteria. The weighting coefficients known as Möbius masses model positive or negative synergies among criteria. We propose an approach to learn the Möbius masses, based on iterative reweighted least square for sparse recovery, and dualization to improve scalability. This approach is applied to learn sparse representations of the multilinear utility model and conjunctive/disjunctive forms of the discrete Choquet integral from preferences examples, in aggregation problems possibly involving more than 20 criteria.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130264657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerous machine learning models can be formulated as a stochastic minimax optimization problem, such as imbalanced data classification with AUC maximization. Developing efficient algorithms to optimize such kinds of problems is of importance and necessity. However, most existing algorithms restrict their focus on the single-machine setting so that they are incapable of dealing with the large communication overhead in a distributed training system. Moreover, most existing communication-efficient optimization algorithms only focus on the traditional minimization problem, failing to handle the minimax optimization problem. To address these challenging issues, in this paper, we develop two novel communication-efficient stochastic gradient descent ascent with momentum algorithms for the distributed minimax optimization problem, which can significantly reduce the communication cost via the two-way compression scheme. However, the compressed momentum makes it considerably challenging to investigate the convergence rate of our algorithms, especially in the presence of the interaction between the minimization and maximization subproblems. In this paper, we successfully addressed these challenges and established the convergence rate of our algorithms for nonconvex-strongly-concave problems. To the best of our knowledge, our algorithms are the first communication-efficient algorithm with theoretical guarantees for the minimax optimization problem. Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication costs.
{"title":"Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms","authors":"Yihan Zhang, M. Qiu, Hongchang Gao","doi":"10.24963/ijcai.2023/512","DOIUrl":"https://doi.org/10.24963/ijcai.2023/512","url":null,"abstract":"Numerous machine learning models can be formulated as a stochastic minimax optimization problem, such as imbalanced data classification with AUC maximization.\u0000\u0000Developing efficient algorithms to optimize such kinds of problems is of importance and necessity. However, most existing algorithms restrict their focus on the single-machine setting so that they are incapable of dealing with the large communication overhead in a distributed training system. Moreover, most existing communication-efficient optimization algorithms only focus on the traditional minimization problem, failing to handle the minimax optimization problem. To address these challenging issues, in this paper, we develop two novel communication-efficient stochastic gradient descent ascent with momentum algorithms for the distributed minimax optimization problem, which can significantly reduce the communication cost via the two-way compression scheme. However, the compressed momentum makes it considerably challenging to investigate the convergence rate of our algorithms, especially in the presence of the interaction between the minimization and maximization subproblems. In this paper, we successfully addressed these challenges and established the convergence rate of our algorithms for nonconvex-strongly-concave problems. To the best of our knowledge, our algorithms are the first communication-efficient algorithm with theoretical guarantees for the minimax optimization problem. Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication costs.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}