The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.
{"title":"A Bitwise GAC Algorithm for Alldifferent Constraints","authors":"Z. Li, Yao-Ming Wang, Zhanshan Li","doi":"10.24963/ijcai.2023/221","DOIUrl":"https://doi.org/10.24963/ijcai.2023/221","url":null,"abstract":"The generalized arc consistency (GAC) algorithm is the prevailing solution for alldifferent constraint problems. The core part of GAC for alldifferent constraints is excavating and enumerating all the strongly connected components (SCCs) of the graph model. This causes a large amount of complex data structures to maintain the node information, leading to a large overhead both in time and memory space. More critically, the complexity of the data structures further precludes the coordination of different optimization schemes for GAC. To solve this problem, the key observation of this paper is that the GAC algorithm only cares whether a node of the graph model is in an SCC or not, rather than which SCCs it belongs to. Based on this observation, we propose AllDiffbit, which employs bitwise data structures and operations to efficiently determine if a node is in an SCC. This greatly reduces the corresponding overhead, and enhances the ability to incorporate existing optimizations to work in a synergistic way. Our experiments show that AllDiffbit outperforms the state-of-the-art GAC algorithms over 60%.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116173312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Hu, Lingling Zhang, Jun Liu, Xinyu Zhang, Wenjun Wu, Qianying Wang
Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https://github.com/AIProCode/GPA.
{"title":"Diagram Visual Grounding: Learning to See with Gestalt-Perceptual Attention","authors":"Xin Hu, Lingling Zhang, Jun Liu, Xinyu Zhang, Wenjun Wu, Qianying Wang","doi":"10.24963/ijcai.2023/93","DOIUrl":"https://doi.org/10.24963/ijcai.2023/93","url":null,"abstract":"Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https://github.com/AIProCode/GPA.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"53 91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121494132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The conservation and the restoration of biodiversity, in accordance with human well-being, is a necessary condition for the realization of several Sustainable Development Goals. However, there is still an important gap between biodiversity research and the management of natural areas. This research project aims to reduce this gap by proposing spatial planning methods that robustly and accurately integrate socio-ecological issues. Artificial intelligence, and notably Constraint Programming, will play a central role and will make it possible to remove the methodological obstacles that prevent us from properly addressing the complexity and heterogeneity of sustainability issues in the management of ecosystems. The whole will be articulated in three axes: (i) integrate socio-ecological dynamics into spatial planning, (ii) rely on adequate landscape metrics in spatial planning, (iii) scaling up spatial planning methods performances. The main study context of this project is the sustainable management of tropical forests, with a particular focus on New Caledonia and West Africa.
{"title":"AI and Decision Support for Sustainable Socio-Ecosystems","authors":"Dimitri Justeau‐Allaire","doi":"10.24963/ijcai.2023/707","DOIUrl":"https://doi.org/10.24963/ijcai.2023/707","url":null,"abstract":"The conservation and the restoration of biodiversity, in accordance with human well-being, is a necessary condition for the realization of several Sustainable Development Goals. However, there is still an important gap between biodiversity research and the management of natural areas. This research project aims to reduce this gap by proposing spatial planning methods that robustly and accurately integrate socio-ecological issues. Artificial intelligence, and notably Constraint Programming, will play a central role and will make it possible to remove the methodological obstacles that prevent us from properly addressing the complexity and heterogeneity of sustainability issues in the management of ecosystems. The whole will be articulated in three axes: (i) integrate socio-ecological dynamics into spatial planning, (ii) rely on adequate landscape metrics in spatial planning, (iii) scaling up spatial planning methods performances. The main study context of this project is the sustainable management of tropical forests, with a particular focus on New Caledonia and West Africa.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"405 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114007942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computer-Aided Design (CAD) plays a crucial role in industrial manufacturing by providing geometry information and the construction workflow for manufactured objects. The construction information enables effective re-editing of parametric CAD models. While boundary representation (B-Rep) is the standard format for representing geometry structures, JSON format is an alternative due to the lack of uniform criteria for storing the construction workflow. Regrettably, most CAD models available on the Internet only offer geometry information, omitting the construction procedure and hampering creation efficiency. This paper proposes a learning approach CADParser to infer the underlying modeling sequences given a B-Rep CAD model. It achieves this by treating the CAD geometry structure as a graph and the construction workflow as a sequence. Since the existing CAD dataset only contains two operations (i.e., Sketch and Extrusion), limiting the diversity of the CAD model creation, we also introduce a large-scale dataset incorporating a more comprehensive range of operations such as Revolution, Fillet, and Chamfer. Each model includes both the geometry structure and the construction sequences. Extensive experiments demonstrate that our method can compete with the existing state-of-the-art methods quantitatively and qualitatively. Data is available at https://drive.google.com/CADParserData.
{"title":"CADParser: A Learning Approach of Sequence Modeling for B-Rep CAD","authors":"Shengdi Zhou, Tianyi Tang, Bin Zhou","doi":"10.24963/ijcai.2023/200","DOIUrl":"https://doi.org/10.24963/ijcai.2023/200","url":null,"abstract":"Computer-Aided Design (CAD) plays a crucial role in industrial manufacturing by providing geometry information and the construction workflow for manufactured objects. The construction information enables effective re-editing of parametric CAD models. While boundary representation (B-Rep) is the standard format for representing geometry structures, JSON format is an alternative due to the lack of uniform criteria for storing the construction workflow. Regrettably, most CAD models available on the Internet only offer geometry information, omitting the construction procedure and hampering creation efficiency. This paper proposes a learning approach CADParser to infer the underlying modeling sequences given a B-Rep CAD model. It achieves this by treating the CAD geometry structure as a graph and the construction workflow as a sequence. Since the existing CAD dataset only contains two operations (i.e., Sketch and Extrusion), limiting the diversity of the CAD model creation, we also introduce a large-scale dataset incorporating a more comprehensive range of operations such as Revolution, Fillet, and Chamfer. Each model includes both the geometry structure and the construction sequences. Extensive experiments demonstrate that our method can compete with the existing state-of-the-art methods quantitatively and qualitatively. Data is available at https://drive.google.com/CADParserData.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124440232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanbei Liu, Yu Zhao, Xiao Wang, Lei Geng, Zhitao Xiao
Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.
{"title":"Multi-Scale Subgraph Contrastive Learning","authors":"Yanbei Liu, Yu Zhao, Xiao Wang, Lei Geng, Zhitao Xiao","doi":"10.24963/ijcai.2023/246","DOIUrl":"https://doi.org/10.24963/ijcai.2023/246","url":null,"abstract":"Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124104716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multilingual Intent Detection and explore its different characteristics are major field of study for last few years. But, detection of intention dynamics from text or voice, especially in the Indian multilingual contexts, is a challenging task. So, my first research question is on intent detection and then I work on the application in Indian Multilingual Healthcare scenario. Speech dialogue systems are designed by a pre-defined set of intents to perform user specified tasks. Newer intentions may surface over time that call for retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically. Hence, here are two crucial jobs: (a) recognizing newly emergent intents; and (b) annotating the data of the new intents in order to effectively retrain the underlying classifier. The tasks become specially challenging when a large number of new intents emerge simultaneously and there is a limited budget of manual annotation. We develop MNID (Multiple Novel Intent Detection), a cluster based framework that can identify multiple novel intents while optimized human annotation cost. Empirical findings on numerous benchmark datasets (of varying sizes) show that MNID surpasses the baseline approaches in terms of accuracy and F1-score by wisely allocating the budget for annotation. We apply intent detection approach on different domains in Indian multilingual scenarios - healthcare, finance etc. The creation of advanced NLU healthcare systems is threatened by the lack of data and technology constraints for resource-poor languages in developing nations like India. We evaluate the current state of several cutting-edge language models used in the healthcare with the goal of detecting query intents and corresponding entities. We conduct comprehensive trials on a number of models different realistic contexts, and we investigate the practical relevance depending on budget and the availability of data on English.
{"title":"Exploring Multilingual Intent Dynamics and Applications","authors":"Ankan Mullick","doi":"10.24963/ijcai.2023/818","DOIUrl":"https://doi.org/10.24963/ijcai.2023/818","url":null,"abstract":"Multilingual Intent Detection and explore its different characteristics are major field of study for last few years. But, detection of intention dynamics from text or voice, especially in the Indian multilingual contexts, is a challenging task. So, my first research question is on intent detection and then I work on the application in Indian Multilingual Healthcare scenario. Speech dialogue systems are designed by a pre-defined set of intents to perform user specified tasks. Newer intentions may surface\u0000\u0000over time that call for retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically.\u0000\u0000Hence, here are two crucial jobs: (a) recognizing newly emergent intents; and (b) annotating the data of the new intents in order\u0000\u0000to effectively retrain the underlying classifier. The tasks become specially challenging when a large number of new intents emerge\u0000\u0000simultaneously and there is a limited budget of manual annotation. We develop MNID (Multiple Novel Intent Detection), a cluster\u0000\u0000based framework that can identify multiple novel intents while optimized human annotation cost. Empirical findings on numerous\u0000\u0000benchmark datasets (of varying sizes) show that MNID surpasses the baseline approaches in terms of accuracy and F1-score by wisely allocating the budget for annotation. We apply intent detection approach on different domains in Indian multilingual scenarios -\u0000\u0000healthcare, finance etc. The creation of advanced NLU healthcare systems is threatened by the lack of data and technology constraints for resource-poor languages in developing nations like India. We evaluate the current state of several cutting-edge language models used in the healthcare with the goal of detecting query intents and corresponding entities. We conduct comprehensive trials on a\u0000\u0000number of models different realistic contexts, and we investigate the practical relevance depending on budget and the availability of\u0000\u0000data on English.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127785152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing literature on confidentiality in knowledge representation and reasoning sometimes may cause a false sense of security, due to lack of details about the attacker, and some misconceptions about security-related concepts. This paper analyzes the vulnerabilities of some recent knowledge protection methods to increase the awareness about their actual effectiveness and their mutual differences.
{"title":"A False Sense of Security (Extended Abstract)","authors":"P. Bonatti","doi":"10.24963/ijcai.2023/770","DOIUrl":"https://doi.org/10.24963/ijcai.2023/770","url":null,"abstract":"The growing literature on confidentiality in knowledge representation and reasoning sometimes may cause a false sense of security, due to lack of details about\u0000\u0000the attacker, and some misconceptions about security-related concepts. This paper\u0000\u0000analyzes the vulnerabilities of some recent knowledge protection methods to increase the awareness about their actual effectiveness and their mutual differences.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianyong Fang, Yu Shi, Qingqing Guo, Linbo Wang, Zhengyi Liu
This article proposes a novel spectral domain based solution to the challenging polyp segmentation. The main contribution is based on an interesting finding of the significant existence of the middle frequency sub-band during the CNN process. Consequently, a Sub-Band based Attention (SBA) module is proposed, which uniformly adopts either the high or middle sub-bands of the encoder features to boost the decoder features and thus concretely improve the feature discrimination. A strong encoder supplying informative sub-bands is also very important, while we highly value the local-and-global information enriched CNN features. Therefore, a Transformer Attended Convolution (TAC) module as the main encoder block is introduced. It takes the Transformer features to boost the CNN features with stronger long-range object contexts. The combination of SBA and TAC leads to a novel polyp segmentation framework, SBA-Net. It adopts TAC to effectively obtain encoded features which also input to SBA, so that efficient sub-bands based attention maps can be generated for progressively decoding the bottleneck features. Consequently, SBA-Net can achieve the robust polyp segmentation, as the experimental results demonstrate.
本文提出了一种新的基于谱域的息肉分割方法。主要贡献是基于一个有趣的发现,即在CNN过程中显著存在中频子带。为此,提出了一种基于子带的注意(Sub-Band based Attention, SBA)模块,该模块统一采用编码器特征的高或中子带来增强译码器特征,从而具体提高特征识别率。提供信息子带的强大编码器也非常重要,同时我们高度重视局部和全局信息丰富的CNN特征。因此,引入变压器参与卷积(TAC)模块作为主要的编码器模块。它使用Transformer特性来增强具有更强远程对象上下文的CNN特性。结合SBA和TAC,形成了一种新的息肉分割框架SBA- net。采用TAC有效获取编码特征,并将编码特征输入到SBA中,生成高效的基于子带的注意图,对瓶颈特征进行逐级解码。实验结果表明,SBA-Net可以实现对息肉的鲁棒性分割。
{"title":"Sub-Band Based Attention for Robust Polyp Segmentation","authors":"Xianyong Fang, Yu Shi, Qingqing Guo, Linbo Wang, Zhengyi Liu","doi":"10.24963/ijcai.2023/82","DOIUrl":"https://doi.org/10.24963/ijcai.2023/82","url":null,"abstract":"This article proposes a novel spectral domain based solution to the challenging polyp segmentation. The main contribution is based on an interesting finding of the significant existence of the middle frequency sub-band during the CNN process. Consequently, a Sub-Band based Attention (SBA) module is proposed, which uniformly adopts either the high or middle sub-bands of the encoder features to boost the decoder features and thus concretely improve the feature discrimination. A strong encoder supplying informative sub-bands is also very important, while we highly value the local-and-global information enriched CNN features. Therefore, a Transformer Attended Convolution (TAC) module as the main encoder block is introduced. It takes the Transformer features to boost the CNN features with stronger long-range object contexts. The combination of SBA and TAC leads to a novel polyp segmentation framework, SBA-Net. It adopts TAC to effectively obtain encoded features which also input to SBA, so that efficient sub-bands based attention maps can be generated for progressively decoding the bottleneck features. Consequently, SBA-Net can achieve the robust polyp segmentation, as the experimental results demonstrate.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125602710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianxiong Tang, Jianhuang Lai, Xiaohua Xie, Lingxiao Yang
Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design a two-stage algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.
{"title":"Spike Count Maximization for Neuromorphic Vision Recognition","authors":"Jianxiong Tang, Jianhuang Lai, Xiaohua Xie, Lingxiao Yang","doi":"10.24963/ijcai.2023/473","DOIUrl":"https://doi.org/10.24963/ijcai.2023/473","url":null,"abstract":"Spiking Neural Networks (SNNs) are the promising models of neuromorphic vision recognition. The mean square error (MSE) and cross-entropy (CE) losses are widely applied to supervise the training of SNNs on neuromorphic datasets. However, the relevance between the output spike counts and predictions is not well modeled by the existing loss functions. This paper proposes a Spike Count Maximization (SCM) training approach for the SNN-based neuromorphic vision recognition model based on optimizing the output spike counts. The SCM is achieved by structural risk minimization (SRM) and a specially designed spike counting loss. The spike counting loss counts the output spikes of the SNN by using the L0-norm, and the SRM maximizes the distance between the margin boundaries of the classifier to ensure the generalization of the model. The SCM is non-smooth and non-differentiable, and we design a two-stage algorithm with fast convergence to solve the problem. Experiment results demonstrate that the SCM performs satisfactorily in most cases. Using the output spikes for prediction, the accuracies of SCM are 2.12%~16.50% higher than the popular training losses on the CIFAR10-DVS dataset. The code is available at https://github.com/TJXTT/SCM-SNN.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122271890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Label distribution is an effective label form to portray label polysemy (i.e., the cases that an instance can be described by multiple labels simultaneously). However, the expensive annotating cost of label distributions limits its application to a wider range of practical tasks. Therefore, LE (label enhancement) techniques are extensively studied to solve this problem. Existing LE algorithms mostly estimate label distributions by the instance relation or the label relation. However, they suffer from biased instance relations, limited model capabilities, or suboptimal local label correlations. Therefore, in this paper, we propose a deep generative model called JRC to simultaneously learn and cluster the joint implicit representations of both features and labels, which can be used to improve any existing LE algorithm involving the instance relation or local label correlations. Besides, we develop a novel label distribution recovery module, and then integrate it with JRC model, thus constituting a novel generative label enhancement model that utilizes the learned joint implicit representations and instance clusters in a principled way. Finally, extensive experiments validate our proposal.
{"title":"Label Enhancement via Joint Implicit Representation Clustering","authors":"Yunan Lu, Weiwei Li, Xiuyi Jia","doi":"10.24963/ijcai.2023/447","DOIUrl":"https://doi.org/10.24963/ijcai.2023/447","url":null,"abstract":"Label distribution is an effective label form to portray label polysemy (i.e., the cases that an instance can be described by multiple labels simultaneously). However, the expensive annotating cost of label distributions limits its application to a wider range of practical tasks. Therefore, LE (label enhancement) techniques are extensively studied to solve this problem. Existing LE algorithms mostly estimate label distributions by the instance relation or the label relation. However, they suffer from biased instance relations, limited model capabilities, or suboptimal local label correlations. Therefore, in this paper, we propose a deep generative model called JRC to simultaneously learn and cluster the joint implicit representations of both features and labels, which can be used to improve any existing LE algorithm involving the instance relation or local label correlations. Besides, we develop a novel label distribution recovery module, and then integrate it with JRC model, thus constituting a novel generative label enhancement model that utilizes the learned joint implicit representations and instance clusters in a principled way. Finally, extensive experiments validate our proposal.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131949138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}