{"title":"Spatial Attention-Based Capsule Networks With Guaranteed Group Equivariance","authors":"Ru Zeng;Yan Song;Yuzhang Qin","doi":"10.1109/TASE.2024.3438190","DOIUrl":null,"url":null,"abstract":"Some capsule networks (CapsNets) reported lately aim to enforce capsule poses and descriptors to be equivariant and invariant respectively by adding extra loss functions as regularization but without providing rigorous proof. To address this problem, a group equivariant spatial attention mechanism (GSA) is proposed to rigidly guarantee the equivariance with mathematical proof while enhancing the spatial information in capsule poses. In addition, to alleviate the computation burden associated with the conventional routing algorithm, group poolings are developed to generate the descriptors and poses of capsules, which contribute greatly to preserving the invariance and equivariance of CapsNets. With the proposed components of GSA and group poolings, a new attentive CapsNet, namely spatial attentive group equivariant CapsNets (SAGE-CapsNets), is constructed in this paper. To validate the invariance and equivariance of SAGE-CapsNets, we conduct experiments involving classification, semantic segmentation, and visualization. The results obtained from these experiments provide empirical evidence of the effectiveness of our proposed approach. Note to Practitioners—This paper is motivated by the problem that existing affine transformations in the real world generally degrade the performance of neural networks in vision tasks like image classification, segmentation, and detection. While conventional capsule networks help to alleviate this problem by learning invariant spatial relationships between features, their robustness to affine transformations is shown through empirical results without rigorous proof. To tackle this issue, we propose a novel capsule network with equivariant components, including group spatial attention and group pooling layers. These components are rigorously proven to be equivariant and greatly contribute to the model’s robustness against affine transformations. Moreover, for practical applications, our proposed attention mechanism improves model performance without significantly increasing computation. Additionally, group pooling preserves model equivariance while reducing computation overhead. As a result, our computation-saving model can be applied to real-world vision applications that require robustness to affine transformations, such as bearing fault diagnosis and facial recognition.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"6076-6087"},"PeriodicalIF":6.4000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10630653/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Some capsule networks (CapsNets) reported lately aim to enforce capsule poses and descriptors to be equivariant and invariant respectively by adding extra loss functions as regularization but without providing rigorous proof. To address this problem, a group equivariant spatial attention mechanism (GSA) is proposed to rigidly guarantee the equivariance with mathematical proof while enhancing the spatial information in capsule poses. In addition, to alleviate the computation burden associated with the conventional routing algorithm, group poolings are developed to generate the descriptors and poses of capsules, which contribute greatly to preserving the invariance and equivariance of CapsNets. With the proposed components of GSA and group poolings, a new attentive CapsNet, namely spatial attentive group equivariant CapsNets (SAGE-CapsNets), is constructed in this paper. To validate the invariance and equivariance of SAGE-CapsNets, we conduct experiments involving classification, semantic segmentation, and visualization. The results obtained from these experiments provide empirical evidence of the effectiveness of our proposed approach. Note to Practitioners—This paper is motivated by the problem that existing affine transformations in the real world generally degrade the performance of neural networks in vision tasks like image classification, segmentation, and detection. While conventional capsule networks help to alleviate this problem by learning invariant spatial relationships between features, their robustness to affine transformations is shown through empirical results without rigorous proof. To tackle this issue, we propose a novel capsule network with equivariant components, including group spatial attention and group pooling layers. These components are rigorously proven to be equivariant and greatly contribute to the model’s robustness against affine transformations. Moreover, for practical applications, our proposed attention mechanism improves model performance without significantly increasing computation. Additionally, group pooling preserves model equivariance while reducing computation overhead. As a result, our computation-saving model can be applied to real-world vision applications that require robustness to affine transformations, such as bearing fault diagnosis and facial recognition.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.