Spatial Attention-Based Capsule Networks With Guaranteed Group Equivariance

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-08-07 DOI:10.1109/TASE.2024.3438190

Ru Zeng;Yan Song;Yuzhang Qin

{"title":"Spatial Attention-Based Capsule Networks With Guaranteed Group Equivariance","authors":"Ru Zeng;Yan Song;Yuzhang Qin","doi":"10.1109/TASE.2024.3438190","DOIUrl":null,"url":null,"abstract":"Some capsule networks (CapsNets) reported lately aim to enforce capsule poses and descriptors to be equivariant and invariant respectively by adding extra loss functions as regularization but without providing rigorous proof. To address this problem, a group equivariant spatial attention mechanism (GSA) is proposed to rigidly guarantee the equivariance with mathematical proof while enhancing the spatial information in capsule poses. In addition, to alleviate the computation burden associated with the conventional routing algorithm, group poolings are developed to generate the descriptors and poses of capsules, which contribute greatly to preserving the invariance and equivariance of CapsNets. With the proposed components of GSA and group poolings, a new attentive CapsNet, namely spatial attentive group equivariant CapsNets (SAGE-CapsNets), is constructed in this paper. To validate the invariance and equivariance of SAGE-CapsNets, we conduct experiments involving classification, semantic segmentation, and visualization. The results obtained from these experiments provide empirical evidence of the effectiveness of our proposed approach. Note to Practitioners—This paper is motivated by the problem that existing affine transformations in the real world generally degrade the performance of neural networks in vision tasks like image classification, segmentation, and detection. While conventional capsule networks help to alleviate this problem by learning invariant spatial relationships between features, their robustness to affine transformations is shown through empirical results without rigorous proof. To tackle this issue, we propose a novel capsule network with equivariant components, including group spatial attention and group pooling layers. These components are rigorously proven to be equivariant and greatly contribute to the model’s robustness against affine transformations. Moreover, for practical applications, our proposed attention mechanism improves model performance without significantly increasing computation. Additionally, group pooling preserves model equivariance while reducing computation overhead. As a result, our computation-saving model can be applied to real-world vision applications that require robustness to affine transformations, such as bearing fault diagnosis and facial recognition.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"6076-6087"},"PeriodicalIF":6.4000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10630653/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Some capsule networks (CapsNets) reported lately aim to enforce capsule poses and descriptors to be equivariant and invariant respectively by adding extra loss functions as regularization but without providing rigorous proof. To address this problem, a group equivariant spatial attention mechanism (GSA) is proposed to rigidly guarantee the equivariance with mathematical proof while enhancing the spatial information in capsule poses. In addition, to alleviate the computation burden associated with the conventional routing algorithm, group poolings are developed to generate the descriptors and poses of capsules, which contribute greatly to preserving the invariance and equivariance of CapsNets. With the proposed components of GSA and group poolings, a new attentive CapsNet, namely spatial attentive group equivariant CapsNets (SAGE-CapsNets), is constructed in this paper. To validate the invariance and equivariance of SAGE-CapsNets, we conduct experiments involving classification, semantic segmentation, and visualization. The results obtained from these experiments provide empirical evidence of the effectiveness of our proposed approach. Note to Practitioners—This paper is motivated by the problem that existing affine transformations in the real world generally degrade the performance of neural networks in vision tasks like image classification, segmentation, and detection. While conventional capsule networks help to alleviate this problem by learning invariant spatial relationships between features, their robustness to affine transformations is shown through empirical results without rigorous proof. To tackle this issue, we propose a novel capsule network with equivariant components, including group spatial attention and group pooling layers. These components are rigorously proven to be equivariant and greatly contribute to the model’s robustness against affine transformations. Moreover, for practical applications, our proposed attention mechanism improves model performance without significantly increasing computation. Additionally, group pooling preserves model equivariance while reducing computation overhead. As a result, our computation-saving model can be applied to real-world vision applications that require robustness to affine transformations, such as bearing fault diagnosis and facial recognition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于空间注意力的胶囊网络，可保证群体等差性

最近报道的一些胶囊网络（CapsNets）旨在通过添加额外的损失函数作为正则化来强制胶囊位姿和描述符分别是等变和不变的，但没有提供严格的证明。为了解决这一问题，提出了一种群体等变空间注意机制（GSA），在增强胶囊姿态空间信息的同时，用数学证明严格保证其等变。此外，为了减轻传统路由算法的计算负担，提出了分组池来生成胶囊的描述符和姿态，这有助于保持胶囊的不变性和等变性。利用本文提出的GSA和群池的组成部分，构造了一种新的关注CapsNet，即空间关注群等变CapsNet （sage -CapsNet）。为了验证sage - capnet的不变性和等变性，我们进行了分类、语义分割和可视化实验。从这些实验中获得的结果为我们提出的方法的有效性提供了经验证据。从业人员注意事项-本文的动机是由于现实世界中现有的仿射变换通常会降低神经网络在图像分类，分割和检测等视觉任务中的性能。虽然传统的胶囊网络通过学习特征之间的不变空间关系来帮助缓解这个问题，但它们对仿射变换的鲁棒性是通过经验结果来证明的，没有严格的证明。为了解决这个问题，我们提出了一个具有等变成分的新型胶囊网络，包括群体空间注意层和群体池化层。这些成分被严格地证明是等变的，并极大地促进了模型对仿射变换的鲁棒性。此外，在实际应用中，我们提出的注意机制在不显著增加计算量的情况下提高了模型的性能。此外，组池在减少计算开销的同时保持了模型的等价性。因此，我们节省计算的模型可以应用于需要对仿射变换具有鲁棒性的现实世界视觉应用，例如轴承故障诊断和面部识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.