Learning geometric invariants through neural networks

The Visual Computer Pub Date : 2024-07-22 DOI:10.1007/s00371-024-03398-z

Arpit Rai

{"title":"Learning geometric invariants through neural networks","authors":"Arpit Rai","doi":"10.1007/s00371-024-03398-z","DOIUrl":null,"url":null,"abstract":"<p>Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40<span>\\(^{\\circ }\\)</span>, in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03398-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40\(^{\circ }\), in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过神经网络学习几何不变式

卷积神经网络已成为解决各种计算机视觉任务的基本模型。然而，这些操作只对物体的平移具有不变性，在旋转和其他仿射变换情况下，其性能会受到影响。这项研究提出了一种新型神经网络，它利用了几何不变性，包括曲率、从多个尺度的物体边界提取的曲线的高阶微分以及边缘的相对方向。这些特征对仿射变换具有不变性，可以提高神经网络形状识别的鲁棒性。我们在小型 NORB 数据集上进行的实验表明，在这些几何不变性上运行的 2 层网络的性能比 3 层卷积网络高出 9.69%，同时对仿射变换的鲁棒性更高，即使在没有任何数据增强的情况下也是如此。值得注意的是，当测试图像旋转 40 （^{\circ }\ ）时，我们的网络的测试准确率仅下降了 6%，而在相同的变换下，VGG 网络和卷积网络的准确率分别大幅下降了 51.7% 和 69%。此外，与基于 SIFT 的词袋分类器及其旋转不变扩展 RIFT 描述符等不变特征描述符相比，我们的模型显示出更强的鲁棒性，在类似的图像变换下，它们的鲁棒性分别下降了 35% 和 14.1%。我们的实验结果进一步表明，在尺度和剪切变换下的鲁棒性有所提高。此外，我们的几何不变性网络的多尺度扩展提取了更高阶的曲线微分，从而增强了对缩放和剪切变换的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量