Learning geometric invariants through neural networks

Arpit Rai
{"title":"Learning geometric invariants through neural networks","authors":"Arpit Rai","doi":"10.1007/s00371-024-03398-z","DOIUrl":null,"url":null,"abstract":"<p>Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40<span>\\(^{\\circ }\\)</span>, in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03398-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40\(^{\circ }\), in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过神经网络学习几何不变式
卷积神经网络已成为解决各种计算机视觉任务的基本模型。然而,这些操作只对物体的平移具有不变性,在旋转和其他仿射变换情况下,其性能会受到影响。这项研究提出了一种新型神经网络,它利用了几何不变性,包括曲率、从多个尺度的物体边界提取的曲线的高阶微分以及边缘的相对方向。这些特征对仿射变换具有不变性,可以提高神经网络形状识别的鲁棒性。我们在小型 NORB 数据集上进行的实验表明,在这些几何不变性上运行的 2 层网络的性能比 3 层卷积网络高出 9.69%,同时对仿射变换的鲁棒性更高,即使在没有任何数据增强的情况下也是如此。值得注意的是,当测试图像旋转 40 (^{\circ }\ )时,我们的网络的测试准确率仅下降了 6%,而在相同的变换下,VGG 网络和卷积网络的准确率分别大幅下降了 51.7% 和 69%。此外,与基于 SIFT 的词袋分类器及其旋转不变扩展 RIFT 描述符等不变特征描述符相比,我们的模型显示出更强的鲁棒性,在类似的图像变换下,它们的鲁棒性分别下降了 35% 和 14.1%。我们的实验结果进一步表明,在尺度和剪切变换下的鲁棒性有所提高。此外,我们的几何不变性网络的多尺度扩展提取了更高阶的曲线微分,从而增强了对缩放和剪切变换的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling Video-driven musical composition using large language model with memory-augmented state space 3D human pose estimation using spatiotemporal hypergraphs and its public benchmark on opera videos Topological structure extraction for computing surface–surface intersection curves Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1