物体和空间分辨能力让弱监督局部特征更出色

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2024-12-01 Epub Date: 2024-09-12 DOI:10.1016/j.neunet.2024.106697

Yifan Yin , Mengxiao Yin , Yunhui Xiong , Pengfei Lai , Kan Chang , Feng Yang

{"title":"物体和空间分辨能力让弱监督局部特征更出色","authors":"Yifan Yin , Mengxiao Yin , Yunhui Xiong , Pengfei Lai , Kan Chang , Feng Yang","doi":"10.1016/j.neunet.2024.106697","DOIUrl":null,"url":null,"abstract":"<div><p>Local feature extraction plays a crucial role in numerous critical visual tasks. However, there remains room for improvement in both descriptors and keypoints, particularly regarding the discriminative power of descriptors and the localization precision of keypoints. To address these challenges, this study introduces a novel local feature extraction pipeline named OSDFeat (Object and Spatial Discrimination Feature). OSDFeat employs a decoupling strategy, training descriptor and detection networks independently. Inspired by semantic correspondence, we propose an Object and Spatial Discrimination ResUNet (OSD-ResUNet). OSD-ResUNet captures features from the feature map that differentiate object appearance and spatial context, thus enhancing descriptor performance. To further improve the discriminative capability of descriptors, we propose a Discrimination Information Retained Normalization module (DIRN). DIRN complementarily integrates spatial-wise normalization and channel-wise normalization, yielding descriptors that are more distinguishable and informative. In the detection network, we propose a Cross Saliency Pooling module (CSP). CSP employs a cross-shaped kernel to aggregate long-range context in both vertical and horizontal dimensions. By enhancing the saliency of keypoints, CSP enables the detection network to effectively utilize descriptor information and achieve more precise localization of keypoints. Compared to the previous best local feature extraction methods, OSDFeat achieves Mean Matching Accuracy of 79.4% in local feature matching task, improving by 1.9% and achieving state-of-the-art results. Additionally, OSDFeat achieves competitive results in Visual Localization and 3D Reconstruction. The results of this study indicate that object and spatial discrimination can improve the accuracy and robustness of local feature, even in challenging environments. The code is available at <span><span>https://github.com/pandaandyy/OSDFeat</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"180 ","pages":"Article 106697"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Object and spatial discrimination makes weakly supervised local feature better\",\"authors\":\"Yifan Yin , Mengxiao Yin , Yunhui Xiong , Pengfei Lai , Kan Chang , Feng Yang\",\"doi\":\"10.1016/j.neunet.2024.106697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Local feature extraction plays a crucial role in numerous critical visual tasks. However, there remains room for improvement in both descriptors and keypoints, particularly regarding the discriminative power of descriptors and the localization precision of keypoints. To address these challenges, this study introduces a novel local feature extraction pipeline named OSDFeat (Object and Spatial Discrimination Feature). OSDFeat employs a decoupling strategy, training descriptor and detection networks independently. Inspired by semantic correspondence, we propose an Object and Spatial Discrimination ResUNet (OSD-ResUNet). OSD-ResUNet captures features from the feature map that differentiate object appearance and spatial context, thus enhancing descriptor performance. To further improve the discriminative capability of descriptors, we propose a Discrimination Information Retained Normalization module (DIRN). DIRN complementarily integrates spatial-wise normalization and channel-wise normalization, yielding descriptors that are more distinguishable and informative. In the detection network, we propose a Cross Saliency Pooling module (CSP). CSP employs a cross-shaped kernel to aggregate long-range context in both vertical and horizontal dimensions. By enhancing the saliency of keypoints, CSP enables the detection network to effectively utilize descriptor information and achieve more precise localization of keypoints. Compared to the previous best local feature extraction methods, OSDFeat achieves Mean Matching Accuracy of 79.4% in local feature matching task, improving by 1.9% and achieving state-of-the-art results. Additionally, OSDFeat achieves competitive results in Visual Localization and 3D Reconstruction. The results of this study indicate that object and spatial discrimination can improve the accuracy and robustness of local feature, even in challenging environments. The code is available at <span><span>https://github.com/pandaandyy/OSDFeat</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"180 \",\"pages\":\"Article 106697\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S089360802400621X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089360802400621X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

局部特征提取在许多关键的视觉任务中发挥着至关重要的作用。然而，描述符和关键点仍有改进的余地，尤其是描述符的判别能力和关键点的定位精度。为了应对这些挑战，本研究引入了一种名为 OSDFeat（物体和空间识别特征）的新型局部特征提取管道。OSDFeat 采用解耦策略，独立训练描述符和检测网络。受语义对应的启发，我们提出了一个对象和空间识别 ResUNet（OSD-ResUNet）。OSD-ResUNet 可从特征图中捕捉区分物体外观和空间环境的特征，从而提高描述符的性能。为了进一步提高描述符的鉴别能力，我们提出了一个鉴别信息保留归一化模块（DIRN）。DIRN 对空间归一化和信道归一化进行了互补整合，从而获得了更具区分度和信息量的描述符。在检测网络中，我们提出了交叉 Saliency Pooling 模块（CSP）。CSP 采用十字形内核，在纵向和横向两个维度上聚合长距离上下文。通过增强关键点的显著性，CSP 使检测网络能够有效利用描述符信息，实现更精确的关键点定位。与之前的最佳局部特征提取方法相比，OSDFeat 在局部特征匹配任务中的平均匹配精度达到了 79.4%，提高了 1.9%，达到了最先进的效果。此外，OSDFeat 在视觉定位和三维重建方面也取得了具有竞争力的结果。这项研究的结果表明，即使在具有挑战性的环境中，物体和空间分辨也能提高局部特征的准确性和鲁棒性。代码可在 https://github.com/pandaandyy/OSDFeat 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Object and spatial discrimination makes weakly supervised local feature better

Local feature extraction plays a crucial role in numerous critical visual tasks. However, there remains room for improvement in both descriptors and keypoints, particularly regarding the discriminative power of descriptors and the localization precision of keypoints. To address these challenges, this study introduces a novel local feature extraction pipeline named OSDFeat (Object and Spatial Discrimination Feature). OSDFeat employs a decoupling strategy, training descriptor and detection networks independently. Inspired by semantic correspondence, we propose an Object and Spatial Discrimination ResUNet (OSD-ResUNet). OSD-ResUNet captures features from the feature map that differentiate object appearance and spatial context, thus enhancing descriptor performance. To further improve the discriminative capability of descriptors, we propose a Discrimination Information Retained Normalization module (DIRN). DIRN complementarily integrates spatial-wise normalization and channel-wise normalization, yielding descriptors that are more distinguishable and informative. In the detection network, we propose a Cross Saliency Pooling module (CSP). CSP employs a cross-shaped kernel to aggregate long-range context in both vertical and horizontal dimensions. By enhancing the saliency of keypoints, CSP enables the detection network to effectively utilize descriptor information and achieve more precise localization of keypoints. Compared to the previous best local feature extraction methods, OSDFeat achieves Mean Matching Accuracy of 79.4% in local feature matching task, improving by 1.9% and achieving state-of-the-art results. Additionally, OSDFeat achieves competitive results in Visual Localization and 3D Reconstruction. The results of this study indicate that object and spatial discrimination can improve the accuracy and robustness of local feature, even in challenging environments. The code is available at https://github.com/pandaandyy/OSDFeat.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.