Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan
{"title":"高效视觉地点识别的结构化修剪","authors":"Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan","doi":"arxiv-2409.07834","DOIUrl":null,"url":null,"abstract":"Visual Place Recognition (VPR) is fundamental for the global re-localization\nof robots and devices, enabling them to recognize previously visited locations\nbased on visual inputs. This capability is crucial for maintaining accurate\nmapping and localization over large areas. Given that VPR methods need to\noperate in real-time on embedded systems, it is critical to optimize these\nsystems for minimal resource consumption. While the most efficient VPR\napproaches employ standard convolutional backbones with fixed descriptor\ndimensions, these often lead to redundancy in the embedding space as well as in\nthe network architecture. Our work introduces a novel structured pruning\nmethod, to not only streamline common VPR architectures but also to\nstrategically remove redundancies within the feature embedding space. This dual\nfocus significantly enhances the efficiency of the system, reducing both map\nand model memory requirements and decreasing feature extraction and retrieval\nlatencies. Our approach has reduced memory usage and latency by 21% and 16%,\nrespectively, across models, while minimally impacting recall@1 accuracy by\nless than 1%. This significant improvement enhances real-time applications on\nedge devices with negligible accuracy loss.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structured Pruning for Efficient Visual Place Recognition\",\"authors\":\"Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan\",\"doi\":\"arxiv-2409.07834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Place Recognition (VPR) is fundamental for the global re-localization\\nof robots and devices, enabling them to recognize previously visited locations\\nbased on visual inputs. This capability is crucial for maintaining accurate\\nmapping and localization over large areas. Given that VPR methods need to\\noperate in real-time on embedded systems, it is critical to optimize these\\nsystems for minimal resource consumption. While the most efficient VPR\\napproaches employ standard convolutional backbones with fixed descriptor\\ndimensions, these often lead to redundancy in the embedding space as well as in\\nthe network architecture. Our work introduces a novel structured pruning\\nmethod, to not only streamline common VPR architectures but also to\\nstrategically remove redundancies within the feature embedding space. This dual\\nfocus significantly enhances the efficiency of the system, reducing both map\\nand model memory requirements and decreasing feature extraction and retrieval\\nlatencies. Our approach has reduced memory usage and latency by 21% and 16%,\\nrespectively, across models, while minimally impacting recall@1 accuracy by\\nless than 1%. This significant improvement enhances real-time applications on\\nedge devices with negligible accuracy loss.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07834\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Structured Pruning for Efficient Visual Place Recognition
Visual Place Recognition (VPR) is fundamental for the global re-localization
of robots and devices, enabling them to recognize previously visited locations
based on visual inputs. This capability is crucial for maintaining accurate
mapping and localization over large areas. Given that VPR methods need to
operate in real-time on embedded systems, it is critical to optimize these
systems for minimal resource consumption. While the most efficient VPR
approaches employ standard convolutional backbones with fixed descriptor
dimensions, these often lead to redundancy in the embedding space as well as in
the network architecture. Our work introduces a novel structured pruning
method, to not only streamline common VPR architectures but also to
strategically remove redundancies within the feature embedding space. This dual
focus significantly enhances the efficiency of the system, reducing both map
and model memory requirements and decreasing feature extraction and retrieval
latencies. Our approach has reduced memory usage and latency by 21% and 16%,
respectively, across models, while minimally impacting recall@1 accuracy by
less than 1%. This significant improvement enhances real-time applications on
edge devices with negligible accuracy loss.