{"title":"Towards Global Localization using Multi-Modal Object-Instance Re-Identification","authors":"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna","doi":"arxiv-2409.12002","DOIUrl":null,"url":null,"abstract":"Re-identification (ReID) is a critical challenge in computer vision,\npredominantly studied in the context of pedestrians and vehicles. However,\nrobust object-instance ReID, which has significant implications for tasks such\nas autonomous exploration, long-term perception, and scene understanding,\nremains underexplored. In this work, we address this gap by proposing a novel\ndual-path object-instance re-identification transformer architecture that\nintegrates multimodal RGB and depth information. By leveraging depth data, we\ndemonstrate improvements in ReID across scenes that are cluttered or have\nvarying illumination conditions. Additionally, we develop a ReID-based\nlocalization framework that enables accurate camera localization and pose\nidentification across different viewpoints. We validate our methods using two\ncustom-built RGB-D datasets, as well as multiple sequences from the open-source\nTUM RGB-D datasets. Our approach demonstrates significant improvements in both\nobject instance ReID (mAP of 75.18) and localization accuracy (success rate of\n83% on TUM-RGBD), highlighting the essential role of object ReID in advancing\nrobotic perception. Our models, frameworks, and datasets have been made\npublicly available.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Re-identification (ReID) is a critical challenge in computer vision,
predominantly studied in the context of pedestrians and vehicles. However,
robust object-instance ReID, which has significant implications for tasks such
as autonomous exploration, long-term perception, and scene understanding,
remains underexplored. In this work, we address this gap by proposing a novel
dual-path object-instance re-identification transformer architecture that
integrates multimodal RGB and depth information. By leveraging depth data, we
demonstrate improvements in ReID across scenes that are cluttered or have
varying illumination conditions. Additionally, we develop a ReID-based
localization framework that enables accurate camera localization and pose
identification across different viewpoints. We validate our methods using two
custom-built RGB-D datasets, as well as multiple sequences from the open-source
TUM RGB-D datasets. Our approach demonstrates significant improvements in both
object instance ReID (mAP of 75.18) and localization accuracy (success rate of
83% on TUM-RGBD), highlighting the essential role of object ReID in advancing
robotic perception. Our models, frameworks, and datasets have been made
publicly available.