{"title":"利用图拓扑的视觉定位图像检索新方法","authors":"A. Elashry, C. Toth","doi":"10.5194/isprs-annals-x-2-2024-49-2024","DOIUrl":null,"url":null,"abstract":"Abstract. This research introduces a novel approach to improve vision-based positioning in the absence of GNSS signals. Specifically, we address the challenge posed by obstacles that alter image information or features, making retrieving the query image from the database difficult. While the Bag of Visual Words (BoVW) is a widely used image retrieval technique, it has a limitation in representing each image with a single histogram vector or vocabulary of visual words, i.e., the emergence of obstacles can introduce new features to the query image, resulting in different visual words. Our study overcomes this limitation by clustering the features of each image using the k-means method and generating a graph for each class. Each node or key point in the graph obtains additional information from its direct neighbors using functions employed in graph neural networks, functioning as a feedforward network with constant parameters. This process generates new embedding nodes, and eventually, global pooling is applied to produce one vector for each graph, representing each image with graph vectors based on objects or feature classes. As a result, each image is represented with graph vectors based on objects or feature classes. In the presence of obstacles covering one or more graphs, there is sufficient information from the query image to retrieve the most relevant image from the database. Our approach was applied to indoor positioning applications, with the database collected in Bolz Hall at The Ohio State University. Traditional BoVW techniques struggle to properly retrieve most query images from the database due to obstacles like humans or recently deployed objects that alter image features. In contrast, our approach has shown progress in image retrieval by representing each image with multiple graph vectors, depending on the number of objects in the image. This helps prevent or mitigate changes in image features caused by obstacles covering or adding features to the image, as demonstrated in the results.\n","PeriodicalId":508124,"journal":{"name":"ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences","volume":" 1242","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Approach to Image Retrieval for Vision-Based Positioning Utilizing Graph Topology\",\"authors\":\"A. Elashry, C. Toth\",\"doi\":\"10.5194/isprs-annals-x-2-2024-49-2024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. This research introduces a novel approach to improve vision-based positioning in the absence of GNSS signals. Specifically, we address the challenge posed by obstacles that alter image information or features, making retrieving the query image from the database difficult. While the Bag of Visual Words (BoVW) is a widely used image retrieval technique, it has a limitation in representing each image with a single histogram vector or vocabulary of visual words, i.e., the emergence of obstacles can introduce new features to the query image, resulting in different visual words. Our study overcomes this limitation by clustering the features of each image using the k-means method and generating a graph for each class. Each node or key point in the graph obtains additional information from its direct neighbors using functions employed in graph neural networks, functioning as a feedforward network with constant parameters. This process generates new embedding nodes, and eventually, global pooling is applied to produce one vector for each graph, representing each image with graph vectors based on objects or feature classes. As a result, each image is represented with graph vectors based on objects or feature classes. In the presence of obstacles covering one or more graphs, there is sufficient information from the query image to retrieve the most relevant image from the database. Our approach was applied to indoor positioning applications, with the database collected in Bolz Hall at The Ohio State University. Traditional BoVW techniques struggle to properly retrieve most query images from the database due to obstacles like humans or recently deployed objects that alter image features. In contrast, our approach has shown progress in image retrieval by representing each image with multiple graph vectors, depending on the number of objects in the image. This helps prevent or mitigate changes in image features caused by obstacles covering or adding features to the image, as demonstrated in the results.\\n\",\"PeriodicalId\":508124,\"journal\":{\"name\":\"ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences\",\"volume\":\" 1242\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/isprs-annals-x-2-2024-49-2024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-annals-x-2-2024-49-2024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
摘要这项研究提出了一种新方法,用于在没有全球导航卫星系统信号的情况下改进基于视觉的定位。具体来说,我们要解决的挑战是,由于障碍物改变了图像信息或特征,使得从数据库中检索查询图像变得困难。虽然 "视觉词袋"(Bag of Visual Words,BoVW)是一种广泛使用的图像检索技术,但它在用单一直方图向量或视觉词词汇来表示每幅图像方面存在局限性,即障碍物的出现会给查询图像带来新的特征,从而产生不同的视觉词。我们的研究通过使用 k-means 方法对每幅图像的特征进行聚类,并为每个类别生成一个图,从而克服了这一局限性。图中的每个节点或关键点都会使用图神经网络中使用的函数从其直接相邻的节点或关键点获取额外的信息,作为具有恒定参数的前馈网络发挥作用。这一过程会生成新的嵌入节点,最终,全局池化技术会为每个图生成一个向量,用基于对象或特征类别的图向量来表示每幅图像。因此,每幅图像都是用基于物体或特征类别的图向量来表示的。在有障碍物覆盖一个或多个图的情况下,查询图像中的信息足以从数据库中检索出最相关的图像。我们的方法应用于室内定位应用,数据库收集于俄亥俄州立大学的博尔兹大厅。传统的 BoVW 技术很难从数据库中正确检索到大多数查询图像,原因是人类或最近部署的物体等障碍物会改变图像特征。相比之下,我们的方法根据图像中物体的数量,用多个图向量来表示每幅图像,从而在图像检索方面取得了进展。如结果所示,这有助于防止或减轻因障碍物覆盖或增加图像特征而导致的图像特征变化。
A Novel Approach to Image Retrieval for Vision-Based Positioning Utilizing Graph Topology
Abstract. This research introduces a novel approach to improve vision-based positioning in the absence of GNSS signals. Specifically, we address the challenge posed by obstacles that alter image information or features, making retrieving the query image from the database difficult. While the Bag of Visual Words (BoVW) is a widely used image retrieval technique, it has a limitation in representing each image with a single histogram vector or vocabulary of visual words, i.e., the emergence of obstacles can introduce new features to the query image, resulting in different visual words. Our study overcomes this limitation by clustering the features of each image using the k-means method and generating a graph for each class. Each node or key point in the graph obtains additional information from its direct neighbors using functions employed in graph neural networks, functioning as a feedforward network with constant parameters. This process generates new embedding nodes, and eventually, global pooling is applied to produce one vector for each graph, representing each image with graph vectors based on objects or feature classes. As a result, each image is represented with graph vectors based on objects or feature classes. In the presence of obstacles covering one or more graphs, there is sufficient information from the query image to retrieve the most relevant image from the database. Our approach was applied to indoor positioning applications, with the database collected in Bolz Hall at The Ohio State University. Traditional BoVW techniques struggle to properly retrieve most query images from the database due to obstacles like humans or recently deployed objects that alter image features. In contrast, our approach has shown progress in image retrieval by representing each image with multiple graph vectors, depending on the number of objects in the image. This helps prevent or mitigate changes in image features caused by obstacles covering or adding features to the image, as demonstrated in the results.