MultiCLU:店面可达性检测与评价的多阶段语境学习与利用

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531361

X. Wang, Jiajun Chen, Hao Tang, Zhigang Zhu

{"title":"MultiCLU:店面可达性检测与评价的多阶段语境学习与利用","authors":"X. Wang, Jiajun Chen, Hao Tang, Zhigang Zhu","doi":"10.1145/3512527.3531361","DOIUrl":null,"url":null,"abstract":"In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metric also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MultiCLU: Multi-stage Context Learning and Utilization for Storefront Accessibility Detection and Evaluation\",\"authors\":\"X. Wang, Jiajun Chen, Hao Tang, Zhigang Zhu\",\"doi\":\"10.1145/3512527.3531361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metric also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.\",\"PeriodicalId\":179895,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3512527.3531361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在这项工作中，从谷歌街景中收集了一个店面可访问性图像数据集，并标记了店面可访问性的三个主要对象:门(用于商店入口)，门把手(用于进入入口)和楼梯(用于通往入口)。在此基础上，提出了一种新的多阶段语境学习和利用方法MultiCLU，该方法分为四个阶段:标记语境(CIL)、训练语境(CIT)、检测语境(CID)和评价语境(CIE)。CIL阶段自动扩展每个旋钮的标签，以包含更多的本地上下文信息。在CIT阶段，使用深度学习方法将基于Faster R-CNN的对象检测器提取的视觉信息投影到由Graph Convolutional Network生成的语义空间中。CID阶段使用类别之间的空间关系推理来细化置信分数。最后，在CIE阶段，提出了一个新的松散的店面可达性评价指标，特别是旋钮类别，以有效地帮助BLV用户找到估计的旋钮位置。实验结果表明，与使用Faster R-CNN的基准检测器相比，所提出的MultiCLU框架的性能明显更好，mAP和recall分别为+13.4%和+15.8%。我们的新评价指标也引入了一种新的评价店面可访问性对象的方法，这对现实生活中的BLV群体有很大的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MultiCLU: Multi-stage Context Learning and Utilization for Storefront Accessibility Detection and Evaluation

In this work, a storefront accessibility image dataset is collected from Google street view and is labeled with three main objects for storefront accessibility: doors (for store entrances), doorknobs (for accessing the entrances) and stairs (for leading to the entrances). Then MultiCLU, a new multi-stage context learning and utilization approach, is proposed with the following four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). The CIL stage automatically extends the label for each knob to include more local contextual information. In the CIT stage, a deep learning method is used to project the visual information extracted by a Faster R-CNN based object detector to semantic space generated by a Graph Convolutional Network. The CID stage uses the spatial relation reasoning between categories to refine the confidence score. Finally in the CIE stage, a new loose evaluation metric for storefront accessibility, especially for knob category, is proposed to efficiently help BLV users to find estimated knob locations. Our experiment results show that the proposed MultiCLU framework can achieve significantly better performance than the baseline detector using Faster R-CNN, with +13.4% on mAP and +15.8% on recall, respectively. Our new evaluation metric also introduces a new way to evaluate storefront accessibility objects, which could benefit BLV group in real life.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量