通过音频描述来理解3D场景:针对视障人士的Kinect-iPad融合

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility Pub Date : 2011-10-24 DOI:10.1145/2049536.2049613

J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun

{"title":"通过音频描述来理解3D场景:针对视障人士的Kinect-iPad融合","authors":"J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun","doi":"10.1145/2049536.2049613","DOIUrl":null,"url":null,"abstract":"Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.","PeriodicalId":351090,"journal":{"name":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired\",\"authors\":\"J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun\",\"doi\":\"10.1145/2049536.2049613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.\",\"PeriodicalId\":351090,\"journal\":{\"name\":\"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2049536.2049613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2049536.2049613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

微软的Kinect 3D运动传感器是一种低成本的3D相机，可以提供室内环境的颜色和深度信息。在这个演示中，这个只有乐趣的相机的功能伴随着iPad的有形界面，是针对视障人士的利益。介绍了一种基于计算机视觉的实时目标定位及其音频描述框架。首先，从场景中提取物体并使用特征描述符和机器学习进行识别。其次，用乐器声音来标记被识别的物体，用虚拟空间声源来描述它们在三维空间中的位置。因此，当手指在iPad上触发声音时，可以听到和探索场景，并在上面映射出物体的俯视图。这使得蒙着眼睛的用户能够建立一个环境的心理占用网格。这里提出的方法带来了有效援助的希望，并可以在不久的将来改编为视障人士的电子旅行辅助工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired

Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility

自引率

0.00%

发文量