通过音频描述来理解3D场景:针对视障人士的Kinect-iPad融合

J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun
{"title":"通过音频描述来理解3D场景:针对视障人士的Kinect-iPad融合","authors":"J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun","doi":"10.1145/2049536.2049613","DOIUrl":null,"url":null,"abstract":"Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.","PeriodicalId":351090,"journal":{"name":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired\",\"authors\":\"J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun\",\"doi\":\"10.1145/2049536.2049613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.\",\"PeriodicalId\":351090,\"journal\":{\"name\":\"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2049536.2049613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2049536.2049613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

微软的Kinect 3D运动传感器是一种低成本的3D相机,可以提供室内环境的颜色和深度信息。在这个演示中,这个只有乐趣的相机的功能伴随着iPad的有形界面,是针对视障人士的利益。介绍了一种基于计算机视觉的实时目标定位及其音频描述框架。首先,从场景中提取物体并使用特征描述符和机器学习进行识别。其次,用乐器声音来标记被识别的物体,用虚拟空间声源来描述它们在三维空间中的位置。因此,当手指在iPad上触发声音时,可以听到和探索场景,并在上面映射出物体的俯视图。这使得蒙着眼睛的用户能够建立一个环境的心理占用网格。这里提出的方法带来了有效援助的希望,并可以在不久的将来改编为视障人士的电子旅行辅助工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired
Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating information support system for visually impaired people with mobile touch screens and vibration Improving public transit accessibility for blind riders: a train station navigation assistant Peer interviews: an adapted methodology for contextual understanding in user-centred design ACES: aphasia emulation, realism, and the turing test Enhancing social connections through automatically-generated online social network messages
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1