An LSTM-based Gesture-to-Speech Recognition System.

Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert
{"title":"An LSTM-based Gesture-to-Speech Recognition System.","authors":"Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert","doi":"10.1109/ichi57859.2023.00062","DOIUrl":null,"url":null,"abstract":"<p><p>Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894657/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi57859.2023.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于 LSTM 的手势语音识别系统
对于有语言障碍的人来说,快速灵活的交流方式非常有限。手势加上快速生成的语音,可以为这些人提供更自然的社交动态,尤其是没有精细运动技能在键盘或平板电脑上打字的人。我们创建了一个手机应用原型,它能生成与训练有素的手部动作相关的声音反应,并收集和整理加速度计数据以进行快速训练,从而为那些可能无法完成手语等标准动作的人提供量身定制的模型。六名参与者做出了 11 种不同的手势,从而产生了数据集。开发的移动应用程序集成了双向 LSTM 网络架构,该架构根据这些数据进行了训练。在使用嵌套主体交叉验证进行评估后,我们的集成双向 LSTM 模型在识别预选的 11 种手势方面的总体召回率为 91.8%,在不评估两种常见混淆手势的情况下,召回率为 95.8%。这个原型是创建能够捕捉新手势的手机系统和为语言障碍人群开发定制手势识别模型的一个步骤。对这一原型的进一步改进可以实现快速高效的交流,从而进一步改善无法说话人群的社交互动。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Mitigating Membership Inference in Deep Survival Analyses with Differential Privacy. Benchmarking Transformer-Based Models for Identifying Social Determinants of Health in Clinical Notes. End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies. End-to-End n-ary Relation Extraction for Combination Drug Therapies. Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1