基于 LSTM 的手势语音识别系统

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI:10.1109/ichi57859.2023.00062

Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert

{"title":"基于 LSTM 的手势语音识别系统","authors":"Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert","doi":"10.1109/ichi57859.2023.00062","DOIUrl":null,"url":null,"abstract":"Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"430-438"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894657/pdf/","citationCount":"0","resultStr":"{\"title\":\"An LSTM-based Gesture-to-Speech Recognition System.\",\"authors\":\"Riyad Bin Rafiq, Syed Araib Karim, Mark V Albert\",\"doi\":\"10.1109/ichi57859.2023.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.\",\"PeriodicalId\":73284,\"journal\":{\"name\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"volume\":\"2023 \",\"pages\":\"430-438\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894657/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ichi57859.2023.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi57859.2023.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对于有语言障碍的人来说，快速灵活的交流方式非常有限。手势加上快速生成的语音，可以为这些人提供更自然的社交动态，尤其是没有精细运动技能在键盘或平板电脑上打字的人。我们创建了一个手机应用原型，它能生成与训练有素的手部动作相关的声音反应，并收集和整理加速度计数据以进行快速训练，从而为那些可能无法完成手语等标准动作的人提供量身定制的模型。六名参与者做出了 11 种不同的手势，从而产生了数据集。开发的移动应用程序集成了双向 LSTM 网络架构，该架构根据这些数据进行了训练。在使用嵌套主体交叉验证进行评估后，我们的集成双向 LSTM 模型在识别预选的 11 种手势方面的总体召回率为 91.8%，在不评估两种常见混淆手势的情况下，召回率为 95.8%。这个原型是创建能够捕捉新手势的手机系统和为语言障碍人群开发定制手势识别模型的一个步骤。对这一原型的进一步改进可以实现快速高效的交流，从而进一步改善无法说话人群的社交互动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An LSTM-based Gesture-to-Speech Recognition System.

Fast and flexible communication options are limited for speech-impaired people. Hand gestures coupled with fast, generated speech can enable a more natural social dynamic for those individuals - particularly individuals without the fine motor skills to type on a keyboard or tablet reliably. We created a mobile phone application prototype that generates audible responses associated with trained hand movements and collects and organizes the accelerometer data for rapid training to allow tailored models for individuals who may not be able to perform standard movements such as sign language. Six participants performed 11 distinct gestures to produce the dataset. A mobile application was developed that integrated a bidirectional LSTM network architecture which was trained from this data. After evaluation using nested subject-wise cross-validation, our integrated bidirectional LSTM model demonstrates an overall recall of 91.8% in recognition of these pre-selected 11 hand gestures, with recall at 95.8% when two commonly confused gestures were not assessed. This prototype is a step in creating a mobile phone system capable of capturing new gestures and developing tailored gesture recognition models for individuals in speech-impaired populations. Further refinement of this prototype can enable fast and efficient communication with the goal of further improving social interaction for individuals unable to speak.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

自引率

0.00%

发文量