Sign2Speech: A Novel Sign Language to Speech Synthesis Pipeline

Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen
{"title":"Sign2Speech: A Novel Sign Language to Speech Synthesis Pipeline","authors":"Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen","doi":"10.56541/ctdh7516","DOIUrl":null,"url":null,"abstract":"The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"24th Irish Machine Vision and Image Processing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56541/ctdh7516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sign2Speech:一种新的手语到语音合成管道
聋人社区成员缺乏辅助手语技术,阻碍了他们获取公共信息,限制了他们的公民权利和社会融入。在本文中,我们介绍了一种新的概念验证方法,用于端到端手语到语音的翻译,而不需要中间文本表示。我们提出了一种基于lstm的方法来从手部姿势生成语音,其中后者可以通过将现成的姿势预测器应用于指纹拼写视频来获得。我们使用一个自定义数据集来训练我们的模型,该数据集由合成的带有语音标签的符号组成,并在一个真实的指纹拼写符号数据集上进行测试。我们生成的输出在定量测量上与真实世界的数据足够相似。这表明我们的技术可以用来从符号中生成语音,而不依赖于文本。合成数据集的使用进一步减少了对真实世界的、带注释的数据的依赖。然而,使用混合数据集,结合真实世界和合成数据,可以进一步改善结果。我们的代码和数据集可在https://github.com/DanBigioi/Sign2Speech上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An NLP approach to Image Analysis A Data Augmentation and Pre-processing Technique for Sign Language Fingerspelling Recognition Acoustic Source Localization Using Straight Line Approximations Towards Temporal Stability in Automatic Video Colourisation Geometrically reconstructing confocal microscopy images for modelling the retinal microvasculature as a 3D cylindrical network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1