Savitar:新冠肺炎时代耳聋和发音困难的智能手语翻译方法

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2023-07-07 DOI:10.1108/dta-09-2022-0375
Wuyan Liang, Xiaolong Xu
{"title":"Savitar:新冠肺炎时代耳聋和发音困难的智能手语翻译方法","authors":"Wuyan Liang, Xiaolong Xu","doi":"10.1108/dta-09-2022-0375","DOIUrl":null,"url":null,"abstract":"PurposeIn the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.Design/methodology/approachSL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.FindingsWe verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.Originality/valueIn this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era\",\"authors\":\"Wuyan Liang, Xiaolong Xu\",\"doi\":\"10.1108/dta-09-2022-0375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeIn the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.Design/methodology/approachSL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.FindingsWe verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.Originality/valueIn this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.\",\"PeriodicalId\":56156,\"journal\":{\"name\":\"Data Technologies and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Technologies and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1108/dta-09-2022-0375\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-09-2022-0375","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

目的在新型冠状病毒感染症(COVID-19)时代,手语翻译在在线学习中受到关注,因为手语翻译可以评估每个学生的肢体动作,弥合语音障碍者和听力障碍者之间的沟通鸿沟。本文的目的是研究具有高翻译性能的SL序列与自然语言序列之间的对齐。设计/方法/方法sl可以表征为关节/骨骼在二维空间中随时间变化的位置信息,形成骨骼序列。为了对关节、骨骼及其运动信息进行编码,我们提出了一个多流层次网络(MHN)、一个词汇预测网络(VPN)和一个带有循环神经网络传感器的关节网络(JN)。JN用于连接由MHN和VPN编码的序列,并学习它们的序列对齐。结果表明,该方法的翻译准确率分别为94.96%、54.52%和92.88%,推理时间分别比listen- attention -spell network (LAS)和visual hierarchy to lexical sequence network (H2SNet)快18倍和1.7倍。在本文中,我们提出了一个新的框架,可以融合多模态输入(即关节、骨骼及其运动流),并将输入流与自然语言对齐。此外,根据MHN、VPN和JN的不同特性对所提供的框架进行了改进。在三个数据集上的实验结果表明,我们的方法在翻译精度和速度方面都优于目前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era
PurposeIn the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.Design/methodology/approachSL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.FindingsWe verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.Originality/valueIn this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data Technologies and Applications
Data Technologies and Applications Social Sciences-Library and Information Sciences
CiteScore
3.80
自引率
6.20%
发文量
29
期刊介绍: Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies
期刊最新文献
Understanding customer behavior by mapping complaints to personality based on social media textual data A systematic review of the use of FHIR to support clinical research, public health and medical education Novel framework for learning performance prediction using pattern identification and deep learning A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1