Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang
{"title":"Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm","authors":"Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang","doi":"arxiv-2408.10488","DOIUrl":null,"url":null,"abstract":"Sign Language Translation (SLT) is a core task in the field of AI-assisted\ndisability. Unlike traditional SLT based on visible light videos, which is\neasily affected by factors such as lighting, rapid hand movements, and privacy\nbreaches, this paper proposes the use of high-definition Event streams for SLT,\neffectively mitigating the aforementioned issues. This is primarily because\nEvent streams have a high dynamic range and dense temporal signals, which can\nwithstand low illumination and motion blur well. Additionally, due to their\nsparsity in space, they effectively protect the privacy of the target person.\nMore specifically, we propose a new high-resolution Event stream sign language\ndataset, termed Event-CSL, which effectively fills the data gap in this area of\nresearch. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in\nthe text vocabulary. These samples are collected in a variety of indoor and\noutdoor scenes, encompassing multiple angles, light intensities, and camera\nmovements. We have benchmarked existing mainstream SLT works to enable fair\ncomparison for future efforts. Based on this dataset and several other\nlarge-scale datasets, we propose a novel baseline method that fully leverages\nthe Mamba model's ability to integrate temporal information of CNN features,\nresulting in improved sign language translation outcomes. Both the benchmark\ndataset and source code will be released on\nhttps://github.com/Event-AHU/OpenESL","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams have a high dynamic range and dense temporal signals, which can withstand low illumination and motion blur well. Additionally, due to their sparsity in space, they effectively protect the privacy of the target person. More specifically, we propose a new high-resolution Event stream sign language dataset, termed Event-CSL, which effectively fills the data gap in this area of research. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in the text vocabulary. These samples are collected in a variety of indoor and outdoor scenes, encompassing multiple angles, light intensities, and camera movements. We have benchmarked existing mainstream SLT works to enable fair comparison for future efforts. Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes. Both the benchmark dataset and source code will be released on https://github.com/Event-AHU/OpenESL
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于事件流的手语翻译:高清基准数据集与新算法
手语翻译(SLT)是人工智能辅助残疾领域的一项核心任务。传统的手语翻译基于可见光视频,容易受到光线、快速手部动作和隐私泄露等因素的影响,而本文提出使用高清事件流进行手语翻译,有效缓解了上述问题。这主要是因为事件流具有高动态范围和密集的时间信号,能够很好地抵御低照度和运动模糊。更具体地说,我们提出了一个新的高分辨率事件流手势语言数据集,称为 Event-CSL,它有效地填补了这一研究领域的数据空白。它包含 14,827 个视频、14,821 个词汇和 2,544 个中文文本词汇。这些样本是在各种室内和室外场景中收集的,包括多角度、光照强度和摄像机运动。我们对现有的主流 SLT 作品进行了基准测试,以便为今后的工作提供公平的比较。基于该数据集和其他几个大规模数据集,我们提出了一种新颖的基准方法,该方法充分利用了 Mamba 模型整合 CNN 特征的时间信息的能力,从而提高了手语翻译效果。基准数据集和源代码都将在 https://github.com/Event-AHU/OpenESL 上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware-Friendly Implementation of Physical Reservoir Computing with CMOS-based Time-domain Analog Spiking Neurons Self-Contrastive Forward-Forward Algorithm Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models PReLU: Yet Another Single-Layer Solution to the XOR Problem Inferno: An Extensible Framework for Spiking Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1