MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition

Arezoo Sadeghzadeh , A.F.M. Shahen Shah , Md Baharul Islam
{"title":"MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition","authors":"Arezoo Sadeghzadeh ,&nbsp;A.F.M. Shahen Shah ,&nbsp;Md Baharul Islam","doi":"10.1016/j.iswa.2024.200384","DOIUrl":null,"url":null,"abstract":"<div><p>Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely <em>MLMSign</em>, by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and <span><math><msup><mrow><mi>a</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> channel of <span><math><mrow><msup><mrow><mi>L</mi></mrow><mrow><mo>∗</mo></mrow></msup><msup><mrow><mi>a</mi></mrow><mrow><mo>∗</mo></mrow></msup><msup><mrow><mi>b</mi></mrow><mrow><mo>∗</mo></mrow></msup></mrow></math></span> color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system’s performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (<span><math><mrow><mo>&gt;</mo><mn>99</mn><mo>.</mo><mn>33</mn><mtext>%</mtext></mrow></math></span>) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200384"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000590/pdfft?md5=9a754731551f7380f553abb3c302ac3a&pid=1-s2.0-S2667305324000590-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305324000590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely MLMSign, by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and a channel of Lab color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system’s performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (>99.33%) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MLMSign:多语言多模态光照不变手语识别
手语(SL)是一种视觉交流工具,对聋人与他人交流和日常生活具有重要意义。由于手语种类繁多且缺乏翻译知识,因此有必要开发自动手语识别(SLR)系统,以缩小聋人和听人群体之间的沟通差距。尽管有许多先进的静态手语识别系统,但如果同时从不同的关键方面进行评估:处理类内和类间高度差异的准确性、鲁棒性、计算复杂性和泛化能力,这些系统在现实生活中都不够实用和有利。为此,我们提出了一种新颖的多语言多模态 SLR 系统,即 MLMSign,充分发挥手工特征和深度学习模型的优势,在最大程度降低计算成本的同时,提高系统的性能和对光照变化的鲁棒性。RGB 符号图像及其手工创建特征的二维可视化,即定向梯度直方图(HOG)特征和 L∗a∗b∗ 色彩空间的 a∗ 通道,被用作训练新型卷积神经网络(CNN)的三种输入模式。通过广泛的参数研究,对层数、滤波器、核大小、学习率和优化技术进行了精心选择,以在不影响准确性的前提下最大限度地降低计算成本。通过集合学习联合部署这三种模式的模型,系统的性能和鲁棒性得到了显著提升。每种模式的影响都是根据网格搜索确定的影响系数进行优化的。除了全面的定量评估外,我们还利用 Grad-CAM 可视化模型对我们提出的模型的能力和三种模式的集合效果进行了定性评估。对测试数据进行的实验结果表明,我们的系统在处理曝光过度和曝光不足的照明条件时具有很强的鲁棒性。我们的系统在六个基准数据集(即 Massey、Static ASL、NUS II、TSL Fersingpelling、BdSL36v1 和 PSL)上获得了很高的准确率(99.33%),这表明我们的系统以最少的参数和对复杂数据集的高泛化能力明显优于最新的先进方法。该系统在四种不同手语中的良好表现使其成为多语言应用的可行系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.60
自引率
0.00%
发文量
0
期刊最新文献
MapReduce teaching learning based optimization algorithm for solving CEC-2013 LSGO benchmark Testsuit Intelligent gear decision method for vehicle automatic transmission system based on data mining Design and implementation of EventsKG for situational monitoring and security intelligence in India: An open-source intelligence gathering approach Ideological orientation and extremism detection in online social networking sites: A systematic review Multi-objective optimization of power networks integrating electric vehicles and wind energy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1