PereiraASLNet:考虑平均精度和推理时间的YOLOX手语字母识别

Noel Pereira
{"title":"PereiraASLNet:考虑平均精度和推理时间的YOLOX手语字母识别","authors":"Noel Pereira","doi":"10.1109/AISP53593.2022.9760665","DOIUrl":null,"url":null,"abstract":"Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively","PeriodicalId":6793,"journal":{"name":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","volume":"20 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"PereiraASLNet: ASL letter recognition with YOLOX taking Mean Average Precision and Inference Time considerations\",\"authors\":\"Noel Pereira\",\"doi\":\"10.1109/AISP53593.2022.9760665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively\",\"PeriodicalId\":6793,\"journal\":{\"name\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"volume\":\"20 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AISP53593.2022.9760665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP53593.2022.9760665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

从本质上讲,手语允许不需要明确说出单词的交流。它是由美国聋人学校在90年代早期开发的。它是一种自然产生的语言,结合面部动作和手势来传达思想和想法。在现代,它主要被聋哑人和重听人使用。与大多数语言不同,美国手语并没有被广泛教授,这使得普通人群很难与那些主要使用美国手语作为唯一交流手段的人进行有效交流。因此,需要一种系统来检测和预测图像中的字母,然后可以实时使用,以克服这种语言障碍。本研究旨在开发基于YOLOX的手语识别系统,该系统建立在YOLOV3的基础上,其架构中包含卷积神经网络。利用YOLOX的各种主干,介绍并提供了从最不准确/最快响应时间到最准确/最慢响应时间的精度测试时间谱的每一端的六种模型。因此,我提出了PereiraASLNet,它使用字母a - z的自定义类和由Roboflow开发的Pascal VOC XML美国手语数据集来训练YOLOX,并开发了YOLOX的变体,考虑到所有YOLOX骨干架构(即YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large和YOLOX-xlarge)的平均精度和推理时间。模型的测试平均精度分别为- 0.9046、0.9070、0.9227、0.9304、0.9329和0.9578,测试推断时间分别为3.50ms、12.97ms、34.86ms、64.56ms、83.23ms和97.56ms
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PereiraASLNet: ASL letter recognition with YOLOX taking Mean Average Precision and Inference Time considerations
Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A 5.80 GHz Harmonic Suppression Antenna for Wireless Energy Transfer Application Crack identification from concrete structure images using deep transfer learning Energy Efficient VoD with Cache in TWDM PON ring Blockchain-based IoT Device Security A New Dynamic Method of Multiprocessor Scheduling using Modified Crow Search Optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1