{"title":"Character and Word Level Gesture Recognition of Indian Sign Language","authors":"Rohini K Katti, S. C, Padmashri Desai, Shankar G","doi":"10.1109/I2CT57861.2023.10126314","DOIUrl":null,"url":null,"abstract":"Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"857 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Communication is essential to humans because it allows the dissemination of knowledge and the formation of interpersonal connections. We communicate through speaking, facial expressions, hand gestures, reading, writing, and sketching, among other things. However, speaking is the most often utilized means of communication. People having speech and hearing disabilities can only communicate using hand gestures, making them extremely reliant on nonverbal modes of communication. Hearing-impaired persons can communicate via sign language. Globally, around 1 percent(5 million) of the Indian population falls into this group. ISL is a complete language with its own vocabulary, semantics, lexicon, and a variety of other distinctive linguistic features. In our work, we present the methods for Indian sign language recognition at the character and word levels. The Bag of Visual Words(BoVW) technique recognizes ISL at character level(A-Z, 0-9) with an accuracy of 99 percent. Indian Lexicon Sign Language Dataset - INCLUDE-50 dataset is used for word-level sign language recognition. Inception model, a deep Convolutional Neural Network(CNN) is used to train the spatial features and LSTM RNN(Recurrent Neural Network) is used to train the temporal features of the video. Using CNN predictions as input to RNN, we achieved an accuracy of 86.7 %. In order to optimize the training process, only 60 % of the dataset is trained using the Meta-Learning model along with LSTM RNN and obtained an accuracy of 84.4 %, thus reducing the training time by 70 % and reaching nearly as close accuracy as the previous pre-trained model.