{"title":"用于连续手语识别的对抗式自动编码器","authors":"Suhail Muhammad Kamal, Yidong Chen, Shaozi Li","doi":"10.1002/cpe.8220","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Sign language serves as a vital communication medium for the deaf community, encompassing a diverse array of signs conveyed through distinct hand shapes along with non-manual gestures like facial expressions and body movements. Accurate recognition of sign language is crucial for bridging the communication gap between deaf and hearing individuals, yet the scarcity of large-scale datasets poses a significant challenge in developing robust recognition technologies. Existing works address this challenge by employing various strategies, such as enhancing visual modules, incorporating pretrained visual models, and leveraging multiple modalities to improve performance and mitigate overfitting. However, the exploration of the contextual module, responsible for modeling long-term dependencies, remains limited. This work introduces an <b>A</b>dversarial <b>A</b>utoencoder for <b>C</b>ontinuous <b>S</b>ign <b>L</b>anguage <b>R</b>ecognition, <b>AA-CSLR</b>, to address the constraints imposed by limited data availability, leveraging the capabilities of generative models. The integration of pretrained knowledge, coupled with cross-modal alignment, enhances the representation of sign language by effectively aligning visual and textual features. Through extensive experiments on publicly available datasets (PHOENIX-2014, PHOENIX-2014T, and CSL-Daily), we demonstrate the effectiveness of our proposed method in achieving competitive performance in continuous sign language recognition.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 22","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adversarial autoencoder for continuous sign language recognition\",\"authors\":\"Suhail Muhammad Kamal, Yidong Chen, Shaozi Li\",\"doi\":\"10.1002/cpe.8220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Sign language serves as a vital communication medium for the deaf community, encompassing a diverse array of signs conveyed through distinct hand shapes along with non-manual gestures like facial expressions and body movements. Accurate recognition of sign language is crucial for bridging the communication gap between deaf and hearing individuals, yet the scarcity of large-scale datasets poses a significant challenge in developing robust recognition technologies. Existing works address this challenge by employing various strategies, such as enhancing visual modules, incorporating pretrained visual models, and leveraging multiple modalities to improve performance and mitigate overfitting. However, the exploration of the contextual module, responsible for modeling long-term dependencies, remains limited. This work introduces an <b>A</b>dversarial <b>A</b>utoencoder for <b>C</b>ontinuous <b>S</b>ign <b>L</b>anguage <b>R</b>ecognition, <b>AA-CSLR</b>, to address the constraints imposed by limited data availability, leveraging the capabilities of generative models. The integration of pretrained knowledge, coupled with cross-modal alignment, enhances the representation of sign language by effectively aligning visual and textual features. Through extensive experiments on publicly available datasets (PHOENIX-2014, PHOENIX-2014T, and CSL-Daily), we demonstrate the effectiveness of our proposed method in achieving competitive performance in continuous sign language recognition.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"36 22\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8220\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8220","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Adversarial autoencoder for continuous sign language recognition
Sign language serves as a vital communication medium for the deaf community, encompassing a diverse array of signs conveyed through distinct hand shapes along with non-manual gestures like facial expressions and body movements. Accurate recognition of sign language is crucial for bridging the communication gap between deaf and hearing individuals, yet the scarcity of large-scale datasets poses a significant challenge in developing robust recognition technologies. Existing works address this challenge by employing various strategies, such as enhancing visual modules, incorporating pretrained visual models, and leveraging multiple modalities to improve performance and mitigate overfitting. However, the exploration of the contextual module, responsible for modeling long-term dependencies, remains limited. This work introduces an Adversarial Autoencoder for Continuous Sign Language Recognition, AA-CSLR, to address the constraints imposed by limited data availability, leveraging the capabilities of generative models. The integration of pretrained knowledge, coupled with cross-modal alignment, enhances the representation of sign language by effectively aligning visual and textual features. Through extensive experiments on publicly available datasets (PHOENIX-2014, PHOENIX-2014T, and CSL-Daily), we demonstrate the effectiveness of our proposed method in achieving competitive performance in continuous sign language recognition.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.