Hu Jin , Shujun Zhang , Zilong Yang , Qi Han , Jianping Cao
{"title":"Exploiting Locality Sensitive Hashing - Clustering and gloss feature for sign language production","authors":"Hu Jin , Shujun Zhang , Zilong Yang , Qi Han , Jianping Cao","doi":"10.1016/j.specom.2025.103227","DOIUrl":null,"url":null,"abstract":"<div><div>The automatic Sign Language Production (SLP), which converts spoken language sentences into continuous sign pose sequences, is crucial for the digital interactive application of sign language. Long text sequence inputs make current deep learning-based SLP models inefficient and unable to fully take advantage of the intricate information conveyed by sign language, resulting in the fact that the generated skeleton pose sequences may not be well comprehensible or acceptable to individuals with hearing impairments. In this paper, we propose a sign language production method that utilizes Locality Sensitive Hashing-Clustering to automatically aggregate the similar and identical embedded word vectors, capture long-distance dependencies, thereby enhance the accuracy of SLP. And a multi-scale feature extraction network is designed to extract local feature of gloss and combine it with embedded text vectors to enhance text in-formation. Extensive experimental results on the challenging RWTH-PHOENIX-Weather 2014T (PHOENIX14T) dataset show that our model outperforms the baseline method.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"171 ","pages":"Article 103227"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000421","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The automatic Sign Language Production (SLP), which converts spoken language sentences into continuous sign pose sequences, is crucial for the digital interactive application of sign language. Long text sequence inputs make current deep learning-based SLP models inefficient and unable to fully take advantage of the intricate information conveyed by sign language, resulting in the fact that the generated skeleton pose sequences may not be well comprehensible or acceptable to individuals with hearing impairments. In this paper, we propose a sign language production method that utilizes Locality Sensitive Hashing-Clustering to automatically aggregate the similar and identical embedded word vectors, capture long-distance dependencies, thereby enhance the accuracy of SLP. And a multi-scale feature extraction network is designed to extract local feature of gloss and combine it with embedded text vectors to enhance text in-formation. Extensive experimental results on the challenging RWTH-PHOENIX-Weather 2014T (PHOENIX14T) dataset show that our model outperforms the baseline method.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.