{"title":"Fine-grained cross-modality consistency mining for Continuous Sign Language Recognition","authors":"Zhenghao Ke, Sheng Liu, Yuan Feng","doi":"10.1016/j.patrec.2025.02.017","DOIUrl":null,"url":null,"abstract":"<div><div>Continuous Sign Language Recognition (CSLR) involves the detection of sequential glosses from visual sign inputs. While current CSLR methods perform well with high-frequency glosses – primarily functors such as conjunctions and pronouns – they struggle to accurately recognize low-frequency content words, which are essential for conveying meaningful information. This challenge arises from limitations in existing datasets and the Connectionist Temporal Classification (CTC) training procedure, leading to poor generalization to diverse linguistic structures. As a result, CSLR systems face limited applicability in real-world scenarios. In this work, we introduce the Fine-Grained Cross-modality Consistency (FGXM) loss, a novel approach designed to align visual and linguistic models. The FGXM loss encourages consistency between visual and language representations, improving the model’s ability to integrate visual context with linguistic understanding. We also propose the unweighted word error rate (uWER), an unbiased metric for CSLR performance. Unlike the conventional word error rate (WER), uWER provides a fairer evaluation by addressing the frequency imbalance between content words and functors, offering a more accurate measure of a model’s real-world effectiveness. We extensively evaluate our approach across multiple datasets and models, demonstrating significant improvements in both accuracy and data efficiency.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"191 ","pages":"Pages 23-30"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525000595","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Continuous Sign Language Recognition (CSLR) involves the detection of sequential glosses from visual sign inputs. While current CSLR methods perform well with high-frequency glosses – primarily functors such as conjunctions and pronouns – they struggle to accurately recognize low-frequency content words, which are essential for conveying meaningful information. This challenge arises from limitations in existing datasets and the Connectionist Temporal Classification (CTC) training procedure, leading to poor generalization to diverse linguistic structures. As a result, CSLR systems face limited applicability in real-world scenarios. In this work, we introduce the Fine-Grained Cross-modality Consistency (FGXM) loss, a novel approach designed to align visual and linguistic models. The FGXM loss encourages consistency between visual and language representations, improving the model’s ability to integrate visual context with linguistic understanding. We also propose the unweighted word error rate (uWER), an unbiased metric for CSLR performance. Unlike the conventional word error rate (WER), uWER provides a fairer evaluation by addressing the frequency imbalance between content words and functors, offering a more accurate measure of a model’s real-world effectiveness. We extensively evaluate our approach across multiple datasets and models, demonstrating significant improvements in both accuracy and data efficiency.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.