{"title":"Towards Cross-Corpora Generalization for Low-Resource Spoken Language Identification","authors":"Spandan Dey;Md Sahidullah;Goutam Saha","doi":"10.1109/TASLP.2024.3492807","DOIUrl":null,"url":null,"abstract":"Low-resource spoken language identification (LID) systems are prone to poor generalization across unknown domains. In this study, using multiple widely used low-resourced South Asian LID corpora, we conduct an in-depth analysis for understanding the key non-lingual bias factors that create corpora mismatch and degrade LID generalization. To quantify the biases, we extract different data-driven and rule-based summary vectors that capture non-lingual aspects, such as speaker characteristics, spoken context, accents or dialects, recording channels, background noise, and environments. We then conduct a statistical analysis to identify the most crucial non-lingual bias factors and corpora mismatch components that impact LID performance. Following these analyses, we then propose effective bias compensation approaches for the most relevant summary vectors. We generate pseudo-labels using hierarchical clustering over language-domain-gender constrained summary vectors and use them to train adversarial networks with conditioned metric loss. The compensations learn invariance for the corpora mismatches due to the non-lingual biases and help to improve the generalization. With the proposed compensation method, we improve equal error rate up to 5.22% and 8.14% for the same-corpora and cross-corpora evaluations, respectively.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"5040-5050"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10747273/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Low-resource spoken language identification (LID) systems are prone to poor generalization across unknown domains. In this study, using multiple widely used low-resourced South Asian LID corpora, we conduct an in-depth analysis for understanding the key non-lingual bias factors that create corpora mismatch and degrade LID generalization. To quantify the biases, we extract different data-driven and rule-based summary vectors that capture non-lingual aspects, such as speaker characteristics, spoken context, accents or dialects, recording channels, background noise, and environments. We then conduct a statistical analysis to identify the most crucial non-lingual bias factors and corpora mismatch components that impact LID performance. Following these analyses, we then propose effective bias compensation approaches for the most relevant summary vectors. We generate pseudo-labels using hierarchical clustering over language-domain-gender constrained summary vectors and use them to train adversarial networks with conditioned metric loss. The compensations learn invariance for the corpora mismatches due to the non-lingual biases and help to improve the generalization. With the proposed compensation method, we improve equal error rate up to 5.22% and 8.14% for the same-corpora and cross-corpora evaluations, respectively.
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.