{"title":"A PRISMA-driven Review of Speech Recognition based on English, Mandarin Chinese, Hindi and Urdu Language","authors":"Muhammad Hazique Khatri, Humera Tariq, Maryam Feroze, Ebad Ali, Zeeshan Anjum Junaidi","doi":"10.5815/ijitcs.2024.03.04","DOIUrl":null,"url":null,"abstract":"The objective of this PRISMA-Driven systematic review is to analyze the relative progress of Urdu speech recognition for the very first time by comparing it mainly with three selected languages; English, Mandarin Chinese, and Hindi based on Artificially Intelligent (AI) building blocks i.e. datasets, feature extraction techniques, experimental design, acoustic and language models. The selection of languages embarks from the speakers of a particular language which reveals that the chosen languages are the world's top spoken languages while Urdu ranks at number ten and is continuously progressing. A total of 176 articles were extracted from the Google Scholar database using custom queries for each language. Among them, 47 articles were selected including 5 review articles and 42 research articles, as per our inclusion criteria and after undergoing quality assessment checks. Comparative research has been designed and findings were organized based on four possible speech types i.e. spontaneous, continuous, connected words and isolated words; twenty-one datasets inclusive benchmark; MFCC, Triangular, Mel spectrogram and Log Mel features; state-of-the-art acoustic and language models; and recognition performance. The findings presented in this systematic literature review have enlightened Urdu and Hindi research towards the best available AI and deep learning practices of English and Mandarin Chinese primarily Triangular filters, Mel spectrogram, Transformers, and Attention as these techniques reveal recent trends and achieved breakthrough performance evident by their word error rate, character error rate, and perplexity.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":" 29","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijitcs.2024.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The objective of this PRISMA-Driven systematic review is to analyze the relative progress of Urdu speech recognition for the very first time by comparing it mainly with three selected languages; English, Mandarin Chinese, and Hindi based on Artificially Intelligent (AI) building blocks i.e. datasets, feature extraction techniques, experimental design, acoustic and language models. The selection of languages embarks from the speakers of a particular language which reveals that the chosen languages are the world's top spoken languages while Urdu ranks at number ten and is continuously progressing. A total of 176 articles were extracted from the Google Scholar database using custom queries for each language. Among them, 47 articles were selected including 5 review articles and 42 research articles, as per our inclusion criteria and after undergoing quality assessment checks. Comparative research has been designed and findings were organized based on four possible speech types i.e. spontaneous, continuous, connected words and isolated words; twenty-one datasets inclusive benchmark; MFCC, Triangular, Mel spectrogram and Log Mel features; state-of-the-art acoustic and language models; and recognition performance. The findings presented in this systematic literature review have enlightened Urdu and Hindi research towards the best available AI and deep learning practices of English and Mandarin Chinese primarily Triangular filters, Mel spectrogram, Transformers, and Attention as these techniques reveal recent trends and achieved breakthrough performance evident by their word error rate, character error rate, and perplexity.