使用韵律、语音策略和声学进行口语识别:综述

Irshad Ahmad Thukroo, Rumaan Bashir, K. Giri
{"title":"使用韵律、语音策略和声学进行口语识别:综述","authors":"Irshad Ahmad Thukroo, Rumaan Bashir, K. Giri","doi":"10.1142/s0219649222500575","DOIUrl":null,"url":null,"abstract":"Spoken language identification (LID) is the identification of language present in a speech segment despite its size (duration and speed), ambiance (topic and emotion), and moderator (gender, age, demographic region). Information Technology has touched new vistas for a couple of decades mostly to simplify the day-to-day life of humans. One of the key contributions of Information Technology is the application of Artificial Intelligence to achieve better results. The advent of artificial intelligence has given rise to a new branch of Natural Language Processing (NLP) called Computational Linguistics, which generates frameworks for intelligently manipulating spoken language knowledge and has brought human–machine into a new stage. In this context, speech has arisen to be one of the imperative forms of interfaces, which is the basic mode of communication for us, and generally the most preferred one. Recognition of the spoken language is a frontend for several technologies, like multiple languages conversation systems, expressed translation software, multilingual speech recognition, spoken word extraction, speech production systems. This paper reviews and summarises the different levels of information that can be used for language identification. A broad study of acoustic, phonetic, and prosody features has been provided and various classifiers have been used for spoken language identification specifically for Indian languages. This paper has investigated various existing spoken language identification models implemented using prosodic, phonotactic, acoustic, and deep learning approaches, the datasets used, and performance measures utilized for their analysis. It also highlights the main features and challenges faced by these models. Moreover, this review analyses the efficiency of the spoken language models that can help the researchers to propose new language identification models for speech signals.","PeriodicalId":127309,"journal":{"name":"J. Inf. Knowl. Manag.","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spoken Language Identification Using Prosody, Phonotactics, and Acoustics: A Review\",\"authors\":\"Irshad Ahmad Thukroo, Rumaan Bashir, K. Giri\",\"doi\":\"10.1142/s0219649222500575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spoken language identification (LID) is the identification of language present in a speech segment despite its size (duration and speed), ambiance (topic and emotion), and moderator (gender, age, demographic region). Information Technology has touched new vistas for a couple of decades mostly to simplify the day-to-day life of humans. One of the key contributions of Information Technology is the application of Artificial Intelligence to achieve better results. The advent of artificial intelligence has given rise to a new branch of Natural Language Processing (NLP) called Computational Linguistics, which generates frameworks for intelligently manipulating spoken language knowledge and has brought human–machine into a new stage. In this context, speech has arisen to be one of the imperative forms of interfaces, which is the basic mode of communication for us, and generally the most preferred one. Recognition of the spoken language is a frontend for several technologies, like multiple languages conversation systems, expressed translation software, multilingual speech recognition, spoken word extraction, speech production systems. This paper reviews and summarises the different levels of information that can be used for language identification. A broad study of acoustic, phonetic, and prosody features has been provided and various classifiers have been used for spoken language identification specifically for Indian languages. This paper has investigated various existing spoken language identification models implemented using prosodic, phonotactic, acoustic, and deep learning approaches, the datasets used, and performance measures utilized for their analysis. It also highlights the main features and challenges faced by these models. Moreover, this review analyses the efficiency of the spoken language models that can help the researchers to propose new language identification models for speech signals.\",\"PeriodicalId\":127309,\"journal\":{\"name\":\"J. Inf. Knowl. Manag.\",\"volume\":\"113 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Inf. Knowl. Manag.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219649222500575\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Knowl. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219649222500575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

口语识别(LID)是对言语片段中存在的语言的识别,而不管其大小(持续时间和速度)、氛围(话题和情感)和主持人(性别、年龄、人口区域)。在过去的几十年里,信息技术已经触及了新的前景,主要是为了简化人类的日常生活。信息技术的关键贡献之一是应用人工智能来取得更好的结果。人工智能的出现催生了自然语言处理(NLP)的一个新分支——计算语言学(Computational Linguistics),它生成了智能操作口语知识的框架,将人机交互带入了一个新阶段。在这种情况下,语音已经成为一种必要的界面形式,它是我们最基本的交流方式,通常也是最受欢迎的一种。口语识别是多语言会话系统、表达翻译软件、多语言语音识别、口语提取、语音生成系统等技术的前端。本文回顾和总结了可用于语言识别的不同层次的信息。对声学、语音和韵律特征进行了广泛的研究,并使用了各种分类器用于口语识别,特别是针对印度语言。本文研究了使用韵律、语音、声学和深度学习方法实现的各种现有口语识别模型、使用的数据集以及用于分析的性能度量。本文还强调了这些模式的主要特点和面临的挑战。此外,本文还分析了口语模型的有效性,这有助于研究人员提出新的语音信号的语言识别模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Spoken Language Identification Using Prosody, Phonotactics, and Acoustics: A Review
Spoken language identification (LID) is the identification of language present in a speech segment despite its size (duration and speed), ambiance (topic and emotion), and moderator (gender, age, demographic region). Information Technology has touched new vistas for a couple of decades mostly to simplify the day-to-day life of humans. One of the key contributions of Information Technology is the application of Artificial Intelligence to achieve better results. The advent of artificial intelligence has given rise to a new branch of Natural Language Processing (NLP) called Computational Linguistics, which generates frameworks for intelligently manipulating spoken language knowledge and has brought human–machine into a new stage. In this context, speech has arisen to be one of the imperative forms of interfaces, which is the basic mode of communication for us, and generally the most preferred one. Recognition of the spoken language is a frontend for several technologies, like multiple languages conversation systems, expressed translation software, multilingual speech recognition, spoken word extraction, speech production systems. This paper reviews and summarises the different levels of information that can be used for language identification. A broad study of acoustic, phonetic, and prosody features has been provided and various classifiers have been used for spoken language identification specifically for Indian languages. This paper has investigated various existing spoken language identification models implemented using prosodic, phonotactic, acoustic, and deep learning approaches, the datasets used, and performance measures utilized for their analysis. It also highlights the main features and challenges faced by these models. Moreover, this review analyses the efficiency of the spoken language models that can help the researchers to propose new language identification models for speech signals.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Knowledge Management in Higher Education in Vietnam: Insights from Higher Education Leaders - An Exploratory Study The Organisation's Size-Innovation Performance Relationship: The Role of Human Resource Development Mechanisms A Comparative Review of Sentimental Analysis Using Machine Learning and Deep Learning Approaches Vocational Education Information Technology Based on Cross-Attention Fusion Knowledge Map Recommendation Algorithm Redesigning Knowledge Management Through Corporate Sustainability Strategy in the Post-Pandemic Era
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1