线性Chirplet变换的语音特征提取及其应用*

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information and Telecommunication Pub Date : 2023-05-03 DOI:10.1080/24751839.2023.2207267
H. Do, D. Chau, S. Tran
{"title":"线性Chirplet变换的语音特征提取及其应用*","authors":"H. Do, D. Chau, S. Tran","doi":"10.1080/24751839.2023.2207267","DOIUrl":null,"url":null,"abstract":"ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"7 1","pages":"376 - 391"},"PeriodicalIF":2.7000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech feature extraction using linear Chirplet transform and its applications*\",\"authors\":\"H. Do, D. Chau, S. Tran\",\"doi\":\"10.1080/24751839.2023.2207267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.\",\"PeriodicalId\":32180,\"journal\":{\"name\":\"Journal of Information and Telecommunication\",\"volume\":\"7 1\",\"pages\":\"376 - 391\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information and Telecommunication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/24751839.2023.2207267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2207267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

摘要

摘要大多数语音处理模型从特征提取开始,然后将特征向量传递给主处理模型。解决方案的性能主要取决于特征表示和模型架构的质量。许多研究都集中在设计健壮的深度网络架构上,而忽略了特征表示在深度神经网络时代的重要作用。这项工作旨在开发一种新的方法,通过线性Chirplet变换(LCT)设计时频域中的语音信号表示。所提出的方法为对人类语音内部频率变化敏感的特征向量提供了坚实的数学基础。这是许多应用的潜在方向。实验结果表明,与MFCC或傅立叶变换相比,基于LCT的特征得到了改进。在说话人性别识别、方言识别和语音识别中,LCT与MFCC等特征相比均有显著改善。这一结果也表明,基于LCT的特征与语言无关,因此可以在各种应用中使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Speech feature extraction using linear Chirplet transform and its applications*
ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.50
自引率
0.00%
发文量
18
审稿时长
27 weeks
期刊最新文献
Utilizing deep learning in chipless RFID tag detection: an investigation on high-precision mm-wave spatial tag estimation from 2D virtual imaging On the performance of outage probability in cognitive NOMA random networks with hardware impairments Relay-assisted communication over a fluctuating two-ray fading channel Modified Caesar Cipher and Card Deck Shuffle Rearrangement Algorithm for Image Encryption Application of data envelopment analysis to IT project evaluation, with special emphasis on the choice of inputs and outputs in the context of the organization in question
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1