{"title":"线性Chirplet变换的语音特征提取及其应用*","authors":"H. Do, D. Chau, S. Tran","doi":"10.1080/24751839.2023.2207267","DOIUrl":null,"url":null,"abstract":"ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"7 1","pages":"376 - 391"},"PeriodicalIF":2.7000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech feature extraction using linear Chirplet transform and its applications*\",\"authors\":\"H. Do, D. Chau, S. Tran\",\"doi\":\"10.1080/24751839.2023.2207267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.\",\"PeriodicalId\":32180,\"journal\":{\"name\":\"Journal of Information and Telecommunication\",\"volume\":\"7 1\",\"pages\":\"376 - 391\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information and Telecommunication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/24751839.2023.2207267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2207267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Speech feature extraction using linear Chirplet transform and its applications*
ABSTRACT Most speech processing models begin with feature extraction and then pass the feature vector to the primary processing model. The solution's performance mainly depends on the quality of the feature representation and the model architecture. Much research focuses on designing robust deep network architecture and ignoring feature representation's important role during the deep neural network era. This work aims to exploit a new approach to design a speech signal representation in the time-frequency domain via Linear Chirplet Transform (LCT). The proposed method provides a feature vector sensitive to the frequency change inside human speech with a solid mathematical foundation. This is a potential direction for many applications. The experimental results show the improvement of the feature based on LCT compared to MFCC or Fourier Transform. In both speaker gender recognition, dialect recognition, and speech recognition, LCT significantly improved compared with MFCC and other features. This result also implies that the feature based on LCT is independent of language, so it can be used in various applications.