{"title":"Automatic feature extraction from spectrograms for acoustic-phonetic analysis","authors":"E. Edmonds, L. Pan, Stella M. O'Brien","doi":"10.1109/ICPR.1992.201873","DOIUrl":null,"url":null,"abstract":"Proposes a new approach for automatic feature extraction from spectrograms, which is an essential component of acoustic-phonetic analysis in automatic continuous speech recognition. The method comprised four levels: segmentation, pattern classification, feature recognition and labelling, and a post-processor. There were three types of patterns: fuzzy, formant and silence. The extracted features included voice bar, stripes, cut-off and transitions of the first four formants. Some techniques are presented, such as two special distortion functions used in segmentation, and a peak-iterate function to detect the stripes feature. This software has been implemented as part of a speech knowledge interface, which was an expert system for speech analysis for speaker-independent, continuous speech recognition. It has been tested with a set of data chosen from a spectrogram database; the correct detection rate for most features was over 89%, and in some cases was as high as 98%.<<ETX>>","PeriodicalId":34917,"journal":{"name":"模式识别与人工智能","volume":"2 1","pages":"701-704"},"PeriodicalIF":0.0000,"publicationDate":"1992-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"模式识别与人工智能","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/ICPR.1992.201873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1
Abstract
Proposes a new approach for automatic feature extraction from spectrograms, which is an essential component of acoustic-phonetic analysis in automatic continuous speech recognition. The method comprised four levels: segmentation, pattern classification, feature recognition and labelling, and a post-processor. There were three types of patterns: fuzzy, formant and silence. The extracted features included voice bar, stripes, cut-off and transitions of the first four formants. Some techniques are presented, such as two special distortion functions used in segmentation, and a peak-iterate function to detect the stripes feature. This software has been implemented as part of a speech knowledge interface, which was an expert system for speech analysis for speaker-independent, continuous speech recognition. It has been tested with a set of data chosen from a spectrogram database; the correct detection rate for most features was over 89%, and in some cases was as high as 98%.<>