{"title":"通过模拟人类听觉感知的某些特性,实现了一种高效的独立于说话人的自动语音识别","authors":"H. Hermansky","doi":"10.1109/ICASSP.1987.1169803","DOIUrl":null,"url":null,"abstract":"An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception\",\"authors\":\"H. Hermansky\",\"doi\":\"10.1109/ICASSP.1987.1169803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.\",\"PeriodicalId\":140810,\"journal\":{\"name\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1987-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1987.1169803\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception
An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.