R. Patterson, J. Holdsworth, P. Thurston, T. Robinson
{"title":"Auditory Images As Input For Speech Recognition Systems","authors":"R. Patterson, J. Holdsworth, P. Thurston, T. Robinson","doi":"10.1109/ASPAA.1991.634090","DOIUrl":null,"url":null,"abstract":"Over the past decade, hearing scientists have developed a number of time-domain models of the processing performed by the cochlea in an effort to develop a reasonably accurate multi-channel representation of the pattern of neural activity flowing from the cochlea up the auditory nerve to the cochlear nucleus [l]. It is often assumed that peripheral auditory processing ends at the output of the cochlea and that the pattern of activity in the auditory nerve is in some sense what we hear. In reality, this neural activity pattern (NAP) is not a good representation of our auditory sensations because it includes phase differences that we do riot hear and it does not include auditory temporal integration (TI). As a result, several of the models have been extended to include periodicity-sensitive TI [2], [3], [4] which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images we experience in response to sounds. When these models are applied to speech sounds, the auditory images of vowels reveal an elaborate formant structure that is absent in the more traditional representation of speech -the spectrogram. An example is presented on the left in the figure; it is the auditory image of the stationary part of the vowel /ae/ as in 'bab' [4]. The abscissa of the auditory image is 'temporal integration interval' and each line of the image shows the activity in one frequency channel of the auditory model. In general terms, activity on a vertical line in the auditory image shows that there is a correlation in the sound at that temporal interval. The coincentrations of activity are the formants of the vowel.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPAA.1991.634090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Over the past decade, hearing scientists have developed a number of time-domain models of the processing performed by the cochlea in an effort to develop a reasonably accurate multi-channel representation of the pattern of neural activity flowing from the cochlea up the auditory nerve to the cochlear nucleus [l]. It is often assumed that peripheral auditory processing ends at the output of the cochlea and that the pattern of activity in the auditory nerve is in some sense what we hear. In reality, this neural activity pattern (NAP) is not a good representation of our auditory sensations because it includes phase differences that we do riot hear and it does not include auditory temporal integration (TI). As a result, several of the models have been extended to include periodicity-sensitive TI [2], [3], [4] which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images we experience in response to sounds. When these models are applied to speech sounds, the auditory images of vowels reveal an elaborate formant structure that is absent in the more traditional representation of speech -the spectrogram. An example is presented on the left in the figure; it is the auditory image of the stationary part of the vowel /ae/ as in 'bab' [4]. The abscissa of the auditory image is 'temporal integration interval' and each line of the image shows the activity in one frequency channel of the auditory model. In general terms, activity on a vertical line in the auditory image shows that there is a correlation in the sound at that temporal interval. The coincentrations of activity are the formants of the vowel.