{"title":"缩小差距:信号增强和模式识别的概率接口","authors":"D. Kolossa","doi":"10.1109/GlobalSIP.2014.7032171","DOIUrl":null,"url":null,"abstract":"Human beings are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, in speech processing systems, the two central tasks of speech signal enhancement and of speech or phonetic-state recognition are often performed almost in isolation, with only estimates of mean values being exchanged between them. This paper describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such interfaces can improve the quality of both components: On the one hand, more reliable pattern recognition can be attained, while on the other hand, enhanced signal quality is achieved when feeding back information from a pattern recognition stage to the signal preprocessing. This latter idea will be described using the example of twin-HMMs, audiovisual speech models that help to recover lost acoustic information by exploiting video data. Overall, it will be shown how broader, probabilistic interfaces between signal processing and pattern recognition can help to achieve better performance in real-world conditions, and to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.","PeriodicalId":91429,"journal":{"name":"... IEEE Global Conference on Signal and Information Processing. IEEE Global Conference on Signal and Information Processing","volume":"41 1","pages":"517-521"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Narrowing the gap: Probabilistic interfaces for signal enhancement and pattern recognition\",\"authors\":\"D. Kolossa\",\"doi\":\"10.1109/GlobalSIP.2014.7032171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human beings are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, in speech processing systems, the two central tasks of speech signal enhancement and of speech or phonetic-state recognition are often performed almost in isolation, with only estimates of mean values being exchanged between them. This paper describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such interfaces can improve the quality of both components: On the one hand, more reliable pattern recognition can be attained, while on the other hand, enhanced signal quality is achieved when feeding back information from a pattern recognition stage to the signal preprocessing. This latter idea will be described using the example of twin-HMMs, audiovisual speech models that help to recover lost acoustic information by exploiting video data. Overall, it will be shown how broader, probabilistic interfaces between signal processing and pattern recognition can help to achieve better performance in real-world conditions, and to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.\",\"PeriodicalId\":91429,\"journal\":{\"name\":\"... IEEE Global Conference on Signal and Information Processing. IEEE Global Conference on Signal and Information Processing\",\"volume\":\"41 1\",\"pages\":\"517-521\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"... IEEE Global Conference on Signal and Information Processing. IEEE Global Conference on Signal and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP.2014.7032171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE Global Conference on Signal and Information Processing. IEEE Global Conference on Signal and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP.2014.7032171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Narrowing the gap: Probabilistic interfaces for signal enhancement and pattern recognition
Human beings are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, in speech processing systems, the two central tasks of speech signal enhancement and of speech or phonetic-state recognition are often performed almost in isolation, with only estimates of mean values being exchanged between them. This paper describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such interfaces can improve the quality of both components: On the one hand, more reliable pattern recognition can be attained, while on the other hand, enhanced signal quality is achieved when feeding back information from a pattern recognition stage to the signal preprocessing. This latter idea will be described using the example of twin-HMMs, audiovisual speech models that help to recover lost acoustic information by exploiting video data. Overall, it will be shown how broader, probabilistic interfaces between signal processing and pattern recognition can help to achieve better performance in real-world conditions, and to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.