{"title":"ASR的AbS:一个新的计算视角","authors":"V. R. Lakkavalli","doi":"10.1109/SPCOM55316.2022.9840830","DOIUrl":null,"url":null,"abstract":"In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AbS for ASR: A New Computational Perspective\",\"authors\":\"V. R. Lakkavalli\",\"doi\":\"10.1109/SPCOM55316.2022.9840830\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.\",\"PeriodicalId\":246982,\"journal\":{\"name\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM55316.2022.9840830\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840830","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper the classical paradigm of analysis by synthesis (AbS) for automatic speech recognition (ASR) is re-visited to enhance the performance of ASR. Although AbS paradigm holds promise to explain the process of perception as proposed in Motor Theory many challenges remain to be addressed to realize a practical ASR system based on it. In this paper, i) a general architecture for ASR using AbS is presented; and, ii) a new AbS-trellis is proposed which is used to realize the AbS loop considering combination of transition (coarticulation) cost and classification cost to search for best decoding path. Initial results on TIMIT database shows that substitution errors may be reduced by employing AbS. This shows promise for using AbS in ASR, and the results further highlight the need to identify an invariant phonetic representation space, a better distance metric (or coarticulation modelling), and synthesizer.