{"title":"基于加权有限状态换能器的多平台语音识别解码器","authors":"Emilian Stoimenov, Tanja Schultz","doi":"10.1109/ASRU.2009.5373404","DOIUrl":null,"url":null,"abstract":"Speech recognition decoders based on static graphs have recently proven to significantly outperform the traditional approach of prefix tree expansion in terms of decoding speed [1], [2]. The reduced search effort makes static graph decoders an attractive alternative for tasks concerned with limited processing power or memory footprint on devices such as PDAs, internet tablets, and smart phones. In this paper we explore the benefits of decoding with an optimized speech recognition network over the fully task-optimized prefix-tree based decoder IBIS [3]. We designed and implemented a new decoder called SWIFT (Speedy WeIgthed Finite-state Transducer) based on WFSTs with its application to embedded platforms in mind. After describing the design, the network construction and storage process, we present evaluation results on a small task suitable for embedded applications, and on a large task, namely the European Parliament Plenary Sessions (EPPS) task from the TC-STAR project [20]. The SWIFT Decoder is up to 50% faster than IBIS on both tasks. In addition, SWIFT achieves significant memory consumption reductions obtained by our innovative network specific storage layout optimization.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A multiplatform speech recognition decoder based on weighted finite-state transducers\",\"authors\":\"Emilian Stoimenov, Tanja Schultz\",\"doi\":\"10.1109/ASRU.2009.5373404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition decoders based on static graphs have recently proven to significantly outperform the traditional approach of prefix tree expansion in terms of decoding speed [1], [2]. The reduced search effort makes static graph decoders an attractive alternative for tasks concerned with limited processing power or memory footprint on devices such as PDAs, internet tablets, and smart phones. In this paper we explore the benefits of decoding with an optimized speech recognition network over the fully task-optimized prefix-tree based decoder IBIS [3]. We designed and implemented a new decoder called SWIFT (Speedy WeIgthed Finite-state Transducer) based on WFSTs with its application to embedded platforms in mind. After describing the design, the network construction and storage process, we present evaluation results on a small task suitable for embedded applications, and on a large task, namely the European Parliament Plenary Sessions (EPPS) task from the TC-STAR project [20]. The SWIFT Decoder is up to 50% faster than IBIS on both tasks. In addition, SWIFT achieves significant memory consumption reductions obtained by our innovative network specific storage layout optimization.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373404\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A multiplatform speech recognition decoder based on weighted finite-state transducers
Speech recognition decoders based on static graphs have recently proven to significantly outperform the traditional approach of prefix tree expansion in terms of decoding speed [1], [2]. The reduced search effort makes static graph decoders an attractive alternative for tasks concerned with limited processing power or memory footprint on devices such as PDAs, internet tablets, and smart phones. In this paper we explore the benefits of decoding with an optimized speech recognition network over the fully task-optimized prefix-tree based decoder IBIS [3]. We designed and implemented a new decoder called SWIFT (Speedy WeIgthed Finite-state Transducer) based on WFSTs with its application to embedded platforms in mind. After describing the design, the network construction and storage process, we present evaluation results on a small task suitable for embedded applications, and on a large task, namely the European Parliament Plenary Sessions (EPPS) task from the TC-STAR project [20]. The SWIFT Decoder is up to 50% faster than IBIS on both tasks. In addition, SWIFT achieves significant memory consumption reductions obtained by our innovative network specific storage layout optimization.