{"title":"连接主义语音识别系统中说话人自适应的厄米多项式","authors":"S. Siniscalchi, Jinyu Li, Chin-Hui Lee","doi":"10.1109/TASL.2013.2270370","DOIUrl":null,"url":null,"abstract":"Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270370","citationCount":"79","resultStr":"{\"title\":\"Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems\",\"authors\":\"S. Siniscalchi, Jinyu Li, Chin-Hui Lee\",\"doi\":\"10.1109/TASL.2013.2270370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2270370\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2270370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2270370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems
Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.