N. Srisuwan, Michael Wand, M. Janke, P. Phukpattaranont, Tanja Schultz, C. Limsakul
{"title":"Enhancement of EMG-based Thai number words classification using frame-based time domain features with stacking filter","authors":"N. Srisuwan, Michael Wand, M. Janke, P. Phukpattaranont, Tanja Schultz, C. Limsakul","doi":"10.1109/APSIPA.2014.7041549","DOIUrl":null,"url":null,"abstract":"In order to overcome a problem existing in a classical automatic speech recognition (e.g. ambient noise and loss of privacy), Electromyography (EMG) from speech production muscles was used in place of a human speech signal. We aim to investigate the EMG speech recognition based on Thai language. The earlier work, we used five channels of the EMG from the facial and neck muscles to classify 11 Thai number words based on Neural Network Classification. 15 features in time domain and frequency domain were employed for feature extraction. We obtained an average accuracy rate of 89.45% for audible speech and 78.55% for silent speech. However, it needs to be enhanced to get the best result. This paper proposes to improve an accuracy rate of EMG-based Thai number words classification. The ten subjects uttered 11 words in both an audible and a silent speech while five channels of the EMG signal were captured. Frame-based time domain features with a stacking filter was performed for feature extraction stage. After that, LDA was used to lessen a dimension of the feature vector. Hidden Markov Model (HMM) was employed in classification stage. The results show that using above techniques of feature extraction, feature dimensionality reduction and classification can improve an average accuracy rate by 3% absolute for audible speech when were compared to earlier work. We achieved an average classification rate of 92.45% and 75.73% for audible and silent speech respectively.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In order to overcome a problem existing in a classical automatic speech recognition (e.g. ambient noise and loss of privacy), Electromyography (EMG) from speech production muscles was used in place of a human speech signal. We aim to investigate the EMG speech recognition based on Thai language. The earlier work, we used five channels of the EMG from the facial and neck muscles to classify 11 Thai number words based on Neural Network Classification. 15 features in time domain and frequency domain were employed for feature extraction. We obtained an average accuracy rate of 89.45% for audible speech and 78.55% for silent speech. However, it needs to be enhanced to get the best result. This paper proposes to improve an accuracy rate of EMG-based Thai number words classification. The ten subjects uttered 11 words in both an audible and a silent speech while five channels of the EMG signal were captured. Frame-based time domain features with a stacking filter was performed for feature extraction stage. After that, LDA was used to lessen a dimension of the feature vector. Hidden Markov Model (HMM) was employed in classification stage. The results show that using above techniques of feature extraction, feature dimensionality reduction and classification can improve an average accuracy rate by 3% absolute for audible speech when were compared to earlier work. We achieved an average classification rate of 92.45% and 75.73% for audible and silent speech respectively.