Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973516
Shubham Sharma, Maulik C. Madhavi, H. Patil
Vocal Tract Length Normalization (VTLN) is used to design vocal tract length normalized Automatic Speech Recognition (ASR) systems. It has led to improvement in the performance of ASR systems by taking into account the physiological differences among speakers. Recently, a number of speech recognition applications are being developed for Indian languages. In this paper, we use state-of-the-art method for VTLN based on maximum likelihood approach. A vowel recognition system has been developed for two low resourced Indian languages, viz., Gujarati and Marathi. Appropriate warping factors have been obtained for all speakers considered for training and testing procedures. An improvement in the performance of vowel recognition is observed as compared to state-of-the-art Mel Frequency Cepstral Coefficients (MFCC).
{"title":"Vocal tract length normalization for vowel recognition in low resource languages","authors":"Shubham Sharma, Maulik C. Madhavi, H. Patil","doi":"10.1109/IALP.2014.6973516","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973516","url":null,"abstract":"Vocal Tract Length Normalization (VTLN) is used to design vocal tract length normalized Automatic Speech Recognition (ASR) systems. It has led to improvement in the performance of ASR systems by taking into account the physiological differences among speakers. Recently, a number of speech recognition applications are being developed for Indian languages. In this paper, we use state-of-the-art method for VTLN based on maximum likelihood approach. A vowel recognition system has been developed for two low resourced Indian languages, viz., Gujarati and Marathi. Appropriate warping factors have been obtained for all speakers considered for training and testing procedures. An improvement in the performance of vowel recognition is observed as compared to state-of-the-art Mel Frequency Cepstral Coefficients (MFCC).","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973501
Jiexin Zhang, Hailong Cao, T. Zhao
The state-of-the-art statistical machine translation models are trained with the parallel corpora. However, the traditional SMT loses its power when it comes to language pairs with few bilingual resources. This paper proposes a novel method that treats the phrase extraction as a classification task. We first automatically generate the training and testing phrase pairs for the classifier. Then, we train a SVM classifier which can determine the phrase pairs are either parallel or non-parallel. The proposed approach is evaluated on the translation task of Chinese-English. Experimental results show that the precision of the classifier on test sets is above 70% and the accuracy is above 98% The quality of the extracted data is also evaluated by measuring the impact on the performance of a state-of-the-art SMT system, which is built with a small parallel corpus. It shows better results over the baseline system.
{"title":"Extracting parallel phrases from comparable corpora","authors":"Jiexin Zhang, Hailong Cao, T. Zhao","doi":"10.1109/IALP.2014.6973501","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973501","url":null,"abstract":"The state-of-the-art statistical machine translation models are trained with the parallel corpora. However, the traditional SMT loses its power when it comes to language pairs with few bilingual resources. This paper proposes a novel method that treats the phrase extraction as a classification task. We first automatically generate the training and testing phrase pairs for the classifier. Then, we train a SVM classifier which can determine the phrase pairs are either parallel or non-parallel. The proposed approach is evaluated on the translation task of Chinese-English. Experimental results show that the precision of the classifier on test sets is above 70% and the accuracy is above 98% The quality of the extracted data is also evaluated by measuring the impact on the performance of a state-of-the-art SMT system, which is built with a small parallel corpus. It shows better results over the baseline system.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121309177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973511
Bhavik B. Vachhani, Kewal D. Malde, Maulik C. Madhavi, H. Patil
Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach. In this paper, we propose use of state-of-the-art Mel Frequency Cepstral Coefficients (MFCC)-based features to capture spectral transition for obstruent detection task. It is expected more spectral transition in the vicinity of obstruents. The entire experimental setup is developed on TIMIT database. The detection efficiency and estimated probability are around 77 % and 0.77 respectively (with 30 ms agreement duration and 0.4 STM threshold).
{"title":"A spectral transition measure based MELCEPSTRAL features for obstruent detection","authors":"Bhavik B. Vachhani, Kewal D. Malde, Maulik C. Madhavi, H. Patil","doi":"10.1109/IALP.2014.6973511","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973511","url":null,"abstract":"Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach. In this paper, we propose use of state-of-the-art Mel Frequency Cepstral Coefficients (MFCC)-based features to capture spectral transition for obstruent detection task. It is expected more spectral transition in the vicinity of obstruents. The entire experimental setup is developed on TIMIT database. The detection efficiency and estimated probability are around 77 % and 0.77 respectively (with 30 ms agreement duration and 0.4 STM threshold).","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973517
Maulik C. Madhavi, Shubham Sharma, H. Patil
This paper discusses development of resources using linguistics and signal processing aspects for two low resource Indian languages, viz., Gujarati and Marathi. Speech resource development discusses the details of data collection, transcription at phone and syllable level and corresponding linguistic units such as phones and syllables. In order to analyze the performance at different fluency levels, three types of recording modes, viz., read, conversation and lecture are considered in this paper. Manual annotation of speech in terms of International Phonetic Alphabet (IPA) symbols is presented. In the later section, we discuss speech segmentation at syllable level and prosodic level marking (pitch marking). Short-term Energy contour is smoothened using group-delay-based algorithm in order to detect syllable units in the speech signal. Detection rate obtained for syllable marking within 20 % agreement duration is of the order of 60 % in case of read mode speech. Prosody pitch marks are analyzed via Fo pattern of a speech signal. The key strength of this study is the analysis for different kinds of recording modes, viz., read, conversation and lecture mode. It is found that CV (where, Consonant is followed by Vowel) type of syllables have highest occurrence (more than 50 %) in both the languages. Read speech is observed to perform better than spontaneous speech in terms of automatic prosodic marking.
{"title":"Development of language resources for speech application in Gujarati and Marathi","authors":"Maulik C. Madhavi, Shubham Sharma, H. Patil","doi":"10.1109/IALP.2014.6973517","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973517","url":null,"abstract":"This paper discusses development of resources using linguistics and signal processing aspects for two low resource Indian languages, viz., Gujarati and Marathi. Speech resource development discusses the details of data collection, transcription at phone and syllable level and corresponding linguistic units such as phones and syllables. In order to analyze the performance at different fluency levels, three types of recording modes, viz., read, conversation and lecture are considered in this paper. Manual annotation of speech in terms of International Phonetic Alphabet (IPA) symbols is presented. In the later section, we discuss speech segmentation at syllable level and prosodic level marking (pitch marking). Short-term Energy contour is smoothened using group-delay-based algorithm in order to detect syllable units in the speech signal. Detection rate obtained for syllable marking within 20 % agreement duration is of the order of 60 % in case of read mode speech. Prosody pitch marks are analyzed via Fo pattern of a speech signal. The key strength of this study is the analysis for different kinds of recording modes, viz., read, conversation and lecture mode. It is found that CV (where, Consonant is followed by Vowel) type of syllables have highest occurrence (more than 50 %) in both the languages. Read speech is observed to perform better than spontaneous speech in terms of automatic prosodic marking.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973503
Lu Zhao, Hui Feng, Huixia Wang, J. Dang
The present study, under the framework of Lado's Contrastive Analysis Hypothesis (1957) and Flege's Speech Learning Model (1995), is to find out whether and to what extent the Tibetan vowel space of native Tibetan speakers affects the “working” vowel space of Mandarin Chinese. The experiment adopts the experimental phonetic approach to examine the features of vowel space of 10 Tibetan speakers (5 male and 5 female) when they read monosyllabic words in Tibetan and Standard Chinese. When compared with the vowel space of Chinese speakers, the vowel space of Tibetan speakers presents the following features: 1. The overall distribution of the vowel space of Tibetan speakers is higher than that of Chinese speakers because the vowels produced by Tibetan speakers have lower Fl. 2. Under the influence of Tibetan vowel system, Tibetan speakers' vowel space of Mandarin monophthongs is slightly to the right of Chinese speakers' vowel space. 3. With the calculation of the Euclidean Distance between Tibetan speakers' monophthongs and Chinese speakers' monophthongs, the production of Mandarin monophthongs by Tibetan male speakers cannot be explained by Flege's Speech Learning Model, while the production of Mandarin monophthongs by Tibetan female speakers provides more evidence to the justification of Speech Learning Model.
{"title":"Acoustic features of Mandarin monophthongs by Tibetan speakers","authors":"Lu Zhao, Hui Feng, Huixia Wang, J. Dang","doi":"10.1109/IALP.2014.6973503","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973503","url":null,"abstract":"The present study, under the framework of Lado's Contrastive Analysis Hypothesis (1957) and Flege's Speech Learning Model (1995), is to find out whether and to what extent the Tibetan vowel space of native Tibetan speakers affects the “working” vowel space of Mandarin Chinese. The experiment adopts the experimental phonetic approach to examine the features of vowel space of 10 Tibetan speakers (5 male and 5 female) when they read monosyllabic words in Tibetan and Standard Chinese. When compared with the vowel space of Chinese speakers, the vowel space of Tibetan speakers presents the following features: 1. The overall distribution of the vowel space of Tibetan speakers is higher than that of Chinese speakers because the vowels produced by Tibetan speakers have lower Fl. 2. Under the influence of Tibetan vowel system, Tibetan speakers' vowel space of Mandarin monophthongs is slightly to the right of Chinese speakers' vowel space. 3. With the calculation of the Euclidean Distance between Tibetan speakers' monophthongs and Chinese speakers' monophthongs, the production of Mandarin monophthongs by Tibetan male speakers cannot be explained by Flege's Speech Learning Model, while the production of Mandarin monophthongs by Tibetan female speakers provides more evidence to the justification of Speech Learning Model.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133411277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973523
Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang
Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.
{"title":"Automatic acquisition of morphological resources for Melanau language","authors":"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang","doi":"10.1109/IALP.2014.6973523","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973523","url":null,"abstract":"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131323182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-10-01DOI: 10.1109/IALP.2014.6973496
Xiaodie Liu, Yun Zhu, Yaohong Jin
We focused on when and how to reorder the complicated Chinese NPs with two, three, four or five semantic-units and the semantic units were smallest chunks for reordering in Chinese-English Machine Translation. By analyzing clear parallels and striking distinctions between complicated Chinese NPs and their English, we built 17 formalized rules to identify the boundaries of semantic units with the Boundary-Words deduced from semantic features to recognized what to reorder and developed a strategy on how to reorder the internal ordering of complicated Chinese NPs when translated into English. At last, we used a rule-based MT system to test our work, and the experimental results showed that our strategy and rule-based method were very efficient.
{"title":"Local phrase reordering model for complicated Chinese NPs in patent Chinese-English machine translation","authors":"Xiaodie Liu, Yun Zhu, Yaohong Jin","doi":"10.1109/IALP.2014.6973496","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973496","url":null,"abstract":"We focused on when and how to reorder the complicated Chinese NPs with two, three, four or five semantic-units and the semantic units were smallest chunks for reordering in Chinese-English Machine Translation. By analyzing clear parallels and striking distinctions between complicated Chinese NPs and their English, we built 17 formalized rules to identify the boundaries of semantic units with the Boundary-Words deduced from semantic features to recognized what to reorder and developed a strategy on how to reorder the internal ordering of complicated Chinese NPs when translated into English. At last, we used a rule-based MT system to test our work, and the experimental results showed that our strategy and rule-based method were very efficient.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"2001 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128563330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}