Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973487
Sivakumar Ramakrishnan, Pradeep Isawasan, V. Mohanan
The purpose of this paper is to identify and reveal the significance of primary logical operative processes of semantic grammar of any languages for the establishment of machine interpretation. This neo generative mechanism for logical semantic representation for machine interpretation has been systematically analyzed by logical linguistic and mathematical postulations. These logical operative processes structurally provide a way in which grammatical properties of language can be treated within a framework of speech acts to accommodate and to ease the machine interpretation for ontological representation and cognitive act. This treatment also allows the sentences to be semantically interpreted and hermeneutically analyzed within the temporal movement of speech act for machine interpretation. The logical postulation of operative processes of grammar enables to provide an explanation of the grammatical intuitions of a native speaker of a language in terms of both a variety of cognitive operations and knowledge of distinct object categories to be applied in the machine interpretation.
{"title":"Logical operative processes of semantic grammar for machine interpretation","authors":"Sivakumar Ramakrishnan, Pradeep Isawasan, V. Mohanan","doi":"10.1109/IALP.2014.6973487","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973487","url":null,"abstract":"The purpose of this paper is to identify and reveal the significance of primary logical operative processes of semantic grammar of any languages for the establishment of machine interpretation. This neo generative mechanism for logical semantic representation for machine interpretation has been systematically analyzed by logical linguistic and mathematical postulations. These logical operative processes structurally provide a way in which grammatical properties of language can be treated within a framework of speech acts to accommodate and to ease the machine interpretation for ontological representation and cognitive act. This treatment also allows the sentences to be semantically interpreted and hermeneutically analyzed within the temporal movement of speech act for machine interpretation. The logical postulation of operative processes of grammar enables to provide an explanation of the grammatical intuitions of a native speaker of a language in terms of both a variety of cognitive operations and knowledge of distinct object categories to be applied in the machine interpretation.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129436498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973485
V. Phu, Phan Thi Tuoi
We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.
{"title":"Sentiment classification using Enhanced Contextual Valence Shifters","authors":"V. Phu, Phan Thi Tuoi","doi":"10.1109/IALP.2014.6973485","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973485","url":null,"abstract":"We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121989443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973519
A. Dinakaramani, Rashel Fam, A. Luthfi, R. Manurung
We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.
{"title":"Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus","authors":"A. Dinakaramani, Rashel Fam, A. Luthfi, R. Manurung","doi":"10.1109/IALP.2014.6973519","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973519","url":null,"abstract":"We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129033432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973475
Zhejie Chi, Quan Zhang
Collocation is studied as an essential linguistic phenomenon in traditional natural language processing. Similarity, collocative concept primitives are introduced in HNC Concept Primitive Space to present the concept primitive pair co-occurring frequently. Collocative concept primitives can be studied with categories together as concept primitives usually contain category information. To explore the collocation phenomenon in the field of HNC and apply collocative information to language processing, this paper presents a two-stage approach to extract category-associated collocative concept primitives from a classification corpus. By conducting collocative concept primitives extraction in each sub-category corpus and carrying out category-associated collocative concept primitives extraction in the summarized corpus, we generate a category-associated collocative concept primitives list for each category. Our experiments show the items we extract are consistent with the reality and are of significance.
{"title":"Category-associated collocative concept primitives extraction","authors":"Zhejie Chi, Quan Zhang","doi":"10.1109/IALP.2014.6973475","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973475","url":null,"abstract":"Collocation is studied as an essential linguistic phenomenon in traditional natural language processing. Similarity, collocative concept primitives are introduced in HNC Concept Primitive Space to present the concept primitive pair co-occurring frequently. Collocative concept primitives can be studied with categories together as concept primitives usually contain category information. To explore the collocation phenomenon in the field of HNC and apply collocative information to language processing, this paper presents a two-stage approach to extract category-associated collocative concept primitives from a classification corpus. By conducting collocative concept primitives extraction in each sub-category corpus and carrying out category-associated collocative concept primitives extraction in the summarized corpus, we generate a category-associated collocative concept primitives list for each category. Our experiments show the items we extract are consistent with the reality and are of significance.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973474
Haijun Zhang, Shumin Shi
This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.
提出了一种基于重复提取的新词检测性能评价方法。对于小规模语料库,我们提出采用条件随机场(Conditional Random Field, CRF)作为统计框架来评估NWD不同策略的效果。对于大规模语料库的情况,由于标注的语料库没有无限多,对比实验无法进行评价。为此,本文提出了一种语用定量模型,用于分析和评价NWD在各种情况下,特别是在大规模语料库情况下的性能。研究表明,实验结果与定量模型得出的结论具有良好的相互验证性。在对实验数据和定量模型进行分析的基础上,得出了基于两种策略的汉语新词检测效果的可靠结论,可以为汉语新词检测的后续研究提供一定的指导。
{"title":"Which performs better for new word detection, character based or Chinese Word Segmentation based?","authors":"Haijun Zhang, Shumin Shi","doi":"10.1109/IALP.2014.6973474","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973474","url":null,"abstract":"This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115327803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973476
Tianhang Wang, Shumin Shi, Heyan Huang, Congjun Long, Ruijing Li
Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.
{"title":"Research on recognition of semantic chunk boundary in Tibetan","authors":"Tianhang Wang, Shumin Shi, Heyan Huang, Congjun Long, Ruijing Li","doi":"10.1109/IALP.2014.6973476","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973476","url":null,"abstract":"Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973479
Kaihong Yang, Shuzhen Shi
Previous discussions on the translation of travel materials are mainly confined to functional and semeiotic perspectives. Authors of this paper hold that Xinjiang travel materials involve implicit information related to distinguished ethnical, geographical and historical cultures which cannot be absorbed comprehensively by English-speakers who do not share the same cultural backgrounds. They try to settle the problem with application of information decompression which means to amplify information redundancy to reduce unpredictability during message transmission. Meanwhile, they take translation plus comment, translation plus supplementation and translation plus explanation as measures in decompression. To be exact, in Chinese-English translation of Xinjiang travel materials, authors of the paper decompress the original texts and release the cultural connotations by means of translation plus comment, translation plus supplementation and translation plus explanation so as to convey correct and adequate information to receivers, shorten the cultural gap and achieve effective communication. This paper tries to propose a new prospective for the translation of Xinjiang travel materials.
{"title":"Information decompression of Xinjiang travel materials","authors":"Kaihong Yang, Shuzhen Shi","doi":"10.1109/IALP.2014.6973479","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973479","url":null,"abstract":"Previous discussions on the translation of travel materials are mainly confined to functional and semeiotic perspectives. Authors of this paper hold that Xinjiang travel materials involve implicit information related to distinguished ethnical, geographical and historical cultures which cannot be absorbed comprehensively by English-speakers who do not share the same cultural backgrounds. They try to settle the problem with application of information decompression which means to amplify information redundancy to reduce unpredictability during message transmission. Meanwhile, they take translation plus comment, translation plus supplementation and translation plus explanation as measures in decompression. To be exact, in Chinese-English translation of Xinjiang travel materials, authors of the paper decompress the original texts and release the cultural connotations by means of translation plus comment, translation plus supplementation and translation plus explanation so as to convey correct and adequate information to receivers, shorten the cultural gap and achieve effective communication. This paper tries to propose a new prospective for the translation of Xinjiang travel materials.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973471
Shohei Okada, Kazuhide Yamamoto
The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese part-of-speech (POS) for predicates is classified according to sense; however, the POS classifications differ from their semantic classification. On this subject, semantic types, which aim to classify predicates, have been proposed. In this paper, we describe semantic types and present our construction of a disambiguator for Japanese verbs. Specifically, we constructed this disambiguator using a support vector machine by building feature vectors. We used semantic categories of noun and results of morphological analysis for the feature vectors. We then achieved 69.9% accuracy of disambiguation for newspaper articles using 10-fold cross-validation.
{"title":"Semantic type disambiguation for Japanese verbs","authors":"Shohei Okada, Kazuhide Yamamoto","doi":"10.1109/IALP.2014.6973471","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973471","url":null,"abstract":"The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese part-of-speech (POS) for predicates is classified according to sense; however, the POS classifications differ from their semantic classification. On this subject, semantic types, which aim to classify predicates, have been proposed. In this paper, we describe semantic types and present our construction of a disambiguator for Japanese verbs. Specifically, we constructed this disambiguator using a support vector machine by building feature vectors. We used semantic categories of noun and results of morphological analysis for the feature vectors. We then achieved 69.9% accuracy of disambiguation for newspaper articles using 10-fold cross-validation.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"10 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120822909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973494
N. Nocon, G. Cuevas, Darwin Magat, Peter Suministrado, C. Cheng
As the number of Internet and mobile phone users grow, texting and chatting have become popular means of communication. Reaching new heights, the extensive use of cellphones and Internet led into the creation of a new language, where words are transformed and made shorter using various styles. Shortcut texting is used in informal venues such as SMS, online, chat rooms, forums and posts in social networks. Huge amounts of data originating from these informal sources can be utilized for various tasks in machine learning and data analytics. As these data may be written in shortcut forms, text normalization is necessary before NLP actions such as information extraction, data mining, text summarization, opinion classification, and even bilingual translations can be fully achieved, by acting as a preprocessing stage that transforms all informal texts back to their original and more understandable forms. This paper is about NormAPI, an API for normalizing Filipino shortcut texts. NormAPI primarily intends to be used as a preprocessing system that corrects informalities in shortcut texts before they are handed for complete data processing.
{"title":"NormAPI: An API for normalizing Filipino shortcut texts","authors":"N. Nocon, G. Cuevas, Darwin Magat, Peter Suministrado, C. Cheng","doi":"10.1109/IALP.2014.6973494","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973494","url":null,"abstract":"As the number of Internet and mobile phone users grow, texting and chatting have become popular means of communication. Reaching new heights, the extensive use of cellphones and Internet led into the creation of a new language, where words are transformed and made shorter using various styles. Shortcut texting is used in informal venues such as SMS, online, chat rooms, forums and posts in social networks. Huge amounts of data originating from these informal sources can be utilized for various tasks in machine learning and data analytics. As these data may be written in shortcut forms, text normalization is necessary before NLP actions such as information extraction, data mining, text summarization, opinion classification, and even bilingual translations can be fully achieved, by acting as a preprocessing stage that transforms all informal texts back to their original and more understandable forms. This paper is about NormAPI, an API for normalizing Filipino shortcut texts. NormAPI primarily intends to be used as a preprocessing system that corrects informalities in shortcut texts before they are handed for complete data processing.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126680942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973509
Nirmesh J. Shah, Mohammadi Zaki, H. Patil
The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this paper, we have applied HMM-based Speech Synthesis (HTS) method to Gujarati (one of the official languages of India). Adaption and evaluation of HTS for Gujarati language has been done here. In addition, to understand the influence of asymmetrical contextual factors on quality of synthesized speech, we have conducted series of experiments. Evaluation of different HTS built for Gujarati speech using various asymmetrical contextual factors is done in terms of naturalness and speech intelligibility. From the experimental results, it is evident that when more weightage is given to left phoneme in asymmetrical contextual factor, HTS performance improves compared to conventional symmetrical contextual factors for both triphone and pentaphone case. Furthermore, we achieved best performance for Gujarati HTS with left-left-left-centre-right (i.e., LLLCR) contextual factors.
{"title":"Influence of various asymmetrical contextual factors for TTS in a low resource language","authors":"Nirmesh J. Shah, Mohammadi Zaki, H. Patil","doi":"10.1109/IALP.2014.6973509","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973509","url":null,"abstract":"The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this paper, we have applied HMM-based Speech Synthesis (HTS) method to Gujarati (one of the official languages of India). Adaption and evaluation of HTS for Gujarati language has been done here. In addition, to understand the influence of asymmetrical contextual factors on quality of synthesized speech, we have conducted series of experiments. Evaluation of different HTS built for Gujarati speech using various asymmetrical contextual factors is done in terms of naturalness and speech intelligibility. From the experimental results, it is evident that when more weightage is given to left phoneme in asymmetrical contextual factor, HTS performance improves compared to conventional symmetrical contextual factors for both triphone and pentaphone case. Furthermore, we achieved best performance for Gujarati HTS with left-left-left-centre-right (i.e., LLLCR) contextual factors.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126214289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}