Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973514
Gözde Gül Şahin, Harun Resit Zafer, E. Adali
In this study, comments about technology brands are collected from a popular Turkish website, eksisözlük, and classified as positive or negative. Turkish text is preprocessed with different kinds of filters and then modeled with 1-gram, 2-grams and 3-grams language models. Naive Bayes (NB), Support Vector Machines (SVM) and K nearest neighbor (KNN) classifiers are applied on different configurations of preprocessing techniques, language models and linguistic attributes for comparison. We measured best F-measure as 0,696 on our test dataset.
{"title":"Polarity detection of Turkish comments on technology companies","authors":"Gözde Gül Şahin, Harun Resit Zafer, E. Adali","doi":"10.1109/IALP.2014.6973514","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973514","url":null,"abstract":"In this study, comments about technology brands are collected from a popular Turkish website, eksisözlük, and classified as positive or negative. Turkish text is preprocessed with different kinds of filters and then modeled with 1-gram, 2-grams and 3-grams language models. Naive Bayes (NB), Support Vector Machines (SVM) and K nearest neighbor (KNN) classifiers are applied on different configurations of preprocessing techniques, language models and linguistic attributes for comparison. We measured best F-measure as 0,696 on our test dataset.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"53 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127573206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973481
Li-jun Zhang
Nominal predicate sentences (NPS) in Chinese can be divided into two categories: NPS1 and NPS2. NPS1 that can be transformed into SHI (be) sentences are mainly assertive and static, whereas NPS2 that can't be transformed into SHI sentences are mainly descriptive, declarative and dynamic. There are two main differences between NPS and SHI sentences: different styles and different prominent degrees of information focus.
{"title":"A comparative study of nominal predicate sentences (NPS) and SHI (be) sentences","authors":"Li-jun Zhang","doi":"10.1109/IALP.2014.6973481","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973481","url":null,"abstract":"Nominal predicate sentences (NPS) in Chinese can be divided into two categories: NPS1 and NPS2. NPS1 that can be transformed into SHI (be) sentences are mainly assertive and static, whereas NPS2 that can't be transformed into SHI sentences are mainly descriptive, declarative and dynamic. There are two main differences between NPS and SHI sentences: different styles and different prominent degrees of information focus.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973482
Anil Krishna Eragani, V. Kuchibhotla
In this paper, we present an approach to integrate unlexicalised grammatical features into Malt dependency parser. Malt parser is a lexicalised parser, and like every lexicalised parser, it is prone to data sparseness. We aim to address this problem by providing features from an unlexicalised parser. Contrary to lexicalised parsers, unlexicalised parsers are known for their robustness. We build a simple unlexicalised grammatical parser with POS tag sequences as grammar rules. We use the features from the grammatical parser as additional features to Malt. We achieved improvements of about 0.17-0.30% (UAS) on both English and Hindi state-of-the-art Malt results.
{"title":"Improving malt dependency parser using a simple grammar-driven unlexicalised dependency parser","authors":"Anil Krishna Eragani, V. Kuchibhotla","doi":"10.1109/IALP.2014.6973482","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973482","url":null,"abstract":"In this paper, we present an approach to integrate unlexicalised grammatical features into Malt dependency parser. Malt parser is a lexicalised parser, and like every lexicalised parser, it is prone to data sparseness. We aim to address this problem by providing features from an unlexicalised parser. Contrary to lexicalised parsers, unlexicalised parsers are known for their robustness. We build a simple unlexicalised grammatical parser with POS tag sequences as grammar rules. We use the features from the grammatical parser as additional features to Malt. We achieved improvements of about 0.17-0.30% (UAS) on both English and Hindi state-of-the-art Malt results.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132928448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973521
Rashel Fam, A. Luthfi, A. Dinakaramani, R. Manurung
This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
{"title":"Building an Indonesian rule-based part-of-speech tagger","authors":"Rashel Fam, A. Luthfi, A. Dinakaramani, R. Manurung","doi":"10.1109/IALP.2014.6973521","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973521","url":null,"abstract":"This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133085395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973473
Xiangfeng Wei, Quan Zhang, Yi Yuan, Zhejie Chi
Sentence group is a linguistic unit between a sentence and an article. It is mapped into contextual element, one of the four layers of linguistic conceptual space in HNC (Hierarchical Network of Concepts) theory. The contextual element is composed of three components: domain, situation and background, with domain in the head of them. To extract the domain and situation of a sentence group, this paper proposed a bottom-up method, which extracts domain-related conceptual symbols from words, and then obtains the domain of a sentence according to the frequencies of the domain-related words and their semantic roles in the sentence. Finally it got the domain of a sentence group by merging sentences with the same domain and got the boundary of the sentence group. The experiment shows that this method can tackle some types of sentence groups well in real corpus. However, there are still a lot of details to be studied in extracting the domain, confirming the boundary of sentence group and extracting the framework of a sentence group.
句群是介于句子和冠词之间的语言单位。它被映射为上下文元素,这是HNC (hierarchy Network of Concepts)理论中语言概念空间的四层之一。语境要素由三部分组成:领域、情境和背景,其中领域居于首位。为了提取句子组的领域和情境,本文提出了一种自下而上的方法,即从词中提取领域相关的概念符号,然后根据领域相关词在句子中的频率及其语义角色得到句子的领域。最后通过对具有相同域的句子进行归并得到句子组的域,得到句子组的边界。实验表明,该方法可以很好地处理真实语料库中某些类型的句子组。但是,在提取领域、确定句群边界、提取句群框架等方面,仍有很多细节有待研究。
{"title":"A bottom-up method for analyzing the domain of a sentence group","authors":"Xiangfeng Wei, Quan Zhang, Yi Yuan, Zhejie Chi","doi":"10.1109/IALP.2014.6973473","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973473","url":null,"abstract":"Sentence group is a linguistic unit between a sentence and an article. It is mapped into contextual element, one of the four layers of linguistic conceptual space in HNC (Hierarchical Network of Concepts) theory. The contextual element is composed of three components: domain, situation and background, with domain in the head of them. To extract the domain and situation of a sentence group, this paper proposed a bottom-up method, which extracts domain-related conceptual symbols from words, and then obtains the domain of a sentence according to the frequencies of the domain-related words and their semantic roles in the sentence. Finally it got the domain of a sentence group by merging sentences with the same domain and got the boundary of the sentence group. The experiment shows that this method can tackle some types of sentence groups well in real corpus. However, there are still a lot of details to be studied in extracting the domain, confirming the boundary of sentence group and extracting the framework of a sentence group.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133699256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973491
Liang-Chu Chen, Wen-Tsan Chao, Chia-Jung Hsieh
Since most search engines retrieve documents strictly based on keywords, they cannot obtain other content that is similar in idea but different in keywords. Therefore, semantic query expansion is very important and ontology is a critical foundation for supporting semantic query expansion. Ontology has been used in Information Retrieval, Data Category, Library Sciences and Medical Sciences; however, its use is rare in the Military Domain. There are two purposes for this research. The first is to use a “Military Dictionary” database as a fundamental and combine it with the procedure of formal concept analysis to automatically construct the relationship between military ontology and vocabulary concepts. The second is to use military news from the “Defense Technology Military Database” as a training data resource, to design a novel query expansion with the Keyword to Formal Concept Query Expansion (K2FCQE) algorithm and then to proceed query mode verification. The results of this research verify that the K2FCQE is more efficient than other query expansions.
{"title":"A novel query expansion method for military news retrieval service","authors":"Liang-Chu Chen, Wen-Tsan Chao, Chia-Jung Hsieh","doi":"10.1109/IALP.2014.6973491","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973491","url":null,"abstract":"Since most search engines retrieve documents strictly based on keywords, they cannot obtain other content that is similar in idea but different in keywords. Therefore, semantic query expansion is very important and ontology is a critical foundation for supporting semantic query expansion. Ontology has been used in Information Retrieval, Data Category, Library Sciences and Medical Sciences; however, its use is rare in the Military Domain. There are two purposes for this research. The first is to use a “Military Dictionary” database as a fundamental and combine it with the procedure of formal concept analysis to automatically construct the relationship between military ontology and vocabulary concepts. The second is to use military news from the “Defense Technology Military Database” as a training data resource, to design a novel query expansion with the Keyword to Formal Concept Query Expansion (K2FCQE) algorithm and then to proceed query mode verification. The results of this research verify that the K2FCQE is more efficient than other query expansions.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129584991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973484
Zhenxin Yang, Miao Li, Zede Zhu, Lei Chen, Linyu Wei, Shaoqi Wang
Different order between Mongolian and Chinese and the scarcity of parallel corpus are the main problems in Mongolian-Chinese statistical machine translation (SMT). We propose a method that adopts morphological information as the features of the maximum entropy based phrase reordering model for Mongolian-Chinese SMT. By taking advantage of the Mongolian morphological information, we add Mongolian stem and affix as phrase boundary information and use a maximum entropy model to predict reordering of neighbor blocks. To some extent, our method can alleviate the influence of reordering caused by the data sparseness. In addition, we further add part-of-speech (POS) as the features in the reordering model. Experiments show that the approach outperforms the maximum entropy model using only boundary words information and provides a maximum improvement of 0.8 BLEU score increment over baseline.
{"title":"A maximum entropy based reordering model for Mongolian-Chinese SMT with morphological information","authors":"Zhenxin Yang, Miao Li, Zede Zhu, Lei Chen, Linyu Wei, Shaoqi Wang","doi":"10.1109/IALP.2014.6973484","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973484","url":null,"abstract":"Different order between Mongolian and Chinese and the scarcity of parallel corpus are the main problems in Mongolian-Chinese statistical machine translation (SMT). We propose a method that adopts morphological information as the features of the maximum entropy based phrase reordering model for Mongolian-Chinese SMT. By taking advantage of the Mongolian morphological information, we add Mongolian stem and affix as phrase boundary information and use a maximum entropy model to predict reordering of neighbor blocks. To some extent, our method can alleviate the influence of reordering caused by the data sparseness. In addition, we further add part-of-speech (POS) as the features in the reordering model. Experiments show that the approach outperforms the maximum entropy model using only boundary words information and provides a maximum improvement of 0.8 BLEU score increment over baseline.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"38 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973499
Yi Qin, Guonian Wang
This paper introduces a pilot study on incorporating effective teaching methods in computer-aided pronunciation training (CAPT) programs to help English-speaking learners acquire Mandarin lexical tones by using speech analysis software. It is proved that CAPT programs help learners identify relevant acoustic cues and discern the four tones in Chinese language, which they found hard to differentiate and imitate. Acoustic analyses of the pitch track comparisons between pre- and post-training productions in the form of visual display of speakers' pitch curves (tracks) reveal the nature of the improving process for the learner. Acoustic images also indicate that post-training tone curves (tracks) approximate native norms to a greater degree than pre-training tone tracks. The methodology developed hereby may provide a platform for more efficient Chinese learning.
{"title":"A computer-aided Chinese pronunciation training program for English-speaking learners","authors":"Yi Qin, Guonian Wang","doi":"10.1109/IALP.2014.6973499","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973499","url":null,"abstract":"This paper introduces a pilot study on incorporating effective teaching methods in computer-aided pronunciation training (CAPT) programs to help English-speaking learners acquire Mandarin lexical tones by using speech analysis software. It is proved that CAPT programs help learners identify relevant acoustic cues and discern the four tones in Chinese language, which they found hard to differentiate and imitate. Acoustic analyses of the pitch track comparisons between pre- and post-training productions in the form of visual display of speakers' pitch curves (tracks) reveal the nature of the improving process for the learner. Acoustic images also indicate that post-training tone curves (tracks) approximate native norms to a greater degree than pre-training tone tracks. The methodology developed hereby may provide a platform for more efficient Chinese learning.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133809846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973478
Pu Li, Hao Zhao
Imperative sentences with assertive mood(ISAM), being positioned between typical declarative sentences and typical imperative sentences, appear as declarative sentences, but perform the function of imperative sentences. They are characterized by their verbs indicating action classification and the verbs are named “performative verbs”. The essay firstly explains why an imperative sentence with assertive mood can perform imperative function considering its formation and reveals the way an imperative sentence with assertive mood takes to transform from a declarative sentence to an imperative sentence. Secondly, performative verbs are categorized and their distances with imperative function are presented. Finally, based on the elements attached to the performative verbs, ISAM are classified into four categories.
{"title":"Imperative sentences with assertive mood","authors":"Pu Li, Hao Zhao","doi":"10.1109/IALP.2014.6973478","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973478","url":null,"abstract":"Imperative sentences with assertive mood(ISAM), being positioned between typical declarative sentences and typical imperative sentences, appear as declarative sentences, but perform the function of imperative sentences. They are characterized by their verbs indicating action classification and the verbs are named “performative verbs”. The essay firstly explains why an imperative sentence with assertive mood can perform imperative function considering its formation and reveals the way an imperative sentence with assertive mood takes to transform from a declarative sentence to an imperative sentence. Secondly, performative verbs are categorized and their distances with imperative function are presented. Finally, based on the elements attached to the performative verbs, ISAM are classified into four categories.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131280222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-04DOI: 10.1109/IALP.2014.6973492
T. Tan, L. Besacier, B. Lecouteux
Many studies have explored on the usage of existing multilingual speech corpora to build an acoustic model for a target language. These works on multilingual acoustic modeling often use multilingual acoustic models to create an initial model. This initial model created is often suboptimal in decoding speech of the target language. Some speech of the target language is then used to adapt and improve the initial model. In this paper however, we investigate multilingual acoustic modeling in enhancing an acoustic model of the target language for automatic speech recognition system. The proposed approach employs context dependent acoustic model merging of a source language to adapt acoustic model of a target language. The source and target language speech are spoken by speakers from the same country. Our experiments on Malay and English automatic speech recognition shows relative improvement in WER from 2% to about 10% when multilingual acoustic model was employed.
{"title":"Acoustic model merging using acoustic models from multilingual speakers for automatic speech recognition","authors":"T. Tan, L. Besacier, B. Lecouteux","doi":"10.1109/IALP.2014.6973492","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973492","url":null,"abstract":"Many studies have explored on the usage of existing multilingual speech corpora to build an acoustic model for a target language. These works on multilingual acoustic modeling often use multilingual acoustic models to create an initial model. This initial model created is often suboptimal in decoding speech of the target language. Some speech of the target language is then used to adapt and improve the initial model. In this paper however, we investigate multilingual acoustic modeling in enhancing an acoustic model of the target language for automatic speech recognition system. The proposed approach employs context dependent acoustic model merging of a source language to adapt acoustic model of a target language. The source and target language speech are spoken by speakers from the same country. Our experiments on Malay and English automatic speech recognition shows relative improvement in WER from 2% to about 10% when multilingual acoustic model was employed.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122037090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}