Different from English, Chinese does not explicitly show grammatical number information by inflection. The Number information in a Chinese sentence is implied by the noun phrase itself and its surrounding context. In this paper, we explore diverse features, including both flat and structured, for number identification of Chinese personal noun phrase. The flat features explore the knowledge within the noun phrase while the structured features capture the surrounding context information of the noun phrase in the parse tree of the given sentence. These two kinds of features together with kernel-based SVM are utilized in this study. Evaluation on the ACE 2005 corpus shows that our method achieves 89.23% in accuracy, which significantly advances the state-of-the-art.
{"title":"Exploring Both Flat and Structured Features for Number Type Identification of Chinese Personal Noun Phrases","authors":"Jun Lang","doi":"10.1109/IALP.2011.69","DOIUrl":"https://doi.org/10.1109/IALP.2011.69","url":null,"abstract":"Different from English, Chinese does not explicitly show grammatical number information by inflection. The Number information in a Chinese sentence is implied by the noun phrase itself and its surrounding context. In this paper, we explore diverse features, including both flat and structured, for number identification of Chinese personal noun phrase. The flat features explore the knowledge within the noun phrase while the structured features capture the surrounding context information of the noun phrase in the parse tree of the given sentence. These two kinds of features together with kernel-based SVM are utilized in this study. Evaluation on the ACE 2005 corpus shows that our method achieves 89.23% in accuracy, which significantly advances the state-of-the-art.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126208612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper improves an unsupervised method for extracting parallel sentence pairs from a comparable corpus by using the triangulation through a third language. Before, an unsupervised method for extracting parallel sentence pairs from a comparable corpus has been proposed. This method is based on technique of cross-language information retrieval with iterative process and requires no more additional parallel data. The method has been validated on the Vietnamese-French and Vietnamese-English bilingual data. In this paper, we address the problem of using triangulation through a third language to improve the parallel data mining processes: English is used in the Vietnamese-French parallel data mining process, and French is used in the Vietnamese-English parallel data mining process. The experiments conducted show that using triangulation can improve the quality of the extracted data and the quality of the translation system as well.
{"title":"Mining Parallel Data from Comparable Corpora via Triangulation","authors":"T. Do, E. Castelli, L. Besacier","doi":"10.1109/IALP.2011.57","DOIUrl":"https://doi.org/10.1109/IALP.2011.57","url":null,"abstract":"This paper improves an unsupervised method for extracting parallel sentence pairs from a comparable corpus by using the triangulation through a third language. Before, an unsupervised method for extracting parallel sentence pairs from a comparable corpus has been proposed. This method is based on technique of cross-language information retrieval with iterative process and requires no more additional parallel data. The method has been validated on the Vietnamese-French and Vietnamese-English bilingual data. In this paper, we address the problem of using triangulation through a third language to improve the parallel data mining processes: English is used in the Vietnamese-French parallel data mining process, and French is used in the Vietnamese-English parallel data mining process. The experiments conducted show that using triangulation can improve the quality of the extracted data and the quality of the translation system as well.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114912692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong
To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.
{"title":"Improving Chinese Dependency Parsing with Self-Disambiguating Patterns","authors":"Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong","doi":"10.1109/IALP.2011.36","DOIUrl":"https://doi.org/10.1109/IALP.2011.36","url":null,"abstract":"To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117113707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The modern Vietnamese is a monosyllabic tone language. Each syllable can be marked with initial, final and tone. In this paper, Vietnamese speech synthesis system is realized by using a trainable HMM-based speech synthesis method. The basic synthesis units of this system are initials and finals. According to the characteristics of Vietnamese, we have conducted such works as collecting corpus, recording, labeling, determining the phonemes list, and designing context attributes and question set. Then Vietnamese speech synthesis system is constructed by using the STRAIGHT synthesizer under the HTS platform. At last, we conduct a subjective test to synthetic speech signals. The results of preliminary evaluation show that the intelligibility of the utterances is approximately 100%, and the quality of synthesis speech is from fair to good.
{"title":"An Experimental Study on Vietnamese Speech Synthesis","authors":"Liping Kui, Jian Yang, Bin He, Enxing Hu","doi":"10.1109/IALP.2011.40","DOIUrl":"https://doi.org/10.1109/IALP.2011.40","url":null,"abstract":"The modern Vietnamese is a monosyllabic tone language. Each syllable can be marked with initial, final and tone. In this paper, Vietnamese speech synthesis system is realized by using a trainable HMM-based speech synthesis method. The basic synthesis units of this system are initials and finals. According to the characteristics of Vietnamese, we have conducted such works as collecting corpus, recording, labeling, determining the phonemes list, and designing context attributes and question set. Then Vietnamese speech synthesis system is constructed by using the STRAIGHT synthesizer under the HTS platform. At last, we conduct a subjective test to synthetic speech signals. The results of preliminary evaluation show that the intelligibility of the utterances is approximately 100%, and the quality of synthesis speech is from fair to good.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"16 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard
Prosodic attitudes, or social affects, are main part of face-to-face interaction and linked to the language through the culture. This paper presents a study on prosodic attitudes in Vietnamese, a tonal language. Perception experiments on 16 Vietnamese attitudes were carried out with Vietnamese and French participants. The results revealed perception differences between native and non-native listeners. As attitudinal expression are partially carried through speech prosody, an analysis was also carried out, in order to have a better understanding of why these attitudes are recognized or confused, and to bring out some prosodic characteristics of Vietnamese social affects.
{"title":"How Vietnamese Attitudes can be Recognized and Confused: Cross-Cultural Perception and Speech Prosody Analysis","authors":"Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard","doi":"10.1109/IALP.2011.39","DOIUrl":"https://doi.org/10.1109/IALP.2011.39","url":null,"abstract":"Prosodic attitudes, or social affects, are main part of face-to-face interaction and linked to the language through the culture. This paper presents a study on prosodic attitudes in Vietnamese, a tonal language. Perception experiments on 16 Vietnamese attitudes were carried out with Vietnamese and French participants. The results revealed perception differences between native and non-native listeners. As attitudinal expression are partially carried through speech prosody, an analysis was also carried out, in order to have a better understanding of why these attitudes are recognized or confused, and to bring out some prosodic characteristics of Vietnamese social affects.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117272153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transliteration is the transformation of word in original language to another language based on its pronunciation. Back transliteration is the transformation of already transliterated word in another language back to its original form. This backward process is in nature more challenging than the forward direction because of more information lost. In many cases, the back transliteration can return almost exact result, which has a minor difference in spelling compared with the original word form. We propose in this work a lexical word similarity for dictionary matching in order to re-rank the candidates and enhance the performance of a grapheme-based location name back transliteration. This method is experimented on Vietnamese-English language pair and showed improvement.
{"title":"Lexical Word Similarity for Re-ranking in Vietnamese-English Named Entity Back Transliteration","authors":"Diem Thi Hoang Le, AiTi Aw","doi":"10.1109/IALP.2011.44","DOIUrl":"https://doi.org/10.1109/IALP.2011.44","url":null,"abstract":"Transliteration is the transformation of word in original language to another language based on its pronunciation. Back transliteration is the transformation of already transliterated word in another language back to its original form. This backward process is in nature more challenging than the forward direction because of more information lost. In many cases, the back transliteration can return almost exact result, which has a minor difference in spelling compared with the original word form. We propose in this work a lexical word similarity for dictionary matching in order to re-rank the candidates and enhance the performance of a grapheme-based location name back transliteration. This method is experimented on Vietnamese-English language pair and showed improvement.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophia Yat-Mei Lee, Daming Dai, Shoushan Li, K. Ahrens
Sentiment and emotion analysis have been traditionally established as independent research topics in NLP. Although they are two important aspects of subjective information and are closely related, there have been few attempts to combine the two analyses. As a preliminary attempt, we integrate emotion information into sentiment analysis by employing emotion keywords to help automatically extract pseudo-labeled samples. The extracted pseudo-labeled samples are then used as the initial training data to perform semi-supervised learning for sentiment classification. Experimental results across four domains show that our approach using emotion keywords is capable of extracting pseudo-labeled samples with high precision (about 90%). Moreover, the pseudo-labeled samples along with the semi-supervised learning approach further improve the classification performance.
{"title":"Extracting Pseudo-Labeled Samples for Sentiment Classification Using Emotion Keywords","authors":"Sophia Yat-Mei Lee, Daming Dai, Shoushan Li, K. Ahrens","doi":"10.1109/IALP.2011.61","DOIUrl":"https://doi.org/10.1109/IALP.2011.61","url":null,"abstract":"Sentiment and emotion analysis have been traditionally established as independent research topics in NLP. Although they are two important aspects of subjective information and are closely related, there have been few attempts to combine the two analyses. As a preliminary attempt, we integrate emotion information into sentiment analysis by employing emotion keywords to help automatically extract pseudo-labeled samples. The extracted pseudo-labeled samples are then used as the initial training data to perform semi-supervised learning for sentiment classification. Experimental results across four domains show that our approach using emotion keywords is capable of extracting pseudo-labeled samples with high precision (about 90%). Moreover, the pseudo-labeled samples along with the semi-supervised learning approach further improve the classification performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126641534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For Chinese word segmentation and POS tagging problem, both character-based and word-based discriminative approaches can be used. Experiments show that these two approaches bring different errors and can complement each other. In this paper, we propose a joint decoding model based on both character-based and word-based models using multi-beam search algorithm. Experimental results show that the joint decoding model outperforms character-based and word-based baseline models.
{"title":"Joint Decoding for Chinese Word Segmentation and POS Tagging Using Character-Based and Word-Based Discriminative Models","authors":"Xinxin Li, Xuan Wang, Lin Yao","doi":"10.1109/IALP.2011.24","DOIUrl":"https://doi.org/10.1109/IALP.2011.24","url":null,"abstract":"For Chinese word segmentation and POS tagging problem, both character-based and word-based discriminative approaches can be used. Experiments show that these two approaches bring different errors and can complement each other. In this paper, we propose a joint decoding model based on both character-based and word-based models using multi-beam search algorithm. Experimental results show that the joint decoding model outperforms character-based and word-based baseline models.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127020677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper abandons a strict two-way sub-classification of intransitive verbs into unaccuasative and unergative for Hindi and proposes a distribution plotting of the same in a diffusion chart. The diagnostics tests that Bhatt (2003) applied on Hindi data are ranked for their efficiency of attributing correct sub-class to verbs. The diffusion chart shows that a tripartite classification handles the issue of classification of intransitive verbs in a better manner than the classical binary approach. The tripartite classification is as follows: (1) Verbs that take animate subject and are compatible with adverb of volitionality; (2) Verbs that take animate subject but are not compatible with adverb of volitionality; and (3) Verbs that take inanimate subject. The classification is of immense advantage for various NLP tasks such as machine translation, natural language generation.
{"title":"Issues with the Unergative/Unaccusative Classification of the Intransitive Verbs","authors":"Nitesh Surtani, Khushboo Jha, Soma Paul","doi":"10.1109/IALP.2011.54","DOIUrl":"https://doi.org/10.1109/IALP.2011.54","url":null,"abstract":"The paper abandons a strict two-way sub-classification of intransitive verbs into unaccuasative and unergative for Hindi and proposes a distribution plotting of the same in a diffusion chart. The diagnostics tests that Bhatt (2003) applied on Hindi data are ranked for their efficiency of attributing correct sub-class to verbs. The diffusion chart shows that a tripartite classification handles the issue of classification of intransitive verbs in a better manner than the classical binary approach. The tripartite classification is as follows: (1) Verbs that take animate subject and are compatible with adverb of volitionality; (2) Verbs that take animate subject but are not compatible with adverb of volitionality; and (3) Verbs that take inanimate subject. The classification is of immense advantage for various NLP tasks such as machine translation, natural language generation.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115897276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Studies on acoustic space have strengthened the view that vowels are acoustically and perceptually defined in terms of their relative positioning in vowel space. Every speaker identifies an optimal vowel space within which perceptual, phonological contrast is maintained. This is an interdisciplinary study involving speech pathology, physics of speech and neurology of speech. Two case studies of dysarthria presented in this paper are -- one Parkinson's disease and one case of acute ischemic stroke with age-gender-language matched controls. A detailed acoustic analysis shows how acoustic space gets considerably reduced, in both PD and stroke, and in these two very different kinds of dysarthrias the acoustic space is also modified very differently. The study also examines the third formant to show that the higher formants are consistently lowered in both PD and stroke. Hypokinetic speech production in these cases is reflected in lower intensity. The results have significant applications in clinical acoustics and in the theoretical fields of neurology of speech, linguistics and phonology.
{"title":"Acoustic Space in Motor Disorders of Speech: Two Case Studies","authors":"Vaishna Narang, Deepshikha Misra, Garima Dalal","doi":"10.1109/IALP.2011.25","DOIUrl":"https://doi.org/10.1109/IALP.2011.25","url":null,"abstract":"Studies on acoustic space have strengthened the view that vowels are acoustically and perceptually defined in terms of their relative positioning in vowel space. Every speaker identifies an optimal vowel space within which perceptual, phonological contrast is maintained. This is an interdisciplinary study involving speech pathology, physics of speech and neurology of speech. Two case studies of dysarthria presented in this paper are -- one Parkinson's disease and one case of acute ischemic stroke with age-gender-language matched controls. A detailed acoustic analysis shows how acoustic space gets considerably reduced, in both PD and stroke, and in these two very different kinds of dysarthrias the acoustic space is also modified very differently. The study also examines the third formant to show that the higher formants are consistently lowered in both PD and stroke. Hypokinetic speech production in these cases is reflected in lower intensity. The results have significant applications in clinical acoustics and in the theoretical fields of neurology of speech, linguistics and phonology.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116357609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}