首页 > 最新文献

2011 International Conference on Asian Language Processing最新文献

英文 中文
How Vietnamese Attitudes can be Recognized and Confused: Cross-Cultural Perception and Speech Prosody Analysis 越南语的态度如何被识别和混淆:跨文化感知和语音韵律分析
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.39
Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard
Prosodic attitudes, or social affects, are main part of face-to-face interaction and linked to the language through the culture. This paper presents a study on prosodic attitudes in Vietnamese, a tonal language. Perception experiments on 16 Vietnamese attitudes were carried out with Vietnamese and French participants. The results revealed perception differences between native and non-native listeners. As attitudinal expression are partially carried through speech prosody, an analysis was also carried out, in order to have a better understanding of why these attitudes are recognized or confused, and to bring out some prosodic characteristics of Vietnamese social affects.
韵律态度或社会影响是面对面交流的重要组成部分,并通过文化与语言联系在一起。本文对声调语言越南语的韵律态度进行了研究。对越南和法国的参与者进行了16种越南态度的感知实验。结果揭示了母语和非母语听众之间的感知差异。由于态度的表达部分是通过言语韵律来进行的,为了更好地理解这些态度被认可或混淆的原因,并揭示越南社会情感的一些韵律特征,我们也进行了分析。
{"title":"How Vietnamese Attitudes can be Recognized and Confused: Cross-Cultural Perception and Speech Prosody Analysis","authors":"Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard","doi":"10.1109/IALP.2011.39","DOIUrl":"https://doi.org/10.1109/IALP.2011.39","url":null,"abstract":"Prosodic attitudes, or social affects, are main part of face-to-face interaction and linked to the language through the culture. This paper presents a study on prosodic attitudes in Vietnamese, a tonal language. Perception experiments on 16 Vietnamese attitudes were carried out with Vietnamese and French participants. The results revealed perception differences between native and non-native listeners. As attitudinal expression are partially carried through speech prosody, an analysis was also carried out, in order to have a better understanding of why these attitudes are recognized or confused, and to bring out some prosodic characteristics of Vietnamese social affects.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117272153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Experimental Study on Vietnamese Speech Synthesis 越南语语音合成的实验研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.40
Liping Kui, Jian Yang, Bin He, Enxing Hu
The modern Vietnamese is a monosyllabic tone language. Each syllable can be marked with initial, final and tone. In this paper, Vietnamese speech synthesis system is realized by using a trainable HMM-based speech synthesis method. The basic synthesis units of this system are initials and finals. According to the characteristics of Vietnamese, we have conducted such works as collecting corpus, recording, labeling, determining the phonemes list, and designing context attributes and question set. Then Vietnamese speech synthesis system is constructed by using the STRAIGHT synthesizer under the HTS platform. At last, we conduct a subjective test to synthetic speech signals. The results of preliminary evaluation show that the intelligibility of the utterances is approximately 100%, and the quality of synthesis speech is from fair to good.
现代越南语是一种单音节语调语言。每个音节都可以标上声母、韵母和声调。本文采用一种可训练的基于hmm的语音合成方法实现了越南语语音合成系统。这个系统的基本合成单位是声母和韵母。根据越南语的特点,我们进行了语料库收集、记录、标注、确定音素表、设计语境属性和问题集等工作。然后利用HTS平台下的STRAIGHT合成器构建越南语语音合成系统。最后,对合成的语音信号进行主观测试。初步评价结果表明,语音的可理解度约为100%,合成语音的质量从一般到良好。
{"title":"An Experimental Study on Vietnamese Speech Synthesis","authors":"Liping Kui, Jian Yang, Bin He, Enxing Hu","doi":"10.1109/IALP.2011.40","DOIUrl":"https://doi.org/10.1109/IALP.2011.40","url":null,"abstract":"The modern Vietnamese is a monosyllabic tone language. Each syllable can be marked with initial, final and tone. In this paper, Vietnamese speech synthesis system is realized by using a trainable HMM-based speech synthesis method. The basic synthesis units of this system are initials and finals. According to the characteristics of Vietnamese, we have conducted such works as collecting corpus, recording, labeling, determining the phonemes list, and designing context attributes and question set. Then Vietnamese speech synthesis system is constructed by using the STRAIGHT synthesizer under the HTS platform. At last, we conduct a subjective test to synthetic speech signals. The results of preliminary evaluation show that the intelligibility of the utterances is approximately 100%, and the quality of synthesis speech is from fair to good.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"16 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring Both Flat and Structured Features for Number Type Identification of Chinese Personal Noun Phrases 汉语人称名词短语数型识别的平面化与结构化特征探讨
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.69
Jun Lang
Different from English, Chinese does not explicitly show grammatical number information by inflection. The Number information in a Chinese sentence is implied by the noun phrase itself and its surrounding context. In this paper, we explore diverse features, including both flat and structured, for number identification of Chinese personal noun phrase. The flat features explore the knowledge within the noun phrase while the structured features capture the surrounding context information of the noun phrase in the parse tree of the given sentence. These two kinds of features together with kernel-based SVM are utilized in this study. Evaluation on the ACE 2005 corpus shows that our method achieves 89.23% in accuracy, which significantly advances the state-of-the-art.
与英语不同的是,汉语并不通过屈折变化来明确表达语法数字信息。汉语句子中的数字信息是由名词短语本身及其周围的语境所暗示的。本文探讨了汉语人称名词短语数字识别的平面化和结构化特征。扁平特征探索名词短语中的知识,而结构化特征在给定句子的解析树中捕获名词短语的周围上下文信息。本文将这两种特征与基于核的支持向量机结合使用。对ACE 2005语料库的评估表明,该方法的准确率达到89.23%,大大提高了目前的研究水平。
{"title":"Exploring Both Flat and Structured Features for Number Type Identification of Chinese Personal Noun Phrases","authors":"Jun Lang","doi":"10.1109/IALP.2011.69","DOIUrl":"https://doi.org/10.1109/IALP.2011.69","url":null,"abstract":"Different from English, Chinese does not explicitly show grammatical number information by inflection. The Number information in a Chinese sentence is implied by the noun phrase itself and its surrounding context. In this paper, we explore diverse features, including both flat and structured, for number identification of Chinese personal noun phrase. The flat features explore the knowledge within the noun phrase while the structured features capture the surrounding context information of the noun phrase in the parse tree of the given sentence. These two kinds of features together with kernel-based SVM are utilized in this study. Evaluation on the ACE 2005 corpus shows that our method achieves 89.23% in accuracy, which significantly advances the state-of-the-art.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126208612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Chinese Dependency Parsing with Self-Disambiguating Patterns 用自消歧模式改进汉语依存句法分析
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.36
Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong
To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.
为了解决依赖解析中的数据稀疏性问题,以往的研究大多采用大规模自动解析数据构建特征。与以前的工作不同,我们提出了一种新的方法,通过使用自消歧模式(SDP)提取上下文无关的依赖三元组(CDT)来改进依赖解析。使用SDP可以避免对基线解析器的依赖,并逐个探索不同类型子结构的影响。另外,以可用的cdt为种子,通过标签传播过程将大量未标记的词对标记为cdt。实验表明,当CDT特征集成到最大生成树(MST)依赖解析器中时,新的解析器比基线MST解析器有了显著的改进。对比结果还表明,带依赖关系标签的CDT比不带依赖关系标签的CDT性能要好得多。
{"title":"Improving Chinese Dependency Parsing with Self-Disambiguating Patterns","authors":"Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong","doi":"10.1109/IALP.2011.36","DOIUrl":"https://doi.org/10.1109/IALP.2011.36","url":null,"abstract":"To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117113707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-reference Resolution in Vietnamese Documents Based on Support Vector Machines 基于支持向量机的越南语文献共同参考解析
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.63
Duc-Trong Le, Mai-Vu Tran, Tri-Thanh Nguyen, Quang-Thuy Ha
Co-reference resolution task still poses many challenges due to the complexity of the Vietnamese language, and the lack of standard Vietnamese linguistic resources. Based on the mention-pair model of Rahman and Ng. (2009) and the characteristics of Vietnamese, this paper proposes a model using support vector machines (SVM) to solve the co-reference in Vietnamese documents. The corpus used in experiments to evaluate the proposed model was constructed from 200 articles in cultural and social categories from vnexpress.net newspaper website. The results of the initial experiments of the proposed model achieved 76.51% accuracy in comparison with that of the baseline model of 73.79% with similar features.
由于越南语的复杂性和缺乏标准的越南语语言资源,共同指称解析任务仍然面临许多挑战。基于Rahman和Ng的提及对模型。(2009)和越南语的特点,本文提出了一个使用支持向量机(SVM)的模型来解决越南语文档中的共同引用问题。实验中使用的语料库是由vexpress.net报纸网站上的200篇文化和社会类文章构建而成的。初步实验结果表明,该模型的准确率为76.51%,而相似特征的基线模型的准确率为73.79%。
{"title":"Co-reference Resolution in Vietnamese Documents Based on Support Vector Machines","authors":"Duc-Trong Le, Mai-Vu Tran, Tri-Thanh Nguyen, Quang-Thuy Ha","doi":"10.1109/IALP.2011.63","DOIUrl":"https://doi.org/10.1109/IALP.2011.63","url":null,"abstract":"Co-reference resolution task still poses many challenges due to the complexity of the Vietnamese language, and the lack of standard Vietnamese linguistic resources. Based on the mention-pair model of Rahman and Ng. (2009) and the characteristics of Vietnamese, this paper proposes a model using support vector machines (SVM) to solve the co-reference in Vietnamese documents. The corpus used in experiments to evaluate the proposed model was constructed from 200 articles in cultural and social categories from vnexpress.net newspaper website. The results of the initial experiments of the proposed model achieved 76.51% accuracy in comparison with that of the baseline model of 73.79% with similar features.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"64 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120982940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lexical Word Similarity for Re-ranking in Vietnamese-English Named Entity Back Transliteration 越英命名实体回音译中词汇相似度重新排序的研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.44
Diem Thi Hoang Le, AiTi Aw
Transliteration is the transformation of word in original language to another language based on its pronunciation. Back transliteration is the transformation of already transliterated word in another language back to its original form. This backward process is in nature more challenging than the forward direction because of more information lost. In many cases, the back transliteration can return almost exact result, which has a minor difference in spelling compared with the original word form. We propose in this work a lexical word similarity for dictionary matching in order to re-rank the candidates and enhance the performance of a grapheme-based location name back transliteration. This method is experimented on Vietnamese-English language pair and showed improvement.
音译是将原语中的单词根据其发音转换成另一种语言的过程。回音译是指将另一种语言中已经音译的单词转换成原形式。这种向后的过程在本质上比向前的过程更具挑战性,因为会丢失更多的信息。在许多情况下,反向音译可以返回几乎准确的结果,与原始单词形式相比,拼写略有不同。在这项工作中,我们提出了词典匹配的词汇词相似度,以便重新排序候选词并提高基于字素的位置名称反音译的性能。该方法在越英语对上进行了实验,取得了较好的效果。
{"title":"Lexical Word Similarity for Re-ranking in Vietnamese-English Named Entity Back Transliteration","authors":"Diem Thi Hoang Le, AiTi Aw","doi":"10.1109/IALP.2011.44","DOIUrl":"https://doi.org/10.1109/IALP.2011.44","url":null,"abstract":"Transliteration is the transformation of word in original language to another language based on its pronunciation. Back transliteration is the transformation of already transliterated word in another language back to its original form. This backward process is in nature more challenging than the forward direction because of more information lost. In many cases, the back transliteration can return almost exact result, which has a minor difference in spelling compared with the original word form. We propose in this work a lexical word similarity for dictionary matching in order to re-rank the candidates and enhance the performance of a grapheme-based location name back transliteration. This method is experimented on Vietnamese-English language pair and showed improvement.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Corpus Based Extractive Document Summarization for Indic Script 基于语料库的印度语抽取文档摘要
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.66
P. Reddy, B. V. Vardhan, A. Govardhan
Summarization is a process of generating condensed form of a given text document, which retains its information and overall meaning. Document summarization approaches are broadly classified into two i.e. extractive summarization approach and abstractive summarization approach. In this paper, we performed single document summarization to generate summary of Telugu text document by using extractive summarization approach. Though there are many document surface features exists, we consider those features which can extensively cover original document and generates summary with less redundancy. We considered the features such as sentence position, sentence similarity with the title, centrality of the sentence and word frequency. To increase the strength of the features, we used a corpus which contains 3000 documents and performed various preprocessing steps like stop word elimination and stemming to retain more meaningful words within the sentence. Sentences are ranked by calculating the scores for each individual sentence by considering all four features simultaneously with optimum weights. The optimum weights to the feature are learned with the help human constructed summaries. The machine generated summaries are evaluated using F1 measure followed by human judgements.
摘要是将给定的文本文档生成浓缩形式,并保留其信息和整体意义的过程。文档摘要方法大致分为抽取式摘要方法和抽象式摘要方法。本文采用抽取摘要的方法对泰卢固语文本文档进行单文档摘要生成。虽然存在许多文档表面特征,但我们考虑的是那些能够广泛覆盖原始文档并生成冗余较少的摘要的特征。我们考虑了句子位置、句子与标题的相似度、句子的中心性和词频等特征。为了增强特征的强度,我们使用了一个包含3000个文档的语料库,并执行了各种预处理步骤,如停止词消除和词干提取,以保留句子中更有意义的单词。通过同时考虑所有四个特征并以最佳权重计算每个句子的分数来对句子进行排名。在人工构造摘要的帮助下学习特征的最优权重。机器生成的摘要使用F1测量进行评估,然后进行人工判断。
{"title":"Corpus Based Extractive Document Summarization for Indic Script","authors":"P. Reddy, B. V. Vardhan, A. Govardhan","doi":"10.1109/IALP.2011.66","DOIUrl":"https://doi.org/10.1109/IALP.2011.66","url":null,"abstract":"Summarization is a process of generating condensed form of a given text document, which retains its information and overall meaning. Document summarization approaches are broadly classified into two i.e. extractive summarization approach and abstractive summarization approach. In this paper, we performed single document summarization to generate summary of Telugu text document by using extractive summarization approach. Though there are many document surface features exists, we consider those features which can extensively cover original document and generates summary with less redundancy. We considered the features such as sentence position, sentence similarity with the title, centrality of the sentence and word frequency. To increase the strength of the features, we used a corpus which contains 3000 documents and performed various preprocessing steps like stop word elimination and stemming to retain more meaningful words within the sentence. Sentences are ranked by calculating the scores for each individual sentence by considering all four features simultaneously with optimum weights. The optimum weights to the feature are learned with the help human constructed summaries. The machine generated summaries are evaluated using F1 measure followed by human judgements.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125013016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Study of the Classification and Arrangement Rule of Uygur Morphemes for Information Processing 维吾尔语信息加工语素分类排列规律研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.50
Pu Li, Shuzhen Shi
In the processing of modern uygur corpus, it is necessary to make a word character mark study of the word level within the modern uygur language data. Since the classification of morpheme is to serve the mark of word character, the article classifies Uygur morphemes from their functions and lists their all classifications and arrangement rules.
在现代维吾尔语语料库的处理中,有必要对现代维吾尔语语料中的词层次进行词字标研究。由于语素的分类是为字标服务的,本文从语素的功能出发,对维吾尔语语素进行了分类,并列出了语素的分类和排列规律。
{"title":"A Study of the Classification and Arrangement Rule of Uygur Morphemes for Information Processing","authors":"Pu Li, Shuzhen Shi","doi":"10.1109/IALP.2011.50","DOIUrl":"https://doi.org/10.1109/IALP.2011.50","url":null,"abstract":"In the processing of modern uygur corpus, it is necessary to make a word character mark study of the word level within the modern uygur language data. Since the classification of morpheme is to serve the mark of word character, the article classifies Uygur morphemes from their functions and lists their all classifications and arrangement rules.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"1047 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123141081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polarity Shifting: Corpus Construction and Analysis 极性转移:语料库建构与分析
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.27
Xiaoqian Zhang, Shoushan Li, Guodong Zhou, Hongxia Zhao
Polarity shifting has been a challenge to automatic sentiment classification. In this paper, we create a corpus which consists of polarity-shifted sentences in various kinds of product reviews. In the corpus, both the sentimental words and shifting trigger words are annotated. Furthermore, we analyze all the polarity shifted sentences and categorize them into five categories: opinion-itself, holder, target, time and hypothesis. Experimental study shows the agreement of annotation and the distribution of the five categories of polarity shifting.
极性转移一直是情感自动分类面临的挑战。在本文中,我们创建了一个语料库,该语料库由各种产品评论中的极性转移句组成。在语料库中,感伤词和移位触发词都有注释。此外,我们对所有极性转移句进行了分析,并将其分为五类:意见本身、持有者、目标、时间和假设。实验研究表明,五类极性转移的注释和分布是一致的。
{"title":"Polarity Shifting: Corpus Construction and Analysis","authors":"Xiaoqian Zhang, Shoushan Li, Guodong Zhou, Hongxia Zhao","doi":"10.1109/IALP.2011.27","DOIUrl":"https://doi.org/10.1109/IALP.2011.27","url":null,"abstract":"Polarity shifting has been a challenge to automatic sentiment classification. In this paper, we create a corpus which consists of polarity-shifted sentences in various kinds of product reviews. In the corpus, both the sentimental words and shifting trigger words are annotated. Furthermore, we analyze all the polarity shifted sentences and categorize them into five categories: opinion-itself, holder, target, time and hypothesis. Experimental study shows the agreement of annotation and the distribution of the five categories of polarity shifting.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122669284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result 阿拉伯语口语文本的句子边界检测:初步结果
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.38
A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman
Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.
最近,自然语言处理任务更频繁地在在线内容上进行。这对阿拉伯语的应用程序提出了一个特殊的问题。在线阿拉伯语内容通常以非正式的阿拉伯语口语书写,其特点是结构不良,缺乏具体的语言标准化。在本文中,我们研究了成功进行自然语言处理的第一步,即句子边界检测问题。由于非正式阿拉伯语缺乏基本的语言规则,我们通过对大量非正式阿拉伯语文本的广泛研究,建立了一份常用标点符号列表。此外,我们评估了这些标点符号作为句子分隔符的正确用法;结果产生了70%的初步准确度。
{"title":"Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result","authors":"A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman","doi":"10.1109/IALP.2011.38","DOIUrl":"https://doi.org/10.1109/IALP.2011.38","url":null,"abstract":"Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2011 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1