首页 > 最新文献

2013 International Conference on Asian Language Processing最新文献

英文 中文
New Language Resources for Arabic: Corpus Containing More Than Two Million Words and a Corpus Processing Tool 阿拉伯语的新语言资源:包含超过两百万单词的语料库和语料库处理工具
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.21
A. Al-Thubaity, Marwa Khan, Manal Al-Mazrua, Maram Al-Mousa
Arabic is a resource-poor language relative to other languages with a similar number of speakers. This situation negatively affects corpus-based linguistic studies in Arabic and, to a lesser extent, Arabic language processing. This paper presents a brief overview of recent freely available Arabic corpora and corpora processing tools, and it examines some of the issues that may be preventing Arabic linguists from using the same. These issues reveal the need for new language resources to enrich and foster Arabic corpus-based studies. Accordingly, this paper introduces the design of a new Arabic corpus that includes modern standard Arabic varieties based on newspapers from all Arab countries and that comprises more than two million words, it also describes the main features of a corpus processing tool specifically designed for Arabic, called "Khawas ÛæÇÕ" ("diver" in English). Khawas provides more features than any other freely available corpus processing tool for Arabic, including n-gram frequency and concordance, collocations, and statistical comparison of two corpora. Finally, we outline modifications and improvements that could be made in future works.
相对于使用人数相似的其他语言,阿拉伯语是一种资源贫乏的语言。这种情况对以语料库为基础的阿拉伯语语言学研究产生负面影响,并在较小程度上影响阿拉伯语处理。本文介绍了最近免费提供的阿拉伯语料库和语料库处理工具的简要概述,并研究了一些可能阻止阿拉伯语言学家使用相同的问题。这些问题表明需要新的语言资源来丰富和促进基于阿拉伯文语料库的研究。因此,本文介绍了一个新的阿拉伯语料库的设计,该语料库包括基于所有阿拉伯国家的报纸的现代标准阿拉伯语品种,包括200多万字,它还描述了专门为阿拉伯语设计的语料库处理工具的主要特征,称为“Khawas ÛæÇÕ”(英语中的“diver”)。Khawas提供了比任何其他免费的阿拉伯语语料库处理工具更多的功能,包括n-gram频率和一致性,搭配和两个语料库的统计比较。最后,我们概述了在未来的工作中可以做出的修改和改进。
{"title":"New Language Resources for Arabic: Corpus Containing More Than Two Million Words and a Corpus Processing Tool","authors":"A. Al-Thubaity, Marwa Khan, Manal Al-Mazrua, Maram Al-Mousa","doi":"10.1109/IALP.2013.21","DOIUrl":"https://doi.org/10.1109/IALP.2013.21","url":null,"abstract":"Arabic is a resource-poor language relative to other languages with a similar number of speakers. This situation negatively affects corpus-based linguistic studies in Arabic and, to a lesser extent, Arabic language processing. This paper presents a brief overview of recent freely available Arabic corpora and corpora processing tools, and it examines some of the issues that may be preventing Arabic linguists from using the same. These issues reveal the need for new language resources to enrich and foster Arabic corpus-based studies. Accordingly, this paper introduces the design of a new Arabic corpus that includes modern standard Arabic varieties based on newspapers from all Arab countries and that comprises more than two million words, it also describes the main features of a corpus processing tool specifically designed for Arabic, called \"Khawas ÛæÇÕ\" (\"diver\" in English). Khawas provides more features than any other freely available corpus processing tool for Arabic, including n-gram frequency and concordance, collocations, and statistical comparison of two corpora. Finally, we outline modifications and improvements that could be made in future works.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131515110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Sentiment Classification with Polarity Shifting Detection 基于极性转移检测的情感分类
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.44
Shoushan Li, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang
Sentiment classification is now a hot research issue in the community of natural language processing and the bag-of-words based machine learning approach is the state-of-the-art for this task. However, one important phenomenon, called polarity shifting, remains unsolved in the bag-of-words model, which sometimes makes the machine learning approach fails. In this study, we aim to perform sentiment classification with full consideration of the polarity shifting phenomenon. First, we extract some detection rules for detecting polarity shifting of sentimental words from a corpus which consists of polarity-shifted sentences. Then, we use the detection rules to detect the polarity-shifted words in the testing data. Third, a novel term counting-based classifier is designed by fully considering those polarity-shifted words. Evaluation shows that the novel term counting-based classifier significantly improves the performance of sentiment analysis across five domains. Furthermore, when this classifier is combined with a machine-learning based classifier, the combined classifier yields better performance than either of them.
情感分类是目前自然语言处理领域的一个研究热点,基于词袋的机器学习方法是该领域的最新研究成果。然而,在词袋模型中,有一个重要的现象,称为极性转移,仍然没有得到解决,这有时会使机器学习方法失败。在本研究中,我们的目标是在充分考虑极性转移现象的情况下进行情感分类。首先,我们从一个由极性转移句子组成的语料库中提取了一些检测情感词极性转移的规则。然后,利用检测规则对测试数据中的极性偏移词进行检测。第三,在充分考虑极性移位词的基础上,设计了一种新的基于词计数的分类器。评估表明,新的基于词计数的分类器显著提高了五个领域的情感分析性能。此外,当此分类器与基于机器学习的分类器结合使用时,组合分类器的性能优于它们中的任何一个。
{"title":"Sentiment Classification with Polarity Shifting Detection","authors":"Shoushan Li, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang","doi":"10.1109/IALP.2013.44","DOIUrl":"https://doi.org/10.1109/IALP.2013.44","url":null,"abstract":"Sentiment classification is now a hot research issue in the community of natural language processing and the bag-of-words based machine learning approach is the state-of-the-art for this task. However, one important phenomenon, called polarity shifting, remains unsolved in the bag-of-words model, which sometimes makes the machine learning approach fails. In this study, we aim to perform sentiment classification with full consideration of the polarity shifting phenomenon. First, we extract some detection rules for detecting polarity shifting of sentimental words from a corpus which consists of polarity-shifted sentences. Then, we use the detection rules to detect the polarity-shifted words in the testing data. Third, a novel term counting-based classifier is designed by fully considering those polarity-shifted words. Evaluation shows that the novel term counting-based classifier significantly improves the performance of sentiment analysis across five domains. Furthermore, when this classifier is combined with a machine-learning based classifier, the combined classifier yields better performance than either of them.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134513362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Topic and Its Negation in Chinese Sentences 汉语句子中的话题及其否定
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.10
Lin He, Qiong Peng
There are two major views on the generation of sentential topics in Chinese, of which some are the ones moved from a syntactic position. Disagreement occurs as regards the so-called dangling topics. One view contends that dangling topics are the moved ones, thematically related to a position inside the comment, the other holds that they are base-generated and licensed by the non-empty set resulting from the intersection of the topic set and the set generated by the semantic variable in the comment. Both views help interpret the negation of topics in Chinese sentential negation from the perspective of syntax, semantics and pragmatics. It is suggested that the topic can be negated when the variable or related element in the comment has a co-referential relation with the topic, or has no definite referent.
关于汉语句子主题的生成有两种主要观点,其中一些观点是从句法角度出发的。分歧发生在所谓的悬空话题上。一种观点认为,悬空主题是移动的主题,在主题上与注释中的位置相关;另一种观点认为,悬空主题是由主题集和注释中语义变量生成的集的交集产生的非空集生成和许可的。这两种观点都有助于从句法、语义和语用的角度解释汉语句子否定中的主语否定。建议当评论中的变量或相关元素与主题存在共指关系,或者没有明确的指涉时,可以对主题进行否定。
{"title":"Topic and Its Negation in Chinese Sentences","authors":"Lin He, Qiong Peng","doi":"10.1109/IALP.2013.10","DOIUrl":"https://doi.org/10.1109/IALP.2013.10","url":null,"abstract":"There are two major views on the generation of sentential topics in Chinese, of which some are the ones moved from a syntactic position. Disagreement occurs as regards the so-called dangling topics. One view contends that dangling topics are the moved ones, thematically related to a position inside the comment, the other holds that they are base-generated and licensed by the non-empty set resulting from the intersection of the topic set and the set generated by the semantic variable in the comment. Both views help interpret the negation of topics in Chinese sentential negation from the perspective of syntax, semantics and pragmatics. It is suggested that the topic can be negated when the variable or related element in the comment has a co-referential relation with the topic, or has no definite referent.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of PLP Cepstral Features for Phonetic Segmentation 使用PLP倒谱特征进行语音分割
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.47
Bhavik B. Vachhani, H. Patil
Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.
语音切分在文本到语音(TTS)合成和自动语音识别(ASR)系统中具有潜在的应用前景。在本文中,我们提出使用感知线性预测倒谱系数(PLPCC)特征来完成语音分割任务。为了检测语音边界,我们使用了频谱转移度量(STM)。使用该方法,我们实现了85%(即比最先进的Mel-frequency Cepstral Coefficients (MFCC)在20 ms协议持续时间内提高3%)的准确率和15%的过分割率(即比MFCC低8%),用于整个TIMIT数据库中对应630个扬声器的2,34,925个电话边界的自动边界检测。
{"title":"Use of PLP Cepstral Features for Phonetic Segmentation","authors":"Bhavik B. Vachhani, H. Patil","doi":"10.1109/IALP.2013.47","DOIUrl":"https://doi.org/10.1109/IALP.2013.47","url":null,"abstract":"Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"270 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Method for Network Topic Attention Forecast Based on Feature Words 基于特征词的网络话题关注度预测方法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.61
Chunlei Yan, Shumin Shi, Heyan Huang, Ruijing Li
The number of people who obtain information and express ideas via the Internet is increasing rapidly. Research on identifying how much attention paid to a given online topic plays an important role in the field of public opinion management. We propose a method to predict the netizens' attention on a specific online topic in this paper. Firstly, we acquire the historical topics' attention-degrees by analyzing news, reviews and forum posts, then built up the Feature Words Set (FWS) and estimate the popularity of each feature word. After that, we extract the feature words from a new topic and evaluate their contribution to it. Finally, the new attention-degree is computed by comparing the new topic's feature words with those in FWS. We compare our method with the Support Vector Regression model on a data set of manually selected topics. Experimental results show that our approach is acceptable for predicting the attention-degree of online topics.
通过互联网获取信息和表达思想的人数正在迅速增加。识别某一网络话题的受关注程度的研究在舆论管理领域有着重要的作用。本文提出了一种预测网民对特定网络话题关注程度的方法。首先,我们通过分析新闻、评论和论坛帖子来获取历史话题的关注度,然后构建特征词集(FWS)并估计每个特征词的流行度。然后,我们从一个新主题中提取特征词,并评估它们对该主题的贡献。最后,将新主题的特征词与FWS中的特征词进行比较,计算新主题的关注度。我们将我们的方法与人工选择主题的数据集上的支持向量回归模型进行比较。实验结果表明,该方法可用于预测在线话题的关注程度。
{"title":"A Method for Network Topic Attention Forecast Based on Feature Words","authors":"Chunlei Yan, Shumin Shi, Heyan Huang, Ruijing Li","doi":"10.1109/IALP.2013.61","DOIUrl":"https://doi.org/10.1109/IALP.2013.61","url":null,"abstract":"The number of people who obtain information and express ideas via the Internet is increasing rapidly. Research on identifying how much attention paid to a given online topic plays an important role in the field of public opinion management. We propose a method to predict the netizens' attention on a specific online topic in this paper. Firstly, we acquire the historical topics' attention-degrees by analyzing news, reviews and forum posts, then built up the Feature Words Set (FWS) and estimate the popularity of each feature word. After that, we extract the feature words from a new topic and evaluate their contribution to it. Finally, the new attention-degree is computed by comparing the new topic's feature words with those in FWS. We compare our method with the Support Vector Regression model on a data set of manually selected topics. Experimental results show that our approach is acceptable for predicting the attention-degree of online topics.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Research on Prosody Features of Mongolian Traditional Folk Long Song 蒙古族传统民间长歌的韵律特征研究
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.15
Yasheng Jin, Wenmin Liu
After labeling and extracting parameters from the voice signals of Mongolia Long Song 'The rich and vast Alashan', prosody features of Mongolian traditional folk Long Song are analyzed in the paper from the following two perspectives: 1) tone characteristics exploring the main prosody parameters, such as pitch, energy and time, 2) speech production characteristics explaining on the bases of formant and trill characteristics.
本文对蒙古长歌《浩荡的阿拉善》的语音信号进行标注和参数提取后,从以下两个方面分析蒙古族传统民间长歌的韵律特征:1)音调特征,发掘音高、能量、时间等主要韵律参数;2)基于构音特征和颤音特征的语音产生特征进行解释。
{"title":"Research on Prosody Features of Mongolian Traditional Folk Long Song","authors":"Yasheng Jin, Wenmin Liu","doi":"10.1109/IALP.2013.15","DOIUrl":"https://doi.org/10.1109/IALP.2013.15","url":null,"abstract":"After labeling and extracting parameters from the voice signals of Mongolia Long Song 'The rich and vast Alashan', prosody features of Mongolian traditional folk Long Song are analyzed in the paper from the following two perspectives: 1) tone characteristics exploring the main prosody parameters, such as pitch, energy and time, 2) speech production characteristics explaining on the bases of formant and trill characteristics.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124113828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Summarization without Lexical Features for Mandarin Presentation Speech 无词汇特征的普通话演讲摘要
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.48
Jian Zhang, Huaqiang Yuan
We present the first known empirical study on speech summarization without lexical features for Mandarin presentation speeches. We evaluate acoustic, lexical and structural features as predictors of summary sentences. We find that the summarizer yields good performance at the average F-measure of 0.625 even by using the combination of acoustic and structural features alone, which are independent of lexical features. In addition, we show that our summarizer performs surprisingly well at the average F-measure of 0.513 by using only acoustic features. These findings enable us to summarize speech without placing a stringent demand on speech recognition accuracy.
本文首次对汉语演讲中不含词汇特征的演讲摘要进行实证研究。我们评估了声学、词汇和结构特征作为总结句的预测因素。我们发现,即使单独使用声学和结构特征的组合,总结器也能在0.625的平均f测量值下产生良好的性能,而声学和结构特征独立于词汇特征。此外,我们还表明,仅使用声学特征,我们的摘要器在0.513的平均f测量值上表现得非常好。这些发现使我们能够在不严格要求语音识别准确性的情况下总结语音。
{"title":"Speech Summarization without Lexical Features for Mandarin Presentation Speech","authors":"Jian Zhang, Huaqiang Yuan","doi":"10.1109/IALP.2013.48","DOIUrl":"https://doi.org/10.1109/IALP.2013.48","url":null,"abstract":"We present the first known empirical study on speech summarization without lexical features for Mandarin presentation speeches. We evaluate acoustic, lexical and structural features as predictors of summary sentences. We find that the summarizer yields good performance at the average F-measure of 0.625 even by using the combination of acoustic and structural features alone, which are independent of lexical features. In addition, we show that our summarizer performs surprisingly well at the average F-measure of 0.513 by using only acoustic features. These findings enable us to summarize speech without placing a stringent demand on speech recognition accuracy.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora 异构标注语料库下汉语分词与词性标注的统一模型
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.64
Jiayi Zhao, Xipeng Qiu, Xuanjing Huang
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU's People's Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related'' tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.
汉语分词和词性标注是更高级的汉语语言处理任务的基础步骤。近年来,开发面向中文科技的异构标注语料库已引起越来越多的研究兴趣。本文提出了一个异构标注语料库的中文科技信息统一模型。首先,我们在宾夕法尼亚大学中文树库(CTB)和北京大学人民日报(PPD)这两个具有代表性的异构语料库之间自动构建了一个松散的不确定映射。然后,我们将异构语料库的中文科技看作两个“相关”的任务,并在两个异构语料库上同时训练我们的统一模型。实验表明,我们的统一模型可以利用共享信息来提高异构语料库的性能,并且比现有的方法有了显著的改进。
{"title":"A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora","authors":"Jiayi Zhao, Xipeng Qiu, Xuanjing Huang","doi":"10.1109/IALP.2013.64","DOIUrl":"https://doi.org/10.1109/IALP.2013.64","url":null,"abstract":"Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU's People's Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related'' tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123024705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting Hierarchical Discourse Structure for Review Sentiment Analysis 基于层次话语结构的评论情感分析
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.42
Fei Wang, Yunfang Wu
The overall sentiment of a text is critically affected by its discourse structure. For the first time, this paper incorporates hierarchical discourse structure into an unsupervised sentiment analysis framework. Experimental results show that by integrating discourse structure, the performance of sentiment analysis is improved by 1.9% (from 85.1% to 87.0%), demonstrating the effectiveness of exploiting discourse structure for sentiment analysis.
话语结构对文本的整体情感有重要影响。本文首次将分层话语结构融入到无监督情感分析框架中。实验结果表明,通过整合话语结构,情感分析的性能提高了1.9%(从85.1%提高到87.0%),证明了利用话语结构进行情感分析的有效性。
{"title":"Exploiting Hierarchical Discourse Structure for Review Sentiment Analysis","authors":"Fei Wang, Yunfang Wu","doi":"10.1109/IALP.2013.42","DOIUrl":"https://doi.org/10.1109/IALP.2013.42","url":null,"abstract":"The overall sentiment of a text is critically affected by its discourse structure. For the first time, this paper incorporates hierarchical discourse structure into an unsupervised sentiment analysis framework. Experimental results show that by integrating discourse structure, the performance of sentiment analysis is improved by 1.9% (from 85.1% to 87.0%), demonstrating the effectiveness of exploiting discourse structure for sentiment analysis.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122110949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Improvements in Statistical Phrase-Based Interactive Machine Translation 基于统计短语的交互式机器翻译的改进
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.27
Dongfeng Cai, Hua Zhang, Na Ye
State-of-the-art Machine Translation (MT) systems are still far from being perfect. An alternative is the so-called Interactive Machine Translation (IMT). In this paper, we present some novel methods to improve the statistical phrase-based IMT. We utilize dynamic distortion limitation to balance the requirements of long distance reordering and decoding speed. And we introduce the difference function to the translation hypothesis extension as a heuristic function, to make the final translation candidates as diverse as possible. We also use the user validated prefix to direct the word selection of suffix based on a word co-occurrence model. All these methods aim at optimizing the first N-best candidate translations and look forward to reducing the cognitive burden of the users. The experiential results show the effectiveness of our methods.
最先进的机器翻译(MT)系统还远远不够完美。另一种选择是所谓的交互式机器翻译(IMT)。本文提出了一些改进基于统计短语的IMT的新方法。我们利用动态失真限制来平衡长距离重排序和解码速度的要求。并将差分函数作为启发式函数引入到翻译假设拓延中,使最终的候选译文尽可能多样化。我们还使用用户验证的前缀来指导基于单词共现模型的后缀的单词选择。所有这些方法都旨在优化前n个最佳候选翻译,并期望减少用户的认知负担。实验结果表明了方法的有效性。
{"title":"Improvements in Statistical Phrase-Based Interactive Machine Translation","authors":"Dongfeng Cai, Hua Zhang, Na Ye","doi":"10.1109/IALP.2013.27","DOIUrl":"https://doi.org/10.1109/IALP.2013.27","url":null,"abstract":"State-of-the-art Machine Translation (MT) systems are still far from being perfect. An alternative is the so-called Interactive Machine Translation (IMT). In this paper, we present some novel methods to improve the statistical phrase-based IMT. We utilize dynamic distortion limitation to balance the requirements of long distance reordering and decoding speed. And we introduce the difference function to the translation hypothesis extension as a heuristic function, to make the final translation candidates as diverse as possible. We also use the user validated prefix to direct the word selection of suffix based on a word co-occurrence model. All these methods aim at optimizing the first N-best candidate translations and look forward to reducing the cognitive burden of the users. The experiential results show the effectiveness of our methods.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130171200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2013 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1