首页 > 最新文献

2013 International Conference on Asian Language Processing最新文献

英文 中文
Analysis and Evaluation of Terminology Translation Consistency in Scientific and Technical Literature 科技文献术语翻译一致性分析与评价
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.25
Baosheng Yin, Xiaodong Yue, Dongfeng Cai, Guiping Zhang
In large-scale scientific and technical literature translation in which many people are involved, inconsistency in the translation of the same terminology is inevitable. Firstly, this paper carried out a comprehensive analysis to terminology translation inconsistency, finding that most are translations with same meaning but different indications, which influences the readability of the whole article. Then, we put forward a semantic similarity-based calculation method to identify this category of terminology inconsistency, selected translation with high frequency of network using Internet search engines, and carried out unionization. Finally, evaluate the literature improvement (post-editing) with two indexes of precision and consistency of terminology translation. In the experimental analysis, 100000-word patent document (English-Chinese human translation) is selected. The consistency index of original translation is 0.494, the consistency index of the processed translation is 0.763. The experimental result indicated that the method effectively improves the terminology translation consistency on the premise of correctly replacing terminology, thus significantly improves the readability of translation.
在多人参与的大型科技文献翻译中,同一术语的翻译不一致是不可避免的。首先,本文对术语翻译不一致的现象进行了全面分析,发现大多是同义不同义的翻译,影响了整篇文章的可读性。然后,我们提出了一种基于语义相似度的计算方法来识别这类术语不一致,利用互联网搜索引擎选择网络频率高的翻译,并进行联合化。最后,用术语翻译的准确性和一致性两个指标来评价文献改进(后期编辑)。在实验分析中,选择10万字的专利文献(英汉人工翻译)。原始译文的一致性指数为0.494,加工后译文的一致性指数为0.763。实验结果表明,该方法在正确替换术语的前提下,有效地提高了术语翻译的一致性,从而显著提高了翻译的可读性。
{"title":"Analysis and Evaluation of Terminology Translation Consistency in Scientific and Technical Literature","authors":"Baosheng Yin, Xiaodong Yue, Dongfeng Cai, Guiping Zhang","doi":"10.1109/IALP.2013.25","DOIUrl":"https://doi.org/10.1109/IALP.2013.25","url":null,"abstract":"In large-scale scientific and technical literature translation in which many people are involved, inconsistency in the translation of the same terminology is inevitable. Firstly, this paper carried out a comprehensive analysis to terminology translation inconsistency, finding that most are translations with same meaning but different indications, which influences the readability of the whole article. Then, we put forward a semantic similarity-based calculation method to identify this category of terminology inconsistency, selected translation with high frequency of network using Internet search engines, and carried out unionization. Finally, evaluate the literature improvement (post-editing) with two indexes of precision and consistency of terminology translation. In the experimental analysis, 100000-word patent document (English-Chinese human translation) is selected. The consistency index of original translation is 0.494, the consistency index of the processed translation is 0.763. The experimental result indicated that the method effectively improves the terminology translation consistency on the premise of correctly replacing terminology, thus significantly improves the readability of translation.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133258705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Purely Monotonic Approach to Machine Translation for Similar Languages 相似语言机器翻译的纯单调方法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.31
Ye Kyaw Thu, A. Finch, E. Sumita, Y. Sagisaka
This paper investigates the effect of taking a strictly monotonic approach to machine translation for a restricted set of suitable language pairs. We studied the effect of decoding monotonically for a set of language pairs which has similar word order characteristics and found that for some language pairs - namely language pairs where both languages are in SOV order - there was almost no difference in machine translation quality. The results of this experiment motivated the extension of the monotonic approach into the alignment stage of the training. We used a Bayesian non-parametric aligner that has been shown to out-perform GIZA++ in combination with the grow-diag-final- and heuristic on transliteration data. Our results show that the monotonic aligner was able to match the performance of the GIZA++ baseline, and gains in translation performance were obtained by integrating both aligners into the systems.
本文研究了采用严格单调方法对一组有限的合适语言对进行机器翻译的效果。我们研究了一组具有相似词序特征的语言对的单调解码效果,发现对于某些语言对(即两种语言都是SOV顺序的语言对),机器翻译质量几乎没有差异。本实验的结果促使单调方法扩展到训练的对齐阶段。我们使用了贝叶斯非参数对齐器,该对齐器与grow-diag-final-和对音译数据的启发式相结合,已被证明优于GIZA++。我们的研究结果表明,单调对准器能够匹配giz++基线的性能,并且通过将两种对准器集成到系统中可以获得平移性能的提高。
{"title":"A Purely Monotonic Approach to Machine Translation for Similar Languages","authors":"Ye Kyaw Thu, A. Finch, E. Sumita, Y. Sagisaka","doi":"10.1109/IALP.2013.31","DOIUrl":"https://doi.org/10.1109/IALP.2013.31","url":null,"abstract":"This paper investigates the effect of taking a strictly monotonic approach to machine translation for a restricted set of suitable language pairs. We studied the effect of decoding monotonically for a set of language pairs which has similar word order characteristics and found that for some language pairs - namely language pairs where both languages are in SOV order - there was almost no difference in machine translation quality. The results of this experiment motivated the extension of the monotonic approach into the alignment stage of the training. We used a Bayesian non-parametric aligner that has been shown to out-perform GIZA++ in combination with the grow-diag-final- and heuristic on transliteration data. Our results show that the monotonic aligner was able to match the performance of the GIZA++ baseline, and gains in translation performance were obtained by integrating both aligners into the systems.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115625063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NLP-Oriented Study on the Imperative Sentence with Interrogative Mood 基于nlp的祈使句疑问句研究
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.18
Pu Li, Hao Zhao
Imperative sentence with interrogative mood (ISIM) is a transitional category between imperative sentence and interrogative sentence. It has both the form of interrogative sentence and the function to transfer information of imperative sentence. So, the classification and meaning of ISIM become argumentative in study and application in Natural Language Processing. In this paper, the author claims three points in this paper: first, there are two parts of ISIM: imperative center which transfers imperative information and interrogative structure which makes the whole sentence more euphemistic, second, there are four types of ISIM in form: attached imperative sentence, imperative sentence of positive and negative, imperative sentence of right and wrong, rhetorical imperative sentence, third, the semantic and types of interrogative structure will affect the semantic of whole sentence.
疑问句语气祈使句是介于祈使句和疑问句之间的过渡范畴。它既具有疑问句的形式,又具有祈使句的信息传递功能。因此,在自然语言处理的研究和应用中,ISIM的分类和意义成为争论的焦点。在本文中,笔者提出了三点观点:第一,谓语从句由两部分组成:传递祈使句信息的祈使句中心和使整个句子更加委婉的疑问句结构;第二,谓语从句在形式上有四种类型:附加祈使句、正负祈使句、是非祈使句、修辞祈使句。第三,疑问句结构的语义和类型会影响整句的语义。
{"title":"NLP-Oriented Study on the Imperative Sentence with Interrogative Mood","authors":"Pu Li, Hao Zhao","doi":"10.1109/IALP.2013.18","DOIUrl":"https://doi.org/10.1109/IALP.2013.18","url":null,"abstract":"Imperative sentence with interrogative mood (ISIM) is a transitional category between imperative sentence and interrogative sentence. It has both the form of interrogative sentence and the function to transfer information of imperative sentence. So, the classification and meaning of ISIM become argumentative in study and application in Natural Language Processing. In this paper, the author claims three points in this paper: first, there are two parts of ISIM: imperative center which transfers imperative information and interrogative structure which makes the whole sentence more euphemistic, second, there are four types of ISIM in form: attached imperative sentence, imperative sentence of positive and negative, imperative sentence of right and wrong, rhetorical imperative sentence, third, the semantic and types of interrogative structure will affect the semantic of whole sentence.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116712760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tibetan Word Segmentation Based on Word-Position Tagging 基于词位标注的藏文分词方法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.74
Caijun Kang, Di Jiang, Congjun Long
The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.
基于词位的藏文分词方法最大的优点是减少了对未知词的分词错误。本文将常用的四标签集升级为六标签集,以适应藏文字符的特点,利用CRF作为标注模型对语料库数据进行训练和测试,并构建后处理模块对结果数据进行修改。实验结果表明,该方法取得了良好的性能,值得进一步研究,包括扩展语料库、优化标签集和特征模板。
{"title":"Tibetan Word Segmentation Based on Word-Position Tagging","authors":"Caijun Kang, Di Jiang, Congjun Long","doi":"10.1109/IALP.2013.74","DOIUrl":"https://doi.org/10.1109/IALP.2013.74","url":null,"abstract":"The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123316536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Data Quality Controlling for Cross-Lingual Sentiment Classification 跨语言情感分类的数据质量控制
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.43
Shoushan Li, Yunxia Xue, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang
Cross-lingual sentiment classification aims to perform sentiment classification in a language (named as the target language) with the help of the resources from another language (named as the source language). Previous studies are prone to using all available data in the source language while using all data is observed to perform no better or even worse than using a partion of good data. In this paper, we propose a novel task called data quality controlling in the source language to select high quality samples from the source language. To tackle this task, we propose two kinds of data quality measurements: intra- and extra-quality measurements which are implemented with the certainty and similarity measurements respectively. The empirical studies demonstrate the effectiveness of the proposed approach to data quality controlling in the source language.
跨语言情感分类的目的是利用另一种语言(源语言)的资源对一种语言(目标语言)进行情感分类。以前的研究倾向于使用源语言中所有可用的数据,而使用所有数据被观察到的表现并不比使用一部分好的数据更好,甚至更差。在本文中,我们提出了一种新的任务,称为源语言数据质量控制,以从源语言中选择高质量的样本。为了解决这个问题,我们提出了两种数据质量度量:内部质量度量和外部质量度量,它们分别用确定性度量和相似性度量实现。实证研究证明了该方法对源语言数据质量控制的有效性。
{"title":"Data Quality Controlling for Cross-Lingual Sentiment Classification","authors":"Shoushan Li, Yunxia Xue, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang","doi":"10.1109/IALP.2013.43","DOIUrl":"https://doi.org/10.1109/IALP.2013.43","url":null,"abstract":"Cross-lingual sentiment classification aims to perform sentiment classification in a language (named as the target language) with the help of the resources from another language (named as the source language). Previous studies are prone to using all available data in the source language while using all data is observed to perform no better or even worse than using a partion of good data. In this paper, we propose a novel task called data quality controlling in the source language to select high quality samples from the source language. To tackle this task, we propose two kinds of data quality measurements: intra- and extra-quality measurements which are implemented with the certainty and similarity measurements respectively. The empirical studies demonstrate the effectiveness of the proposed approach to data quality controlling in the source language.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126426995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating the Difficulty of Concepts on Domain Knowledge Using Latent Semantic Analysis 基于潜在语义分析的领域知识概念难度评估
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.58
Tao-Hsing Chang, Y. Sung, Yao-Tung Lee
As for the field of educational research and its applications, evaluating concept difficulty is necessary but difficult to carry out the work. Two major approaches are employed in previous research to evaluate the difficulty of concepts. Both of two approaches do not take into account whether concepts are acquired by learners and readers or not. This paper will be focused on constructing basic concept list of domain knowledge with latent semantic analysis (LSA), and use the age of acquisition of concept for representing the difficulty of concept. This paper will utilize natural science texts of elementary school in Taiwan as experimental materials to verify the validity of using our proposed method for evaluating the difficulty of concept.
在教育研究及其应用领域,概念难易度评价是必要的,但也是开展工作的难点。以往的研究主要采用两种方法来评估概念的难度。这两种方法都没有考虑概念是否由学习者和读者习得。本文将重点研究利用潜在语义分析(LSA)构建领域知识的基本概念表,并用概念的习得年龄来表示概念的难易程度。本研究将以台湾地区小学自然科学教材为实验材料,来验证我们所提出的概念难度评估方法的有效性。
{"title":"Evaluating the Difficulty of Concepts on Domain Knowledge Using Latent Semantic Analysis","authors":"Tao-Hsing Chang, Y. Sung, Yao-Tung Lee","doi":"10.1109/IALP.2013.58","DOIUrl":"https://doi.org/10.1109/IALP.2013.58","url":null,"abstract":"As for the field of educational research and its applications, evaluating concept difficulty is necessary but difficult to carry out the work. Two major approaches are employed in previous research to evaluate the difficulty of concepts. Both of two approaches do not take into account whether concepts are acquired by learners and readers or not. This paper will be focused on constructing basic concept list of domain knowledge with latent semantic analysis (LSA), and use the age of acquisition of concept for representing the difficulty of concept. This paper will utilize natural science texts of elementary school in Taiwan as experimental materials to verify the validity of using our proposed method for evaluating the difficulty of concept.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116393406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Discourse Topic in Anaphora Resolution and Discourse Construction 语篇主题在回指消解与语篇建构中的应用
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.69
Donghong Liu
Performing the function of encapsulating the whole discourse, discourse topic is usually considered as the center of the discourse. However, the representation form of discourse topic has not been agreed upon due to the elusive nature of the notion per se. Some people view discourse topic as an entity; others regard it as a question; still others as a proposition or even as an unnecessary form. This paper points out that discourse entity cannot be used in event anaphora if it is considered as discourse topic; that if a discourse topic takes the form of question, anaphora resolution may not be realized because of the uncertainty in the question and in the components of the question; and that if a discourse topic takes null form, the global coherence and the relevance of a discourse might be undermined. The paper also proves that comparatively propositional discourse topics conform to human beings' cognitive psychology, contribute to event anaphora resolution and facilitate discourse construction.
话语主题具有概括整个话语的功能,通常被认为是话语的中心。然而,由于概念本身的难以捉摸性,话语主题的表征形式一直没有达成一致。有些人将话语主题视为一个实体;其他人认为这是一个问题;还有一些人把它当作命题,甚至当作不必要的形式。文章指出,如果将语篇实体视为语篇主题,就不能在事件回指中使用语篇实体;如果话语主题采用问题的形式,由于问题及其组成部分的不确定性,回指消解可能无法实现;如果话语主题采用null形式,则可能会破坏话语的整体连贯性和相关性。论证了相对命题性话语主题符合人的认知心理,有助于消解事件回指,有利于话语构建。
{"title":"Discourse Topic in Anaphora Resolution and Discourse Construction","authors":"Donghong Liu","doi":"10.1109/IALP.2013.69","DOIUrl":"https://doi.org/10.1109/IALP.2013.69","url":null,"abstract":"Performing the function of encapsulating the whole discourse, discourse topic is usually considered as the center of the discourse. However, the representation form of discourse topic has not been agreed upon due to the elusive nature of the notion per se. Some people view discourse topic as an entity; others regard it as a question; still others as a proposition or even as an unnecessary form. This paper points out that discourse entity cannot be used in event anaphora if it is considered as discourse topic; that if a discourse topic takes the form of question, anaphora resolution may not be realized because of the uncertainty in the question and in the components of the question; and that if a discourse topic takes null form, the global coherence and the relevance of a discourse might be undermined. The paper also proves that comparatively propositional discourse topics conform to human beings' cognitive psychology, contribute to event anaphora resolution and facilitate discourse construction.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124055708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Acoustic Research of Stops CV Structure Coarticulation in Amdo Tibetan Xiahe Dialect 安多藏夏河方言停顿CV结构协同发音的声学研究
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.51
Shiliang Lv, Yasheng Jin, Ning Ma, Xuechen Yin
This paper studies on the Stops CV Structure in Amdo Tibetan's Xiahe Dialect. This study uses the research methods of speech acoustics, extraction of speech formant parameters. To research the CV structure Intra-Syllable Coarticulation. The main results of the research were as follows: In Xiahe Dialect CV structure exist coarticulation effects between consonants and vowels. The consonant made a great impact on vowel, relating mainly to the place of articulation of consonants. Consonant initials also influenced by the vowel. Through the locus slope of consonant, we found that place of articulation at back of consonants were affected most, unaspirated sound is greater than the aspirated sound.
本文研究了安多藏语夏河方言的停顿CV结构。本研究采用语音声学的研究方法,提取语音形成峰参数。研究汉语语音结构的音节内协同发音。研究的主要结果如下:夏河方言CV结构中存在辅音和元音的协同发音效应。辅音对元音的影响很大,主要与辅音发音的位置有关。辅音首字母也受元音的影响。通过辅音的轨迹斜率,我们发现辅音后部的发音位置受影响最大,不送气音大于送气音。
{"title":"The Acoustic Research of Stops CV Structure Coarticulation in Amdo Tibetan Xiahe Dialect","authors":"Shiliang Lv, Yasheng Jin, Ning Ma, Xuechen Yin","doi":"10.1109/IALP.2013.51","DOIUrl":"https://doi.org/10.1109/IALP.2013.51","url":null,"abstract":"This paper studies on the Stops CV Structure in Amdo Tibetan's Xiahe Dialect. This study uses the research methods of speech acoustics, extraction of speech formant parameters. To research the CV structure Intra-Syllable Coarticulation. The main results of the research were as follows: In Xiahe Dialect CV structure exist coarticulation effects between consonants and vowels. The consonant made a great impact on vowel, relating mainly to the place of articulation of consonants. Consonant initials also influenced by the vowel. Through the locus slope of consonant, we found that place of articulation at back of consonants were affected most, unaspirated sound is greater than the aspirated sound.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124477289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Semantic Orientation of Multiword Expression from Chinese Microblogging with Discriminative Latent Model 基于判别潜模型的中文微博多词表达语义取向挖掘
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.41
Xiao Sun, Chengcheng Li, Chenyi Tang, F. Ren
Extracting semantic orientation of Multiword Expression, especially some newly generated Multiword Expression from internet, is an important task for sentiment analysis of web texts or other real word text as some Multiword Expressions can express more integrative sentiments than words units. This paper proposes a method contains a novel latent discriminative algorithm, which attempts to attack this problem by integrating discriminative model and latent value model. Although Chinese Multiword Expressions consist of multiple words, the semantic orientation of the Multiword Expression is not just simple integration of orientations of the component words, as some words can invert the affective orientation so the Multiword Expressions can have totally opposite semantic orientation. In order to capture the property of such Multiword Expressions, hidden semi-CRF which includes a latent valuable layer, which can be used to address dual-sequence labeling tasks synchronously, is adopted. The method is tested experimentally by adopting a manually labeled set of positive and negative Multiword Expressions from microblog or other internet resources, and the experiments have shown very promising results, which is comparable to the best value ever reported.
多词表达的语义取向提取是网络文本或其他真实文字文本情感分析的重要任务,因为一些多词表达比单词单位更能表达完整的情感。本文提出了一种包含潜在判别算法的新方法,试图将判别模型与潜在值模型相结合来解决这一问题。虽然汉语多词短语是由多个词组成的,但多词短语的语义取向并不是组成词的语义取向的简单整合,因为有些词可以倒转情感取向,所以多词短语的语义取向可能完全相反。为了捕捉多词表达式的属性,采用包含潜在有价层的隐半正则表达式,该隐半正则表达式可用于同步处理双序列标记任务。采用微博或其他网络资源中人工标注的正负多词表达集对该方法进行了实验测试,实验结果非常令人满意,与已有报道的最佳值相当。
{"title":"Mining Semantic Orientation of Multiword Expression from Chinese Microblogging with Discriminative Latent Model","authors":"Xiao Sun, Chengcheng Li, Chenyi Tang, F. Ren","doi":"10.1109/IALP.2013.41","DOIUrl":"https://doi.org/10.1109/IALP.2013.41","url":null,"abstract":"Extracting semantic orientation of Multiword Expression, especially some newly generated Multiword Expression from internet, is an important task for sentiment analysis of web texts or other real word text as some Multiword Expressions can express more integrative sentiments than words units. This paper proposes a method contains a novel latent discriminative algorithm, which attempts to attack this problem by integrating discriminative model and latent value model. Although Chinese Multiword Expressions consist of multiple words, the semantic orientation of the Multiword Expression is not just simple integration of orientations of the component words, as some words can invert the affective orientation so the Multiword Expressions can have totally opposite semantic orientation. In order to capture the property of such Multiword Expressions, hidden semi-CRF which includes a latent valuable layer, which can be used to address dual-sequence labeling tasks synchronously, is adopted. The method is tested experimentally by adopting a manually labeled set of positive and negative Multiword Expressions from microblog or other internet resources, and the experiments have shown very promising results, which is comparable to the best value ever reported.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131948150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mining Parallel Corpus from Sina Microblog 从新浪微博中挖掘并行语料库
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.29
Haitao Xing, Muyun Yang, Haoliang Qi, Sheng Li, T. Zhao
Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.
从拥有数百万用户的微博网站(如新浪微博)中寻找平行语料库作为一种特定类型的信息是一项具有挑战性的任务。本文探讨了从用户名、哈希标签以及用户关系中挖掘此类数据的三种方法的可行性。在目前限制微博内容访问的限制下,最初的实验是令人鼓舞的。
{"title":"Mining Parallel Corpus from Sina Microblog","authors":"Haitao Xing, Muyun Yang, Haoliang Qi, Sheng Li, T. Zhao","doi":"10.1109/IALP.2013.29","DOIUrl":"https://doi.org/10.1109/IALP.2013.29","url":null,"abstract":"Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131985929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2013 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1