首页 > 最新文献

2011 International Conference on Asian Language Processing最新文献

英文 中文
Imbalanced Sentiment Classification with Multi-strategy Ensemble Learning 基于多策略集成学习的不平衡情感分类
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.28
Zhongqing Wang, Shoushan Li, Guodong Zhou, Peifeng Li, Qiaoming Zhu
Recently, sentiment classification has become a hot research topic in natural language processing. But most existing studies assume that the samples in the negative and positive categories are balanced, which might not be true in real applications. In this paper, we investigate sentiment classification tasks where the class distribution of the sam-ples is imbalanced. To handle the imbalanced problem, we propose a multi-strategy ensemble learning approach to this problem. Our ensemble approach integrates sample-ensemble, feature-ensemble, and classifier-ensemble by ex-ploiting multiple classification algorithms. Evaluation across four domains shows that our ensemble approach outper-forms many other popular approaches that handling imbal-anced classification problems, such as re-sampling and cost-sensitive approaches, and is proven effective for imbalanced sentiment classification.
近年来,情感分类已成为自然语言处理领域的一个研究热点。但是大多数现有的研究都假设阴性和阳性类别的样本是平衡的,这在实际应用中可能并不正确。在本文中,我们研究了样本类别分布不平衡的情感分类任务。为了解决不平衡问题,我们提出了一种多策略集成学习方法。我们的集成方法通过利用多种分类算法集成了样本集成、特征集成和分类器集成。跨四个领域的评估表明,我们的集成方法优于许多其他处理不平衡分类问题的流行方法,例如重新采样和成本敏感方法,并且被证明对不平衡情感分类是有效的。
{"title":"Imbalanced Sentiment Classification with Multi-strategy Ensemble Learning","authors":"Zhongqing Wang, Shoushan Li, Guodong Zhou, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP.2011.28","DOIUrl":"https://doi.org/10.1109/IALP.2011.28","url":null,"abstract":"Recently, sentiment classification has become a hot research topic in natural language processing. But most existing studies assume that the samples in the negative and positive categories are balanced, which might not be true in real applications. In this paper, we investigate sentiment classification tasks where the class distribution of the sam-ples is imbalanced. To handle the imbalanced problem, we propose a multi-strategy ensemble learning approach to this problem. Our ensemble approach integrates sample-ensemble, feature-ensemble, and classifier-ensemble by ex-ploiting multiple classification algorithms. Evaluation across four domains shows that our ensemble approach outper-forms many other popular approaches that handling imbal-anced classification problems, such as re-sampling and cost-sensitive approaches, and is proven effective for imbalanced sentiment classification.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117178249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BASRAH: Arabic Verses Meters Identification System 巴士拉:阿拉伯语诗句和韵律识别系统
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.19
Z. Khalaf, Maytham Alabbas, T. Tan
In this paper, we present BASRAH, a system that automatically identifies the meter of Arabic verse, which is an operation that requires a certain level of human expertise. BASRAH uses the numerical prosody method, which depends on verse coding that is derived from the general concept of al-Khalil's feet through using the two primary units (cord=2 and peg=3). BASRAH has proved to be an efficient tool to help inexperienced users to determine the meters of Arabic verses when we tested it on thousands of old and modern Arabic verses.
在本文中,我们介绍了BASRAH,一个自动识别阿拉伯诗歌韵律的系统,这是一个需要一定程度的人类专业知识的操作。BASRAH使用数字韵律方法,这取决于通过使用两个主要单位(cord=2和peg=3)从al-Khalil的脚的一般概念衍生出来的诗歌编码。当我们对成千上万的古代和现代阿拉伯诗歌进行测试时,BASRAH已被证明是一个有效的工具,可以帮助没有经验的用户确定阿拉伯诗歌的长度。
{"title":"BASRAH: Arabic Verses Meters Identification System","authors":"Z. Khalaf, Maytham Alabbas, T. Tan","doi":"10.1109/IALP.2011.19","DOIUrl":"https://doi.org/10.1109/IALP.2011.19","url":null,"abstract":"In this paper, we present BASRAH, a system that automatically identifies the meter of Arabic verse, which is an operation that requires a certain level of human expertise. BASRAH uses the numerical prosody method, which depends on verse coding that is derived from the general concept of al-Khalil's feet through using the two primary units (cord=2 and peg=3). BASRAH has proved to be an efficient tool to help inexperienced users to determine the meters of Arabic verses when we tested it on thousands of old and modern Arabic verses.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Adopting Malay Syllable Structure for Syllable Based Speech Synthesizer for Iban and Bidayuh Languages 马来语音节结构在伊班语和比达尤语音节语音合成器中的应用
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.21
Sarah Flora Samson Juan, V. Edwin, Chai Yeen Cheong, Jun Choi Lee, A. Yeo
Sarawak, Malaysia, has many under-resourced languages, which stands to become extinct if measures are not taken to preserve and maintain them. These languages are mostly spoken by the indigenous groups and not all of the languages are documented or studied. As an initiative to preserve, a Text to Speech (TTS) system has been built for Iban and Bidayuh languages, two out of 44 living languages in Sarawak. To expedite the development, we employed knowledge of closely-related language, i.e. Malay, which is the first language in Malaysia. In this paper, we employed a syllabification algorithm based on Malay syllable structure to build the Iban and Bidayuh syllable list and speech corpus. An accuracy test for the algorithm was conducted to determine the quality of the output from the TTS system using Categorical Estimation (CE). Test showed high percentage in accuracy and quality has a mean score of 3.07 out of 5, suggesting the approach works.
马来西亚沙捞越有许多资源不足的语言,如果不采取措施加以保护和维护,这些语言将会灭绝。这些语言主要由土著群体使用,并不是所有的语言都被记录或研究过。作为一项保护倡议,已经为伊班语和比达尤语建立了一个文本到语音(TTS)系统,这是沙捞越44种现存语言中的两种。为了加快发展,我们使用了与之密切相关的语言,即马来语,这是马来西亚的第一语言。本文采用基于马来语音节结构的音节化算法,构建了伊班语和比达耶语的音节表和语料库。使用分类估计(Categorical Estimation, CE)对该算法进行了精度测试,以确定TTS系统输出的质量。测试表明,准确率和质量的百分比很高,平均得分为3.07分(满分5分),表明该方法有效。
{"title":"Adopting Malay Syllable Structure for Syllable Based Speech Synthesizer for Iban and Bidayuh Languages","authors":"Sarah Flora Samson Juan, V. Edwin, Chai Yeen Cheong, Jun Choi Lee, A. Yeo","doi":"10.1109/IALP.2011.21","DOIUrl":"https://doi.org/10.1109/IALP.2011.21","url":null,"abstract":"Sarawak, Malaysia, has many under-resourced languages, which stands to become extinct if measures are not taken to preserve and maintain them. These languages are mostly spoken by the indigenous groups and not all of the languages are documented or studied. As an initiative to preserve, a Text to Speech (TTS) system has been built for Iban and Bidayuh languages, two out of 44 living languages in Sarawak. To expedite the development, we employed knowledge of closely-related language, i.e. Malay, which is the first language in Malaysia. In this paper, we employed a syllabification algorithm based on Malay syllable structure to build the Iban and Bidayuh syllable list and speech corpus. An accuracy test for the algorithm was conducted to determine the quality of the output from the TTS system using Categorical Estimation (CE). Test showed high percentage in accuracy and quality has a mean score of 3.07 out of 5, suggesting the approach works.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116264714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Building a Rule-Based Malay Text Segmentation Tool 建立一个基于规则的马来语文本分割工具
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.42
Bali Ranaivo-Malançon
This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.
本文介绍了在构建基于规则的马来语文本分割工具时需要考虑的不同问题,该工具可以将文本分割为句子和标记。将该工具与英语和马来语标记器进行比较,以突出马来语文本的特征。
{"title":"Building a Rule-Based Malay Text Segmentation Tool","authors":"Bali Ranaivo-Malançon","doi":"10.1109/IALP.2011.42","DOIUrl":"https://doi.org/10.1109/IALP.2011.42","url":null,"abstract":"This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125736927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Formalization and Rules for Recognition of Satirical Irony 讽刺反语的形式化与识别规则
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.14
Lingpeng Kong, Likun Qiu
Satirical irony ("·´·í") is a very important language phenomena. Its recognition is of great importance to sentiment analysis. However, researches on this topic are still quite rare and existing studies have problems such as unclear definition and unclear objects of study. To solve these problems, we first give clear definitions of satirical irony. Then we discuss in what level satirical irony occurs. Finally, we propose some features of satirical irony.
讽刺反讽(“·´·í”)是一种非常重要的语言现象。它的识别对情感分析具有重要意义。然而,关于这一主题的研究仍然很少,现有的研究存在定义不清、研究对象不明确等问题。为了解决这些问题,我们首先对讽刺的反讽进行了明确的定义。然后我们讨论了讽刺反讽在什么层次上发生。最后,我们提出了讽刺反语的一些特征。
{"title":"Formalization and Rules for Recognition of Satirical Irony","authors":"Lingpeng Kong, Likun Qiu","doi":"10.1109/IALP.2011.14","DOIUrl":"https://doi.org/10.1109/IALP.2011.14","url":null,"abstract":"Satirical irony (\"·´·í\") is a very important language phenomena. Its recognition is of great importance to sentiment analysis. However, researches on this topic are still quite rare and existing studies have problems such as unclear definition and unclear objects of study. To solve these problems, we first give clear definitions of satirical irony. Then we discuss in what level satirical irony occurs. Finally, we propose some features of satirical irony.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116906653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Rule-Based Source-Side Reordering on Phrase Structure Subtrees 基于规则的短语结构子树源端重排序
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.12
Fangli Liang, Lei Chen, Miao Li, Nasun-Urtu
Since different languages put words in different orders, reordering is an important issue in statistical machine translation. The paper proposes a rule-based reordering method at the source side as a preprocessing step, which applies some syntactic reordering rules on the phrase structure subtree to reorder source language. The reordering rules integrate the phrase structure tree with part-of-speech tags, which can implement the reordering not only between words but also between words and phrases. And the problems of long-distance reordering and translation errors can be partly solved. Meanwhile, the interference between reordering rules of this method has been significantly reduced in this method. Experiments shows that our method can improve the performance of the state-of-the-art phrase translation models, achieving 1.71 BLEU score increase over the standard phrase-based machine translation system.
由于不同的语言对单词的排列顺序不同,因此重新排序是统计机器翻译中的一个重要问题。本文提出了一种基于规则的源端重排序方法作为预处理步骤,该方法利用短语结构子树上的语法重排序规则对源语言进行重排序。该规则将短语结构树与词性标签相结合,实现了词与词之间以及词与短语之间的重新排序。远程重排和翻译错误的问题可以部分解决。同时,该方法显著降低了该方法中重排规则之间的干扰。实验表明,我们的方法可以提高最先进的短语翻译模型的性能,与标准的基于短语的机器翻译系统相比,BLEU分数提高了1.71。
{"title":"A Rule-Based Source-Side Reordering on Phrase Structure Subtrees","authors":"Fangli Liang, Lei Chen, Miao Li, Nasun-Urtu","doi":"10.1109/IALP.2011.12","DOIUrl":"https://doi.org/10.1109/IALP.2011.12","url":null,"abstract":"Since different languages put words in different orders, reordering is an important issue in statistical machine translation. The paper proposes a rule-based reordering method at the source side as a preprocessing step, which applies some syntactic reordering rules on the phrase structure subtree to reorder source language. The reordering rules integrate the phrase structure tree with part-of-speech tags, which can implement the reordering not only between words but also between words and phrases. And the problems of long-distance reordering and translation errors can be partly solved. Meanwhile, the interference between reordering rules of this method has been significantly reduced in this method. Experiments shows that our method can improve the performance of the state-of-the-art phrase translation models, achieving 1.71 BLEU score increase over the standard phrase-based machine translation system.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Semantic Orientation and Computer Identification of the Chinese Adverb cai 汉语副词cai的语义定位与计算机识别
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.76
Lin He, Pengbing Chen
The recognition of the semantic orientation of Chinese adverb on the computer is a new attempt. In this paper, in order to achieve computer automatic identification of the adverb "cai", the rules and principles of the semantic orientation of this type are summarized and proposed respectively according to its sentence structure. Based on this, the automatic identification strategies are explored and corresponding procedure diagrams are given.
汉语副词语义取向的计算机识别是一种新的尝试。为了实现对副词“才”的计算机自动识别,本文根据副词“才”的句子结构,分别总结和提出了副词“才”的语义定位规则和原则。在此基础上,探讨了自动识别策略,并给出了相应的流程图。
{"title":"On the Semantic Orientation and Computer Identification of the Chinese Adverb cai","authors":"Lin He, Pengbing Chen","doi":"10.1109/IALP.2011.76","DOIUrl":"https://doi.org/10.1109/IALP.2011.76","url":null,"abstract":"The recognition of the semantic orientation of Chinese adverb on the computer is a new attempt. In this paper, in order to achieve computer automatic identification of the adverb \"cai\", the rules and principles of the semantic orientation of this type are summarized and proposed respectively according to its sentence structure. Based on this, the automatic identification strategies are explored and corresponding procedure diagrams are given.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129950372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research of Event Pronoun Resolution 事件代词解析研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.31
Ning Zhang, Fang Kong, Peifeng Li
Event anaphora resolution plays an important role in discourse analysis. In comparison with general noun phrases, pronouns carry little information of themselves, resolving the event pronouns is a more difficult task. This paper proposes a machine learning-based framework for event pronoun resolution. All kinds of features, including both flat and structural features, are explored for event pronoun resolution. Experiments on OntoNotes corpus show that both flat and structural features are very effective for this task.
事件回指消解在语篇分析中起着重要的作用。与一般的名词短语相比,代词本身所承载的信息很少,消解事件代词是一个比较困难的任务。本文提出了一种基于机器学习的事件代词解析框架。探讨了事件代词解析的各种特征,包括平面特征和结构特征。在OntoNotes语料库上的实验表明,平面特征和结构特征都能很好地完成该任务。
{"title":"Research of Event Pronoun Resolution","authors":"Ning Zhang, Fang Kong, Peifeng Li","doi":"10.1109/IALP.2011.31","DOIUrl":"https://doi.org/10.1109/IALP.2011.31","url":null,"abstract":"Event anaphora resolution plays an important role in discourse analysis. In comparison with general noun phrases, pronouns carry little information of themselves, resolving the event pronouns is a more difficult task. This paper proposes a machine learning-based framework for event pronoun resolution. All kinds of features, including both flat and structural features, are explored for event pronoun resolution. Experiments on OntoNotes corpus show that both flat and structural features are very effective for this task.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"47 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131957192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Comparison of Chinese Spam Filter Based on Generative Model and Discriminative Model 基于生成模型和判别模型的中文垃圾邮件过滤比较
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.64
Yong Han, Yingying Wang, Huafu Ding, Haoliang Qi
Previous studies have shown that discriminative model is better than generative model for spam filtering, which is tested on the English dataset. But the study on Chinese Spam Filter is rare. So we compared the performance of Bogo: a classical generative model, Logistic Regression (LR) and Relaxed Online SVM (ROSVM): two typical discriminative models on the Chinese dataset. Bogo system adopts a generative model, which is based on Bayesian algorithm. We choose the public Chinese datasets: TREC06c, SEWM 2008, SEWM 2010, SEWM 2011, as the test dataset with immediate feedback. The discriminative model gives the better results than the generative model based on spam filter. ROSVM gives the best performance on Chinese spam filter.
已有研究表明,判别模型比生成模型对垃圾邮件的过滤效果更好,并在英语数据集上进行了测试。但是对中文垃圾邮件过滤的研究却很少。因此,我们比较了Bogo(经典生成模型)、Logistic回归(LR)和放松在线支持向量机(ROSVM)这两种典型判别模型在中文数据集上的性能。Bogo系统采用基于贝叶斯算法的生成模型。我们选择中文公开数据集:TREC06c, SEWM 2008, SEWM 2010, SEWM 2011作为即时反馈的测试数据集。判别模型比基于垃圾邮件过滤的生成模型效果更好。ROSVM在中文垃圾邮件过滤上表现最好。
{"title":"The Comparison of Chinese Spam Filter Based on Generative Model and Discriminative Model","authors":"Yong Han, Yingying Wang, Huafu Ding, Haoliang Qi","doi":"10.1109/IALP.2011.64","DOIUrl":"https://doi.org/10.1109/IALP.2011.64","url":null,"abstract":"Previous studies have shown that discriminative model is better than generative model for spam filtering, which is tested on the English dataset. But the study on Chinese Spam Filter is rare. So we compared the performance of Bogo: a classical generative model, Logistic Regression (LR) and Relaxed Online SVM (ROSVM): two typical discriminative models on the Chinese dataset. Bogo system adopts a generative model, which is based on Bayesian algorithm. We choose the public Chinese datasets: TREC06c, SEWM 2008, SEWM 2010, SEWM 2011, as the test dataset with immediate feedback. The discriminative model gives the better results than the generative model based on spam filter. ROSVM gives the best performance on Chinese spam filter.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125725523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Centroid Integer Selection Model -- A High Efficiency Method on Dynamic Multi-document Summarization 质心整数选择模型——一种高效的动态多文档摘要方法
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.56
Meiling Liu, Dequan Zheng, T. Zhao, Yang Yu
This paper researches centroid integer selection based on dynamic multi-document summarization (DMS) and presentes a dynamic multi-document summarization model, called Centroid Integer Selection Model (CISM). This model has mainly two steps. First, some abstracts were extracted from the document sets based on different first sentence, respectively. Second, the best abstract was selected based on centroid strategy from all the abstracts created in the first step. The best advantage this model showed was that it eliminated the effect caused by falsely selecting based on the first sentence. Some experiments were conducted on the Update Task test data from TAC2008, and results of new model were compared with results from the TAC2008 evaluation.
研究了基于动态多文档摘要(DMS)的质心整数选择,提出了一个动态多文档摘要模型——质心整数选择模型(CISM)。这个模型主要分为两个步骤。首先,根据不同的首句分别从文档集中提取一些摘要。其次,根据质心策略从第一步创建的所有摘要中选择最佳摘要。该模型最大的优点是消除了基于第一句的错误选择所带来的影响。在TAC2008的更新任务测试数据上进行了实验,并将新模型的结果与TAC2008的评估结果进行了比较。
{"title":"Centroid Integer Selection Model -- A High Efficiency Method on Dynamic Multi-document Summarization","authors":"Meiling Liu, Dequan Zheng, T. Zhao, Yang Yu","doi":"10.1109/IALP.2011.56","DOIUrl":"https://doi.org/10.1109/IALP.2011.56","url":null,"abstract":"This paper researches centroid integer selection based on dynamic multi-document summarization (DMS) and presentes a dynamic multi-document summarization model, called Centroid Integer Selection Model (CISM). This model has mainly two steps. First, some abstracts were extracted from the document sets based on different first sentence, respectively. Second, the best abstract was selected based on centroid strategy from all the abstracts created in the first step. The best advantage this model showed was that it eliminated the effect caused by falsely selecting based on the first sentence. Some experiments were conducted on the Update Task test data from TAC2008, and results of new model were compared with results from the TAC2008 evaluation.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"136 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120980394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2011 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1