首页 > 最新文献

2011 International Conference on Asian Language Processing最新文献

英文 中文
A Query Reformulation Model Using Markov Graphic Method 基于马尔科夫图方法的查询重构模型
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.62
Jiali Zuo, Mingwen Wang
Information retrieval model is still can not achieve satisfactory performance after decades of development. One of the reasons is the queries can not express information need precisely. Researches have shown that query reformulation can improve the performance of retrieval model. In this paper, we propose a query reformulation model, which use Markov network to represent term relationship to obtain useful information from corpus to reformulate query. Experimental results show that our model can avoid topic drift and then improve the retrieval performance.
信息检索模型经过几十年的发展,仍然不能达到令人满意的性能。其中一个原因是查询不能准确表达信息需求。研究表明,查询重构可以提高检索模型的性能。本文提出了一种查询重表述模型,利用马尔可夫网络表示术语关系,从语料库中获取有用信息进行查询重表述。实验结果表明,该模型可以避免主题漂移,从而提高检索性能。
{"title":"A Query Reformulation Model Using Markov Graphic Method","authors":"Jiali Zuo, Mingwen Wang","doi":"10.1109/IALP.2011.62","DOIUrl":"https://doi.org/10.1109/IALP.2011.62","url":null,"abstract":"Information retrieval model is still can not achieve satisfactory performance after decades of development. One of the reasons is the queries can not express information need precisely. Researches have shown that query reformulation can improve the performance of retrieval model. In this paper, we propose a query reformulation model, which use Markov network to represent term relationship to obtain useful information from corpus to reformulate query. Experimental results show that our model can avoid topic drift and then improve the retrieval performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115014669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Error-Driven Adaptive Language Modeling for Chinese Pinyin-to-Character Conversion 基于错误驱动的汉语拼音字符转换自适应语言建模
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.46
J. Huang, D. Powers
The performance of Chinese Pinyin-to-Character conversion is severely affected when the characteristics of the training and conversion data differ. As natural language is highly variable and uncertain, it is impossible to build a complete and general language model to suit all the tasks. The traditional adaptive MAP models mix the task independent data with task dependent data using a mixture coefficient but we never can predict what style of language users have and what new domain will appear. This paper presents a statistical error-driven adaptive language modeling approach to Chinese Pinyin input system. This model can be incrementally adapted when an error occurs during Pinyin-to-Character converting time. It significantly improves Pinyin-to-Character conversion rate.
当训练数据和转换数据的特征不同时,会严重影响汉字拼音转换的性能。由于自然语言具有高度的可变性和不确定性,不可能建立一个完整的、通用的语言模型来适应所有的任务。传统的自适应MAP模型使用混合系数将任务独立数据与任务相关数据混合,但无法预测用户的语言风格和新领域的出现。提出了一种统计误差驱动的自适应汉语拼音输入系统语言建模方法。当在拼音到字符转换期间发生错误时,可以逐步调整此模型。它显著提高了拼音到字符的转换率。
{"title":"Error-Driven Adaptive Language Modeling for Chinese Pinyin-to-Character Conversion","authors":"J. Huang, D. Powers","doi":"10.1109/IALP.2011.46","DOIUrl":"https://doi.org/10.1109/IALP.2011.46","url":null,"abstract":"The performance of Chinese Pinyin-to-Character conversion is severely affected when the characteristics of the training and conversion data differ. As natural language is highly variable and uncertain, it is impossible to build a complete and general language model to suit all the tasks. The traditional adaptive MAP models mix the task independent data with task dependent data using a mixture coefficient but we never can predict what style of language users have and what new domain will appear. This paper presents a statistical error-driven adaptive language modeling approach to Chinese Pinyin input system. This model can be incrementally adapted when an error occurs during Pinyin-to-Character converting time. It significantly improves Pinyin-to-Character conversion rate.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Phoneme-Level Articulator Dynamics for Pronunciation Animation 语音动画的音素级发音器动态
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.13
Sheng Li, Lan Wang, En Qi
Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.
语音可视化可以扩展为语言学习者的发音动画任务。本文用Carstens电磁发音仪(EMA AG500)记录了一个三维英语发音数据库。实现了一种基于hmm的连续语音视觉合成方法,以恢复三维发音信息。然后将合成的关节与EMA记录进行比较以进行客观评价。使用数据驱动的3D说话头,可以通过外部和内部发音运动来描绘容易混淆的音素之间的区别。实验表明,在训练数据有限的情况下,基于hmm的合成可以实现最小均方根误差小于2mm。合成的发音动作可用于计算机辅助发音训练。
{"title":"The Phoneme-Level Articulator Dynamics for Pronunciation Animation","authors":"Sheng Li, Lan Wang, En Qi","doi":"10.1109/IALP.2011.13","DOIUrl":"https://doi.org/10.1109/IALP.2011.13","url":null,"abstract":"Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129690948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Polarity Shifting: Corpus Construction and Analysis 极性转移:语料库建构与分析
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.27
Xiaoqian Zhang, Shoushan Li, Guodong Zhou, Hongxia Zhao
Polarity shifting has been a challenge to automatic sentiment classification. In this paper, we create a corpus which consists of polarity-shifted sentences in various kinds of product reviews. In the corpus, both the sentimental words and shifting trigger words are annotated. Furthermore, we analyze all the polarity shifted sentences and categorize them into five categories: opinion-itself, holder, target, time and hypothesis. Experimental study shows the agreement of annotation and the distribution of the five categories of polarity shifting.
极性转移一直是情感自动分类面临的挑战。在本文中,我们创建了一个语料库,该语料库由各种产品评论中的极性转移句组成。在语料库中,感伤词和移位触发词都有注释。此外,我们对所有极性转移句进行了分析,并将其分为五类:意见本身、持有者、目标、时间和假设。实验研究表明,五类极性转移的注释和分布是一致的。
{"title":"Polarity Shifting: Corpus Construction and Analysis","authors":"Xiaoqian Zhang, Shoushan Li, Guodong Zhou, Hongxia Zhao","doi":"10.1109/IALP.2011.27","DOIUrl":"https://doi.org/10.1109/IALP.2011.27","url":null,"abstract":"Polarity shifting has been a challenge to automatic sentiment classification. In this paper, we create a corpus which consists of polarity-shifted sentences in various kinds of product reviews. In the corpus, both the sentimental words and shifting trigger words are annotated. Furthermore, we analyze all the polarity shifted sentences and categorize them into five categories: opinion-itself, holder, target, time and hypothesis. Experimental study shows the agreement of annotation and the distribution of the five categories of polarity shifting.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122669284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Corpus Based Extractive Document Summarization for Indic Script 基于语料库的印度语抽取文档摘要
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.66
P. Reddy, B. V. Vardhan, A. Govardhan
Summarization is a process of generating condensed form of a given text document, which retains its information and overall meaning. Document summarization approaches are broadly classified into two i.e. extractive summarization approach and abstractive summarization approach. In this paper, we performed single document summarization to generate summary of Telugu text document by using extractive summarization approach. Though there are many document surface features exists, we consider those features which can extensively cover original document and generates summary with less redundancy. We considered the features such as sentence position, sentence similarity with the title, centrality of the sentence and word frequency. To increase the strength of the features, we used a corpus which contains 3000 documents and performed various preprocessing steps like stop word elimination and stemming to retain more meaningful words within the sentence. Sentences are ranked by calculating the scores for each individual sentence by considering all four features simultaneously with optimum weights. The optimum weights to the feature are learned with the help human constructed summaries. The machine generated summaries are evaluated using F1 measure followed by human judgements.
摘要是将给定的文本文档生成浓缩形式,并保留其信息和整体意义的过程。文档摘要方法大致分为抽取式摘要方法和抽象式摘要方法。本文采用抽取摘要的方法对泰卢固语文本文档进行单文档摘要生成。虽然存在许多文档表面特征,但我们考虑的是那些能够广泛覆盖原始文档并生成冗余较少的摘要的特征。我们考虑了句子位置、句子与标题的相似度、句子的中心性和词频等特征。为了增强特征的强度,我们使用了一个包含3000个文档的语料库,并执行了各种预处理步骤,如停止词消除和词干提取,以保留句子中更有意义的单词。通过同时考虑所有四个特征并以最佳权重计算每个句子的分数来对句子进行排名。在人工构造摘要的帮助下学习特征的最优权重。机器生成的摘要使用F1测量进行评估,然后进行人工判断。
{"title":"Corpus Based Extractive Document Summarization for Indic Script","authors":"P. Reddy, B. V. Vardhan, A. Govardhan","doi":"10.1109/IALP.2011.66","DOIUrl":"https://doi.org/10.1109/IALP.2011.66","url":null,"abstract":"Summarization is a process of generating condensed form of a given text document, which retains its information and overall meaning. Document summarization approaches are broadly classified into two i.e. extractive summarization approach and abstractive summarization approach. In this paper, we performed single document summarization to generate summary of Telugu text document by using extractive summarization approach. Though there are many document surface features exists, we consider those features which can extensively cover original document and generates summary with less redundancy. We considered the features such as sentence position, sentence similarity with the title, centrality of the sentence and word frequency. To increase the strength of the features, we used a corpus which contains 3000 documents and performed various preprocessing steps like stop word elimination and stemming to retain more meaningful words within the sentence. Sentences are ranked by calculating the scores for each individual sentence by considering all four features simultaneously with optimum weights. The optimum weights to the feature are learned with the help human constructed summaries. The machine generated summaries are evaluated using F1 measure followed by human judgements.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125013016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Study of the Classification and Arrangement Rule of Uygur Morphemes for Information Processing 维吾尔语信息加工语素分类排列规律研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.50
Pu Li, Shuzhen Shi
In the processing of modern uygur corpus, it is necessary to make a word character mark study of the word level within the modern uygur language data. Since the classification of morpheme is to serve the mark of word character, the article classifies Uygur morphemes from their functions and lists their all classifications and arrangement rules.
在现代维吾尔语语料库的处理中,有必要对现代维吾尔语语料中的词层次进行词字标研究。由于语素的分类是为字标服务的,本文从语素的功能出发,对维吾尔语语素进行了分类,并列出了语素的分类和排列规律。
{"title":"A Study of the Classification and Arrangement Rule of Uygur Morphemes for Information Processing","authors":"Pu Li, Shuzhen Shi","doi":"10.1109/IALP.2011.50","DOIUrl":"https://doi.org/10.1109/IALP.2011.50","url":null,"abstract":"In the processing of modern uygur corpus, it is necessary to make a word character mark study of the word level within the modern uygur language data. Since the classification of morpheme is to serve the mark of word character, the article classifies Uygur morphemes from their functions and lists their all classifications and arrangement rules.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"1047 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123141081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result 阿拉伯语口语文本的句子边界检测:初步结果
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.38
A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman
Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.
最近,自然语言处理任务更频繁地在在线内容上进行。这对阿拉伯语的应用程序提出了一个特殊的问题。在线阿拉伯语内容通常以非正式的阿拉伯语口语书写,其特点是结构不良,缺乏具体的语言标准化。在本文中,我们研究了成功进行自然语言处理的第一步,即句子边界检测问题。由于非正式阿拉伯语缺乏基本的语言规则,我们通过对大量非正式阿拉伯语文本的广泛研究,建立了一份常用标点符号列表。此外,我们评估了这些标点符号作为句子分隔符的正确用法;结果产生了70%的初步准确度。
{"title":"Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result","authors":"A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman","doi":"10.1109/IALP.2011.38","DOIUrl":"https://doi.org/10.1109/IALP.2011.38","url":null,"abstract":"Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Non-native Accent Pronunciation Modeling in Automatic Speech Recognition 自动语音识别中的非母语口音发音建模
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.65
Basem H. A. Ahmed, T. Tan
In this paper, we proposed an approach to model the pronunciation of non-native accented speech for automatic speech recognition system. The proposed method consists of two phases: phones adaptation and pronunciation generalization. In phones adaptation, we identify the phones used by non-native speakers compared to the standard phones, and then remove the mismatch, as a result of the influence from mother tongue. In pronunciation adaptation, we predict the pronunciations of words by non-native speakers. The results shown the proposed approach reduce the WER from 44.8% to 41.9%.
本文提出了一种用于语音自动识别系统的非母语重音语音建模方法。该方法包括两个阶段:语音适应和语音泛化。在电话适应中,我们将非母语人士使用的电话与标准电话进行比较,然后消除由于母语影响而产生的不匹配。在发音适应中,我们预测非母语人士的发音。结果表明,该方法可将WER从44.8%降低到41.9%。
{"title":"Non-native Accent Pronunciation Modeling in Automatic Speech Recognition","authors":"Basem H. A. Ahmed, T. Tan","doi":"10.1109/IALP.2011.65","DOIUrl":"https://doi.org/10.1109/IALP.2011.65","url":null,"abstract":"In this paper, we proposed an approach to model the pronunciation of non-native accented speech for automatic speech recognition system. The proposed method consists of two phases: phones adaptation and pronunciation generalization. In phones adaptation, we identify the phones used by non-native speakers compared to the standard phones, and then remove the mismatch, as a result of the influence from mother tongue. In pronunciation adaptation, we predict the pronunciations of words by non-native speakers. The results shown the proposed approach reduce the WER from 44.8% to 41.9%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132024778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Research on Multi-document Summarization Model Based on Dynamic Manifold-Ranking 基于动态流形排序的多文档摘要模型研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.55
Meiling Liu, Honge Ren, Dequan Zheng, T. Zhao
This paper introduces a model to describe the dynamic evolution of network information, identifying and analyzing the document collection on the same topic in different stages. In order to characterize the dynamic relationship of evolutionary content differences, this paper presents a dynamic multi-document summarization model, which is called the Dynamic Manifold-Ranking Model (DMRM). Some experiments were conducted on the Update Task test data from TAC2008, and results of new model were compared with results from the TAC2008 evaluation. This comparison demonstrated the effectiveness of the model.
本文引入了一个描述网络信息动态演化的模型,对同一主题不同阶段的文献集合进行识别和分析。为了描述进化内容差异的动态关系,本文提出了一种动态多文档摘要模型,称为动态流形排序模型(DMRM)。在TAC2008的更新任务测试数据上进行了实验,并将新模型的结果与TAC2008的评估结果进行了比较。这一对比证明了该模型的有效性。
{"title":"Research on Multi-document Summarization Model Based on Dynamic Manifold-Ranking","authors":"Meiling Liu, Honge Ren, Dequan Zheng, T. Zhao","doi":"10.1109/IALP.2011.55","DOIUrl":"https://doi.org/10.1109/IALP.2011.55","url":null,"abstract":"This paper introduces a model to describe the dynamic evolution of network information, identifying and analyzing the document collection on the same topic in different stages. In order to characterize the dynamic relationship of evolutionary content differences, this paper presents a dynamic multi-document summarization model, which is called the Dynamic Manifold-Ranking Model (DMRM). Some experiments were conducted on the Update Task test data from TAC2008, and results of new model were compared with results from the TAC2008 evaluation. This comparison demonstrated the effectiveness of the model.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114745902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrated Approach Using Conditional Random Fields for Named Entity Recognition and Person Property Extraction in Vietnamese Text 基于条件随机场的越南语文本命名实体识别与人物属性提取集成方法
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.37
Hoang-Quynh Le, Mai-Vu Tran, Nhat-Nam Bui, N. Phan, Quang-Thuy Ha
Personal names are among one of the most frequently searched items in web search engines and a person entity is always associated with numerous properties. In this paper, we propose an integrated model to recognize person entity and extract relevant values of a pre-defined set of properties related to this person simultaneously for Vietnamese. We also design a rich feature set by using various kind of knowledge resources and a apply famous machine learning method CRFs to improve the results. The obtained results show that our method is suitable for Vietnamese with the average result is 84 % of precision, 82.56% of recall and 83.39 % of F-measure. Moreover, performance time is pretty good, and the results also show the effectiveness of our feature set.
个人姓名是网络搜索引擎中最常搜索的条目之一,个人实体总是与许多属性相关联。在本文中,我们提出了一个集成模型来识别人实体,并同时提取与越南人相关的预定义属性集的相关值。我们还利用各种知识资源设计了丰富的特征集,并应用著名的机器学习方法CRFs来改进结果。结果表明,该方法适用于越南语,平均准确率为84%,召回率为82.56%,F-measure率为83.39%。此外,性能时间也相当不错,结果也表明了我们的特征集的有效性。
{"title":"An Integrated Approach Using Conditional Random Fields for Named Entity Recognition and Person Property Extraction in Vietnamese Text","authors":"Hoang-Quynh Le, Mai-Vu Tran, Nhat-Nam Bui, N. Phan, Quang-Thuy Ha","doi":"10.1109/IALP.2011.37","DOIUrl":"https://doi.org/10.1109/IALP.2011.37","url":null,"abstract":"Personal names are among one of the most frequently searched items in web search engines and a person entity is always associated with numerous properties. In this paper, we propose an integrated model to recognize person entity and extract relevant values of a pre-defined set of properties related to this person simultaneously for Vietnamese. We also design a rich feature set by using various kind of knowledge resources and a apply famous machine learning method CRFs to improve the results. The obtained results show that our method is suitable for Vietnamese with the average result is 84 % of precision, 82.56% of recall and 83.39 % of F-measure. Moreover, performance time is pretty good, and the results also show the effectiveness of our feature set.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117083362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2011 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1