首页 > 最新文献

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献

英文 中文
Designing effective web mining-based techniques for OOV translation 设计有效的基于web挖掘的面向对象语言翻译技术
Haitao Yu, F. Ren, Degen Huang, Lishuang Li
Due to a limited coverage of the existing bilingual dictionary, it is often difficult to translate the Out-Of-Vocabulary terms (OOV) in many natural language processing tasks. In this paper, we propose a general cascade mining technique of three steps, it leverages OOV category to optimize the effectiveness of each step. OOV category based expansion policy is suggested to get more relevant mixed-language documents. OOV category based hybrid extraction approach is suggested to perform a robust extraction. A more flexible model combination based on OOV category is also suggested. Moreover, we conducted experiments to evaluate the effectiveness of each step and the overall performance of the mining technique. The experimental results show significantly performance improvement than the existing methods.
由于现有双语词典的覆盖范围有限,在许多自然语言处理任务中,词汇外术语(OOV)的翻译往往很困难。在本文中,我们提出了一种通用的三级级联挖掘技术,它利用OOV类别来优化每一步的有效性。提出了基于OOV分类的扩展策略,以获得更多相关的混合语言文档。提出了基于OOV分类的混合提取方法,实现了鲁棒性提取。提出了一种基于面向对象分类的更灵活的模型组合方法。此外,我们还进行了实验,以评估每个步骤的有效性和采矿技术的整体性能。实验结果表明,与现有方法相比,该方法的性能有显著提高。
{"title":"Designing effective web mining-based techniques for OOV translation","authors":"Haitao Yu, F. Ren, Degen Huang, Lishuang Li","doi":"10.1109/NLPKE.2010.5587807","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587807","url":null,"abstract":"Due to a limited coverage of the existing bilingual dictionary, it is often difficult to translate the Out-Of-Vocabulary terms (OOV) in many natural language processing tasks. In this paper, we propose a general cascade mining technique of three steps, it leverages OOV category to optimize the effectiveness of each step. OOV category based expansion policy is suggested to get more relevant mixed-language documents. OOV category based hybrid extraction approach is suggested to perform a robust extraction. A more flexible model combination based on OOV category is also suggested. Moreover, we conducted experiments to evaluate the effectiveness of each step and the overall performance of the mining technique. The experimental results show significantly performance improvement than the existing methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115674388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Document summarization based on improved features and clustering 基于改进特征和聚类的多文档摘要
Ying Xiong, Hongyan Liu, Lei Li
Multi-Document summarization is an emerging technique for understanding the main purpose of many documents about the same topic. This paper proposes a new feature selection method to improve the summarization result. When calculating similarity, we use a modified TFIDF formula which achieves a better result. We adopt two ways for exactly extracting keywords. Experimental results demonstrate that our improved method performs better than the traditional one.
多文档摘要是一种新兴的技术,用于理解关于同一主题的许多文档的主要目的。本文提出了一种新的特征选择方法来改善摘要结果。在计算相似度时,我们使用了一个改进的TFIDF公式,得到了更好的结果。我们采用两种方法精确提取关键词。实验结果表明,改进后的方法比传统方法具有更好的性能。
{"title":"Multi-Document summarization based on improved features and clustering","authors":"Ying Xiong, Hongyan Liu, Lei Li","doi":"10.1109/NLPKE.2010.5587834","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587834","url":null,"abstract":"Multi-Document summarization is an emerging technique for understanding the main purpose of many documents about the same topic. This paper proposes a new feature selection method to improve the summarization result. When calculating similarity, we use a modified TFIDF formula which achieves a better result. We adopt two ways for exactly extracting keywords. Experimental results demonstrate that our improved method performs better than the traditional one.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128687187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification 自下而上:探索汉语句子主情感分类的词语情感
Xin Kang, F. Ren, Yunong Wu
In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.
在本文中,我们证明了在九类情感(包括“无情感”)中使用基本情感成分分析汉语句子主情感的有效性。与传统的基于词汇的方法相比,我们的研究以八维情感空间为特征,探索词语和短语的情感强度。设计了情感矩阵核来评估这些情感特征的内积,用于时间复杂度为0 (n)的支持向量机分类。实验结果表明,该方法显著提高了情感分类的性能。
{"title":"Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification","authors":"Xin Kang, F. Ren, Yunong Wu","doi":"10.1109/NLPKE.2010.5587793","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587793","url":null,"abstract":"In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"18 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Improving Chinese-English patent machine translation using sentence segmentation 基于分句的汉英专利机器翻译改进
Yaohong Jin, Zhiying Liu
This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.
提出了一种利用句子分词提高汉英专利机器翻译性能的方法。该方法利用层次概念网络理论(HNC)的一些特征,将汉语长句分割成独立的短句。介绍了CSC的主要动词(Eg)、CSP的主要动词(Egp)、长NPs和连词的语义特征。分割算法的主要目的是检测一个CSC是否可以是一个单独的句子。将分割方法与基于规则的机器翻译系统相结合。对这些短译的顺序进行了调整,并考虑了英汉两种语言的不同表达方式。从实验结果可以看出,该方法有效地提高了汉英专利翻译的性能。我们的方法已被集成到国家知识产权局运行的在线专利MT系统中。
{"title":"Improving Chinese-English patent machine translation using sentence segmentation","authors":"Yaohong Jin, Zhiying Liu","doi":"10.1109/NLPKE.2010.5587855","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587855","url":null,"abstract":"This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"124 20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Texture image retrieval based on gray-primitive co-occurrence matrix 基于灰度-原元共现矩阵的纹理图像检索
Wei Wang, Motoyuki Suzuki, F. Ren
The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.
纹理相似度的研究是基于内容的图像检索系统的重要组成部分。首先证明了灰度-原始共现矩阵的旋转不变性,然后提出了一种基于灰度-原始共现矩阵的纹理图像检索技术。实验结果表明,该算法具有较低的计算复杂度和一定的抗噪能力。
{"title":"Texture image retrieval based on gray-primitive co-occurrence matrix","authors":"Wei Wang, Motoyuki Suzuki, F. Ren","doi":"10.1109/NLPKE.2010.5587830","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587830","url":null,"abstract":"The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic filtration of multiword units 自动过滤多字单位
Y. Liu, Zheng Tie
This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.
本文研究了如何过滤多字单元。我们使用归一化期望(NE)从专利语料库中提取多词候选单位。然后使用停止词、频率、第一个停止词、最后一个停止词和上下文熵对多词单元候选词进行过滤。实验结果表明,过滤后的多词单元准确率提高了8.7%。
{"title":"Automatic filtration of multiword units","authors":"Y. Liu, Zheng Tie","doi":"10.1109/NLPKE.2010.5587783","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587783","url":null,"abstract":"This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Needs and challenges of care robots in nursing care setting: A literature review 护理机器人在护理环境中的需求与挑战:文献综述
Yuko Nagai, T. Tanioka, Shoko Fuji, Yuko Yasuhara, Sakiko Sakamaki, Narimi Taoka, R. Locsin, Fuji Ren, Kazuyuki Matsumoto
This study aims to identify needs and challenges of care robot in nursing care setting through an extensive search of the literature. As the result shows, there exists a shortage of information about results of the introduction of care robots, the needs of recipients and care providers, and relevant ethical problems. To advance our research and to introduce care robots into setting, there are so many things to do; consider the application of natural language processing technology by collaborating with researchers in the robotics field, carry out an investigation, extract the needs, clarify ethical problems and seek solutions, conduct the on-site experiment study, and so on.
本研究旨在确定需求和挑战的保健护理环境中机器人通过一个广泛的文献搜索。结果表明,关于引入护理机器人的结果,接受者和护理提供者的需求以及相关的伦理问题存在信息短缺。为了推进我们的研究并将护理机器人引入环境,有很多事情要做;通过与机器人领域的研究人员合作,考虑自然语言处理技术的应用,开展调查,提取需求,澄清伦理问题并寻求解决方案,进行现场实验研究等。
{"title":"Needs and challenges of care robots in nursing care setting: A literature review","authors":"Yuko Nagai, T. Tanioka, Shoko Fuji, Yuko Yasuhara, Sakiko Sakamaki, Narimi Taoka, R. Locsin, Fuji Ren, Kazuyuki Matsumoto","doi":"10.1109/NLPKE.2010.5587815","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587815","url":null,"abstract":"This study aims to identify needs and challenges of care robot in nursing care setting through an extensive search of the literature. As the result shows, there exists a shortage of information about results of the introduction of care robots, the needs of recipients and care providers, and relevant ethical problems. To advance our research and to introduce care robots into setting, there are so many things to do; consider the application of natural language processing technology by collaborating with researchers in the robotics field, carry out an investigation, extract the needs, clarify ethical problems and seek solutions, conduct the on-site experiment study, and so on.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132952033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A new cascade algorithm based on CRFs for recognizing Chinese verb-object collocation 基于CRFs的汉语动宾搭配级联识别新算法
Guiping Zhang, Zhichao Liu, Qiaoli Zhou, Dongfeng Cai, Jiao Cheng
This paper proposes a new cascade algorithm based on conditional random fields. The algorithm is applied to automatic recognition of Chinese verb-object collocation, and combined with a new sequence labeling of “ONIY”. Experiments compare identified results under two segmentations and part-of-speech tag sets. The comprehensive experimental results show that the best performance is 90.65 % in F-score over Tsinghua Treebank, and 82.00 % in F-score over the segmentation and part-of-speech tagging scheme of Peking University. Our experiments show that the proposed algorithm can greatly improve recognition accuracy of multi-nested collocation, and play a positive role on long distance collocation.
提出了一种新的基于条件随机场的级联算法。将该算法应用于汉语动宾搭配的自动识别中,并结合一种新的序列标注“only”。实验比较了两种分词和词性标签集下的识别结果。综合实验结果表明,该方法优于清华树库的f值为90.65%,优于北京大学的分词和词性标注方案的f值为82.00%。实验结果表明,该算法可以大大提高多嵌套搭配的识别精度,并在远距离搭配中发挥积极作用。
{"title":"A new cascade algorithm based on CRFs for recognizing Chinese verb-object collocation","authors":"Guiping Zhang, Zhichao Liu, Qiaoli Zhou, Dongfeng Cai, Jiao Cheng","doi":"10.1109/NLPKE.2010.5587828","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587828","url":null,"abstract":"This paper proposes a new cascade algorithm based on conditional random fields. The algorithm is applied to automatic recognition of Chinese verb-object collocation, and combined with a new sequence labeling of “ONIY”. Experiments compare identified results under two segmentations and part-of-speech tag sets. The comprehensive experimental results show that the best performance is 90.65 % in F-score over Tsinghua Treebank, and 82.00 % in F-score over the segmentation and part-of-speech tagging scheme of Peking University. Our experiments show that the proposed algorithm can greatly improve recognition accuracy of multi-nested collocation, and play a positive role on long distance collocation.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114551334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Negation disambiguation using the maximum entropy model 最大熵模型的否定消歧
Chunliang Zhang, Xiaoxu Fei, Jingbo Zhu
Handling negation issue is of great significance for sentiment analysis. Most previous studies adopted a simple heuristic rule for sentiment negation disambiguation within a fixed context window. In this paper we present a supervised method to disambiguate which sentiment word is attached to the negator such as “(not)” in an opinionated sentence. Experimental results show that our method can achieve better performance than traditional methods.
处理否定问题对情感分析具有重要意义。以往的研究大多采用简单的启发式规则,在固定的语境窗口内进行情绪否定消歧。本文提出了一种有监督的方法来判别在固执己见的句子中,哪个情感词附在否定词(如“(不)”)后面。实验结果表明,该方法比传统方法具有更好的性能。
{"title":"Negation disambiguation using the maximum entropy model","authors":"Chunliang Zhang, Xiaoxu Fei, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587857","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587857","url":null,"abstract":"Handling negation issue is of great significance for sentiment analysis. Most previous studies adopted a simple heuristic rule for sentiment negation disambiguation within a fixed context window. In this paper we present a supervised method to disambiguate which sentiment word is attached to the negator such as “(not)” in an opinionated sentence. Experimental results show that our method can achieve better performance than traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed training for Conditional Random Fields 条件随机场的分布式训练
Xiaojun Lin, Liang Zhao, Dianhai Yu, Xihong Wu
This paper proposes a novel distributed training method of Conditional Random Fields (CRFs) by utilizing the clusters built from commodity computers. The method employs Message Passing Interface (MPI) to deal with large-scale data in two steps. Firstly, the entire training data is divided into several small pieces, each of which can be handled by one node. Secondly, instead of adopting a root node to collect all features, a new criterion is used to split the whole feature set into non-overlapping subsets and ensure that each node maintains the global information of one feature subset. Experiments are carried out on the task of Chinese word segmentation (WS) with large scale data, and we observed significant reduction on both training time and space, while preserving the performance.
本文提出了一种新的条件随机场(CRFs)分布式训练方法,该方法利用商用计算机构建的聚类进行训练。该方法采用消息传递接口(Message Passing Interface, MPI)分两步处理大规模数据。首先,将整个训练数据分成几个小块,每个小块可以由一个节点处理。其次,不再采用根节点收集所有特征,而是采用新的准则将整个特征集分割成不重叠的子集,并保证每个节点保持一个特征子集的全局信息;对大规模数据的中文分词(WS)任务进行了实验,在保持性能的前提下,我们观察到训练时间和空间的显著减少。
{"title":"Distributed training for Conditional Random Fields","authors":"Xiaojun Lin, Liang Zhao, Dianhai Yu, Xihong Wu","doi":"10.1109/NLPKE.2010.5587803","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587803","url":null,"abstract":"This paper proposes a novel distributed training method of Conditional Random Fields (CRFs) by utilizing the clusters built from commodity computers. The method employs Message Passing Interface (MPI) to deal with large-scale data in two steps. Firstly, the entire training data is divided into several small pieces, each of which can be handled by one node. Secondly, instead of adopting a root node to collect all features, a new criterion is used to split the whole feature set into non-overlapping subsets and ensure that each node maintains the global information of one feature subset. Experiments are carried out on the task of Chinese word segmentation (WS) with large scale data, and we observed significant reduction on both training time and space, while preserving the performance.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123421571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1