Distributed training for Conditional Random Fields

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI:10.1109/NLPKE.2010.5587803

Xiaojun Lin, Liang Zhao, Dianhai Yu, Xihong Wu

引用次数: 5

Abstract

This paper proposes a novel distributed training method of Conditional Random Fields (CRFs) by utilizing the clusters built from commodity computers. The method employs Message Passing Interface (MPI) to deal with large-scale data in two steps. Firstly, the entire training data is divided into several small pieces, each of which can be handled by one node. Secondly, instead of adopting a root node to collect all features, a new criterion is used to split the whole feature set into non-overlapping subsets and ensure that each node maintains the global information of one feature subset. Experiments are carried out on the task of Chinese word segmentation (WS) with large scale data, and we observed significant reduction on both training time and space, while preserving the performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

条件随机场的分布式训练

本文提出了一种新的条件随机场(CRFs)分布式训练方法，该方法利用商用计算机构建的聚类进行训练。该方法采用消息传递接口(Message Passing Interface, MPI)分两步处理大规模数据。首先，将整个训练数据分成几个小块，每个小块可以由一个节点处理。其次，不再采用根节点收集所有特征，而是采用新的准则将整个特征集分割成不重叠的子集，并保证每个节点保持一个特征子集的全局信息;对大规模数据的中文分词(WS)任务进行了实验，在保持性能的前提下，我们观察到训练时间和空间的显著减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)

自引率

0.00%

发文量

期刊最新文献

Dashboard: An integration and testing platform based on backboard architecture for NLP applications Chinese semantic role labeling based on semantic knowledge Transitivity in semantic relation learning Wisdom media “CAIWA Channel” based on natural language interface agent A new cascade algorithm based on CRFs for recognizing Chinese verb-object collocation