Using CoTraining and Semantic Feature Extraction for Positive and Unlabeled Text Classification

2008 International Seminar on Future Information Technology and Management Engineering Pub Date : 2008-11-20 DOI:10.1109/FITME.2008.81

Na Luo, Fuyu Yuan, Wanli Zuo

引用次数: 4

Abstract

This paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, we got vectors of documents in positive set using semantic-based feature extraction, then found the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear one-class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于协同训练和语义特征提取的正面和未标记文本分类

本文最初提出了一种三步算法。首先，利用CoTraining从未标记的数据集u中过滤出可能的正数据，然后利用基于语义的特征提取得到正集中的文档向量，然后从第一步产生的可能正集中找到强正。这些被挑选出来的数据可以提供给正数据集P。最后，线性单类支持向量机将学习纯化后的U为负，扩展后的P为正。由于该算法具有自动扩展正数据集的特性，因此在给定正数据集P不足的情况下，该算法具有良好的性能。综合实验证明，该算法优于现有算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 International Seminar on Future Information Technology and Management Engineering

自引率

0.00%

发文量