Hierarchical exploration based active learning with support vector machine

2010 Second International Conference on Computational Intelligence and Natural Computing Pub Date : 2010-11-22 DOI:10.1109/CINC.2010.5643833

Yanping Yang, E. Song, Guangzhi Ma

{"title":"Hierarchical exploration based active learning with support vector machine","authors":"Yanping Yang, E. Song, Guangzhi Ma","doi":"10.1109/CINC.2010.5643833","DOIUrl":null,"url":null,"abstract":"The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.","PeriodicalId":227004,"journal":{"name":"2010 Second International Conference on Computational Intelligence and Natural Computing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computational Intelligence and Natural Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINC.2010.5643833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于层次探索的支持向量机主动学习

主动学习的目标是最小化机器学习所需的标记数据量。一些方法侧重于开发具有高不确定性的样本，但这些方法无法获得具有代表性的数据样本集。其他方法试图通过利用数据集的先验分布来探索代表性样本。然而，它们通常在计算上很昂贵，并且需要大量的标记数据进行初始化。在本文中，我们开发了一种基于分层探索的主动学习算法，该算法同时考虑了数据集的分布和当前假设的决策边界。我们的方法使用支持向量机(SVM)作为分类器。采用分层聚类算法，自上而下逐步发现数据集的结构。在层次结构发现的每一步中，都会查询代表性样本的标签，以检查相对聚类的纯度。纯度低的团簇将进一步划分。在用这些代表性样本构建SVM模型草稿后，对决策边界附近的不确定样本进行进一步标记，以帮助降低分类器的熵。为了证明该方法的有效性，我们将该方法与来自UCI的六个数据集上的五种最新算法进行了比较。通过比较，我们的方法表现出最好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 Second International Conference on Computational Intelligence and Natural Computing

自引率

0.00%

发文量