{"title":"基于层次探索的支持向量机主动学习","authors":"Yanping Yang, E. Song, Guangzhi Ma","doi":"10.1109/CINC.2010.5643833","DOIUrl":null,"url":null,"abstract":"The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.","PeriodicalId":227004,"journal":{"name":"2010 Second International Conference on Computational Intelligence and Natural Computing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Hierarchical exploration based active learning with support vector machine\",\"authors\":\"Yanping Yang, E. Song, Guangzhi Ma\",\"doi\":\"10.1109/CINC.2010.5643833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.\",\"PeriodicalId\":227004,\"journal\":{\"name\":\"2010 Second International Conference on Computational Intelligence and Natural Computing\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Second International Conference on Computational Intelligence and Natural Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINC.2010.5643833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computational Intelligence and Natural Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINC.2010.5643833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical exploration based active learning with support vector machine
The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.