{"title":"Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning","authors":"Bilzã Araújo, Liang Zhao","doi":"10.1109/BRICS-CCI-CBIC.2013.77","DOIUrl":null,"url":null,"abstract":"Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.","PeriodicalId":306195,"journal":{"name":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRICS-CCI-CBIC.2013.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.