{"title":"Correcting Relational Bias to Improve Classification in Sparsely-Labeled Networks","authors":"J. R. King, Luke K. McDowell","doi":"10.1109/DSAA.2016.11","DOIUrl":null,"url":null,"abstract":"Many classification problems involve nodes that have a natural connection between them, such as links between people, pages, or social network accounts. Recent work has demonstrated how to learn relational dependencies from these links, then leverage them as predictive features. However, while this can often improve accuracy, the use of linked information can also lead to cascading prediction errors, especially in the common-case when a network is only sparsely-labeled. In response, this paper examines several existing and new methods for correcting the \"relational bias\" that leads to such errors. First, we explain how existing approaches can be divided into \"resemblance-based\" and \"assignment-based\" methods, and provide the first experimental comparison between them. We demonstrate that all of these methods can improve accuracy, but that the former type typically leads to better accuracy. Moreover, we show that the more flexible methods typically perform best, motivating a new assignment-based method that often improves accuracy vs. a more rigid method. In addition, we demonstrate for the first time that some of these methods can also improve accuracy when combined with Gibbs sampling for inference. However, we show that, with Gibbs, correcting relational bias also requires improving label initialization, and present two new initialization methods that yield large accuracy gains. Finally, we evaluate the effects of relational bias when \"neighbor attributes,\" recently-proposed additions that can provide more stability during inference, are included as model features. We show that such attributes reduce the negative impact of bias, but that using some form of bias correction remains important for achieving maximal accuracy.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many classification problems involve nodes that have a natural connection between them, such as links between people, pages, or social network accounts. Recent work has demonstrated how to learn relational dependencies from these links, then leverage them as predictive features. However, while this can often improve accuracy, the use of linked information can also lead to cascading prediction errors, especially in the common-case when a network is only sparsely-labeled. In response, this paper examines several existing and new methods for correcting the "relational bias" that leads to such errors. First, we explain how existing approaches can be divided into "resemblance-based" and "assignment-based" methods, and provide the first experimental comparison between them. We demonstrate that all of these methods can improve accuracy, but that the former type typically leads to better accuracy. Moreover, we show that the more flexible methods typically perform best, motivating a new assignment-based method that often improves accuracy vs. a more rigid method. In addition, we demonstrate for the first time that some of these methods can also improve accuracy when combined with Gibbs sampling for inference. However, we show that, with Gibbs, correcting relational bias also requires improving label initialization, and present two new initialization methods that yield large accuracy gains. Finally, we evaluate the effects of relational bias when "neighbor attributes," recently-proposed additions that can provide more stability during inference, are included as model features. We show that such attributes reduce the negative impact of bias, but that using some form of bias correction remains important for achieving maximal accuracy.