Correcting Relational Bias to Improve Classification in Sparsely-Labeled Networks

J. R. King, Luke K. McDowell
{"title":"Correcting Relational Bias to Improve Classification in Sparsely-Labeled Networks","authors":"J. R. King, Luke K. McDowell","doi":"10.1109/DSAA.2016.11","DOIUrl":null,"url":null,"abstract":"Many classification problems involve nodes that have a natural connection between them, such as links between people, pages, or social network accounts. Recent work has demonstrated how to learn relational dependencies from these links, then leverage them as predictive features. However, while this can often improve accuracy, the use of linked information can also lead to cascading prediction errors, especially in the common-case when a network is only sparsely-labeled. In response, this paper examines several existing and new methods for correcting the \"relational bias\" that leads to such errors. First, we explain how existing approaches can be divided into \"resemblance-based\" and \"assignment-based\" methods, and provide the first experimental comparison between them. We demonstrate that all of these methods can improve accuracy, but that the former type typically leads to better accuracy. Moreover, we show that the more flexible methods typically perform best, motivating a new assignment-based method that often improves accuracy vs. a more rigid method. In addition, we demonstrate for the first time that some of these methods can also improve accuracy when combined with Gibbs sampling for inference. However, we show that, with Gibbs, correcting relational bias also requires improving label initialization, and present two new initialization methods that yield large accuracy gains. Finally, we evaluate the effects of relational bias when \"neighbor attributes,\" recently-proposed additions that can provide more stability during inference, are included as model features. We show that such attributes reduce the negative impact of bias, but that using some form of bias correction remains important for achieving maximal accuracy.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many classification problems involve nodes that have a natural connection between them, such as links between people, pages, or social network accounts. Recent work has demonstrated how to learn relational dependencies from these links, then leverage them as predictive features. However, while this can often improve accuracy, the use of linked information can also lead to cascading prediction errors, especially in the common-case when a network is only sparsely-labeled. In response, this paper examines several existing and new methods for correcting the "relational bias" that leads to such errors. First, we explain how existing approaches can be divided into "resemblance-based" and "assignment-based" methods, and provide the first experimental comparison between them. We demonstrate that all of these methods can improve accuracy, but that the former type typically leads to better accuracy. Moreover, we show that the more flexible methods typically perform best, motivating a new assignment-based method that often improves accuracy vs. a more rigid method. In addition, we demonstrate for the first time that some of these methods can also improve accuracy when combined with Gibbs sampling for inference. However, we show that, with Gibbs, correcting relational bias also requires improving label initialization, and present two new initialization methods that yield large accuracy gains. Finally, we evaluate the effects of relational bias when "neighbor attributes," recently-proposed additions that can provide more stability during inference, are included as model features. We show that such attributes reduce the negative impact of bias, but that using some form of bias correction remains important for achieving maximal accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
修正关系偏差以改善稀疏标记网络的分类
许多分类问题涉及节点之间具有自然连接,例如人、页面或社交网络帐户之间的链接。最近的工作演示了如何从这些链接中学习关系依赖,然后利用它们作为预测特性。然而,虽然这通常可以提高准确性,但链接信息的使用也可能导致级联预测错误,特别是在网络只有稀疏标记的常见情况下。作为回应,本文研究了几种现有的和新的方法来纠正导致这种错误的“关系偏差”。首先,我们解释了现有的方法如何分为“基于相似性”和“基于作业”的方法,并提供了它们之间的第一个实验比较。我们证明了所有这些方法都可以提高准确性,但前一种方法通常会带来更好的准确性。此外,我们表明,更灵活的方法通常表现最好,激发了一种新的基于作业的方法,这种方法通常比更严格的方法提高准确性。此外,我们首次证明了其中一些方法在与Gibbs抽样相结合进行推理时也可以提高准确性。然而,我们表明,通过Gibbs,纠正关系偏差也需要改进标签初始化,并提出了两种新的初始化方法,可以产生很大的精度增益。最后,我们评估了当“邻居属性”(最近提出的可以在推理过程中提供更多稳定性的添加)作为模型特征包含时关系偏差的影响。我们表明,这些属性减少了偏差的负面影响,但使用某种形式的偏差校正对于实现最大精度仍然很重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data Task Composition in Crowdsourcing Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours Nonparametric Adjoint-Based Inference for Stochastic Differential Equations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1