Estimating sharer reputation via social data calibration

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487685

Jaewon Yang, Bee-Chung Chen, D. Agarwal

{"title":"Estimating sharer reputation via social data calibration","authors":"Jaewon Yang, Bee-Chung Chen, D. Agarwal","doi":"10.1145/2487575.2487685","DOIUrl":null,"url":null,"abstract":"Online social networks have become important channels for users to share content with their connections and diffuse information. Although much work has been done to identify socially influential users, the problem of finding \"reputable\" sharers, who share good content, has received relatively little attention. Availability of such reputation scores can be useful or various applications like recommending people to follow, procuring high quality content in a scalable way, creating a content reputation economy to incentivize high quality sharing, and many more. To estimate sharer reputation, it is intuitive to leverage data that records how recipients respond (through clicking, liking, etc.) to content items shared by a sharer. However, such data is usually biased --- it has a selection bias since the shared items can only be seen and responded to by users connected to the sharer in most social networks, and it has a response bias since the response is usually influenced by the relationship between the sharer and the recipient (which may not indicate whether the shared content is good). To correct for such biases, we propose to utilize an additional data source that provides unbiased goodness estimates for a small set of shared items, and calibrate biased social data through a novel multi-level hierarchical model that describes how the unbiased data and biased data are jointly generated according to sharer reputation scores. The unbiased data also provides the ground truth for quantitative evaluation of different methods. Experiments based on such ground-truth data show that our proposed model significantly outperforms existing methods that estimate social influence using biased social data.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Online social networks have become important channels for users to share content with their connections and diffuse information. Although much work has been done to identify socially influential users, the problem of finding "reputable" sharers, who share good content, has received relatively little attention. Availability of such reputation scores can be useful or various applications like recommending people to follow, procuring high quality content in a scalable way, creating a content reputation economy to incentivize high quality sharing, and many more. To estimate sharer reputation, it is intuitive to leverage data that records how recipients respond (through clicking, liking, etc.) to content items shared by a sharer. However, such data is usually biased --- it has a selection bias since the shared items can only be seen and responded to by users connected to the sharer in most social networks, and it has a response bias since the response is usually influenced by the relationship between the sharer and the recipient (which may not indicate whether the shared content is good). To correct for such biases, we propose to utilize an additional data source that provides unbiased goodness estimates for a small set of shared items, and calibrate biased social data through a novel multi-level hierarchical model that describes how the unbiased data and biased data are jointly generated according to sharer reputation scores. The unbiased data also provides the ground truth for quantitative evaluation of different methods. Experiments based on such ground-truth data show that our proposed model significantly outperforms existing methods that estimate social influence using biased social data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过社会数据校准估计共享者声誉

在线社交网络已经成为用户与好友分享内容、传播信息的重要渠道。尽管在识别有社会影响力的用户方面已经做了很多工作，但寻找分享优质内容的“有信誉的”分享者的问题却相对较少受到关注。这种声誉评分的可用性对各种应用程序都很有用，比如推荐值得关注的人、以可扩展的方式获取高质量的内容、创建内容声誉经济以激励高质量的分享等等。为了估计分享者的声誉，利用记录接收者如何回应(通过点击、点赞等)的数据是很直观的。然而，这样的数据通常是有偏差的——它有选择偏差，因为在大多数社交网络中，共享的项目只能被连接到分享者的用户看到和回应，它有响应偏差，因为响应通常受到分享者和接受者之间关系的影响(这可能不能表明共享的内容是否好)。为了纠正这种偏差，我们建议利用一个额外的数据源，为一小部分共享项目提供无偏优度估计，并通过一个新的多层次层次模型来校准有偏的社会数据，该模型描述了如何根据共享者声誉分数共同生成无偏数据和有偏数据。无偏数据也为不同方法的定量评价提供了基础真理。基于这些基本事实数据的实验表明，我们提出的模型明显优于使用有偏见的社会数据估计社会影响的现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories