基于解释因素的网络内容流行度建模与预测方法

Jong Gun Lee, S. Moon, Kave Salamatian
{"title":"基于解释因素的网络内容流行度建模与预测方法","authors":"Jong Gun Lee, S. Moon, Kave Salamatian","doi":"10.1109/WI-IAT.2010.209","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the likelihood that a content will be popular. Our approach is rooted in survival analysis where predicting the precise lifetime of an individual is very hard and almost impossible but predicting the likelihood of one's survival longer than a threshold or another individual is possible. We position ourselves in the standpoint of an external observer who has to infer the popularity of a content only using publicly observable metrics, such as the lifetime of a thread, the number of comments, and the number of views. Our goal is to infer these observable metrics, using a set of explanatory factors, such as the number of comments and the number of links in the first hours after the content publication, which are observable by the external observer. We use a Cox proportional hazard regression model that divides the distribution function of the observable popularity metric into two components: a) one that can be explained by the given set of explanatory factors (called risk factors) and b) a baseline distribution function that integrates all the factors not taken into account. To validate our proposed approach, we use data sets from two different online discussion forums: dpreview.com, one of the largest online discussion groups providing news and discussion forums about all kinds of digital cameras, and myspace.com, one of the representative online social networking services. On these two data sets we model two different popularity metrics, the lifetime of threads and the number of comments, and show that our approach can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5~6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2~3 days.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"115","resultStr":"{\"title\":\"An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors\",\"authors\":\"Jong Gun Lee, S. Moon, Kave Salamatian\",\"doi\":\"10.1109/WI-IAT.2010.209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the likelihood that a content will be popular. Our approach is rooted in survival analysis where predicting the precise lifetime of an individual is very hard and almost impossible but predicting the likelihood of one's survival longer than a threshold or another individual is possible. We position ourselves in the standpoint of an external observer who has to infer the popularity of a content only using publicly observable metrics, such as the lifetime of a thread, the number of comments, and the number of views. Our goal is to infer these observable metrics, using a set of explanatory factors, such as the number of comments and the number of links in the first hours after the content publication, which are observable by the external observer. We use a Cox proportional hazard regression model that divides the distribution function of the observable popularity metric into two components: a) one that can be explained by the given set of explanatory factors (called risk factors) and b) a baseline distribution function that integrates all the factors not taken into account. To validate our proposed approach, we use data sets from two different online discussion forums: dpreview.com, one of the largest online discussion groups providing news and discussion forums about all kinds of digital cameras, and myspace.com, one of the representative online social networking services. On these two data sets we model two different popularity metrics, the lifetime of threads and the number of comments, and show that our approach can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5~6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2~3 days.\",\"PeriodicalId\":340211,\"journal\":{\"name\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"115\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2010.209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2010.209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 115

摘要

在本文中,我们提出了一种预测网络内容受欢迎程度的方法。更准确地说,我们不是试图推断内容本身的受欢迎程度,而是推断内容受欢迎的可能性。我们的方法植根于生存分析,在生存分析中,准确预测一个人的寿命非常困难,几乎是不可能的,但预测一个人活得比另一个人的阈值更长的可能性是可能的。我们把自己定位在一个外部观察者的立场上,他必须只使用公开可观察的指标来推断内容的受欢迎程度,比如线程的生命周期、评论的数量和观看的数量。我们的目标是使用一组解释性因素来推断这些可观察的指标,例如在内容发布后的第一个小时内的评论数量和链接数量,这些都是外部观察者可以观察到的。我们使用Cox比例风险回归模型,该模型将可观察到的流行度度量的分布函数分为两个部分:a)可以由给定的解释因素(称为风险因素)集解释的部分,b)集成所有未考虑因素的基线分布函数。为了验证我们提出的方法,我们使用了来自两个不同在线讨论论坛的数据集:dpreview.com,一个最大的在线讨论小组,提供各种数码相机的新闻和讨论论坛,以及myspace.com,一个代表性的在线社交网络服务。在这两个数据集上,我们对两个不同的流行度指标——线程寿命和评论数进行了建模,并表明我们的方法可以通过观察一个线程在前5~6天(分别为24小时)来预测来自Dpreview (Myspace)的线程的寿命,通过观察一个线程在前2~3天的评论数来预测Dpreview线程的评论数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors
In this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the likelihood that a content will be popular. Our approach is rooted in survival analysis where predicting the precise lifetime of an individual is very hard and almost impossible but predicting the likelihood of one's survival longer than a threshold or another individual is possible. We position ourselves in the standpoint of an external observer who has to infer the popularity of a content only using publicly observable metrics, such as the lifetime of a thread, the number of comments, and the number of views. Our goal is to infer these observable metrics, using a set of explanatory factors, such as the number of comments and the number of links in the first hours after the content publication, which are observable by the external observer. We use a Cox proportional hazard regression model that divides the distribution function of the observable popularity metric into two components: a) one that can be explained by the given set of explanatory factors (called risk factors) and b) a baseline distribution function that integrates all the factors not taken into account. To validate our proposed approach, we use data sets from two different online discussion forums: dpreview.com, one of the largest online discussion groups providing news and discussion forums about all kinds of digital cameras, and myspace.com, one of the representative online social networking services. On these two data sets we model two different popularity metrics, the lifetime of threads and the number of comments, and show that our approach can predict the lifetime of threads from Dpreview (Myspace) by observing a thread during the first 5~6 days (24 hours, respectively) and the number of comments of Dpreview threads by observing a thread during first 2~3 days.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Game Theory for Security: Lessons Learned from Deployed Applications A Decision Rule Method for Assessing the Completeness and Consistency of a Data Warehouse Semantic Structure Content for Dynamic Web Pages Enhancing the Performance of Metadata Service for Cloud Computing Improving Diversity of Focused Summaries through the Negative Endorsements of Redundant Facts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1