Content-Based Methods for Predicting Web-Site Demographic Attributes

Santosh Kabbur, Eui-Hong Han, G. Karypis
{"title":"Content-Based Methods for Predicting Web-Site Demographic Attributes","authors":"Santosh Kabbur, Eui-Hong Han, G. Karypis","doi":"10.1109/ICDM.2010.97","DOIUrl":null,"url":null,"abstract":"Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using information derived from their content and their hyper linked structure and not relying on any information directly or indirectly obtained from the web-site's users. Such methods are important because users are becoming increasingly more concerned about sharing their personal and behavioral information on the Internet. Regression-based approaches are developed and studied for predicting demographic attributes that utilize different content-derived features, different ways of building the prediction models, and different ways of aggregating web-page level predictions that take into account the web's hyper linked structure. In addition, a matrix-approximation based approach is developed for coupling the predictions of individual regression models into a model designed to predict the probability mass function of the attribute. Extensive experiments show that these methods are able to achieve an RMSE of 8-10% and provide insights on how to best train and apply such models.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using information derived from their content and their hyper linked structure and not relying on any information directly or indirectly obtained from the web-site's users. Such methods are important because users are becoming increasingly more concerned about sharing their personal and behavioral information on the Internet. Regression-based approaches are developed and studied for predicting demographic attributes that utilize different content-derived features, different ways of building the prediction models, and different ways of aggregating web-page level predictions that take into account the web's hyper linked structure. In addition, a matrix-approximation based approach is developed for coupling the predictions of individual regression models into a model designed to predict the probability mass function of the attribute. Extensive experiments show that these methods are able to achieve an RMSE of 8-10% and provide insights on how to best train and apply such models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于内容的网站人口统计属性预测方法
人口统计信息在获得有关网站用户基础的有价值的见解方面起着重要作用,并广泛用于有针对性的在线广告和促销活动。本文研究了预测网站人口统计属性的机器学习方法,使用来自网站内容和超链接结构的信息,而不依赖于任何直接或间接从网站用户那里获得的信息。这些方法很重要,因为用户越来越关心在互联网上分享他们的个人和行为信息。基于回归的方法被开发和研究用于预测人口统计属性,这些属性利用不同的内容派生的特征,不同的构建预测模型的方法,以及不同的汇总网页级预测的方法(考虑到网络的超链接结构)。此外,还开发了一种基于矩阵逼近的方法,用于将单个回归模型的预测耦合到一个模型中,以预测属性的概率质量函数。大量的实验表明,这些方法能够实现8-10%的RMSE,并为如何最好地训练和应用这些模型提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generalized Probabilistic Matrix Factorizations for Collaborative Filtering MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model Finding Local Anomalies in Very High Dimensional Space Efficient Probabilistic Latent Semantic Analysis with Sparsity Control Enhancing Single-Objective Projective Clustering Ensembles
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1