比较标签聚类算法挖掘Twitter用户兴趣

S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias
{"title":"比较标签聚类算法挖掘Twitter用户兴趣","authors":"S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias","doi":"10.1109/SOCIALCOM.2013.102","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.","PeriodicalId":129308,"journal":{"name":"2013 International Conference on Social Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparing Tag Clustering Algorithms for Mining Twitter Users' Interests\",\"authors\":\"S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias\",\"doi\":\"10.1109/SOCIALCOM.2013.102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.\",\"PeriodicalId\":129308,\"journal\":{\"name\":\"2013 International Conference on Social Computing\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Social Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOCIALCOM.2013.102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCIALCOM.2013.102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

本文以Twitter为例,从社交媒体网站产生的大量、噪音、非结构化和动态数据中挖掘用户兴趣。挖掘过程使用不同的自然语言处理技术从订阅者的推文中提取相关单词,并对其进行聚类分析。我们评估了三种不同的标签聚类算法——pam、Affinity Propagation和UPGMA——在考虑Wikipedia的超链接结构作为单词之间语义紧密度的外部来源时的性能。我们提供了一个解决方案,它可以在没有任何关于主题数量和类别的先验知识的情况下开发,也不需要关于我们正在应用提取的用户的先验知识。该解决方案基于使用无监督的聚类质量度量(剪影宽度)来估计聚类分析的参数。最后,由于人类反馈不像预期的那样可靠,我们通过使用Twitter哈希标签(Twitter用户用来组织其推文的隐式分类方法)来验证该方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparing Tag Clustering Algorithms for Mining Twitter Users' Interests
This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Group Recommendation Algorithm with Collaborative Filtering Access Control Policy Extraction from Unconstrained Natural Language Text Stock Market Manipulation Using Cyberattacks Together with Misinformation Disseminated through Social Media Friendship Prediction on Social Network Users An Empirical Comparison of Graph Databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1