S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias
{"title":"比较标签聚类算法挖掘Twitter用户兴趣","authors":"S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias","doi":"10.1109/SOCIALCOM.2013.102","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.","PeriodicalId":129308,"journal":{"name":"2013 International Conference on Social Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparing Tag Clustering Algorithms for Mining Twitter Users' Interests\",\"authors\":\"S. S. Rodríguez, Ana Fernández Vilas, R. Redondo, J. Pazos-Arias\",\"doi\":\"10.1109/SOCIALCOM.2013.102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.\",\"PeriodicalId\":129308,\"journal\":{\"name\":\"2013 International Conference on Social Computing\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Social Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOCIALCOM.2013.102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCIALCOM.2013.102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing Tag Clustering Algorithms for Mining Twitter Users' Interests
This paper addresses the problem of mining users' interest from the vast, noise, unstructured and dynamic data generated on social media sites, taking Twitter as case study. The mining process uses different Natural Language Processing techniques to extract the relevant words from subscribers' tweets and applies cluster analysis over them. We evaluate the performance of three different tag clustering algorithms -PAM, Affinity Propagation and UPGMA- when considering the hyperlink structure of Wikipedia as external source for semantic closeness among words. We provide a solution which can be developed without any a-priori knowledge about the number and category of topics, neither a priori knowledge about the users we are applying the extraction for. This solution is based on using an unsupervised measure of the clustering quality (Silhouette width) to estimate the parameters of the cluster analysis. Finally, as human feedback is not as reliable as expected, we validate the approach by using Twitter hash tags - the implicit classifying method used by Twitter users to organise their tweets.