这些网址在微博消息中真的很流行吗?

Anqi Cui, Min Zhang, Yiqun Liu, Shaoping Ma
{"title":"这些网址在微博消息中真的很流行吗?","authors":"Anqi Cui, Min Zhang, Yiqun Liu, Shaoping Ma","doi":"10.1109/CCIS.2011.6045021","DOIUrl":null,"url":null,"abstract":"Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.","PeriodicalId":128504,"journal":{"name":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Are the URLs really popular in microblog messages?\",\"authors\":\"Anqi Cui, Min Zhang, Yiqun Liu, Shaoping Ma\",\"doi\":\"10.1109/CCIS.2011.6045021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.\",\"PeriodicalId\":128504,\"journal\":{\"name\":\"2011 IEEE International Conference on Cloud Computing and Intelligence Systems\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cloud Computing and Intelligence Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCIS.2011.6045021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS.2011.6045021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

微博服务正在吸引人们和公司来分享他们的想法和兴趣。由于微博消息的文本有限,人们发布url链接到其他网站以获取详细信息。因此,关注度高的url传播广泛,代表着流行的信息。然而,并非所有这些url都是有用的。其中许多是垃圾网址,这些网址是由自动代理或自动从其他网站推送服务发布的。根据流行url的特征,将其分为四类,并提出了一种聚类分类算法来区分垃圾url和真正流行的url。对英文(Twitter)和中文(新浪微博)的信息进行对比实验。我们得出的结论是,超过一半的流行url是垃圾邮件。其中大多数是从其他网站推送过来的;即使是真正受欢迎的,也会从推送服务中获得很多关注。虽然Twitter和新浪微博消息中url的比例不同,但垃圾url的特征是相似的。该方法可以有效地检测出无标注的垃圾网址及其作者,对微博研究和商业都有一定的帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Are the URLs really popular in microblog messages?
Microblogging services are attracting people and companies to share their ideas and interests. Since the texts of microblog messages are limited, people post URLs to link to other websites for detailed information. Hence, URLs with higher attentions are spread widely and represent popular information. However, not all these URLs are useful. Many of them are spam URLs which are posted by automated agents or by pushing services from other websites automatically. Based on the features of the popular URLs, we divide them into four categories and propose a clustering and classification algorithm to distinguish spam URLs from the really popular ones. Comparative experiments are conducted on English (Twitter) and Chinese (Sina Weibo) messages. We conclude that more than half of the popular URLs are spam. Most of them are pushed from other websites; even the really popular ones gain much attention from the pushing services. Although the proportions of URLs in Twitter and Sina Weibo messages are different, the characteristics of the spam URLs are similar. Our method is efficient for detecting spam URLs and their authors without annotations, and is helpful for both research and business on microblog.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A dynamic and integrated load-balancing scheduling algorithm for Cloud datacenters A CPU-GPU hybrid computing framework for real-time clothing animation The communication of CAN bus used in synchronization control of multi-motor based on DSP An improved dynamic provable data possession model Ensuring the data integrity in cloud data storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1