Savvas Zannettou, T. Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, G. Stringhini, Guillermo Suarez-Tangil
Internet memes are increasingly used to sway and manipulate public opinion. This prompts the need to study their propagation, evolution, and influence across the Web. In this paper, we detect and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, over the course of 13 months. We group the images posted on fringe Web communities (/pol/, Gab, and The_Donald subreddit) into clusters, annotate them using meme metadata obtained from Know Your Meme, and also map images from mainstream communities (Twitter and Reddit) to the clusters. Our analysis provides an assessment of the popularity and diversity of memes in the context of each community, showing, e.g., that racist memes are extremely common in fringe Web communities. We also find a substantial number of politics-related memes on both mainstream and fringe Web communities, supporting media reports that memes might be used to enhance or harm politicians. Finally, we use Hawkes processes to model the interplay between Web communities and quantify their reciprocal influence, finding that /pol/ substantially influences the meme ecosystem with the number of memes it produces, while The_Donald has a higher success rate in pushing them to other communities.
网络表情包越来越多地被用来左右和操纵公众舆论。这促使我们有必要研究它们在整个网络中的传播、演变和影响。在本文中,我们使用基于感知哈希和聚类技术的处理管道,以及从Twitter, Reddit, 4chan's political Incorrect board (/pol/)和Gab收集的26条帖子中收集的1.6亿张图片的数据集,在13个月的时间里,检测和测量了模因在多个Web社区中的传播。我们将边缘网络社区(/pol/, Gab和The_Donald subreddit)上发布的图片分组,使用Know Your meme获得的meme元数据对其进行注释,并将主流社区(Twitter和Reddit)的图片映射到集群中。我们的分析对每个社区背景下模因的受欢迎程度和多样性进行了评估,结果显示,例如,种族主义模因在边缘网络社区中极为普遍。我们还在主流和边缘网络社区中发现了大量与政治相关的模因,这支持了媒体关于模因可能被用来提升或伤害政治家的报道。最后,我们使用Hawkes过程来模拟网络社区之间的相互作用,并量化它们的相互影响,发现/pol/通过其产生的模因数量实质性地影响了模因生态系统,而The_Donald在将它们推送到其他社区方面具有更高的成功率。
{"title":"On the Origins of Memes by Means of Fringe Web Communities","authors":"Savvas Zannettou, T. Caulfield, Jeremy Blackburn, Emiliano De Cristofaro, Michael Sirivianos, G. Stringhini, Guillermo Suarez-Tangil","doi":"10.1145/3278532.3278550","DOIUrl":"https://doi.org/10.1145/3278532.3278550","url":null,"abstract":"Internet memes are increasingly used to sway and manipulate public opinion. This prompts the need to study their propagation, evolution, and influence across the Web. In this paper, we detect and measure the propagation of memes across multiple Web communities, using a processing pipeline based on perceptual hashing and clustering techniques, and a dataset of 160M images from 2.6B posts gathered from Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab, over the course of 13 months. We group the images posted on fringe Web communities (/pol/, Gab, and The_Donald subreddit) into clusters, annotate them using meme metadata obtained from Know Your Meme, and also map images from mainstream communities (Twitter and Reddit) to the clusters. Our analysis provides an assessment of the popularity and diversity of memes in the context of each community, showing, e.g., that racist memes are extremely common in fringe Web communities. We also find a substantial number of politics-related memes on both mainstream and fringe Web communities, supporting media reports that memes might be used to enhance or harm politicians. Finally, we use Hawkes processes to model the interplay between Web communities and quantify their reciprocal influence, finding that /pol/ substantially influences the meme ecosystem with the number of memes it produces, while The_Donald has a higher success rate in pushing them to other communities.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83622714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quirin Scheitle, O. Hohlfeld, Julien Gamba, Jonas Jelten, T. Zimmermann, Stephen D. Strowes, N. Vallina-Rodriguez
A broad range of research areas including Internet measurement, privacy, and network security rely on lists of target domains to be analysed; researchers make use of target lists for reasons of necessity or efficiency. The popular Alexa list of one million domains is a widely used example. Despite their prevalence in research papers, the soundness of top lists has seldom been questioned by the community: little is known about the lists' creation, representativity, potential biases, stability, or overlap between lists. In this study we survey the extent, nature, and evolution of top lists used by research communities. We assess the structure and stability of these lists, and show that rank manipulation is possible for some lists. We also reproduce the results of several scientific studies to assess the impact of using a top list at all, which list specifically, and the date of list creation. We find that (i) top lists generally overestimate results compared to the general population by a significant margin, often even an order of magnitude, and (ii) some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability. We conclude our paper with specific recommendations on the use of top lists, and how to interpret results based on top lists with caution.
{"title":"A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists","authors":"Quirin Scheitle, O. Hohlfeld, Julien Gamba, Jonas Jelten, T. Zimmermann, Stephen D. Strowes, N. Vallina-Rodriguez","doi":"10.1145/3278532.3278574","DOIUrl":"https://doi.org/10.1145/3278532.3278574","url":null,"abstract":"A broad range of research areas including Internet measurement, privacy, and network security rely on lists of target domains to be analysed; researchers make use of target lists for reasons of necessity or efficiency. The popular Alexa list of one million domains is a widely used example. Despite their prevalence in research papers, the soundness of top lists has seldom been questioned by the community: little is known about the lists' creation, representativity, potential biases, stability, or overlap between lists. In this study we survey the extent, nature, and evolution of top lists used by research communities. We assess the structure and stability of these lists, and show that rank manipulation is possible for some lists. We also reproduce the results of several scientific studies to assess the impact of using a top list at all, which list specifically, and the date of list creation. We find that (i) top lists generally overestimate results compared to the general population by a significant margin, often even an order of magnitude, and (ii) some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability. We conclude our paper with specific recommendations on the use of top lists, and how to interpret results based on top lists with caution.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73576621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Beverly, Ramakrishnan Durairajan, D. Plonka, Justin P. Rohrer
Existing methods for active topology discovery within the IPv6 Internet largely mirror those of IPv4. In light of the large and sparsely populated address space, in conjunction with aggressive ICMPv6 rate limiting by routers, this work develops a different approach to Internet-wide IPv6 topology mapping. We adopt randomized probing techniques in order to distribute probing load, minimize the effects of rate limiting, and probe at higher rates. Second, we extensively analyze the efficiency and efficacy of various IPv6 hitlists and target generation methods when used for topology discovery, and synthesize new target lists based on our empirical results to provide both breadth (coverage across networks) and depth (to find potential subnetting). Employing our probing strategy, we discover more than 1.3M IPv6 router interface addresses from a single vantage point. Finally, we share our prober implementation, synthesized target lists, and discovered IPv6 topology results.
{"title":"In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery","authors":"Robert Beverly, Ramakrishnan Durairajan, D. Plonka, Justin P. Rohrer","doi":"10.1145/3278532.3278559","DOIUrl":"https://doi.org/10.1145/3278532.3278559","url":null,"abstract":"Existing methods for active topology discovery within the IPv6 Internet largely mirror those of IPv4. In light of the large and sparsely populated address space, in conjunction with aggressive ICMPv6 rate limiting by routers, this work develops a different approach to Internet-wide IPv6 topology mapping. We adopt randomized probing techniques in order to distribute probing load, minimize the effects of rate limiting, and probe at higher rates. Second, we extensively analyze the efficiency and efficacy of various IPv6 hitlists and target generation methods when used for topology discovery, and synthesize new target lists based on our empirical results to provide both breadth (coverage across networks) and depth (to find potential subnetting). Employing our probing strategy, we discover more than 1.3M IPv6 router interface addresses from a single vantage point. Finally, we share our prober implementation, synthesized target lists, and discovered IPv6 topology results.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87250007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}