首页 > 最新文献

EPJ Data Science最新文献

英文 中文
Online advertisement in a pink-colored market 粉色市场中的在线广告
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-08 DOI: 10.1140/epjds/s13688-024-00473-2
Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas

It is surprising that women are often charged more for products and services marketed explicitly to them. This phenomenon, known as the pink tax, is a major issue that questions women’s buying power. Nevertheless, it is not just limited to physical products – even online advertising can be subject to this type of gender-price discrimination. That is where our research comes in. We have developed a new methodology to measure what we call the digital marketing pink tax – the additional expense of delivering advertisements to female audiences. Analyzing data from Facebook advertising platforms across 187 countries and 40 territories shows this issue is systematic. Particularly, the digital marketing pink tax is prevalent in 79% of audiences across the world and 98% of audiences in highly developed countries. Therefore, advertisers incur a median cost of 30% more to display advertisements to women than men. In contrast, advertisers have to pay less digital marketing pink tax in less-developed countries (5%). Our research indicates that countries in the Middle East and Africa with a low Human Development Index (HDI) do not experience this phenomenon. Our comprehensive investigation of 24 industries reveals that advertisers must pay up to 64% of the digital marketing pink tax to target women in some industries. Our findings also suggest a connection between the digital marketing pink tax and the consumer pink tax – the extra charge placed on products marketed to women. Overall, our research sheds light on an important issue affecting women worldwide. Raising awareness of the digital marketing pink tax and advocating for better regulation.

令人吃惊的是,专门针对女性销售的产品和服务往往向女性收取更高的费用。这种现象被称为 "粉红税",是质疑女性购买力的一个重要问题。然而,这种现象并不局限于实体产品,即使是网络广告也会受到这种性别价格歧视的影响。这正是我们研究的重点所在。我们开发了一种新的方法来衡量我们所说的数字营销粉红税--向女性受众投放广告的额外费用。通过分析来自 187 个国家和 40 个地区的 Facebook 广告平台的数据,我们发现这个问题是系统性的。尤其是,在全球 79% 的受众和 98% 的高度发达国家受众中,数字营销粉红税普遍存在。因此,广告商向女性展示广告的成本中位数要比男性高出 30%。相比之下,在欠发达国家,广告商需要支付的数字营销粉红税更少(5%)。我们的研究表明,人类发展指数(HDI)较低的中东和非洲国家并没有出现这种现象。我们对 24 个行业的全面调查显示,在某些行业中,广告商必须支付高达 64% 的数字营销粉红税,才能将目标对准女性。我们的研究结果还表明,数字营销粉红税与消费者粉红税之间存在联系--消费者粉红税是对面向女性销售的产品征收的额外费用。总之,我们的研究揭示了影响全球女性的一个重要问题。提高对数字营销粉红税的认识,倡导更好的监管。
{"title":"Online advertisement in a pink-colored market","authors":"Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas","doi":"10.1140/epjds/s13688-024-00473-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00473-2","url":null,"abstract":"<p>It is surprising that women are often charged more for products and services marketed explicitly to them. This phenomenon, known as the pink tax, is a major issue that questions women’s buying power. Nevertheless, it is not just limited to physical products – even online advertising can be subject to this type of gender-price discrimination. That is where our research comes in. We have developed a new methodology to measure what we call the digital marketing pink tax – the additional expense of delivering advertisements to female audiences. Analyzing data from Facebook advertising platforms across 187 countries and 40 territories shows this issue is systematic. Particularly, the digital marketing pink tax is prevalent in 79% of audiences across the world and 98% of audiences in highly developed countries. Therefore, advertisers incur a median cost of 30% more to display advertisements to women than men. In contrast, advertisers have to pay less digital marketing pink tax in less-developed countries (5%). Our research indicates that countries in the Middle East and Africa with a low Human Development Index (<i>HDI</i>) do not experience this phenomenon. Our comprehensive investigation of 24 industries reveals that advertisers must pay up to 64% of the digital marketing pink tax to target women in some industries. Our findings also suggest a connection between the digital marketing pink tax and the consumer pink tax – the extra charge placed on products marketed to women. Overall, our research sheds light on an important issue affecting women worldwide. Raising awareness of the digital marketing pink tax and advocating for better regulation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"59 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Who makes open source code? The hybridisation of commercial and open source practices 谁在编写开放源代码?商业和开源实践的混合
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-06 DOI: 10.1140/epjds/s13688-024-00475-0
Peter Mehler, Eva Iris Otto, Anna Sapienza

While Free and Open Source (F/OSS) coding has traditionally been described as a separate commons linked to values of openness and sharing, recent research suggests an increasing integration of private corporations into F/OSS practices, blurring the boundaries between F/OSS and commodified coding. However, there is a dearth of empirical, and especially quantitative studies exploring this phenomenon. To address this gap, we model the power dynamics and infrastructural aspects of software production within GitHub, a central hub for F/OSS development, using a large-scale, directed network. Using various network statistics, we detect the ecosystem’s most impactful actors and find a nuanced picture of the influence of individuals, open source organizations, and private corporations in F/OSS practices. We find that the majority of public repositories on GitHub depend on a small core of specialized repositories and users. In accordance with expectations, individuals and open source organizations are more prevalent in this core of elite GitHub users, however, we also find a significant amount of private organizations with an indirect, yet consistent influence within GitHub. In addition, we find that directly influential individuals tend to facilitate sponsorship methods more often than indirectly or non-influential individuals. Our research highlights a hybridization of F/OSS and sheds light on the complex interplay between influence, power, and code production in the multi-language dependency ecosystem of GitHub.

尽管自由与开源(F/OSS)编码传统上被描述为与开放和共享价值相关的独立公共资源,但最近的研究表明,私营企业越来越多地融入到 F/OSS 的实践中,模糊了 F/OSS 与商品化编码之间的界限。然而,探索这一现象的实证研究,尤其是定量研究却十分匮乏。为了填补这一空白,我们利用一个大规模的定向网络,对 F/OSS 开发中心 GitHub 内软件生产的权力动态和基础设施方面进行了建模。通过使用各种网络统计数据,我们发现了生态系统中最具影响力的参与者,并发现了个人、开源组织和私营企业在 F/OSS 实践中的细微影响。我们发现,GitHub 上的大多数公共源依赖于一小部分核心专业源和用户。个人和开源组织在 GitHub 的核心精英用户中更为普遍,但我们也发现大量私营组织在 GitHub 中具有间接但持续的影响力。此外,我们还发现,与间接或无影响力的个人相比,直接有影响力的个人更倾向于促进赞助方法。我们的研究凸显了 F/OSS 的混合,并揭示了 GitHub 多语言依赖生态系统中影响力、权力和代码生成之间复杂的相互作用。
{"title":"Who makes open source code? The hybridisation of commercial and open source practices","authors":"Peter Mehler, Eva Iris Otto, Anna Sapienza","doi":"10.1140/epjds/s13688-024-00475-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00475-0","url":null,"abstract":"<p>While Free and Open Source (F/OSS) coding has traditionally been described as a separate commons linked to values of openness and sharing, recent research suggests an increasing integration of private corporations into F/OSS practices, blurring the boundaries between F/OSS and commodified coding. However, there is a dearth of empirical, and especially quantitative studies exploring this phenomenon. To address this gap, we model the power dynamics and infrastructural aspects of software production within GitHub, a central hub for F/OSS development, using a large-scale, directed network. Using various network statistics, we detect the ecosystem’s most impactful actors and find a nuanced picture of the influence of individuals, open source organizations, and private corporations in F/OSS practices. We find that the majority of public repositories on GitHub depend on a small core of specialized repositories and users. In accordance with expectations, individuals and open source organizations are more prevalent in this core of elite GitHub users, however, we also find a significant amount of private organizations with an indirect, yet consistent influence within GitHub. In addition, we find that directly influential individuals tend to facilitate sponsorship methods more often than indirectly or non-influential individuals. Our research highlights a hybridization of F/OSS and sheds light on the complex interplay between influence, power, and code production in the multi-language dependency ecosystem of GitHub.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"61 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation using large language models: A new typology of American neighborhoods 使用大型语言模型进行分类:美国社区的新类型
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-22 DOI: 10.1140/epjds/s13688-024-00466-1
Alex D. Singleton, Seth Spielman

In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.

在美国,国家统计系统最近的变化扩大了地理-人口分辨率的权衡。也就是说,在处理来自美国社区调查的人口和经济数据时,随着地理上的放大,由于误差幅度非常大,人口上的分辨率也会随之降低。在本文中,我们利用美国社区调查(ACS)的小区域估算数据,以基于人工智能的开放式、可重现的美国地理人口分类系统的形式,提出了这一问题的解决方案。我们对一系列社会经济、人口和建筑环境变量采用了分区聚类算法。我们的方法采用开源软件管道,可确保对未来数据更新的适应性。一个关键的创新是整合了 GPT4(一种最先进的大型语言模型),以生成直观的聚类描述和名称。这代表了自然语言处理在地理人口研究中的新应用,并展示了人类与人工智能在地理空间领域的合作潜力。
{"title":"Segmentation using large language models: A new typology of American neighborhoods","authors":"Alex D. Singleton, Seth Spielman","doi":"10.1140/epjds/s13688-024-00466-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00466-1","url":null,"abstract":"<p>In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early career wins and tournament prestige characterize tennis players’ trajectories 职业生涯早期的胜利和赛事声望是网球运动员发展轨迹的特征
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-19 DOI: 10.1140/epjds/s13688-024-00472-3
Chiara Zappalà, Sandro Sousa, Tiago Cunha, Alessandro Pluchino, Andrea Rapisarda, Roberta Sinatra

Success in sports is a complex phenomenon that has only garnered limited research attention. In particular, we lack a deep scientific understanding of success in sports like tennis and the factors that contribute to it. Here, we study the unfolding of tennis players’ careers to understand the role of early career stages and the impact of specific tournaments on players’ trajectories. We employ a comprehensive approach combining network science and analysis of the Association of Tennis Professionals (ATP) tournament data and introduce a novel method to quantify tournament prestige based on the eigenvector centrality of the co-attendance network of tournaments. Focusing on the interplay between participation in central tournaments and players’ performance, we find that the level of the tournament where players achieve their first win is associated with becoming a top player. This work sheds light on the critical role of the initial stages in the progression of players’ careers, offering valuable insights into the dynamics of success in tennis.

体育运动中的成功是一种复杂的现象,但其研究成果却非常有限。尤其是,我们对网球等运动项目中的成功以及促成成功的因素缺乏深入的科学认识。在此,我们研究了网球运动员职业生涯的发展,以了解职业生涯早期阶段的作用以及特定赛事对运动员轨迹的影响。我们采用了一种将网络科学与网球职业运动员协会(ATP)赛事数据分析相结合的综合方法,并引入了一种基于赛事共同出席网络特征向量中心度的新方法来量化赛事声望。我们重点研究了参加中心赛事与球员表现之间的相互作用,发现球员取得首胜的赛事级别与成为顶级球员有关。这项研究揭示了初始阶段在球员职业生涯发展中的关键作用,为了解网球运动的成功动力提供了宝贵的见解。
{"title":"Early career wins and tournament prestige characterize tennis players’ trajectories","authors":"Chiara Zappalà, Sandro Sousa, Tiago Cunha, Alessandro Pluchino, Andrea Rapisarda, Roberta Sinatra","doi":"10.1140/epjds/s13688-024-00472-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00472-3","url":null,"abstract":"<p>Success in sports is a complex phenomenon that has only garnered limited research attention. In particular, we lack a deep scientific understanding of success in sports like tennis and the factors that contribute to it. Here, we study the unfolding of tennis players’ careers to understand the role of early career stages and the impact of specific tournaments on players’ trajectories. We employ a comprehensive approach combining network science and analysis of the Association of Tennis Professionals (ATP) tournament data and introduce a novel method to quantify tournament prestige based on the eigenvector centrality of the co-attendance network of tournaments. Focusing on the interplay between participation in central tournaments and players’ performance, we find that the level of the tournament where players achieve their first win is associated with becoming a top player. This work sheds light on the critical role of the initial stages in the progression of players’ careers, offering valuable insights into the dynamics of success in tennis.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multifaceted online coordinated behavior in the 2020 US presidential election 2020 年美国总统大选中的多方面在线协调行为
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-19 DOI: 10.1140/epjds/s13688-024-00467-0
Serena Tardelli, Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, Maurizio Tesconi

Organized attempts to manipulate public opinion during election run-ups have dominated online debates in the last few years. Such attempts require numerous accounts to act in coordination to exert influence. Yet, the ways in which coordinated behavior surfaces during major online political debates is still largely unclear. This study sheds light on coordinated behaviors that took place on Twitter (now X) during the 2020 US Presidential Election. Utilizing state-of-the-art network science methods, we detect and characterize the coordinated communities that participated in the debate. Our approach goes beyond previous analyses by proposing a multifaceted characterization of the coordinated communities that allows obtaining nuanced results. In particular, we uncover three main categories of coordinated users: (i) moderate groups genuinely interested in the electoral debate, (ii) conspiratorial groups that spread false information and divisive narratives, and (iii) foreign influence networks that either sought to tamper with the debate or that exploited it to publicize their own agendas. We also reveal a large use of automation by far-right foreign influence and conspiratorial communities. Conversely, left-leaning supporters were overall less coordinated and engaged primarily in harmless, factual communication. Our results also showed that Twitter was effective at thwarting the activity of some coordinated groups, while it failed on some other equally suspicious ones. Overall, this study advances the understanding of online human interactions and contributes new knowledge to mitigate cyber social threats.

在过去几年中,有组织地试图在选举前夕操纵公众舆论已成为网络辩论的主要话题。这种企图需要众多账户协调行动来施加影响。然而,在主要的网络政治辩论中,协调行为是如何出现的在很大程度上还不清楚。本研究揭示了 2020 年美国总统大选期间 Twitter(现为 X)上发生的协调行为。利用最先进的网络科学方法,我们检测并描述了参与辩论的协调社区。我们的方法超越了以往的分析,提出了协调社区的多方面特征,从而获得了细致入微的结果。特别是,我们发现了三大类协调用户:(i) 真正对选举辩论感兴趣的温和团体,(ii) 散布虚假信息和分裂言论的阴谋团体,(iii) 试图篡改辩论或利用辩论宣传自身议程的外国影响力网络。我们还发现极右翼外国势力和阴谋团体大量使用自动化手段。相反,左翼支持者总体上协调性较差,主要从事无害的事实性交流。我们的研究结果还显示,Twitter 能有效阻止一些协调团体的活动,但对其他一些同样可疑的团体却无能为力。总之,这项研究加深了人们对网络人际互动的理解,并为减轻网络社交威胁贡献了新的知识。
{"title":"Multifaceted online coordinated behavior in the 2020 US presidential election","authors":"Serena Tardelli, Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, Maurizio Tesconi","doi":"10.1140/epjds/s13688-024-00467-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00467-0","url":null,"abstract":"<p>Organized attempts to manipulate public opinion during election run-ups have dominated online debates in the last few years. Such attempts require numerous accounts to <i>act in coordination</i> to exert influence. Yet, the ways in which coordinated behavior surfaces during major online political debates is still largely unclear. This study sheds light on coordinated behaviors that took place on Twitter (now X) during the 2020 US Presidential Election. Utilizing state-of-the-art network science methods, we detect and characterize the coordinated communities that participated in the debate. Our approach goes beyond previous analyses by proposing a multifaceted characterization of the coordinated communities that allows obtaining nuanced results. In particular, we uncover three main categories of coordinated users: (<i>i</i>) moderate groups genuinely interested in the electoral debate, (<i>ii</i>) conspiratorial groups that spread false information and divisive narratives, and (<i>iii</i>) foreign influence networks that either sought to tamper with the debate or that exploited it to publicize their own agendas. We also reveal a large use of automation by far-right foreign influence and conspiratorial communities. Conversely, left-leaning supporters were overall less coordinated and engaged primarily in harmless, factual communication. Our results also showed that Twitter was effective at thwarting the activity of some coordinated groups, while it failed on some other equally suspicious ones. Overall, this study advances the understanding of online human interactions and contributes new knowledge to mitigate cyber social threats.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"48 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a hierarchical model for unraveling conspiracy theories 建立揭示阴谋论的分层模型
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-16 DOI: 10.1140/epjds/s13688-024-00470-5
Mohsen Ghasemizade, Jeremiah Onaolapo

A conspiracy theory (CT) suggests covert groups or powerful individuals secretly manipulate events. Not knowing about existing conspiracy theories could make one more likely to believe them, so this work aims to compile a list of CTs shaped as a tree that is as comprehensive as possible. We began with a manually curated ‘tree’ of CTs from academic papers and Wikipedia. Next, we examined 1769 CT-related articles from four fact-checking websites, focusing on their core content, and used a technique called Keyphrase Extraction to label the documents. This process yielded 769 identified conspiracies, each assigned a label and a family name. The second goal of this project was to detect whether an article is a conspiracy theory, so we built a binary classifier with our labeled dataset. This model uses a transformer-based machine learning technique and is pre-trained on a large corpus called RoBERTa, resulting in an F1 score of 87%. This model helps to identify potential conspiracy theories in new articles. We used a combination of clustering (HDBSCAN) and a dimension reduction technique (UMAP) to assign a label from the tree to these new articles detected as conspiracy theories. We then labeled these groups accordingly to help us match them to the tree. These can lead us to detect new conspiracy theories and expand the tree using computational methods. We successfully generated a tree of conspiracy theories and built a pipeline to detect and categorize conspiracy theories within any text corpora. This pipeline gives us valuable insights through any databases formatted as text.

阴谋论(CT)是指秘密团体或有权势的个人暗中操纵事件。不了解现有的阴谋论可能会让人更容易相信它们,因此这项工作旨在编制一份尽可能全面的阴谋论树状列表。我们首先从学术论文和维基百科中人工编辑了一棵 CT "树"。接下来,我们检查了四个事实核查网站中与 CT 相关的 1769 篇文章,重点关注其核心内容,并使用一种名为 "关键词提取 "的技术对文档进行标注。在此过程中,我们识别出了 769 个阴谋,每个阴谋都有一个标签和姓氏。这个项目的第二个目标是检测一篇文章是否是阴谋论,因此我们用标注过的数据集建立了一个二元分类器。该模型使用了基于变换器的机器学习技术,并在名为 RoBERTa 的大型语料库上进行了预训练,结果 F1 得分为 87%。该模型有助于识别新文章中潜在的阴谋论。我们结合使用了聚类(HDBSCAN)和降维技术(UMAP),为这些被检测为阴谋论的新文章分配树标签。然后,我们对这些组进行相应的标记,以帮助我们将它们与树进行匹配。这些可以帮助我们检测出新的阴谋论,并使用计算方法扩展树。我们成功生成了一棵阴谋论树,并建立了一个在任何文本语料库中检测和分类阴谋论的管道。通过该管道,我们可以从任何文本格式的数据库中获得有价值的见解。
{"title":"Developing a hierarchical model for unraveling conspiracy theories","authors":"Mohsen Ghasemizade, Jeremiah Onaolapo","doi":"10.1140/epjds/s13688-024-00470-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00470-5","url":null,"abstract":"<p>A conspiracy theory (CT) suggests covert groups or powerful individuals secretly manipulate events. Not knowing about existing conspiracy theories could make one more likely to believe them, so this work aims to compile a list of CTs shaped as a tree that is as comprehensive as possible. We began with a manually curated ‘tree’ of CTs from academic papers and Wikipedia. Next, we examined 1769 CT-related articles from four fact-checking websites, focusing on their core content, and used a technique called Keyphrase Extraction to label the documents. This process yielded 769 identified conspiracies, each assigned a label and a family name. The second goal of this project was to detect whether an article is a conspiracy theory, so we built a binary classifier with our labeled dataset. This model uses a transformer-based machine learning technique and is pre-trained on a large corpus called RoBERTa, resulting in an F1 score of 87%. This model helps to identify potential conspiracy theories in new articles. We used a combination of clustering (HDBSCAN) and a dimension reduction technique (UMAP) to assign a label from the tree to these new articles detected as conspiracy theories. We then labeled these groups accordingly to help us match them to the tree. These can lead us to detect new conspiracy theories and expand the tree using computational methods. We successfully generated a tree of conspiracy theories and built a pipeline to detect and categorize conspiracy theories within any text corpora. This pipeline gives us valuable insights through any databases formatted as text.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140610910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaling law of real traffic jams under varying travel demand 不同出行需求下实际交通拥堵的缩放规律
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-11 DOI: 10.1140/epjds/s13688-024-00471-4
Rui Chen, Yuming Lin, Huan Yan, Jiazhen Liu, Yu Liu, Yong Li

The escalation of urban traffic congestion has reached a critical extent due to rapid urbanization, capturing considerable attention within urban science and transportation research. Although preceding studies have validated the scale-free distributions in spatio-temporal congestion clusters across cities, the influence of travel demand on that distribution has yet to be explored. Using a unique traffic dataset during the COVID-19 pandemic in Shanghai 2022, we present empirical evidence that travel demand plays a pivotal role in shaping the scaling laws of traffic congestion. We uncover a noteworthy negative linear correlation between the travel demand and the traffic resilience represented by scaling exponents of congestion cluster size and recovery duration. Additionally, we reveal that travel demand broadly dominates the scale of congestion in the form of scaling laws, including the aggregated volume of congestion clusters, the number of congestion clusters, and the number of congested roads. Subsequent micro-level analysis of congestion propagation also unveils that cascade diffusion determines the demand sensitivity of congestion, while other intrinsic components, namely spontaneous generation and dissipation, are rather stable. Our findings of traffic congestion under diverse travel demand can profoundly enrich our understanding of the scale-free nature of traffic congestion and provide insights into internal mechanisms of congestion propagation.

随着城市化进程的加快,城市交通拥堵问题日益严重,引起了城市科学和交通研究领域的广泛关注。尽管之前的研究已经验证了城市间拥堵时空集群的无标度分布,但出行需求对该分布的影响仍有待探索。利用 2022 年上海 COVID-19 大流行期间的独特交通数据集,我们提出了实证证据,证明出行需求在形成交通拥堵的缩放规律方面发挥了关键作用。我们发现,出行需求与交通弹性之间存在值得注意的负线性相关关系,而交通弹性则由拥堵集群规模和恢复持续时间的缩放指数来表示。此外,我们还揭示了出行需求在拥堵规模的缩放规律(包括拥堵集群的总量、拥堵集群的数量以及拥堵道路的数量)中占据着广泛的主导地位。随后对拥堵传播的微观分析也揭示出,级联扩散决定了拥堵的需求敏感性,而其他内在成分,即自发生成和耗散,则相当稳定。我们对不同出行需求下交通拥堵的研究结果,可以深刻地丰富我们对交通拥堵无标度性质的理解,并为我们提供对拥堵传播内部机制的见解。
{"title":"Scaling law of real traffic jams under varying travel demand","authors":"Rui Chen, Yuming Lin, Huan Yan, Jiazhen Liu, Yu Liu, Yong Li","doi":"10.1140/epjds/s13688-024-00471-4","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00471-4","url":null,"abstract":"<p>The escalation of urban traffic congestion has reached a critical extent due to rapid urbanization, capturing considerable attention within urban science and transportation research. Although preceding studies have validated the scale-free distributions in spatio-temporal congestion clusters across cities, the influence of travel demand on that distribution has yet to be explored. Using a unique traffic dataset during the COVID-19 pandemic in Shanghai 2022, we present empirical evidence that travel demand plays a pivotal role in shaping the scaling laws of traffic congestion. We uncover a noteworthy negative linear correlation between the travel demand and the traffic resilience represented by scaling exponents of congestion cluster size and recovery duration. Additionally, we reveal that travel demand broadly dominates the scale of congestion in the form of scaling laws, including the aggregated volume of congestion clusters, the number of congestion clusters, and the number of congested roads. Subsequent micro-level analysis of congestion propagation also unveils that cascade diffusion determines the demand sensitivity of congestion, while other intrinsic components, namely spontaneous generation and dissipation, are rather stable. Our findings of traffic congestion under diverse travel demand can profoundly enrich our understanding of the scale-free nature of traffic congestion and provide insights into internal mechanisms of congestion propagation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"38 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140563982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Suspended accounts align with the Internet Research Agency misinformation campaign to influence the 2016 US election 被暂停的账户与 "互联网研究机构 "影响 2016 年美国大选的虚假信息活动一致
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-10 DOI: 10.1140/epjds/s13688-024-00464-3
Matteo Serafino, Zhenkun Zhou, José S. Andrade, Alexandre Bovet, Hernán A. Makse

The ongoing debate surrounding the impact of the Internet Research Agency’s (IRA) social media campaign during the 2016 U.S. presidential election has largely overshadowed the involvement of other actors. Our analysis brings to light a substantial group of suspended Twitter users, outnumbering the IRA user group by a factor of 60, who align with the ideologies of the IRA campaign. Our study demonstrates that this group of suspended Twitter accounts significantly influenced individuals categorized as undecided or weak supporters, potentially with the aim of swaying their opinions, as indicated by Granger causality.

围绕互联网研究机构(IRA)在 2016 年美国总统大选期间的社交媒体活动所产生的影响而展开的持续辩论在很大程度上掩盖了其他参与者的参与。我们的分析揭示了一大批被暂停推特账号的用户,其人数比 IRA 用户多出 60 倍,他们与 IRA 运动的意识形态一致。我们的研究表明,正如格兰杰因果关系所显示的那样,这群被暂停的推特账户极大地影响了被归类为未决定或弱支持者的个人,其目的可能是左右他们的观点。
{"title":"Suspended accounts align with the Internet Research Agency misinformation campaign to influence the 2016 US election","authors":"Matteo Serafino, Zhenkun Zhou, José S. Andrade, Alexandre Bovet, Hernán A. Makse","doi":"10.1140/epjds/s13688-024-00464-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00464-3","url":null,"abstract":"<p>The ongoing debate surrounding the impact of the Internet Research Agency’s (IRA) social media campaign during the 2016 U.S. presidential election has largely overshadowed the involvement of other actors. Our analysis brings to light a substantial group of suspended Twitter users, outnumbering the IRA user group by a factor of 60, who align with the ideologies of the IRA campaign. Our study demonstrates that this group of suspended Twitter accounts significantly influenced individuals categorized as undecided or weak supporters, potentially with the aim of swaying their opinions, as indicated by Granger causality.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"49 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140564095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the silent majority: stance detection and characterization of passive users on social media using collaborative filtering and graph convolutional networks 揭开沉默的大多数的面纱:利用协同过滤和图卷积网络对社交媒体上的被动用户进行立场检测和特征描述
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-04 DOI: 10.1140/epjds/s13688-024-00469-y

Abstract

Social Media (SM) has become a popular medium for individuals to share their opinions on various topics, including politics, social issues, and daily affairs. During controversial events such as political elections, active users often proclaim their stance and try to persuade others to support them. However, disparities in participation levels can lead to misperceptions and cause analysts to misjudge the support for each side. For example, current models usually rely on content production and overlook a vast majority of civically engaged users who passively consume information. These “silent users” can significantly impact the democratic process despite being less vocal. Accounting for the stances of this silent majority is critical to improving our reliance on SM to understand and measure social phenomena. Thus, this study proposes and evaluates a new approach for silent users’ stance prediction based on collaborative filtering and Graph Convolutional Networks, which exploits multiple relationships between users and topics. Furthermore, our method allows us to describe users with different stances and online behaviors. We demonstrate its validity using real-world datasets from two related political events. Specifically, we examine user attitudes leading to the Chilean constitutional referendums in 2020 and 2022 through extensive Twitter datasets. In both datasets, our model outperforms the baselines by over 9% at the edge- and the user level. Thus, our method offers an improvement in effectively quantifying the support and creating a multidimensional understanding of social discussions on SM platforms, especially during polarizing events.

摘要 社交媒体(SM)已成为个人就政治、社会问题和日常事务等各种话题分享观点的流行媒介。在政治选举等有争议的事件中,活跃的用户往往会宣布自己的立场,并试图说服他人支持自己。然而,参与程度的差异会导致误解,使分析人员错误判断各方的支持率。例如,当前的模型通常依赖于内容生产,而忽略了绝大多数被动消费信息的公民参与用户。这些 "沉默的用户 "尽管声音较小,却能对民主进程产生重大影响。考虑到这一沉默的大多数的立场,对于改善我们对 SM 的依赖以理解和衡量社会现象至关重要。因此,本研究提出并评估了一种基于协同过滤和图卷积网络的沉默用户立场预测新方法,该方法利用了用户和话题之间的多重关系。此外,我们的方法还能描述具有不同立场和在线行为的用户。我们使用两个相关政治事件的真实数据集证明了该方法的有效性。具体来说,我们通过广泛的 Twitter 数据集研究了用户对 2020 年和 2022 年智利宪法公投的态度。在这两个数据集中,我们的模型在边缘和用户层面的表现均优于基线模型 9% 以上。因此,我们的方法在有效量化支持度和多维度理解 SM 平台上的社会讨论方面有所改进,尤其是在极化事件中。
{"title":"Unveiling the silent majority: stance detection and characterization of passive users on social media using collaborative filtering and graph convolutional networks","authors":"","doi":"10.1140/epjds/s13688-024-00469-y","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00469-y","url":null,"abstract":"<h3>Abstract</h3> <p>Social Media (SM) has become a popular medium for individuals to share their opinions on various topics, including politics, social issues, and daily affairs. During controversial events such as political elections, active users often proclaim their stance and try to persuade others to support them. However, disparities in participation levels can lead to misperceptions and cause analysts to misjudge the support for each side. For example, current models usually rely on content production and overlook a vast majority of civically engaged users who passively consume information. These “silent users” can significantly impact the democratic process despite being less vocal. Accounting for the stances of this silent majority is critical to improving our reliance on SM to understand and measure social phenomena. Thus, this study proposes and evaluates a new approach for silent users’ stance prediction based on collaborative filtering and Graph Convolutional Networks, which exploits multiple relationships between users and topics. Furthermore, our method allows us to describe users with different stances and online behaviors. We demonstrate its validity using real-world datasets from two related political events. Specifically, we examine user attitudes leading to the Chilean constitutional referendums in 2020 and 2022 through extensive Twitter datasets. In both datasets, our model outperforms the baselines by over 9% at the edge- and the user level. Thus, our method offers an improvement in effectively quantifying the support and creating a multidimensional understanding of social discussions on SM platforms, especially during polarizing events.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"32 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140563977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Science as exploration in a knowledge landscape: tracing hotspots or seeking opportunity? 科学是知识景观中的探索:追踪热点还是寻找机会?
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-02 DOI: 10.1140/epjds/s13688-024-00468-z

Abstract

The selection of research topics by scientists can be viewed as an exploration process conducted by individuals with cognitive limitations traversing a complex cognitive landscape influenced by both individual and social factors. While existing theoretical investigations have provided valuable insights, the intricate and multifaceted nature of modern science hinders the implementation of empirical experiments. This study leverages advancements in Geographic Information System (GIS) techniques to investigate the patterns and dynamic mechanisms of topic-transition among scientists. By constructing the knowledge space across 6 large-scale disciplines, we depict the trajectories of scientists’ topic transitions within this space, measuring the flow and distance of research regions across different sub-spaces. Our findings reveal a predominantly conservative pattern of topic transition at the individual level, with scientists primarily exploring local knowledge spaces. Furthermore, simulation modeling analysis identifies research intensity, driven by the concentration of scientists within a specific region, as the key facilitator of topic transition. Conversely, the knowledge distance between fields serves as a significant barrier to exploration. Notably, despite potential opportunities for breakthrough discoveries at the intersection of subfields, empirical evidence suggests that these opportunities do not exert a strong pull on scientists, leading them to favor familiar research areas. Our study provides valuable insights into the exploration dynamics of scientific knowledge production, highlighting the influence of individual cognition, social factors, and the intrinsic structure of the knowledge landscape itself. These findings offer a framework for understanding and potentially shaping the course of scientific progress.

摘要 科学家对研究课题的选择可以被看作是有认知局限的个体在受个人和社会因素影响的复杂认知环境中进行的探索过程。虽然现有的理论研究提供了有价值的见解,但现代科学错综复杂的多面性阻碍了实证实验的实施。本研究利用先进的地理信息系统(GIS)技术来研究科学家之间话题转换的模式和动态机制。通过构建 6 个大型学科的知识空间,我们描绘了科学家在这一空间内的话题转换轨迹,测量了研究区域在不同子空间内的流动和距离。我们的研究结果表明,在个人层面上,主题转换的模式以保守为主,科学家主要探索本地知识空间。此外,模拟建模分析发现,由特定区域内科学家的集中程度所驱动的研究强度是课题转换的主要促进因素。相反,领域之间的知识距离则是探索的一大障碍。值得注意的是,尽管在子领域的交叉点有可能出现突破性发现,但经验证据表明,这些机会并没有对科学家产生强大的吸引力,导致他们倾向于熟悉的研究领域。我们的研究为科学知识生产的探索动力提供了宝贵的见解,突出了个人认知、社会因素和知识景观本身内在结构的影响。这些发现为理解并可能塑造科学进步的进程提供了一个框架。
{"title":"Science as exploration in a knowledge landscape: tracing hotspots or seeking opportunity?","authors":"","doi":"10.1140/epjds/s13688-024-00468-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00468-z","url":null,"abstract":"<h3>Abstract</h3> <p>The selection of research topics by scientists can be viewed as an exploration process conducted by individuals with cognitive limitations traversing a complex cognitive landscape influenced by both individual and social factors. While existing theoretical investigations have provided valuable insights, the intricate and multifaceted nature of modern science hinders the implementation of empirical experiments. This study leverages advancements in Geographic Information System (GIS) techniques to investigate the patterns and dynamic mechanisms of topic-transition among scientists. By constructing the knowledge space across 6 large-scale disciplines, we depict the trajectories of scientists’ topic transitions within this space, measuring the flow and distance of research regions across different sub-spaces. Our findings reveal a predominantly conservative pattern of topic transition at the individual level, with scientists primarily exploring local knowledge spaces. Furthermore, simulation modeling analysis identifies research intensity, driven by the concentration of scientists within a specific region, as the key facilitator of topic transition. Conversely, the knowledge distance between fields serves as a significant barrier to exploration. Notably, despite potential opportunities for breakthrough discoveries at the intersection of subfields, empirical evidence suggests that these opportunities do not exert a strong pull on scientists, leading them to favor familiar research areas. Our study provides valuable insights into the exploration dynamics of scientific knowledge production, highlighting the influence of individual cognition, social factors, and the intrinsic structure of the knowledge landscape itself. These findings offer a framework for understanding and potentially shaping the course of scientific progress.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"2013 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140564049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
EPJ Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1