EPJ Data Science最新文献

An agent-based model to investigate the effects of urban segregation around the clock on inequalities in health behaviour. 一个基于主体的模型，用于调查城市昼夜隔离对健康行为不平等的影响。

IF 2.5 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2026-01-01 Epub Date: 2025-12-11 DOI: 10.1140/epjds/s13688-025-00603-4

Clémentine Cottineau-Mugadza, Julien Perret, Romain Reuillon, Sébastien Rey-Coyrehourcq, Julie Vallée

Social segregation in cities refers to the uneven spatial distribution of individuals from unequal social groups, such as affluent and economically vulnerable people. Social segregation may, in turn, produce social inequalities through contextual effects, since neighbourhood mixing or concentration plays a role in shaping individuals' opinions and behaviours in multiple life domains, including health. Because segregation and contextual effects occur at the places of residence as well as throughout the day, as people move between locations in a city, we aim to understand the social effect of urban segregation 'around the clock' on health behaviours (such as the choice of a healthy diet), using an empirical agent-based model initialised on the Paris region with a synthetic population. We built this synthetic population by pulling together data from two health & nutrition surveys conducted 6 years apart, data from the French census and data from an origin-destination survey. We then combined scenarios of residential patterns (random allocation vs. census-based allocation reflecting the empirical level of residential segregation) with scenarios of daily mobility (no daily moves, random moves or survey-based daily moves reflecting the empirical level of daytime segregation in Paris) to assess the effect of spatio-temporal segregation on the diffusion of health behaviours. While the same upward trend of healthy behaviours is obtained in all scenarios simulated, we find contrasted results with respect to social inequalities: 1/ when the agents' residence is allocated at random, social inequalities of health decrease in the long run; 2/ randomizing daily mobility can mitigate the increase in social inequalities in dietary behaviours induced by effective residential segregation, with this mitigation effect appearing as soon as a small proportion of daily moves are random; 3/ daytime segregation as it exists in Paris slightly reinforces the unequal distribution of health behaviours between the most and least educated groups compared with the sole effect of residential segregation.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00603-4.

城市的社会隔离是指来自不平等社会群体（如富裕人群和经济弱势人群）的个体在空间上分布不均。社会隔离反过来又可能通过环境影响产生社会不平等，因为邻里混合或集中在包括健康在内的多个生活领域影响个人的意见和行为。由于隔离和环境影响发生在居住地以及全天，当人们在城市中的不同地点之间移动时，我们的目标是了解城市隔离“全天候”对健康行为（例如选择健康饮食）的社会影响，使用基于经验主体的模型初始化巴黎地区的合成人口。我们将相隔6年的两次健康与营养调查数据、法国人口普查数据和出发地调查数据汇总在一起，建立了这个综合人口。然后，我们将居住模式情景（随机分配vs.基于人口普查的分配，反映了居住隔离的经验水平）与日常流动性情景（无日常移动、随机移动或基于调查的日常移动，反映了巴黎白天隔离的经验水平）相结合，以评估时空隔离对健康行为扩散的影响。虽然在所有模拟情景中都获得了相同的健康行为上升趋势，但我们发现关于社会不平等的对比结果：1/当代理人的住所随机分配时，从长远来看，健康的社会不平等减少；2/随机化日常流动性可以缓解因有效的居住隔离而导致的饮食行为社会不平等的加剧，只要有一小部分日常流动性是随机的，这种缓解效果就会出现；3 .与居住隔离的唯一影响相比，巴黎存在的白天隔离稍微加剧了受教育程度最高和最低的群体之间保健行为的不平等分布。补充信息：在线版本包含补充资料，获取地址：10.1140/epjds/s13688-025-00603-4。

{"title":"An agent-based model to investigate the effects of urban segregation around the clock on inequalities in health behaviour.","authors":"Clémentine Cottineau-Mugadza, Julien Perret, Romain Reuillon, Sébastien Rey-Coyrehourcq, Julie Vallée","doi":"10.1140/epjds/s13688-025-00603-4","DOIUrl":"10.1140/epjds/s13688-025-00603-4","url":null,"abstract":"Social segregation in cities refers to the uneven spatial distribution of individuals from unequal social groups, such as affluent and economically vulnerable people. Social segregation may, in turn, produce social inequalities through contextual effects, since neighbourhood mixing or concentration plays a role in shaping individuals' opinions and behaviours in multiple life domains, including health. Because segregation and contextual effects occur at the places of residence as well as throughout the day, as people move between locations in a city, we aim to understand the social effect of urban segregation 'around the clock' on health behaviours (such as the choice of a healthy diet), using an empirical agent-based model initialised on the Paris region with a synthetic population. We built this synthetic population by pulling together data from two health & nutrition surveys conducted 6 years apart, data from the French census and data from an origin-destination survey. We then combined scenarios of residential patterns (random allocation vs. census-based allocation reflecting the empirical level of residential segregation) with scenarios of daily mobility (no daily moves, random moves or survey-based daily moves reflecting the empirical level of daytime segregation in Paris) to assess the effect of spatio-temporal segregation on the diffusion of health behaviours. While the same upward trend of healthy behaviours is obtained in all scenarios simulated, we find contrasted results with respect to social inequalities: 1/ when the agents' residence is allocated at random, social inequalities of health decrease in the long run; 2/ randomizing daily mobility can mitigate the increase in social inequalities in dietary behaviours induced by effective residential segregation, with this mitigation effect appearing as soon as a small proportion of daily moves are random; 3/ daytime segregation as it exists in Paris slightly reinforces the unequal distribution of health behaviours between the most and least educated groups compared with the sole effect of residential segregation.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00603-4.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"15 1","pages":"5"},"PeriodicalIF":2.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12804204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Whose voice matters? Word embeddings reveal identity bias in news quotes. 谁的声音重要？词语嵌入揭示了新闻引用中的身份偏见。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-04-17 DOI: 10.1140/epjds/s13688-025-00541-1

Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus

This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.

本文调查了南非新闻选择和代表COVID-19疫苗接种引用中的身份偏见（性别和种族）。鉴于南非的种族隔离历史，社会偏见研究定性地考察了南非新闻中的种族和性别偏见；然而，使用具有代表性的南非新闻语料库的新闻引用来检查和量化说话者水平上的这些偏见的研究仍然有限。为了解决这一差距，我们研究了新闻选择和引用框架中的种族和性别偏见。我们对2020年至2023年间76个南非新闻来源的22,627个疫苗接种引用进行了单词嵌入训练。这些大规模的处理嵌入在设计上是无偏的，但可以学习和发现隐藏在语言中的偏见。我们的研究结果揭示了新闻选择和引用框架中的性别和种族偏见——记者将白人的声音视为更权威的，与全球和技术疫苗话语联系在一起，但将黑人的声音主要限制在当地语境中。他们在新闻中也比女性更频繁地引用男性说话者的话。在一个人类偏见变得越来越含蓄的时代，我们认为嵌入提供了一个强大的工具，可以在新闻的微观或说话者层面上发现、监测和评估这些偏见。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00541-1获得。

{"title":"Whose voice matters? Word embeddings reveal identity bias in news quotes.","authors":"Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus","doi":"10.1140/epjds/s13688-025-00541-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00541-1","url":null,"abstract":"This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"30"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities. 安全的地方还是有毒的地方？在线饮食失调社区的内容节制和社会动态。

IF 2.5 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-07-25 DOI: 10.1140/epjds/s13688-025-00575-5

Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara

Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.

社交媒体平台已经成为讨论包括饮食失调在内的心理健康问题的关键空间。虽然这些平台可以提供有价值的支持网络，但它们也可能放大美化无序认知和自我毁灭行为的有害内容。虽然社交媒体平台已经实施了各种内容审核策略，从严格的到自由放任的方法，但我们对这些不同的审核实践如何与在线社区中围绕这些敏感的心理健康话题的用户参与相互作用缺乏全面的了解。这项研究通过对2019年至2023年期间Twitter/X（260万条推文）、Reddit（17.8万份提交）和TikTok （14K视频）上关于饮食失调的讨论进行比较分析，解决了这一知识差距。我们的研究结果显示，虽然所有平台的用户都在表达担忧和寻求支持方面表现相似，但适度程度较弱的平台（如Twitter/X）会形成有毒的回音室，放大支持厌食症的言论。这些结果表明，适度策略如何显著影响在线社区的发展和影响，特别是在涉及心理健康和自我伤害的情况下。

{"title":"Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities.","authors":"Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara","doi":"10.1140/epjds/s13688-025-00575-5","DOIUrl":"10.1140/epjds/s13688-025-00575-5","url":null,"abstract":"Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"55"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries. COVID-19大流行期间发展中国家流动行为的社会经济差异。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-03-24 DOI: 10.1140/epjds/s13688-025-00532-2

Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger

Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.

在2019冠状病毒病大流行期间，手机数据在量化人类流动性方面发挥了关键作用。现有的关于人口流动模式的研究主要集中在高收入国家的区域总量上，混淆了疫情对最脆弱人群的严重影响。利用2020年3月至12月期间来自三大洲6个中等收入国家的移动电话用户地理位置数据和人口普查数据，我们发现了不同社会经济群体对大流行的行为反应的共同差异。生活在低财富社区的用户不太可能通过自我隔离、搬迁到农村地区或不上下班来应对。在整个观察期间，社会经济群体之间的行为反应差距持续存在。在生活在低财富社区的用户中，那些在大流行前在高财富社区上班的人特别容易遭受经济压力，既面临高财富社区经济活动的减少，又更有可能受到公共交通关闭的影响，因为他们的通勤距离更长。虽然限制政策主要是在全国范围内实施的，但这些结果表明，当识别弱势群体的数据不容易获得时，基于gps的分析可以帮助设计有针对性的基于地方的政策，以帮助最弱势群体。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00532-2获得。

{"title":"Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries.","authors":"Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger","doi":"10.1140/epjds/s13688-025-00532-2","DOIUrl":"10.1140/epjds/s13688-025-00532-2","url":null,"abstract":"Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"25"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11933202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Milgram's experiment in the knowledge space: individual navigation strategies. 米尔格拉姆在知识空间的实验：个人导航策略。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-06-05 DOI: 10.1140/epjds/s13688-025-00558-6

Manran Zhu, János Kertész

Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in the choices of them. Older, white and female participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Our study connects social navigation to knowledge navigation: individuals' differing tendencies to use geographical and occupational information about the target person to navigate in the social space can be understood as different choices between the hub-driven and proximity-driven strategies in the knowledge space.

我们这个时代的数据泛滥特征导致了信息超载，对我们在数字环境中有效地找到道路提出了重大挑战。解决这个问题需要深入了解我们如何在丰富的信息中导航。先前的研究已经发现了个体在地理、社会和信息空间中导航的多种模式，但在知识空间中导航策略的个体差异仍未得到很大程度的探索。为了缩小差距，我们进行了一个在线实验，参与者在维基百科上玩一个导航游戏，并完成关于他们个人信息的问卷调查。利用英文维基百科的层次结构和在其上训练的图嵌入，我们确定了两种导航策略，并发现它们的选择存在显著的个体差异。年龄较大的、白人和女性参与者倾向于采用“就近驱动”策略，而年轻的参与者则倾向于采用“中心驱动”策略。我们的研究将社会导航与知识导航联系起来：个体在社会空间中使用目标人的地理和职业信息进行导航的不同倾向可以理解为知识空间中中心驱动策略和邻近驱动策略的不同选择。

{"title":"Milgram's experiment in the knowledge space: individual navigation strategies.","authors":"Manran Zhu, János Kertész","doi":"10.1140/epjds/s13688-025-00558-6","DOIUrl":"10.1140/epjds/s13688-025-00558-6","url":null,"abstract":"Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in the choices of them. Older, white and female participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Our study connects social navigation to knowledge navigation: individuals' differing tendencies to use geographical and occupational information about the target person to navigate in the social space can be understood as different choices between the hub-driven and proximity-driven strategies in the knowledge space.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"42"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144247072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection of anomalous spatio-temporal patterns of app traffic in response to catastrophic events. 检测响应灾难性事件的应用程序流量的异常时空模式。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-05-06 DOI: 10.1140/epjds/s13688-025-00546-w

Sofia Medina, Shazia'Ayn Babul, Timothy LaRock, Rohit Sahasrabuddhe, Renaud Lambiotte, Nicola Pedreschi

In this work, we uncover patterns of usage mobile phone applications and information spread in response to perturbations caused by unprecedented events. We focus on categorizing patterns of response in both space and time, tracking their relaxation over time. To this end, we use the NetMob2023 Data Challenge dataset, which provides mobile phone applications traffic volume data for several cities in France at a spatial resolution of 100 $m^{2}$ and a time resolution of 15 minutes for a time period ranging from March to May 2019. We analyze the spread of information before, during, and after the catastrophic Notre-Dame fire on April 15th and a bombing that took place in the city centre of Lyon on May 24th using volume of data uploaded and downloaded to different mobile applications as a proxy of information transfer dynamics. We identify different clusters of information transfer dynamics in response to the Notre-Dame fire within the city of Paris as well as in other major French cities. We find a clear pattern of significantly above-baseline usage of the application Twitter (currently known as X) in Paris that radially spreads from the area surrounding the Notre-Dame cathedral to the rest of the city. We detect a similar pattern in the city of Lyon in response to the bombing. Further, we present a null model of radial information spread and develop methods of tracking radial patterns over time. Overall, we illustrate novel analytical methods we devise, showing how they enable a new perspective on mobile phone user response to unplanned catastrophic events and giving insight into how information spreads during a catastrophe in both time and space.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00546-w.

在这项工作中，我们揭示了使用手机应用程序和信息传播的模式，以应对前所未有的事件引起的扰动。我们专注于在空间和时间上对反应模式进行分类，跟踪它们随时间的放松。为此，我们使用了NetMob2023数据挑战数据集，该数据集提供了2019年3月至5月期间法国几个城市的手机应用流量数据，空间分辨率为100 m2，时间分辨率为15分钟。我们分析了4月15日巴黎圣母院灾难性火灾和5月24日里昂市中心爆炸事件发生之前、期间和之后的信息传播，使用上传和下载到不同移动应用程序的数据量作为信息传递动态的代理。我们确定了巴黎圣母院火灾以及法国其他主要城市中不同的信息传递动态集群。我们发现，在巴黎，应用程序Twitter（目前称为X）的使用率明显高于基线，它从圣母院周围的区域呈放射状扩散到城市的其他地方。我们在里昂发现了对爆炸案的类似反应。此外，我们提出了径向信息传播的零模型，并开发了随时间跟踪径向模式的方法。总体而言，我们阐述了我们设计的新颖分析方法，展示了它们如何为手机用户对计划外灾难性事件的反应提供了新的视角，并深入了解了信息在灾难期间如何在时间和空间上传播。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00546-w获得。

{"title":"Detection of anomalous spatio-temporal patterns of app traffic in response to catastrophic events.","authors":"Sofia Medina, Shazia'Ayn Babul, Timothy LaRock, Rohit Sahasrabuddhe, Renaud Lambiotte, Nicola Pedreschi","doi":"10.1140/epjds/s13688-025-00546-w","DOIUrl":"10.1140/epjds/s13688-025-00546-w","url":null,"abstract":"In this work, we uncover patterns of usage mobile phone applications and information spread in response to perturbations caused by unprecedented events. We focus on categorizing patterns of response in both space and time, tracking their relaxation over time. To this end, we use the NetMob2023 Data Challenge dataset, which provides mobile phone applications traffic volume data for several cities in France at a spatial resolution of 100 <math><msup><mi>m</mi> <mn>2</mn></msup> </math> and a time resolution of 15 minutes for a time period ranging from March to May 2019. We analyze the spread of information before, during, and after the catastrophic Notre-Dame fire on April 15th and a bombing that took place in the city centre of Lyon on May 24th using volume of data uploaded and downloaded to different mobile applications as a proxy of information transfer dynamics. We identify different clusters of information transfer dynamics in response to the Notre-Dame fire within the city of Paris as well as in other major French cities. We find a clear pattern of significantly above-baseline usage of the application Twitter (currently known as X) in Paris that radially spreads from the area surrounding the Notre-Dame cathedral to the rest of the city. We detect a similar pattern in the city of Lyon in response to the bombing. Further, we present a null model of radial information spread and develop methods of tracking radial patterns over time. Overall, we illustrate novel analytical methods we devise, showing how they enable a new perspective on mobile phone user response to unplanned catastrophic events and giving insight into how information spreads during a catastrophe in both time and space.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00546-w.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"35"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143990977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping global value chains at the product level. 在产品层面绘制全球价值链。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-03-12 DOI: 10.1140/epjds/s13688-025-00521-5

Lea Karbevska, César A Hidalgo

Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the "World Input-Output Database", "Inter-Country Input-Output Tables", "EXIOBASE", and "EORA", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.

价值链数据对于应对经济动荡至关重要。然而，尽管它很重要，但我们缺乏公开可用的产品级价值链数据集，因为诸如“世界投入产出数据库”、“国家间投入产出表”、“EXIOBASE”和“EORA”等资源缺乏有关产品的信息（例如无线电接收器、电话、电容器、lcd等），而是依赖于总工业部门（例如电气设备、电信）。在这里，我们引入了一种方法，利用机器学习和贸易理论的思想，从细粒度的国际贸易数据中推断产品级价值链关系。我们将我们的方法应用于汇总1200多种产品和250多个世界地区（如美国各州，日本县等）的进出口数据，以推断其贸易模式中隐含的价值链信息。简而言之，我们利用了这样一种观点，即由于全球价值链的存在，专门从事某种产品出口的地区将倾向于专门从事该产品投入的进口。我们利用这一思想开发了一个新的比例分配模型来估计地区和国家之间的产品级贸易流量。这提供了一种在产品层面近似价值链数据的方法，对于从事物流、贸易和可持续发展工作的人来说应该是感兴趣的。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00521-5获得。

{"title":"Mapping global value chains at the product level.","authors":"Lea Karbevska, César A Hidalgo","doi":"10.1140/epjds/s13688-025-00521-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00521-5","url":null,"abstract":"Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the \"World Input-Output Database\", \"Inter-Country Input-Output Tables\", \"EXIOBASE\", and \"EORA\", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"21"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalisation and profiling using algorithms and not-so-popular Colombian music: goal-directed mechanisms in music emotion recognition. 使用算法和不太流行的哥伦比亚音乐的个性化和分析：音乐情感识别中的目标导向机制。

IF 2.5 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-11-13 DOI: 10.1140/epjds/s13688-025-00595-1

Juan Sebastián Gómez-Cañón, Thomas Magnus Lennie, Tuomas Eerola, Pablo Aragón, Estefanía Cano, Perfecto Herrera, Emilia Gómez

This work investigates how personalised Music Emotion Recognition (MER) systems may lead to sensitive profiling when applied to musically induced emotions in politically charged contexts. We focus on traditional Colombian music with explicit political content, including (1) vallenatos and social songs aligned with the left-wing guerrilla Fuerzas Armadas Revolucionarias de Colombia (FARC), and (2) corridos linked to sympathisers of the right-wing paramilitary group Autodefensas Unidas de Colombia (AUC). Using data from 49 participants with diverse political leanings, we train personalised machine learning models to predict induced emotional responses - particularly negative emotions. Our findings reveal that political identity plays a significant role in shaping emotional experiences of music with explicit political content, and that emotion recognition models can capture this variation to a certain extent. These results raise critical concerns about the potential misuse of emotion recognition technologies. What is often framed as a tool for wellbeing and emotional regulation could, in politically sensitive contexts, be repurposed for user profiling. This work highlights the ethical risks of deploying AI-driven emotion analysis without safeguards, particularly among populations that are politically or socially vulnerable. We argue that subjective emotional responses may constitute sensitive personal data, and that failing to account for their sociopolitical context could amplify harm and exclusion.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00595-1.

这项工作研究了个性化的音乐情感识别（MER）系统如何在政治环境中应用于音乐诱发的情绪时导致敏感的分析。我们专注于具有明确政治内容的传统哥伦比亚音乐，包括(1)与左翼游击队“哥伦比亚革命武装力量”（FARC）结盟的valenatos和社会歌曲，以及(2)与右翼准军事组织“哥伦比亚自治联盟”（AUC）同情者有关的走廊。使用来自49名具有不同政治倾向的参与者的数据，我们训练个性化的机器学习模型来预测诱发的情绪反应——尤其是负面情绪。我们的研究结果表明，政治认同在塑造具有明确政治内容的音乐的情感体验中起着重要作用，并且情感识别模型可以在一定程度上捕捉这种变化。这些结果引起了人们对情绪识别技术可能被滥用的严重担忧。在政治敏感的背景下，通常被视为健康和情绪调节工具的东西，可能会被重新用于用户分析。这项工作强调了在没有保障措施的情况下部署人工智能驱动的情感分析的伦理风险，特别是在政治或社会弱势群体中。我们认为，主观情绪反应可能构成敏感的个人数据，如果不考虑其社会政治背景，可能会放大伤害和排斥。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00595-1获得。

{"title":"Personalisation and profiling using algorithms and not-so-popular Colombian music: goal-directed mechanisms in music emotion recognition.","authors":"Juan Sebastián Gómez-Cañón, Thomas Magnus Lennie, Tuomas Eerola, Pablo Aragón, Estefanía Cano, Perfecto Herrera, Emilia Gómez","doi":"10.1140/epjds/s13688-025-00595-1","DOIUrl":"10.1140/epjds/s13688-025-00595-1","url":null,"abstract":"This work investigates how personalised Music Emotion Recognition (MER) systems may lead to sensitive profiling when applied to musically induced emotions in politically charged contexts. We focus on traditional Colombian music with explicit political content, including (1) vallenatos and social songs aligned with the left-wing guerrilla Fuerzas Armadas Revolucionarias de Colombia (FARC), and (2) corridos linked to sympathisers of the right-wing paramilitary group Autodefensas Unidas de Colombia (AUC). Using data from 49 participants with diverse political leanings, we train personalised machine learning models to predict induced emotional responses - particularly negative emotions. Our findings reveal that political identity plays a significant role in shaping emotional experiences of music with explicit political content, and that emotion recognition models can capture this variation to a certain extent. These results raise critical concerns about the potential misuse of emotion recognition technologies. What is often framed as a tool for wellbeing and emotional regulation could, in politically sensitive contexts, be repurposed for user profiling. This work highlights the ethical risks of deploying AI-driven emotion analysis without safeguards, particularly among populations that are politically or socially vulnerable. We argue that subjective emotional responses may constitute sensitive personal data, and that failing to account for their sociopolitical context could amplify harm and exclusion.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00595-1.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"80"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145539583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Endogenous labour flow networks. 内生劳动力流动网络。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-05-21 DOI: 10.1140/epjds/s13688-025-00539-9

Kathyrn R Fair, Omar A Guerrero

In the last decade, the study of labour dynamics has led to the introduction of labour flow networks (LFNs) as a way to conceptualise job-to-job transitions, and to the development of mathematical models to explore the dynamics of these networked flows. To date, LFN models have relied upon an assumption of static network structure. However, as recent events (increasing automation in the workplace, the COVID-19 pandemic, a surge in the demand for programming skills, etc.) have shown, we are experiencing drastic shifts in the job landscape that are altering the ways individuals navigate the labour market. Here we develop a novel model that emerges LFNs from agent-level behaviour, removing the necessity of assuming that future job-to-job flows will be along the same paths where they have been historically observed. This model, informed by economic theory and microdata for the United Kingdom, generates empirical LFNs with a high level of accuracy. We use the model to explore how shocks impacting the underlying distributions of jobs and wages alter the topology of the LFN. This framework represents a crucial step towards the development of models that can answer questions about the future of work in an ever-changing world.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00539-9.

在过去的十年中，对劳动力动态的研究导致了劳动力流动网络（LFNs）的引入，作为概念化工作到工作转换的一种方式，并导致了数学模型的发展，以探索这些网络流动的动态。迄今为止，LFN模型依赖于静态网络结构的假设。然而，正如最近发生的事件（工作场所自动化程度的提高、COVID-19大流行、对编程技能的需求激增等）所表明的那样，我们正在经历就业形势的剧烈变化，这正在改变个人在劳动力市场上的定位方式。在这里，我们开发了一个从代理级行为中出现LFNs的新模型，消除了假设未来工作到工作的流动将沿着历史上观察到的相同路径的必要性。该模型以英国的经济理论和微观数据为依据，产生了高度准确的经验性LFNs。我们使用该模型来探索影响工作和工资潜在分布的冲击如何改变LFN的拓扑结构。这个框架代表着朝着模型的发展迈出了关键的一步，这些模型可以在不断变化的世界中回答有关未来工作的问题。补充信息：在线版本包含补充资料，可在10.1140/epjds/s13688-025-00539-9获得。

{"title":"Endogenous labour flow networks.","authors":"Kathyrn R Fair, Omar A Guerrero","doi":"10.1140/epjds/s13688-025-00539-9","DOIUrl":"10.1140/epjds/s13688-025-00539-9","url":null,"abstract":"In the last decade, the study of labour dynamics has led to the introduction of labour flow networks (LFNs) as a way to conceptualise job-to-job transitions, and to the development of mathematical models to explore the dynamics of these networked flows. To date, LFN models have relied upon an assumption of static network structure. However, as recent events (increasing automation in the workplace, the COVID-19 pandemic, a surge in the demand for programming skills, etc.) have shown, we are experiencing drastic shifts in the job landscape that are altering the ways individuals navigate the labour market. Here we develop a novel model that emerges LFNs from agent-level behaviour, removing the necessity of assuming that future job-to-job flows will be along the same paths where they have been historically observed. This model, informed by economic theory and microdata for the United Kingdom, generates empirical LFNs with a high level of accuracy. We use the model to explore how shocks impacting the underlying distributions of jobs and wages alter the topology of the LFN. This framework represents a crucial step towards the development of models that can answer questions about the future of work in an ever-changing world.Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00539-9.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"39"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12095427/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Endogenous conflict and the limits of predictive optimization. 内生冲突与预测优化的局限性。

IF 2.5 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2025-01-01 Epub Date: 2025-11-21 DOI: 10.1140/epjds/s13688-025-00599-x

Thomas Chadefaux, Thomas Schincariol

Forecasting models in political violence research increasingly rely on high-dimensional covariates and machine learning. Yet in practice, the most reliable conflict forecasts often come from much simpler systems: autoregressive models that predict future events based solely on recent past outcomes. This paper argues that such models are not merely convenient baselines but theoretically appropriate tools for sparse, dynamic environments like armed conflict. We show that autoregressive models consistently outperform or match more complex alternatives across multiple countries and specifications, while structural covariates frequently add little or degrade performance. We explain this pattern both theoretically and empirically: conflict is driven by internal feedback, burstiness, and short-term adaptation-not by slow-changing structural conditions. By foregrounding the limits of causal modeling in high-entropy settings, we make a broader case for epistemic modesty in prediction. Autoregression, we argue, is not a shortcut, but a principled strategy in systems that resist control.

政治暴力研究中的预测模型越来越依赖于高维协变量和机器学习。然而，在实践中，最可靠的冲突预测往往来自更简单的系统：仅根据最近的过去结果预测未来事件的自回归模型。本文认为，这些模型不仅是方便的基线，而且在理论上适用于稀疏的、动态的环境，如武装冲突。我们表明，自回归模型在多个国家和规范中始终优于或匹配更复杂的替代方案，而结构协变量经常增加很少或降低性能。我们从理论上和经验上解释了这种模式：冲突是由内部反馈、突发性和短期适应驱动的，而不是由缓慢变化的结构条件驱动的。通过强调在高熵设置中因果建模的局限性，我们为预测中的认知谦虚提出了更广泛的案例。我们认为，自回归不是捷径，而是抵抗控制的系统中的原则性策略。

引用次数: 0