Pub Date : 2026-01-01Epub Date: 2026-02-09DOI: 10.1140/epjds/s13688-026-00627-4
Jenny Martinez, Javier Argota Sánchez-Vaquerizo, Sachit Mahajan
As data-driven "smart city" agendas expand across Latin America, most urban performance metrics remain focused on infrastructure, connectivity, and aggregate efficiency, often neglecting who truly benefits. Urban greenery, a vital determinant of health and climate resilience, is one such blind spot. While some frameworks now consider "green space," they do so at a coarse, citywide scale, overlooking how access is distributed across neighborhoods and social groups. This obscures critical equity gaps, particularly in cities marked by deep socio-spatial segregation. In this study, we develop a fully reproducible geospatial pipeline that integrates high-resolution canopy height models, public park data, gridded population estimates, and socioeconomic strata to assess how greenery is distributed, not just how much exists. Applied to Bogotá and Medellín, the method reveals stark disparities: population-weighted canopy coverage rises significantly between the lowest and highest strata, while access to public parks also shows measurable inequality, especially in high-density, underserved neighborhoods. These inequities persist despite progressive greening policies, revealing the limits of optimization when legacy segregation is ignored. Our open-source pipeline enables finer-grained, justice-oriented audits that go beyond averages to identify where greenery and its benefits are most lacking. By enabling fine-grained equity assessments, this approach underscores the importance of greenery distribution, not just quantity, as a critical indicator for inclusive and equitable smart cities.
{"title":"Not your mean green: beyond averages in mapping socio-spatial inequities in urban greenery for smart cities.","authors":"Jenny Martinez, Javier Argota Sánchez-Vaquerizo, Sachit Mahajan","doi":"10.1140/epjds/s13688-026-00627-4","DOIUrl":"https://doi.org/10.1140/epjds/s13688-026-00627-4","url":null,"abstract":"<p><p>As data-driven \"smart city\" agendas expand across Latin America, most urban performance metrics remain focused on infrastructure, connectivity, and aggregate efficiency, often neglecting who truly benefits. Urban greenery, a vital determinant of health and climate resilience, is one such blind spot. While some frameworks now consider \"green space,\" they do so at a coarse, citywide scale, overlooking how access is distributed across neighborhoods and social groups. This obscures critical equity gaps, particularly in cities marked by deep socio-spatial segregation. In this study, we develop a fully reproducible geospatial pipeline that integrates high-resolution canopy height models, public park data, gridded population estimates, and socioeconomic strata to assess how greenery is distributed, not just how much exists. Applied to Bogotá and Medellín, the method reveals stark disparities: population-weighted canopy coverage rises significantly between the lowest and highest strata, while access to public parks also shows measurable inequality, especially in high-density, underserved neighborhoods. These inequities persist despite progressive greening policies, revealing the limits of optimization when legacy segregation is ignored. Our open-source pipeline enables finer-grained, justice-oriented audits that go beyond averages to identify where greenery and its benefits are most lacking. By enabling fine-grained equity assessments, this approach underscores the importance of greenery distribution, not just quantity, as a critical indicator for inclusive and equitable smart cities.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"15 1","pages":"25"},"PeriodicalIF":2.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12979267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147467364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social segregation in cities refers to the uneven spatial distribution of individuals from unequal social groups, such as affluent and economically vulnerable people. Social segregation may, in turn, produce social inequalities through contextual effects, since neighbourhood mixing or concentration plays a role in shaping individuals' opinions and behaviours in multiple life domains, including health. Because segregation and contextual effects occur at the places of residence as well as throughout the day, as people move between locations in a city, we aim to understand the social effect of urban segregation 'around the clock' on health behaviours (such as the choice of a healthy diet), using an empirical agent-based model initialised on the Paris region with a synthetic population. We built this synthetic population by pulling together data from two health & nutrition surveys conducted 6 years apart, data from the French census and data from an origin-destination survey. We then combined scenarios of residential patterns (random allocation vs. census-based allocation reflecting the empirical level of residential segregation) with scenarios of daily mobility (no daily moves, random moves or survey-based daily moves reflecting the empirical level of daytime segregation in Paris) to assess the effect of spatio-temporal segregation on the diffusion of health behaviours. While the same upward trend of healthy behaviours is obtained in all scenarios simulated, we find contrasted results with respect to social inequalities: 1/ when the agents' residence is allocated at random, social inequalities of health decrease in the long run; 2/ randomizing daily mobility can mitigate the increase in social inequalities in dietary behaviours induced by effective residential segregation, with this mitigation effect appearing as soon as a small proportion of daily moves are random; 3/ daytime segregation as it exists in Paris slightly reinforces the unequal distribution of health behaviours between the most and least educated groups compared with the sole effect of residential segregation.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00603-4.
{"title":"An agent-based model to investigate the effects of urban segregation around the clock on inequalities in health behaviour.","authors":"Clémentine Cottineau-Mugadza, Julien Perret, Romain Reuillon, Sébastien Rey-Coyrehourcq, Julie Vallée","doi":"10.1140/epjds/s13688-025-00603-4","DOIUrl":"10.1140/epjds/s13688-025-00603-4","url":null,"abstract":"<p><p>Social segregation in cities refers to the uneven spatial distribution of individuals from unequal social groups, such as affluent and economically vulnerable people. Social segregation may, in turn, produce social inequalities through contextual effects, since neighbourhood mixing or concentration plays a role in shaping individuals' opinions and behaviours in multiple life domains, including health. Because segregation and contextual effects occur at the places of residence as well as throughout the day, as people move between locations in a city, we aim to understand the social effect of urban segregation 'around the clock' on health behaviours (such as the choice of a healthy diet), using an empirical agent-based model initialised on the Paris region with a synthetic population. We built this synthetic population by pulling together data from two health & nutrition surveys conducted 6 years apart, data from the French census and data from an origin-destination survey. We then combined scenarios of residential patterns (random allocation vs. census-based allocation reflecting the empirical level of residential segregation) with scenarios of daily mobility (no daily moves, random moves or survey-based daily moves reflecting the empirical level of daytime segregation in Paris) to assess the effect of spatio-temporal segregation on the diffusion of health behaviours. While the same upward trend of healthy behaviours is obtained in all scenarios simulated, we find contrasted results with respect to social inequalities: 1/ when the agents' residence is allocated at random, social inequalities of health decrease in the long run; 2/ randomizing daily mobility can mitigate the increase in social inequalities in dietary behaviours induced by effective residential segregation, with this mitigation effect appearing as soon as a small proportion of daily moves are random; 3/ daytime segregation as it exists in Paris slightly reinforces the unequal distribution of health behaviours between the most and least educated groups compared with the sole effect of residential segregation.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00603-4.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"15 1","pages":"5"},"PeriodicalIF":2.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12804204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) unravel new capabilities for AI personal assistants to overcome cognitive bandwidth limitations of humans, providing decision support or even direct representation of abstained human voters at large scale. However, the quality of this representation and what underlying biases manifest when delegating collective decision making to LLMs is an alarming and timely challenge to tackle. By rigorously emulating more than >50K LLM voting personas in 363 real-world voting elections, we disentangle how AI-generated choices differ from human choices and how this affects collective decision outcomes. Complex preferential ballot formats show significant inconsistencies compared to simpler majoritarian elections, which demonstrate higher consistency. Strikingly, proportional ballot aggregation methods such as equal shares prove to be a win-win: fairer voting outcomes for humans and fairer AI representation, especially for voters likely to abstain. This novel underlying relationship proves paramount for building democratic resilience in scenarios of low voters turnout by voter fatigue: abstained voters are mitigated via AI representatives that recover representative and fair voting outcomes. These interdisciplinary insights provide decision support to policymakers and citizens for developing safeguards and policies for risks of using AI in democratic innovations.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00612-3.
{"title":"Generative AI voting: fair collective choice is resilient to LLM biases and inconsistencies.","authors":"Srijoni Majumdar, Edith Elkind, Evangelos Pournaras","doi":"10.1140/epjds/s13688-025-00612-3","DOIUrl":"10.1140/epjds/s13688-025-00612-3","url":null,"abstract":"<p><p>Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) unravel new capabilities for AI personal assistants to overcome cognitive bandwidth limitations of humans, providing decision support or even direct representation of abstained human voters at large scale. However, the quality of this representation and what underlying biases manifest when delegating collective decision making to LLMs is an alarming and timely challenge to tackle. By rigorously emulating more than >50K LLM voting personas in 363 real-world voting elections, we disentangle how AI-generated choices differ from human choices and how this affects collective decision outcomes. Complex preferential ballot formats show significant inconsistencies compared to simpler majoritarian elections, which demonstrate higher consistency. Strikingly, proportional ballot aggregation methods such as equal shares prove to be a win-win: fairer voting outcomes for humans and fairer AI representation, especially for voters likely to abstain. This novel underlying relationship proves paramount for building democratic resilience in scenarios of low voters turnout by voter fatigue: abstained voters are mitigated via AI representatives that recover representative and fair voting outcomes. These interdisciplinary insights provide decision support to policymakers and citizens for developing safeguards and policies for risks of using AI in democratic innovations.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00612-3.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"15 1","pages":"24"},"PeriodicalIF":2.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147376551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-04-17DOI: 10.1140/epjds/s13688-025-00541-1
Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus
This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.
{"title":"Whose voice matters? Word embeddings reveal identity bias in news quotes.","authors":"Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus","doi":"10.1140/epjds/s13688-025-00541-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00541-1","url":null,"abstract":"<p><p>This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa's apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes - journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00541-1.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"30"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.
{"title":"Safe spaces or toxic places? Content moderation and social dynamics of online eating disorder communities.","authors":"Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, Emilio Ferrara","doi":"10.1140/epjds/s13688-025-00575-5","DOIUrl":"10.1140/epjds/s13688-025-00575-5","url":null,"abstract":"<p><p>Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content moderation strategies, from stringent to laissez-faire approaches, we lack a comprehensive understanding of how these different moderation practices interact with user engagement in online communities around these sensitive mental health topics. This study addresses this knowledge gap through a comparative analysis of eating disorder discussions across Twitter/X (2.6M tweets), Reddit (178K submissions), and TikTok (14K videos) spanning from 2019-2023. Our findings reveal that while users across all platforms engage similarly in expressing concerns and seeking support, platforms with weaker moderation (like Twitter/X) enable the formation of toxic echo chambers that amplify pro-anorexia rhetoric. These results demonstrate how moderation strategies significantly influence the development and impact of online communities, particularly in contexts involving mental health and self-harm.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"55"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144728944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-06-05DOI: 10.1140/epjds/s13688-025-00558-6
Manran Zhu, János Kertész
Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in the choices of them. Older, white and female participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Our study connects social navigation to knowledge navigation: individuals' differing tendencies to use geographical and occupational information about the target person to navigate in the social space can be understood as different choices between the hub-driven and proximity-driven strategies in the knowledge space.
{"title":"Milgram's experiment in the knowledge space: individual navigation strategies.","authors":"Manran Zhu, János Kertész","doi":"10.1140/epjds/s13688-025-00558-6","DOIUrl":"10.1140/epjds/s13688-025-00558-6","url":null,"abstract":"<p><p>Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in the choices of them. Older, white and female participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Our study connects social navigation to knowledge navigation: individuals' differing tendencies to use geographical and occupational information about the target person to navigate in the social space can be understood as different choices between the hub-driven and proximity-driven strategies in the knowledge space.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"42"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144247072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we uncover patterns of usage mobile phone applications and information spread in response to perturbations caused by unprecedented events. We focus on categorizing patterns of response in both space and time, tracking their relaxation over time. To this end, we use the NetMob2023 Data Challenge dataset, which provides mobile phone applications traffic volume data for several cities in France at a spatial resolution of 100 and a time resolution of 15 minutes for a time period ranging from March to May 2019. We analyze the spread of information before, during, and after the catastrophic Notre-Dame fire on April 15th and a bombing that took place in the city centre of Lyon on May 24th using volume of data uploaded and downloaded to different mobile applications as a proxy of information transfer dynamics. We identify different clusters of information transfer dynamics in response to the Notre-Dame fire within the city of Paris as well as in other major French cities. We find a clear pattern of significantly above-baseline usage of the application Twitter (currently known as X) in Paris that radially spreads from the area surrounding the Notre-Dame cathedral to the rest of the city. We detect a similar pattern in the city of Lyon in response to the bombing. Further, we present a null model of radial information spread and develop methods of tracking radial patterns over time. Overall, we illustrate novel analytical methods we devise, showing how they enable a new perspective on mobile phone user response to unplanned catastrophic events and giving insight into how information spreads during a catastrophe in both time and space.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00546-w.
{"title":"Detection of anomalous spatio-temporal patterns of app traffic in response to catastrophic events.","authors":"Sofia Medina, Shazia'Ayn Babul, Timothy LaRock, Rohit Sahasrabuddhe, Renaud Lambiotte, Nicola Pedreschi","doi":"10.1140/epjds/s13688-025-00546-w","DOIUrl":"10.1140/epjds/s13688-025-00546-w","url":null,"abstract":"<p><p>In this work, we uncover patterns of usage mobile phone applications and information spread in response to perturbations caused by unprecedented events. We focus on categorizing patterns of response in both space and time, tracking their relaxation over time. To this end, we use the NetMob2023 Data Challenge dataset, which provides mobile phone applications traffic volume data for several cities in France at a spatial resolution of 100 <math><msup><mi>m</mi> <mn>2</mn></msup> </math> and a time resolution of 15 minutes for a time period ranging from March to May 2019. We analyze the spread of information before, during, and after the catastrophic Notre-Dame fire on April 15th and a bombing that took place in the city centre of Lyon on May 24th using volume of data uploaded and downloaded to different mobile applications as a proxy of information transfer dynamics. We identify different clusters of information transfer dynamics in response to the Notre-Dame fire within the city of Paris as well as in other major French cities. We find a clear pattern of significantly above-baseline usage of the application Twitter (currently known as X) in Paris that radially spreads from the area surrounding the Notre-Dame cathedral to the rest of the city. We detect a similar pattern in the city of Lyon in response to the bombing. Further, we present a null model of radial information spread and develop methods of tracking radial patterns over time. Overall, we illustrate novel analytical methods we devise, showing how they enable a new perspective on mobile phone user response to unplanned catastrophic events and giving insight into how information spreads during a catastrophe in both time and space.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00546-w.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"35"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143990977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-24DOI: 10.1140/epjds/s13688-025-00532-2
Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger
Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.
{"title":"Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries.","authors":"Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger","doi":"10.1140/epjds/s13688-025-00532-2","DOIUrl":"10.1140/epjds/s13688-025-00532-2","url":null,"abstract":"<p><p>Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"25"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11933202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-12DOI: 10.1140/epjds/s13688-025-00521-5
Lea Karbevska, César A Hidalgo
Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the "World Input-Output Database", "Inter-Country Input-Output Tables", "EXIOBASE", and "EORA", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.
{"title":"Mapping global value chains at the product level.","authors":"Lea Karbevska, César A Hidalgo","doi":"10.1140/epjds/s13688-025-00521-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00521-5","url":null,"abstract":"<p><p>Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the \"World Input-Output Database\", \"Inter-Country Input-Output Tables\", \"EXIOBASE\", and \"EORA\", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"21"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-11-13DOI: 10.1140/epjds/s13688-025-00595-1
Juan Sebastián Gómez-Cañón, Thomas Magnus Lennie, Tuomas Eerola, Pablo Aragón, Estefanía Cano, Perfecto Herrera, Emilia Gómez
This work investigates how personalised Music Emotion Recognition (MER) systems may lead to sensitive profiling when applied to musically induced emotions in politically charged contexts. We focus on traditional Colombian music with explicit political content, including (1) vallenatos and social songs aligned with the left-wing guerrilla Fuerzas Armadas Revolucionarias de Colombia (FARC), and (2) corridos linked to sympathisers of the right-wing paramilitary group Autodefensas Unidas de Colombia (AUC). Using data from 49 participants with diverse political leanings, we train personalised machine learning models to predict induced emotional responses - particularly negative emotions. Our findings reveal that political identity plays a significant role in shaping emotional experiences of music with explicit political content, and that emotion recognition models can capture this variation to a certain extent. These results raise critical concerns about the potential misuse of emotion recognition technologies. What is often framed as a tool for wellbeing and emotional regulation could, in politically sensitive contexts, be repurposed for user profiling. This work highlights the ethical risks of deploying AI-driven emotion analysis without safeguards, particularly among populations that are politically or socially vulnerable. We argue that subjective emotional responses may constitute sensitive personal data, and that failing to account for their sociopolitical context could amplify harm and exclusion.
Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00595-1.
{"title":"Personalisation and profiling using algorithms and not-so-popular Colombian music: goal-directed mechanisms in music emotion recognition.","authors":"Juan Sebastián Gómez-Cañón, Thomas Magnus Lennie, Tuomas Eerola, Pablo Aragón, Estefanía Cano, Perfecto Herrera, Emilia Gómez","doi":"10.1140/epjds/s13688-025-00595-1","DOIUrl":"10.1140/epjds/s13688-025-00595-1","url":null,"abstract":"<p><p>This work investigates how personalised Music Emotion Recognition (MER) systems may lead to sensitive profiling when applied to musically induced emotions in politically charged contexts. We focus on traditional Colombian music with explicit political content, including (1) vallenatos and social songs aligned with the left-wing guerrilla Fuerzas Armadas Revolucionarias de Colombia (FARC), and (2) corridos linked to sympathisers of the right-wing paramilitary group Autodefensas Unidas de Colombia (AUC). Using data from 49 participants with diverse political leanings, we train personalised machine learning models to predict induced emotional responses - particularly negative emotions. Our findings reveal that political identity plays a significant role in shaping emotional experiences of music with explicit political content, and that emotion recognition models can capture this variation to a certain extent. These results raise critical concerns about the potential misuse of emotion recognition technologies. What is often framed as a tool for wellbeing and emotional regulation could, in politically sensitive contexts, be repurposed for user profiling. This work highlights the ethical risks of deploying AI-driven emotion analysis without safeguards, particularly among populations that are politically or socially vulnerable. We argue that subjective emotional responses may constitute sensitive personal data, and that failing to account for their sociopolitical context could amplify harm and exclusion.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00595-1.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"80"},"PeriodicalIF":2.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145539583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}