Pub Date : 2024-08-08DOI: 10.1140/epjds/s13688-024-00493-y
Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela
The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users’ political ideologies and the news they share during Argentina’s 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.
{"title":"Analyzing user ideologies and shared news during the 2019 argentinian elections","authors":"Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela","doi":"10.1140/epjds/s13688-024-00493-y","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00493-y","url":null,"abstract":"<p>The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users’ political ideologies and the news they share during Argentina’s 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"57 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1140/epjds/s13688-024-00492-z
Rohit Ram, Marian-Andrei Rizoiu
Social influence pervades our everyday lives and lays the foundation for complex social phenomena, such as the spread of misinformation and the polarization of communities. A disconnect appears between psychology approaches, generally performed and tested in controlled lab experiments, and quantitative methods, which are usually data-driven and rely on network and event analysis. The former are slow, expensive to deploy, and typically do not generalize well to topical issues; the latter often oversimplify the complexities of social influence and ignore psychosocial literature. This work bridges this gap by introducing a human-in-the-loop active learning method that empirically quantifies social influence by crowdsourcing pairwise influence comparisons. We develop simulation and fitting tools, allowing us to estimate the required budget based on the design features and the worker’s decision accuracy. We perform a series of pilot studies to quantify the impact of design features on worker accuracy. We deploy our method to estimate the influence ranking of 500 X/Twitter users. We validate our measure by showing that the obtained empirical influence is tightly linked with agency and communion, the Big Two of social cognition, with agency being the most important dimension for influence formation.
{"title":"Empirically measuring online social influence","authors":"Rohit Ram, Marian-Andrei Rizoiu","doi":"10.1140/epjds/s13688-024-00492-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00492-z","url":null,"abstract":"<p>Social influence pervades our everyday lives and lays the foundation for complex social phenomena, such as the spread of misinformation and the polarization of communities. A disconnect appears between psychology approaches, generally performed and tested in controlled lab experiments, and quantitative methods, which are usually data-driven and rely on network and event analysis. The former are slow, expensive to deploy, and typically do not generalize well to topical issues; the latter often oversimplify the complexities of social influence and ignore psychosocial literature. This work bridges this gap by introducing a human-in-the-loop active learning method that empirically quantifies social influence by crowdsourcing pairwise influence comparisons. We develop simulation and fitting tools, allowing us to estimate the required budget based on the design features and the worker’s decision accuracy. We perform a series of pilot studies to quantify the impact of design features on worker accuracy. We deploy our method to estimate the influence ranking of 500 X/Twitter users. We validate our measure by showing that the obtained empirical influence is tightly linked with agency and communion, the Big Two of social cognition, with agency being the most important dimension for influence formation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1140/epjds/s13688-024-00484-z
Ambra Amico, Giacomo Vaccario, Frank Schweitzer
Networks to distribute goods, from raw materials to food and medicines, are the backbone of a functioning economy. They are shaped by several supply relations connecting manufacturers, distributors, and final buyers worldwide. We present a network-based model to describe the mechanisms underlying the emergence and growth of distribution networks. In our model, firms consider two practices when establishing new supply relations: centralization, the tendency to choose highly connected partners, and multi-sourcing, the preference for multiple suppliers. Centralization enhances network efficiency by leveraging short distribution paths; multi-sourcing fosters resilience by providing multiple distribution paths connecting final buyers to the manufacturer. We validate the proposed model using data on drug shipments in the US. Drawing on these data, we reconstruct 22 nationwide pharmaceutical distribution networks. We demonstrate that the proposed model successfully replicates several structural features of the empirical networks, including their out-degree and path length distributions as well as their resilience and efficiency properties. These findings suggest that the proposed firm-level practices effectively capture the network growth process that leads to the observed structures.
{"title":"Efficiency and resilience: key drivers of distribution network growth","authors":"Ambra Amico, Giacomo Vaccario, Frank Schweitzer","doi":"10.1140/epjds/s13688-024-00484-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00484-z","url":null,"abstract":"<p>Networks to distribute goods, from raw materials to food and medicines, are the backbone of a functioning economy. They are shaped by several supply relations connecting manufacturers, distributors, and final buyers worldwide. We present a network-based model to describe the mechanisms underlying the emergence and growth of distribution networks. In our model, firms consider two practices when establishing new supply relations: centralization, the tendency to choose highly connected partners, and multi-sourcing, the preference for multiple suppliers. Centralization enhances network efficiency by leveraging short distribution paths; multi-sourcing fosters resilience by providing multiple distribution paths connecting final buyers to the manufacturer. We validate the proposed model using data on drug shipments in the US. Drawing on these data, we reconstruct 22 nationwide pharmaceutical distribution networks. We demonstrate that the proposed model successfully replicates several structural features of the empirical networks, including their out-degree and path length distributions as well as their resilience and efficiency properties. These findings suggest that the proposed firm-level practices effectively capture the network growth process that leads to the observed structures.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis.
{"title":"Measuring corporate digital divide through websites: insights from Italian firms","authors":"Leonardo Mazzoni, Fabio Pinelli, Massimo Riccaboni","doi":"10.1140/epjds/s13688-024-00491-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00491-0","url":null,"abstract":"<p>With the increasing pervasiveness of Information and Communication Technology (ICT) in the fabric of economic activities, the corporate digital divide has become a crucial issue for the assessment of Information Technology (IT) competencies and the digital gap between firms and territories. With little granular data available to measure the phenomenon, most studies have used survey data. To address this empirical gap, we scanned the homepages of 182,705 Italian companies and extracted ten characteristics related to their digital footprint to develop a new index for the corporate digital assessment. Our results show a significant digital divide between Italian companies according to size, sector and geographical location, opening new perspectives for monitoring and data-driven analysis.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1140/epjds/s13688-024-00490-1
Marco Mancastroppa, Iacopo Iacopini, Giovanni Petri, Alain Barrat
The richness of many complex systems stems from the interactions among their components. The higher-order nature of these interactions, involving many units at once, and their temporal dynamics constitute crucial properties that shape the behaviour of the system itself. An adequate description of these systems is offered by temporal hypergraphs, that integrate these features within the same framework. However, tools for their temporal and topological characterization are still scarce. Here we develop a series of methods specifically designed to analyse the structural properties of temporal hypergraphs at multiple scales. Leveraging the hyper-core decomposition of hypergraphs, we follow the evolution of the hyper-cores through time, characterizing the hypergraph structure and its temporal dynamics at different topological scales, and quantifying the multi-scale structural stability of the system. We also define two static hypercoreness centrality measures that provide an overall description of the nodes aggregated structural behaviour. We apply the characterization methods to several data sets, establishing connections between structural properties and specific activities within the systems. Finally, we show how the proposed method can be used as a model-validation tool for synthetic temporal hypergraphs, distinguishing the higher-order structures and dynamics generated by different models from the empirical ones, and thus identifying the essential model mechanisms to reproduce the empirical hypergraph structure and evolution. Our work opens several research directions, from the understanding of dynamic processes on temporal higher-order networks to the design of new models of time-varying hypergraphs.
{"title":"The structural evolution of temporal hypergraphs through the lens of hyper-cores","authors":"Marco Mancastroppa, Iacopo Iacopini, Giovanni Petri, Alain Barrat","doi":"10.1140/epjds/s13688-024-00490-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00490-1","url":null,"abstract":"<p>The richness of many complex systems stems from the interactions among their components. The higher-order nature of these interactions, involving many units at once, and their temporal dynamics constitute crucial properties that shape the behaviour of the system itself. An adequate description of these systems is offered by temporal hypergraphs, that integrate these features within the same framework. However, tools for their temporal and topological characterization are still scarce. Here we develop a series of methods specifically designed to analyse the structural properties of temporal hypergraphs at multiple scales. Leveraging the hyper-core decomposition of hypergraphs, we follow the evolution of the hyper-cores through time, characterizing the hypergraph structure and its temporal dynamics at different topological scales, and quantifying the multi-scale structural stability of the system. We also define two static hypercoreness centrality measures that provide an overall description of the nodes aggregated structural behaviour. We apply the characterization methods to several data sets, establishing connections between structural properties and specific activities within the systems. Finally, we show how the proposed method can be used as a model-validation tool for synthetic temporal hypergraphs, distinguishing the higher-order structures and dynamics generated by different models from the empirical ones, and thus identifying the essential model mechanisms to reproduce the empirical hypergraph structure and evolution. Our work opens several research directions, from the understanding of dynamic processes on temporal higher-order networks to the design of new models of time-varying hypergraphs.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"16 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1140/epjds/s13688-024-00489-8
Nandini Iyer, Ronaldo Menezes, Hugo Barbosa
With trends of urbanisation on the rise, providing adequate housing to individuals remains a complex issue to be addressed. Often, the slow output of relevant housing policies, coupled with quickly increasing housing costs, leaves individuals with the burden of finding housing that is affordable and in a safe location. In this paper, we unveil how transit service to employment hubs, not just housing policies, can prevent individuals from improving their housing conditions. We approach this question in three steps, applying the workflow to 20 cities in the United States of America. First, we propose a comprehensive framework to quantify housing insecurity and assign a housing demographic to each neighbourhood. Second, we use transit-pedestrian networks and public transit timetables (GTFS feeds) to estimate the time it takes to travel between two neighbourhoods using public transportation. Third, we apply geospatial autocorrelation to identify employment hotspots for each housing demographic. Finally, we use stochastic modelling to highlight how commuting to areas associated with better housing conditions results in transit commute times of over an hour in 15 cities. Ultimately, we consider the compounded burdens that come with housing insecurity, by having poor transit access to employment areas. In doing so, we highlight the importance of understanding how negative outcomes of housing insecurity coincide with various urban mechanisms, particularly emphasising the role that public transportation plays in locking vulnerable demographics into a cycle of poverty.
{"title":"The role of transport systems in housing insecurity: a mobility-based analysis","authors":"Nandini Iyer, Ronaldo Menezes, Hugo Barbosa","doi":"10.1140/epjds/s13688-024-00489-8","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00489-8","url":null,"abstract":"<p>With trends of urbanisation on the rise, providing adequate housing to individuals remains a complex issue to be addressed. Often, the slow output of relevant housing policies, coupled with quickly increasing housing costs, leaves individuals with the burden of finding housing that is affordable and in a safe location. In this paper, we unveil how transit service to employment hubs, not just housing policies, can prevent individuals from improving their housing conditions. We approach this question in three steps, applying the workflow to 20 cities in the United States of America. First, we propose a comprehensive framework to quantify housing insecurity and assign a housing demographic to each neighbourhood. Second, we use transit-pedestrian networks and public transit timetables (GTFS feeds) to estimate the time it takes to travel between two neighbourhoods using public transportation. Third, we apply geospatial autocorrelation to identify employment hotspots for each housing demographic. Finally, we use stochastic modelling to highlight how commuting to areas associated with better housing conditions results in transit commute times of over an hour in 15 cities. Ultimately, we consider the compounded burdens that come with housing insecurity, by having poor transit access to employment areas. In doing so, we highlight the importance of understanding how negative outcomes of housing insecurity coincide with various urban mechanisms, particularly emphasising the role that public transportation plays in locking vulnerable demographics into a cycle of poverty.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"26 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona’s bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system’s predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.
{"title":"Cycling into the workshop: e-bike and m-bike mobility patterns for predictive maintenance in Barcelona’s bike-sharing system","authors":"Jordi Grau-Escolano, Aleix Bassolas, Julian Vicens","doi":"10.1140/epjds/s13688-024-00486-x","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00486-x","url":null,"abstract":"<p>Bike-sharing systems have emerged as a significant element of urban mobility, providing an environmentally friendly transportation alternative. With the increasing integration of electric bikes alongside mechanical bikes, it is crucial to illuminate distinct usage patterns and their impact on maintenance. Accordingly, this research aims to develop a comprehensive understanding of mobility dynamics, distinguishing between different mobility modes, and introducing a novel predictive maintenance system tailored for bikes. By utilising a combination of trip information and maintenance data from Barcelona’s bike-sharing system, Bicing, this study conducts an extensive analysis of mobility patterns and their relationship to failures of bike components. To accurately predict maintenance needs for essential bike parts, this research delves into various mobility metrics and applies statistical and machine learning survival models, including deep learning models. Due to their complexity, and with the objective of bolstering confidence in the system’s predictions, interpretability techniques explain the main predictors of maintenance needs. The analysis reveals marked differences in the usage patterns of mechanical bikes and electric bikes, with a growing user preference for the latter despite their extra costs. These differences in mobility were found to have a considerable impact on the maintenance needs within the bike-sharing system. Moreover, the predictive maintenance models proved effective in forecasting these maintenance needs, capable of operating across an entire bike fleet. Despite challenges such as approximated bike usage metrics and data imbalances, the study successfully showcases the feasibility of an accurate predictive maintenance system capable of improving operational costs, bike availability, and security.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1140/epjds/s13688-024-00488-9
Alexander M. Petersen
We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.
{"title":"Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation","authors":"Alexander M. Petersen","doi":"10.1140/epjds/s13688-024-00488-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00488-9","url":null,"abstract":"<p>We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141588230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1140/epjds/s13688-024-00487-w
Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu
A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.
{"title":"Downscaling spatial interaction with socioeconomic attributes","authors":"Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu","doi":"10.1140/epjds/s13688-024-00487-w","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00487-w","url":null,"abstract":"<p>A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1140/epjds/s13688-024-00483-0
Minje Choi, Daniel M. Romero, David Jurgens
Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.
{"title":"Profile update: the effects of identity disclosure on network connections and language","authors":"Minje Choi, Daniel M. Romero, David Jurgens","doi":"10.1140/epjds/s13688-024-00483-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00483-0","url":null,"abstract":"<p>Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}