Pub Date : 2023-11-23DOI: 10.1140/epjds/s13688-023-00428-z
Yessica Herrera-Guzmán, Eun Lee, Heetae Kim
Ballet, a mainstream performing art predominantly associated with women, exhibits significant gender imbalances in leading positions. However, the collaboration’s structural composition vis-à-vis gender representation in the field remains unexplored. Our study investigates the gendered labor force composition and collaboration patterns in ballet creations. Our findings reveal gender disparities in ballet creations aligned with gendered collaboration patterns and women’s occupation of more peripheral network positions than men. Productivity disparities show women accessing 20–25% of ballet creations compared to men. Mathematically derived perception errors show the underestimation of women artists’ representation within ballet collaboration networks, potentially impacting women’s careers in the field. Our study highlights the structural imbalances that women face in ballet creations and emphasizes the need for a more inclusive and equal professional environment in the ballet industry. These insights contribute to a broader understanding of structural gender imbalances in artistic domains and can inform cultural organizations about potential affirmative actions toward a better representation of women leaders in ballet.
{"title":"Structural gender imbalances in ballet collaboration networks","authors":"Yessica Herrera-Guzmán, Eun Lee, Heetae Kim","doi":"10.1140/epjds/s13688-023-00428-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00428-z","url":null,"abstract":"<p>Ballet, a mainstream performing art predominantly associated with women, exhibits significant gender imbalances in leading positions. However, the collaboration’s structural composition vis-à-vis gender representation in the field remains unexplored. Our study investigates the gendered labor force composition and collaboration patterns in ballet creations. Our findings reveal gender disparities in ballet creations aligned with gendered collaboration patterns and women’s occupation of more peripheral network positions than men. Productivity disparities show women accessing 20–25% of ballet creations compared to men. Mathematically derived perception errors show the underestimation of women artists’ representation within ballet collaboration networks, potentially impacting women’s careers in the field. Our study highlights the structural imbalances that women face in ballet creations and emphasizes the need for a more inclusive and equal professional environment in the ballet industry. These insights contribute to a broader understanding of structural gender imbalances in artistic domains and can inform cultural organizations about potential affirmative actions toward a better representation of women leaders in ballet.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"18 12","pages":""},"PeriodicalIF":3.6,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Temperature-economic growth relationships are computed to quantify the impact of climate change on the economy. However, model performance and differences of predictions among research complicate the use of climate econometric estimation. Machine learning methods provide an alternative that might improve the predictive effects. However, time series and extrapolation issues constrain methods such as random forests. We apply a simple thought experiment with national marginal GDP growth by aggregating subnational climate impact to alleviate the shortcomings in random forests. This paper uses random forests, multivariate cubic regression, and linear spline regression to examine the direct impacts of temperature on economic development and conducts a performance comparison of the methods. The model results indicate an optimal temperature of 15°C, 15°C or 21°C for each model. Furthermore, a thought experiment indicates that the marginal predictions of national GDP changes by approximately 1%, −3%, or −6% for models with 1°C warming. The performance comparison suggests that random forests have stable model performance and better prediction performance in bootstrapping. However, the extrapolation problem in random forests causes underestimation of climate impact in 5% of cells under 6°C warming. Overall, our results suggest that temperature should be considered in economic projections under climate change scenarios. We also suggest the use of more machine learning methods in climate impact assessment.
{"title":"Temperature impact on the economic growth effect: method development and model performance evaluation with subnational data in China","authors":"Yu Song, Zhihua Pan, Fei Lun, Buju Long, Siyu Liu, Guolin Han, Jialin Wang, Na Huang, Ziyuan Zhang, Shangqian Ma, Guofeng Sun, Cong Liu","doi":"10.1140/epjds/s13688-023-00425-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00425-2","url":null,"abstract":"Abstract Temperature-economic growth relationships are computed to quantify the impact of climate change on the economy. However, model performance and differences of predictions among research complicate the use of climate econometric estimation. Machine learning methods provide an alternative that might improve the predictive effects. However, time series and extrapolation issues constrain methods such as random forests. We apply a simple thought experiment with national marginal GDP growth by aggregating subnational climate impact to alleviate the shortcomings in random forests. This paper uses random forests, multivariate cubic regression, and linear spline regression to examine the direct impacts of temperature on economic development and conducts a performance comparison of the methods. The model results indicate an optimal temperature of 15°C, 15°C or 21°C for each model. Furthermore, a thought experiment indicates that the marginal predictions of national GDP changes by approximately 1%, −3%, or −6% for models with 1°C warming. The performance comparison suggests that random forests have stable model performance and better prediction performance in bootstrapping. However, the extrapolation problem in random forests causes underestimation of climate impact in 5% of cells under 6°C warming. Overall, our results suggest that temperature should be considered in economic projections under climate change scenarios. We also suggest the use of more machine learning methods in climate impact assessment.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136262842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.
{"title":"Does noise affect housing prices? A case study in the urban area of Thessaloniki","authors":"Georgios Kamtziridis, Dimitris Vrakas, Grigorios Tsoumakas","doi":"10.1140/epjds/s13688-023-00424-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00424-3","url":null,"abstract":"Abstract Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135994955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-09DOI: 10.1140/epjds/s13688-023-00411-8
Ana Maria Jaramillo, Hywel T. P. Williams, Nicola Perra, Ronaldo Menezes
Abstract Co-authorship networks, where nodes represent authors and edges represent co-authorship relations, are key to understanding the production and diffusion of knowledge in academia. Social constructs, biases (implicit and explicit), and constraints (e.g. spatial, temporal) affect who works with whom and cause co-authorship networks to organise into tight communities with different levels of segregation. We aim to examine aspects of the co-authorship network structure that lead to segregation and its impact on scientific production. We measure segregation using the Spectral Segregation Index (SSI) and find four ordered categories: completely segregated, highly segregated, moderately segregated and non-segregated communities. We direct our attention to the non-segregated and highly segregated communities, quantifying and comparing their structural topologies and k -core positions. When considering communities of both categories (controlling for size), our results show no differences in density and clustering but substantial variability in the core position. Larger non-segregated communities are more likely to occupy cores near the network nucleus, while the highly segregated ones tend to be closer to the network periphery. Finally, we analyse differences in citations gained by researchers within communities of different segregation categories. Researchers in highly segregated communities get more citations from their community members in middle cores and gain more citations per publication in middle/periphery cores. Those in non-segregated communities get more citations per publication in the nucleus. To our knowledge, this work is the first to characterise community segregation in co-authorship networks and investigate the relationship between community segregation and author citations. Our results help study highly segregated communities of scientific co-authors and can pave the way for intervention strategies to improve the growth and dissemination of scientific knowledge.
{"title":"The structure of segregation in co-authorship networks and its impact on scientific production","authors":"Ana Maria Jaramillo, Hywel T. P. Williams, Nicola Perra, Ronaldo Menezes","doi":"10.1140/epjds/s13688-023-00411-8","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00411-8","url":null,"abstract":"Abstract Co-authorship networks, where nodes represent authors and edges represent co-authorship relations, are key to understanding the production and diffusion of knowledge in academia. Social constructs, biases (implicit and explicit), and constraints (e.g. spatial, temporal) affect who works with whom and cause co-authorship networks to organise into tight communities with different levels of segregation. We aim to examine aspects of the co-authorship network structure that lead to segregation and its impact on scientific production. We measure segregation using the Spectral Segregation Index (SSI) and find four ordered categories: completely segregated, highly segregated, moderately segregated and non-segregated communities. We direct our attention to the non-segregated and highly segregated communities, quantifying and comparing their structural topologies and k -core positions. When considering communities of both categories (controlling for size), our results show no differences in density and clustering but substantial variability in the core position. Larger non-segregated communities are more likely to occupy cores near the network nucleus, while the highly segregated ones tend to be closer to the network periphery. Finally, we analyse differences in citations gained by researchers within communities of different segregation categories. Researchers in highly segregated communities get more citations from their community members in middle cores and gain more citations per publication in middle/periphery cores. Those in non-segregated communities get more citations per publication in the nucleus. To our knowledge, this work is the first to characterise community segregation in co-authorship networks and investigate the relationship between community segregation and author citations. Our results help study highly segregated communities of scientific co-authors and can pave the way for intervention strategies to improve the growth and dissemination of scientific knowledge.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135095076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-06DOI: 10.1140/epjds/s13688-023-00421-6
Fakhri Momeni, Philipp Mayr, Stefan Dietze
Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.
{"title":"Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction","authors":"Fakhri Momeni, Philipp Mayr, Stefan Dietze","doi":"10.1140/epjds/s13688-023-00421-6","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00421-6","url":null,"abstract":"Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135351680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-03DOI: 10.1140/epjds/s13688-023-00418-1
Gabriele Di Bona, Alberto Bracci, Nicola Perra, Vito Latora, Andrea Baronchelli
Abstract Decentralization is a pervasive concept found across disciplines, including Economics, Political Science, and Computer Science, where it is used in distinct yet interrelated ways. Here, we develop and publicly release a general pipeline to investigate the scholarly history of the term, analysing $425{,}144$ 425,144 academic publications that refer to (de)centralization . We find that the fraction of papers on the topic has been exponentially increasing since the 1950s. In 2021, 1 author in 154 mentioned (de)centralization in the title or abstract of an article. Using both semantic information and citation patterns, we cluster papers in fields and characterize the knowledge flows between them. Our analysis reveals that the topic has independently emerged in the different fields, with small cross-disciplinary contamination. Moreover, we show how Blockchain has become the most influential field about 10 years ago, while Governance dominated before the 1990s. In summary, our findings provide a quantitative assessment of the evolution of a key yet elusive concept, which has undergone cycles of rise and fall within different fields. Our pipeline offers a powerful tool to analyze the evolution of any scholarly term in the academic literature, providing insights into the interplay between collective and independent discoveries in science.
{"title":"The concept of decentralization through time and disciplines: a quantitative exploration","authors":"Gabriele Di Bona, Alberto Bracci, Nicola Perra, Vito Latora, Andrea Baronchelli","doi":"10.1140/epjds/s13688-023-00418-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00418-1","url":null,"abstract":"Abstract Decentralization is a pervasive concept found across disciplines, including Economics, Political Science, and Computer Science, where it is used in distinct yet interrelated ways. Here, we develop and publicly release a general pipeline to investigate the scholarly history of the term, analysing $425{,}144$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mn>425</mml:mn> <mml:mo>,</mml:mo> <mml:mn>144</mml:mn> </mml:math> academic publications that refer to (de)centralization . We find that the fraction of papers on the topic has been exponentially increasing since the 1950s. In 2021, 1 author in 154 mentioned (de)centralization in the title or abstract of an article. Using both semantic information and citation patterns, we cluster papers in fields and characterize the knowledge flows between them. Our analysis reveals that the topic has independently emerged in the different fields, with small cross-disciplinary contamination. Moreover, we show how Blockchain has become the most influential field about 10 years ago, while Governance dominated before the 1990s. In summary, our findings provide a quantitative assessment of the evolution of a key yet elusive concept, which has undergone cycles of rise and fall within different fields. Our pipeline offers a powerful tool to analyze the evolution of any scholarly term in the academic literature, providing insights into the interplay between collective and independent discoveries in science.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135739068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1140/epjds/s13688-023-00419-0
Haodong Qi, Tuba Bircan
Abstract Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration.
{"title":"Can Google Trends predict asylum-seekers’ destination choices?","authors":"Haodong Qi, Tuba Bircan","doi":"10.1140/epjds/s13688-023-00419-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00419-0","url":null,"abstract":"Abstract Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135828506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-26DOI: 10.1140/epjds/s13688-023-00403-8
Audun Myers, David Muñoz, Firas A. Khasawneh, Elizabeth Munch
{"title":"Correction: Temporal network analysis using zigzag persistence","authors":"Audun Myers, David Muñoz, Firas A. Khasawneh, Elizabeth Munch","doi":"10.1140/epjds/s13688-023-00403-8","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00403-8","url":null,"abstract":"","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134887148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-22DOI: 10.1140/epjds/s13688-023-00406-5
Abdullah Alrhmoun, János Kertész
Abstract Bots in online social networks can be used for good or bad but their presence is unavoidable and will increase in the future. To investigate how the interaction networks of bots and humans evolve, we created six social bots on Twitter with AI language models and let them carry out standard user operations. Three different strategies were implemented for the bots: a trend-targeting strategy (TTS), a keywords-targeting strategy (KTS) and a user-targeting strategy (UTS). We examined the interaction patterns such as targeting users, spreading messages, propagating relationships, and engagement. We focused on the emergent local structures or motifs and found that the strategies of the social bots had a significant impact on them. Motifs resulting from interactions with bots following TTS or KTS are simple and show significant overlap, while those resulting from interactions with UTS-governed bots lead to more complex motifs. These findings provide insights into human-bot interaction patterns in online social networks, and can be used to develop more effective bots for beneficial tasks and to combat malicious actors.
{"title":"Emergent local structures in an ecosystem of social bots and humans on Twitter","authors":"Abdullah Alrhmoun, János Kertész","doi":"10.1140/epjds/s13688-023-00406-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00406-5","url":null,"abstract":"Abstract Bots in online social networks can be used for good or bad but their presence is unavoidable and will increase in the future. To investigate how the interaction networks of bots and humans evolve, we created six social bots on Twitter with AI language models and let them carry out standard user operations. Three different strategies were implemented for the bots: a trend-targeting strategy (TTS), a keywords-targeting strategy (KTS) and a user-targeting strategy (UTS). We examined the interaction patterns such as targeting users, spreading messages, propagating relationships, and engagement. We focused on the emergent local structures or motifs and found that the strategies of the social bots had a significant impact on them. Motifs resulting from interactions with bots following TTS or KTS are simple and show significant overlap, while those resulting from interactions with UTS-governed bots lead to more complex motifs. These findings provide insights into human-bot interaction patterns in online social networks, and can be used to develop more effective bots for beneficial tasks and to combat malicious actors.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136062242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19DOI: 10.1140/epjds/s13688-023-00400-x
Peter Sheridan Dodds, Joshua R. Minot, Michael V. Arnold, Thayer Alshaabi, Jane Lydia Adams, David Rushing Dewhurst, Tyler J. Gray, Morgan R. Frank, Andrew J. Reagan, Christopher M. Danforth
Abstract Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Here, we introduce ‘allotaxonometry’ along with ‘rank-turbulence divergence’ (RTD), a tunable instrument for comparing any two ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence, which we view as an instrument of ‘type calculus’, for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.
{"title":"Allotaxonometry and rank-turbulence divergence: a universal instrument for comparing complex systems","authors":"Peter Sheridan Dodds, Joshua R. Minot, Michael V. Arnold, Thayer Alshaabi, Jane Lydia Adams, David Rushing Dewhurst, Tyler J. Gray, Morgan R. Frank, Andrew J. Reagan, Christopher M. Danforth","doi":"10.1140/epjds/s13688-023-00400-x","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00400-x","url":null,"abstract":"Abstract Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Here, we introduce ‘allotaxonometry’ along with ‘rank-turbulence divergence’ (RTD), a tunable instrument for comparing any two ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence, which we view as an instrument of ‘type calculus’, for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}