首页 > 最新文献

EPJ Data Science最新文献

英文 中文
Linking physical violence to women’s mobility in Chile 将身体暴力与智利妇女的流动性联系起来
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-04 DOI: 10.1140/epjds/s13688-023-00430-5
Hugo Contreras, Cristian Candia, Rodrigo Troncoso, Leo Ferres, Loreto Bravo, Bruno Lepri, Carlos Rodriguez-Sickert

Despite increased global attention on violence against women, understanding the factors that lead to women becoming victims remains a critical challenge. Notably, the impact of domestic violence on women’s mobility—a critical determinant of their social and economic independence—has remained largely unexplored. This study bridges this gap, employing police records to quantify physical and psychological domestic violence, while leveraging mobile phone data to proxy women’s mobility. Our analyses reveal a negative correlation between physical violence and female mobility, an association that withstands robustness checks, including controls for economic independence variables like education, employment, and occupational segregation, bootstrapping of the data set, and applying a generalized propensity score matching identification strategy. The study emphasizes the potential causal role of physical violence on decreased female mobility, asserting the value of interdisciplinary research in exploring such multifaceted social phenomena to open avenues for preventive measures. The implications of this research extend into the realm of public policy and intervention development, offering new strategies to combat and ultimately eradicate domestic violence against women, thereby contributing to wider efforts toward gender equity.

尽管全球越来越关注对妇女的暴力行为,但了解导致妇女成为受害者的因素仍然是一项重大挑战。值得注意的是,家庭暴力对妇女流动性的影响——这是决定她们社会和经济独立的关键因素——在很大程度上仍未得到研究。这项研究弥补了这一差距,利用警方记录来量化身体和心理上的家庭暴力,同时利用手机数据来代理妇女的流动性。我们的分析揭示了身体暴力与女性流动性之间的负相关关系,这种关系经受住了稳健性检查,包括对教育、就业和职业隔离等经济独立变量的控制,数据集的自举,以及应用广义倾向得分匹配识别策略。该研究强调了身体暴力对减少女性流动性的潜在因果作用,强调了跨学科研究在探索这种多方面的社会现象以开辟预防措施途径方面的价值。这项研究的影响延伸到公共政策和干预发展领域,为打击并最终消除针对妇女的家庭暴力提供了新的战略,从而为实现性别平等做出了更广泛的努力。
{"title":"Linking physical violence to women’s mobility in Chile","authors":"Hugo Contreras, Cristian Candia, Rodrigo Troncoso, Leo Ferres, Loreto Bravo, Bruno Lepri, Carlos Rodriguez-Sickert","doi":"10.1140/epjds/s13688-023-00430-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00430-5","url":null,"abstract":"<p>Despite increased global attention on violence against women, understanding the factors that lead to women becoming victims remains a critical challenge. Notably, the impact of domestic violence on women’s mobility—a critical determinant of their social and economic independence—has remained largely unexplored. This study bridges this gap, employing police records to quantify physical and psychological domestic violence, while leveraging mobile phone data to proxy women’s mobility. Our analyses reveal a negative correlation between physical violence and female mobility, an association that withstands robustness checks, including controls for economic independence variables like education, employment, and occupational segregation, bootstrapping of the data set, and applying a generalized propensity score matching identification strategy. The study emphasizes the potential causal role of physical violence on decreased female mobility, asserting the value of interdisciplinary research in exploring such multifaceted social phenomena to open avenues for preventive measures. The implications of this research extend into the realm of public policy and intervention development, offering new strategies to combat and ultimately eradicate domestic violence against women, thereby contributing to wider efforts toward gender equity.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"30 6","pages":""},"PeriodicalIF":3.6,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational social science is growing up: why puberty consists of embracing measurement validation, theory development, and open science practices 计算社会科学正在成长:青春期为何包括接受测量验证、理论发展和开放科学实践
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 DOI: 10.1140/epjds/s13688-023-00434-1
Timon Elmer
{"title":"Computational social science is growing up: why puberty consists of embracing measurement validation, theory development, and open science practices","authors":"Timon Elmer","doi":"10.1140/epjds/s13688-023-00434-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00434-1","url":null,"abstract":"","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":" 25","pages":"1-19"},"PeriodicalIF":3.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138619747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individual mobility deep insight using mobile phones data 利用手机数据深入洞察个人移动性
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-12-01 DOI: 10.1140/epjds/s13688-023-00431-4
C. Mizzi, Alex Baroncini, Alessandro Fabbri, Davide Micheli, Aldo Vannelli, Carmen Criminisi, Susanna Jean, Armando Bazzani
{"title":"Individual mobility deep insight using mobile phones data","authors":"C. Mizzi, Alex Baroncini, Alessandro Fabbri, Davide Micheli, Aldo Vannelli, Carmen Criminisi, Susanna Jean, Armando Bazzani","doi":"10.1140/epjds/s13688-023-00431-4","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00431-4","url":null,"abstract":"","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":" 3","pages":"1-17"},"PeriodicalIF":3.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138620931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural gender imbalances in ballet collaboration networks 芭蕾合作网络中的结构性性别失衡
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-11-23 DOI: 10.1140/epjds/s13688-023-00428-z
Yessica Herrera-Guzmán, Eun Lee, Heetae Kim

Ballet, a mainstream performing art predominantly associated with women, exhibits significant gender imbalances in leading positions. However, the collaboration’s structural composition vis-à-vis gender representation in the field remains unexplored. Our study investigates the gendered labor force composition and collaboration patterns in ballet creations. Our findings reveal gender disparities in ballet creations aligned with gendered collaboration patterns and women’s occupation of more peripheral network positions than men. Productivity disparities show women accessing 20–25% of ballet creations compared to men. Mathematically derived perception errors show the underestimation of women artists’ representation within ballet collaboration networks, potentially impacting women’s careers in the field. Our study highlights the structural imbalances that women face in ballet creations and emphasizes the need for a more inclusive and equal professional environment in the ballet industry. These insights contribute to a broader understanding of structural gender imbalances in artistic domains and can inform cultural organizations about potential affirmative actions toward a better representation of women leaders in ballet.

芭蕾舞是一种以女性为主的主流表演艺术,在领导职位上表现出明显的性别失衡。但是,协作的结构构成对-à-vis外地的性别代表性仍未加以探讨。本研究探讨芭蕾创作中的性别劳动力构成及合作模式。我们的研究结果揭示了芭蕾舞创作中的性别差异与性别合作模式相一致,女性比男性占据更多的外围网络位置。生产力差异表明,与男性相比,女性获得了20-25%的芭蕾舞作品。数学推导出的认知错误表明,在芭蕾合作网络中,低估了女性艺术家的代表性,这可能会影响女性在该领域的职业生涯。我们的研究强调了女性在芭蕾创作中面临的结构性失衡,并强调了在芭蕾产业中需要一个更加包容和平等的专业环境。这些见解有助于更广泛地理解艺术领域的结构性性别失衡,并可以为文化组织提供关于在芭蕾舞中更好地代表女性领导者的潜在平权行动的信息。
{"title":"Structural gender imbalances in ballet collaboration networks","authors":"Yessica Herrera-Guzmán, Eun Lee, Heetae Kim","doi":"10.1140/epjds/s13688-023-00428-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00428-z","url":null,"abstract":"<p>Ballet, a mainstream performing art predominantly associated with women, exhibits significant gender imbalances in leading positions. However, the collaboration’s structural composition vis-à-vis gender representation in the field remains unexplored. Our study investigates the gendered labor force composition and collaboration patterns in ballet creations. Our findings reveal gender disparities in ballet creations aligned with gendered collaboration patterns and women’s occupation of more peripheral network positions than men. Productivity disparities show women accessing 20–25% of ballet creations compared to men. Mathematically derived perception errors show the underestimation of women artists’ representation within ballet collaboration networks, potentially impacting women’s careers in the field. Our study highlights the structural imbalances that women face in ballet creations and emphasizes the need for a more inclusive and equal professional environment in the ballet industry. These insights contribute to a broader understanding of structural gender imbalances in artistic domains and can inform cultural organizations about potential affirmative actions toward a better representation of women leaders in ballet.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"18 12","pages":""},"PeriodicalIF":3.6,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Temperature impact on the economic growth effect: method development and model performance evaluation with subnational data in China 温度对经济增长效应的影响:基于中国次国家级数据的方法开发与模型绩效评价
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-27 DOI: 10.1140/epjds/s13688-023-00425-2
Yu Song, Zhihua Pan, Fei Lun, Buju Long, Siyu Liu, Guolin Han, Jialin Wang, Na Huang, Ziyuan Zhang, Shangqian Ma, Guofeng Sun, Cong Liu
Abstract Temperature-economic growth relationships are computed to quantify the impact of climate change on the economy. However, model performance and differences of predictions among research complicate the use of climate econometric estimation. Machine learning methods provide an alternative that might improve the predictive effects. However, time series and extrapolation issues constrain methods such as random forests. We apply a simple thought experiment with national marginal GDP growth by aggregating subnational climate impact to alleviate the shortcomings in random forests. This paper uses random forests, multivariate cubic regression, and linear spline regression to examine the direct impacts of temperature on economic development and conducts a performance comparison of the methods. The model results indicate an optimal temperature of 15°C, 15°C or 21°C for each model. Furthermore, a thought experiment indicates that the marginal predictions of national GDP changes by approximately 1%, −3%, or −6% for models with 1°C warming. The performance comparison suggests that random forests have stable model performance and better prediction performance in bootstrapping. However, the extrapolation problem in random forests causes underestimation of climate impact in 5% of cells under 6°C warming. Overall, our results suggest that temperature should be considered in economic projections under climate change scenarios. We also suggest the use of more machine learning methods in climate impact assessment.
计算温度-经济增长关系是为了量化气候变化对经济的影响。然而,模型性能和研究之间预测的差异使气候计量估计的使用复杂化。机器学习方法提供了一种可能改善预测效果的替代方法。然而,时间序列和外推问题限制了随机森林等方法。我们通过汇总地方气候影响,对国家边际GDP增长进行了一个简单的思想实验,以缓解随机森林的缺陷。本文采用随机森林、多元三次回归和线性样条回归等方法考察了温度对经济发展的直接影响,并对这些方法进行了性能比较。模型结果表明,每种模型的最佳温度为15°C、15°C或21°C。此外,一项思想实验表明,对于升温1°C的模式,国家GDP的边际预测变化约为1%、- 3%或- 6%。性能比较表明,随机森林在自举中具有稳定的模型性能和较好的预测性能。然而,随机森林的外推问题导致5%的细胞在升温6°C时低估了气候影响。总的来说,我们的研究结果表明,在气候变化情景下的经济预测中应该考虑温度。我们还建议在气候影响评估中使用更多的机器学习方法。
{"title":"Temperature impact on the economic growth effect: method development and model performance evaluation with subnational data in China","authors":"Yu Song, Zhihua Pan, Fei Lun, Buju Long, Siyu Liu, Guolin Han, Jialin Wang, Na Huang, Ziyuan Zhang, Shangqian Ma, Guofeng Sun, Cong Liu","doi":"10.1140/epjds/s13688-023-00425-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00425-2","url":null,"abstract":"Abstract Temperature-economic growth relationships are computed to quantify the impact of climate change on the economy. However, model performance and differences of predictions among research complicate the use of climate econometric estimation. Machine learning methods provide an alternative that might improve the predictive effects. However, time series and extrapolation issues constrain methods such as random forests. We apply a simple thought experiment with national marginal GDP growth by aggregating subnational climate impact to alleviate the shortcomings in random forests. This paper uses random forests, multivariate cubic regression, and linear spline regression to examine the direct impacts of temperature on economic development and conducts a performance comparison of the methods. The model results indicate an optimal temperature of 15°C, 15°C or 21°C for each model. Furthermore, a thought experiment indicates that the marginal predictions of national GDP changes by approximately 1%, −3%, or −6% for models with 1°C warming. The performance comparison suggests that random forests have stable model performance and better prediction performance in bootstrapping. However, the extrapolation problem in random forests causes underestimation of climate impact in 5% of cells under 6°C warming. Overall, our results suggest that temperature should be considered in economic projections under climate change scenarios. We also suggest the use of more machine learning methods in climate impact assessment.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136262842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does noise affect housing prices? A case study in the urban area of Thessaloniki 噪音会影响房价吗?以塞萨洛尼基市区为例
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-17 DOI: 10.1140/epjds/s13688-023-00424-3
Georgios Kamtziridis, Dimitris Vrakas, Grigorios Tsoumakas
Abstract Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.
房地产市场依赖于各种方法来预测房价,包括在住宅或商业物业数据集上训练的模型。大多数研究都试图通过利用基本财产特征以及城市特征(如与便利设施的距离和道路可达性)等数据来创建更准确的机器学习模型。尽管噪音污染等环境因素可能会影响价格,但围绕这一主题的研究是有限的。其中一个原因是缺乏数据。在本文中,我们基于希腊环境和能源部对希腊塞萨洛尼基市进行的已发表研究,重建并公开了通用噪声污染数据集。然后,我们在塞萨洛尼基不同地区的房地产数据上训练集成机器学习模型,如XGBoost,通过可解释性评估技术来研究噪音影响价格的方式。我们的研究提供了一个新的噪声污染数据集,不仅证明了噪声对房价的影响,而且表明噪声对同一城市不同区域价格的影响存在显著差异。
{"title":"Does noise affect housing prices? A case study in the urban area of Thessaloniki","authors":"Georgios Kamtziridis, Dimitris Vrakas, Grigorios Tsoumakas","doi":"10.1140/epjds/s13688-023-00424-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00424-3","url":null,"abstract":"Abstract Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135994955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The structure of segregation in co-authorship networks and its impact on scientific production 合作作者网络中的隔离结构及其对科学成果的影响
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-09 DOI: 10.1140/epjds/s13688-023-00411-8
Ana Maria Jaramillo, Hywel T. P. Williams, Nicola Perra, Ronaldo Menezes
Abstract Co-authorship networks, where nodes represent authors and edges represent co-authorship relations, are key to understanding the production and diffusion of knowledge in academia. Social constructs, biases (implicit and explicit), and constraints (e.g. spatial, temporal) affect who works with whom and cause co-authorship networks to organise into tight communities with different levels of segregation. We aim to examine aspects of the co-authorship network structure that lead to segregation and its impact on scientific production. We measure segregation using the Spectral Segregation Index (SSI) and find four ordered categories: completely segregated, highly segregated, moderately segregated and non-segregated communities. We direct our attention to the non-segregated and highly segregated communities, quantifying and comparing their structural topologies and k -core positions. When considering communities of both categories (controlling for size), our results show no differences in density and clustering but substantial variability in the core position. Larger non-segregated communities are more likely to occupy cores near the network nucleus, while the highly segregated ones tend to be closer to the network periphery. Finally, we analyse differences in citations gained by researchers within communities of different segregation categories. Researchers in highly segregated communities get more citations from their community members in middle cores and gain more citations per publication in middle/periphery cores. Those in non-segregated communities get more citations per publication in the nucleus. To our knowledge, this work is the first to characterise community segregation in co-authorship networks and investigate the relationship between community segregation and author citations. Our results help study highly segregated communities of scientific co-authors and can pave the way for intervention strategies to improve the growth and dissemination of scientific knowledge.
共同作者网络是理解学术界知识生产和传播的关键,节点代表作者,边缘代表共同作者关系。社会结构、偏见(隐性和显性)和约束(如空间、时间)影响着谁与谁合作,并导致合著网络组织成具有不同程度隔离的紧密社区。我们的目标是研究导致隔离的共同作者网络结构的各个方面及其对科学生产的影响。我们使用光谱隔离指数(SSI)测量隔离,并找到四个有序的类别:完全隔离,高度隔离,适度隔离和非隔离社区。我们将注意力集中在非隔离和高度隔离的社区,量化和比较它们的结构拓扑和k -核心位置。当考虑这两个类别的群落(控制大小)时,我们的结果显示密度和聚类没有差异,但核心位置有很大的变化。较大的非隔离社区更有可能占据靠近网络核心的核心,而高度隔离的社区往往更靠近网络外围。最后,我们分析了研究人员在不同隔离类别的群体中获得的引文差异。高度隔离社区的研究人员从中间核心的社区成员那里获得了更多的引用,而在中间/外围核心,每篇论文获得了更多的引用。那些在非种族隔离社区的人在核心群体中每篇文章得到更多的引用。据我们所知,这项工作是第一次描述共同作者网络中的社区隔离,并调查社区隔离与作者引用之间的关系。我们的研究结果有助于研究高度隔离的科学共同作者群体,并为干预策略铺平道路,以改善科学知识的增长和传播。
{"title":"The structure of segregation in co-authorship networks and its impact on scientific production","authors":"Ana Maria Jaramillo, Hywel T. P. Williams, Nicola Perra, Ronaldo Menezes","doi":"10.1140/epjds/s13688-023-00411-8","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00411-8","url":null,"abstract":"Abstract Co-authorship networks, where nodes represent authors and edges represent co-authorship relations, are key to understanding the production and diffusion of knowledge in academia. Social constructs, biases (implicit and explicit), and constraints (e.g. spatial, temporal) affect who works with whom and cause co-authorship networks to organise into tight communities with different levels of segregation. We aim to examine aspects of the co-authorship network structure that lead to segregation and its impact on scientific production. We measure segregation using the Spectral Segregation Index (SSI) and find four ordered categories: completely segregated, highly segregated, moderately segregated and non-segregated communities. We direct our attention to the non-segregated and highly segregated communities, quantifying and comparing their structural topologies and k -core positions. When considering communities of both categories (controlling for size), our results show no differences in density and clustering but substantial variability in the core position. Larger non-segregated communities are more likely to occupy cores near the network nucleus, while the highly segregated ones tend to be closer to the network periphery. Finally, we analyse differences in citations gained by researchers within communities of different segregation categories. Researchers in highly segregated communities get more citations from their community members in middle cores and gain more citations per publication in middle/periphery cores. Those in non-segregated communities get more citations per publication in the nucleus. To our knowledge, this work is the first to characterise community segregation in co-authorship networks and investigate the relationship between community segregation and author citations. Our results help study highly segregated communities of scientific co-authors and can pave the way for intervention strategies to improve the growth and dissemination of scientific knowledge.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135095076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction 研究作者和出版物特定特征对学者h指数预测的贡献
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-06 DOI: 10.1140/epjds/s13688-023-00421-6
Fakhri Momeni, Philipp Mayr, Stefan Dietze
Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.
对研究人员的产出进行评估对招聘委员会和资助机构至关重要,通常通过他们的科学生产力、引用或h指数等综合指标来衡量。对年轻研究人员的评价更为关键,因为他们需要一段时间才能获得引用和h指数的增长。因此,预测h指数有助于发现研究人员的科学影响力。此外,识别影响因素来预测科学影响有助于研究人员及其组织寻求改善科学影响的解决方案。本研究探讨了作者、论文/地点特征对未来h指数的影响。为此,我们使用机器学习方法来预测h指数,并使用特征分析技术来提高对特征影响的理解。利用Scopus中的文献计量数据,我们定义并提取了两组主要特征。第一个与先前的科学影响有关,我们将其命名为“基于先前影响的特征”,包括出版物数量、收到的引用和h指数。第二组是“非先验影响特征”,包含与作者、合著者、论文和地点特征相关的特征。我们探讨了它们在预测科研人员职业生涯三个阶段的h指数中的重要性。此外,我们还研究了不同特征类别预测性能的时间维度,以找出哪些特征在长期和短期预测中更可靠。我们参考了作者的性别来检验作者的特征在预测任务中的作用。我们的研究结果表明,性别对预测h指数的影响很小。虽然结果表明,在不久的将来,包含基于先验影响的特征的模型对所有研究人员的群体都有更好的表现,但我们发现,从长远来看,非基于先验影响的特征对年轻学者来说是更稳健的预测因子。此外,从长期来看,先前基于影响的特征会比其他特征失去更多预测能力。
{"title":"Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction","authors":"Fakhri Momeni, Philipp Mayr, Stefan Dietze","doi":"10.1140/epjds/s13688-023-00421-6","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00421-6","url":null,"abstract":"Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135351680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The concept of decentralization through time and disciplines: a quantitative exploration 时间和学科的分权概念:定量探索
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-03 DOI: 10.1140/epjds/s13688-023-00418-1
Gabriele Di Bona, Alberto Bracci, Nicola Perra, Vito Latora, Andrea Baronchelli
Abstract Decentralization is a pervasive concept found across disciplines, including Economics, Political Science, and Computer Science, where it is used in distinct yet interrelated ways. Here, we develop and publicly release a general pipeline to investigate the scholarly history of the term, analysing $425{,}144$ 425 , 144 academic publications that refer to (de)centralization . We find that the fraction of papers on the topic has been exponentially increasing since the 1950s. In 2021, 1 author in 154 mentioned (de)centralization in the title or abstract of an article. Using both semantic information and citation patterns, we cluster papers in fields and characterize the knowledge flows between them. Our analysis reveals that the topic has independently emerged in the different fields, with small cross-disciplinary contamination. Moreover, we show how Blockchain has become the most influential field about 10 years ago, while Governance dominated before the 1990s. In summary, our findings provide a quantitative assessment of the evolution of a key yet elusive concept, which has undergone cycles of rise and fall within different fields. Our pipeline offers a powerful tool to analyze the evolution of any scholarly term in the academic literature, providing insights into the interplay between collective and independent discoveries in science.
分权是一个广泛存在于各个学科的概念,包括经济学、政治学和计算机科学,它以不同但相互关联的方式被使用。在这里,我们开发并公开发布了一个通用的管道来调查这个术语的学术历史,分析$425{,}$425,144学术出版物提到(去)中心化。我们发现,自20世纪50年代以来,关于这一主题的论文比例呈指数级增长。2021年,154位作者中有1位在文章标题或摘要中提到(去)中心化。利用语义信息和引文模式,对不同领域的论文进行聚类,并对它们之间的知识流动进行表征。我们的分析表明,该主题在不同领域独立出现,交叉学科污染较小。此外,我们展示了区块链如何在大约10年前成为最有影响力的领域,而治理在20世纪90年代之前占据主导地位。总之,我们的研究结果为一个关键但难以捉摸的概念的演变提供了定量评估,该概念在不同领域经历了兴衰周期。我们的管道提供了一个强大的工具来分析学术文献中任何学术术语的演变,提供对科学中集体发现和独立发现之间相互作用的见解。
{"title":"The concept of decentralization through time and disciplines: a quantitative exploration","authors":"Gabriele Di Bona, Alberto Bracci, Nicola Perra, Vito Latora, Andrea Baronchelli","doi":"10.1140/epjds/s13688-023-00418-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00418-1","url":null,"abstract":"Abstract Decentralization is a pervasive concept found across disciplines, including Economics, Political Science, and Computer Science, where it is used in distinct yet interrelated ways. Here, we develop and publicly release a general pipeline to investigate the scholarly history of the term, analysing $425{,}144$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mn>425</mml:mn> <mml:mo>,</mml:mo> <mml:mn>144</mml:mn> </mml:math> academic publications that refer to (de)centralization . We find that the fraction of papers on the topic has been exponentially increasing since the 1950s. In 2021, 1 author in 154 mentioned (de)centralization in the title or abstract of an article. Using both semantic information and citation patterns, we cluster papers in fields and characterize the knowledge flows between them. Our analysis reveals that the topic has independently emerged in the different fields, with small cross-disciplinary contamination. Moreover, we show how Blockchain has become the most influential field about 10 years ago, while Governance dominated before the 1990s. In summary, our findings provide a quantitative assessment of the evolution of a key yet elusive concept, which has undergone cycles of rise and fall within different fields. Our pipeline offers a powerful tool to analyze the evolution of any scholarly term in the academic literature, providing insights into the interplay between collective and independent discoveries in science.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135739068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can Google Trends predict asylum-seekers’ destination choices? 谷歌趋势能预测寻求庇护者的目的地选择吗?
2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-02 DOI: 10.1140/epjds/s13688-023-00419-0
Haodong Qi, Tuba Bircan
Abstract Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration.
谷歌趋势(GT)整理搜索关键字的数量随着时间和地理位置。从理论上讲,这些数据可以提供人们事先迁移意图的见解,因此对未来迁移的预测分析很有用。然而,从经验上看,GT的预测能力是敏感的,它可能会因地理环境、选择用于分析的搜索关键词、谷歌的市场份额以及用户的特征和搜索行为等而有所不同。与之前大多数试图证明使用GT预测迁移流的好处的研究不同,本文解决了一个关键但较少讨论的问题:GT何时不能增强迁移模型的性能。利用欧盟统计局关于首次庇护申请的统计数据和从各种数据源收集的一组推拉指标,我们训练了移民文献中常用的三类重力模型,并研究了包含GT如何影响模型预测难民目的地选择的能力。结果表明,包括GT的影响在很大程度上取决于不同模型的复杂性。具体来说,GT只能提高相对简单的模型的性能,而不能提高那些由流量固定效应或自回归效应增强的模型的性能。这些发现要求在迁移建模和预测的背景下,更全面地分析使用GT以及其他数字痕迹数据的优势和局限性。我们希望这种细致入微的视角能够促进该领域的进一步创新,并最终使我们更接近人类迁移的全面建模框架。
{"title":"Can Google Trends predict asylum-seekers’ destination choices?","authors":"Haodong Qi, Tuba Bircan","doi":"10.1140/epjds/s13688-023-00419-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-023-00419-0","url":null,"abstract":"Abstract Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135828506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
EPJ Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1