首页 > 最新文献

EPJ Data Science最新文献

英文 中文
Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries.
IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-03-24 DOI: 10.1140/epjds/s13688-025-00532-2
Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger

Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.

{"title":"Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries.","authors":"Lorenzo Lucchini, Ollin D Langle-Chimal, Lorenzo Candeago, Lucio Melito, Alex Chunet, Aleister Montfort, Bruno Lepri, Nancy Lozano-Gracia, Samuel P Fraiberger","doi":"10.1140/epjds/s13688-025-00532-2","DOIUrl":"10.1140/epjds/s13688-025-00532-2","url":null,"abstract":"<p><p>Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. Users living in low-wealth neighborhoods were less likely to respond by self-isolating, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among users living in low-wealth neighborhoods, those who commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk of experiencing economic stress, facing both the reduction in economic activity in the high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute distances. While confinement policies were predominantly country-wide, these results suggest that, when data to identify vulnerable individuals are not readily available, GPS-based analytics could help design targeted place-based policies to aid the most vulnerable.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00532-2.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"25"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11933202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping global value chains at the product level.
IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-03-12 DOI: 10.1140/epjds/s13688-025-00521-5
Lea Karbevska, César A Hidalgo

Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the "World Input-Output Database", "Inter-Country Input-Output Tables", "EXIOBASE", and "EORA", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.

{"title":"Mapping global value chains at the product level.","authors":"Lea Karbevska, César A Hidalgo","doi":"10.1140/epjds/s13688-025-00521-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-025-00521-5","url":null,"abstract":"<p><p>Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the \"World Input-Output Database\", \"Inter-Country Input-Output Tables\", \"EXIOBASE\", and \"EORA\", lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and instead rely on aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method that leverages ideas from machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 1200+ products and 250+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to infer value chain information implicit in their trade patterns. In short, we leverage the idea that due to global value chains, regions specialized in the export of a product will tend to specialize in the import of its inputs. We use this idea to develop a novel proportional allocation model to estimate product-level trade flows between regions and countries. This contributes a method to approximate value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00521-5.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"21"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11903633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding stock market instability via graph auto-encoders.
IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-02-19 DOI: 10.1140/epjds/s13688-025-00523-3
Dragos Gorduza, Stefan Zohren, Xiaowen Dong

Understanding stock market instability is a key question in financial management as practitioners seek to forecast breakdowns in long-run asset co-movement patterns which expose portfolios to rapid and devastating collapses in value. These disruptions are linked to changes in the structure of market wide stock correlations which increase the risk of high volatility shocks. The structure of these co-movements can be described as a network where companies are represented by nodes while edges capture correlations between their price movements. Co-movement breakdowns then manifest as abrupt changes in the topological structure of this network. Measuring the scale of this change and learning a timely indicator of breakdowns is central in understanding both financial stability and volatility forecasting. We propose to use the edge reconstruction accuracy of a graph auto-encoder as an indicator for how homogeneous connections between assets are, which we use, based on the literature of financial network analysis, as a proxy to infer market volatility. We show, through our experiments on the Standard and Poor's index over the 2015-2022 period, that the reconstruction errors from our model correlate with volatility spikes and can be used to improve out-of-sample autoregressive modeling of volatility. Our results demonstrate that market instability can be predicted by changes in the homogeneity in connections of the financial network which expands the understanding of instability in the stock market. We discuss the implications of this graph machine learning-based volatility estimation for policy targeted at ensuring financial market stability.

{"title":"Understanding stock market instability via graph auto-encoders.","authors":"Dragos Gorduza, Stefan Zohren, Xiaowen Dong","doi":"10.1140/epjds/s13688-025-00523-3","DOIUrl":"10.1140/epjds/s13688-025-00523-3","url":null,"abstract":"<p><p>Understanding stock market instability is a key question in financial management as practitioners seek to forecast breakdowns in long-run asset co-movement patterns which expose portfolios to rapid and devastating collapses in value. These disruptions are linked to changes in the structure of market wide stock correlations which increase the risk of high volatility shocks. The structure of these co-movements can be described as a network where companies are represented by nodes while edges capture correlations between their price movements. Co-movement breakdowns then manifest as abrupt changes in the topological structure of this network. Measuring the scale of this change and learning a timely indicator of breakdowns is central in understanding both financial stability and volatility forecasting. We propose to use the edge reconstruction accuracy of a graph auto-encoder as an indicator for how homogeneous connections between assets are, which we use, based on the literature of financial network analysis, as a proxy to infer market volatility. We show, through our experiments on the Standard and Poor's index over the 2015-2022 period, that the reconstruction errors from our model correlate with volatility spikes and can be used to improve out-of-sample autoregressive modeling of volatility. Our results demonstrate that market instability can be predicted by changes in the homogeneity in connections of the financial network which expands the understanding of instability in the stock market. We discuss the implications of this graph machine learning-based volatility estimation for policy targeted at ensuring financial market stability.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"13"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839781/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143482451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly supervised veracity classification with LLM-predicted credibility signals.
IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-02-21 DOI: 10.1140/epjds/s13688-025-00534-0
João A Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity of online content. Automating the extraction of credibility signals presents significant challenges due to the necessity of training high-accuracy, signal-specific extractors, coupled with the lack of sufficiently large annotated datasets. This paper introduces Pastel (Prompted weAk Supervision wiTh crEdibility signaLs), a weakly supervised approach that leverages large language models (LLMs) to extract credibility signals from web content, and subsequently combines them to predict the veracity of content without relying on human supervision. We validate our approach using four article-level misinformation detection datasets, demonstrating that Pastel outperforms zero-shot veracity detection by 38.3% and achieves 86.7% of the performance of the state-of-the-art system trained with human supervision. Moreover, in cross-domain settings where training and testing datasets originate from different domains, Pastel significantly outperforms the state-of-the-art supervised model by 63%. We further study the association between credibility signals and veracity, and perform an ablation study showing the impact of each signal on model performance. Our findings reveal that 12 out of the 19 proposed signals exhibit strong associations with veracity across all datasets, while some signals show domain-specific strengths.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00534-0.

{"title":"Weakly supervised veracity classification with LLM-predicted credibility signals.","authors":"João A Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton","doi":"10.1140/epjds/s13688-025-00534-0","DOIUrl":"10.1140/epjds/s13688-025-00534-0","url":null,"abstract":"<p><p>Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity of online content. Automating the extraction of credibility signals presents significant challenges due to the necessity of training high-accuracy, signal-specific extractors, coupled with the lack of sufficiently large annotated datasets. This paper introduces Pastel (<b>P</b>rompted we<b>A</b>k <b>S</b>upervision wi<b>T</b>h cr<b>E</b>dibility signa<b>L</b>s), a weakly supervised approach that leverages large language models (LLMs) to extract credibility signals from web content, and subsequently combines them to predict the veracity of content without relying on human supervision. We validate our approach using four article-level misinformation detection datasets, demonstrating that Pastel outperforms zero-shot veracity detection by 38.3% and achieves 86.7% of the performance of the state-of-the-art system trained with human supervision. Moreover, in cross-domain settings where training and testing datasets originate from different domains, Pastel significantly outperforms the state-of-the-art supervised model by 63%. We further study the association between credibility signals and veracity, and perform an ablation study showing the impact of each signal on model performance. Our findings reveal that 12 out of the 19 proposed signals exhibit strong associations with veracity across all datasets, while some signals show domain-specific strengths.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00534-0.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"16"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143482452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of playlist characteristics on coherence in user-curated music playlists.
IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-03-19 DOI: 10.1140/epjds/s13688-025-00531-3
Harald Schweiger, Emilia Parada-Cabaleiro, Markus Schedl

Music playlist creation is a crucial, yet not fully explored task in music data mining and music information retrieval. Previous studies have largely focused on investigating diversity, popularity, and serendipity of tracks in human- or machine-generated playlists. However, the concept of playlist coherence - vaguely defined as smooth transitions between tracks - remains poorly understood and even lacks a standardized definition. In this paper, we provide a formal definition for measuring playlist coherence based on the sequential ordering of tracks, offering a more interpretable measurement compared to existing literature, and allowing for comparisons between playlists with different musical styles. The presented formal framework to measure coherence is applied to analyze a substantial dataset of user-generated playlists, examining how various playlist characteristics influence coherence. We identified four key attributes: playlist length, number of edits, track popularity, and collaborative playlist curation as potential influencing factors. Using correlation and causal inference models, the impact of these attributes on coherence across ten auditory and one metadata feature are assessed. Our findings indicate that these attributes influence playlist coherence to varying extents. Longer playlists tend to exhibit higher coherence, whereas playlists dominated by popular tracks or those extensively modified by users show reduced coherence. In contrast, collaborative playlist curation yielded mixed results. The insights from this study have practical implications for enhancing recommendation tasks, such as automatic playlist generation and continuation, beyond traditional accuracy metrics. As a demonstration of these findings, we propose a simple greedy algorithm that reorganizes playlists to align coherence with observed trends.

Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-025-00531-3.

{"title":"The impact of playlist characteristics on coherence in user-curated music playlists.","authors":"Harald Schweiger, Emilia Parada-Cabaleiro, Markus Schedl","doi":"10.1140/epjds/s13688-025-00531-3","DOIUrl":"10.1140/epjds/s13688-025-00531-3","url":null,"abstract":"<p><p>Music playlist creation is a crucial, yet not fully explored task in music data mining and music information retrieval. Previous studies have largely focused on investigating diversity, popularity, and serendipity of tracks in human- or machine-generated playlists. However, the concept of playlist coherence - vaguely defined as smooth transitions between tracks - remains poorly understood and even lacks a standardized definition. In this paper, we provide a formal definition for measuring playlist coherence based on the sequential ordering of tracks, offering a more interpretable measurement compared to existing literature, and allowing for comparisons between playlists with different musical styles. The presented formal framework to measure coherence is applied to analyze a substantial dataset of user-generated playlists, examining how various playlist characteristics influence coherence. We identified four key attributes: playlist length, number of edits, track popularity, and collaborative playlist curation as potential influencing factors. Using correlation and causal inference models, the impact of these attributes on coherence across ten auditory and one metadata feature are assessed. Our findings indicate that these attributes influence playlist coherence to varying extents. Longer playlists tend to exhibit higher coherence, whereas playlists dominated by popular tracks or those extensively modified by users show reduced coherence. In contrast, collaborative playlist curation yielded mixed results. The insights from this study have practical implications for enhancing recommendation tasks, such as automatic playlist generation and continuation, beyond traditional accuracy metrics. As a demonstration of these findings, we propose a simple greedy algorithm that reorganizes playlists to align coherence with observed trends.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1140/epjds/s13688-025-00531-3.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":"24"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11923031/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143691361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating work engagement from online chat tools 估算在线聊天工具的工作参与度
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-05 DOI: 10.1140/epjds/s13688-024-00496-9
Hiroaki Tanaka, Wataru Yamada, Keiichi Ochiai, Shoko Wakamiya, Eiji Aramaki

The Covid-19 pandemic, caused by the SARS-Cov2- virus, has transformed our lives. To combat the spread of the infection, remote work has become a widespread practice. However, this shift has led to various work-related problems, including prolonged working hours, mental health issues, and communication difficulties. One particular challenge faced by team members is the inability to accurately gauge the work engagement (WE) levels of subordinates, such as their absorption, dedication, and vigor, due to the limited number of in-person interactions that occur in remote work settings. To address this issue, online communication systems utilizing text-based chat tools such as Slack and Microsoft Teams have gained popularity as substitutes for face-to-face communication. In this paper, we propose a novel approach that uses graph neural networks (GNNs) to estimate the work engagement levels (WELs) of users on text-based chat platforms. Specifically, our method involves embedding users in a feature space based solely on the structural information of the utilized communication network, without considering the contents of the conversations that take place. We conduct two studies using Slack data to evaluate our proposal. The first study reveals that the properties of communication networks play a more significant role when estimating WELs than do conversation contents. Building upon this result, the second study involves the development of a machine learning model that estimates WELs using only the architectural features of the employed communication network. In this network representation, each node corresponds to a human user, and edges represent communication logs; i.e., if person A talks to person B, the edge between node A and node B is stretched. Notably, our model achieves a correlation coefficient of 0.60 between the observed and predicted WEL values. Importantly, our proposed approach relies solely on communication network data and does not require linguistic information. This makes it particularly valuable for real-world business situations.

由 SARS-Cov2- 病毒引起的 Covid-19 大流行改变了我们的生活。为了抵御感染的传播,远程工作已成为一种普遍做法。然而,这种转变导致了各种与工作相关的问题,包括工作时间延长、心理健康问题和沟通困难。团队成员面临的一个特殊挑战是,由于远程工作环境中面对面交流的次数有限,因此无法准确衡量下属的工作投入(WE)水平,如他们的吸收力、敬业度和活力。为了解决这个问题,利用 Slack 和 Microsoft Teams 等基于文本的聊天工具的在线交流系统作为面对面交流的替代品受到了欢迎。在本文中,我们提出了一种新方法,利用图神经网络(GNN)来估计用户在基于文本的聊天平台上的工作参与度(WEL)。具体来说,我们的方法是仅根据所使用的通信网络的结构信息将用户嵌入特征空间,而不考虑所发生的对话内容。我们使用 Slack 数据进行了两项研究,以评估我们的建议。第一项研究表明,在估算 WEL 时,通信网络的属性比对话内容发挥着更重要的作用。在这一结果的基础上,第二项研究开发了一个机器学习模型,该模型仅使用所使用的通信网络的架构特征来估算 WEL。在这种网络表示法中,每个节点对应一个人类用户,而边代表通信日志;也就是说,如果 A 人与 B 人交谈,节点 A 和节点 B 之间的边就会被拉伸。值得注意的是,我们的模型在观察到的 WEL 值和预测的 WEL 值之间达到了 0.60 的相关系数。重要的是,我们提出的方法完全依赖于通信网络数据,而不需要语言信息。这使得它在现实世界的商业环境中特别有价值。
{"title":"Estimating work engagement from online chat tools","authors":"Hiroaki Tanaka, Wataru Yamada, Keiichi Ochiai, Shoko Wakamiya, Eiji Aramaki","doi":"10.1140/epjds/s13688-024-00496-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00496-9","url":null,"abstract":"<p>The Covid-19 pandemic, caused by the SARS-Cov2- virus, has transformed our lives. To combat the spread of the infection, remote work has become a widespread practice. However, this shift has led to various work-related problems, including prolonged working hours, mental health issues, and communication difficulties. One particular challenge faced by team members is the inability to accurately gauge the work engagement (WE) levels of subordinates, such as their absorption, dedication, and vigor, due to the limited number of in-person interactions that occur in remote work settings. To address this issue, online communication systems utilizing text-based chat tools such as Slack and Microsoft Teams have gained popularity as substitutes for face-to-face communication. In this paper, we propose a novel approach that uses graph neural networks (GNNs) to estimate the work engagement levels (WELs) of users on text-based chat platforms. Specifically, our method involves embedding users in a feature space based solely on the structural information of the utilized communication network, without considering the contents of the conversations that take place. We conduct two studies using Slack data to evaluate our proposal. The first study reveals that the properties of communication networks play a more significant role when estimating WELs than do conversation contents. Building upon this result, the second study involves the development of a machine learning model that estimates WELs using only the architectural features of the employed communication network. In this network representation, each node corresponds to a human user, and edges represent communication logs; i.e., if person A talks to person B, the edge between node A and node B is stretched. Notably, our model achieves a correlation coefficient of 0.60 between the observed and predicted WEL values. Importantly, our proposed approach relies solely on communication network data and does not require linguistic information. This makes it particularly valuable for real-world business situations.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language and the use of law are predictive of judge gender and seniority 语言和法律的使用可预测法官的性别和资历
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-02 DOI: 10.1140/epjds/s13688-024-00494-x
Lluc Font-Pomarol, Angelo Piga, Sergio Nasarre-Aznar, Marta Sales-Pardo, Roger Guimerà

There are examples of how unconscious bias can influence actions of people. In the judiciary, however, despite some examples there is no general theory on whether different demographic attributes such as gender, seniority or ethnicity affect case sentencing. We aim to gain insight into this issue by analyzing over 100k decisions of three different areas of law with the goal of understanding whether judge identity or judge attributes such as gender and seniority can be inferred from decision documents. We find that stylistic features of decisions are predictive of judge identities, their gender and their seniority, a finding that is aligned with results from analysis of written texts outside the judiciary. Surprisingly, we find that features based on legislation cited are also predictive of judge identities and attributes. While own content reuse by judges can explain our ability to predict judge identities, no specific reduced set of features can explain the differences we find in the legislation cited of decisions when we group judges by gender or seniority. Our findings open the door for further research on how these differences translate into how judges apply the law and, ultimately, to promote a more transparent and fair judiciary system.

无意识的偏见会影响人们的行为,这方面的例子不胜枚举。然而,在司法领域,尽管有一些例子,但对于性别、资历或种族等不同的人口属性是否会影响案件判决,却没有普遍的理论。我们分析了三个不同法律领域的 10 多万份判决,旨在了解是否可以从判决文件中推断出法官身份或法官属性(如性别和资历),从而深入了解这一问题。我们发现,判决书的文体特征可以预测法官身份、性别和资历,这一发现与司法机构以外的书面文本分析结果一致。令人惊讶的是,我们发现基于所引用立法的特征也能预测法官的身份和属性。虽然法官重复使用自己的内容可以解释我们预测法官身份的能力,但当我们按性别或资历对法官进行分组时,没有一组特定的缩减特征可以解释我们发现的判决所引用立法的差异。我们的发现为进一步研究这些差异如何转化为法官如何适用法律打开了大门,并最终促进司法系统更加透明和公平。
{"title":"Language and the use of law are predictive of judge gender and seniority","authors":"Lluc Font-Pomarol, Angelo Piga, Sergio Nasarre-Aznar, Marta Sales-Pardo, Roger Guimerà","doi":"10.1140/epjds/s13688-024-00494-x","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00494-x","url":null,"abstract":"<p>There are examples of how unconscious bias can influence actions of people. In the judiciary, however, despite some examples there is no general theory on whether different demographic attributes such as gender, seniority or ethnicity affect case sentencing. We aim to gain insight into this issue by analyzing over 100k decisions of three different areas of law with the goal of understanding whether judge identity or judge attributes such as gender and seniority can be inferred from decision documents. We find that stylistic features of decisions are predictive of judge identities, their gender and their seniority, a finding that is aligned with results from analysis of written texts outside the judiciary. Surprisingly, we find that features based on legislation cited are also predictive of judge identities and attributes. While own content reuse by judges can explain our ability to predict judge identities, no specific reduced set of features can explain the differences we find in the legislation cited of decisions when we group judges by gender or seniority. Our findings open the door for further research on how these differences translate into how judges apply the law and, ultimately, to promote a more transparent and fair judiciary system.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Connection between climatic change and international food prices: evidence from robust long-range cross-correlation and variable-lag transfer entropy with sliding windows approach 气候变化与国际粮食价格之间的联系:稳健的长程交叉相关性和滑动窗口法的变滞后转移熵证据
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-14 DOI: 10.1140/epjds/s13688-024-00482-1
Zouhaier Dhifaoui

As nations progress, the impact of climate change on food prices becomes increasingly substantial. While the influence of climate change on the yields of major agricultural products is widely recognized, its specific effect on food prices remains uncertain. This study delves into the impact of the North Atlantic Oscillation (NAO) index, a well-established climate indicator, on global food prices. To accomplish this, a robust bivariate Hurst exponent (robust bHe) is applied. The study employs a sliding windows approach across various time scales to produce a color map of this coefficient, presenting a time-varying version. Furthermore, variable-lag transfer entropy with a sliding windows approach is utilized to discern causal relationships between the NAO index and international food prices. The findings reveal that significant increases in the NAO index are correlated with noteworthy upswings in various international food prices over both short and long-term periods. Additionally, variable-lag transfer entropy confirms the causal role of the NAO index in influencing international food prices.

随着国家的进步,气候变化对粮食价格的影响越来越大。虽然气候变化对主要农产品产量的影响已得到广泛认可,但其对粮食价格的具体影响仍不确定。本研究深入探讨了北大西洋涛动指数(NAO)这一成熟的气候指标对全球粮食价格的影响。为此,采用了稳健双变量赫斯特指数(稳健 bHe)。该研究采用了一种跨越不同时间尺度的滑动窗口方法,绘制出该系数的彩色地图,呈现出一个随时间变化的版本。此外,还利用滑动窗口法的可变滞后转移熵来判别西北农林业大学指数与国际粮食价格之间的因果关系。研究结果表明,在短期和长期内,NAO 指数的大幅上升与各种国际粮食价格的显著上升相关。此外,可变滞后转移熵也证实了西北农林业大学指数在影响国际粮食价格方面的因果作用。
{"title":"Connection between climatic change and international food prices: evidence from robust long-range cross-correlation and variable-lag transfer entropy with sliding windows approach","authors":"Zouhaier Dhifaoui","doi":"10.1140/epjds/s13688-024-00482-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00482-1","url":null,"abstract":"<p>As nations progress, the impact of climate change on food prices becomes increasingly substantial. While the influence of climate change on the yields of major agricultural products is widely recognized, its specific effect on food prices remains uncertain. This study delves into the impact of the North Atlantic Oscillation (NAO) index, a well-established climate indicator, on global food prices. To accomplish this, a robust bivariate Hurst exponent (robust bHe) is applied. The study employs a sliding windows approach across various time scales to produce a color map of this coefficient, presenting a time-varying version. Furthermore, variable-lag transfer entropy with a sliding windows approach is utilized to discern causal relationships between the NAO index and international food prices. The findings reveal that significant increases in the NAO index are correlated with noteworthy upswings in various international food prices over both short and long-term periods. Additionally, variable-lag transfer entropy confirms the causal role of the NAO index in influencing international food prices.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"34 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter 亲近朋友,亲近敌人:推特上负面关系的结构特性
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-09 DOI: 10.1140/epjds/s13688-024-00485-y
Jack Tacchi, Chiara Boldrini, Andrea Passarella, Marco Conti

The Ego Network Model (ENM) is a model for the structural organisation of relationships, rooted in evolutionary anthropology, that is found ubiquitously in social contexts. It takes the perspective of a single user (Ego) and organises their contacts (Alters) into a series of (typically 5) concentric circles of decreasing intimacy and increasing size. Alters are sorted based on their tie strength to the Ego, however, this is difficult to measure directly. Traditionally, the interaction frequency has been used as a proxy but this misses the qualitative aspects of connections, such as signs (i.e. polarity), which have been shown to provide extremely useful information. However, the sign of an online social relationship is usually an implicit piece of information, which needs to be estimated by interaction data from Online Social Networks (OSNs), making sign prediction in OSNs a research challenge in and of itself. This work aims to bring the ENM into the signed networks domain by investigating the interplay of signed connections with the ENM. This paper delivers 2 main contributions. Firstly, a new and data-efficient method of signing relationships between individuals using sentiment analysis and, secondly, we provide an in-depth look at the properties of Signed Ego Networks (SENs), using 9 Twitter datasets of various categories of users. We find that negative connections are generally over-represented in the active part of the Ego Networks, suggesting that Twitter greatly over-emphasises negative relationships with respect to “offline” social networks. Further, users who use social networks for professional reasons have an even greater share of negative connections. Despite this, we also found weak signs that less negative users tend to allocate more cognitive effort to individual relationships and thus have smaller ego networks on average. All in all, even though structurally ENMs are known to be similar in both offline and online social networks, our results indicate that relationships on Twitter tend to nurture more negativity than offline contexts.

自我网络模型(ENM)是一种关系结构组织模型,植根于进化人类学,在社会环境中随处可见。它从单个用户(自我)的角度出发,将他们的联系人(Alters)组织成一系列(通常为 5 个)同心圆,这些同心圆的亲密程度依次递减,规模依次增大。联系人根据与自我的联系强度进行排序,但这很难直接测量。传统上,互动频率被用作一种替代指标,但这忽略了联系的质量方面,如标志(即极性),而这些标志已被证明能提供极为有用的信息。然而,在线社交关系的符号通常是一种隐含信息,需要通过在线社交网络(OSN)中的交互数据来估算,因此在 OSN 中进行符号预测本身就是一项研究挑战。这项工作旨在通过研究签名连接与 ENM 的相互作用,将 ENM 引入签名网络领域。本文有两大贡献。首先,我们提出了一种新的、数据效率高的方法,利用情感分析对个人之间的关系进行签名;其次,我们利用 9 个不同类别用户的 Twitter 数据集深入研究了签名自我网络(SEN)的特性。我们发现,在自我网络的活跃部分,负面联系的比例普遍过高,这表明与 "离线 "社交网络相比,Twitter 过度强调负面关系。此外,因职业原因而使用社交网络的用户的负面连接比例更高。尽管如此,我们也发现了一些微弱的迹象,表明负面关系较少的用户倾向于将更多的认知努力分配给个人关系,因此平均而言,他们的自我网络较小。总而言之,尽管众所周知线下和线上社交网络的ENM在结构上是相似的,但我们的研究结果表明,Twitter上的人际关系往往比线下更容易滋生消极情绪。
{"title":"Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter","authors":"Jack Tacchi, Chiara Boldrini, Andrea Passarella, Marco Conti","doi":"10.1140/epjds/s13688-024-00485-y","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00485-y","url":null,"abstract":"<p>The Ego Network Model (ENM) is a model for the structural organisation of relationships, rooted in evolutionary anthropology, that is found ubiquitously in social contexts. It takes the perspective of a single user (Ego) and organises their contacts (Alters) into a series of (typically 5) concentric circles of decreasing intimacy and increasing size. Alters are sorted based on their tie strength to the Ego, however, this is difficult to measure directly. Traditionally, the interaction frequency has been used as a proxy but this misses the qualitative aspects of connections, such as signs (i.e. polarity), which have been shown to provide extremely useful information. However, the sign of an online social relationship is usually an implicit piece of information, which needs to be estimated by interaction data from Online Social Networks (OSNs), making sign prediction in OSNs a research challenge in and of itself. This work aims to bring the ENM into the signed networks domain by investigating the interplay of signed connections with the ENM. This paper delivers 2 main contributions. Firstly, a new and data-efficient method of signing relationships between individuals using sentiment analysis and, secondly, we provide an in-depth look at the properties of Signed Ego Networks (SENs), using 9 Twitter datasets of various categories of users. We find that negative connections are generally over-represented in the active part of the Ego Networks, suggesting that Twitter greatly over-emphasises negative relationships with respect to “offline” social networks. Further, users who use social networks for professional reasons have an even greater share of negative connections. Despite this, we also found weak signs that less negative users tend to allocate more cognitive effort to <i>individual</i> relationships and thus have smaller ego networks on average. All in all, even though <i>structurally</i> ENMs are known to be similar in both offline and online social networks, our results indicate that relationships on Twitter tend to nurture more negativity than offline contexts.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"89 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing user ideologies and shared news during the 2019 argentinian elections 分析 2019 年阿根廷大选期间的用户意识形态和共享新闻
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-08 DOI: 10.1140/epjds/s13688-024-00493-y
Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela

The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users’ political ideologies and the news they share during Argentina’s 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.

社交媒体平台上产生的大量数据使我们能够深入了解热门话题和公众意见。此外,它还提供了一个了解用户行为的窗口,包括他们的内容参与和新闻分享习惯。在本研究中,我们分析了用户的政治意识形态与他们在 2019 年阿根廷大选期间分享的新闻之间的关系。我们的研究结果表明,尽管用户访问的媒体具有不同的政治倾向,但他们主要分享与其政治信仰一致的新闻。此外,我们还观察到一种一致的模式,即用户分享与其偏好的候选人相关的主题文章,这凸显了在线讨论中更深层次的政治一致性。我们相信,这一系统分析框架可应用于不同国家的类似情况,尤其是那些政治两极分化严重的国家,如阿根廷。
{"title":"Analyzing user ideologies and shared news during the 2019 argentinian elections","authors":"Sofía M. del Pozo, Sebastián Pinto, Matteo Serafino, Lucio Garcia, Hernán A. Makse, Pablo Balenzuela","doi":"10.1140/epjds/s13688-024-00493-y","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00493-y","url":null,"abstract":"<p>The extensive data generated on social media platforms allow us to gain insights over trending topics and public opinions. Additionally, it offers a window into user behavior, including their content engagement and news sharing habits. In this study, we analyze the relationship between users’ political ideologies and the news they share during Argentina’s 2019 election period. Our findings reveal that users predominantly share news that aligns with their political beliefs, despite accessing media outlets with diverse political leanings. Moreover, we observe a consistent pattern of users sharing articles related to topics biased to their preferred candidates, highlighting a deeper level of political alignment in online discussions. We believe that this systematic analysis framework can be applied to similar scenarios in different countries, especially those marked by significant political polarization, akin to Argentina.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"57 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
EPJ Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1