首页 > 最新文献

EPJ Data Science最新文献

英文 中文
Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation COVID-19 期间房价估算值的变化揭示了危机对集体投机的影响
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-10 DOI: 10.1140/epjds/s13688-024-00488-9
Alexander M. Petersen

We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.

我们利用由单个房价估算组成的城市级面板来估算 COVID-19 对美国加利福尼亚州小型和大型房地产市场的影响。对现货房价估算的描述性分析,包括在线房地产平台 Zillow.com 上列出的单个房产的同期价格不确定性和 30 天价格变化,有助于量化这一全球性社会经济冲击带来的超额估值和估值信心。我们在 COVID-19 前后的准实验性设计跨越了 2020 年前后的数年时间,并利用当时的租赁物业价格估算(即进入居住市场的场外房地产,只是不用于购买,因此没有投机行为)作为上市销售物业的适当反事实,而上市销售物业则受到场内投机行为的影响。结合单位水平匹配和多变量差分回归方法,我们对大流行病爆发后观察到的超额价格增长的符号和幅度进行了一致的估计。具体来说,我们的结果表明,在没有发生大流行病的情况下,挂牌出售的房产每月比预期的多升值 1%。这相当于每年超额价格增长约 12.7 个百分点,占 2021 年研究地区实际年度价格增长的一半以上。与此同时,价格估计的不确定性下降,这表明之前的资产泡沫具有非理性信心的特征。我们探讨了这两种趋势与市场规模、本地市场供应和借贷成本之间的关系,这些因素共同支持了不确定性和中断在决策中的反直觉作用。
{"title":"Shift in house price estimates during COVID-19 reveals effect of crisis on collective speculation","authors":"Alexander M. Petersen","doi":"10.1140/epjds/s13688-024-00488-9","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00488-9","url":null,"abstract":"<p>We exploit a city-level panel comprised of individual house price estimates to estimate the impact of COVID-19 on both small and big real-estate markets in California USA. Descriptive analysis of spot house price estimates, including contemporaneous price uncertainty and 30-day price change for individual properties listed on the online real-estate platform Zillow.com, together facilitate quantifying both the excess valuation and valuation confidence attributable to this global socio-economic shock. Our quasi-experimental pre-/post-COVID-19 design spans several years around 2020 and leverages contemporaneous price estimates of rental properties – i.e., off-market real estate entering the habitation market, just not for purchase and hence free of speculation – as an appropriate counterfactual to properties listed for sale, which are subject to on-market speculation. Combining unit-level matching and multivariate difference-in-difference regression approaches, we obtain consistent estimates regarding the sign and magnitude of excess price growth observed after the pandemic onset. Specifically, our results indicate that properties listed for sale appreciated an additional 1% per month above what would be expected in the absence of the pandemic. This corresponds to an excess annual price growth of roughly 12.7 percentage points, which accounts for more than half of the actual annual price growth in 2021 observed across the studied regions. Simultaneously, uncertainty in price estimates decreased, signaling the irrational confidence characteristic of prior asset bubbles. We explore how these two trends are related to market size, local market supply and borrowing costs, which altogether lend support for the counterintuitive roles of uncertainty and interruptions in decision-making.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141588230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Downscaling spatial interaction with socioeconomic attributes 缩小空间互动与社会经济属性的比例
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-05 DOI: 10.1140/epjds/s13688-024-00487-w
Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu

A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.

各种复杂的社会经济现象,例如移民、通勤和贸易,都可以通过空间互动网络来抽象,其中节点代表地理位置,加权边则表示互动及其强度。然而,由于收集方法和成本的限制,获取细粒度的空间交互数据在实践中非常具有挑战性,因此交通数据和贸易数据等空间交互数据往往只能在粗尺度上获得。在此,我们提出了一种重力降尺度(GD)方法,该方法基于易于获取的社会经济数据和重力定律,可从粗粒度数据中推断出细粒度的相互作用。重力降尺度法假定不同空间尺度的相互作用受类似重力定律的支配,因此可以将从粗粒度区域估算的参数转移到细粒度区域。结果表明,在不同空间尺度和不同区域的数据集上,GD 与其他降尺度方法(即均值加权法和机器学习模型)相比,平均绝对百分比误差平均改善了 24.6%。利用简单的假设,GD 可以对空间相互作用进行精确降尺度,因此适用于包括人类流动、交通和贸易在内的广泛领域。
{"title":"Downscaling spatial interaction with socioeconomic attributes","authors":"Chengling Tang, Lei Dong, Hao Guo, Xuechen Wang, Xiao-Jian Chen, Quanhua Dong, Yu Liu","doi":"10.1140/epjds/s13688-024-00487-w","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00487-w","url":null,"abstract":"<p>A variety of complex socioeconomic phenomena, for example, migration, commuting, and trade can be abstracted by spatial interaction networks, where nodes represent geographic locations and weighted edges convey the interaction and its strength. However, obtaining fine-grained spatial interaction data is very challenging in practice due to limitations in collection methods and costs, so spatial interaction data such as transportation data and trade data are often only available at a coarse scale. Here, we propose a gravity downscaling (GD) method based on readily accessible socioeconomic data and the gravity law to infer fine-grained interactions from coarse-grained data. GD assumes that interactions of different spatial scales are governed by the similar gravity law and thus can transfer the parameters estimated from coarse-grained regions to fine-grained regions. Results show that GD has an average improvement of 24.6% in Mean Absolute Percentage Error over alternative downscaling methods (i.e., the areal-weighted method and machine learning models) across datasets with different spatial scales and in various regions. Using simple assumptions, GD enables accurate downscaling of spatial interactions, making it applicable to a wide range of fields, including human mobility, transportation, and trade.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Profile update: the effects of identity disclosure on network connections and language 资料更新:身份披露对网络联系和语言的影响
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-28 DOI: 10.1140/epjds/s13688-024-00483-0
Minje Choi, Daniel M. Romero, David Jurgens

Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.

我们的社会身份决定了我们如何与周围的世界互动和交往。在网络环境中,个人可以将这些身份明确写入自己的公开传记,这可能意味着对他们来说什么是重要的以及应该如何看待他们。虽然有证据表明在网络社交平台上有意公开身份会产生影响,但其在用户层面上对参与活动的实际影响还有待探索。在此,我们首次在 Twitter 上开展大规模研究,探讨在 Twitter 个人档案上披露身份后的行为变化。通过将社交网络与自然语言处理方法和准实验分析相结合,我们发现,在个人档案中公开身份后,用户(1)会以更符合各自身份的方式发推和转推,(2)会与公开类似身份的用户建立更多联系。我们还研究了公开身份是否会增加被攻击性评论盯上的几率,发现事实上(3)通过推文和个人资料公开身份的综合效应与他人攻击性回复数量的减少有关。我们的研究结果突出表明,决定在网络空间公开自己的身份会使他们表达自己或建立联系的方式发生重大变化,而负面影响的程度却低于预期。
{"title":"Profile update: the effects of identity disclosure on network connections and language","authors":"Minje Choi, Daniel M. Romero, David Jurgens","doi":"10.1140/epjds/s13688-024-00483-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00483-0","url":null,"abstract":"<p>Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one’s identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing user reactions using relevance between location information of tweets and news articles 利用推文和新闻文章的位置信息之间的相关性分析用户反应
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-26 DOI: 10.1140/epjds/s13688-024-00465-2
Yun-Tae Jin, JaeBeom You, Shoko Wakamiya, Hyuk-Yoon Kwon

In this study, we analyze the extent of user reactions based on user’s tweets to news articles, demonstrating the potential for home location prediction. To achieve this, we quantify users’ reactions to specific news articles based on the textual similarity between tweets and news articles, showcasing that users’ reactions to news articles about their cities are significantly higher than those about other cities. To maximize the difference in reactions, we introduce the concept of News Distinctness, which highlights the news articles that affect a specific location. By incorporating News Distinctness with users’ reactions to the news, we magnify its effects. Through experiments conducted with tweets collected from users whose home locations are in five representative cities within the United States and news articles describing events occurring in those cities, we observed a 6.75% to 40% improvement in the reaction score when compared to the average reactions towards news for outside of home location, clearly predicting the home location. Furthermore, News Distinctness increases the difference in reaction score between news in the home location and the average of the news outside of the home location by 12% to 194%. These results demonstrate that our proposed idea can be utilized to predict the users’ location, potentially recommending meaningful information based on the users’ areas of interest.

在本研究中,我们根据用户对新闻报道的推文分析了用户的反应程度,从而展示了家庭位置预测的潜力。为此,我们根据推文与新闻文章之间的文本相似度量化了用户对特定新闻文章的反应,结果显示,用户对有关其所在城市的新闻文章的反应明显高于对其他城市的反应。为了最大限度地缩小反应差异,我们引入了 "新闻独特性 "的概念,突出显示影响特定地点的新闻文章。通过将 "新闻独特性 "与用户对新闻的反应相结合,我们放大了其效果。通过对收集自美国五个代表性城市用户的推文和描述这些城市所发生事件的新闻文章进行实验,我们观察到,与用户对家庭所在地以外新闻的平均反应相比,用户对新闻的反应得分提高了 6.75% 到 40%,明显预测了用户的家庭所在地。此外,"新闻独特性 "还能将家乡新闻与家乡以外新闻的平均反应分值之差提高 12% 至 194%。这些结果表明,我们提出的想法可以用来预测用户的位置,从而有可能根据用户感兴趣的领域推荐有意义的信息。
{"title":"Analyzing user reactions using relevance between location information of tweets and news articles","authors":"Yun-Tae Jin, JaeBeom You, Shoko Wakamiya, Hyuk-Yoon Kwon","doi":"10.1140/epjds/s13688-024-00465-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00465-2","url":null,"abstract":"<p>In this study, we analyze the extent of user reactions based on user’s tweets to news articles, demonstrating the potential for home location prediction. To achieve this, we quantify users’ reactions to specific news articles based on the textual similarity between tweets and news articles, showcasing that users’ reactions to news articles about their cities are significantly higher than those about other cities. To maximize the difference in reactions, we introduce the concept of <i>News Distinctness</i>, which highlights the news articles that affect a specific location. By incorporating News Distinctness with users’ reactions to the news, we magnify its effects. Through experiments conducted with tweets collected from users whose home locations are in five representative cities within the United States and news articles describing events occurring in those cities, we observed a 6.75% to 40% improvement in the reaction score when compared to the average reactions towards news for outside of home location, clearly predicting the home location. Furthermore, News Distinctness increases the difference in reaction score between news in the home location and the average of the news outside of the home location by 12% to 194%. These results demonstrate that our proposed idea can be utilized to predict the users’ location, potentially recommending meaningful information based on the users’ areas of interest.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"10 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glitter or gold? Deriving structured insights from sustainability reports via large language models 熠熠生辉还是金光闪闪?通过大型语言模型从可持续发展报告中获取结构化见解
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-07 DOI: 10.1140/epjds/s13688-024-00481-2
Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano

Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors’ increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies’ sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies’ ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.

在过去十年中,鉴于投资者对环境、社会和治理(ESG)问题的日益关注,一些监管机构开始要求上市公司披露非财务信息。公开发布的可持续发展实践信息通常是以多样化、非结构化和多模式的文件形式披露的。这给高效收集数据并将其整合到统一框架中,从而获得与企业社会责任(CSR)相关的洞察力带来了挑战。因此,使用信息提取(IE)方法成为向利益相关者提供具有洞察力和可操作性数据的直观选择。在本研究中,我们采用大型语言模型(LLM)、上下文学习(In-Context Learning)和检索-增强生成(RAG)范式,从公司的可持续发展报告中提取与 ESG 方面相关的结构化见解。然后,我们利用基于图的表示方法对提取的见解进行统计分析。这些分析表明,环境、社会和治理标准涵盖的主题范围很广,超过 500 个,往往超出了现有分类所考虑的范围,而且公司通过各种举措来解决这些问题。此外,同一地区或行业的公司在披露信息方面存在相似之处,这验证了环境、社会和公司治理文献中的假设。最后,通过将其他公司属性纳入分析,我们研究了哪些因素对公司的环境、社会和公司治理评级影响最大,结果表明环境、社会和公司治理信息披露比其他财务或公司数据对评级的影响更大。
{"title":"Glitter or gold? Deriving structured insights from sustainability reports via large language models","authors":"Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano","doi":"10.1140/epjds/s13688-024-00481-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00481-2","url":null,"abstract":"<p>Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors’ increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies’ sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies’ ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"64 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying polarization in online political discourse 量化网络政治言论中的两极分化
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-05 DOI: 10.1140/epjds/s13688-024-00480-3
Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez

In an era of increasing political polarization, its analysis becomes crucial for the understanding of democratic dynamics. This paper presents a comprehensive research on measuring political polarization on X (Twitter) during election cycles in Spain, from 2011 to 2019. A wide comparative analysis is performed on algorithms used to identify and measure polarization or controversy on microblogging platforms. This analysis is specifically tailored towards publications made by official political party accounts during pre-campaign, campaign, election day, and the week post-election. Guided by the findings of this comparative evaluation, we propose a novel algorithm better suited to capture polarization in the context of political events, which is validated with real data. As a consequence, our research contributes a significant advancement in the field of political science, social network analysis, and overall computational social science, by providing a realistic method to capture polarization from online political discourse.

在政治两极分化日益加剧的时代,分析政治两极分化对了解民主动态至关重要。本文介绍了对 2011 年至 2019 年西班牙选举周期内 X(推特)上的政治极化进行测量的综合研究。本文对用于识别和衡量微博平台上两极分化或争议的算法进行了广泛的比较分析。该分析专门针对政党官方账户在竞选前、竞选期间、选举日和选举后一周发布的信息。在比较评估结果的指导下,我们提出了一种更适合捕捉政治事件中极化现象的新算法,并通过真实数据进行了验证。因此,我们的研究为政治科学、社会网络分析和整个计算社会科学领域提供了一种捕捉网络政治言论中极化现象的现实方法,从而为这一领域做出了重大贡献。
{"title":"Quantifying polarization in online political discourse","authors":"Pau Muñoz, Alejandro Bellogín, Raúl Barba-Rojas, Fernando Díez","doi":"10.1140/epjds/s13688-024-00480-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00480-3","url":null,"abstract":"<p>In an era of increasing political polarization, its analysis becomes crucial for the understanding of democratic dynamics. This paper presents a comprehensive research on measuring political polarization on X (Twitter) during election cycles in Spain, from 2011 to 2019. A wide comparative analysis is performed on algorithms used to identify and measure polarization or controversy on microblogging platforms. This analysis is specifically tailored towards publications made by official political party accounts during pre-campaign, campaign, election day, and the week post-election. Guided by the findings of this comparative evaluation, we propose a novel algorithm better suited to capture polarization in the context of political events, which is validated with real data. As a consequence, our research contributes a significant advancement in the field of political science, social network analysis, and overall computational social science, by providing a realistic method to capture polarization from online political discourse.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"69 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
First-mover advantage in music 音乐领域的先发优势
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-17 DOI: 10.1140/epjds/s13688-024-00476-z
Oleg Sobchuk, Mason Youngblood, Olivier Morin

Why do some songs and musicians become successful while others do not? We show that one of the reasons may be the “first-mover advantage”: artists that stand at the foundation of new music genres tend to be more successful than those who join these genres later on. To test this hypothesis, we have analyzed a massive dataset of over 920,000 songs, including 110 music genres: 10 chosen intentionally and preregistered, and 100 chosen randomly. For this, we collected the data from two music services: Spotify, which provides detailed information about songs’ success (the precise number of times each song was listened to), and Every Noise at Once, which provides detailed genre tags for musicians. 91 genres, out of 110, show the first-mover advantage—clearly suggesting that it is an important mechanism in music success and evolution.

为什么有些歌曲和音乐人获得了成功,而有些却没有?我们的研究表明,原因之一可能是 "先行者优势":站在新音乐流派基础上的艺术家往往比后来加入这些流派的艺术家更成功。为了验证这一假设,我们分析了一个包含超过 92 万首歌曲的庞大数据集,其中包括 110 种音乐类型:10 种是有意选择并预先登记的,100 种是随机选择的。为此,我们从两个音乐服务机构收集了数据:Spotify 提供歌曲成功率的详细信息(每首歌曲被收听的精确次数),而 Every Noise at Once 则为音乐人提供详细的流派标签。在 110 个流派中,有 91 个流派显示出了先行者优势,这清楚地表明先行者优势是音乐成功和进化的重要机制。
{"title":"First-mover advantage in music","authors":"Oleg Sobchuk, Mason Youngblood, Olivier Morin","doi":"10.1140/epjds/s13688-024-00476-z","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00476-z","url":null,"abstract":"<p>Why do some songs and musicians become successful while others do not? We show that one of the reasons may be the “first-mover advantage”: artists that stand at the foundation of new music genres tend to be more successful than those who join these genres later on. To test this hypothesis, we have analyzed a massive dataset of over 920,000 songs, including 110 music genres: 10 chosen intentionally and preregistered, and 100 chosen randomly. For this, we collected the data from two music services: Spotify, which provides detailed information about songs’ success (the precise number of times each song was listened to), and Every Noise at Once, which provides detailed genre tags for musicians. 91 genres, out of 110, show the first-mover advantage—clearly suggesting that it is an important mechanism in music success and evolution.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"14 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online advertisement in a pink-colored market 粉色市场中的在线广告
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-08 DOI: 10.1140/epjds/s13688-024-00473-2
Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas

It is surprising that women are often charged more for products and services marketed explicitly to them. This phenomenon, known as the pink tax, is a major issue that questions women’s buying power. Nevertheless, it is not just limited to physical products – even online advertising can be subject to this type of gender-price discrimination. That is where our research comes in. We have developed a new methodology to measure what we call the digital marketing pink tax – the additional expense of delivering advertisements to female audiences. Analyzing data from Facebook advertising platforms across 187 countries and 40 territories shows this issue is systematic. Particularly, the digital marketing pink tax is prevalent in 79% of audiences across the world and 98% of audiences in highly developed countries. Therefore, advertisers incur a median cost of 30% more to display advertisements to women than men. In contrast, advertisers have to pay less digital marketing pink tax in less-developed countries (5%). Our research indicates that countries in the Middle East and Africa with a low Human Development Index (HDI) do not experience this phenomenon. Our comprehensive investigation of 24 industries reveals that advertisers must pay up to 64% of the digital marketing pink tax to target women in some industries. Our findings also suggest a connection between the digital marketing pink tax and the consumer pink tax – the extra charge placed on products marketed to women. Overall, our research sheds light on an important issue affecting women worldwide. Raising awareness of the digital marketing pink tax and advocating for better regulation.

令人吃惊的是,专门针对女性销售的产品和服务往往向女性收取更高的费用。这种现象被称为 "粉红税",是质疑女性购买力的一个重要问题。然而,这种现象并不局限于实体产品,即使是网络广告也会受到这种性别价格歧视的影响。这正是我们研究的重点所在。我们开发了一种新的方法来衡量我们所说的数字营销粉红税--向女性受众投放广告的额外费用。通过分析来自 187 个国家和 40 个地区的 Facebook 广告平台的数据,我们发现这个问题是系统性的。尤其是,在全球 79% 的受众和 98% 的高度发达国家受众中,数字营销粉红税普遍存在。因此,广告商向女性展示广告的成本中位数要比男性高出 30%。相比之下,在欠发达国家,广告商需要支付的数字营销粉红税更少(5%)。我们的研究表明,人类发展指数(HDI)较低的中东和非洲国家并没有出现这种现象。我们对 24 个行业的全面调查显示,在某些行业中,广告商必须支付高达 64% 的数字营销粉红税,才能将目标对准女性。我们的研究结果还表明,数字营销粉红税与消费者粉红税之间存在联系--消费者粉红税是对面向女性销售的产品征收的额外费用。总之,我们的研究揭示了影响全球女性的一个重要问题。提高对数字营销粉红税的认识,倡导更好的监管。
{"title":"Online advertisement in a pink-colored market","authors":"Amir Mehrjoo, Rubén Cuevas, Ángel Cuevas","doi":"10.1140/epjds/s13688-024-00473-2","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00473-2","url":null,"abstract":"<p>It is surprising that women are often charged more for products and services marketed explicitly to them. This phenomenon, known as the pink tax, is a major issue that questions women’s buying power. Nevertheless, it is not just limited to physical products – even online advertising can be subject to this type of gender-price discrimination. That is where our research comes in. We have developed a new methodology to measure what we call the digital marketing pink tax – the additional expense of delivering advertisements to female audiences. Analyzing data from Facebook advertising platforms across 187 countries and 40 territories shows this issue is systematic. Particularly, the digital marketing pink tax is prevalent in 79% of audiences across the world and 98% of audiences in highly developed countries. Therefore, advertisers incur a median cost of 30% more to display advertisements to women than men. In contrast, advertisers have to pay less digital marketing pink tax in less-developed countries (5%). Our research indicates that countries in the Middle East and Africa with a low Human Development Index (<i>HDI</i>) do not experience this phenomenon. Our comprehensive investigation of 24 industries reveals that advertisers must pay up to 64% of the digital marketing pink tax to target women in some industries. Our findings also suggest a connection between the digital marketing pink tax and the consumer pink tax – the extra charge placed on products marketed to women. Overall, our research sheds light on an important issue affecting women worldwide. Raising awareness of the digital marketing pink tax and advocating for better regulation.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"59 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Who makes open source code? The hybridisation of commercial and open source practices 谁在编写开放源代码?商业和开源实践的混合
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-06 DOI: 10.1140/epjds/s13688-024-00475-0
Peter Mehler, Eva Iris Otto, Anna Sapienza

While Free and Open Source (F/OSS) coding has traditionally been described as a separate commons linked to values of openness and sharing, recent research suggests an increasing integration of private corporations into F/OSS practices, blurring the boundaries between F/OSS and commodified coding. However, there is a dearth of empirical, and especially quantitative studies exploring this phenomenon. To address this gap, we model the power dynamics and infrastructural aspects of software production within GitHub, a central hub for F/OSS development, using a large-scale, directed network. Using various network statistics, we detect the ecosystem’s most impactful actors and find a nuanced picture of the influence of individuals, open source organizations, and private corporations in F/OSS practices. We find that the majority of public repositories on GitHub depend on a small core of specialized repositories and users. In accordance with expectations, individuals and open source organizations are more prevalent in this core of elite GitHub users, however, we also find a significant amount of private organizations with an indirect, yet consistent influence within GitHub. In addition, we find that directly influential individuals tend to facilitate sponsorship methods more often than indirectly or non-influential individuals. Our research highlights a hybridization of F/OSS and sheds light on the complex interplay between influence, power, and code production in the multi-language dependency ecosystem of GitHub.

尽管自由与开源(F/OSS)编码传统上被描述为与开放和共享价值相关的独立公共资源,但最近的研究表明,私营企业越来越多地融入到 F/OSS 的实践中,模糊了 F/OSS 与商品化编码之间的界限。然而,探索这一现象的实证研究,尤其是定量研究却十分匮乏。为了填补这一空白,我们利用一个大规模的定向网络,对 F/OSS 开发中心 GitHub 内软件生产的权力动态和基础设施方面进行了建模。通过使用各种网络统计数据,我们发现了生态系统中最具影响力的参与者,并发现了个人、开源组织和私营企业在 F/OSS 实践中的细微影响。我们发现,GitHub 上的大多数公共源依赖于一小部分核心专业源和用户。个人和开源组织在 GitHub 的核心精英用户中更为普遍,但我们也发现大量私营组织在 GitHub 中具有间接但持续的影响力。此外,我们还发现,与间接或无影响力的个人相比,直接有影响力的个人更倾向于促进赞助方法。我们的研究凸显了 F/OSS 的混合,并揭示了 GitHub 多语言依赖生态系统中影响力、权力和代码生成之间复杂的相互作用。
{"title":"Who makes open source code? The hybridisation of commercial and open source practices","authors":"Peter Mehler, Eva Iris Otto, Anna Sapienza","doi":"10.1140/epjds/s13688-024-00475-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00475-0","url":null,"abstract":"<p>While Free and Open Source (F/OSS) coding has traditionally been described as a separate commons linked to values of openness and sharing, recent research suggests an increasing integration of private corporations into F/OSS practices, blurring the boundaries between F/OSS and commodified coding. However, there is a dearth of empirical, and especially quantitative studies exploring this phenomenon. To address this gap, we model the power dynamics and infrastructural aspects of software production within GitHub, a central hub for F/OSS development, using a large-scale, directed network. Using various network statistics, we detect the ecosystem’s most impactful actors and find a nuanced picture of the influence of individuals, open source organizations, and private corporations in F/OSS practices. We find that the majority of public repositories on GitHub depend on a small core of specialized repositories and users. In accordance with expectations, individuals and open source organizations are more prevalent in this core of elite GitHub users, however, we also find a significant amount of private organizations with an indirect, yet consistent influence within GitHub. In addition, we find that directly influential individuals tend to facilitate sponsorship methods more often than indirectly or non-influential individuals. Our research highlights a hybridization of F/OSS and sheds light on the complex interplay between influence, power, and code production in the multi-language dependency ecosystem of GitHub.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"61 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation using large language models: A new typology of American neighborhoods 使用大型语言模型进行分类:美国社区的新类型
IF 3.6 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-22 DOI: 10.1140/epjds/s13688-024-00466-1
Alex D. Singleton, Seth Spielman

In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.

在美国,国家统计系统最近的变化扩大了地理-人口分辨率的权衡。也就是说,在处理来自美国社区调查的人口和经济数据时,随着地理上的放大,由于误差幅度非常大,人口上的分辨率也会随之降低。在本文中,我们利用美国社区调查(ACS)的小区域估算数据,以基于人工智能的开放式、可重现的美国地理人口分类系统的形式,提出了这一问题的解决方案。我们对一系列社会经济、人口和建筑环境变量采用了分区聚类算法。我们的方法采用开源软件管道,可确保对未来数据更新的适应性。一个关键的创新是整合了 GPT4(一种最先进的大型语言模型),以生成直观的聚类描述和名称。这代表了自然语言处理在地理人口研究中的新应用,并展示了人类与人工智能在地理空间领域的合作潜力。
{"title":"Segmentation using large language models: A new typology of American neighborhoods","authors":"Alex D. Singleton, Seth Spielman","doi":"10.1140/epjds/s13688-024-00466-1","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00466-1","url":null,"abstract":"<p>In the United States, recent changes to the National Statistical System have amplified the geographic-demographic resolution trade-off. That is, when working with demographic and economic data from the American Community Survey, as one zooms in geographically one loses resolution demographically due to very large margins of error. In this paper, we present a solution to this problem in the form of an AI based open and reproducible geodemographic classification system for the United States using small area estimates from the American Community Survey (ACS). We employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Our approach utilizes an open source software pipeline that ensures adaptability to future data updates. A key innovation is the integration of GPT4, a state-of-the-art large language model, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
EPJ Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1