EPJ Data Science最新文献_第4页

Unveiling public perception of AI ethics: an exploration on Wikipedia data 揭示公众对人工智能伦理的看法：维基百科数据探索

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-26 DOI: 10.1140/epjds/s13688-024-00462-5

Abstract

Artificial Intelligence (AI) technologies have exposed more and more ethical issues while providing services to people. It is challenging for people to realize the occurrence of AI ethical issues in most cases. The lower the public awareness, the more difficult it is to address AI ethical issues. Many previous studies have explored public reactions and opinions on AI ethical issues through questionnaires and social media platforms like Twitter. However, these approaches primarily focus on categorizing popular topics and sentiments, overlooking the public’s potential lack of knowledge underlying these issues. Few studies revealed the holistic knowledge structure of AI ethical topics and the relations among the subtopics. As the world’s largest online encyclopedia, Wikipedia encourages people to jointly contribute and share their knowledge by adding new topics and following a well-accepted hierarchical structure. Through public viewing and editing, Wikipedia serves as a proxy for knowledge transmission. This study aims to analyze how the public comprehend the body of knowledge of AI ethics. We adopted the community detection approach to identify the hierarchical community of the AI ethical topics, and further extracted the AI ethics-related entities, which are proper nouns, organizations, and persons. The findings reveal that the primary topics at the top-level community, most pertinent to AI ethics, predominantly revolve around knowledge-based and ethical issues. Examples include transitions from Information Theory to Internet Copyright Infringement. In summary, this study contributes to three points, (1) to present the holistic knowledge structure of AI ethics, (2) to evaluate and improve the existing body of knowledge of AI ethics, (3) to enhance public perception of AI ethics to mitigate the risks associated with AI technologies.

摘要人工智能（AI）技术在为人们提供服务的过程中暴露出越来越多的伦理问题。在大多数情况下，人们很难意识到人工智能伦理问题的存在。公众意识越低，解决人工智能伦理问题的难度就越大。以往的许多研究通过问卷调查和 Twitter 等社交媒体平台探讨了公众对人工智能伦理问题的反应和看法。然而，这些方法主要侧重于对热门话题和情绪进行分类，忽视了公众对这些问题潜在知识的缺乏。很少有研究揭示了人工智能伦理话题的整体知识结构以及子话题之间的关系。作为世界上最大的在线百科全书，维基百科鼓励人们通过添加新主题和遵循公认的分层结构来共同贡献和分享知识。通过公众浏览和编辑，维基百科成为知识传播的代表。本研究旨在分析公众如何理解人工智能伦理的知识体系。我们采用社群检测的方法来识别人工智能伦理主题的层级社群，并进一步提取人工智能伦理相关实体，即专有名词、组织和个人。研究结果表明，与人工智能伦理最相关的顶级社区的主要话题主要围绕知识和伦理问题。例如，从信息论到互联网版权侵权的过渡。总之，本研究有三点贡献：（1）呈现人工智能伦理的整体知识结构；（2）评估和完善现有的人工智能伦理知识体系；（3）增强公众对人工智能伦理的认知，以降低人工智能技术带来的风险。

{"title":"Unveiling public perception of AI ethics: an exploration on Wikipedia data","authors":"","doi":"10.1140/epjds/s13688-024-00462-5","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00462-5","url":null,"abstract":"<h3>Abstract</h3> <p>Artificial Intelligence (AI) technologies have exposed more and more ethical issues while providing services to people. It is challenging for people to realize the occurrence of AI ethical issues in most cases. The lower the public awareness, the more difficult it is to address AI ethical issues. Many previous studies have explored public reactions and opinions on AI ethical issues through questionnaires and social media platforms like Twitter. However, these approaches primarily focus on categorizing popular topics and sentiments, overlooking the public’s potential lack of knowledge underlying these issues. Few studies revealed the holistic knowledge structure of AI ethical topics and the relations among the subtopics. As the world’s largest online encyclopedia, Wikipedia encourages people to jointly contribute and share their knowledge by adding new topics and following a well-accepted hierarchical structure. Through public viewing and editing, Wikipedia serves as a proxy for knowledge transmission. This study aims to analyze how the public comprehend the body of knowledge of AI ethics. We adopted the community detection approach to identify the hierarchical community of the AI ethical topics, and further extracted the AI ethics-related entities, which are proper nouns, organizations, and persons. The findings reveal that the primary topics at the top-level community, most pertinent to AI ethics, predominantly revolve around knowledge-based and ethical issues. Examples include transitions from Information Theory to Internet Copyright Infringement. In summary, this study contributes to three points, (1) to present the holistic knowledge structure of AI ethics, (2) to evaluate and improve the existing body of knowledge of AI ethics, (3) to enhance public perception of AI ethics to mitigate the risks associated with AI technologies.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"101 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online disinformation in the 2020 U.S. election: swing vs. safe states 2020 年美国大选中的网络虚假信息：摇摆州与安全州

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-26 DOI: 10.1140/epjds/s13688-024-00461-6

Manuel Pratelli, Marinella Petrocchi, Fabio Saracco, Rocco De Nicola

For U.S. presidential elections, most states use the so-called winner-take-all system, in which the state’s presidential electors are awarded to the winning political party in the state after a popular vote phase, regardless of the actual margin of victory. Therefore, election campaigns are especially intense in states where there is no clear direction on which party will be the winning party. These states are often referred to as swing states. To measure the impact of such an election law on the campaigns, we analyze the Twitter activity surrounding the 2020 US preelection debate, with a particular focus on the spread of disinformation. We find that about 88% of the online traffic was associated with swing states. In addition, the sharing of links to unreliable news sources is significantly more prevalent in tweets associated with swing states: in this case, untrustworthy tweets are predominantly generated by automated accounts. Furthermore, we observe that the debate is mostly led by two main communities, one with a predominantly Republican affiliation and the other with accounts of different political orientations. Most of the disinformation comes from the former.

在美国总统选举中，大多数州都采用所谓的 "赢者通吃 "制度，即在普选阶段结束后，无论实际胜负如何，该州的总统选举人都将被授予该州的获胜政党。因此，在一些没有明确胜负方向的州，竞选活动尤为激烈。这些州通常被称为摇摆州。为了衡量这种选举法对竞选的影响，我们分析了围绕 2020 年美国大选前辩论的推特活动，尤其关注虚假信息的传播。我们发现，约 88% 的网络流量与摇摆州有关。此外，在与摇摆州相关的推文中，分享不可靠新闻来源链接的现象明显更为普遍：在这种情况下，不可信的推文主要由自动账户生成。此外，我们还观察到，这场辩论主要由两个主要群体主导，一个是以共和党人为主的群体，另一个是不同政治倾向的账户。大部分虚假信息来自前者。

{"title":"Online disinformation in the 2020 U.S. election: swing vs. safe states","authors":"Manuel Pratelli, Marinella Petrocchi, Fabio Saracco, Rocco De Nicola","doi":"10.1140/epjds/s13688-024-00461-6","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00461-6","url":null,"abstract":"<p>For U.S. presidential elections, most states use the so-called winner-take-all system, in which the state’s presidential electors are awarded to the winning political party in the state after a popular vote phase, regardless of the actual margin of victory. Therefore, election campaigns are especially intense in states where there is no clear direction on which party will be the winning party. These states are often referred to as <i>swing states</i>. To measure the impact of such an election law on the campaigns, we analyze the Twitter activity surrounding the 2020 US preelection debate, with a particular focus on the spread of disinformation. We find that about 88% of the online traffic was associated with swing states. In addition, the sharing of links to unreliable news sources is significantly more prevalent in tweets associated with swing states: in this case, untrustworthy tweets are predominantly generated by automated accounts. Furthermore, we observe that the debate is mostly led by two main communities, one with a predominantly Republican affiliation and the other with accounts of different political orientations. Most of the disinformation comes from the former.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human mobility reshaped? Deciphering the impacts of the Covid-19 pandemic on activity patterns, spatial habits, and schedule habits 人类流动性被重塑？解读 Covid-19 大流行对活动模式、空间习惯和日程安排习惯的影响

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-22 DOI: 10.1140/epjds/s13688-024-00463-4

Mohamed Amine Bouzaghrane, Hassan Obeid, Marta González, Joan Walker

Despite the historically documented regularity in human mobility patterns, the relaxation of spatial and temporal constraints, brought by the widespread adoption of telecommuting and e-commerce during the COVID-19 pandemic, as well as a growing desire for flexible work arrangements in a post-pandemic work, indicates a potential reshaping of these patterns. In this paper, we investigate the multifaceted impacts of relaxed spatio-temporal constraints on human mobility, using well-established metrics from the travel behavior literature. Further, we introduce a novel metric for schedule regularity, accounting for specific day-of-week characteristics that previous approaches overlooked. Building on the large body of literature on the impacts of COVID-19 on human mobility, we make use of passively tracked Point of Interest (POI) data for approximately 21,700 smartphone users in the US, and analyze data between January 2020 and September 2022 to answer two key questions: (1) has the COVID-19 pandemic and its associated relaxation of spatio-temporal activity patterns reshaped the different aspects of human mobility, and (2) have we achieved a state of stable post-pandemic “new normal”? We hypothesize that the relaxation of the spatiotemporal constraints around key activities will result in people exhibiting less regular schedules. Findings reveal a complex landscape: while some mobility indicators have reverted to pre-pandemic norms, such as trip frequency and travel distance, others, notably at-home dwell-time, persist at altered levels, suggesting a recalibration rather than a return to past behaviors. Most notably, our analysis reveals a paradox: despite the documented large-scale shift towards flexible work arrangements, schedule habits have strengthened rather than relaxed, defying our initial hypotheses and highlighting a desire for regularity. The study’s results contribute to a deeper understanding of the post-pandemic “new normal”, offering key insights on how multiple facets of travel behavior were reshaped, if at all, by the COVID-19 pandemic, and will help inform transportation planning in a post-pandemic world.

尽管有历史记载表明人类的流动模式具有规律性，但在 COVID-19 大流行期间，远程办公和电子商务的广泛应用带来的时空限制的放松，以及在大流行后工作中对灵活工作安排的日益增长的渴望，都表明这些模式有可能发生重塑。在本文中，我们利用旅行行为文献中的成熟指标，研究了放宽时空限制对人类流动性的多方面影响。此外，我们还引入了一种新的衡量标准来衡量日程安排的规律性，以考虑到以往方法所忽略的特定周日特征。在有关 COVID-19 对人类流动性影响的大量文献的基础上，我们利用被动追踪的美国约 21,700 名智能手机用户的兴趣点 (POI) 数据，分析 2020 年 1 月至 2022 年 9 月期间的数据，以回答两个关键问题：(1) COVID-19 大流行及其相关的时空活动模式的放松是否重塑了人类流动性的不同方面；(2) 我们是否实现了大流行后 "新常态 "的稳定状态？我们假设，主要活动的时空限制的放松将导致人们的日程安排不再那么有规律。研究结果揭示了一种复杂的情况：虽然一些流动性指标已恢复到大流行前的标准，如出行频率和出行距离，但其他指标，特别是在家逗留时间，仍保持在改变的水平上，这表明是一种重新校准，而不是恢复到过去的行为。最值得注意的是，我们的分析揭示了一个悖论：尽管有记录显示，人们大规模转向灵活的工作安排，但日程安排习惯却没有放松，反而加强了，这与我们最初的假设相悖，凸显了人们对规律性的渴望。本研究的结果有助于加深对大流行后 "新常态 "的理解，提供了有关 COVID-19 大流行如何重塑旅行行为多个方面（如果有的话）的重要见解，并将有助于为大流行后世界的交通规划提供信息。

{"title":"Human mobility reshaped? Deciphering the impacts of the Covid-19 pandemic on activity patterns, spatial habits, and schedule habits","authors":"Mohamed Amine Bouzaghrane, Hassan Obeid, Marta González, Joan Walker","doi":"10.1140/epjds/s13688-024-00463-4","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00463-4","url":null,"abstract":"<p>Despite the historically documented regularity in human mobility patterns, the relaxation of spatial and temporal constraints, brought by the widespread adoption of telecommuting and e-commerce during the COVID-19 pandemic, as well as a growing desire for flexible work arrangements in a post-pandemic work, indicates a potential reshaping of these patterns. In this paper, we investigate the multifaceted impacts of relaxed spatio-temporal constraints on human mobility, using well-established metrics from the travel behavior literature. Further, we introduce a novel metric for schedule regularity, accounting for specific day-of-week characteristics that previous approaches overlooked. Building on the large body of literature on the impacts of COVID-19 on human mobility, we make use of passively tracked Point of Interest (POI) data for approximately 21,700 smartphone users in the US, and analyze data between January 2020 and September 2022 to answer two key questions: (1) has the COVID-19 pandemic and its associated relaxation of spatio-temporal activity patterns reshaped the different aspects of human mobility, and (2) have we achieved a state of stable post-pandemic “new normal”? We hypothesize that the relaxation of the spatiotemporal constraints around key activities will result in people exhibiting less regular schedules. Findings reveal a complex landscape: while some mobility indicators have reverted to pre-pandemic norms, such as trip frequency and travel distance, others, notably at-home dwell-time, persist at altered levels, suggesting a recalibration rather than a return to past behaviors. Most notably, our analysis reveals a paradox: despite the documented large-scale shift towards flexible work arrangements, schedule habits have strengthened rather than relaxed, defying our initial hypotheses and highlighting a desire for regularity. The study’s results contribute to a deeper understanding of the post-pandemic “new normal”, offering key insights on how multiple facets of travel behavior were reshaped, if at all, by the COVID-19 pandemic, and will help inform transportation planning in a post-pandemic world.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"122 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of suspicious behavior through anomalies in the tracking data of fishing vessels 通过渔船跟踪数据中的异常现象识别可疑行为

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-21 DOI: 10.1140/epjds/s13688-024-00459-0

Abstract

Automated positioning devices can generate large datasets with information on the movement of humans, animals and objects, revealing patterns of movement, hot spots and overlaps among others. However, in the case of Automated Information Systems (AIS), attached to vessels, observed strange behaviors in the tracking datasets may come from intentional manipulation of the electronic devices. Thus, the analysis of anomalies can provide valuable information on suspicious behavior. Here, we analyze anomalies of fishing vessel trajectories obtained with the Automatic Identification System. The map of silent anomalies, those that occur when positioning data are absent for more than 24 hours, shows that they are most likely to occur closer to land, with 87.1% of anomalies observed within 100 km of the coast. This behavior suggests the potential of identifying silence anomalies as a proxy for illegal activities. With the increasing availability of high-resolution positioning of vessels and the development of powerful statistical analytical tools, we provide hints on the automatic detection of illegal activities that may help optimize the management of fishing resources.

摘要自动定位装置可以生成大量数据集，其中包含人类、动物和物体的运动信息，揭示运动模式、热点和重叠等。然而，就附着在船只上的自动信息系统（AIS）而言，在跟踪数据集中观察到的奇怪行为可能来自对电子设备的有意操纵。因此，对异常情况的分析可以为可疑行为提供有价值的信息。在此，我们分析了通过自动识别系统获取的渔船轨迹异常。无声异常（定位数据缺失超过 24 小时时出现的异常）地图显示，这些异常最有可能发生在靠近陆地的地方，87.1% 的异常在距离海岸 100 公里的范围内被观测到。这种行为表明，有可能将静默异常现象作为非法活动的替代物加以识别。随着高分辨率船只定位技术的日益普及和强大统计分析工具的开发，我们为自动检测非法活动提供了提示，这可能有助于优化渔业资源管理。

{"title":"Identification of suspicious behavior through anomalies in the tracking data of fishing vessels","authors":"","doi":"10.1140/epjds/s13688-024-00459-0","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00459-0","url":null,"abstract":"<h3>Abstract</h3> <p>Automated positioning devices can generate large datasets with information on the movement of humans, animals and objects, revealing patterns of movement, hot spots and overlaps among others. However, in the case of Automated Information Systems (AIS), attached to vessels, observed strange behaviors in the tracking datasets may come from intentional manipulation of the electronic devices. Thus, the analysis of anomalies can provide valuable information on suspicious behavior. Here, we analyze anomalies of fishing vessel trajectories obtained with the Automatic Identification System. The map of silent anomalies, those that occur when positioning data are absent for more than 24 hours, shows that they are most likely to occur closer to land, with 87.1% of anomalies observed within 100 km of the coast. This behavior suggests the potential of identifying silence anomalies as a proxy for illegal activities. With the increasing availability of high-resolution positioning of vessels and the development of powerful statistical analytical tools, we provide hints on the automatic detection of illegal activities that may help optimize the management of fishing resources.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human mobility prediction with causal and spatial-constrained multi-task network 利用因果和空间约束多任务网络预测人类流动性

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-19 DOI: 10.1140/epjds/s13688-024-00460-7

Zongyuan Huang, Shengyuan Xu, Menghan Wang, Hansi Wu, Yanyan Xu, Yaohui Jin

Modeling human mobility helps to understand how people are accessing resources and physically contacting with each other in cities, and thus contributes to various applications such as urban planning, epidemic control, and location-based advertisement. Next location prediction is one decisive task in individual human mobility modeling and is usually viewed as sequence modeling, solved with Markov or RNN-based methods. However, the existing models paid little attention to the logic of individual travel decisions and the reproducibility of the collective behavior of population. To this end, we propose a Causal and Spatial-constrained Long and Short-term Learner (CSLSL) for next location prediction. CSLSL utilizes a causal structure based on multi-task learning to explicitly model the “when→what→where”, a.k.a. “time→activity→location” decision logic. We next propose a spatial-constrained loss function as an auxiliary task, to ensure the consistency between the predicted and actual spatial distribution of travelers’ destinations. Moreover, CSLSL adopts modules named Long and Short-term Capturer (LSC) to learn the transition regularities across different time spans. Extensive experiments on three real-world datasets show promising performance improvements of CSLSL over baselines and confirm the effectiveness of introducing the causality and consistency constraints. The implementation is available at https://github.com/urbanmobility/CSLSL.

建立人类流动模型有助于了解城市中人们如何获取资源和相互之间的物理联系，从而有助于城市规划、流行病控制和基于位置的广告等各种应用。下一个位置预测是人类个体流动建模中的一项决定性任务，通常被视为序列建模，用马尔可夫或基于 RNN 的方法来解决。然而，现有模型很少关注个人出行决策的逻辑性和人口集体行为的可重复性。为此，我们提出了一种用于下一个地点预测的因果和空间约束长短期学习器（CSLSL）。CSLSL 利用基于多任务学习的因果结构来明确模拟 "何时→何事→何地"，即 "时间→活动→位置 "的决策逻辑。接下来，我们提出了一个空间约束损失函数作为辅助任务，以确保旅行者目的地的预测空间分布与实际空间分布之间的一致性。此外，CSLSL 还采用了名为长期和短期捕获器（LSC）的模块来学习不同时间跨度的过渡规律性。在三个真实世界数据集上进行的广泛实验表明，CSLSL的性能比基线有很大提高，并证实了引入因果关系和一致性约束的有效性。实现方法可在 https://github.com/urbanmobility/CSLSL 上获取。

{"title":"Human mobility prediction with causal and spatial-constrained multi-task network","authors":"Zongyuan Huang, Shengyuan Xu, Menghan Wang, Hansi Wu, Yanyan Xu, Yaohui Jin","doi":"10.1140/epjds/s13688-024-00460-7","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00460-7","url":null,"abstract":"<p>Modeling human mobility helps to understand how people are accessing resources and physically contacting with each other in cities, and thus contributes to various applications such as urban planning, epidemic control, and location-based advertisement. Next location prediction is one decisive task in individual human mobility modeling and is usually viewed as sequence modeling, solved with Markov or RNN-based methods. However, the existing models paid little attention to the logic of individual travel decisions and the reproducibility of the collective behavior of population. To this end, we propose a Causal and Spatial-constrained Long and Short-term Learner (CSLSL) for next location prediction. CSLSL utilizes a causal structure based on multi-task learning to explicitly model the “<i>when</i>→<i>what</i>→<i>where</i>”, a.k.a. “<i>time</i>→<i>activity</i>→<i>location</i>” decision logic. We next propose a spatial-constrained loss function as an auxiliary task, to ensure the consistency between the predicted and actual spatial distribution of travelers’ destinations. Moreover, CSLSL adopts modules named Long and Short-term Capturer (LSC) to learn the transition regularities across different time spans. Extensive experiments on three real-world datasets show promising performance improvements of CSLSL over baselines and confirm the effectiveness of introducing the causality and consistency constraints. The implementation is available at https://github.com/urbanmobility/CSLSL.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"62 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evolving demographics: a dynamic clustering approach to analyze residential segregation in Berlin 不断变化的人口结构：分析柏林住宅隔离的动态聚类方法

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-12 DOI: 10.1140/epjds/s13688-024-00455-4

Abstract

This paper examines the phenomenon of residential segregation in Berlin over time using a dynamic clustering analysis approach. Previous research has examined the phenomenon of residential segregation in Berlin at a high spatial and temporal aggregation and statically, i.e. not over time. We propose a methodology to investigate the existence of clusters of residential areas according to migration background, age group, gender, and socio-economic dimension over time. To this end, we have developed a sequential mixed methods approach that includes a multivariate kernel density estimation technique to estimate the density of subpopulations and a dynamic cluster analysis to discover spatial patterns of residential segregation over time (2009-2020). The dynamic analysis shows the emergence of clusters on the dimensions of migration background, age group, gender and socio-economic variables. We also identified a structural change in 2015, resulting in a new cluster in Berlin that reflects the changing distribution of subpopulations with a particular migratory background. Finally, we discuss the findings of this study with previous research and suggest possibilities for policy applications and future research using a dynamic clustering approach for analyzing changes in residential segregation at the city level.

摘要本文采用动态聚类分析方法研究了柏林随时间变化的居住隔离现象。以往的研究对柏林的居住隔离现象进行了高度的空间和时间聚合，并且是静态的，即不随时间变化。我们提出了一种根据移民背景、年龄组、性别和社会经济维度随时间变化研究居住区集群存在情况的方法。为此，我们开发了一种序列混合方法，其中包括一种用于估算亚人群密度的多元核密度估计技术，以及一种用于发现随时间（2009-2020 年）变化的住宅隔离空间模式的动态聚类分析。动态分析显示，在移民背景、年龄组、性别和社会经济变量等方面出现了聚类。我们还发现了 2015 年的结构性变化，在柏林形成了一个新的聚类，反映了具有特定移民背景的亚人群分布的变化。最后，我们将本研究的结果与之前的研究进行了讨论，并提出了使用动态聚类方法分析城市层面居住隔离变化的政策应用和未来研究的可能性。

{"title":"Evolving demographics: a dynamic clustering approach to analyze residential segregation in Berlin","authors":"","doi":"10.1140/epjds/s13688-024-00455-4","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00455-4","url":null,"abstract":"<h3>Abstract</h3> <p>This paper examines the phenomenon of residential segregation in Berlin over time using a dynamic clustering analysis approach. Previous research has examined the phenomenon of residential segregation in Berlin at a high spatial and temporal aggregation and statically, i.e. not over time. We propose a methodology to investigate the existence of clusters of residential areas according to migration background, age group, gender, and socio-economic dimension over time. To this end, we have developed a sequential mixed methods approach that includes a multivariate kernel density estimation technique to estimate the density of subpopulations and a dynamic cluster analysis to discover spatial patterns of residential segregation over time (2009-2020). The dynamic analysis shows the emergence of clusters on the dimensions of migration background, age group, gender and socio-economic variables. We also identified a structural change in 2015, resulting in a new cluster in Berlin that reflects the changing distribution of subpopulations with a particular migratory background. Finally, we discuss the findings of this study with previous research and suggest possibilities for policy applications and future research using a dynamic clustering approach for analyzing changes in residential segregation at the city level.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"110 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale digital signatures of emotional response to the COVID-19 vaccination campaign COVID-19 疫苗接种活动情绪反应的大规模数字特征

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-08 DOI: 10.1140/epjds/s13688-024-00452-7

Abstract

The same individuals can express very different emotions in online social media with respect to face-to-face interactions, partially because of intrinsic limitations of the digital environments and partially because of their algorithmic design, which is optimized to maximize engagement. Such differences become even more pronounced for topics concerning socially sensitive and polarizing issues, such as massive pharmaceutical interventions. Here, we investigate how online emotional responses change during the large-scale COVID-19 vaccination campaign with respect to a baseline in which no specific contentious topic dominates. We show that the online discussions during the pandemic generate a vast spectrum of emotional response compared to the baseline, especially when we take into account the characteristics of the users and the type of information shared in the online platform. Furthermore, we analyze the role of the political orientation of shared news, whose circulation seems to be driven not only by their actual informational content but also by the social need to strengthen one’s affiliation to, and positioning within, a specific online community by means of emotionally arousing posts. Our findings stress the importance of better understanding the emotional reactions to contentious topics at scale from digital signatures, while providing a more quantitative assessment of the ongoing online social dynamics to build a faithful picture of offline social implications.

摘要同样是一个人，在网络社交媒体上表达的情感与面对面交流时可能大相径庭，部分原因是数字环境的内在限制，部分原因是算法设计的优化，以最大限度地提高参与度。对于涉及社会敏感和两极分化问题的话题，如大规模的药物干预，这种差异会变得更加明显。在此，我们研究了在大规模 COVID-19 疫苗接种活动期间，相对于没有特定争议话题主导的基线，在线情绪反应是如何变化的。我们的研究表明，与基线相比，大流行病期间的在线讨论产生了广泛的情绪反应，特别是当我们考虑到用户的特点和在线平台上共享的信息类型时。此外，我们还分析了所分享新闻的政治取向所起的作用，这些新闻的传播似乎不仅受其实际信息内容的驱动，而且还受社会需求的驱动，即通过煽动情绪的帖子来加强个人对特定网络社区的归属感和定位。我们的研究结果强调了从数字签名中更好地理解对有争议话题的大规模情绪反应的重要性，同时对正在进行的在线社会动态进行了更加量化的评估，以建立对离线社会影响的忠实描述。

{"title":"Large-scale digital signatures of emotional response to the COVID-19 vaccination campaign","authors":"","doi":"10.1140/epjds/s13688-024-00452-7","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00452-7","url":null,"abstract":"<h3>Abstract</h3> <p>The same individuals can express very different emotions in online social media with respect to face-to-face interactions, partially because of intrinsic limitations of the digital environments and partially because of their algorithmic design, which is optimized to maximize engagement. Such differences become even more pronounced for topics concerning socially sensitive and polarizing issues, such as massive pharmaceutical interventions. Here, we investigate how online emotional responses change during the large-scale COVID-19 vaccination campaign with respect to a baseline in which no specific contentious topic dominates. We show that the online discussions during the pandemic generate a vast spectrum of emotional response compared to the baseline, especially when we take into account the characteristics of the users and the type of information shared in the online platform. Furthermore, we analyze the role of the political orientation of shared news, whose circulation seems to be driven not only by their actual informational content but also by the social need to strengthen one’s affiliation to, and positioning within, a specific online community by means of emotionally arousing posts. Our findings stress the importance of better understanding the emotional reactions to contentious topics at scale from digital signatures, while providing a more quantitative assessment of the ongoing online social dynamics to build a faithful picture of offline social implications.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Twitter’s algorithmic amplification of low-credibility content: an observational study 评估 Twitter 对低可信度内容的算法放大：一项观察研究

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-07 DOI: 10.1140/epjds/s13688-024-00456-3

Giulio Corsi

Artificial intelligence (AI)-powered recommender systems play a crucial role in determining the content that users are exposed to on social media platforms. However, the behavioural patterns of these systems are often opaque, complicating the evaluation of their impact on the dissemination and consumption of disinformation and misinformation. To begin addressing this evidence gap, this study presents a measurement approach that uses observed digital traces to infer the status of algorithmic amplification of low-credibility content on Twitter over a 14-day period in January 2023. Using an original dataset of ≈ 2.7 million posts on COVID-19 and climate change published on the platform, this study identifies tweets sharing information from low-credibility domains, and uses a bootstrapping model with two stratifications, a tweet’s engagement level and a user’s followers level, to compare any differences in impressions generated between low-credibility and high-credibility samples. Additional stratification variables of toxicity, political bias, and verified status are also examined. This analysis provides valuable observational evidence on whether the Twitter algorithm favours the visibility of low-credibility content, with results indicating that, on aggregate, tweets containing low-credibility URL domains perform better than tweets that do not across both datasets. However, this effect is largely attributable to a difference in high-engagement, high-followers tweets, which are very impactful in terms of impressions generation, and are more likely receive amplified visibility when containing low-credibility content. Furthermore, high toxicity tweets and those with right-leaning bias see heightened amplification, as do low-credibility tweets from verified accounts. Ultimately, this suggests that Twitter’s recommender system may have facilitated the diffusion of false content by amplifying the visibility of low-credibility content with high-engagement generated by very influential users.

人工智能（AI）驱动的推荐系统在决定用户在社交媒体平台上接触的内容方面发挥着至关重要的作用。然而，这些系统的行为模式往往是不透明的，这使得评估它们对虚假信息和错误信息的传播和消费的影响变得更加复杂。为了着手解决这一证据缺口，本研究提出了一种测量方法，利用观察到的数字痕迹来推断 2023 年 1 月 14 天内 Twitter 上低可信度内容的算法放大状况。本研究利用平台上发布的有关 COVID-19 和气候变化的≈270 万条帖子的原始数据集，识别出分享低可信度领域信息的推文，并使用具有两个分层（推文参与度和用户关注度）的引导模型，比较低可信度样本和高可信度样本之间产生的印象差异。此外，还考察了毒性、政治偏见和验证状态等其他分层变量。这项分析为推特算法是否有利于低可信度内容的可见性提供了宝贵的观察证据，结果表明，在两个数据集中，包含低可信度 URL 域的推文的总体表现要好于不包含低可信度 URL 域的推文。然而，这种效果主要归因于高参与度、高关注度推文的差异，这些推文在产生印象方面非常有影响力，当包含低可信度内容时，更有可能获得更高的可见度。此外，毒性高的推文和带有右倾偏见的推文，以及来自已验证账户的低可信度推文也会被放大。这最终表明，Twitter 的推荐系统可能通过放大由非常有影响力的用户产生的高参与度的低可信度内容的可见度，促进了虚假内容的传播。

{"title":"Evaluating Twitter’s algorithmic amplification of low-credibility content: an observational study","authors":"Giulio Corsi","doi":"10.1140/epjds/s13688-024-00456-3","DOIUrl":"https://doi.org/10.1140/epjds/s13688-024-00456-3","url":null,"abstract":"<p>Artificial intelligence (AI)-powered recommender systems play a crucial role in determining the content that users are exposed to on social media platforms. However, the behavioural patterns of these systems are often opaque, complicating the evaluation of their impact on the dissemination and consumption of disinformation and misinformation. To begin addressing this evidence gap, this study presents a measurement approach that uses observed digital traces to infer the status of algorithmic amplification of low-credibility content on Twitter over a 14-day period in January 2023. Using an original dataset of ≈ 2.7 million posts on COVID-19 and climate change published on the platform, this study identifies tweets sharing information from low-credibility domains, and uses a bootstrapping model with two stratifications, a tweet’s engagement level and a user’s followers level, to compare any differences in impressions generated between low-credibility and high-credibility samples. Additional stratification variables of toxicity, political bias, and verified status are also examined. This analysis provides valuable observational evidence on whether the Twitter algorithm favours the visibility of low-credibility content, with results indicating that, on aggregate, tweets containing low-credibility URL domains perform better than tweets that do not across both datasets. However, this effect is largely attributable to a difference in high-engagement, high-followers tweets, which are very impactful in terms of impressions generation, and are more likely receive amplified visibility when containing low-credibility content. Furthermore, high toxicity tweets and those with right-leaning bias see heightened amplification, as do low-credibility tweets from verified accounts. Ultimately, this suggests that Twitter’s recommender system may have facilitated the diffusion of false content by amplifying the visibility of low-credibility content with high-engagement generated by very influential users.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"27 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140054923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The right to audit and power asymmetries in algorithm auditing 审计权与算法审计中的权力不对称

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-07 DOI: 10.1140/epjds/s13688-024-00454-5

Aleksandra Urman, Ivan Smirnov, Jana Lasser

In this paper, we engage with and expand on the keynote talk about the “Right to Audit” given by Prof. Christian Sandvig at the International Conference on Computational Social Science 2021 through a critical reflection on power asymmetries in the algorithm auditing field. We elaborate on the challenges and asymmetries mentioned by Sandvig — such as those related to legal issues and the disparity between early-career and senior researchers. We also contribute a discussion of the asymmetries that were not covered by Sandvig but that we find critically important: those related to other disparities between researchers, incentive structures related to the access to data from companies, targets of auditing and users and their rights. We also discuss the implications these asymmetries have for algorithm auditing research such as the Western-centrism and the lack of the diversity of perspectives. While we focus on the field of algorithm auditing specifically, we suggest some of the discussed asymmetries affect Computational Social Science more generally and need to be reflected on and addressed.

在本文中，我们通过对算法审计领域权力不对称的批判性反思，对克里斯蒂安-桑德维希教授在 2021 年计算社会科学国际会议上发表的关于 "审计权 "的主题演讲进行了参与和扩展。我们详细阐述了桑德维希提到的挑战和不对称--例如与法律问题和早期研究人员与资深研究人员之间的差距有关的挑战和不对称。我们还对桑德维希未涉及但我们认为非常重要的不对称现象进行了讨论：研究人员之间的其他不对称现象、与获取公司数据有关的激励结构、审计目标和用户及其权利。我们还讨论了这些不对称对算法审计研究的影响，如西方中心主义和缺乏多元化视角。虽然我们关注的重点是算法审计领域，但我们认为所讨论的一些不对称现象会对计算社会科学产生更广泛的影响，需要加以反思和解决。

引用次数: 0

The simpliciality of higher-order networks 高阶网络的简单性

IF 3.6 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science

Pub Date : 2024-03-07 DOI: 10.1140/epjds/s13688-024-00458-1

Nicholas W. Landry, Jean-Gabriel Young, Nicole Eikmeier

Higher-order networks are widely used to describe complex systems in which interactions can involve more than two entities at once. In this paper, we focus on inclusion within higher-order networks, referring to situations where specific entities participate in an interaction, and subsets of those entities also interact with each other. Traditional modeling approaches to higher-order networks tend to either not consider inclusion at all (e.g., hypergraph models) or explicitly assume perfect and complete inclusion (e.g., simplicial complex models). To allow for a more nuanced assessment of inclusion in higher-order networks, we introduce the concept of “simpliciality” and several corresponding measures. Contrary to current modeling practice, we show that empirically observed systems rarely lie at either end of the simpliciality spectrum. In addition, we show that generative models fitted to these datasets struggle to capture their inclusion structure. These findings suggest new modeling directions for the field of higher-order network science.

高阶网络被广泛用于描述复杂系统，在这些系统中，互动可能同时涉及两个以上的实体。在本文中，我们重点讨论高阶网络中的包含性，即特定实体参与互动，而这些实体的子集也相互影响的情况。传统的高阶网络建模方法倾向于完全不考虑包含性（如超图模型），或者明确假设完美和完全的包含性（如简单复合模型）。为了对高阶网络中的包含性进行更细致的评估，我们引入了 "简单性 "概念和几种相应的测量方法。与当前的建模实践相反，我们表明，经验观察到的系统很少处于简单性频谱的两端。此外，我们还表明，与这些数据集匹配的生成模型很难捕捉到它们的包含结构。这些发现为高阶网络科学领域提出了新的建模方向。

引用次数: 0