首页 > 最新文献

Journal of Computational Social Science最新文献

英文 中文
Anchoring race: improving the construction of race dimensions in word embeddings. 锚定竞争:改进词嵌入中竞争维度的构建。
IF 2.3 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2026-01-01 Epub Date: 2026-01-16 DOI: 10.1007/s42001-025-00449-w
Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus

Word embeddings have become powerful tools for detecting social biases encoded in language, yet research on measuring race bias through embeddings remains underdeveloped compared to studies on gender bias. This gap largely stems from the complexity of constructing race dimensions, which involve socially contested meanings and less clear semantic oppositions. Existing studies on race bias often rely on intuition and context-specific approaches when choosing anchor terms. In this paper, we address this methodological gap by providing statistical metrics to evaluate the quality and adaptability of race categories in embeddings. We apply these metrics to race categories across three embeddings-Google News (U.S.-centric), South African News (South African context), and Wikipedia (neutral, general-purpose). We find that names are effective for constructing race dimensions, with sub-Saharan African/European name categories producing more stable and generalisable dimensions than other categories, while American names were less generalisable. Validation shows that SSA/European name embeddings correlate most strongly with human ratings and demonstrate that our metrics capture human-perceived semantic structure of race. This research provides a framework for constructing robust race dimensions for measuring race bias in word embeddings.

Supplementary information: The online version contains supplementary material available at 10.1007/s42001-025-00449-w.

词嵌入已经成为检测语言中社会偏见的有力工具,但与性别偏见的研究相比,通过词嵌入测量种族偏见的研究还不发达。这种差距很大程度上源于构建种族维度的复杂性,其中涉及社会上有争议的意义和不太明确的语义对立。现有的关于种族偏见的研究在选择锚定词时往往依赖于直觉和特定情境的方法。在本文中,我们通过提供统计指标来评估嵌入中种族类别的质量和适应性,从而解决了这种方法上的差距。我们将这些指标应用于三个嵌入的种族类别——谷歌新闻(以美国为中心)、南非新闻(南非背景)和维基百科(中立的、通用的)。我们发现名字对于构建种族维度是有效的,撒哈拉以南非洲/欧洲名字类别比其他类别产生更稳定和可概括的维度,而美国名字则不那么可概括。验证表明,SSA/欧洲名字嵌入与人类评分相关性最强,并证明我们的指标捕获了人类感知的种族语义结构。本研究为构建稳健的种族维度来测量词嵌入中的种族偏见提供了一个框架。补充信息:在线版本包含补充资料,提供地址为10.1007/s42001-025-00449-w。
{"title":"Anchoring race: improving the construction of race dimensions in word embeddings.","authors":"Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus","doi":"10.1007/s42001-025-00449-w","DOIUrl":"10.1007/s42001-025-00449-w","url":null,"abstract":"<p><p>Word embeddings have become powerful tools for detecting social biases encoded in language, yet research on measuring race bias through embeddings remains underdeveloped compared to studies on gender bias. This gap largely stems from the complexity of constructing race dimensions, which involve socially contested meanings and less clear semantic oppositions. Existing studies on race bias often rely on intuition and context-specific approaches when choosing anchor terms. In this paper, we address this methodological gap by providing statistical metrics to evaluate the quality and adaptability of race categories in embeddings. We apply these metrics to race categories across three embeddings-Google News (U.S.-centric), South African News (South African context), and Wikipedia (neutral, general-purpose). We find that names are effective for constructing race dimensions, with sub-Saharan African/European name categories producing more stable and generalisable dimensions than other categories, while American names were less generalisable. Validation shows that SSA/European name embeddings correlate most strongly with human ratings and demonstrate that our metrics capture human-perceived semantic structure of race. This research provides a framework for constructing robust race dimensions for measuring race bias in word embeddings.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s42001-025-00449-w.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"9 1","pages":"20"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808247/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulating relational event history data: why and how. 模拟关系事件历史数据:原因和方法。
IF 2.3 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2025-08-21 DOI: 10.1007/s42001-025-00427-2
Rumana Lakdawala, Joris Mulder, Roger Leenders

Many important social phenomena are characterized by repeated interactions among individuals over time such as email exchanges in an organization or face-to-face interactions in a classroom. To understand the underlying mechanisms of social interaction dynamics, statistical simulation techniques for network data at fine temporal granularity are crucial. This article makes two contributions to the field. First, we present statistical frameworks to simulate relational event networks under dyadic and actor-oriented relational event models implemented in an R package remulate. Second, we show how this simulation framework can address key challenges in temporal social network analysis through five case studies. The first study illustrates the necessity of simulation based techniques for model assessment, using a network of criminal gangs. The second shows how simulation supports social theory development which is illustrated via optimal distinctiveness theory. The third explores simulation for understanding the effects of network interventions. In the fourth study, we illustrate how simulation-based analysis can be used to assess the sensitivity of relational event models. The fifth study demonstrates how simulation frameworks can be used to make predictions about future relational dynamics. Through these case studies and software, researchers will be able to better understand social interaction dynamics using relational event data from real-life networks.

许多重要的社会现象的特点是个体之间随着时间的推移而重复互动,例如组织中的电子邮件交流或课堂上的面对面互动。为了理解社会互动动态的潜在机制,精细时间粒度的网络数据统计模拟技术至关重要。这篇文章对这个领域有两个贡献。首先,我们提出了统计框架来模拟在二元和面向参与者的关系事件模型下的关系事件网络,这些模型在R包中实现。其次,我们通过五个案例研究展示了该模拟框架如何解决时间社会网络分析中的关键挑战。第一项研究说明了使用犯罪团伙网络进行模型评估的基于模拟技术的必要性。第二部分显示了模拟如何支持社会理论的发展,这是通过最优独特性理论来说明的。第三篇探讨了理解网络干预效果的模拟。在第四项研究中,我们说明了如何使用基于模拟的分析来评估关系事件模型的敏感性。第五项研究展示了如何使用模拟框架来预测未来的关系动态。通过这些案例研究和软件,研究人员将能够利用来自现实生活网络的关系事件数据更好地理解社会互动动态。
{"title":"Simulating relational event history data: why and how.","authors":"Rumana Lakdawala, Joris Mulder, Roger Leenders","doi":"10.1007/s42001-025-00427-2","DOIUrl":"https://doi.org/10.1007/s42001-025-00427-2","url":null,"abstract":"<p><p>Many important social phenomena are characterized by repeated interactions among individuals over time such as email exchanges in an organization or face-to-face interactions in a classroom. To understand the underlying mechanisms of social interaction dynamics, statistical simulation techniques for network data at fine temporal granularity are crucial. This article makes two contributions to the field. First, we present statistical frameworks to simulate relational event networks under dyadic and actor-oriented relational event models implemented in an R package remulate. Second, we show how this simulation framework can address key challenges in temporal social network analysis through five case studies. The first study illustrates the necessity of simulation based techniques for model assessment, using a network of criminal gangs. The second shows how simulation supports social theory development which is illustrated via optimal distinctiveness theory. The third explores simulation for understanding the effects of network interventions. In the fourth study, we illustrate how simulation-based analysis can be used to assess the sensitivity of relational event models. The fifth study demonstrates how simulation frameworks can be used to make predictions about future relational dynamics. Through these case studies and software, researchers will be able to better understand social interaction dynamics using relational event data from real-life networks.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 4","pages":"92"},"PeriodicalIF":2.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning. 用于文本注释的开源法学硕士:模型设置和微调的实用指南。
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2024-12-18 DOI: 10.1007/s42001-024-00345-9
Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan D Bermeo, Maria Korobeynikova, Fabrizio Gilardi

This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models' effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT - 3.5 and GPT-4, though still lagging behind fine-tuned GPT - 3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.

Supplementary information: The online version contains supplementary material available at 10.1007/s42001-024-00345-9.

本文研究了开源大型语言模型(LLMs)在政治学研究中典型的文本分类任务中的性能。通过检查立场、主题和相关分类等任务,我们旨在指导学者在使用法学硕士进行文本分析时做出明智的决定,并建立一个基线性能基准,以证明模型的有效性。具体来说,我们使用新闻文章和tweet数据集对一系列文本注释任务中的零射击和微调llm进行评估。我们的分析表明,微调提高了开源llm的性能,使它们能够匹配甚至超过零射击GPT- 3.5和GPT-4,尽管仍然落后于微调后的GPT- 3.5。我们进一步确定微调比使用相对适度数量的注释文本进行少量射击训练更可取。我们的研究结果表明,经过微调的开源法学硕士可以有效地部署在广泛的文本注释应用程序中。我们为其他研究人员提供了一个Python笔记本,方便llm在文本注释中的应用。补充信息:在线版本包含补充资料,网址为10.1007/s42001-024-00345-9。
{"title":"Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning.","authors":"Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan D Bermeo, Maria Korobeynikova, Fabrizio Gilardi","doi":"10.1007/s42001-024-00345-9","DOIUrl":"10.1007/s42001-024-00345-9","url":null,"abstract":"<p><p>This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis and to establish a baseline performance benchmark that demonstrates the models' effectiveness. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT <math><mo>-</mo></math> 3.5 and GPT-4, though still lagging behind fine-tuned GPT <math><mo>-</mo></math> 3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s42001-024-00345-9.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 1","pages":"17"},"PeriodicalIF":2.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142877777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capitalizing on a crisis: a computational analysis of all five million British firms during the Covid-19 pandemic. 利用危机:对2019冠状病毒病大流行期间所有500万家英国公司的计算分析
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2025-02-07 DOI: 10.1007/s42001-025-00360-4
Naomi Muggleton, Charles Rahal, Aaron Reeves

The Covid-19 pandemic brought unprecedented changes to business ownership in the UK which affects a generation of entrepreneurs and their employees. Nonetheless, the impact remains poorly understood. This is because research on capital accumulation has typically lacked high-quality, individualized, population-level data. We overcome these barriers to examine who benefits from economic crises through a computationally orientated lens of firm creation. Leveraging a comprehensive cache of administrative data on every UK firm and all nine million people running them, combined with probabilistic algorithms, we conduct individual-level analyzis to understand who became Covid entrepreneurs. Using these techniques, we explore characteristics of entrepreneurs-such as age, gender, region, business experience, and industry-which potentially predict Covid entrepreneurship. By employing an automated time series model selection procedure to generate counterfactuals, we show that Covid entrepreneurs were typically aged 35-49 (40.4%), men (73.1%), and had previously held roles in existing firms (59.4%). For most industries, growth was disproportionately concentrated around London. It was therefore existing corporate elites who were most able to capitalize on the Covid crisis and not, as some hypothesized, young entrepreneurs who were setting up their first businesses. In this respect, the pandemic will likely impact future wealth inequalities. Our work offers methodological guidance for future policymakers during economic crises and highlights the long-term consequences for capital and wealth inequality.

新冠肺炎疫情给英国的企业所有制带来了前所未有的变化,影响了一代企业家及其员工。尽管如此,人们对其影响仍知之甚少。这是因为对资本积累的研究通常缺乏高质量的、个性化的、人口水平的数据。我们克服了这些障碍,通过以计算为导向的企业创建视角来考察谁从经济危机中受益。利用每家英国公司及其900万运营人员的综合管理数据缓存,结合概率算法,我们进行了个人层面的分析,以了解谁成为了新冠肺炎企业家。利用这些技术,我们探索了企业家的特征,如年龄、性别、地区、商业经验和行业,这些特征可能会预测新冠肺炎创业。通过采用自动时间序列模型选择程序生成反事实,我们发现Covid企业家通常年龄在35-49岁(40.4%),男性(73.1%),之前曾在现有公司担任过职务(59.4%)。对大多数行业来说,增长不成比例地集中在伦敦。因此,最有能力利用新冠危机的是现有的企业精英,而不是像一些人假设的那样,是那些第一次创业的年轻企业家。在这方面,疫情可能会影响未来的财富不平等。我们的工作为未来经济危机期间的政策制定者提供了方法论指导,并强调了资本和财富不平等的长期后果。
{"title":"Capitalizing on a crisis: a computational analysis of all five million British firms during the Covid-19 pandemic.","authors":"Naomi Muggleton, Charles Rahal, Aaron Reeves","doi":"10.1007/s42001-025-00360-4","DOIUrl":"10.1007/s42001-025-00360-4","url":null,"abstract":"<p><p>The Covid-19 pandemic brought unprecedented changes to business ownership in the UK which affects a generation of entrepreneurs and their employees. Nonetheless, the impact remains poorly understood. This is because research on capital accumulation has typically lacked high-quality, individualized, population-level data. We overcome these barriers to examine who benefits from economic crises through a computationally orientated lens of firm creation. Leveraging a comprehensive cache of administrative data on every UK firm and all nine million people running them, combined with probabilistic algorithms, we conduct individual-level analyzis to understand who became Covid entrepreneurs. Using these techniques, we explore characteristics of entrepreneurs-such as age, gender, region, business experience, and industry-which potentially predict Covid entrepreneurship. By employing an automated time series model selection procedure to generate counterfactuals, we show that Covid entrepreneurs were typically aged 35-49 (40.4%), men (73.1%), and had previously held roles in existing firms (59.4%). For most industries, growth was disproportionately concentrated around London. It was therefore existing corporate elites who were most able to capitalize on the Covid crisis and not, as some hypothesized, young entrepreneurs who were setting up their first businesses. In this respect, the pandemic will likely impact future wealth inequalities. Our work offers methodological guidance for future policymakers during economic crises and highlights the long-term consequences for capital and wealth inequality.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 2","pages":"29"},"PeriodicalIF":2.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11805783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143383470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing sexual predation and victimization through warnings and awareness among high-risk users. 通过警告和提高高风险使用者的意识,减少性侵犯和性受害。
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2025-06-29 DOI: 10.1007/s42001-025-00399-3
Masanori Takano, Mao Nishiguchi, Fujio Toriumi

Online sexual predators target children by building trust, creating dependency, and arranging meetings for sexual purposes. This poses a significant challenge for online communication platforms that strive to monitor and remove such content and terminate predators' accounts. However, these platforms can only take such actions if sexual predators explicitly violate the terms of service, not during the initial stages of relationship-building. This study designed and evaluated a strategy to prevent sexual predation and victimization by delivering warnings and raising awareness among high-risk individuals based on the routine activity theory in criminal psychology. We identified high-risk users as those with a high probability of committing or being subjected to violations, using a machine learning model that analyzed social networks and monitoring data from the platform. We conducted a randomized controlled trial on a Japanese avatar-based communication application, Pigg Party. High-risk players in the intervention group received warnings and awareness-building messages, while those in the control group did not receive the messages, regardless of their risk level. The trial involved 12,842 high-risk players in the intervention group and 12,844 in the control group for 138 days. The intervention successfully reduced violations and being violated among women for 12 weeks, although the impact on men was limited. These findings contribute to efforts to combat online sexual abuse and advance understanding of criminal psychology.

Supplementary information: The online version contains supplementary material available at 10.1007/s42001-025-00399-3.

网络上的性侵犯者通过建立信任、建立依赖和安排会面来达到性目的。这对在线交流平台构成了重大挑战,这些平台努力监控和删除此类内容,并终止掠夺者的账户。然而,这些平台只有在性侵犯者明确违反服务条款的情况下才能采取此类行动,而不是在建立关系的初始阶段。本研究基于犯罪心理学中的常规活动理论,设计并评估了通过警告和提高高危人群的意识来预防性侵害和受害的策略。我们使用分析社交网络和平台监控数据的机器学习模型,将高风险用户识别为那些极有可能犯下或遭受违规行为的用户。我们对日本基于虚拟形象的通信应用程序Pigg Party进行了随机对照试验。干预组的高风险参与者收到了警告和建立意识的信息,而对照组的参与者没有收到这些信息,无论他们的风险水平如何。在为期138天的试验中,干预组有12842名高危球员,对照组有12844名。干预措施在12周内成功地减少了妇女的侵犯行为和被侵犯行为,尽管对男子的影响有限。这些发现有助于打击网络性虐待,并促进对犯罪心理学的理解。补充信息:在线版本包含补充资料,提供地址为10.1007/s42001-025-00399-3。
{"title":"Reducing sexual predation and victimization through warnings and awareness among high-risk users.","authors":"Masanori Takano, Mao Nishiguchi, Fujio Toriumi","doi":"10.1007/s42001-025-00399-3","DOIUrl":"10.1007/s42001-025-00399-3","url":null,"abstract":"<p><p>Online sexual predators target children by building trust, creating dependency, and arranging meetings for sexual purposes. This poses a significant challenge for online communication platforms that strive to monitor and remove such content and terminate predators' accounts. However, these platforms can only take such actions if sexual predators explicitly violate the terms of service, not during the initial stages of relationship-building. This study designed and evaluated a strategy to prevent sexual predation and victimization by delivering warnings and raising awareness among high-risk individuals based on the routine activity theory in criminal psychology. We identified high-risk users as those with a high probability of committing or being subjected to violations, using a machine learning model that analyzed social networks and monitoring data from the platform. We conducted a randomized controlled trial on a Japanese avatar-based communication application, Pigg Party. High-risk players in the intervention group received warnings and awareness-building messages, while those in the control group did not receive the messages, regardless of their risk level. The trial involved 12,842 high-risk players in the intervention group and 12,844 in the control group for 138 days. The intervention successfully reduced violations and being violated among women for 12 weeks, although the impact on men was limited. These findings contribute to efforts to combat online sexual abuse and advance understanding of criminal psychology.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s42001-025-00399-3.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 3","pages":"70"},"PeriodicalIF":2.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144545148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the structure of the school curriculum with graph neural networks. 用图神经网络探索学校课程结构。
IF 2.3 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2025-09-03 DOI: 10.1007/s42001-025-00420-9
Benjamín Garzón, Vincenzo Perri, Lisi Qarkaxhija, Ingo Scholtes, Martin J Tomasik

School curricula guide the daily learning activities of millions of students. They embody the understanding of the education experts who designed them of how to organize the knowledge that students should acquire in a way that is optimal for learning. This can be viewed as a learning 'theory' which is, nevertheless, rarely put to the test. Here, we model a data set obtained from a Computer-Based Formative Assessment system used by thousands of students. The student-item response matrix is highly sparse and admits a natural representation as a bipartite graph, in which nodes stand for students or items and an edge between a student and an item represents a response of the student to that item. To predict unobserved edge labels (correct/incorrect responses) we resort to a graph neural network (GNN), a machine learning method for graph-structured data. Nodes and edges are represented as multidimensional embeddings. After fitting the model, the learned item embeddings reflect properties of the curriculum, such as item difficulty and the structure of school subject domains and competences. Simulations show that the GNN is particularly advantageous over a classical model when group patterns are present in the connections between students and items, such that students from a particular group have a higher probability of successfully answering items from a specific set. In sum, important aspects of the structure of the school curriculum are reflected in response patterns from educational assessments and can be partially retrieved by our graph-based neural model.

学校课程指导着数百万学生的日常学习活动。它们体现了设计它们的教育专家的理解,即如何以最优的方式组织学生应该获得的知识。这可以被看作是一种学习的“理论”,然而,很少被付诸实践。在这里,我们对一个数据集进行建模,该数据集来自一个由数千名学生使用的基于计算机的形成性评估系统。学生-项目响应矩阵是高度稀疏的,可以自然地表示为二部图,其中节点代表学生或项目,学生和项目之间的边代表学生对该项目的响应。为了预测未观察到的边缘标签(正确/不正确的响应),我们采用了图神经网络(GNN),这是一种用于图结构数据的机器学习方法。节点和边被表示为多维嵌入。拟合模型后,学习到的项目嵌入反映了课程的属性,如项目难度、学校学科领域和能力的结构。模拟表明,当学生和项目之间的联系存在群体模式时,GNN比经典模型特别有优势,这样来自特定群体的学生有更高的概率成功回答特定集合中的项目。总之,学校课程结构的重要方面反映在教育评估的反应模式中,并且可以通过我们基于图的神经模型部分检索。
{"title":"Exploring the structure of the school curriculum with graph neural networks.","authors":"Benjamín Garzón, Vincenzo Perri, Lisi Qarkaxhija, Ingo Scholtes, Martin J Tomasik","doi":"10.1007/s42001-025-00420-9","DOIUrl":"10.1007/s42001-025-00420-9","url":null,"abstract":"<p><p>School curricula guide the daily learning activities of millions of students. They embody the understanding of the education experts who designed them of how to organize the knowledge that students should acquire in a way that is optimal for learning. This can be viewed as a learning 'theory' which is, nevertheless, rarely put to the test. Here, we model a data set obtained from a Computer-Based Formative Assessment system used by thousands of students. The student-item response matrix is highly sparse and admits a natural representation as a bipartite graph, in which nodes stand for students or items and an edge between a student and an item represents a response of the student to that item. To predict unobserved edge labels (correct/incorrect responses) we resort to a graph neural network (GNN), a machine learning method for graph-structured data. Nodes and edges are represented as multidimensional embeddings. After fitting the model, the learned item embeddings reflect properties of the curriculum, such as item difficulty and the structure of school subject domains and competences. Simulations show that the GNN is particularly advantageous over a classical model when group patterns are present in the connections between students and items, such that students from a particular group have a higher probability of successfully answering items from a specific set. In sum, important aspects of the structure of the school curriculum are reflected in response patterns from educational assessments and can be partially retrieved by our graph-based neural model.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 4","pages":"99"},"PeriodicalIF":2.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying the factors influencing the development of bilateral investment treaties with health safeguards: a Machine Learning-based link prediction approach. 确定影响制定具有健康保障的双边投资条约的因素:基于机器学习的联系预测方法。
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1007/s42001-024-00341-z
Haohui Lu, Anne Marie Thow, Dori Patay, Takwa Tissaoui, Nicholas Frank, Holly Rippin, Tien Dat Hoang, Fabio Gomes, Wolfgang Alschner, Shahadat Uddin

A network analysis approach, complemented by machine learning (ML) techniques, is applied to analyse the factors influencing Bilateral Investment Treaties (BITs) at the country level. Using the Electronic Database of Investment Treaties, BITs with health safeguards from 167 countries were charted, resulting in 534 connections with countries as nodes and their BITs as edges. Network analysis found that, on average, a country established BITs with six other nations. Additionally, we used node embedding techniques to generate features from the network, such as the Jaccard coefficient, resource allocation, and Adamic Adar for downstream link prediction. This study employed five tree-based ML models to predict future BIT formations with health inclusion. The eXtreme Gradient Boosting model proved to be superior, achieving a 64.02% accuracy rate. Notably, the Common Neighbor centrality feature and the Capital Account Balance Ratio emerged as influential factors in creating new BITs with health inclusions. Beyond economic considerations, our study highlighted a vital intersection: the nexus between BITs, economic growth, and public health policies. In essence, this research underscores the importance of safeguarding public health in BITs and showcases the potential of ML in understanding the intricacies of international treaties.

采用网络分析方法,辅以机器学习(ML)技术,在国家一级分析影响双边投资条约(BITs)的因素。利用投资条约电子数据库,绘制了167个国家的具有卫生保障措施的双边投资协定图表,结果将534个国家作为节点,将它们的双边投资协定作为边缘。网络分析发现,平均而言,一个国家与其他六个国家建立了双边投资协定。此外,我们使用节点嵌入技术从网络中生成特征,如Jaccard系数、资源分配和用于下游链路预测的Adamic Adar。本研究采用了五种基于树的机器学习模型来预测未来具有健康包容性的BIT地层。结果表明,eXtreme Gradient Boosting模型的准确率达到了64.02%。值得注意的是,共同邻国中心性特征和资本账户余额比率成为创建包含卫生内容的新双边投资协定的影响因素。除了经济方面的考虑,我们的研究还强调了一个重要的交叉点:双边投资协定、经济增长和公共卫生政策之间的联系。从本质上讲,这项研究强调了在双边投资协定中保护公共卫生的重要性,并展示了机器学习在理解错综复杂的国际条约方面的潜力。
{"title":"Identifying the factors influencing the development of bilateral investment treaties with health safeguards: a Machine Learning-based link prediction approach.","authors":"Haohui Lu, Anne Marie Thow, Dori Patay, Takwa Tissaoui, Nicholas Frank, Holly Rippin, Tien Dat Hoang, Fabio Gomes, Wolfgang Alschner, Shahadat Uddin","doi":"10.1007/s42001-024-00341-z","DOIUrl":"10.1007/s42001-024-00341-z","url":null,"abstract":"<p><p>A network analysis approach, complemented by machine learning (ML) techniques, is applied to analyse the factors influencing Bilateral Investment Treaties (BITs) at the country level. Using the Electronic Database of Investment Treaties, BITs with health safeguards from 167 countries were charted, resulting in 534 connections with countries as nodes and their BITs as edges. Network analysis found that, on average, a country established BITs with six other nations. Additionally, we used node embedding techniques to generate features from the network, such as the Jaccard coefficient, resource allocation, and Adamic Adar for downstream link prediction. This study employed five tree-based ML models to predict future BIT formations with health inclusion. The eXtreme Gradient Boosting model proved to be superior, achieving a 64.02% accuracy rate. Notably, the Common Neighbor centrality feature and the Capital Account Balance Ratio emerged as influential factors in creating new BITs with health inclusions. Beyond economic considerations, our study highlighted a vital intersection: the nexus between BITs, economic growth, and public health policies. In essence, this research underscores the importance of safeguarding public health in BITs and showcases the potential of ML in understanding the intricacies of international treaties.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"8 1","pages":"8"},"PeriodicalIF":2.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11621195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142801605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora 报道俄罗斯入侵乌克兰的 Telegram 频道:大型多语言语料库比较分析
IF 3.2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-01-03 DOI: 10.1007/s42001-023-00240-9
Anton Oleinik
{"title":"Telegram channels covering Russia’s invasion of Ukraine: a comparative analysis of large multilingual corpora","authors":"Anton Oleinik","doi":"10.1007/s42001-023-00240-9","DOIUrl":"https://doi.org/10.1007/s42001-023-00240-9","url":null,"abstract":"","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"35 50","pages":"1-24"},"PeriodicalIF":3.2,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139388791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modelling study to explore the effects of regional socio-economics on the spreading of epidemics. 通过建模研究探讨地区社会经济对流行病传播的影响。
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-01-01 Epub Date: 2024-08-14 DOI: 10.1007/s42001-024-00322-2
Jan E Snellman, Rafael A Barrio, Kimmo K Kaski, Maarit J Korpi-Lagg

Epidemics, apart from affecting the health of populations, can have large impacts on their social and economic behavior and subsequently feed back to and influence the spreading of the disease. This calls for systematic investigation which factors affect significantly and either beneficially or adversely the disease spreading and regional socio-economics. Based on our recently developed hybrid agent-based socio-economy and epidemic spreading model we perform extensive exploration of its six-dimensional parameter space of the socio-economic part of the model, namely, the attitudes towards the spread of the pandemic, health and the economic situation for both, the population and government agents who impose regulations. We search for significant patterns from the resulting simulated data using basic classification tools, such as self-organizing maps and principal component analysis, and we monitor different quantities of the model output, such as infection rates, the propagation speed of the epidemic, economic activity, government regulations, and the compliance of population on government restrictions. Out of these, the ones describing the epidemic spreading were resulting in the most distinctive clustering of the data, and they were selected as the basis of the remaining analysis. We relate the found clusters to three distinct types of disease spreading: wave-like, chaotic, and transitional spreading patterns. The most important value parameter contributing to phase changes and the speed of the epidemic was found to be the compliance of the population agents towards the government regulations. We conclude that in compliant populations, the infection rates are significantly lower and the infection spreading is slower, while the population agents' health and economical attitudes show a weaker effect.

流行病除了影响人们的健康外,还会对他们的社会和经济行为产生巨大影响,进而反馈和影响疾病的传播。这就需要系统地研究哪些因素会对疾病传播和区域社会经济产生重大的有利或不利影响。基于我们最近开发的基于代理的混合社会经济和疫情传播模型,我们对模型中社会经济部分的六维参数空间进行了广泛的探索,即人口和实施监管的政府代理对疫情传播、健康和经济状况的态度。我们使用自组织图和主成分分析等基本分类工具从模拟数据中寻找重要模式,并监测模型输出的不同数量,如感染率、疫情传播速度、经济活动、政府法规和民众对政府限制措施的遵守情况。其中,描述疫情传播的数据聚类最为明显,因此被选为后续分析的基础。我们将所发现的聚类与疾病传播的三种不同类型联系起来:波浪式、混沌式和过渡式传播模式。我们发现,导致阶段变化和流行速度的最重要的价值参数是人口代理对政府法规的遵守程度。我们的结论是,在遵守规定的人群中,感染率明显较低,感染传播速度也较慢,而人口代理的健康和经济态度的影响较弱。
{"title":"A modelling study to explore the effects of regional socio-economics on the spreading of epidemics.","authors":"Jan E Snellman, Rafael A Barrio, Kimmo K Kaski, Maarit J Korpi-Lagg","doi":"10.1007/s42001-024-00322-2","DOIUrl":"https://doi.org/10.1007/s42001-024-00322-2","url":null,"abstract":"<p><p>Epidemics, apart from affecting the health of populations, can have large impacts on their social and economic behavior and subsequently feed back to and influence the spreading of the disease. This calls for systematic investigation which factors affect significantly and either beneficially or adversely the disease spreading and regional socio-economics. Based on our recently developed hybrid agent-based socio-economy and epidemic spreading model we perform extensive exploration of its six-dimensional parameter space of the socio-economic part of the model, namely, the attitudes towards the spread of the pandemic, health and the economic situation for both, the population and government agents who impose regulations. We search for significant patterns from the resulting simulated data using basic classification tools, such as self-organizing maps and principal component analysis, and we monitor different quantities of the model output, such as infection rates, the propagation speed of the epidemic, economic activity, government regulations, and the compliance of population on government restrictions. Out of these, the ones describing the epidemic spreading were resulting in the most distinctive clustering of the data, and they were selected as the basis of the remaining analysis. We relate the found clusters to three distinct types of disease spreading: wave-like, chaotic, and transitional spreading patterns. The most important value parameter contributing to phase changes and the speed of the epidemic was found to be the compliance of the population agents towards the government regulations. We conclude that in compliant populations, the infection rates are significantly lower and the infection spreading is slower, while the population agents' health and economical attitudes show a weaker effect.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"7 3","pages":"2535-2562"},"PeriodicalIF":2.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast meta-analytic approximations for relational event models: applications to data streams and multilevel data. 关系事件模型的快速元分析近似:数据流和多层次数据的应用。
IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-01-01 Epub Date: 2024-06-08 DOI: 10.1007/s42001-024-00290-7
Fabio Vieira, Roger Leenders, Joris Mulder

Large relational-event history data stemming from large networks are becoming increasingly available due to recent technological developments (e.g. digital communication, online databases, etc). This opens many new doors to learn about complex interaction behavior between actors in temporal social networks. The relational event model has become the gold standard for relational event history analysis. Currently, however, the main bottleneck to fit relational events models is of computational nature in the form of memory storage limitations and computational complexity. Relational event models are therefore mainly used for relatively small data sets while larger, more interesting datasets, including multilevel data structures and relational event data streams, cannot be analyzed on standard desktop computers. This paper addresses this problem by developing approximation algorithms based on meta-analysis methods that can fit relational event models significantly faster while avoiding the computational issues. In particular, meta-analytic approximations are proposed for analyzing streams of relational event data, multilevel relational event data and potentially combinations thereof. The accuracy and the statistical properties of the methods are assessed using numerical simulations. Furthermore, real-world data are used to illustrate the potential of the methodology to study social interaction behavior in an organizational network and interaction behavior among political actors. The algorithms are implemented in the publicly available R package 'remx'.

由于最近的技术发展(如数字通信、在线数据库等),从大型网络中产生的大量关系-事件历史数据越来越容易获得。这为了解时态社交网络中参与者之间复杂的互动行为打开了许多新的大门。关系事件模型已成为关系事件历史分析的黄金标准。然而,目前关系事件模型的主要瓶颈在于内存存储的限制和计算的复杂性。因此,关系事件模型主要用于相对较小的数据集,而包括多级数据结构和关系事件数据流在内的更大型、更有趣的数据集则无法在标准台式计算机上进行分析。本文通过开发基于元分析方法的近似算法来解决这个问题,这种算法可以大大加快拟合关系型事件模型的速度,同时避免了计算问题。特别是,本文提出了用于分析关系事件数据流、多层次关系事件数据及其潜在组合的元分析近似值。通过数值模拟对这些方法的准确性和统计特性进行了评估。此外,还使用真实世界的数据来说明该方法在研究组织网络中的社会互动行为和政治参与者之间的互动行为方面的潜力。这些算法在公开可用的 R 软件包 "remx "中实现。
{"title":"Fast meta-analytic approximations for relational event models: applications to data streams and multilevel data.","authors":"Fabio Vieira, Roger Leenders, Joris Mulder","doi":"10.1007/s42001-024-00290-7","DOIUrl":"10.1007/s42001-024-00290-7","url":null,"abstract":"<p><p>Large relational-event history data stemming from large networks are becoming increasingly available due to recent technological developments (e.g. digital communication, online databases, etc). This opens many new doors to learn about complex interaction behavior between actors in temporal social networks. The relational event model has become the gold standard for relational event history analysis. Currently, however, the main bottleneck to fit relational events models is of computational nature in the form of memory storage limitations and computational complexity. Relational event models are therefore mainly used for relatively small data sets while larger, more interesting datasets, including multilevel data structures and relational event data streams, cannot be analyzed on standard desktop computers. This paper addresses this problem by developing approximation algorithms based on meta-analysis methods that can fit relational event models significantly faster while avoiding the computational issues. In particular, meta-analytic approximations are proposed for analyzing streams of relational event data, multilevel relational event data and potentially combinations thereof. The accuracy and the statistical properties of the methods are assessed using numerical simulations. Furthermore, real-world data are used to illustrate the potential of the methodology to study social interaction behavior in an organizational network and interaction behavior among political actors. The algorithms are implemented in the publicly available R package 'remx'.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"7 2","pages":"1823-1859"},"PeriodicalIF":2.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11452451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142381790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Social Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1