Pub Date : 2025-05-07DOI: 10.1177/00491241251339184
Thomas Davidson, Daniel Karell
Generative artificial intelligence (AI) offers new capabilities for analyzing data, creating synthetic media, and simulating realistic social interactions. This essay introduces a special issue that examines how these and other affordances of generative AI can advance social science research. We discuss three core themes that appear across the contributed articles: rigorous measurement and validation of AI-generated outputs, optimizing model performance and reproducibility via prompting, and novel uses of AI for the simulation of attitudes and behaviors. We highlight how generative AI enable new methodological innovations that complement and augment existing approaches. This essay and the special issue’s ten articles collectively provide a detailed roadmap for integrating generative AI into social science research in theoretically informed and methodologically rigorous ways. We conclude by reflecting on the implications of the ongoing advances in AI.
{"title":"Integrating Generative Artificial Intelligence into Social Science Research: Measurement, Prompting, and Simulation","authors":"Thomas Davidson, Daniel Karell","doi":"10.1177/00491241251339184","DOIUrl":"https://doi.org/10.1177/00491241251339184","url":null,"abstract":"Generative artificial intelligence (AI) offers new capabilities for analyzing data, creating synthetic media, and simulating realistic social interactions. This essay introduces a special issue that examines how these and other affordances of generative AI can advance social science research. We discuss three core themes that appear across the contributed articles: rigorous measurement and validation of AI-generated outputs, optimizing model performance and reproducibility via prompting, and novel uses of AI for the simulation of attitudes and behaviors. We highlight how generative AI enable new methodological innovations that complement and augment existing approaches. This essay and the special issue’s ten articles collectively provide a detailed roadmap for integrating generative AI into social science research in theoretically informed and methodologically rigorous ways. We conclude by reflecting on the implications of the ongoing advances in AI.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"15 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143920458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-24DOI: 10.1177/00491241251325243
Youngjin Chae, Thomas Davidson
Large language models (LLMs) have tremendous potential for social science research as they are trained on vast amounts of text and can generalize to many tasks. We explore the use of LLMs for supervised text classification, specifically the application to stance detection, which involves detecting attitudes and opinions in texts. We examine the performance of these models across different architectures, training regimes, and task specifications. We compare 10 models ranging in size from tens of millions to hundreds of billions of parameters and test four distinct training regimes: Prompt-based zero-shot learning and few-shot learning, fine-tuning, and instruction-tuning, which combines prompting and fine-tuning. The largest, most powerful models generally offer the best predictive performance even with little or no training examples, but fine-tuning smaller models is a competitive solution due to their relatively high accuracy and low cost. Instruction-tuning the latest generative LLMs expands the scope of text classification, enabling applications to more complex tasks than previously feasible. We offer practical recommendations on the use of LLMs for text classification in sociological research and discuss their limitations and challenges. Ultimately, LLMs can make text classification and other text analysis methods more accurate, accessible, and adaptable, opening new possibilities for computational social science.
{"title":"Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning","authors":"Youngjin Chae, Thomas Davidson","doi":"10.1177/00491241251325243","DOIUrl":"https://doi.org/10.1177/00491241251325243","url":null,"abstract":"Large language models (LLMs) have tremendous potential for social science research as they are trained on vast amounts of text and can generalize to many tasks. We explore the use of LLMs for supervised text classification, specifically the application to stance detection, which involves detecting attitudes and opinions in texts. We examine the performance of these models across different architectures, training regimes, and task specifications. We compare 10 models ranging in size from tens of millions to hundreds of billions of parameters and test four distinct training regimes: Prompt-based zero-shot learning and few-shot learning, fine-tuning, and instruction-tuning, which combines prompting and fine-tuning. The largest, most powerful models generally offer the best predictive performance even with little or no training examples, but fine-tuning smaller models is a competitive solution due to their relatively high accuracy and low cost. Instruction-tuning the latest generative LLMs expands the scope of text classification, enabling applications to more complex tasks than previously feasible. We offer practical recommendations on the use of LLMs for text classification in sociological research and discuss their limitations and challenges. Ultimately, LLMs can make text classification and other text analysis methods more accurate, accessible, and adaptable, opening new possibilities for computational social science.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"72 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143866960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-24DOI: 10.1177/00491241251334124
Donald Tomaskovic-Devey, Chen-Shuo Hong
We examine variations in pay gap estimates and inferences associated with distinct conceptualizations of jobs and employment contexts under legal and comparable worth theories of pay bias. We find that job titles produce smaller estimates of within job pay gaps than job groups, but the inferential importance of job concepts differs across organizational, workplace, and job groups within workplace units of observation. Moving from more to less job concept detail, we find almost no inference differences when pay gaps are estimated at the organizational level. Tradeoffs at the workplace and job groups within workplace levels are more common, comprising around 10 percent to 20 percent of observations. A legal theoretical framework leads to fewer empirical estimates of significant pay disparities, while comparable worth estimates suggest higher levels of gender and racial bias at the job and workplace levels. This research has implications for future analyses of linked employer-employee data and for both scientific research and regulatory enforcement of equal opportunity law.
{"title":"Conceptualizing Job and Employment Concepts for Earnings Inequality Estimands With Linked Employer-Employee Data 1","authors":"Donald Tomaskovic-Devey, Chen-Shuo Hong","doi":"10.1177/00491241251334124","DOIUrl":"https://doi.org/10.1177/00491241251334124","url":null,"abstract":"We examine variations in pay gap estimates and inferences associated with distinct conceptualizations of jobs and employment contexts under legal and comparable worth theories of pay bias. We find that job titles produce smaller estimates of within job pay gaps than job groups, but the inferential importance of job concepts differs across organizational, workplace, and job groups within workplace units of observation. Moving from more to less job concept detail, we find almost no inference differences when pay gaps are estimated at the organizational level. Tradeoffs at the workplace and job groups within workplace levels are more common, comprising around 10 percent to 20 percent of observations. A legal theoretical framework leads to fewer empirical estimates of significant pay disparities, while comparable worth estimates suggest higher levels of gender and racial bias at the job and workplace levels. This research has implications for future analyses of linked employer-employee data and for both scientific research and regulatory enforcement of equal opportunity law.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"17 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143866959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22DOI: 10.1177/00491241251314037
John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai, Chanelle J. Howe
We present a conceptual model to measure disparity—the target study—where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address nonrandom sample selection, we extend our model to generalize or transport disparity or to assess disparity after an intervention on eligibility-related variables that eliminates forms of collider-stratification. To avoid bias from differential timing of enrollment, we aggregate time-specific study results by balancing calendar time of enrollment across social groups. To provide a framework for emulating our model, we discuss study designs, data structures, and G-computation and weighting estimators. We compare our sampling-based model to prominent decomposition-based models used in healthcare and algorithmic fairness. We provide R code for all estimators and apply our methods to measure health system disparities in hypertension control using electronic medical records.
{"title":"The Target Study: A Conceptual Model and Framework for Measuring Disparity","authors":"John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai, Chanelle J. Howe","doi":"10.1177/00491241251314037","DOIUrl":"https://doi.org/10.1177/00491241251314037","url":null,"abstract":"We present a conceptual model to measure disparity—the target study—where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address nonrandom sample selection, we extend our model to generalize or transport disparity or to assess disparity after an intervention on eligibility-related variables that eliminates forms of collider-stratification. To avoid bias from differential timing of enrollment, we aggregate time-specific study results by balancing calendar time of enrollment across social groups. To provide a framework for emulating our model, we discuss study designs, data structures, and G-computation and weighting estimators. We compare our sampling-based model to prominent decomposition-based models used in healthcare and algorithmic fairness. We provide R code for all estimators and apply our methods to measure health system disparities in hypertension control using electronic medical records.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"26 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22DOI: 10.1177/00491241251321152
Chen-Shuo Hong
Social networks literature has explored homophily, the tendency to associate with similar others, as a critical boundary-making process contributing to segregated networks along the lines of identities. Yet, social network research generally conceptualizes identities as sociodemographic categories and seldom considers the inherently continuous and heterogeneous nature of differences. Drawing upon the infracategorical model of inequality, this study demonstrates that a computational approach – combining machine learning and exponential random graph models (ERGMs) – can capture the role of categorical conformity in network structures. Through a case study of gender segregation in friendships, this study presents a workflow for developing a machine-learning-based gender conformity measure and applying it to guide the social network analysis of cultural matching. Results show that adolescents with similar gender conformity are more likely to form friendships, net of homophily based on categorical gender and other controls, and homophily by gender conformity mediates homophily by categorical gender. The study concludes by discussing the limitations of this computational approach and its unique strengths in enhancing theories on categories, boundaries, and stratification.
{"title":"Networks Beyond Categories: A Computational Approach to Examining Gender Homophily","authors":"Chen-Shuo Hong","doi":"10.1177/00491241251321152","DOIUrl":"https://doi.org/10.1177/00491241251321152","url":null,"abstract":"Social networks literature has explored homophily, the tendency to associate with similar others, as a critical boundary-making process contributing to segregated networks along the lines of identities. Yet, social network research generally conceptualizes identities as sociodemographic categories and seldom considers the inherently continuous and heterogeneous nature of differences. Drawing upon the infracategorical model of inequality, this study demonstrates that a computational approach – combining machine learning and exponential random graph models (ERGMs) – can capture the role of categorical conformity in network structures. Through a case study of gender segregation in friendships, this study presents a workflow for developing a machine-learning-based gender conformity measure and applying it to guide the social network analysis of cultural matching. Results show that adolescents with similar gender conformity are more likely to form friendships, net of homophily based on categorical gender and other controls, and homophily by gender conformity mediates homophily by categorical gender. The study concludes by discussing the limitations of this computational approach and its unique strengths in enhancing theories on categories, boundaries, and stratification.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"32 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22DOI: 10.1177/00491241251326865
David Broska, Michael Howes, Austin van Loon
Large language models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not interchangeable , there is limited guidance on using LLMs to obtain valid estimates of causal effects and other parameters. We argue that LLM predictions should be treated as potentially informative observations, while human subjects serve as a gold standard in a mixed subjects design . This paradigm preserves validity and offers more precise estimates at a lower cost than experiments relying exclusively on human subjects. We demonstrate—and extend—prediction-powered inference (PPI), a method that combines predictions and observations. We define the PPI correlation as a measure of interchangeability and derive the effective sample size for PPI. We also introduce a power analysis to optimally choose between informative but costly human subjects and less informative but cheap predictions of human behavior. Mixed subjects designs could enhance scientific productivity and reduce inequality in access to costly evidence.
{"title":"The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations","authors":"David Broska, Michael Howes, Austin van Loon","doi":"10.1177/00491241251326865","DOIUrl":"https://doi.org/10.1177/00491241251326865","url":null,"abstract":"Large language models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not <jats:italic>interchangeable</jats:italic> , there is limited guidance on using LLMs to obtain valid estimates of causal effects and other parameters. We argue that LLM predictions should be treated as potentially informative observations, while human subjects serve as a gold standard in a <jats:italic>mixed subjects design</jats:italic> . This paradigm preserves validity and offers more precise estimates at a lower cost than experiments relying exclusively on human subjects. We demonstrate—and extend—prediction-powered inference (PPI), a method that combines predictions and observations. We define the <jats:italic>PPI correlation</jats:italic> as a measure of interchangeability and derive the <jats:italic>effective sample size</jats:italic> for PPI. We also introduce a power analysis to optimally choose between <jats:italic>informative but costly</jats:italic> human subjects and <jats:italic>less informative but cheap</jats:italic> predictions of human behavior. Mixed subjects designs could enhance scientific productivity and reduce inequality in access to costly evidence.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"4 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1177/00491241251320963
Lai Wei, Yu Xie
The study of mobility effects is an important subject of study in sociology. Empirical investigations of individual mobility effects, however, have been hindered by one fundamental limitation, the unidentifiability of mobility effects when origin and destination are held constant. Given this fundamental limitation, we propose to reconceptualize mobility effects from the micro- to macro-level. Instead of micro-level mobility effects, the primary focus of the past literature, we ask alternative research questions about macro-level mobility effects: What happens to the population distribution of an outcome if we manipulate the mobility regime, that is, if we alter the observed association between social origin and social destination? We relate individual-level mobility experience to macro-level mobility effects under special interventions. The proposed method bridges the macro and micro agendas in social stratification research, and has wider applications in social stratification beyond the study of mobility effects. We illustrate the method with two analyses that evaluate the impact of social mobility on average fertility and income inequality in the United States. We provide an open-source software, the R package socmob , that implements the method.
流动效应研究是社会学的一个重要研究课题。然而,对个人流动效应的实证研究却受到一个基本限制的阻碍,即在原籍地和目的地不变的情况下,流动效应是不可识别的。鉴于这一基本限制,我们建议从微观到宏观层面重新认识流动效应。与以往文献主要关注的微观层面的流动效应不同,我们提出了有关宏观层面流动效应的其他研究问题:如果我们操纵流动制度,也就是说,如果我们改变观察到的社会原籍地和社会目的地之间的关联,结果的人口分布会发生什么变化?我们将个人层面的流动经验与特殊干预下的宏观流动效应联系起来。所提出的方法在社会分层研究的宏观和微观议程之间架起了一座桥梁,在流动效应研究之外的社会分层领域也有更广泛的应用。我们用两个分析来说明该方法,这两个分析评估了社会流动性对美国平均生育率和收入不平等的影响。我们提供了一个开源软件,即实现该方法的 R 软件包 socmob。
{"title":"Social Mobility as Causal Intervention","authors":"Lai Wei, Yu Xie","doi":"10.1177/00491241251320963","DOIUrl":"https://doi.org/10.1177/00491241251320963","url":null,"abstract":"The study of mobility effects is an important subject of study in sociology. Empirical investigations of individual mobility effects, however, have been hindered by one fundamental limitation, the unidentifiability of mobility effects when origin and destination are held constant. Given this fundamental limitation, we propose to reconceptualize mobility effects from the micro- to macro-level. Instead of micro-level mobility effects, the primary focus of the past literature, we ask alternative research questions about macro-level mobility effects: What happens to the population distribution of an outcome if we manipulate the mobility regime, that is, if we alter the observed association between social origin and social destination? We relate individual-level mobility experience to macro-level mobility effects under special interventions. The proposed method bridges the macro and micro agendas in social stratification research, and has wider applications in social stratification beyond the study of mobility effects. We illustrate the method with two analyses that evaluate the impact of social mobility on average fertility and income inequality in the United States. We provide an open-source software, the R package <jats:italic>socmob</jats:italic> , that implements the method.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"1 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1177/00491241251333372
Alessandra Rister Portinari Maranca, Jihoon Chung, Musashi Hinck, Adam D. Wolsky, Naoki Egami, Brandon M. Stewart
Generative artificial intelligence (AI) has shown incredible leaps in performance across data of a variety of modalities including texts, images, audio, and videos. This affords social scientists the ability to annotate variables of interest from unstructured media. While rapidly improving, these methods are far from perfect and, as we show, even ignoring the small amounts of error in high accuracy systems can lead to substantial bias and invalid confidence intervals in downstream analysis. We review how using design-based supervised learning (DSL) guarantees asymptotic unbiasedness and proper confidence interval coverage by making use of a small number of expert annotations. While originally developed for use with large language models in text, we present a series of applications in the context of image analysis, including an investigation of visual predictors of the perceived level of violence in protest images, an analysis of the images shared in the Black Lives Matter movement on Twitter, and a study of U.S. outlets reporting of immigrant caravans. These applications are representative of the type of analysis performed in the visual social science landscape today, and our analyses will exemplify how DSL helps us attain statistical guarantees while using automated methods to reduce human labor.
{"title":"Correcting the Measurement Errors of AI-Assisted Labeling in Image Analysis Using Design-Based Supervised Learning","authors":"Alessandra Rister Portinari Maranca, Jihoon Chung, Musashi Hinck, Adam D. Wolsky, Naoki Egami, Brandon M. Stewart","doi":"10.1177/00491241251333372","DOIUrl":"https://doi.org/10.1177/00491241251333372","url":null,"abstract":"Generative artificial intelligence (AI) has shown incredible leaps in performance across data of a variety of modalities including texts, images, audio, and videos. This affords social scientists the ability to annotate variables of interest from unstructured media. While rapidly improving, these methods are far from perfect and, as we show, even ignoring the small amounts of error in high accuracy systems can lead to substantial bias and invalid confidence intervals in downstream analysis. We review how using design-based supervised learning (DSL) guarantees asymptotic unbiasedness and proper confidence interval coverage by making use of a small number of expert annotations. While originally developed for use with large language models in text, we present a series of applications in the context of image analysis, including an investigation of visual predictors of the perceived level of violence in protest images, an analysis of the images shared in the Black Lives Matter movement on Twitter, and a study of U.S. outlets reporting of immigrant caravans. These applications are representative of the type of analysis performed in the visual social science landscape today, and our analyses will exemplify how DSL helps us attain statistical guarantees while using automated methods to reduce human labor.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"3 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1177/00491241251330582
Julien Boelaert, Samuel Coavoux, Étienne Ollion, Ivaylo Petev, Patrick Präg
Generative artificial intelligence (AI) is increasingly presented as a potential substitute for humans, including as research subjects. However, there is no scientific consensus on how closely these in silico clones can emulate survey respondents. While some defend the use of these “synthetic users,” others point toward social biases in the responses provided by large language models (LLMs). In this article, we demonstrate that these critics are right to be wary of using generative AI to emulate respondents, but probably not for the right reasons. Our results show (i) that to date, models cannot replace research subjects for opinion or attitudinal research; (ii) that they display a strong bias and a low variance on each topic; and (iii) that this bias randomly varies from one topic to the next. We label this pattern “machine bias,” a concept we define, and whose consequences for LLM-based research we further explore.
{"title":"Machine Bias. How Do Generative Language Models Answer Opinion Polls?","authors":"Julien Boelaert, Samuel Coavoux, Étienne Ollion, Ivaylo Petev, Patrick Präg","doi":"10.1177/00491241251330582","DOIUrl":"https://doi.org/10.1177/00491241251330582","url":null,"abstract":"Generative artificial intelligence (AI) is increasingly presented as a potential substitute for humans, including as research subjects. However, there is no scientific consensus on how closely these in silico clones can emulate survey respondents. While some defend the use of these “synthetic users,” others point toward social biases in the responses provided by large language models (LLMs). In this article, we demonstrate that these critics are right to be wary of using generative AI to emulate respondents, but probably not for the right reasons. Our results show (i) that to date, models cannot replace research subjects for opinion or attitudinal research; (ii) that they display a strong bias and a low variance on each topic; and (iii) that this bias randomly varies from one topic to the next. We label this pattern “machine bias,” a concept we define, and whose consequences for LLM-based research we further explore.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"37 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143853640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-19DOI: 10.1177/00491241251326819
Sandrine Chausson, Marion Fourcade, David J. Harding, Björn Ross, Grégory Renard
Modern computational text classification methods have brought social scientists tantalizingly close to the goal of unlocking vast insights buried in text data—from centuries of historical documents to streams of social media posts. Yet three barriers still stand in the way: the tedious labor of manual text annotation, the technical complexity that keeps these tools out of reach for many researchers, and, perhaps most critically, the challenge of bridging the gap between sophisticated algorithms and the deep theoretical understanding social scientists have already developed about human interactions, social structures, and institutions. To counter these limitations, we propose an approach to large-scale text analysis that requires substantially less human-labeled data, and no machine learning expertise, and efficiently integrates the social scientist into critical steps in the workflow. This approach, which allows the detection of statements in text, relies on large language models pre-trained for natural language inference, and a “few-shot” threshold-tuning algorithm rooted in active learning principles. We describe and showcase our approach by analyzing tweets collected during the 2020 U.S. presidential election campaign, and benchmark it against various computational approaches across three datasets.
{"title":"The Insight-Inference Loop: Efficient Text Classification via Natural Language Inference and Threshold-Tuning","authors":"Sandrine Chausson, Marion Fourcade, David J. Harding, Björn Ross, Grégory Renard","doi":"10.1177/00491241251326819","DOIUrl":"https://doi.org/10.1177/00491241251326819","url":null,"abstract":"Modern computational text classification methods have brought social scientists tantalizingly close to the goal of unlocking vast insights buried in text data—from centuries of historical documents to streams of social media posts. Yet three barriers still stand in the way: the tedious labor of manual text annotation, the technical complexity that keeps these tools out of reach for many researchers, and, perhaps most critically, the challenge of bridging the gap between sophisticated algorithms and the deep theoretical understanding social scientists have already developed about human interactions, social structures, and institutions. To counter these limitations, we propose an approach to large-scale text analysis that requires substantially less human-labeled data, and no machine learning expertise, and efficiently integrates the social scientist into critical steps in the workflow. This approach, which allows the detection of statements in text, relies on large language models pre-trained for natural language inference, and a “few-shot” threshold-tuning algorithm rooted in active learning principles. We describe and showcase our approach by analyzing tweets collected during the 2020 U.S. presidential election campaign, and benchmark it against various computational approaches across three datasets.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"1 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}