Pub Date : 2025-06-02DOI: 10.1177/00491241251338412
Ian Lundberg, Daniel Molitor, Jennie E. Brand
To what degree does parent occupation cause a child’s occupational attainment? We articulate this causal question in the potential outcomes framework. Empirically, we show that adjustment for only two confounding variables substantially reduces the estimated association between parent and child occupation in a U.S. cohort. Methodologically, we highlight complications that arise when the treatment variable (parent occupation) can take many categorical values. A central methodological hurdle is positivity: some occupations (e.g., lawyer) are simply never held by some parents (e.g., those who did not complete college). We show how to overcome this hurdle by reporting summaries within subgroups that focus attention on the causal quantities that can be credibly estimated. Future research should build on the longstanding tradition of descriptive mobility research to answer causal questions.
{"title":"The Causal Effect of Parent Occupation on Child Occupation: A Multivalued Treatment with Positivity Constraints","authors":"Ian Lundberg, Daniel Molitor, Jennie E. Brand","doi":"10.1177/00491241251338412","DOIUrl":"https://doi.org/10.1177/00491241251338412","url":null,"abstract":"To what degree does parent occupation cause a child’s occupational attainment? We articulate this causal question in the potential outcomes framework. Empirically, we show that adjustment for only two confounding variables substantially reduces the estimated association between parent and child occupation in a U.S. cohort. Methodologically, we highlight complications that arise when the treatment variable (parent occupation) can take many categorical values. A central methodological hurdle is positivity: some occupations (e.g., lawyer) are simply never held by some parents (e.g., those who did not complete college). We show how to overcome this hurdle by reporting summaries within subgroups that focus attention on the causal quantities that can be credibly estimated. Future research should build on the longstanding tradition of descriptive mobility research to answer causal questions.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"245 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-02DOI: 10.1177/00491241251340080
Hannah Waight, Solomon Messing, Anton Shirikov, Margaret E. Roberts, Jonathan Nagler, Jason Greenfield, Megan A. Brown, Kevin Aslett, Joshua A. Tucker
How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.
{"title":"Quantifying Narrative Similarity Across Languages","authors":"Hannah Waight, Solomon Messing, Anton Shirikov, Margaret E. Roberts, Jonathan Nagler, Jason Greenfield, Megan A. Brown, Kevin Aslett, Joshua A. Tucker","doi":"10.1177/00491241251340080","DOIUrl":"https://doi.org/10.1177/00491241251340080","url":null,"abstract":"How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of <jats:italic>claims</jats:italic> rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"62 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-02DOI: 10.1177/00491241251345463
Yongchao Ma, Nino Mushkudiani, Barry Schouten
In a probability sampling survey, adaptive data collection strategies may be used to obtain a response set that minimizes nonresponse bias within budget constraints. Previous research has stratified the target population into subgroups defined by categories of auxiliary variables observed for the entire population, and tailored strategies to obtain similar response rates across subgroups. However, if the auxiliary variables are weakly correlated with the target survey variables, optimizing data collection for these subgroups may not reduce nonresponse bias and may actually increase the variance of survey estimates. In this paper, we propose a stratification method to identify subgroups by: (1) predicting values of target survey variables from auxiliary variables, and (2) forming subgroups with different response propensities based on the predicted values of target survey variables. By tailoring different data collection strategies to these subgroups, we can obtain a response set with less variation in response propensities across subgroups that are directly relevant to the target survey variables. Given this rationale, we also propose to measure nonresponse bias by the coefficient of variation of response propensities estimated from the predicted target survey variables. A case study using the Dutch Health Survey shows that the proposed stratification method generally produces less variation in response propensities with respect to the predicted target survey variables compared to traditional methods, thereby leading to a response set that better resembles the population.
{"title":"An Optimal Stratification Method for Addressing Nonresponse Bias in Bayesian Adaptive Survey Design","authors":"Yongchao Ma, Nino Mushkudiani, Barry Schouten","doi":"10.1177/00491241251345463","DOIUrl":"https://doi.org/10.1177/00491241251345463","url":null,"abstract":"In a probability sampling survey, adaptive data collection strategies may be used to obtain a response set that minimizes nonresponse bias within budget constraints. Previous research has stratified the target population into subgroups defined by categories of auxiliary variables observed for the entire population, and tailored strategies to obtain similar response rates across subgroups. However, if the auxiliary variables are weakly correlated with the target survey variables, optimizing data collection for these subgroups may not reduce nonresponse bias and may actually increase the variance of survey estimates. In this paper, we propose a stratification method to identify subgroups by: (1) predicting values of target survey variables from auxiliary variables, and (2) forming subgroups with different response propensities based on the predicted values of target survey variables. By tailoring different data collection strategies to these subgroups, we can obtain a response set with less variation in response propensities across subgroups that are directly relevant to the target survey variables. Given this rationale, we also propose to measure nonresponse bias by the coefficient of variation of response propensities estimated from the predicted target survey variables. A case study using the Dutch Health Survey shows that the proposed stratification method generally produces less variation in response propensities with respect to the predicted target survey variables compared to traditional methods, thereby leading to a response set that better resembles the population.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"51 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-27DOI: 10.1177/00491241251339673
Tina Law, Elizabeth Roberto
Although there is growing social science research examining how generative AI models can be effectively and systematically applied to text-based tasks, whether and how these models can be used to analyze images remain open questions. In this article, we introduce a framework for analyzing images with generative multimodal models, which consists of three core tasks: curation, discovery, and measurement and inference. We demonstrate this framework with an empirical application that uses OpenAI's GPT-4o model to analyze satellite and streetscape images ( n = 1,101) to identify built environment features that contribute to contemporary residential segregation in U.S. cities. We find that when GPT-4o is provided with well-defined image labels, the model labels images with high validity compared to expert labels. We conclude with thoughts for other use cases and discuss how social scientists can work collaboratively to ensure that image analysis with generative multimodal models is rigorous, reproducible, ethical, and sustainable.
{"title":"Generative Multimodal Models for Social Science: An Application with Satellite and Streetscape Imagery","authors":"Tina Law, Elizabeth Roberto","doi":"10.1177/00491241251339673","DOIUrl":"https://doi.org/10.1177/00491241251339673","url":null,"abstract":"Although there is growing social science research examining how generative AI models can be effectively and systematically applied to text-based tasks, whether and how these models can be used to analyze images remain open questions. In this article, we introduce a framework for analyzing images with generative multimodal models, which consists of three core tasks: curation, discovery, and measurement and inference. We demonstrate this framework with an empirical application that uses OpenAI's GPT-4o model to analyze satellite and streetscape images ( <jats:italic>n</jats:italic> = 1,101) to identify built environment features that contribute to contemporary residential segregation in U.S. cities. We find that when GPT-4o is provided with well-defined image labels, the model labels images with high validity compared to expert labels. We conclude with thoughts for other use cases and discuss how social scientists can work collaboratively to ensure that image analysis with generative multimodal models is rigorous, reproducible, ethical, and sustainable.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"58 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-27DOI: 10.1177/00491241251338246
Julian Ashwin, Aditya Chhabra, Vijayendra Rao
Large language models (LLMs) are quickly becoming ubiquitous, but their implications for social science research are not yet well understood. We ask whether LLMs can help code and analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees and their Bengali hosts in Bangladesh. We find that using LLMs to annotate and code text can introduce bias that can lead to misleading inferences. By bias we mean that the errors that LLMs make in coding interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human codes leads to less measurement error and bias than LLM annotations. Given that high quality codes are necessary in order to assess whether an LLM introduces bias, we argue that it may be preferable to train a bespoke model on a subset of transcripts coded by trained sociologists rather than use an LLM.
{"title":"Using Large Language Models for Qualitative Analysis can Introduce Serious Bias","authors":"Julian Ashwin, Aditya Chhabra, Vijayendra Rao","doi":"10.1177/00491241251338246","DOIUrl":"https://doi.org/10.1177/00491241251338246","url":null,"abstract":"Large language models (LLMs) are quickly becoming ubiquitous, but their implications for social science research are not yet well understood. We ask whether LLMs can help code and analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees and their Bengali hosts in Bangladesh. We find that using LLMs to annotate and code text can introduce bias that can lead to misleading inferences. By bias we mean that the errors that LLMs make in coding interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human codes leads to less measurement error and bias than LLM annotations. Given that high quality codes are necessary in order to assess whether an LLM introduces bias, we argue that it may be preferable to train a bespoke model on a subset of transcripts coded by trained sociologists rather than use an LLM.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"240 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-22DOI: 10.1177/00491241251339654
Yoosoon Chang, Steven N. Durlauf, Bo Hu, Joon Y. Park
This article proposes a fully nonparametric model to investigate the dynamics of intergenerational income mobility for discrete outcomes. In our model, an individual’s income class probabilities depend on parental income in a manner that accommodates nonlinearities and interactions among various individual and parental characteristics, including race, education, and parental age at childbearing, and so generalizes Markov chain mobility models. We show how the model may be estimated using kernel techniques from machine learning. Utilizing data from the panel study of income dynamics, we show how race, parental education, and mother’s age at birth interact with family income to determine mobility between generations.
{"title":"Accounting for Individual-Specific Heterogeneity in Intergenerational Income Mobility","authors":"Yoosoon Chang, Steven N. Durlauf, Bo Hu, Joon Y. Park","doi":"10.1177/00491241251339654","DOIUrl":"https://doi.org/10.1177/00491241251339654","url":null,"abstract":"This article proposes a fully nonparametric model to investigate the dynamics of intergenerational income mobility for discrete outcomes. In our model, an individual’s income class probabilities depend on parental income in a manner that accommodates nonlinearities and interactions among various individual and parental characteristics, including race, education, and parental age at childbearing, and so generalizes Markov chain mobility models. We show how the model may be estimated using kernel techniques from machine learning. Utilizing data from the panel study of income dynamics, we show how race, parental education, and mother’s age at birth interact with family income to determine mobility between generations.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"35 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-22DOI: 10.1177/00491241251338240
Mike Vuolo, Sadé L. Lindsay, Vincent J. Roscigno, Shawn D. Bushway
Randomized audits and correspondence studies are widely regarded as a “gold standard” for capturing discrimination and bias. However, gatekeepers (e.g., employers) are the analytic unit even though stated implications often center on group-level inequalities. Employing simple rules, we show that audits have the potential to uncover applicant-side inequalities and burdens beyond the gatekeeper biases standardly reported. Specifically, applicants from groups facing lower callback rates must submit more applications to ensure an eventual callback, have fewer opportunities to choose from, and face higher uncertainty regarding how many applications to submit. These results reflect several sequential and cumulative stratification processes “real-world” applicants face that warrant attention in conventional audit reporting. Our approach can be straightforwardly applied and, we show, is particularly pertinent for employment relative to other institutional domains (e.g., education, religion). We discuss the methodological and theoretical relevance of our suggested extensions and the implications for the study of inequality, discrimination, and social closure.
{"title":"The Unrealized Potential of Audits: Applicant-Side Inequalities in Effort, Opportunities, and Certainty","authors":"Mike Vuolo, Sadé L. Lindsay, Vincent J. Roscigno, Shawn D. Bushway","doi":"10.1177/00491241251338240","DOIUrl":"https://doi.org/10.1177/00491241251338240","url":null,"abstract":"Randomized audits and correspondence studies are widely regarded as a “gold standard” for capturing discrimination and bias. However, gatekeepers (e.g., employers) are the analytic unit even though stated implications often center on group-level inequalities. Employing simple rules, we show that audits have the potential to uncover applicant-side inequalities and burdens beyond the gatekeeper biases standardly reported. Specifically, applicants from groups facing lower callback rates must submit more applications to ensure an eventual callback, have fewer opportunities to choose from, and face higher uncertainty regarding how many applications to submit. These results reflect several sequential and cumulative stratification processes “real-world” applicants face that warrant attention in conventional audit reporting. Our approach can be straightforwardly applied and, we show, is particularly pertinent for employment relative to other institutional domains (e.g., education, religion). We discuss the methodological and theoretical relevance of our suggested extensions and the implications for the study of inequality, discrimination, and social closure.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"57 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1177/00491241251339188
Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, Leslie McCall
Over the past decade, social scientists have adapted computational methods for qualitative text analysis, with the hope that they can match the accuracy and reliability of hand coding. The emergence of GPT and open-source generative large language models (LLMs) has transformed this process by shifting from programming to engaging with models using natural language, potentially mimicking the in-depth, inductive, and/or iterative process of qualitative analysis. We test the ability of generative LLMs to replicate and augment traditional qualitative coding, experimenting with multiple prompt structures across four closed- and open-source generative LLMs and proposing a workflow for conducting qualitative coding with generative LLMs. We find that LLMs can perform nearly as well as prior supervised machine learning models in accurately matching hand-coding output. Moreover, using generative LLMs as a natural language interlocutor closely replicates traditional qualitative methods, indicating their potential to transform the qualitative research process, despite ongoing challenges.
{"title":"Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models","authors":"Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, Leslie McCall","doi":"10.1177/00491241251339188","DOIUrl":"https://doi.org/10.1177/00491241251339188","url":null,"abstract":"Over the past decade, social scientists have adapted computational methods for qualitative text analysis, with the hope that they can match the accuracy and reliability of hand coding. The emergence of GPT and open-source generative large language models (LLMs) has transformed this process by shifting from programming to engaging with models using natural language, potentially mimicking the in-depth, inductive, and/or iterative process of qualitative analysis. We test the ability of generative LLMs to replicate and augment traditional qualitative coding, experimenting with multiple prompt structures across four closed- and open-source generative LLMs and proposing a workflow for conducting qualitative coding with generative LLMs. We find that LLMs can perform nearly as well as prior supervised machine learning models in accurately matching hand-coding output. Moreover, using generative LLMs as a natural language interlocutor closely replicates traditional qualitative methods, indicating their potential to transform the qualitative research process, despite ongoing challenges.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"11 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1177/00491241251342008
Alex Lyman, Bryce Hepner, Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, David Wingate
Generative artificial intelligence (AI) has the potential to revolutionize social science research. However, researchers face the difficult challenge of choosing a specific AI model, often without social science-specific guidance. To demonstrate the importance of this choice, we present an evaluation of the effect of alignment, or human-driven modification, on the ability of large language models (LLMs) to simulate the attitudes of human populations (sometimes called silicon sampling ). We benchmark aligned and unaligned versions of six open-source LLMs against each other and compare them to similar responses by humans. Our results suggest that model alignment impacts output in predictable ways, with implications for prompting, task completion, and the substantive content of LLM-based results. We conclude that researchers must be aware of the complex ways in which model training affects their research and carefully consider model choice for each project. We discuss future steps to improve how social scientists work with generative AI tools.
{"title":"Balancing Large Language Model Alignment and Algorithmic Fidelity in Social Science Research","authors":"Alex Lyman, Bryce Hepner, Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, David Wingate","doi":"10.1177/00491241251342008","DOIUrl":"https://doi.org/10.1177/00491241251342008","url":null,"abstract":"Generative artificial intelligence (AI) has the potential to revolutionize social science research. However, researchers face the difficult challenge of choosing a specific AI model, often without social science-specific guidance. To demonstrate the importance of this choice, we present an evaluation of the effect of alignment, or human-driven modification, on the ability of large language models (LLMs) to simulate the attitudes of human populations (sometimes called <jats:italic>silicon sampling</jats:italic> ). We benchmark aligned and unaligned versions of six open-source LLMs against each other and compare them to similar responses by humans. Our results suggest that model alignment impacts output in predictable ways, with implications for prompting, task completion, and the substantive content of LLM-based results. We conclude that researchers must be aware of the complex ways in which model training affects their research and carefully consider model choice for each project. We discuss future steps to improve how social scientists work with generative AI tools.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"16 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-07DOI: 10.1177/00491241251327130
Simone Zhang, Janet Xu, AJ Alvero
The growing popularity of generative artificial intelligence (AI) tools presents new challenges for data quality in online surveys and experiments. This study examines participants’ use of large language models to answer open-ended survey questions and describes empirical tendencies in human versus large language model (LLM)-generated text responses. In an original survey of research participants recruited from a popular online platform for sourcing social science research subjects, 34 percent reported using LLMs to help them answer open-ended survey questions. Simulations comparing human-written responses from three pre-ChatGPT studies with LLM-generated text reveal that LLM responses are more homogeneous and positive, particularly when they describe social groups in sensitive questions. These homogenization patterns may mask important underlying social variation in attitudes and beliefs among human subjects, raising concerns about data validity. Our findings shed light on the scope and potential consequences of participants’ LLM use in online research.
{"title":"Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization","authors":"Simone Zhang, Janet Xu, AJ Alvero","doi":"10.1177/00491241251327130","DOIUrl":"https://doi.org/10.1177/00491241251327130","url":null,"abstract":"The growing popularity of generative artificial intelligence (AI) tools presents new challenges for data quality in online surveys and experiments. This study examines participants’ use of large language models to answer open-ended survey questions and describes empirical tendencies in human versus large language model (LLM)-generated text responses. In an original survey of research participants recruited from a popular online platform for sourcing social science research subjects, 34 percent reported using LLMs to help them answer open-ended survey questions. Simulations comparing human-written responses from three pre-ChatGPT studies with LLM-generated text reveal that LLM responses are more homogeneous and positive, particularly when they describe social groups in sensitive questions. These homogenization patterns may mask important underlying social variation in attitudes and beliefs among human subjects, raising concerns about data validity. Our findings shed light on the scope and potential consequences of participants’ LLM use in online research.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"74 1","pages":""},"PeriodicalIF":6.3,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143920428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}