Pub Date : 2022-07-29DOI: 10.1177/00811750221114857
K. Yamaguchi
This article introduces a new causal analytic method for survival analysis that retains the framework of Rubin’s causal model as an alternative to the marginal structural model (MSM). The major limitation of the MSM is a systematic bias in the effects of past treatments when the method is applied to the hazard rate analysis of nonrepeatable events in the presence of unobserved heterogeneity. This systematic bias is demonstrated in the article. The method introduced here assumes a semiparametric conditional-incidence-rate model and provides consistent estimates of the effects of present and past treatments on the conditional cumulative-incidence rate in the analysis of nonrepeatable events in the presence of unobserved heterogeneity. Unlike the MSM, which requires a sequential and cumulative use of the inverse-probability-of-treatment weighting many times for data with many time points, the new method uses the inverse-probability-of-treatment weighing only twice sequentially for estimation of the present and past treatment effects at each time of entry into treatment, and not cumulatively across different treatment entry times. Analysis of the conditional-incidence rate can also provide a more efficient parameter estimate for the treatment effect than the hazard rate model in cases where a majority of sample persons experience the event and thereby cease to be members of the risk set of the hazard rate during the period of observation. An application to an analysis of sexual initiation demonstrates that leaving home promotes sexual initiation, especially premarital sexual initiation, because it greatly increases the rate of premarital sexual initiation during the year after leaving home.
{"title":"A New RCM Approach to Survival Analysis: The Conditional-Incidence-Rate Model","authors":"K. Yamaguchi","doi":"10.1177/00811750221114857","DOIUrl":"https://doi.org/10.1177/00811750221114857","url":null,"abstract":"This article introduces a new causal analytic method for survival analysis that retains the framework of Rubin’s causal model as an alternative to the marginal structural model (MSM). The major limitation of the MSM is a systematic bias in the effects of past treatments when the method is applied to the hazard rate analysis of nonrepeatable events in the presence of unobserved heterogeneity. This systematic bias is demonstrated in the article. The method introduced here assumes a semiparametric conditional-incidence-rate model and provides consistent estimates of the effects of present and past treatments on the conditional cumulative-incidence rate in the analysis of nonrepeatable events in the presence of unobserved heterogeneity. Unlike the MSM, which requires a sequential and cumulative use of the inverse-probability-of-treatment weighting many times for data with many time points, the new method uses the inverse-probability-of-treatment weighing only twice sequentially for estimation of the present and past treatment effects at each time of entry into treatment, and not cumulatively across different treatment entry times. Analysis of the conditional-incidence rate can also provide a more efficient parameter estimate for the treatment effect than the hazard rate model in cases where a majority of sample persons experience the event and thereby cease to be members of the risk set of the hazard rate during the period of observation. An application to an analysis of sexual initiation demonstrates that leaving home promotes sexual initiation, especially premarital sexual initiation, because it greatly increases the rate of premarital sexual initiation during the year after leaving home.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41740093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-05DOI: 10.1177/00811750221109568
D. Feehan, Vo Hai Son, A. Abdul-Quader
Researchers increasingly use aggregate relational data to learn about the size and distribution of survey respondents’ weak-tie personal networks. Aggregate relational data are collected by asking questions about respondents’ connectedness to many different groups (e.g., “How many teachers do you know?”). This approach can be powerful, but to use aggregate relational data, researchers must locate external information about the size of each group from a census or administrative records (e.g., the number of teachers in the population). This need for external information makes aggregate relational data difficult or impossible to collect in many settings. Here, the authors show that relatively simple modifications can overcome this need for external data, significantly increasing the flexibility of the method and weakening key assumptions required by the associated estimators. The key idea is to estimate the size of these groups from the sample of survey respondents, rather than relying on external sources of information. These methods are appropriate for using a sample survey to study the size and distribution of weak-tie network connections. They can also be used as part of the network scale-up method to estimate the size of hidden populations. The authors illustrate this approach with two empirical studies: a large simulation study and original household survey data collected in Hanoi, Vietnam.
{"title":"Survey Methods for Estimating the Size of Weak-Tie Personal Networks","authors":"D. Feehan, Vo Hai Son, A. Abdul-Quader","doi":"10.1177/00811750221109568","DOIUrl":"https://doi.org/10.1177/00811750221109568","url":null,"abstract":"Researchers increasingly use aggregate relational data to learn about the size and distribution of survey respondents’ weak-tie personal networks. Aggregate relational data are collected by asking questions about respondents’ connectedness to many different groups (e.g., “How many teachers do you know?”). This approach can be powerful, but to use aggregate relational data, researchers must locate external information about the size of each group from a census or administrative records (e.g., the number of teachers in the population). This need for external information makes aggregate relational data difficult or impossible to collect in many settings. Here, the authors show that relatively simple modifications can overcome this need for external data, significantly increasing the flexibility of the method and weakening key assumptions required by the associated estimators. The key idea is to estimate the size of these groups from the sample of survey respondents, rather than relying on external sources of information. These methods are appropriate for using a sample survey to study the size and distribution of weak-tie network connections. They can also be used as part of the network scale-up method to estimate the size of hidden populations. The authors illustrate this approach with two empirical studies: a large simulation study and original household survey data collected in Hanoi, Vietnam.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45024226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-22DOI: 10.1177/00811750221106777
L. Owens
The author explores interactions with one research subject who feigns credentials and invents stories in order to participate in social science research interviews online. The possibility of intentional deception among interviewees in virtually mediated fieldwork is a critical consideration in the context of the recent extensive pivot to online-based fieldwork during the need for social distancing associated with the coronavirus disease 2019 pandemic. Following this rapid shift in what is generally accepted as the “gold standard” for social science research interviews, widespread use of online-based interviewing methods will likely endure as equivalent to in-person methods. A methodological case study with implications for virtually mediated fieldwork, this article highlights some of the advantages and disadvantages of virtually mediated interviews and provides practical suggestions.
{"title":"An Implausible Virtual Interview: Conversations with a Professional Research Subject","authors":"L. Owens","doi":"10.1177/00811750221106777","DOIUrl":"https://doi.org/10.1177/00811750221106777","url":null,"abstract":"The author explores interactions with one research subject who feigns credentials and invents stories in order to participate in social science research interviews online. The possibility of intentional deception among interviewees in virtually mediated fieldwork is a critical consideration in the context of the recent extensive pivot to online-based fieldwork during the need for social distancing associated with the coronavirus disease 2019 pandemic. Following this rapid shift in what is generally accepted as the “gold standard” for social science research interviews, widespread use of online-based interviewing methods will likely endure as equivalent to in-person methods. A methodological case study with implications for virtually mediated fieldwork, this article highlights some of the advantages and disadvantages of virtually mediated interviews and provides practical suggestions.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49592113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-08DOI: 10.1177/00811750221099503
Beatriz Gallo Cordoba, G. Leckie, W. Browne
Ethnic achievement gaps are often explained in terms of student and school factors. The decomposition of these gaps into their within- and between-school components has therefore been applied as a strategy to quantify the overall influence of each set of factors. Three competing approaches have previously been proposed, but each is limited to the study of student-school decompositions of the gap between two ethnic groups (e.g., White and Black). The authors show that these approaches can be reformulated as mediation models facilitating new extensions to allow additional levels in the school system (e.g., classrooms, school districts, geographic areas) and multiple ethnic groups (e.g., White, Black, Hispanic, Asian). The authors illustrate these extensions using administrative data for high school students in Colombia and highlight the increased substantive insights and nuanced policy implications they afford.
{"title":"Decomposing Ethnic Achievement Gaps across Multiple Levels of Analysis and for Multiple Ethnic Groups","authors":"Beatriz Gallo Cordoba, G. Leckie, W. Browne","doi":"10.1177/00811750221099503","DOIUrl":"https://doi.org/10.1177/00811750221099503","url":null,"abstract":"Ethnic achievement gaps are often explained in terms of student and school factors. The decomposition of these gaps into their within- and between-school components has therefore been applied as a strategy to quantify the overall influence of each set of factors. Three competing approaches have previously been proposed, but each is limited to the study of student-school decompositions of the gap between two ethnic groups (e.g., White and Black). The authors show that these approaches can be reformulated as mediation models facilitating new extensions to allow additional levels in the school system (e.g., classrooms, school districts, geographic areas) and multiple ethnic groups (e.g., White, Black, Hispanic, Asian). The authors illustrate these extensions using administrative data for high school students in Colombia and highlight the increased substantive insights and nuanced policy implications they afford.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43907795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-07DOI: 10.1177/00811750221085586
Andrew Carr
To understand how income inequality affects individuals and communities, researchers must have accurate measures of income inequality at lower geographic levels, such as counties, school districts, and census tracts. Studies of income inequality, however, are constrained by the tabular format in which censuses publish income data. In this article, the author proposes a new method, Lorenz interpolation, for estimating income inequality from binned income data. Using public microsample data from the American Community Survey (ACS), the author shows that Lorenz interpolation produces more accurate and reliable income inequality estimates than do alternative estimation methods. Then, using restricted ACS income data obtained through a Federal Statistical Research Data Center, the author evaluates the accuracy of Lorenz interpolation at the census tract and school district levels. Lorenz interpolation produces reliable school district–level estimates, but the method produces less reliable estimates for some income inequality measures at the tract level. These findings indicate that researchers should refrain from estimating tract-level income inequality measures from tabular data. They also show that aggregating tract income distributions to higher geographic levels can produce valid estimates of income inequality.
{"title":"Lorenz Interpolation: A Method for Estimating Income Inequality from Grouped Income Data","authors":"Andrew Carr","doi":"10.1177/00811750221085586","DOIUrl":"https://doi.org/10.1177/00811750221085586","url":null,"abstract":"To understand how income inequality affects individuals and communities, researchers must have accurate measures of income inequality at lower geographic levels, such as counties, school districts, and census tracts. Studies of income inequality, however, are constrained by the tabular format in which censuses publish income data. In this article, the author proposes a new method, Lorenz interpolation, for estimating income inequality from binned income data. Using public microsample data from the American Community Survey (ACS), the author shows that Lorenz interpolation produces more accurate and reliable income inequality estimates than do alternative estimation methods. Then, using restricted ACS income data obtained through a Federal Statistical Research Data Center, the author evaluates the accuracy of Lorenz interpolation at the census tract and school district levels. Lorenz interpolation produces reliable school district–level estimates, but the method produces less reliable estimates for some income inequality measures at the tract level. These findings indicate that researchers should refrain from estimating tract-level income inequality measures from tabular data. They also show that aggregating tract income distributions to higher geographic levels can produce valid estimates of income inequality.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41707373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-01DOI: 10.1177/00811750211071129
Amelie Pedneault, Dale W. Willits
Contextual effects refer to the process by which responses given to survey questions can be affected by question order. Generally, contextual effects harm data measurement validity by introducing bias and increasing measurement error; the risk is that responses to a survey’s later questions are partly affected not only by the substance of the question but also by the preceding questions. Two opposite effects are possible: a carryover effect refers to the assimilation of later questions into those previously asked, and a backfire effect refers to the contrasting of earlier and later questions. In the case where a stereotype is activated in earlier questions of a survey, the previous literature suggests a carryover effect is more likely. The present study tests whether this is also the case in factorial vignette research by examining the influence of first presenting a vignette that corresponds more closely to a stereotypical view of sexual abuse. Results indicate a backfire effect, pointing to the distinctively different way in which vignette scenarios activate stereotypes compared to general survey questions. The results also highlight the need for researchers to control for contextual ordering effects when modeling factorial vignette data.
{"title":"Asking about the Worst First: An Examination of Contextual Effects in Factorial Vignettes","authors":"Amelie Pedneault, Dale W. Willits","doi":"10.1177/00811750211071129","DOIUrl":"https://doi.org/10.1177/00811750211071129","url":null,"abstract":"Contextual effects refer to the process by which responses given to survey questions can be affected by question order. Generally, contextual effects harm data measurement validity by introducing bias and increasing measurement error; the risk is that responses to a survey’s later questions are partly affected not only by the substance of the question but also by the preceding questions. Two opposite effects are possible: a carryover effect refers to the assimilation of later questions into those previously asked, and a backfire effect refers to the contrasting of earlier and later questions. In the case where a stereotype is activated in earlier questions of a survey, the previous literature suggests a carryover effect is more likely. The present study tests whether this is also the case in factorial vignette research by examining the influence of first presenting a vignette that corresponds more closely to a stereotypical view of sexual abuse. Results indicate a backfire effect, pointing to the distinctively different way in which vignette scenarios activate stereotypes compared to general survey questions. The results also highlight the need for researchers to control for contextual ordering effects when modeling factorial vignette data.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42651125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-01DOI: 10.1177/00811750211071130
S. Yuen, Gary Tang, Francis L. F. Lee, Edmund W. Cheng
Protest survey is a standard tool for scholars to understand protests. However, although protest survey methods are well established, the occurrence of spontaneous and leaderless protests has created new challenges for researchers. Not only do their unpredictable occurrences hinder planning, their fluidity also creates problems in obtaining representative samples. This article addresses these challenges based on our research during Hong Kong’s Anti-Extradition Law Amendment Bill Movement. We propose a mixed-mode sampling method combining face-to-face survey and smartphone-based online survey (onsite and post hoc), which can maximize sample sizes while ensuring representativeness in a cost-effective manner. Test results indicate that key variables from the survey modes are not statistically different in a consistent manner, except for age. Our findings show mixed-mode sampling can better capture protesters’ characteristics in contemporary protests and is replicable in other contexts.
{"title":"Surveying Spontaneous Mass Protests: Mixed-mode Sampling and Field Methods","authors":"S. Yuen, Gary Tang, Francis L. F. Lee, Edmund W. Cheng","doi":"10.1177/00811750211071130","DOIUrl":"https://doi.org/10.1177/00811750211071130","url":null,"abstract":"Protest survey is a standard tool for scholars to understand protests. However, although protest survey methods are well established, the occurrence of spontaneous and leaderless protests has created new challenges for researchers. Not only do their unpredictable occurrences hinder planning, their fluidity also creates problems in obtaining representative samples. This article addresses these challenges based on our research during Hong Kong’s Anti-Extradition Law Amendment Bill Movement. We propose a mixed-mode sampling method combining face-to-face survey and smartphone-based online survey (onsite and post hoc), which can maximize sample sizes while ensuring representativeness in a cost-effective manner. Test results indicate that key variables from the survey modes are not statistically different in a consistent manner, except for age. Our findings show mixed-mode sampling can better capture protesters’ characteristics in contemporary protests and is replicable in other contexts.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49169680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-25DOI: 10.1177/00811750211053370
Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah, Dhia Fairus Shofia Fani
Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.
{"title":"Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity","authors":"Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah, Dhia Fairus Shofia Fani","doi":"10.1177/00811750211053370","DOIUrl":"https://doi.org/10.1177/00811750211053370","url":null,"abstract":"Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48332990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-28DOI: 10.1177/00811750211046307
Ryan P. Thombs, Xiaorui Huang, Jared Berry Fitzgerald
Modeling asymmetric relationships is an emerging subject of interest among sociologists. York and Light advanced a method to estimate asymmetric models with panel data, which was further developed by Allison. However, little attention has been given to the large-N, large-T case, wherein autoregression, slope heterogeneity, and cross-sectional dependence are important issues to consider. The authors fill this gap by conducting Monte Carlo experiments comparing the bias and power of the fixed-effects estimator to a set of heterogeneous panel estimators. The authors find that dynamic misspecification can produce substantial biases in the coefficients. Furthermore, even when the dynamics are correctly specified, the fixed-effects estimator will produce inconsistent and unstable estimates of the long-run effects in the presence of slope heterogeneity. The authors demonstrate these findings by testing for directional asymmetry in the economic development–CO2 emissions relationship, a key question in macro sociology, using data for 66 countries from 1971 to 2015. The authors conclude with a set of methodological recommendations on modeling directional asymmetry.
{"title":"What Goes Up Might Not Come Down: Modeling Directional Asymmetry with Large-N, Large-T Data","authors":"Ryan P. Thombs, Xiaorui Huang, Jared Berry Fitzgerald","doi":"10.1177/00811750211046307","DOIUrl":"https://doi.org/10.1177/00811750211046307","url":null,"abstract":"Modeling asymmetric relationships is an emerging subject of interest among sociologists. York and Light advanced a method to estimate asymmetric models with panel data, which was further developed by Allison. However, little attention has been given to the large-N, large-T case, wherein autoregression, slope heterogeneity, and cross-sectional dependence are important issues to consider. The authors fill this gap by conducting Monte Carlo experiments comparing the bias and power of the fixed-effects estimator to a set of heterogeneous panel estimators. The authors find that dynamic misspecification can produce substantial biases in the coefficients. Furthermore, even when the dynamics are correctly specified, the fixed-effects estimator will produce inconsistent and unstable estimates of the long-run effects in the presence of slope heterogeneity. The authors demonstrate these findings by testing for directional asymmetry in the economic development–CO2 emissions relationship, a key question in macro sociology, using data for 66 countries from 1971 to 2015. The authors conclude with a set of methodological recommendations on modeling directional asymmetry.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42559507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequence analysis (SA) has gained increasing interest in social sciences for theholistic analysis of life course and other longitudinal data. The usual approach isto construct sequences, calculate dissimilarities, group similar sequences with clusteranalysis, and use cluster membership as a dependent or independent variable in a linear or nonlinear regression model.This approach may be problematic as the cluster memberships are assumed to befixed known characteristics of the subjects in subsequent analysis. Furthermore, often it is more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for these issues may lead to wrong conclusions about the nature of the studied relationships.In this paper, we bring forward and discuss the problems of the "traditional" useof SA clusters and compare four approaches for different types of data. We conduct a simulation study and an empirical study, demonstrating the importance of considering how sequences and outcomes are related and the need to adjust the analysis accordingly. In many typical social science applications, the traditional approach is prone to result in wrong conclusions and so-called position-dependent approaches such as representativeness should be preferred.
{"title":"From sequences to variables – Rethinking the relationship between sequences and outcomes","authors":"S. Helske, Jouni Helske, Guilherme Kenji Chihaya","doi":"10.31235/osf.io/srxag","DOIUrl":"https://doi.org/10.31235/osf.io/srxag","url":null,"abstract":"Sequence analysis (SA) has gained increasing interest in social sciences for theholistic analysis of life course and other longitudinal data. The usual approach isto construct sequences, calculate dissimilarities, group similar sequences with clusteranalysis, and use cluster membership as a dependent or independent variable in a linear or nonlinear regression model.This approach may be problematic as the cluster memberships are assumed to befixed known characteristics of the subjects in subsequent analysis. Furthermore, often it is more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for these issues may lead to wrong conclusions about the nature of the studied relationships.In this paper, we bring forward and discuss the problems of the \"traditional\" useof SA clusters and compare four approaches for different types of data. We conduct a simulation study and an empirical study, demonstrating the importance of considering how sequences and outcomes are related and the need to adjust the analysis accordingly. In many typical social science applications, the traditional approach is prone to result in wrong conclusions and so-called position-dependent approaches such as representativeness should be preferred.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45045791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}