Pub Date : 2024-02-05DOI: 10.1177/00811750231220950
Scott W. Duxbury
Mediation analysis is increasingly used in the social sciences. Extension to social network data, however, has proved difficult because statistical network models are formulated at a lower level of analysis (the dyad) than many outcomes of interest. This study introduces a general approach for micro-macro mediation analysis in social networks. The author defines the average mediated micro effect (AMME) as the indirect effect of a network selection process on an individual, group, or organizational outcome through its effect on an intervening network variable. The author shows that the AMME can be nonparametrically identified using a wide range of common statistical network and regression modeling strategies under the assumption of conditional independence among multiple mediators. Nonparametric and parametric algorithms are introduced to generically estimate the AMME in a multitude of research designs. The author illustrates the utility of the method with an applied example using cross-sectional National Longitudinal Study of Adolescent to Adult Health data to examine the friendship selection mechanisms that indirectly shape adolescent school performance through their effect on network structure.
{"title":"Micro-Macro Mediation Analysis in Social Networks","authors":"Scott W. Duxbury","doi":"10.1177/00811750231220950","DOIUrl":"https://doi.org/10.1177/00811750231220950","url":null,"abstract":"Mediation analysis is increasingly used in the social sciences. Extension to social network data, however, has proved difficult because statistical network models are formulated at a lower level of analysis (the dyad) than many outcomes of interest. This study introduces a general approach for micro-macro mediation analysis in social networks. The author defines the average mediated micro effect (AMME) as the indirect effect of a network selection process on an individual, group, or organizational outcome through its effect on an intervening network variable. The author shows that the AMME can be nonparametrically identified using a wide range of common statistical network and regression modeling strategies under the assumption of conditional independence among multiple mediators. Nonparametric and parametric algorithms are introduced to generically estimate the AMME in a multitude of research designs. The author illustrates the utility of the method with an applied example using cross-sectional National Longitudinal Study of Adolescent to Adult Health data to examine the friendship selection mechanisms that indirectly shape adolescent school performance through their effect on network structure.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139804971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-13DOI: 10.1177/00811750231217734
M. Verhagen
Quantitative sociologists frequently use simple linear functional forms to estimate associations among variables. However, there is little guidance on whether such simple functional forms correctly reflect the underlying data-generating process. Incorrect model specification can lead to misspecification bias, and a lack of scrutiny of functional forms fosters interference of researcher degrees of freedom in sociological work. In this article, I propose a framework that uses flexible machine learning (ML) methods to provide an indication of the fit potential in a dataset containing the exact same covariates as a researcher’s hypothesized model. When this ML-based fit potential strongly outperforms the researcher’s self-hypothesized functional form, it implies a lack of complexity in the latter. Advances in the field of explainable AI, like the increasingly popular Shapley values, can be used to generate understanding into the ML model such that the researcher’s original functional form can be improved accordingly. The proposed framework aims to use ML beyond solely predictive questions, helping sociologists exploit the potential of ML to identify intricate patterns in data to specify better-fitting, interpretable models. I illustrate the proposed framework using a simulation and real-world examples.
定量社会学家经常使用简单的线性函数形式来估计变量之间的关联。然而,对于这种简单的函数形式是否能正确反映基本的数据生成过程,几乎没有任何指导。不正确的模型规范会导致错误的规范偏差,而对函数形式缺乏审查则会在社会学工作中助长对研究人员自由度的干扰。在本文中,我提出了一个框架,利用灵活的机器学习(ML)方法,在包含与研究人员假设模型完全相同的协变量的数据集中,提供拟合潜力的指示。当这种基于 ML 的拟合潜力大大优于研究人员自我假设的函数形式时,就意味着后者缺乏复杂性。可解释人工智能领域的进步,如日益流行的 Shapley 值,可用于生成对 ML 模型的理解,从而相应地改进研究人员的原始函数形式。所提议的框架旨在将 ML 的使用超越单纯的预测性问题,帮助社会学家利用 ML 的潜力来识别数据中错综复杂的模式,从而指定拟合度更高的、可解释的模型。我将通过模拟和现实世界的例子来说明所提出的框架。
{"title":"Incorporating Machine Learning into Sociological Model-Building","authors":"M. Verhagen","doi":"10.1177/00811750231217734","DOIUrl":"https://doi.org/10.1177/00811750231217734","url":null,"abstract":"Quantitative sociologists frequently use simple linear functional forms to estimate associations among variables. However, there is little guidance on whether such simple functional forms correctly reflect the underlying data-generating process. Incorrect model specification can lead to misspecification bias, and a lack of scrutiny of functional forms fosters interference of researcher degrees of freedom in sociological work. In this article, I propose a framework that uses flexible machine learning (ML) methods to provide an indication of the fit potential in a dataset containing the exact same covariates as a researcher’s hypothesized model. When this ML-based fit potential strongly outperforms the researcher’s self-hypothesized functional form, it implies a lack of complexity in the latter. Advances in the field of explainable AI, like the increasingly popular Shapley values, can be used to generate understanding into the ML model such that the researcher’s original functional form can be improved accordingly. The proposed framework aims to use ML beyond solely predictive questions, helping sociologists exploit the potential of ML to identify intricate patterns in data to specify better-fitting, interpretable models. I illustrate the proposed framework using a simulation and real-world examples.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139531774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1177/00811750231203218
Loring J. Thomas, Peng Huang, Xiaoshuang Iris Luo, John R. Hipp, Carter T. Butts
Geospatial population data are typically organized into nested hierarchies of areal units, in which each unit is a union of units at the next lower level. There is increasing interest in analyses at fine geographic detail, but these lowest rungs of the areal unit hierarchy are often incompletely tabulated because of cost, privacy, or other considerations. Here, the authors introduce a novel algorithm to impute crosstabs of up to three dimensions (e.g., race, ethnicity, and gender) from marginal data combined with data at higher levels of aggregation. This method exactly preserves the observed fine-grained marginals, while approximating higher-order correlations observed in more complete higher level data. The authors show how this approach can be used with U.S. census data via a case study involving differences in exposure to crime across demographic groups, showing that the imputation process introduces very little error into downstream analysis, while depicting social process at the more fine-grained level.
{"title":"Marginal-Preserving Imputation of Three-Way Array Data in Nested Structures, with Application to Small Areal Units","authors":"Loring J. Thomas, Peng Huang, Xiaoshuang Iris Luo, John R. Hipp, Carter T. Butts","doi":"10.1177/00811750231203218","DOIUrl":"https://doi.org/10.1177/00811750231203218","url":null,"abstract":"Geospatial population data are typically organized into nested hierarchies of areal units, in which each unit is a union of units at the next lower level. There is increasing interest in analyses at fine geographic detail, but these lowest rungs of the areal unit hierarchy are often incompletely tabulated because of cost, privacy, or other considerations. Here, the authors introduce a novel algorithm to impute crosstabs of up to three dimensions (e.g., race, ethnicity, and gender) from marginal data combined with data at higher levels of aggregation. This method exactly preserves the observed fine-grained marginals, while approximating higher-order correlations observed in more complete higher level data. The authors show how this approach can be used with U.S. census data via a case study involving differences in exposure to crime across demographic groups, showing that the imputation process introduces very little error into downstream analysis, while depicting social process at the more fine-grained level.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135341765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1177/00811750231209040
Scott W. Duxbury
How do individuals’ network selection decisions create unique network structures? Despite broad sociological interest in the micro-level social interactions that create macro-level network structure, few methods are available to statistically evaluate micro-macro relationships in social networks. This study introduces a general methodological framework for testing the effect of (micro) network selection processes, such as homophily, reciprocity, or preferential attachment, on unique (macro) network structures, such as segregation, clustering, or brokerage. The approach uses estimates from a statistical network model to decompose the contributions of each parameter to a node, subgraph, or global network statistic specified by the researcher. A flexible parametric algorithm is introduced to estimate variances, confidence intervals, and p values. Prior micro-macro network methods can be regarded as special cases of the general framework. Extensions to hypothetical network interventions, joint parameter tests, and longitudinal and multilevel network data are discussed. An example is provided analyzing the micro foundations of political segregation in a crime policy collaboration network.
{"title":"Micro Effects on Macro Structure in Social Networks","authors":"Scott W. Duxbury","doi":"10.1177/00811750231209040","DOIUrl":"https://doi.org/10.1177/00811750231209040","url":null,"abstract":"How do individuals’ network selection decisions create unique network structures? Despite broad sociological interest in the micro-level social interactions that create macro-level network structure, few methods are available to statistically evaluate micro-macro relationships in social networks. This study introduces a general methodological framework for testing the effect of (micro) network selection processes, such as homophily, reciprocity, or preferential attachment, on unique (macro) network structures, such as segregation, clustering, or brokerage. The approach uses estimates from a statistical network model to decompose the contributions of each parameter to a node, subgraph, or global network statistic specified by the researcher. A flexible parametric algorithm is introduced to estimate variances, confidence intervals, and p values. Prior micro-macro network methods can be regarded as special cases of the general framework. Extensions to hypothetical network interventions, joint parameter tests, and longitudinal and multilevel network data are discussed. An example is provided analyzing the micro foundations of political segregation in a crime policy collaboration network.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135391263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-07DOI: 10.1177/00811750231195338
Kenneth R. Hanson, Nicholas Theis
Researchers can use data visualization techniques to explore, analyze, and present data in new ways. Although quantitative data are visualized most often, recent innovations have brought attention to the potential benefits of visualizing qualitative data. In this article, the authors demonstrate one way researchers can use networks to analyze and present ethnographic interview data. The authors suggest that because many respondents know one another in ethnographic research, networks are a useful tool for analyzing the implications of respondents’ familiarity with one another. Moreover, respondents often share familiar cultural references that can be visualized. The authors show how visualizing respondents’ ties in conjunction with their shared cultural references sheds light on the different systems of meaning that respondents within a field site use to make sense of the social phenomena under investigation.
{"title":"Networked Participants, Networked Meanings: Using Networks to Visualize Ethnographic Data","authors":"Kenneth R. Hanson, Nicholas Theis","doi":"10.1177/00811750231195338","DOIUrl":"https://doi.org/10.1177/00811750231195338","url":null,"abstract":"Researchers can use data visualization techniques to explore, analyze, and present data in new ways. Although quantitative data are visualized most often, recent innovations have brought attention to the potential benefits of visualizing qualitative data. In this article, the authors demonstrate one way researchers can use networks to analyze and present ethnographic interview data. The authors suggest that because many respondents know one another in ethnographic research, networks are a useful tool for analyzing the implications of respondents’ familiarity with one another. Moreover, respondents often share familiar cultural references that can be visualized. The authors show how visualizing respondents’ ties in conjunction with their shared cultural references sheds light on the different systems of meaning that respondents within a field site use to make sense of the social phenomena under investigation.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41505157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-05DOI: 10.1177/00811750231193641
Donghui Wang, Yueqi Xie, Junming Huang
The use of pooled data from different repeated survey series to study long-term trends is handicapped by a measurement difficulty: different survey series often use different scales to measure the same attitude and thus generate scale-incomparable data. In this article, the authors propose the latent attitude method (LAM) to address this scale-incomparability problem, on the basis of the assumption that attitudes measured by ordinal categories reflect a latent attitude with cut points. The method extends the latent variable method in the case of a single survey series to the case of multiple survey series and leverages overlapping years for identification. The authors first assess the validity of the method with simulated data. The results show that the method yields accurate estimates of mean attitudes and cut point values. The authors then apply the method to an empirical study of Americans’ attitudes toward China from 1974 to 2019.
{"title":"Trend Analysis with Pooled Data from Different Survey Series: The Latent Attitude Method","authors":"Donghui Wang, Yueqi Xie, Junming Huang","doi":"10.1177/00811750231193641","DOIUrl":"https://doi.org/10.1177/00811750231193641","url":null,"abstract":"The use of pooled data from different repeated survey series to study long-term trends is handicapped by a measurement difficulty: different survey series often use different scales to measure the same attitude and thus generate scale-incomparable data. In this article, the authors propose the latent attitude method (LAM) to address this scale-incomparability problem, on the basis of the assumption that attitudes measured by ordinal categories reflect a latent attitude with cut points. The method extends the latent variable method in the case of a single survey series to the case of multiple survey series and leverages overlapping years for identification. The authors first assess the validity of the method with simulated data. The results show that the method yields accurate estimates of mean attitudes and cut point values. The authors then apply the method to an empirical study of Americans’ attitudes toward China from 1974 to 2019.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45016206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-01DOI: 10.1177/00811750231163832
Lisa Avery, Michael Rotondi
Respondent-driven sampling (RDS) is used to measure trait or disease prevalence in populations that are difficult to reach and often marginalized. The authors evaluated the performance of RDS estimators under varying conditions of trait prevalence, homophily, and relative activity. They used large simulated networks (N = 20,000) derived from real-world RDS degree reports and an empirical Facebook network (N = 22,470) to evaluate estimators of binary and categorical trait prevalence. Variability in prevalence estimates is higher when network degree is drawn from real-world samples than from the commonly assumed Poisson distribution, resulting in lower coverage rates. Newer estimators perform well when the sample is a substantive proportion of the population, but bias is present when the population size is unknown. The choice of preferred RDS estimator needs to be study specific, considering both statistical properties and knowledge of the population under study.
{"title":"Evaluation of Respondent-Driven Sampling Prevalence Estimators Using Real-World Reported Network Degree.","authors":"Lisa Avery, Michael Rotondi","doi":"10.1177/00811750231163832","DOIUrl":"https://doi.org/10.1177/00811750231163832","url":null,"abstract":"<p><p>Respondent-driven sampling (RDS) is used to measure trait or disease prevalence in populations that are difficult to reach and often marginalized. The authors evaluated the performance of RDS estimators under varying conditions of trait prevalence, homophily, and relative activity. They used large simulated networks (<i>N</i> = 20,000) derived from real-world RDS degree reports and an empirical Facebook network (<i>N</i> = 22,470) to evaluate estimators of binary and categorical trait prevalence. Variability in prevalence estimates is higher when network degree is drawn from real-world samples than from the commonly assumed Poisson distribution, resulting in lower coverage rates. Newer estimators perform well when the sample is a substantive proportion of the population, but bias is present when the population size is unknown. The choice of preferred RDS estimator needs to be study specific, considering both statistical properties and knowledge of the population under study.</p>","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/23/b9/10.1177_00811750231163832.PMC10338697.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10302746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-17DOI: 10.1177/00811750231183711
S. Park, Suyeon Kang, Chioun Lee
Causal decomposition analysis is among the rapidly growing number of tools for identifying factors (“mediators”) that contribute to disparities in outcomes between social groups. An example of such mediators is college completion, which explains later health disparities between Black women and White men. The goal is to quantify how much a disparity would be reduced (or remain) if we hypothetically intervened to set the mediator distribution equal across social groups. Despite increasing interest in estimating disparity reduction and the disparity that remains, various estimation procedures are not straightforward, and researchers have scant guidance for choosing an optimal method. In this article, the authors evaluate the performance in terms of bias, variance, and coverage of three approaches that use different modeling strategies: (1) regression-based methods that impose restrictive modeling assumptions (e.g., linearity) and (2) weighting-based and (3) imputation-based methods that rely on the observed distribution of variables. The authors find a trade-off between the modeling assumptions required in the method and its performance. In terms of performance, regression-based methods operate best as long as the restrictive assumption of linearity is met. Methods relying on mediator models without imposing any modeling assumptions are sensitive to the ratio of the group-mediator association to the mediator-outcome association. These results highlight the importance of selecting an appropriate estimation procedure considering the data at hand.
{"title":"Choosing an Optimal Method for Causal Decomposition Analysis with Continuous Outcomes: A Review and Simulation Study","authors":"S. Park, Suyeon Kang, Chioun Lee","doi":"10.1177/00811750231183711","DOIUrl":"https://doi.org/10.1177/00811750231183711","url":null,"abstract":"Causal decomposition analysis is among the rapidly growing number of tools for identifying factors (“mediators”) that contribute to disparities in outcomes between social groups. An example of such mediators is college completion, which explains later health disparities between Black women and White men. The goal is to quantify how much a disparity would be reduced (or remain) if we hypothetically intervened to set the mediator distribution equal across social groups. Despite increasing interest in estimating disparity reduction and the disparity that remains, various estimation procedures are not straightforward, and researchers have scant guidance for choosing an optimal method. In this article, the authors evaluate the performance in terms of bias, variance, and coverage of three approaches that use different modeling strategies: (1) regression-based methods that impose restrictive modeling assumptions (e.g., linearity) and (2) weighting-based and (3) imputation-based methods that rely on the observed distribution of variables. The authors find a trade-off between the modeling assumptions required in the method and its performance. In terms of performance, regression-based methods operate best as long as the restrictive assumption of linearity is met. Methods relying on mediator models without imposing any modeling assumptions are sensitive to the ratio of the group-mediator association to the mediator-outcome association. These results highlight the importance of selecting an appropriate estimation procedure considering the data at hand.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42569427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-11DOI: 10.1177/00811750231184460
O. Aksoy, S. Yıldırım
The flow of resources across nodes over time (e.g., migration, financial transfers, peer-to-peer interactions) is a common phenomenon in sociology. Standard statistical methods are inadequate to model such interdependent flows. We propose a hierarchical Dirichlet-multinomial regression model and a Bayesian estimation method. We apply the model to analyze 25,632,876 migration instances that took place between Turkey’s 81 provinces from 2009 to 2018. We then discuss the methodological and substantive implications of our results. Methodologically, we demonstrate the predictive advantage of our model compared to its most common alternative in migration research, the gravity model. We also discuss our model in the context of other approaches, mostly developed in the social networks literature. Substantively, we find that population, economic prosperity, the spatial and political distance between the origin and destination, the strength of the AKP (Justice and Development Party) in a province, and the network characteristics of the provinces are important predictors of migration, whereas the proportion of ethnic minority Kurds in a province has no positive association with in- and out-migration.
{"title":"A Model of Dynamic Flows: Explaining Turkey’s Interprovincial Migration","authors":"O. Aksoy, S. Yıldırım","doi":"10.1177/00811750231184460","DOIUrl":"https://doi.org/10.1177/00811750231184460","url":null,"abstract":"The flow of resources across nodes over time (e.g., migration, financial transfers, peer-to-peer interactions) is a common phenomenon in sociology. Standard statistical methods are inadequate to model such interdependent flows. We propose a hierarchical Dirichlet-multinomial regression model and a Bayesian estimation method. We apply the model to analyze 25,632,876 migration instances that took place between Turkey’s 81 provinces from 2009 to 2018. We then discuss the methodological and substantive implications of our results. Methodologically, we demonstrate the predictive advantage of our model compared to its most common alternative in migration research, the gravity model. We also discuss our model in the context of other approaches, mostly developed in the social networks literature. Substantively, we find that population, economic prosperity, the spatial and political distance between the origin and destination, the strength of the AKP (Justice and Development Party) in a province, and the network characteristics of the provinces are important predictors of migration, whereas the proportion of ethnic minority Kurds in a province has no positive association with in- and out-migration.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48567827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-15DOI: 10.1177/00811750231177026
Satu Helske, Jouni Helske, Guilherme K. Chihaya
Sequence analysis is increasingly used in the social sciences for the holistic analysis of life-course and other longitudinal data. The usual approach is to construct sequences, calculate dissimilarities, group similar sequences with cluster analysis, and use cluster membership as a dependent or independent variable in a regression model. This approach may be problematic, as cluster memberships are assumed to be fixed known characteristics of the subjects in subsequent analyses. Furthermore, it is often more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for uncertain and mixed memberships may lead to wrong conclusions about the nature of the studied relationships. In this article, the authors bring forward and discuss the problems of the “traditional” use of sequence analysis clusters as variables and compare four approaches for creating explanatory variables from sequence dissimilarities using different types of data. The authors conduct simulation and empirical studies, demonstrating the importance of considering how sequences and outcomes are related and the need to adjust analyses accordingly. In many typical social science applications, the traditional approach is prone to result in wrong conclusions, and similarity-based approaches such as representativeness should be preferred.
{"title":"From Sequences to Variables: Rethinking the Relationship between Sequences and Outcomes","authors":"Satu Helske, Jouni Helske, Guilherme K. Chihaya","doi":"10.1177/00811750231177026","DOIUrl":"https://doi.org/10.1177/00811750231177026","url":null,"abstract":"Sequence analysis is increasingly used in the social sciences for the holistic analysis of life-course and other longitudinal data. The usual approach is to construct sequences, calculate dissimilarities, group similar sequences with cluster analysis, and use cluster membership as a dependent or independent variable in a regression model. This approach may be problematic, as cluster memberships are assumed to be fixed known characteristics of the subjects in subsequent analyses. Furthermore, it is often more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for uncertain and mixed memberships may lead to wrong conclusions about the nature of the studied relationships. In this article, the authors bring forward and discuss the problems of the “traditional” use of sequence analysis clusters as variables and compare four approaches for creating explanatory variables from sequence dissimilarities using different types of data. The authors conduct simulation and empirical studies, demonstrating the importance of considering how sequences and outcomes are related and the need to adjust analyses accordingly. In many typical social science applications, the traditional approach is prone to result in wrong conclusions, and similarity-based approaches such as representativeness should be preferred.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134890272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}