Pub Date : 2019-07-23DOI: 10.1177/0081175019862839
David Melamed, Michael Vuolo
In multilevel data, cross-classified data structures are common. For example, this occurs when individuals move to different regions in longitudinal data or students go to different secondary schools than their primary school peers. In both cases, the data structure is no longer fully nested. Estimating cross-classified multilevel models is computationally intensive, so researchers have used several shortcuts to decrease run time. We consider how these shortcuts affect parameter estimates. In particular, we compare parameter estimates from fully nested and cross-classified models using a series of Monte Carlo simulations. When the outcome is continuous, we identify systematic differences in estimated standard errors and some differences in the estimated variance components. When the outcome is binary, we also find differences in the estimated coefficients. Accordingly, we caution researchers to avoid fully nested model specifications when cross-classification exists but suggest some limited conditions under which parameter estimates are unlikely to be different.
{"title":"Assessing Differences between Nested and Cross-Classified Hierarchical Models","authors":"David Melamed, Michael Vuolo","doi":"10.1177/0081175019862839","DOIUrl":"https://doi.org/10.1177/0081175019862839","url":null,"abstract":"In multilevel data, cross-classified data structures are common. For example, this occurs when individuals move to different regions in longitudinal data or students go to different secondary schools than their primary school peers. In both cases, the data structure is no longer fully nested. Estimating cross-classified multilevel models is computationally intensive, so researchers have used several shortcuts to decrease run time. We consider how these shortcuts affect parameter estimates. In particular, we compare parameter estimates from fully nested and cross-classified models using a series of Monte Carlo simulations. When the outcome is continuous, we identify systematic differences in estimated standard errors and some differences in the estimated variance components. When the outcome is binary, we also find differences in the estimated coefficients. Accordingly, we caution researchers to avoid fully nested model specifications when cross-classification exists but suggest some limited conditions under which parameter estimates are unlikely to be different.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"220 - 257"},"PeriodicalIF":3.0,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019862839","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45444964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1177/0081175019862259
A. Sjölander, Y. Ning
The case-time-control design is a tool to control for measured, time-varying covariates that increase montonically in time within each subject while also controlling for all unmeasured covariates that are constant within each subject across time. Until recently, the design was restricted to data with only two timepoints and a single binary covariate, or data with a binary exposure. Sjölander (2017) made an important extension that allows for an arbitrary number of timepoints and covariates and a nonbinary exposure. However, his estimation method requires fairly strong model assumptions, and it may create bias if these assumptions are violated. We propose a novel estimation method for the case-time-control design, which to a large extent relaxes the model assumptions in Sjölander. We show in simulations that this estimation method performs well under a range of scenarios and gives consistent estimates when Sjölander’s estimation does not.
{"title":"A General and Robust Estimation Method for the Case-Time-Control Design","authors":"A. Sjölander, Y. Ning","doi":"10.1177/0081175019862259","DOIUrl":"https://doi.org/10.1177/0081175019862259","url":null,"abstract":"The case-time-control design is a tool to control for measured, time-varying covariates that increase montonically in time within each subject while also controlling for all unmeasured covariates that are constant within each subject across time. Until recently, the design was restricted to data with only two timepoints and a single binary covariate, or data with a binary exposure. Sjölander (2017) made an important extension that allows for an arbitrary number of timepoints and covariates and a nonbinary exposure. However, his estimation method requires fairly strong model assumptions, and it may create bias if these assumptions are violated. We propose a novel estimation method for the case-time-control design, which to a large extent relaxes the model assumptions in Sjölander. We show in simulations that this estimation method performs well under a range of scenarios and gives consistent estimates when Sjölander’s estimation does not.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"349 - 365"},"PeriodicalIF":3.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019862259","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48555335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-19DOI: 10.1177/0081175019860244
Han Zhang, Jennifer Pan
Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict.
{"title":"CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media","authors":"Han Zhang, Jennifer Pan","doi":"10.1177/0081175019860244","DOIUrl":"https://doi.org/10.1177/0081175019860244","url":null,"abstract":"Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"1 - 57"},"PeriodicalIF":3.0,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019860244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46607543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-06-20DOI: 10.1177/0081175019852763
Trenton D. Mize, Long Doan, J. S. Long
Many research questions involve comparing predictions or effects across multiple models. For example, it may be of interest whether an independent variable’s effect changes after adding variables to a model. Or, it could be important to compare a variable’s effect on different outcomes or across different types of models. When doing this, marginal effects are a useful method for quantifying effects because they are in the natural metric of the dependent variable and they avoid identification problems when comparing regression coefficients across logit and probit models. Despite advances that make it possible to compute marginal effects for almost any model, there is no general method for comparing these effects across models. In this article, the authors provide a general framework for comparing predictions and marginal effects across models using seemingly unrelated estimation to combine estimates from multiple models, which allows tests of the equality of predictions and effects across models. The authors illustrate their method to compare nested models, to compare effects on different dependent or independent variables, to compare results from different samples or groups within one sample, and to assess results from different types of models.
{"title":"A General Framework for Comparing Predictions and Marginal Effects across Models","authors":"Trenton D. Mize, Long Doan, J. S. Long","doi":"10.1177/0081175019852763","DOIUrl":"https://doi.org/10.1177/0081175019852763","url":null,"abstract":"Many research questions involve comparing predictions or effects across multiple models. For example, it may be of interest whether an independent variable’s effect changes after adding variables to a model. Or, it could be important to compare a variable’s effect on different outcomes or across different types of models. When doing this, marginal effects are a useful method for quantifying effects because they are in the natural metric of the dependent variable and they avoid identification problems when comparing regression coefficients across logit and probit models. Despite advances that make it possible to compute marginal effects for almost any model, there is no general method for comparing these effects across models. In this article, the authors provide a general framework for comparing predictions and marginal effects across models using seemingly unrelated estimation to combine estimates from multiple models, which allows tests of the equality of predictions and effects across models. The authors illustrate their method to compare nested models, to compare effects on different dependent or independent variables, to compare results from different samples or groups within one sample, and to assess results from different types of models.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"152 - 189"},"PeriodicalIF":3.0,"publicationDate":"2019-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019852763","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43162214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-06-18DOI: 10.1177/0081175019852762
Jan Goldenstein, Philipp Poschmann
Social scientists have recently started discussing the utilization of text-mining tools as being fruitful for scaling inductively grounded close reading. We aim to progress in this direction and provide a contemporary contribution to the literature. By focusing on map analysis, we demonstrate the potential of text-mining tools for text analysis that approaches inductive but still formal in-depth analysis. We propose that a combination of text-mining tools addressing different layers of meaning facilitates a closer analysis of the dynamics of manifest and latent meanings than is currently acknowledged. To illustrate our approach, we combine grammatical parsing and topic modeling to operationalize communication structures within sentences and the semantic surroundings of these communication structures. We use a reliable and downloadable software application to analyze the dynamic interlacement of two layers of meaning over time. We do so by analyzing 15,371 newspaper articles on corporate responsibility published in the United States from 1950 to 2013.
{"title":"Analyzing Meaning in Big Data: Performing a Map Analysis Using Grammatical Parsing and Topic Modeling","authors":"Jan Goldenstein, Philipp Poschmann","doi":"10.1177/0081175019852762","DOIUrl":"https://doi.org/10.1177/0081175019852762","url":null,"abstract":"Social scientists have recently started discussing the utilization of text-mining tools as being fruitful for scaling inductively grounded close reading. We aim to progress in this direction and provide a contemporary contribution to the literature. By focusing on map analysis, we demonstrate the potential of text-mining tools for text analysis that approaches inductive but still formal in-depth analysis. We propose that a combination of text-mining tools addressing different layers of meaning facilitates a closer analysis of the dynamics of manifest and latent meanings than is currently acknowledged. To illustrate our approach, we combine grammatical parsing and topic modeling to operationalize communication structures within sentences and the semantic surroundings of these communication structures. We use a reliable and downloadable software application to analyze the dynamic interlacement of two layers of meaning over time. We do so by analyzing 15,371 newspaper articles on corporate responsibility published in the United States from 1950 to 2013.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"23 1","pages":"131 - 83"},"PeriodicalIF":3.0,"publicationDate":"2019-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019852762","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64851195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-09DOI: 10.1177/0081175019842263
Nynke M. D. Niezink, T. Snijders, M. V. van Duijn
The dynamics of individual behavior are related to the dynamics of the social structures in which individuals are embedded. This implies that in order to study social mechanisms such as social selection or peer influence, we need to model the evolution of social networks and the attributes of network actors as interdependent processes. The stochastic actor-oriented model is a statistical approach to study network-attribute coevolution based on longitudinal data. In its standard specification, the coevolving actor attributes are assumed to be measured on an ordinal categorical scale. Continuous variables first need to be discretized to fit into such a modeling framework. This article presents an extension of the stochastic actor-oriented model that does away with this restriction by using a stochastic differential equation to model the evolution of a continuous attribute. We propose a measure for explained variance and give an interpretation of parameter sizes. The proposed method is illustrated by a study of the relationship between friendship, alcohol consumption, and self-esteem among adolescents.
{"title":"No Longer Discrete: Modeling the Dynamics of Social Networks and Continuous Behavior","authors":"Nynke M. D. Niezink, T. Snijders, M. V. van Duijn","doi":"10.1177/0081175019842263","DOIUrl":"https://doi.org/10.1177/0081175019842263","url":null,"abstract":"The dynamics of individual behavior are related to the dynamics of the social structures in which individuals are embedded. This implies that in order to study social mechanisms such as social selection or peer influence, we need to model the evolution of social networks and the attributes of network actors as interdependent processes. The stochastic actor-oriented model is a statistical approach to study network-attribute coevolution based on longitudinal data. In its standard specification, the coevolving actor attributes are assumed to be measured on an ordinal categorical scale. Continuous variables first need to be discretized to fit into such a modeling framework. This article presents an extension of the stochastic actor-oriented model that does away with this restriction by using a stochastic differential equation to model the evolution of a continuous attribute. We propose a measure for explained variance and give an interpretation of parameter sizes. The proposed method is illustrated by a study of the relationship between friendship, alcohol consumption, and self-esteem among adolescents.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"295 - 340"},"PeriodicalIF":3.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175019842263","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49387493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-15DOI: 10.1177/0081175018809150
Marco Giesselmann, Alexander W. Schmidt-Catran
Multilevel models with persons nested in countries are increasingly popular in cross-country research. Recently, social scientists have started to analyze data with a three-level structure: persons at level 1, nested in year-specific country samples at level 2, nested in countries at level 3. By using a country fixed-effects estimator, or an alternative equivalent specification in a random-effects framework, this structure is increasingly used to estimate within-country effects in order to control for unobserved heterogeneity. For the main effects of country-level characteristics, such estimators have been shown to have desirable statistical properties. However, estimators of cross-level interactions in these models are not exhibiting these attractive properties: as algebraic transformations show, they are not independent of between-country variation and thus carry country-specific heterogeneity. Monte Carlo experiments consistently reveal the standard approaches to within estimation to provide biased estimates of cross-level interactions in the presence of an unobserved correlated moderator at the country level. To obtain an unbiased within-country estimator of a cross-level interaction, effect heterogeneity must be systematically controlled. By replicating a published analysis, we demonstrate the relevance of this extended country fixed-effects estimator in research practice. The intent of this article is to provide advice for multilevel practitioners, who will be increasingly confronted with the availability of pooled cross-sectional survey data.
{"title":"Getting the Within Estimator of Cross-Level Interactions in Multilevel Models with Pooled Cross-Sections: Why Country Dummies (Sometimes) Do Not Do the Job","authors":"Marco Giesselmann, Alexander W. Schmidt-Catran","doi":"10.1177/0081175018809150","DOIUrl":"https://doi.org/10.1177/0081175018809150","url":null,"abstract":"Multilevel models with persons nested in countries are increasingly popular in cross-country research. Recently, social scientists have started to analyze data with a three-level structure: persons at level 1, nested in year-specific country samples at level 2, nested in countries at level 3. By using a country fixed-effects estimator, or an alternative equivalent specification in a random-effects framework, this structure is increasingly used to estimate within-country effects in order to control for unobserved heterogeneity. For the main effects of country-level characteristics, such estimators have been shown to have desirable statistical properties. However, estimators of cross-level interactions in these models are not exhibiting these attractive properties: as algebraic transformations show, they are not independent of between-country variation and thus carry country-specific heterogeneity. Monte Carlo experiments consistently reveal the standard approaches to within estimation to provide biased estimates of cross-level interactions in the presence of an unobserved correlated moderator at the country level. To obtain an unbiased within-country estimator of a cross-level interaction, effect heterogeneity must be systematically controlled. By replicating a published analysis, we demonstrate the relevance of this extended country fixed-effects estimator in research practice. The intent of this article is to provide advice for multilevel practitioners, who will be increasingly confronted with the availability of pooled cross-sectional survey data.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"49 1","pages":"190 - 219"},"PeriodicalIF":3.0,"publicationDate":"2018-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175018809150","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44725827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-08-01DOI: 10.1177/0081175018794489
O. Vassend
1. The Bayesian information criterion (BIC) has been proposed as a way to carry out Bayesian hypothesis testing when there are no clear expectations. However, the BIC rests on a particular prior distribution, for which there is rarely any justification. See Raftery (1995) on the case for the BIC and Weakliem (1999) for a critique. 2. The assumption that the sample is of the same size is important. To obtain the expected prediction error in a sample of arbitrary size, it is necessary to know the true model. Consequently, there is no method of model selection that uniformly leads to better out-of-sample predictions. 3. Schultz proposes that the value should be exp(AIC2 – AIC1), or about .0025 in this example. I think this is mistaken, and it should be exp{(AIC2 – AIC1)/2}. The general point about considering the theoretical probability of a nonzero value applies regardless of which formula is correct.
{"title":"Comment: The Inferential Information Criterion from a Bayesian Point of View","authors":"O. Vassend","doi":"10.1177/0081175018794489","DOIUrl":"https://doi.org/10.1177/0081175018794489","url":null,"abstract":"1. The Bayesian information criterion (BIC) has been proposed as a way to carry out Bayesian hypothesis testing when there are no clear expectations. However, the BIC rests on a particular prior distribution, for which there is rarely any justification. See Raftery (1995) on the case for the BIC and Weakliem (1999) for a critique. 2. The assumption that the sample is of the same size is important. To obtain the expected prediction error in a sample of arbitrary size, it is necessary to know the true model. Consequently, there is no method of model selection that uniformly leads to better out-of-sample predictions. 3. Schultz proposes that the value should be exp(AIC2 – AIC1), or about .0025 in this example. I think this is mistaken, and it should be exp{(AIC2 – AIC1)/2}. The general point about considering the theoretical probability of a nonzero value applies regardless of which formula is correct.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"48 1","pages":"91 - 97"},"PeriodicalIF":3.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/0081175018794489","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44420171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}