Haiyan Bai, Xing Liu, F. Bai, Yuting Chen, Randyll Pandohie
Machine learning has become one of the important methods to process big data. It has made a breakthrough in the limitations of traditional statistical models dealing with high-dimensional data. The current study is to introduce and discuss about how machine learning method can be implemented in high-dimensional education data and help with increasing the model efficacy in dealing with high-dimensional education data. A demonstration of the implementation with an empirical data set is also provided.
{"title":"Machine Learning Method for High-Dimensional Education Data","authors":"Haiyan Bai, Xing Liu, F. Bai, Yuting Chen, Randyll Pandohie","doi":"10.2458/jmmss.5396","DOIUrl":"https://doi.org/10.2458/jmmss.5396","url":null,"abstract":"Machine learning has become one of the important methods to process big data. It has made a breakthrough in the limitations of traditional statistical models dealing with high-dimensional data. The current study is to introduce and discuss about how machine learning method can be implemented in high-dimensional education data and help with increasing the model efficacy in dealing with high-dimensional education data. A demonstration of the implementation with an empirical data set is also provided.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43313113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Schwartz, Roland B. Stark, Elijah Biletch, Richard B. B. Stuart
Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data.
{"title":"Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy","authors":"C. Schwartz, Roland B. Stark, Elijah Biletch, Richard B. B. Stuart","doi":"10.2458/jmmss.5397","DOIUrl":"https://doi.org/10.2458/jmmss.5397","url":null,"abstract":"Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48496402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Invitation for COVID-19 Submissions","authors":"E. Board","doi":"10.2458/jmmss.5395","DOIUrl":"https://doi.org/10.2458/jmmss.5395","url":null,"abstract":"Invitation from the Editor","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42888446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A barrier that prevents many social scientists from pursuing big data research is the lack of technical training required to assemble and organize big data. In an effort to address this barrier, we provide an introductory tutorial into machine learning for social scientists by demonstrating the basic steps and fundamental concepts involved in binary classification. We first describe the data and libraries required for analysis. We then demonstrate data cleaning methods, feature engineering, the model-building process, model assessment, and feature importance. Last, we discuss the ways in which social scientists can use machine learning to complement inference-based approaches and how it can contribute to a richer understanding of social science.
{"title":"Binary Classification: An Introductory Machine Learning Tutorial for Social Scientists","authors":"Vivian P. Ta, Leonardo Carrico, Arthur Bousquet","doi":"10.2458/jmmss.5186","DOIUrl":"https://doi.org/10.2458/jmmss.5186","url":null,"abstract":"A barrier that prevents many social scientists from pursuing big data research is the lack of technical training required to assemble and organize big data. In an effort to address this barrier, we provide an introductory tutorial into machine learning for social scientists by demonstrating the basic steps and fundamental concepts involved in binary classification. We first describe the data and libraries required for analysis. We then demonstrate data cleaning methods, feature engineering, the model-building process, model assessment, and feature importance. Last, we discuss the ways in which social scientists can use machine learning to complement inference-based approaches and how it can contribute to a richer understanding of social science.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45687724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guide for ContributorsEditorial PersonnelFrom the EditorsReviewers
投稿人指南编辑人员来自编辑评审员
{"title":"Journal of Methods and Measurement in the Social Sciences","authors":"Editorial Board","doi":"10.2458/jmmss.5185","DOIUrl":"https://doi.org/10.2458/jmmss.5185","url":null,"abstract":"Guide for ContributorsEditorial PersonnelFrom the EditorsReviewers","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47373272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Figueredo, V. Smith-Castro, Mateo Peñaherrera-Aguirre
The present article describes the development of a Modern Biased Information Test (MBIT) inspired by the work published by Donald Campbell in 1950 on indirect measures of prejudice. A biased information test aims to tap individuals' intergroup attitudes from the selective information they use to describe group members. Two biased information tests were developed to measure ethnocentric and androcentric biases, respectively, and applied in four convenience samples of students from two different cultural settings (Costa Rica and the USA). The internal consistency for the accuracy indicators derived from both tests was acceptable and comparable across cultures. In contrast, the internal consistency for ethnocentric biases was adequate across samples and cultures, but the internal consistency for androcentric biases was unacceptable across both cultures. Results are discussed in the line of the usefulness of alternative measures for tapping implicit attitudes.
{"title":"The Modern Biased Information Test: Proposing alternatives for implicit measures","authors":"A. Figueredo, V. Smith-Castro, Mateo Peñaherrera-Aguirre","doi":"10.2458/jmmss.2966","DOIUrl":"https://doi.org/10.2458/jmmss.2966","url":null,"abstract":"The present article describes the development of a Modern Biased Information Test (MBIT) inspired by the work published by Donald Campbell in 1950 on indirect measures of prejudice. A biased information test aims to tap individuals' intergroup attitudes from the selective information they use to describe group members. Two biased information tests were developed to measure ethnocentric and androcentric biases, respectively, and applied in four convenience samples of students from two different cultural settings (Costa Rica and the USA). The internal consistency for the accuracy indicators derived from both tests was acceptable and comparable across cultures. In contrast, the internal consistency for ethnocentric biases was adequate across samples and cultures, but the internal consistency for androcentric biases was unacceptable across both cultures. Results are discussed in the line of the usefulness of alternative measures for tapping implicit attitudes.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44242682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From the Editors","authors":"E. Board","doi":"10.2458/jmmss.3058","DOIUrl":"https://doi.org/10.2458/jmmss.3058","url":null,"abstract":"","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47586612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using an example from animal cognition, I argue that the problems of bias—inherent in choosing null hypotheses or setting Bayesian priors—can sometimes be avoided altogether by collecting more and better observational data before setting up tests of any sort.
{"title":"In Defense of Fishing","authors":"R. Byrne","doi":"10.2458/jmmss.3063","DOIUrl":"https://doi.org/10.2458/jmmss.3063","url":null,"abstract":"Using an example from animal cognition, I argue that the problems of bias—inherent in choosing null hypotheses or setting Bayesian priors—can sometimes be avoided altogether by collecting more and better observational data before setting up tests of any sort.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44231235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The elder statesmen of psychology – such as Lewis Petrinovich, Donald Campbell, Samuel Messick, Kurt Lewin, and Paul Meehl – have been instructing on methodology for decades. But psychology seems to have a short memory and an aversion to becoming a cumulative science. the work, the measured effects of in specific environments and with specific populations say that Lewis Petrinovich landed squarely on some of my pet peeves about research in psychology. his treatise I find myself asking, the field no memory? When are we going to learn to build a sound science that is cumulative? There are, however, some glimmers of
{"title":"Echoes from the Past: Meaning in Measures, Environments, and Predictions","authors":"B. Krauss","doi":"10.2458/jmmss.3064","DOIUrl":"https://doi.org/10.2458/jmmss.3064","url":null,"abstract":"The elder statesmen of psychology – such as Lewis Petrinovich, Donald Campbell, Samuel Messick, Kurt Lewin, and Paul Meehl – have been instructing on methodology for decades. But psychology seems to have a short memory and an aversion to becoming a cumulative science. the work, the measured effects of in specific environments and with specific populations say that Lewis Petrinovich landed squarely on some of my pet peeves about research in psychology. his treatise I find myself asking, the field no memory? When are we going to learn to build a sound science that is cumulative? There are, however, some glimmers of","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46051396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petrinovich’s target article focused on how behavioral science is done, including how it is often done wrong, and how it should be done. I identify another malign influence on behavioral science, which, so far as I know, has, until now, been ignored (I would be happy to be shown that I am wrong on this). To wit, the way that Introductions to papers are written creates a niche that can be exploited for the purposes of promoting one’s work to obtain resources or status, or for self-aggrandizement. I offer a few, probably wrongheaded, suggestions for ending this practice.
{"title":"Marvel Cinematic Universe Introductions","authors":"A. Weiss","doi":"10.2458/jmmss.3066","DOIUrl":"https://doi.org/10.2458/jmmss.3066","url":null,"abstract":"Petrinovich’s target article focused on how behavioral science is done, including how it is often done wrong, and how it should be done. I identify another malign influence on behavioral science, which, so far as I know, has, until now, been ignored (I would be happy to be shown that I am wrong on this). To wit, the way that Introductions to papers are written creates a niche that can be exploited for the purposes of promoting one’s work to obtain resources or status, or for self-aggrandizement. I offer a few, probably wrongheaded, suggestions for ending this practice.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43620828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}