{"title":"用于家庭研究探索性分析的有监督机器学习","authors":"Xiaoran Sun","doi":"10.1111/jomf.12973","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (<i>N</i> = 5114 adolescents, 50.53% female, <i>M</i><sub>age</sub> = 15.94, <i>SD</i> = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The random forest model performed best (<i>R</i><sup>2</sup> = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Supervised ML research can be used to inform further confirmatory analyses and advance theory.</p>\n </section>\n </div>","PeriodicalId":48440,"journal":{"name":"Journal of Marriage and Family","volume":"86 5","pages":"1468-1494"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jomf.12973","citationCount":"0","resultStr":"{\"title\":\"Supervised machine learning for exploratory analysis in family research\",\"authors\":\"Xiaoran Sun\",\"doi\":\"10.1111/jomf.12973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Method</h3>\\n \\n <p>Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (<i>N</i> = 5114 adolescents, 50.53% female, <i>M</i><sub>age</sub> = 15.94, <i>SD</i> = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The random forest model performed best (<i>R</i><sup>2</sup> = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Supervised ML research can be used to inform further confirmatory analyses and advance theory.</p>\\n </section>\\n </div>\",\"PeriodicalId\":48440,\"journal\":{\"name\":\"Journal of Marriage and Family\",\"volume\":\"86 5\",\"pages\":\"1468-1494\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jomf.12973\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Marriage and Family\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jomf.12973\",\"RegionNum\":1,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"FAMILY STUDIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Marriage and Family","FirstCategoryId":"90","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jomf.12973","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}
引用次数: 0
摘要
有监督的机器学习(ML)可以同时检查大量变量、识别关键预测因子并探索预测因子之间的模式--这种方法可能有助于解决家庭研究中对缺乏理论特异性和普遍存在的无指导探索性分析的担忧。在概述了监督式 ML 之后,我们利用全国青少年健康纵向研究(Add Health)第一至第四波的数据集(N = 5114 名青少年,50.53% 为女性,Mage = 15.94,SD = 1.77(第一波))进行了实例分析。从使用第一至第四波 Add Health 数据的 143 篇文章中,确定了八个领域(如社会经济、养育子女、健康)中的 62 个青少年家庭变量,作为年轻成人(24-32 岁)受教育程度的预测因素。在基准回归模型之后,使用 Lasso 回归、决策树、随机森林和极端梯度提升等方法训练了 ML 模型;这些模型与训练数据分别进行了测试,并通过 SHapley Additive exPlanations 进行了解释:有 14 个变量被确定为教育程度的关键预测因素。这些预测因素之间出现了模式,包括方向性、非线性和交互作用。
Supervised machine learning for exploratory analysis in family research
Objective
This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.
Background
Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.
Method
Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (N = 5114 adolescents, 50.53% female, Mage = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.
Results
The random forest model performed best (R2 = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.
Conclusions
Supervised ML research can be used to inform further confirmatory analyses and advance theory.
期刊介绍:
For more than 70 years, Journal of Marriage and Family (JMF) has been a leading research journal in the family field. JMF features original research and theory, research interpretation and reviews, and critical discussion concerning all aspects of marriage, other forms of close relationships, and families.In 2009, an institutional subscription to Journal of Marriage and Family includes a subscription to Family Relations and Journal of Family Theory & Review.