Supervised machine learning for exploratory analysis in family research

IF 2.7 1区 社会学 Q1 FAMILY STUDIES Journal of Marriage and Family Pub Date : 2024-02-14 DOI:10.1111/jomf.12973
Xiaoran Sun
{"title":"Supervised machine learning for exploratory analysis in family research","authors":"Xiaoran Sun","doi":"10.1111/jomf.12973","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.</p>\n </section>\n \n <section>\n \n <h3> Background</h3>\n \n <p>Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (<i>N</i> = 5114 adolescents, 50.53% female, <i>M</i><sub>age</sub> = 15.94, <i>SD</i> = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The random forest model performed best (<i>R</i><sup>2</sup> = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Supervised ML research can be used to inform further confirmatory analyses and advance theory.</p>\n </section>\n </div>","PeriodicalId":48440,"journal":{"name":"Journal of Marriage and Family","volume":"86 5","pages":"1468-1494"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jomf.12973","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Marriage and Family","FirstCategoryId":"90","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jomf.12973","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FAMILY STUDIES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

This article introduces supervised machine learning (ML) for conducting exploratory, discovery-oriented family research in a transparent and systematic way.

Background

Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis.

Method

Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV (N = 5114 adolescents, 50.53% female, Mage = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations.

Results

The random forest model performed best (R2 = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged.

Conclusions

Supervised ML research can be used to inform further confirmatory analyses and advance theory.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于家庭研究探索性分析的有监督机器学习
有监督的机器学习(ML)可以同时检查大量变量、识别关键预测因子并探索预测因子之间的模式--这种方法可能有助于解决家庭研究中对缺乏理论特异性和普遍存在的无指导探索性分析的担忧。在概述了监督式 ML 之后,我们利用全国青少年健康纵向研究(Add Health)第一至第四波的数据集(N = 5114 名青少年,50.53% 为女性,Mage = 15.94,SD = 1.77(第一波))进行了实例分析。从使用第一至第四波 Add Health 数据的 143 篇文章中,确定了八个领域(如社会经济、养育子女、健康)中的 62 个青少年家庭变量,作为年轻成人(24-32 岁)受教育程度的预测因素。在基准回归模型之后,使用 Lasso 回归、决策树、随机森林和极端梯度提升等方法训练了 ML 模型;这些模型与训练数据分别进行了测试,并通过 SHapley Additive exPlanations 进行了解释:有 14 个变量被确定为教育程度的关键预测因素。这些预测因素之间出现了模式,包括方向性、非线性和交互作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
12.20
自引率
6.70%
发文量
81
期刊介绍: For more than 70 years, Journal of Marriage and Family (JMF) has been a leading research journal in the family field. JMF features original research and theory, research interpretation and reviews, and critical discussion concerning all aspects of marriage, other forms of close relationships, and families.In 2009, an institutional subscription to Journal of Marriage and Family includes a subscription to Family Relations and Journal of Family Theory & Review.
期刊最新文献
Issue Information Introduction to mid-decade Special Issue on Theory and Methods The ties that bind: Questions for studying families in neighborhood contexts Issue Information Looking beyond marital status: What we can learn from relationship status measures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1