在协变量数据缺失的情况下使用分类和回归树进行倾向评分估计

Q3 Mathematics Epidemiologic Methods Pub Date : 2018-07-25 DOI:10.1515/em-2017-0020
Bas B L Penning de Vries, M. van Smeden, R. Groenwold
{"title":"在协变量数据缺失的情况下使用分类和回归树进行倾向评分估计","authors":"Bas B L Penning de Vries, M. van Smeden, R. Groenwold","doi":"10.1515/em-2017-0020","DOIUrl":null,"url":null,"abstract":"Abstract Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposure-outcome effects among the exposed, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.","PeriodicalId":37999,"journal":{"name":"Epidemiologic Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data\",\"authors\":\"Bas B L Penning de Vries, M. van Smeden, R. Groenwold\",\"doi\":\"10.1515/em-2017-0020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposure-outcome effects among the exposed, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.\",\"PeriodicalId\":37999,\"journal\":{\"name\":\"Epidemiologic Methods\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiologic Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/em-2017-0020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiologic Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/em-2017-0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 8

摘要

数据挖掘和机器学习技术,如分类和回归树(CART)代表了传统逻辑回归对倾向评分估计的一个有希望的替代方案。虽然不完整的数据排除了对所有受试者进行逻辑回归的拟合,但CART之所以吸引人,部分原因是一些实现允许将不完整的记录纳入树拟合中,并为所有受试者提供倾向得分估计。基于理论上的考虑,我们认为CART对丢失数据的自动处理可能并不合适。通过一系列模拟实验,我们检验了处理缺失协变量数据的不同方法的性能;(i)将CART算法直接应用于(部分)不完整的数据,(ii)完整的案例分析,以及(iii)多次插值。评估的标准是评估暴露者的暴露-结果效应偏差、标准误差、均方误差和覆盖率。将CART算法直接应用于不完整的数据会导致偏差,即使在数据完全随机丢失的情况下也是如此。总体而言,多次插补后进行CART的效果最好。我们的研究表明,自动处理CART中缺失的数据可能会导致严重的偏差,并且作为一种解释缺失数据的手段,多重输入的效果并不好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data
Abstract Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that the automatic handling of missing data by CART may however not be appropriate. Using a series of simulation experiments, we examined the performance of different approaches to handling missing covariate data; (i) applying the CART algorithm directly to the (partially) incomplete data, (ii) complete case analysis, and (iii) multiple imputation. Performance was assessed in terms of bias in estimating exposure-outcome effects among the exposed, standard error, mean squared error and coverage. Applying the CART algorithm directly to incomplete data resulted in bias, even in scenarios where data were missing completely at random. Overall, multiple imputation followed by CART resulted in the best performance. Our study showed that automatic handling of missing data in CART can cause serious bias and does not outperform multiple imputation as a means to account for missing data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Epidemiologic Methods
Epidemiologic Methods Mathematics-Applied Mathematics
CiteScore
2.10
自引率
0.00%
发文量
7
期刊介绍: Epidemiologic Methods (EM) seeks contributions comparable to those of the leading epidemiologic journals, but also invites papers that may be more technical or of greater length than what has traditionally been allowed by journals in epidemiology. Applications and examples with real data to illustrate methodology are strongly encouraged but not required. Topics. genetic epidemiology, infectious disease, pharmaco-epidemiology, ecologic studies, environmental exposures, screening, surveillance, social networks, comparative effectiveness, statistical modeling, causal inference, measurement error, study design, meta-analysis
期刊最新文献
Linked shrinkage to improve estimation of interaction effects in regression models. Bounds for selection bias using outcome probabilities Population dynamic study of two prey one predator system with disease in first prey using fuzzy impulsive control Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes. Performance evaluation of ResNet model for classification of tomato plant disease
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1