A Comparison of Two Strategies for Building an Exposure Prediction Model.

Annals of Occupational Hygiene Pub Date : 2016-01-01 Epub Date: 2015-09-30 DOI:10.1093/annhyg/mev072

Marina Heiden, Svend Erik Mathiassen, Jennifer Garza, Per Liv, Jens Wahlström

{"title":"A Comparison of Two Strategies for Building an Exposure Prediction Model.","authors":"Marina Heiden, Svend Erik Mathiassen, Jennifer Garza, Per Liv, Jens Wahlström","doi":"10.1093/annhyg/mev072","DOIUrl":null,"url":null,"abstract":"<p><p>Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users. </p>","PeriodicalId":8458,"journal":{"name":"Annals of Occupational Hygiene","volume":"60 1","pages":"74-89"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/annhyg/mev072","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Occupational Hygiene","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/annhyg/mev072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/9/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

建立暴露预测模型的两种策略比较。

对大量人口的工作暴露进行具有成本效益的评估，可以从模型中获得，在这种模型中，用昂贵的测量方法评估的“真实”暴露是由容易获得和廉价的预测器估计出来的。通常，这些模型是建立在验证研究的基础上的，该研究包括“真实的”暴露数据以及从问卷调查或公司数据中广泛收集的候选预测因子，由于建模可用的自由度的限制，这些预测因子不能全部包含在模型中。在这些情况下，需要使用能够识别候选预测器的最佳可能子集的程序来选择预测器。本研究比较了选择一组预测变量的两种策略。一种策略依赖于预测因子与暴露之间关联的逐步假设检验，而另一种策略使用聚类分析来减少预测因子的数量，而不依赖于测量暴露的经验信息。将这两种策略应用于计算机用户的生物力学暴露和候选预测因子的相同数据集，并根据确定的暴露预测因子以及使用原始数据的自举样本得出的模型拟合来比较它们。识别的预测因子在很大程度上不同于两种策略，并且初始模型拟合对于逐步测试策略比聚类方法更好。使用固定预测因子的自举重采样模型的内部验证表明，两种策略在重采样数据集中的模型拟合程度相同。然而，当逐步检验策略的验证过程中纳入预测器选择时，模型拟合程度降低到两种策略显示相似的模型拟合程度。因此，这两种策略在预测其他计算机用户样本的生物力学暴露方面都表现不佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Annals of Occupational Hygiene 医学-毒理学

自引率

0.00%

发文量

审稿时长

2 months