Marina Heiden, Svend Erik Mathiassen, Jennifer Garza, Per Liv, Jens Wahlström
{"title":"A Comparison of Two Strategies for Building an Exposure Prediction Model.","authors":"Marina Heiden, Svend Erik Mathiassen, Jennifer Garza, Per Liv, Jens Wahlström","doi":"10.1093/annhyg/mev072","DOIUrl":null,"url":null,"abstract":"<p><p>Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users. </p>","PeriodicalId":8458,"journal":{"name":"Annals of Occupational Hygiene","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/annhyg/mev072","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Occupational Hygiene","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/annhyg/mev072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/9/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users.