Sylvia Ek Sudat, Elizabeth J Carlton, Edmund Yw Seto, Robert C Spear, Alan E Hubbard
{"title":"利用因果推理中的变量重要性度量对中国农村地区血吸虫病感染风险因素进行排序。","authors":"Sylvia Ek Sudat, Elizabeth J Carlton, Edmund Yw Seto, Robert C Spear, Alan E Hubbard","doi":"10.1186/1742-5573-7-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter.</p><p><strong>Results: </strong>The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%).</p><p><strong>Conclusions: </strong>The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"3"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2913928/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China.\",\"authors\":\"Sylvia Ek Sudat, Elizabeth J Carlton, Edmund Yw Seto, Robert C Spear, Alan E Hubbard\",\"doi\":\"10.1186/1742-5573-7-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter.</p><p><strong>Results: </strong>The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%).</p><p><strong>Conclusions: </strong>The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.</p>\",\"PeriodicalId\":87082,\"journal\":{\"name\":\"Epidemiologic perspectives & innovations : EP+I\",\"volume\":\"7 \",\"pages\":\"3\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2913928/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiologic perspectives & innovations : EP+I\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/1742-5573-7-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiologic perspectives & innovations : EP+I","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1742-5573-7-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
背景:通过接触受污染的水而感染血吸虫病是一个全球性的公共卫生问题。本文分析了一项回顾性研究的数据,该研究报告了中国农村 1011 人的水接触和血吸虫病感染状况。我们通过比较以下三种分析方法,提出了识别风险因素的半参数方法:以预测为重点的机器学习算法、简单的主效应多变量回归和受因果人口干预参数启发的半参数变量重要性(VI)估计:多变量回归发现,只有工具清洗与结果相关,相对风险为 1.03,95% 置信区间(CI)为 1.01-1.05。在半参数 VI 分析中,发现三种类型的水接触与结果有关:七月接触水(VI 估计值 0.16,95% CI 0.11-0.22)、清洗工具接触水(VI 估计值 0.88,95% CI 0.80-0.97)和插秧接触水(VI 估计值 0.71,95% CI 0.53-0.96)。7月份的VI结果尤其显示出与感染状况的密切联系--其因果关系解释意味着,如果7月份不接触水,我们研究人群中的血吸虫病流行率将降低84%,即从0.3降至0.05(95% CI为78%-89%):7月VI估计值表明血吸虫病感染风险在季节内可能存在变化,而回归分析并未发现这种关联。尽管这项研究存在许多局限性,从而削弱了对因果关系进行解释的可能性,但如果能在接近实时的情况下检测到高风险时段,就能提供新的预防方案。最重要的是,我们强调,传统的回归方法通常是基于任意的预设模型,因此在实际应用中很难解释其参数。我们的研究结果支持分析方法的实际应用,相比之下,这些方法不需要任意预设模型,估算出的参数具有简单的公共卫生解释,并且在应用推论时将模型选择视为变异的来源。
Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China.
Background: Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter.
Results: The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%).
Conclusions: The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.