Epidemiologic perspectives & innovations : EP+I最新文献_第2页

Carcinogen metabolism, cigarette smoking, and breast cancer risk: a Bayes model averaging approach. 致癌物质代谢、吸烟和乳腺癌风险:贝叶斯模型平均方法。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-11-16 DOI: 10.1186/1742-5573-7-10

Nadine Stephenson, Lars Beckmann, Jenny Chang-Claude

Background: Standard logistic regression with or without stepwise selection has the disadvantage of not incorporating model uncertainty and the dependency of estimates on the underlying model into the final inference. We explore the use of a Bayes Model Averaging approach as an alternative to analyze the influence of genetic variants, environmental effects and their interactions on disease.

Methods: Logistic regression with and without stepwise selection and Bayes Model Averaging were applied to a population-based case-control study exploring the association of genetic variants in tobacco smoke-related carcinogen pathways with breast cancer.

Results: Both regression and Bayes Model Averaging highlighted a significant effect of NAT1*10 on breast cancer, while regression analysis also suggested a significant effect for packyears and for the interaction of packyears and NAT2.

Conclusions: Bayes Model Averaging allows incorporation of model uncertainty, helps reduce dimensionality and avoids the problem of multiple comparisons. It can be used to incorporate biological information, such as pathway data, into the analysis. As with all Bayesian analysis methods, careful consideration must be given to prior specification.

背景:有或没有逐步选择的标准逻辑回归的缺点是没有将模型不确定性和对潜在模型的估计依赖纳入最终推断。我们探索使用贝叶斯模型平均方法作为分析遗传变异、环境影响及其相互作用对疾病的影响的替代方法。方法:采用Logistic回归(有或没有逐步选择)和贝叶斯模型平均(Bayes Model Averaging)进行基于人群的病例对照研究，探讨烟草烟雾相关致癌物通路的遗传变异与乳腺癌的关系。结果:回归分析和贝叶斯模型平均均强调了NAT1*10对乳腺癌的显著影响，而回归分析也表明packyears以及packyears与NAT2的相互作用也有显著影响。结论:贝叶斯模型平均可以考虑模型的不确定性，有助于降低维数，避免多重比较问题。它可以用来将生物信息，如通路数据，纳入分析。与所有贝叶斯分析方法一样，必须仔细考虑事先的规范。

{"title":"Carcinogen metabolism, cigarette smoking, and breast cancer risk: a Bayes model averaging approach.","authors":"Nadine Stephenson, Lars Beckmann, Jenny Chang-Claude","doi":"10.1186/1742-5573-7-10","DOIUrl":"https://doi.org/10.1186/1742-5573-7-10","url":null,"abstract":"Background: Standard logistic regression with or without stepwise selection has the disadvantage of not incorporating model uncertainty and the dependency of estimates on the underlying model into the final inference. We explore the use of a Bayes Model Averaging approach as an alternative to analyze the influence of genetic variants, environmental effects and their interactions on disease.Methods: Logistic regression with and without stepwise selection and Bayes Model Averaging were applied to a population-based case-control study exploring the association of genetic variants in tobacco smoke-related carcinogen pathways with breast cancer.Results: Both regression and Bayes Model Averaging highlighted a significant effect of NAT1*10 on breast cancer, while regression analysis also suggested a significant effect for packyears and for the interaction of packyears and NAT2.Conclusions: Bayes Model Averaging allows incorporation of model uncertainty, helps reduce dimensionality and avoids the problem of multiple comparisons. It can be used to incorporate biological information, such as pathway data, into the analysis. As with all Bayesian analysis methods, careful consideration must be given to prior specification.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2010-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29471852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Categorisation of continuous risk factors in epidemiological publications: a survey of current practice. 流行病学出版物中持续危险因素的分类:当前实践的调查。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-10-15 DOI: 10.1186/1742-5573-7-9

Elizabeth L Turner, Joanna E Dobson, Stuart J Pocock

Background: Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.

Results: Of the 254 articles reviewed, 58 were included in our survey. Categorisation occurred in 50 (86%) of them. Of those, 42% also analysed the variable continuously and 24% considered alternative groupings. Most (78%) used 3 to 5 groups. No articles relied solely on dichotomisation, although it did feature prominently in 3 articles. The choice of group boundaries varied: 34% used quantiles, 18% equally spaced categories, 12% external criteria, 34% other approaches and 2% did not describe the approach used. Categorical risk estimates were most commonly (66%) presented as pairwise comparisons to a reference group, usually the highest or lowest (79%). Reporting of categorical analysis was mostly in tables; only 20% in figures.

Conclusions: Categorical analyses of continuous risk factors are common. Accordingly, we provide recommendations for good practice. Key issues include pre-defining appropriate choice of groupings and analysis strategies, clear presentation of grouped findings in tables and figures, and drawing valid conclusions from categorical analyses, avoiding injudicious use of multiple alternative analyses.

背景:观察性流行病学研究报告经常对(组)连续危险因素(暴露)变量进行分类。然而，很少有系统的评估分类是如何实践或报告在文献中，没有扩展的指导方针的做法已确定。因此，我们在流行病学文献中评估了这种做法的性质。对五个流行病学期刊和五个普通医学期刊进行了两个月(2007年12月和2008年1月)的综述。所有审查持续风险因素与健康结果之间关系的文章都采用标准形式进行了调查，重点是主要风险因素。利用调查结果，我们提供了说明性的例子，并结合了来自更广泛的文献和经验的想法，我们提供了良好实践的指导方针。结果:254篇文献中，58篇纳入我们的调查。其中50例(86%)发生了分类。其中42%的人还连续分析了变量，24%的人考虑了替代分组。大多数(78%)使用3到5组。没有一篇文章完全依赖于二分法，尽管它在3篇文章中占有突出地位。分组边界的选择各不相同:34%使用分位数，18%使用等间隔类别，12%使用外部标准，34%使用其他方法，2%没有描述所使用的方法。分类风险估计最常见的是(66%)作为参照组的两两比较，通常是最高或最低(79%)。分类分析报告多以表格形式;只有20%的数字。结论:连续危险因素的分类分析是常见的。因此，我们提供了一些好的实践建议。关键问题包括预先定义适当的分组和分析策略的选择，在表格和数字中清楚地展示分组的发现，从分类分析中得出有效的结论，避免不明智地使用多种替代分析。

{"title":"Categorisation of continuous risk factors in epidemiological publications: a survey of current practice.","authors":"Elizabeth L Turner, Joanna E Dobson, Stuart J Pocock","doi":"10.1186/1742-5573-7-9","DOIUrl":"https://doi.org/10.1186/1742-5573-7-9","url":null,"abstract":"Background: Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.Results: Of the 254 articles reviewed, 58 were included in our survey. Categorisation occurred in 50 (86%) of them. Of those, 42% also analysed the variable continuously and 24% considered alternative groupings. Most (78%) used 3 to 5 groups. No articles relied solely on dichotomisation, although it did feature prominently in 3 articles. The choice of group boundaries varied: 34% used quantiles, 18% equally spaced categories, 12% external criteria, 34% other approaches and 2% did not describe the approach used. Categorical risk estimates were most commonly (66%) presented as pairwise comparisons to a reference group, usually the highest or lowest (79%). Reporting of categorical analysis was mostly in tables; only 20% in figures.Conclusions: Categorical analyses of continuous risk factors are common. Accordingly, we provide recommendations for good practice. Key issues include pre-defining appropriate choice of groupings and analysis strategies, clear presentation of grouped findings in tables and figures, and drawing valid conclusions from categorical analyses, avoiding injudicious use of multiple alternative analyses.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2010-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29353851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 103

Population attributable fraction: comparison of two mathematical procedures to estimate the annual attributable number of deaths. 人口归因分数:估计年归因死亡人数的两种数学方法的比较。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-08-31 DOI: 10.1186/1742-5573-7-8

Bernard Ck Choi

Objective: The purpose of this paper was to compare two mathematical procedures to estimate the annual attributable number of deaths (the Allison et al procedure and the Mokdad et al procedure), and derive a new procedure that combines the best aspects of both procedures. The new procedure calculates attributable number of deaths along a continuum (i.e. for each unit of exposure), and allows for one or more neutral (neither exposed nor nonexposed) exposure categories.

Methods: Mathematical derivations and real datasets were used to demonstrate the theoretical relationship and practical differences between the two procedures. Results of the comparison were used to develop a new procedure that combines the best features of both.

Findings: The Allison procedure is complex because it directly estimates the number of attributable deaths. This necessitates calculation of probabilities of death. The Mokdad procedure is simpler because it estimates the number of attributable deaths indirectly through population attributable fractions. The probabilities of death cancel out in the numerator and denominator of the fractions. However, the Mokdad procedure is not applicable when a neutral exposure category exists.

Conclusion: By combining the innovation of the Allison procedure (allowing for a neutral category) and the simplicity of the Mokdad procedure (using population attributable fractions), this paper proposes a new procedure to calculate attributable numbers of death.

目的:本文的目的是比较估算每年可归因死亡人数的两种数学方法(Allison等人方法和Mokdad等人方法)，并推导出一种结合两种方法最佳方面的新方法。新程序计算沿连续体(即每个暴露单位)的可归因死亡人数，并允许一个或多个中性(既不暴露也不暴露)暴露类别。方法:采用数学推导和实际数据集，论证两种方法的理论关系和实际差异。比较的结果被用于开发一种结合两者最佳特征的新程序。研究结果:Allison程序是复杂的，因为它直接估计可归因死亡的数量。这就需要计算死亡的概率。Mokdad程序比较简单，因为它通过人口归因部分间接估计可归因死亡人数。死亡的概率在分数的分子和分母上约掉了。但是，如果存在中性暴露类别，则不适用Mokdad程序。结论:结合Allison程序的创新(允许中性类别)和Mokdad程序的简单性(使用人口归因分数)，本文提出了一种计算死亡归因人数的新程序。

{"title":"Population attributable fraction: comparison of two mathematical procedures to estimate the annual attributable number of deaths.","authors":"Bernard Ck Choi","doi":"10.1186/1742-5573-7-8","DOIUrl":"https://doi.org/10.1186/1742-5573-7-8","url":null,"abstract":"Objective: The purpose of this paper was to compare two mathematical procedures to estimate the annual attributable number of deaths (the Allison et al procedure and the Mokdad et al procedure), and derive a new procedure that combines the best aspects of both procedures. The new procedure calculates attributable number of deaths along a continuum (i.e. for each unit of exposure), and allows for one or more neutral (neither exposed nor nonexposed) exposure categories.Methods: Mathematical derivations and real datasets were used to demonstrate the theoretical relationship and practical differences between the two procedures. Results of the comparison were used to develop a new procedure that combines the best features of both.Findings: The Allison procedure is complex because it directly estimates the number of attributable deaths. This necessitates calculation of probabilities of death. The Mokdad procedure is simpler because it estimates the number of attributable deaths indirectly through population attributable fractions. The probabilities of death cancel out in the numerator and denominator of the fractions. However, the Mokdad procedure is not applicable when a neutral exposure category exists.Conclusion: By combining the innovation of the Allison procedure (allowing for a neutral category) and the simplicity of the Mokdad procedure (using population attributable fractions), this paper proposes a new procedure to calculate attributable numbers of death.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29278019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Author's response to Poole, C. Commentary: How Many Are Affected? A Real Limit of Epidemiology. 作者对 Poole, C. 评论的回应：有多少人受到影响？流行病学的真正局限。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-08-26 DOI: 10.1186/1742-5573-7-7

Nicolle M Gatto, Ulka B Campbell, Sharon Schwartz

引用次数: 0

How many are affected? A real limit of epidemiology. 有多少人受到影响?流行病学的真正局限。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-08-24 DOI: 10.1186/1742-5573-7-6

Charles Poole

A person can experience an effect on the occurrence of an outcome in a defined follow-up period without experiencing an effect on the risk of that outcome over the same period. Sufficient causes are sometimes used to deepen potential-outcome explanations of this phenomenon. In doing so, care should be taken to avoid tipping the balance between simplification and realism too far toward simplification. Death and other competing risks should not be assumed away. The time scale should be explicit, with specific times for the occurrence of specified component causes and for the completion of each sufficient cause. Component causes that affect risk should occur no later than the start of the risk period. Sufficient causes should be allowed to have component causes in common. When individuals experience all components of two or more sufficient causes, the outcome must be recurrent. In addition to effects on rates and risks, effects on incidence time itself should be considered.

一个人可以在一个确定的随访期内体验到对结果发生的影响，而不会在同一时期体验到对结果风险的影响。充分原因有时被用来加深对这一现象的潜在结果的解释。在这样做时，应注意避免在简化和现实主义之间的平衡过于简单化。死亡和其他相互竞争的风险不应该被排除在外。时间尺度应该是明确的，包括特定组成原因发生的具体时间和每个充分原因完成的具体时间。影响风险的组成原因不应迟于风险期的开始。应允许充分原因具有共同的组成原因。当个体经历两个或更多充分原因的所有组成部分时，结果必须是反复出现的。除了对发病率和风险的影响外，还应考虑对发病时间本身的影响。

引用次数: 6

Redundant causation from a sufficient cause perspective. 从充分原因的角度来看，冗余因果关系。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-08-02 DOI: 10.1186/1742-5573-7-5

Nicolle M Gatto, Ulka B Campbell

Sufficient causes of disease are redundant when an individual acquires the components of two or more sufficient causes. In this circumstance, the individual still would have become diseased even if one of the sufficient causes had not been acquired. In the context of a study, when any individuals acquire components of more than one sufficient cause over the observation period, the etiologic effect of the exposure (defined as the absolute or relative difference between the proportion of the exposed who develop the disease by the end of the study period and the proportion of those individuals who would have developed the disease at the moment they did even in the absence of the exposure) may be underestimated. Even in the absence of confounding and bias, the observed effect estimate represents only a subset of the etiologic effect. This underestimation occurs regardless of the measure of effect used.To some extent, redundancy of sufficient causes is always present, and under some circumstances, it may make a true cause of disease appear to be not causal. This problem is particularly relevant when the researcher's goal is to characterize the universe of sufficient causes of the disease, identify risk factors for targeted interventions, or construct causal diagrams. In this paper, we use the sufficient component cause model and the disease response type framework to show how redundant causation arises and the factors that determine the extent of its impact on epidemiologic effect measures.

当个体获得两个或两个以上充分病因的成分时，疾病的充分病因就是多余的。在这种情况下，即使没有获得其中一个充分的原因，个人仍然会患病。在一项研究中，当任何个体在观察期内获得一个以上充分原因的组成部分时，暴露的病因效应(定义为在研究期结束时患病的暴露者比例与即使没有暴露也会患病的暴露者比例之间的绝对或相对差异)可能被低估。即使在没有混杂和偏倚的情况下，观察到的效应估计也只代表了病原学效应的一个子集。不管所使用的效果如何，这种低估都会发生。在某种程度上，充分原因的冗余总是存在的，在某些情况下，它可能使疾病的真正原因看起来不是因果的。当研究人员的目标是描述疾病的充分原因，确定有针对性干预的风险因素或构建因果图时，这个问题尤其相关。在本文中，我们使用充分成分原因模型和疾病反应类型框架来显示冗余因果关系是如何产生的，以及决定其对流行病学效应测量影响程度的因素。

{"title":"Redundant causation from a sufficient cause perspective.","authors":"Nicolle M Gatto, Ulka B Campbell","doi":"10.1186/1742-5573-7-5","DOIUrl":"https://doi.org/10.1186/1742-5573-7-5","url":null,"abstract":" Sufficient causes of disease are redundant when an individual acquires the components of two or more sufficient causes. In this circumstance, the individual still would have become diseased even if one of the sufficient causes had not been acquired. In the context of a study, when any individuals acquire components of more than one sufficient cause over the observation period, the etiologic effect of the exposure (defined as the absolute or relative difference between the proportion of the exposed who develop the disease by the end of the study period and the proportion of those individuals who would have developed the disease at the moment they did even in the absence of the exposure) may be underestimated. Even in the absence of confounding and bias, the observed effect estimate represents only a subset of the etiologic effect. This underestimation occurs regardless of the measure of effect used.To some extent, redundancy of sufficient causes is always present, and under some circumstances, it may make a true cause of disease appear to be not causal. This problem is particularly relevant when the researcher's goal is to characterize the universe of sufficient causes of the disease, identify risk factors for targeted interventions, or construct causal diagrams. In this paper, we use the sufficient component cause model and the disease response type framework to show how redundant causation arises and the factors that determine the extent of its impact on epidemiologic effect measures.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2010-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29160710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Fitting additive Poisson models. 拟合加性泊松模型。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-07-20 DOI: 10.1186/1742-5573-7-4

Hendriek C Boshuizen, Edith Jm Feskens

This paper describes how to fit an additive Poisson model using standard software. It is illustrated with SAS code, but can be similarly used for other software packages.

本文介绍了如何用标准软件拟合加性泊松模型。它是用SAS代码说明的，但也可以类似地用于其他软件包。

引用次数: 39

Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China. 利用因果推理中的变量重要性度量对中国农村地区血吸虫病感染风险因素进行排序。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-07-14 DOI: 10.1186/1742-5573-7-3

Sylvia Ek Sudat, Elizabeth J Carlton, Edmund Yw Seto, Robert C Spear, Alan E Hubbard

Background: Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter.

Results: The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%).

Conclusions: The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.

背景：通过接触受污染的水而感染血吸虫病是一个全球性的公共卫生问题。本文分析了一项回顾性研究的数据，该研究报告了中国农村 1011 人的水接触和血吸虫病感染状况。我们通过比较以下三种分析方法，提出了识别风险因素的半参数方法：以预测为重点的机器学习算法、简单的主效应多变量回归和受因果人口干预参数启发的半参数变量重要性（VI）估计：多变量回归发现，只有工具清洗与结果相关，相对风险为 1.03，95% 置信区间（CI）为 1.01-1.05。在半参数 VI 分析中，发现三种类型的水接触与结果有关：七月接触水（VI 估计值 0.16，95% CI 0.11-0.22）、清洗工具接触水（VI 估计值 0.88，95% CI 0.80-0.97）和插秧接触水（VI 估计值 0.71，95% CI 0.53-0.96）。7月份的VI结果尤其显示出与感染状况的密切联系--其因果关系解释意味着，如果7月份不接触水，我们研究人群中的血吸虫病流行率将降低84%，即从0.3降至0.05（95% CI为78%-89%）：7月VI估计值表明血吸虫病感染风险在季节内可能存在变化，而回归分析并未发现这种关联。尽管这项研究存在许多局限性，从而削弱了对因果关系进行解释的可能性，但如果能在接近实时的情况下检测到高风险时段，就能提供新的预防方案。最重要的是，我们强调，传统的回归方法通常是基于任意的预设模型，因此在实际应用中很难解释其参数。我们的研究结果支持分析方法的实际应用，相比之下，这些方法不需要任意预设模型，估算出的参数具有简单的公共卫生解释，并且在应用推论时将模型选择视为变异的来源。

{"title":"Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China.","authors":"Sylvia Ek Sudat, Elizabeth J Carlton, Edmund Yw Seto, Robert C Spear, Alan E Hubbard","doi":"10.1186/1742-5573-7-3","DOIUrl":"10.1186/1742-5573-7-3","url":null,"abstract":"Background: Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter.Results: The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%).Conclusions: The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2010-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2913928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29119947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can we use biomarkers in combination with self-reports to strengthen the analysis of nutritional epidemiologic studies? 我们是否可以将生物标志物与自我报告相结合来加强营养流行病学研究的分析?

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-01-20 DOI: 10.1186/1742-5573-7-2

Laurence S Freedman, Victor Kipnis, Arthur Schatzkin, Natasa Tasevska, Nancy Potischman

Identifying diet-disease relationships in nutritional cohort studies is plagued by the measurement error in self-reported intakes. The authors propose using biomarkers known to be correlated with dietary intake, so as to strengthen analyses of diet-disease hypotheses. The authors consider combining self-reported intakes and biomarker levels using principal components, Howe's method, or a joint statistical test of effects in a bivariate model. They compared the statistical power of these methods with that of conventional univariate analyses of self-reported intake or of biomarker level. They used computer simulation of different disease risk models, with input parameters based on data from the literature on the relationship between lutein intake and age-related macular degeneration. The results showed that if the dietary effect on disease was fully mediated through the biomarker level, then the univariate analysis of the biomarker was the most powerful approach. However, combination methods, particularly principal components and Howe's method, were not greatly inferior in this situation, and were as good as, or better than, univariate biomarker analysis if mediation was only partial or non-existent. In some circumstances sample size requirements were reduced to 20-50% of those required for conventional analyses of self-reported intake. The authors conclude that (i) including biomarker data in addition to the usual dietary data in a cohort could greatly strengthen the investigation of diet-disease relationships, and (ii) when the extent of mediation through the biomarker is unknown, use of principal components or Howe's method appears a good strategy.

在营养队列研究中确定饮食与疾病的关系受到自我报告摄入量测量误差的困扰。作者建议使用已知的与饮食摄入相关的生物标志物，以加强对饮食疾病假设的分析。作者考虑将自我报告的摄入量和生物标志物水平结合使用主成分，Howe的方法，或在双变量模型中对效果进行联合统计检验。他们将这些方法的统计能力与传统的自我报告摄入量或生物标志物水平的单变量分析进行了比较。他们使用计算机模拟不同的疾病风险模型，输入参数基于叶黄素摄入量与年龄相关性黄斑变性之间关系的文献数据。结果表明，如果饮食对疾病的影响完全通过生物标志物水平介导，那么生物标志物的单变量分析是最有效的方法。然而，组合方法，特别是主成分和Howe的方法，在这种情况下并不差很多，如果中介只是部分或不存在，则与单变量生物标志物分析一样好，甚至更好。在某些情况下，样本量要求减少到自我报告摄入量的传统分析所需样本量的20-50%。作者得出结论:(1)在队列中除了常规的饮食数据外，还包括生物标志物数据可以大大加强对饮食-疾病关系的调查，(2)当生物标志物的调节程度未知时，使用主成分或Howe的方法似乎是一个很好的策略。

{"title":"Can we use biomarkers in combination with self-reports to strengthen the analysis of nutritional epidemiologic studies?","authors":"Laurence S Freedman, Victor Kipnis, Arthur Schatzkin, Natasa Tasevska, Nancy Potischman","doi":"10.1186/1742-5573-7-2","DOIUrl":"https://doi.org/10.1186/1742-5573-7-2","url":null,"abstract":"Identifying diet-disease relationships in nutritional cohort studies is plagued by the measurement error in self-reported intakes. The authors propose using biomarkers known to be correlated with dietary intake, so as to strengthen analyses of diet-disease hypotheses. The authors consider combining self-reported intakes and biomarker levels using principal components, Howe's method, or a joint statistical test of effects in a bivariate model. They compared the statistical power of these methods with that of conventional univariate analyses of self-reported intake or of biomarker level. They used computer simulation of different disease risk models, with input parameters based on data from the literature on the relationship between lutein intake and age-related macular degeneration. The results showed that if the dietary effect on disease was fully mediated through the biomarker level, then the univariate analysis of the biomarker was the most powerful approach. However, combination methods, particularly principal components and Howe's method, were not greatly inferior in this situation, and were as good as, or better than, univariate biomarker analysis if mediation was only partial or non-existent. In some circumstances sample size requirements were reduced to 20-50% of those required for conventional analyses of self-reported intake. The authors conclude that (i) including biomarker data in addition to the usual dietary data in a cohort could greatly strengthen the investigation of diet-disease relationships, and (ii) when the extent of mediation through the biomarker is unknown, use of principal components or Howe's method appears a good strategy.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2010-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28734975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

A method to predict breast cancer stage using Medicare claims. 一种利用医疗保险索赔预测乳腺癌分期的方法。

Epidemiologic perspectives & innovations : EP+I

Pub Date : 2010-01-15 DOI: 10.1186/1742-5573-7-1

Grace L Smith, Ya-Chen T Shih, Sharon H Giordano, Benjamin D Smith, Thomas A Buchholz

Background: In epidemiologic studies, cancer stage is an important predictor of outcomes. However, cancer stage is typically unavailable in medical insurance claims datasets, thus limiting the usefulness of such data for epidemiologic studies. Therefore, we sought to develop an algorithm to predict cancer stage based on covariates available from claims-based data.

Methods: We identified a cohort of 77,306 women age >/= 66 years with stage I-IV breast cancer, using the Surveillence Epidemiology and End Results (SEER)-Medicare database. We formulated an algorithm to predict cancer stage using covariates (demographic, tumor, and treatment characteristics) obtained from claims. Logistic regression models derived prediction equations in a training set, and equations' test characteristics (sensitivity, specificity, positive predictive value (PPV), and negative predictive value [NPV]) were calculated in a validation set.

Results: Of the entire sample of women diagnosed with invasive breast cancer, 51% had stage I; 26% stage II; 11% stage III; and 4% stage IV disease. The equation predicting stage IV disease achieved sensitivity of 81%, specificity 89%, positive predictive value (PPV) 24%, and negative predictive value (NPV) 99%, while the equation distinguishing stage I/II from stage III disease achieved sensitivity 83%, specificity 78%, PPV 98%, and NPV 31%. Combined, the equations most accurately identified early stage disease and ascertained a sample in which 98% of patients were stage I or II.

Conclusions: A claims-based algorithm was utilized to predict breast cancer stage, and was particularly successful when used to identify early stage disease. These prediction equations may be applied in future studies of breast cancer patients, substantially improving the utility of claims-based studies in this group. This method may similarly be employed to develop algorithms permitting claims-based epidemiologic studies of patients with other cancers.

背景:在流行病学研究中，癌症分期是预后的重要预测指标。然而，在医疗保险索赔数据集中通常无法获得癌症阶段，从而限制了此类数据对流行病学研究的有用性。因此，我们试图开发一种基于索赔数据中可用协变量的算法来预测癌症分期。方法:使用监测流行病学和最终结果(SEER)-Medicare数据库，我们确定了77,306名年龄>/= 66岁的I-IV期乳腺癌女性队列。我们制定了一种算法，利用从索赔中获得的协变量(人口统计学、肿瘤和治疗特征)来预测癌症分期。逻辑回归模型在训练集中推导预测方程，在验证集中计算方程的检验特征(敏感性、特异性、阳性预测值(PPV)和阴性预测值(NPV))。结果:在所有被诊断为浸润性乳腺癌的女性样本中，51%为I期;26%为II期;11%为第三阶段;4%是IV期疾病。预测IV期疾病的方程灵敏度为81%，特异性为89%，阳性预测值(PPV)为24%，阴性预测值(NPV)为99%，而区分I/II期和III期疾病的方程灵敏度为83%，特异性为78%，PPV为98%，NPV为31%。结合起来，这些方程最准确地识别了早期疾病，并确定了98%的患者处于I期或II期的样本。结论:一种基于索赔的算法被用于预测乳腺癌的分期，并且在用于识别早期疾病时特别成功。这些预测方程可以应用于未来对乳腺癌患者的研究，大大提高基于索赔的研究在该组中的效用。这种方法可以类似地用于开发算法，允许对其他癌症患者进行基于索赔的流行病学研究。

{"title":"A method to predict breast cancer stage using Medicare claims.","authors":"Grace L Smith, Ya-Chen T Shih, Sharon H Giordano, Benjamin D Smith, Thomas A Buchholz","doi":"10.1186/1742-5573-7-1","DOIUrl":"https://doi.org/10.1186/1742-5573-7-1","url":null,"abstract":"Background: In epidemiologic studies, cancer stage is an important predictor of outcomes. However, cancer stage is typically unavailable in medical insurance claims datasets, thus limiting the usefulness of such data for epidemiologic studies. Therefore, we sought to develop an algorithm to predict cancer stage based on covariates available from claims-based data.Methods: We identified a cohort of 77,306 women age >/= 66 years with stage I-IV breast cancer, using the Surveillence Epidemiology and End Results (SEER)-Medicare database. We formulated an algorithm to predict cancer stage using covariates (demographic, tumor, and treatment characteristics) obtained from claims. Logistic regression models derived prediction equations in a training set, and equations' test characteristics (sensitivity, specificity, positive predictive value (PPV), and negative predictive value [NPV]) were calculated in a validation set.Results: Of the entire sample of women diagnosed with invasive breast cancer, 51% had stage I; 26% stage II; 11% stage III; and 4% stage IV disease. The equation predicting stage IV disease achieved sensitivity of 81%, specificity 89%, positive predictive value (PPV) 24%, and negative predictive value (NPV) 99%, while the equation distinguishing stage I/II from stage III disease achieved sensitivity 83%, specificity 78%, PPV 98%, and NPV 31%. Combined, the equations most accurately identified early stage disease and ascertained a sample in which 98% of patients were stage I or II.Conclusions: A claims-based algorithm was utilized to predict breast cancer stage, and was particularly successful when used to identify early stage disease. These prediction equations may be applied in future studies of breast cancer patients, substantially improving the utility of claims-based studies in this group. This method may similarly be employed to develop algorithms permitting claims-based epidemiologic studies of patients with other cancers.","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"7 ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2010-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-7-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28705499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38