Lawrence C McCandless, Paul Gustafson, Peter C Austin, Adrian R Levy
Regression adjustment for the propensity score is a statistical method that reduces confounding from measured variables in observational data. A Bayesian propensity score analysis extends this idea by using simultaneous estimation of the propensity scores and the treatment effect. In this article, we conduct an empirical investigation of the performance of Bayesian propensity scores in the context of an observational study of the effectiveness of beta-blocker therapy in heart failure patients. We study the balancing properties of the estimated propensity scores. Traditional Frequentist propensity scores focus attention on balancing covariates that are strongly associated with treatment. In contrast, we demonstrate that Bayesian propensity scores can be used to balance the association between covariates and the outcome. This balancing property has the effect of reducing confounding bias because it reduces the degree to which covariates are outcome risk factors.
{"title":"Covariate balance in a Bayesian propensity score analysis of beta blocker therapy in heart failure patients.","authors":"Lawrence C McCandless, Paul Gustafson, Peter C Austin, Adrian R Levy","doi":"10.1186/1742-5573-6-5","DOIUrl":"https://doi.org/10.1186/1742-5573-6-5","url":null,"abstract":"<p><p>Regression adjustment for the propensity score is a statistical method that reduces confounding from measured variables in observational data. A Bayesian propensity score analysis extends this idea by using simultaneous estimation of the propensity scores and the treatment effect. In this article, we conduct an empirical investigation of the performance of Bayesian propensity scores in the context of an observational study of the effectiveness of beta-blocker therapy in heart failure patients. We study the balancing properties of the estimated propensity scores. Traditional Frequentist propensity scores focus attention on balancing covariates that are strongly associated with treatment. In contrast, we demonstrate that Bayesian propensity scores can be used to balance the association between covariates and the outcome. This balancing property has the effect of reducing confounding bias because it reduces the degree to which covariates are outcome risk factors.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"6 ","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2009-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-6-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28392894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are pleased to publish an update to "Identifiabiliity, exchangeability and epidemiological confounding" (IEEC) by Sander Greenland and James Robins, originally published in 1986 in the International Journal of Epidemiology. This is the first in a series of updates to classic epidemiologic-methods papers that EP&I has commissioned.
{"title":"Update: Greenland and Robins (1986). Identifiability, exchangeability and epidemiological confounding.","authors":"George Maldonado","doi":"10.1186/1742-5573-6-3","DOIUrl":"https://doi.org/10.1186/1742-5573-6-3","url":null,"abstract":"<p><p>We are pleased to publish an update to \"Identifiabiliity, exchangeability and epidemiological confounding\" (IEEC) by Sander Greenland and James Robins, originally published in 1986 in the International Journal of Epidemiology. This is the first in a series of updates to classic epidemiologic-methods papers that EP&I has commissioned.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"6 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2009-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-6-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28384047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 1986 the International Journal of Epidemiology published "Identifiability, Exchangeability and Epidemiological Confounding". We review the article from the perspective of a quarter century after it was first drafted and relate it to subsequent developments on confounding, ignorability, and collapsibility.
{"title":"Identifiability, exchangeability and confounding revisited.","authors":"Sander Greenland, James M Robins","doi":"10.1186/1742-5573-6-4","DOIUrl":"https://doi.org/10.1186/1742-5573-6-4","url":null,"abstract":"<p><p>In 1986 the International Journal of Epidemiology published \"Identifiability, Exchangeability and Epidemiological Confounding\". We review the article from the perspective of a quarter century after it was first drafted and relate it to subsequent developments on confounding, ignorability, and collapsibility.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"6 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2009-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-6-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28384048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as "aspects of [statistical] associations", have an indispensible part to play in the goal of making justified causal claims.
{"title":"The role of causal criteria in causal inferences: Bradford Hill's \"aspects of association\".","authors":"Andrew C Ward","doi":"10.1186/1742-5573-6-2","DOIUrl":"10.1186/1742-5573-6-2","url":null,"abstract":"<p><p>As noted by Wesley Salmon and many others, causal concepts are ubiquitous in every branch of theoretical science, in the practical disciplines and in everyday life. In the theoretical and practical sciences especially, people often base claims about causal relations on applications of statistical methods to data. However, the source and type of data place important constraints on the choice of statistical methods as well as on the warrant attributed to the causal claims based on the use of such methods. For example, much of the data used by people interested in making causal claims come from non-experimental, observational studies in which random allocations to treatment and control groups are not present. Thus, one of the most important problems in the social and health sciences concerns making justified causal inferences using non-experimental, observational data. In this paper, I examine one method of justifying such inferences that is especially widespread in epidemiology and the health sciences generally - the use of causal criteria. I argue that while the use of causal criteria is not appropriate for either deductive or inductive inferences, they do have an important role to play in inferences to the best explanation. As such, causal criteria, exemplified by what Bradford Hill referred to as \"aspects of [statistical] associations\", have an indispensible part to play in the goal of making justified causal claims.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"6 ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2009-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2706236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28250041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One possibility for the statistical evaluation of trends in epidemiological exposure studies is the use of a trend test for data organized in a 2 x k contingency table. Commonly, the exposure data are naturally grouped or continuous exposure data are appropriately categorized. The trend test should be sensitive to any shape of the exposure-response relationship. Commonly, a global trend test only determines whether there is a trend or not. Once a trend is seen it is important to identify the likely shape of the exposure-response relationship. This paper introduces a best contrast approach and an alternative approach based on order-restricted information criteria for the model selection of a particular exposure-response relationship. For the simple change point alternative H1 : pi1 = ...= piq
{"title":"Trend tests for the evaluation of exposure-response relationships in epidemiological exposure studies.","authors":"Ludwig A Hothorn, Michael Vaeth, Torsten Hothorn","doi":"10.1186/1742-5573-6-1","DOIUrl":"10.1186/1742-5573-6-1","url":null,"abstract":"<p><p>One possibility for the statistical evaluation of trends in epidemiological exposure studies is the use of a trend test for data organized in a 2 x k contingency table. Commonly, the exposure data are naturally grouped or continuous exposure data are appropriately categorized. The trend test should be sensitive to any shape of the exposure-response relationship. Commonly, a global trend test only determines whether there is a trend or not. Once a trend is seen it is important to identify the likely shape of the exposure-response relationship. This paper introduces a best contrast approach and an alternative approach based on order-restricted information criteria for the model selection of a particular exposure-response relationship. For the simple change point alternative H1 : pi1 = ...= piq <piq+1 = ... = pik an appropriate approach for the identification of a global trend as well as for the most likely shape of that exposure-response relationship is characterized by simulation and demonstrated for real data examples. Power and simultaneous confidence intervals can be estimated as well. If the conditions are fulfilled to transform the exposure-response data into a 2 x k table, a simple approach for identification of a global trend and its elementary shape is available for epidemiologists.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"6 ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2009-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-6-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28025678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cathy L Backinger, Deirdre Lawrence, Judith Swan, Deborah M Winn, Nancy Breen, Anne Hartman, Rachel Grana, David Tran, Samantha Farrell
Objective: The National Health Interview Survey (NHIS) is a continuous, nationwide, household interview survey of the civilian noninstitutionalized population of the United States. This annual survey is conducted by the National Center for Health Statistics, part of the Centers for Disease Control and Prevention. Since 1965, the survey and its supplements have provided data on issues related to the use of cigarettes and other tobacco products. This paper describes the survey, provides an overview of peer-reviewed and government-issued research that uses tobacco-related data from the NHIS, and suggests additional areas for exploration and directions for future research.
Data sources: We performed literature searches using the PubMed database, selecting articles from 1966 to 2008. Study selection. Inclusion criteria were relevancy to tobacco research and primary use of NHIS data; 117 articles met these criteria. Data extraction and synthesis. Tobacco-related data from the NHIS have been used to analyze smoking prevalence and trends; attitudes, knowledge, and beliefs; initiation; cessation and advice to quit; health care practices; health consequences; secondhand smoke exposure; and use of smokeless tobacco. To date, use of these data has had broad application; however, great potential still exists for additional use.
Conclusion: NHIS data provide information that can be useful to both practitioners and researchers. It is important to explore new and creative ways to best use these data and to address the full range of salient tobacco-related topics. Doing so will better inform future tobacco control research and programs.
{"title":"Using the National Health Interview Survey to understand and address the impact of tobacco in the United States: past perspectives and future considerations.","authors":"Cathy L Backinger, Deirdre Lawrence, Judith Swan, Deborah M Winn, Nancy Breen, Anne Hartman, Rachel Grana, David Tran, Samantha Farrell","doi":"10.1186/1742-5573-5-8","DOIUrl":"10.1186/1742-5573-5-8","url":null,"abstract":"<p><strong>Objective: </strong>The National Health Interview Survey (NHIS) is a continuous, nationwide, household interview survey of the civilian noninstitutionalized population of the United States. This annual survey is conducted by the National Center for Health Statistics, part of the Centers for Disease Control and Prevention. Since 1965, the survey and its supplements have provided data on issues related to the use of cigarettes and other tobacco products. This paper describes the survey, provides an overview of peer-reviewed and government-issued research that uses tobacco-related data from the NHIS, and suggests additional areas for exploration and directions for future research.</p><p><strong>Data sources: </strong>We performed literature searches using the PubMed database, selecting articles from 1966 to 2008. Study selection. Inclusion criteria were relevancy to tobacco research and primary use of NHIS data; 117 articles met these criteria. Data extraction and synthesis. Tobacco-related data from the NHIS have been used to analyze smoking prevalence and trends; attitudes, knowledge, and beliefs; initiation; cessation and advice to quit; health care practices; health consequences; secondhand smoke exposure; and use of smokeless tobacco. To date, use of these data has had broad application; however, great potential still exists for additional use.</p><p><strong>Conclusion: </strong>NHIS data provide information that can be useful to both practitioners and researchers. It is important to explore new and creative ways to best use these data and to address the full range of salient tobacco-related topics. Doing so will better inform future tobacco control research and programs.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"5 ","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2008-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2627846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27878521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Many epidemiological methods for analysing follow-up studies require the calculation of rates based on accumulating person-time and events, stratified by various factors. Managing this stratification and accumulation is often the most difficult aspect of this type of analysis.
Tutorial: We provide a tutorial on accumulating person-time and events, stratified by various factors i.e. creating event-time tables. We show how to efficiently generate event-time tables for many different outcomes simultaneously. We also provide a new vocabulary to characterise and differentiate time-varying factors. The tutorial is focused on using a SAS macro to perform most of the common tasks in the creation of event-time tables. All the most common types of time-varying covariates can be generated and categorised by the macro. It can also provide output suitable for other types of survival analysis (e.g. Cox regression). The aim of our methodology is to support the creation of bug-free, readable, efficient, capable and easily modified programs for making event-time tables. We briefly compare analyses based on event-time tables with Cox regression and nested case-control studies for the analysis of follow-up data.
Conclusion: Anyone working with time-varying covariates, particularly from large detailed person-time data sets, would gain from having these methods in their programming toolkit.
简介许多流行病学方法在分析随访研究时,都需要根据按各种因素分层的人时和事件的累积来计算比率。管理这种分层和累积往往是这类分析最困难的方面:我们将为您提供有关按各种因素分层累计人时和事件的教程,即创建事件时间表。我们展示了如何同时有效地生成多种不同结果的事件时间表。我们还提供了一个新的词汇来描述和区分时变因素。本教程的重点是使用 SAS 宏来执行创建事件时间表中的大部分常见任务。所有最常见的时变协变量类型都可以通过宏生成和分类。它还能提供适用于其他类型生存分析(如 Cox 回归)的输出结果。我们的方法旨在支持创建无错误、可读性强、高效、有能力且易于修改的程序,用于制作事件-时间表格。我们简要比较了基于事件时间表的分析与用于分析随访数据的 Cox 回归和嵌套病例对照研究:结论:任何处理时变协变量的人,尤其是来自大型详细个人时间数据集的人,都会从他们的编程工具包中获得这些方法。
{"title":"Methods for stratification of person-time and events - a prerequisite for Poisson regression and SIR estimation.","authors":"Klaus Rostgaard","doi":"10.1186/1742-5573-5-7","DOIUrl":"10.1186/1742-5573-5-7","url":null,"abstract":"<p><strong>Introduction: </strong>Many epidemiological methods for analysing follow-up studies require the calculation of rates based on accumulating person-time and events, stratified by various factors. Managing this stratification and accumulation is often the most difficult aspect of this type of analysis.</p><p><strong>Tutorial: </strong>We provide a tutorial on accumulating person-time and events, stratified by various factors i.e. creating event-time tables. We show how to efficiently generate event-time tables for many different outcomes simultaneously. We also provide a new vocabulary to characterise and differentiate time-varying factors. The tutorial is focused on using a SAS macro to perform most of the common tasks in the creation of event-time tables. All the most common types of time-varying covariates can be generated and categorised by the macro. It can also provide output suitable for other types of survival analysis (e.g. Cox regression). The aim of our methodology is to support the creation of bug-free, readable, efficient, capable and easily modified programs for making event-time tables. We briefly compare analyses based on event-time tables with Cox regression and nested case-control studies for the analysis of follow-up data.</p><p><strong>Conclusion: </strong>Anyone working with time-varying covariates, particularly from large detailed person-time data sets, would gain from having these methods in their programming toolkit.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"5 ","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2008-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2615420/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27843557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Case-case analysis of enteric diseases with routine surveillance data: Potential use and example results","authors":"N. Wilson, M. Baker, R. Edwards, G. Simmons","doi":"10.1186/1742-5573-5-6","DOIUrl":"https://doi.org/10.1186/1742-5573-5-6","url":null,"abstract":"","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"11 1","pages":"6 - 6"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83374542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: While the population attributable fraction (PAF) provides potentially valuable information regarding the community-level effect of risk factors, significant limitations exist with current strategies for estimating a PAF in multiple risk factor models. These strategies can result in paradoxical or ambiguous measures of effect, or require unrealistic assumptions regarding variables in the model. A method is proposed in which an overall or total PAF across multiple risk factors is partitioned into components based upon a sequential ordering of effects. This method is applied to several hypothetical data sets in order to demonstrate its application and interpretation in diverse analytic situations.
Results: The proposed method is demonstrated to provide clear and interpretable measures of effect, even when risk factors are related/correlated and/or when risk factors interact. Furthermore, this strategy not only addresses, but also quantifies issues raised by other researchers who have noted the potential impact of population-shifts on population-level effects in multiple risk factor models.
Conclusion: Combined with simple, unadjusted PAF estimates and an aggregate PAF based on all risk factors under consideration, the sequentially partitioned PAF provides valuable additional information regarding the process through which population rates of a disorder may be impacted. In addition, the approach can also be used to statistically control for confounding by other variables, while avoiding the potential pitfalls of attempting to separately differentiate direct and indirect effects.
{"title":"Partitioning the population attributable fraction for a sequential chain of effects.","authors":"Craig A Mason, Shihfen Tu","doi":"10.1186/1742-5573-5-5","DOIUrl":"https://doi.org/10.1186/1742-5573-5-5","url":null,"abstract":"<p><strong>Background: </strong>While the population attributable fraction (PAF) provides potentially valuable information regarding the community-level effect of risk factors, significant limitations exist with current strategies for estimating a PAF in multiple risk factor models. These strategies can result in paradoxical or ambiguous measures of effect, or require unrealistic assumptions regarding variables in the model. A method is proposed in which an overall or total PAF across multiple risk factors is partitioned into components based upon a sequential ordering of effects. This method is applied to several hypothetical data sets in order to demonstrate its application and interpretation in diverse analytic situations.</p><p><strong>Results: </strong>The proposed method is demonstrated to provide clear and interpretable measures of effect, even when risk factors are related/correlated and/or when risk factors interact. Furthermore, this strategy not only addresses, but also quantifies issues raised by other researchers who have noted the potential impact of population-shifts on population-level effects in multiple risk factor models.</p><p><strong>Conclusion: </strong>Combined with simple, unadjusted PAF estimates and an aggregate PAF based on all risk factors under consideration, the sequentially partitioned PAF provides valuable additional information regarding the process through which population rates of a disorder may be impacted. In addition, the approach can also be used to statistically control for confounding by other variables, while avoiding the potential pitfalls of attempting to separately differentiate direct and indirect effects.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"5 ","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2008-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-5-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27709043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pascal Wild, Nadine Andrieu, Alisa M Goldstein, Walter Schill
The two-phase design consists of an initial (Phase One) study with known disease status and inexpensive covariate information. Within this initial study one selects a subsample on which to collect detailed covariate data. Two-phase studies have been shown to be efficient compared to standard case-control designs. However, potential problems arise if one cannot assure minimum sample sizes in the rarest categories or if recontact of subjects is difficult. In the case of a rare exposure with an inexpensive proxy, the authors propose the flexible two-phase design for which there is a single time of contact, at which a decision about full covariate ascertainment is made based on the proxy. Subjects are screened until the desired numbers of cases and controls have been selected for full data collection. Strategies for optimizing the cost/efficiency of this design and corresponding software are presented. The design is applied to two examples from occupational and genetic epidemiology. By ensuring minimum numbers for the rarest disease-covariate combination(s), we obtain considerable efficiency gains over standard two-phase studies with an improved practical feasibility. The flexible two-phase design may be the design of choice in the case of well targeted studies of the effect of rare exposures with an inexpensive proxy.
{"title":"Flexible Two-Phase studies for rare exposures: Feasibility, planning and efficiency issues of a new variant.","authors":"Pascal Wild, Nadine Andrieu, Alisa M Goldstein, Walter Schill","doi":"10.1186/1742-5573-5-4","DOIUrl":"https://doi.org/10.1186/1742-5573-5-4","url":null,"abstract":"<p><p>The two-phase design consists of an initial (Phase One) study with known disease status and inexpensive covariate information. Within this initial study one selects a subsample on which to collect detailed covariate data. Two-phase studies have been shown to be efficient compared to standard case-control designs. However, potential problems arise if one cannot assure minimum sample sizes in the rarest categories or if recontact of subjects is difficult. In the case of a rare exposure with an inexpensive proxy, the authors propose the flexible two-phase design for which there is a single time of contact, at which a decision about full covariate ascertainment is made based on the proxy. Subjects are screened until the desired numbers of cases and controls have been selected for full data collection. Strategies for optimizing the cost/efficiency of this design and corresponding software are presented. The design is applied to two examples from occupational and genetic epidemiology. By ensuring minimum numbers for the rarest disease-covariate combination(s), we obtain considerable efficiency gains over standard two-phase studies with an improved practical feasibility. The flexible two-phase design may be the design of choice in the case of well targeted studies of the effect of rare exposures with an inexpensive proxy.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"5 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5573-5-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27706637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}