Pub Date : 2018-03-05eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0074-x
Miguel Marino, Marcello Pagano
Background: Nationally-representative surveys suggest that females have a higher prevalence of HIV than males in most African countries. Unfortunately, these results are made on the basis of surveys with non-ignorable missing data. This study evaluates the impact that differential survey nonresponse rates between males and females can have on the point estimate of the HIV prevalence ratio of these two classifiers.
Methods: We study 29 Demographic and Health Surveys (DHS) from 2001 to 2010. Instead of employing often used multiple imputation models with a Missing at Random assumption that may not hold in this setting, we assess the effect of ignoring the information contained in the missing HIV information for males and females through three proposed statistical measures. These measures can be used in settings where the interest is comparing the prevalence of a disease between two groups. The proposed measures do not utilize parametric models and can be implemented by researchers of any level. They are: (1) an upper bound on the potential bias of the usual practise of using reported HIV prevalence estimates that ignore subjects who have missing HIV outcomes. (2) Plausible range intervals to account for nonresponses, without any additional parametric modeling assumptions. (3) Prevalence ratio inflation factors to correct the point estimate of the HIV prevalence ratio, if estimates of nonresponders' HIV prevalences were known.
Results: In 86% of countries, males have higher upper bounds of HIV prevalence than females, this is consonant with males possibly having higher infection rates than females. Additionally, 74% of surveys have a plausible range that crosses 1.0, suggesting a plausible equivalence between male and female HIV prevalences.
Conclusions: It is quite reasonable to conclude that there is so much DHS nonresponse in evaluating the HIV status question, that existing data is plausibly generated by the situation where the virus is equally distributed between the sexes.
背景:具有全国代表性的调查表明,在大多数非洲国家,女性的艾滋病毒感染率高于男性。遗憾的是,这些结果都是在有不可忽略的缺失数据的调查基础上得出的。本研究评估了男性和女性之间不同的调查无应答率对这两种分类方法的 HIV 感染率比值点估算的影响:我们研究了 2001 年至 2010 年的 29 次人口与健康调查(DHS)。我们没有采用通常使用的随机缺失假设的多重估算模型,而是通过三种拟议的统计测量方法来评估忽略男性和女性缺失的 HIV 信息所产生的影响。这些统计量可用于比较两组间疾病流行率的情况。建议的测量方法不使用参数模型,任何水平的研究人员都可以实施。它们是(1) 对使用报告的艾滋病流行率估计值的通常做法的潜在偏差设定上限,这种做法忽略了缺失艾滋病结果的受试者。(2) 合理的范围区间,以考虑到未回复的情况,而无需任何额外的参数建模假设。(3) 如果已知未回复者的艾滋病毒感染率估计值,则采用感染率比率膨胀系数来修正艾滋病毒感染率比率的点估计值:在 86% 的国家中,男性的 HIV 感染率上限高于女性,这与男性的感染率可能高于女性相吻合。此外,74% 的调查的可信范围超过了 1.0,这表明男性和女性的艾滋病感染率之间存在可信的等值关系:在评估 HIV 感染状况的问题时,人口与健康调查中存在大量的无响应情况,因此现有数据可能是由病毒在两性之间平均分布的情况产生的,这一结论是非常合理的。
{"title":"Role of survey response rates on valid inference: an application to HIV prevalence estimates.","authors":"Miguel Marino, Marcello Pagano","doi":"10.1186/s12982-018-0074-x","DOIUrl":"10.1186/s12982-018-0074-x","url":null,"abstract":"<p><strong>Background: </strong>Nationally-representative surveys suggest that females have a higher prevalence of HIV than males in most African countries. Unfortunately, these results are made on the basis of surveys with non-ignorable missing data. This study evaluates the impact that differential survey nonresponse rates between males and females can have on the point estimate of the HIV prevalence ratio of these two classifiers.</p><p><strong>Methods: </strong>We study 29 Demographic and Health Surveys (DHS) from 2001 to 2010. Instead of employing often used multiple imputation models with a Missing at Random assumption that may not hold in this setting, we assess the effect of ignoring the information contained in the missing HIV information for males and females through three proposed statistical measures. These measures can be used in settings where the interest is comparing the prevalence of a disease between two groups. The proposed measures do not utilize parametric models and can be implemented by researchers of any level. They are: (1) an upper bound on the potential bias of the usual practise of using reported HIV prevalence estimates that ignore subjects who have missing HIV outcomes. (2) Plausible range intervals to account for nonresponses, without any additional parametric modeling assumptions. (3) Prevalence ratio inflation factors to correct the point estimate of the HIV prevalence ratio, if estimates of nonresponders' HIV prevalences were known.</p><p><strong>Results: </strong>In 86% of countries, males have higher upper bounds of HIV prevalence than females, this is consonant with males possibly having higher infection rates than females. Additionally, 74% of surveys have a <i>plausible</i> range that crosses 1.0, suggesting a plausible equivalence between male and female HIV prevalences.</p><p><strong>Conclusions: </strong>It is quite reasonable to conclude that there is so much DHS nonresponse in evaluating the HIV status question, that existing data is plausibly generated by the situation where the virus is equally distributed between the sexes.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"6"},"PeriodicalIF":3.6,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5839032/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35903247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-02eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0073-y
Robert W Eyre, Thomas House, F Xavier Gómez-Olivé, Frances E Griffiths
Background: Central to the study of populations, and therefore to the analysis of the development of countries undergoing major transitions, is the calculation of fertility patterns and their dependence on different variables such as age, education, and socio-economic status. Most epidemiological research on these matters rely on the often unjustified assumption of (generalised) linearity, or alternatively makes a parametric assumption (e.g. for age-patterns).
Methods: We consider nonlinearity of fertility in the covariates by combining an established nonlinear parametric model for fertility over age with nonlinear modelling of fertility over other covariates. For the latter, we use the semi-parametric method of Gaussian process regression which is a popular methodology in many fields including machine learning, computer science, and systems biology. We applied the method to data from the Agincourt Health and Socio-Demographic Surveillance System, annual census rounds performed on a poor rural region of South Africa since 1992, to analyse fertility patterns over age and socio-economic status.
Results: We capture a previously established age-pattern of fertility, whilst being able to more robustly model the relationship between fertility and socio-economic status without unjustified a priori assumptions of linearity. Peak fertility over age is shown to be increasing over time, as well as for adolescents but not for those later in life for whom fertility is generally decreasing over time.
Conclusions: Combining Gaussian process regression with nonlinear parametric modelling of fertility over age allowed for the incorporation of further covariates into the analysis without needing to assume a linear relationship. This enabled us to provide further insights into the fertility patterns of the Agincourt study area, in particular the interaction between age and socio-economic status.
{"title":"Modelling fertility in rural South Africa with combined nonlinear parametric and semi-parametric methods.","authors":"Robert W Eyre, Thomas House, F Xavier Gómez-Olivé, Frances E Griffiths","doi":"10.1186/s12982-018-0073-y","DOIUrl":"https://doi.org/10.1186/s12982-018-0073-y","url":null,"abstract":"<p><strong>Background: </strong>Central to the study of populations, and therefore to the analysis of the development of countries undergoing major transitions, is the calculation of fertility patterns and their dependence on different variables such as age, education, and socio-economic status. Most epidemiological research on these matters rely on the often unjustified assumption of (generalised) linearity, or alternatively makes a parametric assumption (e.g. for age-patterns).</p><p><strong>Methods: </strong>We consider nonlinearity of fertility in the covariates by combining an established nonlinear parametric model for fertility over age with nonlinear modelling of fertility over other covariates. For the latter, we use the semi-parametric method of Gaussian process regression which is a popular methodology in many fields including machine learning, computer science, and systems biology. We applied the method to data from the Agincourt Health and Socio-Demographic Surveillance System, annual census rounds performed on a poor rural region of South Africa since 1992, to analyse fertility patterns over age and socio-economic status.</p><p><strong>Results: </strong>We capture a previously established age-pattern of fertility, whilst being able to more robustly model the relationship between fertility and socio-economic status without unjustified a priori assumptions of linearity. Peak fertility over age is shown to be increasing over time, as well as for adolescents but not for those later in life for whom fertility is generally decreasing over time.</p><p><strong>Conclusions: </strong>Combining Gaussian process regression with nonlinear parametric modelling of fertility over age allowed for the incorporation of further covariates into the analysis without needing to assume a linear relationship. This enabled us to provide further insights into the fertility patterns of the Agincourt study area, in particular the interaction between age and socio-economic status.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"5"},"PeriodicalIF":2.3,"publicationDate":"2018-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0073-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35885842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-23eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0072-z
Matthew R Grigsby, Junrui Di, Andrew Leroux, Vadim Zipunnikov, Luo Xiao, Ciprian Crainiceanu, William Checkley
Background: Literature surrounding the statistical modeling of childhood growth data involves a diverse set of potential models from which investigators can choose. However, the lack of a comprehensive framework for comparing non-nested models leads to difficulty in assessing model performance. This paper proposes a framework for comparing non-nested growth models using novel metrics of predictive accuracy based on modifications of the mean squared error criteria.
Methods: Three metrics were created: normalized, age-adjusted, and weighted mean squared error (MSE). Predictive performance metrics were used to compare linear mixed effects models and functional regression models. Prediction accuracy was assessed by partitioning the observed data into training and test datasets. This partitioning was constructed to assess prediction accuracy for backward (i.e., early growth), forward (i.e., late growth), in-range, and on new-individuals. Analyses were done with height measurements from 215 Peruvian children with data spanning from near birth to 2 years of age.
Results: Functional models outperformed linear mixed effects models in all scenarios tested. In particular, prediction errors for functional concurrent regression (FCR) and functional principal component analysis models were approximately 6% lower when compared to linear mixed effects models. When we weighted subject-specific MSEs according to subject-specific growth rates during infancy, we found that FCR was the best performer in all scenarios.
Conclusion: With this novel approach, we can quantitatively compare non-nested models and weight subgroups of interest to select the best performing growth model for a particular application or problem at hand.
{"title":"Novel metrics for growth model selection.","authors":"Matthew R Grigsby, Junrui Di, Andrew Leroux, Vadim Zipunnikov, Luo Xiao, Ciprian Crainiceanu, William Checkley","doi":"10.1186/s12982-018-0072-z","DOIUrl":"10.1186/s12982-018-0072-z","url":null,"abstract":"<p><strong>Background: </strong>Literature surrounding the statistical modeling of childhood growth data involves a diverse set of potential models from which investigators can choose. However, the lack of a comprehensive framework for comparing non-nested models leads to difficulty in assessing model performance. This paper proposes a framework for comparing non-nested growth models using novel metrics of predictive accuracy based on modifications of the mean squared error criteria.</p><p><strong>Methods: </strong>Three metrics were created: normalized, age-adjusted, and weighted mean squared error (MSE). Predictive performance metrics were used to compare linear mixed effects models and functional regression models. Prediction accuracy was assessed by partitioning the observed data into training and test datasets. This partitioning was constructed to assess prediction accuracy for backward (i.e., early growth), forward (i.e., late growth), in-range, and on new-individuals. Analyses were done with height measurements from 215 Peruvian children with data spanning from near birth to 2 years of age.</p><p><strong>Results: </strong>Functional models outperformed linear mixed effects models in all scenarios tested. In particular, prediction errors for functional concurrent regression (FCR) and functional principal component analysis models were approximately 6% lower when compared to linear mixed effects models. When we weighted subject-specific MSEs according to subject-specific growth rates during infancy, we found that FCR was the best performer in all scenarios.</p><p><strong>Conclusion: </strong>With this novel approach, we can quantitatively compare non-nested models and weight subgroups of interest to select the best performing growth model for a particular application or problem at hand.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"4"},"PeriodicalIF":2.3,"publicationDate":"2018-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5824542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35865435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-06eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0070-1
Nandita Perumal, Daniel E Roth, Johnna Perdrizet, Aluísio J D Barros, Iná S Santos, Alicia Matijasevich, Diego G Bassani
Background: Postmenstrual and/or gestational age-corrected age (CA) is required to apply child growth standards to children born preterm (< 37 weeks gestational age). Yet, CA is rarely used in epidemiologic studies in low- and middle-income countries (LMICs), which may bias population estimates of childhood undernutrition. To evaluate the effect of accounting for GA in the application of growth standards, we used GA-specific standards at birth (INTERGROWTH-21st newborn size standards) in conjunction with CA for preterm-born children in the application of World Health Organization Child Growth Standards postnatally (referred to as 'CA' strategy) versus postnatal age for all children, to estimate mean length-for-age (LAZ) and weight-for-age (WAZ) z scores at 0, 3, 12, 24, and 48-months of age in the 2004 Pelotas (Brazil) Birth Cohort.
Results: At birth (n = 4066), mean LAZ was higher and the prevalence of stunting (LAZ < -2) was lower using CA versus postnatal age (mean ± SD): - 0.36 ± 1.19 versus - 0.67 ± 1.32; and 8.3 versus 11.6%, respectively. Odds ratio (OR) and population attributable risk (PAR) of stunting due to preterm birth were attenuated and changed inferences using CA versus postnatal age at birth [OR, 95% confidence interval (CI): 1.32 (95% CI 0.95, 1.82) vs 14.7 (95% CI 11.7, 18.4); PAR 3.1 vs 42.9%]; differences in inferences persisted at 3-months. At 12, 24, and 48-months, preterm birth was associated with stunting, but ORs/PARs remained attenuated using CA compared to postnatal age. Findings were similar for weight-for-age z scores.
Conclusions: Population-based epidemiologic studies in LMICs in which GA is unused or unavailable may overestimate the prevalence of early childhood undernutrition and inflate the fraction of undernutrition attributable to preterm birth.
背景:对早产儿适用儿童生长标准需要月经后年龄和/或胎龄校正年龄(CA)(2004 年佩洛塔斯(巴西)出生队列中 0、3、12、24 和 48 个月时的 z 评分):出生时(n = 4066),平均 LAZ 值较高,发育迟缓的发生率(LAZ z 分数)也较高:在未使用或无法获得 GA 的低收入与中等收入国家开展的基于人口的流行病学研究可能会高估儿童早期营养不良的发生率,并夸大早产造成的营养不良比例。
{"title":"Effect of correcting for gestational age at birth on population prevalence of early childhood undernutrition.","authors":"Nandita Perumal, Daniel E Roth, Johnna Perdrizet, Aluísio J D Barros, Iná S Santos, Alicia Matijasevich, Diego G Bassani","doi":"10.1186/s12982-018-0070-1","DOIUrl":"10.1186/s12982-018-0070-1","url":null,"abstract":"<p><strong>Background: </strong>Postmenstrual and/or gestational age-corrected age (CA) is required to apply child growth standards to children born preterm (< 37 weeks gestational age). Yet, CA is rarely used in epidemiologic studies in low- and middle-income countries (LMICs), which may bias population estimates of childhood undernutrition. To evaluate the effect of accounting for GA in the application of growth standards, we used GA-specific standards at birth (INTERGROWTH-21st newborn size standards) in conjunction with CA for preterm-born children in the application of World Health Organization Child Growth Standards postnatally (referred to as 'CA' strategy) versus postnatal age for all children, to estimate mean length-for-age (LAZ) and weight-for-age (WAZ) <i>z</i> scores at 0, 3, 12, 24, and 48-months of age in the 2004 Pelotas (Brazil) Birth Cohort.</p><p><strong>Results: </strong>At birth (n = 4066), mean LAZ was higher and the prevalence of stunting (LAZ < -2) was lower using CA versus postnatal age (mean ± SD): - 0.36 ± 1.19 versus - 0.67 ± 1.32; and 8.3 versus 11.6%, respectively. Odds ratio (OR) and population attributable risk (PAR) of stunting due to preterm birth were attenuated and changed inferences using CA versus postnatal age at birth [OR, 95% confidence interval (CI): 1.32 (95% CI 0.95, 1.82) vs 14.7 (95% CI 11.7, 18.4); PAR 3.1 vs 42.9%]; differences in inferences persisted at 3-months. At 12, 24, and 48-months, preterm birth was associated with stunting, but ORs/PARs remained attenuated using CA compared to postnatal age. Findings were similar for weight-for-age <i>z</i> scores.</p><p><strong>Conclusions: </strong>Population-based epidemiologic studies in LMICs in which GA is unused or unavailable may overestimate the prevalence of early childhood undernutrition and inflate the fraction of undernutrition attributable to preterm birth.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"3"},"PeriodicalIF":2.3,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35830088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-06eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0071-0
Kate Sabot, Tanya Marchant, Neil Spicer, Della Berhanu, Meenakshi Gautham, Nasir Umar, Joanna Schellenberg
Background: Understanding the context of a health programme is important in interpreting evaluation findings and in considering the external validity for other settings. Public health researchers can be imprecise and inconsistent in their usage of the word "context" and its application to their work. This paper presents an approach to defining context, to capturing relevant contextual information and to using such information to help interpret findings from the perspective of a research group evaluating the effect of diverse innovations on coverage of evidence-based, life-saving interventions for maternal and newborn health in Ethiopia, Nigeria, and India.
Methods: We define "context" as the background environment or setting of any program, and "contextual factors" as those elements of context that could affect implementation of a programme. Through a structured, consultative process, contextual factors were identified while trying to strike a balance between comprehensiveness and feasibility. Thematic areas included demographics and socio-economics, epidemiological profile, health systems and service uptake, infrastructure, education, environment, politics, policy and governance. We outline an approach for capturing and using contextual factors while maximizing use of existing data. Methods include desk reviews, secondary data extraction and key informant interviews. Outputs include databases of contextual factors and summaries of existing maternal and newborn health policies and their implementation. Use of contextual data will be qualitative in nature and may assist in interpreting findings in both quantitative and qualitative aspects of programme evaluation.
Discussion: Applying this approach was more resource intensive than expected, in part because routinely available information was not consistently available across settings and more primary data collection was required than anticipated. Data was used only minimally, partly due to a lack of evaluation results that needed further explanation, but also because contextual data was not available for the precise units of analysis or time periods of interest. We would advise others to consider integrating contextual factors within other data collection activities, and to conduct regular reviews of maternal and newborn health policies. This approach and the learnings from its application could help inform the development of guidelines for the collection and use of contextual factors in public health evaluation.
{"title":"Contextual factors in maternal and newborn health evaluation: a protocol applied in Nigeria, India and Ethiopia.","authors":"Kate Sabot, Tanya Marchant, Neil Spicer, Della Berhanu, Meenakshi Gautham, Nasir Umar, Joanna Schellenberg","doi":"10.1186/s12982-018-0071-0","DOIUrl":"https://doi.org/10.1186/s12982-018-0071-0","url":null,"abstract":"<p><strong>Background: </strong>Understanding the context of a health programme is important in interpreting evaluation findings and in considering the external validity for other settings. Public health researchers can be imprecise and inconsistent in their usage of the word \"context\" and its application to their work. This paper presents an approach to defining context, to capturing relevant contextual information and to using such information to help interpret findings from the perspective of a research group evaluating the effect of diverse innovations on coverage of evidence-based, life-saving interventions for maternal and newborn health in Ethiopia, Nigeria, and India.</p><p><strong>Methods: </strong>We define \"context\" as the background environment or setting of any program, and \"contextual factors\" as those elements of context that could affect implementation of a programme. Through a structured, consultative process, contextual factors were identified while trying to strike a balance between comprehensiveness and feasibility. Thematic areas included demographics and socio-economics, epidemiological profile, health systems and service uptake, infrastructure, education, environment, politics, policy and governance. We outline an approach for capturing and using contextual factors while maximizing use of existing data. Methods include desk reviews, secondary data extraction and key informant interviews. Outputs include databases of contextual factors and summaries of existing maternal and newborn health policies and their implementation. Use of contextual data will be qualitative in nature and may assist in interpreting findings in both quantitative and qualitative aspects of programme evaluation.</p><p><strong>Discussion: </strong>Applying this approach was more resource intensive than expected, in part because routinely available information was not consistently available across settings and more primary data collection was required than anticipated. Data was used only minimally, partly due to a lack of evaluation results that needed further explanation, but also because contextual data was not available for the precise units of analysis or time periods of interest. We would advise others to consider integrating contextual factors within other data collection activities, and to conduct regular reviews of maternal and newborn health policies. This approach and the learnings from its application could help inform the development of guidelines for the collection and use of contextual factors in public health evaluation.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"2"},"PeriodicalIF":2.3,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0071-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35830087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-01-22eCollection Date: 2018-01-01DOI: 10.1186/s12982-018-0069-7
Mette Lise Lousdal
The instrumental variable method has been employed within economics to infer causality in the presence of unmeasured confounding. Emphasising the parallels to randomisation may increase understanding of the underlying assumptions within epidemiology. An instrument is a variable that predicts exposure, but conditional on exposure shows no independent association with the outcome. The random assignment in trials is an example of what would be expected to be an ideal instrument, but instruments can also be found in observational settings with a naturally varying phenomenon e.g. geographical variation, physical distance to facility or physician's preference. The fourth identifying assumption has received less attention, but is essential for the generalisability of estimated effects. The instrument identifies the group of compliers in which exposure is pseudo-randomly assigned leading to exchangeability with regard to unmeasured confounders. Underlying assumptions can only partially be tested empirically and require subject-matter knowledge. Future studies employing instruments should carefully seek to validate all four assumptions, possibly drawing on parallels to randomisation.
{"title":"An introduction to instrumental variable assumptions, validation and estimation.","authors":"Mette Lise Lousdal","doi":"10.1186/s12982-018-0069-7","DOIUrl":"https://doi.org/10.1186/s12982-018-0069-7","url":null,"abstract":"<p><p>The instrumental variable method has been employed within economics to infer causality in the presence of unmeasured confounding. Emphasising the parallels to randomisation may increase understanding of the underlying assumptions within epidemiology. An instrument is a variable that predicts exposure, but conditional on exposure shows no independent association with the outcome. The random assignment in trials is an example of what would be expected to be an ideal instrument, but instruments can also be found in observational settings with a naturally varying phenomenon e.g. geographical variation, physical distance to facility or physician's preference. The fourth identifying assumption has received less attention, but is essential for the generalisability of estimated effects. The instrument identifies the group of <i>compliers</i> in which exposure is pseudo-randomly assigned leading to exchangeability with regard to unmeasured confounders. Underlying assumptions can only partially be tested empirically and require subject-matter knowledge. Future studies employing instruments should carefully seek to validate all four assumptions, possibly drawing on parallels to randomisation.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"1"},"PeriodicalIF":2.3,"publicationDate":"2018-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0069-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35782943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-19eCollection Date: 2017-01-01DOI: 10.1186/s12982-017-0068-0
R P Cornish, J Macleod, J R Carpenter, K Tilling
Background: When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).
Methods: Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.
Results: Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.
Conclusions: In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
{"title":"Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.","authors":"R P Cornish, J Macleod, J R Carpenter, K Tilling","doi":"10.1186/s12982-017-0068-0","DOIUrl":"https://doi.org/10.1186/s12982-017-0068-0","url":null,"abstract":"<p><strong>Background: </strong>When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).</p><p><strong>Methods: </strong>Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.</p><p><strong>Results: </strong>Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.</p><p><strong>Conclusions: </strong>In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"14"},"PeriodicalIF":2.3,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0068-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35682082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-14eCollection Date: 2017-01-01DOI: 10.1186/s12982-017-0067-1
Dan Li, Ruth Keogh, John P Clancy, Rhonda D Szczesniak
Background: Epidemiologic surveillance of lung function is key to clinical care of individuals with cystic fibrosis, but lung function decline is nonlinear and often impacted by acute respiratory events known as pulmonary exacerbations. Statistical models are needed to simultaneously estimate lung function decline while providing risk estimates for the onset of pulmonary exacerbations, in order to identify relevant predictors of declining lung function and understand how these associations could be used to predict the onset of pulmonary exacerbations.
Methods: Using longitudinal lung function (FEV1) measurements and time-to-event data on pulmonary exacerbations from individuals in the United States Cystic Fibrosis Registry, we implemented a flexible semiparametric joint model consisting of a mixed-effects submodel with regression splines to fit repeated FEV1 measurements and a time-to-event submodel for possibly censored data on pulmonary exacerbations. We contrasted this approach with methods currently used in epidemiological studies and highlight clinical implications.
Results: The semiparametric joint model had the best fit of all models examined based on deviance information criterion. Higher starting FEV1 implied more rapid lung function decline in both separate and joint models; however, individualized risk estimates for pulmonary exacerbation differed depending upon model type. Based on shared parameter estimates from the joint model, which accounts for the nonlinear FEV1 trajectory, patients with more positive rates of change were less likely to experience a pulmonary exacerbation (HR per one standard deviation increase in FEV1 rate of change = 0.566, 95% CI 0.516-0.619), and having higher absolute FEV1 also corresponded to lower risk of having a pulmonary exacerbation (HR per one standard deviation increase in FEV1 = 0.856, 95% CI 0.781-0.937). At the population level, both submodels indicated significant effects of birth cohort, socioeconomic status and respiratory infections on FEV1 decline, as well as significant effects of gender, socioeconomic status and birth cohort on pulmonary exacerbation risk.
Conclusions: Through a flexible joint-modeling approach, we provide a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients; we demonstrate how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.
背景:肺功能的流行病学监测是囊性纤维化患者临床护理的关键,但肺功能下降是非线性的,通常受到急性呼吸事件(肺恶化)的影响。需要统计模型来同时估计肺功能下降,同时提供肺急性发作的风险估计,以确定肺功能下降的相关预测因素,并了解如何使用这些关联来预测肺急性发作的发生。方法:使用纵向肺功能(FEV1)测量值和美国囊性纤维化登记册中个体肺恶化的事件发生时间数据,我们实现了一个灵活的半参数联合模型,该模型由一个混合效应子模型组成,该模型具有回归样条,以拟合重复的FEV1测量值,以及一个可能被省略的肺恶化数据的事件发生时间子模型。我们将这种方法与目前流行病学研究中使用的方法进行了对比,并强调了临床意义。结果:半参数联合模型是基于偏差信息准则检验的所有模型中拟合最好的。启动FEV1越高,单独模型和联合模型肺功能下降越快;然而,肺恶化的个体化风险估计因模型类型而异。基于联合模型的共享参数估计,该模型解释了非线性FEV1轨迹,阳性变化率越高的患者越不可能经历肺恶化(FEV1变化率每一个标准差增加的HR = 0.566, 95% CI 0.516-0.619),并且绝对FEV1越高,肺恶化的风险也越低(FEV1每一个标准差增加的HR = 0.856, 95% CI 0.781-0.937)。在人群水平上,两个亚模型均显示出生队列、社会经济地位和呼吸道感染对肺ev1下降有显著影响,性别、社会经济地位和出生队列对肺恶化风险有显著影响。结论:通过灵活的关节建模方法,我们提供了一种同时估计个体患者肺功能轨迹和肺恶化风险的方法;我们展示了这种方法如何为囊性纤维化的临床过程提供了额外的见解,这是使用传统方法无法实现的。
{"title":"Flexible semiparametric joint modeling: an application to estimate individual lung function decline and risk of pulmonary exacerbations in cystic fibrosis.","authors":"Dan Li, Ruth Keogh, John P Clancy, Rhonda D Szczesniak","doi":"10.1186/s12982-017-0067-1","DOIUrl":"https://doi.org/10.1186/s12982-017-0067-1","url":null,"abstract":"<p><strong>Background: </strong>Epidemiologic surveillance of lung function is key to clinical care of individuals with cystic fibrosis, but lung function decline is nonlinear and often impacted by acute respiratory events known as pulmonary exacerbations. Statistical models are needed to simultaneously estimate lung function decline while providing risk estimates for the onset of pulmonary exacerbations, in order to identify relevant predictors of declining lung function and understand how these associations could be used to predict the onset of pulmonary exacerbations.</p><p><strong>Methods: </strong>Using longitudinal lung function (FEV<sub>1</sub>) measurements and time-to-event data on pulmonary exacerbations from individuals in the United States Cystic Fibrosis Registry, we implemented a flexible semiparametric joint model consisting of a mixed-effects submodel with regression splines to fit repeated FEV<sub>1</sub> measurements and a time-to-event submodel for possibly censored data on pulmonary exacerbations. We contrasted this approach with methods currently used in epidemiological studies and highlight clinical implications.</p><p><strong>Results: </strong>The semiparametric joint model had the best fit of all models examined based on deviance information criterion. Higher starting FEV<sub>1</sub> implied more rapid lung function decline in both separate and joint models; however, individualized risk estimates for pulmonary exacerbation differed depending upon model type. Based on shared parameter estimates from the joint model, which accounts for the nonlinear FEV<sub>1</sub> trajectory, patients with more positive rates of change were less likely to experience a pulmonary exacerbation (HR per one standard deviation increase in FEV<sub>1</sub> rate of change = 0.566, 95% CI 0.516-0.619), and having higher absolute FEV<sub>1</sub> also corresponded to lower risk of having a pulmonary exacerbation (HR per one standard deviation increase in FEV<sub>1</sub> = 0.856, 95% CI 0.781-0.937). At the population level, both submodels indicated significant effects of birth cohort, socioeconomic status and respiratory infections on FEV<sub>1</sub> decline, as well as significant effects of gender, socioeconomic status and birth cohort on pulmonary exacerbation risk.</p><p><strong>Conclusions: </strong>Through a flexible joint-modeling approach, we provide a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients; we demonstrate how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"13"},"PeriodicalIF":2.3,"publicationDate":"2017-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0067-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35219501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-21eCollection Date: 2017-01-01DOI: 10.1186/s12982-017-0066-2
Christopher Jarvis, Gian Luca Di Tanna, Daniel Lewis, Neal Alexander, W John Edmunds
Background: Cluster randomised trials (CRTs) often use geographical areas as the unit of randomisation, however explicit consideration of the location and spatial distribution of observations is rare. In many trials, the location of participants will have little importance, however in some, especially against infectious diseases, spillover effects due to participants being located close together may affect trial results. This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results.
Methods: A systematic review of CRTs containing spatial methods, defined as a method that accounts for the structure, location, or relative distances between observations. We searched three sources: Ovid/Medline, Pubmed, and Web of Science databases. Spatial methods were categorised and details of the impact of spatial effects on trial results recorded.
Results: We identified ten papers which met the inclusion criteria, comprising thirteen trials. We found that existing approaches fell into two categories; spatial variables and spatial modelling. The spatial variable approach was most common and involved standard statistical analysis of distance measurements. Spatial modelling is a more sophisticated approach which incorporates the spatial structure of the data within a random effects model. Studies tended to demonstrate the importance of accounting for location and distribution of observations in estimating unbiased effects.
Conclusions: There have been a few attempts to control and estimate spatial effects within the context of human CRTs, but our overall understanding is limited. Although spatial effects may bias trial results, their consideration was usually a supplementary, rather than primary analysis. Further work is required to evaluate and develop the spatial methodologies relevant to a range of CRTs.
背景:聚类随机试验(crt)通常使用地理区域作为随机化的单位,但很少明确考虑观察的位置和空间分布。在许多试验中,参与者的位置并不重要,但在某些试验中,特别是针对传染病的试验,由于参与者位置靠得很近而产生的溢出效应可能会影响试验结果。本综述旨在确定用于crt的空间分析方法,并提高对空间效应对试验结果影响的理解。方法:对包含空间方法的crt进行系统回顾,空间方法定义为解释观测值之间的结构、位置或相对距离的方法。我们搜索了三个来源:Ovid/Medline, Pubmed和Web of Science数据库。对空间方法进行了分类,并详细记录了空间效应对试验结果的影响。结果:我们确定了10篇符合纳入标准的论文,包括13项试验。我们发现现有的方法分为两类;空间变量和空间建模。空间变量方法是最常见的,涉及距离测量的标准统计分析。空间建模是一种更复杂的方法,它将数据的空间结构纳入随机效应模型中。研究倾向于证明在估计无偏效应时考虑观测值的位置和分布的重要性。结论:在人类crt的背景下,已经有一些控制和估计空间效应的尝试,但我们的总体理解是有限的。虽然空间效应可能会使试验结果偏倚,但它们的考虑通常是补充分析,而不是主要分析。需要进一步的工作来评价和发展与一系列crt相关的空间方法。
{"title":"Spatial analysis of cluster randomised trials: a systematic review of analysis methods.","authors":"Christopher Jarvis, Gian Luca Di Tanna, Daniel Lewis, Neal Alexander, W John Edmunds","doi":"10.1186/s12982-017-0066-2","DOIUrl":"https://doi.org/10.1186/s12982-017-0066-2","url":null,"abstract":"<p><strong>Background: </strong>Cluster randomised trials (CRTs) often use geographical areas as the unit of randomisation, however explicit consideration of the location and spatial distribution of observations is rare. In many trials, the location of participants will have little importance, however in some, especially against infectious diseases, spillover effects due to participants being located close together may affect trial results. This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results.</p><p><strong>Methods: </strong>A systematic review of CRTs containing spatial methods, defined as a method that accounts for the structure, location, or relative distances between observations. We searched three sources: Ovid/Medline, Pubmed, and Web of Science databases. Spatial methods were categorised and details of the impact of spatial effects on trial results recorded.</p><p><strong>Results: </strong>We identified ten papers which met the inclusion criteria, comprising thirteen trials. We found that existing approaches fell into two categories; spatial variables and spatial modelling. The spatial variable approach was most common and involved standard statistical analysis of distance measurements. Spatial modelling is a more sophisticated approach which incorporates the spatial structure of the data within a random effects model. Studies tended to demonstrate the importance of accounting for location and distribution of observations in estimating unbiased effects.</p><p><strong>Conclusions: </strong>There have been a few attempts to control and estimate spatial effects within the context of human CRTs, but our overall understanding is limited. Although spatial effects may bias trial results, their consideration was usually a supplementary, rather than primary analysis. Further work is required to evaluate and develop the spatial methodologies relevant to a range of CRTs.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"12"},"PeriodicalIF":2.3,"publicationDate":"2017-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0066-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35447180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.
Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.
Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
{"title":"Decision trees in epidemiological research.","authors":"Ashwini Venkatasubramaniam, Julian Wolfson, Nathan Mitchell, Timothy Barnes, Meghan JaKa, Simone French","doi":"10.1186/s12982-017-0064-4","DOIUrl":"https://doi.org/10.1186/s12982-017-0064-4","url":null,"abstract":"<p><strong>Background: </strong>In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.</p><p><strong>Main text: </strong>We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.</p><p><strong>Conclusions: </strong>Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"11"},"PeriodicalIF":2.3,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0064-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35439732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}