Emerging Themes in Epidemiology最新文献_第5页

Role of survey response rates on valid inference: an application to HIV prevalence estimates. 调查回复率对有效推断的作用：艾滋病毒流行率估算的应用。

IF 3.6 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-03-05 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0074-x

Miguel Marino, Marcello Pagano

Background: Nationally-representative surveys suggest that females have a higher prevalence of HIV than males in most African countries. Unfortunately, these results are made on the basis of surveys with non-ignorable missing data. This study evaluates the impact that differential survey nonresponse rates between males and females can have on the point estimate of the HIV prevalence ratio of these two classifiers.

Methods: We study 29 Demographic and Health Surveys (DHS) from 2001 to 2010. Instead of employing often used multiple imputation models with a Missing at Random assumption that may not hold in this setting, we assess the effect of ignoring the information contained in the missing HIV information for males and females through three proposed statistical measures. These measures can be used in settings where the interest is comparing the prevalence of a disease between two groups. The proposed measures do not utilize parametric models and can be implemented by researchers of any level. They are: (1) an upper bound on the potential bias of the usual practise of using reported HIV prevalence estimates that ignore subjects who have missing HIV outcomes. (2) Plausible range intervals to account for nonresponses, without any additional parametric modeling assumptions. (3) Prevalence ratio inflation factors to correct the point estimate of the HIV prevalence ratio, if estimates of nonresponders' HIV prevalences were known.

Results: In 86% of countries, males have higher upper bounds of HIV prevalence than females, this is consonant with males possibly having higher infection rates than females. Additionally, 74% of surveys have a plausible range that crosses 1.0, suggesting a plausible equivalence between male and female HIV prevalences.

Conclusions: It is quite reasonable to conclude that there is so much DHS nonresponse in evaluating the HIV status question, that existing data is plausibly generated by the situation where the virus is equally distributed between the sexes.

背景：具有全国代表性的调查表明，在大多数非洲国家，女性的艾滋病毒感染率高于男性。遗憾的是，这些结果都是在有不可忽略的缺失数据的调查基础上得出的。本研究评估了男性和女性之间不同的调查无应答率对这两种分类方法的 HIV 感染率比值点估算的影响：我们研究了 2001 年至 2010 年的 29 次人口与健康调查（DHS）。我们没有采用通常使用的随机缺失假设的多重估算模型，而是通过三种拟议的统计测量方法来评估忽略男性和女性缺失的 HIV 信息所产生的影响。这些统计量可用于比较两组间疾病流行率的情况。建议的测量方法不使用参数模型，任何水平的研究人员都可以实施。它们是(1) 对使用报告的艾滋病流行率估计值的通常做法的潜在偏差设定上限，这种做法忽略了缺失艾滋病结果的受试者。(2) 合理的范围区间，以考虑到未回复的情况，而无需任何额外的参数建模假设。(3) 如果已知未回复者的艾滋病毒感染率估计值，则采用感染率比率膨胀系数来修正艾滋病毒感染率比率的点估计值：在 86% 的国家中，男性的 HIV 感染率上限高于女性，这与男性的感染率可能高于女性相吻合。此外，74% 的调查的可信范围超过了 1.0，这表明男性和女性的艾滋病感染率之间存在可信的等值关系：在评估 HIV 感染状况的问题时，人口与健康调查中存在大量的无响应情况，因此现有数据可能是由病毒在两性之间平均分布的情况产生的，这一结论是非常合理的。

{"title":"Role of survey response rates on valid inference: an application to HIV prevalence estimates.","authors":"Miguel Marino, Marcello Pagano","doi":"10.1186/s12982-018-0074-x","DOIUrl":"10.1186/s12982-018-0074-x","url":null,"abstract":"Background: Nationally-representative surveys suggest that females have a higher prevalence of HIV than males in most African countries. Unfortunately, these results are made on the basis of surveys with non-ignorable missing data. This study evaluates the impact that differential survey nonresponse rates between males and females can have on the point estimate of the HIV prevalence ratio of these two classifiers.Methods: We study 29 Demographic and Health Surveys (DHS) from 2001 to 2010. Instead of employing often used multiple imputation models with a Missing at Random assumption that may not hold in this setting, we assess the effect of ignoring the information contained in the missing HIV information for males and females through three proposed statistical measures. These measures can be used in settings where the interest is comparing the prevalence of a disease between two groups. The proposed measures do not utilize parametric models and can be implemented by researchers of any level. They are: (1) an upper bound on the potential bias of the usual practise of using reported HIV prevalence estimates that ignore subjects who have missing HIV outcomes. (2) Plausible range intervals to account for nonresponses, without any additional parametric modeling assumptions. (3) Prevalence ratio inflation factors to correct the point estimate of the HIV prevalence ratio, if estimates of nonresponders' HIV prevalences were known.Results: In 86% of countries, males have higher upper bounds of HIV prevalence than females, this is consonant with males possibly having higher infection rates than females. Additionally, 74% of surveys have a plausible range that crosses 1.0, suggesting a plausible equivalence between male and female HIV prevalences.Conclusions: It is quite reasonable to conclude that there is so much DHS nonresponse in evaluating the HIV status question, that existing data is plausibly generated by the situation where the virus is equally distributed between the sexes.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"6"},"PeriodicalIF":3.6,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5839032/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35903247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling fertility in rural South Africa with combined nonlinear parametric and semi-parametric methods. 结合非线性参数和半参数方法对南非农村生育率进行建模。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-03-02 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0073-y

Robert W Eyre, Thomas House, F Xavier Gómez-Olivé, Frances E Griffiths

Background: Central to the study of populations, and therefore to the analysis of the development of countries undergoing major transitions, is the calculation of fertility patterns and their dependence on different variables such as age, education, and socio-economic status. Most epidemiological research on these matters rely on the often unjustified assumption of (generalised) linearity, or alternatively makes a parametric assumption (e.g. for age-patterns).

Methods: We consider nonlinearity of fertility in the covariates by combining an established nonlinear parametric model for fertility over age with nonlinear modelling of fertility over other covariates. For the latter, we use the semi-parametric method of Gaussian process regression which is a popular methodology in many fields including machine learning, computer science, and systems biology. We applied the method to data from the Agincourt Health and Socio-Demographic Surveillance System, annual census rounds performed on a poor rural region of South Africa since 1992, to analyse fertility patterns over age and socio-economic status.

Results: We capture a previously established age-pattern of fertility, whilst being able to more robustly model the relationship between fertility and socio-economic status without unjustified a priori assumptions of linearity. Peak fertility over age is shown to be increasing over time, as well as for adolescents but not for those later in life for whom fertility is generally decreasing over time.

Conclusions: Combining Gaussian process regression with nonlinear parametric modelling of fertility over age allowed for the incorporation of further covariates into the analysis without needing to assume a linear relationship. This enabled us to provide further insights into the fertility patterns of the Agincourt study area, in particular the interaction between age and socio-economic status.

背景:人口研究的核心，因此也是分析正在经历重大转型的国家的发展的核心，是计算生育率模式及其对年龄、教育和社会经济地位等不同变量的依赖。大多数关于这些问题的流行病学研究依赖于通常不合理的(广义的)线性假设，或者做出参数假设(例如年龄模式)。方法:通过将已建立的生育率随年龄变化的非线性参数模型与生育率随其他协变量的非线性模型相结合，考虑生育率在协变量中的非线性。对于后者，我们使用高斯过程回归的半参数方法，这是许多领域的流行方法，包括机器学习，计算机科学和系统生物学。我们将该方法应用于阿金库尔健康和社会人口监测系统的数据，该系统自1992年以来在南非贫困农村地区进行年度人口普查，以分析年龄和社会经济地位的生育模式。结果:我们捕获了先前建立的生育率年龄模式，同时能够更稳健地模拟生育率和社会经济地位之间的关系，而没有不合理的线性先验假设。随着年龄的增长，生育高峰会随着时间的推移而增加，青少年也是如此，但对于那些生育能力随着时间的推移而普遍下降的人来说，情况并非如此。结论:将高斯过程回归与生育年龄的非线性参数建模相结合，可以将进一步的协变量纳入分析，而无需假设线性关系。这使我们能够进一步了解阿金库尔研究区域的生育模式，特别是年龄和社会经济地位之间的相互作用。

{"title":"Modelling fertility in rural South Africa with combined nonlinear parametric and semi-parametric methods.","authors":"Robert W Eyre, Thomas House, F Xavier Gómez-Olivé, Frances E Griffiths","doi":"10.1186/s12982-018-0073-y","DOIUrl":"https://doi.org/10.1186/s12982-018-0073-y","url":null,"abstract":"Background: Central to the study of populations, and therefore to the analysis of the development of countries undergoing major transitions, is the calculation of fertility patterns and their dependence on different variables such as age, education, and socio-economic status. Most epidemiological research on these matters rely on the often unjustified assumption of (generalised) linearity, or alternatively makes a parametric assumption (e.g. for age-patterns).Methods: We consider nonlinearity of fertility in the covariates by combining an established nonlinear parametric model for fertility over age with nonlinear modelling of fertility over other covariates. For the latter, we use the semi-parametric method of Gaussian process regression which is a popular methodology in many fields including machine learning, computer science, and systems biology. We applied the method to data from the Agincourt Health and Socio-Demographic Surveillance System, annual census rounds performed on a poor rural region of South Africa since 1992, to analyse fertility patterns over age and socio-economic status.Results: We capture a previously established age-pattern of fertility, whilst being able to more robustly model the relationship between fertility and socio-economic status without unjustified a priori assumptions of linearity. Peak fertility over age is shown to be increasing over time, as well as for adolescents but not for those later in life for whom fertility is generally decreasing over time.Conclusions: Combining Gaussian process regression with nonlinear parametric modelling of fertility over age allowed for the incorporation of further covariates into the analysis without needing to assume a linear relationship. This enabled us to provide further insights into the fertility patterns of the Agincourt study area, in particular the interaction between age and socio-economic status.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"5"},"PeriodicalIF":2.3,"publicationDate":"2018-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0073-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35885842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Novel metrics for growth model selection. 用于选择增长模型的新指标。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-02-23 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0072-z

Matthew R Grigsby, Junrui Di, Andrew Leroux, Vadim Zipunnikov, Luo Xiao, Ciprian Crainiceanu, William Checkley

Background: Literature surrounding the statistical modeling of childhood growth data involves a diverse set of potential models from which investigators can choose. However, the lack of a comprehensive framework for comparing non-nested models leads to difficulty in assessing model performance. This paper proposes a framework for comparing non-nested growth models using novel metrics of predictive accuracy based on modifications of the mean squared error criteria.

Methods: Three metrics were created: normalized, age-adjusted, and weighted mean squared error (MSE). Predictive performance metrics were used to compare linear mixed effects models and functional regression models. Prediction accuracy was assessed by partitioning the observed data into training and test datasets. This partitioning was constructed to assess prediction accuracy for backward (i.e., early growth), forward (i.e., late growth), in-range, and on new-individuals. Analyses were done with height measurements from 215 Peruvian children with data spanning from near birth to 2 years of age.

Results: Functional models outperformed linear mixed effects models in all scenarios tested. In particular, prediction errors for functional concurrent regression (FCR) and functional principal component analysis models were approximately 6% lower when compared to linear mixed effects models. When we weighted subject-specific MSEs according to subject-specific growth rates during infancy, we found that FCR was the best performer in all scenarios.

Conclusion: With this novel approach, we can quantitatively compare non-nested models and weight subgroups of interest to select the best performing growth model for a particular application or problem at hand.

背景：有关儿童生长数据统计建模的文献涉及多种潜在模型，研究人员可从中进行选择。然而，由于缺乏一个全面的框架来比较非嵌套模型，因此在评估模型性能方面存在困难。本文根据对均方误差标准的修改，提出了一个使用新的预测准确性指标来比较非嵌套生长模型的框架：方法：创建了三个指标：归一化、年龄调整和加权均方误差（MSE）。预测性能指标用于比较线性混合效应模型和函数回归模型。预测准确性是通过将观测数据划分为训练数据集和测试数据集来评估的。这种划分是为了评估后向（即早期生长）、前向（即晚期生长）、范围内和新个体的预测准确性。分析使用了 215 名秘鲁儿童的身高测量数据，数据时间跨度为近出生至 2 岁：结果：在所有测试方案中，功能模型都优于线性混合效应模型。特别是，与线性混合效应模型相比，功能并发回归（FCR）和功能主成分分析模型的预测误差低约 6%。当我们根据婴儿期特定受试者的生长速度对特定受试者的 MSE 进行加权时，我们发现 FCR 在所有情况下都表现最佳：通过这种新方法，我们可以定量比较非嵌套模型，并对感兴趣的子组进行加权，从而为特定应用或手头的问题选择性能最佳的生长模型。

{"title":"Novel metrics for growth model selection.","authors":"Matthew R Grigsby, Junrui Di, Andrew Leroux, Vadim Zipunnikov, Luo Xiao, Ciprian Crainiceanu, William Checkley","doi":"10.1186/s12982-018-0072-z","DOIUrl":"10.1186/s12982-018-0072-z","url":null,"abstract":"Background: Literature surrounding the statistical modeling of childhood growth data involves a diverse set of potential models from which investigators can choose. However, the lack of a comprehensive framework for comparing non-nested models leads to difficulty in assessing model performance. This paper proposes a framework for comparing non-nested growth models using novel metrics of predictive accuracy based on modifications of the mean squared error criteria.Methods: Three metrics were created: normalized, age-adjusted, and weighted mean squared error (MSE). Predictive performance metrics were used to compare linear mixed effects models and functional regression models. Prediction accuracy was assessed by partitioning the observed data into training and test datasets. This partitioning was constructed to assess prediction accuracy for backward (i.e., early growth), forward (i.e., late growth), in-range, and on new-individuals. Analyses were done with height measurements from 215 Peruvian children with data spanning from near birth to 2 years of age.Results: Functional models outperformed linear mixed effects models in all scenarios tested. In particular, prediction errors for functional concurrent regression (FCR) and functional principal component analysis models were approximately 6% lower when compared to linear mixed effects models. When we weighted subject-specific MSEs according to subject-specific growth rates during infancy, we found that FCR was the best performer in all scenarios.Conclusion: With this novel approach, we can quantitatively compare non-nested models and weight subgroups of interest to select the best performing growth model for a particular application or problem at hand.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"4"},"PeriodicalIF":2.3,"publicationDate":"2018-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5824542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35865435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effect of correcting for gestational age at birth on population prevalence of early childhood undernutrition. 校正出生时的胎龄对幼儿营养不良人群患病率的影响。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-02-06 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0070-1

Nandita Perumal, Daniel E Roth, Johnna Perdrizet, Aluísio J D Barros, Iná S Santos, Alicia Matijasevich, Diego G Bassani

Background: Postmenstrual and/or gestational age-corrected age (CA) is required to apply child growth standards to children born preterm (< 37 weeks gestational age). Yet, CA is rarely used in epidemiologic studies in low- and middle-income countries (LMICs), which may bias population estimates of childhood undernutrition. To evaluate the effect of accounting for GA in the application of growth standards, we used GA-specific standards at birth (INTERGROWTH-21st newborn size standards) in conjunction with CA for preterm-born children in the application of World Health Organization Child Growth Standards postnatally (referred to as 'CA' strategy) versus postnatal age for all children, to estimate mean length-for-age (LAZ) and weight-for-age (WAZ) z scores at 0, 3, 12, 24, and 48-months of age in the 2004 Pelotas (Brazil) Birth Cohort.

Results: At birth (n = 4066), mean LAZ was higher and the prevalence of stunting (LAZ < -2) was lower using CA versus postnatal age (mean ± SD): - 0.36 ± 1.19 versus - 0.67 ± 1.32; and 8.3 versus 11.6%, respectively. Odds ratio (OR) and population attributable risk (PAR) of stunting due to preterm birth were attenuated and changed inferences using CA versus postnatal age at birth [OR, 95% confidence interval (CI): 1.32 (95% CI 0.95, 1.82) vs 14.7 (95% CI 11.7, 18.4); PAR 3.1 vs 42.9%]; differences in inferences persisted at 3-months. At 12, 24, and 48-months, preterm birth was associated with stunting, but ORs/PARs remained attenuated using CA compared to postnatal age. Findings were similar for weight-for-age z scores.

Conclusions: Population-based epidemiologic studies in LMICs in which GA is unused or unavailable may overestimate the prevalence of early childhood undernutrition and inflate the fraction of undernutrition attributable to preterm birth.

背景：对早产儿适用儿童生长标准需要月经后年龄和/或胎龄校正年龄（CA）（2004 年佩洛塔斯（巴西）出生队列中 0、3、12、24 和 48 个月时的 z 评分）：出生时（n = 4066），平均 LAZ 值较高，发育迟缓的发生率（LAZ z 分数）也较高：在未使用或无法获得 GA 的低收入与中等收入国家开展的基于人口的流行病学研究可能会高估儿童早期营养不良的发生率，并夸大早产造成的营养不良比例。

{"title":"Effect of correcting for gestational age at birth on population prevalence of early childhood undernutrition.","authors":"Nandita Perumal, Daniel E Roth, Johnna Perdrizet, Aluísio J D Barros, Iná S Santos, Alicia Matijasevich, Diego G Bassani","doi":"10.1186/s12982-018-0070-1","DOIUrl":"10.1186/s12982-018-0070-1","url":null,"abstract":"Background: Postmenstrual and/or gestational age-corrected age (CA) is required to apply child growth standards to children born preterm (< 37 weeks gestational age). Yet, CA is rarely used in epidemiologic studies in low- and middle-income countries (LMICs), which may bias population estimates of childhood undernutrition. To evaluate the effect of accounting for GA in the application of growth standards, we used GA-specific standards at birth (INTERGROWTH-21st newborn size standards) in conjunction with CA for preterm-born children in the application of World Health Organization Child Growth Standards postnatally (referred to as 'CA' strategy) versus postnatal age for all children, to estimate mean length-for-age (LAZ) and weight-for-age (WAZ) z scores at 0, 3, 12, 24, and 48-months of age in the 2004 Pelotas (Brazil) Birth Cohort.Results: At birth (n = 4066), mean LAZ was higher and the prevalence of stunting (LAZ < -2) was lower using CA versus postnatal age (mean ± SD): - 0.36 ± 1.19 versus - 0.67 ± 1.32; and 8.3 versus 11.6%, respectively. Odds ratio (OR) and population attributable risk (PAR) of stunting due to preterm birth were attenuated and changed inferences using CA versus postnatal age at birth [OR, 95% confidence interval (CI): 1.32 (95% CI 0.95, 1.82) vs 14.7 (95% CI 11.7, 18.4); PAR 3.1 vs 42.9%]; differences in inferences persisted at 3-months. At 12, 24, and 48-months, preterm birth was associated with stunting, but ORs/PARs remained attenuated using CA compared to postnatal age. Findings were similar for weight-for-age z scores.Conclusions: Population-based epidemiologic studies in LMICs in which GA is unused or unavailable may overestimate the prevalence of early childhood undernutrition and inflate the fraction of undernutrition attributable to preterm birth.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"3"},"PeriodicalIF":2.3,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35830088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contextual factors in maternal and newborn health evaluation: a protocol applied in Nigeria, India and Ethiopia. 孕产妇和新生儿健康评价中的环境因素:尼日利亚、印度和埃塞俄比亚适用的议定书。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-02-06 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0071-0

Kate Sabot, Tanya Marchant, Neil Spicer, Della Berhanu, Meenakshi Gautham, Nasir Umar, Joanna Schellenberg

Background: Understanding the context of a health programme is important in interpreting evaluation findings and in considering the external validity for other settings. Public health researchers can be imprecise and inconsistent in their usage of the word "context" and its application to their work. This paper presents an approach to defining context, to capturing relevant contextual information and to using such information to help interpret findings from the perspective of a research group evaluating the effect of diverse innovations on coverage of evidence-based, life-saving interventions for maternal and newborn health in Ethiopia, Nigeria, and India.

Methods: We define "context" as the background environment or setting of any program, and "contextual factors" as those elements of context that could affect implementation of a programme. Through a structured, consultative process, contextual factors were identified while trying to strike a balance between comprehensiveness and feasibility. Thematic areas included demographics and socio-economics, epidemiological profile, health systems and service uptake, infrastructure, education, environment, politics, policy and governance. We outline an approach for capturing and using contextual factors while maximizing use of existing data. Methods include desk reviews, secondary data extraction and key informant interviews. Outputs include databases of contextual factors and summaries of existing maternal and newborn health policies and their implementation. Use of contextual data will be qualitative in nature and may assist in interpreting findings in both quantitative and qualitative aspects of programme evaluation.

Discussion: Applying this approach was more resource intensive than expected, in part because routinely available information was not consistently available across settings and more primary data collection was required than anticipated. Data was used only minimally, partly due to a lack of evaluation results that needed further explanation, but also because contextual data was not available for the precise units of analysis or time periods of interest. We would advise others to consider integrating contextual factors within other data collection activities, and to conduct regular reviews of maternal and newborn health policies. This approach and the learnings from its application could help inform the development of guidelines for the collection and use of contextual factors in public health evaluation.

背景:了解卫生规划的背景对于解释评价结果和考虑其他环境的外部有效性非常重要。公共卫生研究人员在使用“上下文”一词及其在工作中的应用时可能不精确和不一致。本文提出了一种定义背景、获取相关背景信息并利用这些信息帮助从一个研究小组的角度解释研究结果的方法，该研究小组评估了埃塞俄比亚、尼日利亚和印度各种创新对以证据为基础的孕产妇和新生儿健康救生干预措施覆盖面的影响。方法:我们将“上下文”定义为任何程序的背景环境或设置，“上下文因素”定义为可能影响程序实施的上下文元素。通过一个有组织的协商过程，确定了各种背景因素，同时设法在全面性和可行性之间取得平衡。专题领域包括人口统计和社会经济学、流行病学概况、卫生系统和服务吸收、基础设施、教育、环境、政治、政策和治理。我们概述了在最大限度地利用现有数据的同时捕获和使用上下文因素的方法。方法包括案头回顾、二次数据提取和关键线人访谈。产出包括环境因素数据库和现有孕产妇和新生儿保健政策及其执行情况摘要。背景数据的使用将是定性的，可能有助于解释方案评价在数量和质量两方面的调查结果。讨论:应用这种方法比预期的需要更多的资源，部分原因是常规可用的信息在不同的设置中并不一致，并且需要比预期更多的原始数据收集。数据的使用很少，部分原因是缺乏需要进一步解释的评价结果，但也因为上下文数据无法用于分析的精确单位或感兴趣的时间段。我们建议其他国家考虑将环境因素纳入其他数据收集活动，并定期审查孕产妇和新生儿保健政策。这一方法及其应用所获得的经验可以帮助制定在公共卫生评价中收集和使用背景因素的指导方针。

{"title":"Contextual factors in maternal and newborn health evaluation: a protocol applied in Nigeria, India and Ethiopia.","authors":"Kate Sabot, Tanya Marchant, Neil Spicer, Della Berhanu, Meenakshi Gautham, Nasir Umar, Joanna Schellenberg","doi":"10.1186/s12982-018-0071-0","DOIUrl":"https://doi.org/10.1186/s12982-018-0071-0","url":null,"abstract":"Background: Understanding the context of a health programme is important in interpreting evaluation findings and in considering the external validity for other settings. Public health researchers can be imprecise and inconsistent in their usage of the word \"context\" and its application to their work. This paper presents an approach to defining context, to capturing relevant contextual information and to using such information to help interpret findings from the perspective of a research group evaluating the effect of diverse innovations on coverage of evidence-based, life-saving interventions for maternal and newborn health in Ethiopia, Nigeria, and India.Methods: We define \"context\" as the background environment or setting of any program, and \"contextual factors\" as those elements of context that could affect implementation of a programme. Through a structured, consultative process, contextual factors were identified while trying to strike a balance between comprehensiveness and feasibility. Thematic areas included demographics and socio-economics, epidemiological profile, health systems and service uptake, infrastructure, education, environment, politics, policy and governance. We outline an approach for capturing and using contextual factors while maximizing use of existing data. Methods include desk reviews, secondary data extraction and key informant interviews. Outputs include databases of contextual factors and summaries of existing maternal and newborn health policies and their implementation. Use of contextual data will be qualitative in nature and may assist in interpreting findings in both quantitative and qualitative aspects of programme evaluation.Discussion: Applying this approach was more resource intensive than expected, in part because routinely available information was not consistently available across settings and more primary data collection was required than anticipated. Data was used only minimally, partly due to a lack of evaluation results that needed further explanation, but also because contextual data was not available for the precise units of analysis or time periods of interest. We would advise others to consider integrating contextual factors within other data collection activities, and to conduct regular reviews of maternal and newborn health policies. This approach and the learnings from its application could help inform the development of guidelines for the collection and use of contextual factors in public health evaluation.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"2"},"PeriodicalIF":2.3,"publicationDate":"2018-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0071-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35830087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An introduction to instrumental variable assumptions, validation and estimation. 介绍工具变量的假设、验证和估计。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2018-01-22 eCollection Date: 2018-01-01 DOI: 10.1186/s12982-018-0069-7

Mette Lise Lousdal

The instrumental variable method has been employed within economics to infer causality in the presence of unmeasured confounding. Emphasising the parallels to randomisation may increase understanding of the underlying assumptions within epidemiology. An instrument is a variable that predicts exposure, but conditional on exposure shows no independent association with the outcome. The random assignment in trials is an example of what would be expected to be an ideal instrument, but instruments can also be found in observational settings with a naturally varying phenomenon e.g. geographical variation, physical distance to facility or physician's preference. The fourth identifying assumption has received less attention, but is essential for the generalisability of estimated effects. The instrument identifies the group of compliers in which exposure is pseudo-randomly assigned leading to exchangeability with regard to unmeasured confounders. Underlying assumptions can only partially be tested empirically and require subject-matter knowledge. Future studies employing instruments should carefully seek to validate all four assumptions, possibly drawing on parallels to randomisation.

在经济学中，工具变量法已被用于在存在无法测量的混杂时推断因果关系。强调与随机化的相似之处可能会增加对流行病学中潜在假设的理解。仪器是预测暴露的变量，但以暴露为条件与结果没有独立关联。试验中的随机分配是理想仪器的一个例子，但仪器也可以在具有自然变化现象的观察环境中找到，例如地理变化，到设施的物理距离或医生的偏好。第四个识别假设受到的关注较少，但对于估计效果的普遍性至关重要。该工具确定了暴露是伪随机分配的编译器组，导致未测量混杂因素的互换性。潜在的假设只能部分地被经验检验，并且需要相关的知识。未来使用工具的研究应谨慎地寻求验证所有四个假设，可能与随机化相似。

{"title":"An introduction to instrumental variable assumptions, validation and estimation.","authors":"Mette Lise Lousdal","doi":"10.1186/s12982-018-0069-7","DOIUrl":"https://doi.org/10.1186/s12982-018-0069-7","url":null,"abstract":"The instrumental variable method has been employed within economics to infer causality in the presence of unmeasured confounding. Emphasising the parallels to randomisation may increase understanding of the underlying assumptions within epidemiology. An instrument is a variable that predicts exposure, but conditional on exposure shows no independent association with the outcome. The random assignment in trials is an example of what would be expected to be an ideal instrument, but instruments can also be found in observational settings with a naturally varying phenomenon e.g. geographical variation, physical distance to facility or physician's preference. The fourth identifying assumption has received less attention, but is essential for the generalisability of estimated effects. The instrument identifies the group of compliers in which exposure is pseudo-randomly assigned leading to exchangeability with regard to unmeasured confounders. Underlying assumptions can only partially be tested empirically and require subject-matter knowledge. Future studies employing instruments should carefully seek to validate all four assumptions, possibly drawing on parallels to randomisation.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"15 ","pages":"1"},"PeriodicalIF":2.3,"publicationDate":"2018-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-018-0069-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35782943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study. 使用关联代理结果数据的多重输入导致重要的偏差减少和效率提高:一项模拟研究。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2017-12-19 eCollection Date: 2017-01-01 DOI: 10.1186/s12982-017-0068-0

R P Cornish, J Macleod, J R Carpenter, K Tilling

Background: When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).

Methods: Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.

Results: Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.

Conclusions: In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.

背景:当一个结果变量不是随机丢失时(MNAR:丢失的概率取决于结果值)，对暴露对该结果的影响的估计往往是有偏差的。我们调查了这种偏差的程度，并检查了是否可以通过将通过与行政数据联系获得的代理结果作为多重imputation (MI)的辅助变量来减少偏差。方法:使用雅芳父母与儿童纵向研究(ALSPAC)的数据，我们估计母乳喂养与智商(连续结果)之间的关联，并将相关成就数据(智商的代理)作为MI模型的辅助变量。模拟研究探讨了不同缺失数据比例(从20%到80%)、结果与其代理之间的相关性(0.1-0.9)、缺失数据机制的强度以及代理变量不完整的影响。结果:将缺失结果的关联代理作为辅助变量，即使在80%的结果缺失的情况下，也可以在所有情况下减少偏差并提高效率。使用不完整的代理也同样有益。结果与其代理之间的高相关性(> 0.5)大大减少了缺失信息。与此一致，ALSPAC分析显示，纳入代理减少了偏倚，提高了效率。额外代理的收益是温和的。结论:在随访损失的纵向研究中，当研究结果为MNAR时，将通过与外部数据来源的联系获得的研究结果的代理作为MI模型中的辅助变量，可以减少实际重要的偏倚并提高效率。

{"title":"Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.","authors":"R P Cornish, J Macleod, J R Carpenter, K Tilling","doi":"10.1186/s12982-017-0068-0","DOIUrl":"https://doi.org/10.1186/s12982-017-0068-0","url":null,"abstract":"Background: When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).Methods: Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.Results: Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.Conclusions: In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"14"},"PeriodicalIF":2.3,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0068-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35682082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Flexible semiparametric joint modeling: an application to estimate individual lung function decline and risk of pulmonary exacerbations in cystic fibrosis. 柔性半参数关节建模:用于估计囊性纤维化患者个体肺功能下降和肺恶化风险的应用。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2017-11-14 eCollection Date: 2017-01-01 DOI: 10.1186/s12982-017-0067-1

Dan Li, Ruth Keogh, John P Clancy, Rhonda D Szczesniak

Background: Epidemiologic surveillance of lung function is key to clinical care of individuals with cystic fibrosis, but lung function decline is nonlinear and often impacted by acute respiratory events known as pulmonary exacerbations. Statistical models are needed to simultaneously estimate lung function decline while providing risk estimates for the onset of pulmonary exacerbations, in order to identify relevant predictors of declining lung function and understand how these associations could be used to predict the onset of pulmonary exacerbations.

Methods: Using longitudinal lung function (FEV₁) measurements and time-to-event data on pulmonary exacerbations from individuals in the United States Cystic Fibrosis Registry, we implemented a flexible semiparametric joint model consisting of a mixed-effects submodel with regression splines to fit repeated FEV₁ measurements and a time-to-event submodel for possibly censored data on pulmonary exacerbations. We contrasted this approach with methods currently used in epidemiological studies and highlight clinical implications.

Results: The semiparametric joint model had the best fit of all models examined based on deviance information criterion. Higher starting FEV₁ implied more rapid lung function decline in both separate and joint models; however, individualized risk estimates for pulmonary exacerbation differed depending upon model type. Based on shared parameter estimates from the joint model, which accounts for the nonlinear FEV₁ trajectory, patients with more positive rates of change were less likely to experience a pulmonary exacerbation (HR per one standard deviation increase in FEV₁ rate of change = 0.566, 95% CI 0.516-0.619), and having higher absolute FEV₁ also corresponded to lower risk of having a pulmonary exacerbation (HR per one standard deviation increase in FEV₁ = 0.856, 95% CI 0.781-0.937). At the population level, both submodels indicated significant effects of birth cohort, socioeconomic status and respiratory infections on FEV₁ decline, as well as significant effects of gender, socioeconomic status and birth cohort on pulmonary exacerbation risk.

Conclusions: Through a flexible joint-modeling approach, we provide a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients; we demonstrate how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.

背景:肺功能的流行病学监测是囊性纤维化患者临床护理的关键，但肺功能下降是非线性的，通常受到急性呼吸事件(肺恶化)的影响。需要统计模型来同时估计肺功能下降，同时提供肺急性发作的风险估计，以确定肺功能下降的相关预测因素，并了解如何使用这些关联来预测肺急性发作的发生。方法:使用纵向肺功能(FEV1)测量值和美国囊性纤维化登记册中个体肺恶化的事件发生时间数据，我们实现了一个灵活的半参数联合模型，该模型由一个混合效应子模型组成，该模型具有回归样条，以拟合重复的FEV1测量值，以及一个可能被省略的肺恶化数据的事件发生时间子模型。我们将这种方法与目前流行病学研究中使用的方法进行了对比，并强调了临床意义。结果:半参数联合模型是基于偏差信息准则检验的所有模型中拟合最好的。启动FEV1越高，单独模型和联合模型肺功能下降越快;然而，肺恶化的个体化风险估计因模型类型而异。基于联合模型的共享参数估计，该模型解释了非线性FEV1轨迹，阳性变化率越高的患者越不可能经历肺恶化(FEV1变化率每一个标准差增加的HR = 0.566, 95% CI 0.516-0.619)，并且绝对FEV1越高，肺恶化的风险也越低(FEV1每一个标准差增加的HR = 0.856, 95% CI 0.781-0.937)。在人群水平上，两个亚模型均显示出生队列、社会经济地位和呼吸道感染对肺ev1下降有显著影响，性别、社会经济地位和出生队列对肺恶化风险有显著影响。结论:通过灵活的关节建模方法，我们提供了一种同时估计个体患者肺功能轨迹和肺恶化风险的方法;我们展示了这种方法如何为囊性纤维化的临床过程提供了额外的见解，这是使用传统方法无法实现的。

{"title":"Flexible semiparametric joint modeling: an application to estimate individual lung function decline and risk of pulmonary exacerbations in cystic fibrosis.","authors":"Dan Li, Ruth Keogh, John P Clancy, Rhonda D Szczesniak","doi":"10.1186/s12982-017-0067-1","DOIUrl":"https://doi.org/10.1186/s12982-017-0067-1","url":null,"abstract":"Background: Epidemiologic surveillance of lung function is key to clinical care of individuals with cystic fibrosis, but lung function decline is nonlinear and often impacted by acute respiratory events known as pulmonary exacerbations. Statistical models are needed to simultaneously estimate lung function decline while providing risk estimates for the onset of pulmonary exacerbations, in order to identify relevant predictors of declining lung function and understand how these associations could be used to predict the onset of pulmonary exacerbations.Methods: Using longitudinal lung function (FEV1) measurements and time-to-event data on pulmonary exacerbations from individuals in the United States Cystic Fibrosis Registry, we implemented a flexible semiparametric joint model consisting of a mixed-effects submodel with regression splines to fit repeated FEV1 measurements and a time-to-event submodel for possibly censored data on pulmonary exacerbations. We contrasted this approach with methods currently used in epidemiological studies and highlight clinical implications.Results: The semiparametric joint model had the best fit of all models examined based on deviance information criterion. Higher starting FEV1 implied more rapid lung function decline in both separate and joint models; however, individualized risk estimates for pulmonary exacerbation differed depending upon model type. Based on shared parameter estimates from the joint model, which accounts for the nonlinear FEV1 trajectory, patients with more positive rates of change were less likely to experience a pulmonary exacerbation (HR per one standard deviation increase in FEV1 rate of change = 0.566, 95% CI 0.516-0.619), and having higher absolute FEV1 also corresponded to lower risk of having a pulmonary exacerbation (HR per one standard deviation increase in FEV1 = 0.856, 95% CI 0.781-0.937). At the population level, both submodels indicated significant effects of birth cohort, socioeconomic status and respiratory infections on FEV1 decline, as well as significant effects of gender, socioeconomic status and birth cohort on pulmonary exacerbation risk.Conclusions: Through a flexible joint-modeling approach, we provide a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients; we demonstrate how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"13"},"PeriodicalIF":2.3,"publicationDate":"2017-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0067-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35219501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Spatial analysis of cluster randomised trials: a systematic review of analysis methods. 聚类随机试验的空间分析:分析方法的系统回顾。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2017-09-21 eCollection Date: 2017-01-01 DOI: 10.1186/s12982-017-0066-2

Christopher Jarvis, Gian Luca Di Tanna, Daniel Lewis, Neal Alexander, W John Edmunds

Background: Cluster randomised trials (CRTs) often use geographical areas as the unit of randomisation, however explicit consideration of the location and spatial distribution of observations is rare. In many trials, the location of participants will have little importance, however in some, especially against infectious diseases, spillover effects due to participants being located close together may affect trial results. This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results.

Methods: A systematic review of CRTs containing spatial methods, defined as a method that accounts for the structure, location, or relative distances between observations. We searched three sources: Ovid/Medline, Pubmed, and Web of Science databases. Spatial methods were categorised and details of the impact of spatial effects on trial results recorded.

Results: We identified ten papers which met the inclusion criteria, comprising thirteen trials. We found that existing approaches fell into two categories; spatial variables and spatial modelling. The spatial variable approach was most common and involved standard statistical analysis of distance measurements. Spatial modelling is a more sophisticated approach which incorporates the spatial structure of the data within a random effects model. Studies tended to demonstrate the importance of accounting for location and distribution of observations in estimating unbiased effects.

Conclusions: There have been a few attempts to control and estimate spatial effects within the context of human CRTs, but our overall understanding is limited. Although spatial effects may bias trial results, their consideration was usually a supplementary, rather than primary analysis. Further work is required to evaluate and develop the spatial methodologies relevant to a range of CRTs.

背景:聚类随机试验(crt)通常使用地理区域作为随机化的单位，但很少明确考虑观察的位置和空间分布。在许多试验中，参与者的位置并不重要，但在某些试验中，特别是针对传染病的试验，由于参与者位置靠得很近而产生的溢出效应可能会影响试验结果。本综述旨在确定用于crt的空间分析方法，并提高对空间效应对试验结果影响的理解。方法:对包含空间方法的crt进行系统回顾，空间方法定义为解释观测值之间的结构、位置或相对距离的方法。我们搜索了三个来源:Ovid/Medline, Pubmed和Web of Science数据库。对空间方法进行了分类，并详细记录了空间效应对试验结果的影响。结果:我们确定了10篇符合纳入标准的论文，包括13项试验。我们发现现有的方法分为两类;空间变量和空间建模。空间变量方法是最常见的，涉及距离测量的标准统计分析。空间建模是一种更复杂的方法，它将数据的空间结构纳入随机效应模型中。研究倾向于证明在估计无偏效应时考虑观测值的位置和分布的重要性。结论:在人类crt的背景下，已经有一些控制和估计空间效应的尝试，但我们的总体理解是有限的。虽然空间效应可能会使试验结果偏倚，但它们的考虑通常是补充分析，而不是主要分析。需要进一步的工作来评价和发展与一系列crt相关的空间方法。

{"title":"Spatial analysis of cluster randomised trials: a systematic review of analysis methods.","authors":"Christopher Jarvis, Gian Luca Di Tanna, Daniel Lewis, Neal Alexander, W John Edmunds","doi":"10.1186/s12982-017-0066-2","DOIUrl":"https://doi.org/10.1186/s12982-017-0066-2","url":null,"abstract":"Background: Cluster randomised trials (CRTs) often use geographical areas as the unit of randomisation, however explicit consideration of the location and spatial distribution of observations is rare. In many trials, the location of participants will have little importance, however in some, especially against infectious diseases, spillover effects due to participants being located close together may affect trial results. This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results.Methods: A systematic review of CRTs containing spatial methods, defined as a method that accounts for the structure, location, or relative distances between observations. We searched three sources: Ovid/Medline, Pubmed, and Web of Science databases. Spatial methods were categorised and details of the impact of spatial effects on trial results recorded.Results: We identified ten papers which met the inclusion criteria, comprising thirteen trials. We found that existing approaches fell into two categories; spatial variables and spatial modelling. The spatial variable approach was most common and involved standard statistical analysis of distance measurements. Spatial modelling is a more sophisticated approach which incorporates the spatial structure of the data within a random effects model. Studies tended to demonstrate the importance of accounting for location and distribution of observations in estimating unbiased effects.Conclusions: There have been a few attempts to control and estimate spatial effects within the context of human CRTs, but our overall understanding is limited. Although spatial effects may bias trial results, their consideration was usually a supplementary, rather than primary analysis. Further work is required to evaluate and develop the spatial methodologies relevant to a range of CRTs.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"12"},"PeriodicalIF":2.3,"publicationDate":"2017-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0066-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35447180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Decision trees in epidemiological research. 流行病学研究中的决策树。

IF 2.3 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Emerging Themes in Epidemiology

Pub Date : 2017-09-20 eCollection Date: 2017-01-01 DOI: 10.1186/s12982-017-0064-4

Ashwini Venkatasubramaniam, Julian Wolfson, Nathan Mitchell, Timothy Barnes, Meghan JaKa, Simone French

Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.

Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.

Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

背景:在许多研究中，确定与结果相对均匀的人口亚群是很有意义的。这些亚组的性质可以提供对效果机制的洞察，并为量身定制的干预措施提出目标。然而，用标准的统计方法确定相关的子组可能具有挑战性。主要文本:我们回顾了关于决策树的文献，决策树是一种基于协变量将人口划分为具有相似结果变量值的不同子组的技术。我们比较了两种决策树方法，即流行的分类与回归树(CART)技术和较新的条件推理树(CTree)技术，在模拟研究中评估了它们的性能，并使用了盒饭研究(一项分量干预的随机对照试验)的数据。CART和CTree都能识别同质的总体子组，当子组真正存在于数据中时，相对于基于回归的方法，它们能提供更高的预测精度。CART和CTree之间的一个重要区别是，后者在构建决策树时使用正式的统计假设检验框架，这简化了识别和解释最终树模型的过程。我们还介绍了一种新的方法来可视化由决策树定义的子群。我们新颖的图形可视化为决策树识别的子群提供了更有科学意义的表征。结论:决策树是识别由个体特征组合定义的同质子群的有用工具。虽然所有的决策树技术都会生成子组，但我们提倡使用更新的CTree技术，因为它简单且易于解释。

{"title":"Decision trees in epidemiological research.","authors":"Ashwini Venkatasubramaniam, Julian Wolfson, Nathan Mitchell, Timothy Barnes, Meghan JaKa, Simone French","doi":"10.1186/s12982-017-0064-4","DOIUrl":"https://doi.org/10.1186/s12982-017-0064-4","url":null,"abstract":"Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"14 ","pages":"11"},"PeriodicalIF":2.3,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12982-017-0064-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35439732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 86