Pub Date : 2020-01-01DOI: 10.1080/24709360.2018.1469810
Xiao‐Hua Zhou
In biomedical research, missing data are a common problem. The statistical literature to solve this problem is well developed but overly technical and complicated for health science researchers who are not experts in statistics or methodology. In this paper, we review available statistical methods for handling missing data and provide health science researchers with the means of understanding the importance of missing data in their own personal research, and the ability to use these methods given the available software.
{"title":"Challenges and strategies in analysis of missing data","authors":"Xiao‐Hua Zhou","doi":"10.1080/24709360.2018.1469810","DOIUrl":"https://doi.org/10.1080/24709360.2018.1469810","url":null,"abstract":"In biomedical research, missing data are a common problem. The statistical literature to solve this problem is well developed but overly technical and complicated for health science researchers who are not experts in statistics or methodology. In this paper, we review available statistical methods for handling missing data and provide health science researchers with the means of understanding the importance of missing data in their own personal research, and the ability to use these methods given the available software.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"15 - 23"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2018.1469810","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45344516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1734391
S. Vilakati, G. Cortese
The focus on two-stage randomization designs with survival end points is on estimating and comparing survival distributions for the different treatment policies. The objective is to identify the treatment policy which prolongs survival. In this paper, a method for comparing two treatment policies is proposed. These treatment policies may be shared path or independent path treatment policies. Simulation studies are performed to evaluate the performance of the new approach. The simulation studies reveal that the new method has better statistical power in cases where the survival curves cross. The new method is applied to a clinical trial dataset for leukemia.
{"title":"Weighted Lin and Xu test for two-stage randomization designs","authors":"S. Vilakati, G. Cortese","doi":"10.1080/24709360.2020.1734391","DOIUrl":"https://doi.org/10.1080/24709360.2020.1734391","url":null,"abstract":"The focus on two-stage randomization designs with survival end points is on estimating and comparing survival distributions for the different treatment policies. The objective is to identify the treatment policy which prolongs survival. In this paper, a method for comparing two treatment policies is proposed. These treatment policies may be shared path or independent path treatment policies. Simulation studies are performed to evaluate the performance of the new approach. The simulation studies reveal that the new method has better statistical power in cases where the survival curves cross. The new method is applied to a clinical trial dataset for leukemia.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"221 - 237"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1734391","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49638481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1842048
J. R. Khan, Jahida Gulshan
Earlier studies to assess the effects of risk factors on child nutritional status in Bangladesh have used conventional regression models that are inadequate to capture a complete scenario of effects. Therefore, this study aimed to evaluate the heterogeneous effects of factors at different points of conditional height-for-age Z-score (HAZ) distribution accounting for cluster-level variation using linear quantile mixed model (LQMM) and to compare them with a linear mixed model (LMM). In addition, an unconditional quantile model (UQM) was used to measure the effect of factors on the unconditional (marginal) HAZ distribution. A total of 6340 children aged 0–59 months extracted from the 2014 Bangladesh Demographic and Health Survey. Different factors – maternal characteristics (age, occupation, nutritional status, parity, birth interval), parental education, child age, breastfeeding status, and morbidity had significant heterogeneous effects on HAZ distribution. For example, secondary or higher educated parents had substantial differential impacts on the lower tail and upper tail of the child HAZ distribution, which was masked by LMM estimate. Moreover, significant cluster-level variations found across all quantiles of child HAZ. During intervention design, heterogeneous effects of factors and cluster variation ought to consider addressing the undernutrition problem in Bangladesh.
{"title":"Heterogeneous effects of factors on child nutritional status in Bangladesh using linear quantile mixed model","authors":"J. R. Khan, Jahida Gulshan","doi":"10.1080/24709360.2020.1842048","DOIUrl":"https://doi.org/10.1080/24709360.2020.1842048","url":null,"abstract":"Earlier studies to assess the effects of risk factors on child nutritional status in Bangladesh have used conventional regression models that are inadequate to capture a complete scenario of effects. Therefore, this study aimed to evaluate the heterogeneous effects of factors at different points of conditional height-for-age Z-score (HAZ) distribution accounting for cluster-level variation using linear quantile mixed model (LQMM) and to compare them with a linear mixed model (LMM). In addition, an unconditional quantile model (UQM) was used to measure the effect of factors on the unconditional (marginal) HAZ distribution. A total of 6340 children aged 0–59 months extracted from the 2014 Bangladesh Demographic and Health Survey. Different factors – maternal characteristics (age, occupation, nutritional status, parity, birth interval), parental education, child age, breastfeeding status, and morbidity had significant heterogeneous effects on HAZ distribution. For example, secondary or higher educated parents had substantial differential impacts on the lower tail and upper tail of the child HAZ distribution, which was masked by LMM estimate. Moreover, significant cluster-level variations found across all quantiles of child HAZ. During intervention design, heterogeneous effects of factors and cluster variation ought to consider addressing the undernutrition problem in Bangladesh.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"265 - 281"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1842048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43047827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1719690
Lucy Beggs, R. Briscoe, C. Griffiths, G. Ellison, M. Gilthorpe
Background: Intervention differential effects (IDEs) occur where changes in an outcome depend upon the initial values of that outcome. Although methods to identify IDEs are well documented, there remains a lack of understanding about the circumstances under which these methods are robust. One context that has not been explored is the identification of intervention differential effect in studies where sample selection is based on the initial value of the outcome being evaluated. We hypothesise that, in such settings, established methods for detecting IDEs will struggle to discriminate these from regression to the mean. Methods: Using simulated datasets of weight-loss intervention programmes that recruit according to initial body mass index, we explore the reliability of Oldham's method and multilevel modelling (MLM) to detect IDEs. Results: In datasets simulated with no IDE, Oldham's method and MLM yield Type I error rates >90%, confirming that threshold selection/truncation leads to bias due to regression to the mean. Type I error rates return close to 5% for both methods when a control group is introduced. Conclusions: Oldham's method and MLM can robustly detect IDEs in this setting, but only if analyses incorporate a control group for comparison.
{"title":"Intervention differential effects and regression to the mean in studies where sample selection is based on the initial value of the outcome variable: an evaluation of methods illustrated in weight-management studies","authors":"Lucy Beggs, R. Briscoe, C. Griffiths, G. Ellison, M. Gilthorpe","doi":"10.1080/24709360.2020.1719690","DOIUrl":"https://doi.org/10.1080/24709360.2020.1719690","url":null,"abstract":"Background: Intervention differential effects (IDEs) occur where changes in an outcome depend upon the initial values of that outcome. Although methods to identify IDEs are well documented, there remains a lack of understanding about the circumstances under which these methods are robust. One context that has not been explored is the identification of intervention differential effect in studies where sample selection is based on the initial value of the outcome being evaluated. We hypothesise that, in such settings, established methods for detecting IDEs will struggle to discriminate these from regression to the mean. Methods: Using simulated datasets of weight-loss intervention programmes that recruit according to initial body mass index, we explore the reliability of Oldham's method and multilevel modelling (MLM) to detect IDEs. Results: In datasets simulated with no IDE, Oldham's method and MLM yield Type I error rates >90%, confirming that threshold selection/truncation leads to bias due to regression to the mean. Type I error rates return close to 5% for both methods when a control group is introduced. Conclusions: Oldham's method and MLM can robustly detect IDEs in this setting, but only if analyses incorporate a control group for comparison.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"172 - 188"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1719690","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41500269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1721965
O. Marrero
ABSTRACT We present a detailed exposition of the development and application of a likelihood-ratio test for seasonality. It is well known that likelihood-ratio tests have optimal power properties. We assess the test's performance by means of a simulation study. The test's application is illustrated with three examples that have different alternative hypotheses, thus extending the original presentation of the test. These examples are not artificial or contrived, but they come from actual, real applications. As far as we know, these are the only completely worked-out examples of this test's application that are available in the literature. Thus, our exposition can serve as a tutorial on the test's application. Our presentation is detailed so as to facilitate further extension and application of the test to other alternative hypotheses. We supply pertinent R computer code in an appendix. For those who teach maximum-likelihood estimation, our examples provide interesting, real-life cases that may be used in teaching.
{"title":"Application and extension of a likelihood-ratio test for seasonality in epidemiological data","authors":"O. Marrero","doi":"10.1080/24709360.2020.1721965","DOIUrl":"https://doi.org/10.1080/24709360.2020.1721965","url":null,"abstract":"ABSTRACT We present a detailed exposition of the development and application of a likelihood-ratio test for seasonality. It is well known that likelihood-ratio tests have optimal power properties. We assess the test's performance by means of a simulation study. The test's application is illustrated with three examples that have different alternative hypotheses, thus extending the original presentation of the test. These examples are not artificial or contrived, but they come from actual, real applications. As far as we know, these are the only completely worked-out examples of this test's application that are available in the literature. Thus, our exposition can serve as a tutorial on the test's application. Our presentation is detailed so as to facilitate further extension and application of the test to other alternative hypotheses. We supply pertinent R computer code in an appendix. For those who teach maximum-likelihood estimation, our examples provide interesting, real-life cases that may be used in teaching.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"189 - 220"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1721965","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44181921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1724003
S. Vijan
Evaluation of treatment effects in randomized clinical trials typically focuses on the average difference in outcomes between arms of a trial. While this approach is the gold standard for establishing a causal relationship between treatment and outcome, reporting of average effects can mask important differences in benefits across various subpopulations, a phenomenon known as heterogeneity of treatment effects (HTE). The presence of HTE has been demonstrated in many settings and lack of consideration of HTE can lead to inappropriate treatment (or lack of treatment) for many patients. This paper describes approaches to analyzing and reporting trials with explicit consideration of heterogeneity, in order to improve our ability to treat individual patients more effectively.
{"title":"Evaluating heterogeneity of treatment effects","authors":"S. Vijan","doi":"10.1080/24709360.2020.1724003","DOIUrl":"https://doi.org/10.1080/24709360.2020.1724003","url":null,"abstract":"Evaluation of treatment effects in randomized clinical trials typically focuses on the average difference in outcomes between arms of a trial. While this approach is the gold standard for establishing a causal relationship between treatment and outcome, reporting of average effects can mask important differences in benefits across various subpopulations, a phenomenon known as heterogeneity of treatment effects (HTE). The presence of HTE has been demonstrated in many settings and lack of consideration of HTE can lead to inappropriate treatment (or lack of treatment) for many patients. This paper describes approaches to analyzing and reporting trials with explicit consideration of heterogeneity, in order to improve our ability to treat individual patients more effectively.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"98 - 104"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1724003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42115724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1708660
T. Kashner
{"title":"Applying statistical and analytical methods to U.S. Department of Veterans Affairs databases","authors":"T. Kashner","doi":"10.1080/24709360.2019.1708660","DOIUrl":"https://doi.org/10.1080/24709360.2019.1708660","url":null,"abstract":"","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"3 - 5"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1708660","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48180550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1704127
C. Clancy
{"title":"Transforming data into actionable insights","authors":"C. Clancy","doi":"10.1080/24709360.2019.1704127","DOIUrl":"https://doi.org/10.1080/24709360.2019.1704127","url":null,"abstract":"","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1704127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45784969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1681211
T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou
In the era of big data and cloud computing, analysts need statistical models to go beyond predicting outcomes to forecasting how outcomes change when decision-makers intervene to change one or more causal factors. This paper reviews methods to estimate the causal effects of treatment choices on patient health outcomes using observational datasets. Methods are limited to those that model choice of treatment (propensity scoring) and treatment outcomes (instrumental variable, difference in differences, control function). A regression framework was developed to show how unobserved confounding covariates and heterogeneous outcomes can introduce biases to effect size estimates. In response to criticisms that outcome approaches are not systematic and subject to model misspecification error, we extend the control function approach of Lu and White by applying Best Approximating Model technology (BAM-CF). Results from simulation experiments are presented to compare biases between BAM-CF and propensity scoring in the presence of an unobserved confounder. We conclude no one strategy is ‘optimal’ for all datasets, and analyst should consider multiple approaches to assess robustness. For both observational and randomized datasets, researchers should assess how moderating covariates impact estimates of treatment effect sizes so that clinicians can understand what is best for each individual patient.
{"title":"Making causal inferences about treatment effect sizes from observational datasets","authors":"T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou","doi":"10.1080/24709360.2019.1681211","DOIUrl":"https://doi.org/10.1080/24709360.2019.1681211","url":null,"abstract":"In the era of big data and cloud computing, analysts need statistical models to go beyond predicting outcomes to forecasting how outcomes change when decision-makers intervene to change one or more causal factors. This paper reviews methods to estimate the causal effects of treatment choices on patient health outcomes using observational datasets. Methods are limited to those that model choice of treatment (propensity scoring) and treatment outcomes (instrumental variable, difference in differences, control function). A regression framework was developed to show how unobserved confounding covariates and heterogeneous outcomes can introduce biases to effect size estimates. In response to criticisms that outcome approaches are not systematic and subject to model misspecification error, we extend the control function approach of Lu and White by applying Best Approximating Model technology (BAM-CF). Results from simulation experiments are presented to compare biases between BAM-CF and propensity scoring in the presence of an unobserved confounder. We conclude no one strategy is ‘optimal’ for all datasets, and analyst should consider multiple approaches to assess robustness. For both observational and randomized datasets, researchers should assess how moderating covariates impact estimates of treatment effect sizes so that clinicians can understand what is best for each individual patient.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"48 - 83"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1681211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44059431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1790085
Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson
What do we wish to investigate? While this may be a common question in research, it does not always come with straightforward answers. This article reviews data-driven methods of collection, questions asked and questions answered, and the myriad of different conclusions that may result. We examine differences in answers to questions based on independent versus correlated observations, bivariate versus conditional associations, relations versus extrapolation, and single membership versus multiple membership modeling. Regardless of the issue, these differences are usually not due to so-called bad data or due to bad models; they are usually due to the investigators misinterpreting the answers that were given. Most importantly, one cannot ask a question and obtain an answer without understanding the data structure, its size and its representativeness. Simply stated, the fact that I went to the store and bought an outfit does not mean the outfit is appropriate for the event. The answers obtained may not be answering the question of interest.
{"title":"Common errors of interpretation in biostatistics","authors":"Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson","doi":"10.1080/24709360.2020.1790085","DOIUrl":"https://doi.org/10.1080/24709360.2020.1790085","url":null,"abstract":"What do we wish to investigate? While this may be a common question in research, it does not always come with straightforward answers. This article reviews data-driven methods of collection, questions asked and questions answered, and the myriad of different conclusions that may result. We examine differences in answers to questions based on independent versus correlated observations, bivariate versus conditional associations, relations versus extrapolation, and single membership versus multiple membership modeling. Regardless of the issue, these differences are usually not due to so-called bad data or due to bad models; they are usually due to the investigators misinterpreting the answers that were given. Most importantly, one cannot ask a question and obtain an answer without understanding the data structure, its size and its representativeness. Simply stated, the fact that I went to the store and bought an outfit does not mean the outfit is appropriate for the event. The answers obtained may not be answering the question of interest.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"238 - 246"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1790085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43256200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}