Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1724003
S. Vijan
Evaluation of treatment effects in randomized clinical trials typically focuses on the average difference in outcomes between arms of a trial. While this approach is the gold standard for establishing a causal relationship between treatment and outcome, reporting of average effects can mask important differences in benefits across various subpopulations, a phenomenon known as heterogeneity of treatment effects (HTE). The presence of HTE has been demonstrated in many settings and lack of consideration of HTE can lead to inappropriate treatment (or lack of treatment) for many patients. This paper describes approaches to analyzing and reporting trials with explicit consideration of heterogeneity, in order to improve our ability to treat individual patients more effectively.
{"title":"Evaluating heterogeneity of treatment effects","authors":"S. Vijan","doi":"10.1080/24709360.2020.1724003","DOIUrl":"https://doi.org/10.1080/24709360.2020.1724003","url":null,"abstract":"Evaluation of treatment effects in randomized clinical trials typically focuses on the average difference in outcomes between arms of a trial. While this approach is the gold standard for establishing a causal relationship between treatment and outcome, reporting of average effects can mask important differences in benefits across various subpopulations, a phenomenon known as heterogeneity of treatment effects (HTE). The presence of HTE has been demonstrated in many settings and lack of consideration of HTE can lead to inappropriate treatment (or lack of treatment) for many patients. This paper describes approaches to analyzing and reporting trials with explicit consideration of heterogeneity, in order to improve our ability to treat individual patients more effectively.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"98 - 104"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1724003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42115724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1708660
T. Kashner
{"title":"Applying statistical and analytical methods to U.S. Department of Veterans Affairs databases","authors":"T. Kashner","doi":"10.1080/24709360.2019.1708660","DOIUrl":"https://doi.org/10.1080/24709360.2019.1708660","url":null,"abstract":"","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"3 - 5"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1708660","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48180550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1681211
T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou
In the era of big data and cloud computing, analysts need statistical models to go beyond predicting outcomes to forecasting how outcomes change when decision-makers intervene to change one or more causal factors. This paper reviews methods to estimate the causal effects of treatment choices on patient health outcomes using observational datasets. Methods are limited to those that model choice of treatment (propensity scoring) and treatment outcomes (instrumental variable, difference in differences, control function). A regression framework was developed to show how unobserved confounding covariates and heterogeneous outcomes can introduce biases to effect size estimates. In response to criticisms that outcome approaches are not systematic and subject to model misspecification error, we extend the control function approach of Lu and White by applying Best Approximating Model technology (BAM-CF). Results from simulation experiments are presented to compare biases between BAM-CF and propensity scoring in the presence of an unobserved confounder. We conclude no one strategy is ‘optimal’ for all datasets, and analyst should consider multiple approaches to assess robustness. For both observational and randomized datasets, researchers should assess how moderating covariates impact estimates of treatment effect sizes so that clinicians can understand what is best for each individual patient.
{"title":"Making causal inferences about treatment effect sizes from observational datasets","authors":"T. Kashner, Steven S. Henley, R. Golden, Xiao‐Hua Zhou","doi":"10.1080/24709360.2019.1681211","DOIUrl":"https://doi.org/10.1080/24709360.2019.1681211","url":null,"abstract":"In the era of big data and cloud computing, analysts need statistical models to go beyond predicting outcomes to forecasting how outcomes change when decision-makers intervene to change one or more causal factors. This paper reviews methods to estimate the causal effects of treatment choices on patient health outcomes using observational datasets. Methods are limited to those that model choice of treatment (propensity scoring) and treatment outcomes (instrumental variable, difference in differences, control function). A regression framework was developed to show how unobserved confounding covariates and heterogeneous outcomes can introduce biases to effect size estimates. In response to criticisms that outcome approaches are not systematic and subject to model misspecification error, we extend the control function approach of Lu and White by applying Best Approximating Model technology (BAM-CF). Results from simulation experiments are presented to compare biases between BAM-CF and propensity scoring in the presence of an unobserved confounder. We conclude no one strategy is ‘optimal’ for all datasets, and analyst should consider multiple approaches to assess robustness. For both observational and randomized datasets, researchers should assess how moderating covariates impact estimates of treatment effect sizes so that clinicians can understand what is best for each individual patient.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"48 - 83"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1681211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44059431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1704127
C. Clancy
{"title":"Transforming data into actionable insights","authors":"C. Clancy","doi":"10.1080/24709360.2019.1704127","DOIUrl":"https://doi.org/10.1080/24709360.2019.1704127","url":null,"abstract":"","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1704127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45784969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2020.1790085
Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson
What do we wish to investigate? While this may be a common question in research, it does not always come with straightforward answers. This article reviews data-driven methods of collection, questions asked and questions answered, and the myriad of different conclusions that may result. We examine differences in answers to questions based on independent versus correlated observations, bivariate versus conditional associations, relations versus extrapolation, and single membership versus multiple membership modeling. Regardless of the issue, these differences are usually not due to so-called bad data or due to bad models; they are usually due to the investigators misinterpreting the answers that were given. Most importantly, one cannot ask a question and obtain an answer without understanding the data structure, its size and its representativeness. Simply stated, the fact that I went to the store and bought an outfit does not mean the outfit is appropriate for the event. The answers obtained may not be answering the question of interest.
{"title":"Common errors of interpretation in biostatistics","authors":"Elsa Vazquez Arreola, Kyle M. Irimata, Jeffrey R. Wilson","doi":"10.1080/24709360.2020.1790085","DOIUrl":"https://doi.org/10.1080/24709360.2020.1790085","url":null,"abstract":"What do we wish to investigate? While this may be a common question in research, it does not always come with straightforward answers. This article reviews data-driven methods of collection, questions asked and questions answered, and the myriad of different conclusions that may result. We examine differences in answers to questions based on independent versus correlated observations, bivariate versus conditional associations, relations versus extrapolation, and single membership versus multiple membership modeling. Regardless of the issue, these differences are usually not due to so-called bad data or due to bad models; they are usually due to the investigators misinterpreting the answers that were given. Most importantly, one cannot ask a question and obtain an answer without understanding the data structure, its size and its representativeness. Simply stated, the fact that I went to the store and bought an outfit does not mean the outfit is appropriate for the event. The answers obtained may not be answering the question of interest.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"238 - 246"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2020.1790085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43256200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1080/24709360.2019.1618653
Steven S. Henley, R. Golden, T. Kashner
ABSTRACT Statistical modeling methods are widely used in clinical science, epidemiology, and health services research to analyze data that has been collected in clinical trials as well as observational studies of existing data sources, such as claims files and electronic health records. Diagnostic and prognostic inferences from statistical models are critical to researchers advancing science, clinical practitioners making patient care decisions, and administrators and policy makers impacting the health care system to improve quality and reduce costs. The veracity of such inferences relies not only on the quality and completeness of the collected data, but also statistical model validity. A key component of establishing model validity is determining when a model is not correctly specified and therefore incapable of adequately representing the Data Generating Process (DGP). In this article, model validity is first described and methods designed for assessing model fit, specification, and selection are reviewed. Second, data transformations that improve the model’s ability to represent the DGP are addressed. Third, model search and validation methods are discussed. Finally, methods for evaluating predictive and classification performance are presented. Together, these methods provide a practical framework with recommendations to guide the development and evaluation of statistical models that provide valid statistical inferences.
{"title":"Statistical modeling methods: challenges and strategies","authors":"Steven S. Henley, R. Golden, T. Kashner","doi":"10.1080/24709360.2019.1618653","DOIUrl":"https://doi.org/10.1080/24709360.2019.1618653","url":null,"abstract":"ABSTRACT Statistical modeling methods are widely used in clinical science, epidemiology, and health services research to analyze data that has been collected in clinical trials as well as observational studies of existing data sources, such as claims files and electronic health records. Diagnostic and prognostic inferences from statistical models are critical to researchers advancing science, clinical practitioners making patient care decisions, and administrators and policy makers impacting the health care system to improve quality and reduce costs. The veracity of such inferences relies not only on the quality and completeness of the collected data, but also statistical model validity. A key component of establishing model validity is determining when a model is not correctly specified and therefore incapable of adequately representing the Data Generating Process (DGP). In this article, model validity is first described and methods designed for assessing model fit, specification, and selection are reviewed. Second, data transformations that improve the model’s ability to represent the DGP are addressed. Third, model search and validation methods are discussed. Finally, methods for evaluating predictive and classification performance are presented. Together, these methods provide a practical framework with recommendations to guide the development and evaluation of statistical models that provide valid statistical inferences.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"4 1","pages":"105 - 139"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1618653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47377251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-15DOI: 10.1080/24709360.2019.1673623
Zheyu Wang
Latent variable modeling is often used in diagnostic studies where a gold standard reference test is not available. Its applications have become increasing popular with the fast discovery of novel biomarkers and the effort to improve healthcare for each individual. This paper attempt to provide a review on current developments and debates of these models with a focus in diagnostic studies and to discuss the value as well as cautionary considerations in the applications of these models.
{"title":"Developments and debates on latent variable modeling in diagnostic studies when there is no gold standard","authors":"Zheyu Wang","doi":"10.1080/24709360.2019.1673623","DOIUrl":"https://doi.org/10.1080/24709360.2019.1673623","url":null,"abstract":"Latent variable modeling is often used in diagnostic studies where a gold standard reference test is not available. Its applications have become increasing popular with the fast discovery of novel biomarkers and the effort to improve healthcare for each individual. This paper attempt to provide a review on current developments and debates of these models with a focus in diagnostic studies and to discuss the value as well as cautionary considerations in the applications of these models.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"100 - 117"},"PeriodicalIF":0.0,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1673623","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45189867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-01DOI: 10.1080/24709360.2019.1615770
A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj
Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.
找出数据集中的聚类数量被认为是聚类分析的基本问题之一。本文通过MCSim软件包将最大聚类相似性(MCS)集成到R统计软件中,以寻找最优聚类数。两种聚类方法之间的相似性是在相同数量的聚类下计算的,使用Rand[聚类方法评估的客观标准。J Am Stat Assoc.1971;66:846–850.]和Jaccard[高山区植物群的分布。新植物学家。1912;11:37–50.]指数,对偶然一致性进行校正。指数以最高频率达到最大值的聚类数量是最优聚类数量的候选者。与其他标准不同,MCS可用于循环数据。在MCSim中实现了R中存在的七种聚类算法。使用校正的相似性指数生成聚类数量与聚类相似性的关系图。生成相似性指数的值和聚类树(树状图)。给出了几个例子,包括模拟、真实和循环数据集,以展示MCSim是如何在实践中成功工作的。
{"title":"How many clusters exist? Answer via maximum clustering similarity implemented in R","authors":"A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj","doi":"10.1080/24709360.2019.1615770","DOIUrl":"https://doi.org/10.1080/24709360.2019.1615770","url":null,"abstract":"Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"62 - 79"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1615770","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42954294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-01DOI: 10.1080/24709360.2019.1699341
Nathalie C. Moon, Leilei Zeng, R. Cook
Cohort studies are routinely conducted to learn about the incidence or progression rates of chronic diseases. The illness-death model offers a natural framework for joint consideration of non-fatal events in the semi-competing risks setting. We consider the design of prospective cohort studies where the goal is to estimate the effect of a marker on the risk of a non-fatal event which is subject to interval-censoring due to an intermittent observation scheme. The sample size is shown to depend on the effect of interest, the number of assessments, and the duration of follow-up. Minimum-cost designs are also developed to account for the different costs of recruitment and follow-up examination. We also consider the setting where the event status of individuals is observed subject to misclassification; the consequent need to increase the sample size to account for this error is illustrated through asymptotic calculations.
{"title":"Cohort study design for illness-death processes with disease status under intermittent observation","authors":"Nathalie C. Moon, Leilei Zeng, R. Cook","doi":"10.1080/24709360.2019.1699341","DOIUrl":"https://doi.org/10.1080/24709360.2019.1699341","url":null,"abstract":"Cohort studies are routinely conducted to learn about the incidence or progression rates of chronic diseases. The illness-death model offers a natural framework for joint consideration of non-fatal events in the semi-competing risks setting. We consider the design of prospective cohort studies where the goal is to estimate the effect of a marker on the risk of a non-fatal event which is subject to interval-censoring due to an intermittent observation scheme. The sample size is shown to depend on the effect of interest, the number of assessments, and the duration of follow-up. Minimum-cost designs are also developed to account for the different costs of recruitment and follow-up examination. We also consider the setting where the event status of individuals is observed subject to misclassification; the consequent need to increase the sample size to account for this error is illustrated through asymptotic calculations.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"178 - 200"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1699341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47561742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-01-01DOI: 10.1080/24709360.2019.1591072
Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou
Sparse and functional principal component analysis is a technique to extract sparse and smooth principal components from a matrix. In this paper, we propose a modified sparse and functional principal component analysis model for feature extraction. We measure the tuning parameters by their robustness against random perturbation, and select the tuning parameters by derivative-free optimization. We test our algorithm on the ADNI dataset to distinguish between the patients with Alzheimer's disease and the control group. By applying proper classification methods for sparse features, we get better result than classic singular value decomposition, support vector machine and logistic regression.
{"title":"Modified sparse functional principal component analysis for fMRI data process","authors":"Zhengyang Fang, J. Y. Han, N. Simon, Xiaoping Zhou","doi":"10.1080/24709360.2019.1591072","DOIUrl":"https://doi.org/10.1080/24709360.2019.1591072","url":null,"abstract":"Sparse and functional principal component analysis is a technique to extract sparse and smooth principal components from a matrix. In this paper, we propose a modified sparse and functional principal component analysis model for feature extraction. We measure the tuning parameters by their robustness against random perturbation, and select the tuning parameters by derivative-free optimization. We test our algorithm on the ADNI dataset to distinguish between the patients with Alzheimer's disease and the control group. By applying proper classification methods for sparse features, we get better result than classic singular value decomposition, support vector machine and logistic regression.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"80 - 89"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1591072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46473035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}