Pub Date : 2021-07-31DOI: 10.1080/24709360.2021.1948380
Liming Xie
In this paper, the Bayesian structural time series model (BSTS) is used to analyze and predict total confirmed cases who infected COVID-19 in the United States from February 28, 2020 through April 6, 2020 using the collect data from CDC (Center of Disease Control) in the United States. It includes variables of days, total confirmed cases, confirmed cases daily, death cases daily, and fatality rates. The author exploits the flexibility of Local Linear Trend, Seasonality, Contemporaneous covariates of dynamic coefficients in the Bayesian structural time series models. In addition, Causal Impact function in R programming is applied to analyze the model and read report of model. The results of the model show that the total confirmed cases who infected COVID-19 will be still most likely to increase straightly, the total numbers infected COVID-19 would be broken through 600,000 in the United States in near future (in the subsequent months). And then arrive at the peak around mid-May 2020. Also, the model suggests that the probability of variable Recovered cases daily is 0.07.
{"title":"The analysis and forecasting COVID-19 cases in the United States using Bayesian structural time series models","authors":"Liming Xie","doi":"10.1080/24709360.2021.1948380","DOIUrl":"https://doi.org/10.1080/24709360.2021.1948380","url":null,"abstract":"In this paper, the Bayesian structural time series model (BSTS) is used to analyze and predict total confirmed cases who infected COVID-19 in the United States from February 28, 2020 through April 6, 2020 using the collect data from CDC (Center of Disease Control) in the United States. It includes variables of days, total confirmed cases, confirmed cases daily, death cases daily, and fatality rates. The author exploits the flexibility of Local Linear Trend, Seasonality, Contemporaneous covariates of dynamic coefficients in the Bayesian structural time series models. In addition, Causal Impact function in R programming is applied to analyze the model and read report of model. The results of the model show that the total confirmed cases who infected COVID-19 will be still most likely to increase straightly, the total numbers infected COVID-19 would be broken through 600,000 in the United States in near future (in the subsequent months). And then arrive at the peak around mid-May 2020. Also, the model suggests that the probability of variable Recovered cases daily is 0.07.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"6 1","pages":"1 - 15"},"PeriodicalIF":0.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1948380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47094435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-22DOI: 10.1080/24709360.2021.1948381
M. Sooriyarachchi
In epidemiology, it is often the case that two or more correlated count response variables are encountered. Under this scenario, it is more efficient to model the data using a joint model. Besides, if one of these count variables has an excess of zeros (spike at zero) the log link cannot be used in general. The situation is more complicated when the data is grouped into clusters. A Generalized Linear Mixed Model (GLMM) is used to accommodate this cluster covariance. The objective of this research is to develop a new modeling approach that can handle this situation. The method is illustrated on a global data set of Covid 19 patients. The important conclusions are that the new model was successfully implemented both in theory and practice. A plot of the residuals indicated a well-fitting model to the data.
{"title":"Joint modeling of two count variables using a shared random effect model in the presence of clusters for complex data","authors":"M. Sooriyarachchi","doi":"10.1080/24709360.2021.1948381","DOIUrl":"https://doi.org/10.1080/24709360.2021.1948381","url":null,"abstract":"In epidemiology, it is often the case that two or more correlated count response variables are encountered. Under this scenario, it is more efficient to model the data using a joint model. Besides, if one of these count variables has an excess of zeros (spike at zero) the log link cannot be used in general. The situation is more complicated when the data is grouped into clusters. A Generalized Linear Mixed Model (GLMM) is used to accommodate this cluster covariance. The objective of this research is to develop a new modeling approach that can handle this situation. The method is illustrated on a global data set of Covid 19 patients. The important conclusions are that the new model was successfully implemented both in theory and practice. A plot of the residuals indicated a well-fitting model to the data.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"6 1","pages":"16 - 30"},"PeriodicalIF":0.0,"publicationDate":"2021-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1948381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48934622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/24709360.2021.1953942
G. Pennello, Xiting Yang
We are delighted to offer this special issue of Biostatistics & Epidemiology on ‘Statistical Methods in Precision Medicine: Diagnostic, Prognostic, Predictive and Therapeutic.’ Precision medicine, often referred to as personalized medicine, has a relatively short history but presents great opportunities and challenges. As former US Health and Human Services Secretary Michael Leavitt said in a 2007 meeting of the Personalized Medicine Coalition, advances in science and technology present an unprecedented ‘opportunity to bring health care to a new level of effectiveness and safety’ [1]. In particular, recent advances have been made in omicsbased in vitro measurements [2–4], quantitative imaging biomarkers [5], artificial intelligence/ machine learning [6], and electronic health record keeping [7]. These advances and others have led to a surge in medical research activity into personalized medicine, which has been described as ‘providing the right drug for the right patient at the right time’ [8]. As a result, the potential has never been greater to obtain powerful information for individualizing medical decision making, including but not limited to information on diagnosis, prognosis, and treatment selection, and for predicting dose, monitoring disease, modifying behavior, and aiding the development of a tailored therapy, that is, a drug or a medical device [9, 10]. The recognition that advances in science, technology, mathematics, and data collection could revolutionize healthcare has led to many important government initiatives. In 2015, the US launched the Precision Medicine Initiative (PMI), with the mission ‘to enable a new era of medicine through research, technology, and policies that empower patients, researchers, and providers to work together toward development of individualized care.’ This announcement was followed by the 21st Century Cures Act [11], which provided funding for PMI to drive research into the genetic, lifestyle and environmental variations of disease. Prior to PMI, the US Food and Drug Administration (FDA) had already made personalized medicine a top priority, issuing the discussion paper Paving the Way for Personalized Medicine: FDA’s Role in a New Era of Medical Product Development [12]. The FDA and the National Institutes of Health (NIH) published a working glossary of terminology for Biomarkers, EndpointS, and other Tools (BEST) [13]. The European Union Council [14] provided discussions on personalizedmedicine, including a formal definition. The EuropeanMedicinesAgency (EMA) provided a perspective on pharmacogenomic information in drug labeling [15]. The first goal of EMA’s vision of Regulatory Science Strategy to 2025 [16] is ‘Catalysing the integration of science and technology in medicines development,’ under which the first core recommendation is to ‘support developments in precision medicine, biomarkers and omics’. These are just a few selected examples of regulatory efforts being made across the globe to facilita
{"title":"Special issue introduction: Statistical Methods in Precision Medicine: Diagnostic, Prognostic, Predictive and Therapeutic","authors":"G. Pennello, Xiting Yang","doi":"10.1080/24709360.2021.1953942","DOIUrl":"https://doi.org/10.1080/24709360.2021.1953942","url":null,"abstract":"We are delighted to offer this special issue of Biostatistics & Epidemiology on ‘Statistical Methods in Precision Medicine: Diagnostic, Prognostic, Predictive and Therapeutic.’ Precision medicine, often referred to as personalized medicine, has a relatively short history but presents great opportunities and challenges. As former US Health and Human Services Secretary Michael Leavitt said in a 2007 meeting of the Personalized Medicine Coalition, advances in science and technology present an unprecedented ‘opportunity to bring health care to a new level of effectiveness and safety’ [1]. In particular, recent advances have been made in omicsbased in vitro measurements [2–4], quantitative imaging biomarkers [5], artificial intelligence/ machine learning [6], and electronic health record keeping [7]. These advances and others have led to a surge in medical research activity into personalized medicine, which has been described as ‘providing the right drug for the right patient at the right time’ [8]. As a result, the potential has never been greater to obtain powerful information for individualizing medical decision making, including but not limited to information on diagnosis, prognosis, and treatment selection, and for predicting dose, monitoring disease, modifying behavior, and aiding the development of a tailored therapy, that is, a drug or a medical device [9, 10]. The recognition that advances in science, technology, mathematics, and data collection could revolutionize healthcare has led to many important government initiatives. In 2015, the US launched the Precision Medicine Initiative (PMI), with the mission ‘to enable a new era of medicine through research, technology, and policies that empower patients, researchers, and providers to work together toward development of individualized care.’ This announcement was followed by the 21st Century Cures Act [11], which provided funding for PMI to drive research into the genetic, lifestyle and environmental variations of disease. Prior to PMI, the US Food and Drug Administration (FDA) had already made personalized medicine a top priority, issuing the discussion paper Paving the Way for Personalized Medicine: FDA’s Role in a New Era of Medical Product Development [12]. The FDA and the National Institutes of Health (NIH) published a working glossary of terminology for Biomarkers, EndpointS, and other Tools (BEST) [13]. The European Union Council [14] provided discussions on personalizedmedicine, including a formal definition. The EuropeanMedicinesAgency (EMA) provided a perspective on pharmacogenomic information in drug labeling [15]. The first goal of EMA’s vision of Regulatory Science Strategy to 2025 [16] is ‘Catalysing the integration of science and technology in medicines development,’ under which the first core recommendation is to ‘support developments in precision medicine, biomarkers and omics’. These are just a few selected examples of regulatory efforts being made across the globe to facilita","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"93 - 99"},"PeriodicalIF":0.0,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49355065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-03DOI: 10.1080/24709360.2021.1975255
Yunyun Jiang, Q. Pan, Ying Liu, S. Evans
Sensitivity and specificity are key aspects in evaluating the performance of diagnostic tests. Accuracy and AUC are commonly used composite measures that incorporate sensitivity and specificity. Average Weighted Accuracy (AWA) is motivated by the need for a statistical measure that compares diagnostic tests from the medical costs and clinical impact point of view, while incorporating the relevant prevalence range of the disease as well as the relative importance of false-positive versus false-negative cases. We illustrate the testing procedures in four different scenarios: (i) one diagnostic test vs. the best random test, (ii) two diagnostic tests from two independent samples, (iii) two diagnostic tests from the same sample, and (iv) more than two diagnostic tests from different or the same samples. The impacts of sample size, prevalence, and relative importance on power and average medical costs/clinical loss are examined through simulation studies. Accuracy has the highest power while AWA provides a consistent criterion in selecting the optimal threshold and better diagnostic tests with direct clinical interpretations. The use of AWA is illustrated on a three-arm clinical trial evaluating three different assays in detecting Neisseria gonorrhoeae and Chlamydia trachomatis in the rectum and pharynx.
{"title":"A statistical review: why average weighted accuracy, not accuracy or AUC?","authors":"Yunyun Jiang, Q. Pan, Ying Liu, S. Evans","doi":"10.1080/24709360.2021.1975255","DOIUrl":"https://doi.org/10.1080/24709360.2021.1975255","url":null,"abstract":"Sensitivity and specificity are key aspects in evaluating the performance of diagnostic tests. Accuracy and AUC are commonly used composite measures that incorporate sensitivity and specificity. Average Weighted Accuracy (AWA) is motivated by the need for a statistical measure that compares diagnostic tests from the medical costs and clinical impact point of view, while incorporating the relevant prevalence range of the disease as well as the relative importance of false-positive versus false-negative cases. We illustrate the testing procedures in four different scenarios: (i) one diagnostic test vs. the best random test, (ii) two diagnostic tests from two independent samples, (iii) two diagnostic tests from the same sample, and (iv) more than two diagnostic tests from different or the same samples. The impacts of sample size, prevalence, and relative importance on power and average medical costs/clinical loss are examined through simulation studies. Accuracy has the highest power while AWA provides a consistent criterion in selecting the optimal threshold and better diagnostic tests with direct clinical interpretations. The use of AWA is illustrated on a three-arm clinical trial evaluating three different assays in detecting Neisseria gonorrhoeae and Chlamydia trachomatis in the rectum and pharynx.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"267 - 286"},"PeriodicalIF":0.0,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41893870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-10DOI: 10.1080/24709360.2021.1913706
G. Campbell
Companion diagnostic tests are crucial in the development of precision medicine. These tests provide information that is essential for the safe and effective use of specific therapeutic products. Statistics plays a key role in the design and analysis of studies to demonstrate the safety and effectiveness of the companion diagnostics. This article can serve as an introduction to companion diagnostics for therapeutic statisticians and for diagnostic ones as well as a discussion of some of the statistical challenges. The topics include biomarker development, diagnostic performance, misclassification, prospective-retrospective validation, bridging studies, missing data, follow-on diagnostics and complex signatures.
{"title":"The role of statistics in the design and analysis of companion diagnostic (CDx) studies","authors":"G. Campbell","doi":"10.1080/24709360.2021.1913706","DOIUrl":"https://doi.org/10.1080/24709360.2021.1913706","url":null,"abstract":"Companion diagnostic tests are crucial in the development of precision medicine. These tests provide information that is essential for the safe and effective use of specific therapeutic products. Statistics plays a key role in the design and analysis of studies to demonstrate the safety and effectiveness of the companion diagnostics. This article can serve as an introduction to companion diagnostics for therapeutic statisticians and for diagnostic ones as well as a discussion of some of the statistical challenges. The topics include biomarker development, diagnostic performance, misclassification, prospective-retrospective validation, bridging studies, missing data, follow-on diagnostics and complex signatures.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"218 - 231"},"PeriodicalIF":0.0,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1913706","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41503728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-25DOI: 10.1080/24709360.2021.1913705
R. Simon, Songbai Wang
Companion diagnostic tests play an important role in precision medicine. With the advancement of new technologies, multiple companion diagnostic tests can be rapidly developed in multiple platforms and use different samples to select patients for new treatments. Analytically validated assays must be clinically evaluated before they can be implemented in patient management. The status quo design for validating candidate assays is to employ one candidate assay to select patients for new drug clinical trial and then further evaluate the 2nd candidate assay in a bridging study. We propose a new enrollment strategy that employs two assays to select patients. We then develop a bivariate Bayesian approach that enables the totality of data to be used in evaluating whether these assays can be used independently or in a composite procedure in selecting right patients for new treatment. We demonstrate through simulations that when proper priors are available, the Bayesian approach is superior to classical methods in terms of statistical power.
{"title":"A bivariate Bayesian framework for simultaneous evaluation of two candidate companion diagnostic assays in a new drug clinical trial","authors":"R. Simon, Songbai Wang","doi":"10.1080/24709360.2021.1913705","DOIUrl":"https://doi.org/10.1080/24709360.2021.1913705","url":null,"abstract":"Companion diagnostic tests play an important role in precision medicine. With the advancement of new technologies, multiple companion diagnostic tests can be rapidly developed in multiple platforms and use different samples to select patients for new treatments. Analytically validated assays must be clinically evaluated before they can be implemented in patient management. The status quo design for validating candidate assays is to employ one candidate assay to select patients for new drug clinical trial and then further evaluate the 2nd candidate assay in a bridging study. We propose a new enrollment strategy that employs two assays to select patients. We then develop a bivariate Bayesian approach that enables the totality of data to be used in evaluating whether these assays can be used independently or in a composite procedure in selecting right patients for new treatment. We demonstrate through simulations that when proper priors are available, the Bayesian approach is superior to classical methods in terms of statistical power.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"207 - 217"},"PeriodicalIF":0.0,"publicationDate":"2021-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1913705","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44584920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-17DOI: 10.1080/24709360.2021.1898731
Jirui Wang, Yunpeng Zhao, L. Tang
This manuscript estimates the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods.
{"title":"Estimating the AUC with a graphical lasso method for high-dimensional biomarkers with LOD","authors":"Jirui Wang, Yunpeng Zhao, L. Tang","doi":"10.1080/24709360.2021.1898731","DOIUrl":"https://doi.org/10.1080/24709360.2021.1898731","url":null,"abstract":"This manuscript estimates the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"189 - 206"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1898731","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44107211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-17DOI: 10.1080/24709360.2021.1898269
Hanzhong Liu
Investigators often use regression adjustment methods to analyze the results of randomized experiments when baseline covariates are available. Their aim is to improve the estimation efficiency of treatment effects by adjusting for imbalance of covariates. Under mild conditions, the regression-adjusted average treatment effect estimator is asymptotically normal with asymptotic variance no greater than that of the unadjusted estimator. The asymptotic variance can be estimated conservatively based on residual sum of squares. This article studies alternative inference methods based on the bootstrap and investigates their asymptotic properties under the Neyman–Rubin causal model and randomization-based inference framework. We show that the weighted, residual and paired bootstrap methods provide asymptotically conservative variance estimators that perform at least as good as the estimator based on residual sum of squares. We further provide counterexamples, where the original estimator is asymptotically normal, but the bootstrap counterpart is inconsistent for estimating its limiting distribution. Simulation studies indicate that the paired bootstrap method is preferable, in terms of preserving type I errors, for a small sample size. Finally, our methods analyze HER2+ breast cancer data from the NeOAdjuvant Herceptin trial to examine the effectiveness of trastuzumab in combination with neoadjuvant chemotherapy.
{"title":"Bootstrapping inference of average treatment effect in completely randomized experiments with high-dimensional covariates","authors":"Hanzhong Liu","doi":"10.1080/24709360.2021.1898269","DOIUrl":"https://doi.org/10.1080/24709360.2021.1898269","url":null,"abstract":"Investigators often use regression adjustment methods to analyze the results of randomized experiments when baseline covariates are available. Their aim is to improve the estimation efficiency of treatment effects by adjusting for imbalance of covariates. Under mild conditions, the regression-adjusted average treatment effect estimator is asymptotically normal with asymptotic variance no greater than that of the unadjusted estimator. The asymptotic variance can be estimated conservatively based on residual sum of squares. This article studies alternative inference methods based on the bootstrap and investigates their asymptotic properties under the Neyman–Rubin causal model and randomization-based inference framework. We show that the weighted, residual and paired bootstrap methods provide asymptotically conservative variance estimators that perform at least as good as the estimator based on residual sum of squares. We further provide counterexamples, where the original estimator is asymptotically normal, but the bootstrap counterpart is inconsistent for estimating its limiting distribution. Simulation studies indicate that the paired bootstrap method is preferable, in terms of preserving type I errors, for a small sample size. Finally, our methods analyze HER2+ breast cancer data from the NeOAdjuvant Herceptin trial to examine the effectiveness of trastuzumab in combination with neoadjuvant chemotherapy.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"6 1","pages":"203 - 220"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1898269","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45298163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-17DOI: 10.1080/24709360.2021.1880224
Zhuang Miao, L. Tang, Ao Yuan
In clustered receiver operating characteristic (ROC) data each patient has several normal and abnormal observations. Within the same cluster, observations are naturally correlated. Several nonparametric methods have been proposed in the literature to handle clustered data structure, but their performances on simulated and real datasets have not been compared. Recently, a multiple outputation method has been considered for clustered data in areas other than diagnostic accuracy to account for within-cluster correlation. The multiple outputation method offers a resampling-based alternative for one sample clustered data with or without covariates, or for hypothesis testing in two sample clustered data. The method does not require a specific within-cluster correlation structure and yields a valid estimator while accounting for the within-cluster correlations. This paper contributes to the literature by introducing the multiple outputation method to the ROC setting, and empirically comparing the performance of these clustered ROC curve methods. The performance of these methods is also evaluated through two real examples.
{"title":"Comparative study of statistical methods for clustered ROC data: nonparametric methods and multiple outputation methods","authors":"Zhuang Miao, L. Tang, Ao Yuan","doi":"10.1080/24709360.2021.1880224","DOIUrl":"https://doi.org/10.1080/24709360.2021.1880224","url":null,"abstract":"In clustered receiver operating characteristic (ROC) data each patient has several normal and abnormal observations. Within the same cluster, observations are naturally correlated. Several nonparametric methods have been proposed in the literature to handle clustered data structure, but their performances on simulated and real datasets have not been compared. Recently, a multiple outputation method has been considered for clustered data in areas other than diagnostic accuracy to account for within-cluster correlation. The multiple outputation method offers a resampling-based alternative for one sample clustered data with or without covariates, or for hypothesis testing in two sample clustered data. The method does not require a specific within-cluster correlation structure and yields a valid estimator while accounting for the within-cluster correlations. This paper contributes to the literature by introducing the multiple outputation method to the ROC setting, and empirically comparing the performance of these clustered ROC curve methods. The performance of these methods is also evaluated through two real examples.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"169 - 188"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1880224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48308687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-17DOI: 10.1080/24709360.2021.1898730
Vahe Avagyan, S. Vansteelandt
The presence of confounding by high-dimensional variables complicates the estimation of the average effect of a point treatment. On the one hand, it necessitates the use of variable selection strategies or more general high-dimensional statistical methods. On the other hand, the use of such techniques tends to result in biased estimators with a non-standard asymptotic behavior. Double-robust estimators are useful for offering a resolution because they possess a so-called small bias property. This property has been exploited to achieve valid (uniform) inference of the average causal effect when data-adaptive estimators of the propensity score and conditional outcome mean both converge to their respective truths at sufficiently fast rate. In this article, we extend this work in order to retain valid (uniform) inference when one of these estimators does not converge to the truth, regardless of which. This is done by generalizing prior work for low-dimensional settings by [Vermeulen K, Vansteelandt S. Bias-reduced doubly robust estimation. Am Stat Assoc. 2015;110(511):1024–1036.] to incorporate regularization. The proposed penalized bias-reduced double-robust estimation strategy exhibits promising performance in simulation studies and a data analysis, relative to competing proposals.
高维变量混杂的存在使积分治疗的平均效果的估计变得复杂。一方面,它需要使用变量选择策略或更通用的高维统计方法。另一方面,使用这种技术往往会导致具有非标准渐近行为的有偏估计量。双稳健估计量对于提供分辨率是有用的,因为它们具有所谓的小偏差性质。当倾向得分和条件结果均值的数据自适应估计量都以足够快的速度收敛到它们各自的真理时,这一特性已被用来实现平均因果效应的有效(一致)推断。在本文中,我们扩展了这项工作,以便在其中一个估计量不收敛于真值时保持有效(一致)推断,无论是哪一个。这是通过将[Vermeulen K,Vansteelandt S.Bias reduced double robust estimation.Am Stat Assoc.2015;110(511):1024–1036.]对低维设置的先前工作进行推广来实现的。相对于竞争方案,所提出的惩罚偏差减少双稳健估计策略在模拟研究和数据分析中表现出了良好的性能。
{"title":"High-dimensional inference for the average treatment effect under model misspecification using penalized bias-reduced double-robust estimation","authors":"Vahe Avagyan, S. Vansteelandt","doi":"10.1080/24709360.2021.1898730","DOIUrl":"https://doi.org/10.1080/24709360.2021.1898730","url":null,"abstract":"The presence of confounding by high-dimensional variables complicates the estimation of the average effect of a point treatment. On the one hand, it necessitates the use of variable selection strategies or more general high-dimensional statistical methods. On the other hand, the use of such techniques tends to result in biased estimators with a non-standard asymptotic behavior. Double-robust estimators are useful for offering a resolution because they possess a so-called small bias property. This property has been exploited to achieve valid (uniform) inference of the average causal effect when data-adaptive estimators of the propensity score and conditional outcome mean both converge to their respective truths at sufficiently fast rate. In this article, we extend this work in order to retain valid (uniform) inference when one of these estimators does not converge to the truth, regardless of which. This is done by generalizing prior work for low-dimensional settings by [Vermeulen K, Vansteelandt S. Bias-reduced doubly robust estimation. Am Stat Assoc. 2015;110(511):1024–1036.] to incorporate regularization. The proposed penalized bias-reduced double-robust estimation strategy exhibits promising performance in simulation studies and a data analysis, relative to competing proposals.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"6 1","pages":"221 - 238"},"PeriodicalIF":0.0,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2021.1898730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48653333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}