Pub Date : 2024-12-10Epub Date: 2024-10-24DOI: 10.1002/sim.10251
Weichang Yu, Howard Bondell
We propose a semiparametric approach to Bayesian modeling of dynamic treatment regimes that is built on a Bayesian likelihood-based regression estimation framework. Methods based on this framework exhibit a probabilistic coherence property that leads to accurate estimation of the optimal dynamic treatment regime. Unlike most Bayesian estimation methods, our proposed method avoids strong distributional assumptions for the intermediate and final outcomes by utilizing empirical likelihoods. Our proposed method allows for either linear, or more flexible forms of mean functions for the stagewise outcomes. A variational Bayes approximation is used for computation to avoid common pitfalls associated with Markov Chain Monte Carlo approaches coupled with empirical likelihood. Through simulations and analysis of the STAR*D sequential randomized trial data, our proposed method demonstrates superior accuracy over Q-learning and parametric Bayesian likelihood-based regression estimation, particularly when the parametric assumptions of regression error distributions may be potentially violated.
{"title":"Bayesian Empirical Likelihood Regression for Semiparametric Estimation of Optimal Dynamic Treatment Regimes.","authors":"Weichang Yu, Howard Bondell","doi":"10.1002/sim.10251","DOIUrl":"10.1002/sim.10251","url":null,"abstract":"<p><p>We propose a semiparametric approach to Bayesian modeling of dynamic treatment regimes that is built on a Bayesian likelihood-based regression estimation framework. Methods based on this framework exhibit a probabilistic coherence property that leads to accurate estimation of the optimal dynamic treatment regime. Unlike most Bayesian estimation methods, our proposed method avoids strong distributional assumptions for the intermediate and final outcomes by utilizing empirical likelihoods. Our proposed method allows for either linear, or more flexible forms of mean functions for the stagewise outcomes. A variational Bayes approximation is used for computation to avoid common pitfalls associated with Markov Chain Monte Carlo approaches coupled with empirical likelihood. Through simulations and analysis of the STAR*D sequential randomized trial data, our proposed method demonstrates superior accuracy over Q-learning and parametric Bayesian likelihood-based regression estimation, particularly when the parametric assumptions of regression error distributions may be potentially violated.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5461-5472"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-17DOI: 10.1002/sim.10247
Guangyong Zou, Lily Zou
Multiple primary endpoints are commonly used in randomized controlled trials to assess treatment effects. When the endpoints are measured on different scales, the O'Brien rank-sum test or the Wei-Lachin test for stochastic ordering may be used for hypothesis testing. However, the O'Brien-Wei-Lachin (OWL) approach is unable to handle missing data and adjust for baseline measurements. We present a nonparametric approach for data analysis that encompasses the OWL approach as a special case. Our approach is based on quantifying an endpoint-specific treatment effect using the probability that a participant in the treatment group has a better score than (or a win over) a participant in the control group. The average of the endpoint-specific win probabilities (WinPs), termed the global win probability (gWinP), is used to quantify the global treatment effect, with the null hypothesis gWinP = 0.50. Our approach involves converting the data for each endpoint to endpoint-specific win fractions, and modeling the win fractions using multivariate linear mixed models to obtain estimates of the endpoint-specific WinPs and the associated variance-covariance matrix. Focusing on confidence interval estimation for the gWinP, we derive sample size formulas for clinical trial design. Simulation results demonstrate that our approach performed well in terms of bias, interval coverage percentage, and assurance of achieving a pre-specified precision for the gWinP. Illustrative code for implementing the methods using SAS PROC RANK and PROC MIXED is provided.
{"title":"A Nonparametric Global Win Probability Approach to the Analysis and Sizing of Randomized Controlled Trials With Multiple Endpoints of Different Scales and Missing Data: Beyond O'Brien-Wei-Lachin.","authors":"Guangyong Zou, Lily Zou","doi":"10.1002/sim.10247","DOIUrl":"10.1002/sim.10247","url":null,"abstract":"<p><p>Multiple primary endpoints are commonly used in randomized controlled trials to assess treatment effects. When the endpoints are measured on different scales, the O'Brien rank-sum test or the Wei-Lachin test for stochastic ordering may be used for hypothesis testing. However, the O'Brien-Wei-Lachin (OWL) approach is unable to handle missing data and adjust for baseline measurements. We present a nonparametric approach for data analysis that encompasses the OWL approach as a special case. Our approach is based on quantifying an endpoint-specific treatment effect using the probability that a participant in the treatment group has a better score than (or a win over) a participant in the control group. The average of the endpoint-specific win probabilities (WinPs), termed the global win probability (gWinP), is used to quantify the global treatment effect, with the null hypothesis gWinP = 0.50. Our approach involves converting the data for each endpoint to endpoint-specific win fractions, and modeling the win fractions using multivariate linear mixed models to obtain estimates of the endpoint-specific WinPs and the associated variance-covariance matrix. Focusing on confidence interval estimation for the gWinP, we derive sample size formulas for clinical trial design. Simulation results demonstrate that our approach performed well in terms of bias, interval coverage percentage, and assurance of achieving a pre-specified precision for the gWinP. Illustrative code for implementing the methods using SAS PROC RANK and PROC MIXED is provided.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5366-5379"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-12DOI: 10.1002/sim.10243
Kathrin Möllenhoff, Nadine Binder, Holger Dette
The identification of similar patient pathways is a crucial task in healthcare analytics. A flexible tool to address this issue are parametric competing risks models, where transition intensities may be specified by a variety of parametric distributions, thus in particular being possibly time-dependent. We assess the similarity between two such models by examining the transitions between different health states. This research introduces a method to measure the maximum differences in transition intensities over time, leading to the development of a test procedure for assessing similarity. We propose a parametric bootstrap approach for this purpose and provide a proof to confirm the validity of this procedure. The performance of our proposed method is evaluated through a simulation study, considering a range of sample sizes, differing amounts of censoring, and various thresholds for similarity. Finally, we demonstrate the practical application of our approach with a case study from urological clinical routine practice, which inspired this research.
{"title":"Testing Similarity of Parametric Competing Risks Models for Identifying Potentially Similar Pathways in Healthcare.","authors":"Kathrin Möllenhoff, Nadine Binder, Holger Dette","doi":"10.1002/sim.10243","DOIUrl":"10.1002/sim.10243","url":null,"abstract":"<p><p>The identification of similar patient pathways is a crucial task in healthcare analytics. A flexible tool to address this issue are parametric competing risks models, where transition intensities may be specified by a variety of parametric distributions, thus in particular being possibly time-dependent. We assess the similarity between two such models by examining the transitions between different health states. This research introduces a method to measure the maximum differences in transition intensities over time, leading to the development of a test procedure for assessing similarity. We propose a parametric bootstrap approach for this purpose and provide a proof to confirm the validity of this procedure. The performance of our proposed method is evaluated through a simulation study, considering a range of sample sizes, differing amounts of censoring, and various thresholds for similarity. Finally, we demonstrate the practical application of our approach with a case study from urological clinical routine practice, which inspired this research.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5316-5330"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-18DOI: 10.1002/sim.10256
Sydney Porter, Thomas A Murray, Anne Eaton
We propose a phase I/II trial design to support dose-finding when the optimal biological dose (OBD) may differ in two prespecified patient subgroups. The proposed design uses a utility function to quantify efficacy-toxicity trade-offs, and a Bayesian model with spike and slab prior distributions for the subgroup effect on toxicity and efficacy to guide dosing and to facilitate identifying either subgroup-specific OBDs or a common OBD depending on the resulting trial data. In a simulation study, we find the proposed design performs nearly as well as a design that ignores subgroups when the dose-toxicity and dose-efficacy relationships are the same in both subgroups, and nearly as well as a design with independent dose-finding within each subgroup when these relationships differ across subgroups. In other words, the proposed adaptive design performs similarly to the design that would be chosen if investigators possessed foreknowledge about whether the dose-toxicity and/or dose-efficacy relationship differs across two prespecified subgroups. Thus, the proposed design may be effective for OBD selection when uncertainty exists about whether the OBD differs in two prespecified subgroups.
{"title":"Phase I/II Design for Selecting Subgroup-Specific Optimal Biological Doses for Prespecified Subgroups.","authors":"Sydney Porter, Thomas A Murray, Anne Eaton","doi":"10.1002/sim.10256","DOIUrl":"10.1002/sim.10256","url":null,"abstract":"<p><p>We propose a phase I/II trial design to support dose-finding when the optimal biological dose (OBD) may differ in two prespecified patient subgroups. The proposed design uses a utility function to quantify efficacy-toxicity trade-offs, and a Bayesian model with spike and slab prior distributions for the subgroup effect on toxicity and efficacy to guide dosing and to facilitate identifying either subgroup-specific OBDs or a common OBD depending on the resulting trial data. In a simulation study, we find the proposed design performs nearly as well as a design that ignores subgroups when the dose-toxicity and dose-efficacy relationships are the same in both subgroups, and nearly as well as a design with independent dose-finding within each subgroup when these relationships differ across subgroups. In other words, the proposed adaptive design performs similarly to the design that would be chosen if investigators possessed foreknowledge about whether the dose-toxicity and/or dose-efficacy relationship differs across two prespecified subgroups. Thus, the proposed design may be effective for OBD selection when uncertainty exists about whether the OBD differs in two prespecified subgroups.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5401-5411"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-24DOI: 10.1002/sim.10261
Suman Majumder, Brent A Coull, Jessica L Mark Welch, Patrick J La Riviere, Floyd E Dewhirst, Jacqueline R Starr, Kyu Ha Lee
Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat each cluster's central object as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mechanisms. Parent-offspring statistical frameworks can be usefully applied even when central objects ("parents") differ from peripheral ones ("offspring"). We propose the novel multivariate cluster point process (MCPP) to quantify multi-object (e.g., multi-cellular) arrangements. Unlike commonly used approaches, the MCPP exploits locations of the central parent object in clusters. It accounts for possibly multilayered, multivariate clustering. The model formulation requires specification of which object types function as cluster centers and which reside peripherally. If such information is unknown, the relative roles of object types may be explored by comparing fit of different models via the deviance information criterion (DIC). In simulated data, we compared a series of models' DIC; the MCPP correctly identified simulated relationships. It also produced more accurate and precise parameter estimates than the classical univariate Neyman-Scott process model. We also used the MCPP to quantify proposed configurations and explore new ones in human dental plaque biofilm image data. MCPP models quantified simultaneous clustering of Streptococcus and Porphyromonas around Corynebacterium and of Pasteurellaceae around Streptococcus and successfully captured hypothesized structures for all taxa. Further exploration suggested the presence of clustering between Fusobacterium and Leptotrichia, a previously unreported relationship.
{"title":"Multivariate Cluster Point Process to Quantify and Explore Multi-Entity Configurations: Application to Biofilm Image Data.","authors":"Suman Majumder, Brent A Coull, Jessica L Mark Welch, Patrick J La Riviere, Floyd E Dewhirst, Jacqueline R Starr, Kyu Ha Lee","doi":"10.1002/sim.10261","DOIUrl":"10.1002/sim.10261","url":null,"abstract":"<p><p>Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat each cluster's central object as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mechanisms. Parent-offspring statistical frameworks can be usefully applied even when central objects (\"parents\") differ from peripheral ones (\"offspring\"). We propose the novel multivariate cluster point process (MCPP) to quantify multi-object (e.g., multi-cellular) arrangements. Unlike commonly used approaches, the MCPP exploits locations of the central parent object in clusters. It accounts for possibly multilayered, multivariate clustering. The model formulation requires specification of which object types function as cluster centers and which reside peripherally. If such information is unknown, the relative roles of object types may be explored by comparing fit of different models via the deviance information criterion (DIC). In simulated data, we compared a series of models' DIC; the MCPP correctly identified simulated relationships. It also produced more accurate and precise parameter estimates than the classical univariate Neyman-Scott process model. We also used the MCPP to quantify proposed configurations and explore new ones in human dental plaque biofilm image data. MCPP models quantified simultaneous clustering of Streptococcus and Porphyromonas around Corynebacterium and of Pasteurellaceae around Streptococcus and successfully captured hypothesized structures for all taxa. Further exploration suggested the presence of clustering between Fusobacterium and Leptotrichia, a previously unreported relationship.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5446-5460"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.
{"title":"Functional Principal Component Analysis for Continuous Non-Gaussian, Truncated, and Discrete Functional Data.","authors":"Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov","doi":"10.1002/sim.10240","DOIUrl":"10.1002/sim.10240","url":null,"abstract":"<p><p>Mobile health studies often collect multiple within-day self-reported assessments of participants' behavior and well-being on different scales such as physical activity (continuous scale), pain levels (truncated scale), mood states (ordinal scale), and the occurrence of daily life events (binary scale). These assessments, when indexed by time of day, can be treated and analyzed as functional data corresponding to their respective types: continuous, truncated, ordinal, and binary. Motivated by these examples, we develop a functional principal component analysis that deals with all four types of functional data in a unified manner. It employs a semiparametric Gaussian copula model, assuming a generalized latent non-paranormal process as the underlying generating mechanism for these four types of functional data. We specify latent temporal dependence using a covariance estimated through Kendall's <math> <semantics><mrow><mi>τ</mi></mrow> <annotation>$$ tau $$</annotation></semantics> </math> bridging method, incorporating smoothness in the bridging process. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Simulation studies demonstrate the method's competitive performance under both dense and sparse sampling designs. The method is applied to data from 497 participants in the National Institute of Mental Health Family Study of Mood Spectrum Disorders to characterize differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes, including Major Depressive Disorder and Type 1 and 2 Bipolar Disorder. Software implementation of the proposed method is provided in an R-package.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5431-5445"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-12DOI: 10.1002/sim.10235
Gwangsu Kim, Jeongho Park, Sangwook Kang
An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, the interpretation of which is intuitive. The semiparametric AFT model that does not specify the error distribution is sufficiently flexible and robust to depart from the distributional assumption. Owing to its desirable features, this class of model has been considered a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the non-linearity of predictors when modeling the mean. Deep neural networks (DNNs) have received much attention over the past few decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing non-linearity. Here, we propose applying a DNN to fit AFT models using Gehan-type loss combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank-based AFT model (DeepR-AFT) were investigated via an extensive simulation study. The DeepR-AFT model showed superior performance over its parametric and semiparametric counterparts when the predictor was nonlinear. For linear predictors, DeepR-AFT performed better when the dimensions of the covariates were large. The superior performance of the proposed DeepR-AFT was demonstrated using three real datasets.
{"title":"Deep Neural Network-Based Accelerated Failure Time Models Using Rank Loss.","authors":"Gwangsu Kim, Jeongho Park, Sangwook Kang","doi":"10.1002/sim.10235","DOIUrl":"10.1002/sim.10235","url":null,"abstract":"<p><p>An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, the interpretation of which is intuitive. The semiparametric AFT model that does not specify the error distribution is sufficiently flexible and robust to depart from the distributional assumption. Owing to its desirable features, this class of model has been considered a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the non-linearity of predictors when modeling the mean. Deep neural networks (DNNs) have received much attention over the past few decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing non-linearity. Here, we propose applying a DNN to fit AFT models using Gehan-type loss combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank-based AFT model (DeepR-AFT) were investigated via an extensive simulation study. The DeepR-AFT model showed superior performance over its parametric and semiparametric counterparts when the predictor was nonlinear. For linear predictors, DeepR-AFT performed better when the dimensions of the covariates were large. The superior performance of the proposed DeepR-AFT was demonstrated using three real datasets.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5331-5343"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-10DOI: 10.1002/sim.10227
Joanne Kim, Andrew B Lawson, Brian Neelon, Jeffrey E Korte, Jan M Eberth, Gerardo Chowell
Identification of areas of high disease risk has been one of the top goals for infectious disease public health surveillance. Accurate prediction of these regions leads to effective resource allocation and faster intervention. This paper proposes a novel prediction surveillance metric based on a Bayesian spatio-temporal model for infectious disease outbreaks. Exceedance probability, which has been commonly used for cluster detection in statistical epidemiology, was extended to predict areas of high risk. The proposed metric consists of three components: the area's risk profile, temporal risk trend, and spatial neighborhood influence. We also introduce a weighting scheme to balance these three components, which accommodates the characteristics of the infectious disease outbreak, spatial properties, and disease trends. Thorough simulation studies were conducted to identify the optimal weighting scheme and evaluate the performance of the proposed prediction surveillance metric. Results indicate that the area's own risk and the neighborhood influence play an important role in making a highly sensitive metric, and the risk trend term is important for the specificity and accuracy of prediction. The proposed prediction metric was applied to the COVID-19 case data of South Carolina from March 12, 2020, and the subsequent 30 weeks of data.
{"title":"A Novel Bayesian Spatio-Temporal Surveillance Metric to Predict Emerging Infectious Disease Areas of High Disease Risk.","authors":"Joanne Kim, Andrew B Lawson, Brian Neelon, Jeffrey E Korte, Jan M Eberth, Gerardo Chowell","doi":"10.1002/sim.10227","DOIUrl":"10.1002/sim.10227","url":null,"abstract":"<p><p>Identification of areas of high disease risk has been one of the top goals for infectious disease public health surveillance. Accurate prediction of these regions leads to effective resource allocation and faster intervention. This paper proposes a novel prediction surveillance metric based on a Bayesian spatio-temporal model for infectious disease outbreaks. Exceedance probability, which has been commonly used for cluster detection in statistical epidemiology, was extended to predict areas of high risk. The proposed metric consists of three components: the area's risk profile, temporal risk trend, and spatial neighborhood influence. We also introduce a weighting scheme to balance these three components, which accommodates the characteristics of the infectious disease outbreak, spatial properties, and disease trends. Thorough simulation studies were conducted to identify the optimal weighting scheme and evaluate the performance of the proposed prediction surveillance metric. Results indicate that the area's own risk and the neighborhood influence play an important role in making a highly sensitive metric, and the risk trend term is important for the specificity and accuracy of prediction. The proposed prediction metric was applied to the COVID-19 case data of South Carolina from March 12, 2020, and the subsequent 30 weeks of data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5300-5315"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-09DOI: 10.1002/sim.10233
Marina Zamsheva, Alexander Kluttig, Andreas Wienke, Oliver Kuss
We propose a parametric model for describing chronic disease mortality from cohort data and illustrate its use for Type 2 diabetes. The model uses ideas from accelerated life testing in reliability theory and conceptualizes the occurrence of a chronic disease as putting the observational unit to an enhanced stress level, which is supposed to shorten its lifetime. It further addresses the issue of semi-competing risk, that is, the asymmetry of death and diagnosis of disease, where the disease can be diagnosed before death, but not after. With respect to the cohort structure of the data, late entry into the cohort is taken into account and prevalent as well as incident cases inform the analysis. We finally give an extension of the model that allows age at disease diagnosis to be observed not exactly, but only partially within an interval. Model parameters can be straightforwardly estimated by Maximum Likelihood, using the assumption of a Gompertz distribution we show in a small simulation study that this works well. Data of the Cardiovascular Disease, Living and Ageing in Halle (CARLA) study, a population-based cohort in the city of Halle (Saale) in the eastern part of Germany, are used for illustration.
{"title":"Modeling Chronic Disease Mortality by Methods From Accelerated Life Testing.","authors":"Marina Zamsheva, Alexander Kluttig, Andreas Wienke, Oliver Kuss","doi":"10.1002/sim.10233","DOIUrl":"10.1002/sim.10233","url":null,"abstract":"<p><p>We propose a parametric model for describing chronic disease mortality from cohort data and illustrate its use for Type 2 diabetes. The model uses ideas from accelerated life testing in reliability theory and conceptualizes the occurrence of a chronic disease as putting the observational unit to an enhanced stress level, which is supposed to shorten its lifetime. It further addresses the issue of semi-competing risk, that is, the asymmetry of death and diagnosis of disease, where the disease can be diagnosed before death, but not after. With respect to the cohort structure of the data, late entry into the cohort is taken into account and prevalent as well as incident cases inform the analysis. We finally give an extension of the model that allows age at disease diagnosis to be observed not exactly, but only partially within an interval. Model parameters can be straightforwardly estimated by Maximum Likelihood, using the assumption of a Gompertz distribution we show in a small simulation study that this works well. Data of the Cardiovascular Disease, Living and Ageing in Halle (CARLA) study, a population-based cohort in the city of Halle (Saale) in the eastern part of Germany, are used for illustration.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5273-5284"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-10Epub Date: 2024-10-15DOI: 10.1002/sim.10238
Tingyu Zhu, Lan Xue, Carmen Tekwe, Keith Diaz, Mark Benden, Roger Zoh
Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the inherent data structure, resulting in erroneous clustering outcomes. In this article, we propose a simulation-based approach designed to mitigate the impact of measurement errors. Our proposed method estimates the distribution of functional measurement errors through repeated measurements. Subsequently, the clustering algorithm is applied to simulated data generated from the conditional distribution of the unobserved true functional data given the observed contaminated functional data, accounting for the adjustments made to rectify measurement errors. We illustrate through simulations show that the proposed method has improved numerical performance than the naive methods that neglect such errors. Our proposed method was applied to a childhood obesity study, giving more reliable clustering results.
{"title":"Clustering Functional Data With Measurement Errors: A Simulation-Based Approach.","authors":"Tingyu Zhu, Lan Xue, Carmen Tekwe, Keith Diaz, Mark Benden, Roger Zoh","doi":"10.1002/sim.10238","DOIUrl":"10.1002/sim.10238","url":null,"abstract":"<p><p>Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the inherent data structure, resulting in erroneous clustering outcomes. In this article, we propose a simulation-based approach designed to mitigate the impact of measurement errors. Our proposed method estimates the distribution of functional measurement errors through repeated measurements. Subsequently, the clustering algorithm is applied to simulated data generated from the conditional distribution of the unobserved true functional data given the observed contaminated functional data, accounting for the adjustments made to rectify measurement errors. We illustrate through simulations show that the proposed method has improved numerical performance than the naive methods that neglect such errors. Our proposed method was applied to a childhood obesity study, giving more reliable clustering results.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5344-5352"},"PeriodicalIF":1.8,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}