Meta-analysis is a powerful tool to synthesize findings from multiple studies. The normal-normal random-effects model is widely used to account for between-study heterogeneity. However, meta-analyses of sparse data, which may arise when the event rate is low for binary or count outcomes, pose a challenge to the normal-normal random-effects model in the accuracy and stability in inference since the normal approximation in the within-study model may not be good. To reduce bias arising from data sparsity, the generalized linear mixed model can be used by replacing the approximate normal within-study model with an exact model. Publication bias is one of the most serious threats in meta-analysis. Several quantitative sensitivity analysis methods for evaluating the potential impacts of selective publication are available for the normal-normal random-effects model. We propose a sensitivity analysis method by extending the likelihood-based sensitivity analysis with the $t$-statistic selection function of Copas to several generalized linear mixed-effects models. Through applications of our proposed method to several real-world meta-analyses and simulation studies, the proposed method was proven to outperform the likelihood-based sensitivity analysis based on the normal-normal model. The proposed method would give useful guidance to address publication bias in the meta-analysis of sparse data.
{"title":"Sensitivity analysis for publication bias in meta-analysis of sparse data based on exact likelihood.","authors":"Taojun Hu, Yi Zhou, Satoshi Hattori","doi":"10.1093/biomtc/ujae092","DOIUrl":"10.1093/biomtc/ujae092","url":null,"abstract":"<p><p>Meta-analysis is a powerful tool to synthesize findings from multiple studies. The normal-normal random-effects model is widely used to account for between-study heterogeneity. However, meta-analyses of sparse data, which may arise when the event rate is low for binary or count outcomes, pose a challenge to the normal-normal random-effects model in the accuracy and stability in inference since the normal approximation in the within-study model may not be good. To reduce bias arising from data sparsity, the generalized linear mixed model can be used by replacing the approximate normal within-study model with an exact model. Publication bias is one of the most serious threats in meta-analysis. Several quantitative sensitivity analysis methods for evaluating the potential impacts of selective publication are available for the normal-normal random-effects model. We propose a sensitivity analysis method by extending the likelihood-based sensitivity analysis with the $t$-statistic selection function of Copas to several generalized linear mixed-effects models. Through applications of our proposed method to several real-world meta-analyses and simulation studies, the proposed method was proven to outperform the likelihood-based sensitivity analysis based on the normal-normal model. The proposed method would give useful guidance to address publication bias in the meta-analysis of sparse data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the following discussion, we describe the various assumptions of exchangeability that have been made in the context of Bayesian borrowing and related models. In this context, we are able to highlight the difficulty of dynamic Bayesian borrowing under the assumption of individuals in the historical data being exchangeable with the current data and thus the strengths and novel features of the latent exchangeability prior. As borrowing methods are popular within clinical trials to augment the control arm, some potential challenges are identified with the application of the approach in this setting.
{"title":"Discussion on \"LEAP: the latent exchangeability prior for borrowing information from historical data\" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim.","authors":"Darren Scott, Alex Lewin","doi":"10.1093/biomtc/ujae085","DOIUrl":"https://doi.org/10.1093/biomtc/ujae085","url":null,"abstract":"<p><p>In the following discussion, we describe the various assumptions of exchangeability that have been made in the context of Bayesian borrowing and related models. In this context, we are able to highlight the difficulty of dynamic Bayesian borrowing under the assumption of individuals in the historical data being exchangeable with the current data and thus the strengths and novel features of the latent exchangeability prior. As borrowing methods are popular within clinical trials to augment the control arm, some potential challenges are identified with the application of the approach in this setting.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine Brumberg, Dylan S Small, Paul R Rosenbaum
What is the best way to split one stratum into two to maximally reduce the within-stratum imbalance in many covariates? We formulate this as an integer program and approximate the solution by randomized rounding of a linear program. A linear program may assign a fraction of a person to each refined stratum. Randomized rounding views fractional people as probabilities, assigning intact people to strata using biased coins. Randomized rounding is a well-studied theoretical technique for approximating the optimal solution of certain insoluble integer programs. When the number of people in a stratum is large relative to the number of covariates, we prove the following new results: (i) randomized rounding to split a stratum does very little randomizing, so it closely resembles the linear programming relaxation without splitting intact people; (ii) the linear relaxation and the randomly rounded solution place lower and upper bounds on the unattainable integer programming solution; and because of (i), these bounds are often close, thereby ratifying the usable randomly rounded solution. We illustrate using an observational study that balanced many covariates by forming matched pairs composed of 2016 patients selected from 5735 using a propensity score. Instead, we form 5 propensity score strata and refine them into 10 strata, obtaining excellent covariate balance while retaining all patients. An R package optrefine at CRAN implements the method. Supplementary materials are available online.
{"title":"Optimal refinement of strata to balance covariates.","authors":"Katherine Brumberg, Dylan S Small, Paul R Rosenbaum","doi":"10.1093/biomtc/ujae061","DOIUrl":"https://doi.org/10.1093/biomtc/ujae061","url":null,"abstract":"<p><p>What is the best way to split one stratum into two to maximally reduce the within-stratum imbalance in many covariates? We formulate this as an integer program and approximate the solution by randomized rounding of a linear program. A linear program may assign a fraction of a person to each refined stratum. Randomized rounding views fractional people as probabilities, assigning intact people to strata using biased coins. Randomized rounding is a well-studied theoretical technique for approximating the optimal solution of certain insoluble integer programs. When the number of people in a stratum is large relative to the number of covariates, we prove the following new results: (i) randomized rounding to split a stratum does very little randomizing, so it closely resembles the linear programming relaxation without splitting intact people; (ii) the linear relaxation and the randomly rounded solution place lower and upper bounds on the unattainable integer programming solution; and because of (i), these bounds are often close, thereby ratifying the usable randomly rounded solution. We illustrate using an observational study that balanced many covariates by forming matched pairs composed of 2016 patients selected from 5735 using a propensity score. Instead, we form 5 propensity score strata and refine them into 10 strata, obtaining excellent covariate balance while retaining all patients. An R package optrefine at CRAN implements the method. Supplementary materials are available online.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.
{"title":"Causal meta-analysis by integrating multiple observational studies with multivariate outcomes.","authors":"Subharup Guha, Yi Li","doi":"10.1093/biomtc/ujae070","DOIUrl":"10.1093/biomtc/ujae070","url":null,"abstract":"<p><p>Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.
{"title":"Controlling false discovery rate for mediator selection in high-dimensional data.","authors":"Ran Dai, Ruiyang Li, Seonjoo Lee, Ying Liu","doi":"10.1093/biomtc/ujae064","DOIUrl":"10.1093/biomtc/ujae064","url":null,"abstract":"<p><p>The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prior distributions, which represent one's belief in the distributions of unknown parameters before observing the data, impact Bayesian inference in a critical and fundamental way. With the ability to incorporate external information from expert opinions or historical datasets, the priors, if specified appropriately, can improve the statistical efficiency of Bayesian inference. In survival analysis, based on the concept of unit information (UI) under parametric models, we propose the unit information Dirichlet process (UIDP) as a new class of nonparametric priors for the underlying distribution of time-to-event data. By deriving the Fisher information in terms of the differential of the cumulative hazard function, the UIDP prior is formulated to match its prior UI with the weighted average of UI in historical datasets and thus can utilize both parametric and nonparametric information provided by historical datasets. With a Markov chain Monte Carlo algorithm, simulations and real data analysis demonstrate that the UIDP prior can adaptively borrow historical information and improve statistical efficiency in survival analysis.
{"title":"Unit information Dirichlet process prior.","authors":"Jiaqi Gu, Guosheng Yin","doi":"10.1093/biomtc/ujae091","DOIUrl":"https://doi.org/10.1093/biomtc/ujae091","url":null,"abstract":"<p><p>Prior distributions, which represent one's belief in the distributions of unknown parameters before observing the data, impact Bayesian inference in a critical and fundamental way. With the ability to incorporate external information from expert opinions or historical datasets, the priors, if specified appropriately, can improve the statistical efficiency of Bayesian inference. In survival analysis, based on the concept of unit information (UI) under parametric models, we propose the unit information Dirichlet process (UIDP) as a new class of nonparametric priors for the underlying distribution of time-to-event data. By deriving the Fisher information in terms of the differential of the cumulative hazard function, the UIDP prior is formulated to match its prior UI with the weighted average of UI in historical datasets and thus can utilize both parametric and nonparametric information provided by historical datasets. With a Markov chain Monte Carlo algorithm, simulations and real data analysis demonstrate that the UIDP prior can adaptively borrow historical information and improve statistical efficiency in survival analysis.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The US Food and Drug Administration launched Project Optimus to reform the dose optimization and dose selection paradigm in oncology drug development, calling for the paradigm shift from finding the maximum tolerated dose to the identification of optimal biological dose (OBD). Motivated by a real-world drug development program, we propose a master-protocol-based platform trial design to simultaneously identify OBDs of a new drug, combined with standards of care or other novel agents, in multiple indications. We propose a Bayesian latent subgroup model to accommodate the treatment heterogeneity across indications, and employ Bayesian hierarchical models to borrow information within subgroups. At each interim analysis, we update the subgroup membership and dose-toxicity and -efficacy estimates, as well as the estimate of the utility for risk-benefit tradeoff, based on the observed data across treatment arms to inform the arm-specific decision of dose escalation and de-escalation and identify the OBD for each arm of a combination partner and an indication. The simulation study shows that the proposed design has desirable operating characteristics, providing a highly flexible and efficient way for dose optimization. The design has great potential to shorten the drug development timeline, save costs by reducing overlapping infrastructure, and speed up regulatory approval.
{"title":"A Bayesian latent-subgroup platform design for dose optimization.","authors":"Rongji Mu,Xiaojiang Zhan,Rui Sammi Tang,Ying Yuan","doi":"10.1093/biomtc/ujae093","DOIUrl":"https://doi.org/10.1093/biomtc/ujae093","url":null,"abstract":"The US Food and Drug Administration launched Project Optimus to reform the dose optimization and dose selection paradigm in oncology drug development, calling for the paradigm shift from finding the maximum tolerated dose to the identification of optimal biological dose (OBD). Motivated by a real-world drug development program, we propose a master-protocol-based platform trial design to simultaneously identify OBDs of a new drug, combined with standards of care or other novel agents, in multiple indications. We propose a Bayesian latent subgroup model to accommodate the treatment heterogeneity across indications, and employ Bayesian hierarchical models to borrow information within subgroups. At each interim analysis, we update the subgroup membership and dose-toxicity and -efficacy estimates, as well as the estimate of the utility for risk-benefit tradeoff, based on the observed data across treatment arms to inform the arm-specific decision of dose escalation and de-escalation and identify the OBD for each arm of a combination partner and an indication. The simulation study shows that the proposed design has desirable operating characteristics, providing a highly flexible and efficient way for dose optimization. The design has great potential to shorten the drug development timeline, save costs by reducing overlapping infrastructure, and speed up regulatory approval.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"38 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clustered coefficient regression (CCR) extends the classical regression model by allowing regression coefficients varying across observations and forming clusters of observations. It has become an increasingly useful tool for modeling the heterogeneous relationship between the predictor and response variables. A typical issue of existing CCR methods is that the estimation and clustering results can be unstable in the presence of multicollinearity. To address the instability issue, this paper introduces a low-rank structure of the CCR coefficient matrix and proposes a penalized non-convex optimization problem with an adaptive group fusion-type penalty tailor-made for this structure. An iterative algorithm is developed to solve this non-convex optimization problem with guaranteed convergence. An upper bound for the coefficient estimation error is also obtained to show the statistical property of the estimator. Empirical studies on both simulated datasets and a COVID-19 mortality rate dataset demonstrate the superiority of the proposed method to existing methods.
{"title":"Reduced-rank clustered coefficient regression for addressing multicollinearity in heterogeneous coefficient estimation.","authors":"Yan Zhong, Kejun He, Gefei Li","doi":"10.1093/biomtc/ujae076","DOIUrl":"https://doi.org/10.1093/biomtc/ujae076","url":null,"abstract":"<p><p>Clustered coefficient regression (CCR) extends the classical regression model by allowing regression coefficients varying across observations and forming clusters of observations. It has become an increasingly useful tool for modeling the heterogeneous relationship between the predictor and response variables. A typical issue of existing CCR methods is that the estimation and clustering results can be unstable in the presence of multicollinearity. To address the instability issue, this paper introduces a low-rank structure of the CCR coefficient matrix and proposes a penalized non-convex optimization problem with an adaptive group fusion-type penalty tailor-made for this structure. An iterative algorithm is developed to solve this non-convex optimization problem with guaranteed convergence. An upper bound for the coefficient estimation error is also obtained to show the statistical property of the estimator. Empirical studies on both simulated datasets and a COVID-19 mortality rate dataset demonstrate the superiority of the proposed method to existing methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent years have witnessed a rise in the popularity of information integration without sharing of raw data. By leveraging and incorporating summary information from external sources, internal studies can achieve enhanced estimation efficiency and prediction accuracy. However, a noteworthy challenge in utilizing summary-level information is accommodating the inherent heterogeneity across diverse data sources. In this study, we delve into the issue of prior probability shift between two cohorts, wherein the difference of two data distributions depends on the outcome. We introduce a novel semi-parametric constrained optimization-based approach to integrate information within this framework, which has not been extensively explored in existing literature. Our proposed method tackles the prior probability shift by introducing the outcome-dependent selection function and effectively addresses the estimation uncertainty associated with summary information from the external source. Our approach facilitates valid inference even in the absence of a known variance-covariance estimate from the external source. Through extensive simulation studies, we observe the superiority of our method over existing ones, showcasing minimal estimation bias and reduced variance for both binary and continuous outcomes. We further demonstrate the utility of our method through its application in investigating risk factors related to essential hypertension, where the reduced estimation variability is observed after integrating summary information from an external data.
{"title":"Integrating external summary information in the presence of prior probability shift: an application to assessing essential hypertension.","authors":"Chixiang Chen, Peisong Han, Shuo Chen, Michelle Shardell, Jing Qin","doi":"10.1093/biomtc/ujae090","DOIUrl":"10.1093/biomtc/ujae090","url":null,"abstract":"<p><p>Recent years have witnessed a rise in the popularity of information integration without sharing of raw data. By leveraging and incorporating summary information from external sources, internal studies can achieve enhanced estimation efficiency and prediction accuracy. However, a noteworthy challenge in utilizing summary-level information is accommodating the inherent heterogeneity across diverse data sources. In this study, we delve into the issue of prior probability shift between two cohorts, wherein the difference of two data distributions depends on the outcome. We introduce a novel semi-parametric constrained optimization-based approach to integrate information within this framework, which has not been extensively explored in existing literature. Our proposed method tackles the prior probability shift by introducing the outcome-dependent selection function and effectively addresses the estimation uncertainty associated with summary information from the external source. Our approach facilitates valid inference even in the absence of a known variance-covariance estimate from the external source. Through extensive simulation studies, we observe the superiority of our method over existing ones, showcasing minimal estimation bias and reduced variance for both binary and continuous outcomes. We further demonstrate the utility of our method through its application in investigating risk factors related to essential hypertension, where the reduced estimation variability is observed after integrating summary information from an external data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11381951/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains.
{"title":"Visibility graph-based covariance functions for scalable spatial analysis in non-convex partially Euclidean domains.","authors":"Brian Gilbert, Abhirup Datta","doi":"10.1093/biomtc/ujae089","DOIUrl":"https://doi.org/10.1093/biomtc/ujae089","url":null,"abstract":"<p><p>We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142153112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}