Conditional independence is a foundational concept for understanding probabilistic relationships among variables, with broad applications in fields such as causal inference and machine learning. This study focuses on testing conditional independence, $Tperp X|Z$, where T represents survival data possibly subject to right censoring, Z represents established risk factors for T, and X represents potential novel biomarkers. The goal is to identify novel biomarkers that offer additional merits for further risk assessment and prediction. This can be achieved by using either the partial or parametric likelihood ratio statistic to evaluate whether the coefficient vector of X in the conditional model of T given $(X^{ mathrm{scriptscriptstyle top } }, Z^{ mathrm{scriptscriptstyle top } })^{ mathrm{scriptscriptstyle top } }$ is equal to zero. Traditional tests such as directly comparing likelihood ratios to chi-squared distributions may produce erroneous type-I error rates under model misspecification. As an alternative, we propose a resampling-based method to approximate the distribution of the likelihood ratios. A key advantage of the proposed test is its double robustness: it achieves approximately correct type-I error rates when either the conditional outcome model or the working model of ${rm pr} (X|Z)$ is correctly specified. Additionally, machine learning techniques can be incorporated to improve test performance. Simulation studies and the application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data demonstrate the finite-sample performance of the proposed tests.
{"title":"Double robust conditional independence test for novel biomarkers given established risk factors with survival data.","authors":"Baoying Yang, Jing Qin, Jing Ning, Yukun Liu","doi":"10.1093/biomtc/ujaf133","DOIUrl":"10.1093/biomtc/ujaf133","url":null,"abstract":"<p><p>Conditional independence is a foundational concept for understanding probabilistic relationships among variables, with broad applications in fields such as causal inference and machine learning. This study focuses on testing conditional independence, $Tperp X|Z$, where T represents survival data possibly subject to right censoring, Z represents established risk factors for T, and X represents potential novel biomarkers. The goal is to identify novel biomarkers that offer additional merits for further risk assessment and prediction. This can be achieved by using either the partial or parametric likelihood ratio statistic to evaluate whether the coefficient vector of X in the conditional model of T given $(X^{ mathrm{scriptscriptstyle top } }, Z^{ mathrm{scriptscriptstyle top } })^{ mathrm{scriptscriptstyle top } }$ is equal to zero. Traditional tests such as directly comparing likelihood ratios to chi-squared distributions may produce erroneous type-I error rates under model misspecification. As an alternative, we propose a resampling-based method to approximate the distribution of the likelihood ratios. A key advantage of the proposed test is its double robustness: it achieves approximately correct type-I error rates when either the conditional outcome model or the working model of ${rm pr} (X|Z)$ is correctly specified. Additionally, machine learning techniques can be incorporated to improve test performance. Simulation studies and the application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data demonstrate the finite-sample performance of the proposed tests.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145336170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by a malaria vaccine efficacy trial, this paper investigates generalized nonparametric temporal models of intensity processes with multiple time scales. Through the choice of link functions, the proposed models encompass a wide range of models such as the multiplicative temporal intensity model and the additive temporal intensity model. A maximum likelihood estimation procedure is developed to estimate the effects of two time-scales via the local linear smoothing with double kernels. Computational algorithms are developed to facilitate applications of the proposed method. An adaptive algorithm is developed to overcome the challenges of overlapping covariates. A cross-validation bandwidth selection procedure based on the logarithm of likelihood criteria is discussed. The asymptotic properties of the proposed estimators are investigated. Our simulation study shows that the proposed methods have satisfactory finite sample performance for both the multiplicative temporal intensity model and additive temporal intensity model. The proposed methods are applied to analyze the MAL-094/MAL-095 malaria vaccine efficacy trial data to investigate how the new malaria infection risk changes over time and how a prior infection or vaccination changes the future infection risk. The proposed method provides new insight into the protective effects of the malaria vaccine against new malaria infections and how the vaccine efficacy is modified by the history of prior malaria infection over time.
{"title":"Generalized nonparametric temporal modeling of recurrent events with application to a malaria vaccine trial.","authors":"Fei Heng, Yanqing Sun, Jing Xu, Peter B Gilbert","doi":"10.1093/biomtc/ujaf146","DOIUrl":"10.1093/biomtc/ujaf146","url":null,"abstract":"<p><p>Motivated by a malaria vaccine efficacy trial, this paper investigates generalized nonparametric temporal models of intensity processes with multiple time scales. Through the choice of link functions, the proposed models encompass a wide range of models such as the multiplicative temporal intensity model and the additive temporal intensity model. A maximum likelihood estimation procedure is developed to estimate the effects of two time-scales via the local linear smoothing with double kernels. Computational algorithms are developed to facilitate applications of the proposed method. An adaptive algorithm is developed to overcome the challenges of overlapping covariates. A cross-validation bandwidth selection procedure based on the logarithm of likelihood criteria is discussed. The asymptotic properties of the proposed estimators are investigated. Our simulation study shows that the proposed methods have satisfactory finite sample performance for both the multiplicative temporal intensity model and additive temporal intensity model. The proposed methods are applied to analyze the MAL-094/MAL-095 malaria vaccine efficacy trial data to investigate how the new malaria infection risk changes over time and how a prior infection or vaccination changes the future infection risk. The proposed method provides new insight into the protective effects of the malaria vaccine against new malaria infections and how the vaccine efficacy is modified by the history of prior malaria infection over time.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12635532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although the Cox proportional hazards (PH) model is well established and extensively used in the analysis of survival data, the PH assumption may not always hold in practical scenarios. The class of semiparametric transformation models extends the Cox model and also includes many other survival models as special cases. This paper introduces a deep partially linear transformation model as a general and flexible regression framework for right-censored data. The proposed method is capable of avoiding the curse of dimensionality while still retaining the interpretability of some covariates of interest. We derive the overall convergence rate of the maximum likelihood estimators, the minimax lower bound of the nonparametric deep neural network estimator, and the asymptotic normality and the semiparametric efficiency of the parametric estimator. Comprehensive simulation studies demonstrate the impressive performance of the proposed estimation procedure in terms of both the estimation accuracy and the predictive power, which is further validated by an application to a real-world dataset.
{"title":"Deep partially linear transformation model for right-censored survival data.","authors":"Junkai Yin, Yue Zhang, Zhangsheng Yu","doi":"10.1093/biomtc/ujaf126","DOIUrl":"10.1093/biomtc/ujaf126","url":null,"abstract":"<p><p>Although the Cox proportional hazards (PH) model is well established and extensively used in the analysis of survival data, the PH assumption may not always hold in practical scenarios. The class of semiparametric transformation models extends the Cox model and also includes many other survival models as special cases. This paper introduces a deep partially linear transformation model as a general and flexible regression framework for right-censored data. The proposed method is capable of avoiding the curse of dimensionality while still retaining the interpretability of some covariates of interest. We derive the overall convergence rate of the maximum likelihood estimators, the minimax lower bound of the nonparametric deep neural network estimator, and the asymptotic normality and the semiparametric efficiency of the parametric estimator. Comprehensive simulation studies demonstrate the impressive performance of the proposed estimation procedure in terms of both the estimation accuracy and the predictive power, which is further validated by an application to a real-world dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145399782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The US Food and Drug Administration (FDA) launched Project Optimus to shift the objective of dose selection from the maximum tolerated dose to the optimal biological dose (OBD), optimizing the benefit-risk tradeoff. One approach recommended by the FDA's guidance is to conduct randomized trials comparing multiple doses. In this paper, using the selection design framework, we propose a Randomized Optimal SElection (ROSE) design, which minimizes sample size while ensuring the probability of correct selection of the OBD at pre-specified accuracy levels. The ROSE design is simple to implement, involving a straightforward comparison of the difference in response rates between two dose arms against a predetermined decision boundary. We further consider a two-stage ROSE design that allows for early selection of the OBD at the interim when there is sufficient evidence, further reducing the sample size. Simulation studies demonstrate that the ROSE design exhibits desirable operating characteristics in correctly identifying the OBD. A sample size of 15-40 patients per dosage arm typically results in a percentage of correct selection of the optimal dose ranging from 60% to 70%.
{"title":"Randomized optimal selection design for dose optimization.","authors":"Shuqi Wang, Ying Yuan, Suyu Liu","doi":"10.1093/biomtc/ujaf124","DOIUrl":"10.1093/biomtc/ujaf124","url":null,"abstract":"<p><p>The US Food and Drug Administration (FDA) launched Project Optimus to shift the objective of dose selection from the maximum tolerated dose to the optimal biological dose (OBD), optimizing the benefit-risk tradeoff. One approach recommended by the FDA's guidance is to conduct randomized trials comparing multiple doses. In this paper, using the selection design framework, we propose a Randomized Optimal SElection (ROSE) design, which minimizes sample size while ensuring the probability of correct selection of the OBD at pre-specified accuracy levels. The ROSE design is simple to implement, involving a straightforward comparison of the difference in response rates between two dose arms against a predetermined decision boundary. We further consider a two-stage ROSE design that allows for early selection of the OBD at the interim when there is sufficient evidence, further reducing the sample size. Simulation studies demonstrate that the ROSE design exhibits desirable operating characteristics in correctly identifying the OBD. A sample size of 15-40 patients per dosage arm typically results in a percentage of correct selection of the optimal dose ranging from 60% to 70%.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12505323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.
{"title":"A semiparametric Gaussian Mixture Model with spatial dependence and its application to whole-slide image clustering analysis.","authors":"Baichen Yu, Jin Liu, Hansheng Wang","doi":"10.1093/biomtc/ujaf149","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf149","url":null,"abstract":"<p><p>We develop here a semiparametric Gaussian Mixture Model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian Mixture Model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145653431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.
{"title":"SPLasso for high-dimensional additive hazards regression with covariate measurement error.","authors":"Jiarui Zhang, Hongsheng Liu, Xin Chen, Jinfeng Xu","doi":"10.1093/biomtc/ujaf130","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf130","url":null,"abstract":"<p><p>High-dimensional error-prone survival data are prevalent in biomedical studies, where numerous clinical or genetic variables are collected for risk assessment. The presence of measurement errors in covariates complicates parameter estimation and variable selection, leading to non-convex optimization challenges. We propose an error-in-variables additive hazards regression model for high-dimensional noisy survival data. By employing the nearest positive semi-definite matrix projection, we develop a fast Lasso approach (semi-definite projection Lasso, SPLasso) and its soft thresholding variant (SPLasso-T), both with theoretical guarantees. Under mild assumptions, we establish model selection consistency, oracle inequalities, and limiting distributions for these methods. Simulation studies and two real data applications demonstrate the methods' superior efficiency in handling high-dimensional data, particularly showcasing remarkable performance in scenarios with missing values, highlighting their robustness and practical utility in complex biomedical settings.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145273319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.
{"title":"Bayesian scalar-on-image regression with spatial interactions for modeling Alzheimer's disease.","authors":"Nilanjana Chakraborty, Qi Long, Suprateek Kundu","doi":"10.1093/biomtc/ujaf144","DOIUrl":"10.1093/biomtc/ujaf144","url":null,"abstract":"<p><p>There has been substantial progress in predictive modeling for cognitive impairment in neurodegenerative disorders such as Alzheimer's disease (AD), based on neuroimaging biomarkers. However, existing approaches typically do not incorporate heterogeneity that may potentially arise due to interactions between the spatially varying imaging features and supplementary demographic, clinical and genetic risk factors in AD. Unfortunately, ignoring such heterogeneity may potentially result in poor prediction and biased estimation. Building on existing scalar-on-image regression framework, we address this issue by incorporating spatially varying interactions between brain image and supplementary risk factors to model cognitive impairment in AD. The proposed Bayesian method tackles spatial interactions via hierarchical representation for the functional regression coefficients depending on supplementary risk factors, which is embedded in a scalar-on-function framework involving a multi-resolution wavelet decomposition. To address the curse of dimensionality, we induce simultaneous sparsity and clustering via a spike and slab mixture prior, where the slab component is characterized by a latent class distribution. We develop an efficient Markov chain Monte Carlo algorithm for posterior computation. Extensive simulations and application to the longitudinal Alzheimer's Disease Neuroimaging Initiative study illustrate significantly improved prediction of cognitive impairment in AD across multiple visits by our model in comparison with alternate approaches. The proposed approach also identifies key brain regions in AD that exhibit significant association with cognitive abilities, either directly or through interactions with risk factors.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12613162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145501754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.
{"title":"High-dimensional multi-study multi-modality covariate-augmented generalized factor model.","authors":"Wei Liu, Qingzhi Zhong","doi":"10.1093/biomtc/ujaf107","DOIUrl":"10.1093/biomtc/ujaf107","url":null,"abstract":"<p><p>Latent factor models that integrate data from multiple sources/studies or modalities have garnered considerable attention across various disciplines. However, existing methods predominantly focus either on multi-study integration or multi-modality integration, rendering them insufficient for analyzing the diverse modalities measured across multiple studies. To address this limitation and cater to practical needs, we introduce a high-dimensional generalized factor model that seamlessly integrates multi-modality data from multiple studies, while also accommodating additional covariates. We conduct a thorough investigation of the identifiability conditions to enhance the model's interpretability. To tackle the complexity of high-dimensional nonlinear integration caused by 4 large latent random matrices, we utilize a variational lower bound to approximate the observed log-likelihood by employing a variational posterior distribution. By profiling the variational parameters, we establish the asymptotical properties of estimators for model parameters using M-estimation theory. Furthermore, we devise a computationally efficient variational expectation maximization (EM) algorithm to execute the estimation process and a criterion to determine the optimal number of both study-shared and study-specific factors. Extensive simulation studies and a real-world application show that the proposed method significantly outperforms existing methods in terms of estimation accuracy and computational efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144871261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira
An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the "true" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the "true" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.
{"title":"Model robust designs for dose-response models.","authors":"Belmiro P M Duarte, Anthony C Atkinson, Nuno M C Oliveira","doi":"10.1093/biomtc/ujaf112","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf112","url":null,"abstract":"<p><p>An optimal experimental design is a structured data collection plan aimed at maximizing the amount of information gathered. Determining an optimal experimental design, however, relies on the assumption that a predetermined model structure, relating the response and covariates, is known a priori. In practical scenarios, such as dose-response modeling, the form of the model representing the \"true\" relationship is frequently unknown, although there exists a finite set or pool of potential alternative models. Designing experiments based on a single model from this set may lead to inefficiency or inadequacy if the \"true\" model differs from that assumed when calculating the design. One approach to minimize the impact of the uncertainty in the model on the experimental plan is known as model robust design. In this context, we systematically address the challenge of finding approximate optimal model robust experimental designs. Our focus is on locally optimal designs, so allowing some of the models in the pool to be nonlinear. We present three Semidefinite Programming-based formulations, each aligned with one of the classes of model robustness criteria introduced by Läuter. These formulations exploit the semidefinite representability of the robustness criteria, leading to the representation of the robust problem as a semidefinite program. To ensure comparability of information measures across various models, we employ standardized designs. To illustrate the application of our approach, we consider a dose-response study where, initially, seven models were postulated as potential candidates to describe the dose-response relationship.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}