This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.
本研究引入了一种创新的累积链接建模(CLM)方法,利用遥感数据监测大面积作物的生长进度。研究开发了两种不同的模型,一种是固定效应累积联系模型,另一种是混合效应累积联系模型,其中包含年度随机效应,以捕捉固有的季节间变异性。推论基于部分似然法,有两种法则变化,一种是基于多二项分布的标准 CLM,另一种是基于乘积二项分布的新型 CLM。利用美国内布拉斯加州 20 年的现场数据,对玉米、燕麦、高粱、大豆、冬小麦、苜蓿、干豆和小米等八种作物的模型性能进行了评估。这些模型利用日历时间、热时间和归一化差异植被指数等预测属性。结果表明,这种方法可广泛应用于不同作物,对作物生长进度进行大规模预测,并能估算重要的农艺参数。为了促进可重复性,我们开发了一个 R 软件包生态系统,并以 "人类的年龄 "为名向公众开放。这些软件包可用于在任何拥有此类数据的地区(包括美国)实施所介绍的方法。
{"title":"Cumulative link mixed-effects models in the service of remote sensing crop progress monitoring.","authors":"Ioannis Oikonomidis, Samis Trevezas","doi":"10.1093/biomtc/ujae137","DOIUrl":"https://doi.org/10.1093/biomtc/ujae137","url":null,"abstract":"<p><p>This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many clinical contexts, the event of interest could occur multiple times for the same patient. Considerable advancement has been made on developing recurrent event models based on or that use biomarker information. However, less attention has been given to evaluating the prognostic accuracy of a biomarker or a composite score obtained from a fitted recurrent event-rate model. In this manuscript, we propose novel measures to characterize the prognostic accuracy of a marker measured at baseline in the presence of recurrent events. The proposed estimators are based on a semiparametric frailty model that accounts for the informativeness of a marker and unobserved heterogeneity among patients with respect to the rate of event occurrence. We investigate the asymptotic properties of the proposed accuracy estimators and demonstrate these estimators' finite sample performance through simulation studies. The proposed estimators have minimal bias and appropriate coverage. The estimators are applied to evaluate the performance of a baseline forced expiratory volume, a measure of lung capacity, for repeated episodes of pulmonary exacerbations in patients with cystic fibrosis.
{"title":"Time-dependent prognostic accuracy measures for recurrent event data.","authors":"R Dey, D E Schaubel, J A Hanley, P Saha-Chaudhuri","doi":"10.1093/biomtc/ujae150","DOIUrl":"10.1093/biomtc/ujae150","url":null,"abstract":"<p><p>In many clinical contexts, the event of interest could occur multiple times for the same patient. Considerable advancement has been made on developing recurrent event models based on or that use biomarker information. However, less attention has been given to evaluating the prognostic accuracy of a biomarker or a composite score obtained from a fitted recurrent event-rate model. In this manuscript, we propose novel measures to characterize the prognostic accuracy of a marker measured at baseline in the presence of recurrent events. The proposed estimators are based on a semiparametric frailty model that accounts for the informativeness of a marker and unobserved heterogeneity among patients with respect to the rate of event occurrence. We investigate the asymptotic properties of the proposed accuracy estimators and demonstrate these estimators' finite sample performance through simulation studies. The proposed estimators have minimal bias and appropriate coverage. The estimators are applied to evaluate the performance of a baseline forced expiratory volume, a measure of lung capacity, for repeated episodes of pulmonary exacerbations in patients with cystic fibrosis.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeheng Ge, Tao Li, Xingdong Feng, Mengyun Wu, Hailong Liu
Numerous statistical methods have been developed to search for genomic markers associated with the development, progression, and response to treatment of complex diseases. Among them, feature ranking plays a vital role due to its intuitive formulation and computational efficiency. However, most of the existing methods are based on the marginal importance of molecular predictors and share the limitation that the dependence (network) structures among predictors are not well accommodated, where a disease phenotype usually reflects various biological processes that interact in a complex network. In this paper, we propose a structured feature ranking method for identifying genomic markers, where such network structures are effectively accommodated using Laplacian regularization. The proposed method innovatively investigates multiple network scenarios, where the networks can be known a priori and data-dependently estimated. In addition, we rigorously explore the noise and uncertainty in the networks and control their impacts with proper selection of tuning parameters. These characteristics make the proposed method enjoy especially broad applicability. Theoretical result of our proposal is rigorously established. Compared to the original marginal measure, the proposed network structured measure can achieve sure screening properties with a faster convergence rate under mild conditions. Extensive simulations and analysis of The Cancer Genome Atlas melanoma data demonstrate the improvement of finite sample performance and practical usefulness of the proposed method.
{"title":"Structured feature ranking for genomic marker identification accommodating multiple types of networks.","authors":"Yeheng Ge, Tao Li, Xingdong Feng, Mengyun Wu, Hailong Liu","doi":"10.1093/biomtc/ujae158","DOIUrl":"https://doi.org/10.1093/biomtc/ujae158","url":null,"abstract":"<p><p>Numerous statistical methods have been developed to search for genomic markers associated with the development, progression, and response to treatment of complex diseases. Among them, feature ranking plays a vital role due to its intuitive formulation and computational efficiency. However, most of the existing methods are based on the marginal importance of molecular predictors and share the limitation that the dependence (network) structures among predictors are not well accommodated, where a disease phenotype usually reflects various biological processes that interact in a complex network. In this paper, we propose a structured feature ranking method for identifying genomic markers, where such network structures are effectively accommodated using Laplacian regularization. The proposed method innovatively investigates multiple network scenarios, where the networks can be known a priori and data-dependently estimated. In addition, we rigorously explore the noise and uncertainty in the networks and control their impacts with proper selection of tuning parameters. These characteristics make the proposed method enjoy especially broad applicability. Theoretical result of our proposal is rigorously established. Compared to the original marginal measure, the proposed network structured measure can achieve sure screening properties with a faster convergence rate under mild conditions. Extensive simulations and analysis of The Cancer Genome Atlas melanoma data demonstrate the improvement of finite sample performance and practical usefulness of the proposed method.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Razieh Nabi, Matteo Bonvini, Edward H Kennedy, Ming-Yueh Huang, Marcela Smid, Daniel O Scharfstein
Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed by Robins et al., Franks et al., and Zhou and Yao. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step, split sample, truncated estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has $sqrt{n}$ asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.
{"title":"Semiparametric sensitivity analysis: unmeasured confounding in observational studies.","authors":"Razieh Nabi, Matteo Bonvini, Edward H Kennedy, Ming-Yueh Huang, Marcela Smid, Daniel O Scharfstein","doi":"10.1093/biomtc/ujae106","DOIUrl":"https://doi.org/10.1093/biomtc/ujae106","url":null,"abstract":"<p><p>Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed by Robins et al., Franks et al., and Zhou and Yao. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step, split sample, truncated estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has $sqrt{n}$ asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jon A Steingrimsson, Sarah E Robertson, Sarah Voter, Issa J Dahabreh
We consider estimation of measures of model performance in a target population when covariate and outcome data are available from a source population and covariate data, but not outcome data, are available from the target population. In this setting, identification of measures of model performance is possible under an untestable assumption that the outcome and population (source or target) are independent conditional on covariates. In practice, this assumption is uncertain and, in some cases, controversial. Therefore, sensitivity analysis may be useful for examining the impact of assumption violations on inferences about model performance. Here, we propose an exponential tilt sensitivity analysis model and develop statistical methods to determine how measures of model performance are affected by violations of the assumption of conditional independence between outcome and population. We provide identification results and estimators for the risk in the target population under the sensitivity analysis model, examine the large-sample properties of the estimators, and apply them to data on lung cancer screening.
{"title":"Sensitivity analysis for studies transporting prediction models.","authors":"Jon A Steingrimsson, Sarah E Robertson, Sarah Voter, Issa J Dahabreh","doi":"10.1093/biomtc/ujae129","DOIUrl":"10.1093/biomtc/ujae129","url":null,"abstract":"<p><p>We consider estimation of measures of model performance in a target population when covariate and outcome data are available from a source population and covariate data, but not outcome data, are available from the target population. In this setting, identification of measures of model performance is possible under an untestable assumption that the outcome and population (source or target) are independent conditional on covariates. In practice, this assumption is uncertain and, in some cases, controversial. Therefore, sensitivity analysis may be useful for examining the impact of assumption violations on inferences about model performance. Here, we propose an exponential tilt sensitivity analysis model and develop statistical methods to determine how measures of model performance are affected by violations of the assumption of conditional independence between outcome and population. We provide identification results and estimators for the risk in the target population under the sensitivity analysis model, examine the large-sample properties of the estimators, and apply them to data on lung cancer screening.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom
The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).
越来越多的高维纵向基因表达测量方法有助于了解生物机制,这是精准医疗所必需的。生物学知识表明,描述复杂疾病的最佳方法可能是在可能相互影响的潜在通路层面上进行描述。我们提出了一种贝叶斯方法,可以通过隶属高斯过程(DGP)描述不同通路之间的这种相关性,并通过贝叶斯稀疏因子分析将观察到的高维基因表达轨迹映射到未观察到的低维通路表达轨迹中。我们的建议是对纵向数据放宽独立因子经典假设的首次尝试,并通过模拟和实际数据分析,在恢复通路表达轨迹的形状、揭示基因和通路之间的关系以及预测基因表达(更接近的点估计和更窄的预测区间)方面表现出卓越的性能。为了拟合模型,我们提出了蒙特卡洛期望最大化(MCEM)方案,通过结合标准马尔可夫链蒙特卡洛采样器和 R 软件包 GPFDA(可返回 DGP 超参数的最大似然估计值),可以方便地实现该方案。MCEM 的模块化结构使其可以推广到涉及 DGP 模型组件的其他复杂模型。我们的 R 软件包 DGP4LCF 可在 R Archive Network (CRAN) 上查阅。
{"title":"Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories.","authors":"Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom","doi":"10.1093/biomtc/ujae131","DOIUrl":"https://doi.org/10.1093/biomtc/ujae131","url":null,"abstract":"<p><p>The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangfan Ren, Christine B Peterson, Marina Vannucci
In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.
{"title":"Bayesian network-guided sparse regression with flexible varying effects.","authors":"Yangfan Ren, Christine B Peterson, Marina Vannucci","doi":"10.1093/biomtc/ujae111","DOIUrl":"https://doi.org/10.1093/biomtc/ujae111","url":null,"abstract":"<p><p>In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyuan Tian, Fan Li, Li Shen, Denise Esserman, Yize Zhao
Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among genetic exposure, brain connectivity, and time to disease onset with maximum information extraction, we propose a Bayesian approach to model the effect pathway between each of these components while quantifying the mediating role of brain networks. To accommodate the biological architectures of brain connectivity constructed along white matter fiber tracts, we develop a structural model which includes a symmetric matrix-variate accelerated failure time model for disease onset and a symmetric matrix response regression for the network-variate mediator. We further impose within-graph sparsity and between-graph shrinkage to identify informative network configurations and eliminate the interference of noisy components. Simulations are carried out to confirm the advantages of our proposed method over existing alternatives. By applying the proposed method to the landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain neurobiologically plausible insights that may inform future intervention strategies.
{"title":"Bayesian pathway analysis over brain network mediators for survival data.","authors":"Xinyuan Tian, Fan Li, Li Shen, Denise Esserman, Yize Zhao","doi":"10.1093/biomtc/ujae132","DOIUrl":"10.1093/biomtc/ujae132","url":null,"abstract":"<p><p>Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among genetic exposure, brain connectivity, and time to disease onset with maximum information extraction, we propose a Bayesian approach to model the effect pathway between each of these components while quantifying the mediating role of brain networks. To accommodate the biological architectures of brain connectivity constructed along white matter fiber tracts, we develop a structural model which includes a symmetric matrix-variate accelerated failure time model for disease onset and a symmetric matrix response regression for the network-variate mediator. We further impose within-graph sparsity and between-graph shrinkage to identify informative network configurations and eliminate the interference of noisy components. Simulations are carried out to confirm the advantages of our proposed method over existing alternatives. By applying the proposed method to the landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain neurobiologically plausible insights that may inform future intervention strategies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.
{"title":"Joint mirror procedure: controlling false discovery rate for identifying simultaneous signals.","authors":"Linsui Deng, Kejun He, Xianyang Zhang","doi":"10.1093/biomtc/ujae142","DOIUrl":"10.1093/biomtc/ujae142","url":null,"abstract":"<p><p>In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a sequential multiple-assignment randomized trial (SMART), a sequence of treatments is given to a patient over multiple stages. In each stage, randomization may be done to allocate patients to different treatment groups. Even though SMART designs are getting popular among clinical researchers, the methodologies for adaptive randomization at different stages of a SMART are few and not sophisticated enough to handle the complexity of optimal allocation of treatments at every stage of a trial. Lack of optimal allocation methodologies can raise critical concerns about SMART designs from an ethical point of view. In this work, we develop an optimal adaptive allocation procedure using a constrained optimization that minimizes the total expected number of treatment failures for a SMART with a binary primary outcome, subject to a fixed asymptotic variance of a predefined objective function. Issues related to optimal adaptive allocations are explored theoretically with supporting simulations. The applicability of the proposed methodology is demonstrated using a recently conducted SMART study named M-bridge for developing universal and resource-efficient dynamic treatment regimes for incoming first-year college students as a bridge to desirable treatments to address alcohol-related risks.
{"title":"Optimal adaptive SMART designs with binary outcomes.","authors":"Rik Ghosh, Bibhas Chakraborty, Inbal Nahum-Shani, Megan E Patrick, Palash Ghosh","doi":"10.1093/biomtc/ujae140","DOIUrl":"10.1093/biomtc/ujae140","url":null,"abstract":"<p><p>In a sequential multiple-assignment randomized trial (SMART), a sequence of treatments is given to a patient over multiple stages. In each stage, randomization may be done to allocate patients to different treatment groups. Even though SMART designs are getting popular among clinical researchers, the methodologies for adaptive randomization at different stages of a SMART are few and not sophisticated enough to handle the complexity of optimal allocation of treatments at every stage of a trial. Lack of optimal allocation methodologies can raise critical concerns about SMART designs from an ethical point of view. In this work, we develop an optimal adaptive allocation procedure using a constrained optimization that minimizes the total expected number of treatment failures for a SMART with a binary primary outcome, subject to a fixed asymptotic variance of a predefined objective function. Issues related to optimal adaptive allocations are explored theoretically with supporting simulations. The applicability of the proposed methodology is demonstrated using a recently conducted SMART study named M-bridge for developing universal and resource-efficient dynamic treatment regimes for incoming first-year college students as a bridge to desirable treatments to address alcohol-related risks.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639531/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}