Biometrics最新文献_第6页

Cumulative link mixed-effects models in the service of remote sensing crop progress monitoring. 累积环节混合效应模型在作物遥感监测中的应用。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae137

Ioannis Oikonomidis, Samis Trevezas

This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.

本研究引入了一种创新的累积链接建模（CLM）方法，利用遥感数据监测大面积作物的生长进度。研究开发了两种不同的模型，一种是固定效应累积联系模型，另一种是混合效应累积联系模型，其中包含年度随机效应，以捕捉固有的季节间变异性。推论基于部分似然法，有两种法则变化，一种是基于多二项分布的标准 CLM，另一种是基于乘积二项分布的新型 CLM。利用美国内布拉斯加州 20 年的现场数据，对玉米、燕麦、高粱、大豆、冬小麦、苜蓿、干豆和小米等八种作物的模型性能进行了评估。这些模型利用日历时间、热时间和归一化差异植被指数等预测属性。结果表明，这种方法可广泛应用于不同作物，对作物生长进度进行大规模预测，并能估算重要的农艺参数。为了促进可重复性，我们开发了一个 R 软件包生态系统，并以 "人类的年龄 "为名向公众开放。这些软件包可用于在任何拥有此类数据的地区（包括美国）实施所介绍的方法。

{"title":"Cumulative link mixed-effects models in the service of remote sensing crop progress monitoring.","authors":"Ioannis Oikonomidis, Samis Trevezas","doi":"10.1093/biomtc/ujae137","DOIUrl":"https://doi.org/10.1093/biomtc/ujae137","url":null,"abstract":"This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time-dependent prognostic accuracy measures for recurrent event data. 复发事件数据的时变预测准确度测量。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae150

R Dey, D E Schaubel, J A Hanley, P Saha-Chaudhuri

In many clinical contexts, the event of interest could occur multiple times for the same patient. Considerable advancement has been made on developing recurrent event models based on or that use biomarker information. However, less attention has been given to evaluating the prognostic accuracy of a biomarker or a composite score obtained from a fitted recurrent event-rate model. In this manuscript, we propose novel measures to characterize the prognostic accuracy of a marker measured at baseline in the presence of recurrent events. The proposed estimators are based on a semiparametric frailty model that accounts for the informativeness of a marker and unobserved heterogeneity among patients with respect to the rate of event occurrence. We investigate the asymptotic properties of the proposed accuracy estimators and demonstrate these estimators' finite sample performance through simulation studies. The proposed estimators have minimal bias and appropriate coverage. The estimators are applied to evaluate the performance of a baseline forced expiratory volume, a measure of lung capacity, for repeated episodes of pulmonary exacerbations in patients with cystic fibrosis.

在许多临床环境中，感兴趣的事件可能在同一患者身上多次发生。在开发基于或使用生物标志物信息的复发事件模型方面取得了相当大的进展。然而，很少有人关注评估生物标志物或从拟合的复发事件率模型中获得的综合评分的预后准确性。在这篇手稿中，我们提出了新的措施来表征在基线时测量的复发事件的预后准确性。所提出的估计基于半参数脆弱性模型，该模型考虑了标志物的信息性和患者之间关于事件发生率的未观察到的异质性。我们研究了所提出的精度估计量的渐近性质，并通过仿真研究证明了这些估计量的有限样本性能。所建议的估计器具有最小的偏差和适当的覆盖范围。该估计器用于评估囊性纤维化患者反复发作的肺加重的基线用力呼气量（肺活量的量度）的表现。

{"title":"Time-dependent prognostic accuracy measures for recurrent event data.","authors":"R Dey, D E Schaubel, J A Hanley, P Saha-Chaudhuri","doi":"10.1093/biomtc/ujae150","DOIUrl":"10.1093/biomtc/ujae150","url":null,"abstract":"In many clinical contexts, the event of interest could occur multiple times for the same patient. Considerable advancement has been made on developing recurrent event models based on or that use biomarker information. However, less attention has been given to evaluating the prognostic accuracy of a biomarker or a composite score obtained from a fitted recurrent event-rate model. In this manuscript, we propose novel measures to characterize the prognostic accuracy of a marker measured at baseline in the presence of recurrent events. The proposed estimators are based on a semiparametric frailty model that accounts for the informativeness of a marker and unobserved heterogeneity among patients with respect to the rate of event occurrence. We investigate the asymptotic properties of the proposed accuracy estimators and demonstrate these estimators' finite sample performance through simulation studies. The proposed estimators have minimal bias and appropriate coverage. The estimators are applied to evaluate the performance of a baseline forced expiratory volume, a measure of lung capacity, for repeated episodes of pulmonary exacerbations in patients with cystic fibrosis.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structured feature ranking for genomic marker identification accommodating multiple types of networks. 适应多种类型网络的基因组标记识别的结构化特征排序。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae158

Yeheng Ge, Tao Li, Xingdong Feng, Mengyun Wu, Hailong Liu

Numerous statistical methods have been developed to search for genomic markers associated with the development, progression, and response to treatment of complex diseases. Among them, feature ranking plays a vital role due to its intuitive formulation and computational efficiency. However, most of the existing methods are based on the marginal importance of molecular predictors and share the limitation that the dependence (network) structures among predictors are not well accommodated, where a disease phenotype usually reflects various biological processes that interact in a complex network. In this paper, we propose a structured feature ranking method for identifying genomic markers, where such network structures are effectively accommodated using Laplacian regularization. The proposed method innovatively investigates multiple network scenarios, where the networks can be known a priori and data-dependently estimated. In addition, we rigorously explore the noise and uncertainty in the networks and control their impacts with proper selection of tuning parameters. These characteristics make the proposed method enjoy especially broad applicability. Theoretical result of our proposal is rigorously established. Compared to the original marginal measure, the proposed network structured measure can achieve sure screening properties with a faster convergence rate under mild conditions. Extensive simulations and analysis of The Cancer Genome Atlas melanoma data demonstrate the improvement of finite sample performance and practical usefulness of the proposed method.

已经开发了许多统计方法来寻找与复杂疾病的发生、进展和治疗反应相关的基因组标记。其中，特征排序因其公式直观、计算效率高而起着至关重要的作用。然而，大多数现有方法都是基于分子预测因子的边际重要性，并且存在预测因子之间的依赖（网络）结构不能很好适应的局限性，其中疾病表型通常反映了在复杂网络中相互作用的各种生物过程。在本文中，我们提出了一种用于识别基因组标记的结构化特征排序方法，其中使用拉普拉斯正则化有效地容纳了这种网络结构。该方法创新性地研究了多个网络场景，其中网络可以被先验地知道并依赖于数据进行估计。此外，我们严格研究了网络中的噪声和不确定性，并通过合理选择调谐参数来控制它们的影响。这些特点使所提出的方法具有特别广泛的适用性。我们的建议的理论结果是严格成立的。与原有的边际测度相比，本文提出的网络结构化测度在温和条件下具有较快的收敛速度，具有一定的筛选性能。对癌症基因组图谱黑色素瘤数据的大量模拟和分析证明了有限样本性能的改进和所提出方法的实用性。

{"title":"Structured feature ranking for genomic marker identification accommodating multiple types of networks.","authors":"Yeheng Ge, Tao Li, Xingdong Feng, Mengyun Wu, Hailong Liu","doi":"10.1093/biomtc/ujae158","DOIUrl":"https://doi.org/10.1093/biomtc/ujae158","url":null,"abstract":"Numerous statistical methods have been developed to search for genomic markers associated with the development, progression, and response to treatment of complex diseases. Among them, feature ranking plays a vital role due to its intuitive formulation and computational efficiency. However, most of the existing methods are based on the marginal importance of molecular predictors and share the limitation that the dependence (network) structures among predictors are not well accommodated, where a disease phenotype usually reflects various biological processes that interact in a complex network. In this paper, we propose a structured feature ranking method for identifying genomic markers, where such network structures are effectively accommodated using Laplacian regularization. The proposed method innovatively investigates multiple network scenarios, where the networks can be known a priori and data-dependently estimated. In addition, we rigorously explore the noise and uncertainty in the networks and control their impacts with proper selection of tuning parameters. These characteristics make the proposed method enjoy especially broad applicability. Theoretical result of our proposal is rigorously established. Compared to the original marginal measure, the proposed network structured measure can achieve sure screening properties with a faster convergence rate under mild conditions. Extensive simulations and analysis of The Cancer Genome Atlas melanoma data demonstrate the improvement of finite sample performance and practical usefulness of the proposed method.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semiparametric sensitivity analysis: unmeasured confounding in observational studies. 半参数敏感性分析：观察性研究中的未测量混杂因素。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae106

Razieh Nabi, Matteo Bonvini, Edward H Kennedy, Ming-Yueh Huang, Marcela Smid, Daniel O Scharfstein

Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed by Robins et al., Franks et al., and Zhou and Yao. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step, split sample, truncated estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has $sqrt{n}$ asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.

从观察数据中建立因果关系往往依赖于无法检验的假设。了解从非实验研究中得出的结论是否以及在多大程度上对潜在的未测量混杂因素具有稳健性至关重要。在本文中，我们将平均因果效应（ACE）作为推论目标。我们推广了罗宾斯等人、弗兰克斯等人以及周和姚所开发的敏感性分析方法。我们使用半参数理论推导出固定敏感度参数下 ACE 的非参数有效影响函数。我们利用该影响函数构建了一个一步法、分割样本、截断的 ACE 估计器。我们的估计器依赖于观测数据分布的半参数模型；重要的是，这些模型对灵敏度分析参数值不施加任何限制。我们建立了充分条件，确保我们的估计器具有 $sqrt{n}$ 渐进性。我们使用我们的方法来评估孕期吸烟对出生体重的因果效应。我们还在模拟研究中评估了估计程序的性能。

{"title":"Semiparametric sensitivity analysis: unmeasured confounding in observational studies.","authors":"Razieh Nabi, Matteo Bonvini, Edward H Kennedy, Ming-Yueh Huang, Marcela Smid, Daniel O Scharfstein","doi":"10.1093/biomtc/ujae106","DOIUrl":"https://doi.org/10.1093/biomtc/ujae106","url":null,"abstract":"Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed by Robins et al., Franks et al., and Zhou and Yao. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step, split sample, truncated estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has $sqrt{n}$ asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sensitivity analysis for studies transporting prediction models. 研究运输预测模型的敏感性分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae129

Jon A Steingrimsson, Sarah E Robertson, Sarah Voter, Issa J Dahabreh

We consider estimation of measures of model performance in a target population when covariate and outcome data are available from a source population and covariate data, but not outcome data, are available from the target population. In this setting, identification of measures of model performance is possible under an untestable assumption that the outcome and population (source or target) are independent conditional on covariates. In practice, this assumption is uncertain and, in some cases, controversial. Therefore, sensitivity analysis may be useful for examining the impact of assumption violations on inferences about model performance. Here, we propose an exponential tilt sensitivity analysis model and develop statistical methods to determine how measures of model performance are affected by violations of the assumption of conditional independence between outcome and population. We provide identification results and estimators for the risk in the target population under the sensitivity analysis model, examine the large-sample properties of the estimators, and apply them to data on lung cancer screening.

我们考虑在目标人群中估算模型的性能指标，即从源人群中获得协变量和结果数据，从目标人群中获得协变量数据而非结果数据。在这种情况下，可以根据一个无法检验的假设来确定模型的性能指标，即结果和人群（源人群或目标人群）在协变量条件下是独立的。实际上，这一假设并不确定，在某些情况下还存在争议。因此，灵敏度分析可用于检查违反假设对模型性能推断的影响。在此，我们提出了一个指数倾斜敏感性分析模型，并开发了统计方法来确定结果与人群之间条件独立假设的违反对模型性能的影响。我们提供了敏感性分析模型下目标人群风险的识别结果和估计值，检验了估计值的大样本特性，并将其应用于肺癌筛查数据。

{"title":"Sensitivity analysis for studies transporting prediction models.","authors":"Jon A Steingrimsson, Sarah E Robertson, Sarah Voter, Issa J Dahabreh","doi":"10.1093/biomtc/ujae129","DOIUrl":"10.1093/biomtc/ujae129","url":null,"abstract":"We consider estimation of measures of model performance in a target population when covariate and outcome data are available from a source population and covariate data, but not outcome data, are available from the target population. In this setting, identification of measures of model performance is possible under an untestable assumption that the outcome and population (source or target) are independent conditional on covariates. In practice, this assumption is uncertain and, in some cases, controversial. Therefore, sensitivity analysis may be useful for examining the impact of assumption violations on inferences about model performance. Here, we propose an exponential tilt sensitivity analysis model and develop statistical methods to determine how measures of model performance are affected by violations of the assumption of conditional independence between outcome and population. We provide identification results and estimators for the risk in the target population under the sensitivity analysis model, examine the large-sample properties of the estimators, and apply them to data on lung cancer screening.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11582396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories. 针对高维基因表达轨迹的依存高斯过程动态因子分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae131

Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom

The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).

越来越多的高维纵向基因表达测量方法有助于了解生物机制，这是精准医疗所必需的。生物学知识表明，描述复杂疾病的最佳方法可能是在可能相互影响的潜在通路层面上进行描述。我们提出了一种贝叶斯方法，可以通过隶属高斯过程（DGP）描述不同通路之间的这种相关性，并通过贝叶斯稀疏因子分析将观察到的高维基因表达轨迹映射到未观察到的低维通路表达轨迹中。我们的建议是对纵向数据放宽独立因子经典假设的首次尝试，并通过模拟和实际数据分析，在恢复通路表达轨迹的形状、揭示基因和通路之间的关系以及预测基因表达（更接近的点估计和更窄的预测区间）方面表现出卓越的性能。为了拟合模型，我们提出了蒙特卡洛期望最大化（MCEM）方案，通过结合标准马尔可夫链蒙特卡洛采样器和 R 软件包 GPFDA（可返回 DGP 超参数的最大似然估计值），可以方便地实现该方案。MCEM 的模块化结构使其可以推广到涉及 DGP 模型组件的其他复杂模型。我们的 R 软件包 DGP4LCF 可在 R Archive Network (CRAN) 上查阅。

{"title":"Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories.","authors":"Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom","doi":"10.1093/biomtc/ujae131","DOIUrl":"https://doi.org/10.1093/biomtc/ujae131","url":null,"abstract":"The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian network-guided sparse regression with flexible varying effects. 具有灵活变化效应的贝叶斯网络引导稀疏回归。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae111

Yangfan Ren, Christine B Peterson, Marina Vannucci

In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.

在本文中，我们提出了一种新颖的贝叶斯回归特征选择方法--图形估计变化效应回归（VERGE）。我们的模型在一些关键方面能够利用基因组学或成像研究中产生的数据集的复杂结构。我们区分了预测因子（即结果预测模型中使用的特征）和受试者水平协变量（调节预测因子对结果的影响）。我们构建了一个变化系数建模框架，在这个框架中，我们推断出预测变量之间的网络，并利用这一网络信息鼓励选择相关的预测变量。我们采用了变量选择尖峰和平板先验，从而能够选择网络关联的预测变量和改变预测效应的协变量。我们通过模拟研究证明，我们的方法在特征选择和预测准确性方面都优于现有的替代方法。我们将 VERGE 应用于描述肠道微生物组特征对肥胖的影响，并在此基础上确定了一系列微生物类群及其生态依赖关系。我们允许受试者级别的协变量（包括性别和饮食摄入变量）来修改微生物组预测因子的系数，从而为这些因素之间的相互作用提供更多的洞察力。

{"title":"Bayesian network-guided sparse regression with flexible varying effects.","authors":"Yangfan Ren, Christine B Peterson, Marina Vannucci","doi":"10.1093/biomtc/ujae111","DOIUrl":"https://doi.org/10.1093/biomtc/ujae111","url":null,"abstract":"In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian pathway analysis over brain network mediators for survival data. 针对生存数据的脑网络介质贝叶斯路径分析

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae132

Xinyuan Tian, Fan Li, Li Shen, Denise Esserman, Yize Zhao

Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among genetic exposure, brain connectivity, and time to disease onset with maximum information extraction, we propose a Bayesian approach to model the effect pathway between each of these components while quantifying the mediating role of brain networks. To accommodate the biological architectures of brain connectivity constructed along white matter fiber tracts, we develop a structural model which includes a symmetric matrix-variate accelerated failure time model for disease onset and a symmetric matrix response regression for the network-variate mediator. We further impose within-graph sparsity and between-graph shrinkage to identify informative network configurations and eliminate the interference of noisy components. Simulations are carried out to confirm the advantages of our proposed method over existing alternatives. By applying the proposed method to the landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain neurobiologically plausible insights that may inform future intervention strategies.

无创成像技术的进步促进了全脑互连网络（即大脑连接性）的构建。现有的大脑连通性分析方法经常将整个网络分解为独特的边缘向量或摘要度量，导致大量信息丢失。为了探索遗传暴露、大脑连通性和发病时间之间的效应机制，并最大限度地提取信息，我们提出了一种贝叶斯方法来模拟这些组成部分之间的效应途径，同时量化大脑网络的中介作用。为了适应沿白质纤维束构建的大脑连通性生物结构，我们建立了一个结构模型，其中包括一个对称矩阵变量加速失败时间模型（用于疾病发病）和一个对称矩阵响应回归模型（用于网络变量中介）。我们进一步施加了图内稀疏性和图间收缩，以识别信息网络配置并消除噪声成分的干扰。通过模拟实验，我们证实了我们提出的方法相对于现有方法的优势。通过将所提出的方法应用于具有里程碑意义的阿尔茨海默病神经成像倡议研究，我们获得了神经生物学上合理的见解，这些见解或许能为未来的干预策略提供参考。

{"title":"Bayesian pathway analysis over brain network mediators for survival data.","authors":"Xinyuan Tian, Fan Li, Li Shen, Denise Esserman, Yize Zhao","doi":"10.1093/biomtc/ujae132","DOIUrl":"10.1093/biomtc/ujae132","url":null,"abstract":"Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among genetic exposure, brain connectivity, and time to disease onset with maximum information extraction, we propose a Bayesian approach to model the effect pathway between each of these components while quantifying the mediating role of brain networks. To accommodate the biological architectures of brain connectivity constructed along white matter fiber tracts, we develop a structural model which includes a symmetric matrix-variate accelerated failure time model for disease onset and a symmetric matrix response regression for the network-variate mediator. We further impose within-graph sparsity and between-graph shrinkage to identify informative network configurations and eliminate the interference of noisy components. Simulations are carried out to confirm the advantages of our proposed method over existing alternatives. By applying the proposed method to the landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain neurobiologically plausible insights that may inform future intervention strategies.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint mirror procedure: controlling false discovery rate for identifying simultaneous signals. 联合镜像程序：控制识别同步信号的错误发现率。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae142

Linsui Deng, Kejun He, Xianyang Zhang

In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.

在许多应用中，识别感兴趣的特定特征的过程通常涉及测试多个假设的联合统计显著性。例子包括中介分析，它同时检查暴露中介和中介结果效应的存在，以及可复制性分析，旨在识别在多个独立研究中表现出统计显著性的同时信号。在这项工作中，我们提出了一种称为联合镜像（JM）程序的新方法，该方法可以有效地检测这些特征，同时在有限样本中保持错误发现率（FDR）控制。JM过程采用迭代方法，根据逐步揭示的信息逐渐缩小拒绝区域，直到错误发现比例的保守估计低于目标FDR水平。此外，我们引入了一种更严格的误差度量，称为复合FDR (cFDR)，它根据每个错误发现的null分量的数量为其分配权重。我们用留一技术证明了JM程序在有限样本下控制cFDR。为了实现JM过程，我们提出了一种有效的算法，该算法可以包含偏序信息。通过广泛的模拟，我们证明了我们的过程有效地控制了cFDR，并增强了各种场景下的统计能力，包括测试统计依赖于特征的情况。最后，我们通过将该方法应用于真实世界的中介和可复制性分析来展示其实用性。

{"title":"Joint mirror procedure: controlling false discovery rate for identifying simultaneous signals.","authors":"Linsui Deng, Kejun He, Xianyang Zhang","doi":"10.1093/biomtc/ujae142","DOIUrl":"10.1093/biomtc/ujae142","url":null,"abstract":"In many applications, the process of identifying a specific feature of interest often involves testing multiple hypotheses for their joint statistical significance. Examples include mediation analysis, which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis, aiming to identify simultaneous signals that exhibit statistical significance across multiple independent studies. In this work, we present a new approach called the joint mirror (JM) procedure that effectively detects such features while maintaining false discovery rate (FDR) control in finite samples. The JM procedure employs an iterative method that gradually shrinks the rejection region based on progressively revealed information until a conservative estimate of the false discovery proportion is below the target FDR level. Additionally, we introduce a more stringent error measure known as the composite FDR (cFDR), which assigns weights to each false discovery based on its number of null components. We use the leave-one-out technique to prove that the JM procedure controls the cFDR in finite samples. To implement the JM procedure, we propose an efficient algorithm that can incorporate partial ordering information. Through extensive simulations, we show that our procedure effectively controls the cFDR and enhances statistical power across various scenarios, including the case that test statistics are dependent across the features. Finally, we showcase the utility of our method by applying it to real-world mediation and replicability analyses.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal adaptive SMART designs with binary outcomes. 具有二元结果的最优自适应SMART设计。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae140

Rik Ghosh, Bibhas Chakraborty, Inbal Nahum-Shani, Megan E Patrick, Palash Ghosh

In a sequential multiple-assignment randomized trial (SMART), a sequence of treatments is given to a patient over multiple stages. In each stage, randomization may be done to allocate patients to different treatment groups. Even though SMART designs are getting popular among clinical researchers, the methodologies for adaptive randomization at different stages of a SMART are few and not sophisticated enough to handle the complexity of optimal allocation of treatments at every stage of a trial. Lack of optimal allocation methodologies can raise critical concerns about SMART designs from an ethical point of view. In this work, we develop an optimal adaptive allocation procedure using a constrained optimization that minimizes the total expected number of treatment failures for a SMART with a binary primary outcome, subject to a fixed asymptotic variance of a predefined objective function. Issues related to optimal adaptive allocations are explored theoretically with supporting simulations. The applicability of the proposed methodology is demonstrated using a recently conducted SMART study named M-bridge for developing universal and resource-efficient dynamic treatment regimes for incoming first-year college students as a bridge to desirable treatments to address alcohol-related risks.

在顺序多任务随机试验（SMART）中，在多个阶段对患者进行一系列治疗。在每个阶段，可以进行随机化，将患者分配到不同的治疗组。尽管SMART设计在临床研究人员中越来越受欢迎，但在SMART的不同阶段进行适应性随机化的方法很少，而且不够复杂，无法处理在试验的每个阶段最佳分配治疗的复杂性。从道德的角度来看，缺乏最佳分配方法可能会引起对SMART设计的关键关注。在这项工作中，我们开发了一个最优的自适应分配程序，该程序使用约束优化最小化具有二元主要结果的SMART的治疗失败的总预期次数，受制于预定义目标函数的固定渐近方差。从理论上探讨了最优自适应分配的相关问题，并进行了仿真分析。最近进行的一项名为M-bridge的SMART研究证明了所提出方法的适用性，该研究旨在为即将入学的一年级大学生开发普遍且资源高效的动态治疗方案，作为解决酒精相关风险的理想治疗方法的桥梁。

{"title":"Optimal adaptive SMART designs with binary outcomes.","authors":"Rik Ghosh, Bibhas Chakraborty, Inbal Nahum-Shani, Megan E Patrick, Palash Ghosh","doi":"10.1093/biomtc/ujae140","DOIUrl":"10.1093/biomtc/ujae140","url":null,"abstract":"In a sequential multiple-assignment randomized trial (SMART), a sequence of treatments is given to a patient over multiple stages. In each stage, randomization may be done to allocate patients to different treatment groups. Even though SMART designs are getting popular among clinical researchers, the methodologies for adaptive randomization at different stages of a SMART are few and not sophisticated enough to handle the complexity of optimal allocation of treatments at every stage of a trial. Lack of optimal allocation methodologies can raise critical concerns about SMART designs from an ethical point of view. In this work, we develop an optimal adaptive allocation procedure using a constrained optimization that minimizes the total expected number of treatment failures for a SMART with a binary primary outcome, subject to a fixed asymptotic variance of a predefined objective function. Issues related to optimal adaptive allocations are explored theoretically with supporting simulations. The applicability of the proposed methodology is demonstrated using a recently conducted SMART study named M-bridge for developing universal and resource-efficient dynamic treatment regimes for incoming first-year college students as a bridge to desirable treatments to address alcohol-related risks.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639531/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0