首页 > 最新文献

Biometrics最新文献

英文 中文
Composite dyadic models for spatio-temporal data. 时空数据的复合二元模型。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae107
Michael R Schwob, Mevin B Hooten, Vagheesh Narasimhan

Mechanistic statistical models are commonly used to study the flow of biological processes. For example, in landscape genetics, the aim is to infer spatial mechanisms that govern gene flow in populations. Existing statistical approaches in landscape genetics do not account for temporal dependence in the data and may be computationally prohibitive. We infer mechanisms with a Bayesian hierarchical dyadic model that scales well with large data sets and that accounts for spatial and temporal dependence. We construct a fully connected network comprising spatio-temporal data for the dyadic model and use normalized composite likelihoods to account for the dependence structure in space and time. We develop a dyadic model to account for physical mechanisms commonly found in physical-statistical models and apply our methods to ancient human DNA data to infer the mechanisms that affected human movement in Bronze Age Europe.

机制统计模型通常用于研究生物过程的流动。例如,在景观遗传学中,目的是推断支配种群基因流动的空间机制。景观遗传学中的现有统计方法并不考虑数据的时间依赖性,而且计算量可能过大。我们采用贝叶斯分层二元模型来推断机制,该模型能很好地扩展大型数据集,并考虑空间和时间依赖性。我们为二元模型构建了一个由时空数据组成的全连接网络,并使用归一化复合似然来解释空间和时间上的依赖结构。我们建立了一个二元模型来解释物理统计模型中常见的物理机制,并将我们的方法应用于古人类 DNA 数据,以推断影响青铜时代欧洲人类运动的机制。
{"title":"Composite dyadic models for spatio-temporal data.","authors":"Michael R Schwob, Mevin B Hooten, Vagheesh Narasimhan","doi":"10.1093/biomtc/ujae107","DOIUrl":"10.1093/biomtc/ujae107","url":null,"abstract":"<p><p>Mechanistic statistical models are commonly used to study the flow of biological processes. For example, in landscape genetics, the aim is to infer spatial mechanisms that govern gene flow in populations. Existing statistical approaches in landscape genetics do not account for temporal dependence in the data and may be computationally prohibitive. We infer mechanisms with a Bayesian hierarchical dyadic model that scales well with large data sets and that accounts for spatial and temporal dependence. We construct a fully connected network comprising spatio-temporal data for the dyadic model and use normalized composite likelihoods to account for the dependence structure in space and time. We develop a dyadic model to account for physical mechanisms commonly found in physical-statistical models and apply our methods to ancient human DNA data to infer the mechanisms that affected human movement in Bronze Age Europe.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142364260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group sequential testing of a treatment effect using a surrogate marker. 使用替代标记对治疗效果进行分组序列测试。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae108
Layla Parast, Jay Bartroff

The identification of surrogate markers is motivated by their potential to make decisions sooner about a treatment effect. However, few methods have been developed to actually use a surrogate marker to test for a treatment effect in a future study. Most existing methods consider combining surrogate marker and primary outcome information to test for a treatment effect, rely on fully parametric methods where strict parametric assumptions are made about the relationship between the surrogate and the outcome, and/or assume the surrogate marker is measured at only a single time point. Recent work has proposed a nonparametric test for a treatment effect using only surrogate marker information measured at a single time point by borrowing information learned from a prior study where both the surrogate and primary outcome were measured. In this paper, we utilize this nonparametric test and propose group sequential procedures that allow for early stopping of treatment effect testing in a setting where the surrogate marker is measured repeatedly over time. We derive the properties of the correlated surrogate-based nonparametric test statistics at multiple time points and compute stopping boundaries that allow for early stopping for a significant treatment effect, or for futility. We examine the performance of our proposed test using a simulation study and illustrate the method using data from two distinct AIDS clinical trials.

确定替代标记物的动机在于它们有可能更快地对治疗效果做出决定。然而,在未来的研究中,很少有方法能真正使用替代标记物来检验治疗效果。现有的大多数方法都考虑结合替代标记物和主要结果信息来检验治疗效果,依赖于全参数方法,即对替代标记物和结果之间的关系做出严格的参数假设,和/或假设替代标记物仅在单一时间点进行测量。最近的研究提出了一种非参数检验方法,通过借用先前研究中同时测量代用指标和主要结果的信息,仅使用单一时间点测量的代用指标信息来检验治疗效果。在本文中,我们利用这种非参数检验,提出了分组序列程序,允许在一段时间内重复测量替代标记物的情况下尽早停止治疗效果检验。我们推导了多个时间点上基于相关代用指标的非参数检验统计量的特性,并计算了停止界限,以便在治疗效果显著或无效时尽早停止。我们通过模拟研究检验了我们提出的检验方法的性能,并使用两项不同的艾滋病临床试验数据对该方法进行了说明。
{"title":"Group sequential testing of a treatment effect using a surrogate marker.","authors":"Layla Parast, Jay Bartroff","doi":"10.1093/biomtc/ujae108","DOIUrl":"https://doi.org/10.1093/biomtc/ujae108","url":null,"abstract":"<p><p>The identification of surrogate markers is motivated by their potential to make decisions sooner about a treatment effect. However, few methods have been developed to actually use a surrogate marker to test for a treatment effect in a future study. Most existing methods consider combining surrogate marker and primary outcome information to test for a treatment effect, rely on fully parametric methods where strict parametric assumptions are made about the relationship between the surrogate and the outcome, and/or assume the surrogate marker is measured at only a single time point. Recent work has proposed a nonparametric test for a treatment effect using only surrogate marker information measured at a single time point by borrowing information learned from a prior study where both the surrogate and primary outcome were measured. In this paper, we utilize this nonparametric test and propose group sequential procedures that allow for early stopping of treatment effect testing in a setting where the surrogate marker is measured repeatedly over time. We derive the properties of the correlated surrogate-based nonparametric test statistics at multiple time points and compute stopping boundaries that allow for early stopping for a significant treatment effect, or for futility. We examine the performance of our proposed test using a simulation study and illustrate the method using data from two distinct AIDS clinical trials.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to achieve model-robust inference in stepped wedge trials with model-based methods? 如何利用基于模型的方法在阶梯楔形试验中实现模型可靠的推断?
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae123
Bingkai Wang, Xueqi Wang, Fan Li

A stepped wedge design is an unidirectional crossover design where clusters are randomized to distinct treatment sequences. While model-based analysis of stepped wedge designs is a standard practice to evaluate treatment effects accounting for clustering and adjusting for covariates, their properties under misspecification have not been systematically explored. In this article, we focus on model-based methods, including linear mixed models and generalized estimating equations with an independence, simple exchangeable, or nested exchangeable working correlation structure. We study when a potentially misspecified working model can offer consistent estimation of the marginal treatment effect estimands, which are defined nonparametrically with potential outcomes and may be functions of calendar time and/or exposure time. We prove a central result that consistency for nonparametric estimands usually requires a correctly specified treatment effect structure, but generally not the remaining aspects of the working model (functional form of covariates, random effects, and error distribution), and valid inference is obtained via the sandwich variance estimator. Furthermore, an additional g-computation step is required to achieve model-robust inference under non-identity link functions or for ratio estimands. The theoretical results are illustrated via several simulation experiments and re-analysis of a completed stepped wedge cluster randomized trial.

阶梯楔形设计是一种单向交叉设计,在这种设计中,分组被随机分配到不同的治疗序列中。基于模型的阶梯楔形设计分析是评估治疗效果的标准做法,它考虑了聚类并调整了协变量,但尚未系统地探讨其在错误规范下的特性。本文重点讨论基于模型的方法,包括线性混合模型和具有独立、简单可交换或嵌套可交换工作相关结构的广义估计方程。我们研究了一个可能被错误定义的工作模型在什么情况下可以提供边际治疗效果估计值的一致性估计,边际治疗效果估计值是用潜在结果非参数定义的,可能是日历时间和/或暴露时间的函数。我们证明了一个核心结果,即非参数估计的一致性通常需要一个正确指定的治疗效果结构,但一般不需要工作模型的其他方面(协变量的函数形式、随机效应和误差分布),并且可以通过三明治方差估计器获得有效推论。此外,还需要额外的 g 计算步骤,才能在非同一性联系函数或比率估计值条件下实现模型可靠的推断。通过几个模拟实验和对已完成的阶梯楔形群随机试验的重新分析,对理论结果进行了说明。
{"title":"How to achieve model-robust inference in stepped wedge trials with model-based methods?","authors":"Bingkai Wang, Xueqi Wang, Fan Li","doi":"10.1093/biomtc/ujae123","DOIUrl":"10.1093/biomtc/ujae123","url":null,"abstract":"<p><p>A stepped wedge design is an unidirectional crossover design where clusters are randomized to distinct treatment sequences. While model-based analysis of stepped wedge designs is a standard practice to evaluate treatment effects accounting for clustering and adjusting for covariates, their properties under misspecification have not been systematically explored. In this article, we focus on model-based methods, including linear mixed models and generalized estimating equations with an independence, simple exchangeable, or nested exchangeable working correlation structure. We study when a potentially misspecified working model can offer consistent estimation of the marginal treatment effect estimands, which are defined nonparametrically with potential outcomes and may be functions of calendar time and/or exposure time. We prove a central result that consistency for nonparametric estimands usually requires a correctly specified treatment effect structure, but generally not the remaining aspects of the working model (functional form of covariates, random effects, and error distribution), and valid inference is obtained via the sandwich variance estimator. Furthermore, an additional g-computation step is required to achieve model-robust inference under non-identity link functions or for ratio estimands. The theoretical results are illustrated via several simulation experiments and re-analysis of a completed stepped wedge cluster randomized trial.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneity-aware integrative regression for ancestry-specific association studies. 用于祖先特异性关联研究的异质性感知整合回归。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae109
Aaron J Molstad, Yanwei Cai, Alexander P Reiner, Charles Kooperberg, Wei Sun, Li Hsu

Ancestry-specific proteome-wide association studies (PWAS) based on genetically predicted protein expression can reveal complex disease etiology specific to certain ancestral groups. These studies require ancestry-specific models for protein expression as a function of SNP genotypes. In order to improve protein expression prediction in ancestral populations historically underrepresented in genomic studies, we propose a new penalized maximum likelihood estimator for fitting ancestry-specific joint protein quantitative trait loci models. Our estimator borrows information across ancestral groups, while simultaneously allowing for heterogeneous error variances and regression coefficients. We propose an alternative parameterization of our model that makes the objective function convex and the penalty scale invariant. To improve computational efficiency, we propose an approximate version of our method and study its theoretical properties. Our method provides a substantial improvement in protein expression prediction accuracy in individuals of African ancestry, and in a downstream PWAS analysis, leads to the discovery of multiple associations between protein expression and blood lipid traits in the African ancestry population.

基于基因预测蛋白表达的特定祖先全蛋白质组关联研究(PWAS)可以揭示某些祖先群体特有的复杂疾病病因。这些研究需要特定祖先的蛋白质表达模型作为 SNP 基因型的函数。为了改善在基因组研究中历来代表性不足的祖先人群的蛋白质表达预测,我们提出了一种新的惩罚性最大似然估计器,用于拟合祖先特异性联合蛋白质数量性状位点模型。我们的估计器借用了不同祖先群体的信息,同时允许异质性误差方差和回归系数。我们提出了模型的另一种参数化方法,使目标函数具有凸性和惩罚尺度不变性。为了提高计算效率,我们提出了一种近似版本的方法,并对其理论特性进行了研究。我们的方法大大提高了非洲血统个体蛋白质表达预测的准确性,并在下游的 PWAS 分析中发现了非洲血统人群中蛋白质表达与血脂特征之间的多种关联。
{"title":"Heterogeneity-aware integrative regression for ancestry-specific association studies.","authors":"Aaron J Molstad, Yanwei Cai, Alexander P Reiner, Charles Kooperberg, Wei Sun, Li Hsu","doi":"10.1093/biomtc/ujae109","DOIUrl":"10.1093/biomtc/ujae109","url":null,"abstract":"<p><p>Ancestry-specific proteome-wide association studies (PWAS) based on genetically predicted protein expression can reveal complex disease etiology specific to certain ancestral groups. These studies require ancestry-specific models for protein expression as a function of SNP genotypes. In order to improve protein expression prediction in ancestral populations historically underrepresented in genomic studies, we propose a new penalized maximum likelihood estimator for fitting ancestry-specific joint protein quantitative trait loci models. Our estimator borrows information across ancestral groups, while simultaneously allowing for heterogeneous error variances and regression coefficients. We propose an alternative parameterization of our model that makes the objective function convex and the penalty scale invariant. To improve computational efficiency, we propose an approximate version of our method and study its theoretical properties. Our method provides a substantial improvement in protein expression prediction accuracy in individuals of African ancestry, and in a downstream PWAS analysis, leads to the discovery of multiple associations between protein expression and blood lipid traits in the African ancestry population.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11492996/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new robust approach for the polytomous logistic regression model based on Rényi's pseudodistances. 基于 Rényi 伪距的多项式逻辑回归模型新稳健方法。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae125
Elena Castilla

This paper presents a robust alternative to the maximum likelihood estimator (MLE) for the polytomous logistic regression model, known as the family of minimum Rènyi Pseudodistance (RP) estimators. The proposed minimum RP estimators are parametrized by a tuning parameter $alpha ge 0$, and include the MLE as a special case when $alpha =0$. These estimators, along with a family of RP-based Wald-type tests, are shown to exhibit superior performance in the presence of misclassification errors. The paper includes an extensive simulation study and a real data example to illustrate the robustness of these proposed statistics.

本文提出了多态逻辑回归模型最大似然估计器(MLE)的稳健替代方法,即最小雷尼伪距(RP)估计器系列。所提出的最小 RP 估计器由一个调整参数 $alpha ge 0$ 参数化,并将 MLE 作为 $alpha =0$ 时的特例。这些估计器以及一系列基于 RP 的沃尔德类型检验,在存在误分类误差的情况下表现出卓越的性能。论文包括一项广泛的模拟研究和一个真实数据示例,以说明这些拟议统计量的稳健性。
{"title":"A new robust approach for the polytomous logistic regression model based on Rényi's pseudodistances.","authors":"Elena Castilla","doi":"10.1093/biomtc/ujae125","DOIUrl":"https://doi.org/10.1093/biomtc/ujae125","url":null,"abstract":"<p><p>This paper presents a robust alternative to the maximum likelihood estimator (MLE) for the polytomous logistic regression model, known as the family of minimum Rènyi Pseudodistance (RP) estimators. The proposed minimum RP estimators are parametrized by a tuning parameter $alpha ge 0$, and include the MLE as a special case when $alpha =0$. These estimators, along with a family of RP-based Wald-type tests, are shown to exhibit superior performance in the presence of misclassification errors. The paper includes an extensive simulation study and a real data example to illustrate the robustness of these proposed statistics.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A causal inference framework for leveraging external controls in hybrid trials. 在混合试验中利用外部控制的因果推理框架。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae095
Michael Valancius, Herbert Pang, Jiawen Zhu, Stephen R Cole, Michele Jonsson Funk, Michael R Kosorok

We consider the challenges associated with causal inference in settings where data from a randomized trial are augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE). This question is motivated by the SUNFISH trial, which investigated the effect of risdiplam on motor function in patients with spinal muscular atrophy. While the original analysis used only data generated by the trial, we explore an alternative analysis incorporating external controls from the placebo arm of a historical trial. We cast the setting into a formal causal inference framework and show how these designs are characterized by a lack of full randomization to treatment and heightened dependency on modeling. To address this, we outline sufficient causal assumptions about the exchangeability between the internal and external controls to identify the ATE and establish a connection with novel graphical criteria. Furthermore, we propose estimators, review efficiency bounds, develop an approach for efficient doubly robust estimation even when unknown nuisance models are estimated with flexible machine learning methods, suggest model diagnostics, and demonstrate finite-sample performance of the methods through a simulation study. The ideas and methods are illustrated through their application to the SUNFISH trial, where we find that external controls can increase the efficiency of treatment effect estimation.

为了提高平均治疗效果(ATE)的估算效率,我们在随机试验数据的基础上增加了来自外部的对照数据,在这种情况下,我们考虑了与因果推断相关的挑战。这个问题是由 SUNFISH 试验提出的,该试验研究了利西地平对脊髓性肌肉萎缩症患者运动功能的影响。虽然最初的分析只使用了试验产生的数据,但我们探索了另一种分析方法,将历史试验中安慰剂组的外部对照纳入其中。我们将这一设置纳入正式的因果推理框架,并说明这些设计的特点是缺乏治疗的完全随机化以及对建模的高度依赖。为了解决这个问题,我们概述了关于内部和外部控制之间可交换性的充分因果假设,以确定 ATE,并与新的图形标准建立联系。此外,我们还提出了估算方法,审查了效率界限,开发了一种即使在使用灵活的机器学习方法估算未知滋扰模型时也能进行高效双稳健估算的方法,提出了模型诊断建议,并通过模拟研究展示了这些方法的有限样本性能。我们将这些观点和方法应用于 SUNFISH 试验,发现外部控制可以提高治疗效果估计的效率。
{"title":"A causal inference framework for leveraging external controls in hybrid trials.","authors":"Michael Valancius, Herbert Pang, Jiawen Zhu, Stephen R Cole, Michele Jonsson Funk, Michael R Kosorok","doi":"10.1093/biomtc/ujae095","DOIUrl":"10.1093/biomtc/ujae095","url":null,"abstract":"<p><p>We consider the challenges associated with causal inference in settings where data from a randomized trial are augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE). This question is motivated by the SUNFISH trial, which investigated the effect of risdiplam on motor function in patients with spinal muscular atrophy. While the original analysis used only data generated by the trial, we explore an alternative analysis incorporating external controls from the placebo arm of a historical trial. We cast the setting into a formal causal inference framework and show how these designs are characterized by a lack of full randomization to treatment and heightened dependency on modeling. To address this, we outline sufficient causal assumptions about the exchangeability between the internal and external controls to identify the ATE and establish a connection with novel graphical criteria. Furthermore, we propose estimators, review efficiency bounds, develop an approach for efficient doubly robust estimation even when unknown nuisance models are estimated with flexible machine learning methods, suggest model diagnostics, and demonstrate finite-sample performance of the methods through a simulation study. The ideas and methods are illustrated through their application to the SUNFISH trial, where we find that external controls can increase the efficiency of treatment effect estimation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11546536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal effect estimation in survival analysis with high dimensional confounders. 具有高维度混杂因素的生存分析中的因果效应估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae110
Fei Jiang, Ge Zhao, Rosa Rodriguez-Monguio, Yanyuan Ma

With the ever advancing of modern technologies, it has become increasingly common that the number of collected confounders exceeds the number of subjects in a data set. However, matching based methods for estimating causal treatment effect in their original forms are not capable of handling high-dimensional confounders, and their various modified versions lack statistical support and valid inference tools. In this article, we propose a new approach for estimating causal treatment effect, defined as the difference of the restricted mean survival time (RMST) under different treatments in high-dimensional setting for survival data. We combine the factor model and the sufficient dimension reduction techniques to construct propensity score and prognostic score. Based on these scores, we develop a kernel based doubly robust estimator of the RMST difference. We demonstrate its link to matching and establish the consistency and asymptotic normality of the estimator. We illustrate our method by analyzing a dataset from a study aimed at comparing the effects of two alternative treatments on the RMST of patients with diffuse large B cell lymphoma.

随着现代技术的不断进步,收集到的混杂因素数量超过数据集中受试者数量的情况越来越普遍。然而,基于配对的因果治疗效果估计方法的原始形式无法处理高维混杂因素,其各种修改版本也缺乏统计支持和有效的推断工具。在本文中,我们提出了一种估算因果治疗效果的新方法,即在高维生存数据环境下,将因果治疗效果定义为不同治疗下受限平均生存时间(RMST)之差。我们结合因子模型和充分降维技术来构建倾向评分和预后评分。基于这些分数,我们开发了基于核的 RMST 差异双重稳健估计器。我们证明了它与匹配的联系,并建立了估计器的一致性和渐近正态性。我们通过分析一项研究的数据集来说明我们的方法,该研究旨在比较两种替代治疗方法对弥漫大 B 细胞淋巴瘤患者 RMST 的影响。
{"title":"Causal effect estimation in survival analysis with high dimensional confounders.","authors":"Fei Jiang, Ge Zhao, Rosa Rodriguez-Monguio, Yanyuan Ma","doi":"10.1093/biomtc/ujae110","DOIUrl":"https://doi.org/10.1093/biomtc/ujae110","url":null,"abstract":"<p><p>With the ever advancing of modern technologies, it has become increasingly common that the number of collected confounders exceeds the number of subjects in a data set. However, matching based methods for estimating causal treatment effect in their original forms are not capable of handling high-dimensional confounders, and their various modified versions lack statistical support and valid inference tools. In this article, we propose a new approach for estimating causal treatment effect, defined as the difference of the restricted mean survival time (RMST) under different treatments in high-dimensional setting for survival data. We combine the factor model and the sufficient dimension reduction techniques to construct propensity score and prognostic score. Based on these scores, we develop a kernel based doubly robust estimator of the RMST difference. We demonstrate its link to matching and establish the consistency and asymptotic normality of the estimator. We illustrate our method by analyzing a dataset from a study aimed at comparing the effects of two alternative treatments on the RMST of patients with diffuse large B cell lymphoma.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a supervised weighted overfitted latent class analysis. 利用监督加权过度拟合潜类分析法,从调查数据中得出低收入妇女依赖结果的饮食模式。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae122
Stephanie M Wu, Matthew R Williams, Terrance D Savitsky, Briana J K Stephenson

Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesian model-based clustering methods summarize dietary data into latent patterns that holistically capture relationships among foods and a known health outcome but do not sufficiently account for complex survey design. This leads to biased estimation and inference and lack of generalizability of the patterns. To address this, we propose a supervised weighted overfitted latent class analysis (SWOLCA) based on a Bayesian pseudo-likelihood approach that integrates sampling weights into an exposure-outcome model for discrete data. Our model adjusts for stratification, clustering, and informative sampling, and handles modifying effects via interaction terms within a Markov chain Monte Carlo Gibbs sampling algorithm. Simulation studies confirm that the SWOLCA model exhibits good performance in terms of bias, precision, and coverage. Using data from the National Health and Nutrition Examination Survey (2015-2018), we demonstrate the utility of our model by characterizing dietary patterns associated with hypertensive outcomes among low-income women in the United States.

膳食质量差是高血压的一个主要可改变风险因素,对低收入妇女的影响尤为严重。调查是了解未充分研究人群饮食与疾病关系的主要数据来源,但由于膳食数据的复杂性和数据来源的选择偏差,对这一人群进行膳食驱动的高血压结果分析具有挑战性。基于贝叶斯模型的监督聚类方法将膳食数据归纳为潜在模式,可全面捕捉食物与已知健康结果之间的关系,但不能充分考虑复杂的调查设计。这就导致估计和推断存在偏差,模式缺乏普遍性。为了解决这个问题,我们提出了一种基于贝叶斯伪似然法的监督加权过度拟合潜类分析(SWOLCA),该方法将抽样权重整合到离散数据的暴露-结果模型中。我们的模型可对分层、聚类和信息抽样进行调整,并通过马尔科夫链蒙特卡罗吉布斯抽样算法中的交互项处理修正效应。模拟研究证实,SWOLCA 模型在偏差、精确度和覆盖率方面表现出良好的性能。利用美国国家健康与营养调查(2015-2018 年)的数据,我们通过描述与美国低收入妇女高血压结果相关的饮食模式,证明了我们模型的实用性。
{"title":"Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a supervised weighted overfitted latent class analysis.","authors":"Stephanie M Wu, Matthew R Williams, Terrance D Savitsky, Briana J K Stephenson","doi":"10.1093/biomtc/ujae122","DOIUrl":"10.1093/biomtc/ujae122","url":null,"abstract":"<p><p>Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesian model-based clustering methods summarize dietary data into latent patterns that holistically capture relationships among foods and a known health outcome but do not sufficiently account for complex survey design. This leads to biased estimation and inference and lack of generalizability of the patterns. To address this, we propose a supervised weighted overfitted latent class analysis (SWOLCA) based on a Bayesian pseudo-likelihood approach that integrates sampling weights into an exposure-outcome model for discrete data. Our model adjusts for stratification, clustering, and informative sampling, and handles modifying effects via interaction terms within a Markov chain Monte Carlo Gibbs sampling algorithm. Simulation studies confirm that the SWOLCA model exhibits good performance in terms of bias, precision, and coverage. Using data from the National Health and Nutrition Examination Survey (2015-2018), we demonstrate the utility of our model by characterizing dietary patterns associated with hypertensive outcomes among low-income women in the United States.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11518851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wasserstein regression with empirical measures and density estimation for sparse data. 稀疏数据的瓦瑟斯坦回归与经验度量和密度估计。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae127
Yidong Zhou, Hans-Georg Müller

The problem of modeling the relationship between univariate distributions and one or more explanatory variables lately has found increasing interest. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with currently available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for each distribution varies and especially when it is small for some of the distributions but large for others. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations and Environmental Influences on Child Health Outcomes data.

建立单变量分布与一个或多个解释变量之间关系的模型问题近来越来越受到关注。现有的方法是用替代估计分布来替代通常未知的响应分布。这些估计值是从现有数据中获得的,但当某些分布只有少量数据时,就会出现问题。这种情况在实践中很常见,目前可用的方法无法解决,尤其是当我们以密度估计为目标时。我们展示了在有协变量的情况下,如何避免这种情况以及与密度估计相关的其他问题,如调整参数选择和偏差问题。我们还介绍了基于经验测量的分布-响应回归的新版本。通过避免恢复完整个体响应分布的预处理步骤,所提出的方法适用于每种分布的可用样本量不同的情况,尤其是当某些分布的样本量较小,而另一些分布的样本量较大时。在这种情况下,即使对于只有少量数据的分布,也可以通过获得整个分布样本的强度来获得一致的分布估计值,而单独估计分布或密度的传统方法则会失败,因为稀疏采样的密度无法得到一致的估计值。通过模拟和环境对儿童健康结果的影响数据,证明了所提出的模型优于现有方法。
{"title":"Wasserstein regression with empirical measures and density estimation for sparse data.","authors":"Yidong Zhou, Hans-Georg Müller","doi":"10.1093/biomtc/ujae127","DOIUrl":"https://doi.org/10.1093/biomtc/ujae127","url":null,"abstract":"<p><p>The problem of modeling the relationship between univariate distributions and one or more explanatory variables lately has found increasing interest. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with currently available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for each distribution varies and especially when it is small for some of the distributions but large for others. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations and Environmental Influences on Child Health Outcomes data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A formal goodness-of-fit test for spatial binary Markov random field models. 空间二元马尔可夫随机场模型的正式拟合优度检验。
IF 1.4 4区 数学 Q3 BIOLOGY Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae119
Eva Biswas, Andee Kaplan, Mark S Kaiser, Daniel J Nordman

Binary spatial observations arise in environmental and ecological studies, where Markov random field (MRF) models are often applied. Despite the prevalence and the long history of MRF models for spatial binary data, appropriate model diagnostics have remained an unresolved issue in practice. A complicating factor is that such models involve neighborhood specifications, which are difficult to assess for binary data. To address this, we propose a formal goodness-of-fit (GOF) test for diagnosing an MRF model for spatial binary values. The test statistic involves a type of conditional Moran's I based on the fitted conditional probabilities, which can detect departures in model form, including neighborhood structure. Numerical studies show that the GOF test can perform well in detecting deviations from a null model, with a focus on neighborhoods as a difficult issue. We illustrate the spatial test with an application to Besag's historical endive data as well as the breeding pattern of grasshopper sparrows across Iowa.

在环境和生态研究中会出现二元空间观测数据,马尔可夫随机场(MRF)模型经常被应用。尽管马尔可夫随机场模型在空间二元数据中的应用非常普遍,而且历史悠久,但在实践中,适当的模型诊断仍是一个悬而未决的问题。一个复杂的因素是,这类模型涉及邻域规范,很难对二进制数据进行评估。为了解决这个问题,我们提出了一种正式的拟合优度(GOF)检验,用于诊断空间二进制值的 MRF 模型。该检验统计量涉及一种基于拟合条件概率的条件莫兰 I,它可以检测模型形式的偏离,包括邻域结构。数值研究表明,GOF 检验能很好地检测出与空模型的偏离,其中邻域是一个难点。我们将空间检验应用于贝萨格的苣荬菜历史数据以及爱荷华州各地蚱蜢麻雀的繁殖模式,以此来说明空间检验。
{"title":"A formal goodness-of-fit test for spatial binary Markov random field models.","authors":"Eva Biswas, Andee Kaplan, Mark S Kaiser, Daniel J Nordman","doi":"10.1093/biomtc/ujae119","DOIUrl":"https://doi.org/10.1093/biomtc/ujae119","url":null,"abstract":"<p><p>Binary spatial observations arise in environmental and ecological studies, where Markov random field (MRF) models are often applied. Despite the prevalence and the long history of MRF models for spatial binary data, appropriate model diagnostics have remained an unresolved issue in practice. A complicating factor is that such models involve neighborhood specifications, which are difficult to assess for binary data. To address this, we propose a formal goodness-of-fit (GOF) test for diagnosing an MRF model for spatial binary values. The test statistic involves a type of conditional Moran's I based on the fitted conditional probabilities, which can detect departures in model form, including neighborhood structure. Numerical studies show that the GOF test can perform well in detecting deviations from a null model, with a focus on neighborhoods as a difficult issue. We illustrate the spatial test with an application to Besag's historical endive data as well as the breeding pattern of grasshopper sparrows across Iowa.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142494172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1