Biometrics最新文献_第3页

Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories. 针对高维基因表达轨迹的依存高斯过程动态因子分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae131

Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom

The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).

越来越多的高维纵向基因表达测量方法有助于了解生物机制，这是精准医疗所必需的。生物学知识表明，描述复杂疾病的最佳方法可能是在可能相互影响的潜在通路层面上进行描述。我们提出了一种贝叶斯方法，可以通过隶属高斯过程（DGP）描述不同通路之间的这种相关性，并通过贝叶斯稀疏因子分析将观察到的高维基因表达轨迹映射到未观察到的低维通路表达轨迹中。我们的建议是对纵向数据放宽独立因子经典假设的首次尝试，并通过模拟和实际数据分析，在恢复通路表达轨迹的形状、揭示基因和通路之间的关系以及预测基因表达（更接近的点估计和更窄的预测区间）方面表现出卓越的性能。为了拟合模型，我们提出了蒙特卡洛期望最大化（MCEM）方案，通过结合标准马尔可夫链蒙特卡洛采样器和 R 软件包 GPFDA（可返回 DGP 超参数的最大似然估计值），可以方便地实现该方案。MCEM 的模块化结构使其可以推广到涉及 DGP 模型组件的其他复杂模型。我们的 R 软件包 DGP4LCF 可在 R Archive Network (CRAN) 上查阅。

{"title":"Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories.","authors":"Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom","doi":"10.1093/biomtc/ujae131","DOIUrl":"https://doi.org/10.1093/biomtc/ujae131","url":null,"abstract":"The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian network-guided sparse regression with flexible varying effects. 具有灵活变化效应的贝叶斯网络引导稀疏回归。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae111

Yangfan Ren, Christine B Peterson, Marina Vannucci

In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.

在本文中，我们提出了一种新颖的贝叶斯回归特征选择方法--图形估计变化效应回归（VERGE）。我们的模型在一些关键方面能够利用基因组学或成像研究中产生的数据集的复杂结构。我们区分了预测因子（即结果预测模型中使用的特征）和受试者水平协变量（调节预测因子对结果的影响）。我们构建了一个变化系数建模框架，在这个框架中，我们推断出预测变量之间的网络，并利用这一网络信息鼓励选择相关的预测变量。我们采用了变量选择尖峰和平板先验，从而能够选择网络关联的预测变量和改变预测效应的协变量。我们通过模拟研究证明，我们的方法在特征选择和预测准确性方面都优于现有的替代方法。我们将 VERGE 应用于描述肠道微生物组特征对肥胖的影响，并在此基础上确定了一系列微生物类群及其生态依赖关系。我们允许受试者级别的协变量（包括性别和饮食摄入变量）来修改微生物组预测因子的系数，从而为这些因素之间的相互作用提供更多的洞察力。

{"title":"Bayesian network-guided sparse regression with flexible varying effects.","authors":"Yangfan Ren, Christine B Peterson, Marina Vannucci","doi":"10.1093/biomtc/ujae111","DOIUrl":"https://doi.org/10.1093/biomtc/ujae111","url":null,"abstract":"In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors. 利用有信息的分层收缩分区先验对计算机鼠标跟踪数据进行聚类。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae124

Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani

Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.

鼠标跟踪数据记录了参与者执行实验任务时电脑鼠标的轨迹，为了解受试者的基本认知过程提供了宝贵的信息。神经科学家们有兴趣将受试者在电脑鼠标跟踪任务中的反应进行聚类，以揭示个体决策行为的模式，并识别具有类似神经行为反应的人群亚群。这些数据可与神经影像学数据相结合，为个性化干预提供更多信息。在本文中，我们开发了一种新颖的分层收缩分割（HSP）先验，用于对小鼠追踪数据轨迹得出的汇总统计数据进行聚类。HSP 模型将受试者聚类定义为一组受试者，这组受试者会产生更多相似（而非相同）的条件嵌套分区。所提出的模型可以结合有关受试者或条件分区的先验信息来促进聚类，并允许每个受试者组内的嵌套分区出现偏差。这些特点使 HSP 模型有别于其他双聚类方法，后者通常在一个受试者组内创建相同的条件嵌套分区。此外，它也有别于现有的嵌套聚类方法，后者根据抽样模型中的共同参数定义聚类，并通过不同的分布确定受试者组。我们在一项试验研究的小鼠跟踪数据集和模拟研究中说明了 HSP 模型的独特功能。我们的研究结果表明了所提出的探索性框架在聚类和揭示受试者群体间可能存在的不同行为模式方面的能力和有效性。

{"title":"Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors.","authors":"Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani","doi":"10.1093/biomtc/ujae124","DOIUrl":"10.1093/biomtc/ujae124","url":null,"abstract":"Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling longitudinal skewed functional data. 纵向倾斜功能数据建模。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae121

Mohammad Samsul Alam, Ana-Maria Staicu

This paper introduces a model for longitudinal functional data analysis that accounts for pointwise skewness. The proposed procedure decouples the marginal pointwise variation from the complex longitudinal and functional dependence using copula methodology. Pointwise variation is described through parametric distribution functions that capture varying skewness and change smoothly both in time and over the functional argument. Joint dependence is quantified through a Gaussian copula with a low-rank approximation-based covariance. The introduced class of models provides a unifying platform for both pointwise quantile estimation and prediction of complete trajectories at new times. We investigate the methods numerically in simulations and discuss their application to a diffusion tensor imaging study of multiple sclerosis patients. This approach is implemented in the R package sLFDA that is publicly available on GitHub.

本文介绍了一种用于纵向功能数据分析的模型，该模型考虑到了点偏度。所提出的程序利用 copula 方法将边际点状变化与复杂的纵向和函数依赖性分离开来。点向变异通过参数分布函数来描述，这些函数能捕捉不同的偏斜度，并在时间和功能参数上平滑变化。联合依赖性通过高斯协方差与基于低阶近似的协方差进行量化。引入的这一类模型提供了一个统一的平台，既能进行点量化估计，又能预测新时间的完整轨迹。我们在模拟中对这些方法进行了数值研究，并讨论了它们在多发性硬化症患者扩散张量成像研究中的应用。这种方法在 GitHub 上公开发布的 sLFDA R 软件包中实现。

引用次数: 0

Robust model averaging approach by Mallows-type criterion. 采用 Mallows 型标准的稳健模型平均法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae128

Miaomiao Wang, Kang You, Lixing Zhu, Guohua Zou

Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.

模型平均法是处理模型选择过程中的不确定性和融合不同模型信息的重要工具，已被广泛应用于各个领域。然而，现有的模型平均准则大多是基于普通最小二乘法或最大似然法提出的，对异常值或违反某些模型假设具有较高的敏感性。对于均值回归，还没有开发出最优的稳健方法。为了填补这一空白，我们在本文中提出了一种采用 Mallows 型准则的离群稳健模型平均方法。其思路是，我们首先为每个候选模型构建一个广义 M（GM）估计器，然后基于 GM 型损失函数，通过最终预测误差的渐近展开建立稳健加权方案。因此，即使数据集受到响应和/或协变量异常值的污染，我们仍然可以获得值得信赖的结果。在一些正则条件下，建立了所提出的稳健模型平均估计器的渐近特性。我们还推导出了趋向于理论最优权重向量的权重估计器的一致性。我们证明了我们的模型平均估计器在有界影响函数方面是稳健的。此外，我们还定义了经验预测影响函数，以评估模型平均估计器的定量稳健性。我们进行了模拟研究和实际数据分析，以证明我们的估计器的有限样本性能，并将其与其他常用的模型选择和平均方法进行比较。

{"title":"Robust model averaging approach by Mallows-type criterion.","authors":"Miaomiao Wang, Kang You, Lixing Zhu, Guohua Zou","doi":"10.1093/biomtc/ujae128","DOIUrl":"https://doi.org/10.1093/biomtc/ujae128","url":null,"abstract":"Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal generative models for learning heterogeneous group dynamics of ecological momentary assessment data. 用于学习生态瞬时评估数据的异质群体动态的时间生成模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae115

Soohyun Kim, Young-Geun Kim, Yuanjia Wang

One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.

精准精神病学的目标之一是考虑到潜在的动态过程，以个性化的方式描述精神障碍的特征。移动技术的最新进展使得生态学瞬间评估的收集成为可能，这种评估可以高频率地实时捕捉多种反应。然而，生态瞬间评估数据通常是多维、相关和分层的。混合效应模型是常用的模型，但可能需要对固定效应、随机效应和相关结构做出限制性假设。递归时空受限玻尔兹曼机（RTRBM）是一种生成式神经网络，可用于建立时空数据模型，但现有的大多数 RTRBM 方法都没有考虑到基于可用协变量的种群内群体动态的潜在异质性。在本文中，我们提出了一种新的时间生成模型--HDRBM，用于学习异质性群体动态，并在模拟和真实世界的生态瞬时评估数据集上证明了这种方法的有效性。我们表明，通过纳入协变量，HDRBM 可以提高准确性和可解释性，探索参与者群体动态的潜在驱动因素，并可作为生态瞬时评估研究的生成模型。

{"title":"Temporal generative models for learning heterogeneous group dynamics of ecological momentary assessment data.","authors":"Soohyun Kim, Young-Geun Kim, Yuanjia Wang","doi":"10.1093/biomtc/ujae115","DOIUrl":"10.1093/biomtc/ujae115","url":null,"abstract":"One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging information from secondary endpoints to enhance dynamic borrowing across subpopulations. 利用次要终点的信息，加强亚人群的动态借用。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae118

Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners

Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.

随机试验寻求在目标人群中有效估计治疗效果，但科学兴趣往往也集中在亚人群上。虽然每个亚人群中的受试者通常太少，无法有效估计这些亚人群的治疗效果，但可以通过在亚人群间借力来获得精确度，就像篮子试验中的情况一样。虽然动态借力被认为是估算亚人群对主要终点治疗效果的有效方法，但利用次要终点中的信息还可以提高效率。我们提出了一种多源可交换性模型（MEM），该模型结合了次要终点，可以更有效地评估亚人群的可交换性。在所有模拟研究中，与只考虑主要终点数据的标准 MEM 相比，我们提出的模型几乎一致地降低了均方误差，在亚人群对治疗反应相似时提高了效率，在亚人群异质性时降低了偏差幅度。我们利用最近完成的一项尼古丁含量极低的香烟试验数据来估算三个优先亚人群的戒烟效果，从而说明我们的模型是可行的。与标准模型相比，我们提出的模型使有效样本量增加了两到四倍。

{"title":"Leveraging information from secondary endpoints to enhance dynamic borrowing across subpopulations.","authors":"Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners","doi":"10.1093/biomtc/ujae118","DOIUrl":"https://doi.org/10.1093/biomtc/ujae118","url":null,"abstract":"Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142494173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data. 用于分析空间解析转录组学数据的带特征选择的可解释贝叶斯聚类方法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae066

Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.

空间分辨转录组学（SRT）技术的最新突破使我们能够在保留空间信息的同时，在点或细胞水平上进行全面的分子特征描述。细胞是组织的基本组成单位，被组织成不同但又相互连接的组成部分。虽然许多非空间和空间聚类方法都被用来根据 SRT 高维分子图谱将整个区域划分为相互排斥的空间域，但大多数方法都需要临时选择可解释性较差的降维技术。为了克服这一难题，我们提出了一种零膨胀负二项混合模型，根据分子轮廓对斑点或细胞进行聚类。为了提高可解释性，我们采用了一种特征选择机制，根据能揭示聚类结果的鉴别基因提供 SRT 分子剖面的低维摘要。我们还通过马尔可夫随机场先验进一步纳入了 SRT 地理空间概况。通过模拟研究和 3 个真实数据应用，我们展示了这种联合建模策略与其他最先进方法相比如何提高聚类准确性。

{"title":"An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data.","authors":"Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li","doi":"10.1093/biomtc/ujae066","DOIUrl":"10.1093/biomtc/ujae066","url":null,"abstract":"Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint structure learning and causal effect estimation for categorical graphical models. 分类图形模型的联合结构学习和因果效应估计

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae067

Federico Castelletti, Guido Consonni, Marco L Della Vedova

The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.

本文的研究范围是涉及分类变量的多变量环境。在对一个变量进行外部操作后，目标是评估其对相关结果的因果影响。一个典型的情景是由代表生活方式、身心特征、症状和风险因素的变量组成的系统，其结果是是否患有某种疾病。这些变量以复杂的方式相互关联，使得干预效果可以通过多种途径传播。我们方法的一个显著特点是在估算因果效应的同时，考虑到依赖结构（我们通过有向无环图（DAG）表示）和 DAG 模型参数的不确定性。具体来说，我们提出了一种马尔可夫链蒙特卡洛算法，该算法基于高效的可逆跳跃建议方案，以 DAG 和参数的联合后验为目标。我们通过大量的模拟研究验证了我们的方法，并证明它在估计精度方面优于目前最先进的程序。最后，我们将我们的方法应用于分析本科生抑郁和焦虑的数据集。

{"title":"Joint structure learning and causal effect estimation for categorical graphical models.","authors":"Federico Castelletti, Guido Consonni, Marco L Della Vedova","doi":"10.1093/biomtc/ujae067","DOIUrl":"https://doi.org/10.1093/biomtc/ujae067","url":null,"abstract":"The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiply robust estimation of marginal structural models in observational studies subject to covariate-driven observations. 在观测研究中，对受协变因素驱动的观测结果进行边际结构模型的多重稳健估计。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae065

Janie Coulombe, Shu Yang

Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.

电子健康记录和其他来源的观察数据越来越多地被用于因果推断。使用这些非研究目的的数据来估计因果效应会受到混杂因素和不规则间隔的协变量驱动的观察时间的影响。以前曾提出过一种考虑到这些特征的双重加权估计器，它依赖于对用于加权的两个滋扰模型的正确规范。在这项工作中，我们提出了一种新颖的一致乘稳健估计器，并通过分析和综合模拟研究证明，与针对相同环境提出的唯一替代估计器相比，该估计器更灵活、更高效。我们将其进一步应用于美国 Add Health 研究数据，以估计治疗咨询对美国青少年酒精消费的因果效应。

引用次数: 0