Biometrics最新文献_第7页

Modeling longitudinal skewed functional data. 纵向倾斜功能数据建模。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae121

Mohammad Samsul Alam, Ana-Maria Staicu

This paper introduces a model for longitudinal functional data analysis that accounts for pointwise skewness. The proposed procedure decouples the marginal pointwise variation from the complex longitudinal and functional dependence using copula methodology. Pointwise variation is described through parametric distribution functions that capture varying skewness and change smoothly both in time and over the functional argument. Joint dependence is quantified through a Gaussian copula with a low-rank approximation-based covariance. The introduced class of models provides a unifying platform for both pointwise quantile estimation and prediction of complete trajectories at new times. We investigate the methods numerically in simulations and discuss their application to a diffusion tensor imaging study of multiple sclerosis patients. This approach is implemented in the R package sLFDA that is publicly available on GitHub.

本文介绍了一种用于纵向功能数据分析的模型，该模型考虑到了点偏度。所提出的程序利用 copula 方法将边际点状变化与复杂的纵向和函数依赖性分离开来。点向变异通过参数分布函数来描述，这些函数能捕捉不同的偏斜度，并在时间和功能参数上平滑变化。联合依赖性通过高斯协方差与基于低阶近似的协方差进行量化。引入的这一类模型提供了一个统一的平台，既能进行点量化估计，又能预测新时间的完整轨迹。我们在模拟中对这些方法进行了数值研究，并讨论了它们在多发性硬化症患者扩散张量成像研究中的应用。这种方法在 GitHub 上公开发布的 sLFDA R 软件包中实现。

引用次数: 0

Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors. 利用有信息的分层收缩分区先验对计算机鼠标跟踪数据进行聚类。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae124

Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani

Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.

鼠标跟踪数据记录了参与者执行实验任务时电脑鼠标的轨迹，为了解受试者的基本认知过程提供了宝贵的信息。神经科学家们有兴趣将受试者在电脑鼠标跟踪任务中的反应进行聚类，以揭示个体决策行为的模式，并识别具有类似神经行为反应的人群亚群。这些数据可与神经影像学数据相结合，为个性化干预提供更多信息。在本文中，我们开发了一种新颖的分层收缩分割（HSP）先验，用于对小鼠追踪数据轨迹得出的汇总统计数据进行聚类。HSP 模型将受试者聚类定义为一组受试者，这组受试者会产生更多相似（而非相同）的条件嵌套分区。所提出的模型可以结合有关受试者或条件分区的先验信息来促进聚类，并允许每个受试者组内的嵌套分区出现偏差。这些特点使 HSP 模型有别于其他双聚类方法，后者通常在一个受试者组内创建相同的条件嵌套分区。此外，它也有别于现有的嵌套聚类方法，后者根据抽样模型中的共同参数定义聚类，并通过不同的分布确定受试者组。我们在一项试验研究的小鼠跟踪数据集和模拟研究中说明了 HSP 模型的独特功能。我们的研究结果表明了所提出的探索性框架在聚类和揭示受试者群体间可能存在的不同行为模式方面的能力和有效性。

{"title":"Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors.","authors":"Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani","doi":"10.1093/biomtc/ujae124","DOIUrl":"10.1093/biomtc/ujae124","url":null,"abstract":"Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust model averaging approach by Mallows-type criterion. 采用 Mallows 型标准的稳健模型平均法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae128

Miaomiao Wang, Kang You, Lixing Zhu, Guohua Zou

Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.

模型平均法是处理模型选择过程中的不确定性和融合不同模型信息的重要工具，已被广泛应用于各个领域。然而，现有的模型平均准则大多是基于普通最小二乘法或最大似然法提出的，对异常值或违反某些模型假设具有较高的敏感性。对于均值回归，还没有开发出最优的稳健方法。为了填补这一空白，我们在本文中提出了一种采用 Mallows 型准则的离群稳健模型平均方法。其思路是，我们首先为每个候选模型构建一个广义 M（GM）估计器，然后基于 GM 型损失函数，通过最终预测误差的渐近展开建立稳健加权方案。因此，即使数据集受到响应和/或协变量异常值的污染，我们仍然可以获得值得信赖的结果。在一些正则条件下，建立了所提出的稳健模型平均估计器的渐近特性。我们还推导出了趋向于理论最优权重向量的权重估计器的一致性。我们证明了我们的模型平均估计器在有界影响函数方面是稳健的。此外，我们还定义了经验预测影响函数，以评估模型平均估计器的定量稳健性。我们进行了模拟研究和实际数据分析，以证明我们的估计器的有限样本性能，并将其与其他常用的模型选择和平均方法进行比较。

{"title":"Robust model averaging approach by Mallows-type criterion.","authors":"Miaomiao Wang, Kang You, Lixing Zhu, Guohua Zou","doi":"10.1093/biomtc/ujae128","DOIUrl":"https://doi.org/10.1093/biomtc/ujae128","url":null,"abstract":"Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale survival analysis with a cure fraction. 利用治愈率进行大规模生存分析

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae138

Bo Han, Xiaoguang Wang, Liuquan Sun

With the advent of massive survival data with a cure fraction, large-scale regression for analyzing the effects of risk factors on a general population has become an emerging challenge. This article proposes a new probability-weighted method for estimation and inference for semiparametric cure regression models. We develop a flexible formulation of the mixture cure model consisting of the model-free incidence and the latency assumed by the semiparametric proportional hazards model. The susceptible probability assesses the concordance between the observations and the latency. With the susceptible probability as weight, we propose a weighted estimating equation method in a small-scale setting. Robust nonparametric estimation of the weight permits stable implementation of the estimation of regression parameters. A recursive probability-weighted estimation method based on data blocks with smaller sizes is further proposed, which achieves computational and memory efficiency in a large-scale or online setting. Asymptotic properties of the proposed estimators are established. We conduct simulation studies and a real data application to demonstrate the empirical performance of the proposed method.

随着具有治愈率的海量生存数据的出现，用于分析风险因素对一般人群影响的大规模回归已成为一项新的挑战。本文提出了一种新的概率加权方法，用于半参数治愈回归模型的估计和推断。我们开发了一种灵活的混合治愈模型，该模型由半参数比例危害模型假设的无模型发病率和潜伏期组成。易感概率评估观察结果与潜伏期之间的一致性。以易感概率为权重，我们提出了一种小规模加权估计方程方法。通过对权重进行稳健的非参数估计，可以稳定地实现回归参数的估计。我们进一步提出了一种基于较小规模数据块的递归概率加权估计方法，在大规模或在线环境中实现了计算和内存效率。我们建立了所提估计器的渐近特性。我们进行了模拟研究和实际数据应用，以证明所提方法的经验性能。

{"title":"Large-scale survival analysis with a cure fraction.","authors":"Bo Han, Xiaoguang Wang, Liuquan Sun","doi":"10.1093/biomtc/ujae138","DOIUrl":"https://doi.org/10.1093/biomtc/ujae138","url":null,"abstract":"With the advent of massive survival data with a cure fraction, large-scale regression for analyzing the effects of risk factors on a general population has become an emerging challenge. This article proposes a new probability-weighted method for estimation and inference for semiparametric cure regression models. We develop a flexible formulation of the mixture cure model consisting of the model-free incidence and the latency assumed by the semiparametric proportional hazards model. The susceptible probability assesses the concordance between the observations and the latency. With the susceptible probability as weight, we propose a weighted estimating equation method in a small-scale setting. Robust nonparametric estimation of the weight permits stable implementation of the estimation of regression parameters. A recursive probability-weighted estimation method based on data blocks with smaller sizes is further proposed, which achieves computational and memory efficiency in a large-scale or online setting. Asymptotic properties of the proposed estimators are established. We conduct simulation studies and a real data application to demonstrate the empirical performance of the proposed method.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142692670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal generative models for learning heterogeneous group dynamics of ecological momentary assessment data. 用于学习生态瞬时评估数据的异质群体动态的时间生成模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae115

Soohyun Kim, Young-Geun Kim, Yuanjia Wang

One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.

精准精神病学的目标之一是考虑到潜在的动态过程，以个性化的方式描述精神障碍的特征。移动技术的最新进展使得生态学瞬间评估的收集成为可能，这种评估可以高频率地实时捕捉多种反应。然而，生态瞬间评估数据通常是多维、相关和分层的。混合效应模型是常用的模型，但可能需要对固定效应、随机效应和相关结构做出限制性假设。递归时空受限玻尔兹曼机（RTRBM）是一种生成式神经网络，可用于建立时空数据模型，但现有的大多数 RTRBM 方法都没有考虑到基于可用协变量的种群内群体动态的潜在异质性。在本文中，我们提出了一种新的时间生成模型--HDRBM，用于学习异质性群体动态，并在模拟和真实世界的生态瞬时评估数据集上证明了这种方法的有效性。我们表明，通过纳入协变量，HDRBM 可以提高准确性和可解释性，探索参与者群体动态的潜在驱动因素，并可作为生态瞬时评估研究的生成模型。

{"title":"Temporal generative models for learning heterogeneous group dynamics of ecological momentary assessment data.","authors":"Soohyun Kim, Young-Geun Kim, Yuanjia Wang","doi":"10.1093/biomtc/ujae115","DOIUrl":"10.1093/biomtc/ujae115","url":null,"abstract":"One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging information from secondary endpoints to enhance dynamic borrowing across subpopulations. 利用次要终点的信息，加强亚人群的动态借用。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae118

Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners

Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.

随机试验寻求在目标人群中有效估计治疗效果，但科学兴趣往往也集中在亚人群上。虽然每个亚人群中的受试者通常太少，无法有效估计这些亚人群的治疗效果，但可以通过在亚人群间借力来获得精确度，就像篮子试验中的情况一样。虽然动态借力被认为是估算亚人群对主要终点治疗效果的有效方法，但利用次要终点中的信息还可以提高效率。我们提出了一种多源可交换性模型（MEM），该模型结合了次要终点，可以更有效地评估亚人群的可交换性。在所有模拟研究中，与只考虑主要终点数据的标准 MEM 相比，我们提出的模型几乎一致地降低了均方误差，在亚人群对治疗反应相似时提高了效率，在亚人群异质性时降低了偏差幅度。我们利用最近完成的一项尼古丁含量极低的香烟试验数据来估算三个优先亚人群的戒烟效果，从而说明我们的模型是可行的。与标准模型相比，我们提出的模型使有效样本量增加了两到四倍。

{"title":"Leveraging information from secondary endpoints to enhance dynamic borrowing across subpopulations.","authors":"Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners","doi":"10.1093/biomtc/ujae118","DOIUrl":"10.1093/biomtc/ujae118","url":null,"abstract":"Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142494173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating marginal treatment effect in cluster randomized trials with multi-level missing outcomes. 估计具有多级缺失结果的聚类随机试验的边际治疗效果。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae135

Chia-Rui Chang, Rui Wang

Analyses of cluster randomized trials (CRTs) can be complicated by informative missing outcome data. Methods such as inverse probability weighted generalized estimating equations have been proposed to account for informative missingness by weighing the observed individual outcome data in each cluster. These existing methods have focused on settings where missingness occurs at the individual level and each cluster has partially or fully observed individual outcomes. In the presence of missing clusters, for example, all outcomes from a cluster are missing due to drop-out of the cluster, these approaches ignore this cluster-level missingness and can lead to biased inference if the cluster-level missingness is informative. Informative missingness at multiple levels can also occur in CRTs with a multi-level structure where study participants are nested in subclusters such as healthcare providers, and the subclusters are nested in clusters such as clinics. In this paper, we propose new estimators for estimating the marginal treatment effect in CRTs accounting for missing outcome data at multiple levels based on weighted generalized estimating equations. We show that the proposed multi-level multiply robust estimator is consistent and asymptotically normally distributed provided that one of the multiple propensity score models postulated at each clustering level is correctly specified. We evaluate the performance of the proposed method through extensive simulations and illustrate its use with a CRT evaluating a Malaria risk-reduction intervention in rural Madagascar.

分组随机试验（CRTs）的分析可能会因结果数据的信息缺失而变得复杂。有人提出了反概率加权广义估计方程等方法，通过权衡每个群组中观察到的个体结果数据来解释信息缺失。这些现有方法主要针对的是在个体水平上出现缺失，且每个群组都有部分或全部观察到的个体结果的情况。在群组缺失的情况下，例如，由于退出群组而导致群组中的所有结果缺失，这些方法会忽略群组层面的缺失，如果群组层面的缺失具有信息性，则可能导致推断偏差。在多层次结构的 CRT 中，研究参与者嵌套在医疗保健提供者等子群组中，而子群组嵌套在诊所等群组中，也会出现多层次的信息缺失。在本文中，我们基于加权广义估计方程，提出了在 CRT 中估计边际治疗效果的新估计方法，以考虑多层次结果数据的缺失。我们的研究表明，只要在每个聚类水平上假设的多个倾向评分模型中，有一个是正确指定的，那么所提出的多水平多重稳健估计器就是一致和渐近正态分布的。我们通过大量模拟来评估所提出方法的性能，并用一个评估马达加斯加农村地区疟疾风险降低干预措施的 CRT 来说明该方法的应用。

{"title":"Estimating marginal treatment effect in cluster randomized trials with multi-level missing outcomes.","authors":"Chia-Rui Chang, Rui Wang","doi":"10.1093/biomtc/ujae135","DOIUrl":"10.1093/biomtc/ujae135","url":null,"abstract":"Analyses of cluster randomized trials (CRTs) can be complicated by informative missing outcome data. Methods such as inverse probability weighted generalized estimating equations have been proposed to account for informative missingness by weighing the observed individual outcome data in each cluster. These existing methods have focused on settings where missingness occurs at the individual level and each cluster has partially or fully observed individual outcomes. In the presence of missing clusters, for example, all outcomes from a cluster are missing due to drop-out of the cluster, these approaches ignore this cluster-level missingness and can lead to biased inference if the cluster-level missingness is informative. Informative missingness at multiple levels can also occur in CRTs with a multi-level structure where study participants are nested in subclusters such as healthcare providers, and the subclusters are nested in clusters such as clinics. In this paper, we propose new estimators for estimating the marginal treatment effect in CRTs accounting for missing outcome data at multiple levels based on weighted generalized estimating equations. We show that the proposed multi-level multiply robust estimator is consistent and asymptotically normally distributed provided that one of the multiple propensity score models postulated at each clustering level is correctly specified. We evaluate the performance of the proposed method through extensive simulations and illustrate its use with a CRT evaluating a Malaria risk-reduction intervention in rural Madagascar.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discussion on "LEAP: the latent exchangeability prior for borrowing information from historical data" by Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, and Joseph G. Ibrahim. 关于 Ethan M. Alt、Xiuya Chang、Xun Jiang、Qing Liu、May Mo、H. Amy Xia 和 Joseph G. Ibrahim 所著《LEAP：从历史数据中借用信息的潜在可交换性先验》的讨论。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae084

Harlan Campbell, Paul Gustafson

We commend Alt et al.'s innovative approach for analysis with a hybrid control arm while offering insights into two key considerations: the necessity for extrapolation and the potential benefits of curating historical control data before analysis.

我们对 Alt 等人使用混合对照臂进行分析的创新方法表示赞赏，同时对两个关键考虑因素提出了见解：外推的必要性和分析前整理历史对照数据的潜在益处。

引用次数: 0

An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data. 用于分析空间解析转录组学数据的带特征选择的可解释贝叶斯聚类方法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae066

Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.

空间分辨转录组学（SRT）技术的最新突破使我们能够在保留空间信息的同时，在点或细胞水平上进行全面的分子特征描述。细胞是组织的基本组成单位，被组织成不同但又相互连接的组成部分。虽然许多非空间和空间聚类方法都被用来根据 SRT 高维分子图谱将整个区域划分为相互排斥的空间域，但大多数方法都需要临时选择可解释性较差的降维技术。为了克服这一难题，我们提出了一种零膨胀负二项混合模型，根据分子轮廓对斑点或细胞进行聚类。为了提高可解释性，我们采用了一种特征选择机制，根据能揭示聚类结果的鉴别基因提供 SRT 分子剖面的低维摘要。我们还通过马尔可夫随机场先验进一步纳入了 SRT 地理空间概况。通过模拟研究和 3 个真实数据应用，我们展示了这种联合建模策略与其他最先进方法相比如何提高聚类准确性。

{"title":"An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data.","authors":"Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li","doi":"10.1093/biomtc/ujae066","DOIUrl":"10.1093/biomtc/ujae066","url":null,"abstract":"Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint structure learning and causal effect estimation for categorical graphical models. 分类图形模型的联合结构学习和因果效应估计

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae067

Federico Castelletti, Guido Consonni, Marco L Della Vedova

The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.

本文的研究范围是涉及分类变量的多变量环境。在对一个变量进行外部操作后，目标是评估其对相关结果的因果影响。一个典型的情景是由代表生活方式、身心特征、症状和风险因素的变量组成的系统，其结果是是否患有某种疾病。这些变量以复杂的方式相互关联，使得干预效果可以通过多种途径传播。我们方法的一个显著特点是在估算因果效应的同时，考虑到依赖结构（我们通过有向无环图（DAG）表示）和 DAG 模型参数的不确定性。具体来说，我们提出了一种马尔可夫链蒙特卡洛算法，该算法基于高效的可逆跳跃建议方案，以 DAG 和参数的联合后验为目标。我们通过大量的模拟研究验证了我们的方法，并证明它在估计精度方面优于目前最先进的程序。最后，我们将我们的方法应用于分析本科生抑郁和焦虑的数据集。

{"title":"Joint structure learning and causal effect estimation for categorical graphical models.","authors":"Federico Castelletti, Guido Consonni, Marco L Della Vedova","doi":"10.1093/biomtc/ujae067","DOIUrl":"https://doi.org/10.1093/biomtc/ujae067","url":null,"abstract":"The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0