Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom
The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).
越来越多的高维纵向基因表达测量方法有助于了解生物机制,这是精准医疗所必需的。生物学知识表明,描述复杂疾病的最佳方法可能是在可能相互影响的潜在通路层面上进行描述。我们提出了一种贝叶斯方法,可以通过隶属高斯过程(DGP)描述不同通路之间的这种相关性,并通过贝叶斯稀疏因子分析将观察到的高维基因表达轨迹映射到未观察到的低维通路表达轨迹中。我们的建议是对纵向数据放宽独立因子经典假设的首次尝试,并通过模拟和实际数据分析,在恢复通路表达轨迹的形状、揭示基因和通路之间的关系以及预测基因表达(更接近的点估计和更窄的预测区间)方面表现出卓越的性能。为了拟合模型,我们提出了蒙特卡洛期望最大化(MCEM)方案,通过结合标准马尔可夫链蒙特卡洛采样器和 R 软件包 GPFDA(可返回 DGP 超参数的最大似然估计值),可以方便地实现该方案。MCEM 的模块化结构使其可以推广到涉及 DGP 模型组件的其他复杂模型。我们的 R 软件包 DGP4LCF 可在 R Archive Network (CRAN) 上查阅。
{"title":"Dynamic factor analysis with dependent Gaussian processes for high-dimensional gene expression trajectories.","authors":"Jiachen Cai, Robert J B Goudie, Colin Starr, Brian D M Tom","doi":"10.1093/biomtc/ujae131","DOIUrl":"https://doi.org/10.1093/biomtc/ujae131","url":null,"abstract":"<p><p>The increasing availability of high-dimensional, longitudinal measures of gene expression can facilitate understanding of biological mechanisms, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterizing such correlation among different pathways through dependent Gaussian processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian sparse factor analysis. Our proposal is the first attempt to relax the classical assumption of independent factors for longitudinal data and has demonstrated a superior performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated through simulations and real data analysis. To fit the model, we propose a Monte Carlo expectation maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA,which returns the maximum likelihood estimates of DGP hyperparameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. Our R package DGP4LCF that implements the proposed approach is available on the Comprehensive R Archive Network (CRAN).</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangfan Ren, Christine B Peterson, Marina Vannucci
In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.
{"title":"Bayesian network-guided sparse regression with flexible varying effects.","authors":"Yangfan Ren, Christine B Peterson, Marina Vannucci","doi":"10.1093/biomtc/ujae111","DOIUrl":"https://doi.org/10.1093/biomtc/ujae111","url":null,"abstract":"<p><p>In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani
Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.
{"title":"Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors.","authors":"Ziyi Song, Weining Shen, Marina Vannucci, Alexandria Baldizon, Paul M Cinciripini, Francesco Versace, Michele Guindani","doi":"10.1093/biomtc/ujae124","DOIUrl":"10.1093/biomtc/ujae124","url":null,"abstract":"<p><p>Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a model for longitudinal functional data analysis that accounts for pointwise skewness. The proposed procedure decouples the marginal pointwise variation from the complex longitudinal and functional dependence using copula methodology. Pointwise variation is described through parametric distribution functions that capture varying skewness and change smoothly both in time and over the functional argument. Joint dependence is quantified through a Gaussian copula with a low-rank approximation-based covariance. The introduced class of models provides a unifying platform for both pointwise quantile estimation and prediction of complete trajectories at new times. We investigate the methods numerically in simulations and discuss their application to a diffusion tensor imaging study of multiple sclerosis patients. This approach is implemented in the R package sLFDA that is publicly available on GitHub.
本文介绍了一种用于纵向功能数据分析的模型,该模型考虑到了点偏度。所提出的程序利用 copula 方法将边际点状变化与复杂的纵向和函数依赖性分离开来。点向变异通过参数分布函数来描述,这些函数能捕捉不同的偏斜度,并在时间和功能参数上平滑变化。联合依赖性通过高斯协方差与基于低阶近似的协方差进行量化。引入的这一类模型提供了一个统一的平台,既能进行点量化估计,又能预测新时间的完整轨迹。我们在模拟中对这些方法进行了数值研究,并讨论了它们在多发性硬化症患者扩散张量成像研究中的应用。这种方法在 GitHub 上公开发布的 sLFDA R 软件包中实现。
{"title":"Modeling longitudinal skewed functional data.","authors":"Mohammad Samsul Alam, Ana-Maria Staicu","doi":"10.1093/biomtc/ujae121","DOIUrl":"https://doi.org/10.1093/biomtc/ujae121","url":null,"abstract":"<p><p>This paper introduces a model for longitudinal functional data analysis that accounts for pointwise skewness. The proposed procedure decouples the marginal pointwise variation from the complex longitudinal and functional dependence using copula methodology. Pointwise variation is described through parametric distribution functions that capture varying skewness and change smoothly both in time and over the functional argument. Joint dependence is quantified through a Gaussian copula with a low-rank approximation-based covariance. The introduced class of models provides a unifying platform for both pointwise quantile estimation and prediction of complete trajectories at new times. We investigate the methods numerically in simulations and discuss their application to a diffusion tensor imaging study of multiple sclerosis patients. This approach is implemented in the R package sLFDA that is publicly available on GitHub.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.
{"title":"Robust model averaging approach by Mallows-type criterion.","authors":"Miaomiao Wang, Kang You, Lixing Zhu, Guohua Zou","doi":"10.1093/biomtc/ujae128","DOIUrl":"https://doi.org/10.1093/biomtc/ujae128","url":null,"abstract":"<p><p>Model averaging is an important tool for treating uncertainty from model selection process and fusing information from different models, and has been widely used in various fields. However, the most existing model averaging criteria are proposed based on the methods of ordinary least squares or maximum likelihood, which possess high sensitivity to outliers or violation of certain model assumption. For the mean regression, no optimal robust methods are developed. To fill this gap, in our paper, we propose an outlier-robust model averaging approach by Mallows-type criterion. The idea is that we first construct a generalized M (GM) estimator for each candidate model, and then build robust weighting schemes by the asymptotic expansion of the final prediction error based on the GM-type loss function. So, we can still achieve a trustworthy result even if the dataset is contaminated by outliers in response and/or covariates. Asymptotic properties of the proposed robust model averaging estimators are established under some regularity conditions. The consistency of our weight estimators tending to the theoretically optimal weight vectors is also derived. We prove that our model averaging estimator is robust in terms of having bounded influence function. Further, we define the empirical prediction influence function to evaluate the quantitative robustness of the model averaging estimator. A simulation study and a real data analysis are conducted to demonstrate the finite sample performance of our estimators and compare them with other commonly used model selection and averaging methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.
{"title":"Temporal generative models for learning heterogeneous group dynamics of ecological momentary assessment data.","authors":"Soohyun Kim, Young-Geun Kim, Yuanjia Wang","doi":"10.1093/biomtc/ujae115","DOIUrl":"10.1093/biomtc/ujae115","url":null,"abstract":"<p><p>One of the goals of precision psychiatry is to characterize mental disorders in an individualized manner, taking into account the underlying dynamic processes. Recent advances in mobile technologies have enabled the collection of ecological momentary assessments that capture multiple responses in real-time at high frequency. However, ecological momentary assessment data are often multi-dimensional, correlated, and hierarchical. Mixed-effect models are commonly used but may require restrictive assumptions about the fixed and random effects and the correlation structure. The recurrent temporal restricted Boltzmann machine (RTRBM) is a generative neural network that can be used to model temporal data, but most existing RTRBM approaches do not account for the potential heterogeneity of group dynamics within a population based on available covariates. In this paper, we propose a new temporal generative model, the HDRBM, to learn the heterogeneous group dynamics and demonstrate the effectiveness of this approach on simulated and real-world ecological momentary assessment datasets. We show that by incorporating covariates, HDRBM can improve accuracy and interpretability, explore the underlying drivers of the group dynamics of participants, and serve as a generative model for ecological momentary assessment studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners
Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.
随机试验寻求在目标人群中有效估计治疗效果,但科学兴趣往往也集中在亚人群上。虽然每个亚人群中的受试者通常太少,无法有效估计这些亚人群的治疗效果,但可以通过在亚人群间借力来获得精确度,就像篮子试验中的情况一样。虽然动态借力被认为是估算亚人群对主要终点治疗效果的有效方法,但利用次要终点中的信息还可以提高效率。我们提出了一种多源可交换性模型(MEM),该模型结合了次要终点,可以更有效地评估亚人群的可交换性。在所有模拟研究中,与只考虑主要终点数据的标准 MEM 相比,我们提出的模型几乎一致地降低了均方误差,在亚人群对治疗反应相似时提高了效率,在亚人群异质性时降低了偏差幅度。我们利用最近完成的一项尼古丁含量极低的香烟试验数据来估算三个优先亚人群的戒烟效果,从而说明我们的模型是可行的。与标准模型相比,我们提出的模型使有效样本量增加了两到四倍。
{"title":"Leveraging information from secondary endpoints to enhance dynamic borrowing across subpopulations.","authors":"Jack M Wolf, David M Vock, Xianghua Luo, Dorothy K Hatsukami, F Joseph McClernon, Joseph S Koopmeiners","doi":"10.1093/biomtc/ujae118","DOIUrl":"https://doi.org/10.1093/biomtc/ujae118","url":null,"abstract":"<p><p>Randomized trials seek efficient treatment effect estimation within target populations, yet scientific interest often also centers on subpopulations. Although there are typically too few subjects within each subpopulation to efficiently estimate these subpopulation treatment effects, one can gain precision by borrowing strength across subpopulations, as is the case in a basket trial. While dynamic borrowing has been proposed as an efficient approach to estimating subpopulation treatment effects on primary endpoints, additional efficiency could be gained by leveraging the information found in secondary endpoints. We propose a multisource exchangeability model (MEM) that incorporates secondary endpoints to more efficiently assess subpopulation exchangeability. Across simulation studies, our proposed model almost uniformly reduces the mean squared error when compared to the standard MEM that only considers data from the primary endpoint by gaining efficiency when subpopulations respond similarly to the treatment and reducing the magnitude of bias when the subpopulations are heterogeneous. We illustrate our model's feasibility using data from a recently completed trial of very low nicotine content cigarettes to estimate the effect on abstinence from smoking within three priority subpopulations. Our proposed model led to increases in the effective sample size two to four times greater than under the standard MEM.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142494173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li
Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.
{"title":"An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data.","authors":"Huimin Li, Bencong Zhu, Xi Jiang, Lei Guo, Yang Xie, Lin Xu, Qiwei Li","doi":"10.1093/biomtc/ujae066","DOIUrl":"10.1093/biomtc/ujae066","url":null,"abstract":"<p><p>Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Castelletti, Guido Consonni, Marco L Della Vedova
The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.
本文的研究范围是涉及分类变量的多变量环境。在对一个变量进行外部操作后,目标是评估其对相关结果的因果影响。一个典型的情景是由代表生活方式、身心特征、症状和风险因素的变量组成的系统,其结果是是否患有某种疾病。这些变量以复杂的方式相互关联,使得干预效果可以通过多种途径传播。我们方法的一个显著特点是在估算因果效应的同时,考虑到依赖结构(我们通过有向无环图(DAG)表示)和 DAG 模型参数的不确定性。具体来说,我们提出了一种马尔可夫链蒙特卡洛算法,该算法基于高效的可逆跳跃建议方案,以 DAG 和参数的联合后验为目标。我们通过大量的模拟研究验证了我们的方法,并证明它在估计精度方面优于目前最先进的程序。最后,我们将我们的方法应用于分析本科生抑郁和焦虑的数据集。
{"title":"Joint structure learning and causal effect estimation for categorical graphical models.","authors":"Federico Castelletti, Guido Consonni, Marco L Della Vedova","doi":"10.1093/biomtc/ujae067","DOIUrl":"https://doi.org/10.1093/biomtc/ujae067","url":null,"abstract":"<p><p>The scope of this paper is a multivariate setting involving categorical variables. Following an external manipulation of one variable, the goal is to evaluate the causal effect on an outcome of interest. A typical scenario involves a system of variables representing lifestyle, physical and mental features, symptoms, and risk factors, with the outcome being the presence or absence of a disease. These variables are interconnected in complex ways, allowing the effect of an intervention to propagate through multiple paths. A distinctive feature of our approach is the estimation of causal effects while accounting for uncertainty in both the dependence structure, which we represent through a directed acyclic graph (DAG), and the DAG-model parameters. Specifically, we propose a Markov chain Monte Carlo algorithm that targets the joint posterior over DAGs and parameters, based on an efficient reversible-jump proposal scheme. We validate our method through extensive simulation studies and demonstrate that it outperforms current state-of-the-art procedures in terms of estimation accuracy. Finally, we apply our methodology to analyze a dataset on depression and anxiety in undergraduate students.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.
电子健康记录和其他来源的观察数据越来越多地被用于因果推断。使用这些非研究目的的数据来估计因果效应会受到混杂因素和不规则间隔的协变量驱动的观察时间的影响。以前曾提出过一种考虑到这些特征的双重加权估计器,它依赖于对用于加权的两个滋扰模型的正确规范。在这项工作中,我们提出了一种新颖的一致乘稳健估计器,并通过分析和综合模拟研究证明,与针对相同环境提出的唯一替代估计器相比,该估计器更灵活、更高效。我们将其进一步应用于美国 Add Health 研究数据,以估计治疗咨询对美国青少年酒精消费的因果效应。
{"title":"Multiply robust estimation of marginal structural models in observational studies subject to covariate-driven observations.","authors":"Janie Coulombe, Shu Yang","doi":"10.1093/biomtc/ujae065","DOIUrl":"10.1093/biomtc/ujae065","url":null,"abstract":"<p><p>Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141619221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}