首页 > 最新文献

Biostatistics最新文献

英文 中文
A Bayesian nonparametric approach to correct for underreporting in count data. 一种贝叶斯非参数方法,用于纠正计数数据中的漏报。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad027
Serena Arima, Silvia Polettini, Giuseppe Pasculli, Loreto Gesualdo, Francesco Pesce, Deni-Aldo Procaccini

We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.

我们提出了一个用于少报计数数据的非参数复合泊松模型,该模型引入了报告概率的潜在聚类结构。后者是根据专家的意见和报告过程中的代理使用模型参数进行估计的。所提出的模型用于估计意大利阿普利亚的慢性肾脏疾病患病率,基于一个独特的统计数据库,该数据库涵盖了通过整合多源登记信息获得的m=258个市镇的信息。为了监测、监测和管理目的,需要准确的流行率估计;然而,统计数字被认为被严重低估,尤其是在意大利最贫困、最异质的地区之一阿普利亚的一些地区。我们的研究结果与之前的发现一致,并突出了该疾病有趣的地理模式。我们使用先前研究中描述的巴西早期新生儿死亡率风险的模拟和真实数据,将我们的模型与文献中的现有方法进行了比较:当获得有关数据质量的部分信息时,所提出的方法被证明是准确的,特别适合。
{"title":"A Bayesian nonparametric approach to correct for underreporting in count data.","authors":"Serena Arima, Silvia Polettini, Giuseppe Pasculli, Loreto Gesualdo, Francesco Pesce, Deni-Aldo Procaccini","doi":"10.1093/biostatistics/kxad027","DOIUrl":"10.1093/biostatistics/kxad027","url":null,"abstract":"<p><p>We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"904-918"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing microbial evolution through gene and genome phylogenies. 通过基因和基因组系统发育分析微生物进化。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad025
Sarah Teichman, Michael D Lee, Amy D Willis

Microbiome scientists critically need modern tools to explore and analyze microbial evolution. Often this involves studying the evolution of microbial genomes as a whole. However, different genes in a single genome can be subject to different evolutionary pressures, which can result in distinct gene-level evolutionary histories. To address this challenge, we propose to treat estimated gene-level phylogenies as data objects, and present an interactive method for the analysis of a collection of gene phylogenies. We use a local linear approximation of phylogenetic tree space to visualize estimated gene trees as points in low-dimensional Euclidean space, and address important practical limitations of existing related approaches, allowing an intuitive visualization of complex data objects. We demonstrate the utility of our proposed approach through microbial data analyses, including by identifying outlying gene histories in strains of Prevotella, and by contrasting Streptococcus phylogenies estimated using different gene sets. Our method is available as an open-source R package, and assists with estimating, visualizing, and interacting with a collection of bacterial gene phylogenies.

微生物组科学家迫切需要现代工具来探索和分析微生物进化。这通常涉及到从整体上研究微生物基因组的进化。然而,单个基因组中的不同基因可能受到不同的进化压力,这可能导致不同的基因水平进化史。为了应对这一挑战,我们建议将估计的基因水平系统发育视为数据对象,并提出一种用于分析基因系统发育集合的交互式方法。我们使用系统发育树空间的局部线性近似来将估计的基因树可视化为低维欧几里得空间中的点,并解决现有相关方法的重要实际局限性,从而实现复杂数据对象的直观可视化。我们通过微生物数据分析证明了我们提出的方法的实用性,包括通过鉴定普雷沃氏菌菌株中的外围基因史,以及通过对比使用不同基因集估计的链球菌系统发育。我们的方法是一个开源的R包,有助于估计、可视化和与细菌基因系统发育的集合相互作用。
{"title":"Analyzing microbial evolution through gene and genome phylogenies.","authors":"Sarah Teichman, Michael D Lee, Amy D Willis","doi":"10.1093/biostatistics/kxad025","DOIUrl":"10.1093/biostatistics/kxad025","url":null,"abstract":"<p><p>Microbiome scientists critically need modern tools to explore and analyze microbial evolution. Often this involves studying the evolution of microbial genomes as a whole. However, different genes in a single genome can be subject to different evolutionary pressures, which can result in distinct gene-level evolutionary histories. To address this challenge, we propose to treat estimated gene-level phylogenies as data objects, and present an interactive method for the analysis of a collection of gene phylogenies. We use a local linear approximation of phylogenetic tree space to visualize estimated gene trees as points in low-dimensional Euclidean space, and address important practical limitations of existing related approaches, allowing an intuitive visualization of complex data objects. We demonstrate the utility of our proposed approach through microbial data analyses, including by identifying outlying gene histories in strains of Prevotella, and by contrasting Streptococcus phylogenies estimated using different gene sets. Our method is available as an open-source R package, and assists with estimating, visualizing, and interacting with a collection of bacterial gene phylogenies.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"786-800"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247178/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66784613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian nonparametric approach for multiple mediators with applications in mental health studies. 应用于心理健康研究的贝叶斯非参数多重中介方法。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad038
Samrat Roy, Michael J Daniels, Jason Roy

Mediation analysis with contemporaneously observed multiple mediators is a significant area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect or estimate the joint mediation effect just as the sum of individual mediator effects, ignoring the interaction among the mediators. In this article, we propose a novel Bayesian nonparametric method that overcomes the two aforementioned drawbacks. We model the joint distribution of the observed data (outcome, mediators, treatment, and confounders) flexibly using an enriched Dirichlet process mixture with three levels. We use standardization (g-computation) to compute all possible mediation effects, including pairwise and all other possible interaction among the mediators. We thoroughly explore our method via simulations and apply our method to a mental health data from Wisconsin Longitudinal Study, where we estimate how the effect of births from unintended pregnancies on later life mental depression (CES-D) among the mothers is mediated through lack of self-acceptance and autonomy, employment instability, lack of social participation, and increased family stress. Our method identified significant individual mediators, along with some significant pairwise effects.

利用同时观测到的多个中介因子进行中介分析是因果推断的一个重要领域。最近针对多中介因素的方法通常基于参数模型,因此可能存在模型规范错误的问题。此外,大部分现有文献要么只允许估计联合中介效应,要么只将联合中介效应估计为单个中介效应之和,而忽略了中介效应之间的相互作用。在本文中,我们提出了一种新颖的贝叶斯非参数方法,克服了上述两个缺点。我们使用一个具有三个层次的富集 Dirichlet 过程混合物,对观测数据(结果、中介效应、治疗和混杂因素)的联合分布进行灵活建模。我们使用标准化(g-计算)来计算所有可能的中介效应,包括成对的中介效应和中介间所有其他可能的相互作用。我们通过模拟对我们的方法进行了深入探讨,并将我们的方法应用于威斯康星纵向研究的心理健康数据中,我们估计了意外怀孕生育对母亲日后精神抑郁(CES-D)的影响是如何通过缺乏自我接纳和自主、就业不稳定、缺乏社会参与和家庭压力增大等因素进行中介的。我们的方法确定了重要的个体中介因素,以及一些重要的配对效应。
{"title":"A Bayesian nonparametric approach for multiple mediators with applications in mental health studies.","authors":"Samrat Roy, Michael J Daniels, Jason Roy","doi":"10.1093/biostatistics/kxad038","DOIUrl":"10.1093/biostatistics/kxad038","url":null,"abstract":"<p><p>Mediation analysis with contemporaneously observed multiple mediators is a significant area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect or estimate the joint mediation effect just as the sum of individual mediator effects, ignoring the interaction among the mediators. In this article, we propose a novel Bayesian nonparametric method that overcomes the two aforementioned drawbacks. We model the joint distribution of the observed data (outcome, mediators, treatment, and confounders) flexibly using an enriched Dirichlet process mixture with three levels. We use standardization (g-computation) to compute all possible mediation effects, including pairwise and all other possible interaction among the mediators. We thoroughly explore our method via simulations and apply our method to a mental health data from Wisconsin Longitudinal Study, where we estimate how the effect of births from unintended pregnancies on later life mental depression (CES-D) among the mothers is mediated through lack of self-acceptance and autonomy, employment instability, lack of social participation, and increased family stress. Our method identified significant individual mediators, along with some significant pairwise effects.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"919-932"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection in high dimensions for discrete-outcome individualized treatment rules: Reducing severity of depression symptoms. 离散结果的高维度变量选择个性化治疗规则:降低抑郁症状的严重程度。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad022
Erica E M Moodie, Zeyu Bian, Janie Coulombe, Yi Lian, Archer Y Yang, Susan M Shortreed

Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.

尽管人们对评估个体化治疗规则越来越感兴趣,但很少关注二元结果设置。非线性链接函数的估计具有挑战性,尤其是在需要变量选择的情况下。在抑郁症治疗的案例研究中,我们使用一种新的计算方法来求解最近提出的双鲁棒正则化估计方程,以完成这项艰巨的任务。我们展示了这种新方法与加权和惩罚估计方程相结合在这种具有挑战性的二元结果设置中的应用。我们证明了该方法的双重稳健性及其对变量选择的有效性。这项工作的动机是利用在华盛顿凯撒永久医院接受治疗的患者群体对单极性抑郁症的治疗进行分析。
{"title":"Variable selection in high dimensions for discrete-outcome individualized treatment rules: Reducing severity of depression symptoms.","authors":"Erica E M Moodie, Zeyu Bian, Janie Coulombe, Yi Lian, Archer Y Yang, Susan M Shortreed","doi":"10.1093/biostatistics/kxad022","DOIUrl":"10.1093/biostatistics/kxad022","url":null,"abstract":"<p><p>Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"633-647"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10201574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction. 诊断肾梗阻的异质性数据模式的综合潜在类模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad020
Jeong Hoon Jang, Changgee Chang, Amita K Manatunga, Andrew T Taylor, Qi Long

Radionuclide imaging plays a critical role in the diagnosis and management of kidney obstruction. However, most practicing radiologists in US hospitals have insufficient time and resources to acquire training and experience needed to interpret radionuclide images, leading to increased diagnostic errors. To tackle this problem, Emory University embarked on a study that aims to develop a computer-assisted diagnostic (CAD) tool for kidney obstruction by mining and analyzing patient data comprised of renogram curves, ordinal expert ratings on the obstruction status, pharmacokinetic variables, and demographic information. The major challenges here are the heterogeneity in data modes and the lack of gold standard for determining kidney obstruction. In this article, we develop a statistically principled CAD tool based on an integrative latent class model that leverages heterogeneous data modalities available for each patient to provide accurate prediction of kidney obstruction. Our integrative model consists of three sub-models (multilevel functional latent factor regression model, probit scalar-on-function regression model, and Gaussian mixture model), each of which is tailored to the specific data mode and depends on the unknown obstruction status (latent class). An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. Extensive simulations are conducted to evaluate the performance of the proposed method. An application to an Emory renal study demonstrates the usefulness of our model as a CAD tool for kidney obstruction.

放射性核素成像在肾梗阻的诊断和治疗中起着至关重要的作用。然而,美国医院的大多数执业放射科医生没有足够的时间和资源来获得解释放射性核素图像所需的培训和经验,导致诊断错误增加。为了解决这个问题,埃默里大学开展了一项研究,旨在开发一种计算机辅助诊断(CAD)工具,通过挖掘和分析患者数据,包括肾图曲线,阻塞状态的顺序专家评分,药代动力学变量和人口统计信息。这里的主要挑战是数据模式的异质性和缺乏确定肾梗阻的金标准。在本文中,我们开发了一种基于综合潜在分类模型的统计学原理CAD工具,该模型利用每个患者可用的异构数据模式来提供准确的肾梗阻预测。我们的综合模型包括三个子模型(多层功能潜在因素回归模型、概率标量-函数回归模型和高斯混合模型),每个子模型都针对特定的数据模式进行定制,并取决于未知阻塞状态(潜在类别)。提出了一种高效的MCMC算法来训练模型并预测具有相关不确定性的肾梗阻。进行了大量的仿真来评估所提出的方法的性能。在一项Emory肾脏研究中的应用证明了我们的模型作为肾梗阻CAD工具的有效性。
{"title":"An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction.","authors":"Jeong Hoon Jang, Changgee Chang, Amita K Manatunga, Andrew T Taylor, Qi Long","doi":"10.1093/biostatistics/kxad020","DOIUrl":"10.1093/biostatistics/kxad020","url":null,"abstract":"<p><p>Radionuclide imaging plays a critical role in the diagnosis and management of kidney obstruction. However, most practicing radiologists in US hospitals have insufficient time and resources to acquire training and experience needed to interpret radionuclide images, leading to increased diagnostic errors. To tackle this problem, Emory University embarked on a study that aims to develop a computer-assisted diagnostic (CAD) tool for kidney obstruction by mining and analyzing patient data comprised of renogram curves, ordinal expert ratings on the obstruction status, pharmacokinetic variables, and demographic information. The major challenges here are the heterogeneity in data modes and the lack of gold standard for determining kidney obstruction. In this article, we develop a statistically principled CAD tool based on an integrative latent class model that leverages heterogeneous data modalities available for each patient to provide accurate prediction of kidney obstruction. Our integrative model consists of three sub-models (multilevel functional latent factor regression model, probit scalar-on-function regression model, and Gaussian mixture model), each of which is tailored to the specific data mode and depends on the unknown obstruction status (latent class). An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. Extensive simulations are conducted to evaluate the performance of the proposed method. An application to an Emory renal study demonstrates the usefulness of our model as a CAD tool for kidney obstruction.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"769-785"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10252590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantification and statistical modeling of droplet-based single-nucleus RNA-sequencing data. 基于液滴的单核rna测序数据的量化和统计建模。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad010
Albert Kuo, Kasper D Hansen, Stephanie C Hicks

In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.

在含有难以解离的细胞的复杂组织中,单核rna测序(snRNA-seq)已成为比单细胞rna测序(scRNA-seq)更好的测量基因表达的实验技术。为了在下游分析中准确地模拟这些数据,之前的工作表明,基于液滴的snRNA-seq数据不是零膨胀的,但基于液滴的snRNA-seq数据是否遵循相同的概率分布尚未得到系统评估。利用10x Genomics Chromium系统对小鼠皮质核和DropSeq系统对小鼠肾脏核的假阴性对照数据,我们发现基于液滴的snRNA-seq数据遵循负二项分布,这表明用于scRNA-seq的参数统计模型可转移到snRNA-seq中。此外,我们发现将定量定位策略从scRNA-seq调整为snRNA-seq的定量选择可以在下游分析和生物学解释中发挥重要作用。特别是,不包括内含子区域的参考转录组导致文库大小明显较小和细胞类型分类不一致。我们还证实了snRNA-seq数据中存在基因长度偏差,我们发现这种偏差存在于外显子和内含子读取中,并调查了这种偏差的潜在原因。
{"title":"Quantification and statistical modeling of droplet-based single-nucleus RNA-sequencing data.","authors":"Albert Kuo, Kasper D Hansen, Stephanie C Hicks","doi":"10.1093/biostatistics/kxad010","DOIUrl":"10.1093/biostatistics/kxad010","url":null,"abstract":"<p><p>In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"801-817"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247185/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9551865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blurring cluster randomized trials and observational studies: Two-Stage TMLE for subsampling, missingness, and few independent units. 模糊分组随机试验和观察研究:针对子抽样、缺失和少数独立单位的两阶段 TMLE。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad015
Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer

Cluster randomized trials (CRTs) often enroll large numbers of participants; yet due to resource constraints, only a subset of participants may be selected for outcome assessment, and those sampled may not be representative of all cluster members. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific endpoints and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters, limiting statistical power and raising concerns about finite sample performance. Motivated by SEARCH-TB, a CRT aimed at reducing incident tuberculosis infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation to account for three sources of missingness: (i) subsampling; (ii) measurement of baseline status among those sampled; and (iii) measurement of final status among those in the incidence cohort (persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which subunits of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave like an observational study. Our application to SEARCH-TB highlights the real-world impact of different assumptions on measurement and dependence; estimates relying on unrealistic assumptions suggested the intervention increased the incidence of TB infection by 18% (risk ratio [RR]=1.18, 95% confidence interval [CI]: 0.85-1.63), while estimates accounting for the sampling scheme, missingness, and within community dependence found the intervention decreased the incident TB by 27% (RR=0.73, 95% CI: 0.57-0.92).

分组随机试验(CRTs)通常会招募大量参与者,但由于资源限制,可能只会选择一部分参与者进行结果评估,而这些抽样者可能并不代表所有分组成员。缺失数据也是一个挑战:如果有测量结果的被抽样者与有缺失结果的被抽样者不同,那么对特定臂终点和干预效果的未调整估计可能会有偏差。此外,CRT 通常只对少数群组进行招募和随机化,从而限制了统计能力,并引发了对有限样本性能的担忧。SEARCH-TB 是一项旨在减少结核病感染事件的 CRT,受此启发,我们展示了应对这些挑战的连锁方法。首先,我们扩展了 "两阶段 "目标最小损失估计法,以考虑三个缺失来源:(i) 子抽样;(ii) 在被抽样者中测量基线状态;(iii) 在发病队列中测量最终状态(已知基线时处于风险中的人)。其次,我们严格评估了假设条件,在这些假设条件下,群组的子单元可被视为条件独立单元,从而提高精确度和统计能力,但同时也会导致 CRT 表现得像一项观察性研究。我们对 SEARCH-TB 的应用凸显了不同假设对测量和依赖性的实际影响;根据不切实际的假设得出的估计结果表明,干预措施使肺结核感染率增加了 18%(风险比 [RR]=1.18,95% 置信区间 [CI]:0.85-1.63),而考虑到抽样方案、遗漏和社区内依赖性的估计结果表明,干预措施使肺结核发病率降低了 27%(RR=0.73,95% 置信区间:0.57-0.92)。
{"title":"Blurring cluster randomized trials and observational studies: Two-Stage TMLE for subsampling, missingness, and few independent units.","authors":"Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer","doi":"10.1093/biostatistics/kxad015","DOIUrl":"10.1093/biostatistics/kxad015","url":null,"abstract":"<p><p>Cluster randomized trials (CRTs) often enroll large numbers of participants; yet due to resource constraints, only a subset of participants may be selected for outcome assessment, and those sampled may not be representative of all cluster members. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific endpoints and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters, limiting statistical power and raising concerns about finite sample performance. Motivated by SEARCH-TB, a CRT aimed at reducing incident tuberculosis infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation to account for three sources of missingness: (i) subsampling; (ii) measurement of baseline status among those sampled; and (iii) measurement of final status among those in the incidence cohort (persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which subunits of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave like an observational study. Our application to SEARCH-TB highlights the real-world impact of different assumptions on measurement and dependence; estimates relying on unrealistic assumptions suggested the intervention increased the incidence of TB infection by 18% (risk ratio [RR]=1.18, 95% confidence interval [CI]: 0.85-1.63), while estimates accounting for the sampling scheme, missingness, and within community dependence found the intervention decreased the incident TB by 27% (RR=0.73, 95% CI: 0.57-0.92).</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"599-616"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10516286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty directed factorial clinical trials. 不确定性指导的因子临床试验。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad036
Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa

The development and evaluation of novel treatment combinations is a key component of modern clinical research. The primary goals of factorial clinical trials of treatment combinations range from the estimation of intervention-specific effects, or the discovery of potential synergies, to the identification of combinations with the highest response probabilities. Most factorial studies use balanced or block randomization, with an equal number of patients assigned to each treatment combination, irrespective of the specific goals of the trial. Here, we introduce a class of Bayesian response-adaptive designs for factorial clinical trials with binary outcomes. The study design was developed using Bayesian decision-theoretic arguments and adapts the randomization probabilities to treatment combinations during the enrollment period based on the available data. Our approach enables the investigator to specify a utility function representative of the aims of the trial, and the Bayesian response-adaptive randomization algorithm aims to maximize this utility function. We considered several utility functions and factorial designs tailored to them. Then, we conducted a comparative simulation study to illustrate relevant differences of key operating characteristics across the resulting designs. We also investigated the asymptotic behavior of the proposed adaptive designs. We also used data summaries from three recent factorial trials in perioperative care, smoking cessation, and infectious disease prevention to define realistic simulation scenarios and illustrate advantages of the introduced trial designs compared to other study designs.

开发和评估新型治疗组合是现代临床研究的重要组成部分。对治疗组合进行因子临床试验的主要目的包括估算特定干预措施的效果、发现潜在的协同作用以及确定具有最高应答概率的治疗组合。不管试验的具体目标是什么,大多数因子研究都采用平衡随机化或分块随机化,将相同数量的患者分配到每种治疗组合中。在此,我们介绍一类贝叶斯反应自适应设计,用于二元结果的因子临床试验。该研究设计是利用贝叶斯决策理论论据开发出来的,它能根据现有数据调整入组期间治疗组合的随机化概率。我们的方法使研究者能够指定一个代表试验目的的效用函数,而贝叶斯反应自适应随机化算法的目的就是使该效用函数最大化。我们考虑了几种效用函数和针对它们的因子设计。然后,我们进行了一项比较模拟研究,以说明不同设计的关键运行特征之间的相关差异。我们还研究了所提出的自适应设计的渐进行为。我们还使用了最近在围手术期护理、戒烟和传染病预防方面进行的三项因子试验的数据摘要,以确定现实的模拟场景,并说明所引入的试验设计与其他研究设计相比的优势。
{"title":"Uncertainty directed factorial clinical trials.","authors":"Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa","doi":"10.1093/biostatistics/kxad036","DOIUrl":"10.1093/biostatistics/kxad036","url":null,"abstract":"<p><p>The development and evaluation of novel treatment combinations is a key component of modern clinical research. The primary goals of factorial clinical trials of treatment combinations range from the estimation of intervention-specific effects, or the discovery of potential synergies, to the identification of combinations with the highest response probabilities. Most factorial studies use balanced or block randomization, with an equal number of patients assigned to each treatment combination, irrespective of the specific goals of the trial. Here, we introduce a class of Bayesian response-adaptive designs for factorial clinical trials with binary outcomes. The study design was developed using Bayesian decision-theoretic arguments and adapts the randomization probabilities to treatment combinations during the enrollment period based on the available data. Our approach enables the investigator to specify a utility function representative of the aims of the trial, and the Bayesian response-adaptive randomization algorithm aims to maximize this utility function. We considered several utility functions and factorial designs tailored to them. Then, we conducted a comparative simulation study to illustrate relevant differences of key operating characteristics across the resulting designs. We also investigated the asymptotic behavior of the proposed adaptive designs. We also used data summaries from three recent factorial trials in perioperative care, smoking cessation, and infectious disease prevention to define realistic simulation scenarios and illustrate advantages of the introduced trial designs compared to other study designs.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"833-851"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes. 用时间序列观测数据对混合结果进行因果推理的贝叶斯多因素分析模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad030
Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis

Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.

在许多科学研究领域,利用多单位和结果的时间序列观测数据来评估干预措施的影响是一个常见的问题。在这里,我们提出了一种新的贝叶斯多元因素分析模型来估计这种情况下的干预效果,并开发了一种有效的马尔可夫链蒙特卡罗算法来从高维和不可处理的后验中采样。所提出的方法是为数不多的能够同时处理混合类型(连续、二项、计数)结果的方法之一,通过联合建模受干预影响的多个结果来提高因果效应估计的效率,并易于为所有感兴趣的因果估计提供不确定性量化。使用建议的方法,我们评估了地方追踪伙伴关系对英格兰COVID-19测试和追踪计划有效性的影响。
{"title":"A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes.","authors":"Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis","doi":"10.1093/biostatistics/kxad030","DOIUrl":"10.1093/biostatistics/kxad030","url":null,"abstract":"<p><p>Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"867-884"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138500308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Covariate-guided Bayesian mixture of spline experts for the analysis of multivariate high-density longitudinal data. 用于分析多变量高密度纵向数据的协变量指导贝叶斯混合样条专家。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biostatistics/kxad034
Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty

With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.

随着大脑活动和结构测量技术的快速发展,用于分析现代脑成像数据的统计方法在科学进步中发挥着重要作用。测量脑功能的成像数据通常是多变量高密度纵向数据,而且不同成像源和受试者之间存在异质性,这就给统计和计算带来了各种挑战。在本文中,我们提出了一种基于组的方法,通过贝叶斯混合平滑样条对多元高密度纵向数据集合进行聚类。我们的方法假设每个多变量高密度纵向轨迹都是具有不同混合权重的多个分量的混合物。假定与时间无关的协变量与混合物成分相关,并通过专家混合物模型的对数权重将其纳入。我们在完全贝叶斯框架下利用吉布斯抽样法制定了这一方法,其中成分的数量是根据偏差信息标准选择的。我们通过模拟研究将所提出的方法与现有方法进行了比较,并将其应用于一项功能性近红外光谱研究,该研究旨在了解婴儿的情绪反应和压力恢复情况。研究结果揭示了大脑活动的独特模式,以及这些模式与选定协变量之间的关联。
{"title":"Covariate-guided Bayesian mixture of spline experts for the analysis of multivariate high-density longitudinal data.","authors":"Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty","doi":"10.1093/biostatistics/kxad034","DOIUrl":"10.1093/biostatistics/kxad034","url":null,"abstract":"<p><p>With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"666-680"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139032905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1