首页 > 最新文献

Biostatistics最新文献

英文 中文
Unlocking the power of time-since-infection models: data augmentation for improved instantaneous reproduction number estimation. 释放感染后时间模型的威力:通过数据扩增改进瞬时繁殖数估算。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae054
Jiasheng Shi, Yizhao Zhou, Jing Huang

The time-since-infection (TSI) models, which use disease surveillance data to model infectious diseases, have become increasingly popular due to their flexibility and capacity to address complex disease control questions. However, a notable limitation of TSI models is their primary reliance on incidence data. Even when hospitalization data are available, existing TSI models have not been crafted to improve the estimation of disease transmission or to estimate hospitalization-related parameters-metrics crucial for understanding a pandemic and planning hospital resources. Moreover, their dependence on reported infection data makes them vulnerable to variations in data quality. In this study, we advance TSI models by integrating hospitalization data, marking a significant step forward in modeling with TSI models. We introduce hospitalization propensity parameters to jointly model incidence and hospitalization data. We use a composite likelihood function to accommodate complex data structure and a Monte Carlo expectation-maximization algorithm to estimate model parameters. We analyze COVID-19 data to estimate disease transmission, assess risk factor impacts, and calculate hospitalization propensity. Our model improves the accuracy of estimating the instantaneous reproduction number in TSI models, particularly when hospitalization data is of higher quality than incidence data. It enables the estimation of key infectious disease parameters without relying on contact tracing data and provides a foundation for integrating TSI models with other infectious disease models.

自感染以来时间(TSI)模型利用疾病监测数据对传染病进行建模,因其灵活性和解决复杂疾病控制问题的能力而越来越受欢迎。然而,TSI 模型的一个显著局限是主要依赖于发病率数据。即使在有住院数据的情况下,现有的 TSI 模型也无法改进对疾病传播的估计或估计与住院相关的参数,而这些参数对于了解大流行病和规划医院资源至关重要。此外,这些模型依赖于报告的感染数据,因此容易受到数据质量变化的影响。在本研究中,我们通过整合住院数据推进了 TSI 模型,标志着 TSI 模型在建模方面迈出了重要一步。我们引入了住院倾向参数,对发病率和住院数据进行联合建模。我们使用复合似然函数来适应复杂的数据结构,并使用蒙特卡罗期望最大化算法来估计模型参数。我们分析了 COVID-19 数据,以估计疾病传播、评估风险因素影响并计算住院倾向。我们的模型提高了 TSI 模型中估计瞬时繁殖数量的准确性,尤其是当住院数据的质量高于发病数据时。它可以在不依赖接触追踪数据的情况下估算传染病的关键参数,并为 TSI 模型与其他传染病模型的整合奠定了基础。
{"title":"Unlocking the power of time-since-infection models: data augmentation for improved instantaneous reproduction number estimation.","authors":"Jiasheng Shi, Yizhao Zhou, Jing Huang","doi":"10.1093/biostatistics/kxae054","DOIUrl":"10.1093/biostatistics/kxae054","url":null,"abstract":"<p><p>The time-since-infection (TSI) models, which use disease surveillance data to model infectious diseases, have become increasingly popular due to their flexibility and capacity to address complex disease control questions. However, a notable limitation of TSI models is their primary reliance on incidence data. Even when hospitalization data are available, existing TSI models have not been crafted to improve the estimation of disease transmission or to estimate hospitalization-related parameters-metrics crucial for understanding a pandemic and planning hospital resources. Moreover, their dependence on reported infection data makes them vulnerable to variations in data quality. In this study, we advance TSI models by integrating hospitalization data, marking a significant step forward in modeling with TSI models. We introduce hospitalization propensity parameters to jointly model incidence and hospitalization data. We use a composite likelihood function to accommodate complex data structure and a Monte Carlo expectation-maximization algorithm to estimate model parameters. We analyze COVID-19 data to estimate disease transmission, assess risk factor impacts, and calculate hospitalization propensity. Our model improves the accuracy of estimating the instantaneous reproduction number in TSI models, particularly when hospitalization data is of higher quality than incidence data. It enables the estimation of key infectious disease parameters without relying on contact tracing data and provides a foundation for integrating TSI models with other infectious disease models.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalized likelihood optimization for censored missing value imputation in proteomics. 蛋白质组学中缺失值估算的惩罚似然优化。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf006
Lucas Etourneau, Laura Fancello, Samuel Wieczorek, Nelle Varoquaux, Thomas Burger

Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long been established as one of the most popular high-throughput analysis workflows for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we introduce Pirat, an algorithm that harnesses this challenge using an original likelihood maximization strategy. Notably, it models the instrument limit by learning a global censoring mechanism from the data available. Moreover, it estimates the covariance matrix between enzymatic cleavage products (ie peptides or precursor ions), while offering a natural way to integrate complementary transcriptomic information when multi-omic assays are available. Our benchmarking on several datasets covering a variety of experimental designs (number of samples, acquisition mode, missingness patterns, etc.) and using a variety of metrics (differential analysis ground truth or imputation errors) shows that Pirat outperforms all pre-existing imputation methods. Beyond the interest of Pirat as an imputation tool, these results pinpoint the need for a paradigm change in proteomics imputation, as most pre-existing strategies could be boosted by incorporating similar models to account for the instrument censorship or for the correlation structures, either grounded to the analytical pipeline or arising from a multi-omic approach.

使用质谱和液相色谱的无标签自下而上的蛋白质组学长期以来一直是蛋白质组学表征最流行的高通量分析工作流程之一。然而,它产生的数据受到复杂和异构缺失值的阻碍,这是长期以来一直存在的问题。为了解决这个问题,我们引入了Pirat,这是一种利用原始可能性最大化策略来应对这一挑战的算法。值得注意的是,它通过从现有数据中学习全球审查机制来模拟工具限制。此外,它估计了酶裂解产物(即肽或前体离子)之间的协方差矩阵,同时提供了一种自然的方法来整合互补的转录组信息,当多组分析是可用的。我们对涵盖各种实验设计(样本数量、采集模式、缺失模式等)的多个数据集进行基准测试,并使用各种度量(差分分析基础真值或输入误差)表明Pirat优于所有现有的输入方法。除了Pirat作为一种输入工具的兴趣之外,这些结果还指出了蛋白质组学输入的范式改变的必要性,因为大多数现有的策略可以通过合并类似的模型来促进仪器审查或相关结构,无论是基于分析管道还是来自多组学方法。
{"title":"Penalized likelihood optimization for censored missing value imputation in proteomics.","authors":"Lucas Etourneau, Laura Fancello, Samuel Wieczorek, Nelle Varoquaux, Thomas Burger","doi":"10.1093/biostatistics/kxaf006","DOIUrl":"10.1093/biostatistics/kxaf006","url":null,"abstract":"<p><p>Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long been established as one of the most popular high-throughput analysis workflows for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we introduce Pirat, an algorithm that harnesses this challenge using an original likelihood maximization strategy. Notably, it models the instrument limit by learning a global censoring mechanism from the data available. Moreover, it estimates the covariance matrix between enzymatic cleavage products (ie peptides or precursor ions), while offering a natural way to integrate complementary transcriptomic information when multi-omic assays are available. Our benchmarking on several datasets covering a variety of experimental designs (number of samples, acquisition mode, missingness patterns, etc.) and using a variety of metrics (differential analysis ground truth or imputation errors) shows that Pirat outperforms all pre-existing imputation methods. Beyond the interest of Pirat as an imputation tool, these results pinpoint the need for a paradigm change in proteomics imputation, as most pre-existing strategies could be boosted by incorporating similar models to account for the instrument censorship or for the correlation structures, either grounded to the analytical pipeline or arising from a multi-omic approach.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selection processes, transportability, and failure time analysis in life history studies. 生命史研究中的选择过程、可迁移性和失效时间分析。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae039
Richard J Cook, Jerald F Lawless

In life history analysis of data from cohort studies, it is important to address the process by which participants are identified and selected. Many health studies select or enrol individuals based on whether they have experienced certain health related events, for example, disease diagnosis or some complication from disease. Standard methods of analysis rely on assumptions concerning the independence of selection and a person's prospective life history process, given their prior history. Violations of such assumptions are common, however, and can bias estimation of process features. This has implications for the internal and external validity of cohort studies, and for the transportabilty of results to a population. In this paper, we study failure time analysis by proposing a joint model for the cohort selection process and the failure process of interest. This allows us to address both independence assumptions and the transportability of study results. It is shown that transportability cannot be guaranteed in the absence of auxiliary information on the population. Conditions that produce dependent selection and types of auxiliary data are discussed and illustrated in numerical studies. The proposed framework is applied to a study of the risk of psoriatic arthritis in persons with psoriasis.

在对队列研究的数据进行生命史分析时,重要的是要解决参与者的识别和选择过程。许多健康研究都是根据个人是否经历过某些健康相关事件(如疾病诊断或疾病引起的某些并发症)来选择或招募个人的。标准的分析方法依赖于对选择的独立性和一个人的前瞻性生活史过程(考虑到其先前的历史)的假设。然而,违反这些假设的情况很常见,而且会对过程特征的估计产生偏差。这对队列研究的内部和外部有效性以及结果在人群中的可迁移性都有影响。在本文中,我们通过提出队列选择过程和相关失效过程的联合模型来研究失效时间分析。这样,我们就能同时解决独立性假设和研究结果的可迁移性问题。研究表明,在没有人口辅助信息的情况下,可迁移性无法得到保证。在数值研究中讨论并说明了产生依赖性选择的条件和辅助数据类型。提出的框架适用于银屑病患者银屑病关节炎风险的研究。
{"title":"Selection processes, transportability, and failure time analysis in life history studies.","authors":"Richard J Cook, Jerald F Lawless","doi":"10.1093/biostatistics/kxae039","DOIUrl":"10.1093/biostatistics/kxae039","url":null,"abstract":"<p><p>In life history analysis of data from cohort studies, it is important to address the process by which participants are identified and selected. Many health studies select or enrol individuals based on whether they have experienced certain health related events, for example, disease diagnosis or some complication from disease. Standard methods of analysis rely on assumptions concerning the independence of selection and a person's prospective life history process, given their prior history. Violations of such assumptions are common, however, and can bias estimation of process features. This has implications for the internal and external validity of cohort studies, and for the transportabilty of results to a population. In this paper, we study failure time analysis by proposing a joint model for the cohort selection process and the failure process of interest. This allows us to address both independence assumptions and the transportability of study results. It is shown that transportability cannot be guaranteed in the absence of auxiliary information on the population. Conditions that produce dependent selection and types of auxiliary data are discussed and illustrated in numerical studies. The proposed framework is applied to a study of the risk of psoriatic arthritis in persons with psoriasis.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-analysis models with group structure for pleiotropy detection at gene and variant level using summary statistics from multiple datasets. 在基因和变异水平上使用汇总统计数据进行多效性检测的群体结构荟萃分析模型。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf037
Pierre-Emmanuel Sugier, Yazdan Asgari, Mohammed Sedki, Thérèse Truong, Benoit Liquet

Genome-wide association studies (GWASs) have highlighted the importance of pleiotropy in human diseases, where one gene can impact 2 or more unrelated traits. Examining shared genetic risk factors across multiple diseases can enhance our understanding of these conditions by pinpointing new genes and biological pathways involved. Furthermore, with an increasing wealth of GWAS summary statistics available to the scientific community, leveraging these findings across multiple phenotypes could unveil novel pleiotropic associations. Existing selection methods examine pleiotropic associations one by one at a scale of either the genetic variant or the gene, and thus cannot consider all the genetic information at the same time. To address this limitation, we propose a new approach called MPSG (Meta-analysis model adapted for Pleiotropy Selection with Group structure). This method performs a penalized multivariate meta-analysis method adapted for pleiotropy and takes into account the group structure information nested in the data to select relevant variants and genes (or pathways) from all the genetic information. To do so, we implemented an alternating direction method of multipliers algorithm. We compared the performance of the method with other benchmark meta-analysis approaches such as GCPBayes, PLACO, and ASSET by considering as inputs different kinds of summary statistics. We provide an application of our method to the identification of potential pleiotropic genes between breast and thyroid cancers.

全基因组关联研究(GWASs)强调了多效性在人类疾病中的重要性,其中一个基因可以影响2个或更多不相关的性状。检查多种疾病的共同遗传风险因素可以通过确定新基因和所涉及的生物学途径来增强我们对这些疾病的理解。此外,随着越来越多的GWAS汇总统计数据提供给科学界,利用这些发现跨多种表型可以揭示新的多效性关联。现有的选择方法在遗传变异或基因的尺度上逐个考察多效性关联,因此不能同时考虑所有的遗传信息。为了解决这一限制,我们提出了一种新的方法,称为MPSG(适应多效选择与群体结构的元分析模型)。该方法采用了一种适用于多效性的惩罚多元元分析方法,并考虑了数据中嵌套的群体结构信息,从所有遗传信息中选择相关的变异和基因(或途径)。为此,我们实现了乘法器算法的交替方向法。通过将不同类型的汇总统计数据作为输入,我们将该方法与其他基准元分析方法(如GCPBayes、PLACO和ASSET)的性能进行了比较。我们提供了一个应用我们的方法来鉴定乳腺癌和甲状腺癌之间潜在的多效性基因。
{"title":"Meta-analysis models with group structure for pleiotropy detection at gene and variant level using summary statistics from multiple datasets.","authors":"Pierre-Emmanuel Sugier, Yazdan Asgari, Mohammed Sedki, Thérèse Truong, Benoit Liquet","doi":"10.1093/biostatistics/kxaf037","DOIUrl":"10.1093/biostatistics/kxaf037","url":null,"abstract":"<p><p>Genome-wide association studies (GWASs) have highlighted the importance of pleiotropy in human diseases, where one gene can impact 2 or more unrelated traits. Examining shared genetic risk factors across multiple diseases can enhance our understanding of these conditions by pinpointing new genes and biological pathways involved. Furthermore, with an increasing wealth of GWAS summary statistics available to the scientific community, leveraging these findings across multiple phenotypes could unveil novel pleiotropic associations. Existing selection methods examine pleiotropic associations one by one at a scale of either the genetic variant or the gene, and thus cannot consider all the genetic information at the same time. To address this limitation, we propose a new approach called MPSG (Meta-analysis model adapted for Pleiotropy Selection with Group structure). This method performs a penalized multivariate meta-analysis method adapted for pleiotropy and takes into account the group structure information nested in the data to select relevant variants and genes (or pathways) from all the genetic information. To do so, we implemented an alternating direction method of multipliers algorithm. We compared the performance of the method with other benchmark meta-analysis approaches such as GCPBayes, PLACO, and ASSET by considering as inputs different kinds of summary statistics. We provide an application of our method to the identification of potential pleiotropic genes between breast and thyroid cancers.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145535173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAMITA: Bayesian multiple imputation for tensor arrays. BAMITA:张量阵列的贝叶斯多重估算。
IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae047
Ziren Jiang, Gen Li, Eric F Lock

Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.

在一些生物医学领域,数据越来越多地采用多路阵列或张量的形式。这样的张量通常是不完全观察到的。例如,我们受到纵向微生物组研究的激励,其中几个主题缺少几个时间点。关于张量缺失数据的输入的文献越来越多。然而,现有的方法给出了缺失值的点估计,而没有捕捉不确定性。我们在灵活的贝叶斯框架中提出了张量的多重输入方法,该方法可以为缺失的条目产生真实的模拟值,并可以通过随后的分析传播不确定性。我们的模型采用高效且广泛适用的共轭先验进行CANDECOMP/PARAFAC (CP)分解,并具有可分离残差协方差结构。对于缺少张量的单个条目或整个纤维的情况,该方法在输入精度和不确定度校准方面表现良好。对于两个微生物组的应用,它被证明可以准确地捕获缺失时间点的完整微生物组谱的不确定性,并用于推断种群物种多样性的趋势。文档化的R代码来执行我们的多重插值方法可以在https://github.com/lockEF/MultiwayImputation上找到。
{"title":"BAMITA: Bayesian multiple imputation for tensor arrays.","authors":"Ziren Jiang, Gen Li, Eric F Lock","doi":"10.1093/biostatistics/kxae047","DOIUrl":"10.1093/biostatistics/kxae047","url":null,"abstract":"<p><p>Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. DifferentialRegulation:一种贝叶斯分层方法,用于识别差异调控基因。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae017
Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, Mark D Robinson

Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.

虽然转录组学数据通常用于分析成熟剪接的 mRNA,但最近的研究重点是联合研究剪接和未剪接(或前体)的 mRNA,这可用于研究基因调控和基因表达生成的变化。然而,大多数剪接/未剪接推断方法(如 RNA 速度工具)都侧重于单个样本,很少允许在样本组(如健康样本与患病样本)之间进行比较。此外,这种推断具有挑战性,因为剪接和非剪接的 mRNA 丰度具有高度的定量不确定性,这是由于多映射读数(即与多个转录本(或基因)兼容的读数,和/或与它们的剪接和非剪接版本兼容的读数)的普遍存在。在此,我们介绍一种贝叶斯分层方法 DifferentialRegulation,用于发现不同实验条件下未剪接 mRNA(相对于总 mRNA)相对丰度的变化。我们通过潜变量方法对量化的不确定性进行建模,将读数分配到其源基因/转录本以及各自的剪接版本。我们设计了几个基准,与最先进的竞争对手相比,我们的方法在灵敏度和误差控制方面表现出色。重要的是,我们的工具非常灵活,既能处理大容量数据,也能处理单细胞 RNA 序列数据。DifferentialRegulation 以 Bioconductor R 软件包的形式发布。
{"title":"DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes.","authors":"Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, Mark D Robinson","doi":"10.1093/biostatistics/kxae017","DOIUrl":"10.1093/biostatistics/kxae017","url":null,"abstract":"<p><p>Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1079-1093"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projection-based two-sample inference for sparsely observed multivariate functional data. 基于投影的稀疏观测多变量函数数据的双样本推断。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae004
Salil Koner, Sheng Luo

Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.

现代纵向研究收集多种结果作为主要终点,以了解疾病的复杂动态。通常情况下,特别是在临床试验中,多维反应之间的联合变化在评估两个或多个组之间的差异特征方面发挥着重要作用,而不是根据单一结果进行推断。我们开发了一种基于投影的双样本显著性检验,以确定在稀疏纵向设计下观察到的多变量特征之间的群体水平差异。该方法建立在广泛采用的多元函数主成分分析的基础上,以降低无限维多模态函数的维度,同时保留各成分之间的动态相关性。该检验适用于反应的多种(非平稳)协方差结构,而且只需一个 p 值就能检测出显著的组间差异,从而克服了因分别比较各分量的均值而产生的多个 p 值的调整问题。有限样本数值研究表明,与最先进的检验程序相比,该检验保持了 I 型误差,并能有力地检测出显著的组间差异。该检验在两项针对阿尔茨海默病和帕金森病(PD)患者的重要纵向研究中进行,即针对轻度认知障碍高危人群的 TOMMORROW 研究,以检测吡格列酮组和安慰剂组之间认知测试得分的差异;以及 Azillect 研究,以评估拉沙吉兰作为一种潜在治疗方法对延缓帕金森病进展的疗效。
{"title":"Projection-based two-sample inference for sparsely observed multivariate functional data.","authors":"Salil Koner, Sheng Luo","doi":"10.1093/biostatistics/kxae004","DOIUrl":"10.1093/biostatistics/kxae004","url":null,"abstract":"<p><p>Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1156-1177"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139984624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional support vector machine. 功能支持向量机
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae007
Shanghong Xie, R Todd Ogden

Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.

线性和广义线性标量-函数模型常用于了解标量响应变量(如连续、二元结果)与函数预测因子之间的关系。当响应变量和功能预测因子之间的关系很复杂时,这类技术对模型的错误规范很敏感。另一方面,支持向量机(SVM)是最稳健的预测模型之一,但不能考虑重复测量之间的高度相关性,也不能用于不规则数据。在这项工作中,我们提出了一种新方法,将功能主成分分析与 SVM 分类和回归技术相结合,以考虑功能数据的连续性以及标量响应变量与功能预测因子之间的非线性关系。我们通过大量模拟实验和两个真实数据应用证明了我们方法的性能:利用脑电信号对酗酒者进行分类,以及利用近红外反射光谱预测葡萄糖苷浓度。当功能预测因子的测量误差相对较大时,我们的方法尤其更具优势。
{"title":"Functional support vector machine.","authors":"Shanghong Xie, R Todd Ogden","doi":"10.1093/biostatistics/kxae007","DOIUrl":"10.1093/biostatistics/kxae007","url":null,"abstract":"<p><p>Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1178-1194"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140112299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Exponential family measurement error models for single-cell CRISPR screens. 更正:单细胞 CRISPR 筛选的指数族测量误差模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae022
{"title":"Correction to: Exponential family measurement error models for single-cell CRISPR screens.","authors":"","doi":"10.1093/biostatistics/kxae022","DOIUrl":"10.1093/biostatistics/kxae022","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1273"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12097883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141319004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian semiparametric model for sequential treatment decisions with informative timing. 具有信息时间的序列治疗决策的贝叶斯半参数模型。
IF 1.8 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad035
Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy

We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.

我们针对动态治疗规则对小儿急性髓性白血病(AML)患者生存期的影响建立了一个贝叶斯半参数模型。数据由参加 III 期临床试验的患者子集组成,在该试验中,患者依次接受四个疗程的治疗。在每个疗程中,患者接受的治疗可能包括也可能不包括蒽环类药物(ACT)。众所周知,蒽环类药物能有效治疗急性髓细胞白血病,但它也有心脏毒性,可能导致一些患者过早死亡。我们的任务是估算假设的动态 ACT 治疗策略下的潜在生存概率,但这有几个障碍。首先,由于 ACT 不是随机的,它对生存的影响会随着时间的推移而受到干扰。其次,受试者何时开始下一疗程取决于他们何时从上一疗程中康复,这使得时间可能对后续治疗和存活率产生影响。第三,患者可能在完成全部治疗序列之前死亡或退出。我们开发了一种基于伽马过程先验的贝叶斯半参数生成模型来解决这些复杂问题。在每个疗程中,该模型都能连续捕捉受试者向后续治疗或死亡的转变。G 计算用于计算潜在存活概率的后验值,并根据时变混杂因素进行调整。利用我们的方法,我们估算了假设治疗规则的疗效,这些规则根据不断变化的心脏功能动态修改 ACT。
{"title":"Bayesian semiparametric model for sequential treatment decisions with informative timing.","authors":"Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy","doi":"10.1093/biostatistics/kxad035","DOIUrl":"10.1093/biostatistics/kxad035","url":null,"abstract":"<p><p>We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"947-961"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139479547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1