Biostatistics最新文献_第4页

Correction. 更正。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae029

引用次数: 0

HMM for discovering decision-making dynamics using reinforcement learning experiments. 利用强化学习实验发现决策动态的 HMM。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae033

Xingche Guo, Donglin Zeng, Yuanjia Wang

Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

重度抑郁障碍（MDD）是导致残疾生活年数的主要原因，由于其复杂性和异质性，给诊断和治疗带来了挑战。新出现的证据表明，奖赏处理异常可作为重度抑郁症的行为标记。为了测量奖赏加工，患者要完成基于计算机的行为任务，其中涉及做出选择或对兴奋剂做出反应，而这些选择或反应与不同的结果有关，例如在实验室中的收益或损失。对强化学习（RL）模型进行拟合，以提取衡量奖赏处理各方面（如奖赏敏感性）的参数，从而描述患者在行为任务中如何做出决策。最近的研究结果表明，仅根据单一的 RL 模型来描述奖赏学习是不够的；相反，决策过程可能会在多种策略之间切换。一个重要的科学问题是，决策策略的动态变化如何影响 MDD 患者的奖赏学习能力。受 "建立临床护理中抗抑郁剂反应的调节因子和生物特征"（EMBARC）研究中的概率奖励任务的启发，我们提出了一种新的 RL-HMM（隐马尔可夫模型）框架，用于分析基于奖励的决策。我们的模型允许在 HMM 下的两种不同方法之间切换决策策略：受试者根据 RL 模型做出决策或选择随机选择。我们考虑了连续的 RL 状态空间，并允许 HMM 中的过渡概率随时间变化。我们引入了一种计算高效的期望最大化（EM）算法来进行参数估计，并使用非参数自举法进行推断。广泛的模拟研究验证了我们方法的有限样本性能。我们将我们的方法应用于 EMBARC 研究，结果表明与健康对照组相比，MDD 患者在 RL 中的参与度较低，而参与度与情绪冲突任务中负面情绪回路的大脑活动有关。

{"title":"HMM for discovering decision-making dynamics using reinforcement learning experiments.","authors":"Xingche Guo, Donglin Zeng, Yuanjia Wang","doi":"10.1093/biostatistics/kxae033","DOIUrl":"10.1093/biostatistics/kxae033","url":null,"abstract":"Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAMITA: Bayesian multiple imputation for tensor arrays. BAMITA：张量阵列的贝叶斯多重估算。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae047

Ziren Jiang, Gen Li, Eric F Lock

Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.

在一些生物医学领域，数据越来越多地采用多路阵列或张量的形式。这样的张量通常是不完全观察到的。例如，我们受到纵向微生物组研究的激励，其中几个主题缺少几个时间点。关于张量缺失数据的输入的文献越来越多。然而，现有的方法给出了缺失值的点估计，而没有捕捉不确定性。我们在灵活的贝叶斯框架中提出了张量的多重输入方法，该方法可以为缺失的条目产生真实的模拟值，并可以通过随后的分析传播不确定性。我们的模型采用高效且广泛适用的共轭先验进行CANDECOMP/PARAFAC （CP）分解，并具有可分离残差协方差结构。对于缺少张量的单个条目或整个纤维的情况，该方法在输入精度和不确定度校准方面表现良好。对于两个微生物组的应用，它被证明可以准确地捕获缺失时间点的完整微生物组谱的不确定性，并用于推断种群物种多样性的趋势。文档化的R代码来执行我们的多重插值方法可以在https://github.com/lockEF/MultiwayImputation上找到。

{"title":"BAMITA: Bayesian multiple imputation for tensor arrays.","authors":"Ziren Jiang, Gen Li, Eric F Lock","doi":"10.1093/biostatistics/kxae047","DOIUrl":"10.1093/biostatistics/kxae047","url":null,"abstract":"Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selection processes, transportability, and failure time analysis in life history studies. 生命史研究中的选择过程、可迁移性和失效时间分析。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae039

Richard J Cook, Jerald F Lawless

In life history analysis of data from cohort studies, it is important to address the process by which participants are identified and selected. Many health studies select or enrol individuals based on whether they have experienced certain health related events, for example, disease diagnosis or some complication from disease. Standard methods of analysis rely on assumptions concerning the independence of selection and a person's prospective life history process, given their prior history. Violations of such assumptions are common, however, and can bias estimation of process features. This has implications for the internal and external validity of cohort studies, and for the transportabilty of results to a population. In this paper, we study failure time analysis by proposing a joint model for the cohort selection process and the failure process of interest. This allows us to address both independence assumptions and the transportability of study results. It is shown that transportability cannot be guaranteed in the absence of auxiliary information on the population. Conditions that produce dependent selection and types of auxiliary data are discussed and illustrated in numerical studies. The proposed framework is applied to a study of the risk of psoriatic arthritis in persons with psoriasis.

在对队列研究的数据进行生命史分析时，重要的是要解决参与者的识别和选择过程。许多健康研究都是根据个人是否经历过某些健康相关事件（如疾病诊断或疾病引起的某些并发症）来选择或招募个人的。标准的分析方法依赖于对选择的独立性和一个人的前瞻性生活史过程（考虑到其先前的历史）的假设。然而，违反这些假设的情况很常见，而且会对过程特征的估计产生偏差。这对队列研究的内部和外部有效性以及结果在人群中的可迁移性都有影响。在本文中，我们通过提出队列选择过程和相关失效过程的联合模型来研究失效时间分析。这样，我们就能同时解决独立性假设和研究结果的可迁移性问题。研究表明，在没有人口辅助信息的情况下，可迁移性无法得到保证。在数值研究中讨论并说明了产生依赖性选择的条件和辅助数据类型。提出的框架适用于银屑病患者银屑病关节炎风险的研究。

{"title":"Selection processes, transportability, and failure time analysis in life history studies.","authors":"Richard J Cook, Jerald F Lawless","doi":"10.1093/biostatistics/kxae039","DOIUrl":"10.1093/biostatistics/kxae039","url":null,"abstract":"In life history analysis of data from cohort studies, it is important to address the process by which participants are identified and selected. Many health studies select or enrol individuals based on whether they have experienced certain health related events, for example, disease diagnosis or some complication from disease. Standard methods of analysis rely on assumptions concerning the independence of selection and a person's prospective life history process, given their prior history. Violations of such assumptions are common, however, and can bias estimation of process features. This has implications for the internal and external validity of cohort studies, and for the transportabilty of results to a population. In this paper, we study failure time analysis by proposing a joint model for the cohort selection process and the failure process of interest. This allows us to address both independence assumptions and the transportability of study results. It is shown that transportability cannot be guaranteed in the absence of auxiliary information on the population. Conditions that produce dependent selection and types of auxiliary data are discussed and illustrated in numerical studies. The proposed framework is applied to a study of the risk of psoriatic arthritis in persons with psoriasis.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. DifferentialRegulation：一种贝叶斯分层方法，用于识别差异调控基因。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae017

Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, Mark D Robinson

Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.

虽然转录组学数据通常用于分析成熟剪接的 mRNA，但最近的研究重点是联合研究剪接和未剪接（或前体）的 mRNA，这可用于研究基因调控和基因表达生成的变化。然而，大多数剪接/未剪接推断方法（如 RNA 速度工具）都侧重于单个样本，很少允许在样本组（如健康样本与患病样本）之间进行比较。此外，这种推断具有挑战性，因为剪接和非剪接的 mRNA 丰度具有高度的定量不确定性，这是由于多映射读数（即与多个转录本（或基因）兼容的读数，和/或与它们的剪接和非剪接版本兼容的读数）的普遍存在。在此，我们介绍一种贝叶斯分层方法 DifferentialRegulation，用于发现不同实验条件下未剪接 mRNA（相对于总 mRNA）相对丰度的变化。我们通过潜变量方法对量化的不确定性进行建模，将读数分配到其源基因/转录本以及各自的剪接版本。我们设计了几个基准，与最先进的竞争对手相比，我们的方法在灵敏度和误差控制方面表现出色。重要的是，我们的工具非常灵活，既能处理大容量数据，也能处理单细胞 RNA 序列数据。DifferentialRegulation 以 Bioconductor R 软件包的形式发布。

{"title":"DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes.","authors":"Simone Tiberi, Joël Meili, Peiying Cai, Charlotte Soneson, Dongze He, Hirak Sarkar, Alejandra Avalos-Pacheco, Rob Patro, Mark D Robinson","doi":"10.1093/biostatistics/kxae017","DOIUrl":"10.1093/biostatistics/kxae017","url":null,"abstract":"Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1079-1093"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional support vector machine. 功能支持向量机

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae007

Shanghong Xie, R Todd Ogden

Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.

线性和广义线性标量-函数模型常用于了解标量响应变量（如连续、二元结果）与函数预测因子之间的关系。当响应变量和功能预测因子之间的关系很复杂时，这类技术对模型的错误规范很敏感。另一方面，支持向量机（SVM）是最稳健的预测模型之一，但不能考虑重复测量之间的高度相关性，也不能用于不规则数据。在这项工作中，我们提出了一种新方法，将功能主成分分析与 SVM 分类和回归技术相结合，以考虑功能数据的连续性以及标量响应变量与功能预测因子之间的非线性关系。我们通过大量模拟实验和两个真实数据应用证明了我们方法的性能：利用脑电信号对酗酒者进行分类，以及利用近红外反射光谱预测葡萄糖苷浓度。当功能预测因子的测量误差相对较大时，我们的方法尤其更具优势。

{"title":"Functional support vector machine.","authors":"Shanghong Xie, R Todd Ogden","doi":"10.1093/biostatistics/kxae007","DOIUrl":"10.1093/biostatistics/kxae007","url":null,"abstract":"Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1178-1194"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140112299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Projection-based two-sample inference for sparsely observed multivariate functional data. 基于投影的稀疏观测多变量函数数据的双样本推断。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae004

Salil Koner, Sheng Luo

Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.

现代纵向研究收集多种结果作为主要终点，以了解疾病的复杂动态。通常情况下，特别是在临床试验中，多维反应之间的联合变化在评估两个或多个组之间的差异特征方面发挥着重要作用，而不是根据单一结果进行推断。我们开发了一种基于投影的双样本显著性检验，以确定在稀疏纵向设计下观察到的多变量特征之间的群体水平差异。该方法建立在广泛采用的多元函数主成分分析的基础上，以降低无限维多模态函数的维度，同时保留各成分之间的动态相关性。该检验适用于反应的多种（非平稳）协方差结构，而且只需一个 p 值就能检测出显著的组间差异，从而克服了因分别比较各分量的均值而产生的多个 p 值的调整问题。有限样本数值研究表明，与最先进的检验程序相比，该检验保持了 I 型误差，并能有力地检测出显著的组间差异。该检验在两项针对阿尔茨海默病和帕金森病（PD）患者的重要纵向研究中进行，即针对轻度认知障碍高危人群的 TOMMORROW 研究，以检测吡格列酮组和安慰剂组之间认知测试得分的差异；以及 Azillect 研究，以评估拉沙吉兰作为一种潜在治疗方法对延缓帕金森病进展的疗效。

{"title":"Projection-based two-sample inference for sparsely observed multivariate functional data.","authors":"Salil Koner, Sheng Luo","doi":"10.1093/biostatistics/kxae004","DOIUrl":"10.1093/biostatistics/kxae004","url":null,"abstract":"Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1156-1177"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139984624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: Exponential family measurement error models for single-cell CRISPR screens. 更正：单细胞 CRISPR 筛选的指数族测量误差模型。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae022

引用次数: 0

Bayesian semiparametric model for sequential treatment decisions with informative timing. 具有信息时间的序列治疗决策的贝叶斯半参数模型。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxad035

Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy

We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.

我们针对动态治疗规则对小儿急性髓性白血病（AML）患者生存期的影响建立了一个贝叶斯半参数模型。数据由参加 III 期临床试验的患者子集组成，在该试验中，患者依次接受四个疗程的治疗。在每个疗程中，患者接受的治疗可能包括也可能不包括蒽环类药物（ACT）。众所周知，蒽环类药物能有效治疗急性髓细胞白血病，但它也有心脏毒性，可能导致一些患者过早死亡。我们的任务是估算假设的动态 ACT 治疗策略下的潜在生存概率，但这有几个障碍。首先，由于 ACT 不是随机的，它对生存的影响会随着时间的推移而受到干扰。其次，受试者何时开始下一疗程取决于他们何时从上一疗程中康复，这使得时间可能对后续治疗和存活率产生影响。第三，患者可能在完成全部治疗序列之前死亡或退出。我们开发了一种基于伽马过程先验的贝叶斯半参数生成模型来解决这些复杂问题。在每个疗程中，该模型都能连续捕捉受试者向后续治疗或死亡的转变。G 计算用于计算潜在存活概率的后验值，并根据时变混杂因素进行调整。利用我们的方法，我们估算了假设治疗规则的疗效，这些规则根据不断变化的心脏功能动态修改 ACT。

{"title":"Bayesian semiparametric model for sequential treatment decisions with informative timing.","authors":"Arman Oganisian, Kelly D Getz, Todd A Alonzo, Richard Aplenc, Jason A Roy","doi":"10.1093/biostatistics/kxad035","DOIUrl":"10.1093/biostatistics/kxad035","url":null,"abstract":"We develop a Bayesian semiparametric model for the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in a phase III clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT is not randomized, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semiparametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time. G-computation is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using our approach, we estimate the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"947-961"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139479547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mendelian randomization analysis using multiple biomarkers of an underlying common exposure. 利用潜在共同暴露的多种生物标志物进行孟德尔随机分析。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-10-01 DOI: 10.1093/biostatistics/kxae006

Jin Jin, Guanghao Qi, Zhi Yu, Nilanjan Chatterjee

Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.

孟德尔随机化（MR）分析越来越多地用于利用全基因组关联研究的数据测试暴露对疾病结果的因果效应。在某些情况下，潜在的暴露（如系统性炎症）可能无法直接观察到，但可以测量受暴露共同调控的多种生物标志物或其他类型的性状。我们提出了一种对潜在暴露进行磁共振分析（MRLE）的方法，通过利用多个相关性状的信息来测试潜在暴露的显著性和影响方向。该方法是在一个结构方程模型下开发的，在该模型中，假设遗传变异通过潜在暴露产生间接影响，并可能对性状产生直接影响，根据可观察性状的 GWAS 总结关联统计量的二阶矩构建一组估计函数。模拟研究表明，MRLE 具有很好的 I 型误差率控制，在各种类型的多效性条件下，与单性状 MR 检验相比，MRLE 的功率更大。在五个炎症生物标记物（CRP、IL-6、IL-8、TNF-α 和 MCP-1）中使用遗传关联统计的 MRLE 应用提供了炎症对增加冠状动脉疾病、结直肠癌和类风湿性关节炎风险的潜在因果效应的证据，而对单个生物标记物的标准 MR 分析未能检测到此类效应的一致证据。

{"title":"Mendelian randomization analysis using multiple biomarkers of an underlying common exposure.","authors":"Jin Jin, Guanghao Qi, Zhi Yu, Nilanjan Chatterjee","doi":"10.1093/biostatistics/kxae006","DOIUrl":"10.1093/biostatistics/kxae006","url":null,"abstract":"Mendelian randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers or other types of traits that are co-regulated by the exposure. We propose a method for MR analysis on latent exposures (MRLE), which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of GWAS summary association statistics for the observable traits, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies show that MRLE has well-controlled type I error rates and enhanced power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α, and MCP-1) provide evidence for potential causal effects of inflammation on increasing the risk of coronary artery disease, colorectal cancer, and rheumatoid arthritis, while standard MR analysis for individual biomarkers fails to detect consistent evidence for such effects.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1015-1033"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0