Biometrics最新文献_第4页

Wasserstein regression with empirical measures and density estimation for sparse data. 稀疏数据的瓦瑟斯坦回归与经验度量和密度估计。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae127

Yidong Zhou, Hans-Georg Müller

The problem of modeling the relationship between univariate distributions and one or more explanatory variables lately has found increasing interest. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with currently available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for each distribution varies and especially when it is small for some of the distributions but large for others. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations and Environmental Influences on Child Health Outcomes data.

建立单变量分布与一个或多个解释变量之间关系的模型问题近来越来越受到关注。现有的方法是用替代估计分布来替代通常未知的响应分布。这些估计值是从现有数据中获得的，但当某些分布只有少量数据时，就会出现问题。这种情况在实践中很常见，目前可用的方法无法解决，尤其是当我们以密度估计为目标时。我们展示了在有协变量的情况下，如何避免这种情况以及与密度估计相关的其他问题，如调整参数选择和偏差问题。我们还介绍了基于经验测量的分布-响应回归的新版本。通过避免恢复完整个体响应分布的预处理步骤，所提出的方法适用于每种分布的可用样本量不同的情况，尤其是当某些分布的样本量较小，而另一些分布的样本量较大时。在这种情况下，即使对于只有少量数据的分布，也可以通过获得整个分布样本的强度来获得一致的分布估计值，而单独估计分布或密度的传统方法则会失败，因为稀疏采样的密度无法得到一致的估计值。通过模拟和环境对儿童健康结果的影响数据，证明了所提出的模型优于现有方法。

{"title":"Wasserstein regression with empirical measures and density estimation for sparse data.","authors":"Yidong Zhou, Hans-Georg Müller","doi":"10.1093/biomtc/ujae127","DOIUrl":"https://doi.org/10.1093/biomtc/ujae127","url":null,"abstract":"The problem of modeling the relationship between univariate distributions and one or more explanatory variables lately has found increasing interest. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with currently available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for each distribution varies and especially when it is small for some of the distributions but large for others. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations and Environmental Influences on Child Health Outcomes data.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A formal goodness-of-fit test for spatial binary Markov random field models. 空间二元马尔可夫随机场模型的正式拟合优度检验。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae119

Eva Biswas, Andee Kaplan, Mark S Kaiser, Daniel J Nordman

Binary spatial observations arise in environmental and ecological studies, where Markov random field (MRF) models are often applied. Despite the prevalence and the long history of MRF models for spatial binary data, appropriate model diagnostics have remained an unresolved issue in practice. A complicating factor is that such models involve neighborhood specifications, which are difficult to assess for binary data. To address this, we propose a formal goodness-of-fit (GOF) test for diagnosing an MRF model for spatial binary values. The test statistic involves a type of conditional Moran's I based on the fitted conditional probabilities, which can detect departures in model form, including neighborhood structure. Numerical studies show that the GOF test can perform well in detecting deviations from a null model, with a focus on neighborhoods as a difficult issue. We illustrate the spatial test with an application to Besag's historical endive data as well as the breeding pattern of grasshopper sparrows across Iowa.

在环境和生态研究中会出现二元空间观测数据，马尔可夫随机场（MRF）模型经常被应用。尽管马尔可夫随机场模型在空间二元数据中的应用非常普遍，而且历史悠久，但在实践中，适当的模型诊断仍是一个悬而未决的问题。一个复杂的因素是，这类模型涉及邻域规范，很难对二进制数据进行评估。为了解决这个问题，我们提出了一种正式的拟合优度（GOF）检验，用于诊断空间二进制值的 MRF 模型。该检验统计量涉及一种基于拟合条件概率的条件莫兰 I，它可以检测模型形式的偏离，包括邻域结构。数值研究表明，GOF 检验能很好地检测出与空模型的偏离，其中邻域是一个难点。我们将空间检验应用于贝萨格的苣荬菜历史数据以及爱荷华州各地蚱蜢麻雀的繁殖模式，以此来说明空间检验。

引用次数: 0

Case-crossover designs and overdispersion with application to air pollution epidemiology. 病例交叉设计和过度分散在空气污染流行病学中的应用。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae117

Samuel Perreault, Gracia Y Dong, Alex Stringer, Hwashin Shin, Patrick E Brown

Over the last three decades, case-crossover designs have found many applications in health sciences, especially in air pollution epidemiology. They are typically used, in combination with partial likelihood techniques, to define a conditional logistic model for the responses, usually health outcomes, conditional on the exposures. Despite the fact that conditional logistic models have been shown equivalent, in typical air pollution epidemiology setups, to specific instances of the well-known Poisson time series model, it is often claimed that they cannot allow for overdispersion. This paper clarifies the relationship between case-crossover designs, the models that ensue from their use, and overdispersion. In particular, we propose to relax the assumption of independence between individuals traditionally made in case-crossover analyses, in order to explicitly introduce overdispersion in the conditional logistic model. As we show, the resulting overdispersed conditional logistic model coincides with the overdispersed, conditional Poisson model, in the sense that their likelihoods are simple re-expressions of one another. We further provide the technical details of a Bayesian implementation of the proposed case-crossover model, which we use to demonstrate, by means of a large simulation study, that standard case-crossover models can lead to dramatically underestimated coverage probabilities, while the proposed models do not. We also perform an illustrative analysis of the association between air pollution and morbidity in Toronto, Canada, which shows that the proposed models are more robust than standard ones to outliers such as those associated with public holidays.

在过去的三十年中，病例交叉设计在健康科学中得到了广泛应用，尤其是在空气污染流行病学中。它们通常与部分似然法技术相结合，用于定义以暴露为条件的反应（通常是健康结果）的条件逻辑模型。尽管在典型的空气污染流行病学设置中，条件 logistic 模型已被证明等同于著名的泊松时间序列模型的具体实例，但人们经常声称它们无法考虑过度分散。本文澄清了病例交叉设计、使用病例交叉设计所产生的模型与过度分散之间的关系。特别是，我们建议放宽个案交叉分析中传统的个体间独立性假设，以便在条件逻辑模型中明确引入过度分散。正如我们所展示的，由此产生的过度分散条件 logistic 模型与过度分散条件泊松模型相吻合，从这个意义上说，它们的似然值是彼此的简单再表达。我们进一步提供了贝叶斯法实现所提出的病例交叉模型的技术细节，并通过一项大型模拟研究证明，标准病例交叉模型会导致覆盖概率被严重低估，而所提出的模型不会。我们还对加拿大多伦多的空气污染与发病率之间的关系进行了说明性分析，结果表明，与标准模型相比，拟议模型对异常值（如与公共节假日相关的异常值）更稳健。

{"title":"Case-crossover designs and overdispersion with application to air pollution epidemiology.","authors":"Samuel Perreault, Gracia Y Dong, Alex Stringer, Hwashin Shin, Patrick E Brown","doi":"10.1093/biomtc/ujae117","DOIUrl":"https://doi.org/10.1093/biomtc/ujae117","url":null,"abstract":"Over the last three decades, case-crossover designs have found many applications in health sciences, especially in air pollution epidemiology. They are typically used, in combination with partial likelihood techniques, to define a conditional logistic model for the responses, usually health outcomes, conditional on the exposures. Despite the fact that conditional logistic models have been shown equivalent, in typical air pollution epidemiology setups, to specific instances of the well-known Poisson time series model, it is often claimed that they cannot allow for overdispersion. This paper clarifies the relationship between case-crossover designs, the models that ensue from their use, and overdispersion. In particular, we propose to relax the assumption of independence between individuals traditionally made in case-crossover analyses, in order to explicitly introduce overdispersion in the conditional logistic model. As we show, the resulting overdispersed conditional logistic model coincides with the overdispersed, conditional Poisson model, in the sense that their likelihoods are simple re-expressions of one another. We further provide the technical details of a Bayesian implementation of the proposed case-crossover model, which we use to demonstrate, by means of a large simulation study, that standard case-crossover models can lead to dramatically underestimated coverage probabilities, while the proposed models do not. We also perform an illustrative analysis of the association between air pollution and morbidity in Toronto, Canada, which shows that the proposed models are more robust than standard ones to outliers such as those associated with public holidays.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A hierarchical random effects state-space model for modeling brain activities from electroencephalogram data. 根据脑电图数据建立大脑活动模型的分层随机效应状态空间模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae130

Xingche Guo, Bin Yang, Ji Meng Loh, Qinxia Wang, Yuanjia Wang

Mental disorders present challenges in diagnosis and treatment due to their complex and heterogeneous nature. Electroencephalogram (EEG) has shown promise as a source of potential biomarkers for these disorders. However, existing methods for analyzing EEG signals have limitations in addressing heterogeneity and capturing complex brain activity patterns between regions. This paper proposes a novel random effects state-space model (RESSM) for analyzing large-scale multi-channel resting-state EEG signals, accounting for the heterogeneity of brain connectivities between groups and individual subjects. We incorporate multi-level random effects for temporal dynamical and spatial mapping matrices and address non-stationarity so that the brain connectivity patterns can vary over time. The model is fitted under a Bayesian hierarchical model framework coupled with a Gibbs sampler. Compared to previous mixed-effects state-space models, we directly model high-dimensional random effects matrices of interest without structural constraints and tackle the challenge of identifiability. Through extensive simulation studies, we demonstrate that our approach yields valid estimation and inference. We apply RESSM to a multi-site clinical trial of major depressive disorder (MDD). Our analysis uncovers significant differences in resting-state brain temporal dynamics among MDD patients compared to healthy individuals. In addition, we show the subject-level EEG features derived from RESSM exhibit a superior predictive value for the heterogeneous treatment effect compared to the EEG frequency band power, suggesting the potential of EEG as a valuable biomarker for MDD.

精神疾病因其复杂性和异质性，给诊断和治疗带来了挑战。脑电图（EEG）有望成为这些疾病的潜在生物标记物来源。然而，现有的脑电信号分析方法在处理异质性和捕捉区域间复杂的大脑活动模式方面存在局限性。本文提出了一种新颖的随机效应状态空间模型（RESSM），用于分析大规模多通道静息态脑电图信号，并考虑到组间和单个受试者之间大脑连接性的异质性。我们为时间动态矩阵和空间映射矩阵加入了多级随机效应，并解决了非稳态问题，从而使大脑连接模式随时间而变化。该模型在贝叶斯层次模型框架下与吉布斯采样器相结合进行拟合。与以往的混合效应状态空间模型相比，我们直接对高维随机效应矩阵进行建模，无需结构约束，并解决了可识别性的难题。通过大量的模拟研究，我们证明了我们的方法能产生有效的估计和推断。我们将 RESSM 应用于重度抑郁障碍（MDD）的多地点临床试验。我们的分析发现，与健康人相比，MDD 患者的大脑静息态时间动态存在显著差异。此外，我们还表明，与脑电图频带功率相比，RESSM 得出的受试者级脑电图特征对异质性治疗效果具有更高的预测价值，这表明脑电图有可能成为治疗 MDD 的重要生物标志物。

{"title":"A hierarchical random effects state-space model for modeling brain activities from electroencephalogram data.","authors":"Xingche Guo, Bin Yang, Ji Meng Loh, Qinxia Wang, Yuanjia Wang","doi":"10.1093/biomtc/ujae130","DOIUrl":"10.1093/biomtc/ujae130","url":null,"abstract":"Mental disorders present challenges in diagnosis and treatment due to their complex and heterogeneous nature. Electroencephalogram (EEG) has shown promise as a source of potential biomarkers for these disorders. However, existing methods for analyzing EEG signals have limitations in addressing heterogeneity and capturing complex brain activity patterns between regions. This paper proposes a novel random effects state-space model (RESSM) for analyzing large-scale multi-channel resting-state EEG signals, accounting for the heterogeneity of brain connectivities between groups and individual subjects. We incorporate multi-level random effects for temporal dynamical and spatial mapping matrices and address non-stationarity so that the brain connectivity patterns can vary over time. The model is fitted under a Bayesian hierarchical model framework coupled with a Gibbs sampler. Compared to previous mixed-effects state-space models, we directly model high-dimensional random effects matrices of interest without structural constraints and tackle the challenge of identifiability. Through extensive simulation studies, we demonstrate that our approach yields valid estimation and inference. We apply RESSM to a multi-site clinical trial of major depressive disorder (MDD). Our analysis uncovers significant differences in resting-state brain temporal dynamics among MDD patients compared to healthy individuals. In addition, we show the subject-level EEG features derived from RESSM exhibit a superior predictive value for the heterogeneous treatment effect compared to the EEG frequency band power, suggesting the potential of EEG as a valuable biomarker for MDD.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An exploratory penalized regression to identify combined effects of temporal variables-application to agri-environmental issues. 用于确定时间变量综合效应的探索性惩罚回归--应用于农业环境问题。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae134

Bénedicte Fontez, Patrice Loisel, Thierry Simonneau, Nadine Hilgert

The development of sensors is opening new avenues in several fields of activity. Concerning agricultural crops, complex combinations of agri-environmental dynamics, such as soil and climate variables, are now commonly recorded. These new kinds of measurements are an opportunity to improve knowledge of the drivers of crop yield and crop quality at harvest. This involves renewing statistical approaches to account for the combined variations of these dynamic variables, here considered as temporal variables. The objective of the paper is to estimate an interpretable model to study the influence of the two combined inputs on a scalar output. A Sparse and Structured Procedure is proposed to Identify Combined Effects of Formatted temporal Predictors, hereafter denoted S piceFP. The method is based on the transformation of both temporal variables into categorical variables by defining joint modalities, from which a collection of multiple regression models is then derived. The regressors are the frequencies associated with joint class intervals. The class intervals and related regression coefficients are determined using a generalized fused lasso. S piceFP is a generic and exploratory approach. The simulations we performed show that it is flexible enough to select the non-null or influential modalities of values. A motivating example for grape quality is presented.

传感器的发展为多个活动领域开辟了新的途径。在农作物方面，土壤和气候变量等农业环境动态的复杂组合现在已被普遍记录下来。这些新的测量手段为我们提供了一个机会，可以更好地了解作物产量和收获时作物质量的驱动因素。这就需要更新统计方法，以考虑这些动态变量的综合变化，在此将其视为时间变量。本文的目的是估算一个可解释的模型，以研究这两个综合输入对标量输出的影响。本文提出了一种稀疏和结构化程序来识别格式化时间预测因子的组合效应，以下简称 S piceFP。该方法的基础是通过定义联合模式将两个时间变量转换为分类变量，然后从中导出一系列多元回归模型。回归因子是与联合类别区间相关的频率。类区间和相关回归系数是通过广义融合套索确定的。S piceFP 是一种通用的探索性方法。我们进行的模拟显示，它在选择非空或有影响的数值模式时具有足够的灵活性。我们以葡萄质量为例进行了说明。

{"title":"An exploratory penalized regression to identify combined effects of temporal variables-application to agri-environmental issues.","authors":"Bénedicte Fontez, Patrice Loisel, Thierry Simonneau, Nadine Hilgert","doi":"10.1093/biomtc/ujae134","DOIUrl":"https://doi.org/10.1093/biomtc/ujae134","url":null,"abstract":"The development of sensors is opening new avenues in several fields of activity. Concerning agricultural crops, complex combinations of agri-environmental dynamics, such as soil and climate variables, are now commonly recorded. These new kinds of measurements are an opportunity to improve knowledge of the drivers of crop yield and crop quality at harvest. This involves renewing statistical approaches to account for the combined variations of these dynamic variables, here considered as temporal variables. The objective of the paper is to estimate an interpretable model to study the influence of the two combined inputs on a scalar output. A Sparse and Structured Procedure is proposed to Identify Combined Effects of Formatted temporal Predictors, hereafter denoted S piceFP. The method is based on the transformation of both temporal variables into categorical variables by defining joint modalities, from which a collection of multiple regression models is then derived. The regressors are the frequencies associated with joint class intervals. The class intervals and related regression coefficients are determined using a generalized fused lasso. S piceFP is a generic and exploratory approach. The simulations we performed show that it is flexible enough to select the non-null or influential modalities of values. A motivating example for grape quality is presented.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142692652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive randomization methods for sequential multiple assignment randomized trials (smarts) via thompson sampling. 基于汤普森抽样的顺序多任务随机试验的自适应随机化方法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae152

Peter Norwood, Marie Davidian, Eric Laber

Response-adaptive randomization (RAR) has been studied extensively in conventional, single-stage clinical trials, where it has been shown to yield ethical and statistical benefits, especially in trials with many treatment arms. However, RAR and its potential benefits are understudied in sequential multiple assignment randomized trials (SMARTs), which are the gold-standard trial design for evaluation of multi-stage treatment regimes. We propose a suite of RAR algorithms for SMARTs based on Thompson Sampling (TS), a widely used RAR method in single-stage trials in which treatment randomization probabilities are aligned with the estimated probability that the treatment is optimal. We focus on two common objectives in SMARTs: (1) comparison of the regimes embedded in the trial and (2) estimation of an optimal embedded regime. We develop valid post-study inferential procedures for treatment regimes under the proposed algorithms. This is nontrivial, as even in single-stage settings standard estimators of an average treatment effect can have nonnormal asymptotic behavior under RAR. Our algorithms are the first for RAR in multi-stage trials that account for non-standard limiting behavior due to RAR. Empirical studies based on real-world SMARTs show that TS can improve in-trial subject outcomes without sacrificing efficiency for post-trial comparisons.

在传统的单阶段临床试验中，对反应自适应随机化（RAR）进行了广泛的研究，结果表明它在伦理和统计方面都有好处，尤其是在有许多治疗臂的试验中。然而，在顺序多重分配随机试验（SMARTs）中，RAR 及其潜在优势却未得到充分研究，而顺序多重分配随机试验是评估多阶段治疗方案的黄金标准试验设计。我们基于汤普森抽样（Thompson Sampling，TS）提出了一套适用于 SMART 的 RAR 算法，汤普森抽样是单阶段试验中广泛使用的一种 RAR 方法，在单阶段试验中，治疗随机化概率与估计的最佳治疗概率一致。我们关注 SMART 的两个共同目标：(1) 比较试验中的嵌入制度；(2) 估算最佳嵌入制度。我们根据所提出的算法，为治疗方案制定了有效的研究后推论程序。这并非易事，因为即使在单阶段设置中，平均治疗效果的标准估计值在 RAR 下也可能具有非正态性渐近行为。我们的算法是首个考虑到 RAR 导致的非标准限制行为的多阶段试验 RAR 算法。基于真实世界 SMART 的实证研究表明，TS 可以在不牺牲试验后比较效率的情况下改善试验中的受试者结果。

{"title":"Adaptive randomization methods for sequential multiple assignment randomized trials (smarts) via thompson sampling.","authors":"Peter Norwood, Marie Davidian, Eric Laber","doi":"10.1093/biomtc/ujae152","DOIUrl":"10.1093/biomtc/ujae152","url":null,"abstract":"Response-adaptive randomization (RAR) has been studied extensively in conventional, single-stage clinical trials, where it has been shown to yield ethical and statistical benefits, especially in trials with many treatment arms. However, RAR and its potential benefits are understudied in sequential multiple assignment randomized trials (SMARTs), which are the gold-standard trial design for evaluation of multi-stage treatment regimes. We propose a suite of RAR algorithms for SMARTs based on Thompson Sampling (TS), a widely used RAR method in single-stage trials in which treatment randomization probabilities are aligned with the estimated probability that the treatment is optimal. We focus on two common objectives in SMARTs: (1) comparison of the regimes embedded in the trial and (2) estimation of an optimal embedded regime. We develop valid post-study inferential procedures for treatment regimes under the proposed algorithms. This is nontrivial, as even in single-stage settings standard estimators of an average treatment effect can have nonnormal asymptotic behavior under RAR. Our algorithms are the first for RAR in multi-stage trials that account for non-standard limiting behavior due to RAR. Empirical studies based on real-world SMARTs show that TS can improve in-trial subject outcomes without sacrificing efficiency for post-trial comparisons.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An efficient joint model for high dimensional longitudinal and survival data via generic association features. 基于通用关联特征的高维纵向数据和生存数据的高效联合模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae149

Van Tuan Nguyen, Adeline Fermanian, Antoine Barbieri, Sarah Zohar, Anne-Sophie Jannot, Simon Bussy, Agathe Guilloux

This paper introduces a prognostic method called FLASH that addresses the problem of joint modeling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularization techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalized medicine or churn prediction. We develop an estimation methodology based on the expectation-maximization algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available medical datasets. Our method significantly outperforms the state-of-the-art joint models in terms of C-index in a so-called "real-time" prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical point of view, making it interpretable, which is of the greatest importance for a prognostic algorithm in healthcare.

本文介绍了一种名为 "FLASH "的预后方法，它可以解决在有大量纵向特征和时间无关特征的情况下，对纵向数据和删减持续时间进行联合建模的问题。在文献中，标准的联合模型要么是共享随机效应模型，要么是联合潜类模型。结合这两个领域的思想并使用适当的正则化技术，我们定义了一种新模型，它能够在高维背景下自动识别重要的预后纵向特征，这在个性化医疗或流失预测等许多领域越来越重要。我们开发了一种基于期望最大化算法的估计方法，并提供了一种高效的实现方法。该方法的统计性能在大量蒙特卡罗模拟研究和公开医疗数据集上都得到了验证。在所谓的 "实时 "预测环境下，我们的方法在 C 指数方面明显优于最先进的联合模型，计算速度比其他竞争方法快了几个数量级。此外，我们的模型还能自动识别与实际情况相关的重要特征，使其具有可解释性，这对医疗预后算法来说至关重要。

{"title":"An efficient joint model for high dimensional longitudinal and survival data via generic association features.","authors":"Van Tuan Nguyen, Adeline Fermanian, Antoine Barbieri, Sarah Zohar, Anne-Sophie Jannot, Simon Bussy, Agathe Guilloux","doi":"10.1093/biomtc/ujae149","DOIUrl":"https://doi.org/10.1093/biomtc/ujae149","url":null,"abstract":"This paper introduces a prognostic method called FLASH that addresses the problem of joint modeling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularization techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalized medicine or churn prediction. We develop an estimation methodology based on the expectation-maximization algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available medical datasets. Our method significantly outperforms the state-of-the-art joint models in terms of C-index in a so-called \"real-time\" prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical point of view, making it interpretable, which is of the greatest importance for a prognostic algorithm in healthcare.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Debiased high-dimensional regression calibration for errors-in-variables log-contrast models. 变量误差对数对比模型的去偏高维回归校正。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae153

Huali Zhao, Tianying Wang

Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.

在分析肠道微生物组和宏基因组数据的挑战的激励下，本工作旨在解决涉及组成协变量的高维回归模型中的测量误差问题。本文标志着对受测量错误或污染数据影响的高维成分数据进行统计推断的开创性努力。我们介绍了一种为线性对数对比模型量身定制的校准方法。在相对宽松的参数稀疏性条件下，我们建立了推理估计量的渐近正态性。数值实验和在微生物组研究中的应用证明了我们的高维校准策略在最小化偏差和实现置信区间的预期覆盖率方面的有效性。此外，我们提出的方法的潜在应用远远超出了成分数据，表明它对广泛的研究背景的适应性。

引用次数: 0

Acknowledgment of Referees 2024. 承认裁判2024年。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae162

引用次数: 0

Semi-parametric sensitivity analysis for trials with irregular and informative assessment times. 评估时间不规则且信息丰富的试验的半参数敏感性分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae154

Bonnie B Smith, Yujing Gao, Shu Yang, Ravi Varadhan, Andrea J Apter, Daniel O Scharfstein

Many trials are designed to collect outcomes at or around pre-specified times after randomization. If there is variability in the times when participants are actually assessed, this can pose a challenge to learning the effect of treatment, since not all participants have outcome assessments at the times of interest. Furthermore, observed outcome values may not be representative of all participants' outcomes at a given time. Methods have been developed that account for some types of such irregular and informative assessment times; however, since these methods rely on untestable assumptions, sensitivity analyses are needed. We develop a sensitivity analysis methodology that is benchmarked at the explainable assessment (EA) assumption, under which assessment and outcomes at each time are related only through data collected prior to that time. Our method uses an exponential tilting assumption, governed by a sensitivity analysis parameter, that posits deviations from the EA assumption. Our inferential strategy is based on a new influence function-based, augmented inverse intensity-weighted estimator. Our approach allows for flexible semiparametric modeling of the observed data, which is separated from specification of the sensitivity parameter. We apply our method to a randomized trial of low-income individuals with uncontrolled asthma, and we illustrate implementation of our estimation procedure in detail.

许多试验的目的是在随机化后的预定时间或前后收集结果。如果参与者实际接受评估的时间存在差异，这可能对了解治疗效果构成挑战，因为并非所有参与者都在感兴趣的时间接受结果评估。此外，观察到的结果值可能不能代表所有参与者在给定时间的结果。已经开发了一些方法来解释某些类型的这种不规则和信息性的评估时间；然而，由于这些方法依赖于不可检验的假设，因此需要进行敏感性分析。我们开发了一种敏感性分析方法，该方法以可解释评估（EA）假设为基准，在该假设下，每次的评估和结果仅通过在该时间之前收集的数据相关联。我们的方法使用指数倾斜假设，由敏感性分析参数控制，假设偏离EA假设。我们的推理策略基于一种新的基于影响函数的增广逆强度加权估计器。我们的方法允许对观测数据进行灵活的半参数建模，这与灵敏度参数的规格分离。我们将我们的方法应用于一项低收入哮喘患者的随机试验，并详细说明了我们的估计程序的实施。

{"title":"Semi-parametric sensitivity analysis for trials with irregular and informative assessment times.","authors":"Bonnie B Smith, Yujing Gao, Shu Yang, Ravi Varadhan, Andrea J Apter, Daniel O Scharfstein","doi":"10.1093/biomtc/ujae154","DOIUrl":"10.1093/biomtc/ujae154","url":null,"abstract":"Many trials are designed to collect outcomes at or around pre-specified times after randomization. If there is variability in the times when participants are actually assessed, this can pose a challenge to learning the effect of treatment, since not all participants have outcome assessments at the times of interest. Furthermore, observed outcome values may not be representative of all participants' outcomes at a given time. Methods have been developed that account for some types of such irregular and informative assessment times; however, since these methods rely on untestable assumptions, sensitivity analyses are needed. We develop a sensitivity analysis methodology that is benchmarked at the explainable assessment (EA) assumption, under which assessment and outcomes at each time are related only through data collected prior to that time. Our method uses an exponential tilting assumption, governed by a sensitivity analysis parameter, that posits deviations from the EA assumption. Our inferential strategy is based on a new influence function-based, augmented inverse intensity-weighted estimator. Our approach allows for flexible semiparametric modeling of the observed data, which is separated from specification of the sensitivity parameter. We apply our method to a randomized trial of low-income individuals with uncontrolled asthma, and we illustrate implementation of our estimation procedure in detail.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0