Biometrics最新文献_第5页

Changepoint detection on daily home activity pattern: a sliced Poisson process method. 日常居家活动模式的变化点检测：一种切片泊松过程方法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae114

Israel Martínez-Hernández, Rebecca Killick

The problem of health and care of people is being revolutionized. An important component of that revolution is disease prevention and health improvement from home. A natural approach to the health problem is monitoring changes in people's behavior or activities. These changes can be indicators of potential health problems. However, due to a person's daily pattern, changes will be observed throughout each day, with, eg, an increase of events around meal times and fewer events during the night. We do not wish to detect such within-day changes but rather changes in the daily behavior pattern from one day to the next. To this end, we assume the set of event times within a given day as a single observation. We model this observation as the realization of an inhomogeneous Poisson process where the rate function can vary with the time of day. Then, we propose to detect changes in the sequence of inhomogeneous Poisson processes. This approach is appropriate for many phenomena, particularly for home activity data. Our methodology is evaluated on simulated data. Overall, our approach uses local change information to detect changes across days. At the same time, it allows us to visualize and interpret the results, changes, and trends over time, allowing the detection of potential health decline.

人们的健康和护理问题正在发生革命性的变化。这场革命的一个重要组成部分就是在家预防疾病和改善健康。解决健康问题的一个自然方法是监测人们行为或活动的变化。这些变化可能是潜在健康问题的指标。然而，由于一个人的日常模式，每天都会观察到变化，例如，用餐时间前后的活动会增加，而夜间的活动会减少。我们不希望检测这种日内变化，而是希望检测从一天到第二天的日常行为模式的变化。为此，我们将给定一天内的事件时间集合假定为一个观测值。我们将此观察结果建模为非均质泊松过程的实现，其中速率函数可随一天中的时间而变化。然后，我们建议检测不均匀泊松过程序列的变化。这种方法适用于许多现象，特别是家庭活动数据。我们的方法在模拟数据上进行了评估。总的来说，我们的方法利用局部变化信息来检测跨天的变化。同时，它还能让我们直观地解读结果、变化和随时间变化的趋势，从而发现潜在的健康下降问题。

{"title":"Changepoint detection on daily home activity pattern: a sliced Poisson process method.","authors":"Israel Martínez-Hernández, Rebecca Killick","doi":"10.1093/biomtc/ujae114","DOIUrl":"https://doi.org/10.1093/biomtc/ujae114","url":null,"abstract":"The problem of health and care of people is being revolutionized. An important component of that revolution is disease prevention and health improvement from home. A natural approach to the health problem is monitoring changes in people's behavior or activities. These changes can be indicators of potential health problems. However, due to a person's daily pattern, changes will be observed throughout each day, with, eg, an increase of events around meal times and fewer events during the night. We do not wish to detect such within-day changes but rather changes in the daily behavior pattern from one day to the next. To this end, we assume the set of event times within a given day as a single observation. We model this observation as the realization of an inhomogeneous Poisson process where the rate function can vary with the time of day. Then, we propose to detect changes in the sequence of inhomogeneous Poisson processes. This approach is appropriate for many phenomena, particularly for home activity data. Our methodology is evaluated on simulated data. Overall, our approach uses local change information to detect changes across days. At the same time, it allows us to visualize and interpret the results, changes, and trends over time, allowing the detection of potential health decline.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional generalized canonical correlation analysis for studying multiple longitudinal variables. 用于研究多个纵向变量的功能广义典型相关分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae113

Lucas Sort, Laurent Le Brusquet, Arthur Tenenhaus

In this paper, we introduce functional generalized canonical correlation analysis, a new framework for exploring associations between multiple random processes observed jointly. The framework is based on the multiblock regularized generalized canonical correlation analysis framework. It is robust to sparsely and irregularly observed data, making it applicable in many settings. We establish the monotonic property of the solving procedure and introduce a Bayesian approach for estimating canonical components. We propose an extension of the framework that allows the integration of a univariate or multivariate response into the analysis, paving the way for predictive applications. We evaluate the method's efficiency in simulation studies and present a use case on a longitudinal dataset.

在本文中，我们介绍了功能广义典范相关分析，这是一种探索联合观测的多个随机过程之间关联的新框架。该框架基于多块正则化广义典范相关分析框架。它对稀疏和不规则观测数据具有鲁棒性，因此适用于多种环境。我们建立了求解过程的单调性，并引入了一种贝叶斯方法来估计典型成分。我们提出了框架的扩展，允许将单变量或多变量响应纳入分析，为预测应用铺平了道路。我们在模拟研究中评估了该方法的效率，并介绍了一个纵向数据集的使用案例。

引用次数: 0

Bayesian inference for group-level cortical surface image-on-scalar regression with Gaussian process priors. 采用高斯过程先验的群体级皮层表面图像标度回归的贝叶斯推断。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae116

Andrew S Whiteman, Timothy D Johnson, Jian Kang

In regression-based analyses of group-level neuroimage data, researchers typically fit a series of marginal general linear models to image outcomes at each spatially referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, the resulting inference can be poorly calibrated. Spatial modeling of effects of interest leads to more powerful analyses; however, the number of locations in a typical neuroimage can preclude standard computing methods in this setting. Here, we contribute a Bayesian spatial regression model for group-level neuroimaging analyses. We induce regularization of spatially varying regression coefficient functions through Gaussian process priors. When combined with a simple non-stationary model for the error process, our prior hierarchy can lead to more data-adaptive smoothing than standard methods. We achieve computational tractability through a Vecchia-type approximation of our prior that retains full spatial rank and can be constructed for a wide class of spatial correlation functions. We outline several ways to work with our model in practice and compare performance against standard vertex-wise analyses and several alternatives. Finally, we illustrate our methods in an analysis of cortical surface functional magnetic resonance imaging task contrast data from a large cohort of children enrolled in the adolescent brain cognitive development study.

在基于回归的组级神经图像数据分析中，研究人员通常会对每个空间参照像素的图像结果拟合一系列边际一般线性模型。在预处理过程中，通常会通过对数据进行空间平滑处理来间接诱导相关效应的空间正则化。虽然这种方法通常效果很好，但由此产生的推论可能校准不佳。对感兴趣的效应进行空间建模能带来更强大的分析；然而，典型神经图像中的位置数量可能会妨碍这种情况下的标准计算方法。在这里，我们为组级神经影像分析提供了一个贝叶斯空间回归模型。我们通过高斯过程先验对空间变化的回归系数函数进行正则化。当与误差过程的简单非平稳模型相结合时，我们的先验层次结构能带来比标准方法更多的数据适应性平滑。我们通过 Vecchia 类型的先验近似实现了计算的可操作性，这种近似保留了完整的空间秩，并可为多种空间相关函数构建。我们概述了在实践中使用我们的模型的几种方法，并与标准顶点分析和几种替代方法进行了性能比较。最后，我们通过分析参加青少年大脑认知发展研究的一大批儿童的皮层表面功能磁共振成像任务对比数据来说明我们的方法。

{"title":"Bayesian inference for group-level cortical surface image-on-scalar regression with Gaussian process priors.","authors":"Andrew S Whiteman, Timothy D Johnson, Jian Kang","doi":"10.1093/biomtc/ujae116","DOIUrl":"10.1093/biomtc/ujae116","url":null,"abstract":"In regression-based analyses of group-level neuroimage data, researchers typically fit a series of marginal general linear models to image outcomes at each spatially referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, the resulting inference can be poorly calibrated. Spatial modeling of effects of interest leads to more powerful analyses; however, the number of locations in a typical neuroimage can preclude standard computing methods in this setting. Here, we contribute a Bayesian spatial regression model for group-level neuroimaging analyses. We induce regularization of spatially varying regression coefficient functions through Gaussian process priors. When combined with a simple non-stationary model for the error process, our prior hierarchy can lead to more data-adaptive smoothing than standard methods. We achieve computational tractability through a Vecchia-type approximation of our prior that retains full spatial rank and can be constructed for a wide class of spatial correlation functions. We outline several ways to work with our model in practice and compare performance against standard vertex-wise analyses and several alternatives. Finally, we illustrate our methods in an analysis of cortical surface functional magnetic resonance imaging task contrast data from a large cohort of children enrolled in the adolescent brain cognitive development study.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11518852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Likelihood adaptively incorporated external aggregate information with uncertainty for survival data. 概率自适应地将外部总体信息与生存数据的不确定性结合起来。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae120

Ziqi Chen, Yu Shen, Jing Qin, Jing Ning

Population-based cancer registry databases are critical resources to bridge the information gap that results from a lack of sufficient statistical power from primary cohort data with small to moderate sample size. Although comprehensive data associated with tumor biomarkers often remain either unavailable or inconsistently measured in these registry databases, aggregate survival information sourced from these repositories has been well documented and publicly accessible. An appealing option is to integrate the aggregate survival information from the registry data with the primary cohort to enhance the evaluation of treatment impacts or prediction of survival outcomes across distinct tumor subtypes. Nevertheless, for rare types of cancer, even the sample sizes of cancer registries remain modest. The variability linked to the aggregated statistics could be non-negligible compared with the sample variation of the primary cohort. In response, we propose an externally informed likelihood approach, which facilitates the linkage between the primary cohort and external aggregate data, with consideration of the variation from aggregate information. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. Through the application of our proposed method, we integrate data from the cohort of inflammatory breast cancer (IBC) patients at the University of Texas MD Anderson Cancer Center with aggregate survival data from the National Cancer Data Base, enabling us to appraise the effect of tri-modality treatment on survival across various tumor subtypes of IBC.

基于人群的癌症登记数据库是弥合信息差距的重要资源，而信息差距是由于样本量小到中等的原始队列数据缺乏足够的统计能力造成的。虽然与肿瘤生物标记物相关的综合数据在这些登记数据库中往往无法获得或测量结果不一致，但从这些资料库中获得的总体生存信息已被详细记录并可公开获取。一个吸引人的选择是将登记数据中的总体生存信息与原始队列整合起来，以加强对不同肿瘤亚型的治疗效果评估或生存结果预测。然而，对于罕见类型的癌症，即使是癌症登记处的样本量也仍然不大。与原始队列的样本变异相比，与汇总统计相关的变异可能是不可忽略的。为此，我们提出了一种外部知情似然法，这种方法有助于将原始队列和外部总体数据联系起来，并考虑到总体信息的变异。我们建立了估计器的渐近特性，并通过模拟研究评估了有限样本的性能。通过应用我们提出的方法，我们将得克萨斯大学 MD 安德森癌症中心的炎性乳腺癌（IBC）患者队列数据与国家癌症数据库的总体生存数据进行了整合，从而评估了三模式治疗对不同肿瘤亚型 IBC 患者生存的影响。

{"title":"Likelihood adaptively incorporated external aggregate information with uncertainty for survival data.","authors":"Ziqi Chen, Yu Shen, Jing Qin, Jing Ning","doi":"10.1093/biomtc/ujae120","DOIUrl":"10.1093/biomtc/ujae120","url":null,"abstract":"Population-based cancer registry databases are critical resources to bridge the information gap that results from a lack of sufficient statistical power from primary cohort data with small to moderate sample size. Although comprehensive data associated with tumor biomarkers often remain either unavailable or inconsistently measured in these registry databases, aggregate survival information sourced from these repositories has been well documented and publicly accessible. An appealing option is to integrate the aggregate survival information from the registry data with the primary cohort to enhance the evaluation of treatment impacts or prediction of survival outcomes across distinct tumor subtypes. Nevertheless, for rare types of cancer, even the sample sizes of cancer registries remain modest. The variability linked to the aggregated statistics could be non-negligible compared with the sample variation of the primary cohort. In response, we propose an externally informed likelihood approach, which facilitates the linkage between the primary cohort and external aggregate data, with consideration of the variation from aggregate information. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. Through the application of our proposed method, we integrate data from the cohort of inflammatory breast cancer (IBC) patients at the University of Texas MD Anderson Cancer Center with aggregate survival data from the National Cancer Data Base, enabling us to appraise the effect of tri-modality treatment on survival across various tumor subtypes of IBC.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11518850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On network deconvolution for undirected graphs. 关于无向图的网络解卷积。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae112

Zhaotong Lin, Isaac Pan, Wei Pan

Network deconvolution (ND) is a method to reconstruct a direct-effect network describing direct (or conditional) effects (or associations) between any two nodes from a given network depicting total (or marginal) effects (or associations). Its key idea is that, in a directed graph, a total effect can be decomposed into the sum of a direct and an indirect effects, with the latter further decomposed as the sum of various products of direct effects. This yields a simple closed-form solution for the direct-effect network, facilitating its important applications to distinguish direct and indirect effects. Despite its application to undirected graphs, it is not well known why the method works, leaving it with skepticism. We first clarify the implicit linear model assumption underlying ND, then derive a surprisingly simple result on the equivalence between ND and use of precision matrices, offering insightful justification and interpretation for the application of ND to undirected graphs. We also establish a formal result to characterize the effect of scaling a total-effect graph. Finally, leveraging large-scale genome-wide association study data, we show a novel application of ND to contrast marginal versus conditional genetic correlations between body height and risk of coronary artery disease; the results align with an inferred causal directed graph using ND. We conclude that ND is a promising approach with its easy and wide applicability to both directed and undirected graphs.

网络解卷积（ND）是一种从描述总（或边际）效应（或关联）的给定网络中重建描述任意两个节点之间直接（或条件）效应（或关联）的直接效应网络的方法。它的主要思想是，在有向图中，总效应可以分解为直接效应和间接效应之和，后者又可进一步分解为直接效应的各种乘积之和。这就为直接效应网络提供了一个简单的闭式解，便于其在区分直接效应和间接效应方面的重要应用。尽管该方法适用于无向图，但人们并不清楚它为何有效，因此对其持怀疑态度。我们首先澄清了 ND 所隐含的线性模型假设，然后推导出一个令人惊讶的简单结果，即 ND 与使用精确矩阵之间的等价性，为 ND 在无向图中的应用提供了深刻的理由和解释。我们还建立了一个正式的结果来描述缩放总效应图的效果。最后，利用大规模全基因组关联研究数据，我们展示了 ND 的一种新应用，即对比身高与冠心病风险之间的边际遗传相关性和条件遗传相关性；结果与使用 ND 推断的因果有向图一致。我们的结论是，ND 是一种很有前途的方法，它既简单又广泛适用于有向图，也适用于无向图。

{"title":"On network deconvolution for undirected graphs.","authors":"Zhaotong Lin, Isaac Pan, Wei Pan","doi":"10.1093/biomtc/ujae112","DOIUrl":"10.1093/biomtc/ujae112","url":null,"abstract":"Network deconvolution (ND) is a method to reconstruct a direct-effect network describing direct (or conditional) effects (or associations) between any two nodes from a given network depicting total (or marginal) effects (or associations). Its key idea is that, in a directed graph, a total effect can be decomposed into the sum of a direct and an indirect effects, with the latter further decomposed as the sum of various products of direct effects. This yields a simple closed-form solution for the direct-effect network, facilitating its important applications to distinguish direct and indirect effects. Despite its application to undirected graphs, it is not well known why the method works, leaving it with skepticism. We first clarify the implicit linear model assumption underlying ND, then derive a surprisingly simple result on the equivalence between ND and use of precision matrices, offering insightful justification and interpretation for the application of ND to undirected graphs. We also establish a formal result to characterize the effect of scaling a total-effect graph. Finally, leveraging large-scale genome-wide association study data, we show a novel application of ND to contrast marginal versus conditional genetic correlations between body height and risk of coronary artery disease; the results align with an inferred causal directed graph using ND. We conclude that ND is a promising approach with its easy and wide applicability to both directed and undirected graphs.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142387636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ROMI: a randomized two-stage basket trial design to optimize doses for multiple indications. ROMI：采用随机两阶段篮式试验设计，优化多种适应症的剂量。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae105

Shuqi Wang, Peter F Thall, Kentaro Takeda, Ying Yuan

Optimizing doses for multiple indications is challenging. The pooled approach of finding a single optimal biological dose (OBD) for all indications ignores that dose-response or dose-toxicity curves may differ between indications, resulting in varying OBDs. Conversely, indication-specific dose optimization often requires a large sample size. To address this challenge, we propose a Randomized two-stage basket trial design that Optimizes doses in Multiple Indications (ROMI). In stage 1, for each indication, response and toxicity are evaluated for a high dose, which may be a previously obtained maximum tolerated dose, with a rule that stops accrual to indications where the high dose is unsafe or ineffective. Indications not terminated proceed to stage 2, where patients are randomized between the high dose and a specified lower dose. A latent-cluster Bayesian hierarchical model is employed to borrow information between indications, while considering the potential heterogeneity of OBD across indications. Indication-specific utilities are used to quantify response-toxicity trade-offs. At the end of stage 2, for each indication with at least one acceptable dose, the dose with highest posterior mean utility is selected as optimal. Two versions of ROMI are presented, one using only stage 2 data for dose optimization and the other optimizing doses using data from both stages. Simulations show that both versions have desirable operating characteristics compared to designs that either ignore indications or optimize dose independently for each indication.

针对多种适应症优化剂量具有挑战性。为所有适应症寻找单一最佳生物剂量（OBD）的集合方法忽略了不同适应症的剂量反应或剂量毒性曲线可能不同，从而导致不同的OBD。相反，针对特定适应症的剂量优化往往需要大量样本。为了应对这一挑战，我们提出了一种 "多适应症剂量优化（ROMI）"的两阶段随机篮式试验设计。在第一阶段，针对每个适应症，评估高剂量（可能是之前获得的最大耐受剂量）的反应和毒性，并规定高剂量不安全或无效的适应症停止累积。未被终止的适应症进入第二阶段，患者在高剂量和指定的低剂量之间随机选择。在考虑到不同适应症间 OBD 的潜在异质性的同时，还采用了一种潜群组贝叶斯分层模型来借用适应症间的信息。适应症特定的效用被用来量化反应-毒性权衡。在第二阶段结束时，对于至少有一个可接受剂量的每个适应症，选择后验平均效用最高的剂量作为最佳剂量。本文介绍了两个版本的 ROMI，一个版本仅使用第 2 阶段的数据进行剂量优化，另一个版本则使用两个阶段的数据进行剂量优化。模拟显示，与忽略适应症或针对每个适应症单独优化剂量的设计相比，这两个版本都具有理想的运行特性。

{"title":"ROMI: a randomized two-stage basket trial design to optimize doses for multiple indications.","authors":"Shuqi Wang, Peter F Thall, Kentaro Takeda, Ying Yuan","doi":"10.1093/biomtc/ujae105","DOIUrl":"10.1093/biomtc/ujae105","url":null,"abstract":"Optimizing doses for multiple indications is challenging. The pooled approach of finding a single optimal biological dose (OBD) for all indications ignores that dose-response or dose-toxicity curves may differ between indications, resulting in varying OBDs. Conversely, indication-specific dose optimization often requires a large sample size. To address this challenge, we propose a Randomized two-stage basket trial design that Optimizes doses in Multiple Indications (ROMI). In stage 1, for each indication, response and toxicity are evaluated for a high dose, which may be a previously obtained maximum tolerated dose, with a rule that stops accrual to indications where the high dose is unsafe or ineffective. Indications not terminated proceed to stage 2, where patients are randomized between the high dose and a specified lower dose. A latent-cluster Bayesian hierarchical model is employed to borrow information between indications, while considering the potential heterogeneity of OBD across indications. Indication-specific utilities are used to quantify response-toxicity trade-offs. At the end of stage 2, for each indication with at least one acceptable dose, the dose with highest posterior mean utility is selected as optimal. Two versions of ROMI are presented, one using only stage 2 data for dose optimization and the other optimizing doses using data from both stages. Simulations show that both versions have desirable operating characteristics compared to designs that either ignore indications or optimize dose independently for each indication.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142364261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unlocking the power of multi-institutional data: Integrating and harmonizing genomic data across institutions. 释放多机构数据的力量：整合和协调跨机构的基因组数据。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae146

Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas

Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.

癌症是一种由基因组改变驱动的复杂疾病，肿瘤测序正成为癌症患者临床治疗的主要手段。多机构测序数据的出现为学习真实世界的证据以提高精准肿瘤学提供了强大的资源。由美国癌症研究协会领导的 GENIE BPC 建立了一个独特的数据库，将多个癌症中心治疗患者的基因组数据与临床信息联系起来。然而，利用来自多个机构的测序数据面临着巨大挑战。当分析集中于不同基因组的共同基因时，基因组的差异会导致信息丢失。此外，不同机构的测序技术差异和患者异质性也增加了复杂性。高数据维度、稀疏的基因突变模式以及单个基因水平的微弱信号使问题更加复杂。在这些现实挑战的激励下，我们引入了 Bridge 模型。该模型采用量化匹配潜变量的方法提取综合特征，以保留共同基因以外的信息，最大限度地利用所有可用数据，同时利用信息共享提高学习效率和模型的泛化能力。通过提取经过协调和降噪处理的低维潜在变量，可以捕捉到每个个体独有的真实突变模式。我们通过大量的模拟研究来评估模型的性能和参数估计。从 Bridge 模型中提取的潜特征在预测 GENIE BPC 数据中六种癌症类型的患者生存率方面始终表现出色。

{"title":"Unlocking the power of multi-institutional data: Integrating and harmonizing genomic data across institutions.","authors":"Yuan Chen, Ronglai Shen, Xiwen Feng, Katherine Panageas","doi":"10.1093/biomtc/ujae146","DOIUrl":"10.1093/biomtc/ujae146","url":null,"abstract":"Cancer is a complex disease driven by genomic alterations, and tumor sequencing is becoming a mainstay of clinical care for cancer patients. The emergence of multi-institution sequencing data presents a powerful resource for learning real-world evidence to enhance precision oncology. GENIE BPC, led by American Association for Cancer Research, establishes a unique database linking genomic data with clinical information for patients treated at multiple cancer centers. However, leveraging sequencing data from multiple institutions presents significant challenges. Variability in gene panels can lead to loss of information when analyses focus on genes common across panels. Additionally, differences in sequencing techniques and patient heterogeneity across institutions add complexity. High data dimensionality, sparse gene mutation patterns, and weak signals at the individual gene level further complicate matters. Motivated by these real-world challenges, we introduce the Bridge model. It uses a quantile-matched latent variable approach to derive integrated features to preserve information beyond common genes and maximize the utilization of all available data, while leveraging information sharing to enhance both learning efficiency and the model's capacity to generalize. By extracting harmonized and noise-reduced lower-dimensional latent variables, the true mutation pattern unique to each individual is captured. We assess model's performance and parameter estimation through extensive simulation studies. The extracted latent features from the Bridge model consistently excel in predicting patient survival across six cancer types in GENIE BPC data.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A generalized logrank-type test for comparison of treatment regimes in sequential multiple assignment randomized trials. 顺序多重分配随机试验中治疗方案比较的广义对数检验。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae139

Anastasios A Tsiatis, Marie Davidian

The sequential multiple assignment randomized trial (SMART) is the ideal study design for the evaluation of multistage treatment regimes, which comprise sequential decision rules that recommend treatments for a patient at each of a series of decision points based on their evolving characteristics. A common goal is to compare the set of so-called embedded regimes represented in the design on the basis of a primary outcome of interest. In the study of chronic diseases and disorders, this outcome is often a time to an event, and a goal is to compare the distributions of the time-to-event outcome associated with each regime in the set. We present a general statistical framework in which we develop a logrank-type test for comparison of the survival distributions associated with regimes within a specified set based on the data from a SMART with an arbitrary number of stages that allows incorporation of covariate information to enhance efficiency and can also be used with data from an observational study. The framework provides clarification of the assumptions required to yield a principled test procedure, and the proposed test subsumes or offers an improved alternative to existing methods. We demonstrate performance of the methods in a suite of simulation studies. The methods are applied to a SMART in patients with acute promyelocytic leukemia.

顺序多任务随机试验（SMART）是评估多阶段治疗方案的理想研究设计，包括顺序决策规则，根据患者不断变化的特征在一系列决策点中的每一个为患者推荐治疗。一个共同的目标是根据感兴趣的主要结果比较设计中所表示的所谓嵌入式制度的集合。在慢性疾病和失调的研究中，这个结果通常是一个事件发生的时间，目标是比较与集合中每个方案相关的事件发生时间结果的分布。我们提出了一个通用的统计框架，在这个框架中，我们开发了一个logrank型检验，用于比较特定集合内与制度相关的生存分布，该集合基于具有任意数量阶段的SMART数据，允许合并协变量信息以提高效率，也可以与观察性研究的数据一起使用。该框架澄清了产生原则性测试程序所需的假设，并且提议的测试包含或提供了现有方法的改进替代方案。我们在一套模拟研究中证明了这些方法的性能。这些方法应用于急性早幼粒细胞白血病患者的SMART。

{"title":"A generalized logrank-type test for comparison of treatment regimes in sequential multiple assignment randomized trials.","authors":"Anastasios A Tsiatis, Marie Davidian","doi":"10.1093/biomtc/ujae139","DOIUrl":"10.1093/biomtc/ujae139","url":null,"abstract":"The sequential multiple assignment randomized trial (SMART) is the ideal study design for the evaluation of multistage treatment regimes, which comprise sequential decision rules that recommend treatments for a patient at each of a series of decision points based on their evolving characteristics. A common goal is to compare the set of so-called embedded regimes represented in the design on the basis of a primary outcome of interest. In the study of chronic diseases and disorders, this outcome is often a time to an event, and a goal is to compare the distributions of the time-to-event outcome associated with each regime in the set. We present a general statistical framework in which we develop a logrank-type test for comparison of the survival distributions associated with regimes within a specified set based on the data from a SMART with an arbitrary number of stages that allows incorporation of covariate information to enhance efficiency and can also be used with data from an observational study. The framework provides clarification of the assumptions required to yield a principled test procedure, and the proposed test subsumes or offers an improved alternative to existing methods. We demonstrate performance of the methods in a suite of simulation studies. The methods are applied to a SMART in patients with acute promyelocytic leukemia.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11636965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142817045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A likelihood approach to incorporating self-report data in HIV recency classification. 将自我报告数据纳入HIV近期分类的可能性方法。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae147

Wenlong Yang, Danping Liu, Le Bao, Runze Li

Estimating new HIV infections is significant yet challenging due to the difficulty in distinguishing between recent and long-term infections. We demonstrate that HIV recency status (recent versus long-term) could be determined from self-report testing history and biomarkers, which are increasingly available in bio-behavioral surveys. HIV recency status is partially observed, given the self-report testing history. For example, people who tested positive for HIV over 1 year ago should have a long-term infection. Based on the nationally representative samples collected by the Population-based HIV Impact Assessment (PHIA) Project, we propose a likelihood-based probabilistic model for HIV recency classification. The model incorporates individuals with known recency status based on testing histories and individuals whose recency status could not be determined and integrates the mechanism of how HIV recency status depends on biomarkers and the mechanism of how HIV recency status, together with the self-report time of the most recent HIV test, impacts the test results. We compare our method to logistic regression and the binary classification tree (current practice) on Malawi PHIA data, as well as on simulated data. Our model obtains more efficient and less biased parameter estimates and is relatively robust to potential reporting error and model misspecification.

由于难以区分近期感染和长期感染，估计新的艾滋病毒感染非常重要，但也具有挑战性。我们证明HIV近发状态（近期或长期）可以从自我报告的检测历史和生物标志物中确定，这些在生物行为调查中越来越可用。根据自我报告的检测史，可以部分观察到HIV的近发状况。例如，1年前艾滋病毒检测呈阳性的人应该是长期感染。基于基于人口的HIV影响评估项目收集的具有全国代表性的样本，我们提出了一个基于似然的HIV近期分类概率模型。该模型结合了基于检测历史的已知近发状态个体和无法确定近发状态个体，整合了HIV近发状态如何依赖生物标志物的机制，以及HIV近发状态如何与最近一次HIV检测的自我报告时间一起影响检测结果的机制。我们将我们的方法与马拉维PHIA数据以及模拟数据上的逻辑回归和二分类树（目前的做法）进行比较。我们的模型获得了更有效和更少偏差的参数估计，并且对潜在的报告错误和模型错误规范具有相对的鲁棒性。

{"title":"A likelihood approach to incorporating self-report data in HIV recency classification.","authors":"Wenlong Yang, Danping Liu, Le Bao, Runze Li","doi":"10.1093/biomtc/ujae147","DOIUrl":"10.1093/biomtc/ujae147","url":null,"abstract":"Estimating new HIV infections is significant yet challenging due to the difficulty in distinguishing between recent and long-term infections. We demonstrate that HIV recency status (recent versus long-term) could be determined from self-report testing history and biomarkers, which are increasingly available in bio-behavioral surveys. HIV recency status is partially observed, given the self-report testing history. For example, people who tested positive for HIV over 1 year ago should have a long-term infection. Based on the nationally representative samples collected by the Population-based HIV Impact Assessment (PHIA) Project, we propose a likelihood-based probabilistic model for HIV recency classification. The model incorporates individuals with known recency status based on testing histories and individuals whose recency status could not be determined and integrates the mechanism of how HIV recency status depends on biomarkers and the mechanism of how HIV recency status, together with the self-report time of the most recent HIV test, impacts the test results. We compare our method to logistic regression and the binary classification tree (current practice) on Malawi PHIA data, as well as on simulated data. Our model obtains more efficient and less biased parameter estimates and is relatively robust to potential reporting error and model misspecification.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cumulative link mixed-effects models in the service of remote sensing crop progress monitoring. 累积环节混合效应模型在作物遥感监测中的应用。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-10-03 DOI: 10.1093/biomtc/ujae137

Ioannis Oikonomidis, Samis Trevezas

This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.

本研究引入了一种创新的累积链接建模（CLM）方法，利用遥感数据监测大面积作物的生长进度。研究开发了两种不同的模型，一种是固定效应累积联系模型，另一种是混合效应累积联系模型，其中包含年度随机效应，以捕捉固有的季节间变异性。推论基于部分似然法，有两种法则变化，一种是基于多二项分布的标准 CLM，另一种是基于乘积二项分布的新型 CLM。利用美国内布拉斯加州 20 年的现场数据，对玉米、燕麦、高粱、大豆、冬小麦、苜蓿、干豆和小米等八种作物的模型性能进行了评估。这些模型利用日历时间、热时间和归一化差异植被指数等预测属性。结果表明，这种方法可广泛应用于不同作物，对作物生长进度进行大规模预测，并能估算重要的农艺参数。为了促进可重复性，我们开发了一个 R 软件包生态系统，并以 "人类的年龄 "为名向公众开放。这些软件包可用于在任何拥有此类数据的地区（包括美国）实施所介绍的方法。

{"title":"Cumulative link mixed-effects models in the service of remote sensing crop progress monitoring.","authors":"Ioannis Oikonomidis, Samis Trevezas","doi":"10.1093/biomtc/ujae137","DOIUrl":"https://doi.org/10.1093/biomtc/ujae137","url":null,"abstract":"This study introduces an innovative cumulative link modeling (CLM) approach to monitor crop progress over large areas using remote sensing data. Two distinct models are developed, a fixed-effects CLM and a mixed-effects one that incorporates annual random effects to capture the inherent inter-seasonal variability. Inference is based on partial-likelihood with two law variations, the standard CLM based on the multinomial distribution and a novel one based on the product binomial distribution. Model performance is evaluated on eight crops, namely corn, oats, sorghum, soybeans, winter wheat, alfalfa, dry beans, and millet, using in-situ data from Nebraska, USA, spanning 20 years. The models utilize the predictive attributes of calendar time, thermal time, and the normalized difference vegetation index. The results demonstrate the wide applicability of this approach to different crops, providing large-scale predictions of crop progress and allowing the estimation of important agronomic parameters. To facilitate reproducibility, an ecosystem of R packages has been developed and made publicly accessible under the name Ages of Man. The packages can be utilized to implement the presented methodology in any area with this type of data, including the USA.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0