Biometrics最新文献_第10页

Bayesian inference for multivariate probit model with latent envelope. 具有潜在包络的多元概率模型的贝叶斯推断。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae059

Kwangmin Lee, Yeonhee Park

The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.

库克等人（2010 年）提出的反应包络模型是多元线性回归模型下估计回归系数的一种有效方法。它通过识别响应的实质性和非实质性部分并去除非实质性变异来提高估计效率。响应包络模型只针对连续响应变量进行过研究。本文提出了带潜包络的多元 probit 模型，简称 probit 包络模型，作为多元二元响应变量的响应包络模型。probit 包络模型利用响应包络模型的思想，考虑了多元 probit 模型中高斯潜变量之间的关系。我们利用基本可识别性概念来解决 probit 包络模型的可识别性问题，并提出了参数估计的贝叶斯方法。我们通过模拟研究和实际数据分析来说明 probit 包络模型。模拟研究表明，与多元概率模型相比，概率包络模型具有提高估计效率的潜力。真实数据分析表明，概率包络模型适用于多标签分类。

{"title":"Bayesian inference for multivariate probit model with latent envelope.","authors":"Kwangmin Lee, Yeonhee Park","doi":"10.1093/biomtc/ujae059","DOIUrl":"https://doi.org/10.1093/biomtc/ujae059","url":null,"abstract":"The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard. 使用不完善的金标准进行非参数接收器工作特征曲线分析。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae063

Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou

This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.

本文探讨了在金标准不完善的情况下估计接收者操作特征曲线（ROC）和曲线下面积（AUC）所面临的挑战，这是诊断准确性研究中的一个常见问题。当疾病状态的参考标准容易出错时，我们将深入研究 ROC 曲线和 AUC 的非参数识别和估算。我们的方法取决于这种不完美参考标准的已知或可估计准确性以及条件独立假设，在此假设下，我们证明了 ROC 曲线的可识别性，并提出了一种非参数估计方法。在不完全参考标准的准确性仍然未知的情况下，我们确定 ROC 曲线是不可识别的，但两个 AUC 之间差值的符号是可以识别的。这一洞察力促使我们开发出一种假设检验方法，用于评估 AUC 的相对优越性。与现有方法相比，所提出的方法是非参数方法，因此不依赖于参数模型假设。此外，它们还适用于连续生物标记物的 ROC/AUC 分析和序数生物标记物的 AUC 分析。我们的理论结果和模拟研究验证了所提出的方法，并通过在两项实际诊断研究中的应用进一步说明了这些方法。

{"title":"Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard.","authors":"Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae063","DOIUrl":"https://doi.org/10.1093/biomtc/ujae063","url":null,"abstract":"This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations. 双嵌套病例对照设计的绝对风险：使用和不使用增强估计方程的特定病因比例危险模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae062

Minjung Lee, Mitchell H Gail

We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.

我们从双嵌套病例对照（DNCC）数据中，根据竞争风险的特定病因比例危险度模型估算相对危险度和绝对危险度（或累积发病率或粗风险）。在 DNCC 设计中，对照组不仅要与主要病因的病例进行时间匹配，还要与竞争风险的病例（第二阶段样本）进行时间匹配。第二阶段样本有完整的协变量数据，但其他队列成员只有生存结果和一些协变量信息。设计加权估计器使用的是根据 DNCC 的 Samuelsen 类型计算得出的反抽样概率。为了利用所有队列成员的额外信息，我们在估计方程中增加了一个对零无偏的项，但提高了特定成因比例危险模型的估计效率。我们建立了所建议的估计器（包括绝对风险估计器）的渐近特性，并推导出一致的方差估计器。我们表明，增强设计加权估计器比设计加权估计器更有效。通过模拟，我们表明所提出的渐近方法能在实际样本量中产生名义运行特征。我们使用美国国家癌症研究所的前列腺癌、肺癌、结肠直肠癌和卵巢癌筛查试验研究中的前列腺癌死亡率数据来说明这些方法。

{"title":"Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations.","authors":"Minjung Lee, Mitchell H Gail","doi":"10.1093/biomtc/ujae062","DOIUrl":"https://doi.org/10.1093/biomtc/ujae062","url":null,"abstract":"We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Factor-augmented transformation models for interval-censored failure time data. 用于间隔删失故障时间数据的因子增强变换模型。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae078

Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song

Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.

区间删失失效时间数据经常出现在各种科学研究中，在这些研究中，每个受试者都经历了相关失效事件发生的定期检查，而失效时间只知道在一个特定的时间区间内。此外，收集到的数据可能包含多个具有一定相关性的观测变量，从而导致严重的多重共线性问题。本研究提出了一种因子增强变换模型，用于分析区间删失的故障时间数据，同时降低模型维度，避免多个相关协变量引起的多重共线性。我们提供了一个联合建模框架，其中包括一个因子分析模型，用于将多个观测变量归类为几个潜在因子，以及一类带有增强因子的半参数变换模型，用于检验这些因子和其他协变量对故障事件的影响。此外，我们还提出了一种非参数最大似然估计方法，并为其实现开发了一种计算稳定可靠的期望最大化算法。我们建立了所提估计器的渐近特性，并进行了模拟研究，以评估所提方法的经验性能。我们还提供了阿尔茨海默病神经影像倡议（ADNI）研究的应用。此外，还为实践者提供了一个 R 软件包 ICTransCFA。本文编写过程中使用的数据来自 ADNI 数据库。

{"title":"Factor-augmented transformation models for interval-censored failure time data.","authors":"Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song","doi":"10.1093/biomtc/ujae078","DOIUrl":"https://doi.org/10.1093/biomtc/ujae078","url":null,"abstract":"Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Propensity weighting plus adjustment in proportional hazards model is not doubly robust. 比例危险模型中的倾向加权加调整不具有双重稳健性。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae069

Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike

Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.

近来，应用研究普遍将常用的生存分析建模方法（如多变量 Cox 模型和倾向得分加权法）结合起来，目的是形成一个双重稳健的暴露效应危险比估计值，当 Cox 模型或倾向得分模型被正确指定时，该估计值在大样本中是无偏的。一般来说，当确实存在因果效应时，即使经过回归标准化处理，这种组合也不会产生双重稳健估计值。我们通过模拟证明了半参数 Cox 模型、Weibull 比例危险模型和简单比例危险灵活参数模型缺乏双重稳健性，后两种模型都是通过最大似然法拟合的。我们提供了一个新颖的证明，即如果倾向得分或结果模型指定正确且包含所有混杂因素，那么倾向得分加权与比例危险生存模型的组合，无论是通过完全似然法还是部分似然法拟合，在暴露对结果无因果效应的空值下，在特定的删减机制下都是一致的。鉴于我们的研究结果表明双重稳健性只存在于空值条件下，我们概述了 2 种简单的替代估计方法，它们对给定时间点上的生存率差异具有双重稳健性（在上述意义上），前提是能够正确地对剔除机制进行建模；我们还概述了一种对完整生存率曲线具有双重稳健性的估计方法。我们在辅助信息中提供了使用这些估计器进行估计和推断的 R 代码。

{"title":"Propensity weighting plus adjustment in proportional hazards model is not doubly robust.","authors":"Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike","doi":"10.1093/biomtc/ujae069","DOIUrl":"https://doi.org/10.1093/biomtc/ujae069","url":null,"abstract":"Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators. 通过整合来自异质种群的外部信息改进线性回归模型的预测：詹姆斯-斯坦估计器

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae072

Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor

We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.

我们考虑的情况是：(1) 一项内部研究根据个体水平数据建立了一个线性回归预测模型；(2) 一些外部研究拟合了类似的线性回归模型，这些模型只使用了协变量子集，并在没有个体水平数据的情况下提供了缩小模型的系数估计值；(3) 这些研究人群之间存在异质性。我们的目标是将外部模型的摘要信息整合到内部模型的拟合中，以提高预测的准确性。我们采用詹姆斯-斯泰因收缩方法，提出了在信息整合后预测均方误差不会变差的估计器，而且在很多情况下，无论研究人群的异质性程度如何，估计器的预测均方误差都会更好。我们进行了全面的模拟研究，以考察所提出的估计器的数值性能。我们还应用该方法，通过整合已发表文献的摘要信息，从血铅水平和其他协变量的角度增强了髌骨骨铅水平的预测模型。

{"title":"Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators.","authors":"Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor","doi":"10.1093/biomtc/ujae072","DOIUrl":"10.1093/biomtc/ujae072","url":null,"abstract":"We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Post-selection inference in regression models for group testing data. 分组测试数据回归模型中的后选择推断。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae101

Qinyan Shen, Karl Gregory, Xianzheng Huang

We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.

我们开发了一种在逻辑回归中选择变量后进行有效推断的方法，这种方法适用于部分观察到反应的情况，即观察到一组容易出错的测试结果而不是反应的真实值。为了选择重要的协变量，同时考虑响应数据中的缺失信息，我们采用期望最大化算法来计算受 LASSO 惩罚的最大似然估计值。在变量选择之后，我们根据多面体(polyhedral)lemma 扩展了选择后推断方法，从而对所选协变量的影响进行推断。大量模拟研究的经验证据表明，与使用相同数据进行变量选择和推断而不对变量选择进行调整的天真推断方法相比，我们的后选择推断结果更加可靠。

引用次数: 0

A Gaussian-process approximation to a spatial SIR process using moment closures and emulators. 使用矩闭合和仿真器的空间 SIR 过程的高斯过程近似。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae068

Parker Trostle, Joseph Guinness, Brian J Reich

The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.

控制疾病传播的动力学很难建模，因为感染既是潜在病原体的函数，也是人类或动物行为的函数。在模拟疾病如何在不同空间地点之间传播时，这一挑战就更大了。许多建议的空间流行病学模型需要权衡利弊才能拟合，要么抽象出理论传播动态，要么拟合确定性模型，要么需要大量计算资源进行多次模拟。我们提出了一种用高斯过程近似复杂空间传播动态的方法。我们首先对著名的 SIR 随机过程提出了灵活的空间扩展，然后推导出这一随机过程的时刻闭合近似值。这种时刻闭合近似得到了易感因子和感染因子的均值和协方差随时间演变的常微分方程。由于这些常微分方程是用 MCMC 拟合模型的瓶颈，因此我们使用低阶仿真器对其进行近似。这一近似值为我们的分层模型奠定了基础，该模型可用于按空间位置和时间计算有噪声的、未充分报告的新感染人数。我们演示了如何使用我们的模型，根据真实的空间 SIR 跳跃过程对模拟感染进行推断。然后，我们将我们的方法应用于巴西 2015 年末至 2016 年初的寨卡新发感染人数建模。

{"title":"A Gaussian-process approximation to a spatial SIR process using moment closures and emulators.","authors":"Parker Trostle, Joseph Guinness, Brian J Reich","doi":"10.1093/biomtc/ujae068","DOIUrl":"10.1093/biomtc/ujae068","url":null,"abstract":"The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A generalized outcome-adaptive sequential multiple assignment randomized trial design. 广义结果适应性顺序多重分配随机试验设计。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae073

Xue Yang, Yu Cheng, Peter F Thall, Abdus S Wahed

A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.

动态治疗机制（DTR）是多阶段决策过程的数学表示。当应用于医疗环境中的序贯治疗选择时，动态治疗机制有助于确定慢性疾病（如艾滋病、精神疾病、药物滥用和多种癌症）的最佳疗法。顺序多重分配随机试验（SMART）为构建 DTR 和提供 DTR 之间无偏见的比较提供了一个有用的框架。SMART 的局限性在于，它忽略了过去患者的数据，而这些数据可能有助于降低新患者接受劣质治疗的概率。在实践中，这可能会导致治疗依从性下降或患者放弃治疗。为了解决这个问题，我们提出了一种广义结果自适应（GO）SMART 设计，它能自适应地取消特定阶段随机化概率的平衡，使之有利于在既往患者身上观察到的更有效的治疗方法。为了纠正结果自适应随机化引起的偏差，我们提出了嵌入 GO-SMART 的 DTR 效果的 G 估计器和反概率加权估计器，并通过分析表明它们是一致的。我们报告的模拟结果表明，与 SMART、反应自适应 SMART 和带有自适应随机化的 SMART 相比，GO-SMART 设计能用最佳 DTR 治疗更多的患者，并获得更多的总反应数，同时保持相似或更好的统计功率。

{"title":"A generalized outcome-adaptive sequential multiple assignment randomized trial design.","authors":"Xue Yang, Yu Cheng, Peter F Thall, Abdus S Wahed","doi":"10.1093/biomtc/ujae073","DOIUrl":"https://doi.org/10.1093/biomtc/ujae073","url":null,"abstract":"A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Planning cost-effective operational forest inventories. 规划具有成本效益的实用森林资源调查。

IF 1.4 4区数学 Q3 BIOLOGY

Biometrics

Pub Date : 2024-07-01 DOI: 10.1093/biomtc/ujae104

Santeri Karppinen, Liviu Ene, Lovisa Engberg Sundström, Juha Karvanen

We address a Bayesian two-stage decision problem in operational forestry where the inner stage considers scheduling the harvesting to fulfill demand targets and the outer stage considers selecting the accuracy of pre-harvest inventories that are used to estimate the timber volumes of the forest tracts. The higher accuracy of the inventory enables better scheduling decisions but also implies higher costs. We focus on the outer stage, which we formulate as a maximization of the posterior value of the inventory decision under a budget constraint. The posterior value depends on the solution to the inner stage problem and its computation is analytically intractable, featuring an NP-hard binary optimization problem within a high-dimensional integral. In particular, the binary optimization problem is a special case of a generalized quadratic assignment problem. We present a practical method that solves the outer stage problem with an approximation which combines Monte Carlo sampling with a greedy, randomized method for the binary optimization problem. We derive inventory decisions for a dataset of 100 Swedish forest tracts across a range of inventory budgets and estimate the value of the information to be obtained.

我们要解决的是经营性林业中的贝叶斯两阶段决策问题，其中内阶段要考虑安排采伐以实现需求目标，外阶段要考虑选择采伐前库存的准确性，这些库存用于估算林地的木材量。库存的准确性越高，就能做出更好的计划安排决策，但也意味着成本越高。我们将重点放在外部阶段，将其表述为在预算约束条件下库存决策后验值的最大化。后验值取决于内部阶段问题的解，其计算在分析上很难实现，是一个高维积分内的 NP 难二元优化问题。特别是，二元优化问题是广义二次赋值问题的一个特例。我们提出了一种实用的方法，用一种近似方法解决外部阶段问题，该方法结合了蒙特卡罗采样和二元优化问题的贪婪随机方法。我们推导出了瑞典 100 个森林迹地数据集在不同清查预算范围内的清查决策，并估算了所获信息的价值。

{"title":"Planning cost-effective operational forest inventories.","authors":"Santeri Karppinen, Liviu Ene, Lovisa Engberg Sundström, Juha Karvanen","doi":"10.1093/biomtc/ujae104","DOIUrl":"https://doi.org/10.1093/biomtc/ujae104","url":null,"abstract":"We address a Bayesian two-stage decision problem in operational forestry where the inner stage considers scheduling the harvesting to fulfill demand targets and the outer stage considers selecting the accuracy of pre-harvest inventories that are used to estimate the timber volumes of the forest tracts. The higher accuracy of the inventory enables better scheduling decisions but also implies higher costs. We focus on the outer stage, which we formulate as a maximization of the posterior value of the inventory decision under a budget constraint. The posterior value depends on the solution to the inner stage problem and its computation is analytically intractable, featuring an NP-hard binary optimization problem within a high-dimensional integral. In particular, the binary optimization problem is a special case of a generalized quadratic assignment problem. We present a practical method that solves the outer stage problem with an approximation which combines Monte Carlo sampling with a greedy, randomized method for the binary optimization problem. We derive inventory decisions for a dataset of 100 Swedish forest tracts across a range of inventory budgets and estimate the value of the information to be obtained.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0