Statistics in Medicine最新文献_第3页

Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data. 在不公布个人层面连续数据的情况下，利用行政记录改进调查推断。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-18 DOI: 10.1002/sim.10270

Sharifa Z Williams, Jungang Zou, Yutao Liu, Yajuan Si, Sandro Galea, Qixuan Chen

Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.

概率调查面临的挑战是无应答率越来越高，导致统计推断产生偏差。有关人口的辅助信息可用于减少估计中的偏差。通常情况下，行政记录中的连续辅助变量在向公众公布前会先被离散化，以避免泄密。这可能会削弱行政记录在改进调查估计方面的作用，尤其是当连续辅助信息与调查结果之间存在密切关系时。在本文中，我们提出了一种分两步走的策略，即首先由统计机构利用人口中的保密连续辅助数据估算调查样本的响应倾向得分，然后将其纳入修改后的人口数据中，供数据用户使用。在第二步中，无法获取保密连续辅助数据的数据用户将离散连续变量和倾向得分作为预测因子，利用贝叶斯模型中的样条进行预测性调查推断。我们通过仿真证明，与其他方法相比，所提出的方法性能良好，能更有效地估计人口均值，95% 可信区间的覆盖率更高。我们使用俄亥俄州陆军国民警卫队心理健康计划（OHARNG-MHI）对所提出的方法进行了说明。本研究中开发的方法可在 R 软件包 AuxSurvey 中找到。

{"title":"Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data.","authors":"Sharifa Z Williams, Jungang Zou, Yutao Liu, Yajuan Si, Sandro Galea, Qixuan Chen","doi":"10.1002/sim.10270","DOIUrl":"10.1002/sim.10270","url":null,"abstract":"Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5803-5813"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Estimating Equations for Survival Data With Dependent Censoring. 具有相关删减的生存数据的广义估计方程。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-12-01 DOI: 10.1002/sim.10296

Lili Yu, Liang Liu

Independent censoring is usually assumed in survival data analysis. However, dependent censoring, where the survival time is dependent on the censoring time, is often seen in real data applications. In this project, we model the vector of survival time and censoring time marginally through semiparametric heteroscedastic accelerated failure time models and model their association by the vector of errors in the model. We show that this semiparametric model is identified, and the generalized estimating equation approach is extended to estimate the parameters in this model. It is shown that the estimators of the model parameters are consistent and asymptotically normal. Simulation studies are conducted to compare it with the estimation method under a parametric model. A real dataset from a prostate cancer study is used for illustration of the new proposed method.

在生存数据分析中，通常假定独立审查。然而，在实际的数据应用中经常会看到依赖审查，其中生存时间依赖于审查时间。在本项目中，我们通过半参数异方差加速失效时间模型建立了生存时间和边缘审查时间的向量模型，并通过模型中的误差向量建立了它们之间的关联模型。我们证明了该半参数模型是可辨识的，并将广义估计方程方法推广到该模型的参数估计。结果表明，模型参数的估计量是一致且渐近正态的。通过仿真研究将其与参数模型下的估计方法进行了比较。一个来自前列腺癌研究的真实数据集被用来说明新提出的方法。

引用次数: 0

Regression Trees With Fused Leaves. 带融合叶的回归树

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-20 DOI: 10.1002/sim.10272

Xiaogang Su, Lei Liu, Lili Liu, Ruiwen Zhou, Guoqiao Wang, Elise Dusseldorp, Tianni Zhou

We propose a novel regression tree method named "TreeFuL," an abbreviation for 'Tree with Fused Leaves.' TreeFuL innovatively combines recursive partitioning with fused regularization, offering a distinct approach to the conventional pruning method. One of TreeFuL's noteworthy advantages is its capacity for cross-validated amalgamation of non-neighboring terminal nodes. This is facilitated by a leaf coloring scheme that supports tree shearing and node amalgamation. As a result, TreeFuL facilitates the development of more parsimonious tree models without compromising predictive accuracy. The refined model offers enhanced interpretability, making it particularly well-suited for biomedical applications of decision trees, such as disease diagnosis and prognosis. We demonstrate the practical advantages of our proposed method through simulation studies and an analysis of data collected in an obesity study.

我们提出了一种名为 "TreeFuL "的新型回归树方法，"TreeFuL "是 "Tree with Fused Leaves "的缩写。TreeFuL 创新性地将递归分割与融合正则化相结合，为传统的剪枝方法提供了一种独特的方法。TreeFuL 值得一提的优势之一是它能对非相邻的终端节点进行交叉验证合并。这得益于支持树剪切和节点合并的树叶着色方案。因此，TreeFuL 可以在不影响预测准确性的前提下，帮助开发更简洁的树模型。改进后的模型具有更强的可解释性，因此特别适合决策树的生物医学应用，如疾病诊断和预后。我们通过模拟研究和对肥胖症研究中收集的数据进行分析，证明了我们提出的方法的实际优势。

引用次数: 0

Bayesian Nonparametric Model for Heterogeneous Treatment Effects With Zero-Inflated Data. 零膨胀数据下异质治疗效果的贝叶斯非参数模型

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-28 DOI: 10.1002/sim.10266

Chanmin Kim, Yisheng Li, Ting Xu, Zhongxing Liao

One goal of precision medicine is to develop effective treatments for patients by tailoring to their individual demographic, clinical, and/or genetic characteristics. To achieve this goal, statistical models must be developed that can identify and evaluate potentially heterogeneous treatment effects in a robust manner. The oft-cited existing methods for assessing treatment effect heterogeneity are based upon parametric models with interactions or conditioning on covariate values, the performance of which is sensitive to the omission of important covariates and/or the choice of their values. We propose a new Bayesian nonparametric (BNP) method for estimating heterogeneous causal effects in studies with zero-inflated outcome data, which arise commonly in health-related studies. We employ the enriched Dirichlet process (EDP) mixture in our BNP approach, establishing a connection between an outcome DP mixture and a covariate DP mixture. This enables us to estimate posterior distributions concurrently, facilitating flexible inference regarding individual causal effects. We show in a set of simulation studies that the proposed method outperforms two other BNP methods in terms of bias and mean squared error (MSE) of the conditional average treatment effect estimates. In particular, the proposed model has the advantage of appropriately reflecting uncertainty in regions where the overlap condition is violated compared to other competing models. We apply the proposed method to a study of the relationship between heart radiation dose parameters and the blood level of high-sensitivity cardiac troponin T (hs-cTnT) to examine if the effect of a high mean heart radiation dose on hs-cTnT varies by baseline characteristics.

精准医疗的目标之一是根据患者的人口统计、临床和/或遗传特征量身定制有效的治疗方法。为了实现这一目标，必须开发能够以稳健的方式识别和评估潜在异质性治疗效果的统计模型。经常被引用的评估治疗效果异质性的现有方法是基于协变量值的相互作用或条件作用的参数模型，其性能对重要协变量的遗漏和/或其值的选择很敏感。我们提出了一种新的贝叶斯非参数（BNP）方法来估计零膨胀结局数据研究中的异质性因果效应，这在与健康相关的研究中很常见。我们在BNP方法中采用了富狄利克雷过程（EDP）混合物，建立了结果DP混合物和协变量DP混合物之间的联系。这使我们能够同时估计后验分布，促进对个体因果效应的灵活推断。我们在一组模拟研究中表明，所提出的方法在条件平均处理效果估计的偏差和均方误差（MSE）方面优于其他两种BNP方法。特别是，与其他竞争模型相比，该模型具有适当反映违反重叠条件区域的不确定性的优点。我们将提出的方法应用于心脏辐射剂量参数与高敏心肌肌钙蛋白T （hs-cTnT）血液水平之间的关系研究，以检查高平均心脏辐射剂量对hs-cTnT的影响是否因基线特征而异。

{"title":"Bayesian Nonparametric Model for Heterogeneous Treatment Effects With Zero-Inflated Data.","authors":"Chanmin Kim, Yisheng Li, Ting Xu, Zhongxing Liao","doi":"10.1002/sim.10266","DOIUrl":"10.1002/sim.10266","url":null,"abstract":"One goal of precision medicine is to develop effective treatments for patients by tailoring to their individual demographic, clinical, and/or genetic characteristics. To achieve this goal, statistical models must be developed that can identify and evaluate potentially heterogeneous treatment effects in a robust manner. The oft-cited existing methods for assessing treatment effect heterogeneity are based upon parametric models with interactions or conditioning on covariate values, the performance of which is sensitive to the omission of important covariates and/or the choice of their values. We propose a new Bayesian nonparametric (BNP) method for estimating heterogeneous causal effects in studies with zero-inflated outcome data, which arise commonly in health-related studies. We employ the enriched Dirichlet process (EDP) mixture in our BNP approach, establishing a connection between an outcome DP mixture and a covariate DP mixture. This enables us to estimate posterior distributions concurrently, facilitating flexible inference regarding individual causal effects. We show in a set of simulation studies that the proposed method outperforms two other BNP methods in terms of bias and mean squared error (MSE) of the conditional average treatment effect estimates. In particular, the proposed model has the advantage of appropriately reflecting uncertainty in regions where the overlap condition is violated compared to other competing models. We apply the proposed method to a study of the relationship between heart radiation dose parameters and the blood level of high-sensitivity cardiac troponin T (hs-cTnT) to examine if the effect of a high mean heart radiation dose on hs-cTnT varies by baseline characteristics.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5968-5982"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142751737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unlocking Cognitive Analysis Potential in Alzheimer's Disease Clinical Trials: Investigating Hierarchical Linear Models for Analyzing Novel Measurement Burst Design Data. 释放阿尔茨海默病临床试验中的认知分析潜力：研究用于分析新型测量突变设计数据的层次线性模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-25 DOI: 10.1002/sim.10292

Guoqiao Wang, Jason Hassenstab, Yan Li, Andrew J Aschenbrenner, Eric M McDade, Jorge Llibre-Guerra, Randall J Bateman, Chengjie Xiong

Measurement burst designs typically administer brief cognitive tests four times per day for 1 week, resulting in a maximum of 28 data points per week per test for every 6 months. In Alzheimer's disease clinical trials, utilizing measurement burst designs holds great promise for boosting statistical power by collecting huge amount of data. However, appropriate methods for analyzing these complex datasets are not well investigated. Furthermore, the large amount of burst design data also poses tremendous challenges for traditional computational procedures such as SAS mixed or Nlmixed. We propose to analyze burst design data using novel hierarchical linear mixed effects models or hierarchical mixed models for repeated measures. Through simulations and real-world data applications using the novel SAS procedure Hpmixed, we demonstrate these hierarchical models' efficiency over traditional models. Our sample simulation and analysis code can serve as a catalyst to facilitate the methodology development for burst design data.

测量突变设计通常每天进行四次简短的认知测试，每次测试持续一周，每六个月每周最多可获得 28 个数据点。在阿尔茨海默病临床试验中，利用测量突变设计通过收集大量数据来提高统计能力大有可为。然而，分析这些复杂数据集的适当方法还没有得到很好的研究。此外，大量的突发设计数据也给 SAS 混合或 Nlmixed 等传统计算程序带来了巨大挑战。我们建议使用新型分层线性混合效应模型或重复测量分层混合模型来分析突发设计数据。通过使用新型 SAS 程序 Hpmixed 进行模拟和实际数据应用，我们证明了这些层次模型比传统模型更高效。我们的模拟和分析代码样本可作为促进突发设计数据方法开发的催化剂。

{"title":"Unlocking Cognitive Analysis Potential in Alzheimer's Disease Clinical Trials: Investigating Hierarchical Linear Models for Analyzing Novel Measurement Burst Design Data.","authors":"Guoqiao Wang, Jason Hassenstab, Yan Li, Andrew J Aschenbrenner, Eric M McDade, Jorge Llibre-Guerra, Randall J Bateman, Chengjie Xiong","doi":"10.1002/sim.10292","DOIUrl":"10.1002/sim.10292","url":null,"abstract":"Measurement burst designs typically administer brief cognitive tests four times per day for 1 week, resulting in a maximum of 28 data points per week per test for every 6 months. In Alzheimer's disease clinical trials, utilizing measurement burst designs holds great promise for boosting statistical power by collecting huge amount of data. However, appropriate methods for analyzing these complex datasets are not well investigated. Furthermore, the large amount of burst design data also poses tremendous challenges for traditional computational procedures such as SAS mixed or Nlmixed. We propose to analyze burst design data using novel hierarchical linear mixed effects models or hierarchical mixed models for repeated measures. Through simulations and real-world data applications using the novel SAS procedure Hpmixed, we demonstrate these hierarchical models' efficiency over traditional models. Our sample simulation and analysis code can serve as a catalyst to facilitate the methodology development for burst design data.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5898-5910"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142717271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Hierarchical Bayesian Model for Estimating Age-Specific COVID-19 Infection Fatality Rates in Developing Countries. 估算发展中国家特定年龄 COVID-19 感染致死率的层次贝叶斯模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-11 DOI: 10.1002/sim.10259

Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick

The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scarce resources. However, there lacks a coherent method to synthesize available data to create estimates of IFR and seroprevalence that vary continuously with age and adequately reflect uncertainties inherent in the underlying data. In this article, we introduce a novel Bayesian hierarchical model to estimate IFR as a continuous function of age that acknowledges heterogeneity in population age structure across locations and accounts for uncertainty in the estimates due to seroprevalence sampling variability and the imperfect serology test assays. Our approach simultaneously models test assay characteristics, serology, and death data, where the serology and death data are often available only for binned age groups. Information is shared across locations through hierarchical modeling to improve estimation of the parameters with limited data. Modeling data from 26 developing country locations during the first year of the COVID-19 pandemic, we found seroprevalence did not change dramatically with age, and the IFR at age 60 was above the high-income country estimate for most locations.

COVID-19 感染致死率 (IFR) 是指感染 SARS-CoV-2 后死亡的人数比例。由于 COVID-19 对老年人的影响尤为严重，因此必须估算出特定年龄段的 IFR，以便于比较不同地区 COVID-19 的影响，并优先分配稀缺资源。然而，目前缺乏一种连贯的方法来综合现有数据，以得出随年龄不断变化并能充分反映基础数据固有不确定性的 IFR 和血清流行率估计值。在这篇文章中，我们引入了一种新的贝叶斯分层模型，以年龄的连续函数来估算IFR，该模型承认不同地点人群年龄结构的异质性，并考虑了由于血清流行率采样变异和血清学检测方法不完善而导致的估算值的不确定性。我们的方法同时对检测化验特性、血清学和死亡数据进行建模，而血清学和死亡数据通常只能提供二进制年龄组的数据。通过分层建模，各地共享信息，从而利用有限的数据改进参数估计。通过对 COVID-19 大流行第一年期间 26 个发展中国家的数据进行建模，我们发现血清流行率并没有随着年龄的增长而发生显著变化，而且大多数地区 60 岁时的 IFR 都高于高收入国家的估计值。

{"title":"A Hierarchical Bayesian Model for Estimating Age-Specific COVID-19 Infection Fatality Rates in Developing Countries.","authors":"Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick","doi":"10.1002/sim.10259","DOIUrl":"10.1002/sim.10259","url":null,"abstract":"The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scarce resources. However, there lacks a coherent method to synthesize available data to create estimates of IFR and seroprevalence that vary continuously with age and adequately reflect uncertainties inherent in the underlying data. In this article, we introduce a novel Bayesian hierarchical model to estimate IFR as a continuous function of age that acknowledges heterogeneity in population age structure across locations and accounts for uncertainty in the estimates due to seroprevalence sampling variability and the imperfect serology test assays. Our approach simultaneously models test assay characteristics, serology, and death data, where the serology and death data are often available only for binned age groups. Information is shared across locations through hierarchical modeling to improve estimation of the parameters with limited data. Modeling data from 26 developing country locations during the first year of the COVID-19 pandemic, we found seroprevalence did not change dramatically with age, and the IFR at age 60 was above the high-income country estimate for most locations.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5667-5680"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Risk Assessment of Time-to-Event Targets With Adaptive Information Transfer. 基于自适应信息传递的时间到事件目标的有效风险评估。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-12-01 DOI: 10.1002/sim.10290

Jie Ding, Jialiang Li, Ping Xie, Xiaoguang Wang

Using informative sources to enhance statistical analysis in target studies has become an increasingly popular research topic. However, cohorts with time-to-event outcomes have not received sufficient attention, and external studies often encounter issues of incomparability due to population heterogeneity and unmeasured risk factors. To improve individualized risk assessments, we propose a novel methodology that adaptively borrows information from multiple incomparable sources. By extracting aggregate statistics through transitional models applied to both the external sources and the target population, we incorporate this information efficiently using the control variate technique. This approach eliminates the need to load individual-level records from sources directly, resulting in low computational complexity and strong privacy protection. Asymptotically, our estimators of both relative and baseline risks are more efficient than traditional results, and the power of covariate effects testing is much enhanced. We demonstrate the practical performance of our method via extensive simulations and a real case study.

利用信息源加强目标研究中的统计分析已成为一个日益流行的研究课题。然而，具有事件发生时间结果的队列没有得到足够的重视，并且由于人群异质性和未测量的风险因素，外部研究经常遇到不可比较性问题。为了提高个性化的风险评估，我们提出了一种新的方法，自适应地从多个不可比较的来源借用信息。通过应用于外部来源和目标人群的过渡模型提取汇总统计数据，我们使用控制变量技术有效地合并这些信息。这种方法消除了直接从源加载个人级记录的需要，从而降低了计算复杂性和强大的隐私保护。渐近地，我们对相对风险和基线风险的估计比传统的结果更有效，协变量效应测试的能力得到了极大的增强。我们通过大量的模拟和实际案例研究证明了我们的方法的实际性能。

引用次数: 0

ℓ 1 $$ {ell}_1 $$ -Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes. ℓ 1 $$ {ell}_1 $$ -Penalized Multinomial Regression：估计、推理和预测，应用于不同痴呆亚型的风险因素识别。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-12 DOI: 10.1002/sim.10263

Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng

High-dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based $ℓ_{1}$ -penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and $p$ value of the individual hypothesis test. We also examine cases of model misspecification and non-identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.

高维多叉回归模型在实践中非常有用，但与逻辑回归模型相比，它受到的研究关注较少，尤其是从统计推断的角度来看。在这项工作中，我们分析了基于对比度的 ℓ 1 $$ {ell}_1 $$ -penalized 多叉回归模型的估计和预测误差，并将去尾法扩展到多叉情况，为每个系数和单个假设检验的 p $$ p $$ 值提供了有效的置信区间。我们还研究了模型规范错误和非同分布数据的情况，以证明我们的方法在违反某些假设时的稳健性。我们应用去杂方法来识别不同亚型痴呆症进展过程中的重要预测因素。大量模拟结果表明，与其他推理方法相比，去杂方法更具优势。

{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:semantics> <ns0:mrow> <ns0:msub><ns0:mrow><ns0:mi>ℓ</ns0:mi></ns0:mrow> <ns0:mrow><ns0:mn>1</ns0:mn></ns0:mrow> </ns0:msub> </ns0:mrow> <ns0:annotation>$$ {ell}_1 $$</ns0:annotation></ns0:semantics> </ns0:math> -Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes.","authors":"Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng","doi":"10.1002/sim.10263","DOIUrl":"10.1002/sim.10263","url":null,"abstract":"High-dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based <math> <semantics> <mrow> <msub><mrow><mi>ℓ</mi></mrow> <mrow><mn>1</mn></mrow> </msub> </mrow> <annotation>$$ {ell}_1 $$</annotation></semantics> </math> -penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value of the individual hypothesis test. We also examine cases of model misspecification and non-identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5711-5747"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical Inference for Counting Processes Under Shape Heterogeneity. 形状异质性下计数过程的统计推断

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-19 DOI: 10.1002/sim.10280

Ying Sheng, Yifei Sun

Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root- $n$ convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.

比例率模型是分析重复事件数据最常用的方法之一。虽然该模型提供了对协变量效应的直接比率解释，但比例比率假设意味着协变量不会改变比率函数的形状。当比例假设不成立时，我们建议通过两类参数来描述协变量对比率函数的影响：形状参数和大小参数。前者允许协变量灵活地影响速率函数的形状，后者保留了协变量对速率函数大小影响的可解释性。为了克服同时估计两组参数所带来的挑战，我们提出了一种条件伪似然法来消除形状估计中的大小参数，然后用事件计数投影法进行大小估计。所提出的估计值是渐近正态的，收敛率为根 n $$ n $$。我们利用 SEER-Medicare 数据进行了模拟研究和复发性住院分析，以说明所提出的方法。

{"title":"Statistical Inference for Counting Processes Under Shape Heterogeneity.","authors":"Ying Sheng, Yifei Sun","doi":"10.1002/sim.10280","DOIUrl":"10.1002/sim.10280","url":null,"abstract":"Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root- <math> <semantics><mrow><mi>n</mi></mrow> <annotation>$$ n $$</annotation></semantics> </math> convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5849-5861"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skewness-Corrected Confidence Intervals for Predictive Values in Enrichment Studies. 富集研究中预测值的斜度校正置信区间。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine

Pub Date : 2024-12-30 Epub Date: 2024-11-20 DOI: 10.1002/sim.10283

Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi

The positive predictive value (PPV) and negative predictive value (NPV) can be expressed as functions of disease prevalence ( $ρ$ ) and the ratios of two binomial proportions ( $ϕ$ ), where $ϕ_{ppv} = \frac{1 - specificity}{sensitivity}$ and $ϕ_{npv} = \frac{1 - sensitivity}{specificity}$ . In prospective studies, where the proportion of subjects with the disease in the study cohort is an unbiased estimate of the disease prevalence, the confidence intervals (CIs) of PPV and NPV can be estimated using established methods for single proportion. However, in enrichment studies, such as case-control studies, where the proportion of diseased subjects significantly differs from disease prevalence, estimating CIs for PPV and NPV remains a challenge in terms of skewness and overall coverage, especially under extreme conditions (e.g., $NPV = 1$ ). In this article, we extend the method adopted by Li, where CIs for PPV and NPV were derived from those of $ϕ$ . We explored additional CI methods for $ϕ$ , including those by Gart & Nam (GN), MoverJ, and Walter and convert their corresponding CIs for PPV and NPV. Through simulations, we compared these methods with established CI methods, Fieller, Pepe, and Delta in terms of skewness and overall coverage. While no method proves universally optimal, GN and MoverJ methods generally emerge as recommended choices.

阳性预测值（PPV）和阴性预测值（NPV）可以表示为疾病流行率（ρ $$ rho $$）和两个二项式比例（j $$ phi $$）的函数、其中，ϕ ppv = 1 - 特异性敏感性 $$ {phi}_{ppv}=frac{1- 特异性}{敏感性} $$ 和 ϕ npv = 1 - 敏感性特异性 $$ {phi}_{npv}=frac{1- 敏感性}{特异性} $$ 。在前瞻性研究中，研究队列中患病受试者的比例是对疾病患病率的无偏估计，因此 PPV 和 NPV 的置信区间 (CIs) 可以使用单比例的既定方法进行估计。然而，在病例对照研究等富集研究中，患病受试者的比例与疾病流行率存在显著差异，因此从偏度和总体覆盖率的角度来看，尤其是在极端条件下（如 NPV = 1 $$ mathrm{NPV}=1 $$），估计 PPV 和 NPV 的置信区间仍是一项挑战。在本文中，我们扩展了 Li 所采用的方法，其中 PPV 和 NPV 的 CI 是根据 ϕ $$ phi $$ 的 CI 得出的。我们还探索了其他的 ϕ $$ phi $$ CI 方法，包括 Gart & Nam (GN)、MoverJ 和 Walter 的方法，并转换了它们相应的 PPV 和 NPV CI。通过模拟，我们将这些方法与已有的 CI 方法、Fieller、Pepe 和 Delta 在偏度和总体覆盖率方面进行了比较。虽然没有一种方法被证明是普遍最优的，但 GN 和 MoverJ 方法通常是推荐的选择。

{"title":"Skewness-Corrected Confidence Intervals for Predictive Values in Enrichment Studies.","authors":"Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi","doi":"10.1002/sim.10283","DOIUrl":"10.1002/sim.10283","url":null,"abstract":"The positive predictive value (PPV) and negative predictive value (NPV) can be expressed as functions of disease prevalence ( <math> <semantics><mrow><mi>ρ</mi></mrow> <annotation>$$ rho $$</annotation></semantics> </math> ) and the ratios of two binomial proportions ( <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> ), where <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>ppv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>specificity</mtext></mrow> <mtext>sensitivity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{ppv}=frac{1- specificity}{sensitivity} $$</annotation></semantics> </math> and <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>npv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>sensitivity</mtext></mrow> <mtext>specificity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{npv}=frac{1- sensitivity}{specificity} $$</annotation></semantics> </math> . In prospective studies, where the proportion of subjects with the disease in the study cohort is an unbiased estimate of the disease prevalence, the confidence intervals (CIs) of PPV and NPV can be estimated using established methods for single proportion. However, in enrichment studies, such as case-control studies, where the proportion of diseased subjects significantly differs from disease prevalence, estimating CIs for PPV and NPV remains a challenge in terms of skewness and overall coverage, especially under extreme conditions (e.g., <math> <semantics><mrow><mi>NPV</mi> <mo>=</mo> <mn>1</mn></mrow> <annotation>$$ mathrm{NPV}=1 $$</annotation></semantics> </math> ). In this article, we extend the method adopted by Li, where CIs for PPV and NPV were derived from those of <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> . We explored additional CI methods for <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> , including those by Gart & Nam (GN), MoverJ, and Walter and convert their corresponding CIs for PPV and NPV. Through simulations, we compared these methods with established CI methods, Fieller, Pepe, and Delta in terms of skewness and overall coverage. While no method proves universally optimal, GN and MoverJ methods generally emerge as recommended choices.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5862-5871"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0