首页 > 最新文献

International Journal of Biostatistics最新文献

英文 中文
Optimizing personalized treatments for targeted patient populations across multiple domains. 跨领域优化针对目标患者群体的个性化治疗。
IF 1.2 4区 数学 Pub Date : 2024-09-26 DOI: 10.1515/ijb-2024-0068
Yuan Chen, Donglin Zeng, Yuanjia Wang

Learning individualized treatment rules (ITRs) for a target patient population with mental disorders is confronted with many challenges. First, the target population may be different from the training population that provided data for learning ITRs. Ignoring differences between the training patient data and the target population can result in sub-optimal treatment strategies for the target population. Second, for mental disorders, a patient's underlying mental state is not observed but can be inferred from measures of high-dimensional combinations of symptomatology. Treatment mechanisms are unknown and can be complex, and thus treatment effect moderation can take complicated forms. To address these challenges, we propose a novel method that connects measurement models, efficient weighting schemes, and flexible neural network architecture through latent variables to tailor treatments for a target population. Patients' underlying mental states are represented by a compact set of latent state variables while preserving interpretability. Weighting schemes are designed based on lower-dimensional latent variables to efficiently balance population differences so that biases in learning the latent structure and treatment effects are mitigated. Extensive simulation studies demonstrated consistent superiority of the proposed method and the weighting approach. Applications to two real-world studies of patients with major depressive disorder have shown a broad utility of the proposed method in improving treatment outcomes in the target population.

针对目标精神障碍患者群体学习个性化治疗规则(ITR)面临着许多挑战。首先,目标人群可能不同于为学习 ITR 提供数据的训练人群。忽略训练患者数据与目标人群之间的差异,可能会导致针对目标人群的治疗策略达不到最佳效果。其次,对于精神障碍而言,患者的基本精神状态无法观察到,但可以通过症状的高维组合测量来推断。治疗机制是未知的,也可能是复杂的,因此治疗效果调节的形式也可能是复杂的。为了应对这些挑战,我们提出了一种新方法,通过潜变量将测量模型、高效加权方案和灵活的神经网络架构联系起来,为目标人群量身定制治疗方案。患者的基本心理状态由一组紧凑的潜在状态变量表示,同时保持可解释性。加权方案的设计基于低维潜在变量,以有效平衡人群差异,从而减轻学习潜在结构和治疗效果的偏差。广泛的模拟研究表明,所提出的方法和加权方法具有一致的优越性。在两项针对重度抑郁症患者的实际研究中的应用表明,所提出的方法在改善目标人群的治疗效果方面具有广泛的实用性。
{"title":"Optimizing personalized treatments for targeted patient populations across multiple domains.","authors":"Yuan Chen, Donglin Zeng, Yuanjia Wang","doi":"10.1515/ijb-2024-0068","DOIUrl":"10.1515/ijb-2024-0068","url":null,"abstract":"<p><p>Learning individualized treatment rules (ITRs) for a target patient population with mental disorders is confronted with many challenges. First, the target population may be different from the training population that provided data for learning ITRs. Ignoring differences between the training patient data and the target population can result in sub-optimal treatment strategies for the target population. Second, for mental disorders, a patient's underlying mental state is not observed but can be inferred from measures of high-dimensional combinations of symptomatology. Treatment mechanisms are unknown and can be complex, and thus treatment effect moderation can take complicated forms. To address these challenges, we propose a novel method that connects measurement models, efficient weighting schemes, and flexible neural network architecture through latent variables to tailor treatments for a target population. Patients' underlying mental states are represented by a compact set of latent state variables while preserving interpretability. Weighting schemes are designed based on lower-dimensional latent variables to efficiently balance population differences so that biases in learning the latent structure and treatment effects are mitigated. Extensive simulation studies demonstrated consistent superiority of the proposed method and the weighting approach. Applications to two real-world studies of patients with major depressive disorder have shown a broad utility of the proposed method in improving treatment outcomes in the target population.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome. 针对随时间变化的结果,对治疗轨迹进行历史限制边际结构模型和潜类增长分析。
IF 1.2 4区 数学 Pub Date : 2024-08-12 DOI: 10.1515/ijb-2023-0116
Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot

In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.

在之前的工作中,我们提出了一种将潜类增长分析(LCGA)与边际结构模型(LCGA-MSM)相结合的框架。LCGA-MSM 首先将众多随时间变化的治疗模式归纳为几个轨迹组,然后对组间差异进行群体层面的因果解释。然而,LCGA-MSM 框架并不适合结果随时间变化的情况。在本研究中,我们建议将非参数历史限制边际结构模型(HRMSM)与 LCGA 结合起来。HRMSM 可以看作是标准 MSM 在多个时间区间上的应用。据我们所知,我们还首次将 HRMSMs 应用于时间到事件结果。以前曾有人指出,当以危险比或生存曲线为目标时,HRMSMs 可能会在生存分析中带来解释问题。我们提出的因果参数可以绕过这些解释难题。我们考虑了三种不同的参数估计方法:逆治疗概率加权法(IPTW)、g 计算法和集合纵向目标最大似然估计法(pooled LTMLE)。我们进行了模拟研究,以衡量所提出的 LCGA-HRMSM 的性能。在所有情况下,无论是使用 g 计算还是集合 LTMLE,我们都能获得无偏估计值。在某些情况下,IPTW 得出的估计值偏差稍大。总体而言,所有方法都能很好地覆盖 95% 的置信区间。我们将这一方法应用于由 57,211 名他汀类药物服用者组成的魁北克老年人群,发现他汀类药物服用依从性越高,心血管疾病或全因死亡的综合风险越低。
{"title":"History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome.","authors":"Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot","doi":"10.1515/ijb-2023-0116","DOIUrl":"10.1515/ijb-2023-0116","url":null,"abstract":"<p><p>In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141972255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials. 经典-贝叶斯混合法确定双臂优势临床试验的样本量。
IF 1.2 4区 数学 Pub Date : 2024-07-01 DOI: 10.1515/ijb-2023-0050
Valeria Sambucini

Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.

基于功率分析的传统样本量确定(SSD)方法利用的是未知参数的相关固定值或初步估计值。经典-贝叶斯混合方法可用于正式纳入未知量的信息或模型不确定性,方法是根据贝叶斯方法使用先验分布,同时仍在频数主义框架下分析数据。在本文中,我们针对双臂优势试验中的 SSD 提出了一种混合程序,该程序考虑到了统计功率中涉及的未知参数所扮演的不同角色。因此,我们使用不同的先验分布来正式确定设计预期,并对分析阶段涉及的初步估计信息或不确定性进行建模。为了说明该方法,我们考虑了二进制数据,并使用三个可能的相关参数,即成功比例差、相对风险对数和几率比对数,推导出了所建议的混合标准。文中还提供了文献中的数字示例,以说明如何实施所建议的程序。
{"title":"Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials.","authors":"Valeria Sambucini","doi":"10.1515/ijb-2023-0050","DOIUrl":"https://doi.org/10.1515/ijb-2023-0050","url":null,"abstract":"<p><p>Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma. 基于聚类的可解释逻辑回归模型,应用于描述严重嗜酸性粒细胞性哮喘的治疗反应。
IF 1.2 4区 数学 Pub Date : 2024-06-25 DOI: 10.1515/ijb-2023-0061
Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona

Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.

哮喘是一种以慢性气道高反应性和炎症为特征的疾病,表现为不同程度的气流受限和肺功能受损,导致气短、胸闷和咳嗽等呼吸道症状。嗜酸性粒细胞性哮喘是一种独特的表型,半数以上的重症哮喘患者都会患上这种疾病。嗜酸性粒细胞性哮喘可通过针对特定免疫信号通路的单克隆抗体进行有效治疗,这些通路会加剧该疾病的炎症,尤其是白细胞介素-5(IL-5),它是一种在哮喘中起关键作用的细胞因子。在这项研究中,我们提出了一个数据分析管道,旨在识别重度嗜酸性粒细胞性哮喘的亚型与随访治疗反应的关系,这在常规临床实践中具有巨大的应用潜力。一旦确定了将患者划分为亚表型的最佳方案,就会以一种新颖的方式使用标明每个患者所属组别的标签。对于专门逻辑回归模型中的每个输入变量,通过考虑的输入变量与分组标签之间适当的交互项来确定分组对治疗反应的影响。我们的研究表明,聚类几率比可以根据聚类标签进行有意义的解释。通过这种方法,我们可以为聚类算法识别出的每个组别中的每个输入变量定义响应变量的效应度量,而这在标准的逻辑回归中是不可能实现的,因为参照类的效应与总体截距有别。模型的可解释性是通过促进稀疏性来实现的,这一目标是通过使用一种特殊的组-拉索(group-Lasso)技术分层学习交互作用来实现的。此外,我们还提供了有效的表达式,用于计算稀疏性促进算法所使用的不寻常参数化中的几率比。我们展示了如何将所提出的数据分析管道应用于哮喘患者的亚表型问题,以及对单克隆抗体治疗的反应质量。
{"title":"An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma.","authors":"Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona","doi":"10.1515/ijb-2023-0061","DOIUrl":"https://doi.org/10.1515/ijb-2023-0061","url":null,"abstract":"<p><p>Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response to comments on 'sensitivity of estimands in clinical trials with imperfect compliance'. 对 "不完全遵从的临床试验中估计值的敏感性 "评论的回应。
IF 1.2 4区 数学 Pub Date : 2024-05-06 DOI: 10.1515/ijb-2024-0013
Heng Chen, Daniel F Heitjan
{"title":"Response to comments on 'sensitivity of estimands in clinical trials with imperfect compliance'.","authors":"Heng Chen, Daniel F Heitjan","doi":"10.1515/ijb-2024-0013","DOIUrl":"https://doi.org/10.1515/ijb-2024-0013","url":null,"abstract":"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140854329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis. 基于排序集抽样的平均剩余寿命递减估计,并应用于生存分析。
IF 1.2 4区 数学 Pub Date : 2024-03-29 DOI: 10.1515/ijb-2023-0051
Elham Zamanzade, Ehsan Zamanzade, Afshin Parvardeh

The mean residual lifetime (MRL) of a unit in a population at a given time t, is the average remaining lifetime among those population units still alive at the time t. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained "free of charge", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.

在给定的时间 t,种群中一个单位的平均剩余寿命(MRL)是在时间 t 仍存活的种群单位的平均剩余寿命。因此,改进 MRL 函数估计的一个自然方法就是在估计过程中使用这一假设。在本文中,我们在排序集抽样(RSS)中开发了一种 MRL 估计器,它具有单调性特性。我们证明它是真正 MRL 函数的强均匀一致估计器。我们还证明,引入的估计器的渐近分布与经验分布相同,因此,至少在渐近意义上,新估计器是 "免费 "获得的。然后,我们利用蒙特卡罗模拟将所提出的估计器与 RSS 和简单随机抽样 (SRS) 中的竞争对手进行了比较。我们的模拟结果证实了所提出的程序在有限样本量下的优越性。最后,我们使用了来自美国国家癌症研究所(NCI)的监测、流行病学和最终结果(SEER)项目的真实数据集,以证明所引入的技术能更准确地估计乳腺癌患者的平均剩余寿命。
{"title":"Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis.","authors":"Elham Zamanzade, Ehsan Zamanzade, Afshin Parvardeh","doi":"10.1515/ijb-2023-0051","DOIUrl":"https://doi.org/10.1515/ijb-2023-0051","url":null,"abstract":"<p><p>The mean residual lifetime (MRL) of a unit in a population at a given time <i>t</i>, is the average remaining lifetime among those population units still alive at the time <i>t</i>. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained \"free of charge\", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140319833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements. 用于评估具有异质随机评分者和重复测量的定量数据一致性的统计模型。
IF 1.2 4区 数学 Pub Date : 2024-02-22 DOI: 10.1515/ijb-2023-0037
Claus Thorn Ekstrøm, Bendix Carstensen

Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].

定量测量方法之间的一致性通常是通过计算成对方法之间的一致性极限和/或通过布兰-阿尔特曼图进行说明来评估的。我们考虑的情况是,观察到的测量方法被视为可能方法群体中的随机样本,并讨论如何将基本的线性混合效应模型扩展到这种情况。例如,当测量方法代表用于对特定个人或项目进行评分的评分者/评判者时,这种情况就很重要。在随机方法的情况下,我们对与特定方法有关的估计值不感兴趣,而是对量化实际参与测量的方法之间的变异感兴趣,并在归纳方法的临床表现时将其作为额外的变异来源。在模型中,我们允许评定者有各自的精确度/技能,并允许链接重复(即当项目内重复的编号、标签或排序很重要时)。本文展示了对两个数据集的一致性极限进行估计的应用:一个是一组学生的空间感知数据集,另一个是消费者对法国巧克力的偏好数据集。模型由 R 软件包 MethComp 实现[Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013.R 软件包 1.22 版,R 核心团队。R: a language and environment for statistical computing.奥地利维也纳:R Foundation for Statistical Computing; 2012]。
{"title":"Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements.","authors":"Claus Thorn Ekstrøm, Bendix Carstensen","doi":"10.1515/ijb-2023-0037","DOIUrl":"https://doi.org/10.1515/ijb-2023-0037","url":null,"abstract":"<p><p>Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (<i>i.e.</i>, when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139913982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible variable selection in the presence of missing data. 在数据缺失的情况下灵活选择变量。
IF 1.2 4区 数学 Pub Date : 2024-02-13 DOI: 10.1515/ijb-2023-0059
Brian D Williamson, Ying Huang

In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.

在许多应用中,人们有兴趣从多个候选特征中识别出一组或一组合理的特征,从而在预测响应时达到理想的性能水平。在实践中,由于抽样设计或其他随机机制造成的数据缺失,这项任务往往会变得复杂。最近在缺失数据情况下进行变量选择的大多数工作都在一定程度上依赖于有限维统计模型,例如广义线性模型或惩罚线性模型。在这种模型被错误定义的情况下,所选变量可能并不都具有真正的科学相关性,并可能导致面板的分类性能不理想。为了解决这一局限性,我们提出了一种非参数变量选择算法,结合多重估算,在随机数据缺失的情况下建立灵活的面板。我们概述了基于所提算法的策略,这些策略可实现对常用错误率的控制。通过模拟,我们证明了我们的建议具有良好的操作特性,与现有的几种惩罚回归方法相比,在广义线性模型被错误指定的情况下,我们的面板具有更高的分类和变量选择性能。最后,我们利用所提出的方法开发了生物标记物面板,用于区分具有不同恶性潜能的胰腺囊肿,在这种情况下,由于标本量有限,生物标记物会出现复杂的缺失。
{"title":"Flexible variable selection in the presence of missing data.","authors":"Brian D Williamson, Ying Huang","doi":"10.1515/ijb-2023-0059","DOIUrl":"10.1515/ijb-2023-0059","url":null,"abstract":"<p><p>In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139724719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the mixed model for repeated measures to robustly increase precision in randomized trials. 改进重复测量的混合模型,显著提高随机试验的精度。
IF 1.2 4区 数学 Pub Date : 2023-11-29 DOI: 10.1515/ijb-2022-0101
Bingkai Wang, Yu Du

In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.

在随机试验中,结果的重复测量是常规收集的。重复测量的混合模型(MMRM)利用这些重复结果测量的信息,通常用于主要分析,以估计主要终点的平均治疗效果。然而,当MMRM不正确地模拟中间结果时,它可能会遭受偏差和精度损失,因此不能无害地使用随机化后的信息。本文提出了一种常用的MMRM的扩展,称为IMMRM,它提高了鲁棒性并优化了协变量调整、分层随机化和中间结果测量调整的精度增益。在正则性条件和完全随机缺失条件下,证明了IMMRM估计对任意模型错规范的平均处理效果具有鲁棒性,并且与协方差分析(ANCOVA)估计和MMRM估计渐近相等或更精确。在随机缺失的情况下,IMMRM比MMRM更不容易被错误指定,并且我们通过模拟研究证明IMMRM仍然具有更小的偏差和更小的方差。我们的研究结果得到了一项针对糖尿病治疗的随机试验的再分析的进一步支持。
{"title":"Improving the mixed model for repeated measures to robustly increase precision in randomized trials.","authors":"Bingkai Wang, Yu Du","doi":"10.1515/ijb-2022-0101","DOIUrl":"https://doi.org/10.1515/ijb-2022-0101","url":null,"abstract":"<p><p>In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138452976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction-based variable selection for component-wise gradient boosting. 基于预测的梯度增强变量选择。
IF 1.2 4区 数学 Pub Date : 2023-11-27 eCollection Date: 2024-05-01 DOI: 10.1515/ijb-2023-0052
Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach

Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.

基于模型的组件梯度增强是一种流行的数据驱动变量选择工具。为了进一步提高其预测和选择质量,对原始算法进行了一些修改,主要关注不同的停止准则,而没有改变实际的变量选择机制。我们研究了基于模型的组件梯度增强中变量选择步骤的不同基于预测的机制。这些方法包括赤池氏信息准则(Akaikes Information Criterion, AIC)以及依赖于通过交叉验证计算的组件测试误差的选择规则。我们实现了广义线性模型的AIC和交叉验证例程,并评估了它们的变量选择特性和预测性能。一项广泛的模拟研究揭示了改进的选择特性,而在年龄标准化的COVID-19发病率的现实世界应用中,预测误差可以降低。
{"title":"Prediction-based variable selection for component-wise gradient boosting.","authors":"Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach","doi":"10.1515/ijb-2023-0052","DOIUrl":"10.1515/ijb-2023-0052","url":null,"abstract":"<p><p>Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"293-314"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1