Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi
The positive predictive value (PPV) and negative predictive value (NPV) can be expressed as functions of disease prevalence ( ) and the ratios of two binomial proportions ( ), where and . In prospective studies, where the proportion of subjects with the disease in the study cohort is an unbiased estimate of the disease prevalence, the confidence intervals (CIs) of PPV and NPV can be estimated using established methods for single proportion. However, in enrichment studies, such as case-control studies, where the proportion of diseased subjects significantly differs from disease prevalence, estimating CIs for PPV and NPV remains a challenge in terms of skewness and overall coverage, especially under extreme conditions (e.g., ). In this article, we extend the method adopted by Li, where CIs for PPV and NPV were derived from those of . We explored additional CI methods for , including those by Gart & Nam (GN), MoverJ, and Walter and convert their corresponding CIs for PPV and NPV. Through simulations, we compared these methods with established CI methods, Fieller, Pepe, and Delta in terms of skewness and overall coverage. While no method proves universally optimal, GN and MoverJ methods generally emerge as recommended choices.
{"title":"Skewness-Corrected Confidence Intervals for Predictive Values in Enrichment Studies.","authors":"Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi","doi":"10.1002/sim.10283","DOIUrl":"https://doi.org/10.1002/sim.10283","url":null,"abstract":"<p><p>The positive predictive value (PPV) and negative predictive value (NPV) can be expressed as functions of disease prevalence ( <math> <semantics><mrow><mi>ρ</mi></mrow> <annotation>$$ rho $$</annotation></semantics> </math> ) and the ratios of two binomial proportions ( <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> ), where <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>ppv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>specificity</mtext></mrow> <mtext>sensitivity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{ppv}=frac{1- specificity}{sensitivity} $$</annotation></semantics> </math> and <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>npv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>sensitivity</mtext></mrow> <mtext>specificity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{npv}=frac{1- sensitivity}{specificity} $$</annotation></semantics> </math> . In prospective studies, where the proportion of subjects with the disease in the study cohort is an unbiased estimate of the disease prevalence, the confidence intervals (CIs) of PPV and NPV can be estimated using established methods for single proportion. However, in enrichment studies, such as case-control studies, where the proportion of diseased subjects significantly differs from disease prevalence, estimating CIs for PPV and NPV remains a challenge in terms of skewness and overall coverage, especially under extreme conditions (e.g., <math> <semantics><mrow><mi>NPV</mi> <mo>=</mo> <mn>1</mn></mrow> <annotation>$$ mathrm{NPV}=1 $$</annotation></semantics> </math> ). In this article, we extend the method adopted by Li, where CIs for PPV and NPV were derived from those of <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> . We explored additional CI methods for <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> , including those by Gart & Nam (GN), MoverJ, and Walter and convert their corresponding CIs for PPV and NPV. Through simulations, we compared these methods with established CI methods, Fieller, Pepe, and Delta in terms of skewness and overall coverage. While no method proves universally optimal, GN and MoverJ methods generally emerge as recommended choices.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20Epub Date: 2024-09-09DOI: 10.1002/sim.10192
Wei Yang, Harold I Feldman, Wensheng Guo
Clustering functional data aims to identify unique functional patterns in the entire domain, but this can be challenging due to phase variability that distorts the observed patterns. Curve registration can be used to remove this variability, but determining the appropriate level of warping flexibility can be complicated. Curve registration also requires a target to which a functional object is aligned, typically the cross-sectional mean of functional objects within the same cluster. However, this mean is unknown prior to clustering. Furthermore, there is a trade-off between flexible warping and the number of resulting clusters. Removing more phase variability through curve registration can lead to fewer remaining variations in the functional data, resulting in a smaller number of clusters. Thus, the optimal number of clusters and warping flexibility cannot be uniquely identified. We propose to use external information to solve the identification issue. We define a cross validated Kullback-Leibler information criterion to select the number of clusters and the warping penalty. The criterion is derived from the predictive classification likelihood considering the joint distribution of both the functional data and external variable and penalizes the uncertainty in the cluster membership. We evaluate our method through simulation and apply it to electrocardiographic data collected in the Chronic Renal Insufficiency Cohort study. We identify two distinct clusters of electrocardiogram (ECG) profiles, with the second cluster exhibiting ST segment depression, an indication of cardiac ischemia, compared to the normal ECG profiles in the first cluster.
对功能数据进行聚类的目的是识别整个领域中的独特功能模式,但由于相位变异会扭曲观察到的模式,这可能具有挑战性。曲线配准可用于消除这种可变性,但确定适当程度的翘曲灵活性可能比较复杂。曲线配准还需要一个与功能对象对齐的目标,通常是同一群组中功能对象的横截面平均值。然而,在聚类之前,这个平均值是未知的。此外,在灵活翘曲和由此产生的聚类数量之间需要权衡。通过曲线配准去除更多的相位变异会导致功能数据中剩余的变异减少,从而导致聚类数量减少。因此,聚类的最佳数量和翘曲的灵活性无法唯一确定。我们建议使用外部信息来解决识别问题。我们定义了一个经过交叉验证的库尔贝克-莱伯勒信息准则来选择聚类数量和翘曲惩罚。该准则源于预测分类可能性,考虑了功能数据和外部变量的联合分布,并对群组成员的不确定性进行惩罚。我们通过模拟评估了我们的方法,并将其应用于慢性肾功能不全队列研究中收集的心电图数据。我们确定了两个不同的心电图(ECG)集群,与第一个集群中正常的心电图相比,第二个集群表现出 ST 段压低,这是心脏缺血的迹象。
{"title":"Selection of number of clusters and warping penalty in clustering functional electrocardiogram.","authors":"Wei Yang, Harold I Feldman, Wensheng Guo","doi":"10.1002/sim.10192","DOIUrl":"10.1002/sim.10192","url":null,"abstract":"<p><p>Clustering functional data aims to identify unique functional patterns in the entire domain, but this can be challenging due to phase variability that distorts the observed patterns. Curve registration can be used to remove this variability, but determining the appropriate level of warping flexibility can be complicated. Curve registration also requires a target to which a functional object is aligned, typically the cross-sectional mean of functional objects within the same cluster. However, this mean is unknown prior to clustering. Furthermore, there is a trade-off between flexible warping and the number of resulting clusters. Removing more phase variability through curve registration can lead to fewer remaining variations in the functional data, resulting in a smaller number of clusters. Thus, the optimal number of clusters and warping flexibility cannot be uniquely identified. We propose to use external information to solve the identification issue. We define a cross validated Kullback-Leibler information criterion to select the number of clusters and the warping penalty. The criterion is derived from the predictive classification likelihood considering the joint distribution of both the functional data and external variable and penalizes the uncertainty in the cluster membership. We evaluate our method through simulation and apply it to electrocardiographic data collected in the Chronic Renal Insufficiency Cohort study. We identify two distinct clusters of electrocardiogram (ECG) profiles, with the second cluster exhibiting ST segment depression, an indication of cardiac ischemia, compared to the normal ECG profiles in the first cluster.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4913-4927"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142154970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root- convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.
比例率模型是分析重复事件数据最常用的方法之一。虽然该模型提供了对协变量效应的直接比率解释,但比例比率假设意味着协变量不会改变比率函数的形状。当比例假设不成立时,我们建议通过两类参数来描述协变量对比率函数的影响:形状参数和大小参数。前者允许协变量灵活地影响速率函数的形状,后者保留了协变量对速率函数大小影响的可解释性。为了克服同时估计两组参数所带来的挑战,我们提出了一种条件伪似然法来消除形状估计中的大小参数,然后用事件计数投影法进行大小估计。所提出的估计值是渐近正态的,收敛率为根 n $$ n $$。我们利用 SEER-Medicare 数据进行了模拟研究和复发性住院分析,以说明所提出的方法。
{"title":"Statistical Inference for Counting Processes Under Shape Heterogeneity.","authors":"Ying Sheng, Yifei Sun","doi":"10.1002/sim.10280","DOIUrl":"https://doi.org/10.1002/sim.10280","url":null,"abstract":"<p><p>Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root- <math> <semantics><mrow><mi>n</mi></mrow> <annotation>$$ n $$</annotation></semantics> </math> convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Chen, Yuquan Wang, Dapeng Shi, Yunlong Cao, Yue-Qing Hu
The instrumental variable method is widely used in causal inference research to improve the accuracy of estimating causal effects. However, the weak correlation between instruments and exposure, as well as the direct impact of instruments on the outcome, can lead to biased estimates. To mitigate the bias introduced by such instruments in nonlinear causal inference, we propose a two-stage nonlinear causal effect estimation based on model averaging. The model uses different subsets of instruments in the first stage to predict exposure after a nonlinear transformation with the help of sliced inverse regression. In the second stage, adaptive Lasso penalty is applied to instruments to obtain the estimation of causal effect. We prove that the proposed estimator exhibits favorable asymptotic properties and evaluate its performance through a series of numerical studies, demonstrating its effectiveness in identifying nonlinear causal effects and its capability to handle scenarios with weak and invalid instruments. We apply the proposed method to the Atherosclerosis Risk in Communities dataset to investigate the relationship between BMI and hypertension.
{"title":"Instrumental Variable Model Average With Applications in Nonlinear Causal Inference.","authors":"Dong Chen, Yuquan Wang, Dapeng Shi, Yunlong Cao, Yue-Qing Hu","doi":"10.1002/sim.10269","DOIUrl":"10.1002/sim.10269","url":null,"abstract":"<p><p>The instrumental variable method is widely used in causal inference research to improve the accuracy of estimating causal effects. However, the weak correlation between instruments and exposure, as well as the direct impact of instruments on the outcome, can lead to biased estimates. To mitigate the bias introduced by such instruments in nonlinear causal inference, we propose a two-stage nonlinear causal effect estimation based on model averaging. The model uses different subsets of instruments in the first stage to predict exposure after a nonlinear transformation with the help of sliced inverse regression. In the second stage, adaptive Lasso penalty is applied to instruments to obtain the estimation of causal effect. We prove that the proposed estimator exhibits favorable asymptotic properties and evaluate its performance through a series of numerical studies, demonstrating its effectiveness in identifying nonlinear causal effects and its capability to handle scenarios with weak and invalid instruments. We apply the proposed method to the Atherosclerosis Risk in Communities dataset to investigate the relationship between BMI and hypertension.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.
概率调查面临的挑战是无应答率越来越高,导致统计推断产生偏差。有关人口的辅助信息可用于减少估计中的偏差。通常情况下,行政记录中的连续辅助变量在向公众公布前会先被离散化,以避免泄密。这可能会削弱行政记录在改进调查估计方面的作用,尤其是当连续辅助信息与调查结果之间存在密切关系时。在本文中,我们提出了一种分两步走的策略,即首先由统计机构利用人口中的保密连续辅助数据估算调查样本的响应倾向得分,然后将其纳入修改后的人口数据中,供数据用户使用。在第二步中,无法获取保密连续辅助数据的数据用户将离散连续变量和倾向得分作为预测因子,利用贝叶斯模型中的样条进行预测性调查推断。我们通过仿真证明,与其他方法相比,所提出的方法性能良好,能更有效地估计人口均值,95% 可信区间的覆盖率更高。我们使用俄亥俄州陆军国民警卫队心理健康计划(OHARNG-MHI)对所提出的方法进行了说明。本研究中开发的方法可在 R 软件包 AuxSurvey 中找到。
{"title":"Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data.","authors":"Sharifa Z Williams, Jungang Zou, Yutao Liu, Yajuan Si, Sandro Galea, Qixuan Chen","doi":"10.1002/sim.10270","DOIUrl":"10.1002/sim.10270","url":null,"abstract":"<p><p>Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>The success of a Mendelian randomization (MR) study critically depends on the validity of the assumptions underlying MR. We focus on detecting heterogeneity (also known as horizontal pleiotropy) in two-sample summary-data MR. A popular approach is to apply Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic method, developed for meta-analysis. However, Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic, including its modifications, is known to lack power when its degrees of freedom are large. Furthermore, there is no theoretical justification for the claimed null distribution of the minimum of the modified Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic with exact weighting ( <math> <semantics> <mrow> <msub><mrow><mi>Q</mi></mrow> <mrow><mi>min</mi></mrow> </msub> </mrow> <annotation>$$ {Q}_{mathrm{min}} $$</annotation></semantics> </math> ), although it seems to perform well in simulation studies.</p><p><strong>Method: </strong>The principle of our proposed method is straightforward: if a set of variables are valid instruments, then any linear combination of these variables is still a valid instrument. Specifically, this principle holds when these linear combinations are formed using eigenvectors derived from a variance matrix. Each linear combination follows a known normal distribution from which a <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value can be calculated. We use the minimum <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value for these eigenvector-based linear combinations as the test statistic. Additionally, we explore a modification of the modified Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic by replacing the weighting matrix with a truncated singular value decomposition.</p><p><strong>Results: </strong>Extensive simulation studies reveal that the proposed methods outperform Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic, including those with modified weights, and MR-PRESSO, another popular method for detecting heterogeneity, in cases where the number of instruments is not large or the Wald ratios take two values. We also demonstrate these methods using empirical examples. Furthermore, we show that <math> <semantics> <mrow> <msub><mrow><mi>Q</mi></mrow> <mrow><mi>min</mi></mrow> </msub> </mrow> <annotation>$$ {Q}_{mathrm{min}} $$</annotation></semantics> </math> does not follow, but is dominated by, the claimed null chi-square distribution. The proposed methods are implemented in an R package iGasso.</p><p><strong>Conclusions: </strong>Dimension reduction techniques are useful
背景:孟德尔随机化(Mendelian randomization,MR)研究的成功与否很大程度上取决于 MR 假设的有效性。我们的重点是检测双样本汇总数据 MR 中的异质性(也称为水平多效性)。一种流行的方法是应用为荟萃分析开发的 Cochran's Q $$ Q $$ 统计方法。然而,众所周知,当自由度较大时,Cochran 的 Q $$ Q $$ 统计法(包括其修改版)缺乏力量。此外,虽然在模拟研究中,修正的科克伦 Q $ Q $ 统计量的精确加权(Q min $$ {Q}_{mathrm{min}} $$)最小值的无效分布似乎表现良好,但并没有理论依据:我们提出的方法原理简单明了:如果一组变量是有效的工具,那么这些变量的任何线性组合仍然是有效的工具。具体来说,当这些线性组合是利用方差矩阵中的特征向量构成时,这一原则就成立了。每个线性组合都遵循已知的正态分布,从中可以计算出 p $$ p $$ 值。我们使用这些基于特征向量的线性组合的最小 p $ p $ 值作为检验统计量。此外,我们还通过用截断奇异值分解代替加权矩阵,对修正的 Cochran Q $$ Q $$ 统计量进行了改进:广泛的模拟研究表明,在工具数量不多或 Wald 比率取两个值的情况下,所提出的方法优于 Cochran Q $$ Q $$ 统计法(包括修改过权重的方法)和另一种流行的异质性检测方法 MR-PRESSO。我们还利用经验实例演示了这些方法。此外,我们还证明了 Q min $$ {Q}_{mathrm{min}}$$ 并不遵循所声称的空驰方分布,而是受其支配。提出的方法在 R 软件包 iGasso 中实现:降维技术有助于对 MR 中的异质性进行强有力的检验。
{"title":"Powerful Test of Heterogeneity in Two-Sample Summary-Data Mendelian Randomization.","authors":"Kai Wang, Steven Y Alberding","doi":"10.1002/sim.10279","DOIUrl":"https://doi.org/10.1002/sim.10279","url":null,"abstract":"<p><strong>Background: </strong>The success of a Mendelian randomization (MR) study critically depends on the validity of the assumptions underlying MR. We focus on detecting heterogeneity (also known as horizontal pleiotropy) in two-sample summary-data MR. A popular approach is to apply Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic method, developed for meta-analysis. However, Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic, including its modifications, is known to lack power when its degrees of freedom are large. Furthermore, there is no theoretical justification for the claimed null distribution of the minimum of the modified Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic with exact weighting ( <math> <semantics> <mrow> <msub><mrow><mi>Q</mi></mrow> <mrow><mi>min</mi></mrow> </msub> </mrow> <annotation>$$ {Q}_{mathrm{min}} $$</annotation></semantics> </math> ), although it seems to perform well in simulation studies.</p><p><strong>Method: </strong>The principle of our proposed method is straightforward: if a set of variables are valid instruments, then any linear combination of these variables is still a valid instrument. Specifically, this principle holds when these linear combinations are formed using eigenvectors derived from a variance matrix. Each linear combination follows a known normal distribution from which a <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value can be calculated. We use the minimum <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value for these eigenvector-based linear combinations as the test statistic. Additionally, we explore a modification of the modified Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic by replacing the weighting matrix with a truncated singular value decomposition.</p><p><strong>Results: </strong>Extensive simulation studies reveal that the proposed methods outperform Cochran's <math> <semantics><mrow><mi>Q</mi></mrow> <annotation>$$ Q $$</annotation></semantics> </math> statistic, including those with modified weights, and MR-PRESSO, another popular method for detecting heterogeneity, in cases where the number of instruments is not large or the Wald ratios take two values. We also demonstrate these methods using empirical examples. Furthermore, we show that <math> <semantics> <mrow> <msub><mrow><mi>Q</mi></mrow> <mrow><mi>min</mi></mrow> </msub> </mrow> <annotation>$$ {Q}_{mathrm{min}} $$</annotation></semantics> </math> does not follow, but is dominated by, the claimed null chi-square distribution. The proposed methods are implemented in an R package iGasso.</p><p><strong>Conclusions: </strong>Dimension reduction techniques are useful ","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Increasingly during the past decade, researchers have sought to leverage auxiliary data for enhancing individualized inference. Many existing methods, such as multisource exchangeability models (MEM), have been developed to borrow information from multiple supplemental sources to support parameter inference in a primary source. MEM and its alternatives decide how much information to borrow based on the exchangeability of the primary and supplemental sources, where exchangeability is defined as equality of the target parameter. Other information that may also help determine the exchangeability of sources is ignored. In this article, we propose a generalized reinforced borrowing framework (RBF) leveraging auxiliary data for enhancing individualized inference using a distance-embedded prior which uses data not only about the target parameter but also uses different types of auxiliary information sources to "reinforce" inference on the target parameter. RBF improves inference with minimal additional computational burden. We demonstrate the application of RBF to a study investigating the impact of the COVID-19 pandemic on individual activity and transportation behaviors, where RBF achieves 20%-40% lower MSE compared with existing methods.
{"title":"Reinforced Borrowing Framework: Leveraging Auxiliary Data for Individualized Inference.","authors":"Ziyu Ji, Julian Wolfson","doi":"10.1002/sim.10267","DOIUrl":"10.1002/sim.10267","url":null,"abstract":"<p><p>Increasingly during the past decade, researchers have sought to leverage auxiliary data for enhancing individualized inference. Many existing methods, such as multisource exchangeability models (MEM), have been developed to borrow information from multiple supplemental sources to support parameter inference in a primary source. MEM and its alternatives decide how much information to borrow based on the exchangeability of the primary and supplemental sources, where exchangeability is defined as equality of the target parameter. Other information that may also help determine the exchangeability of sources is ignored. In this article, we propose a generalized reinforced borrowing framework (RBF) leveraging auxiliary data for enhancing individualized inference using a distance-embedded prior which uses data not only about the target parameter but also uses different types of auxiliary information sources to \"reinforce\" inference on the target parameter. RBF improves inference with minimal additional computational burden. We demonstrate the application of RBF to a study investigating the impact of the COVID-19 pandemic on individual activity and transportation behaviors, where RBF achieves 20%-40% lower MSE compared with existing methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marianthie Wank, Sarah Medley, Roy N Tamura, Thomas M Braun, Kelley M Kidwell
Results from randomized control trials (RCTs) may not be representative when individuals refuse to be randomized or are excluded for having a preference for which treatment they receive. If trial designs do not allow for participant treatment preferences, trials can suffer in accrual, adherence, retention, and external validity of results. Thus, there is interest surrounding clinical trial designs that incorporate participant treatment preferences. We propose a Partially Randomized, Patient Preference, Sequential, Multiple Assignment, Randomized Trial (PRPP-SMART) which combines a Partially Randomized, Patient Preference (PRPP) design with a Sequential, Multiple Assignment, Randomized Trial (SMART) design. This novel PRPP-SMART design is a multi-stage clinical trial design where, at each stage, participants either receive their preferred treatment, or if they do not have a preferred treatment, they are randomized. This paper focuses on the clinical trial design for PRPP-SMARTs and the development of Bayesian and frequentist weighted and replicated regression models (WRRMs) to analyze data from such trials. We propose a two-stage PRPP-SMART with binary end of stage outcomes and estimate the embedded dynamic treatment regimes (DTRs). Our WRRMs use data from both randomized and non-randomized participants for efficient estimation of the DTR effects. We compare our method to a more traditional PRPP analysis which only considers participants randomized to treatment. Our Bayesian and frequentist methods produce more efficient DTR estimates with negligible bias despite the inclusion of non-randomized participants in the analysis. The proposed PRPP-SMART design and analytic method is a promising approach to incorporate participant treatment preferences into clinical trial design.
{"title":"A Partially Randomized Patient Preference, Sequential, Multiple-Assignment, Randomized Trial Design Analyzed via Weighted and Replicated Frequentist and Bayesian Methods.","authors":"Marianthie Wank, Sarah Medley, Roy N Tamura, Thomas M Braun, Kelley M Kidwell","doi":"10.1002/sim.10276","DOIUrl":"https://doi.org/10.1002/sim.10276","url":null,"abstract":"<p><p>Results from randomized control trials (RCTs) may not be representative when individuals refuse to be randomized or are excluded for having a preference for which treatment they receive. If trial designs do not allow for participant treatment preferences, trials can suffer in accrual, adherence, retention, and external validity of results. Thus, there is interest surrounding clinical trial designs that incorporate participant treatment preferences. We propose a Partially Randomized, Patient Preference, Sequential, Multiple Assignment, Randomized Trial (PRPP-SMART) which combines a Partially Randomized, Patient Preference (PRPP) design with a Sequential, Multiple Assignment, Randomized Trial (SMART) design. This novel PRPP-SMART design is a multi-stage clinical trial design where, at each stage, participants either receive their preferred treatment, or if they do not have a preferred treatment, they are randomized. This paper focuses on the clinical trial design for PRPP-SMARTs and the development of Bayesian and frequentist weighted and replicated regression models (WRRMs) to analyze data from such trials. We propose a two-stage PRPP-SMART with binary end of stage outcomes and estimate the embedded dynamic treatment regimes (DTRs). Our WRRMs use data from both randomized and non-randomized participants for efficient estimation of the DTR effects. We compare our method to a more traditional PRPP analysis which only considers participants randomized to treatment. Our Bayesian and frequentist methods produce more efficient DTR estimates with negligible bias despite the inclusion of non-randomized participants in the analysis. The proposed PRPP-SMART design and analytic method is a promising approach to incorporate participant treatment preferences into clinical trial design.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonidas E Bantis, Benjamin Brewer, Christos T Nakas, Benjamin Reiser
Receiver operating characteristic (ROC) curve analysis is widely used in evaluating the effectiveness of a diagnostic test/biomarker or classifier score. A parametric approach for statistical inference on ROC curves based on a Box-Cox transformation to normality has frequently been discussed in the literature. Many investigators have highlighted the difficulty of taking into account the variability of the estimated transformation parameter when carrying out such an analysis. This variability is often ignored and inferences are made by considering the estimated transformation parameter as fixed and known. In this paper, we will review the literature discussing the use of the Box-Cox transformation for ROC curves and the methodology for accounting for the estimation of the Box-Cox transformation parameter in the context of ROC analysis, and detail its application to a number of problems. We present a general framework for inference on any functional of interest, including common measures such as the AUC, the Youden index, and the sensitivity at a given specificity (and vice versa). We further developed a new R package (named 'rocbc') that carries out all discussed approaches and is available in CRAN.
{"title":"Statistical Inference for Box-Cox based Receiver Operating Characteristic Curves.","authors":"Leonidas E Bantis, Benjamin Brewer, Christos T Nakas, Benjamin Reiser","doi":"10.1002/sim.10252","DOIUrl":"https://doi.org/10.1002/sim.10252","url":null,"abstract":"<p><p>Receiver operating characteristic (ROC) curve analysis is widely used in evaluating the effectiveness of a diagnostic test/biomarker or classifier score. A parametric approach for statistical inference on ROC curves based on a Box-Cox transformation to normality has frequently been discussed in the literature. Many investigators have highlighted the difficulty of taking into account the variability of the estimated transformation parameter when carrying out such an analysis. This variability is often ignored and inferences are made by considering the estimated transformation parameter as fixed and known. In this paper, we will review the literature discussing the use of the Box-Cox transformation for ROC curves and the methodology for accounting for the estimation of the Box-Cox transformation parameter in the context of ROC analysis, and detail its application to a number of problems. We present a general framework for inference on any functional of interest, including common measures such as the AUC, the Youden index, and the sensitivity at a given specificity (and vice versa). We further developed a new R package (named 'rocbc') that carries out all discussed approaches and is available in CRAN.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song
Partially linear models provide a valuable tool for modeling failure time data with nonlinear covariate effects. Their applicability and importance in survival analysis have been widely acknowledged. To date, numerous inference methods for such models have been developed under traditional right censoring. However, the existing studies seldom target interval-censored data, which provide more coarse information and frequently occur in many scientific studies involving periodical follow-up. In this work, we propose a flexible class of partially linear transformation models to examine parametric and nonparametric covariate effects for interval-censored outcomes. We consider the sieve maximum likelihood estimation approach that approximates the cumulative baseline hazard function and nonparametric covariate effect with the monotone splines and -splines, respectively. We develop an easy-to-implement expectation-maximization algorithm coupled with three-stage data augmentation to facilitate maximization. We establish the consistency of the proposed estimators and the asymptotic distribution of parametric components based on the empirical process techniques. Numerical results from extensive simulation studies indicate that our proposed method performs satisfactorily in finite samples. An application to a study on hypobaric decompression sickness suggests that the variable TR360 exhibits a significant dynamic and nonlinear effect on the risk of developing hypobaric decompression sickness.
部分线性模型为具有非线性协变量效应的失效时间数据建模提供了宝贵的工具。它们在生存分析中的适用性和重要性已得到广泛认可。迄今为止,在传统的右普查条件下,已开发出许多针对此类模型的推断方法。然而,现有的研究很少针对区间删失数据,而区间删失数据能提供更粗略的信息,并经常出现在许多涉及定期随访的科学研究中。在这项工作中,我们提出了一类灵活的部分线性变换模型,用于检验区间删失结果的参数和非参数协变量效应。我们考虑了筛分最大似然估计方法,该方法分别用单调样条和 B $ B $ B -样条逼近累积基线危险函数和非参数协变量效应。我们开发了一种易于实现的期望最大化算法,并结合了三阶段数据扩增以促进最大化。我们基于经验过程技术,建立了所提出估计器的一致性和参数成分的渐近分布。大量模拟研究的数值结果表明,我们提出的方法在有限样本中的表现令人满意。应用于低压减压病研究的结果表明,变量 TR360 对患低压减压病的风险有显著的动态非线性影响。
{"title":"Sieve Maximum Likelihood Estimation of Partially Linear Transformation Models With Interval-Censored Data.","authors":"Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song","doi":"10.1002/sim.10225","DOIUrl":"https://doi.org/10.1002/sim.10225","url":null,"abstract":"<p><p>Partially linear models provide a valuable tool for modeling failure time data with nonlinear covariate effects. Their applicability and importance in survival analysis have been widely acknowledged. To date, numerous inference methods for such models have been developed under traditional right censoring. However, the existing studies seldom target interval-censored data, which provide more coarse information and frequently occur in many scientific studies involving periodical follow-up. In this work, we propose a flexible class of partially linear transformation models to examine parametric and nonparametric covariate effects for interval-censored outcomes. We consider the sieve maximum likelihood estimation approach that approximates the cumulative baseline hazard function and nonparametric covariate effect with the monotone splines and <math> <semantics><mrow><mi>B</mi></mrow> <annotation>$$ B $$</annotation></semantics> </math> -splines, respectively. We develop an easy-to-implement expectation-maximization algorithm coupled with three-stage data augmentation to facilitate maximization. We establish the consistency of the proposed estimators and the asymptotic distribution of parametric components based on the empirical process techniques. Numerical results from extensive simulation studies indicate that our proposed method performs satisfactorily in finite samples. An application to a study on hypobaric decompression sickness suggests that the variable TR360 exhibits a significant dynamic and nonlinear effect on the risk of developing hypobaric decompression sickness.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}