Randomization-based inference using the Fisher randomization test allows for the computation of Fisher-exact P-values, making it an attractive option for the analysis of small, randomized experiments with non-normal outcomes. Two common test statistics used to perform Fisher randomization tests are the difference-in-means between the treatment and control groups and the covariate-adjusted version of the difference-in-means using analysis of covariance. Modern computing allows for fast computation of the Fisher-exact P-value, but confidence intervals have typically been obtained by inverting the Fisher randomization test over a range of possible effect sizes. The test inversion procedure is computationally expensive, limiting the usage of randomization-based inference in applied work. A recent paper by Zhu and Liu developed a closed form expression for the randomization-based confidence interval using the difference-in-means statistic. We develop an important extension of Zhu and Liu to obtain a closed form expression for the randomization-based covariate-adjusted confidence interval and give practitioners a sufficiency condition that can be checked using observed data and that guarantees that these confidence intervals have correct coverage. Simulations show that our procedure generates randomization-based covariate-adjusted confidence intervals that are robust to non-normality and that can be calculated in nearly the same time as it takes to calculate the Fisher-exact P-value, thus removing the computational barrier to performing randomization-based inference when adjusting for covariates. We also demonstrate our method on a re-analysis of phase I clinical trial data.
使用费舍尔随机化检验进行基于随机化的推断,可以计算费舍尔精确 P 值,使其成为分析具有非正态性结果的小型随机实验的一个有吸引力的选择。进行费舍尔随机化检验常用的两个检验统计量是治疗组和对照组的均值差和使用协方差分析的均值差的协方差调整版本。现代计算技术可以快速计算费雪精确 P 值,但置信区间通常是通过在一系列可能的效应大小范围内反转费雪随机检验来获得的。检验反演过程的计算成本很高,限制了基于随机化的推断在应用工作中的使用。Zhu 和 Liu 最近发表的一篇论文利用均值差统计量开发了基于随机化的置信区间的封闭式表达式。我们在 Zhu 和 Liu 的基础上进行了重要扩展,得到了基于随机化的协变量调整置信区间的闭合形式表达式,并为实践者提供了一个充分条件,该条件可以使用观测数据进行检验,并保证这些置信区间具有正确的覆盖范围。模拟结果表明,我们的程序生成的基于随机化协变量调整的置信区间对非正态性具有稳健性,计算时间几乎与计算 Fisher 精确 P 值的时间相同,从而消除了在调整协变量时进行基于随机化推断的计算障碍。我们还在一期临床试验数据的重新分析中演示了我们的方法。
{"title":"On exact randomization-based covariate-adjusted confidence intervals.","authors":"Jacob Fiksel","doi":"10.1093/biomtc/ujae051","DOIUrl":"https://doi.org/10.1093/biomtc/ujae051","url":null,"abstract":"<p><p>Randomization-based inference using the Fisher randomization test allows for the computation of Fisher-exact P-values, making it an attractive option for the analysis of small, randomized experiments with non-normal outcomes. Two common test statistics used to perform Fisher randomization tests are the difference-in-means between the treatment and control groups and the covariate-adjusted version of the difference-in-means using analysis of covariance. Modern computing allows for fast computation of the Fisher-exact P-value, but confidence intervals have typically been obtained by inverting the Fisher randomization test over a range of possible effect sizes. The test inversion procedure is computationally expensive, limiting the usage of randomization-based inference in applied work. A recent paper by Zhu and Liu developed a closed form expression for the randomization-based confidence interval using the difference-in-means statistic. We develop an important extension of Zhu and Liu to obtain a closed form expression for the randomization-based covariate-adjusted confidence interval and give practitioners a sufficiency condition that can be checked using observed data and that guarantees that these confidence intervals have correct coverage. Simulations show that our procedure generates randomization-based covariate-adjusted confidence intervals that are robust to non-normality and that can be calculated in nearly the same time as it takes to calculate the Fisher-exact P-value, thus removing the computational barrier to performing randomization-based inference when adjusting for covariates. We also demonstrate our method on a re-analysis of phase I clinical trial data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanthirige Lakshika M Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas
Multi-gene panel testing allows many cancer susceptibility genes to be tested quickly at a lower cost making such testing accessible to a broader population. Thus, more patients carrying pathogenic germline mutations in various cancer-susceptibility genes are being identified. This creates a great opportunity, as well as an urgent need, to counsel these patients about appropriate risk-reducing management strategies. Counseling hinges on accurate estimates of age-specific risks of developing various cancers associated with mutations in a specific gene, ie, penetrance estimation. We propose a meta-analysis approach based on a Bayesian hierarchical random-effects model to obtain penetrance estimates by integrating studies reporting different types of risk measures (eg, penetrance, relative risk, odds ratio) while accounting for the associated uncertainties. After estimating posterior distributions of the parameters via a Markov chain Monte Carlo algorithm, we estimate penetrance and credible intervals. We investigate the proposed method and compare with an existing approach via simulations based on studies reporting risks for two moderate-risk breast cancer susceptibility genes, ATM and PALB2. Our proposed method is far superior in terms of coverage probability of credible intervals and mean square error of estimates. Finally, we apply our method to estimate the penetrance of breast cancer among carriers of pathogenic mutations in the ATM gene.
{"title":"Bayesian meta-analysis of penetrance for cancer risk.","authors":"Thanthirige Lakshika M Ruberu, Danielle Braun, Giovanni Parmigiani, Swati Biswas","doi":"10.1093/biomtc/ujae038","DOIUrl":"10.1093/biomtc/ujae038","url":null,"abstract":"<p><p>Multi-gene panel testing allows many cancer susceptibility genes to be tested quickly at a lower cost making such testing accessible to a broader population. Thus, more patients carrying pathogenic germline mutations in various cancer-susceptibility genes are being identified. This creates a great opportunity, as well as an urgent need, to counsel these patients about appropriate risk-reducing management strategies. Counseling hinges on accurate estimates of age-specific risks of developing various cancers associated with mutations in a specific gene, ie, penetrance estimation. We propose a meta-analysis approach based on a Bayesian hierarchical random-effects model to obtain penetrance estimates by integrating studies reporting different types of risk measures (eg, penetrance, relative risk, odds ratio) while accounting for the associated uncertainties. After estimating posterior distributions of the parameters via a Markov chain Monte Carlo algorithm, we estimate penetrance and credible intervals. We investigate the proposed method and compare with an existing approach via simulations based on studies reporting risks for two moderate-risk breast cancer susceptibility genes, ATM and PALB2. Our proposed method is far superior in terms of coverage probability of credible intervals and mean square error of estimates. Finally, we apply our method to estimate the penetrance of breast cancer among carriers of pathogenic mutations in the ATM gene.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141178675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chan Park, David B Richardson, Eric J Tchetgen Tchetgen
Negative control variables are sometimes used in nonexperimental studies to detect the presence of confounding by hidden factors. A negative control outcome (NCO) is an outcome that is influenced by unobserved confounders of the exposure effects on the outcome in view, but is not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO on the treatment-free counterfactual, together with a rank-preserving structural model, which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single-proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose 3 separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly-robust approach. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil.
{"title":"Single proxy control.","authors":"Chan Park, David B Richardson, Eric J Tchetgen Tchetgen","doi":"10.1093/biomtc/ujae027","DOIUrl":"https://doi.org/10.1093/biomtc/ujae027","url":null,"abstract":"<p><p>Negative control variables are sometimes used in nonexperimental studies to detect the presence of confounding by hidden factors. A negative control outcome (NCO) is an outcome that is influenced by unobserved confounders of the exposure effects on the outcome in view, but is not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO on the treatment-free counterfactual, together with a rank-preserving structural model, which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single-proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose 3 separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly-robust approach. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11033710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140847741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruberu et al. (2023) introduce an elegant approach to fit a complicated meta-analysis problem with diverse reporting modalities into the framework of hierarchical Bayesian inference. We discuss issues related to some of the involved parametric model assumptions.
{"title":"Discussion on \"Bayesian meta-analysis of penetrance for cancer risk\" by Thanthirige Lakshika M. Ruberu, Danielle Braun, Giovanni Parmigiani, and Swati Biswas.","authors":"Peter Müller, Bernardo Flores","doi":"10.1093/biomtc/ujae042","DOIUrl":"https://doi.org/10.1093/biomtc/ujae042","url":null,"abstract":"<p><p>Ruberu et al. (2023) introduce an elegant approach to fit a complicated meta-analysis problem with diverse reporting modalities into the framework of hierarchical Bayesian inference. We discuss issues related to some of the involved parametric model assumptions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141178723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dafne Zorzetto, Falco J Bargagli-Stoffi, Antonio Canale, Francesca Dominici
Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.
{"title":"Confounder-dependent Bayesian mixture model: Characterizing heterogeneity of causal effects in air pollution epidemiology.","authors":"Dafne Zorzetto, Falco J Bargagli-Stoffi, Antonio Canale, Francesca Dominici","doi":"10.1093/biomtc/ujae025","DOIUrl":"https://doi.org/10.1093/biomtc/ujae025","url":null,"abstract":"<p><p>Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11028589/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah C Lotspeich, Brian D Richardson, Pedro L Baldoni, Kimberly P Enders, Michael G Hudgens
People living with HIV on antiretroviral therapy often have undetectable virus levels by standard assays, but "latent" HIV still persists in viral reservoirs. Eliminating these reservoirs is the goal of HIV cure research. The quantitative viral outgrowth assay (QVOA) is commonly used to estimate the reservoir size, that is, the infectious units per million (IUPM) of HIV-persistent resting CD4+ T cells. A new variation of the QVOA, the ultra deep sequencing assay of the outgrowth virus (UDSA), was recently developed that further quantifies the number of viral lineages within a subset of infected wells. Performing the UDSA on a subset of wells provides additional information that can improve IUPM estimation. This paper considers statistical inference about the IUPM from combined dilution assay (QVOA) and deep viral sequencing (UDSA) data, even when some deep sequencing data are missing. Methods are proposed to accommodate assays with wells sequenced at multiple dilution levels and with imperfect sensitivity and specificity, and a novel bias-corrected estimator is included for small samples. The proposed methods are evaluated in a simulation study, applied to data from the University of North Carolina HIV Cure Center, and implemented in the open-source R package SLDeepAssay.
接受抗逆转录病毒治疗的艾滋病病毒感染者通常在标准检测方法中检测不到病毒,但 "潜伏 "的艾滋病病毒仍然存在于病毒库中。消除这些病毒库是艾滋病治愈研究的目标。定量病毒外生测定法(QVOA)通常用于估算病毒库的规模,即静息 CD4+ T 细胞中每百万感染单位(IUPM)的 HIV 感染率。最近又开发了一种 QVOA 的新变体--生长期病毒超深度测序分析法(UDSA),它能进一步量化感染井子集内的病毒系数量。在感染井子集上进行 UDSA 可提供额外信息,从而改进 IUPM 估算。本文考虑从联合稀释测定(QVOA)和深度病毒测序(UDSA)数据中对 IUPM 进行统计推断,即使某些深度测序数据缺失。本文提出的方法适用于多稀释水平测序井、灵敏度和特异性不完善的检测,还包括一种适用于小样本的新型偏差校正估算器。我们在模拟研究中对所提出的方法进行了评估,并将其应用于北卡罗来纳大学艾滋病治疗中心的数据,并在开源 R 软件包 SLDeepAssay 中实现。
{"title":"Quantifying the HIV reservoir with dilution assays and deep viral sequencing.","authors":"Sarah C Lotspeich, Brian D Richardson, Pedro L Baldoni, Kimberly P Enders, Michael G Hudgens","doi":"10.1093/biomtc/ujad018","DOIUrl":"10.1093/biomtc/ujad018","url":null,"abstract":"<p><p>People living with HIV on antiretroviral therapy often have undetectable virus levels by standard assays, but \"latent\" HIV still persists in viral reservoirs. Eliminating these reservoirs is the goal of HIV cure research. The quantitative viral outgrowth assay (QVOA) is commonly used to estimate the reservoir size, that is, the infectious units per million (IUPM) of HIV-persistent resting CD4+ T cells. A new variation of the QVOA, the ultra deep sequencing assay of the outgrowth virus (UDSA), was recently developed that further quantifies the number of viral lineages within a subset of infected wells. Performing the UDSA on a subset of wells provides additional information that can improve IUPM estimation. This paper considers statistical inference about the IUPM from combined dilution assay (QVOA) and deep viral sequencing (UDSA) data, even when some deep sequencing data are missing. Methods are proposed to accommodate assays with wells sequenced at multiple dilution levels and with imperfect sensitivity and specificity, and a novel bias-corrected estimator is included for small samples. The proposed methods are evaluated in a simulation study, applied to data from the University of North Carolina HIV Cure Center, and implemented in the open-source R package SLDeepAssay.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10873562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139745915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Epidemiological studies based on 2-phase designs help ensure efficient use of limited resources in situations where certain covariates are prohibitively expensive to measure for a full cohort. Typically, these designs involve 2 steps: In phase I, data on an outcome and inexpensive covariates are acquired, and in phase II, a subsample is chosen in which the costly variable of interest is measured. For right-censored data, 2-phase designs have been primarily based on the Cox model. We develop efficient 2-phase design strategies for settings involving a fraction of long-term survivors due to nonsusceptibility. Using mixture models accommodating a nonsusceptible fraction, we consider 3 regression frameworks, including (a) a logistic "cure" model, (b) a proportional hazards model for those who are susceptible, and (c) regression models for susceptibility and failure time in those susceptible. Importantly, we introduce a novel class of bivariate residual-dependent designs to address the unique challenges presented in scenario (c), which involves 2 parameters of interest. Extensive simulation studies demonstrate the superiority of our approach over various phase II subsampling schemes. We illustrate the method through applications to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.
{"title":"Two-phase designs with failure time processes subject to nonsusceptibility.","authors":"Fangya Mao, Li C Cheung, Richard J Cook","doi":"10.1093/biomtc/ujad038","DOIUrl":"10.1093/biomtc/ujad038","url":null,"abstract":"<p><p>Epidemiological studies based on 2-phase designs help ensure efficient use of limited resources in situations where certain covariates are prohibitively expensive to measure for a full cohort. Typically, these designs involve 2 steps: In phase I, data on an outcome and inexpensive covariates are acquired, and in phase II, a subsample is chosen in which the costly variable of interest is measured. For right-censored data, 2-phase designs have been primarily based on the Cox model. We develop efficient 2-phase design strategies for settings involving a fraction of long-term survivors due to nonsusceptibility. Using mixture models accommodating a nonsusceptible fraction, we consider 3 regression frameworks, including (a) a logistic \"cure\" model, (b) a proportional hazards model for those who are susceptible, and (c) regression models for susceptibility and failure time in those susceptible. Importantly, we introduce a novel class of bivariate residual-dependent designs to address the unique challenges presented in scenario (c), which involves 2 parameters of interest. Extensive simulation studies demonstrate the superiority of our approach over various phase II subsampling schemes. We illustrate the method through applications to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140038631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serge Aleshin-Guendel, Mauricio Sadinle, Jon Wakefield
We organize the discussants' major comments into the following categories: sensitivity analyses, zero counts, model selection, the marginal no-highest-order interaction (NHOI) assumption, and the usefulness of our proposed framework.
{"title":"Rejoinder to the discussion on \"The central role of the identifying assumption in population size estimation\".","authors":"Serge Aleshin-Guendel, Mauricio Sadinle, Jon Wakefield","doi":"10.1093/biomtc/ujad033","DOIUrl":"10.1093/biomtc/ujad033","url":null,"abstract":"<p><p>We organize the discussants' major comments into the following categories: sensitivity analyses, zero counts, model selection, the marginal no-highest-order interaction (NHOI) assumption, and the usefulness of our proposed framework.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140058584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ce Yang, Benjamin Langworthy, Sharon Curhan, Kenneth I Vaden, Gary Curhan, Judy R Dubno, Molin Wang
Age-related hearing loss has a complex etiology. Researchers have made efforts to classify relevant audiometric phenotypes, aiming to enhance medical interventions and improve hearing health. We leveraged existing pattern analyses of age-related hearing loss and implemented the phenotype classification via quadratic discriminant analysis (QDA). We herein propose a method for analyzing the exposure effects on the soft classification probabilities of the phenotypes via estimating equations. Under reasonable assumptions, the estimating equations are unbiased and lead to consistent estimators. The resulting estimator had good finite sample performances in simulation studies. As an illustrative example, we applied our proposed methods to assess the association between a dietary intake pattern, assessed as adherence scores for the dietary approaches to stop hypertension diet calculated using validated food-frequency questionnaires, and audiometric phenotypes (older-normal, metabolic, sensory, and metabolic plus sensory), determined based on data obtained in the Nurses' Health Study II Conservation of Hearing Study, the Audiology Assessment Arm. Our findings suggested that participants with a more healthful dietary pattern were less likely to develop the metabolic plus sensory phenotype of age-related hearing loss.
老年性听力损失的病因复杂。研究人员努力对相关听力表型进行分类,旨在加强医疗干预和改善听力健康。我们利用现有的老年性听力损失模式分析,通过二次判别分析(QDA)实现了表型分类。我们在此提出一种方法,通过估计方程分析暴露对表型软分类概率的影响。在合理的假设条件下,估计方程是无偏的,并能得到一致的估计值。在模拟研究中,所得到的估计器具有良好的有限样本性能。举例说明,我们应用所提出的方法评估了膳食摄入模式与听力表型(老年正常听力、代谢听力、感官听力和代谢加感官听力)之间的关系,膳食摄入模式是通过有效的食物频率调查问卷计算出的高血压防治膳食方法的依从性得分,而听力表型则是根据护士健康研究 II 听力保护研究听力评估臂中获得的数据确定的。我们的研究结果表明,饮食模式更健康的参与者不太可能出现老年性听力损失的代谢加感觉表型。
{"title":"Soft classification and regression analysis of audiometric phenotypes of age-related hearing loss.","authors":"Ce Yang, Benjamin Langworthy, Sharon Curhan, Kenneth I Vaden, Gary Curhan, Judy R Dubno, Molin Wang","doi":"10.1093/biomtc/ujae013","DOIUrl":"10.1093/biomtc/ujae013","url":null,"abstract":"<p><p>Age-related hearing loss has a complex etiology. Researchers have made efforts to classify relevant audiometric phenotypes, aiming to enhance medical interventions and improve hearing health. We leveraged existing pattern analyses of age-related hearing loss and implemented the phenotype classification via quadratic discriminant analysis (QDA). We herein propose a method for analyzing the exposure effects on the soft classification probabilities of the phenotypes via estimating equations. Under reasonable assumptions, the estimating equations are unbiased and lead to consistent estimators. The resulting estimator had good finite sample performances in simulation studies. As an illustrative example, we applied our proposed methods to assess the association between a dietary intake pattern, assessed as adherence scores for the dietary approaches to stop hypertension diet calculated using validated food-frequency questionnaires, and audiometric phenotypes (older-normal, metabolic, sensory, and metabolic plus sensory), determined based on data obtained in the Nurses' Health Study II Conservation of Hearing Study, the Audiology Assessment Arm. Our findings suggested that participants with a more healthful dietary pattern were less likely to develop the metabolic plus sensory phenotype of age-related hearing loss.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140130630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hillary M Heiling, Naim U Rashid, Quefeng Li, Xianlu L Peng, Jen Jen Yeh, Joseph G Ibrahim
Modern biomedical datasets are increasingly high-dimensional and exhibit complex correlation structures. Generalized linear mixed models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high-dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.
{"title":"Efficient computation of high-dimensional penalized generalized linear mixed models by latent factor modeling of the random effects.","authors":"Hillary M Heiling, Naim U Rashid, Quefeng Li, Xianlu L Peng, Jen Jen Yeh, Joseph G Ibrahim","doi":"10.1093/biomtc/ujae016","DOIUrl":"10.1093/biomtc/ujae016","url":null,"abstract":"<p><p>Modern biomedical datasets are increasingly high-dimensional and exhibit complex correlation structures. Generalized linear mixed models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high-dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10946237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140142743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}