Storey's estimator for the proportion of true null hypotheses, originally proposed under the continuous framework, has been modified in this work under the discrete framework. The modification results in improved estimation of the parameter of interest. The proposed estimator is used to formulate an adaptive version of the Benjamini–Hochberg procedure. Control over the false discovery rate by the proposed adaptive procedure has been proved analytically. The proposed estimate is also used to formulate an adaptive version of the Benjamini–Hochberg–Heyse procedure. Simulation experiments establish the conservative nature of this new adaptive procedure. Substantial amount of gain in power is observed for the new adaptive procedures over the standard procedures. For demonstration of the proposed method, two important real life gene expression data sets, one related to the study of HIV and the other related to methylation study, are used.
斯多里的真实零假设比例估计器最初是在连续框架下提出的,在这项工作中根据离散框架进行了修改。这一修改改进了对相关参数的估计。所提出的估计方法被用于制定本杰明-霍奇伯格程序的自适应版本。所提出的自适应程序对错误发现率的控制已得到分析证明。所提出的估计值还可用于制定自适应版本的 Benjamini-Hochberg-Heyse 程序。模拟实验证明了这种新的自适应程序的保守性。与标准程序相比,新的自适应程序获得了大量的功率增益。为了演示所提出的方法,我们使用了两个重要的真实基因表达数据集,一个与 HIV 研究有关,另一个与甲基化研究有关。
{"title":"Estimating the proportion of true null hypotheses and adaptive false discovery rate control in discrete paradigm","authors":"Aniket Biswas, Gaurangadeb Chattopadhyay","doi":"10.1002/bimj.202200204","DOIUrl":"10.1002/bimj.202200204","url":null,"abstract":"<p>Storey's estimator for the proportion of true null hypotheses, originally proposed under the continuous framework, has been modified in this work under the discrete framework. The modification results in improved estimation of the parameter of interest. The proposed estimator is used to formulate an adaptive version of the Benjamini–Hochberg procedure. Control over the false discovery rate by the proposed adaptive procedure has been proved analytically. The proposed estimate is also used to formulate an adaptive version of the Benjamini–Hochberg–Heyse procedure. Simulation experiments establish the conservative nature of this new adaptive procedure. Substantial amount of gain in power is observed for the new adaptive procedures over the standard procedures. For demonstration of the proposed method, two important real life gene expression data sets, one related to the study of HIV and the other related to methylation study, are used.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karla Monterrubio-Gómez, Nathan Constantine-Cooke, Catalina A. Vallejos
When modeling competing risks (CR) survival data, several techniques have been proposed in both the statistical and machine learning literature. State-of-the-art methods have extended classical approaches with more flexible assumptions that can improve predictive performance, allow high-dimensional data and missing values, among others. Despite this, modern approaches have not been widely employed in applied settings. This article aims to aid the uptake of such methods by providing a condensed compendium of CR survival methods with a unified notation and interpretation across approaches. We highlight available software and, when possible, demonstrate their usage via reproducible R vignettes. Moreover, we discuss two major concerns that can affect benchmark studies in this context: the choice of performance metrics and reproducibility.
在对竞争风险(CR)生存数据建模时,统计和机器学习文献中都提出了几种技术。最先进的方法对经典方法进行了扩展,采用了更灵活的假设,可以提高预测性能,允许高维数据和缺失值等。尽管如此,现代方法尚未在应用环境中得到广泛应用。本文旨在通过提供一份简明的 CR 生存方法简编,对各种方法进行统一的符号和解释,从而帮助这些方法的应用。我们重点介绍了可用的软件,并在可能的情况下通过可重现的 R 小节演示了这些软件的用法。此外,我们还讨论了在这种情况下可能影响基准研究的两个主要问题:性能指标的选择和可重复性。
{"title":"A review on statistical and machine learning competing risks methods","authors":"Karla Monterrubio-Gómez, Nathan Constantine-Cooke, Catalina A. Vallejos","doi":"10.1002/bimj.202300060","DOIUrl":"10.1002/bimj.202300060","url":null,"abstract":"<p>When modeling competing risks (CR) survival data, several techniques have been proposed in both the statistical and machine learning literature. State-of-the-art methods have extended classical approaches with more flexible assumptions that can improve predictive performance, allow high-dimensional data and missing values, among others. Despite this, modern approaches have not been widely employed in applied settings. This article aims to aid the uptake of such methods by providing a condensed compendium of CR survival methods with a unified notation and interpretation across approaches. We highlight available software and, when possible, demonstrate their usage via reproducible R vignettes. Moreover, we discuss two major concerns that can affect benchmark studies in this context: the choice of performance metrics and reproducibility.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300060","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139731093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raphael Rehms, Nicole Ellenbach, Eva Rehfuess, Jacob Burns, Ulrich Mansmann, Sabine Hoffmann
Infectious disease models can serve as critical tools to predict the development of cases and associated healthcare demand and to determine the set of nonpharmaceutical interventions (NPIs) that is most effective in slowing the spread of an infectious agent. Current approaches to estimate NPI effects typically focus on relatively short time periods and either on the number of reported cases, deaths, intensive care occupancy, or hospital occupancy as a single indicator of disease transmission. In this work, we propose a Bayesian hierarchical model that integrates multiple outcomes and complementary sources of information in the estimation of the true and unknown number of infections while accounting for time-varying underreporting and weekday-specific delays in reported cases and deaths, allowing us to estimate the number of infections on a daily basis rather than having to smooth the data. To address dynamic changes occurring over long periods of time, we account for the spread of new variants, seasonality, and time-varying differences in host susceptibility. We implement a Markov chain Monte Carlo algorithm to conduct Bayesian inference and illustrate the proposed approach with data on COVID-19 from 20 European countries. The approach shows good performance on simulated data and produces posterior predictions that show a good fit to reported cases, deaths, hospital, and intensive care occupancy.
{"title":"A Bayesian hierarchical approach to account for evidence and uncertainty in the modeling of infectious diseases: An application to COVID-19","authors":"Raphael Rehms, Nicole Ellenbach, Eva Rehfuess, Jacob Burns, Ulrich Mansmann, Sabine Hoffmann","doi":"10.1002/bimj.202200341","DOIUrl":"10.1002/bimj.202200341","url":null,"abstract":"<p>Infectious disease models can serve as critical tools to predict the development of cases and associated healthcare demand and to determine the set of nonpharmaceutical interventions (NPIs) that is most effective in slowing the spread of an infectious agent. Current approaches to estimate NPI effects typically focus on relatively short time periods and either on the number of reported cases, deaths, intensive care occupancy, or hospital occupancy as a single indicator of disease transmission. In this work, we propose a Bayesian hierarchical model that integrates multiple outcomes and complementary sources of information in the estimation of the true and unknown number of infections while accounting for time-varying underreporting and weekday-specific delays in reported cases and deaths, allowing us to estimate the number of infections on a daily basis rather than having to smooth the data. To address dynamic changes occurring over long periods of time, we account for the spread of new variants, seasonality, and time-varying differences in host susceptibility. We implement a Markov chain Monte Carlo algorithm to conduct Bayesian inference and illustrate the proposed approach with data on COVID-19 from 20 European countries. The approach shows good performance on simulated data and produces posterior predictions that show a good fit to reported cases, deaths, hospital, and intensive care occupancy.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139572218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rheanna M. Mainzer, Cattram D. Nguyen, John B. Carlin, Margarita Moreno-Betancur, Ian R. White, Katherine J. Lee
Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include is not always straightforward. Several data-driven auxiliary variable selection strategies have been proposed, but there has been limited evaluation of their performance. Using a simulation study we evaluated the performance of eight auxiliary variable selection strategies: (1, 2) two versions of selection based on correlations in the observed data; (3) selection using hypothesis tests of the “missing completely at random” assumption; (4) replacing auxiliary variables with their principal components; (5, 6) forward and forward stepwise selection; (7) forward selection based on the estimated fraction of missing information; and (8) selection via the least absolute shrinkage and selection operator (LASSO). A complete case analysis and an MI analysis using all auxiliary variables (the “full model”) were included for comparison. We also applied all strategies to a motivating case study. The full model outperformed all auxiliary variable selection strategies in the simulation study, with the LASSO strategy the best performing auxiliary variable selection strategy overall. All MI analysis strategies that we were able to apply to the case study led to similar estimates, although computational time was substantially reduced when variable selection was employed. This study provides further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the LASSO may be a promising alternative when the full model fails or is too burdensome.
多重估算(MI)是处理缺失数据的一种常用方法。可以在估算模型中加入辅助变量来改进 MI 估计值。然而,选择加入哪些辅助变量并不总是那么简单。目前已经提出了几种数据驱动的辅助变量选择策略,但对其性能的评估还很有限。通过模拟研究,我们评估了八种辅助变量选择策略的性能:(1, 2) 基于观测数据相关性的两种选择版本;(3) 使用 "完全随机缺失 "假设的假设检验进行选择;(4) 用主成分替换辅助变量;(5, 6) 向前和向前逐步选择;(7) 基于缺失信息估计分数的向前选择;(8) 通过最小绝对收缩和选择算子(LASSO)进行选择。为了进行比较,我们纳入了完整病例分析和使用所有辅助变量的 MI 分析("完整模型")。我们还将所有策略应用于一项激励性案例研究。在模拟研究中,完整模型的表现优于所有辅助变量选择策略,而 LASSO 策略是整体表现最好的辅助变量选择策略。我们在案例研究中采用的所有 MI 分析策略都得出了相似的估算结果,不过在采用变量选择策略时,计算时间大大缩短。这项研究为尽可能采用包容性辅助变量策略提供了进一步支持。当完整模型失效或过于繁琐时,使用 LASSO 进行辅助变量选择可能是一种很有前途的替代方法。
{"title":"A comparison of strategies for selecting auxiliary variables for multiple imputation","authors":"Rheanna M. Mainzer, Cattram D. Nguyen, John B. Carlin, Margarita Moreno-Betancur, Ian R. White, Katherine J. Lee","doi":"10.1002/bimj.202200291","DOIUrl":"https://doi.org/10.1002/bimj.202200291","url":null,"abstract":"<p>Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include is not always straightforward. Several data-driven auxiliary variable selection strategies have been proposed, but there has been limited evaluation of their performance. Using a simulation study we evaluated the performance of eight auxiliary variable selection strategies: (1, 2) two versions of selection based on correlations in the observed data; (3) selection using hypothesis tests of the “missing completely at random” assumption; (4) replacing auxiliary variables with their principal components; (5, 6) forward and forward stepwise selection; (7) forward selection based on the estimated fraction of missing information; and (8) selection via the least absolute shrinkage and selection operator (LASSO). A complete case analysis and an MI analysis using all auxiliary variables (the “full model”) were included for comparison. We also applied all strategies to a motivating case study. The full model outperformed all auxiliary variable selection strategies in the simulation study, with the LASSO strategy the best performing auxiliary variable selection strategy overall. All MI analysis strategies that we were able to apply to the case study led to similar estimates, although computational time was substantially reduced when variable selection was employed. This study provides further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the LASSO may be a promising alternative when the full model fails or is too burdensome.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200291","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139550466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An inference procedure is proposed to provide consistent estimators of parameters in a modal regression model with a covariate prone to measurement error. A score-based diagnostic tool exploiting parametric bootstrap is developed to assess adequacy of parametric assumptions imposed on the regression model. The proposed estimation method and diagnostic tool are applied to synthetic data generated from simulation experiments and data from real-world applications to demonstrate their implementation and performance. These empirical examples illustrate the importance of adequately accounting for measurement error in the error-prone covariate when inferring the association between a response and covariates based on a modal regression model that is especially suitable for skewed and heavy-tailed response data.
{"title":"Parametric modal regression with error in covariates","authors":"Qingyang Liu, Xianzheng Huang","doi":"10.1002/bimj.202200348","DOIUrl":"10.1002/bimj.202200348","url":null,"abstract":"<p>An inference procedure is proposed to provide consistent estimators of parameters in a modal regression model with a covariate prone to measurement error. A score-based diagnostic tool exploiting parametric bootstrap is developed to assess adequacy of parametric assumptions imposed on the regression model. The proposed estimation method and diagnostic tool are applied to synthetic data generated from simulation experiments and data from real-world applications to demonstrate their implementation and performance. These empirical examples illustrate the importance of adequately accounting for measurement error in the error-prone covariate when inferring the association between a response and covariates based on a modal regression model that is especially suitable for skewed and heavy-tailed response data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139492257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gianmarco Caruso, Pierfrancesco Alaimo Di Loro, Marco Mingione, Luca Tardella, Daniela Silvia Pace, Giovanna Jona Lasinio
This work aims to show how prior knowledge about the structure of a heterogeneous animal population can be leveraged to improve the abundance estimation from capture–recapture survey data. We combine the Open Jolly-Seber model with finite mixtures and propose a parsimonious specification tailored to the residency patterns of the common bottlenose dolphin. We employ a Bayesian framework for our inference, discussing the appropriate choice of priors to mitigate label-switching and nonidentifiability issues, commonly associated with finite mixture models. We conduct a series of simulation experiments to illustrate the competitive advantage of our proposal over less specific alternatives. The proposed approach is applied to data collected on the common bottlenose dolphin population inhabiting the Tiber River estuary (Mediterranean Sea). Our results provide novel insights into this population's size and structure, shedding light on some of the ecological processes governing its dynamics.
{"title":"Finite mixtures in capture–recapture surveys for modeling residency patterns in marine wildlife populations","authors":"Gianmarco Caruso, Pierfrancesco Alaimo Di Loro, Marco Mingione, Luca Tardella, Daniela Silvia Pace, Giovanna Jona Lasinio","doi":"10.1002/bimj.202200350","DOIUrl":"https://doi.org/10.1002/bimj.202200350","url":null,"abstract":"<p>This work aims to show how prior knowledge about the structure of a heterogeneous animal population can be leveraged to improve the abundance estimation from capture–recapture survey data. We combine the Open Jolly-Seber model with finite mixtures and propose a parsimonious specification tailored to the residency patterns of the common bottlenose dolphin. We employ a Bayesian framework for our inference, discussing the appropriate choice of priors to mitigate label-switching and nonidentifiability issues, commonly associated with finite mixture models. We conduct a series of simulation experiments to illustrate the competitive advantage of our proposal over less specific alternatives. The proposed approach is applied to data collected on the common bottlenose dolphin population inhabiting the Tiber River estuary (Mediterranean Sea). Our results provide novel insights into this population's size and structure, shedding light on some of the ecological processes governing its dynamics.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139473964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The two-sample problem is one of the earliest problems in statistics: given two samples, the question is whether or not the observations were sampled from the same distribution. Many statistical tests have been developed for this problem, and many tests have been evaluated in simulation studies, but hardly any study has tried to set up a neutral comparison study. In this paper, we introduce an open science initiative that potentially allows for neutral comparisons of two-sample tests. It is designed as an open-source R package, a repository, and an online R Shiny app. This paper describes the principles, the design of the system and illustrates the use of the system.
双样本问题是统计学中最早出现的问题之一:在给定两个样本的情况下,问题在于观测值是否从相同的分布中抽取。针对这个问题已经开发了许多统计检验方法,并在模拟研究中对许多检验方法进行了评估,但几乎没有任何研究试图建立一个中立的比较研究。在本文中,我们介绍了一项开放科学计划,该计划有可能实现双样本检验的中性比较。它被设计成一个开源 R 软件包、一个资源库和一个在线 R Shiny 应用程序。本文介绍了该系统的原理和设计,并说明了该系统的使用方法。
{"title":"Neutralise: An open science initiative for neutral comparison of two-sample tests","authors":"Leyla Kodalci, Olivier Thas","doi":"10.1002/bimj.202200237","DOIUrl":"https://doi.org/10.1002/bimj.202200237","url":null,"abstract":"<p>The two-sample problem is one of the earliest problems in statistics: given two samples, the question is whether or not the observations were sampled from the same distribution. Many statistical tests have been developed for this problem, and many tests have been evaluated in simulation studies, but hardly any study has tried to set up a neutral comparison study. In this paper, we introduce an open science initiative that potentially allows for neutral comparisons of two-sample tests. It is designed as an open-source R package, a repository, and an online R Shiny app. This paper describes the principles, the design of the system and illustrates the use of the system.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200237","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139435302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For low prevalence disease, we consider estimation of the odds ratio for two specified groups of individuals using group testing data. Broadly the two groups may be classified as “the exposed” and “the unexposed.” Often in observational studies, the exposure status is not correctly recorded. In addition, diagnostic tests are rarely completely accurate. The proposed model accounts for imperfect sensitivity and specificity of diagnostic tests along with the misclassification in the exposure status. For model identifiability, we make use of internal validation data, where a subsample of reasonably small size is selected from the original sample by simple random sampling without replacement. Pseudo-maximum likelihood method is employed for the estimation of the model parameters. The performance of group testing methodology is compared with individual testing for different parametric configurations. A limited data study related to COVID-19 prevalence is performed to illustrate the methodology.
{"title":"Estimation of odds ratio from group testing data with misclassified exposure","authors":"Surupa Roy, Sumanta Adhya, Subrata Rana","doi":"10.1002/bimj.202200254","DOIUrl":"https://doi.org/10.1002/bimj.202200254","url":null,"abstract":"<p>For low prevalence disease, we consider estimation of the odds ratio for two specified groups of individuals using group testing data. Broadly the two groups may be classified as “the exposed” and “the unexposed.” Often in observational studies, the exposure status is not correctly recorded. In addition, diagnostic tests are rarely completely accurate. The proposed model accounts for imperfect sensitivity and specificity of diagnostic tests along with the misclassification in the exposure status. For model identifiability, we make use of internal validation data, where a subsample of reasonably small size is selected from the original sample by simple random sampling without replacement. Pseudo-maximum likelihood method is employed for the estimation of the model parameters. The performance of group testing methodology is compared with individual testing for different parametric configurations. A limited data study related to COVID-19 prevalence is performed to illustrate the methodology.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139435301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With reference to a stratified case–control (CC) procedure based on a binary variable of primary interest, we derive the expression of the distortion induced by the sampling design on the parameters of the logistic model of a secondary variable. This is particularly relevant when performing mediation analysis (possibly in a causal framework) with stratified case–control (SCC) data in settings where both the outcome and the mediator are binary. Despite being designed for parametric identification, our strategy is general and can be used also in a nonparametric context. With reference to parametric estimation, we derive the maximum likelihood (ML) estimator and the M-estimator of the joint outcome–mediator parameter vector. We then conduct a simulation study focusing on the main causal mediation quantities (i.e., natural effects) and comparing M- and ML estimation to existing methods, based on weighting. As an illustrative example, we reanalyze a German CC data set in order to investigate whether the effect of reduced immunocompetency on listeriosis onset is mediated by the intake of gastric acid suppressors.
参照基于二元主变量的分层病例对照(CC)程序,我们推导出抽样设计对次要变量逻辑模型参数的扭曲表达式。在结果和中介变量都是二元变量的情况下,利用分层病例对照(SCC)数据进行中介分析(可能在因果框架内)时,这一点尤为重要。尽管我们的策略是为参数识别而设计的,但它具有通用性,也可用于非参数环境。参照参数估计,我们推导出了最大似然(ML)估计器和结果-中介联合参数向量的 M-估计器。然后,我们以主要因果中介量(即自然效应)为重点进行了模拟研究,并将 M-估计法和 ML 估计法与基于加权的现有方法进行了比较。作为一个示例,我们重新分析了德国的 CC 数据集,以研究免疫能力下降对李斯特菌病发病的影响是否由胃酸抑制剂的摄入起中介作用。
{"title":"Mediation analysis with case–control sampling: Identification and estimation in the presence of a binary mediator","authors":"Marco Doretti, Minna Genbäck, Elena Stanghellini","doi":"10.1002/bimj.202300089","DOIUrl":"https://doi.org/10.1002/bimj.202300089","url":null,"abstract":"<p>With reference to a stratified case–control (CC) procedure based on a binary variable of primary interest, we derive the expression of the distortion induced by the sampling design on the parameters of the logistic model of a secondary variable. This is particularly relevant when performing mediation analysis (possibly in a causal framework) with stratified case–control (SCC) data in settings where both the outcome and the mediator are binary. Despite being designed for parametric identification, our strategy is general and can be used also in a nonparametric context. With reference to parametric estimation, we derive the maximum likelihood (ML) estimator and the M-estimator of the joint outcome–mediator parameter vector. We then conduct a simulation study focusing on the main causal mediation quantities (i.e., natural effects) and comparing M- and ML estimation to existing methods, based on weighting. As an illustrative example, we reanalyze a German CC data set in order to investigate whether the effect of reduced immunocompetency on listeriosis onset is mediated by the intake of gastric acid suppressors.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300089","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139435303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}