Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez
{"title":"交叉验证目标最大似然估计的性能","authors":"Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez","doi":"arxiv-2409.11265","DOIUrl":null,"url":null,"abstract":"Background: Advanced methods for causal inference, such as targeted maximum\nlikelihood estimation (TMLE), require certain conditions for statistical\ninference. However, in situations where there is not differentiability due to\ndata sparsity or near-positivity violations, the Donsker class condition is\nviolated. In such situations, TMLE variance can suffer from inflation of the\ntype I error and poor coverage, leading to conservative confidence intervals.\nCross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve\non performance compared to TMLE in settings of positivity or Donsker class\nviolations. We aim to investigate the performance of CVTMLE compared to TMLE in\nvarious settings. Methods: We utilised the data-generating mechanism as described in Leger et\nal. (2022) to run a Monte Carlo experiment under different Donsker class\nviolations. Then, we evaluated the respective statistical performances of TMLE\nand CVTMLE with different super learner libraries, with and without regression\ntree methods. Results: We found that CVTMLE vastly improves confidence interval coverage\nwithout adversely affecting bias, particularly in settings with small sample\nsizes and near-positivity violations. Furthermore, incorporating regression\ntrees using standard TMLE with ensemble super learner-based initial estimates\nincreases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class\ncondition is no longer necessary to obtain valid statistical inference when\nusing regression trees and under either data sparsity or near-positivity\nviolations. We show through simulations that CVTMLE is much less sensitive to\nthe choice of the super learner library and thereby provides better estimation\nand inference in cases where the super learner library uses more flexible\ncandidates and is prone to overfitting.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of Cross-Validated Targeted Maximum Likelihood Estimation\",\"authors\":\"Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez\",\"doi\":\"arxiv-2409.11265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Advanced methods for causal inference, such as targeted maximum\\nlikelihood estimation (TMLE), require certain conditions for statistical\\ninference. However, in situations where there is not differentiability due to\\ndata sparsity or near-positivity violations, the Donsker class condition is\\nviolated. In such situations, TMLE variance can suffer from inflation of the\\ntype I error and poor coverage, leading to conservative confidence intervals.\\nCross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve\\non performance compared to TMLE in settings of positivity or Donsker class\\nviolations. We aim to investigate the performance of CVTMLE compared to TMLE in\\nvarious settings. Methods: We utilised the data-generating mechanism as described in Leger et\\nal. (2022) to run a Monte Carlo experiment under different Donsker class\\nviolations. Then, we evaluated the respective statistical performances of TMLE\\nand CVTMLE with different super learner libraries, with and without regression\\ntree methods. Results: We found that CVTMLE vastly improves confidence interval coverage\\nwithout adversely affecting bias, particularly in settings with small sample\\nsizes and near-positivity violations. Furthermore, incorporating regression\\ntrees using standard TMLE with ensemble super learner-based initial estimates\\nincreases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class\\ncondition is no longer necessary to obtain valid statistical inference when\\nusing regression trees and under either data sparsity or near-positivity\\nviolations. We show through simulations that CVTMLE is much less sensitive to\\nthe choice of the super learner library and thereby provides better estimation\\nand inference in cases where the super learner library uses more flexible\\ncandidates and is prone to overfitting.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance of Cross-Validated Targeted Maximum Likelihood Estimation
Background: Advanced methods for causal inference, such as targeted maximum
likelihood estimation (TMLE), require certain conditions for statistical
inference. However, in situations where there is not differentiability due to
data sparsity or near-positivity violations, the Donsker class condition is
violated. In such situations, TMLE variance can suffer from inflation of the
type I error and poor coverage, leading to conservative confidence intervals.
Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve
on performance compared to TMLE in settings of positivity or Donsker class
violations. We aim to investigate the performance of CVTMLE compared to TMLE in
various settings. Methods: We utilised the data-generating mechanism as described in Leger et
al. (2022) to run a Monte Carlo experiment under different Donsker class
violations. Then, we evaluated the respective statistical performances of TMLE
and CVTMLE with different super learner libraries, with and without regression
tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage
without adversely affecting bias, particularly in settings with small sample
sizes and near-positivity violations. Furthermore, incorporating regression
trees using standard TMLE with ensemble super learner-based initial estimates
increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class
condition is no longer necessary to obtain valid statistical inference when
using regression trees and under either data sparsity or near-positivity
violations. We show through simulations that CVTMLE is much less sensitive to
the choice of the super learner library and thereby provides better estimation
and inference in cases where the super learner library uses more flexible
candidates and is prone to overfitting.