Multi-arm trials are gaining interest in practice given the statistical and logistical advantages that they can offer. The standard approach is to use a fixed (throughout the trial) allocation ratio, but there is a call for making it adaptive and skewing the allocation of patients towards better performing arms. However, among other challenges, it is well-known that these approaches might suffer from lower statistical power. We present a response-adaptive design for continuous endpoints which explicitly allows to control the trade-off between the number of patients allocated to the 'optimal' arm and the statistical power. Such a balance is achieved through the calibration of a tuning parameter, and we explore various strategies to effectively select it. The proposed criterion is based on a context-dependent information measure which gives a greater weight to those treatment arms which have characteristics close to a pre-specified clinical target. We also introduce a simulation-based hypothesis testing procedure which focuses on selecting the target arm, discussing strategies to effectively control the type-I error rate. The potential advantage of the proposed criterion over currently used alternatives is evaluated in simulations, and its practical implementation is illustrated in the context of early Phase IIa proof-of-concept oncology clinical trials.
鉴于多臂试验在统计和后勤方面的优势,多臂试验在实践中越来越受到关注。标准的方法是使用固定的(整个试验期间)分配比例,但也有人呼吁使其具有适应性,并将患者的分配向表现更好的病区倾斜。然而,众所周知,除其他挑战外,这些方法可能会降低统计功率。我们提出了一种针对连续终点的反应适应性设计,它明确允许控制分配到 "最佳 "臂的患者人数与统计功率之间的权衡。这种平衡是通过校准调谐参数来实现的,我们还探讨了有效选择调谐参数的各种策略。我们提出的标准是基于与上下文相关的信息度量,它赋予那些特征接近预先指定的临床目标的治疗臂更大的权重。我们还介绍了一种基于模拟的假设检验程序,该程序侧重于选择目标臂,并讨论了有效控制 I 类错误率的策略。我们通过模拟评估了所提出的标准相对于目前使用的替代标准的潜在优势,并结合早期 IIa 期概念验证肿瘤临床试验说明了该标准的实际应用情况。
{"title":"A response-adaptive multi-arm design for continuous endpoints based on a weighted information measure","authors":"Gianmarco Caruso, Pavel Mozgunov","doi":"arxiv-2409.04970","DOIUrl":"https://doi.org/arxiv-2409.04970","url":null,"abstract":"Multi-arm trials are gaining interest in practice given the statistical and\u0000logistical advantages that they can offer. The standard approach is to use a\u0000fixed (throughout the trial) allocation ratio, but there is a call for making\u0000it adaptive and skewing the allocation of patients towards better performing\u0000arms. However, among other challenges, it is well-known that these approaches\u0000might suffer from lower statistical power. We present a response-adaptive\u0000design for continuous endpoints which explicitly allows to control the\u0000trade-off between the number of patients allocated to the 'optimal' arm and the\u0000statistical power. Such a balance is achieved through the calibration of a\u0000tuning parameter, and we explore various strategies to effectively select it.\u0000The proposed criterion is based on a context-dependent information measure\u0000which gives a greater weight to those treatment arms which have characteristics\u0000close to a pre-specified clinical target. We also introduce a simulation-based\u0000hypothesis testing procedure which focuses on selecting the target arm,\u0000discussing strategies to effectively control the type-I error rate. The\u0000potential advantage of the proposed criterion over currently used alternatives\u0000is evaluated in simulations, and its practical implementation is illustrated in\u0000the context of early Phase IIa proof-of-concept oncology clinical trials.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper critically examines current methodologies for evaluating models in Conditional and Average Treatment Effect (CATE/ATE) estimation, identifying several key pitfalls in existing practices. The current approach of over-reliance on specific metrics and empirical means and lack of statistical tests necessitates a more rigorous evaluation approach. We propose an automated algorithm for selecting appropriate statistical tests, addressing the trade-offs and assumptions inherent in these tests. Additionally, we emphasize the importance of reporting empirical standard deviations alongside performance metrics and advocate for using Squared Error for Coverage (SEC) and Absolute Error for Coverage (AEC) metrics and empirical histograms of the coverage results as supplementary metrics. These enhancements provide a more comprehensive understanding of model performance in heterogeneous data-generating processes (DGPs). The practical implications are demonstrated through two examples, showcasing the benefits of these methodological improvements, which can significantly improve the robustness and accuracy of future research in statistical models for CATE and ATE estimation.
本文批判性地研究了当前评估条件和平均治疗效果(CATE/ATE)估算模型的方法,指出了现有实践中存在的几个主要缺陷。目前的方法过度依赖具体指标和经验手段,缺乏统计检验,因此需要一种更严格的评估方法。我们提出了一种用于选择适当统计检验的自动化算法,解决了这些检验中固有的取舍和假设问题。此外,我们还强调了在报告性能指标的同时报告经验标准偏差的重要性,并主张使用覆盖率平方误差(SEC)和覆盖率绝对误差(AEC)指标以及覆盖率结果的经验直方图作为补充指标。通过这些改进,可以更全面地了解异构数据生成过程(DGP)中的模型性能。通过两个示例展示了这些方法改进的实际意义,它们可以显著提高未来 CATE 和 ATE 估算统计模型研究的稳健性和准确性。
{"title":"Really Doing Great at Model Evaluation for CATE Estimation? A Critical Consideration of Current Model Evaluation Practices in Treatment Effect Estimation","authors":"Hugo Gobato Souto, Francisco Louzada Neto","doi":"arxiv-2409.05161","DOIUrl":"https://doi.org/arxiv-2409.05161","url":null,"abstract":"This paper critically examines current methodologies for evaluating models in\u0000Conditional and Average Treatment Effect (CATE/ATE) estimation, identifying\u0000several key pitfalls in existing practices. The current approach of\u0000over-reliance on specific metrics and empirical means and lack of statistical\u0000tests necessitates a more rigorous evaluation approach. We propose an automated\u0000algorithm for selecting appropriate statistical tests, addressing the\u0000trade-offs and assumptions inherent in these tests. Additionally, we emphasize\u0000the importance of reporting empirical standard deviations alongside performance\u0000metrics and advocate for using Squared Error for Coverage (SEC) and Absolute\u0000Error for Coverage (AEC) metrics and empirical histograms of the coverage\u0000results as supplementary metrics. These enhancements provide a more\u0000comprehensive understanding of model performance in heterogeneous\u0000data-generating processes (DGPs). The practical implications are demonstrated\u0000through two examples, showcasing the benefits of these methodological\u0000improvements, which can significantly improve the robustness and accuracy of\u0000future research in statistical models for CATE and ATE estimation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Like density functions, period life-table death counts are nonnegative and have a constrained integral, and thus live in a constrained nonlinear space. Implementing established modelling and forecasting methods without obeying these constraints can be problematic for such nonlinear data. We introduce cumulative distribution function transformation to forecast the life-table death counts. Using the Japanese life-table death counts obtained from the Japanese Mortality Database (2024), we evaluate the point and interval forecast accuracies of the proposed approach, which compares favourably to an existing compositional data analytic approach. The improved forecast accuracy of life-table death counts is of great interest to demographers for estimating age-specific survival probabilities and life expectancy and actuaries for determining temporary annuity prices for different ages and maturities.
{"title":"Forecasting Age Distribution of Deaths: Cumulative Distribution Function Transformation","authors":"Han Lin Shang, Steven Haberman","doi":"arxiv-2409.04981","DOIUrl":"https://doi.org/arxiv-2409.04981","url":null,"abstract":"Like density functions, period life-table death counts are nonnegative and\u0000have a constrained integral, and thus live in a constrained nonlinear space.\u0000Implementing established modelling and forecasting methods without obeying\u0000these constraints can be problematic for such nonlinear data. We introduce\u0000cumulative distribution function transformation to forecast the life-table\u0000death counts. Using the Japanese life-table death counts obtained from the\u0000Japanese Mortality Database (2024), we evaluate the point and interval forecast\u0000accuracies of the proposed approach, which compares favourably to an existing\u0000compositional data analytic approach. The improved forecast accuracy of\u0000life-table death counts is of great interest to demographers for estimating\u0000age-specific survival probabilities and life expectancy and actuaries for\u0000determining temporary annuity prices for different ages and maturities.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article gives an integrative review of research using projective methods in the consumer research domain. We give a general historical overview of the use of projective methods, both in psychology and in consumer research applications, and discuss the reliability and validity aspects and measurement for projective techniques. We review the literature on projective techniques in the areas of marketing, hospitality & tourism, and consumer & food science, with a mixed methods research focus on the interplay of qualitative and quantitative techniques. We review the use of several quantitative techniques used for structuring and analyzing projective data and run an empirical reanalysis of previously gathered data. We give recommendations for improved rigor and for potential future work involving mixed methods in projective techniques.
{"title":"Projective Techniques in Consumer Research: A Mixed Methods-Focused Review and Empirical Reanalysis","authors":"Stephen L. France","doi":"arxiv-2409.04995","DOIUrl":"https://doi.org/arxiv-2409.04995","url":null,"abstract":"This article gives an integrative review of research using projective methods\u0000in the consumer research domain. We give a general historical overview of the\u0000use of projective methods, both in psychology and in consumer research\u0000applications, and discuss the reliability and validity aspects and measurement\u0000for projective techniques. We review the literature on projective techniques in\u0000the areas of marketing, hospitality & tourism, and consumer & food science,\u0000with a mixed methods research focus on the interplay of qualitative and\u0000quantitative techniques. We review the use of several quantitative techniques\u0000used for structuring and analyzing projective data and run an empirical\u0000reanalysis of previously gathered data. We give recommendations for improved\u0000rigor and for potential future work involving mixed methods in projective\u0000techniques.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiewen Liu, Todd A. Miano, Stephen Griffiths, Michael G. S. Shashaty, Wei Yang
Marginal structural models (MSMs) are widely used in observational studies to estimate the causal effect of time-varying treatments. Despite its popularity, limited attention has been paid to summarizing the treatment history in the outcome model, which proves particularly challenging when individuals' treatment trajectories exhibit complex patterns over time. Commonly used metrics such as the average treatment level fail to adequately capture the treatment history, hindering causal interpretation. For scenarios where treatment histories exhibit distinct temporal patterns, we develop a new approach to parameterize the outcome model. We apply latent growth curve analysis to identify representative treatment trajectories from the observed data and use the posterior probability of latent class membership to summarize the different treatment trajectories. We demonstrate its use in parameterizing the MSMs, which facilitates the interpretations of the results. We apply the method to analyze data from an existing cohort of lung transplant recipients to estimate the effect of Tacrolimus concentrations on the risk of incident chronic kidney disease.
{"title":"Marginal Structural Modeling of Representative Treatment Trajectories","authors":"Jiewen Liu, Todd A. Miano, Stephen Griffiths, Michael G. S. Shashaty, Wei Yang","doi":"arxiv-2409.04933","DOIUrl":"https://doi.org/arxiv-2409.04933","url":null,"abstract":"Marginal structural models (MSMs) are widely used in observational studies to\u0000estimate the causal effect of time-varying treatments. Despite its popularity,\u0000limited attention has been paid to summarizing the treatment history in the\u0000outcome model, which proves particularly challenging when individuals'\u0000treatment trajectories exhibit complex patterns over time. Commonly used\u0000metrics such as the average treatment level fail to adequately capture the\u0000treatment history, hindering causal interpretation. For scenarios where\u0000treatment histories exhibit distinct temporal patterns, we develop a new\u0000approach to parameterize the outcome model. We apply latent growth curve\u0000analysis to identify representative treatment trajectories from the observed\u0000data and use the posterior probability of latent class membership to summarize\u0000the different treatment trajectories. We demonstrate its use in parameterizing\u0000the MSMs, which facilitates the interpretations of the results. We apply the\u0000method to analyze data from an existing cohort of lung transplant recipients to\u0000estimate the effect of Tacrolimus concentrations on the risk of incident\u0000chronic kidney disease.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of automated machine learning (AutoML) solutions are developed in Python, however a large percentage of data scientists are associated with the R language. Unfortunately, there are limited R solutions available. Moreover high entry level means they are not accessible to everyone, due to required knowledge about machine learning (ML). To fill this gap, we present the forester package, which offers ease of use regardless of the user's proficiency in the area of machine learning. The forester is an open-source AutoML package implemented in R designed for training high-quality tree-based models on tabular data. It fully supports binary and multiclass classification, regression, and partially survival analysis tasks. With just a few functions, the user is capable of detecting issues regarding the data quality, preparing the preprocessing pipeline, training and tuning tree-based models, evaluating the results, and creating the report for further analysis.
大多数自动化机器学习(AutoML)解决方案都是用 Python 开发的,但也有很大一部分数据科学家使用 R 语言。遗憾的是,目前可用的 R 语言解决方案非常有限。此外,由于需要具备机器学习(ML)方面的知识,因此入门级较高的解决方案并非人人都能使用。为了填补这一空白,我们推出了 forester 软件包,无论用户在机器学习领域是否熟练,都能轻松使用。forester 是一个用 R 实现的开源 AutoML 软件包,旨在对表格数据训练基于树的高质量模型。它完全支持二元和多类分类、回归和部分生存分析任务。用户只需使用几个函数,就能检测数据质量问题,准备预处理管道,训练和调整基于树的模型,评估结果,并创建用于进一步分析的报告。
{"title":"forester: A Tree-Based AutoML Tool in R","authors":"Hubert Ruczyński, Anna Kozak","doi":"arxiv-2409.04789","DOIUrl":"https://doi.org/arxiv-2409.04789","url":null,"abstract":"The majority of automated machine learning (AutoML) solutions are developed\u0000in Python, however a large percentage of data scientists are associated with\u0000the R language. Unfortunately, there are limited R solutions available.\u0000Moreover high entry level means they are not accessible to everyone, due to\u0000required knowledge about machine learning (ML). To fill this gap, we present\u0000the forester package, which offers ease of use regardless of the user's\u0000proficiency in the area of machine learning. The forester is an open-source AutoML package implemented in R designed for\u0000training high-quality tree-based models on tabular data. It fully supports\u0000binary and multiclass classification, regression, and partially survival\u0000analysis tasks. With just a few functions, the user is capable of detecting\u0000issues regarding the data quality, preparing the preprocessing pipeline,\u0000training and tuning tree-based models, evaluating the results, and creating the\u0000report for further analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling the interference effect is an important issue in the field of causal inference. Existing studies rely on explicit and often homogeneous assumptions regarding interference structures. In this paper, we introduce a low-rank and sparse treatment effect model that leverages data-driven techniques to identify the locations of interference effects. A profiling algorithm is proposed to estimate the model coefficients, and based on these estimates, global test and local detection methods are established to detect the existence of interference and the interference neighbor locations for each unit. We derive the non-asymptotic bound of the estimation error, and establish theoretical guarantees for the global test and the accuracy of the detection method in terms of Jaccard index. Simulations and real data examples are provided to demonstrate the usefulness of the proposed method.
{"title":"Spatial Interference Detection in Treatment Effect Model","authors":"Wei Zhang, Fang Yao, Ying Yang","doi":"arxiv-2409.04836","DOIUrl":"https://doi.org/arxiv-2409.04836","url":null,"abstract":"Modeling the interference effect is an important issue in the field of causal\u0000inference. Existing studies rely on explicit and often homogeneous assumptions\u0000regarding interference structures. In this paper, we introduce a low-rank and\u0000sparse treatment effect model that leverages data-driven techniques to identify\u0000the locations of interference effects. A profiling algorithm is proposed to\u0000estimate the model coefficients, and based on these estimates, global test and\u0000local detection methods are established to detect the existence of interference\u0000and the interference neighbor locations for each unit. We derive the\u0000non-asymptotic bound of the estimation error, and establish theoretical\u0000guarantees for the global test and the accuracy of the detection method in\u0000terms of Jaccard index. Simulations and real data examples are provided to\u0000demonstrate the usefulness of the proposed method.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesus E. Vazquez, Marissa C. Ashner, Yanyuan Ma, Karen Marder, Tanya P. Garcia
While right-censored time-to-event outcomes have been studied for decades, handling time-to-event covariates, also known as right-censored covariates, is now of growing interest. So far, the literature has treated right-censored covariates as distinct from missing covariates, overlooking the potential applicability of estimators to both scenarios. We bridge this gap by establishing connections between right-censored and missing covariates under various assumptions about censoring and missingness, allowing us to identify parallels and differences to determine when estimators can be used in both contexts. These connections reveal adaptations to five estimators for right-censored covariates in the unexplored area of informative covariate right-censoring and to formulate a new estimator for this setting, where the event time depends on the censoring time. We establish the asymptotic properties of the six estimators, evaluate their robustness under incorrect distributional assumptions, and establish their comparative efficiency. We conducted a simulation study to confirm our theoretical results, and then applied all estimators to a Huntington disease observational study to analyze cognitive impairments as a function of time to clinical diagnosis.
{"title":"Establishing the Parallels and Differences Between Right-Censored and Missing Covariates","authors":"Jesus E. Vazquez, Marissa C. Ashner, Yanyuan Ma, Karen Marder, Tanya P. Garcia","doi":"arxiv-2409.04684","DOIUrl":"https://doi.org/arxiv-2409.04684","url":null,"abstract":"While right-censored time-to-event outcomes have been studied for decades,\u0000handling time-to-event covariates, also known as right-censored covariates, is\u0000now of growing interest. So far, the literature has treated right-censored\u0000covariates as distinct from missing covariates, overlooking the potential\u0000applicability of estimators to both scenarios. We bridge this gap by\u0000establishing connections between right-censored and missing covariates under\u0000various assumptions about censoring and missingness, allowing us to identify\u0000parallels and differences to determine when estimators can be used in both\u0000contexts. These connections reveal adaptations to five estimators for\u0000right-censored covariates in the unexplored area of informative covariate\u0000right-censoring and to formulate a new estimator for this setting, where the\u0000event time depends on the censoring time. We establish the asymptotic\u0000properties of the six estimators, evaluate their robustness under incorrect\u0000distributional assumptions, and establish their comparative efficiency. We\u0000conducted a simulation study to confirm our theoretical results, and then\u0000applied all estimators to a Huntington disease observational study to analyze\u0000cognitive impairments as a function of time to clinical diagnosis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markov Chain Monte Carlo (MCMC), and Tensor Networks (TN) are two powerful frameworks for numerically investigating many-body systems, each offering distinct advantages. MCMC, with its flexibility and theoretical consistency, is well-suited for simulating arbitrary systems by sampling. TN, on the other hand, provides a powerful tensor-based language for capturing the entanglement properties intrinsic to many-body systems, offering a universal representation of these systems. In this work, we leverage the computational strengths of TN to design a versatile cluster MCMC sampler. Specifically, we propose a general framework for constructing tensor-based cluster MCMC methods, enabling arbitrary cluster updates by utilizing TNs to compute the distributions required in the MCMC sampler. Our framework unifies several existing cluster algorithms as special cases and allows for natural extensions. We demonstrate our method by applying it to the simulation of the two-dimensional Edwards-Anderson Model and the three-dimensional Ising Model. This work is dedicated to the memory of Prof. David Draper.
{"title":"A Unified Framework for Cluster Methods with Tensor Networks","authors":"Erdong Guo, David Draper","doi":"arxiv-2409.04729","DOIUrl":"https://doi.org/arxiv-2409.04729","url":null,"abstract":"Markov Chain Monte Carlo (MCMC), and Tensor Networks (TN) are two powerful\u0000frameworks for numerically investigating many-body systems, each offering\u0000distinct advantages. MCMC, with its flexibility and theoretical consistency, is\u0000well-suited for simulating arbitrary systems by sampling. TN, on the other\u0000hand, provides a powerful tensor-based language for capturing the entanglement\u0000properties intrinsic to many-body systems, offering a universal representation\u0000of these systems. In this work, we leverage the computational strengths of TN\u0000to design a versatile cluster MCMC sampler. Specifically, we propose a general\u0000framework for constructing tensor-based cluster MCMC methods, enabling\u0000arbitrary cluster updates by utilizing TNs to compute the distributions\u0000required in the MCMC sampler. Our framework unifies several existing cluster\u0000algorithms as special cases and allows for natural extensions. We demonstrate\u0000our method by applying it to the simulation of the two-dimensional\u0000Edwards-Anderson Model and the three-dimensional Ising Model. This work is\u0000dedicated to the memory of Prof. David Draper.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the effect of treatments from natural experiments, where treatments are pre-assigned, is an important and well-studied problem. We introduce a novel natural experiment dataset obtained from an early childhood literacy nonprofit. Surprisingly, applying over 20 established estimators to the dataset produces inconsistent results in evaluating the nonprofit's efficacy. To address this, we create a benchmark to evaluate estimator accuracy using synthetic outcomes, whose design was guided by domain experts. The benchmark extensively explores performance as real world conditions like sample size, treatment correlation, and propensity score accuracy vary. Based on our benchmark, we observe that the class of doubly robust treatment effect estimators, which are based on simple and intuitive regression adjustment, generally outperform other more complicated estimators by orders of magnitude. To better support our theoretical understanding of doubly robust estimators, we derive a closed form expression for the variance of any such estimator that uses dataset splitting to obtain an unbiased estimate. This expression motivates the design of a new doubly robust estimator that uses a novel loss function when fitting functions for regression adjustment. We release the dataset and benchmark in a Python package; the package is built in a modular way to facilitate new datasets and estimators.
{"title":"Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm","authors":"R. Teal Witter, Christopher Musco","doi":"arxiv-2409.04500","DOIUrl":"https://doi.org/arxiv-2409.04500","url":null,"abstract":"Estimating the effect of treatments from natural experiments, where\u0000treatments are pre-assigned, is an important and well-studied problem. We\u0000introduce a novel natural experiment dataset obtained from an early childhood\u0000literacy nonprofit. Surprisingly, applying over 20 established estimators to\u0000the dataset produces inconsistent results in evaluating the nonprofit's\u0000efficacy. To address this, we create a benchmark to evaluate estimator accuracy\u0000using synthetic outcomes, whose design was guided by domain experts. The\u0000benchmark extensively explores performance as real world conditions like sample\u0000size, treatment correlation, and propensity score accuracy vary. Based on our\u0000benchmark, we observe that the class of doubly robust treatment effect\u0000estimators, which are based on simple and intuitive regression adjustment,\u0000generally outperform other more complicated estimators by orders of magnitude.\u0000To better support our theoretical understanding of doubly robust estimators, we\u0000derive a closed form expression for the variance of any such estimator that\u0000uses dataset splitting to obtain an unbiased estimate. This expression\u0000motivates the design of a new doubly robust estimator that uses a novel loss\u0000function when fitting functions for regression adjustment. We release the\u0000dataset and benchmark in a Python package; the package is built in a modular\u0000way to facilitate new datasets and estimators.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}