首页 > 最新文献

arXiv - STAT - Methodology最新文献

英文 中文
A response-adaptive multi-arm design for continuous endpoints based on a weighted information measure 基于加权信息测量的连续终点反应自适应多臂设计
Pub Date : 2024-09-08 DOI: arxiv-2409.04970
Gianmarco Caruso, Pavel Mozgunov
Multi-arm trials are gaining interest in practice given the statistical andlogistical advantages that they can offer. The standard approach is to use afixed (throughout the trial) allocation ratio, but there is a call for makingit adaptive and skewing the allocation of patients towards better performingarms. However, among other challenges, it is well-known that these approachesmight suffer from lower statistical power. We present a response-adaptivedesign for continuous endpoints which explicitly allows to control thetrade-off between the number of patients allocated to the 'optimal' arm and thestatistical power. Such a balance is achieved through the calibration of atuning parameter, and we explore various strategies to effectively select it.The proposed criterion is based on a context-dependent information measurewhich gives a greater weight to those treatment arms which have characteristicsclose to a pre-specified clinical target. We also introduce a simulation-basedhypothesis testing procedure which focuses on selecting the target arm,discussing strategies to effectively control the type-I error rate. Thepotential advantage of the proposed criterion over currently used alternativesis evaluated in simulations, and its practical implementation is illustrated inthe context of early Phase IIa proof-of-concept oncology clinical trials.
鉴于多臂试验在统计和后勤方面的优势,多臂试验在实践中越来越受到关注。标准的方法是使用固定的(整个试验期间)分配比例,但也有人呼吁使其具有适应性,并将患者的分配向表现更好的病区倾斜。然而,众所周知,除其他挑战外,这些方法可能会降低统计功率。我们提出了一种针对连续终点的反应适应性设计,它明确允许控制分配到 "最佳 "臂的患者人数与统计功率之间的权衡。这种平衡是通过校准调谐参数来实现的,我们还探讨了有效选择调谐参数的各种策略。我们提出的标准是基于与上下文相关的信息度量,它赋予那些特征接近预先指定的临床目标的治疗臂更大的权重。我们还介绍了一种基于模拟的假设检验程序,该程序侧重于选择目标臂,并讨论了有效控制 I 类错误率的策略。我们通过模拟评估了所提出的标准相对于目前使用的替代标准的潜在优势,并结合早期 IIa 期概念验证肿瘤临床试验说明了该标准的实际应用情况。
{"title":"A response-adaptive multi-arm design for continuous endpoints based on a weighted information measure","authors":"Gianmarco Caruso, Pavel Mozgunov","doi":"arxiv-2409.04970","DOIUrl":"https://doi.org/arxiv-2409.04970","url":null,"abstract":"Multi-arm trials are gaining interest in practice given the statistical and\u0000logistical advantages that they can offer. The standard approach is to use a\u0000fixed (throughout the trial) allocation ratio, but there is a call for making\u0000it adaptive and skewing the allocation of patients towards better performing\u0000arms. However, among other challenges, it is well-known that these approaches\u0000might suffer from lower statistical power. We present a response-adaptive\u0000design for continuous endpoints which explicitly allows to control the\u0000trade-off between the number of patients allocated to the 'optimal' arm and the\u0000statistical power. Such a balance is achieved through the calibration of a\u0000tuning parameter, and we explore various strategies to effectively select it.\u0000The proposed criterion is based on a context-dependent information measure\u0000which gives a greater weight to those treatment arms which have characteristics\u0000close to a pre-specified clinical target. We also introduce a simulation-based\u0000hypothesis testing procedure which focuses on selecting the target arm,\u0000discussing strategies to effectively control the type-I error rate. The\u0000potential advantage of the proposed criterion over currently used alternatives\u0000is evaluated in simulations, and its practical implementation is illustrated in\u0000the context of early Phase IIa proof-of-concept oncology clinical trials.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Really Doing Great at Model Evaluation for CATE Estimation? A Critical Consideration of Current Model Evaluation Practices in Treatment Effect Estimation CATE 估算的模型评估真的做得很好吗?对当前治疗效果估算模型评估实践的批判性思考
Pub Date : 2024-09-08 DOI: arxiv-2409.05161
Hugo Gobato Souto, Francisco Louzada Neto
This paper critically examines current methodologies for evaluating models inConditional and Average Treatment Effect (CATE/ATE) estimation, identifyingseveral key pitfalls in existing practices. The current approach ofover-reliance on specific metrics and empirical means and lack of statisticaltests necessitates a more rigorous evaluation approach. We propose an automatedalgorithm for selecting appropriate statistical tests, addressing thetrade-offs and assumptions inherent in these tests. Additionally, we emphasizethe importance of reporting empirical standard deviations alongside performancemetrics and advocate for using Squared Error for Coverage (SEC) and AbsoluteError for Coverage (AEC) metrics and empirical histograms of the coverageresults as supplementary metrics. These enhancements provide a morecomprehensive understanding of model performance in heterogeneousdata-generating processes (DGPs). The practical implications are demonstratedthrough two examples, showcasing the benefits of these methodologicalimprovements, which can significantly improve the robustness and accuracy offuture research in statistical models for CATE and ATE estimation.
本文批判性地研究了当前评估条件和平均治疗效果(CATE/ATE)估算模型的方法,指出了现有实践中存在的几个主要缺陷。目前的方法过度依赖具体指标和经验手段,缺乏统计检验,因此需要一种更严格的评估方法。我们提出了一种用于选择适当统计检验的自动化算法,解决了这些检验中固有的取舍和假设问题。此外,我们还强调了在报告性能指标的同时报告经验标准偏差的重要性,并主张使用覆盖率平方误差(SEC)和覆盖率绝对误差(AEC)指标以及覆盖率结果的经验直方图作为补充指标。通过这些改进,可以更全面地了解异构数据生成过程(DGP)中的模型性能。通过两个示例展示了这些方法改进的实际意义,它们可以显著提高未来 CATE 和 ATE 估算统计模型研究的稳健性和准确性。
{"title":"Really Doing Great at Model Evaluation for CATE Estimation? A Critical Consideration of Current Model Evaluation Practices in Treatment Effect Estimation","authors":"Hugo Gobato Souto, Francisco Louzada Neto","doi":"arxiv-2409.05161","DOIUrl":"https://doi.org/arxiv-2409.05161","url":null,"abstract":"This paper critically examines current methodologies for evaluating models in\u0000Conditional and Average Treatment Effect (CATE/ATE) estimation, identifying\u0000several key pitfalls in existing practices. The current approach of\u0000over-reliance on specific metrics and empirical means and lack of statistical\u0000tests necessitates a more rigorous evaluation approach. We propose an automated\u0000algorithm for selecting appropriate statistical tests, addressing the\u0000trade-offs and assumptions inherent in these tests. Additionally, we emphasize\u0000the importance of reporting empirical standard deviations alongside performance\u0000metrics and advocate for using Squared Error for Coverage (SEC) and Absolute\u0000Error for Coverage (AEC) metrics and empirical histograms of the coverage\u0000results as supplementary metrics. These enhancements provide a more\u0000comprehensive understanding of model performance in heterogeneous\u0000data-generating processes (DGPs). The practical implications are demonstrated\u0000through two examples, showcasing the benefits of these methodological\u0000improvements, which can significantly improve the robustness and accuracy of\u0000future research in statistical models for CATE and ATE estimation.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecasting Age Distribution of Deaths: Cumulative Distribution Function Transformation 预测死亡年龄分布:累积分布函数变换
Pub Date : 2024-09-08 DOI: arxiv-2409.04981
Han Lin Shang, Steven Haberman
Like density functions, period life-table death counts are nonnegative andhave a constrained integral, and thus live in a constrained nonlinear space.Implementing established modelling and forecasting methods without obeyingthese constraints can be problematic for such nonlinear data. We introducecumulative distribution function transformation to forecast the life-tabledeath counts. Using the Japanese life-table death counts obtained from theJapanese Mortality Database (2024), we evaluate the point and interval forecastaccuracies of the proposed approach, which compares favourably to an existingcompositional data analytic approach. The improved forecast accuracy oflife-table death counts is of great interest to demographers for estimatingage-specific survival probabilities and life expectancy and actuaries fordetermining temporary annuity prices for different ages and maturities.
与密度函数一样,周期生命表死亡人数也是非负的,并且具有一个受约束的积分,因此存在于一个受约束的非线性空间中。我们引入累积分布函数变换来预测生命表死亡数。利用从日本死亡率数据库(2024 年)中获得的日本生命表死亡人数,我们评估了所提出方法的点和区间预测误差,其结果优于现有的组合数据分析方法。生命表死亡人数预测准确性的提高对人口学家估计特定年龄的生存概率和预期寿命以及精算师确定不同年龄和期限的临时年金价格具有重大意义。
{"title":"Forecasting Age Distribution of Deaths: Cumulative Distribution Function Transformation","authors":"Han Lin Shang, Steven Haberman","doi":"arxiv-2409.04981","DOIUrl":"https://doi.org/arxiv-2409.04981","url":null,"abstract":"Like density functions, period life-table death counts are nonnegative and\u0000have a constrained integral, and thus live in a constrained nonlinear space.\u0000Implementing established modelling and forecasting methods without obeying\u0000these constraints can be problematic for such nonlinear data. We introduce\u0000cumulative distribution function transformation to forecast the life-table\u0000death counts. Using the Japanese life-table death counts obtained from the\u0000Japanese Mortality Database (2024), we evaluate the point and interval forecast\u0000accuracies of the proposed approach, which compares favourably to an existing\u0000compositional data analytic approach. The improved forecast accuracy of\u0000life-table death counts is of great interest to demographers for estimating\u0000age-specific survival probabilities and life expectancy and actuaries for\u0000determining temporary annuity prices for different ages and maturities.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projective Techniques in Consumer Research: A Mixed Methods-Focused Review and Empirical Reanalysis 消费者研究中的投射技术:以混合方法为重点的回顾与实证再分析
Pub Date : 2024-09-08 DOI: arxiv-2409.04995
Stephen L. France
This article gives an integrative review of research using projective methodsin the consumer research domain. We give a general historical overview of theuse of projective methods, both in psychology and in consumer researchapplications, and discuss the reliability and validity aspects and measurementfor projective techniques. We review the literature on projective techniques inthe areas of marketing, hospitality & tourism, and consumer & food science,with a mixed methods research focus on the interplay of qualitative andquantitative techniques. We review the use of several quantitative techniquesused for structuring and analyzing projective data and run an empiricalreanalysis of previously gathered data. We give recommendations for improvedrigor and for potential future work involving mixed methods in projectivetechniques.
本文对消费者研究领域中使用投射法的研究进行了综合评述。我们概述了投射法在心理学和消费者研究应用中的历史,并讨论了投射法的信度、效度以及测量方法。我们回顾了市场营销、酒店与旅游、消费者与食品科学领域中有关投射法的文献,重点介绍了定性与定量技术相互作用的混合研究方法。我们回顾了用于构建和分析投射数据的几种定量技术,并对以前收集的数据进行了实证分析。我们为改进研究方法和未来可能开展的涉及投射技术混合方法的工作提出了建议。
{"title":"Projective Techniques in Consumer Research: A Mixed Methods-Focused Review and Empirical Reanalysis","authors":"Stephen L. France","doi":"arxiv-2409.04995","DOIUrl":"https://doi.org/arxiv-2409.04995","url":null,"abstract":"This article gives an integrative review of research using projective methods\u0000in the consumer research domain. We give a general historical overview of the\u0000use of projective methods, both in psychology and in consumer research\u0000applications, and discuss the reliability and validity aspects and measurement\u0000for projective techniques. We review the literature on projective techniques in\u0000the areas of marketing, hospitality & tourism, and consumer & food science,\u0000with a mixed methods research focus on the interplay of qualitative and\u0000quantitative techniques. We review the use of several quantitative techniques\u0000used for structuring and analyzing projective data and run an empirical\u0000reanalysis of previously gathered data. We give recommendations for improved\u0000rigor and for potential future work involving mixed methods in projective\u0000techniques.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marginal Structural Modeling of Representative Treatment Trajectories 代表性治疗轨迹的边际结构模型
Pub Date : 2024-09-07 DOI: arxiv-2409.04933
Jiewen Liu, Todd A. Miano, Stephen Griffiths, Michael G. S. Shashaty, Wei Yang
Marginal structural models (MSMs) are widely used in observational studies toestimate the causal effect of time-varying treatments. Despite its popularity,limited attention has been paid to summarizing the treatment history in theoutcome model, which proves particularly challenging when individuals'treatment trajectories exhibit complex patterns over time. Commonly usedmetrics such as the average treatment level fail to adequately capture thetreatment history, hindering causal interpretation. For scenarios wheretreatment histories exhibit distinct temporal patterns, we develop a newapproach to parameterize the outcome model. We apply latent growth curveanalysis to identify representative treatment trajectories from the observeddata and use the posterior probability of latent class membership to summarizethe different treatment trajectories. We demonstrate its use in parameterizingthe MSMs, which facilitates the interpretations of the results. We apply themethod to analyze data from an existing cohort of lung transplant recipients toestimate the effect of Tacrolimus concentrations on the risk of incidentchronic kidney disease.
边际结构模型(MSM)被广泛应用于观察性研究,以估计随时间变化的治疗方法的因果效应。尽管边际结构模型很受欢迎,但人们对在结果模型中总结治疗历史的关注却很有限,而当个体的治疗轨迹随时间呈现出复杂的模式时,边际结构模型就显得尤其具有挑战性。常用的指标,如平均治疗水平,无法充分反映治疗历史,从而阻碍了因果关系的解释。针对治疗历史表现出独特时间模式的情况,我们开发了一种新方法来对结果模型进行参数化。我们应用潜增长曲线分析法从观测数据中识别出具有代表性的治疗轨迹,并使用潜类成员资格的后验概率来总结不同的治疗轨迹。我们展示了该方法在 MSM 参数化中的应用,这有助于对结果进行解释。我们将该方法用于分析现有肺移植受者队列的数据,以估计他克莫司浓度对慢性肾病发病风险的影响。
{"title":"Marginal Structural Modeling of Representative Treatment Trajectories","authors":"Jiewen Liu, Todd A. Miano, Stephen Griffiths, Michael G. S. Shashaty, Wei Yang","doi":"arxiv-2409.04933","DOIUrl":"https://doi.org/arxiv-2409.04933","url":null,"abstract":"Marginal structural models (MSMs) are widely used in observational studies to\u0000estimate the causal effect of time-varying treatments. Despite its popularity,\u0000limited attention has been paid to summarizing the treatment history in the\u0000outcome model, which proves particularly challenging when individuals'\u0000treatment trajectories exhibit complex patterns over time. Commonly used\u0000metrics such as the average treatment level fail to adequately capture the\u0000treatment history, hindering causal interpretation. For scenarios where\u0000treatment histories exhibit distinct temporal patterns, we develop a new\u0000approach to parameterize the outcome model. We apply latent growth curve\u0000analysis to identify representative treatment trajectories from the observed\u0000data and use the posterior probability of latent class membership to summarize\u0000the different treatment trajectories. We demonstrate its use in parameterizing\u0000the MSMs, which facilitates the interpretations of the results. We apply the\u0000method to analyze data from an existing cohort of lung transplant recipients to\u0000estimate the effect of Tacrolimus concentrations on the risk of incident\u0000chronic kidney disease.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
forester: A Tree-Based AutoML Tool in R forester:基于树的 R 语言 AutoML 工具
Pub Date : 2024-09-07 DOI: arxiv-2409.04789
Hubert Ruczyński, Anna Kozak
The majority of automated machine learning (AutoML) solutions are developedin Python, however a large percentage of data scientists are associated withthe R language. Unfortunately, there are limited R solutions available.Moreover high entry level means they are not accessible to everyone, due torequired knowledge about machine learning (ML). To fill this gap, we presentthe forester package, which offers ease of use regardless of the user'sproficiency in the area of machine learning. The forester is an open-source AutoML package implemented in R designed fortraining high-quality tree-based models on tabular data. It fully supportsbinary and multiclass classification, regression, and partially survivalanalysis tasks. With just a few functions, the user is capable of detectingissues regarding the data quality, preparing the preprocessing pipeline,training and tuning tree-based models, evaluating the results, and creating thereport for further analysis.
大多数自动化机器学习(AutoML)解决方案都是用 Python 开发的,但也有很大一部分数据科学家使用 R 语言。遗憾的是,目前可用的 R 语言解决方案非常有限。此外,由于需要具备机器学习(ML)方面的知识,因此入门级较高的解决方案并非人人都能使用。为了填补这一空白,我们推出了 forester 软件包,无论用户在机器学习领域是否熟练,都能轻松使用。forester 是一个用 R 实现的开源 AutoML 软件包,旨在对表格数据训练基于树的高质量模型。它完全支持二元和多类分类、回归和部分生存分析任务。用户只需使用几个函数,就能检测数据质量问题,准备预处理管道,训练和调整基于树的模型,评估结果,并创建用于进一步分析的报告。
{"title":"forester: A Tree-Based AutoML Tool in R","authors":"Hubert Ruczyński, Anna Kozak","doi":"arxiv-2409.04789","DOIUrl":"https://doi.org/arxiv-2409.04789","url":null,"abstract":"The majority of automated machine learning (AutoML) solutions are developed\u0000in Python, however a large percentage of data scientists are associated with\u0000the R language. Unfortunately, there are limited R solutions available.\u0000Moreover high entry level means they are not accessible to everyone, due to\u0000required knowledge about machine learning (ML). To fill this gap, we present\u0000the forester package, which offers ease of use regardless of the user's\u0000proficiency in the area of machine learning. The forester is an open-source AutoML package implemented in R designed for\u0000training high-quality tree-based models on tabular data. It fully supports\u0000binary and multiclass classification, regression, and partially survival\u0000analysis tasks. With just a few functions, the user is capable of detecting\u0000issues regarding the data quality, preparing the preprocessing pipeline,\u0000training and tuning tree-based models, evaluating the results, and creating the\u0000report for further analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Interference Detection in Treatment Effect Model 治疗效果模型中的空间干扰检测
Pub Date : 2024-09-07 DOI: arxiv-2409.04836
Wei Zhang, Fang Yao, Ying Yang
Modeling the interference effect is an important issue in the field of causalinference. Existing studies rely on explicit and often homogeneous assumptionsregarding interference structures. In this paper, we introduce a low-rank andsparse treatment effect model that leverages data-driven techniques to identifythe locations of interference effects. A profiling algorithm is proposed toestimate the model coefficients, and based on these estimates, global test andlocal detection methods are established to detect the existence of interferenceand the interference neighbor locations for each unit. We derive thenon-asymptotic bound of the estimation error, and establish theoreticalguarantees for the global test and the accuracy of the detection method interms of Jaccard index. Simulations and real data examples are provided todemonstrate the usefulness of the proposed method.
建立干扰效应模型是因果推断领域的一个重要问题。现有的研究依赖于明确的、通常是同质的干扰结构假设。在本文中,我们引入了一种低秩和稀疏的治疗效果模型,利用数据驱动技术来确定干扰效应的位置。我们提出了一种剖析算法来估算模型系数,并基于这些估算建立了全局测试和局部检测方法,以检测干扰的存在和每个单元的干扰邻域位置。我们推导出了估计误差的近似边界,并为全局检验和检测方法的准确性建立了与 Jaccard 指数相关的理论保证。我们还提供了模拟和真实数据实例,以证明所提方法的实用性。
{"title":"Spatial Interference Detection in Treatment Effect Model","authors":"Wei Zhang, Fang Yao, Ying Yang","doi":"arxiv-2409.04836","DOIUrl":"https://doi.org/arxiv-2409.04836","url":null,"abstract":"Modeling the interference effect is an important issue in the field of causal\u0000inference. Existing studies rely on explicit and often homogeneous assumptions\u0000regarding interference structures. In this paper, we introduce a low-rank and\u0000sparse treatment effect model that leverages data-driven techniques to identify\u0000the locations of interference effects. A profiling algorithm is proposed to\u0000estimate the model coefficients, and based on these estimates, global test and\u0000local detection methods are established to detect the existence of interference\u0000and the interference neighbor locations for each unit. We derive the\u0000non-asymptotic bound of the estimation error, and establish theoretical\u0000guarantees for the global test and the accuracy of the detection method in\u0000terms of Jaccard index. Simulations and real data examples are provided to\u0000demonstrate the usefulness of the proposed method.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Establishing the Parallels and Differences Between Right-Censored and Missing Covariates 确定右删失变量和缺失变量之间的相似性和差异性
Pub Date : 2024-09-07 DOI: arxiv-2409.04684
Jesus E. Vazquez, Marissa C. Ashner, Yanyuan Ma, Karen Marder, Tanya P. Garcia
While right-censored time-to-event outcomes have been studied for decades,handling time-to-event covariates, also known as right-censored covariates, isnow of growing interest. So far, the literature has treated right-censoredcovariates as distinct from missing covariates, overlooking the potentialapplicability of estimators to both scenarios. We bridge this gap byestablishing connections between right-censored and missing covariates undervarious assumptions about censoring and missingness, allowing us to identifyparallels and differences to determine when estimators can be used in bothcontexts. These connections reveal adaptations to five estimators forright-censored covariates in the unexplored area of informative covariateright-censoring and to formulate a new estimator for this setting, where theevent time depends on the censoring time. We establish the asymptoticproperties of the six estimators, evaluate their robustness under incorrectdistributional assumptions, and establish their comparative efficiency. Weconducted a simulation study to confirm our theoretical results, and thenapplied all estimators to a Huntington disease observational study to analyzecognitive impairments as a function of time to clinical diagnosis.
右删失时间到事件结果的研究已有几十年历史,而处理时间到事件协变量(也称为右删失协变量)现在越来越受到关注。迄今为止,相关文献一直将右删失协变量与缺失协变量区别对待,忽略了估计值在这两种情况下的潜在适用性。我们通过建立右删失协变量与缺失协变量之间的联系,弥补了这一不足,从而使我们能够找出两者之间的相似之处和不同之处,以确定何时可以在这两种情况下使用估计量。这些联系揭示了在信息协变量右删减这一尚未探索的领域中,五个右删减协变量估计器的适应性,并为这一环境提出了一个新的估计器,其中事件时间取决于删减时间。我们建立了六个估计器的渐近特性,评估了它们在不正确分布假设下的稳健性,并建立了它们的比较效率。我们进行了一项模拟研究来证实我们的理论结果,并将所有估计器应用到亨廷顿病的观察研究中,分析认知障碍与临床诊断时间的函数关系。
{"title":"Establishing the Parallels and Differences Between Right-Censored and Missing Covariates","authors":"Jesus E. Vazquez, Marissa C. Ashner, Yanyuan Ma, Karen Marder, Tanya P. Garcia","doi":"arxiv-2409.04684","DOIUrl":"https://doi.org/arxiv-2409.04684","url":null,"abstract":"While right-censored time-to-event outcomes have been studied for decades,\u0000handling time-to-event covariates, also known as right-censored covariates, is\u0000now of growing interest. So far, the literature has treated right-censored\u0000covariates as distinct from missing covariates, overlooking the potential\u0000applicability of estimators to both scenarios. We bridge this gap by\u0000establishing connections between right-censored and missing covariates under\u0000various assumptions about censoring and missingness, allowing us to identify\u0000parallels and differences to determine when estimators can be used in both\u0000contexts. These connections reveal adaptations to five estimators for\u0000right-censored covariates in the unexplored area of informative covariate\u0000right-censoring and to formulate a new estimator for this setting, where the\u0000event time depends on the censoring time. We establish the asymptotic\u0000properties of the six estimators, evaluate their robustness under incorrect\u0000distributional assumptions, and establish their comparative efficiency. We\u0000conducted a simulation study to confirm our theoretical results, and then\u0000applied all estimators to a Huntington disease observational study to analyze\u0000cognitive impairments as a function of time to clinical diagnosis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Unified Framework for Cluster Methods with Tensor Networks 张量网络集群方法的统一框架
Pub Date : 2024-09-07 DOI: arxiv-2409.04729
Erdong Guo, David Draper
Markov Chain Monte Carlo (MCMC), and Tensor Networks (TN) are two powerfulframeworks for numerically investigating many-body systems, each offeringdistinct advantages. MCMC, with its flexibility and theoretical consistency, iswell-suited for simulating arbitrary systems by sampling. TN, on the otherhand, provides a powerful tensor-based language for capturing the entanglementproperties intrinsic to many-body systems, offering a universal representationof these systems. In this work, we leverage the computational strengths of TNto design a versatile cluster MCMC sampler. Specifically, we propose a generalframework for constructing tensor-based cluster MCMC methods, enablingarbitrary cluster updates by utilizing TNs to compute the distributionsrequired in the MCMC sampler. Our framework unifies several existing clusteralgorithms as special cases and allows for natural extensions. We demonstrateour method by applying it to the simulation of the two-dimensionalEdwards-Anderson Model and the three-dimensional Ising Model. This work isdedicated to the memory of Prof. David Draper.
马尔可夫链蒙特卡罗(MCMC)和张量网络(TN)是对多体系统进行数值研究的两个强大框架,各自具有不同的优势。MCMC 具有灵活性和理论一致性,非常适合通过采样模拟任意系统。而 TN 则提供了一种基于张量的强大语言,用于捕捉多体系统固有的纠缠特性,为这些系统提供了一种通用的表示方法。在这项工作中,我们利用 TN 的计算优势设计了一种多功能集群 MCMC 采样器。具体来说,我们提出了一种构建基于张量的聚类 MCMC 方法的通用框架,通过利用 TN 计算 MCMC 采样器中所需的分布,实现任意的聚类更新。我们的框架将现有的几种聚类算法统一为特例,并允许自然扩展。我们将此方法应用于二维爱德华兹-安德森模型和三维伊辛模型的仿真,从而展示了我们的方法。谨以此文悼念戴维-德雷珀(David Draper)教授。
{"title":"A Unified Framework for Cluster Methods with Tensor Networks","authors":"Erdong Guo, David Draper","doi":"arxiv-2409.04729","DOIUrl":"https://doi.org/arxiv-2409.04729","url":null,"abstract":"Markov Chain Monte Carlo (MCMC), and Tensor Networks (TN) are two powerful\u0000frameworks for numerically investigating many-body systems, each offering\u0000distinct advantages. MCMC, with its flexibility and theoretical consistency, is\u0000well-suited for simulating arbitrary systems by sampling. TN, on the other\u0000hand, provides a powerful tensor-based language for capturing the entanglement\u0000properties intrinsic to many-body systems, offering a universal representation\u0000of these systems. In this work, we leverage the computational strengths of TN\u0000to design a versatile cluster MCMC sampler. Specifically, we propose a general\u0000framework for constructing tensor-based cluster MCMC methods, enabling\u0000arbitrary cluster updates by utilizing TNs to compute the distributions\u0000required in the MCMC sampler. Our framework unifies several existing cluster\u0000algorithms as special cases and allows for natural extensions. We demonstrate\u0000our method by applying it to the simulation of the two-dimensional\u0000Edwards-Anderson Model and the three-dimensional Ising Model. This work is\u0000dedicated to the memory of Prof. David Draper.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm 自然实验估算器基准:新数据集和双重稳健算法
Pub Date : 2024-09-06 DOI: arxiv-2409.04500
R. Teal Witter, Christopher Musco
Estimating the effect of treatments from natural experiments, wheretreatments are pre-assigned, is an important and well-studied problem. Weintroduce a novel natural experiment dataset obtained from an early childhoodliteracy nonprofit. Surprisingly, applying over 20 established estimators tothe dataset produces inconsistent results in evaluating the nonprofit'sefficacy. To address this, we create a benchmark to evaluate estimator accuracyusing synthetic outcomes, whose design was guided by domain experts. Thebenchmark extensively explores performance as real world conditions like samplesize, treatment correlation, and propensity score accuracy vary. Based on ourbenchmark, we observe that the class of doubly robust treatment effectestimators, which are based on simple and intuitive regression adjustment,generally outperform other more complicated estimators by orders of magnitude.To better support our theoretical understanding of doubly robust estimators, wederive a closed form expression for the variance of any such estimator thatuses dataset splitting to obtain an unbiased estimate. This expressionmotivates the design of a new doubly robust estimator that uses a novel lossfunction when fitting functions for regression adjustment. We release thedataset and benchmark in a Python package; the package is built in a modularway to facilitate new datasets and estimators.
从自然实验中估计治疗效果是一个重要的、经过深入研究的问题,因为自然实验中的治疗是预先分配的。我们引入了一个新的自然实验数据集,该数据集来自一家非营利性儿童早期扫盲机构。令人惊讶的是,在数据集上应用超过 20 种既定的估计方法,在评估该非营利组织的有效性时产生了不一致的结果。为了解决这个问题,我们创建了一个基准,利用合成结果来评估估计器的准确性,其设计由领域专家指导。该基准广泛探讨了样本大小、治疗相关性和倾向得分准确性等现实条件发生变化时的性能。为了更好地支持我们对双重稳健估计器的理论理解,我们为任何此类估计器的方差求出了一个封闭形式的表达式,该估计器使用数据集分割来获得无偏估计。这个表达式促使我们设计了一种新的双重稳健估计器,它在拟合回归调整函数时使用了一种新的损失函数。我们在 Python 软件包中发布了数据集和基准;该软件包以模块化方式构建,以便于使用新的数据集和估计器。
{"title":"Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm","authors":"R. Teal Witter, Christopher Musco","doi":"arxiv-2409.04500","DOIUrl":"https://doi.org/arxiv-2409.04500","url":null,"abstract":"Estimating the effect of treatments from natural experiments, where\u0000treatments are pre-assigned, is an important and well-studied problem. We\u0000introduce a novel natural experiment dataset obtained from an early childhood\u0000literacy nonprofit. Surprisingly, applying over 20 established estimators to\u0000the dataset produces inconsistent results in evaluating the nonprofit's\u0000efficacy. To address this, we create a benchmark to evaluate estimator accuracy\u0000using synthetic outcomes, whose design was guided by domain experts. The\u0000benchmark extensively explores performance as real world conditions like sample\u0000size, treatment correlation, and propensity score accuracy vary. Based on our\u0000benchmark, we observe that the class of doubly robust treatment effect\u0000estimators, which are based on simple and intuitive regression adjustment,\u0000generally outperform other more complicated estimators by orders of magnitude.\u0000To better support our theoretical understanding of doubly robust estimators, we\u0000derive a closed form expression for the variance of any such estimator that\u0000uses dataset splitting to obtain an unbiased estimate. This expression\u0000motivates the design of a new doubly robust estimator that uses a novel loss\u0000function when fitting functions for regression adjustment. We release the\u0000dataset and benchmark in a Python package; the package is built in a modular\u0000way to facilitate new datasets and estimators.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1