首页 > 最新文献

arXiv - STAT - Methodology最新文献

英文 中文
Exact Posterior Mean and Covariance for Generalized Linear Mixed Models 广义线性混合模型的精确后验均值和协方差
Pub Date : 2024-09-14 DOI: arxiv-2409.09310
Tonglin Zhang
A novel method is proposed for the exact posterior mean and covariance of therandom effects given the response in a generalized linear mixed model (GLMM)when the response does not follow normal. The research solves a long-standingproblem in Bayesian statistics when an intractable integral appears in theposterior distribution. It is well-known that the posterior distribution of therandom effects given the response in a GLMM when the response does not follownormal contains intractable integrals. Previous methods rely on Monte Carlosimulations for the posterior distributions. They do not provide the exactposterior mean and covariance of the random effects given the response. Thespecial integral computation (SIC) method is proposed to overcome thedifficulty. The SIC method does not use the posterior distribution in thecomputation. It devises an optimization problem to reach the task. An advantageis that the computation of the posterior distribution is unnecessary. Theproposed SIC avoids the main difficulty in Bayesian analysis when intractableintegrals appear in the posterior distribution.
本文提出了一种新方法,用于在广义线性混合模型(GLMM)中,当响应不符合正态分布时,给定响应的随机效应的精确后验均值和协方差。这项研究解决了贝叶斯统计中一个长期存在的问题,即在后验分布中出现一个难以解决的积分。众所周知,当 GLMM 中的响应不服从正态分布时,给定响应的随机效应的后验分布包含难以处理的积分。以前的方法依赖蒙特卡洛模拟来计算后验分布。这些方法无法提供给定响应的随机效应的精确后验均值和协方差。为了克服这一困难,我们提出了特殊积分计算(SIC)方法。SIC 方法在计算中不使用后验分布。它设计了一个优化问题来完成任务。其优点是无需计算后验分布。所提出的 SIC 方法避免了贝叶斯分析中的主要困难,即在后验分布中出现难以处理的积分。
{"title":"Exact Posterior Mean and Covariance for Generalized Linear Mixed Models","authors":"Tonglin Zhang","doi":"arxiv-2409.09310","DOIUrl":"https://doi.org/arxiv-2409.09310","url":null,"abstract":"A novel method is proposed for the exact posterior mean and covariance of the\u0000random effects given the response in a generalized linear mixed model (GLMM)\u0000when the response does not follow normal. The research solves a long-standing\u0000problem in Bayesian statistics when an intractable integral appears in the\u0000posterior distribution. It is well-known that the posterior distribution of the\u0000random effects given the response in a GLMM when the response does not follow\u0000normal contains intractable integrals. Previous methods rely on Monte Carlo\u0000simulations for the posterior distributions. They do not provide the exact\u0000posterior mean and covariance of the random effects given the response. The\u0000special integral computation (SIC) method is proposed to overcome the\u0000difficulty. The SIC method does not use the posterior distribution in the\u0000computation. It devises an optimization problem to reach the task. An advantage\u0000is that the computation of the posterior distribution is unnecessary. The\u0000proposed SIC avoids the main difficulty in Bayesian analysis when intractable\u0000integrals appear in the posterior distribution.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint spatial modeling of mean and non-homogeneous variance combining semiparametric SAR and GAMLSS models for hedonic prices 结合半参数 SAR 模型和 GAMLSS 模型为对冲价格建立均值和非均质方差的联合空间模型
Pub Date : 2024-09-13 DOI: arxiv-2409.08912
J. D. Toloza-Delgado, O. O. Melo, N. A. Cruz
In the context of spatial econometrics, it is very useful to havemethodologies that allow modeling the spatial dependence of the observedvariables and obtaining more precise predictions of both the mean and thevariability of the response variable, something very useful in territorialplanning and public policies. This paper proposes a new methodology thatjointly models the mean and the variance. Also, it allows to model the spatialdependence of the dependent variable as a function of covariates and to modelthe semiparametric effects in both models. The algorithms developed are basedon generalized additive models that allow the inclusion of non-parametric termsin both the mean and the variance, maintaining the traditional theoreticalframework of spatial regression. The theoretical developments of the estimationof this model are carried out, obtaining desirable statistical properties inthe estimators. A simulation study is developed to verify that the proposedmethod has a remarkable predictive capacity in terms of the mean square errorand shows a notable improvement in the estimation of the spatial autoregressiveparameter, compared to other traditional methods and some recent developments.The model is also tested on data from the construction of a hedonic price modelfor the city of Bogota, highlighting as the main result the ability to modelthe variability of housing prices, and the wealth in the analysis obtained.
在空间计量经济学的背景下,有一种方法可以对观测变量的空间依赖性进行建模,并对响应变量的均值和变异性进行更精确的预测,这对领土规划和公共政策非常有用。本文提出了一种新的方法,可同时对均值和方差进行建模。此外,它还可以将因变量的空间依赖性作为协变量的函数进行建模,并在两个模型中对半参数效应进行建模。所开发的算法基于广义加法模型,允许在均值和方差中包含非参数项,保持了空间回归的传统理论框架。对该模型的估计进行了理论开发,获得了理想的估计器统计特性。通过模拟研究,验证了所提出的方法在均方误差方面具有显著的预测能力,并且与其他传统方法和一些最新方法相比,在空间自回归参数估计方面有明显改善。该模型还在构建波哥大市享乐主义价格模型的数据上进行了测试,主要结果突出了对住房价格变化的建模能力,以及所获分析的丰富性。
{"title":"Joint spatial modeling of mean and non-homogeneous variance combining semiparametric SAR and GAMLSS models for hedonic prices","authors":"J. D. Toloza-Delgado, O. O. Melo, N. A. Cruz","doi":"arxiv-2409.08912","DOIUrl":"https://doi.org/arxiv-2409.08912","url":null,"abstract":"In the context of spatial econometrics, it is very useful to have\u0000methodologies that allow modeling the spatial dependence of the observed\u0000variables and obtaining more precise predictions of both the mean and the\u0000variability of the response variable, something very useful in territorial\u0000planning and public policies. This paper proposes a new methodology that\u0000jointly models the mean and the variance. Also, it allows to model the spatial\u0000dependence of the dependent variable as a function of covariates and to model\u0000the semiparametric effects in both models. The algorithms developed are based\u0000on generalized additive models that allow the inclusion of non-parametric terms\u0000in both the mean and the variance, maintaining the traditional theoretical\u0000framework of spatial regression. The theoretical developments of the estimation\u0000of this model are carried out, obtaining desirable statistical properties in\u0000the estimators. A simulation study is developed to verify that the proposed\u0000method has a remarkable predictive capacity in terms of the mean square error\u0000and shows a notable improvement in the estimation of the spatial autoregressive\u0000parameter, compared to other traditional methods and some recent developments.\u0000The model is also tested on data from the construction of a hedonic price model\u0000for the city of Bogota, highlighting as the main result the ability to model\u0000the variability of housing prices, and the wealth in the analysis obtained.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi forests: Variable importance for multi-class outcomes 多重森林:多类结果的变量重要性
Pub Date : 2024-09-13 DOI: arxiv-2409.08925
Roman HornungInstitute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, GermanyMunich Center for Machine Learning, Alexander HapfelmeierInstitute of AI and Informatics in Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
In prediction tasks with multi-class outcomes, identifying covariatesspecifically associated with one or more outcome classes can be important.Conventional variable importance measures (VIMs) from random forests (RFs),like permutation and Gini importance, focus on overall predictive performanceor node purity, without differentiating between the classes. Therefore, theycan be expected to fail to distinguish class-associated covariates fromcovariates that only distinguish between groups of classes. We introduce a VIMcalled multi-class VIM, tailored for identifying exclusively class-associatedcovariates, via a novel RF variant called multi forests (MuFs). The trees inMuFs use both multi-way and binary splitting. The multi-way splits generatechild nodes for each class, using a split criterion that evaluates how wellthese nodes represent their respective classes. This setup forms the basis ofthe multi-class VIM, which measures the discriminatory ability of the splitsperformed in the respective covariates with regard to this split criterion.Alongside the multi-class VIM, we introduce a second VIM, the discriminatoryVIM. This measure, based on the binary splits, assesses the strength of thegeneral influence of the covariates, irrespective of theirclass-associatedness. Simulation studies demonstrate that the multi-class VIMspecifically ranks class-associated covariates highly, unlike conventional VIMswhich also rank other types of covariates highly. Analyses of 121 datasetsreveal that MuFs often have slightly lower predictive performance compared toconventional RFs. This is, however, not a limiting factor given the algorithm'sprimary purpose of calculating the multi-class VIM.
在具有多类结果的预测任务中,识别与一个或多个结果类别特别相关的协变量可能非常重要。来自随机森林(RF)的传统变量重要性度量(VIMs),如置换和基尼重要性,侧重于整体预测性能或节点纯度,而不区分类别。因此,预计它们无法区分与类相关的协变量和只区分类群的协变量。我们通过一种名为多森林(MuFs)的新型 RF 变体,引入了一种称为多类 VIM 的 VIM,专门用于识别与类相关的协变量。MuFs 中的树同时使用多向分裂和二元分裂。多向拆分为每个类别生成子节点,使用拆分标准来评估这些节点对各自类别的代表程度。这种设置构成了多类 VIM 的基础,多类 VIM 衡量的是根据这种拆分标准在各自协变量中进行拆分的判别能力。该指标基于二元拆分,评估协变量的一般影响强度,而不考虑其类别相关性。模拟研究表明,多类 VIM 对类相关协变量的排序很高,而传统 VIM 对其他类型协变量的排序也很高。对 121 个数据集的分析表明,MuFs 的预测性能往往略低于传统的 RFs。不过,考虑到该算法的主要目的是计算多类 VIM,这并不是一个限制因素。
{"title":"Multi forests: Variable importance for multi-class outcomes","authors":"Roman HornungInstitute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, GermanyMunich Center for Machine Learning, Alexander HapfelmeierInstitute of AI and Informatics in Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany","doi":"arxiv-2409.08925","DOIUrl":"https://doi.org/arxiv-2409.08925","url":null,"abstract":"In prediction tasks with multi-class outcomes, identifying covariates\u0000specifically associated with one or more outcome classes can be important.\u0000Conventional variable importance measures (VIMs) from random forests (RFs),\u0000like permutation and Gini importance, focus on overall predictive performance\u0000or node purity, without differentiating between the classes. Therefore, they\u0000can be expected to fail to distinguish class-associated covariates from\u0000covariates that only distinguish between groups of classes. We introduce a VIM\u0000called multi-class VIM, tailored for identifying exclusively class-associated\u0000covariates, via a novel RF variant called multi forests (MuFs). The trees in\u0000MuFs use both multi-way and binary splitting. The multi-way splits generate\u0000child nodes for each class, using a split criterion that evaluates how well\u0000these nodes represent their respective classes. This setup forms the basis of\u0000the multi-class VIM, which measures the discriminatory ability of the splits\u0000performed in the respective covariates with regard to this split criterion.\u0000Alongside the multi-class VIM, we introduce a second VIM, the discriminatory\u0000VIM. This measure, based on the binary splits, assesses the strength of the\u0000general influence of the covariates, irrespective of their\u0000class-associatedness. Simulation studies demonstrate that the multi-class VIM\u0000specifically ranks class-associated covariates highly, unlike conventional VIMs\u0000which also rank other types of covariates highly. Analyses of 121 datasets\u0000reveal that MuFs often have slightly lower predictive performance compared to\u0000conventional RFs. This is, however, not a limiting factor given the algorithm's\u0000primary purpose of calculating the multi-class VIM.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracing the impacts of Mount Pinatubo eruption on global climate using spatially-varying changepoint detection 利用空间变化点探测追踪皮纳图博火山爆发对全球气候的影响
Pub Date : 2024-09-13 DOI: arxiv-2409.08908
Samantha Shi-Jun, Lyndsay Shand, Bo Li
Significant events such as volcanic eruptions can have global and longlasting impacts on climate. These global impacts, however, are not uniformacross space and time. Understanding how the Mt. Pinatubo eruption affectsglobal and regional climate is of great interest for predicting impact onclimate due to similar events. We propose a Bayesian framework tosimultaneously detect and estimate spatially-varying temporal changepoints forregional climate impacts. Our approach takes into account the diffusing natureof the changes caused by the volcanic eruption and leverages spatialcorrelation. We illustrate our method on simulated datasets and compare it withan existing changepoint detection method. Finally, we apply our method onmonthly stratospheric aerosol optical depth and surface temperature data from1985 to 1995 to detect and estimate changepoints following the 1991 Mt.Pinatubo eruption.
火山爆发等重大事件会对气候产生全球性的持久影响。然而,这些全球性影响在空间和时间上并不一致。了解皮纳图博火山爆发如何影响全球和区域气候,对于预测类似事件对气候的影响具有重要意义。我们提出了一个贝叶斯框架,用于同时检测和估计区域气候影响的空间变化时间变化点。我们的方法考虑到了火山爆发引起的变化的扩散性,并利用了空间相关性。我们在模拟数据集上说明了我们的方法,并与现有的变化点检测方法进行了比较。最后,我们将我们的方法应用于 1985 年至 1995 年每月的平流层气溶胶光学深度和地表温度数据,以探测和估计 1991 年皮纳图博火山爆发后的变化点。
{"title":"Tracing the impacts of Mount Pinatubo eruption on global climate using spatially-varying changepoint detection","authors":"Samantha Shi-Jun, Lyndsay Shand, Bo Li","doi":"arxiv-2409.08908","DOIUrl":"https://doi.org/arxiv-2409.08908","url":null,"abstract":"Significant events such as volcanic eruptions can have global and long\u0000lasting impacts on climate. These global impacts, however, are not uniform\u0000across space and time. Understanding how the Mt. Pinatubo eruption affects\u0000global and regional climate is of great interest for predicting impact on\u0000climate due to similar events. We propose a Bayesian framework to\u0000simultaneously detect and estimate spatially-varying temporal changepoints for\u0000regional climate impacts. Our approach takes into account the diffusing nature\u0000of the changes caused by the volcanic eruption and leverages spatial\u0000correlation. We illustrate our method on simulated datasets and compare it with\u0000an existing changepoint detection method. Finally, we apply our method on\u0000monthly stratospheric aerosol optical depth and surface temperature data from\u00001985 to 1995 to detect and estimate changepoints following the 1991 Mt.\u0000Pinatubo eruption.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cubature-based uncertainty estimation for nonlinear regression models 基于三次方的非线性回归模型不确定性估计
Pub Date : 2024-09-13 DOI: arxiv-2409.08756
Martin Bubel, Jochen Schmid, Maximilian Carmesin, Volodymyr Kozachynskyi, Erik Esche, Michael Bortz
Calibrating model parameters to measured data by minimizing loss functions isan important step in obtaining realistic predictions from model-basedapproaches, e.g., for process optimization. This is applicable to bothknowledge-driven and data-driven model setups. Due to measurement errors, thecalibrated model parameters also carry uncertainty. In this contribution, weuse cubature formulas based on sparse grids to calculate the variance of theregression results. The number of cubature points is close to the theoreticalminimum required for a given level of exactness. We present exact benchmarkresults, which we also compare to other cubatures. This scheme is then appliedto estimate the prediction uncertainty of the NRTL model, calibrated toobservations from different experimental designs.
通过最小化损失函数将模型参数与测量数据进行校准,是基于模型的方法(如流程优化)获得真实预测的重要一步。这同时适用于知识驱动型和数据驱动型模型设置。由于测量误差,校准后的模型参数也具有不确定性。在本文中,我们使用基于稀疏网格的立方公式来计算回归结果的方差。立方点的数量接近特定精确度所需的理论最小值。我们提出了精确的基准结果,并与其他立方公式进行了比较。然后,我们将这一方案应用于估算 NRTL 模型的预测不确定性,并根据来自不同实验设计的观测数据进行校准。
{"title":"Cubature-based uncertainty estimation for nonlinear regression models","authors":"Martin Bubel, Jochen Schmid, Maximilian Carmesin, Volodymyr Kozachynskyi, Erik Esche, Michael Bortz","doi":"arxiv-2409.08756","DOIUrl":"https://doi.org/arxiv-2409.08756","url":null,"abstract":"Calibrating model parameters to measured data by minimizing loss functions is\u0000an important step in obtaining realistic predictions from model-based\u0000approaches, e.g., for process optimization. This is applicable to both\u0000knowledge-driven and data-driven model setups. Due to measurement errors, the\u0000calibrated model parameters also carry uncertainty. In this contribution, we\u0000use cubature formulas based on sparse grids to calculate the variance of the\u0000regression results. The number of cubature points is close to the theoretical\u0000minimum required for a given level of exactness. We present exact benchmark\u0000results, which we also compare to other cubatures. This scheme is then applied\u0000to estimate the prediction uncertainty of the NRTL model, calibrated to\u0000observations from different experimental designs.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Angular Co-variance using intrinsic geometry of torus: Non-parametric change points detection in meteorological data 利用环的固有几何形状进行角度共变:气象数据中的非参数变化点检测
Pub Date : 2024-09-13 DOI: arxiv-2409.08838
Surojit Biswas, Buddhananda Banerjee, Arnab Kumar Laha
In many temporal datasets, the parameters of the underlying distribution maychange abruptly at unknown times. Detecting these changepoints is crucial fornumerous applications. While this problem has been extensively studied forlinear data, there has been remarkably less research on bivariate angular data.For the first time, we address the changepoint problem for the mean directionof toroidal and spherical data, which are types of bivariate angular data. Byleveraging the intrinsic geometry of a curved torus, we introduce the conceptof the ``square'' of an angle. This leads us to define the ``curved dispersionmatrix'' for bivariate angular random variables, analogous to the dispersionmatrix for bivariate linear random variables. Using this analogous measure ofthe ``Mahalanobis distance,'' we develop two new non-parametric tests toidentify changes in the mean direction parameters for toroidal and sphericaldistributions. We derive the limiting distributions of the test statistics andevaluate their power surface and contours through extensive simulations. Wealso apply the proposed methods to detect changes in mean direction for hourlywind-wave direction measurements and the path of the cyclonic storm``Biporjoy,'' which occurred between 6th and 19th June 2023 over the ArabianSea, western coast of India.
在许多时间数据集中,基础分布的参数可能会在未知时间突然发生变化。检测这些变化点对于众多应用来说至关重要。我们首次解决了环形数据和球形数据(二维角度数据的一种)平均方向的变化点问题。通过利用曲面环的内在几何特性,我们引入了角度的 "平方 "概念。由此,我们定义了二元角度随机变量的 "曲线离散矩阵",类似于二元线性随机变量的离散矩阵。利用这个类似的 "马哈罗诺比距离 "度量,我们开发了两个新的非参数检验,以识别环形分布和球形分布的平均方向参数的变化。我们得出了检验统计量的极限分布,并通过大量模拟评估了它们的功率面和等值线。我们还将提出的方法应用于检测每小时风向测量的平均风向变化以及2023年6月6日至19日发生在印度西海岸阿拉伯海上空的气旋风暴 "Biporjoy "的路径。
{"title":"Angular Co-variance using intrinsic geometry of torus: Non-parametric change points detection in meteorological data","authors":"Surojit Biswas, Buddhananda Banerjee, Arnab Kumar Laha","doi":"arxiv-2409.08838","DOIUrl":"https://doi.org/arxiv-2409.08838","url":null,"abstract":"In many temporal datasets, the parameters of the underlying distribution may\u0000change abruptly at unknown times. Detecting these changepoints is crucial for\u0000numerous applications. While this problem has been extensively studied for\u0000linear data, there has been remarkably less research on bivariate angular data.\u0000For the first time, we address the changepoint problem for the mean direction\u0000of toroidal and spherical data, which are types of bivariate angular data. By\u0000leveraging the intrinsic geometry of a curved torus, we introduce the concept\u0000of the ``square'' of an angle. This leads us to define the ``curved dispersion\u0000matrix'' for bivariate angular random variables, analogous to the dispersion\u0000matrix for bivariate linear random variables. Using this analogous measure of\u0000the ``Mahalanobis distance,'' we develop two new non-parametric tests to\u0000identify changes in the mean direction parameters for toroidal and spherical\u0000distributions. We derive the limiting distributions of the test statistics and\u0000evaluate their power surface and contours through extensive simulations. We\u0000also apply the proposed methods to detect changes in mean direction for hourly\u0000wind-wave direction measurements and the path of the cyclonic storm\u0000``Biporjoy,'' which occurred between 6th and 19th June 2023 over the Arabian\u0000Sea, western coast of India.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of distributions for risks based on the first moment and c-statistic 基于第一矩和 c 统计量的风险分布识别
Pub Date : 2024-09-13 DOI: arxiv-2409.09178
Mohsen Sadatsafavi, Tae Yoon Lee, John Petkau
We show that for any family of distributions with support on [0,1] withstrictly monotonic cumulative distribution function (CDF) that has no jumps andis quantile-identifiable (i.e., any two distinct quantiles identify thedistribution), knowing the first moment and c-statistic is enough to identifythe distribution. The derivations motivate numerical algorithms for mapping agiven pair of expected value and c-statistic to the parameters of specifiedtwo-parameter distributions for probabilities. We implemented these algorithmsin R and in a simulation study evaluated their numerical accuracy for commonfamilies of distributions for risks (beta, logit-normal, and probit-normal). Anarea of application for these developments is in risk prediction modeling(e.g., sample size calculations and Value of Information analysis), where onemight need to estimate the parameters of the distribution of predicted risksfrom the reported summary statistics.
我们证明,对于任何支持[0,1]且具有严格单调累积分布函数(CDF)、无跳跃且可量值化(即任何两个不同的量值可识别该分布)的分布族,知道第一矩和 c 统计量就足以识别该分布。这些推导激发了将给定的一对期望值和 c 统计量映射到指定概率双参数分布参数的数值算法。我们用 R 语言实现了这些算法,并在模拟研究中评估了它们对常见风险分布系列(β、logit-正态分布和 probit-正态分布)的数值精度。这些开发成果的一个应用领域是风险预测建模(如样本大小计算和信息价值分析),在这种情况下,我们可能需要根据报告的汇总统计量来估计预测风险分布的参数。
{"title":"Identification of distributions for risks based on the first moment and c-statistic","authors":"Mohsen Sadatsafavi, Tae Yoon Lee, John Petkau","doi":"arxiv-2409.09178","DOIUrl":"https://doi.org/arxiv-2409.09178","url":null,"abstract":"We show that for any family of distributions with support on [0,1] with\u0000strictly monotonic cumulative distribution function (CDF) that has no jumps and\u0000is quantile-identifiable (i.e., any two distinct quantiles identify the\u0000distribution), knowing the first moment and c-statistic is enough to identify\u0000the distribution. The derivations motivate numerical algorithms for mapping a\u0000given pair of expected value and c-statistic to the parameters of specified\u0000two-parameter distributions for probabilities. We implemented these algorithms\u0000in R and in a simulation study evaluated their numerical accuracy for common\u0000families of distributions for risks (beta, logit-normal, and probit-normal). An\u0000area of application for these developments is in risk prediction modeling\u0000(e.g., sample size calculations and Value of Information analysis), where one\u0000might need to estimate the parameters of the distribution of predicted risks\u0000from the reported summary statistics.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change point analysis with irregular signals 不规则信号的变化点分析
Pub Date : 2024-09-13 DOI: arxiv-2409.08863
Tobias Kley, Yuhan Philip Liu, Hongyuan Cao, Wei Biao Wu
This paper considers the problem of testing and estimation of change pointwhere signals after the change point can be highly irregular, which departsfrom the existing literature that assumes signals after the change point to bepiece-wise constant or vary smoothly. A two-step approach is proposed toeffectively estimate the location of the change point. The first step consistsof a preliminary estimation of the change point that allows us to obtainunknown parameters for the second step. In the second step we use a newprocedure to determine the position of the change point. We show that, undersuitable conditions, the desirable $mathcal{O}_P(1)$ rate of convergence ofthe estimated change point can be obtained. We apply our method to analyze theBaidu search index of COVID-19 related symptoms and find 8~December 2019 to bethe starting date of the COVID-19 pandemic.
本文考虑的是变化点的测试和估计问题,即变化点之后的信号可能非常不规则,这与现有文献中假设变化点之后的信号是片面恒定或平滑变化的观点不同。为了有效估计变化点的位置,我们提出了一种分两步走的方法。第一步是对变化点进行初步估计,以便为第二步获取未知参数。在第二步中,我们使用一种新的程序来确定变化点的位置。我们证明,在合适的条件下,可以获得理想的$mathcal{O}_P(1)$估计变化点的收敛速率。我们应用我们的方法分析了 COVID-19 相关症状的百度搜索指数,发现 2019 年 12 月 8 日是 COVID-19 大流行的开始日期。
{"title":"Change point analysis with irregular signals","authors":"Tobias Kley, Yuhan Philip Liu, Hongyuan Cao, Wei Biao Wu","doi":"arxiv-2409.08863","DOIUrl":"https://doi.org/arxiv-2409.08863","url":null,"abstract":"This paper considers the problem of testing and estimation of change point\u0000where signals after the change point can be highly irregular, which departs\u0000from the existing literature that assumes signals after the change point to be\u0000piece-wise constant or vary smoothly. A two-step approach is proposed to\u0000effectively estimate the location of the change point. The first step consists\u0000of a preliminary estimation of the change point that allows us to obtain\u0000unknown parameters for the second step. In the second step we use a new\u0000procedure to determine the position of the change point. We show that, under\u0000suitable conditions, the desirable $mathcal{O}_P(1)$ rate of convergence of\u0000the estimated change point can be obtained. We apply our method to analyze the\u0000Baidu search index of COVID-19 related symptoms and find 8~December 2019 to be\u0000the starting date of the COVID-19 pandemic.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression-based proximal causal inference for right-censored time-to-event data 针对右删失时间到事件数据的基于回归的近因推断
Pub Date : 2024-09-13 DOI: arxiv-2409.08924
Kendrick Li, George C. Linderman, Xu Shi, Eric J. Tchetgen Tchetgen
Unmeasured confounding is one of the major concerns in causal inference fromobservational data. Proximal causal inference (PCI) is an emergingmethodological framework to detect and potentially account for confounding biasby carefully leveraging a pair of negative control exposure (NCE) and outcome(NCO) variables, also known as treatment and outcome confounding proxies.Although regression-based PCI is well developed for binary and continuousoutcomes, analogous PCI regression methods for right-censored time-to-eventoutcomes are currently lacking. In this paper, we propose a novel two-stageregression PCI approach for right-censored survival data under an additivehazard structural model. We provide theoretical justification for the proposedapproach tailored to different types of NCOs, including continuous, count, andright-censored time-to-event variables. We illustrate the approach with anevaluation of the effectiveness of right heart catheterization among criticallyill patients using data from the SUPPORT study. Our method is implemented inthe open-access R package 'pci2s'.
未测量的混杂因素是根据观察数据进行因果推断的主要问题之一。近端因果推断(PCI)是一种新兴的方法论框架,它通过仔细利用一对负控制暴露(NCE)和结果(NCO)变量(也称为治疗和结果混杂代理变量)来检测和潜在地解释混杂偏倚。虽然基于回归的 PCI 已针对二元和连续结果得到了很好的发展,但目前还缺乏针对右删失时间到事件结果的类似 PCI 回归方法。在本文中,我们提出了一种新颖的两阶段回归 PCI 方法,用于加性危害结构模型下的右删失生存数据。我们针对不同类型的 NCO(包括连续变量、计数变量和右删失时间到事件变量),为所提出的方法提供了理论依据。我们利用 SUPPORT 研究的数据对重症患者进行右心导管检查的有效性进行了评估,以此来说明我们的方法。我们的方法是在开放存取的 R 软件包 "pci2s "中实现的。
{"title":"Regression-based proximal causal inference for right-censored time-to-event data","authors":"Kendrick Li, George C. Linderman, Xu Shi, Eric J. Tchetgen Tchetgen","doi":"arxiv-2409.08924","DOIUrl":"https://doi.org/arxiv-2409.08924","url":null,"abstract":"Unmeasured confounding is one of the major concerns in causal inference from\u0000observational data. Proximal causal inference (PCI) is an emerging\u0000methodological framework to detect and potentially account for confounding bias\u0000by carefully leveraging a pair of negative control exposure (NCE) and outcome\u0000(NCO) variables, also known as treatment and outcome confounding proxies.\u0000Although regression-based PCI is well developed for binary and continuous\u0000outcomes, analogous PCI regression methods for right-censored time-to-event\u0000outcomes are currently lacking. In this paper, we propose a novel two-stage\u0000regression PCI approach for right-censored survival data under an additive\u0000hazard structural model. We provide theoretical justification for the proposed\u0000approach tailored to different types of NCOs, including continuous, count, and\u0000right-censored time-to-event variables. We illustrate the approach with an\u0000evaluation of the effectiveness of right heart catheterization among critically\u0000ill patients using data from the SUPPORT study. Our method is implemented in\u0000the open-access R package 'pci2s'.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The underreported death toll of wars: a probabilistic reassessment from a structured expert elicitation 少报的战争死亡人数:通过结构化专家征询进行的概率重新评估
Pub Date : 2024-09-13 DOI: arxiv-2409.08779
Paola Vesco, David Randahl, Håvard Hegre, Stina Högbladh, Mert Can Yilmaz
Event datasets including those provided by Uppsala Conflict Data Program(UCDP) are based on reports from the media and international organizations, andare likely to suffer from reporting bias. Since the UCDP has strict inclusioncriteria, they most likely under-estimate conflict-related deaths, but we donot know by how much. Here, we provide a generalizable, cross-national measureof uncertainty around UCDP reported fatalities that is more robust andrealistic than UCDP's documented low and high estimates, and make available adataset and R package accounting for the measurement uncertainty. We use astructured expert elicitation combined with statistical modelling to derive adistribution of plausible number of fatalities given the number ofbattle-related deaths and the type of violence documented by the UCDP. Theresults can help scholars understand the extent of bias affecting theirempirical analyses of organized violence and contribute to improve the accuracyof conflict forecasting systems.
包括乌普萨拉冲突数据计划(Uppsala Conflict Data Program,UCDP)提供的事件数据集都是基于媒体和国际组织的报告,很可能存在报告偏差。由于乌普萨拉冲突数据计划有严格的纳入标准,它们很可能低估了与冲突有关的死亡人数,但我们不知道低估了多少。在此,我们提供了一种可通用的、跨国家的、围绕 UCDP 报告的死亡人数的不确定性测量方法,它比 UCDP 有据可查的低估计值和高估计值更稳健、更现实,并提供了测量不确定性的数据集和 R 软件包。我们使用结构化的专家征询法并结合统计建模,根据 UCDP 记录的与战争有关的死亡人数和暴力类型,推导出合理的死亡人数分布。这些结果有助于学者们了解影响有组织暴力实证分析的偏差程度,并有助于提高冲突预测系统的准确性。
{"title":"The underreported death toll of wars: a probabilistic reassessment from a structured expert elicitation","authors":"Paola Vesco, David Randahl, Håvard Hegre, Stina Högbladh, Mert Can Yilmaz","doi":"arxiv-2409.08779","DOIUrl":"https://doi.org/arxiv-2409.08779","url":null,"abstract":"Event datasets including those provided by Uppsala Conflict Data Program\u0000(UCDP) are based on reports from the media and international organizations, and\u0000are likely to suffer from reporting bias. Since the UCDP has strict inclusion\u0000criteria, they most likely under-estimate conflict-related deaths, but we do\u0000not know by how much. Here, we provide a generalizable, cross-national measure\u0000of uncertainty around UCDP reported fatalities that is more robust and\u0000realistic than UCDP's documented low and high estimates, and make available a\u0000dataset and R package accounting for the measurement uncertainty. We use a\u0000structured expert elicitation combined with statistical modelling to derive a\u0000distribution of plausible number of fatalities given the number of\u0000battle-related deaths and the type of violence documented by the UCDP. The\u0000results can help scholars understand the extent of bias affecting their\u0000empirical analyses of organized violence and contribute to improve the accuracy\u0000of conflict forecasting systems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1