基于正态分布复合尺度混合的截尾数据回归建模

IF 0.5 4区数学 Q4 STATISTICS & PROBABILITY Brazilian Journal of Probability and Statistics Pub Date : 2023-06-01 DOI:10.1214/22-bjps551

Luis Benites, C. Zeller, H. Bolfarine, V. H. Lachos

{"title":"基于正态分布复合尺度混合的截尾数据回归建模","authors":"Luis Benites, C. Zeller, H. Bolfarine, V. H. Lachos","doi":"10.1214/22-bjps551","DOIUrl":null,"url":null,"abstract":". In the framework of censored regression models, the distribution of the error term can depart significantly from normality, for instance, due to the presence of multi-modality, skewness and/or atypical observations. In this paper we propose a novel censored linear regression model where the random errors follow a finite mixture of scale mixtures of normal (SMN) distribution. The SMN is an attractive class of symmetrical heavy-tailed densities that includes the normal, Student-t, slash and the contaminated normal distribution as special cases. This approach allows us to model data with great flexibility, ac-commodating simultaneously multimodality, heavy tails and skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing the maximum likelihood estimates of the parameters, with standard errors and prediction of the censored values as a by-products. The proposed algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated SMN distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The methodology addressed in this paper is implemented in the R package C ensMixReg.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":"1 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Regression modeling of censored data based on compound scale mixtures of normal distributions\",\"authors\":\"Luis Benites, C. Zeller, H. Bolfarine, V. H. Lachos\",\"doi\":\"10.1214/22-bjps551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". In the framework of censored regression models, the distribution of the error term can depart significantly from normality, for instance, due to the presence of multi-modality, skewness and/or atypical observations. In this paper we propose a novel censored linear regression model where the random errors follow a finite mixture of scale mixtures of normal (SMN) distribution. The SMN is an attractive class of symmetrical heavy-tailed densities that includes the normal, Student-t, slash and the contaminated normal distribution as special cases. This approach allows us to model data with great flexibility, ac-commodating simultaneously multimodality, heavy tails and skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing the maximum likelihood estimates of the parameters, with standard errors and prediction of the censored values as a by-products. The proposed algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated SMN distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The methodology addressed in this paper is implemented in the R package C ensMixReg.\",\"PeriodicalId\":51242,\"journal\":{\"name\":\"Brazilian Journal of Probability and Statistics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brazilian Journal of Probability and Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/22-bjps551\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Journal of Probability and Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/22-bjps551","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

．在删节回归模型的框架中，误差项的分布可能明显偏离正态，例如，由于多模态、偏度和/或非典型观测值的存在。本文提出了一种新的截尾线性回归模型，其中随机误差遵循正态分布的有限混合尺度(SMN)分布。SMN是一类有吸引力的对称重尾密度，包括正态分布、Student-t分布、斜线分布和受污染正态分布作为特殊情况。这种方法使我们能够以极大的灵活性建模数据，同时适应多模态、重尾和依赖于混合组件结构的偏度。我们开发了一种分析易于处理和高效的em型算法，用于迭代计算参数的最大似然估计，并将标准误差和截尾值的预测作为副产品。该算法在e步具有封闭形式的表达式，依赖于截断的SMN分布的均值和方差公式。通过对仿真数据集和实际数据集的分析，验证了该方法的有效性。本文讨论的方法是在R包C ensMixReg中实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Regression modeling of censored data based on compound scale mixtures of normal distributions

. In the framework of censored regression models, the distribution of the error term can depart significantly from normality, for instance, due to the presence of multi-modality, skewness and/or atypical observations. In this paper we propose a novel censored linear regression model where the random errors follow a finite mixture of scale mixtures of normal (SMN) distribution. The SMN is an attractive class of symmetrical heavy-tailed densities that includes the normal, Student-t, slash and the contaminated normal distribution as special cases. This approach allows us to model data with great flexibility, ac-commodating simultaneously multimodality, heavy tails and skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing the maximum likelihood estimates of the parameters, with standard errors and prediction of the censored values as a by-products. The proposed algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated SMN distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The methodology addressed in this paper is implemented in the R package C ensMixReg.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Brazilian Journal of Probability and Statistics STATISTICS & PROBABILITY-

CiteScore

1.60

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： The Brazilian Journal of Probability and Statistics aims to publish high quality research papers in applied probability, applied statistics, computational statistics, mathematical statistics, probability theory and stochastic processes. More specifically, the following types of contributions will be considered: (i) Original articles dealing with methodological developments, comparison of competing techniques or their computational aspects. (ii) Original articles developing theoretical results. (iii) Articles that contain novel applications of existing methodologies to practical problems. For these papers the focus is in the importance and originality of the applied problem, as well as, applications of the best available methodologies to solve it. (iv) Survey articles containing a thorough coverage of topics of broad interest to probability and statistics. The journal will occasionally publish book reviews, invited papers and essays on the teaching of statistics.