首页 > 最新文献

Stat最新文献

英文 中文
Exact confidence intervals for the difference of two proportions based on partially observed binary data 基于部分观测到的二进制数据的两个比例之差的精确置信区间
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-11-20 DOI: 10.1002/sta4.631
Chongxiu Yu, Weizhen Wang, Zhongzhan Zhang
In a matched pairs experiment, two binary variables are typically observed on all subjects in the experiment. However, when one of the variables is missing on some subjects, we have so called the partially observed binary data that consist of two parts: a multinomial from the subjects with a pair of observed variables and two independent binomials from the subjects with only one observed variable. The goal of this paper is to construct exact confidence intervals for the difference of two (success) proportions of the two binary variables. We first derive a new test by combining two score tests for the two parts of the data and invert it to an asymptotic confidence interval. Since asymptotic intervals do not achieve the nominal level, this interval and three other existing intervals are improved to be exact by the general h�$$ h $$�-function method. We compare the infimum coverage probability and average interval length of these intervals and recommend the exact intervals that are improved from the newly proposed interval. Two real data sets are used to illustrate the intervals.
在配对实验中,通常在所有实验对象上观察到两个二元变量。然而,当某些受试者缺少其中一个变量时,我们称之为部分观察到的二元数据,它由两部分组成:一个来自受试者的多项数据,其中包含一对观察到的变量;两个来自受试者的独立二项式数据,仅包含一个观察到的变量。本文的目的是为两个二元变量的两个(成功)比例之差构造精确的置信区间。我们首先通过结合数据的两个部分的两个分数检验来推导一个新的检验,并将其反演为一个渐近置信区间。由于渐近区间没有达到标称水平,因此该区间和其他三个现有区间通过一般h $$ h $$ -函数方法进行改进以达到精确。我们比较了这些区间的最小覆盖概率和平均区间长度,并推荐了从新提出的区间改进的精确区间。使用两个真实数据集来说明区间。
{"title":"Exact confidence intervals for the difference of two proportions based on partially observed binary data","authors":"Chongxiu Yu, Weizhen Wang, Zhongzhan Zhang","doi":"10.1002/sta4.631","DOIUrl":"https://doi.org/10.1002/sta4.631","url":null,"abstract":"In a matched pairs experiment, two binary variables are typically observed on all subjects in the experiment. However, when one of the variables is missing on some subjects, we have so called the partially observed binary data that consist of two parts: a multinomial from the subjects with a pair of observed variables and two independent binomials from the subjects with only one observed variable. The goal of this paper is to construct exact confidence intervals for the difference of two (success) proportions of the two binary variables. We first derive a new test by combining two score tests for the two parts of the data and invert it to an asymptotic confidence interval. Since asymptotic intervals do not achieve the nominal level, this interval and three other existing intervals are improved to be exact by the general <math altimg=\"urn:x-wiley:sta4:media:sta4631:sta4631-math-0001\" display=\"inline\" location=\"graphic/sta4631-math-0001.png\">\u0000<semantics>\u0000<mrow>\u0000<mi>h</mi>\u0000</mrow>\u0000$$ h $$</annotation>\u0000</semantics></math>-function method. We compare the infimum coverage probability and average interval length of these intervals and recommend the exact intervals that are improved from the newly proposed interval. Two real data sets are used to illustrate the intervals.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"10 9","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian process regression and classification using International Classification of Disease codes as covariates 使用国际疾病分类代码作为协变量的高斯过程回归和分类
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-10-07 DOI: 10.1002/sta4.618
Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White
In electronic health records (EHRs) data analysis, nonparametric regression and classification using International Classification of Disease (ICD) codes as covariates remain understudied. Automated methods have been developed over the years for predicting biomedical responses using EHRs, but relatively less attention has been paid to developing patient similarity measures that use ICD codes and chronic conditions, where a chronic condition is defined as a set of ICD codes. We address this problem by first developing a string kernel function for measuring the similarity between a pair of primary chronic conditions, represented as subsets of ICD codes. Second, we extend this similarity measure to a family of covariance functions on subsets of chronic conditions. This family is used in developing Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses and other demographic information as covariates. Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and predictions. The proposed methods are tuning free, so they are ideal for automated prediction of biomedical responses depending on chronic conditions. We evaluate the practical performance of our method on EHR data collected from 1660 patients at the University of Iowa Hospitals and Clinics (UIHC) with six different primary cancer sites. Our method provides better sensitivity and specificity than its competitors in classifying different primary cancer sites and estimates the marginal associations between chronic conditions and primary cancer sites.
在电子健康记录(EHRs)数据分析中,使用国际疾病分类(ICD)代码作为协变量的非参数回归和分类仍未得到充分研究。多年来,人们已经开发出自动化方法,利用电子病历预测生物医学反应,但相对较少关注使用ICD代码和慢性病开发患者相似性测量,其中慢性病被定义为一组ICD代码。为了解决这个问题,我们首先开发了一个字符串核函数,用于测量一对主要慢性疾病之间的相似性,表示为ICD代码的子集。其次,我们将这种相似性度量扩展到慢性病子集上的协方差函数族。该家族用于开发高斯过程(GP)先验,用于贝叶斯非参数回归和分类,使用诊断和其他人口统计信息作为协变量。马尔可夫链蒙特卡罗(MCMC)算法用于后验推理和预测。所提出的方法是免费调整的,因此它们是根据慢性疾病自动预测生物医学反应的理想选择。我们对来自爱荷华大学医院和诊所(UIHC)的6个不同原发癌症部位的1660名患者的电子病历数据进行了评估。我们的方法在分类不同的原发癌部位和估计慢性疾病与原发癌部位之间的边际关联方面提供了比其竞争对手更好的敏感性和特异性。
{"title":"Gaussian process regression and classification using International Classification of Disease codes as covariates","authors":"Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White","doi":"10.1002/sta4.618","DOIUrl":"https://doi.org/10.1002/sta4.618","url":null,"abstract":"In electronic health records (EHRs) data analysis, nonparametric regression and classification using International Classification of Disease (ICD) codes as covariates remain understudied. Automated methods have been developed over the years for predicting biomedical responses using EHRs, but relatively less attention has been paid to developing patient similarity measures that use ICD codes and chronic conditions, where a chronic condition is defined as a set of ICD codes. We address this problem by first developing a string kernel function for measuring the similarity between a pair of primary chronic conditions, represented as subsets of ICD codes. Second, we extend this similarity measure to a family of covariance functions on subsets of chronic conditions. This family is used in developing Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses and other demographic information as covariates. Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and predictions. The proposed methods are tuning free, so they are ideal for automated prediction of biomedical responses depending on chronic conditions. We evaluate the practical performance of our method on EHR data collected from 1660 patients at the University of Iowa Hospitals and Clinics (UIHC) with six different primary cancer sites. Our method provides better sensitivity and specificity than its competitors in classifying different primary cancer sites and estimates the marginal associations between chronic conditions and primary cancer sites.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"27 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotic tail properties of Poisson mixture distributions 泊松混合分布的渐近尾部性质
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-26 DOI: 10.1002/sta4.622
Samuel Valiquette, Gwladys Toulemonde, Jean Peyhardi, Éric Marchand, Frédéric Mortier
Count data are omnipresent in many applied fields, often with overdispersion. With mixtures of Poisson distributions representing an elegant and appealing modelling strategy, we focus here on how the tail behaviour of the mixing distribution is related to the tail of the resulting Poisson mixture. We define five sets of mixing distributions, and we identify for each case whenever the Poisson mixture is in, close to or far from a domain of attraction of maxima. We also characterize how the Poisson mixture behaves similarly to a standard Poisson distribution when the mixing distribution has a finite support. Finally, we study, both analytically and numerically, how goodness‐of‐fit can be assessed with the inspection of tail behaviour.
计数数据在许多应用领域中无处不在,往往存在过分散。泊松分布的混合代表了一种优雅而吸引人的建模策略,我们在这里关注混合分布的尾部行为如何与得到的泊松混合物的尾部相关。我们定义了五组混合分布,并确定了每种情况下泊松混合物何时处于、接近或远离最大吸引域。我们还描述了当混合分布具有有限支持时,泊松混合如何与标准泊松分布相似。最后,我们从分析和数值两方面研究了如何通过检验尾部行为来评估拟合优度。
{"title":"Asymptotic tail properties of Poisson mixture distributions","authors":"Samuel Valiquette, Gwladys Toulemonde, Jean Peyhardi, Éric Marchand, Frédéric Mortier","doi":"10.1002/sta4.622","DOIUrl":"https://doi.org/10.1002/sta4.622","url":null,"abstract":"Count data are omnipresent in many applied fields, often with overdispersion. With mixtures of Poisson distributions representing an elegant and appealing modelling strategy, we focus here on how the tail behaviour of the mixing distribution is related to the tail of the resulting Poisson mixture. We define five sets of mixing distributions, and we identify for each case whenever the Poisson mixture is in, close to or far from a domain of attraction of maxima. We also characterize how the Poisson mixture behaves similarly to a standard Poisson distribution when the mixing distribution has a finite support. Finally, we study, both analytically and numerically, how goodness‐of‐fit can be assessed with the inspection of tail behaviour.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models 贝叶斯常微分方程模型可靠有效推理的重要抽样方法
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-18 DOI: 10.1002/sta4.614
Juho Timonen, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, Aki Vehtari
Statistical models can involve implicitly defined quantities, such as solutions to nonlinear ordinary differential equations (ODEs), that unavoidably need to be numerically approximated in order to evaluate the model. The approximation error inherently biases statistical inference results, but the amount of this bias is generally unknown and often ignored in Bayesian parameter inference. We propose a computationally efficient method for verifying the reliability of posterior inference for such models, when the inference is performed using Markov chain Monte Carlo methods. We validate the efficiency and reliability of our workflow in experiments using simulated and real data and different ODE solvers. We highlight problems that arise with commonly used adaptive ODE solvers and propose robust and effective alternatives, which, accompanied by our workflow, can now be taken into use without losing reliability of the inferences.
统计模型可能涉及隐式定义的量,例如非线性常微分方程(ode)的解,为了评估模型,这些解不可避免地需要进行数值近似。近似误差固有地对统计推断结果产生偏差,但这种偏差的大小通常是未知的,在贝叶斯参数推断中往往被忽略。我们提出了一种计算效率高的方法来验证这些模型的后验推理的可靠性,当推理使用马尔可夫链蒙特卡罗方法进行时。我们在实验中使用模拟数据和真实数据以及不同的ODE求解器验证了我们的工作流的效率和可靠性。我们强调了常用的自适应ODE求解器出现的问题,并提出了鲁棒和有效的替代方案,这些替代方案伴随着我们的工作流程,现在可以在不失去推理可靠性的情况下使用。
{"title":"An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models","authors":"Juho Timonen, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, Aki Vehtari","doi":"10.1002/sta4.614","DOIUrl":"https://doi.org/10.1002/sta4.614","url":null,"abstract":"Statistical models can involve implicitly defined quantities, such as solutions to nonlinear ordinary differential equations (ODEs), that unavoidably need to be numerically approximated in order to evaluate the model. The approximation error inherently biases statistical inference results, but the amount of this bias is generally unknown and often ignored in Bayesian parameter inference. We propose a computationally efficient method for verifying the reliability of posterior inference for such models, when the inference is performed using Markov chain Monte Carlo methods. We validate the efficiency and reliability of our workflow in experiments using simulated and real data and different ODE solvers. We highlight problems that arise with commonly used adaptive ODE solvers and propose robust and effective alternatives, which, accompanied by our workflow, can now be taken into use without losing reliability of the inferences.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135149145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A modified partial envelope tensor response regression 改进的部分包络张量响应回归
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-12 DOI: 10.1002/sta4.615
Wenxing Guo, Narayanaswamy Balakrishnan, Shanshan Qin
The envelope model is a useful statistical technique that can be applied to multivariate linear regression problems. It aims to remove immaterial information via sufficient dimension reduction techniques while still gaining efficiency and providing accurate parameter estimates. Recently, envelope tensor versions have been developed to extend this technique to tensor data. In this work, a partial tensor envelope model is proposed that allows for a parsimonious version of tensor response regression when only certain predictors are of interest. The consistency and asymptotic normality of the regression coefficients estimator are also established theoretically, which provides a rigorous foundation for the proposed method. In numerical studies using both simulated and real‐world data, the partial tensor envelope model is shown to outperform several existing methods in terms of the efficiency of the regression coefficients associated with the selected predictors.
包络模型是一种有用的统计技术,可以应用于多元线性回归问题。它旨在通过足够的降维技术去除非物质信息,同时仍然获得效率并提供准确的参数估计。最近,开发了包络张量版本,将该技术扩展到张量数据。在这项工作中,提出了一个局部张量包络模型,当只有某些预测因子感兴趣时,它允许张量响应回归的简化版本。从理论上建立了回归系数估计量的相合性和渐近正态性,为所提出的方法提供了严格的基础。在使用模拟和现实世界数据的数值研究中,就与所选预测因子相关的回归系数的效率而言,偏张量包络模型被证明优于几种现有方法。
{"title":"A modified partial envelope tensor response regression","authors":"Wenxing Guo, Narayanaswamy Balakrishnan, Shanshan Qin","doi":"10.1002/sta4.615","DOIUrl":"https://doi.org/10.1002/sta4.615","url":null,"abstract":"The envelope model is a useful statistical technique that can be applied to multivariate linear regression problems. It aims to remove immaterial information via sufficient dimension reduction techniques while still gaining efficiency and providing accurate parameter estimates. Recently, envelope tensor versions have been developed to extend this technique to tensor data. In this work, a partial tensor envelope model is proposed that allows for a parsimonious version of tensor response regression when only certain predictors are of interest. The consistency and asymptotic normality of the regression coefficients estimator are also established theoretically, which provides a rigorous foundation for the proposed method. In numerical studies using both simulated and real‐world data, the partial tensor envelope model is shown to outperform several existing methods in terms of the efficiency of the regression coefficients associated with the selected predictors.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New penalty in information criteria for the ARCH sequence with structural changes 结构变化的ARCH序列信息判据中的新惩罚
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-10 DOI: 10.1002/sta4.612
Ryoto Ozaki, Yoshiyuki Ninomiya
For change point models and autoregressive conditional heteroscedasticity (ARCH) models, which have long been important especially in econometrics, we develop information criteria that work well even when considering a combination of these models. Since the change point model does not satisfy the conventional statistical asymptotics, a formal Akaike information criterion (AIC) with twice the number of parameters as the penalty term would clearly result in overfitting. Therefore, we derive an AIC‐type information criterion from its original definition using asymptotics peculiar to the change point model. Specifically, we suppose time series data treated in econometrics and derive Takeuchi information criterion (TIC) as the main information criterion allowing for model misspecification. It is confirmed that the penalty for the change point parameter is almost three times larger than the penalty for the regular parameter. We also derive the AIC in this setting from the TIC by removing the consideration of the model misspecification. In numerical experiments, the derived TIC and AIC are compared with the formal AIC and Bayesian information criterion (BIC). It is shown that the derived information criteria clearly outperform the others in light of the original purpose of AIC, which is to give an estimate close to the true structure. We also ensure that the TIC seems to be superior to the AIC in the presence of model misspecification.
对于变化点模型和自回归条件异方差(ARCH)模型,特别是在计量经济学中一直很重要,我们开发了即使在考虑这些模型的组合时也能很好地工作的信息标准。由于变化点模型不满足常规的统计渐近性,以两倍的参数数作为惩罚项的正式赤池信息准则(AIC)显然会导致过拟合。因此,我们利用变点模型特有的渐近性,从AIC -型信息准则的原始定义推导出AIC -型信息准则。具体来说,我们假设在计量经济学中处理的时间序列数据,并推导出Takeuchi信息准则(TIC)作为允许模型错误规范的主要信息准则。结果表明,变化点参数的惩罚几乎是常规参数惩罚的三倍。我们还通过消除模型错误规范的考虑,从TIC中导出了这种情况下的AIC。在数值实验中,将导出的TIC和AIC与正式AIC和贝叶斯信息准则(BIC)进行了比较。结果表明,根据AIC的原始目的(即给出接近真实结构的估计),推导出的信息准则明显优于其他准则。我们还确保在存在模型错误规范时,TIC似乎优于AIC。
{"title":"New penalty in information criteria for the ARCH sequence with structural changes","authors":"Ryoto Ozaki, Yoshiyuki Ninomiya","doi":"10.1002/sta4.612","DOIUrl":"https://doi.org/10.1002/sta4.612","url":null,"abstract":"For change point models and autoregressive conditional heteroscedasticity (ARCH) models, which have long been important especially in econometrics, we develop information criteria that work well even when considering a combination of these models. Since the change point model does not satisfy the conventional statistical asymptotics, a formal Akaike information criterion (AIC) with twice the number of parameters as the penalty term would clearly result in overfitting. Therefore, we derive an AIC‐type information criterion from its original definition using asymptotics peculiar to the change point model. Specifically, we suppose time series data treated in econometrics and derive Takeuchi information criterion (TIC) as the main information criterion allowing for model misspecification. It is confirmed that the penalty for the change point parameter is almost three times larger than the penalty for the regular parameter. We also derive the AIC in this setting from the TIC by removing the consideration of the model misspecification. In numerical experiments, the derived TIC and AIC are compared with the formal AIC and Bayesian information criterion (BIC). It is shown that the derived information criteria clearly outperform the others in light of the original purpose of AIC, which is to give an estimate close to the true structure. We also ensure that the TIC seems to be superior to the AIC in the presence of model misspecification.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136071436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On pairwise interaction multivariate Pareto models 关于两两相互作用的多元Pareto模型
4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-10 DOI: 10.1002/sta4.613
Michaël Lalancette
The rich class of multivariate Pareto distributions forms the basis of recently introduced extremal graphical models. However, most existing literature on the topic is focused on the popular parametric family of Hüsler–Reiss distributions. It is shown that the Hüsler–Reiss family is in fact the only continuous multivariate Pareto model that exhibits the structure of a pairwise interaction model, justifying its use in many high‐dimensional problems. Along the way, useful insight is obtained concerning a certain class of distributions that generalize the Hüsler–Reiss family, a result of independent interest.
丰富的多元帕累托分布构成了最近引入的极值图形模型的基础。然而,关于该主题的大多数现有文献都集中在流行的h sler - reiss分布的参数族上。结果表明,h sler - reiss族实际上是唯一表现出两两相互作用模型结构的连续多元Pareto模型,证明了它在许多高维问题中的应用。在此过程中,获得了关于推广h sler - reiss族的某类分布的有用见解,这是独立兴趣的结果。
{"title":"On pairwise interaction multivariate Pareto models","authors":"Michaël Lalancette","doi":"10.1002/sta4.613","DOIUrl":"https://doi.org/10.1002/sta4.613","url":null,"abstract":"The rich class of multivariate Pareto distributions forms the basis of recently introduced extremal graphical models. However, most existing literature on the topic is focused on the popular parametric family of Hüsler–Reiss distributions. It is shown that the Hüsler–Reiss family is in fact the only continuous multivariate Pareto model that exhibits the structure of a pairwise interaction model, justifying its use in many high‐dimensional problems. Along the way, useful insight is obtained concerning a certain class of distributions that generalize the Hüsler–Reiss family, a result of independent interest.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136072664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conditional mixture modelling for heavy‐tailed and skewed data 重尾和偏态数据的条件混合建模
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-30 DOI: 10.1002/sta4.608
Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu
Overparameterization is a serious concern for multivariate mixture models as it can lead to model overfitting and, as a result, mixture order underestimation. Parsimonious modelling is one of the most effective remedies in this context. In Gaussian mixture models, the majority of parameters is associated with covariance matrices and parsimonious models based on factor analysers and spectral decomposition of dispersion parameters are the most popular in literature. Some drawbacks of these models include the lack of flexibility in imposing different covariance structures for individual components and limitations in modelling compact clusters. Recently introduced conditional mixture models provide substantial flexibility in addressing these concerns. The components of such mixtures are formulated as a product of conditional distributions with univariate Gaussian densities being the primary choice. However, the presence of heavy tails or skewness in any dimension can lead to fitting problems. We propose a flexible model that is free of the above‐mentioned limitations and name it a contaminated transformation conditional mixture model and demonstrate on a series of simulation studies that it can effectively account for skewness and heavy tails. Applications to real‐life data sets show good results and highlight the promise of the proposed model.
过度参数化是多变量混合模型的一个严重问题,因为它可能导致模型过拟合,从而导致混合顺序低估。在这种情况下,简约建模是最有效的补救措施之一。在高斯混合模型中,大多数参数与协方差矩阵相关,基于因子分析和色散参数谱分解的简约模型是文献中最流行的模型。这些模型的一些缺点包括在为单个组件施加不同协方差结构方面缺乏灵活性,以及在建模紧凑集群方面存在局限性。最近引入的条件混合模型为解决这些问题提供了很大的灵活性。这种混合物的成分被表述为条件分布的乘积,单变量高斯密度是主要选择。然而,在任何维度上出现重尾或偏态都可能导致拟合问题。我们提出了一个不受上述限制的灵活模型,并将其命名为污染变换条件混合模型,并在一系列仿真研究中证明它可以有效地解释偏态和重尾。应用于实际数据集显示出良好的结果,并突出了所提出模型的前景。
{"title":"Conditional mixture modelling for heavy‐tailed and skewed data","authors":"Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu","doi":"10.1002/sta4.608","DOIUrl":"https://doi.org/10.1002/sta4.608","url":null,"abstract":"Overparameterization is a serious concern for multivariate mixture models as it can lead to model overfitting and, as a result, mixture order underestimation. Parsimonious modelling is one of the most effective remedies in this context. In Gaussian mixture models, the majority of parameters is associated with covariance matrices and parsimonious models based on factor analysers and spectral decomposition of dispersion parameters are the most popular in literature. Some drawbacks of these models include the lack of flexibility in imposing different covariance structures for individual components and limitations in modelling compact clusters. Recently introduced conditional mixture models provide substantial flexibility in addressing these concerns. The components of such mixtures are formulated as a product of conditional distributions with univariate Gaussian densities being the primary choice. However, the presence of heavy tails or skewness in any dimension can lead to fitting problems. We propose a flexible model that is free of the above‐mentioned limitations and name it a contaminated transformation conditional mixture model and demonstrate on a series of simulation studies that it can effectively account for skewness and heavy tails. Applications to real‐life data sets show good results and highlight the promise of the proposed model.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"48 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90717289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Likelihood‐based inference for linear mixed‐effects models using the generalized hyperbolic distribution 使用广义双曲分布的线性混合效应模型的基于似然的推断
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-17 DOI: 10.1002/sta4.602
V. H. Lachos, M. Galea, C. Zeller, M. Prates
In this paper, we develop statistical methodology for the analysis of data under nonnormal distributions, in the context of mixed effects models. Although the multivariate normal distribution is useful in many cases, it is not appropriate, for instance, when the data come from skewed and/or heavy‐tailed distributions. To analyse data with these characteristics, in this paper, we extend the standard linear mixed effects model, considering the family of generalized hyperbolic distributions. We propose methods for statistical inference based on the likelihood function, and due to its complexity, the EM algorithm is used to find the maximum likelihood estimates with the standard errors and the exact likelihood value as a by‐product. We use simulations to investigate the asymptotic properties of the expectation‐maximization algorithm (EM) estimates and prediction accuracy. A real example is analysed, illustrating the usefulness of the proposed methods.
在本文中,我们发展了在混合效应模型的背景下分析非正态分布下数据的统计方法。虽然多元正态分布在许多情况下是有用的,但它并不合适,例如,当数据来自偏斜和/或重尾分布时。为了分析具有这些特征的数据,本文扩展了标准的线性混合效应模型,考虑了广义双曲分布族。我们提出了基于似然函数的统计推断方法,由于其复杂性,EM算法被用来寻找以标准误差和精确似然值为副产物的最大似然估计。我们利用模拟研究了期望最大化算法(EM)估计的渐近性质和预测精度。通过实例分析,说明了所提方法的有效性。
{"title":"Likelihood‐based inference for linear mixed‐effects models using the generalized hyperbolic distribution","authors":"V. H. Lachos, M. Galea, C. Zeller, M. Prates","doi":"10.1002/sta4.602","DOIUrl":"https://doi.org/10.1002/sta4.602","url":null,"abstract":"In this paper, we develop statistical methodology for the analysis of data under nonnormal distributions, in the context of mixed effects models. Although the multivariate normal distribution is useful in many cases, it is not appropriate, for instance, when the data come from skewed and/or heavy‐tailed distributions. To analyse data with these characteristics, in this paper, we extend the standard linear mixed effects model, considering the family of generalized hyperbolic distributions. We propose methods for statistical inference based on the likelihood function, and due to its complexity, the EM algorithm is used to find the maximum likelihood estimates with the standard errors and the exact likelihood value as a by‐product. We use simulations to investigate the asymptotic properties of the expectation‐maximization algorithm (EM) estimates and prediction accuracy. A real example is analysed, illustrating the usefulness of the proposed methods.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"152 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85390536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proposed variable sampling interval maximum EWMA and distance EWMA charts with unknown process parameters 提出了过程参数未知的变采样区间最大EWMA和距离EWMA图
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-16 DOI: 10.1002/sta4.605
R. Parvin, M. Khoo, S. Saha, W. L. Teoh
The variable sampling interval (VSI) exponentially weighted moving average (EWMA) chart which varies the chart's sampling interval according to the value of the current plotting statistic increases the speed of the standard EWMA chart in detecting shifts. Joint monitoring schemes use a single combined statistic for the mean and variance in process monitoring. To simultaneously monitor the mean and variance of a process from the normal distribution, two VSI EWMA schemes with unknown process parameters, based on (i) Maximum (Max) and (ii) Distance (Dis) type combining functions, are proposed in this paper. Each of these schemes uses a single plotting statistic. The effects of parameter estimation on the performance of the proposed VSI Max EWMA and VSI Dis EWMA schemes, in terms of the average time to signal, standard deviation of the time to signal, expected average time to signal and median time to signal criteria, are studied using Monte Carlo simulation. The results show that the proposed schemes can identify process shifts quicker than the existing Max/Dis Shewhart (SH), Max/Dis cumulative sum (CUSUM) and Max/Dis EWMA schemes. The implementation of the proposed schemes is demonstrated using a commercial dataset.
可变采样间隔指数加权移动平均图(VSI)根据当前绘制统计量的值改变图的采样间隔,提高了标准指数加权移动平均图检测移位的速度。联合监测方案在过程监测中使用单个组合统计量来表示均值和方差。为了从正态分布同时监测过程的均值和方差,本文提出了基于(i) Maximum (Max)和(ii) Distance (Dis)型组合函数的两种未知过程参数的VSI EWMA方案。这些方案中的每一个都使用一个单独的绘图统计量。利用蒙特卡罗仿真研究了参数估计对所提出的VSI Max和VSI Dis EWMA方案在平均到信号时间、到信号时间标准差、期望平均到信号时间和中位数到信号时间准则方面性能的影响。结果表明,该算法比现有的Max/Dis Shewhart (SH)、Max/Dis cumulative sum (CUSUM)和Max/Dis EWMA算法能更快地识别过程转移。使用商业数据集演示了所提出方案的实现。
{"title":"Proposed variable sampling interval maximum EWMA and distance EWMA charts with unknown process parameters","authors":"R. Parvin, M. Khoo, S. Saha, W. L. Teoh","doi":"10.1002/sta4.605","DOIUrl":"https://doi.org/10.1002/sta4.605","url":null,"abstract":"The variable sampling interval (VSI) exponentially weighted moving average (EWMA) chart which varies the chart's sampling interval according to the value of the current plotting statistic increases the speed of the standard EWMA chart in detecting shifts. Joint monitoring schemes use a single combined statistic for the mean and variance in process monitoring. To simultaneously monitor the mean and variance of a process from the normal distribution, two VSI EWMA schemes with unknown process parameters, based on (i) Maximum (Max) and (ii) Distance (Dis) type combining functions, are proposed in this paper. Each of these schemes uses a single plotting statistic. The effects of parameter estimation on the performance of the proposed VSI Max EWMA and VSI Dis EWMA schemes, in terms of the average time to signal, standard deviation of the time to signal, expected average time to signal and median time to signal criteria, are studied using Monte Carlo simulation. The results show that the proposed schemes can identify process shifts quicker than the existing Max/Dis Shewhart (SH), Max/Dis cumulative sum (CUSUM) and Max/Dis EWMA schemes. The implementation of the proposed schemes is demonstrated using a commercial dataset.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"102 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81782833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1