Mozhgan Alirezaei Dizicheh, Ehsan Zamanzade, N. Iranpanah
This work deals with problem of estimating the odds using judgment post stratification (JPS) sampling design. Several estimators of the odds are described and the asymptotic normality of each of them is established. Monte Carlo simulation study is then used to compare different estimators of the odds in the JPS with the standard estimator in simple random sampling (SRS) with replacement for both perfect/imperfect ranking and for both JPS data with/without empty strata. The comparison results indicate that the estimators developed here can be highly more efficient than their SRS counterpart in some certain circumstances. Finally, a real dataset from the third National Health and Nutrition Examination Survey (NHANES III) is employed for illustration purposes.
{"title":"Efficient estimation of the odds using judgment post stratification","authors":"Mozhgan Alirezaei Dizicheh, Ehsan Zamanzade, N. Iranpanah","doi":"10.1214/20-BJPS481","DOIUrl":"https://doi.org/10.1214/20-BJPS481","url":null,"abstract":"This work deals with problem of estimating the odds using judgment post stratification (JPS) sampling design. Several estimators of the odds are described and the asymptotic normality of each of them is established. Monte Carlo simulation study is then used to compare different estimators of the odds in the JPS with the standard estimator in simple random sampling (SRS) with replacement for both perfect/imperfect ranking and for both JPS data with/without empty strata. The comparison results indicate that the estimators developed here can be highly more efficient than their SRS counterpart in some certain circumstances. Finally, a real dataset from the third National Health and Nutrition Examination Survey (NHANES III) is employed for illustration purposes.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":"35 1","pages":"375-391"},"PeriodicalIF":1.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43014897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract. We study the probability that one beta-distributed random variable exceeds the maximum of two others, allowing all three to have general parameters. This amounts to studying Euler transforms of products of two incomplete beta functions. We provide a closed form for the general problem in terms of Kampé de Fériet functions and a variety of simpler closed forms in special cases. The results are applied to derive the moments of the maximum of two independent beta-distributed random variables and to find inner products of incomplete beta functions. Restricted to positive integer parameters, our results are applied to determine an expected exit time for a conditioned random walk and also to a combinatorial problem of enumerating strings comprised of three different letters, subject to constraints.
摘要我们研究一个β分布随机变量超过另外两个随机变量最大值的概率,允许所有三个随机变量都有一般参数。这相当于研究两个不完全函数乘积的欧拉变换。我们用kamp de fsamriet函数给出了一般问题的封闭形式,并在特殊情况下给出了各种更简单的封闭形式。结果应用于推导两个独立的β分布随机变量的极大值的矩和求不完全β函数的内积。由于受正整数参数的限制,我们的结果被应用于确定条件随机漫步的预期退出时间,以及在约束条件下枚举由三个不同字母组成的字符串的组合问题。
{"title":"Integrals of incomplete beta functions, with applications to order statistics, random walks and string enumeration","authors":"Stephen B. Connor, C. Fewster","doi":"10.1214/21-bjps522","DOIUrl":"https://doi.org/10.1214/21-bjps522","url":null,"abstract":"Abstract. We study the probability that one beta-distributed random variable exceeds the maximum of two others, allowing all three to have general parameters. This amounts to studying Euler transforms of products of two incomplete beta functions. We provide a closed form for the general problem in terms of Kampé de Fériet functions and a variety of simpler closed forms in special cases. The results are applied to derive the moments of the maximum of two independent beta-distributed random variables and to find inner products of incomplete beta functions. Restricted to positive integer parameters, our results are applied to determine an expected exit time for a conditioned random walk and also to a combinatorial problem of enumerating strings comprised of three different letters, subject to constraints.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44289493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract. We establish some limit theorems for quasi-arithmetic means of random variables. This class of means contains the arithmetic, geometric and harmonic means. Our feature is that the generators of quasiarithmetic means are allowed to be complex-valued, which makes considerations for quasi-arithmetic means of random variables which could take negative values possible. Our motivation for the limit theorems is finding simple estimators of the parameters of the Cauchy distribution. By applying the limit theorems, we obtain some closed-form unbiased strongly-consistent estimators for the joint of the location and scale parameters of the Cauchy distribution, which are easy to compute and analyze.
{"title":"Limit theorems for quasi-arithmetic means of random variables with applications to point estimations for the Cauchy distribution","authors":"Y. Akaoka, K. Okamura, Y. Otobe","doi":"10.1214/22-BJPS531","DOIUrl":"https://doi.org/10.1214/22-BJPS531","url":null,"abstract":"Abstract. We establish some limit theorems for quasi-arithmetic means of random variables. This class of means contains the arithmetic, geometric and harmonic means. Our feature is that the generators of quasiarithmetic means are allowed to be complex-valued, which makes considerations for quasi-arithmetic means of random variables which could take negative values possible. Our motivation for the limit theorems is finding simple estimators of the parameters of the Cauchy distribution. By applying the limit theorems, we obtain some closed-form unbiased strongly-consistent estimators for the joint of the location and scale parameters of the Cauchy distribution, which are easy to compute and analyze.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42558076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A multivariate mixed-effects model seems to be the most appropriate for gene expression data collected in a crossover trial. It is, however, difficult to obtain reliable results using standard statistical inference when some responses are missing. Particularly for crossover studies, missingness is a serious concern as the trial requires a small number of participants. A Monte Carlo EM (MCEM)-based technique was adopted to deal with this situation. In addition to estimation, MCEM likelihood ratio tests (LRTs) are developed to test fixed effects in crossover models with missing data. Intensive simulation studies were conducted prior to analyzing gene expression data.
{"title":"Likelihood-based missing data analysis in crossover trials","authors":"S. Pareek, K. Das, S. Mukhopadhyay","doi":"10.1214/23-bjps570","DOIUrl":"https://doi.org/10.1214/23-bjps570","url":null,"abstract":"A multivariate mixed-effects model seems to be the most appropriate for gene expression data collected in a crossover trial. It is, however, difficult to obtain reliable results using standard statistical inference when some responses are missing. Particularly for crossover studies, missingness is a serious concern as the trial requires a small number of participants. A Monte Carlo EM (MCEM)-based technique was adopted to deal with this situation. In addition to estimation, MCEM likelihood ratio tests (LRTs) are developed to test fixed effects in crossover models with missing data. Intensive simulation studies were conducted prior to analyzing gene expression data.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43356823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roohollah Roozegar, N. Balakrishnan, A. Bekker, A. Jamalizadeh
In this paper, we establish some results for multivariate selection scale-mixtures of normal distributions with arbitrary mixing variable. First, we discuss their stochastic representation in terms of multivariate selection normal distributions. Next, the conditional distributions as well as the first two moments of multivariate selection scale-mixtures of normal distributions are obtained when the selection set is an arbitrary rectangle in the q-dimensional Euclidean space of R. The unified skew-scale mixture of normal (SUSMN) distributions are subsequently discussed as a special case. As a subclass of SUSMN distributions, the class of unified skew-symmetric generalized hyperbolic (SUSGH) distributions are studied in detail. Finally, we show that our results can be used to obtain moments of L-statistics and of multivariate concomitants from multivariate scale-mixtures of normal distributions.
{"title":"On multivariate selection scale-mixtures of normal distributions","authors":"Roohollah Roozegar, N. Balakrishnan, A. Bekker, A. Jamalizadeh","doi":"10.1214/20-BJPS478","DOIUrl":"https://doi.org/10.1214/20-BJPS478","url":null,"abstract":"In this paper, we establish some results for multivariate selection scale-mixtures of normal distributions with arbitrary mixing variable. First, we discuss their stochastic representation in terms of multivariate selection normal distributions. Next, the conditional distributions as well as the first two moments of multivariate selection scale-mixtures of normal distributions are obtained when the selection set is an arbitrary rectangle in the q-dimensional Euclidean space of R. The unified skew-scale mixture of normal (SUSMN) distributions are subsequently discussed as a special case. As a subclass of SUSMN distributions, the class of unified skew-symmetric generalized hyperbolic (SUSGH) distributions are studied in detail. Finally, we show that our results can be used to obtain moments of L-statistics and of multivariate concomitants from multivariate scale-mixtures of normal distributions.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":"35 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41961519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we consider a semiparametric regression model where the error follows a scale mixture of Gaussian distributions. The purpose is to estimate the target function which is assumed to belong to some class of functions using the EM algorithm and approximations via P -splines and B-splines. We illustrate the proposed methodology through several simulation studies. Other forms of function approximation are also studied, namely Fourier and wavelet expansions.
{"title":"Estimation of semiparametric models with errors following a scale mixture of Gaussian distributions","authors":"Marcelo M. Taddeo, P. Morettin","doi":"10.1214/20-BJPS476","DOIUrl":"https://doi.org/10.1214/20-BJPS476","url":null,"abstract":"In this paper we consider a semiparametric regression model where the error follows a scale mixture of Gaussian distributions. The purpose is to estimate the target function which is assumed to belong to some class of functions using the EM algorithm and approximations via P -splines and B-splines. We illustrate the proposed methodology through several simulation studies. Other forms of function approximation are also studied, namely Fourier and wavelet expansions.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47363127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, there has been a growing interest in integer-valued time series models, including integer-valued autoregressive (INAR) models and integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) models, but only a few of them can deal with data on the full set of integers, i.e., Z = {...,−2,−1, 0, 1, 2, ...}. Although some attempts have been made to deal with Z-valued time series, these models do not provide enough flexibility in modeling some specific integers (e.g. 0, ±1). A symmetric Skellam INGARCH(1,1) model was proposed in the literature, but it only considered zero-mean processes, which limits its application. We first extend the symmetric Skellam INGARCH model to an asymmetric version, which can deal with non-zero-mean processes. Then we propose a modified Skellam model which adopts a careful treatment on integers 0 and ±1 to satisfy a special feature of the data. Our models are easy-to-use and flexible. The maximum likelihood method is used to estimate unknown parameters and the log-likelihood ratio test statistic is provided for testing the asymmetric model against the modified one. Simulation studies are given to evaluate performances of the parametric estimation and log-likelihood ratio test. A real data example is also presented to demonstrate good performances of newly proposed models.
{"title":"Modeling Z-valued time series based on new versions of the Skellam INGARCH model","authors":"Yan Cui, Qi Li, Fukang Zhu","doi":"10.1214/20-BJPS473","DOIUrl":"https://doi.org/10.1214/20-BJPS473","url":null,"abstract":"Recently, there has been a growing interest in integer-valued time series models, including integer-valued autoregressive (INAR) models and integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) models, but only a few of them can deal with data on the full set of integers, i.e., Z = {...,−2,−1, 0, 1, 2, ...}. Although some attempts have been made to deal with Z-valued time series, these models do not provide enough flexibility in modeling some specific integers (e.g. 0, ±1). A symmetric Skellam INGARCH(1,1) model was proposed in the literature, but it only considered zero-mean processes, which limits its application. We first extend the symmetric Skellam INGARCH model to an asymmetric version, which can deal with non-zero-mean processes. Then we propose a modified Skellam model which adopts a careful treatment on integers 0 and ±1 to satisfy a special feature of the data. Our models are easy-to-use and flexible. The maximum likelihood method is used to estimate unknown parameters and the log-likelihood ratio test statistic is provided for testing the asymmetric model against the modified one. Simulation studies are given to evaluate performances of the parametric estimation and log-likelihood ratio test. A real data example is also presented to demonstrate good performances of newly proposed models.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42589973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many extreme events are characterized by being always susceptible to outside influences that will modify their behavior at some point in time. The change point tool has been used in statistical models to detect when these changes occur. This paper presents a model based on a Bayesian approach that describes the behavior of extreme data regarding river quota, which may present more than one change point. In each one of the regimes, the GEV distribution is adjusted and each GEV parameter of each regime is written in function of presence of covariates. In the applications proposed here, the results showed that the model was able to accurately estimate the actual amount of change points in the series, and also showed that it was extremely important to consider them in the analysis, since it was verified that after the change of regime, the levels of return have changed considerably. The results were also able to show which months the occurrence of an extreme event is greater.
{"title":"Regression models for change point data in extremes","authors":"F. Nascimento, Alan da Silva Assunção","doi":"10.1214/20-bjps488","DOIUrl":"https://doi.org/10.1214/20-bjps488","url":null,"abstract":"Many extreme events are characterized by being always susceptible to outside influences that will modify their behavior at some point in time. The change point tool has been used in statistical models to detect when these changes occur. This paper presents a model based on a Bayesian approach that describes the behavior of extreme data regarding river quota, which may present more than one change point. In each one of the regimes, the GEV distribution is adjusted and each GEV parameter of each regime is written in function of presence of covariates. In the applications proposed here, the results showed that the model was able to accurately estimate the actual amount of change points in the series, and also showed that it was extremely important to consider them in the analysis, since it was verified that after the change of regime, the levels of return have changed considerably. The results were also able to show which months the occurrence of an extreme event is greater.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":"35 1","pages":"85-100"},"PeriodicalIF":1.0,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45786525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the indicators for evaluating the capability of a process potential and performance in an effective way is the process capability index (PCI). It is of great significance to quality control engineers as it quantifies the relation between the actual performance of the process and the pre-set specifications of the product. Most of the traditional PCIs performed well when process follows the normal behaviour. In this article, we consider a process capability index, $C_{pk}$, suggested by Kane (Journal of Quality Technology 18 (1986) 41–52) which can be used for normal random variables. The objective of this article is three fold: First, we address different methods of estimation of the process capability index $C_{pk}$ from frequentist approaches for the normal distribution. We briefly describe different frequentist approaches, namely, maximum likelihood estimators, least squares and weighted least squares estimators, maximum product of spacings estimators, Cramer–von-Mises estimators, Anderson–Darling estimators and Right-Tail Anderson–Darling estimators and compare them in terms of their mean squared errors using extensive numerical simulations. Second, we compare three parametric bootstrap confidence intervals (BCIs) namely, standard bootstrap, percentile bootstrap and bias-corrected percentile bootstrap. Third, we consider Bayesian estimation under squared error loss function using normal prior for location parameter and inverse gamma for scale parameter for the considered model. Monte Carlo simulation study has been carried out to compare the performances of the classical BCIs and highest posterior density (HPD) credible intervals of $C_{pk}$ in terms of average widths and coverage probabilities. Finally, two real data sets have been analyzed for illustrative purposes.
以有效的方式评估过程潜力和性能的能力的指标之一是过程能力指数(PCI)。它对质量控制工程师具有重要意义,因为它量化了过程的实际性能和产品的预设规格之间的关系。当流程遵循正常行为时,大多数传统PCI都表现良好。在本文中,我们考虑Kane(Journal of Quality Technology 18(1986)41-52)提出的过程能力指数$C_{pk}$,该指数可用于正态随机变量。本文的目的有三个方面:首先,我们讨论了从正态分布的频率论方法中估计过程能力指数$C{pk}$的不同方法。我们简要描述了不同的频率论方法,即最大似然估计量、最小二乘和加权最小二乘估计量、空间最大乘积估计量、Cramer–von Mises估计量、Anderson–Darling估计量和右尾Anderson–达林估计量,并使用广泛的数值模拟对它们的均方误差进行了比较。其次,我们比较了三种参数bootstrap置信区间,即标准bootstrap、百分位bootstrap和偏差校正百分位bootstrap。第三,我们考虑了在平方误差损失函数下的贝叶斯估计,使用正态先验作为所考虑模型的位置参数,使用逆伽马作为尺度参数。蒙特卡罗模拟研究比较了经典脑机接口和最高后验密度(HPD)可信区间$C_{pk}$在平均宽度和覆盖概率方面的性能。最后,为了便于说明,对两个真实的数据集进行了分析。
{"title":"Confidence intervals of the index $C_{pk}$ for normally distributed quality characteristics using classical and Bayesian methods of estimation","authors":"Mahendra Saha, S. Dey, A. Yadav, Sajid Ali","doi":"10.1214/20-bjps469","DOIUrl":"https://doi.org/10.1214/20-bjps469","url":null,"abstract":"One of the indicators for evaluating the capability of a process potential and performance in an effective way is the process capability index (PCI). It is of great significance to quality control engineers as it quantifies the relation between the actual performance of the process and the pre-set specifications of the product. Most of the traditional PCIs performed well when process follows the normal behaviour. In this article, we consider a process capability index, $C_{pk}$, suggested by Kane (Journal of Quality Technology 18 (1986) 41–52) which can be used for normal random variables. The objective of this article is three fold: First, we address different methods of estimation of the process capability index $C_{pk}$ from frequentist approaches for the normal distribution. We briefly describe different frequentist approaches, namely, maximum likelihood estimators, least squares and weighted least squares estimators, maximum product of spacings estimators, Cramer–von-Mises estimators, Anderson–Darling estimators and Right-Tail Anderson–Darling estimators and compare them in terms of their mean squared errors using extensive numerical simulations. Second, we compare three parametric bootstrap confidence intervals (BCIs) namely, standard bootstrap, percentile bootstrap and bias-corrected percentile bootstrap. Third, we consider Bayesian estimation under squared error loss function using normal prior for location parameter and inverse gamma for scale parameter for the considered model. Monte Carlo simulation study has been carried out to compare the performances of the classical BCIs and highest posterior density (HPD) credible intervals of $C_{pk}$ in terms of average widths and coverage probabilities. Finally, two real data sets have been analyzed for illustrative purposes.","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41350479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preface","authors":"","doi":"10.1214/20-bjps351pre","DOIUrl":"https://doi.org/10.1214/20-bjps351pre","url":null,"abstract":"","PeriodicalId":51242,"journal":{"name":"Brazilian Journal of Probability and Statistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41523855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}