Pub Date : 2021-03-04DOI: 10.1186/s40488-021-00113-4
Vladimir Vladimirovich Vinogradov, Richard Bruce Paris
We introduce two extensions of the canonical Feller–Spitzer distribution from the class of Bessel densities, which comprise two distinct stochastically decreasing one-parameter families of positive absolutely continuous infinitely divisible distributions with monotone densities, whose upper tails exhibit a power decay. The densities of the members of the first class are expressed in terms of the modified Bessel function of the first kind, whereas the members of the second class have the densities of their Lévy measure given by virtue of the same function. The Laplace transforms for both these families possess closed–form representations in terms of specific hypergeometric functions. We obtain the explicit expressions by virtue of the particular parameter value for the moments of the distributions considered and establish the monotonicity of the mean, variance, skewness and excess kurtosis within the families. We derive numerous properties of members of these classes by employing both new and previously known properties of the special functions involved and determine the variance function for the natural exponential family generated by a member of the second class.
{"title":"On two extensions of the canonical Feller–Spitzer distribution","authors":"Vladimir Vladimirovich Vinogradov, Richard Bruce Paris","doi":"10.1186/s40488-021-00113-4","DOIUrl":"https://doi.org/10.1186/s40488-021-00113-4","url":null,"abstract":"We introduce two extensions of the canonical Feller–Spitzer distribution from the class of Bessel densities, which comprise two distinct stochastically decreasing one-parameter families of positive absolutely continuous infinitely divisible distributions with monotone densities, whose upper tails exhibit a power decay. The densities of the members of the first class are expressed in terms of the modified Bessel function of the first kind, whereas the members of the second class have the densities of their Lévy measure given by virtue of the same function. The Laplace transforms for both these families possess closed–form representations in terms of specific hypergeometric functions. We obtain the explicit expressions by virtue of the particular parameter value for the moments of the distributions considered and establish the monotonicity of the mean, variance, skewness and excess kurtosis within the families. We derive numerous properties of members of these classes by employing both new and previously known properties of the special functions involved and determine the variance function for the natural exponential family generated by a member of the second class.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-26DOI: 10.1186/s40488-021-00114-3
Francesco Zuniga, Tomasz J. Kozubowski, Anna K. Panorska
We study the joint distribution of stochastic events described by (X,Y,N), where N has a 1-inflated (or deflated) geometric distribution and X, Y are the sum and the maximum of N exponential random variables. Models with similar structure have been used in several areas of applications, including actuarial science, finance, and weather and climate, where such events naturally arise. We provide basic properties of this class of multivariate distributions of mixed type, and discuss their applications. Our results include marginal and conditional distributions, joint integral transforms, moments and related parameters, stochastic representations, estimation and testing. An example from finance illustrates the modeling potential of this new model.
{"title":"A new trivariate model for stochastic episodes","authors":"Francesco Zuniga, Tomasz J. Kozubowski, Anna K. Panorska","doi":"10.1186/s40488-021-00114-3","DOIUrl":"https://doi.org/10.1186/s40488-021-00114-3","url":null,"abstract":"We study the joint distribution of stochastic events described by (X,Y,N), where N has a 1-inflated (or deflated) geometric distribution and X, Y are the sum and the maximum of N exponential random variables. Models with similar structure have been used in several areas of applications, including actuarial science, finance, and weather and climate, where such events naturally arise. We provide basic properties of this class of multivariate distributions of mixed type, and discuss their applications. Our results include marginal and conditional distributions, joint integral transforms, moments and related parameters, stochastic representations, estimation and testing. An example from finance illustrates the modeling potential of this new model.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-21DOI: 10.1186/s40488-021-00115-2
Kimberly F. Sellers, Ali Arab, Sean Melville, Fanyu Cui
Al-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain over- or under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.
{"title":"A flexible univariate moving average time-series model for dispersed count data","authors":"Kimberly F. Sellers, Ali Arab, Sean Melville, Fanyu Cui","doi":"10.1186/s40488-021-00115-2","DOIUrl":"https://doi.org/10.1186/s40488-021-00115-2","url":null,"abstract":"Al-Osh and Alzaid (1988) consider a Poisson moving average (PMA) model to describe the relation among integer-valued time series data; this model, however, is constrained by the underlying equi-dispersion assumption for count data (i.e., that the variance and the mean equal). This work instead introduces a flexible integer-valued moving average model for count data that contain over- or under-dispersion via the Conway-Maxwell-Poisson (CMP) distribution and related distributions. This first-order sum-of-Conway-Maxwell-Poissons moving average (SCMPMA(1)) model offers a generalizable construct that includes the PMA (among others) as a special case. We highlight the SCMPMA model properties and illustrate its flexibility via simulated data examples.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01Epub Date: 2021-06-24DOI: 10.1186/s40488-021-00121-4
Cindy Xin Feng
Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.
{"title":"A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.","authors":"Cindy Xin Feng","doi":"10.1186/s40488-021-00121-4","DOIUrl":"10.1186/s40488-021-00121-4","url":null,"abstract":"<p><p>Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.</p>","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39698361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-26DOI: 10.1186/s40488-020-00112-x
Haigang Liu, David B. Hitchcock, S. Zahra Samadi
To investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This approach allows the modelling of the main spatio-temporal properties of water height dynamics over multiple locations, accounting for the effect of river network, geomorphology, and forcing rainfall. In this respect, a proximity matrix based on watershed information was used to capture the spatial structure of gage height measurements in and around South Carolina. The temporal structure was handled by a first-order autoregressive term in the model. Several covariates, including the elevation of the sites and effects of seasonality, were examined, along with daily rainfall amount. A non-normal error structure was used to account for the heavy-tailed distribution of maximum gage heights. The proposed model captured some key features of the flood process such as seasonality and a stronger association between precipitation and flooding during summer season. The model is able to forecast short term flood gage height which is crucial for informed emergency decision. As a byproduct, we also developed a Python library to retrieve and handle environmental data provided by some main agencies in the United States. This library can be of general usefulness for studies requiring rainfall, flow, and geomorphological information over specific areas of the conterminous US.
{"title":"Spatio-temporal analysis of flood data from South Carolina","authors":"Haigang Liu, David B. Hitchcock, S. Zahra Samadi","doi":"10.1186/s40488-020-00112-x","DOIUrl":"https://doi.org/10.1186/s40488-020-00112-x","url":null,"abstract":"To investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This approach allows the modelling of the main spatio-temporal properties of water height dynamics over multiple locations, accounting for the effect of river network, geomorphology, and forcing rainfall. In this respect, a proximity matrix based on watershed information was used to capture the spatial structure of gage height measurements in and around South Carolina. The temporal structure was handled by a first-order autoregressive term in the model. Several covariates, including the elevation of the sites and effects of seasonality, were examined, along with daily rainfall amount. A non-normal error structure was used to account for the heavy-tailed distribution of maximum gage heights. The proposed model captured some key features of the flood process such as seasonality and a stronger association between precipitation and flooding during summer season. The model is able to forecast short term flood gage height which is crucial for informed emergency decision. As a byproduct, we also developed a Python library to retrieve and handle environmental data provided by some main agencies in the United States. This library can be of general usefulness for studies requiring rainfall, flow, and geomorphological information over specific areas of the conterminous US.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-28DOI: 10.1186/s40488-020-00111-y
Hsin-Hsiung Huang, Jie Yang
{"title":"Affine-transformation invariant clustering models","authors":"Hsin-Hsiung Huang, Jie Yang","doi":"10.1186/s40488-020-00111-y","DOIUrl":"https://doi.org/10.1186/s40488-020-00111-y","url":null,"abstract":"","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40488-020-00111-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65895414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-19DOI: 10.1186/s40488-020-00109-6
Chang Yu, Daniel Zelterman
We develop the distribution for the number of hypotheses found to be statistically significant using the rule from Simes (Biometrika 73: 751–754, 1986) for controlling the family-wise error rate (FWER). We find the distribution of the number of statistically significant p-values under the null hypothesis and show this follows a normal distribution under the alternative. We propose a parametric distribution ΨI(·) to model the marginal distribution of p-values sampled from a mixture of null uniform and non-uniform distributions under different alternative hypotheses. The ΨI distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit ΨI to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence in sampled p-values using a latent variable. These methods can be combined to illustrate a power analysis in planning a larger study on the basis of a smaller pilot experiment.
{"title":"Distributions associated with simultaneous multiple hypothesis testing","authors":"Chang Yu, Daniel Zelterman","doi":"10.1186/s40488-020-00109-6","DOIUrl":"https://doi.org/10.1186/s40488-020-00109-6","url":null,"abstract":"We develop the distribution for the number of hypotheses found to be statistically significant using the rule from Simes (Biometrika 73: 751–754, 1986) for controlling the family-wise error rate (FWER). We find the distribution of the number of statistically significant p-values under the null hypothesis and show this follows a normal distribution under the alternative. We propose a parametric distribution ΨI(·) to model the marginal distribution of p-values sampled from a mixture of null uniform and non-uniform distributions under different alternative hypotheses. The ΨI distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit ΨI to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence in sampled p-values using a latent variable. These methods can be combined to illustrate a power analysis in planning a larger study on the basis of a smaller pilot experiment.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-06DOI: 10.1186/s40488-020-00110-z
Fadal A.A. Aldhufairi, Jungsywan H. Sepanski
This paper introduces a new family of bivariate copulas constructed using a unit Weibull distortion. Existing copulas play the role of the base or initial copulas that are transformed or distorted into a new family of copulas with additional parameters, allowing more flexibility and better fit to data. We present a general form for the new bivariate copula function and its conditional and density distributions. The tail behaviors are investigated and indicate the unit Weibull distortion may result in new copulas with upper tail dependence when the base copula has no upper tail dependence. The concordance ordering and Kendall’s tau are derived for the cases when the base copulas are Archimedean, such as the Clayton and Frank copulas. The Loss-ALEA data are analyzed to evaluate the performance of the proposed new families of copulas.
{"title":"New families of bivariate copulas via unit weibull distortion","authors":"Fadal A.A. Aldhufairi, Jungsywan H. Sepanski","doi":"10.1186/s40488-020-00110-z","DOIUrl":"https://doi.org/10.1186/s40488-020-00110-z","url":null,"abstract":"This paper introduces a new family of bivariate copulas constructed using a unit Weibull distortion. Existing copulas play the role of the base or initial copulas that are transformed or distorted into a new family of copulas with additional parameters, allowing more flexibility and better fit to data. We present a general form for the new bivariate copula function and its conditional and density distributions. The tail behaviors are investigated and indicate the unit Weibull distortion may result in new copulas with upper tail dependence when the base copula has no upper tail dependence. The concordance ordering and Kendall’s tau are derived for the cases when the base copulas are Archimedean, such as the Clayton and Frank copulas. The Loss-ALEA data are analyzed to evaluate the performance of the proposed new families of copulas.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-07DOI: 10.1186/s40488-020-00107-8
Mohammad A. Aljarrah, Felix Famoye, Carl Lee
A new generalized asymmetric logistic distribution is defined. In some cases, existing three parameter distributions provide poor fit to heavy tailed data sets. The proposed new distribution consists of only three parameters and is shown to fit a much wider range of heavy left and right tailed data when compared with various existing distributions. The new generalized distribution has logistic, maximum and minimum Gumbel distributions as sub-models. Some properties of the new distribution including mode, skewness, kurtosis, hazard function, and moments are studied. We propose the method of maximum likelihood to estimate the parameters and assess the finite sample size performance of the method. A generalized logistic regression model, based on the new distribution, is presented. Logistic-log-logistic regression, Weibull-extreme value regression and log-Fréchet regression are special cases of the generalized logistic regression model. The model is applied to fit failure time of a new insulation technique and the survival of a heart transplant study.
{"title":"Generalized logistic distribution and its regression model","authors":"Mohammad A. Aljarrah, Felix Famoye, Carl Lee","doi":"10.1186/s40488-020-00107-8","DOIUrl":"https://doi.org/10.1186/s40488-020-00107-8","url":null,"abstract":"A new generalized asymmetric logistic distribution is defined. In some cases, existing three parameter distributions provide poor fit to heavy tailed data sets. The proposed new distribution consists of only three parameters and is shown to fit a much wider range of heavy left and right tailed data when compared with various existing distributions. The new generalized distribution has logistic, maximum and minimum Gumbel distributions as sub-models. Some properties of the new distribution including mode, skewness, kurtosis, hazard function, and moments are studied. We propose the method of maximum likelihood to estimate the parameters and assess the finite sample size performance of the method. A generalized logistic regression model, based on the new distribution, is presented. Logistic-log-logistic regression, Weibull-extreme value regression and log-Fréchet regression are special cases of the generalized logistic regression model. The model is applied to fit failure time of a new insulation technique and the survival of a heart transplant study.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-05DOI: 10.1186/s40488-020-00106-9
Jose Guardiola
{"title":"The spherical-Dirichlet distribution","authors":"Jose Guardiola","doi":"10.1186/s40488-020-00106-9","DOIUrl":"https://doi.org/10.1186/s40488-020-00106-9","url":null,"abstract":"","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40488-020-00106-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65895331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}