Pub Date : 2021-12-12DOI: 10.1186/s40488-021-00116-1
Kumar, C. Satheesh, Nair, Subha R.
In this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution. Several theoretical properties of the distribution are studied in detail including expressions for its probability density function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Certain structural properties of the distribution along with expressions for reliability measures as well as the distribution and moments of order statistics are obtained. Also we discuss the maximum likelihood estimation of the parameters of the proposed distribution and illustrate the usefulness of the model through real life examples. In addition, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets.
{"title":"A generalization to the log-inverse Weibull distribution and its applications in cancer research","authors":"Kumar, C. Satheesh, Nair, Subha R.","doi":"10.1186/s40488-021-00116-1","DOIUrl":"https://doi.org/10.1186/s40488-021-00116-1","url":null,"abstract":"In this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution. Several theoretical properties of the distribution are studied in detail including expressions for its probability density function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Certain structural properties of the distribution along with expressions for reliability measures as well as the distribution and moments of order statistics are obtained. Also we discuss the maximum likelihood estimation of the parameters of the proposed distribution and illustrate the usefulness of the model through real life examples. In addition, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-06DOI: 10.1186/s40488-021-00125-0
H. Nguyen, TrungTin Nguyen, Faicel Chamroukhi, Geoffrey John McLachlan
{"title":"Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models","authors":"H. Nguyen, TrungTin Nguyen, Faicel Chamroukhi, Geoffrey John McLachlan","doi":"10.1186/s40488-021-00125-0","DOIUrl":"https://doi.org/10.1186/s40488-021-00125-0","url":null,"abstract":"","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40488-021-00125-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65895569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-01DOI: 10.1186/s40488-021-00124-1
Pakes, Anthony G.
A family of generalised Planck (GP) laws is defined and its structural properties explored. Sometimes subject to parameter restrictions, a GP law is a randomly scaled gamma law; it arises as the equilibrium law of a perturbed version of the Feller mean reverting diffusion; the density functions can be decreasing, unimodal or bimodal; it is infinitely divisible. It is argued that the GP law is not a generalised gamma convolution. Characterisations are obtained in terms of invariance under random contraction of a weighted version of a related law. The GP law is a particular instance of equilibrium laws obtained from a recursion suggested by a genetic mutation-selection balance model. Some related infinitely divisible laws are exhibited.
{"title":"Structural properties of generalised Planck distributions","authors":"Pakes, Anthony G.","doi":"10.1186/s40488-021-00124-1","DOIUrl":"https://doi.org/10.1186/s40488-021-00124-1","url":null,"abstract":"A family of generalised Planck (GP) laws is defined and its structural properties explored. Sometimes subject to parameter restrictions, a GP law is a randomly scaled gamma law; it arises as the equilibrium law of a perturbed version of the Feller mean reverting diffusion; the density functions can be decreasing, unimodal or bimodal; it is infinitely divisible. It is argued that the GP law is not a generalised gamma convolution. Characterisations are obtained in terms of invariance under random contraction of a weighted version of a related law. The GP law is a particular instance of equilibrium laws obtained from a recursion suggested by a genetic mutation-selection balance model. Some related infinitely divisible laws are exhibited.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-19DOI: 10.1186/s40488-021-00127-y
D. Hamed, Ahmad Alzaghal
{"title":"New class of Lindley distributions: properties and applications","authors":"D. Hamed, Ahmad Alzaghal","doi":"10.1186/s40488-021-00127-y","DOIUrl":"https://doi.org/10.1186/s40488-021-00127-y","url":null,"abstract":"","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40488-021-00127-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65895647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-18DOI: 10.1186/s40488-021-00123-2
Kyung Serk Cho, Hon Keung Tony Ng
A tolerance interval is a statistical interval that covers at least 100ρ% of the population of interest with a 100(1−α)% confidence, where ρ and α are pre-specified values in (0, 1). In many scientific fields, such as pharmaceutical sciences, manufacturing processes, clinical sciences, and environmental sciences, tolerance intervals are used for statistical inference and quality control. Despite the usefulness of tolerance intervals, the procedures to compute tolerance intervals are not commonly implemented in statistical software packages. This paper aims to provide a comparative study of the computational procedures for tolerance intervals in some commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. On the other hand, we also investigate the effect of misspecifying the underlying probability model on the performance of tolerance intervals. We study the performance of tolerance intervals when the assumed distribution is the same as the true underlying distribution and when the assumed distribution is different from the true distribution via a Monte Carlo simulation study. We also propose a robust model selection approach to obtain tolerance intervals that are relatively insensitive to the model misspecification. We show that the proposed robust model selection approach performs well when the underlying distribution is unknown but candidate distributions are available.
{"title":"Tolerance intervals in statistical software and robustness under model misspecification","authors":"Kyung Serk Cho, Hon Keung Tony Ng","doi":"10.1186/s40488-021-00123-2","DOIUrl":"https://doi.org/10.1186/s40488-021-00123-2","url":null,"abstract":"A tolerance interval is a statistical interval that covers at least 100ρ% of the population of interest with a 100(1−α)% confidence, where ρ and α are pre-specified values in (0, 1). In many scientific fields, such as pharmaceutical sciences, manufacturing processes, clinical sciences, and environmental sciences, tolerance intervals are used for statistical inference and quality control. Despite the usefulness of tolerance intervals, the procedures to compute tolerance intervals are not commonly implemented in statistical software packages. This paper aims to provide a comparative study of the computational procedures for tolerance intervals in some commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. On the other hand, we also investigate the effect of misspecifying the underlying probability model on the performance of tolerance intervals. We study the performance of tolerance intervals when the assumed distribution is the same as the true underlying distribution and when the assumed distribution is different from the true distribution via a Monte Carlo simulation study. We also propose a robust model selection approach to obtain tolerance intervals that are relatively insensitive to the model misspecification. We show that the proposed robust model selection approach performs well when the underlying distribution is unknown but candidate distributions are available.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-08DOI: 10.1186/s40488-021-00126-z
Demba Fofana, E. O. George, Dale Bowman
Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.
{"title":"Combining assumptions and graphical network into gene expression data analysis","authors":"Demba Fofana, E. O. George, Dale Bowman","doi":"10.1186/s40488-021-00126-z","DOIUrl":"https://doi.org/10.1186/s40488-021-00126-z","url":null,"abstract":"Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-12DOI: 10.1186/s40488-021-00120-5
Charles K. Amponsah, Tomasz J. Kozubowski, Anna K. Panorska
We propose a new stochastic model describing the joint distribution of (X,N), where N is a counting variable while X is the sum of N independent gamma random variables. We present the main properties of this general model, which include marginal and conditional distributions, integral transforms, moments and parameter estimation. We also discuss in more detail a special case where N has a heavy tailed discrete Pareto distribution. An example from finance illustrates the modeling potential of this new mixed bivariate distribution.
{"title":"A general stochastic model for bivariate episodes driven by a gamma sequence","authors":"Charles K. Amponsah, Tomasz J. Kozubowski, Anna K. Panorska","doi":"10.1186/s40488-021-00120-5","DOIUrl":"https://doi.org/10.1186/s40488-021-00120-5","url":null,"abstract":"We propose a new stochastic model describing the joint distribution of (X,N), where N is a counting variable while X is the sum of N independent gamma random variables. We present the main properties of this general model, which include marginal and conditional distributions, integral transforms, moments and parameter estimation. We also discuss in more detail a special case where N has a heavy tailed discrete Pareto distribution. An example from finance illustrates the modeling potential of this new mixed bivariate distribution.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-16DOI: 10.1186/s40488-021-00119-y
Alexander D. Knudson, Tomasz J. Kozubowski, Anna K. Panorska, A. Grant Schissler
We propose a flexible multivariate stochastic model for over-dispersed count data. Our methodology is built upon mixed Poisson random vectors (Y1,…,Yd), where the {Yi} are conditionally independent Poisson random variables. The stochastic rates of the {Yi} are multivariate distributions with arbitrary non-negative margins linked by a copula function. We present basic properties of these mixed Poisson multivariate distributions and provide several examples. A particular case with geometric and negative binomial marginal distributions is studied in detail. We illustrate an application of our model by conducting a high-dimensional simulation motivated by RNA-sequencing data.
{"title":"A flexible multivariate model for high-dimensional correlated count data","authors":"Alexander D. Knudson, Tomasz J. Kozubowski, Anna K. Panorska, A. Grant Schissler","doi":"10.1186/s40488-021-00119-y","DOIUrl":"https://doi.org/10.1186/s40488-021-00119-y","url":null,"abstract":"We propose a flexible multivariate stochastic model for over-dispersed count data. Our methodology is built upon mixed Poisson random vectors (Y1,…,Yd), where the {Yi} are conditionally independent Poisson random variables. The stochastic rates of the {Yi} are multivariate distributions with arbitrary non-negative margins linked by a copula function. We present basic properties of these mixed Poisson multivariate distributions and provide several examples. A particular case with geometric and negative binomial marginal distributions is studied in detail. We illustrate an application of our model by conducting a high-dimensional simulation motivated by RNA-sequencing data.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-06DOI: 10.1186/s40488-021-00117-0
Yixuan Zou, Jan Hannig, D. S. Young
{"title":"Generalized fiducial inference on the mean of zero-inflated Poisson and Poisson hurdle models","authors":"Yixuan Zou, Jan Hannig, D. S. Young","doi":"10.1186/s40488-021-00117-0","DOIUrl":"https://doi.org/10.1186/s40488-021-00117-0","url":null,"abstract":"","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40488-021-00117-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65895438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-05DOI: 10.1186/s40488-021-00118-z
Huihui Lin, N. Rao Chaganty
Correlated binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. The generalized estimating equations (GEEs) and the multivariate probit (MP) model are two of the popular methods for analyzing such data. However, both methods have some significant drawbacks. The GEEs may not have an underlying likelihood and the MP model may fail to generate a multivariate binary distribution with specified marginals and bivariate correlations. In this paper, we study multivariate binary distributions that are based on D-vine pair-copula models as a superior alternative to these methods. We elucidate the construction of these binary distributions in two and three dimensions with numerical examples. For higher dimensions, we provide a method of constructing a multidimensional binary distribution with specified marginals and equicorrelated correlation matrix. We present a real-life data analysis to illustrate the application of our results.
{"title":"Multivariate distributions of correlated binary variables generated by pair-copulas","authors":"Huihui Lin, N. Rao Chaganty","doi":"10.1186/s40488-021-00118-z","DOIUrl":"https://doi.org/10.1186/s40488-021-00118-z","url":null,"abstract":"Correlated binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. The generalized estimating equations (GEEs) and the multivariate probit (MP) model are two of the popular methods for analyzing such data. However, both methods have some significant drawbacks. The GEEs may not have an underlying likelihood and the MP model may fail to generate a multivariate binary distribution with specified marginals and bivariate correlations. In this paper, we study multivariate binary distributions that are based on D-vine pair-copula models as a superior alternative to these methods. We elucidate the construction of these binary distributions in two and three dimensions with numerical examples. For higher dimensions, we provide a method of constructing a multidimensional binary distribution with specified marginals and equicorrelated correlation matrix. We present a real-life data analysis to illustrate the application of our results.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138503845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}