To test the null hypothesis of a Poisson marginal distribution, test statistics based on the Stein–Chen identity are proposed. For a wide class of Poisson count time series, the asymptotic distribution of different types of Stein–Chen statistics is derived, also if multiple statistics are jointly applied. The performance of the tests is analyzed with simulations, as well as the question which Stein–Chen functions should be used for which alternative. Illustrative data examples are presented, and possible extensions of the novel Stein–Chen approach are discussed as well.
{"title":"Goodness‐of‐fit tests for Poisson count time series based on the Stein–Chen identity","authors":"Boris Aleksandrov, C. Weiß, C. Jentsch","doi":"10.1111/stan.12252","DOIUrl":"https://doi.org/10.1111/stan.12252","url":null,"abstract":"To test the null hypothesis of a Poisson marginal distribution, test statistics based on the Stein–Chen identity are proposed. For a wide class of Poisson count time series, the asymptotic distribution of different types of Stein–Chen statistics is derived, also if multiple statistics are jointly applied. The performance of the tests is analyzed with simulations, as well as the question which Stein–Chen functions should be used for which alternative. Illustrative data examples are presented, and possible extensions of the novel Stein–Chen approach are discussed as well.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88392751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The behavior and spatial distribution of crime events can be explained through the characterization of an area in terms of its demography, socioeconomy, and built environment. In particular, recent studies on the incidence of crime in a city have focused on the identification of features of the built environment (specific places or facilities) that may increase crime risk within a certain radius. However, it is hard to identify environmental characteristics that consistently explain crime occurrence across cities and crime types. This article focuses on the assessment of the effect that certain types of places have on the incidence of property crime, robbery, and vandalism in three cities of the Valencian region (Spain): Alicante, Castellon, and Valencia. A nonlinear effects model is used to identify such places and to construct a risk map over the three cities considering the three crime types under research. The results obtained suggest that there are remarkable differences across cities and crime types in terms of the types of places associated with crime outcomes. The identification of high‐risk areas allows verifying that crime is highly concentrated, and also that there is a high level of spatial overlap between the high‐risk areas corresponding to different crime types.
{"title":"Identifying crime generators and spatially overlapping high‐risk areas through a nonlinear model: A comparison between three cities of the Valencian region (Spain)","authors":"Á. Briz‐Redón, J. Mateu, F. Montes","doi":"10.1111/stan.12254","DOIUrl":"https://doi.org/10.1111/stan.12254","url":null,"abstract":"The behavior and spatial distribution of crime events can be explained through the characterization of an area in terms of its demography, socioeconomy, and built environment. In particular, recent studies on the incidence of crime in a city have focused on the identification of features of the built environment (specific places or facilities) that may increase crime risk within a certain radius. However, it is hard to identify environmental characteristics that consistently explain crime occurrence across cities and crime types. This article focuses on the assessment of the effect that certain types of places have on the incidence of property crime, robbery, and vandalism in three cities of the Valencian region (Spain): Alicante, Castellon, and Valencia. A nonlinear effects model is used to identify such places and to construct a risk map over the three cities considering the three crime types under research. The results obtained suggest that there are remarkable differences across cities and crime types in terms of the types of places associated with crime outcomes. The identification of high‐risk areas allows verifying that crime is highly concentrated, and also that there is a high level of spatial overlap between the high‐risk areas corresponding to different crime types.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81648311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing regional population compositions is an important task in many research fields. Small area estimation with generalized linear mixed models marks a powerful tool for this purpose. However, the method has limitations in practice. When the data are subject to measurement errors, small area models produce inefficient or biased results since they cannot account for data uncertainty. This is particularly problematic for composition prediction, since generalized linear mixed models often rely on approximate likelihood inference. Obtained predictions are not reliable. We propose a robust multivariate Fay–Herriot model to solve these issues. It combines compositional data analysis with robust optimization theory. The nonlinear estimation of compositions is restated as a linear problem through isometric logratio transformations. Robust model parameter estimation is performed via penalized maximum likelihood. A robust best predictor is derived. Simulations are conducted to demonstrate the effectiveness of the approach. An application to alcohol consumption in Germany is provided.
{"title":"Robust prediction of domain compositions from uncertain data using isometric logratio transformations in a penalized multivariate Fay–Herriot model","authors":"J. Krause, J. P. Burgard, D. Morales","doi":"10.1111/stan.12253","DOIUrl":"https://doi.org/10.1111/stan.12253","url":null,"abstract":"Assessing regional population compositions is an important task in many research fields. Small area estimation with generalized linear mixed models marks a powerful tool for this purpose. However, the method has limitations in practice. When the data are subject to measurement errors, small area models produce inefficient or biased results since they cannot account for data uncertainty. This is particularly problematic for composition prediction, since generalized linear mixed models often rely on approximate likelihood inference. Obtained predictions are not reliable. We propose a robust multivariate Fay–Herriot model to solve these issues. It combines compositional data analysis with robust optimization theory. The nonlinear estimation of compositions is restated as a linear problem through isometric logratio transformations. Robust model parameter estimation is performed via penalized maximum likelihood. A robust best predictor is derived. Simulations are conducted to demonstrate the effectiveness of the approach. An application to alcohol consumption in Germany is provided.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81420779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary analysis of time‐to‐event data typically makes the censoring at random assumption, that is, that—conditional on covariates in the model—the distribution of event times is the same, whether they are observed or unobserved. In such cases, we need to explore the robustness of inference to more pragmatic assumptions about patients post‐censoring in sensitivity analyses. Reference‐based multiple imputation, which avoids analysts explicitly specifying the parameters of the unobserved data distribution, has proved attractive to researchers. Building on results for longitudinal continuous data, we show that inference using a Tobit regression imputation model for reference‐based sensitivity analysis with right censored log normal data is information anchored, meaning the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. We illustrate our theoretical results using simulation and a clinical trial case study.
{"title":"Information anchored reference‐based sensitivity analysis for truncated normal data with application to survival analysis","authors":"A. Atkinson, S. Cro, J. Carpenter, M. Kenward","doi":"10.1111/stan.12250","DOIUrl":"https://doi.org/10.1111/stan.12250","url":null,"abstract":"The primary analysis of time‐to‐event data typically makes the censoring at random assumption, that is, that—conditional on covariates in the model—the distribution of event times is the same, whether they are observed or unobserved. In such cases, we need to explore the robustness of inference to more pragmatic assumptions about patients post‐censoring in sensitivity analyses. Reference‐based multiple imputation, which avoids analysts explicitly specifying the parameters of the unobserved data distribution, has proved attractive to researchers. Building on results for longitudinal continuous data, we show that inference using a Tobit regression imputation model for reference‐based sensitivity analysis with right censored log normal data is information anchored, meaning the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. We illustrate our theoretical results using simulation and a clinical trial case study.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85623926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The product of two zero mean correlated normal random variables, and more generally the sum of independent copies of such random variables, has received much attention in the statistics literature and appears in many application areas. However, many important distributional properties are yet to be recorded. This review paper fills this gap by providing the basic distributional theory for the sum of independent copies of the product of two zero mean correlated normal random variables. Properties covered include probability and cumulative distribution functions, generating functions, moments and cumulants, mode and median, Stein characterisations, representations in terms of other random variables, and a list of related distributions. We also review how the product of two zero mean correlated normal random variables arises naturally as a limiting distribution, with an example given for the distributional approximation of double Wiener‐Itô integrals.
{"title":"The basic distributional theory for the product of zero mean correlated normal random variables","authors":"Robert E. Gaunt","doi":"10.1111/stan.12267","DOIUrl":"https://doi.org/10.1111/stan.12267","url":null,"abstract":"The product of two zero mean correlated normal random variables, and more generally the sum of independent copies of such random variables, has received much attention in the statistics literature and appears in many application areas. However, many important distributional properties are yet to be recorded. This review paper fills this gap by providing the basic distributional theory for the sum of independent copies of the product of two zero mean correlated normal random variables. Properties covered include probability and cumulative distribution functions, generating functions, moments and cumulants, mode and median, Stein characterisations, representations in terms of other random variables, and a list of related distributions. We also review how the product of two zero mean correlated normal random variables arises naturally as a limiting distribution, with an example given for the distributional approximation of double Wiener‐Itô integrals.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76103266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The beta family owes its privileged status within unit interval distributions to several relevant features such as, for example, easiness of interpretation and versatility in modeling different types of data. However, the flexibility of its density at the endpoints of the support is poor enough to prevent from properly modeling the data portions having values next to zero and one. Such a drawback can be overcome by resorting to the class of the noncentral beta distributions. Indeed, the latter allows the density to take on arbitrary positive and finite limits which have a really simple form. Nevertheless, the analytical and mathematical complexity of this distribution poses strong limitations on its use as a model for data on the real interval (0, 1). That said, an in‐depth study of a newly found analogue of the noncentral beta distribution is carried out in this article. The latter preserves the applicative potential of the standard noncentral beta class but with the advantage of showing a more straightforward and easily handleable density.
{"title":"On the conditional noncentral beta distribution","authors":"C. Orsi","doi":"10.1111/stan.12249","DOIUrl":"https://doi.org/10.1111/stan.12249","url":null,"abstract":"The beta family owes its privileged status within unit interval distributions to several relevant features such as, for example, easiness of interpretation and versatility in modeling different types of data. However, the flexibility of its density at the endpoints of the support is poor enough to prevent from properly modeling the data portions having values next to zero and one. Such a drawback can be overcome by resorting to the class of the noncentral beta distributions. Indeed, the latter allows the density to take on arbitrary positive and finite limits which have a really simple form. Nevertheless, the analytical and mathematical complexity of this distribution poses strong limitations on its use as a model for data on the real interval (0, 1). That said, an in‐depth study of a newly found analogue of the noncentral beta distribution is carried out in this article. The latter preserves the applicative potential of the standard noncentral beta class but with the advantage of showing a more straightforward and easily handleable density.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80063061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Random walks, intrinsic autoregression, state‐space models, smoothing splines, and so on have been widely used in various areas of statistics. However, practitioners wanting to fit these models using existing packages for random‐effects models are often faced with the difficulty that their covariance matrices are not uniquely determined. Unfortunately, different specifications of the model lead to different covariance structures, giving different analyses. Even if we make a decision on specification it is not immediately obvious how to make inferences from these models. There have been various suggestions on how to overcome such difficulties. However, they differ, implying that there is as yet no agreed remedy. In this article we provide a unified view on these alternatives and show how the analysis can be made invariant with respect to the choice of covariance by inclusion of a suitable set of covariates. Several examples are used to illustrate the approach.
{"title":"Resolving the ambiguity of random‐effects models with singular precision matrix","authors":"Woojoo Lee, H. Piepho, Youngjo Lee","doi":"10.1111/stan.12244","DOIUrl":"https://doi.org/10.1111/stan.12244","url":null,"abstract":"Random walks, intrinsic autoregression, state‐space models, smoothing splines, and so on have been widely used in various areas of statistics. However, practitioners wanting to fit these models using existing packages for random‐effects models are often faced with the difficulty that their covariance matrices are not uniquely determined. Unfortunately, different specifications of the model lead to different covariance structures, giving different analyses. Even if we make a decision on specification it is not immediately obvious how to make inferences from these models. There have been various suggestions on how to overcome such difficulties. However, they differ, implying that there is as yet no agreed remedy. In this article we provide a unified view on these alternatives and show how the analysis can be made invariant with respect to the choice of covariance by inclusion of a suitable set of covariates. Several examples are used to illustrate the approach.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82348589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. M. Bayer, Francisco Cribari‐Neto, Jéssica Santos
Models based on the Kumaraswamy law are used with variables that assume values in (0, 1). In some cases, however, the data contain zeros and/or ones, that is, there is data inflation. We introduce a class of regression models that can be used with such inflated data, namely: the class of inflated Kumaraswamy regression models. We consider inflation at zero, at one, and at both zero and one. We introduce the model and provide closed‐form expressions for its score vector and Fisher's information matrix. The proposed model is used to evaluate the impacts of different conditioning variables on the proportion of people who live in households with inadequate water supply and sewage in Brazilian municipalities. Our results reveal that policies directed to increasing the population share with college education in places where it is low are particularly effective in reducing the prevalence of people who live under inadequate sanitation conditions.
{"title":"Inflated Kumaraswamy regressions with application to water supply and sanitation in Brazil","authors":"F. M. Bayer, Francisco Cribari‐Neto, Jéssica Santos","doi":"10.1111/stan.12242","DOIUrl":"https://doi.org/10.1111/stan.12242","url":null,"abstract":"Models based on the Kumaraswamy law are used with variables that assume values in (0, 1). In some cases, however, the data contain zeros and/or ones, that is, there is data inflation. We introduce a class of regression models that can be used with such inflated data, namely: the class of inflated Kumaraswamy regression models. We consider inflation at zero, at one, and at both zero and one. We introduce the model and provide closed‐form expressions for its score vector and Fisher's information matrix. The proposed model is used to evaluate the impacts of different conditioning variables on the proportion of people who live in households with inadequate water supply and sewage in Brazilian municipalities. Our results reveal that policies directed to increasing the population share with college education in places where it is low are particularly effective in reducing the prevalence of people who live under inadequate sanitation conditions.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79501911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a system of dependent Poisson variables, where each variable is the sum of an independent variate and a common variate. It is the common variate that creates the dependence. Within this system, a test of independence may be constructed where the null hypothesis is that the common variate is identically zero. In the present paper, we consider the maximum log likelihood ratio test. For this test, it is well‐known that the asymptotic distribution of the test statistic is an equal mixture of zero and a chi‐square distribution with one degree of freedom. We examine a Bartlett correction of the test, in the hope that we will get better approximation of the nominal size for moderately large sample sizes. A correction of this type is explicitly derived, and its usefulness is explored in a simulation study. For practical purposes, the correction is found to be useful in dimension two, but not in higher dimensions.
{"title":"Bartlett correction of an independence test in a multivariate Poisson model","authors":"Rolf Larsson","doi":"10.1111/stan.12265","DOIUrl":"https://doi.org/10.1111/stan.12265","url":null,"abstract":"We consider a system of dependent Poisson variables, where each variable is the sum of an independent variate and a common variate. It is the common variate that creates the dependence. Within this system, a test of independence may be constructed where the null hypothesis is that the common variate is identically zero. In the present paper, we consider the maximum log likelihood ratio test. For this test, it is well‐known that the asymptotic distribution of the test statistic is an equal mixture of zero and a chi‐square distribution with one degree of freedom. We examine a Bartlett correction of the test, in the hope that we will get better approximation of the nominal size for moderately large sample sizes. A correction of this type is explicitly derived, and its usefulness is explored in a simulation study. For practical purposes, the correction is found to be useful in dimension two, but not in higher dimensions.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87499206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we introduce the mixed estimators based on product least relative error estimation and least squares estimation in a multiplicative linear regression model. The asymptotic properties for the mixed estimators are established. We present some explicit expressions of the optimal estimator of the mixed estimators, and we also suggest some numerical solutions in the simulation studies and real data analysis. Studying model checking problems for multiplicative linear regression models, we propose four test statistics. One is the score‐type test statistic, the second one is the residual‐based empirical process test statistic marked by proper functions of the covariates. The third one is the integrated conditional moment test statistic by using linear projection weighting function, and the fourth one is the adaptive model test statistic. These test statistics are all related to the mixed estimators. The asymptotic properties of these test statistics are established, and some bootstrap procedures for calculating the critical values are also proposed. Simulation studies are conducted to demonstrate the performance of the proposed estimation procedures, and a real example is analyzed to illustrate its practical usage.
{"title":"Model checking for multiplicative linear regression models with mixed estimators","authors":"Jun Zhang","doi":"10.1111/stan.12239","DOIUrl":"https://doi.org/10.1111/stan.12239","url":null,"abstract":"In this paper, we introduce the mixed estimators based on product least relative error estimation and least squares estimation in a multiplicative linear regression model. The asymptotic properties for the mixed estimators are established. We present some explicit expressions of the optimal estimator of the mixed estimators, and we also suggest some numerical solutions in the simulation studies and real data analysis. Studying model checking problems for multiplicative linear regression models, we propose four test statistics. One is the score‐type test statistic, the second one is the residual‐based empirical process test statistic marked by proper functions of the covariates. The third one is the integrated conditional moment test statistic by using linear projection weighting function, and the fourth one is the adaptive model test statistic. These test statistics are all related to the mixed estimators. The asymptotic properties of these test statistics are established, and some bootstrap procedures for calculating the critical values are also proposed. Simulation studies are conducted to demonstrate the performance of the proposed estimation procedures, and a real example is analyzed to illustrate its practical usage.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87409760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}