Pub Date : 2016-02-16DOI: 10.1007/978-3-319-11259-6_23-1
Y. Marzouk, T. Moselhy, M. Parno, Alessio Spantini
{"title":"An introduction to sampling via measure transport","authors":"Y. Marzouk, T. Moselhy, M. Parno, Alessio Spantini","doi":"10.1007/978-3-319-11259-6_23-1","DOIUrl":"https://doi.org/10.1007/978-3-319-11259-6_23-1","url":null,"abstract":"","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"108 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74659204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-16DOI: 10.4310/AMSA.2018.V3.N2.A2
Jiming Jiang, P. Lahiri, Thuan Nguyen
We consider estimation of measure of uncertainty in small area estimation (SAE) when a procedure of model selection is involved prior to the estimation. A unified Monte-Carlo jackknife method, called McJack, is proposed for estimating the logarithm of the mean squared prediction error. We prove the second-order unbiasedness of McJack, and demonstrate the performance of McJack in assessing uncertainty in SAE after model selection through empirical investigations that include simulation studies and real-data analyses.
{"title":"A Unified Monte-Carlo Jackknife for Small Area Estimation after Model Selection","authors":"Jiming Jiang, P. Lahiri, Thuan Nguyen","doi":"10.4310/AMSA.2018.V3.N2.A2","DOIUrl":"https://doi.org/10.4310/AMSA.2018.V3.N2.A2","url":null,"abstract":"We consider estimation of measure of uncertainty in small area estimation (SAE) when a procedure of model selection is involved prior to the estimation. A unified Monte-Carlo jackknife method, called McJack, is proposed for estimating the logarithm of the mean squared prediction error. We prove the second-order unbiasedness of McJack, and demonstrate the performance of McJack in assessing uncertainty in SAE after model selection through empirical investigations that include simulation studies and real-data analyses.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86255428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-12DOI: 10.4310/SII.2016.V9.N4.A11
Qiaoya Zhang, Yiyuan She
Principal Component Analysis (PCA) is a dimension reduction technique. It produces inconsistent estimators when the dimensionality is moderate to high, which is often the problem in modern large-scale applications where algorithm scalability and model interpretability are difficult to achieve, not to mention the prevalence of missing values. While existing sparse PCA methods alleviate inconsistency, they are constrained to the Gaussian assumption of classical PCA and fail to address algorithm scalability issues. We generalize sparse PCA to the broad exponential family distributions under high-dimensional setup, with built-in treatment for missing values. Meanwhile we propose a family of iterative sparse generalized PCA (SG-PCA) algorithms such that despite the non-convexity and non-smoothness of the optimization task, the loss function decreases in every iteration. In terms of ease and intuitive parameter tuning, our sparsity-inducing regularization is far superior to the popular Lasso. Furthermore, to promote overall scalability, accelerated gradient is integrated for fast convergence, while a progressive screening technique gradually squeezes out nuisance dimensions of a large-scale problem for feasible optimization. High-dimensional simulation and real data experiments demonstrate the efficiency and efficacy of SG-PCA.
{"title":"Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity","authors":"Qiaoya Zhang, Yiyuan She","doi":"10.4310/SII.2016.V9.N4.A11","DOIUrl":"https://doi.org/10.4310/SII.2016.V9.N4.A11","url":null,"abstract":"Principal Component Analysis (PCA) is a dimension reduction technique. It produces inconsistent estimators when the dimensionality is moderate to high, which is often the problem in modern large-scale applications where algorithm scalability and model interpretability are difficult to achieve, not to mention the prevalence of missing values. While existing sparse PCA methods alleviate inconsistency, they are constrained to the Gaussian assumption of classical PCA and fail to address algorithm scalability issues. We generalize sparse PCA to the broad exponential family distributions under high-dimensional setup, with built-in treatment for missing values. Meanwhile we propose a family of iterative sparse generalized PCA (SG-PCA) algorithms such that despite the non-convexity and non-smoothness of the optimization task, the loss function decreases in every iteration. In terms of ease and intuitive parameter tuning, our sparsity-inducing regularization is far superior to the popular Lasso. Furthermore, to promote overall scalability, accelerated gradient is integrated for fast convergence, while a progressive screening technique gradually squeezes out nuisance dimensions of a large-scale problem for feasible optimization. High-dimensional simulation and real data experiments demonstrate the efficiency and efficacy of SG-PCA.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73104443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Agostinelli, A. Marazzi, V. Yohai, A. Randriamiharisoa
robustloggamma is an R package for robust estimation and inference in the generalized loggamma model. We briefly introduce the model, the estimation procedures and the computational algorithms. Then, we illustrate the use of the package with the help of a real data set.
{"title":"Robust Estimation of the Generalized Loggamma Model. The R Package robustloggamma","authors":"C. Agostinelli, A. Marazzi, V. Yohai, A. Randriamiharisoa","doi":"10.18637/JSS.V070.I07","DOIUrl":"https://doi.org/10.18637/JSS.V070.I07","url":null,"abstract":"robustloggamma is an R package for robust estimation and inference in the generalized loggamma model. We briefly introduce the model, the estimation procedures and the computational algorithms. Then, we illustrate the use of the package with the help of a real data set.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77094174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-04DOI: 10.4310/SII.2016.V9.N4.A9
R. Casarin, Radu V. Craiu, F. Leisen
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines 'divide and conquer" ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.
{"title":"Embarrassingly Parallel Sequential Markov-chain Monte Carlo for Large Sets of Time Series","authors":"R. Casarin, Radu V. Craiu, F. Leisen","doi":"10.4310/SII.2016.V9.N4.A9","DOIUrl":"https://doi.org/10.4310/SII.2016.V9.N4.A9","url":null,"abstract":"Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines 'divide and conquer\" ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76134836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In econometrics some nonparametric instrumental regression models and nonparametric demand models with endogeneity lead to nonlinear integral equations with unknown integral kernels. We prove convergence rates of the risk for the iteratively regularized Newton method applied to these problems. Compared to related results we relay on a weaker non-linearity condition and have stronger convergence results. We demonstrate by numerical simulations for a nonparametric IV regression problem with continuous instrument and regressor that the method produces better results than the standard method.
{"title":"Convergence of the risk for nonparametric IV quantile regression and nonparametric IV regression with full independence","authors":"Fabian Dunker","doi":"10.17877/DE290R-16447","DOIUrl":"https://doi.org/10.17877/DE290R-16447","url":null,"abstract":"In econometrics some nonparametric instrumental regression models and nonparametric demand models with endogeneity lead to nonlinear integral equations with unknown integral kernels. We prove convergence rates of the risk for the iteratively regularized Newton method applied to these problems. Compared to related results we relay on a weaker non-linearity condition and have stronger convergence results. We demonstrate by numerical simulations for a nonparametric IV regression problem with continuous instrument and regressor that the method produces better results than the standard method.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89871681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article describes blavaan, an R package for estimating Bayesian structural equation models (SEMs) via JAGS and for summarizing the results. It also describes a novel parameter expansion approach for estimating specific types of models with residual covariances, which facilitates estimation of these models in JAGS. The methodology and software are intended to provide users with a general means of estimating Bayesian SEMs, both classical and novel, in a straightforward fashion. Users can estimate Bayesian versions of classical SEMs with lavaan syntax, they can obtain state-of-the-art Bayesian fit measures associated with the models, and they can export JAGS code to modify the SEMs as desired. These features and more are illustrated by example, and the parameter expansion approach is explained in detail.
{"title":"blavaan: Bayesian structural equation models via parameter expansion","authors":"E. Merkle, Y. Rosseel","doi":"10.18637/jss.v085.i04","DOIUrl":"https://doi.org/10.18637/jss.v085.i04","url":null,"abstract":"This article describes blavaan, an R package for estimating Bayesian structural equation models (SEMs) via JAGS and for summarizing the results. It also describes a novel parameter expansion approach for estimating specific types of models with residual covariances, which facilitates estimation of these models in JAGS. The methodology and software are intended to provide users with a general means of estimating Bayesian SEMs, both classical and novel, in a straightforward fashion. Users can estimate Bayesian versions of classical SEMs with lavaan syntax, they can obtain state-of-the-art Bayesian fit measures associated with the models, and they can export JAGS code to modify the SEMs as desired. These features and more are illustrated by example, and the parameter expansion approach is explained in detail.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86681229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important step of modeling spatially-referenced data is appropriately specifying the second order properties of the random field. A scientist developing a model for spatial data has a number of options regarding the nature of the dependence between observations. One of these options is deciding whether or not the dependence between observations depends on direction, or, in other words, whether or not the spatial covariance function is isotropic. Isotropy implies that spatial dependence is a function of only the distance and not the direction of the spatial separation between sampling locations. A researcher may use graphical techniques, such as directional sample semivariograms, to determine whether an assumption of isotropy holds. These graphical diagnostics can be difficult to assess, subject to personal interpretation, and potentially misleading as they typically do not include a measure of uncertainty. In order to escape these issues, a hypothesis test of the assumption of isotropy may be more desirable. To avoid specification of the covariance function, a number of nonparametric tests of isotropy have been developed using both the spatial and spectral representations of random fields. Several of these nonparametric tests are implemented in the R package spTest, available on CRAN. We demonstrate how graphical techniques and the hypothesis tests programmed in spTest can be used in practice to assess isotropy properties.
{"title":"spTest: An R Package Implementing Nonparametric Tests of Isotropy","authors":"Zachary D. Weller","doi":"10.18637/JSS.V083.I04","DOIUrl":"https://doi.org/10.18637/JSS.V083.I04","url":null,"abstract":"An important step of modeling spatially-referenced data is appropriately specifying the second order properties of the random field. A scientist developing a model for spatial data has a number of options regarding the nature of the dependence between observations. One of these options is deciding whether or not the dependence between observations depends on direction, or, in other words, whether or not the spatial covariance function is isotropic. Isotropy implies that spatial dependence is a function of only the distance and not the direction of the spatial separation between sampling locations. A researcher may use graphical techniques, such as directional sample semivariograms, to determine whether an assumption of isotropy holds. These graphical diagnostics can be difficult to assess, subject to personal interpretation, and potentially misleading as they typically do not include a measure of uncertainty. In order to escape these issues, a hypothesis test of the assumption of isotropy may be more desirable. To avoid specification of the covariance function, a number of nonparametric tests of isotropy have been developed using both the spatial and spectral representations of random fields. Several of these nonparametric tests are implemented in the R package spTest, available on CRAN. We demonstrate how graphical techniques and the hypothesis tests programmed in spTest can be used in practice to assess isotropy properties.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83305095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-21DOI: 10.1007/978-3-319-76035-3_24
R. Fontana, Fabio Rapallo
{"title":"Simulations on the combinatorial structure of D-optimal designs","authors":"R. Fontana, Fabio Rapallo","doi":"10.1007/978-3-319-76035-3_24","DOIUrl":"https://doi.org/10.1007/978-3-319-76035-3_24","url":null,"abstract":"","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73903311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D.
{"title":"EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions","authors":"Sharon X. Lee, G. J. Mclachlan","doi":"10.18637/JSS.V083.I03","DOIUrl":"https://doi.org/10.18637/JSS.V083.I03","url":null,"abstract":"This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80013752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}