首页 > 最新文献

arXiv: Computation最新文献

英文 中文
Metamodel-based sensitivity analysis: polynomial chaos expansions and Gaussian processes 基于元模型的灵敏度分析:多项式混沌展开和高斯过程
Pub Date : 2016-06-14 DOI: 10.1007/978-3-319-11259-6_38-1
Loic Le Gratiet, S. Marelli, B. Sudret
{"title":"Metamodel-based sensitivity analysis: polynomial chaos expansions and Gaussian processes","authors":"Loic Le Gratiet, S. Marelli, B. Sudret","doi":"10.1007/978-3-319-11259-6_38-1","DOIUrl":"https://doi.org/10.1007/978-3-319-11259-6_38-1","url":null,"abstract":"","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84513221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions 一个R包拟合多元污染正态分布的简约混合
Pub Date : 2016-06-12 DOI: 10.18637/JSS.V085.I10
A. Punzo, A. Mazza, P. McNicholas
We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under the common assumption of elliptically contoured groups. Thirteen variants of the model are also implemented to introduce parsimony. The expectation-conditional maximization algorithm is adopted to obtain maximum likelihood parameter estimates, and likelihood-based model selection criteria are used to select the model and the number of groups. Parallel computation can be used on multicore PCs and computer clusters, when several models have to be fitted. Differently from the more popular mixtures of multivariate normal and t distributions, this approach also allows for automatic detection of mild outliers via the maximum a posteriori probabilities procedure. To exemplify the use of the package, applications to artificial and real data are presented.
我们介绍了R包污染型混合,旨在传播多元污染正态分布的混合物的使用,作为在椭圆轮廓群的共同假设下稳健聚类和分类的工具。该模型还实现了13种变体,以引入节俭。采用期望-条件最大化算法获得最大似然参数估计,采用基于似然的模型选择准则选择模型和组数。并行计算可以用于多核pc机和计算机集群,当多个模型必须拟合时。与更流行的多元正态分布和t分布的混合不同,这种方法还允许通过最大后验概率过程自动检测轻度异常值。为了举例说明该包的使用,给出了对人工数据和实际数据的应用。
{"title":"ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions","authors":"A. Punzo, A. Mazza, P. McNicholas","doi":"10.18637/JSS.V085.I10","DOIUrl":"https://doi.org/10.18637/JSS.V085.I10","url":null,"abstract":"We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under the common assumption of elliptically contoured groups. Thirteen variants of the model are also implemented to introduce parsimony. The expectation-conditional maximization algorithm is adopted to obtain maximum likelihood parameter estimates, and likelihood-based model selection criteria are used to select the model and the number of groups. Parallel computation can be used on multicore PCs and computer clusters, when several models have to be fitted. Differently from the more popular mixtures of multivariate normal and t distributions, this approach also allows for automatic detection of mild outliers via the maximum a posteriori probabilities procedure. To exemplify the use of the package, applications to artificial and real data are presented.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85749629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
DNest4: Diffusive Nested Sampling in C++ and Python c++和Python中的扩散嵌套采样
Pub Date : 2016-06-12 DOI: 10.18637/JSS.V086.I07
B. Brewer, D. Foreman-Mackey
In probabilistic (Bayesian) inferences, we typically want to compute properties of the posterior distribution, describing knowledge of unknown quantities in the context of a particular dataset and the assumed prior information. The marginal likelihood, also known as the "evidence", is a key quantity in Bayesian model selection. The Diffusive Nested Sampling algorithm, a variant of Nested Sampling, is a powerful tool for generating posterior samples and estimating marginal likelihoods. It is effective at solving complex problems including many where the posterior distribution is multimodal or has strong dependencies between variables. DNest4 is an open source (MIT licensed), multi-threaded implementation of this algorithm in C++11, along with associated utilities including: i) RJObject, a class template for finite mixture models, (ii) A Python package allowing basic use without C++ coding, and iii) Experimental support for models implemented in Julia. In this paper we demonstrate DNest4 usage through examples including simple Bayesian data analysis, finite mixture models, and Approximate Bayesian Computation.
在概率(贝叶斯)推断中,我们通常想要计算后验分布的属性,描述特定数据集和假设先验信息背景下未知量的知识。边际似然,又称“证据”,是贝叶斯模型选择中的一个关键量。扩散嵌套抽样算法是嵌套抽样的一种变体,是生成后验样本和估计边际似然的有力工具。它在解决复杂问题上是有效的,包括许多后验分布是多模态的或变量之间有很强的依赖性的问题。DNest4是一个开源的(MIT许可的),在c++ 11中实现了这个算法的多线程,以及相关的实用程序,包括:i) RJObject,一个有限混合模型的类模板,(ii)一个Python包,允许基本使用而不需要c++编码,以及iii)对Julia实现的模型的实验支持。在本文中,我们通过简单贝叶斯数据分析、有限混合模型和近似贝叶斯计算等例子来演示DNest4的使用。
{"title":"DNest4: Diffusive Nested Sampling in C++ and Python","authors":"B. Brewer, D. Foreman-Mackey","doi":"10.18637/JSS.V086.I07","DOIUrl":"https://doi.org/10.18637/JSS.V086.I07","url":null,"abstract":"In probabilistic (Bayesian) inferences, we typically want to compute properties of the posterior distribution, describing knowledge of unknown quantities in the context of a particular dataset and the assumed prior information. The marginal likelihood, also known as the \"evidence\", is a key quantity in Bayesian model selection. The Diffusive Nested Sampling algorithm, a variant of Nested Sampling, is a powerful tool for generating posterior samples and estimating marginal likelihoods. It is effective at solving complex problems including many where the posterior distribution is multimodal or has strong dependencies between variables. DNest4 is an open source (MIT licensed), multi-threaded implementation of this algorithm in C++11, along with associated utilities including: i) RJObject, a class template for finite mixture models, (ii) A Python package allowing basic use without C++ coding, and iii) Experimental support for models implemented in Julia. In this paper we demonstrate DNest4 usage through examples including simple Bayesian data analysis, finite mixture models, and Approximate Bayesian Computation.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76142298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Maxima Units Search (MUS) algorithm: methodology and applications 最大单位搜索(MUS)算法:方法和应用
Pub Date : 2016-06-08 DOI: 10.1007/978-3-319-73906-9_7
Leonardo Egidi, R. Pappadà, F. Pauli, N. Torelli
{"title":"Maxima Units Search (MUS) algorithm: methodology and applications","authors":"Leonardo Egidi, R. Pappadà, F. Pauli, N. Torelli","doi":"10.1007/978-3-319-73906-9_7","DOIUrl":"https://doi.org/10.1007/978-3-319-73906-9_7","url":null,"abstract":"","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79407847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Rectangular Statistical Cartograms in R: The recmap Package 矩形统计地图在R:重映射包
Pub Date : 2016-06-01 DOI: 10.18637/jss.v086.c01
Christian Panse
Cartogram drawing is a technique for showing geography-related statistical information, such as demographic and epidemiological data. The idea is to distort a map by resizing its regions according to a statistical parameter by keeping the map recognizable. This article describes an R package implementing an algorithm called RecMap which approximates every map region by a rectangle where the area corresponds to the given statistical value (maintain zero cartographic error). The package implements the computationally intensive tasks in C++. This paper's contribution is that it demonstrates on real and synthetic maps how recmap can be used, how it is implemented and used with other statistical packages.
地图绘制是一种显示地理相关统计信息的技术,例如人口统计和流行病学数据。这个想法是通过根据统计参数调整区域大小来扭曲地图,同时保持地图的可识别性。本文描述了一个实现RecMap算法的R包,该算法通过一个矩形近似每个地图区域,其中该区域对应于给定的统计值(保持零制图误差)。该包在c++中实现了计算密集型任务。本文的贡献在于,它演示了如何在真实的和合成的地图上使用recmap,它是如何实现的,以及如何与其他统计软件包一起使用。
{"title":"Rectangular Statistical Cartograms in R: The recmap Package","authors":"Christian Panse","doi":"10.18637/jss.v086.c01","DOIUrl":"https://doi.org/10.18637/jss.v086.c01","url":null,"abstract":"Cartogram drawing is a technique for showing geography-related statistical information, such as demographic and epidemiological data. The idea is to distort a map by resizing its regions according to a statistical parameter by keeping the map recognizable. This article describes an R package implementing an algorithm called RecMap which approximates every map region by a rectangle where the area corresponds to the given statistical value (maintain zero cartographic error). The package implements the computationally intensive tasks in C++. This paper's contribution is that it demonstrates on real and synthetic maps how recmap can be used, how it is implemented and used with other statistical packages.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85113791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MCMC with Strings and Branes: The Suburban Algorithm 带字符串和膜的MCMC:郊区算法
Pub Date : 2016-05-17 DOI: 10.1142/S0217751X17501330
J. Heckman, J. Bernstein, B. Vigoda
Motivated by the physics of strings and branes, we introduce a general suite of Markov chain Monte Carlo (MCMC) "suburban samplers" (i.e., spread out Metropolis). The suburban algorithm involves an ensemble of statistical agents connected together by a random network. Performance of the collective in reaching a fast and accurate inference depends primarily on the average number of nearest neighbor connections. Increasing the average number of neighbors above zero initially leads to an increase in performance, though there is a critical connectivity with effective dimension d_eff ~ 1, above which "groupthink" takes over, and the performance of the sampler declines.
基于弦和膜的物理性质,我们引入了一套通用的马尔可夫链蒙特卡罗算法。“郊区采样者”(即分散在大都市)。郊区算法涉及一个由随机网络连接在一起的统计代理的集合。集体在达到快速和准确推断方面的性能主要取决于最近邻连接的平均数量。将邻居的平均数量增加到零以上,最初会导致性能的提高,尽管有效维数d_eff ~ 1存在一个关键的连通性,超过这个连通性,“群体思维”就会起作用,采样器的性能就会下降。
{"title":"MCMC with Strings and Branes: The Suburban Algorithm","authors":"J. Heckman, J. Bernstein, B. Vigoda","doi":"10.1142/S0217751X17501330","DOIUrl":"https://doi.org/10.1142/S0217751X17501330","url":null,"abstract":"Motivated by the physics of strings and branes, we introduce a general suite of Markov chain Monte Carlo (MCMC) \"suburban samplers\" (i.e., spread out Metropolis). The suburban algorithm involves an ensemble of statistical agents connected together by a random network. Performance of the collective in reaching a fast and accurate inference depends primarily on the average number of nearest neighbor connections. Increasing the average number of neighbors above zero initially leads to an increase in performance, though there is a critical connectivity with effective dimension d_eff ~ 1, above which \"groupthink\" takes over, and the performance of the sampler declines.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"1999 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88277987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Forward and Inverse Uncertainty Quantification using Multilevel Monte Carlo Algorithms for an Elliptic Nonlocal Equation 椭圆型非局部方程的多级蒙特卡罗算法的正反不确定性量化
Pub Date : 2016-03-21 DOI: 10.1615/INT.J.UNCERTAINTYQUANTIFICATION.2016018661
A. Jasra, K. Law, Yan Zhou
This paper considers uncertainty quantification for an elliptic nonlocal equation. In particular, it is assumed that the parameters which define the kernel in the nonlocal operator are uncertain and a priori distributed according to a probability measure. It is shown that the induced probability measure on some quantities of interest arising from functionals of the solution to the equation with random inputs is well-defined; as is the posterior distribution on parameters given observations. As the elliptic nonlocal equation cannot be solved approximate posteriors are constructed. The multilevel Monte Carlo (MLMC) and multilevel sequential Monte Carlo (MLSMC) sampling algorithms are used for a priori and a posteriori estimation, respectively, of quantities of interest. These algorithms reduce the amount of work to estimate posterior expectations, for a given level of error, relative to Monte Carlo and i.i.d. sampling from the posterior at a given level of approximation of the solution of the elliptic nonlocal equation.
研究一类椭圆型非定域方程的不确定性量化问题。特别地,假定非局部算子中定义核的参数是不确定的,并且是根据概率度量先验分布的。证明了随机输入方程解的泛函对某些感兴趣的量的诱导概率测度是定义良好的;给定观测值的参数的后验分布也是如此。由于椭圆型非局部方程不能解,构造了近似后验。多层蒙特卡罗(MLMC)和多层顺序蒙特卡罗(MLSMC)采样算法分别用于感兴趣数量的先验和后验估计。这些算法减少了估计后验期望的工作量,对于给定的误差水平,相对于蒙特卡罗和i.i.d采样的后验在给定的近似水平的椭圆非局部方程的解。
{"title":"Forward and Inverse Uncertainty Quantification using Multilevel Monte Carlo Algorithms for an Elliptic Nonlocal Equation","authors":"A. Jasra, K. Law, Yan Zhou","doi":"10.1615/INT.J.UNCERTAINTYQUANTIFICATION.2016018661","DOIUrl":"https://doi.org/10.1615/INT.J.UNCERTAINTYQUANTIFICATION.2016018661","url":null,"abstract":"This paper considers uncertainty quantification for an elliptic nonlocal equation. In particular, it is assumed that the parameters which define the kernel in the nonlocal operator are uncertain and a priori distributed according to a probability measure. It is shown that the induced probability measure on some quantities of interest arising from functionals of the solution to the equation with random inputs is well-defined; as is the posterior distribution on parameters given observations. As the elliptic nonlocal equation cannot be solved approximate posteriors are constructed. The multilevel Monte Carlo (MLMC) and multilevel sequential Monte Carlo (MLSMC) sampling algorithms are used for a priori and a posteriori estimation, respectively, of quantities of interest. These algorithms reduce the amount of work to estimate posterior expectations, for a given level of error, relative to Monte Carlo and i.i.d. sampling from the posterior at a given level of approximation of the solution of the elliptic nonlocal equation.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74504173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
kdecopula: An R Package for the Kernel Estimation of Bivariate Copula Densities 二元Copula密度核估计的R包
Pub Date : 2016-03-14 DOI: 10.18637/JSS.V084.I07
T. Nagler
We describe the R package kdecopula (current version 0.9.0), which provides fast implementations of various kernel estimators for the copula density. Due to a variety of available plotting options it is particularly useful for the exploratory analysis of dependence structures. It can be further used for accurate nonparametric estimation of copula densities and resampling. The implementation features spline interpolation of the estimates to allow for fast evaluation of density estimates and integrals thereof. We utilize this for a fast renormalization scheme that ensures that estimates are bona fide copula densities and additionally improves the estimators' accuracy. The performance of the methods is illustrated by simulations.
我们描述了R包kdecopula(当前版本0.9.0),它为copula密度提供了各种内核估计器的快速实现。由于有多种可用的绘图选项,它对于依赖性结构的探索性分析特别有用。它可以进一步用于精确的非参数估计耦合密度和重采样。该实现的特征是估计的样条插值,以允许快速评估密度估计及其积分。我们将其用于快速重整化方案,以确保估计是真实的联结密度,并进一步提高估计器的准确性。仿真结果表明了该方法的有效性。
{"title":"kdecopula: An R Package for the Kernel Estimation of Bivariate Copula Densities","authors":"T. Nagler","doi":"10.18637/JSS.V084.I07","DOIUrl":"https://doi.org/10.18637/JSS.V084.I07","url":null,"abstract":"We describe the R package kdecopula (current version 0.9.0), which provides fast implementations of various kernel estimators for the copula density. Due to a variety of available plotting options it is particularly useful for the exploratory analysis of dependence structures. It can be further used for accurate nonparametric estimation of copula densities and resampling. The implementation features spline interpolation of the estimates to allow for fast evaluation of density estimates and integrals thereof. We utilize this for a fast renormalization scheme that ensures that estimates are bona fide copula densities and additionally improves the estimators' accuracy. The performance of the methods is illustrated by simulations.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77168954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
A Poisson process model for Monte Carlo 蒙特卡罗泊松过程模型
Pub Date : 2016-02-18 DOI: 10.7551/mitpress/10761.003.0008
Chris J. Maddison
Simulating samples from arbitrary probability distributions is a major research program of statistical computing. Recent work has shown promise in an old idea, that sampling from a discrete distribution can be accomplished by perturbing and maximizing its mass function. Yet, it has not been clearly explained how this research project relates to more traditional ideas in the Monte Carlo literature. This chapter addresses that need by identifying a Poisson process model that unifies the perturbation and accept-reject views of Monte Carlo simulation. Many existing methods can be analyzed in this framework. The chapter reviews Poisson processes and defines a Poisson process model for Monte Carlo methods. This model is used to generalize the perturbation trick to infinite spaces by constructing Gumbel processes, random functions whose maxima are located at samples over infinite spaces. The model is also used to analyze A* sampling and OS*, two methods from distinct Monte Carlo families.
从任意概率分布中模拟样本是统计计算的一个重要研究项目。最近的工作显示了一个古老的想法的希望,即从离散分布中采样可以通过扰动和最大化其质量函数来完成。然而,还没有清楚地解释这个研究项目是如何与蒙特卡洛文献中更传统的观点联系起来的。本章通过确定一个泊松过程模型来解决这个问题,该模型统一了蒙特卡罗模拟的摄动和接受-拒绝观点。许多现有的方法都可以在这个框架中进行分析。本章回顾了泊松过程,并为蒙特卡罗方法定义了泊松过程模型。该模型通过构造Gumbel过程将摄动技巧推广到无限空间,该随机函数的最大值位于无限空间上的样本处。该模型还用于分析A*抽样和OS*两种不同的蒙特卡罗族方法。
{"title":"A Poisson process model for Monte Carlo","authors":"Chris J. Maddison","doi":"10.7551/mitpress/10761.003.0008","DOIUrl":"https://doi.org/10.7551/mitpress/10761.003.0008","url":null,"abstract":"Simulating samples from arbitrary probability distributions is a major research program of statistical computing. Recent work has shown promise in an old idea, that sampling from a discrete distribution can be accomplished by perturbing and maximizing its mass function. Yet, it has not been clearly explained how this research project relates to more traditional ideas in the Monte Carlo literature. This chapter addresses that need by identifying a Poisson process model that unifies the perturbation and accept-reject views of Monte Carlo simulation. Many existing methods can be analyzed in this framework. The chapter reviews Poisson processes and defines a Poisson process model for Monte Carlo methods. This model is used to generalize the perturbation trick to infinite spaces by constructing Gumbel processes, random functions whose maxima are located at samples over infinite spaces. The model is also used to analyze A* sampling and OS*, two methods from distinct Monte Carlo families.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87913714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters 超大样本的平滑样条方差分析:通过舍入参数进行可扩展计算
Pub Date : 2016-02-16 DOI: 10.4310/SII.2016.V9.N4.A3
Nathaniel E. Helwig, Ping Ma
In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.
在当前的大数据时代,研究人员通常会收集和分析超大样本量的数据。面向数据的统计方法已经发展到从超大数据中提取信息。平滑样条方差分析(SSANOVA)是一种很有前途的从噪声数据中提取信息的方法;然而,SSANOVA庞大的计算成本阻碍了其广泛应用。本文提出了一种超大样本数据拟合SSANOVA模型的新算法。在该算法中,我们引入了舍入参数,使计算具有可扩展性。为了证明舍入参数的好处,我们给出了一个模拟研究和一个使用脑电图数据的真实数据示例。我们的研究结果表明(使用舍入参数),研究人员可以在几秒钟内使用标准的笔记本电脑或平板电脑将非参数回归模型拟合到非常大的样本中。
{"title":"Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters","authors":"Nathaniel E. Helwig, Ping Ma","doi":"10.4310/SII.2016.V9.N4.A3","DOIUrl":"https://doi.org/10.4310/SII.2016.V9.N4.A3","url":null,"abstract":"In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.","PeriodicalId":8446,"journal":{"name":"arXiv: Computation","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86503373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
arXiv: Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1