首页 > 最新文献

arXiv: Applications最新文献

英文 中文
Reducing Storage of Global Wind Ensembles with Stochastic Generators 利用随机发电机减少全球风系统的存储
Pub Date : 2017-02-07 DOI: 10.1214/17-AOAS1105
J. Jeong, S. Castruccio, P. Crippa, M. Genton
Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth's orography. We consider a multi-step conditional likelihood approach to estimate the parameters that explicitly accounts for nonstationary features while also balancing memory storage and distributed computation. We apply the proposed model to more than 18 million points of yearly global wind speed. The proposed SG requires orders of magnitude less storage for generating surrogate ensemble members from wind than does creating additional wind fields from the climate model, even if an effective lossy data compression algorithm is applied to the simulation output.
风能有潜力对未来的能源资源做出重大贡献。然而,在全球范围内定位这种可再生能源的来源是极具挑战性的,因为现代计算机模型产生的非常大的数据集很难存储。我们提出了一个统计模型,旨在通过全球年风数据的随机生成器(SG)再现一系列运行的数据生成机制。我们引入了一种基于大尺度地理描述符(如海拔)的空间变化参数的演化谱方法,以更好地解释地球地形的不同制度。我们考虑了一种多步条件似然方法来估计参数,这些参数明确地说明了非平稳特征,同时也平衡了内存存储和分布式计算。我们将提出的模型应用于超过1800万个全球年风速点。即使对模拟输出应用了有效的有损数据压缩算法,与从气候模型中创建额外的风场相比,所提出的SG所需的存储空间要少几个数量级。
{"title":"Reducing Storage of Global Wind Ensembles with Stochastic Generators","authors":"J. Jeong, S. Castruccio, P. Crippa, M. Genton","doi":"10.1214/17-AOAS1105","DOIUrl":"https://doi.org/10.1214/17-AOAS1105","url":null,"abstract":"Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth's orography. We consider a multi-step conditional likelihood approach to estimate the parameters that explicitly accounts for nonstationary features while also balancing memory storage and distributed computation. We apply the proposed model to more than 18 million points of yearly global wind speed. The proposed SG requires orders of magnitude less storage for generating surrogate ensemble members from wind than does creating additional wind fields from the climate model, even if an effective lossy data compression algorithm is applied to the simulation output.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114849284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study 利用异方差高斯过程对疾病发病率进行现象学预测:一个登革热病例研究
Pub Date : 2017-02-01 DOI: 10.1214/17-AOAS1090
L. Johnson, R. Gramacy, Jeremy M. Cohen, E. Mordecai, C. Murdock, Jason Rohr, S. Ryan, Anna M. Stewart-Ibarra, Daniel P. Weikel
In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week and total season incidence across each of several seasons. Our team was one of the top performers of that competition, outperforming all other teams in multiple targets/locals. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large---in particular relationships between dengue transmission and environmental factors---and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that "memorize" the trajectories of past seasons, and then "match" the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, e.g., at sites with shorter histories of disease (such as Iquitos), or where measurements and forecasts of ancillary covariates like precipitation are unavailable and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.
2015年,美国联邦政府利用秘鲁伊基托斯和波多黎各圣胡安的历史病例数据赞助了一项登革热预测竞赛。竞争对手在样本外预测的几个方面进行了评估,包括高峰周的目标、该周的高峰发病率和几个季节中每个季节的总发病率。我们的队伍是那次比赛中表现最好的队伍之一,在多个目标/地区的比赛中都胜过其他所有队伍。在本文中,我们报告了我们的方法,令人惊讶的是,其中很大一部分忽略了已知的流行病生物学-特别是登革热传播与环境因素之间的关系-而是依赖于灵活的非参数非线性高斯过程(GP)回归拟合,“记住”过去季节的轨迹,然后实时“匹配”正在展开的季节与过去的动态。我们的现象学方法在疾病动力学不太清楚的情况下具有优势,例如,在疾病历史较短的地点(如伊基托斯),或在降雨量等辅助协变量的测量和预测不可用和/或与病例的关联强度尚不清楚的情况下。特别是,我们表明GP方法通常优于我们为利用丰富的协变量信息而开发的更经典的广义线性(自回归)模型(GLM)。我们在两个基准语言环境中演示了我们的方法的变化,以及其他竞赛对手提交的结果的完整摘要。
{"title":"Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study","authors":"L. Johnson, R. Gramacy, Jeremy M. Cohen, E. Mordecai, C. Murdock, Jason Rohr, S. Ryan, Anna M. Stewart-Ibarra, Daniel P. Weikel","doi":"10.1214/17-AOAS1090","DOIUrl":"https://doi.org/10.1214/17-AOAS1090","url":null,"abstract":"In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week and total season incidence across each of several seasons. Our team was one of the top performers of that competition, outperforming all other teams in multiple targets/locals. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large---in particular relationships between dengue transmission and environmental factors---and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that \"memorize\" the trajectories of past seasons, and then \"match\" the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, e.g., at sites with shorter histories of disease (such as Iquitos), or where measurements and forecasts of ancillary covariates like precipitation are unavailable and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128942044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Testing the Zonal Stationarity of Spatial Point Processes: Applied to prostate tissues and trees locations 测试空间点过程的区域平稳性:应用于前列腺组织和树木位置
Pub Date : 2017-01-29 DOI: 10.4310/SII.2018.V11.N3.A11
Azam Saadatjouy, A. R. Taheriyoun, M. Q. Vahidi-Asl
We consider the problem of testing the stationarity and isotropy of a spatial point pattern based on the concept of local spectra. Using a logarithmic transformation, the mechanism of the proposed test is approximately identical to a simple two factor analysis of variance procedure when the variance of residuals is known. This procedure is also used for testing the stationarity in neighborhood of a particular point of the window of observation. The same idea is used in post-hoc tests to cluster the point pattern into stationary and nonstationary sub-windows. The performance of the proposed method is examined via a simulation study and applied in a practical data.
基于局域谱的概念,研究了空间点图的平稳性和各向同性检验问题。使用对数变换,当残差方差已知时,所提出的检验的机制与简单的两因素方差分析程序大致相同。该方法也可用于检验观测窗口某一特定点的邻域平稳性。在事后测试中使用了同样的思想,将点模式聚类到平稳和非平稳子窗口中。通过仿真研究和实际数据验证了所提方法的性能。
{"title":"Testing the Zonal Stationarity of Spatial Point Processes: Applied to prostate tissues and trees locations","authors":"Azam Saadatjouy, A. R. Taheriyoun, M. Q. Vahidi-Asl","doi":"10.4310/SII.2018.V11.N3.A11","DOIUrl":"https://doi.org/10.4310/SII.2018.V11.N3.A11","url":null,"abstract":"We consider the problem of testing the stationarity and isotropy of a spatial point pattern based on the concept of local spectra. Using a logarithmic transformation, the mechanism of the proposed test is approximately identical to a simple two factor analysis of variance procedure when the variance of residuals is known. This procedure is also used for testing the stationarity in neighborhood of a particular point of the window of observation. The same idea is used in post-hoc tests to cluster the point pattern into stationary and nonstationary sub-windows. The performance of the proposed method is examined via a simulation study and applied in a practical data.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133800899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
vtreat: a data.frame Processor for Predictive Modeling 用于预测建模的数据帧处理器
Pub Date : 2016-11-29 DOI: 10.5281/ZENODO.1173314
N. Zumel, J. Mount
We look at common problems found in data that is used for predictive modeling tasks, and describe how to address them with the vtreat R package. vtreat prepares real-world data for predictive modeling in a reproducible and statistically sound manner. We describe the theory of preparing variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems dealt with include: infinite values, invalid values, NA, too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Of special interest are techniques needed to avoid needlessly introducing undesirable nested modeling bias (which is a risk when using a data-preprocessor).
我们将查看用于预测建模任务的数据中发现的常见问题,并描述如何使用vtreat R包解决这些问题。Vtreat准备真实世界的数据,以可重复和统计合理的方式进行预测建模。我们描述了准备变量的理论,以便数据具有更少的异常情况,从而更容易在生产中安全地使用模型。处理的常见问题包括:无限值、无效值、NA、太多的分类级别、罕见的分类级别和新的分类级别(在应用程序期间看到的级别,而不是在训练期间看到的级别)。需要特别关注的是避免不必要地引入不受欢迎的嵌套建模偏差(这在使用数据预处理器时是一种风险)所需的技术。
{"title":"vtreat: a data.frame Processor for Predictive Modeling","authors":"N. Zumel, J. Mount","doi":"10.5281/ZENODO.1173314","DOIUrl":"https://doi.org/10.5281/ZENODO.1173314","url":null,"abstract":"We look at common problems found in data that is used for predictive modeling tasks, and describe how to address them with the vtreat R package. vtreat prepares real-world data for predictive modeling in a reproducible and statistically sound manner. We describe the theory of preparing variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems dealt with include: infinite values, invalid values, NA, too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Of special interest are techniques needed to avoid needlessly introducing undesirable nested modeling bias (which is a risk when using a data-preprocessor).","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120927179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Space and circular time log Gaussian Cox processes with application to crime event data 空间和循环时间对数高斯Cox处理在犯罪事件数据中的应用
Pub Date : 2016-11-26 DOI: 10.1214/16-AOAS960
Shinichiro Shirota, A. Gelfand
We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a emph{random} intensity which we model as a realization of a spatio-temporal log Gaussian process. Importantly, we view time as circular not linear, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type. Furthermore, each crime event is recorded by day of the year which we convert to day of the week marks. The contribution here is to develop models to accommodate such data. Our specifications take the form of hierarchical models which we fit within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. We have location, hour, day of the year, and crime type for each event. We investigate models to enhance our understanding of the set of incidences.
我们把一系列犯罪事件发生的地点和时间看作是一个时空点模式。所以,无论是非齐次泊松过程还是更一般的考克斯过程,我们都需要指定一个时空强度。对于后者,我们需要一个emph{随机}强度,我们将其建模为一个时空对数高斯过程的实现。重要的是,我们认为时间是循环的,而不是线性的,需要有效的可分离和不可分离的协方差函数在一个有界的空间区域与循环时间交叉。此外,还根据犯罪类型对犯罪进行分类。此外,每个犯罪事件都是按一年中的一天记录的,我们将其转换为星期几的标记。这里的贡献是开发适应这些数据的模型。我们的规范采用层次模型的形式,我们将其放入贝叶斯框架中。在这方面,我们考虑非齐次泊松过程和对数高斯Cox过程的模型比较。我们还比较了可分离和不可分离的协方差规格。我们的激励数据集是2012年旧金山市犯罪事件的集合。我们有每个事件的地点、时间、日期和犯罪类型。我们研究模型以增强我们对一系列事件的理解。
{"title":"Space and circular time log Gaussian Cox processes with application to crime event data","authors":"Shinichiro Shirota, A. Gelfand","doi":"10.1214/16-AOAS960","DOIUrl":"https://doi.org/10.1214/16-AOAS960","url":null,"abstract":"We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a emph{random} intensity which we model as a realization of a spatio-temporal log Gaussian process. Importantly, we view time as circular not linear, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type. Furthermore, each crime event is recorded by day of the year which we convert to day of the week marks. \u0000The contribution here is to develop models to accommodate such data. Our specifications take the form of hierarchical models which we fit within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. \u0000Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. We have location, hour, day of the year, and crime type for each event. We investigate models to enhance our understanding of the set of incidences.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116141104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Statistical Methods for Thermal Index Estimation Based on Accelerated Destructive Degradation Test Data 基于加速破坏退化试验数据的热指数估计的统计方法
Pub Date : 2016-11-22 DOI: 10.1007/978-981-10-5194-4_12
Yimeng Xie, Zhongnan Jin, Yili Hong, J. V. Mullekom
{"title":"Statistical Methods for Thermal Index Estimation Based on Accelerated Destructive Degradation Test Data","authors":"Yimeng Xie, Zhongnan Jin, Yili Hong, J. V. Mullekom","doi":"10.1007/978-981-10-5194-4_12","DOIUrl":"https://doi.org/10.1007/978-981-10-5194-4_12","url":null,"abstract":"","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125869210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Earthquake Number Forecasts Testing 地震次数预报测试
Pub Date : 2016-09-26 DOI: 10.1093/gji/ggx300
Y. Kagan
We study the distributions of earthquake numbers in two global catalogs: Global Centroid-Moment Tensor and Preliminary Determinations of Epicenters. These distributions are required to develop the number test for forecasts of future seismic activity rate. A common assumption is that the numbers are described by the Poisson distribution. In contrast to the one-parameter Poisson distribution, the negative-binomial distribution (NBD) has two parameters. The second parameter characterizes the clustering or over-dispersion of a process. We investigate the dependence of parameters for both distributions on the catalog magnitude threshold and on temporal subdivision of catalog duration. We find that for most cases of interest the Poisson distribution can be rejected statistically at a high significance level in favor of the NBD. Therefore we investigate whether these distributions fit the observed distributions of seismicity. For this purpose we study upper statistical moments of earthquake numbers (skewness and kurtosis) and compare them to the theoretical values for both distributions. Empirical values for the skewness and the kurtosis increase for the smaller magnitude threshold and increase with even greater intensity for small temporal subdivision of catalogs. A calculation of the NBD skewness and kurtosis levels shows rapid increase of these upper moments levels. However, the observed catalog values of skewness and kurtosis are rising even faster. This means that for small time intervals the earthquake number distribution is even more heavy-tailed than the NBD predicts. Therefore for small time intervals we propose using empirical number distributions appropriately smoothed for testing forecasted earthquake numbers.
我们研究了两个全球目录中地震次数的分布:全球质心矩张量和震中的初步确定。这些分布需要发展用于预测未来地震活动率的数字检验。一个常见的假设是,这些数字是由泊松分布描述的。与单参数泊松分布不同,负二项分布有两个参数。第二个参数表示过程的聚类或过度分散。我们研究了分布参数对目录星等阈值和目录持续时间的时间细分的依赖性。我们发现,对于大多数感兴趣的情况,泊松分布可以在统计上以高显著性水平被拒绝,而有利于NBD。因此,我们研究这些分布是否符合观测到的地震活动性分布。为此,我们研究了地震数的上统计矩(偏度和峰度),并将它们与两种分布的理论值进行比较。偏度和峰度的经验值在较小的星等阈值下增加,在较小的星表时间细分下增加的强度更大。对NBD偏度和峰度水平的计算表明,这些高矩水平迅速增加。然而,观测到的偏度和峰度的目录值上升得更快。这意味着,在小的时间间隔内,地震次数的分布甚至比NBD预测的更为密集。因此,对于较小的时间间隔,我们建议使用经过适当平滑处理的经验数分布来检验预测地震次数。
{"title":"Earthquake Number Forecasts Testing","authors":"Y. Kagan","doi":"10.1093/gji/ggx300","DOIUrl":"https://doi.org/10.1093/gji/ggx300","url":null,"abstract":"We study the distributions of earthquake numbers in two global catalogs: Global Centroid-Moment Tensor and Preliminary Determinations of Epicenters. These distributions are required to develop the number test for forecasts of future seismic activity rate. A common assumption is that the numbers are described by the Poisson distribution. In contrast to the one-parameter Poisson distribution, the negative-binomial distribution (NBD) has two parameters. The second parameter characterizes the clustering or over-dispersion of a process. We investigate the dependence of parameters for both distributions on the catalog magnitude threshold and on temporal subdivision of catalog duration. We find that for most cases of interest the Poisson distribution can be rejected statistically at a high significance level in favor of the NBD. Therefore we investigate whether these distributions fit the observed distributions of seismicity. For this purpose we study upper statistical moments of earthquake numbers (skewness and kurtosis) and compare them to the theoretical values for both distributions. Empirical values for the skewness and the kurtosis increase for the smaller magnitude threshold and increase with even greater intensity for small temporal subdivision of catalogs. A calculation of the NBD skewness and kurtosis levels shows rapid increase of these upper moments levels. However, the observed catalog values of skewness and kurtosis are rising even faster. This means that for small time intervals the earthquake number distribution is even more heavy-tailed than the NBD predicts. Therefore for small time intervals we propose using empirical number distributions appropriately smoothed for testing forecasted earthquake numbers.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"608 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121982643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Application of Bootstrap Re-sampling Method to a Categorical Data of HIV/AIDS Spread across different Social-Economic Classes 自举重抽样方法在不同社会经济阶层艾滋病传播分类数据中的应用
Pub Date : 2016-09-25 DOI: 10.5923/j.statistics.20150504.04
A. O. Bello, F. Oguntolu, O. Adetutu, Nyor Ngutor, J. P. Ojedokun
This research reports on the relationship and significance of social-economic factors (age, sex, employment status) and modes of HIV/AIDS transmission to the HIV/AIDS spread. Logistic regression model, a form of probabilistic function for binary response was used to relate social-economic factors (age, sex, employment status) to HIV/AIDS spread. The statistical predictive model was used to project the likelihood response of HIV/AIDS spread with a larger population using 10,000 Bootstrap re-sampling observations.
本研究报告了社会经济因素(年龄、性别、就业状况)和HIV/AIDS传播方式与HIV/AIDS传播的关系和意义。Logistic回归模型是二元响应的一种概率函数形式,用于将社会经济因素(年龄、性别、就业状况)与艾滋病毒/艾滋病传播联系起来。利用统计预测模型,利用10000次Bootstrap重抽样观测,预测更大人群中艾滋病毒/艾滋病传播的可能性响应。
{"title":"Application of Bootstrap Re-sampling Method to a Categorical Data of HIV/AIDS Spread across different Social-Economic Classes","authors":"A. O. Bello, F. Oguntolu, O. Adetutu, Nyor Ngutor, J. P. Ojedokun","doi":"10.5923/j.statistics.20150504.04","DOIUrl":"https://doi.org/10.5923/j.statistics.20150504.04","url":null,"abstract":"This research reports on the relationship and significance of social-economic factors (age, sex, employment status) and modes of HIV/AIDS transmission to the HIV/AIDS spread. Logistic regression model, a form of probabilistic function for binary response was used to relate social-economic factors (age, sex, employment status) to HIV/AIDS spread. The statistical predictive model was used to project the likelihood response of HIV/AIDS spread with a larger population using 10,000 Bootstrap re-sampling observations.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132217763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian Additive Regression Trees 预测人类驾驶行为以帮助无人驾驶汽车驾驶:随机截距贝叶斯加性回归树
Pub Date : 2016-09-23 DOI: 10.4310/SII.2018.V11.N4.A1
Y. V. Tan, C. Flannagan, M. Elliott
The development of driverless vehicles has spurred the need to predict human driving behavior to facilitate interaction between driverless and human-driven vehicles. Predicting human driving movements can be challenging, and poor prediction models can lead to accidents between the driverless and human-driven vehicles. We used the vehicle speed obtained from a naturalistic driving dataset to predict whether a human-driven vehicle would stop before executing a left turn. In a preliminary analysis, we found that BART produced less variable and higher AUC values compared to a variety of other state-of-the-art binary predictor methods. However, BART assumes independent observations, but our dataset consists of multiple observations clustered by driver. Although methods extending BART to clustered or longitudinal data are available, they lack readily available software and can only be applied to clustered continuous outcomes. We extend BART to handle correlated binary observations by adding a random intercept and used a simulation study to determine bias, root mean squared error, 95% coverage, and average length of 95% credible interval in a correlated data setting. We then successfully implemented our random intercept BART model to our clustered dataset and found substantial improvements in prediction performance compared to BART and random intercept linear logistic regression.
无人驾驶汽车的发展刺激了预测人类驾驶行为的需求,以促进无人驾驶汽车和人类驾驶汽车之间的互动。预测人类驾驶动作可能具有挑战性,而糟糕的预测模型可能导致无人驾驶汽车和人类驾驶汽车之间发生事故。我们使用从自然驾驶数据集获得的车速来预测人类驾驶的车辆是否会在左转前停车。在初步分析中,我们发现与其他各种最先进的二元预测方法相比,BART产生的变量更少,AUC值更高。然而,BART假设独立的观测值,但我们的数据集由多个观测值组成,这些观测值按驾驶员聚类。虽然将BART扩展到集群或纵向数据的方法是可用的,但它们缺乏现成的软件,只能应用于集群连续结果。我们通过添加随机截距扩展BART来处理相关的二元观测,并使用模拟研究来确定相关数据设置中的偏差、均方根误差、95%覆盖率和95%可信区间的平均长度。然后,我们成功地将我们的随机截距BART模型应用于我们的聚类数据集,并发现与BART和随机截距线性逻辑回归相比,预测性能有了实质性的提高。
{"title":"Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian Additive Regression Trees","authors":"Y. V. Tan, C. Flannagan, M. Elliott","doi":"10.4310/SII.2018.V11.N4.A1","DOIUrl":"https://doi.org/10.4310/SII.2018.V11.N4.A1","url":null,"abstract":"The development of driverless vehicles has spurred the need to predict human driving behavior to facilitate interaction between driverless and human-driven vehicles. Predicting human driving movements can be challenging, and poor prediction models can lead to accidents between the driverless and human-driven vehicles. We used the vehicle speed obtained from a naturalistic driving dataset to predict whether a human-driven vehicle would stop before executing a left turn. In a preliminary analysis, we found that BART produced less variable and higher AUC values compared to a variety of other state-of-the-art binary predictor methods. However, BART assumes independent observations, but our dataset consists of multiple observations clustered by driver. Although methods extending BART to clustered or longitudinal data are available, they lack readily available software and can only be applied to clustered continuous outcomes. We extend BART to handle correlated binary observations by adding a random intercept and used a simulation study to determine bias, root mean squared error, 95% coverage, and average length of 95% credible interval in a correlated data setting. We then successfully implemented our random intercept BART model to our clustered dataset and found substantial improvements in prediction performance compared to BART and random intercept linear logistic regression.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Network Inference and Community Detection, Based on Covariance Matrices, Correlations and Test Statistics from Arbitrary Distributions 基于协方差矩阵、相关性和任意分布的检验统计量的网络推理和社区检测
Pub Date : 2016-09-06 DOI: 10.6084/M9.FIGSHARE.3807537.V1
E. Thomas
In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets.
在本文中,我们提出了从网络节点对或更一般的变量对之间的关联强度的各种度量中推断二值邻接矩阵的方法。这种关联强度可以通过样本协方差和相关矩阵来量化,更普遍的是通过任意分布的检验统计和假设检验p值来量化。社区检测方法,如块建模,通常需要二值邻接矩阵作为起点。因此,我们提出的方法的主要动机是从变量之间的关联强度的这种成对测量中获得二值邻接矩阵。所提出的方法适用于大型高维数据集,并且基于计算效率高的算法。我们将说明它在一系列上下文和数据集中的实用性。
{"title":"Network Inference and Community Detection, Based on Covariance Matrices, Correlations and Test Statistics from Arbitrary Distributions","authors":"E. Thomas","doi":"10.6084/M9.FIGSHARE.3807537.V1","DOIUrl":"https://doi.org/10.6084/M9.FIGSHARE.3807537.V1","url":null,"abstract":"In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123308840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
arXiv: Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1