首页 > 最新文献

arXiv: Applications最新文献

英文 中文
Estimating the number of casualties in the American Indian war: a Bayesian analysis using the power law distribution 估计美国印第安人战争中的伤亡人数:使用幂律分布的贝叶斯分析
Pub Date : 2017-10-04 DOI: 10.1214/17-AOAS1082
C. Gillespie
The American Indian war lasted over one hundred years, and is a major event in the history of North America. As expected, since the war commenced in late eighteenth century, casualty records surrounding this conflict contain numerous sources of error, such as rounding and counting. Additionally, while major battles such as the Battle of the Little Bighorn were recorded, many smaller skirmishes were completely omitted from the records. Over the last few decades, it has been observed that the number of casualties in major conflicts follows a power law distribution. This paper places this observation within the Bayesian paradigm, enabling modelling of different error sources, allowing inferences to be made about the overall casualty numbers in the American Indian war.
美国印第安人战争持续了一百多年,是北美历史上的一件大事。不出所料,自18世纪后期战争开始以来,围绕这场冲突的伤亡记录包含许多错误来源,例如四舍五入和计数。此外,虽然像小大角之战这样的主要战役被记录了下来,但许多较小的小规模战斗却完全被遗漏了。在过去的几十年里,人们观察到,重大冲突中的伤亡人数遵循幂律分布。本文将这一观察结果置于贝叶斯范式中,使不同误差源的建模成为可能,从而可以推断出美国印第安人战争中的总伤亡人数。
{"title":"Estimating the number of casualties in the American Indian war: a Bayesian analysis using the power law distribution","authors":"C. Gillespie","doi":"10.1214/17-AOAS1082","DOIUrl":"https://doi.org/10.1214/17-AOAS1082","url":null,"abstract":"The American Indian war lasted over one hundred years, and is a major event in the history of North America. As expected, since the war commenced in late eighteenth century, casualty records surrounding this conflict contain numerous sources of error, such as rounding and counting. Additionally, while major battles such as the Battle of the Little Bighorn were recorded, many smaller skirmishes were completely omitted from the records. Over the last few decades, it has been observed that the number of casualties in major conflicts follows a power law distribution. This paper places this observation within the Bayesian paradigm, enabling modelling of different error sources, allowing inferences to be made about the overall casualty numbers in the American Indian war.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Understanding the Effect of Incentivized Advertising along the Conversion Funnel 理解激励广告在转化渠道中的作用
Pub Date : 2017-09-01 DOI: 10.2139/ssrn.3714353
K. Chiong, Sha Yang, Richard Y. Chen
In an effort to combat ad annoyance in mobile apps, publishers have introduced a new ad format called "Incentivized Advertising" or "Rewarded Advertising", whereby users receive rewards in exchange for watching ads. There is much debate in the industry regarding its' effectiveness. On the one hand, incentivized advertising is less intrusive and annoying, but on the other hand, users might be more interested in the rewards rather than the ad content. Using a large dataset of 1 million impressions from a mobile advertising platform, and in three separate quasi-experimental approaches, we find that incentivized advertising leads to lower users' click-through rates, but a higher overall install rate of the advertised app. In the second part, we study the mechanism of how incentivized advertising affects users' behavior. We test the hypothesis that incentivized advertising causes a temptation effect, whereby users prefer to collect and enjoy their rewards immediately, instead of pursuing the ads. We find the temptation effect is stronger when (i) users have to wait longer before receiving the rewards and when (ii) the value of the reward is relatively larger. We further find support that incentivized advertising has a positive effect of reducing ad annoyance -- an effect that is stronger for small-screen mobile devices, where advertising is more annoying. Finally, we take the publisher's perspective and quantify the overall effect on ad revenue. Our difference-in-differences estimates suggest switching to incentivized advertising would increase the publisher's revenue by $3.10 per 1,000 impressions.
为了对抗手机应用中的广告烦恼,发行商推出了一种名为“激励广告”或“奖励广告”的新广告形式,即用户通过观看广告获得奖励。关于其有效性,业内存在很多争议。一方面,激励性广告不那么烦人,但另一方面,用户可能对奖励更感兴趣,而不是广告内容。使用来自移动广告平台的100万次印象的大型数据集,并通过三种独立的准实验方法,我们发现激励广告导致较低的用户点击率,但广告应用的总体安装率更高。在第二部分中,我们研究了激励广告如何影响用户行为的机制。我们检验了激励广告产生诱惑效应的假设,即用户更愿意立即收集和享受他们的奖励,而不是追求广告。我们发现,当(i)用户需要等待更长的时间才能收到奖励,(ii)奖励的价值相对较大时,诱惑效应更强。我们进一步发现,激励性广告对减少广告烦人感具有积极作用——这种效果在小屏幕移动设备上更为明显,因为小屏幕移动设备上的广告更烦人。最后,我们从发行商的角度出发,量化其对广告收入的整体影响。我们的差异估计表明,转向激励广告将为发行商每1000次展示增加3.10美元的收益。
{"title":"Understanding the Effect of Incentivized Advertising along the Conversion Funnel","authors":"K. Chiong, Sha Yang, Richard Y. Chen","doi":"10.2139/ssrn.3714353","DOIUrl":"https://doi.org/10.2139/ssrn.3714353","url":null,"abstract":"In an effort to combat ad annoyance in mobile apps, publishers have introduced a new ad format called \"Incentivized Advertising\" or \"Rewarded Advertising\", whereby users receive rewards in exchange for watching ads. There is much debate in the industry regarding its' effectiveness. On the one hand, incentivized advertising is less intrusive and annoying, but on the other hand, users might be more interested in the rewards rather than the ad content. Using a large dataset of 1 million impressions from a mobile advertising platform, and in three separate quasi-experimental approaches, we find that incentivized advertising leads to lower users' click-through rates, but a higher overall install rate of the advertised app. \u0000In the second part, we study the mechanism of how incentivized advertising affects users' behavior. We test the hypothesis that incentivized advertising causes a temptation effect, whereby users prefer to collect and enjoy their rewards immediately, instead of pursuing the ads. We find the temptation effect is stronger when (i) users have to wait longer before receiving the rewards and when (ii) the value of the reward is relatively larger. We further find support that incentivized advertising has a positive effect of reducing ad annoyance -- an effect that is stronger for small-screen mobile devices, where advertising is more annoying. Finally, we take the publisher's perspective and quantify the overall effect on ad revenue. Our difference-in-differences estimates suggest switching to incentivized advertising would increase the publisher's revenue by $3.10 per 1,000 impressions.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114352921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Measuring human activity spaces from GPS data with density ranking and summary curves 利用密度排序和汇总曲线从GPS数据中测量人类活动空间
Pub Date : 2017-08-16 DOI: 10.1214/19-aoas1311
Yen-Chi Chen, A. Dobra
Activity spaces are fundamental to the assessment of individuals' dynamic exposure to social and environmental risk factors associated with multiple spatial contexts that are visited during activities of daily living. In this paper we survey existing approaches for measuring the geometry, size and structure of activity spaces based on GPS data, and explain their limitations. We propose addressing these shortcomings through a nonparametric approach called density ranking, and also through three summary curves: the mass-volume curve, the Betti number curve, and the persistence curve. We introduce a novel mixture model for human activity spaces, and study its asymptotic properties. We prove that the kernel density estimator which, at the present time, is one of the most widespread methods for measuring activity spaces is not a stable estimator of their structure. We illustrate the practical value of our methods with a simulation study, and with a recently collected GPS dataset that comprises the locations visited by ten individuals over a six months period.
活动空间是评估个人动态暴露于与日常生活活动中所访问的多个空间背景相关的社会和环境风险因素的基础。本文综述了现有的基于GPS数据测量活动空间几何、大小和结构的方法,并解释了它们的局限性。我们建议通过一种称为密度排序的非参数方法和三条总结曲线来解决这些缺点:质量-体积曲线、Betti数曲线和持久性曲线。提出了一种新的人类活动空间混合模型,并研究了其渐近性质。证明了核密度估计量并不是活动空间结构的稳定估计量,而核密度估计量是目前测量活动空间最常用的方法之一。我们通过模拟研究和最近收集的GPS数据集说明了我们方法的实用价值,该数据集包括10个人在6个月内访问的位置。
{"title":"Measuring human activity spaces from GPS data with density ranking and summary curves","authors":"Yen-Chi Chen, A. Dobra","doi":"10.1214/19-aoas1311","DOIUrl":"https://doi.org/10.1214/19-aoas1311","url":null,"abstract":"Activity spaces are fundamental to the assessment of individuals' dynamic exposure to social and environmental risk factors associated with multiple spatial contexts that are visited during activities of daily living. In this paper we survey existing approaches for measuring the geometry, size and structure of activity spaces based on GPS data, and explain their limitations. We propose addressing these shortcomings through a nonparametric approach called density ranking, and also through three summary curves: the mass-volume curve, the Betti number curve, and the persistence curve. We introduce a novel mixture model for human activity spaces, and study its asymptotic properties. We prove that the kernel density estimator which, at the present time, is one of the most widespread methods for measuring activity spaces is not a stable estimator of their structure. We illustrate the practical value of our methods with a simulation study, and with a recently collected GPS dataset that comprises the locations visited by ten individuals over a six months period.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131535510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Theme Enrichment Analysis: A Statistical Test for Identifying Significantly Enriched Themes in a List of Stories with an Application to the Star Trek Television Franchise 主题丰富分析:在《星际迷航》电视专营权的故事列表中识别显著丰富主题的统计检验
Pub Date : 2017-07-19 DOI: 10.16995/dscn.316
Mikael Onsjo, Paul Sheridan
In this paper, we describe how the hypergeometric test can be used to determine whether a given theme of interest occurs in a storyset at a frequency more than would be expected by chance. By a storyset we mean simply a list of stories defined according to a common attribute (e.g., author, movement, period). The test works roughly as follows: Given a background storyset and a sub-storyset of interest, the test determines whether a given theme is over-represented in the sub-storyset, based on comparing the proportions of stories in the sub-storyset and background storyset featuring the theme. A storyset is said to be "enriched" for a theme with respect to a particular background storyset, when the theme is identified as being significantly over-represented by the test. Furthermore, we introduce here a toy dataset consisting of 280 manually themed Star Trek television franchise episodes. As a proof of concept, we use the hypergeometric test to analyze the Star Trek stories for enriched themes. The hypergeometric testing approach to theme enrichment analysis is implemented for the Star Trek thematic dataset in the R package stoRy. A related R Shiny web application can be found at this https URL.
在本文中,我们描述了如何使用超几何测试来确定给定主题在故事集中出现的频率是否超过偶然预期。通过故事集,我们指的是根据共同属性(例如,作者、运动、时期)定义的故事列表。测试的工作原理大致如下:给定一个背景故事集和一个感兴趣的子故事集,测试通过比较子故事集和背景故事集中故事的比例,来确定给定主题在子故事集中是否被过度呈现。一个故事集被认为是一个主题的“丰富”,相对于一个特定的背景故事集,当主题被确定为被测试显著地过度代表时。此外,我们在这里介绍一个玩具数据集,由280个手动主题的《星际迷航》电视专营权剧集组成。作为概念证明,我们使用超几何测试来分析星际迷航故事的丰富主题。对R包stoRy中的《星际迷航》主题数据集实现了主题丰富分析的超几何测试方法。相关的R Shiny web应用程序可以在这个https URL中找到。
{"title":"Theme Enrichment Analysis: A Statistical Test for Identifying Significantly Enriched Themes in a List of Stories with an Application to the Star Trek Television Franchise","authors":"Mikael Onsjo, Paul Sheridan","doi":"10.16995/dscn.316","DOIUrl":"https://doi.org/10.16995/dscn.316","url":null,"abstract":"In this paper, we describe how the hypergeometric test can be used to determine whether a given theme of interest occurs in a storyset at a frequency more than would be expected by chance. By a storyset we mean simply a list of stories defined according to a common attribute (e.g., author, movement, period). The test works roughly as follows: Given a background storyset and a sub-storyset of interest, the test determines whether a given theme is over-represented in the sub-storyset, based on comparing the proportions of stories in the sub-storyset and background storyset featuring the theme. A storyset is said to be \"enriched\" for a theme with respect to a particular background storyset, when the theme is identified as being significantly over-represented by the test. Furthermore, we introduce here a toy dataset consisting of 280 manually themed Star Trek television franchise episodes. As a proof of concept, we use the hypergeometric test to analyze the Star Trek stories for enriched themes. The hypergeometric testing approach to theme enrichment analysis is implemented for the Star Trek thematic dataset in the R package stoRy. A related R Shiny web application can be found at this https URL.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123953359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Matern based multivariate Gaussian random process for a consistent model of the horizontal wind components and related variables 基于多变量高斯随机过程的水平风分量及相关变量一致模型
Pub Date : 2017-07-05 DOI: 10.1175/JAS-D-16-0369.1
Rudiger Hewer, P. Friederichs, A. Hense, M. Schlather
The integration of physical relationships into stochastic models is of major interest e.g. in data assimilation. Here, a multivariate Gaussian random field formulation is introduced, which represents the differential relations of the two-dimensional wind field and related variables such as streamfunction, velocity potential, vorticity and divergence. The covariance model is based on a flexible bivariate Matern covariance function for streamfunction and velocity potential. It allows for different variances in the potentials, non-zero correlations between them, anisotropy and a flexible smoothness parameter. The joint covariance function of the related variables is derived analytically. Further, it is shown that a consistent model with non-zero correlations between the potentials and positive definite covariance function is possible. The statistical model is fitted to forecasts of the horizontal wind fields of a mesoscale numerical weather prediction system. Parameter uncertainty is assessed by a parametric bootstrap method. The estimates reveal only physically negligible correlations between the potentials. In contrast to the numerical estimator, the statistical estimator of the ratio between the variances of the rotational and divergent wind components is unbiased.
将物理关系集成到随机模型中是一个重要的研究方向,例如数据同化。本文介绍了二维风场与流函数、速度势、涡度、散度等相关变量之间的微分关系的多元高斯随机场公式。协方差模型是基于流函数和速度势的灵活的二元matn协方差函数。它允许电位的不同方差,它们之间的非零相关性,各向异性和灵活的平滑参数。分析导出了相关变量的联合协方差函数。进一步证明了在电位和正定协方差函数之间存在非零相关的一致性模型是可能的。将统计模式拟合到一个中尺度数值天气预报系统的水平风场预报中。采用参数自举法评估参数不确定性。这些估计只揭示了电位之间物理上可忽略不计的相关性。与数值估计量相比,旋转风分量和发散风分量方差之比的统计估计量是无偏的。
{"title":"A Matern based multivariate Gaussian random process for a consistent model of the horizontal wind components and related variables","authors":"Rudiger Hewer, P. Friederichs, A. Hense, M. Schlather","doi":"10.1175/JAS-D-16-0369.1","DOIUrl":"https://doi.org/10.1175/JAS-D-16-0369.1","url":null,"abstract":"The integration of physical relationships into stochastic models is of major interest e.g. in data assimilation. Here, a multivariate Gaussian random field formulation is introduced, which represents the differential relations of the two-dimensional wind field and related variables such as streamfunction, velocity potential, vorticity and divergence. The covariance model is based on a flexible bivariate Matern covariance function for streamfunction and velocity potential. It allows for different variances in the potentials, non-zero correlations between them, anisotropy and a flexible smoothness parameter. The joint covariance function of the related variables is derived analytically. Further, it is shown that a consistent model with non-zero correlations between the potentials and positive definite covariance function is possible. The statistical model is fitted to forecasts of the horizontal wind fields of a mesoscale numerical weather prediction system. Parameter uncertainty is assessed by a parametric bootstrap method. The estimates reveal only physically negligible correlations between the potentials. In contrast to the numerical estimator, the statistical estimator of the ratio between the variances of the rotational and divergent wind components is unbiased.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132302999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Challenges to Estimating Contagion Effects from Observational Data 从观测数据估计传染效应的挑战
Pub Date : 2017-06-26 DOI: 10.1007/978-3-319-77332-2_3
Elizabeth L. Ogburn
{"title":"Challenges to Estimating Contagion Effects from Observational Data","authors":"Elizabeth L. Ogburn","doi":"10.1007/978-3-319-77332-2_3","DOIUrl":"https://doi.org/10.1007/978-3-319-77332-2_3","url":null,"abstract":"","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129584356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Quantifying the relation between performance and success in soccer 量化表现与足球成功之间的关系
Pub Date : 2017-05-02 DOI: 10.1142/S021952591750014X
L. Pappalardo, Paolo Cintia
The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.
如今,关于体育活动的大量数据的可用性为量化表现与成功之间的关系提供了机会。在这项研究中,我们分析了6个欧洲联赛的6000多场比赛和1000万场比赛,并调查了足球比赛中的这种关系。我们发现,一支球队在比赛最终排名中的位置与其典型表现显著相关,正如从足球数据中提取的一组技术特征所描述的那样。此外,我们发现,虽然胜利和失败可以用球队在比赛中的表现来解释,但很难通过使用机器学习方法来检测平局。然后,我们只依靠技术数据模拟每个联赛整个赛季的结果,即排除进球,利用过去赛季数据训练的机器学习模型。模拟生成的球队排名(PC排名)与实际排名接近,表明复杂系统对足球的看法有可能揭示有关表现和成功之间关系的隐藏模式。
{"title":"Quantifying the relation between performance and success in soccer","authors":"L. Pappalardo, Paolo Cintia","doi":"10.1142/S021952591750014X","DOIUrl":"https://doi.org/10.1142/S021952591750014X","url":null,"abstract":"The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Linear regression model with a randomly censored predictor:Estimation procedures 具有随机删减预测器的线性回归模型:估计程序
Pub Date : 2017-03-30 DOI: 10.19080/BBOAJ.2017.01.555556
F. Atem, Roland A. Matsouaka
We consider linear regression model estimation where the covariate of interest is randomly censored. Under a non-informative censoring mechanism, one may obtain valid estimates by deleting censored observations. However, this comes at a cost of lost information and decreased efficiency, especially under heavy censoring. Other methods for dealing with censored covariates, such as ignoring censoring or replacing censored observations with a fixed number, often lead to severely biased results and are of limited practicality. Parametric methods based on maximum likelihood estimation as well as semiparametric and non-parametric methods have been successfully used in linear regression estimation with censored covariates where censoring is due to a limit of detection. In this paper, we adapt some of these methods to handle randomly censored covariates and compare them under different scenarios to recently-developed semiparametric and nonparametric methods for randomly censored covariates. Specifically, we consider both dependent and independent randomly censored mechanisms as well as the impact of using a non-parametric algorithm on the distribution of the randomly censored covariate. Through extensive simulation studies, we compare the performance of these methods under different scenarios. Finally, we illustrate and compare the methods using the Framingham Health Study data to assess the association between low-density lipoprotein (LDL) in offspring and parental age at onset of a clinically-diagnosed cardiovascular event.
我们考虑线性回归模型估计,其中感兴趣的协变量是随机删减的。在非信息审查机制下,可以通过删除审查的观测值来获得有效的估计。然而,这是以丢失信息和降低效率为代价的,特别是在严格审查的情况下。处理审查协变量的其他方法,如忽略审查或用固定数量代替审查观测值,往往导致严重偏差的结果,实用性有限。基于极大似然估计的参数方法以及半参数和非参数方法已经成功地应用于带有删减协变量的线性回归估计中,其中删减是由于检测限制造成的。在本文中,我们采用其中的一些方法来处理随机截尾协变量,并将它们与最近发展的随机截尾协变量的半参数和非参数方法在不同情况下进行了比较。具体来说,我们考虑了依赖和独立随机审查机制,以及使用非参数算法对随机审查协变量分布的影响。通过广泛的仿真研究,我们比较了这些方法在不同场景下的性能。最后,我们用弗雷明汉健康研究的数据来说明和比较两种方法,以评估后代低密度脂蛋白(LDL)与父母在临床诊断的心血管事件发病时的年龄之间的关系。
{"title":"Linear regression model with a randomly censored predictor:Estimation procedures","authors":"F. Atem, Roland A. Matsouaka","doi":"10.19080/BBOAJ.2017.01.555556","DOIUrl":"https://doi.org/10.19080/BBOAJ.2017.01.555556","url":null,"abstract":"We consider linear regression model estimation where the covariate of interest is randomly censored. Under a non-informative censoring mechanism, one may obtain valid estimates by deleting censored observations. However, this comes at a cost of lost information and decreased efficiency, especially under heavy censoring. Other methods for dealing with censored covariates, such as ignoring censoring or replacing censored observations with a fixed number, often lead to severely biased results and are of limited practicality. Parametric methods based on maximum likelihood estimation as well as semiparametric and non-parametric methods have been successfully used in linear regression estimation with censored covariates where censoring is due to a limit of detection. \u0000In this paper, we adapt some of these methods to handle randomly censored covariates and compare them under different scenarios to recently-developed semiparametric and nonparametric methods for randomly censored covariates. Specifically, we consider both dependent and independent randomly censored mechanisms as well as the impact of using a non-parametric algorithm on the distribution of the randomly censored covariate. Through extensive simulation studies, we compare the performance of these methods under different scenarios. Finally, we illustrate and compare the methods using the Framingham Health Study data to assess the association between low-density lipoprotein (LDL) in offspring and parental age at onset of a clinically-diagnosed cardiovascular event.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121335777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modeling and Estimation for Self-Exciting Spatio-Temporal Models of Terrorist Activity 恐怖活动自激时空模型的建模与估计
Pub Date : 2017-03-24 DOI: 10.1214/17-AOAS1112
Nicholas J. Clark, P. Dixon
Spatio-temporal hierarchical modeling is an extremely attractive way to model the spread of crime or terrorism data over a given region, especially when the observations are counts and must be modeled discretely. The spatio-temporal diffusion is placed, as a matter of convenience, in the process model allowing for straightforward estimation of the diffusion parameters through Bayesian techniques. However, this method of modeling does not allow for the existence of self-excitation, or a temporal data model dependency, that has been shown to exist in criminal and terrorism data. In this manuscript we will use existing theories on how violence spreads to create models that allow for both spatio-temporal diffusion in the process model as well as temporal diffusion, or self-excitation, in the data model. We will further demonstrate how Laplace approximations similar to their use in Integrated Nested Laplace Approximation can be used to quickly and accurately conduct inference of self-exciting spatio-temporal models allowing practitioners a new way of fitting and comparing multiple process models. We will illustrate this approach by fitting a self-exciting spatio-temporal model to terrorism data in Iraq and demonstrate how choice of process model leads to differing conclusions on the existence of self-excitation in the data and differing conclusions on how violence is spreading spatio-temporally.
时空分层建模是一种非常有吸引力的方法来模拟犯罪或恐怖主义数据在给定区域的传播,特别是当观测值是计数的并且必须离散建模时。为了方便起见,将时空扩散置于过程模型中,允许通过贝叶斯技术直接估计扩散参数。然而,这种建模方法不允许存在自激励或时间数据模型依赖,而犯罪和恐怖主义数据中已显示存在这种依赖。在本文中,我们将使用现有的关于暴力如何传播的理论来创建模型,这些模型既允许过程模型中的时空扩散,也允许数据模型中的时间扩散或自激。我们将进一步演示如何使用与集成嵌套拉普拉斯近似相似的拉普拉斯近似来快速准确地进行自激时空模型的推理,从而为从业者提供一种拟合和比较多个过程模型的新方法。我们将通过将自激时空模型拟合到伊拉克的恐怖主义数据来说明这种方法,并展示过程模型的选择如何导致关于数据中存在自激的不同结论,以及关于暴力如何在时空上传播的不同结论。
{"title":"Modeling and Estimation for Self-Exciting Spatio-Temporal Models of Terrorist Activity","authors":"Nicholas J. Clark, P. Dixon","doi":"10.1214/17-AOAS1112","DOIUrl":"https://doi.org/10.1214/17-AOAS1112","url":null,"abstract":"Spatio-temporal hierarchical modeling is an extremely attractive way to model the spread of crime or terrorism data over a given region, especially when the observations are counts and must be modeled discretely. The spatio-temporal diffusion is placed, as a matter of convenience, in the process model allowing for straightforward estimation of the diffusion parameters through Bayesian techniques. However, this method of modeling does not allow for the existence of self-excitation, or a temporal data model dependency, that has been shown to exist in criminal and terrorism data. In this manuscript we will use existing theories on how violence spreads to create models that allow for both spatio-temporal diffusion in the process model as well as temporal diffusion, or self-excitation, in the data model. We will further demonstrate how Laplace approximations similar to their use in Integrated Nested Laplace Approximation can be used to quickly and accurately conduct inference of self-exciting spatio-temporal models allowing practitioners a new way of fitting and comparing multiple process models. We will illustrate this approach by fitting a self-exciting spatio-temporal model to terrorism data in Iraq and demonstrate how choice of process model leads to differing conclusions on the existence of self-excitation in the data and differing conclusions on how violence is spreading spatio-temporally.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131359642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Bayesian Analysis of Censored Spatial Data Based on a Non-Gaussian Model 基于非高斯模型的截尾空间数据贝叶斯分析
Pub Date : 2017-03-15 DOI: 10.18869/acadpub.jsri.13.2.155
V. Tadayon
In this paper, we suggest using a skew Gaussian-log Gaussian model for the analysis of spatial censored data from a Bayesian point of view. This approach furnishes an extension of the skew log Gaussian model to accommodate to both skewness and heavy tails and also censored data. All of the characteristics mentioned are three pervasive features of spatial data. We utilize data augmentation method and Markov chain Monte Carlo (MCMC) algorithms to do posterior calculations. The methodology is illustrated using simulated data, as well as applying it to a real data set. Keywords: Censored data, data augmentation, non-Gaussian spatial models, outlier, unified skew Gaussian.
在本文中,我们建议从贝叶斯的角度使用一个偏高斯-对数高斯模型来分析空间截尾数据。这种方法提供了对偏对数高斯模型的扩展,以适应偏态和重尾以及截尾数据。上述所有特征都是空间数据的三个普遍特征。我们利用数据增强法和马尔可夫链蒙特卡罗(MCMC)算法进行后验计算。该方法是用模拟数据来说明,并将其应用于一个真实的数据集。关键词:截尾数据,数据增强,非高斯空间模型,离群值,统一偏高斯。
{"title":"Bayesian Analysis of Censored Spatial Data Based on a Non-Gaussian Model","authors":"V. Tadayon","doi":"10.18869/acadpub.jsri.13.2.155","DOIUrl":"https://doi.org/10.18869/acadpub.jsri.13.2.155","url":null,"abstract":"In this paper, we suggest using a skew Gaussian-log Gaussian model for the analysis of spatial censored data from a Bayesian point of view. This approach furnishes an extension of the skew log Gaussian model to accommodate to both skewness and heavy tails and also censored data. All of the characteristics mentioned are three pervasive features of spatial data. We utilize data augmentation method and Markov chain Monte Carlo (MCMC) algorithms to do posterior calculations. The methodology is illustrated using simulated data, as well as applying it to a real data set. Keywords: Censored data, data augmentation, non-Gaussian spatial models, outlier, unified skew Gaussian.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
arXiv: Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1