首页 > 最新文献

arXiv: Applications最新文献

英文 中文
What are the most important factors that influence the changes in London Real Estate Prices? How to quantify them? 影响伦敦房地产价格变化的最重要因素是什么?如何量化它们?
Pub Date : 2018-02-22 DOI: 10.1453/jeb.v5i1.1609
Yiyang Gu
In recent years, real estate industry has captured government and public attention around the world. The factors influencing the prices of real estate are diversified and complex. However, due to the limitations and one-sidedness of their respective views, they did not provide enough theoretical basis for the fluctuation of house price and its influential factors. The purpose of this paper is to build a housing price model to make the scientific and objective analysis of London's real estate market trends from the year 1996 to 2016 and proposes some countermeasures to reasonably control house prices. Specifically, the paper analyzes eight factors which affect the house prices from two aspects: housing supply and demand and find out the factor which is of vital importance to the increase of housing price per square meter. The problem of a high level of multicollinearity between them is solved by using principal components analysis.
近年来,房地产行业在全球范围内引起了政府和公众的关注。影响房地产价格的因素是多元而复杂的。然而,由于各自观点的局限性和片面性,并没有为房价波动及其影响因素提供足够的理论依据。本文的目的是建立房价模型,科学客观地分析1996年至2016年伦敦房地产市场的走势,并提出合理控制房价的对策。具体来说,本文从住房供给和需求两个方面分析了影响房价的八个因素,找出了对每平方米房价上涨至关重要的因素。利用主成分分析方法解决了两者之间高度多重共线性的问题。
{"title":"What are the most important factors that influence the changes in London Real Estate Prices? How to quantify them?","authors":"Yiyang Gu","doi":"10.1453/jeb.v5i1.1609","DOIUrl":"https://doi.org/10.1453/jeb.v5i1.1609","url":null,"abstract":"In recent years, real estate industry has captured government and public attention around the world. The factors influencing the prices of real estate are diversified and complex. However, due to the limitations and one-sidedness of their respective views, they did not provide enough theoretical basis for the fluctuation of house price and its influential factors. The purpose of this paper is to build a housing price model to make the scientific and objective analysis of London's real estate market trends from the year 1996 to 2016 and proposes some countermeasures to reasonably control house prices. Specifically, the paper analyzes eight factors which affect the house prices from two aspects: housing supply and demand and find out the factor which is of vital importance to the increase of housing price per square meter. The problem of a high level of multicollinearity between them is solved by using principal components analysis.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121968159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Automated Quality Assessment of (Citizen) Weather Stations (市民)气象站的自动质素评估
Pub Date : 2018-02-05 DOI: 10.1553/giscience2018_01_s65
Julian Bruns, J. Riesterer, Bowen Wang, T. Riedel, M. Beigl
Today we have access to a vast amount of weather, air quality, noise or radioactivity data collected by individual around the globe. This volunteered geographic information often contains data of uncertain and of heterogeneous quality, in particular when compared to official in-situ measurements. This limits their application, as rigorous, work-intensive data cleaning has to be performed, which reduces the amount of data and cannot be performed in real-time. In this paper, we propose dynamically learning the quality of individual sensors by optimizing a weighted Gaussian process regression using a genetic algorithm. We chose weather stations as our use case as these are the most common VGI measurements. The evaluation is done for the south-west of Germany in August 2016 with temperature data from the Wunderground network and the Deutsche Wetter Dienst (DWD), in total 1561 stations. Using a 10-fold cross-validation scheme based on the DWD ground truth, we can show significant improvements of the predicted sensor reading. In our experiment we were obtain a 12.5% improvement on the mean absolute error.
今天,我们可以获得全球个人收集的大量天气、空气质量、噪音或放射性数据。这种自愿提供的地理信息通常包含不确定和异构质量的数据,特别是与官方的原位测量相比较时。这限制了它们的应用,因为必须执行严格的、工作密集型的数据清理,这减少了数据量,并且不能实时执行。在本文中,我们提出动态学习单个传感器的质量,通过优化加权高斯过程回归使用遗传算法。我们选择气象站作为我们的用例,因为它们是最常见的VGI测量。2016年8月,利用Wunderground网络和Deutsche Wetter Dienst (DWD)的1561个站点的温度数据,对德国西南部进行了评估。使用基于DWD地面真值的10倍交叉验证方案,我们可以显示预测传感器读数的显着改进。在我们的实验中,我们获得了12.5%的平均绝对误差改进。
{"title":"Automated Quality Assessment of (Citizen) Weather Stations","authors":"Julian Bruns, J. Riesterer, Bowen Wang, T. Riedel, M. Beigl","doi":"10.1553/giscience2018_01_s65","DOIUrl":"https://doi.org/10.1553/giscience2018_01_s65","url":null,"abstract":"Today we have access to a vast amount of weather, air quality, noise or radioactivity data collected by individual around the globe. This volunteered geographic information often contains data of uncertain and of heterogeneous quality, in particular when compared to official in-situ measurements. This limits their application, as rigorous, work-intensive data cleaning has to be performed, which reduces the amount of data and cannot be performed in real-time. In this paper, we propose dynamically learning the quality of individual sensors by optimizing a weighted Gaussian process regression using a genetic algorithm. We chose weather stations as our use case as these are the most common VGI measurements. The evaluation is done for the south-west of Germany in August 2016 with temperature data from the Wunderground network and the Deutsche Wetter Dienst (DWD), in total 1561 stations. Using a 10-fold cross-validation scheme based on the DWD ground truth, we can show significant improvements of the predicted sensor reading. In our experiment we were obtain a 12.5% improvement on the mean absolute error.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reliability-centered maintenance: analyzing failure in harvest sugarcane machine using some generalizations of the Weibull distribution 以可靠性为中心的维修:用威布尔分布的一些推广方法分析收割甘蔗机的故障
Pub Date : 2017-12-08 DOI: 10.1155/2018/1241856
P. Ramos, D. Nascimento, Camila Cocolo, M. J. Nicola, C. Alonso, Luiz Gustavo Ribeiro, A. Ennes, F. Louzada
In this study we considered five generalizations of the standard Weibull distribution to describe the lifetime of two important components of harvest sugarcane machines. The harvesters considered in the analysis does the harvest of an average of 20 tons of sugarcane per hour and their malfunction may lead to major losses, therefore, an effective maintenance approach is of main interesting for cost savings. For the considered distributions, the mathematical background is presented. Maximum likelihood is used for parameter estimation. Further, different discrimination procedures were used to obtain the best fit for each component. At the end, we propose a maintenance scheduling for the components of the harvesters using predictive analysis.
在这项研究中,我们考虑了标准威布尔分布的五种概括来描述收割甘蔗机的两个重要部件的寿命。分析中考虑的收割机平均每小时收获20吨甘蔗,其故障可能导致重大损失,因此,有效的维护方法对节省成本非常重要。对于所考虑的分布,给出了数学背景。最大似然用于参数估计。此外,采用不同的判别程序来获得每个成分的最佳拟合。最后,利用预测分析的方法提出了收割机部件的维修计划。
{"title":"Reliability-centered maintenance: analyzing failure in harvest sugarcane machine using some generalizations of the Weibull distribution","authors":"P. Ramos, D. Nascimento, Camila Cocolo, M. J. Nicola, C. Alonso, Luiz Gustavo Ribeiro, A. Ennes, F. Louzada","doi":"10.1155/2018/1241856","DOIUrl":"https://doi.org/10.1155/2018/1241856","url":null,"abstract":"In this study we considered five generalizations of the standard Weibull distribution to describe the lifetime of two important components of harvest sugarcane machines. The harvesters considered in the analysis does the harvest of an average of 20 tons of sugarcane per hour and their malfunction may lead to major losses, therefore, an effective maintenance approach is of main interesting for cost savings. For the considered distributions, the mathematical background is presented. Maximum likelihood is used for parameter estimation. Further, different discrimination procedures were used to obtain the best fit for each component. At the end, we propose a maintenance scheduling for the components of the harvesters using predictive analysis.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131926044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Theoretical Study of Process Dependence for Standard Two-Process Serial Models and Standard Two-Process Parallel Models 标准两进程串行模型和标准两进程并行模型的过程依赖性理论研究
Pub Date : 2017-12-02 DOI: 10.4324/9781315169903-6
Ru Zhang, Yanjun Liu, J. Townsend
In this article we differentiate and characterize the standard two-process serial models and the standard two process parallel models by investigating the behavior of (conditional) distributions of the total completion times and survivals of intercompletion times without assuming any particular forms for the distributions of processing times. We address our argument through mathematical proofs and computational methods. It is found that for the standard two-process serial models, positive dependence between the total completion times does not hold if no specific distributional forms are imposed to the processing times. By contrast, for the standard two-process parallel models the total completion times are independent. According to different nature of process dependence, one can distinguish a standard two process serial model from a standard two-process parallel model. We also find that in standard two-process parallel models the monotonicity of survival function of the intercompletion time of stage 2 conditional on the completion of stage 1 depends on the monotonicity of the hazard function of processing time. We also find that the survival of intercompletion time(s) from stage 1 to stage 2 is increasing when the ratio of hazard function meets certain criterion. Then the empirical finding that the intercompletion time is grown with the growth of the number of recalled words can be accounted by standard parallel models. We also find that if the cumulative hazard function is concave or linear, the survival from stage 1 to stage 2 is increasing.
本文通过研究总完成时间和间隔完成时间剩余时间的(条件)分布的行为,对标准两进程串行模型和标准两进程并行模型进行了区分和表征,而没有对处理时间的分布作任何特殊的假设。我们通过数学证明和计算方法来阐述我们的论点。研究发现,对于标准的两工序序列模型,如果对加工时间不施加特定的分配形式,则总完成时间之间不存在正相关关系。相比之下,对于标准的双进程并行模型,总完成时间是独立的。根据过程依赖性质的不同,可以区分标准的两进程串行模型和标准的两进程并行模型。我们还发现,在标准的两进程并行模型中,以阶段1完成为条件的阶段2的互补时间的生存函数的单调性取决于处理时间的危险函数的单调性。我们还发现,当风险函数的比值满足一定的准则时,从阶段1到阶段2的互补时间存活时间(s)增加。实验结果表明,补全时间随召回词数的增加而增加,可以用标准的并行模型来解释。我们还发现,如果累积风险函数是凹的或线性的,从阶段1到阶段2的生存是增加的。
{"title":"A Theoretical Study of Process Dependence for Standard Two-Process Serial Models and Standard Two-Process Parallel Models","authors":"Ru Zhang, Yanjun Liu, J. Townsend","doi":"10.4324/9781315169903-6","DOIUrl":"https://doi.org/10.4324/9781315169903-6","url":null,"abstract":"In this article we differentiate and characterize the standard two-process serial models and the standard two process parallel models by investigating the behavior of (conditional) distributions of the total completion times and survivals of intercompletion times without assuming any particular forms for the distributions of processing times. We address our argument through mathematical proofs and computational methods. It is found that for the standard two-process serial models, positive dependence between the total completion times does not hold if no specific distributional forms are imposed to the processing times. By contrast, for the standard two-process parallel models the total completion times are independent. According to different nature of process dependence, one can distinguish a standard two process serial model from a standard two-process parallel model. We also find that in standard two-process parallel models the monotonicity of survival function of the intercompletion time of stage 2 conditional on the completion of stage 1 depends on the monotonicity of the hazard function of processing time. We also find that the survival of intercompletion time(s) from stage 1 to stage 2 is increasing when the ratio of hazard function meets certain criterion. Then the empirical finding that the intercompletion time is grown with the growth of the number of recalled words can be accounted by standard parallel models. We also find that if the cumulative hazard function is concave or linear, the survival from stage 1 to stage 2 is increasing.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114706373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador 萨尔瓦多性暴力复杂反常模式的发现
Pub Date : 2017-11-17 DOI: 10.5281/zenodo.571551
Maria De-Arteaga, A. Dubrawski
When sexual violence is a product of organized crime or social imaginary, the links between sexual violence episodes can be understood as a latent structure. With this assumption in place, we can use data science to uncover complex patterns. In this paper we focus on the use of data mining techniques to unveil complex anomalous spatiotemporal patterns of sexual violence. We illustrate their use by analyzing all reported rapes in El Salvador over a period of nine years. Through our analysis, we are able to provide evidence of phenomena that, to the best of our knowledge, have not been previously reported in literature. We devote special attention to a pattern we discover in the East, where underage victims report their boyfriends as perpetrators at anomalously high rates. Finally, we explain how such analyzes could be conducted in real-time, enabling early detection of emerging patterns to allow law enforcement agencies and policy makers to react accordingly.
当性暴力是有组织犯罪或社会想象的产物时,性暴力事件之间的联系可以被理解为一个潜在的结构。有了这个假设,我们就可以使用数据科学来发现复杂的模式。在本文中,我们着重于使用数据挖掘技术来揭示性暴力的复杂异常时空模式。我们通过分析萨尔瓦多九年来报告的所有强奸案来说明它们的使用。通过我们的分析,我们能够提供一些现象的证据,据我们所知,这些现象以前没有在文献中报道过。我们特别关注我们在东方发现的一种模式,未成年受害者举报她们的男朋友是犯罪者的比例异常高。最后,我们解释了如何实时进行此类分析,从而能够及早发现新出现的模式,从而使执法机构和政策制定者能够做出相应的反应。
{"title":"Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador","authors":"Maria De-Arteaga, A. Dubrawski","doi":"10.5281/zenodo.571551","DOIUrl":"https://doi.org/10.5281/zenodo.571551","url":null,"abstract":"When sexual violence is a product of organized crime or social imaginary, the links between sexual violence episodes can be understood as a latent structure. With this assumption in place, we can use data science to uncover complex patterns. In this paper we focus on the use of data mining techniques to unveil complex anomalous spatiotemporal patterns of sexual violence. We illustrate their use by analyzing all reported rapes in El Salvador over a period of nine years. Through our analysis, we are able to provide evidence of phenomena that, to the best of our knowledge, have not been previously reported in literature. We devote special attention to a pattern we discover in the East, where underage victims report their boyfriends as perpetrators at anomalously high rates. Finally, we explain how such analyzes could be conducted in real-time, enabling early detection of emerging patterns to allow law enforcement agencies and policy makers to react accordingly.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126557946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Bayesian Gaussian models for interpolating large-dimensional data at misaligned areal units 在不对齐的面积单位上插值大维数据的贝叶斯高斯模型
Pub Date : 2017-11-10 DOI: 10.36334/modsim.2017.a2.bakar
K. Bakar
Areal level spatial data are often large, sparse and may appear with geographical shapes that are regular or irregular (e.g., postcode). Moreover, sometimes it is important to obtain predictive inference in regular or irregular areal shapes that is misaligned with the observed spatial areal geographical boundary. For example, in a survey the respondents were asked about their postcode, however for policy making purposes, researchers are often interested to obtain information at the SA2. The statistical challenge is to obtain spatial prediction at the SA2s, where the SA2s may have overlapped geographical boundaries with postcodes. The study is motivated by a practical survey data obtained from the Australian National University (ANU) Poll. Here the main research question is to understand respondents' satisfaction level with the way Australia is heading. The data are observed at 1,944 postcodes among the 2,516 available postcodes across Australia, and prediction is obtained at the 2,196 SA2s. The proposed method also explored through a grid-based simulation study, where data have been observed in a regular grid and spatial prediction has been done in a regular grid that has a misaligned geographical boundary with the first regular grid-set. The real-life example with ANU Poll data addresses the situation of irregular geographical boundaries that are misaligned, i.e., model fitted with postcode data and hence obtained prediction at the SA2. A comparison study is also performed to validate the proposed method. In this paper, a Gaussian model is constructed under Bayesian hierarchy. The novelty lies in the development of the basis function that can address spatial sparsity and localised spatial structure. It can also address the large-dimensional spatial data modelling problem by constructing knot based reduced-dimensional basis functions.
面级空间数据通常是大而稀疏的,并且可能出现规则或不规则的地理形状(例如,邮政编码)。此外,有时在与观测到的空间区域地理边界不一致的规则或不规则区域形状中获得预测推断是很重要的。例如,在一项调查中,受访者被问及他们的邮政编码,然而出于政策制定的目的,研究人员通常有兴趣在SA2获取信息。统计上的挑战是获得SA2s的空间预测,其中SA2s可能与邮政编码重叠的地理边界。这项研究的动机是来自澳大利亚国立大学(ANU)民意调查的实际调查数据。这里的主要研究问题是了解受访者对澳大利亚前进方式的满意度。该数据在澳大利亚2516个可用邮政编码中的1944个邮政编码中进行了观察,并在2196个邮政编码中进行了预测。该方法还通过基于网格的模拟研究进行了探索,其中在规则网格中观察数据,并在规则网格中进行空间预测,该规则网格与第一个规则网格集的地理边界不一致。ANU Poll数据的现实例子解决了不规则地理边界不对齐的情况,即模型与邮政编码数据相拟合,从而在SA2上获得预测。比较研究也进行了验证所提出的方法。本文在贝叶斯层次结构下构造了一个高斯模型。新颖之处在于基函数的发展,可以解决空间稀疏性和局部空间结构。它还可以通过构造基于结的降维基函数来解决大维空间数据建模问题。
{"title":"Bayesian Gaussian models for interpolating large-dimensional data at misaligned areal units","authors":"K. Bakar","doi":"10.36334/modsim.2017.a2.bakar","DOIUrl":"https://doi.org/10.36334/modsim.2017.a2.bakar","url":null,"abstract":"Areal level spatial data are often large, sparse and may appear with geographical shapes that are regular or irregular (e.g., postcode). Moreover, sometimes it is important to obtain predictive inference in regular or irregular areal shapes that is misaligned with the observed spatial areal geographical boundary. For example, in a survey the respondents were asked about their postcode, however for policy making purposes, researchers are often interested to obtain information at the SA2. The statistical challenge is to obtain spatial prediction at the SA2s, where the SA2s may have overlapped geographical boundaries with postcodes. The study is motivated by a practical survey data obtained from the Australian National University (ANU) Poll. Here the main research question is to understand respondents' satisfaction level with the way Australia is heading. The data are observed at 1,944 postcodes among the 2,516 available postcodes across Australia, and prediction is obtained at the 2,196 SA2s. The proposed method also explored through a grid-based simulation study, where data have been observed in a regular grid and spatial prediction has been done in a regular grid that has a misaligned geographical boundary with the first regular grid-set. The real-life example with ANU Poll data addresses the situation of irregular geographical boundaries that are misaligned, i.e., model fitted with postcode data and hence obtained prediction at the SA2. A comparison study is also performed to validate the proposed method. In this paper, a Gaussian model is constructed under Bayesian hierarchy. The novelty lies in the development of the basis function that can address spatial sparsity and localised spatial structure. It can also address the large-dimensional spatial data modelling problem by constructing knot based reduced-dimensional basis functions.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127019917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Finite Sample Correction for Two-Sample Inference with Sparse Covariate Adjusted Functional Data 稀疏协变量调整函数数据双样本推理的有限样本校正
Pub Date : 2017-11-09 DOI: 10.1016/J.JMVA.2018.04.006
Dominik Liebl
{"title":"Finite Sample Correction for Two-Sample Inference with Sparse Covariate Adjusted Functional Data","authors":"Dominik Liebl","doi":"10.1016/J.JMVA.2018.04.006","DOIUrl":"https://doi.org/10.1016/J.JMVA.2018.04.006","url":null,"abstract":"","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"119872078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian nonparametric models for biomedical data analysis 生物医学数据分析的贝叶斯非参数模型
Pub Date : 2017-10-26 DOI: 10.15781/T2MP4W42K
Tianjian Zhou
In this dissertation, we develop nonparametric Bayesian models for biomedical data analysis. In particular, we focus on inference for tumor heterogeneity and inference for missing data. First, we present a Bayesian feature allocation model for tumor subclone reconstruction using mutation pairs. The key innovation lies in the use of short reads mapped to pairs of proximal single nucleotide variants (SNVs). In contrast, most existing methods use only marginal reads for unpaired SNVs. In the same context of using mutation pairs, in order to recover the phylogenetic relationship of subclones, we then develop a Bayesian treed feature allocation model. In contrast to commonly used feature allocation models, we allow the latent features to be dependent, using a tree structure to introduce dependence. Finally, we propose a nonparametric Bayesian approach to monotone missing data in longitudinal studies with non-ignorable missingness. In contrast to most existing methods, our method allows for incorporating information from auxiliary covariates and is able to capture complex structures among the response, missingness and auxiliary covariates. Our models are validated through simulation studies and are applied to real-world biomedical datasets.
在本论文中,我们建立了非参数贝叶斯模型用于生物医学数据分析。我们特别关注肿瘤异质性的推断和缺失数据的推断。首先,我们提出了一个基于突变对的肿瘤亚克隆重构贝叶斯特征分配模型。关键的创新在于使用映射到近端单核苷酸变异(snv)对的短读。相比之下,大多数现有方法仅对未配对的snv使用边缘读取。在使用突变对的相同背景下,为了恢复亚克隆的系统发育关系,我们建立了贝叶斯树特征分配模型。与常用的特征分配模型相比,我们允许潜在特征是依赖的,使用树结构引入依赖关系。最后,我们提出了一种非参数贝叶斯方法来处理具有不可忽略缺失的纵向研究中的单调缺失数据。与大多数现有方法相比,我们的方法允许从辅助协变量中合并信息,并且能够捕获响应,缺失和辅助协变量之间的复杂结构。我们的模型通过模拟研究得到验证,并应用于现实世界的生物医学数据集。
{"title":"Bayesian nonparametric models for biomedical data analysis","authors":"Tianjian Zhou","doi":"10.15781/T2MP4W42K","DOIUrl":"https://doi.org/10.15781/T2MP4W42K","url":null,"abstract":"In this dissertation, we develop nonparametric Bayesian models for biomedical data analysis. In particular, we focus on inference for tumor heterogeneity and inference for missing data. First, we present a Bayesian feature allocation model for tumor subclone reconstruction using mutation pairs. The key innovation lies in the use of short reads mapped to pairs of proximal single nucleotide variants (SNVs). In contrast, most existing methods use only marginal reads for unpaired SNVs. In the same context of using mutation pairs, in order to recover the phylogenetic relationship of subclones, we then develop a Bayesian treed feature allocation model. In contrast to commonly used feature allocation models, we allow the latent features to be dependent, using a tree structure to introduce dependence. Finally, we propose a nonparametric Bayesian approach to monotone missing data in longitudinal studies with non-ignorable missingness. In contrast to most existing methods, our method allows for incorporating information from auxiliary covariates and is able to capture complex structures among the response, missingness and auxiliary covariates. Our models are validated through simulation studies and are applied to real-world biomedical datasets.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127112119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Response to: 'Nist Experts Urge Caution in Use of Courtroom Evidence Presentation Method' 对“Nist专家敦促谨慎使用法庭证据呈现方法”的回应
Pub Date : 2017-10-16 DOI: 10.2139/SSRN.3054092
G. Morrison
A press release from the National Institute of Standards and Technology (NIST)could potentially impede progress toward improving the analysis of forensic evidence and the presentation of forensic analysis results in courts in the United States and around the world. "NIST experts urge caution in use of courtroom evidence presentation method" was released on October 12, 2017, and was picked up by the this http URL news service. It argues that, except in exceptional cases, the results of forensic analyses should not be reported as "likelihood ratios". The press release, and the journal article by NIST researchers Steven P. Lund & Harri Iyer on which it is based, identifies some legitimate points of concern, but makes a strawman argument and reaches an unjustified conclusion that throws the baby out with the bathwater.
美国国家标准与技术研究所(NIST)发布的一份新闻稿可能会阻碍美国和世界各地法庭对法医证据分析的改进和法医分析结果的呈现。2017年10月12日,美国国家标准与技术研究院(NIST)专家敦促谨慎使用法庭证据展示方法,并被本网站转载。它认为,除特殊情况外,法医分析的结果不应以“可能性比”报告。这份新闻稿,以及NIST研究人员Steven P. Lund和Harri Iyer在杂志上发表的文章,是该报告的基础,指出了一些合理的担忧点,但提出了一个吸管人的论点,得出了一个不合理的结论,把婴儿和洗澡水一起倒掉了。
{"title":"A Response to: 'Nist Experts Urge Caution in Use of Courtroom Evidence Presentation Method'","authors":"G. Morrison","doi":"10.2139/SSRN.3054092","DOIUrl":"https://doi.org/10.2139/SSRN.3054092","url":null,"abstract":"A press release from the National Institute of Standards and Technology (NIST)could potentially impede progress toward improving the analysis of forensic evidence and the presentation of forensic analysis results in courts in the United States and around the world. \"NIST experts urge caution in use of courtroom evidence presentation method\" was released on October 12, 2017, and was picked up by the this http URL news service. It argues that, except in exceptional cases, the results of forensic analyses should not be reported as \"likelihood ratios\". The press release, and the journal article by NIST researchers Steven P. Lund & Harri Iyer on which it is based, identifies some legitimate points of concern, but makes a strawman argument and reaches an unjustified conclusion that throws the baby out with the bathwater.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128182322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes 运动结果联合估计的多元广义线性混合模型
Pub Date : 2017-10-15 DOI: 10.26398/IJAS.0030-008
Jennifer Broatch, Andrew T. Karl
This paper explores improvements in prediction accuracy and inference capability when allowing for potential correlation in team-level random effects across multiple game-level responses from different assumed distributions. First-order and fully exponential Laplace approximations are used to fit normal-binary and Poisson-binary multivariate generalized linear mixed models with non-nested random effects structures. We have built these models into the R package mvglmmRank, which is used to explore several seasons of American college football and basketball data.
本文探讨了在考虑来自不同假设分布的多个游戏级别响应的团队级别随机效应的潜在相关性时,预测精度和推理能力的改进。采用一阶和全指数拉普拉斯近似拟合具有非嵌套随机效应结构的正态二元和泊松二元多元广义线性混合模型。我们将这些模型构建到R软件包mvglmmRank中,该软件包用于研究美国大学橄榄球和篮球的几个赛季的数据。
{"title":"Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes","authors":"Jennifer Broatch, Andrew T. Karl","doi":"10.26398/IJAS.0030-008","DOIUrl":"https://doi.org/10.26398/IJAS.0030-008","url":null,"abstract":"This paper explores improvements in prediction accuracy and inference capability when allowing for potential correlation in team-level random effects across multiple game-level responses from different assumed distributions. First-order and fully exponential Laplace approximations are used to fit normal-binary and Poisson-binary multivariate generalized linear mixed models with non-nested random effects structures. We have built these models into the R package mvglmmRank, which is used to explore several seasons of American college football and basketball data.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125486082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
arXiv: Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1