首页 > 最新文献

arXiv: Methodology最新文献

英文 中文
Improved Chebyshev inequality: new probability bounds with known supremum of PDF 改进的Chebyshev不等式:具有已知上极值的新的概率界
Pub Date : 2018-08-31 DOI: 10.31219/osf.io/h9zfn
T. Nishiyama
In this paper, we derive new probability bounds for Chebyshev's inequality if the supremum of the probability density function is known.This result holds for one-dimensional or multivariate continuous probability distributions with finite mean and variance (covariance matrix).We also show that the similar result holds for specific discrete probability distributions.
在已知概率密度函数的上极值的情况下,导出了Chebyshev不等式的新的概率界。该结果适用于具有有限均值和方差(协方差矩阵)的一维或多元连续概率分布。我们还表明,类似的结果适用于特定的离散概率分布。
{"title":"Improved Chebyshev inequality: new probability bounds with known supremum of PDF","authors":"T. Nishiyama","doi":"10.31219/osf.io/h9zfn","DOIUrl":"https://doi.org/10.31219/osf.io/h9zfn","url":null,"abstract":"In this paper, we derive new probability bounds for Chebyshev's inequality if the supremum of the probability density function is known.This result holds for one-dimensional or multivariate continuous probability distributions with finite mean and variance (covariance matrix).We also show that the similar result holds for specific discrete probability distributions.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Sliding Window Discrete Fourier Transform 滑动窗口离散傅里叶变换
Pub Date : 2018-07-20 DOI: 10.1184/R1/8191937.V1
Lee F. Richardson, W. Eddy
The discrete Fourier transform (DFT) is a widely used tool across science and engineering. Nevertheless, the DFT assumes that the frequency characteristics of a signal remain constant over time, and is unable to detect local changes. Researchers beginning with Gabor (1946) addressed this shortcoming by inventing methods to obtain time-frequency representations, and this thesis focuses on one such method: the Sliding Window Discrete Fourier Transform (SWDFT). Whereas the DFT operates on an entire signal, the SWDFT takes an ordered sequence of smaller DFTs on contiguous subsets of a signal. The SWDFT is a fundamental tool in time-frequency analysis, and is used in a variety of applications, such as spectrogram estimation, image enhancement, neural networks, and more. This thesis studies the SWDFT from three perspectives: algorithmic, statistical, and applied. Algorithmically, we introduce the Tree SWDFT algorithm, and extend it to arbitrary dimensions. Statistically, wederive the marginal distribution and covariance structure of SWDFT coefficients for white noise signals, which allows us to characterize the SWDFT coefficients as a Gaussian process with a known covariance. We also propose a localized version of cosine regression, and show that the approximate maximum likelihood estimate of the frequency parameter in this model is the maximum SWDFT coefficient over all possible window sizes. From an applied perspective, we introduce a new algorithm to decompose signals with multiple non-stationary periodic components, called matching demodulation. We demonstrate the utility of matching demodulation in an analysis of local field potential recordings from a neuroscience experiment.
离散傅里叶变换(DFT)是一种广泛应用于科学和工程领域的工具。然而,DFT假设信号的频率特性随时间保持不变,并且无法检测到局部变化。从Gabor(1946)开始的研究人员通过发明获得时频表示的方法来解决这一缺点,本文主要关注这样一种方法:滑动窗口离散傅里叶变换(SWDFT)。DFT作用于整个信号,而SWDFT则在信号的连续子集上取较小DFT的有序序列。SWDFT是时频分析的基本工具,用于各种应用,如谱图估计、图像增强、神经网络等。本文从算法、统计和应用三个方面对SWDFT进行了研究。在算法上,我们引入了树形SWDFT算法,并将其扩展到任意维度。统计上,我们推导了白噪声信号的SWDFT系数的边际分布和协方差结构,这使我们能够将SWDFT系数表征为具有已知协方差的高斯过程。我们还提出了余弦回归的局部版本,并表明该模型中频率参数的近似最大似然估计是所有可能窗口大小上的最大SWDFT系数。从应用的角度出发,我们引入了一种新的算法来分解具有多个非平稳周期分量的信号,称为匹配解调。我们在神经科学实验的局部场电位记录分析中展示了匹配解调的效用。
{"title":"The Sliding Window Discrete Fourier Transform","authors":"Lee F. Richardson, W. Eddy","doi":"10.1184/R1/8191937.V1","DOIUrl":"https://doi.org/10.1184/R1/8191937.V1","url":null,"abstract":"The discrete Fourier transform (DFT) is a widely used tool across science and engineering. Nevertheless, the DFT assumes that the frequency characteristics of a signal remain constant over time, and is unable to detect local changes. Researchers beginning with Gabor (1946) addressed this shortcoming by inventing methods to obtain time-frequency representations, and this thesis focuses on one such method: the Sliding Window Discrete Fourier Transform (SWDFT). Whereas the DFT operates on an entire signal, the SWDFT takes an ordered sequence of smaller DFTs on contiguous subsets of a signal. The SWDFT is a fundamental tool in time-frequency analysis, and is used in a variety of applications, such as spectrogram estimation, image enhancement, neural networks, and more. This thesis studies the SWDFT from three perspectives: algorithmic, statistical, and applied. Algorithmically, we introduce the Tree SWDFT algorithm, and extend it to arbitrary dimensions. Statistically, wederive the marginal distribution and covariance structure of SWDFT coefficients for white noise signals, which allows us to characterize the SWDFT coefficients as a Gaussian process with a known covariance. We also propose a localized version of cosine regression, and show that the approximate maximum likelihood estimate of the frequency parameter in this model is the maximum SWDFT coefficient over all possible window sizes. From an applied perspective, we introduce a new algorithm to decompose signals with multiple non-stationary periodic components, called matching demodulation. We demonstrate the utility of matching demodulation in an analysis of local field potential recordings from a neuroscience experiment.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134040479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JMASM 52: Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using R JMASM 52:使用R的极有效排列和自举假设检验
Pub Date : 2018-06-28 DOI: 10.22237/jmasm/1604189940
C. Chatzipantsiou, Marios Dimitriadis, M. Papadakis, M. Tsagris
Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. In this paper we treat the case of Pearson correlation coefficient and two independent samples t-test. We propose a highly computationally efficient method for calculating permutation based p-values in these two cases. The method is general and can be applied or be adopted to other similar two sample mean or two mean vectors cases.
众所周知,基于重新抽样的统计检验计算量很大,但在可用的小样本量时是可靠的。尽管它们的理论性质很好,但人们并没有付出太多努力使它们变得有效。本文处理了Pearson相关系数和两个独立样本t检验的情况。我们提出了一种计算效率很高的方法来计算这两种情况下基于排列的p值。该方法具有通用性,可以应用于其他类似的两个样本均值或两个均值向量的情况。
{"title":"JMASM 52: Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using R","authors":"C. Chatzipantsiou, Marios Dimitriadis, M. Papadakis, M. Tsagris","doi":"10.22237/jmasm/1604189940","DOIUrl":"https://doi.org/10.22237/jmasm/1604189940","url":null,"abstract":"Re-sampling based statistical tests are known to be computationally heavy, but reliable when small sample sizes are available. Despite their nice theoretical properties not much effort has been put to make them efficient. In this paper we treat the case of Pearson correlation coefficient and two independent samples t-test. We propose a highly computationally efficient method for calculating permutation based p-values in these two cases. The method is general and can be applied or be adopted to other similar two sample mean or two mean vectors cases.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116452507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A functional approach to estimation of the parameters of generalized negative binomial and gamma distributions 广义负二项分布和γ分布参数估计的泛函方法
Pub Date : 2018-06-27 DOI: 10.1007/978-3-319-99447-5_30
A. Gorshenin, V. Korolev
{"title":"A functional approach to estimation of the parameters of generalized negative binomial and gamma distributions","authors":"A. Gorshenin, V. Korolev","doi":"10.1007/978-3-319-99447-5_30","DOIUrl":"https://doi.org/10.1007/978-3-319-99447-5_30","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129797917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A nonparametric spatial test to identify factors that shape a microbiome 一种非参数空间测试,用于识别形成微生物组的因素
Pub Date : 2018-06-16 DOI: 10.1214/19-aoas1262
Susheela P. Singh, Ana-Maria Staicu, R. Dunn, N. Fierer, B. Reich
The advent of high-throughput sequencing technologies has made data from DNA material readily available, leading to a surge of microbiome-related research establishing links between markers of microbiome health and specific outcomes. However, to harness the power of microbial communities we must understand not only how they affect us, but also how they can be influenced to improve outcomes. This area has been dominated by methods that reduce community composition to summary metrics, which can fail to fully exploit the complexity of community data. Recently, methods have been developed to model the abundance of taxa in a community, but they can be computationally intensive and do not account for spatial effects underlying microbial settlement. These spatial effects are particularly relevant in the microbiome setting because we expect communities that are close together to be more similar than those that are far apart. In this paper, we propose a flexible Bayesian spike-and-slab variable selection model for presence-absence indicators that accounts for spatial dependence and cross-dependence between taxa while reducing dimensionality in both directions. We show by simulation that in the presence of spatial dependence, popular distance-based hypothesis testing methods fail to preserve their advertised size, and the proposed method improves variable selection. Finally, we present an application of our method to an indoor fungal community found with homes across the contiguous United States.
高通量测序技术的出现使得来自DNA材料的数据容易获得,导致微生物组相关研究的激增,建立了微生物组健康标记与特定结果之间的联系。然而,为了利用微生物群落的力量,我们不仅要了解它们如何影响我们,还要了解如何影响它们以改善结果。该领域一直被将社区组成简化为汇总指标的方法所主导,这可能无法充分利用社区数据的复杂性。最近,人们已经开发了一些方法来模拟群落中分类群的丰度,但这些方法可能需要大量的计算,并且不能考虑微生物沉降的空间效应。这些空间效应在微生物组环境中尤为重要,因为我们预计距离较近的群落比距离较远的群落更相似。在本文中,我们提出了一个灵活的贝叶斯穗板变量选择模型,该模型考虑了分类群之间的空间依赖性和交叉依赖性,同时在两个方向上都降低了维数。我们通过模拟表明,在存在空间依赖性的情况下,基于距离的假设检验方法无法保持其广告大小,并且所提出的方法改进了变量选择。最后,我们提出了一个应用我们的方法,以室内真菌群落发现与家庭在美国连续。
{"title":"A nonparametric spatial test to identify factors that shape a microbiome","authors":"Susheela P. Singh, Ana-Maria Staicu, R. Dunn, N. Fierer, B. Reich","doi":"10.1214/19-aoas1262","DOIUrl":"https://doi.org/10.1214/19-aoas1262","url":null,"abstract":"The advent of high-throughput sequencing technologies has made data from DNA material readily available, leading to a surge of microbiome-related research establishing links between markers of microbiome health and specific outcomes. However, to harness the power of microbial communities we must understand not only how they affect us, but also how they can be influenced to improve outcomes. This area has been dominated by methods that reduce community composition to summary metrics, which can fail to fully exploit the complexity of community data. Recently, methods have been developed to model the abundance of taxa in a community, but they can be computationally intensive and do not account for spatial effects underlying microbial settlement. These spatial effects are particularly relevant in the microbiome setting because we expect communities that are close together to be more similar than those that are far apart. In this paper, we propose a flexible Bayesian spike-and-slab variable selection model for presence-absence indicators that accounts for spatial dependence and cross-dependence between taxa while reducing dimensionality in both directions. We show by simulation that in the presence of spatial dependence, popular distance-based hypothesis testing methods fail to preserve their advertised size, and the proposed method improves variable selection. Finally, we present an application of our method to an indoor fungal community found with homes across the contiguous United States.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Confidence ellipsoids for regression coefficients by observations from a mixture 由混合观测得到的回归系数的置信椭球
Pub Date : 2018-06-04 DOI: 10.15559/18-VMSTA105
V. Miroshnichenko, R. Maiboroda
Confidence ellipsoids for linear regression coefficients are constructed by observations from a mixture with varying concentrations. Two approaches are discussed. The first one is the nonparametric approach based on the weighted least squares technique. The second one is an approximate maximum likelihood estimation with application of the EM-algorithm for the estimates calculation.
线性回归系数的置信椭球是由对不同浓度混合物的观测构造的。讨论了两种方法。第一种是基于加权最小二乘的非参数方法。第二种是近似极大似然估计,并应用em算法进行估计计算。
{"title":"Confidence ellipsoids for regression coefficients by observations from a mixture","authors":"V. Miroshnichenko, R. Maiboroda","doi":"10.15559/18-VMSTA105","DOIUrl":"https://doi.org/10.15559/18-VMSTA105","url":null,"abstract":"Confidence ellipsoids for linear regression coefficients are constructed by observations from a mixture with varying concentrations. Two approaches are discussed. The first one is the nonparametric approach based on the weighted least squares technique. The second one is an approximate maximum likelihood estimation with application of the EM-algorithm for the estimates calculation.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126140623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bandwidth selection for kernel density estimators of multivariate level sets and highest density regions 多变量水平集和最高密度区域核密度估计器的带宽选择
Pub Date : 2018-06-03 DOI: 10.1214/18-EJS1501
Charles R. Doss, Guangwei Weng
We consider bandwidth matrix selection for kernel density estimators (KDEs) of density level sets in $mathbb{R}^d$, $d ge 2$. We also consider estimation of highest density regions, which differs from estimating level sets in that one specifies the probability content of the set rather than specifying the level directly. This complicates the problem. Bandwidth selection for KDEs is well studied, but the goal of most methods is to minimize a global loss function for the density or its derivatives. The loss we consider here is instead the measure of the symmetric difference of the true set and estimated set. We derive an asymptotic approximation to the corresponding risk. The approximation depends on unknown quantities which can be estimated, and the approximation can then be minimized to yield a choice of bandwidth, which we show in simulations performs well. We provide an R package lsbs for implementing our procedure.
我们考虑了$mathbb{R}^d$, $d ge2 $中密度水平集的核密度估计器(KDEs)的带宽矩阵选择。我们还考虑了最高密度区域的估计,它与估计水平集的不同之处在于,它指定了集合的概率内容,而不是直接指定了水平。这使问题复杂化了。kde的带宽选择得到了很好的研究,但大多数方法的目标是最小化密度或其导数的全局损失函数。我们在这里考虑的损失是对真集和估计集的对称差的度量。我们推导出相应风险的渐近近似。近似依赖于可以估计的未知量,然后可以最小化近似以产生带宽的选择,我们在模拟中证明了这一点。我们提供了一个R包lsbs来实现我们的过程。
{"title":"Bandwidth selection for kernel density estimators of multivariate level sets and highest density regions","authors":"Charles R. Doss, Guangwei Weng","doi":"10.1214/18-EJS1501","DOIUrl":"https://doi.org/10.1214/18-EJS1501","url":null,"abstract":"We consider bandwidth matrix selection for kernel density estimators (KDEs) of density level sets in $mathbb{R}^d$, $d ge 2$. We also consider estimation of highest density regions, which differs from estimating level sets in that one specifies the probability content of the set rather than specifying the level directly. This complicates the problem. Bandwidth selection for KDEs is well studied, but the goal of most methods is to minimize a global loss function for the density or its derivatives. The loss we consider here is instead the measure of the symmetric difference of the true set and estimated set. We derive an asymptotic approximation to the corresponding risk. The approximation depends on unknown quantities which can be estimated, and the approximation can then be minimized to yield a choice of bandwidth, which we show in simulations performs well. We provide an R package lsbs for implementing our procedure.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128486075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Anchored Bayesian Gaussian mixture models 锚定贝叶斯高斯混合模型
Pub Date : 2018-05-21 DOI: 10.1214/20-ejs1756
D. Kunkel, M. Peruggia
Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.
有限混合是一种灵活的建模工具,用于不规则形状的密度和来自异质种群的样本。当使用组件特征上的可交换先验进行混合建模时,组件标签是任意的,并且在后验分析中无法区分。这使得不可能将任何有意义的解释归因于组成特征的边际后验分布。我们提出了一个模型,在这个模型中,假设少量的观察结果来自一些标记的成分密度。生成的模型是不可交换的,允许在没有后处理的情况下对组件特征进行推断。我们的方法在建模阶段为组件标签分配了意义,并且可以证明为标签上的数据依赖信息。我们表明,我们的方法产生了可解释的结果,通常(但并不总是)与重新标记算法产生的结果相似,附带的好处是,边际推断直接来自一个明确指定的概率模型,而不是事后操纵。我们提供了渐近的结果,从而为模型选择提供了实用的指导方针,这些指导方针是通过最大化关于类标签的先验信息来激励的,并在真实和模拟数据上展示了我们的方法。
{"title":"Anchored Bayesian Gaussian mixture models","authors":"D. Kunkel, M. Peruggia","doi":"10.1214/20-ejs1756","DOIUrl":"https://doi.org/10.1214/20-ejs1756","url":null,"abstract":"Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126166024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
BART with targeted smoothing: An analysis of patient-specific stillbirth risk BART与目标平滑:患者特异性死产风险的分析
Pub Date : 2018-05-19 DOI: 10.1214/19-aoas1268
Jennifer Starling, Jared S. Murray, C. Carvalho, R. Bukowski, J. Scott
This article introduces BART with Targeted Smoothing, or tsBART, a new Bayesian tree-based model for nonparametric regression. The goal of tsBART is to introduce smoothness over a single target covariate t, while not necessarily requiring smoothness over other covariates x. TsBART is based on the Bayesian Additive Regression Trees (BART) model, an ensemble of regression trees. TsBART extends BART by parameterizing each tree's terminal nodes with smooth functions of t, rather than independent scalars. Like BART, tsBART captures complex nonlinear relationships and interactions among the predictors. But unlike BART, tsBART guarantees that the response surface will be smooth in the target covariate. This improves interpretability and helps regularize the estimate. After introducing and benchmarking the tsBART model, we apply it to our motivating example: pregnancy outcomes data from the National Center for Health Statistics. Our aim is to provide patient-specific estimates of stillbirth risk across gestational age (t), based on maternal and fetal risk factors (x). Obstetricians expect stillbirth risk to vary smoothly over gestational age, but not necessarily over other covariates, and tsBART has been designed precisely to reflect this structural knowledge. The results of our analysis show the clear superiority of the tsBART model for quantifying stillbirth risk, thereby providing patients and doctors with better information for managing the risk of perinatal mortality. All methods described here are implemented in the R package tsbart.
本文介绍了BART与目标平滑,或tsBART,一种新的基于贝叶斯树的非参数回归模型。tsBART的目标是在单个目标协变量t上引入平滑性,而不一定要求在其他协变量x上实现平滑。tsBART基于贝叶斯加性回归树(BART)模型,这是回归树的集合。TsBART通过使用t的光滑函数(而不是独立的标量)参数化每棵树的终端节点来扩展BART。与BART一样,tsBART捕获了预测因子之间复杂的非线性关系和相互作用。但与BART不同的是,tsBART保证响应面在目标协变量中是光滑的。这提高了可解释性,并有助于使估计规范化。在引入tsBART模型并对其进行基准测试后,我们将其应用于我们的激励示例:来自国家卫生统计中心的妊娠结局数据。我们的目的是根据产妇和胎儿的危险因素(x),提供患者对整个妊娠期死产风险的具体估计(t)。产科医生期望死产风险随妊娠期平稳变化,但不一定随其他协变量变化,tsBART的设计正是为了反映这一结构知识。我们的分析结果显示tsBART模型在量化死产风险方面具有明显的优势,从而为患者和医生提供更好的信息来管理围产期死亡风险。这里描述的所有方法都在R包tsbart中实现。
{"title":"BART with targeted smoothing: An analysis of patient-specific stillbirth risk","authors":"Jennifer Starling, Jared S. Murray, C. Carvalho, R. Bukowski, J. Scott","doi":"10.1214/19-aoas1268","DOIUrl":"https://doi.org/10.1214/19-aoas1268","url":null,"abstract":"This article introduces BART with Targeted Smoothing, or tsBART, a new Bayesian tree-based model for nonparametric regression. The goal of tsBART is to introduce smoothness over a single target covariate t, while not necessarily requiring smoothness over other covariates x. TsBART is based on the Bayesian Additive Regression Trees (BART) model, an ensemble of regression trees. TsBART extends BART by parameterizing each tree's terminal nodes with smooth functions of t, rather than independent scalars. Like BART, tsBART captures complex nonlinear relationships and interactions among the predictors. But unlike BART, tsBART guarantees that the response surface will be smooth in the target covariate. This improves interpretability and helps regularize the estimate. \u0000After introducing and benchmarking the tsBART model, we apply it to our motivating example: pregnancy outcomes data from the National Center for Health Statistics. Our aim is to provide patient-specific estimates of stillbirth risk across gestational age (t), based on maternal and fetal risk factors (x). Obstetricians expect stillbirth risk to vary smoothly over gestational age, but not necessarily over other covariates, and tsBART has been designed precisely to reflect this structural knowledge. The results of our analysis show the clear superiority of the tsBART model for quantifying stillbirth risk, thereby providing patients and doctors with better information for managing the risk of perinatal mortality. All methods described here are implemented in the R package tsbart.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132485321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data 利用高维异构数据学习基因调控网络
Pub Date : 2018-05-07 DOI: 10.1007/978-3-319-99389-8_15
B. Jia, F. Liang
{"title":"Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data","authors":"B. Jia, F. Liang","doi":"10.1007/978-3-319-99389-8_15","DOIUrl":"https://doi.org/10.1007/978-3-319-99389-8_15","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"195 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114059487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
arXiv: Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1