首页 > 最新文献

Statistical Modelling最新文献

英文 中文
Spatial smoothing revisited: An application to rental data in Munich 空间平滑重新审视:慕尼黑租赁数据的应用
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-18 DOI: 10.1177/1471082x231158465
L. Fahrmeir, G. Kauermann, G. Tutz, Michael Windmann
Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996) , provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial additive regression and employ them for analysing rental data in Munich. The first method applies tensor product P-splines to the geolocation of apartments, measured on a continuous scale through the centroid of the quarter where an apartment is. The alternative approach exploits the neighbourhood structure of districts on a discrete scale, where districts consist of a set of neighbouring quarters. The discrete modelling approach yields smooth estimates when using ridge-type penalties but can also enforce spatial clustering of districts with a homogeneous structure when using Lasso-type penalties.
空间平滑利用空间信息在回归模型中获得更好的估计。特别是Eilers和Marx(1996)提出的具有B样条和惩罚的灵活平滑,提供了强大的工具,可用于包括可用的空间信息。我们考虑了空间加性回归中的替代平滑方法,并将其用于分析慕尼黑的租金数据。第一种方法将张量积P样条应用于公寓的地理位置,通过公寓所在区域的质心在连续尺度上进行测量。另一种方法利用离散尺度上的区域邻里结构,其中区域由一组相邻区域组成。离散建模方法在使用山脊型惩罚时产生平滑的估计,但在使用拉索型惩罚时,也可以强制对具有同质结构的地区进行空间聚类。
{"title":"Spatial smoothing revisited: An application to rental data in Munich","authors":"L. Fahrmeir, G. Kauermann, G. Tutz, Michael Windmann","doi":"10.1177/1471082x231158465","DOIUrl":"https://doi.org/10.1177/1471082x231158465","url":null,"abstract":"Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996) , provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial additive regression and employ them for analysing rental data in Munich. The first method applies tensor product P-splines to the geolocation of apartments, measured on a continuous scale through the centroid of the quarter where an apartment is. The alternative approach exploits the neighbourhood structure of districts on a discrete scale, where districts consist of a set of neighbouring quarters. The discrete modelling approach yields smooth estimates when using ridge-type penalties but can also enforce spatial clustering of districts with a homogeneous structure when using Lasso-type penalties.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46273613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tensor product P-splines using a sparse mixed model formulation 使用稀疏混合模型公式的张量乘积P样条
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-18 DOI: 10.1177/1471082x231178591
M. Boer
A new approach to represent P-splines as a mixed model is presented. The corresponding matrices are sparse allowing the new approach can find the optimal values of the penalty parameters in a computationally efficient manner. Whereas the new mixed model P-splines formulation is similar to the original P-splines, a key difference is that the fixed effects are modelled explicitly, and extra constraints are added to the random part of the model. An important feature ensuring that the entire computation is fast is a sparse implementation of the Automated Differentiation of the Cholesky algorithm. It is shown by means of two examples that the new approach is fast compared to existing methods. The methodology has been implemented in the R-package LMMsolver available on CRAN ( https://CRAN.R-project.org/package=LMMsolver ).
提出了一种将p样条曲线表示为混合模型的新方法。相应的矩阵是稀疏的,使得新方法能够以高效的计算方式找到惩罚参数的最优值。尽管新的混合模型p样条公式与原始的p样条公式相似,但一个关键的区别是固定效应被明确地建模,并且在模型的随机部分添加了额外的约束。确保整个计算快速的一个重要特征是对Cholesky算法的自动微分的稀疏实现。通过两个算例表明,与现有方法相比,新方法具有较快的速度。该方法已在CRAN (https://CRAN.R-project.org/package=LMMsolver)上提供的r包LMMsolver中实现。
{"title":"Tensor product P-splines using a sparse mixed model formulation","authors":"M. Boer","doi":"10.1177/1471082x231178591","DOIUrl":"https://doi.org/10.1177/1471082x231178591","url":null,"abstract":"A new approach to represent P-splines as a mixed model is presented. The corresponding matrices are sparse allowing the new approach can find the optimal values of the penalty parameters in a computationally efficient manner. Whereas the new mixed model P-splines formulation is similar to the original P-splines, a key difference is that the fixed effects are modelled explicitly, and extra constraints are added to the random part of the model. An important feature ensuring that the entire computation is fast is a sparse implementation of the Automated Differentiation of the Cholesky algorithm. It is shown by means of two examples that the new approach is fast compared to existing methods. The methodology has been implemented in the R-package LMMsolver available on CRAN ( https://CRAN.R-project.org/package=LMMsolver ).","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47324621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear or smooth? Enhanced model choice in boosting via deselection of base-learners 线性的还是平滑的?通过取消基础学习器来增强模型选择
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-18 DOI: 10.1177/1471082x231170045
A. Mayr, T. Wistuba, Jan Speller, F. Gudé, B. Hofner
The specification of a particular type of effect (e.g., linear or non-linear) of a covariate in a regression model can be either based on graphical assessment, subject matter knowledge or also on data-driven model choice procedures. For the latter variant, we present a boosting approach that is available for a huge number of different model classes. Boosting is an indirect regularization technique that leads to variable selection and can easily incorporate also non-linear or smooth effects. Furthermore, the algorithm can be adapted in a way to automatically select whether to model a continuous variable with a smooth or a linear effect. We enhance this model choice procedure by trying to compensate the inherent bias towards the more complex effect by incorporating a pragmatic and simple deselection technique that was originally implemented for enhanced variable selection. We illustrate our approach in the analysis of T3 thyroid hormone levels from a larger Galician cohort and investigate its performance in a simulation study.
回归模型中协变量的特定类型效应(例如线性或非线性)的说明可以基于图形评估、主题知识或数据驱动的模型选择程序。对于后一种变体,我们提出了一种可用于大量不同模型类的增强方法。增强是一种间接正则化技术,它导致变量选择,并且可以很容易地纳入非线性或平滑效果。此外,该算法还可以自动选择是否对具有平滑或线性效果的连续变量进行建模。我们通过整合一种实用而简单的取消选择技术(最初是为了增强变量选择而实现的),试图补偿对更复杂效应的固有偏见,从而增强了这种模型选择过程。我们在一个较大的加利西亚队列的T3甲状腺激素水平分析中说明了我们的方法,并在模拟研究中调查了其性能。
{"title":"Linear or smooth? Enhanced model choice in boosting via deselection of base-learners","authors":"A. Mayr, T. Wistuba, Jan Speller, F. Gudé, B. Hofner","doi":"10.1177/1471082x231170045","DOIUrl":"https://doi.org/10.1177/1471082x231170045","url":null,"abstract":"The specification of a particular type of effect (e.g., linear or non-linear) of a covariate in a regression model can be either based on graphical assessment, subject matter knowledge or also on data-driven model choice procedures. For the latter variant, we present a boosting approach that is available for a huge number of different model classes. Boosting is an indirect regularization technique that leads to variable selection and can easily incorporate also non-linear or smooth effects. Furthermore, the algorithm can be adapted in a way to automatically select whether to model a continuous variable with a smooth or a linear effect. We enhance this model choice procedure by trying to compensate the inherent bias towards the more complex effect by incorporating a pragmatic and simple deselection technique that was originally implemented for enhanced variable selection. We illustrate our approach in the analysis of T3 thyroid hormone levels from a larger Galician cohort and investigate its performance in a simulation study.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41395613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint modelling of non-crossing additive quantile regression via constrained B-spline varying coefficients 约束b样条变系数法非交叉加性分位数回归联合建模
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-08-14 DOI: 10.1177/1471082x231181734
V. Muggeo, G. Sottile, G. Cilluffo
We present a unified framework able to fit the entire quantile process, namely to estimate simultaneously multiple non-crossing quantile curves. The framework relies on assuming each regression parameter varies smoothly across the percentile direction according to B-splines whose coefficients obey proper restrictions. Multiple linear and penalized smooth terms are allowed and the corresponding tuning parameters are estimated efficiently as part of the model fitting. Monotonicity and concavity constraints on the smoothed relationships are also easily accounted for in the framework. Simulation results provide evidence our proposal exhibits good statistical performance with respect to competitors while guaranteeing the non-crossing property and modest computational load. Analyses on a real dataset related to vocabulary size growth are presented to illustrate the model capability in practice.
我们提出了一个能够拟合整个分位数过程的统一框架,即同时估计多个不相交的分位数曲线。该框架依赖于假设每个回归参数根据系数服从适当限制的B样条在百分位方向上平滑变化。允许多个线性和惩罚平滑项,并且作为模型拟合的一部分有效地估计相应的调谐参数。平滑关系上的单调性和凹度约束也很容易在框架中得到解释。仿真结果证明,我们的方案相对于竞争对手表现出良好的统计性能,同时保证了非交叉性和适度的计算负载。对与词汇大小增长相关的真实数据集进行了分析,以说明模型在实践中的能力。
{"title":"Joint modelling of non-crossing additive quantile regression via constrained B-spline varying coefficients","authors":"V. Muggeo, G. Sottile, G. Cilluffo","doi":"10.1177/1471082x231181734","DOIUrl":"https://doi.org/10.1177/1471082x231181734","url":null,"abstract":"We present a unified framework able to fit the entire quantile process, namely to estimate simultaneously multiple non-crossing quantile curves. The framework relies on assuming each regression parameter varies smoothly across the percentile direction according to B-splines whose coefficients obey proper restrictions. Multiple linear and penalized smooth terms are allowed and the corresponding tuning parameters are estimated efficiently as part of the model fitting. Monotonicity and concavity constraints on the smoothed relationships are also easily accounted for in the framework. Simulation results provide evidence our proposal exhibits good statistical performance with respect to competitors while guaranteeing the non-crossing property and modest computational load. Analyses on a real dataset related to vocabulary size growth are presented to illustrate the model capability in practice.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46411772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian semiparametric mixed effects proportional hazards model for clustered partly interval-censored data 部分区间截尾数据的贝叶斯半参数混合效应比例风险模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-07-24 DOI: 10.1177/1471082x231165559
Chun Pan, B. Cai
Clustered partly interval-censored survival data naturally arise from many medical and epidemiological studies. We propose a Bayesian semiparametric approach for fitting a mixed effects proportional hazards (PH) model to clustered partly interval-censored data. The proposed method allows for not only a random intercept as most frailty models do for clustered survival data, but also random effects of covariates. We assume a normal prior for each random intercept/random effect, seeing the instability of a gamma prior for a frailty in this situation. Simulation studies with data generated from both mixed effects PH model and mixed effects accelerated failure times model are conducted, to evaluate the performance of the proposed method and compare it with the three methods currently available in the literature. The application of the proposed approach is illustrated through analyzing the progression-free survival data derived from a phase III metastatic colorectal cancer clinical trial.
许多医学和流行病学研究自然产生了部分区间截尾的聚类生存数据。我们提出了一种贝叶斯半参数方法,用于将混合效应比例风险(PH)模型拟合到聚类的部分区间截尾数据。所提出的方法不仅允许像大多数脆弱性模型对聚类生存数据所做的那样进行随机截距,还允许协变量的随机效应。我们假设每个随机截距/随机效应都有一个正常的先验,看到了在这种情况下脆弱性的伽马先验的不稳定性。利用混合效应PH模型和混合效应加速失效时间模型产生的数据进行了模拟研究,以评估所提出方法的性能,并将其与目前文献中可用的三种方法进行比较。通过分析来源于转移性结直肠癌癌症III期临床试验的无进展生存数据,说明了所提出方法的应用。
{"title":"Bayesian semiparametric mixed effects proportional hazards model for clustered partly interval-censored data","authors":"Chun Pan, B. Cai","doi":"10.1177/1471082x231165559","DOIUrl":"https://doi.org/10.1177/1471082x231165559","url":null,"abstract":"Clustered partly interval-censored survival data naturally arise from many medical and epidemiological studies. We propose a Bayesian semiparametric approach for fitting a mixed effects proportional hazards (PH) model to clustered partly interval-censored data. The proposed method allows for not only a random intercept as most frailty models do for clustered survival data, but also random effects of covariates. We assume a normal prior for each random intercept/random effect, seeing the instability of a gamma prior for a frailty in this situation. Simulation studies with data generated from both mixed effects PH model and mixed effects accelerated failure times model are conducted, to evaluate the performance of the proposed method and compare it with the three methods currently available in the literature. The application of the proposed approach is illustrated through analyzing the progression-free survival data derived from a phase III metastatic colorectal cancer clinical trial.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49442378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust clustering based on finite mixture of multivariate fragmental distributions 基于多元碎片分布有限混合的稳健聚类
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 DOI: 10.1177/1471082X211048660
M. Maleki, G. McLachlan, Sharon X. Lee
A flexible class of multivariate distributions called scale mixtures of fragmental normal (SMFN) distributions, is introduced. Its extension to the case of a finite mixture of SMFN (FM-SMFN) distributions is also proposed. The SMFN family of distributions is convenient and effective for modelling data with skewness, discrepant observations and population heterogeneity. It also possesses some other desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. A stochastic representation of the SMFN distribution is given and then a hierarchical representation is described, the latter aids in parameter estimation, derivation of statistical properties and simulations. Maximum likelihood estimation of the FM-SMFN distribution via the expectation–maximization (EM) algorithm is outlined before the clustering performance of the proposed mixture model is illustrated using simulated and real datasets. In particular, the ability of FM-SMFN distributions to model data generated from various well-known families is demonstrated.
介绍了一类灵活的多元分布,即碎片正态分布的尺度混合分布。并将其推广到有限混合SMFN分布(FM-SMFN)的情况。SMFN分布族对于具有偏度、差异观测值和总体异质性的数据建模方便有效。它还具有其他一些令人满意的特性,包括可解析处理的密度和易于模拟和估计参数的计算。给出了SMFN分布的随机表示,然后描述了分层表示,后者有助于参数估计,统计性质的推导和模拟。通过期望最大化(EM)算法对FM-SMFN分布的最大似然估计进行了概述,然后用模拟和实际数据集说明了所提出的混合模型的聚类性能。特别是,FM-SMFN分布对来自各种知名家族的数据进行建模的能力得到了证明。
{"title":"Robust clustering based on finite mixture of multivariate fragmental distributions","authors":"M. Maleki, G. McLachlan, Sharon X. Lee","doi":"10.1177/1471082X211048660","DOIUrl":"https://doi.org/10.1177/1471082X211048660","url":null,"abstract":"A flexible class of multivariate distributions called scale mixtures of fragmental normal (SMFN) distributions, is introduced. Its extension to the case of a finite mixture of SMFN (FM-SMFN) distributions is also proposed. The SMFN family of distributions is convenient and effective for modelling data with skewness, discrepant observations and population heterogeneity. It also possesses some other desirable properties, including an analytically tractable density and ease of computation for simulation and estimation of parameters. A stochastic representation of the SMFN distribution is given and then a hierarchical representation is described, the latter aids in parameter estimation, derivation of statistical properties and simulations. Maximum likelihood estimation of the FM-SMFN distribution via the expectation–maximization (EM) algorithm is outlined before the clustering performance of the proposed mixture model is illustrated using simulated and real datasets. In particular, the ability of FM-SMFN distributions to model data generated from various well-known families is demonstrated.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"247 - 272"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46752796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Canonical correlation analysis in high dimensions with structured regularization. 基于结构化正则化的高维典型相关分析。
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-06-01 DOI: 10.1177/1471082x211041033
Elena Tuzhilina, Leonardo Tozzi, Trevor Hastie

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an 2 penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.

典型相关分析(CCA)是一种测量两个多变量数据矩阵之间关联的技术。典型相关分析(RCCA)的正则化修正在典型相关分析系数上施加一个l2惩罚,被广泛应用于高维数据的应用。这种正则化的一个限制是它忽略任何数据结构,平等地对待所有特征,这可能不适合某些应用程序。在本文中,我们将介绍几种考虑底层数据结构的正则化CCA的方法。特别是,所提出的组正则化典型相关分析(GRCCA)在变量在组中相关时非常有用。我们举例说明了一些计算策略,以避免在高维正则化CCA中过度计算。我们演示了这些方法在神经科学的激励应用中的应用,以及一个小的模拟示例。
{"title":"Canonical correlation analysis in high dimensions with structured regularization.","authors":"Elena Tuzhilina,&nbsp;Leonardo Tozzi,&nbsp;Trevor Hastie","doi":"10.1177/1471082x211041033","DOIUrl":"https://doi.org/10.1177/1471082x211041033","url":null,"abstract":"<p><p>Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an <i>ℓ</i><sub>2</sub> penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.</p>","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 3","pages":"203-227"},"PeriodicalIF":1.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10274416/pdf/nihms-1834734.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A multilevel analysis of real estate valuation using distributional and quantile regression 基于分布和分位数回归的房地产估价多层次分析
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-18 DOI: 10.1177/1471082x231157205
Alexander Razen, Wolfgang A. Brunauer, N. Klein, T. Kneib, S. Lang, Nikolaus Umlauf
Real estate valuation is typically based on hedonic regression models where the expected price of a property is explained in dependence of its attributes. However, investors in the housing market are equally interested in the distribution of real estate market values (including price variation), that is, determining the impact of attributes of a property on the entire conditional distribution. We therefore consider Bayesian structured additive distributional and quantile regression models for real estate valuation. In the first approach, each parameter of a potentially complex parametric response distribution is related to a structured additive predictor. In contrast, the second approach proceeds differently and models arbitrary quantiles of the response distribution directly and nonparametrically. Both models presented are based on a multilevel version of structured additive regression thereby utilizing the typical hierarchical structure of real estate data. We demonstrate the proposed methodology within a detailed case study based on more than 3 000 owner-occupied single family homes in Austria, discuss interpretation of the resulting effect estimates, and compare models based on their predictive ability.
房地产估价通常基于特征回归模型,其中房地产的预期价格根据其属性进行解释。然而,住房市场的投资者对房地产市场价值的分布(包括价格变化)同样感兴趣,即确定一处房产的属性对整个条件分布的影响。因此,我们考虑房地产估价的贝叶斯结构加性分布和分位数回归模型。在第一种方法中,潜在复杂的参数响应分布的每个参数都与结构化的加性预测器有关。相反,第二种方法进行得不同,并直接和非框架地对响应分布的任意分位数进行建模。所提出的两个模型都基于结构化加性回归的多级版本,从而利用了房地产数据的典型层次结构。我们在一项基于奥地利3000多套自住独栋住宅的详细案例研究中展示了所提出的方法,讨论了对由此产生的影响估计的解释,并根据其预测能力对模型进行了比较。
{"title":"A multilevel analysis of real estate valuation using distributional and quantile regression","authors":"Alexander Razen, Wolfgang A. Brunauer, N. Klein, T. Kneib, S. Lang, Nikolaus Umlauf","doi":"10.1177/1471082x231157205","DOIUrl":"https://doi.org/10.1177/1471082x231157205","url":null,"abstract":"Real estate valuation is typically based on hedonic regression models where the expected price of a property is explained in dependence of its attributes. However, investors in the housing market are equally interested in the distribution of real estate market values (including price variation), that is, determining the impact of attributes of a property on the entire conditional distribution. We therefore consider Bayesian structured additive distributional and quantile regression models for real estate valuation. In the first approach, each parameter of a potentially complex parametric response distribution is related to a structured additive predictor. In contrast, the second approach proceeds differently and models arbitrary quantiles of the response distribution directly and nonparametrically. Both models presented are based on a multilevel version of structured additive regression thereby utilizing the typical hierarchical structure of real estate data. We demonstrate the proposed methodology within a detailed case study based on more than 3 000 owner-occupied single family homes in Austria, discuss interpretation of the resulting effect estimates, and compare models based on their predictive ability.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45655548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multidimensional beta-binomial regression model: A joint analysis of patient-reported outcomes 多维β-二项回归模型:患者报告结果的联合分析
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-04-14 DOI: 10.1177/1471082x231151311
J. Najera-Zuloaga, Dae-Jin Lee, C. Esteban, I. Arostegui
Patient-reported outcomes (PROs) are often used as primary outcomes in clinical research studies. PROs are usually measured in ordinal scales and they tend to have excess variability beyond the binomial distribution, a property called overdispersion. Beta-binomial distribution has been previously proposed in this context in order to fit PROs, and beta-binomial regression (BBR) as a good alternative for modelling purposes, including the extension to mixed-effects models in a longitudinal framework. Many PROs have various health dimensions, which are commonly correlated within subjects. However, in clinical analysis, dimensions are separately analysed. In this work, we propose a multidimensional BBR model that incorporates a multidimensional outcome including several PROs in a joint analysis. The proposal has been evaluated and compared to the independent analysis through a simulation study and a real data application with patients with respiratory disease. Results show the advantages that a multidimensional approach offers in terms of parameter significance and interpretation. Additionally, the methods proposed in this work are implemented in the PROreg R-package developed by the authors.
患者报告结果(PROs)通常被用作临床研究的主要结果。PROs通常是在有序尺度上测量的,它们往往具有超出二项式分布的过度可变性,这种特性被称为过度分散。以前曾在这种情况下提出过β-二项式分布,以适应PROs,β-二项回归(BBR)是建模目的的一个很好的替代方案,包括在纵向框架中扩展到混合效应模型。许多PROs具有不同的健康维度,这些维度在受试者中通常是相关的。然而,在临床分析中,维度是单独分析的。在这项工作中,我们提出了一个多维BBR模型,该模型在联合分析中包含了包括多个PROs在内的多维结果。通过模拟研究和呼吸系统疾病患者的真实数据应用,对该提案进行了评估,并与独立分析进行了比较。结果表明,多维方法在参数显著性和解释方面具有优势。此外,本文中提出的方法在作者开发的PROreg R包中实现。
{"title":"Multidimensional beta-binomial regression model: A joint analysis of patient-reported outcomes","authors":"J. Najera-Zuloaga, Dae-Jin Lee, C. Esteban, I. Arostegui","doi":"10.1177/1471082x231151311","DOIUrl":"https://doi.org/10.1177/1471082x231151311","url":null,"abstract":"Patient-reported outcomes (PROs) are often used as primary outcomes in clinical research studies. PROs are usually measured in ordinal scales and they tend to have excess variability beyond the binomial distribution, a property called overdispersion. Beta-binomial distribution has been previously proposed in this context in order to fit PROs, and beta-binomial regression (BBR) as a good alternative for modelling purposes, including the extension to mixed-effects models in a longitudinal framework. Many PROs have various health dimensions, which are commonly correlated within subjects. However, in clinical analysis, dimensions are separately analysed. In this work, we propose a multidimensional BBR model that incorporates a multidimensional outcome including several PROs in a joint analysis. The proposal has been evaluated and compared to the independent analysis through a simulation study and a real data application with patients with respiratory disease. Results show the advantages that a multidimensional approach offers in terms of parameter significance and interpretation. Additionally, the methods proposed in this work are implemented in the PROreg R-package developed by the authors.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42363693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A two-part measurement error model to estimate participation in undeclared work and related earnings 一个由两部分组成的测量误差模型,用于估计参与未申报的工作和相关收入
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-02-07 DOI: 10.1177/1471082x221145240
Maria Felice Arezzo, Serena Arima, G. Guagnano
In undeclared work research, the estimation of the magnitude of the phenomenon (i.e., the amount of income and/or the percentage of workers involved) is of major interest. This has been done either using indirect methods or by means of ad hoc surveys such as the Eurobarometer special survey on undeclared work, our motivating study. The extent of undeclared work can be measured by means of two different outcomes: the event of working off-the-book (binary variable) and, when the event occurs, the amount of earnings deriving from the undeclared activity (continuous variable). This setup has been typically modeled via the so called two-part model: a binary choice model for the probability of observing a positive-versus-zero outcome and then, conditional on a positive outcome, a regression model for the positive outcome. We propose an extension of the two-part model that goes in two directions. The first regards the measurement error that, given the very nature of undeclared activities, is most likely to affect both the outcomes of interest. The second is that we generalize the linear regression part of the model to allow individual-level means. We also conduct an extensive simulation study to investigate the performance of the proposed model and compare it with traditional approaches.
在未申报的工作研究中,估计这种现象的规模(即收入数额和/或所涉工人的百分比)是一个重要的问题。这要么是通过间接方法,要么是通过特别调查的方式完成的,比如欧洲晴雨表对未申报工作的特别调查,这是我们的激励研究。未申报工作的程度可以通过两种不同的结果来衡量:表外工作事件(二元变量),以及当事件发生时,从未申报活动中获得的收益金额(连续变量)。这种设置通常是通过所谓的两部分模型来建模的:一个是观察到正与零结果的概率的二元选择模型,然后,以积极结果为条件,一个是积极结果的回归模型。我们提出了两个方向的两部分模型的扩展。第一个考虑到度量误差,考虑到未申报活动的本质,它最有可能影响两个结果。其次,我们推广了模型的线性回归部分,以允许个人水平的均值。我们还进行了广泛的仿真研究,以调查所提出的模型的性能,并将其与传统方法进行比较。
{"title":"A two-part measurement error model to estimate participation in undeclared work and related earnings","authors":"Maria Felice Arezzo, Serena Arima, G. Guagnano","doi":"10.1177/1471082x221145240","DOIUrl":"https://doi.org/10.1177/1471082x221145240","url":null,"abstract":"In undeclared work research, the estimation of the magnitude of the phenomenon (i.e., the amount of income and/or the percentage of workers involved) is of major interest. This has been done either using indirect methods or by means of ad hoc surveys such as the Eurobarometer special survey on undeclared work, our motivating study. The extent of undeclared work can be measured by means of two different outcomes: the event of working off-the-book (binary variable) and, when the event occurs, the amount of earnings deriving from the undeclared activity (continuous variable). This setup has been typically modeled via the so called two-part model: a binary choice model for the probability of observing a positive-versus-zero outcome and then, conditional on a positive outcome, a regression model for the positive outcome. We propose an extension of the two-part model that goes in two directions. The first regards the measurement error that, given the very nature of undeclared activities, is most likely to affect both the outcomes of interest. The second is that we generalize the linear regression part of the model to allow individual-level means. We also conduct an extensive simulation study to investigate the performance of the proposed model and compare it with traditional approaches.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47608085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Modelling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1