首页 > 最新文献

Statistical Modelling最新文献

英文 中文
Block models for generalized multipartite networks: Applications in ecology and ethnobiology 广义多方网络的块模型:在生态学和民族生物学中的应用
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-12-18 DOI: 10.1177/1471082X20963254
A. Bar-Hen, P. Barbillon, S. Donnet
Generalized multipartite networks consist in the joint observation of several networks implying some common pre-specified groups of individuals. Such complex networks arise commonly in social sciences, biology, ecology, etc. We propose a flexible probabilistic model named Multipartite Block Model (MBM) able to unravel the topology of multipartite networks by identifying clusters (blocks) of nodes sharing the same patterns of connectivity across the collection of networks they are involved in. The model parameters are estimated through a variational version of the Expectation–Maximization algorithm. The numbers of blocks are chosen using an Integrated Completed Likelihood criterion specifically designed for our model. A simulation study illustrates the robustness of the inference strategy. Finally, two datasets respectively issued from ecology and ethnobiology are analyzed with the MBM in order to illustrate its flexibility and its relevance for the analysis of real datasets. The inference procedure is implemented in an R-package GREMLIN, available on Github (https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN).
广义多部分网络包括对几个网络的联合观察,这些网络意味着一些共同的预先指定的个体群体。这种复杂的网络通常出现在社会科学、生物学、生态学等领域。我们提出了一种名为多部分块模型(MBM)的灵活概率模型,该模型能够通过识别在所涉及的网络集合中共享相同连接模式的节点集群(块)来解开多部分网络的拓扑结构。通过期望-最大化算法的变分版本来估计模型参数。块的数量是使用专门为我们的模型设计的综合完全似然准则来选择的。仿真研究表明了推理策略的稳健性。最后,用MBM对生态学和民族生物学分别发布的两个数据集进行了分析,以说明其灵活性及其与真实数据集分析的相关性。推理过程在Github上提供的R包GREMLIN中实现(https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN)。
{"title":"Block models for generalized multipartite networks: Applications in ecology and ethnobiology","authors":"A. Bar-Hen, P. Barbillon, S. Donnet","doi":"10.1177/1471082X20963254","DOIUrl":"https://doi.org/10.1177/1471082X20963254","url":null,"abstract":"Generalized multipartite networks consist in the joint observation of several networks implying some common pre-specified groups of individuals. Such complex networks arise commonly in social sciences, biology, ecology, etc. We propose a flexible probabilistic model named Multipartite Block Model (MBM) able to unravel the topology of multipartite networks by identifying clusters (blocks) of nodes sharing the same patterns of connectivity across the collection of networks they are involved in. The model parameters are estimated through a variational version of the Expectation–Maximization algorithm. The numbers of blocks are chosen using an Integrated Completed Likelihood criterion specifically designed for our model. A simulation study illustrates the robustness of the inference strategy. Finally, two datasets respectively issued from ecology and ethnobiology are analyzed with the MBM in order to illustrate its flexibility and its relevance for the analysis of real datasets. The inference procedure is implemented in an R-package GREMLIN, available on Github (https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN).","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"273 - 296"},"PeriodicalIF":1.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20963254","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44560116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Spatial survival modelling of business re-opening after Katrina: Survival modelling compared to spatial probit modelling of re-opening within 3, 6 or 12 months 卡特里娜飓风后企业重新开业的空间生存模型:3、6或12个月内重新开业的生存模型与空间概率模型的比较
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-12-15 DOI: 10.1177/1471082X20967158
R. Bivand, V. Gómez‐Rubio
Zhou and Hanson; Zhou and Hanson; Zhou and Hanson (2015, Nonparametric Bayesian Inference in Biostatistics, pages 215–46. Cham: Springer; 2018, Journal of the American Statistical Association, 113, 571–81; 2020, spBayesSurv: Bayesian Modeling and Analysis of Spatially Correlated Survival Data. R package version 1.1.4) and Zhou et al. (2020, Journal of Statistical Software, Articles, 92, 1–33) present methods for estimating spatial survival models using areal data. This article applies their methods to a dataset recording New Orleans business decisions to re-open after Hurricane Katrina; the data were included in LeSage et al. (2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27). In two articles (LeSage etal., 2011a, Significance, 8, 160—63; 2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27), spatial probit models are used to model spatial dependence in this dataset, with decisions to re-open aggregated to the first 90, 180 and 360 days. We re-cast the problem as one of examining the time-to-event records in the data, right-censored as observations ceased before 175 businesses had re-opened; we omit businesses already re-opened when observations began on Day 41. We are interested in checking whether the conclusions about the covariates using aspatial and spatial probit models are modified when applying survival and spatial survival models estimated using MCMC and INLA. In general, we find that the same covariates are associated with re-opening decisions in both modelling approaches. We do however find that data collected from three streets differ substantially, and that the streets are probably better handled separately or that the street effect should be included explicitly.
周和汉森;周和汉森;周和汉森(2015,生物统计学中的非参数贝叶斯推断,第215–46页。查姆:施普林格;2018年,《美国统计协会杂志》,113571–81;2020,spBayesSurv:空间相关生存数据的贝叶斯建模和分析。R软件包1.1.4版)和周等人(2020,《统计软件杂志》,文章,92,1-33)提出了使用区域数据估计空间生存模型的方法。本文将他们的方法应用于一个数据集,该数据集记录了卡特里娜飓风后新奥尔良重新开业的商业决策;数据包含在LeSage等人(2011b,英国皇家统计学会杂志:A系列(社会统计),1741007-27)中。在两篇文章中(LeSage et al.,2011a,Significance,8160-63;2011b,Journal of the Royal Statistical Society:Series A(Statistics In Society),1741007-27),空间概率模型用于对该数据集中的空间依赖性进行建模,并决定在前90、180和360天重新开放。我们将这个问题重新描述为检查数据中的事件时间记录,在175家企业重新开业之前,由于观察结果停止,因此对其进行了严格审查;我们忽略了第41天开始观察时已经重新开业的企业。当应用使用MCMC和INLA估计的生存率和空间生存率模型时,我们有兴趣检查使用空间概率和空间概率模型的关于协变量的结论是否被修改。通常,我们发现在两种建模方法中,相同的协变量与重新开放决策相关。然而,我们确实发现,从三条街道收集的数据存在很大差异,这些街道可能最好单独处理,或者应该明确包括街道效应。
{"title":"Spatial survival modelling of business re-opening after Katrina: Survival modelling compared to spatial probit modelling of re-opening within 3, 6 or 12 months","authors":"R. Bivand, V. Gómez‐Rubio","doi":"10.1177/1471082X20967158","DOIUrl":"https://doi.org/10.1177/1471082X20967158","url":null,"abstract":"Zhou and Hanson; Zhou and Hanson; Zhou and Hanson (2015, Nonparametric Bayesian Inference in Biostatistics, pages 215–46. Cham: Springer; 2018, Journal of the American Statistical Association, 113, 571–81; 2020, spBayesSurv: Bayesian Modeling and Analysis of Spatially Correlated Survival Data. R package version 1.1.4) and Zhou et al. (2020, Journal of Statistical Software, Articles, 92, 1–33) present methods for estimating spatial survival models using areal data. This article applies their methods to a dataset recording New Orleans business decisions to re-open after Hurricane Katrina; the data were included in LeSage et al. (2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27). In two articles (LeSage etal., 2011a, Significance, 8, 160—63; 2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27), spatial probit models are used to model spatial dependence in this dataset, with decisions to re-open aggregated to the first 90, 180 and 360 days. We re-cast the problem as one of examining the time-to-event records in the data, right-censored as observations ceased before 175 businesses had re-opened; we omit businesses already re-opened when observations began on Day 41. We are interested in checking whether the conclusions about the covariates using aspatial and spatial probit models are modified when applying survival and spatial survival models estimated using MCMC and INLA. In general, we find that the same covariates are associated with re-opening decisions in both modelling approaches. We do however find that data collected from three streets differ substantially, and that the streets are probably better handled separately or that the street effect should be included explicitly.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"21 1","pages":"137 - 160"},"PeriodicalIF":1.0,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20967158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45849449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A copula-based approach to joint modelling of multiple longitudinal responses with multimodal structures 基于copula的多模态结构多重纵向响应联合建模方法
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-12-13 DOI: 10.1177/1471082X20967168
Zahra Mahdiyeh, I. Kazemi, G. Verbeke
This article introduces a flexible modelling strategy to extend the familiar mixed-effects models for analysing longitudinal responses in the multivariate setting. By initiating a flexible multivariate multimodal distribution, this strategy relaxes the imposed normality assumption of related random-effects. We use copulas to construct a multimodal form of elliptical distributions. It can deal with the multimodality of responses and the non-linearity of dependence structure. Moreover, the proposed model can flexibly accommodate clustered subject-effects for multiple longitudinal measurements. It is much useful when several subpopulations exist but cannot be directly identifiable. Since the implied marginal distribution is not in the closed form, to approximate the associated likelihood functions, we suggest a computational methodology based on the Gauss–Hermite quadrature that consequently enables us to implement standard optimization techniques. We conduct a simulation study to highlight the main properties of the theoretical part and make a comparison with regular mixture distributions. Results confirm that the new strategy deserves to receive attention in practice. We illustrate the usefulness of our model by the analysis of a real-life dataset taken from a low back pain study.
本文介绍了一种灵活的建模策略,以扩展常见的混合效应模型,用于分析多变量环境中的纵向响应。通过启动灵活的多变量多模式分布,该策略放松了相关随机效应的正态性假设。我们使用copula来构造椭圆分布的多模态形式。它可以处理响应的多模态和依赖结构的非线性。此外,所提出的模型可以灵活地适应多个纵向测量的聚集主体效应。当存在几个亚群但无法直接识别时,它非常有用。由于隐含边际分布不是封闭形式,为了近似相关的似然函数,我们提出了一种基于高斯-埃尔米特求积的计算方法,从而使我们能够实现标准的优化技术。我们进行了模拟研究,以突出理论部分的主要特性,并与规则的混合物分布进行了比较。研究结果表明,这一新策略在实践中值得关注。我们通过分析一项腰痛研究中的真实数据集来说明我们模型的有用性。
{"title":"A copula-based approach to joint modelling of multiple longitudinal responses with multimodal structures","authors":"Zahra Mahdiyeh, I. Kazemi, G. Verbeke","doi":"10.1177/1471082X20967168","DOIUrl":"https://doi.org/10.1177/1471082X20967168","url":null,"abstract":"This article introduces a flexible modelling strategy to extend the familiar mixed-effects models for analysing longitudinal responses in the multivariate setting. By initiating a flexible multivariate multimodal distribution, this strategy relaxes the imposed normality assumption of related random-effects. We use copulas to construct a multimodal form of elliptical distributions. It can deal with the multimodality of responses and the non-linearity of dependence structure. Moreover, the proposed model can flexibly accommodate clustered subject-effects for multiple longitudinal measurements. It is much useful when several subpopulations exist but cannot be directly identifiable. Since the implied marginal distribution is not in the closed form, to approximate the associated likelihood functions, we suggest a computational methodology based on the Gauss–Hermite quadrature that consequently enables us to implement standard optimization techniques. We conduct a simulation study to highlight the main properties of the theoretical part and make a comparison with regular mixture distributions. Results confirm that the new strategy deserves to receive attention in practice. We illustrate the usefulness of our model by the analysis of a real-life dataset taken from a low back pain study.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"327 - 348"},"PeriodicalIF":1.0,"publicationDate":"2020-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20967168","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42995318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Response transformations for random effect and variance component models 随机效应和方差分量模型的响应变换
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-12-13 DOI: 10.1177/1471082X20966919
Amani Almohaimeed, J. Einbeck
Random effect models have been popularly used as a mainstream statistical technique over several decades; and the same can be said for response transformation models such as the Box–Cox transformation. The latter aims at ensuring that the assumptions of normality and of homoscedasticity of the response distribution are fulfilled, which are essential conditions for inference based on a linear model or a linear mixed model. However, methodology for response transformation and simultaneous inclusion of random effects has been developed and implemented only scarcely, and is so far restricted to Gaussian random effects. We develop such methodology, thereby not requiring parametric assumptions on the distribution of the random effects. This is achieved by extending the ‘Nonparametric Maximum Likelihood’ towards a ‘Nonparametric profile maximum likelihood’ technique, allowing to deal with overdispersion as well as two-level data scenarios.
几十年来,随机效应模型已被广泛用作主流统计技术;对于响应变换模型(如Box–Cox变换)也是如此。后者旨在确保响应分布的正态性和同方差假设得到满足,这是基于线性模型或线性混合模型进行推理的必要条件。然而,响应转换和同时包含随机效应的方法很少被开发和实施,并且到目前为止仅限于高斯随机效应。我们开发了这样的方法,从而不需要对随机效应的分布进行参数假设。这是通过将“非参数最大似然”扩展到“非参数配置文件最大似然”技术来实现的,允许处理过度分散以及两级数据场景。
{"title":"Response transformations for random effect and variance component models","authors":"Amani Almohaimeed, J. Einbeck","doi":"10.1177/1471082X20966919","DOIUrl":"https://doi.org/10.1177/1471082X20966919","url":null,"abstract":"Random effect models have been popularly used as a mainstream statistical technique over several decades; and the same can be said for response transformation models such as the Box–Cox transformation. The latter aims at ensuring that the assumptions of normality and of homoscedasticity of the response distribution are fulfilled, which are essential conditions for inference based on a linear model or a linear mixed model. However, methodology for response transformation and simultaneous inclusion of random effects has been developed and implemented only scarcely, and is so far restricted to Gaussian random effects. We develop such methodology, thereby not requiring parametric assumptions on the distribution of the random effects. This is achieved by extending the ‘Nonparametric Maximum Likelihood’ towards a ‘Nonparametric profile maximum likelihood’ technique, allowing to deal with overdispersion as well as two-level data scenarios.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"297 - 326"},"PeriodicalIF":1.0,"publicationDate":"2020-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20966919","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42821112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Kernel-based estimation of individual location densities from smartphone data 基于核的智能手机数据个体位置密度估计
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-12-01 DOI: 10.1177/1471082X19870331
F. Finazzi, L. Paci
Localizing people across space and over time is a relevant and challenging problem in many modern applications. Smartphone ubiquity gives the opportunity to collect useful individual data as never before. In this work, the focus is on location data collected by smartphone applications. We propose a kernel-based density estimation approach that exploits cyclical spatio-temporal patterns of people to estimate the individual location density at any time, uncertainty included. Model parameters are estimated by maximum likelihood cross-validation. Unlike classic tracking methods designed for high spatio-temporal resolution data, the approach is suitable when location data are sparse in time and are affected by non-negligible errors. The approach is applied to location data collected by the Earthquake Network citizen science project which carries out a worldwide earthquake early warning system based on smartphones. The approach is parsimonious and is suitable to model location data gathered by any location-aware smartphone application.
在许多现代应用程序中,跨空间和跨时间定位人员是一个相关且具有挑战性的问题。智能手机的普及为收集有用的个人数据提供了前所未有的机会。在这项工作中,重点是智能手机应用程序收集的位置数据。我们提出了一种基于核的密度估计方法,该方法利用人的循环时空模式来估计任何时候的个人位置密度,包括不确定性。模型参数通过最大似然交叉验证进行估计。与为高时空分辨率数据设计的经典跟踪方法不同,该方法适用于位置数据时间稀疏且受不可忽略误差影响的情况。该方法应用于地震网络公民科学项目收集的位置数据,该项目基于智能手机在全球范围内实施地震预警系统。该方法简洁,适用于对任何位置感知智能手机应用程序收集的位置数据进行建模。
{"title":"Kernel-based estimation of individual location densities from smartphone data","authors":"F. Finazzi, L. Paci","doi":"10.1177/1471082X19870331","DOIUrl":"https://doi.org/10.1177/1471082X19870331","url":null,"abstract":"Localizing people across space and over time is a relevant and challenging problem in many modern applications. Smartphone ubiquity gives the opportunity to collect useful individual data as never before. In this work, the focus is on location data collected by smartphone applications. We propose a kernel-based density estimation approach that exploits cyclical spatio-temporal patterns of people to estimate the individual location density at any time, uncertainty included. Model parameters are estimated by maximum likelihood cross-validation. Unlike classic tracking methods designed for high spatio-temporal resolution data, the approach is suitable when location data are sparse in time and are affected by non-negligible errors. The approach is applied to location data collected by the Earthquake Network citizen science project which carries out a worldwide earthquake early warning system based on smartphones. The approach is parsimonious and is suitable to model location data gathered by any location-aware smartphone application.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"20 1","pages":"617 - 633"},"PeriodicalIF":1.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X19870331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48104005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Maximum approximate likelihood estimation of general continuous-time state-space models 一般连续时间状态空间模型的最大近似似然估计
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-10-28 DOI: 10.1177/1471082x211065785
S. Mews, R. Langrock, Marius Otting, Houda Yaqine, Jost Reinecke
Continuous-time state-space models (SSMs) are flexible tools for analysing irregularly sampled sequential observations that are driven by an underlying state process. Corresponding applications typically involve restrictive assumptions concerning linearity and Gaussianity to facilitate inference on the model parameters via the Kalman filter. In this contribution, we provide a general continuous-time SSM framework, allowing both the observation and the state process to be non-linear and non-Gaussian. Statistical inference is carried out by maximum approximate likelihood estimation, where multiple numerical integration within the likelihood evaluation is performed via a fine discretization of the state process. The corresponding reframing of the SSM as a continuous-time hidden Markov model, with structured state transitions, enables us to apply the associated efficient algorithms for parameter estimation and state decoding. We illustrate the modelling approach in a case study using data from a longitudinal study on delinquent behaviour of adolescents in Germany, revealing temporal persistence in the deviation of an individual's delinquency level from the population mean.
连续时间状态空间模型(SSM)是一种灵活的工具,用于分析由底层状态过程驱动的不规则采样序列观测。相应的应用通常涉及关于线性和高斯性的限制性假设,以便于通过卡尔曼滤波器推断模型参数。在这篇文章中,我们提供了一个通用的连续时间SSM框架,允许观测和状态过程都是非线性和非高斯的。统计推断是通过最大近似似然估计来执行的,其中通过状态过程的精细离散化来执行似然评估中的多重数值积分。将SSM相应地重新定义为具有结构化状态转换的连续时间隐马尔可夫模型,使我们能够应用相关的有效算法进行参数估计和状态解码。我们在一个案例研究中使用了德国青少年犯罪行为纵向研究的数据来说明建模方法,揭示了个人犯罪水平与人口平均值偏差的时间持续性。
{"title":"Maximum approximate likelihood estimation of general continuous-time state-space models","authors":"S. Mews, R. Langrock, Marius Otting, Houda Yaqine, Jost Reinecke","doi":"10.1177/1471082x211065785","DOIUrl":"https://doi.org/10.1177/1471082x211065785","url":null,"abstract":"Continuous-time state-space models (SSMs) are flexible tools for analysing irregularly sampled sequential observations that are driven by an underlying state process. Corresponding applications typically involve restrictive assumptions concerning linearity and Gaussianity to facilitate inference on the model parameters via the Kalman filter. In this contribution, we provide a general continuous-time SSM framework, allowing both the observation and the state process to be non-linear and non-Gaussian. Statistical inference is carried out by maximum approximate likelihood estimation, where multiple numerical integration within the likelihood evaluation is performed via a fine discretization of the state process. The corresponding reframing of the SSM as a continuous-time hidden Markov model, with structured state transitions, enables us to apply the associated efficient algorithms for parameter estimation and state decoding. We illustrate the modelling approach in a case study using data from a longitudinal study on delinquent behaviour of adolescents in Germany, revealing temporal persistence in the deviation of an individual's delinquency level from the population mean.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45949267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Two-part quantile regression models for semi-continuous longitudinal data: A finite mixture approach 半连续纵向数据的两部分分位数回归模型:有限混合方法
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-10-23 DOI: 10.1177/1471082X21993603
Luca Merlo, A. Maruotti, L. Petrella
This article develops a two-part finite mixture quantile regression model for semi-continuous longitudinal data. The proposed methodology allows heterogeneity sources that influence the model for the binary response variable to also influence the distribution of the positive outcomes. As is common in the quantile regression literature, estimation and inference on the model parameters are based on the asymmetric Laplace distribution. Maximum likelihood estimates are obtained through the EM algorithm without parametric assumptions on the random effects distribution. In addition, a penalized version of the EM algorithm is presented to tackle the problem of variable selection. The proposed statistical method is applied to the well-known RAND Health Insurance Experiment dataset which gives further insights on its empirical behaviour.
本文为半连续纵向数据建立了一个由两部分组成的有限混合分位数回归模型。所提出的方法允许影响二元响应变量模型的异质性来源也影响积极结果的分布。正如分位数回归文献中常见的那样,对模型参数的估计和推断是基于不对称拉普拉斯分布的。最大似然估计是通过EM算法获得的,而不需要对随机效应分布进行参数假设。此外,还提出了EM算法的惩罚版本来解决变量选择问题。将所提出的统计方法应用于著名的兰德健康保险实验数据集,进一步深入了解了其实证行为。
{"title":"Two-part quantile regression models for semi-continuous longitudinal data: A finite mixture approach","authors":"Luca Merlo, A. Maruotti, L. Petrella","doi":"10.1177/1471082X21993603","DOIUrl":"https://doi.org/10.1177/1471082X21993603","url":null,"abstract":"This article develops a two-part finite mixture quantile regression model for semi-continuous longitudinal data. The proposed methodology allows heterogeneity sources that influence the model for the binary response variable to also influence the distribution of the positive outcomes. As is common in the quantile regression literature, estimation and inference on the model parameters are based on the asymmetric Laplace distribution. Maximum likelihood estimates are obtained through the EM algorithm without parametric assumptions on the random effects distribution. In addition, a penalized version of the EM algorithm is presented to tackle the problem of variable selection. The proposed statistical method is applied to the well-known RAND Health Insurance Experiment dataset which gives further insights on its empirical behaviour.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"485 - 508"},"PeriodicalIF":1.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X21993603","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43893165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multiple imputation and selection of ordinal level 2 predictors in multilevel models: An analysis of the relationship between student ratings and teacher practices and attitudes 多层次模型中有序水平2预测因子的多重归算和选择:学生评分与教师实践和态度之间关系的分析
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-10-22 DOI: 10.1177/1471082X20949710
L. Grilli, Maria Francesca Marino, O. Paccagnella, C. Rampichini
The article is motivated by the analysis of the relationship between university student ratings and teacher practices and attitudes, which are measured via a set of binary and ordinal items collected by an innovative survey. The analysis is conducted through a two-level random intercept model, where student ratings are nested within teachers. The analysis must face two issues about the items measuring teacher practices and attitudes, which are level 2 predictors: (a) the items are severely affected by missingness due to teacher non-response and (b) there is redundancy in both the number of items and the number of categories of their measurement scale. We tackle the missing data issue by considering a multiple imputation strategy exploiting information at both student and teacher levels. For the redundancy issue, we rely on regularization techniques for ordinal predictors, also accounting for the multilevel data structure. The proposed solution addresses the problem at hand in an original way, and it can be applied whenever it is required to select level 2 predictors affected by missing values. The results obtained with the final model indicate that ratings on teacher ability to motivate students are related to certain teacher practices and attitudes.
本文的动机是分析大学生评分与教师实践和态度之间的关系,这些关系是通过一项创新调查收集的一组二元和有序项目来衡量的。该分析是通过两级随机截距模型进行的,其中学生评分嵌套在教师中。分析必须面对两个关于测量教师实践和态度的项目的问题,这两个项目是2级预测因素:(a)由于教师的不回应,这些项目受到遗漏的严重影响;(b)其测量量表的项目数量和类别数量都存在冗余。我们通过考虑在学生和教师层面利用信息的多重插补策略来解决数据缺失问题。对于冗余问题,我们依赖正则化技术来进行有序预测,同时考虑多级数据结构。所提出的解决方案以独创的方式解决了手头的问题,并且无论何时需要选择受缺失值影响的2级预测因子,都可以应用该解决方案。最终模型的结果表明,对教师激励学生能力的评分与教师的某些做法和态度有关。
{"title":"Multiple imputation and selection of ordinal level 2 predictors in multilevel models: An analysis of the relationship between student ratings and teacher practices and attitudes","authors":"L. Grilli, Maria Francesca Marino, O. Paccagnella, C. Rampichini","doi":"10.1177/1471082X20949710","DOIUrl":"https://doi.org/10.1177/1471082X20949710","url":null,"abstract":"The article is motivated by the analysis of the relationship between university student ratings and teacher practices and attitudes, which are measured via a set of binary and ordinal items collected by an innovative survey. The analysis is conducted through a two-level random intercept model, where student ratings are nested within teachers. The analysis must face two issues about the items measuring teacher practices and attitudes, which are level 2 predictors: (a) the items are severely affected by missingness due to teacher non-response and (b) there is redundancy in both the number of items and the number of categories of their measurement scale. We tackle the missing data issue by considering a multiple imputation strategy exploiting information at both student and teacher levels. For the redundancy issue, we rely on regularization techniques for ordinal predictors, also accounting for the multilevel data structure. The proposed solution addresses the problem at hand in an original way, and it can be applied whenever it is required to select level 2 predictors affected by missing values. The results obtained with the final model indicate that ratings on teacher ability to motivate students are related to certain teacher practices and attitudes.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"221 - 238"},"PeriodicalIF":1.0,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20949710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47914577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bayesian mixture modelling of the high-energy photon counts collected by the Fermi Large Area Telescope 费米大面积望远镜收集的高能光子计数的贝叶斯混合模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-09-28 DOI: 10.1177/1471082X20947222
D. Costantin, Andrea Sottosanti, A. Brazzale, D. Bastieri, J. Fan
Identifying as yet undetected high-energy sources in the γ -ray sky is one of the declared objectives of the Fermi Large Area Telescope (LAT) Collaboration. We develop a Bayesian mixture model which is capable of disentangling the high-energy extra-galactic sources present in a given sky region from the pervasive background radiation. We achieve this by combining two model components. The first component models the emission activity of the single sources and incorporates the instrument response function of the Fermi γ -ray space telescope. The second component reliably reflects the current knowledge of the physical phenomena which underlie the γ -ray background. The model parameters are estimated using a reversible jump MCMC algorithm, which simultaneously returns the number of detected sources, their locations and relative intensities, and the background component. Our proposal is illustrated using a sample of the Fermi LAT data. In the analysed sky region, our model correctly identifies 116 sources out of the 132 present. The detection rate and the estimated directions and intensities of the identified sources are largely unaffected by the number of detected sources.
识别γ射线天空中尚未被发现的高能源是费米大面积望远镜(LAT)合作的宣布目标之一。我们开发了一个贝叶斯混合模型,该模型能够将存在于给定天空区域的高能星系外源与普遍的背景辐射解开。我们通过组合两个模型组件来实现这一点。第一个组件模拟了单个源的发射活动,并结合了费米γ射线空间望远镜的仪器响应函数。第二个分量可靠地反映了γ射线背景下的物理现象的当前知识。使用可逆跳跃MCMC算法估计模型参数,该算法同时返回检测到的源的数量、它们的位置和相对强度以及背景分量。我们的建议是用费米LAT数据的样本来说明的。在分析的天空区域中,我们的模型正确地识别了132个来源中的116个。所识别的源的检测速率以及估计的方向和强度在很大程度上不受检测到的源的数量的影响。
{"title":"Bayesian mixture modelling of the high-energy photon counts collected by the Fermi Large Area Telescope","authors":"D. Costantin, Andrea Sottosanti, A. Brazzale, D. Bastieri, J. Fan","doi":"10.1177/1471082X20947222","DOIUrl":"https://doi.org/10.1177/1471082X20947222","url":null,"abstract":"Identifying as yet undetected high-energy sources in the γ -ray sky is one of the declared objectives of the Fermi Large Area Telescope (LAT) Collaboration. We develop a Bayesian mixture model which is capable of disentangling the high-energy extra-galactic sources present in a given sky region from the pervasive background radiation. We achieve this by combining two model components. The first component models the emission activity of the single sources and incorporates the instrument response function of the Fermi γ -ray space telescope. The second component reliably reflects the current knowledge of the physical phenomena which underlie the γ -ray background. The model parameters are estimated using a reversible jump MCMC algorithm, which simultaneously returns the number of detected sources, their locations and relative intensities, and the background component. Our proposal is illustrated using a sample of the Fermi LAT data. In the analysed sky region, our model correctly identifies 116 sources out of the 132 present. The detection rate and the estimated directions and intensities of the identified sources are largely unaffected by the number of detected sources.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"175 - 198"},"PeriodicalIF":1.0,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20947222","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47939426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pairwise estimation of multivariate longitudinal outcomes in a Bayesian setting with extensions to the joint model 贝叶斯环境下多变量纵向结果的成对估计及其对联合模型的扩展
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2020-09-28 DOI: 10.1177/1471082X20945069
K. Mauff, N. Erler, I. Kardys, D. Rizopoulos
Multiple longitudinal outcomes are theoretically easily modelled via extension of the generalized linear mixed effects model. However, due to computational limitations in high dimensions, in practice these models are applied only in situations with relatively few outcomes. We adapt the solution proposed by Fieuws and Verbeke (2006) to the Bayesian setting: fitting all pairwise bivariate models instead of a single multivariate model, and combining the Markov Chain Monte Carlo (MCMC) realizations obtained for each pairwise bivariate model for the relevant parameters. We explore importance sampling as a method to more closely approximate the correct multivariate posterior distribution. Simulation studies show satisfactory results in terms of bias, RMSE and coverage of the 95% credible intervals for multiple longitudinal outcomes, even in scenarios with more limited information and non-continuous outcomes, although the use of importance sampling is not successful. We further examine the incorporation of a time-to-event outcome, proposing the use of Bayesian pairwise estimation of a multivariate GLMM in an adaptation of the corrected two-stage estimation procedure for the joint model for multiple longitudinal outcomes and a time-to-event outcome (Mauff et al., 2020, Statistics and Computing). The method does not work as well in the case of the corrected two-stage joint model; however, the results are promising and should be explored further.
通过广义线性混合效应模型的扩展,理论上可以很容易地对多个纵向结果进行建模。然而,由于高维的计算限制,在实践中,这些模型仅适用于结果相对较少的情况。我们将Fieuws和Verbeke(2006)提出的解决方案应用于贝叶斯设置:拟合所有成对的二变量模型而不是单个多变量模型,并将为每个成对的二元模型获得的马尔可夫链蒙特卡罗(MCMC)实现与相关参数相结合。我们探索重要性抽样作为一种更接近正确的多元后验分布的方法。模拟研究表明,即使在信息更有限和结果不连续的情况下,在多个纵向结果的偏倚、RMSE和95%可信区间的覆盖率方面也取得了令人满意的结果,尽管重要性抽样的使用并不成功。我们进一步研究了时间到事件结果的结合,建议在多个纵向结果和时间到事件的联合模型的校正两阶段估计程序的自适应中使用多变量GLMM的贝叶斯成对估计(Mauff et al.,2020,Statistics and Computing)。该方法在修正的两阶段联合模型的情况下效果不佳;然而,结果是有希望的,应该进一步探索。
{"title":"Pairwise estimation of multivariate longitudinal outcomes in a Bayesian setting with extensions to the joint model","authors":"K. Mauff, N. Erler, I. Kardys, D. Rizopoulos","doi":"10.1177/1471082X20945069","DOIUrl":"https://doi.org/10.1177/1471082X20945069","url":null,"abstract":"Multiple longitudinal outcomes are theoretically easily modelled via extension of the generalized linear mixed effects model. However, due to computational limitations in high dimensions, in practice these models are applied only in situations with relatively few outcomes. We adapt the solution proposed by Fieuws and Verbeke (2006) to the Bayesian setting: fitting all pairwise bivariate models instead of a single multivariate model, and combining the Markov Chain Monte Carlo (MCMC) realizations obtained for each pairwise bivariate model for the relevant parameters. We explore importance sampling as a method to more closely approximate the correct multivariate posterior distribution. Simulation studies show satisfactory results in terms of bias, RMSE and coverage of the 95% credible intervals for multiple longitudinal outcomes, even in scenarios with more limited information and non-continuous outcomes, although the use of importance sampling is not successful. We further examine the incorporation of a time-to-event outcome, proposing the use of Bayesian pairwise estimation of a multivariate GLMM in an adaptation of the corrected two-stage estimation procedure for the joint model for multiple longitudinal outcomes and a time-to-event outcome (Mauff et al., 2020, Statistics and Computing). The method does not work as well in the case of the corrected two-stage joint model; however, the results are promising and should be explored further.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"21 1","pages":"115 - 136"},"PeriodicalIF":1.0,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X20945069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45650265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Statistical Modelling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1