Pub Date : 2024-09-17DOI: 10.1177/1471082x241258261
Andrew McInerney, Kevin Burke
Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.
{"title":"A statistical modelling approach to feedforward neural network model selection","authors":"Andrew McInerney, Kevin Burke","doi":"10.1177/1471082x241258261","DOIUrl":"https://doi.org/10.1177/1471082x241258261","url":null,"abstract":"Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"119 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-27DOI: 10.1177/1471082x241235024
Martje Rave, Göran Kauermann
With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed to understand the whole picture of the pandemic. In this article, we take a closer look at the severity of the course of COVID-19 in Germany, for which ideal information would be the number of incoming patients to ICU units. This information was (and still is) not available. Instead, the current occupancy of ICU units on the district level was reported daily. We demonstrate how this information can be used to predict the number of incoming as well as released COVID-19 patients using a stochastic version of the Expectation Maximization algorithm (SEM). This, in turn, allows for estimating the influence of district-specific and age-specific infection rates as well as further covariates, including spatial effects, on the number of incoming patients. The article demon-strates that even if relevant data are not recorded or provided officially, statistical modelling allows for reconstructing them. This also includes the quantification of uncertainty which naturally results from the application of the SEM algorithm.
{"title":"The Skellam distribution revisited: Estimating the unobserved incoming and outgoing ICU COVID-19 patients on a regional level in Germany","authors":"Martje Rave, Göran Kauermann","doi":"10.1177/1471082x241235024","DOIUrl":"https://doi.org/10.1177/1471082x241235024","url":null,"abstract":"With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed to understand the whole picture of the pandemic. In this article, we take a closer look at the severity of the course of COVID-19 in Germany, for which ideal information would be the number of incoming patients to ICU units. This information was (and still is) not available. Instead, the current occupancy of ICU units on the district level was reported daily. We demonstrate how this information can be used to predict the number of incoming as well as released COVID-19 patients using a stochastic version of the Expectation Maximization algorithm (SEM). This, in turn, allows for estimating the influence of district-specific and age-specific infection rates as well as further covariates, including spatial effects, on the number of incoming patients. The article demon-strates that even if relevant data are not recorded or provided officially, statistical modelling allows for reconstructing them. This also includes the quantification of uncertainty which naturally results from the application of the SEM algorithm.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"67 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1177/1471082x241234139
Yanxi Li, Derek S. Young, Julien Gori, Olivier Rioul
Fitts’ law is often employed as a predictive model for human movement, especially in the field of human-computer interaction. Models with an assumed Gaussian error structure are usually adequate when applied to data collected from controlled studies. However, observational data (often referred to as data gathered ‘in the wild’) typically display noticeable positive skewness relative to a mean trend as users do not routinely try to minimize their task completion time. As such, the exponentially modified Gaussian (EMG) regression model has been applied to aimed movements data. However, it is also of interest to reasonably characterize those regions where a user likely was not trying to minimize their task completion time. In this article, we propose a novel model with a two-component mixture structure—one Gaussian and one exponential—on the errors to identify such a region. An expectation-conditional-maximization (ECM) algorithm is developed for estimation of such a model and some properties of the algorithm are established. The efficacy of the proposed model, as well as its ability to inform model-based clustering, are addressed in this work through extensive simulations and an insightful analysis of a human aiming performance study.
{"title":"A novel mixture model for characterizing human aiming performance data","authors":"Yanxi Li, Derek S. Young, Julien Gori, Olivier Rioul","doi":"10.1177/1471082x241234139","DOIUrl":"https://doi.org/10.1177/1471082x241234139","url":null,"abstract":"Fitts’ law is often employed as a predictive model for human movement, especially in the field of human-computer interaction. Models with an assumed Gaussian error structure are usually adequate when applied to data collected from controlled studies. However, observational data (often referred to as data gathered ‘in the wild’) typically display noticeable positive skewness relative to a mean trend as users do not routinely try to minimize their task completion time. As such, the exponentially modified Gaussian (EMG) regression model has been applied to aimed movements data. However, it is also of interest to reasonably characterize those regions where a user likely was not trying to minimize their task completion time. In this article, we propose a novel model with a two-component mixture structure—one Gaussian and one exponential—on the errors to identify such a region. An expectation-conditional-maximization (ECM) algorithm is developed for estimation of such a model and some properties of the algorithm are established. The efficacy of the proposed model, as well as its ability to inform model-based clustering, are addressed in this work through extensive simulations and an insightful analysis of a human aiming performance study.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"101 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-08DOI: 10.1177/1471082x231225307
Ryan Peterson, Joseph Cavanaugh
The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more sceptical of higher-order polynomials and interactions a priori compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior scepticism can be applied to characterize the potentially complex seasonal autoregressive (AR) structure of a series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the fastTS R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modelling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while generally producing more accurate predictions.
稀疏性排序套索(SRL)是针对存在交互作用和多项式时的模型选择和估计而开发的。SRL 的主要原理是,与主效应相比,算法应该先验地对高阶多项式和交互作用持怀疑态度,因此纳入这些更复杂的项需要更高水平的证据。在时间序列中,同样的先验怀疑排序思想也可用于在模型拟合过程中描述序列潜在的复杂季节性自回归(AR)结构,在季节性不确定或具有多种模式的情况下尤其有用。SRL 可以自然地纳入外生变量,并简化推理和/或特征选择的选项。即使是具有高维特征集的大型序列,其拟合过程也非常快速。在这项工作中,我们将讨论这一程序的表述,以及我们通过 fastTS R 软件包为实现这一程序而开发的软件。我们在爱荷华大学医院和诊所对每小时急诊到达人数进行自回归建模的新应用中,探索了基于 SRL 方法的性能。我们发现 SRL 比其竞争对手快得多,同时通常能产生更准确的预测。
{"title":"Fast, effective, and coherent time series modelling using the sparsity-ranked lasso","authors":"Ryan Peterson, Joseph Cavanaugh","doi":"10.1177/1471082x231225307","DOIUrl":"https://doi.org/10.1177/1471082x231225307","url":null,"abstract":"The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more sceptical of higher-order polynomials and interactions a priori compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior scepticism can be applied to characterize the potentially complex seasonal autoregressive (AR) structure of a series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the fastTS R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modelling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while generally producing more accurate predictions.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"55 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140071345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-05DOI: 10.1177/1471082x231226287
Carlos Vergara-Hernández, Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito
Spatial small area estimation models have become very popular in some contexts, such as disease mapping. Data in disease mapping studies are exhaustive, that is, the available data are supposed to be a complete register of all the observable events. In contrast, some other small area studies do not use exhaustive data, such as survey based studies, where a particular sampling design is typically followed and inferences are later extrapolated to the entire population. In this article we propose a spatial model for small area survey studies, taking advantage of spatial dependence between units, which is the key assumption used for yielding reliable estimates in exhaustive data based studies. In addition, and in contrast to most survey-based spatial studies, we also take into account information on the sampling design and additional supplementary variables to obtain estimates in small areas. This makes it possible to merge spatial and sampling models into a common proposal.
{"title":"Taking advantage of sampling designs in spatial small-area survey studies","authors":"Carlos Vergara-Hernández, Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito","doi":"10.1177/1471082x231226287","DOIUrl":"https://doi.org/10.1177/1471082x231226287","url":null,"abstract":"Spatial small area estimation models have become very popular in some contexts, such as disease mapping. Data in disease mapping studies are exhaustive, that is, the available data are supposed to be a complete register of all the observable events. In contrast, some other small area studies do not use exhaustive data, such as survey based studies, where a particular sampling design is typically followed and inferences are later extrapolated to the entire population. In this article we propose a spatial model for small area survey studies, taking advantage of spatial dependence between units, which is the key assumption used for yielding reliable estimates in exhaustive data based studies. In addition, and in contrast to most survey-based spatial studies, we also take into account information on the sampling design and additional supplementary variables to obtain estimates in small areas. This makes it possible to merge spatial and sampling models into a common proposal.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"192 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-28DOI: 10.1177/1471082x231225806
Anneleen Verhasselt, Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom
Quantile regression can be a helpful technique for analysing clustered (such as longitudinal) data. It can characterize the change in response over time without making distributional assumptions and is robust to outliers in the response. A quantile regression model using a copula-based multivariate asymmetric Laplace distribution for addressing correlation due to clustering is introduced. Furthermore, we propose a pairwise estimator for the parameters of the model. Since it is based on pseudo-likelihood, it needs to be modified to avoid bias in presence of missingness. Therefore, we enhance the model with inverse probability weighting. In this way, our proposal is unbiased under the missing at random assumption. Based on simulations, the estimator is efficient and computationally fast. Finally, the methodology is illustrated using a study in ophthalmology.
{"title":"Copula-based pairwise estimator for quantile regression with hierarchical missing data","authors":"Anneleen Verhasselt, Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom","doi":"10.1177/1471082x231225806","DOIUrl":"https://doi.org/10.1177/1471082x231225806","url":null,"abstract":"Quantile regression can be a helpful technique for analysing clustered (such as longitudinal) data. It can characterize the change in response over time without making distributional assumptions and is robust to outliers in the response. A quantile regression model using a copula-based multivariate asymmetric Laplace distribution for addressing correlation due to clustering is introduced. Furthermore, we propose a pairwise estimator for the parameters of the model. Since it is based on pseudo-likelihood, it needs to be modified to avoid bias in presence of missingness. Therefore, we enhance the model with inverse probability weighting. In this way, our proposal is unbiased under the missing at random assumption. Based on simulations, the estimator is efficient and computationally fast. Finally, the methodology is illustrated using a study in ophthalmology.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"77 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140003632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-07DOI: 10.1177/1471082x231219847
Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad
Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a computationally efficient approach that can be routinely applied. We use the new method to analyse the prevalence of completion of secondary education for 20-49 year old women in Nigeria in 2018 based on the 2018 DHS survey. The analysis demonstrates substantial changes in the estimates of spatial range and fixed effects compared to when we ignore jittering. Through a simulation study that mimics the dataset, we demonstrate that accounting for jittering reduces attenuation in the estimated coefficients for covariates and improves predictions. The results also show that the common approach of averaging covariate values in windows around the observed locations does not lead to the same improvements as accounting for jittering.
{"title":"Impact of jittering on raster- and distance-based geostatistical analyses of DHS data","authors":"Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad","doi":"10.1177/1471082x231219847","DOIUrl":"https://doi.org/10.1177/1471082x231219847","url":null,"abstract":"Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a computationally efficient approach that can be routinely applied. We use the new method to analyse the prevalence of completion of secondary education for 20-49 year old women in Nigeria in 2018 based on the 2018 DHS survey. The analysis demonstrates substantial changes in the estimates of spatial range and fixed effects compared to when we ignore jittering. Through a simulation study that mimics the dataset, we demonstrate that accounting for jittering reduces attenuation in the estimated coefficients for covariates and improves predictions. The results also show that the common approach of averaging covariate values in windows around the observed locations does not lead to the same improvements as accounting for jittering.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"10 3 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139949298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-21DOI: 10.1177/1471082x231209361
R. K. Sercundes, G. Molenberghs, G. Verbeke, Clarice G.B. Demétrio, Sila C. da Silva, Rafael A. Moral
Longitudinal studies involving nominal outcomes are carried out in various scientific areas. These outcomes are frequently modelled using a generalized linear mixed modelling (GLMM) framework. This widely used approach allows for the modelling of the hierarchy in the data to accommodate different degrees of overdispersion. In this article, a combined model (CM) that takes into account overdispersion and clustering through two separate sets of random effects is formulated. Maximum likelihood estimation with analytic-numerical integration is used to estimate the model parameters. To examine the relative performance of the CM and the GLMM, simulation studies were carried out, exploring scenarios with different sample sizes, types of random effects, and overdispersion. Both models were applied to a real dataset obtained from an experiment in agriculture. We also provide an implementation of these models through SAS code.
各种科学领域都开展了涉及名义结果的纵向研究。这些结果经常使用广义线性混合建模(GLMM)框架进行建模。这种广泛使用的方法允许对数据中的层次结构进行建模,以适应不同程度的过度分散。本文提出了一种组合模型(CM),通过两组独立的随机效应将过度分散和聚类考虑在内。采用最大似然估计法和分析-数值积分法来估计模型参数。为了检验 CM 和 GLMM 的相对性能,进行了模拟研究,探讨了不同样本大小、随机效应类型和过度分散的情况。这两种模型都被应用于从农业实验中获得的真实数据集。我们还通过 SAS 代码提供了这些模型的实现方法。
{"title":"A combined overdispersed longitudinal model for nominal data","authors":"R. K. Sercundes, G. Molenberghs, G. Verbeke, Clarice G.B. Demétrio, Sila C. da Silva, Rafael A. Moral","doi":"10.1177/1471082x231209361","DOIUrl":"https://doi.org/10.1177/1471082x231209361","url":null,"abstract":"Longitudinal studies involving nominal outcomes are carried out in various scientific areas. These outcomes are frequently modelled using a generalized linear mixed modelling (GLMM) framework. This widely used approach allows for the modelling of the hierarchy in the data to accommodate different degrees of overdispersion. In this article, a combined model (CM) that takes into account overdispersion and clustering through two separate sets of random effects is formulated. Maximum likelihood estimation with analytic-numerical integration is used to estimate the model parameters. To examine the relative performance of the CM and the GLMM, simulation studies were carried out, exploring scenarios with different sample sizes, types of random effects, and overdispersion. Both models were applied to a real dataset obtained from an experiment in agriculture. We also provide an implementation of these models through SAS code.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"32 24","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138948041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-21DOI: 10.1177/1471082x231204930
Rafael Cabral Fernandez, Kelly Cristina Mota Gonçalves, João Batista de Morais Pereira
This article introduces a new class of nested models that extends the literature standard combination of spatial autoregressive model for areal data with parametric quantile regression by considering an asymmetric Laplace distribution for the random errors. In addition to being more flexible, the new proposed model can incorporate a hierarchical structure, allowing it to deal with clustered data. Such an approach produces a robust statistical method for modeling the quantiles of areal data distributed in a geographically hierarchical setting. The proposed non-hierarchical model is evaluated using a wellknown house pricing dataset and a simulation study. In addition, its hierarchical version is applied to a real dataset of math scores related to public high schools within the metropolitan area of Rio de Janeiro, Brazil.
{"title":"A flexible Bayesian hierarchical quantile spatial model for areal data","authors":"Rafael Cabral Fernandez, Kelly Cristina Mota Gonçalves, João Batista de Morais Pereira","doi":"10.1177/1471082x231204930","DOIUrl":"https://doi.org/10.1177/1471082x231204930","url":null,"abstract":"This article introduces a new class of nested models that extends the literature standard combination of spatial autoregressive model for areal data with parametric quantile regression by considering an asymmetric Laplace distribution for the random errors. In addition to being more flexible, the new proposed model can incorporate a hierarchical structure, allowing it to deal with clustered data. Such an approach produces a robust statistical method for modeling the quantiles of areal data distributed in a geographically hierarchical setting. The proposed non-hierarchical model is evaluated using a wellknown house pricing dataset and a simulation study. In addition, its hierarchical version is applied to a real dataset of math scores related to public high schools within the metropolitan area of Rio de Janeiro, Brazil.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"62 7","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-08DOI: 10.1177/1471082x231202341
Margaux Delporte, Steffen Fieuws, G. Molenberghs, G. Verbeke, D. De Coninck, Vera Hoorens
In many biomedical studies multiple responses are collected over time, which results in highdimensional longitudinal data. It is often of interest to model the continuous and binary responses jointly, which can be done with joint generalized mixed models in which the association is modelled through random effects. Investigating the association between the responses is often limited to scrutinizing the correlations between the latent random effects. In this article, this approach is extended by deriving closed-form formulas for the manifest correlations (and corresponding standard errors), which reflects the correlation between the observed responses as observed. In addition, the marginal joint model is constructed, from which predictions of subvectors of one response conditional on subvectors of other response(s) and potentially a subvector of the history of the response can be derived. Corresponding prediction and confidence intervals are constructed. Two case studies are discussed, in which further pseudo-likelihood methodology is applied to reduce the computational complexity.
{"title":"A joint normal-binary (probit) model for high-dimensional longitudinal data","authors":"Margaux Delporte, Steffen Fieuws, G. Molenberghs, G. Verbeke, D. De Coninck, Vera Hoorens","doi":"10.1177/1471082x231202341","DOIUrl":"https://doi.org/10.1177/1471082x231202341","url":null,"abstract":"In many biomedical studies multiple responses are collected over time, which results in highdimensional longitudinal data. It is often of interest to model the continuous and binary responses jointly, which can be done with joint generalized mixed models in which the association is modelled through random effects. Investigating the association between the responses is often limited to scrutinizing the correlations between the latent random effects. In this article, this approach is extended by deriving closed-form formulas for the manifest correlations (and corresponding standard errors), which reflects the correlation between the observed responses as observed. In addition, the marginal joint model is constructed, from which predictions of subvectors of one response conditional on subvectors of other response(s) and potentially a subvector of the history of the response can be derived. Corresponding prediction and confidence intervals are constructed. Two case studies are discussed, in which further pseudo-likelihood methodology is applied to reduce the computational complexity.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"28 10","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138589253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}