首页 > 最新文献

Statistical Modelling最新文献

英文 中文
A statistical modelling approach to feedforward neural network model selection 前馈神经网络模型选择的统计建模方法
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-17 DOI: 10.1177/1471082x241258261
Andrew McInerney, Kevin Burke
Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.
前馈神经网络(FNN)可视为非线性回归模型,其中协变量通过加权求和与非线性函数的组合进入模型。虽然这些模型与统计建模中使用的方法有一些相似之处,但大多数神经网络研究都是在统计领域之外进行的。这导致缺乏基于统计学的方法,尤其是很少强调模型的简约性。确定输入层结构类似于变量选择,而隐藏层结构则与模型复杂性有关。在实践中,神经网络模型的选择通常是通过比较模型的样本外性能来实现的。然而,与此相反,相关似然函数的构建为基于信息标准的变量和结构选择打开了大门。本文提出了一种新的模型选择方法,该方法利用贝叶斯信息准则(BIC)对 FNN 进行输入节点和隐藏节点选择。选择 BIC 而不是样本外性能作为模型选择的目标函数,可提高恢复真实模型的概率,同时实现良好的样本外性能。仿真研究用于评估和论证所提出的方法,并对真实数据的应用进行了调查。
{"title":"A statistical modelling approach to feedforward neural network model selection","authors":"Andrew McInerney, Kevin Burke","doi":"10.1177/1471082x241258261","DOIUrl":"https://doi.org/10.1177/1471082x241258261","url":null,"abstract":"Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"119 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Skellam distribution revisited: Estimating the unobserved incoming and outgoing ICU COVID-19 patients on a regional level in Germany 重新审视斯凯拉姆分布:估算德国地区一级未观察到的 ICU COVID-19 病人进出情况
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-05-27 DOI: 10.1177/1471082x241235024
Martje Rave, Göran Kauermann
With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed to understand the whole picture of the pandemic. In this article, we take a closer look at the severity of the course of COVID-19 in Germany, for which ideal information would be the number of incoming patients to ICU units. This information was (and still is) not available. Instead, the current occupancy of ICU units on the district level was reported daily. We demonstrate how this information can be used to predict the number of incoming as well as released COVID-19 patients using a stochastic version of the Expectation Maximization algorithm (SEM). This, in turn, allows for estimating the influence of district-specific and age-specific infection rates as well as further covariates, including spatial effects, on the number of incoming patients. The article demon-strates that even if relevant data are not recorded or provided officially, statistical modelling allows for reconstructing them. This also includes the quantification of uncertainty which naturally results from the application of the SEM algorithm.
随着 COVID-19 大流行的开始,我们意识到需要收集全面的数据,并提供给科学家和专家进行适当的数据分析。在德国,罗伯特-科赫研究所(RKI)一直在努力满足对 COVID-19 数据的需求,但过去(现在仍然)缺少了解大流行全貌所需的相关数据。在这篇文章中,我们将仔细研究 COVID-19 在德国的严重程度,其中最理想的信息是重症监护室的入院病人数量。这一信息过去无法获得(现在也无法获得)。取而代之的是每天报告的地区一级重症监护病房的当前占用率。我们展示了如何利用这一信息,通过随机版本的期望最大化算法(SEM)来预测 COVID-19 病人的入院和出院人数。这反过来又可以估计特定地区和特定年龄的感染率以及其他协变量(包括空间效应)对新来患者人数的影响。文章证明,即使官方没有记录或提供相关数据,也可以通过统计建模来重建这些数据。这还包括应用 SEM 算法自然产生的不确定性的量化。
{"title":"The Skellam distribution revisited: Estimating the unobserved incoming and outgoing ICU COVID-19 patients on a regional level in Germany","authors":"Martje Rave, Göran Kauermann","doi":"10.1177/1471082x241235024","DOIUrl":"https://doi.org/10.1177/1471082x241235024","url":null,"abstract":"With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed to understand the whole picture of the pandemic. In this article, we take a closer look at the severity of the course of COVID-19 in Germany, for which ideal information would be the number of incoming patients to ICU units. This information was (and still is) not available. Instead, the current occupancy of ICU units on the district level was reported daily. We demonstrate how this information can be used to predict the number of incoming as well as released COVID-19 patients using a stochastic version of the Expectation Maximization algorithm (SEM). This, in turn, allows for estimating the influence of district-specific and age-specific infection rates as well as further covariates, including spatial effects, on the number of incoming patients. The article demon-strates that even if relevant data are not recorded or provided officially, statistical modelling allows for reconstructing them. This also includes the quantification of uncertainty which naturally results from the application of the SEM algorithm.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"67 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel mixture model for characterizing human aiming performance data 用于描述人类瞄准表演数据特征的新型混合模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-04-25 DOI: 10.1177/1471082x241234139
Yanxi Li, Derek S. Young, Julien Gori, Olivier Rioul
Fitts’ law is often employed as a predictive model for human movement, especially in the field of human-computer interaction. Models with an assumed Gaussian error structure are usually adequate when applied to data collected from controlled studies. However, observational data (often referred to as data gathered ‘in the wild’) typically display noticeable positive skewness relative to a mean trend as users do not routinely try to minimize their task completion time. As such, the exponentially modified Gaussian (EMG) regression model has been applied to aimed movements data. However, it is also of interest to reasonably characterize those regions where a user likely was not trying to minimize their task completion time. In this article, we propose a novel model with a two-component mixture structure—one Gaussian and one exponential—on the errors to identify such a region. An expectation-conditional-maximization (ECM) algorithm is developed for estimation of such a model and some properties of the algorithm are established. The efficacy of the proposed model, as well as its ability to inform model-based clustering, are addressed in this work through extensive simulations and an insightful analysis of a human aiming performance study.
菲茨定律经常被用作人类运动的预测模型,尤其是在人机交互领域。假定误差结构为高斯的模型通常适用于对照研究收集的数据。然而,观察数据(通常被称为 "野外 "收集的数据)通常会显示出相对于平均趋势的明显正偏度,因为用户并不总是试图尽量缩短任务完成时间。因此,指数修正高斯(EMG)回归模型被应用于瞄准运动数据。然而,合理地描述那些用户可能并不试图尽量缩短任务完成时间的区域也很有意义。在本文中,我们提出了一种具有双成分混合结构的新型模型--一个高斯模型和一个指数模型--来识别这种误差区域。我们开发了一种期望条件最大化(ECM)算法来估计这种模型,并确定了该算法的一些特性。本研究通过大量模拟和对人类瞄准性能研究的深入分析,探讨了所提模型的功效及其为基于模型的聚类提供信息的能力。
{"title":"A novel mixture model for characterizing human aiming performance data","authors":"Yanxi Li, Derek S. Young, Julien Gori, Olivier Rioul","doi":"10.1177/1471082x241234139","DOIUrl":"https://doi.org/10.1177/1471082x241234139","url":null,"abstract":"Fitts’ law is often employed as a predictive model for human movement, especially in the field of human-computer interaction. Models with an assumed Gaussian error structure are usually adequate when applied to data collected from controlled studies. However, observational data (often referred to as data gathered ‘in the wild’) typically display noticeable positive skewness relative to a mean trend as users do not routinely try to minimize their task completion time. As such, the exponentially modified Gaussian (EMG) regression model has been applied to aimed movements data. However, it is also of interest to reasonably characterize those regions where a user likely was not trying to minimize their task completion time. In this article, we propose a novel model with a two-component mixture structure—one Gaussian and one exponential—on the errors to identify such a region. An expectation-conditional-maximization (ECM) algorithm is developed for estimation of such a model and some properties of the algorithm are established. The efficacy of the proposed model, as well as its ability to inform model-based clustering, are addressed in this work through extensive simulations and an insightful analysis of a human aiming performance study.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"101 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast, effective, and coherent time series modelling using the sparsity-ranked lasso 利用稀疏性排序套索进行快速、有效和连贯的时间序列建模
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-03-08 DOI: 10.1177/1471082x231225307
Ryan Peterson, Joseph Cavanaugh
The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more sceptical of higher-order polynomials and interactions a priori compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior scepticism can be applied to characterize the potentially complex seasonal autoregressive (AR) structure of a series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the fastTS R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modelling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while generally producing more accurate predictions.
稀疏性排序套索(SRL)是针对存在交互作用和多项式时的模型选择和估计而开发的。SRL 的主要原理是,与主效应相比,算法应该先验地对高阶多项式和交互作用持怀疑态度,因此纳入这些更复杂的项需要更高水平的证据。在时间序列中,同样的先验怀疑排序思想也可用于在模型拟合过程中描述序列潜在的复杂季节性自回归(AR)结构,在季节性不确定或具有多种模式的情况下尤其有用。SRL 可以自然地纳入外生变量,并简化推理和/或特征选择的选项。即使是具有高维特征集的大型序列,其拟合过程也非常快速。在这项工作中,我们将讨论这一程序的表述,以及我们通过 fastTS R 软件包为实现这一程序而开发的软件。我们在爱荷华大学医院和诊所对每小时急诊到达人数进行自回归建模的新应用中,探索了基于 SRL 方法的性能。我们发现 SRL 比其竞争对手快得多,同时通常能产生更准确的预测。
{"title":"Fast, effective, and coherent time series modelling using the sparsity-ranked lasso","authors":"Ryan Peterson, Joseph Cavanaugh","doi":"10.1177/1471082x231225307","DOIUrl":"https://doi.org/10.1177/1471082x231225307","url":null,"abstract":"The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more sceptical of higher-order polynomials and interactions a priori compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior scepticism can be applied to characterize the potentially complex seasonal autoregressive (AR) structure of a series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the fastTS R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modelling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while generally producing more accurate predictions.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"55 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140071345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taking advantage of sampling designs in spatial small-area survey studies 在小区域空间调查研究中利用抽样设计的优势
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-03-05 DOI: 10.1177/1471082x231226287
Carlos Vergara-Hernández, Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito
Spatial small area estimation models have become very popular in some contexts, such as disease mapping. Data in disease mapping studies are exhaustive, that is, the available data are supposed to be a complete register of all the observable events. In contrast, some other small area studies do not use exhaustive data, such as survey based studies, where a particular sampling design is typically followed and inferences are later extrapolated to the entire population. In this article we propose a spatial model for small area survey studies, taking advantage of spatial dependence between units, which is the key assumption used for yielding reliable estimates in exhaustive data based studies. In addition, and in contrast to most survey-based spatial studies, we also take into account information on the sampling design and additional supplementary variables to obtain estimates in small areas. This makes it possible to merge spatial and sampling models into a common proposal.
空间小区域估算模型在某些情况下非常流行,例如疾病绘图。疾病绘图研究中的数据是详尽无遗的,也就是说,现有数据应该是所有可观测事件的完整记录。与此相反,其他一些小范围研究并不使用详尽的数据,例如基于调查的研究,通常会遵循特定的抽样设计,然后推断出整个人口的情况。在本文中,我们提出了一种用于小区域调查研究的空间模型,该模型利用了单位之间的空间依赖性,这是在基于详尽数据的研究中获得可靠估计的关键假设。此外,与大多数基于调查的空间研究不同的是,我们还考虑了抽样设计信息和额外的补充变量,以获得小区域的估计值。这使得将空间模型和抽样模型合并为一个共同的建议成为可能。
{"title":"Taking advantage of sampling designs in spatial small-area survey studies","authors":"Carlos Vergara-Hernández, Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito","doi":"10.1177/1471082x231226287","DOIUrl":"https://doi.org/10.1177/1471082x231226287","url":null,"abstract":"Spatial small area estimation models have become very popular in some contexts, such as disease mapping. Data in disease mapping studies are exhaustive, that is, the available data are supposed to be a complete register of all the observable events. In contrast, some other small area studies do not use exhaustive data, such as survey based studies, where a particular sampling design is typically followed and inferences are later extrapolated to the entire population. In this article we propose a spatial model for small area survey studies, taking advantage of spatial dependence between units, which is the key assumption used for yielding reliable estimates in exhaustive data based studies. In addition, and in contrast to most survey-based spatial studies, we also take into account information on the sampling design and additional supplementary variables to obtain estimates in small areas. This makes it possible to merge spatial and sampling models into a common proposal.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"192 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copula-based pairwise estimator for quantile regression with hierarchical missing data 基于 Copula 的分层缺失数据量化回归成对估计器
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-02-28 DOI: 10.1177/1471082x231225806
Anneleen Verhasselt, Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom
Quantile regression can be a helpful technique for analysing clustered (such as longitudinal) data. It can characterize the change in response over time without making distributional assumptions and is robust to outliers in the response. A quantile regression model using a copula-based multivariate asymmetric Laplace distribution for addressing correlation due to clustering is introduced. Furthermore, we propose a pairwise estimator for the parameters of the model. Since it is based on pseudo-likelihood, it needs to be modified to avoid bias in presence of missingness. Therefore, we enhance the model with inverse probability weighting. In this way, our proposal is unbiased under the missing at random assumption. Based on simulations, the estimator is efficient and computationally fast. Finally, the methodology is illustrated using a study in ophthalmology.
定量回归是一种有助于分析聚类(如纵向)数据的技术。它可以描述响应随时间的变化,而无需做出分布假设,并对响应中的异常值具有稳健性。我们介绍了一种使用基于 copula 的多元非对称拉普拉斯分布的量化回归模型,以解决聚类引起的相关性问题。此外,我们还提出了模型参数的成对估计器。由于该方法基于伪似然法,因此需要对其进行修改,以避免在存在缺失的情况下出现偏差。因此,我们用反概率加权来增强模型。这样,在随机缺失假设下,我们的建议是无偏的。根据模拟,该估计器效率高,计算速度快。最后,我们用一项眼科研究来说明这一方法。
{"title":"Copula-based pairwise estimator for quantile regression with hierarchical missing data","authors":"Anneleen Verhasselt, Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom","doi":"10.1177/1471082x231225806","DOIUrl":"https://doi.org/10.1177/1471082x231225806","url":null,"abstract":"Quantile regression can be a helpful technique for analysing clustered (such as longitudinal) data. It can characterize the change in response over time without making distributional assumptions and is robust to outliers in the response. A quantile regression model using a copula-based multivariate asymmetric Laplace distribution for addressing correlation due to clustering is introduced. Furthermore, we propose a pairwise estimator for the parameters of the model. Since it is based on pseudo-likelihood, it needs to be modified to avoid bias in presence of missingness. Therefore, we enhance the model with inverse probability weighting. In this way, our proposal is unbiased under the missing at random assumption. Based on simulations, the estimator is efficient and computationally fast. Finally, the methodology is illustrated using a study in ophthalmology.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"77 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140003632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of jittering on raster- and distance-based geostatistical analyses of DHS data 抖动对基于栅格和距离的人口与健康调查数据地理统计分析的影响
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-02-07 DOI: 10.1177/1471082x231219847
Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad
Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a computationally efficient approach that can be routinely applied. We use the new method to analyse the prevalence of completion of secondary education for 20-49 year old women in Nigeria in 2018 based on the 2018 DHS survey. The analysis demonstrates substantial changes in the estimates of spatial range and fixed effects compared to when we ignore jittering. Through a simulation study that mimics the dataset, we demonstrate that accounting for jittering reduces attenuation in the estimated coefficients for covariates and improves predictions. The results also show that the common approach of averaging covariate values in windows around the observed locations does not lead to the same improvements as accounting for jittering.
根据人口与健康调查(DHS)项目的住户调查,精细的协变量栅格通常被用于绘制人口与健康指标的地理统计模型。然而,地理统计分析忽略了一个事实,即人口与健康调查中的 GPS 坐标是出于保护隐私的目的而抖动的。我们证明了考虑这种抖动的必要性,并提出了一种可常规应用的高效计算方法。我们使用新方法,以 2018 年人口与健康调查为基础,分析了 2018 年尼日利亚 20-49 岁女性完成中等教育的普及率。分析表明,与我们忽略抖动时相比,空间范围和固定效应的估计值发生了很大变化。通过模拟数据集的模拟研究,我们证明考虑抖动可减少协变量估计系数的衰减,并改善预测结果。结果还表明,在观测位置周围的窗口中平均协变量值的常见方法并不能带来与考虑抖动相同的改进。
{"title":"Impact of jittering on raster- and distance-based geostatistical analyses of DHS data","authors":"Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad","doi":"10.1177/1471082x231219847","DOIUrl":"https://doi.org/10.1177/1471082x231219847","url":null,"abstract":"Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a computationally efficient approach that can be routinely applied. We use the new method to analyse the prevalence of completion of secondary education for 20-49 year old women in Nigeria in 2018 based on the 2018 DHS survey. The analysis demonstrates substantial changes in the estimates of spatial range and fixed effects compared to when we ignore jittering. Through a simulation study that mimics the dataset, we demonstrate that accounting for jittering reduces attenuation in the estimated coefficients for covariates and improves predictions. The results also show that the common approach of averaging covariate values in windows around the observed locations does not lead to the same improvements as accounting for jittering.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"10 3 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139949298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A combined overdispersed longitudinal model for nominal data 名义数据的综合过度分散纵向模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-12-21 DOI: 10.1177/1471082x231209361
R. K. Sercundes, G. Molenberghs, G. Verbeke, Clarice G.B. Demétrio, Sila C. da Silva, Rafael A. Moral
Longitudinal studies involving nominal outcomes are carried out in various scientific areas. These outcomes are frequently modelled using a generalized linear mixed modelling (GLMM) framework. This widely used approach allows for the modelling of the hierarchy in the data to accommodate different degrees of overdispersion. In this article, a combined model (CM) that takes into account overdispersion and clustering through two separate sets of random effects is formulated. Maximum likelihood estimation with analytic-numerical integration is used to estimate the model parameters. To examine the relative performance of the CM and the GLMM, simulation studies were carried out, exploring scenarios with different sample sizes, types of random effects, and overdispersion. Both models were applied to a real dataset obtained from an experiment in agriculture. We also provide an implementation of these models through SAS code.
各种科学领域都开展了涉及名义结果的纵向研究。这些结果经常使用广义线性混合建模(GLMM)框架进行建模。这种广泛使用的方法允许对数据中的层次结构进行建模,以适应不同程度的过度分散。本文提出了一种组合模型(CM),通过两组独立的随机效应将过度分散和聚类考虑在内。采用最大似然估计法和分析-数值积分法来估计模型参数。为了检验 CM 和 GLMM 的相对性能,进行了模拟研究,探讨了不同样本大小、随机效应类型和过度分散的情况。这两种模型都被应用于从农业实验中获得的真实数据集。我们还通过 SAS 代码提供了这些模型的实现方法。
{"title":"A combined overdispersed longitudinal model for nominal data","authors":"R. K. Sercundes, G. Molenberghs, G. Verbeke, Clarice G.B. Demétrio, Sila C. da Silva, Rafael A. Moral","doi":"10.1177/1471082x231209361","DOIUrl":"https://doi.org/10.1177/1471082x231209361","url":null,"abstract":"Longitudinal studies involving nominal outcomes are carried out in various scientific areas. These outcomes are frequently modelled using a generalized linear mixed modelling (GLMM) framework. This widely used approach allows for the modelling of the hierarchy in the data to accommodate different degrees of overdispersion. In this article, a combined model (CM) that takes into account overdispersion and clustering through two separate sets of random effects is formulated. Maximum likelihood estimation with analytic-numerical integration is used to estimate the model parameters. To examine the relative performance of the CM and the GLMM, simulation studies were carried out, exploring scenarios with different sample sizes, types of random effects, and overdispersion. Both models were applied to a real dataset obtained from an experiment in agriculture. We also provide an implementation of these models through SAS code.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"32 24","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138948041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A flexible Bayesian hierarchical quantile spatial model for areal data 用于地形数据的灵活贝叶斯分层量化空间模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-12-21 DOI: 10.1177/1471082x231204930
Rafael Cabral Fernandez, Kelly Cristina Mota Gonçalves, João Batista de Morais Pereira
This article introduces a new class of nested models that extends the literature standard combination of spatial autoregressive model for areal data with parametric quantile regression by considering an asymmetric Laplace distribution for the random errors. In addition to being more flexible, the new proposed model can incorporate a hierarchical structure, allowing it to deal with clustered data. Such an approach produces a robust statistical method for modeling the quantiles of areal data distributed in a geographically hierarchical setting. The proposed non-hierarchical model is evaluated using a wellknown house pricing dataset and a simulation study. In addition, its hierarchical version is applied to a real dataset of math scores related to public high schools within the metropolitan area of Rio de Janeiro, Brazil.
本文介绍了一类新的嵌套模型,通过考虑随机误差的非对称拉普拉斯分布,扩展了文献中的空间自回归模型与参数量子回归的标准组合。除了更加灵活之外,新提出的模型还可以结合层次结构,从而处理聚类数据。这种方法产生了一种稳健的统计方法,可用于对分布在地理分层环境中的areal数据进行量化建模。我们使用一个著名的房屋定价数据集和一项模拟研究对所提出的非层次模型进行了评估。此外,该模型的分层版本还应用于巴西里约热内卢大都会地区公立高中数学分数的真实数据集。
{"title":"A flexible Bayesian hierarchical quantile spatial model for areal data","authors":"Rafael Cabral Fernandez, Kelly Cristina Mota Gonçalves, João Batista de Morais Pereira","doi":"10.1177/1471082x231204930","DOIUrl":"https://doi.org/10.1177/1471082x231204930","url":null,"abstract":"This article introduces a new class of nested models that extends the literature standard combination of spatial autoregressive model for areal data with parametric quantile regression by considering an asymmetric Laplace distribution for the random errors. In addition to being more flexible, the new proposed model can incorporate a hierarchical structure, allowing it to deal with clustered data. Such an approach produces a robust statistical method for modeling the quantiles of areal data distributed in a geographically hierarchical setting. The proposed non-hierarchical model is evaluated using a wellknown house pricing dataset and a simulation study. In addition, its hierarchical version is applied to a real dataset of math scores related to public high schools within the metropolitan area of Rio de Janeiro, Brazil.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"62 7","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A joint normal-binary (probit) model for high-dimensional longitudinal data 高维纵向数据的正态-二元(probit)联合模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2023-12-08 DOI: 10.1177/1471082x231202341
Margaux Delporte, Steffen Fieuws, G. Molenberghs, G. Verbeke, D. De Coninck, Vera Hoorens
In many biomedical studies multiple responses are collected over time, which results in highdimensional longitudinal data. It is often of interest to model the continuous and binary responses jointly, which can be done with joint generalized mixed models in which the association is modelled through random effects. Investigating the association between the responses is often limited to scrutinizing the correlations between the latent random effects. In this article, this approach is extended by deriving closed-form formulas for the manifest correlations (and corresponding standard errors), which reflects the correlation between the observed responses as observed. In addition, the marginal joint model is constructed, from which predictions of subvectors of one response conditional on subvectors of other response(s) and potentially a subvector of the history of the response can be derived. Corresponding prediction and confidence intervals are constructed. Two case studies are discussed, in which further pseudo-likelihood methodology is applied to reduce the computational complexity.
在许多生物医学研究中,随着时间的推移收集了多个响应,这导致高维纵向数据。将连续响应和二元响应联合建模是一个很有意义的问题,这可以用联合广义混合模型来实现,其中关联是通过随机效应来建模的。调查反应之间的关联通常仅限于仔细检查潜在随机效应之间的相关性。在本文中,通过推导明显相关性(和相应的标准误差)的封闭形式公式,扩展了这种方法,它反映了观察到的响应之间的相关性。此外,构建了边际联合模型,从该模型中可以推导出一个响应的子向量的预测,该预测以其他响应的子向量为条件,并可能推导出响应历史的子向量。构造了相应的预测区间和置信区间。讨论了两个案例研究,其中进一步应用伪似然方法来降低计算复杂度。
{"title":"A joint normal-binary (probit) model for high-dimensional longitudinal data","authors":"Margaux Delporte, Steffen Fieuws, G. Molenberghs, G. Verbeke, D. De Coninck, Vera Hoorens","doi":"10.1177/1471082x231202341","DOIUrl":"https://doi.org/10.1177/1471082x231202341","url":null,"abstract":"In many biomedical studies multiple responses are collected over time, which results in highdimensional longitudinal data. It is often of interest to model the continuous and binary responses jointly, which can be done with joint generalized mixed models in which the association is modelled through random effects. Investigating the association between the responses is often limited to scrutinizing the correlations between the latent random effects. In this article, this approach is extended by deriving closed-form formulas for the manifest correlations (and corresponding standard errors), which reflects the correlation between the observed responses as observed. In addition, the marginal joint model is constructed, from which predictions of subvectors of one response conditional on subvectors of other response(s) and potentially a subvector of the history of the response can be derived. Corresponding prediction and confidence intervals are constructed. Two case studies are discussed, in which further pseudo-likelihood methodology is applied to reduce the computational complexity.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"28 10","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138589253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Modelling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1