Pub Date : 2024-06-01Epub Date: 2024-05-22DOI: 10.1177/0282423x241244917
Harrison Quick, Guangzi Song
When analyzing spatially referenced event data, the criteria for declaring rates as "reliable" is still a matter of dispute. What these varying criteria have in common, however, is that they are rarely satisfied for crude estimates in small area analysis settings, prompting the use of spatial models to improve reliability. While reasonable, recent work has quantified the extent to which popular models from the spatial statistics literature can overwhelm the information contained in the data, leading to oversmoothing. Here, we begin by providing a definition for a "reliable" estimate for event rates that can be used for crude and model-based estimates and allows for discrete and continuous statements of reliability. We then construct a spatial Bayesian framework that allows users to infuse prior information into their models to improve reliability while also guarding against oversmoothing. We apply our approach to county-level birth data from Pennsylvania, highlighting the effect of oversmoothing in spatial models and how our approach can allow users to better focus their attention to areas where sufficient data exists to drive inferential decisions. We then conclude with a brief discussion of how this definition of reliability can be used in the design of small area studies.
{"title":"Reliable event rates for disease mapping.","authors":"Harrison Quick, Guangzi Song","doi":"10.1177/0282423x241244917","DOIUrl":"10.1177/0282423x241244917","url":null,"abstract":"<p><p>When analyzing spatially referenced event data, the criteria for declaring rates as \"reliable\" is still a matter of dispute. What these varying criteria have in common, however, is that they are rarely satisfied for crude estimates in small area analysis settings, prompting the use of spatial models to improve reliability. While reasonable, recent work has quantified the extent to which popular models from the spatial statistics literature can overwhelm the information contained in the data, leading to oversmoothing. Here, we begin by providing a definition for a \"reliable\" estimate for event rates that can be used for crude and model-based estimates and allows for discrete and continuous statements of reliability. We then construct a spatial Bayesian framework that allows users to infuse prior information into their models to improve reliability while also guarding against oversmoothing. We apply our approach to county-level birth data from Pennsylvania, highlighting the effect of oversmoothing in spatial models and how our approach can allow users to better focus their attention to areas where sufficient data exists to drive inferential decisions. We then conclude with a brief discussion of how this definition of reliability can be used in the design of small area studies.</p>","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"40 2","pages":"333-347"},"PeriodicalIF":0.5,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we propose a framework for small area estimation with multiply imputed survey data. Many statistical surveys suffer from (a) high nonresponse rates due to sensitive questions and response burden and (b) too small sample sizes to allow for reliable estimates on (unplanned) disaggregated levels due to budget constraints. One way to deal with missing values is to replace them by several plausible/imputed values based on a model. Small area estimation, such as the model by Fay and Herriot, is applied to estimate regionally disaggregated indicators when direct estimates are imprecise. The framework presented tackles simultaneously multiply imputed values and imprecise direct estimates. In particular, we extend the general class of transformed Fay-Herriot models to account for the additional uncertainty from multiple imputation. We derive three special cases of the Fay-Herriot model with particular transformations and provide point and mean squared error estimators. Depending on the case, the mean squared error is estimated by analytic solutions or resampling methods. Comprehensive simulations in a controlled environment show that the proposed methodology leads to reliable and precise results in terms of bias and mean squared error. The methodology is illustrated by a real data example using European wealth data.
{"title":"Small Area with Multiply Imputed Survey Data","authors":"Marina Runge, Timo Schmid","doi":"10.2478/jos-2023-0024","DOIUrl":"https://doi.org/10.2478/jos-2023-0024","url":null,"abstract":"In this article, we propose a framework for small area estimation with multiply imputed survey data. Many statistical surveys suffer from (a) high nonresponse rates due to sensitive questions and response burden and (b) too small sample sizes to allow for reliable estimates on (unplanned) disaggregated levels due to budget constraints. One way to deal with missing values is to replace them by several plausible/imputed values based on a model. Small area estimation, such as the model by Fay and Herriot, is applied to estimate regionally disaggregated indicators when direct estimates are imprecise. The framework presented tackles simultaneously multiply imputed values and imprecise direct estimates. In particular, we extend the general class of transformed Fay-Herriot models to account for the additional uncertainty from multiple imputation. We derive three special cases of the Fay-Herriot model with particular transformations and provide point and mean squared error estimators. Depending on the case, the mean squared error is estimated by analytic solutions or resampling methods. Comprehensive simulations in a controlled environment show that the proposed methodology leads to reliable and precise results in terms of bias and mean squared error. The methodology is illustrated by a real data example using European wealth data.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sampling variance smoothing is an important topic in small area estimation. In this article, we propose sampling variance smoothing methods for small area proportion estimation. In particular, we consider the generalized variance function and design effect methods for sampling variance smoothing. We evaluate and compare the smoothed sampling variances and small area estimates based on the smoothed variance estimates through analysis of survey data from Statistics Canada. The results from real data analysis and simulation study indicate that the proposed sampling variance smoothing methods perform very well for small area estimation.
{"title":"Application of Sampling Variance Smoothing Methods for Small Area Proportion Estimation","authors":"Yong You, Mike Hidiroglou","doi":"10.2478/jos-2023-0026","DOIUrl":"https://doi.org/10.2478/jos-2023-0026","url":null,"abstract":"Sampling variance smoothing is an important topic in small area estimation. In this article, we propose sampling variance smoothing methods for small area proportion estimation. In particular, we consider the generalized variance function and design effect methods for sampling variance smoothing. We evaluate and compare the smoothed sampling variances and small area estimates based on the smoothed variance estimates through analysis of survey data from Statistics Canada. The results from real data analysis and simulation study indicate that the proposed sampling variance smoothing methods perform very well for small area estimation.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"6 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile network data records are promising for measuring temporal changes in present populations. This promise has been boosted since high-frequency passively-collected signaling data became available. Its temporal event rate is considerably higher than that of Call Detail Records – on which most of the previous literature is based. Yet, we show it remains a challenge to produce statistics consistent over time, robust to changes in the “measuring instruments” and conveying spatial uncertainty to the end user. In this article, we propose a methodology to estimate – consistently over several months – hourly population presence over France based on signaling data spatially merged with fine-grained official population counts. We draw particular attention to consistency at several spatial scales and over time and to spatial mapping reflecting spatial accuracy. We compare the results with external references and discuss the challenges which remain. We argue data fusion approaches between fine-grained official statistics data sets and mobile network data, spatially merged to preserve privacy, are promising for future methodologies.
{"title":"Temporally Consistent Present Population from Mobile Network Signaling Data for Official Statistics","authors":"Milena Suarez Castillo, Francois Sémécurbe, Cezary Ziemlicki, Haixuan Xavier Tao, Tom Seimandi","doi":"10.2478/jos-2023-0025","DOIUrl":"https://doi.org/10.2478/jos-2023-0025","url":null,"abstract":"Mobile network data records are promising for measuring temporal changes in present populations. This promise has been boosted since high-frequency passively-collected signaling data became available. Its temporal event rate is considerably higher than that of Call Detail Records – on which most of the previous literature is based. Yet, we show it remains a challenge to produce statistics consistent over time, robust to changes in the “measuring instruments” and conveying spatial uncertainty to the end user. In this article, we propose a methodology to estimate – consistently over several months – hourly population presence over France based on signaling data spatially merged with fine-grained official population counts. We draw particular attention to consistency at several spatial scales and over time and to spatial mapping reflecting spatial accuracy. We compare the results with external references and discuss the challenges which remain. We argue data fusion approaches between fine-grained official statistics data sets and mobile network data, spatially merged to preserve privacy, are promising for future methodologies.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"79 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Obtaining reliable estimates in small areas is a challenge because of the coverage and periodicity of data collection. Several techniques of small area estimation have been proposed to produce quality measures in small areas, but few of them are focused on updating these estimates. By combining the attributes of the most recent versions of the structure-preserving estimation methods, this article proposes a new alternative to estimate and update cross-classified counts for small domains, when the variable of interest is not available in the census. The proposed methodology is used to obtain and up-date estimates of the incidence of poverty in 81 Costa Rican cantons for six postcensal years (2012–2017). As uncertainty measures, mean squared errors are estimated via parametric bootstrap, and the adequacy of the proposed method is assessed with a design-based simulation.
{"title":"Small Area Estimates of Poverty Incidence in Costa Rica under a Structure Preserving Estimation (SPREE) Approach","authors":"Alejandra Arias-Salazar","doi":"10.2478/jos-2023-0021","DOIUrl":"https://doi.org/10.2478/jos-2023-0021","url":null,"abstract":"Obtaining reliable estimates in small areas is a challenge because of the coverage and periodicity of data collection. Several techniques of small area estimation have been proposed to produce quality measures in small areas, but few of them are focused on updating these estimates. By combining the attributes of the most recent versions of the structure-preserving estimation methods, this article proposes a new alternative to estimate and update cross-classified counts for small domains, when the variable of interest is not available in the census. The proposed methodology is used to obtain and up-date estimates of the incidence of poverty in 81 Costa Rican cantons for six postcensal years (2012–2017). As uncertainty measures, mean squared errors are estimated via parametric bootstrap, and the adequacy of the proposed method is assessed with a design-based simulation.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"60 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Li, Le Qi, Yichen Qin, Cunjie Lin, Yuhong Yang
In this study, we advocate a two-stage framework to deal with the issues encountered in surveys with long questionnaires. In Stage I, we propose a split questionnaire design (SQD) developed by minimizing a quadratic cost function while achieving reliability constraints on estimates of means, which effectively reduces the survey cost, alleviates the burden on the respondents, and potentially improves data quality. In Stage II, we develop a block weighted least squares (BWLS) estimator of linear regression coefficients that can be used with data obtained from the SQD obtained in Stage I. Numerical studies comparing existing methods strongly favor the proposed estimator in terms of prediction and estimation accuracy. Using the European Social Survey (ESS) data, we demonstrate that the proposed SQD can substantially reduce the survey cost and the number of questions answered by each respondent, and the proposed estimator is much more interpretable and efficient than present alternatives for the SQD data.
{"title":"Block Weighted Least Squares Estimation for Nonlinear Cost-based Split Questionnaire Design","authors":"Yang Li, Le Qi, Yichen Qin, Cunjie Lin, Yuhong Yang","doi":"10.2478/jos-2023-0022","DOIUrl":"https://doi.org/10.2478/jos-2023-0022","url":null,"abstract":"In this study, we advocate a two-stage framework to deal with the issues encountered in surveys with long questionnaires. In Stage I, we propose a split questionnaire design (SQD) developed by minimizing a quadratic cost function while achieving reliability constraints on estimates of means, which effectively reduces the survey cost, alleviates the burden on the respondents, and potentially improves data quality. In Stage II, we develop a block weighted least squares (BWLS) estimator of linear regression coefficients that can be used with data obtained from the SQD obtained in Stage I. Numerical studies comparing existing methods strongly favor the proposed estimator in terms of prediction and estimation accuracy. Using the European Social Survey (ESS) data, we demonstrate that the proposed SQD can substantially reduce the survey cost and the number of questions answered by each respondent, and the proposed estimator is much more interpretable and efficient than present alternatives for the SQD data.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"26 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joeri Minnen, Sven Rymenants, Ignace Glorieux, Theun Pieter van Tienoven
The modernization of the production of official statistics faces challenges related to technological developments, budget cuts, and growing privacy concerns. At the same time, there is a need for shareable and scalable platforms to support comparable data, leading to several online data collection strategies being rolled out. Time Use Surveys (TUS) are particularly affected by these challenges and needs as they (while producing rich data) are complex, time-intensive studies (because they include multiple tasks and are administered at the household level). This article introduces the Modular Online Time Use Survey (MOTUS) data collection platform and explains how it accommodates the challenges of and changes in the production of a TUS that is carried out in line with the Harmonized European Time Use Survey guidelines. It argues that MOTUS supports a shift in the methodological paradigm of conducting TUS by being timelier and more cost efficient, by lowering respondent burden, and by improving the reliability of the data collected. Importantly, the modular structure allows MOTUS to be easily deployed for various TUS configurations. Moreover, this versatile structure allows comparable, complex diary surveys (such as the household budget survey) to be performed on the same platform and with the same applications.
官方统计数据编制工作的现代化面临着与技术发展、预算削减和日益增长的隐私关切有关的挑战。与此同时,还需要可共享和可扩展的平台来支持可比数据,因此推出了若干在线数据收集战略。时间利用调查(TUS)尤其受到这些挑战和需求的影响,因为它们(在产生丰富数据的同时)是复杂的时间密集型研究(因为它们包括多项任务,并在家庭层面进行管理)。本文介绍了模块化在线时间使用情况调查(MOTUS)数据收集平台,并解释了该平台如何应对根据欧洲时间使用情况统一调查指南开展的时间使用情况调查所带来的挑战和变化。报告认为,MOTUS 通过更及时、更具成本效益、减轻受访者负担和提高所收集数据的可靠性,支持了时间使用调查方法范式的转变。重要的是,模块化结构使 MOTUS 能够方便地用于各种 TUS 配置。此外,这种多功能结构还允许在同一平台上使用相同的应用程序进行可比的复杂日记调查(如家庭预算调查)。
{"title":"Answering Current Challenges of and Changes in Producing Official Time Use Statistics Using the Data Collection Platform MOTUS","authors":"Joeri Minnen, Sven Rymenants, Ignace Glorieux, Theun Pieter van Tienoven","doi":"10.2478/jos-2023-0023","DOIUrl":"https://doi.org/10.2478/jos-2023-0023","url":null,"abstract":"The modernization of the production of official statistics faces challenges related to technological developments, budget cuts, and growing privacy concerns. At the same time, there is a need for shareable and scalable platforms to support comparable data, leading to several online data collection strategies being rolled out. Time Use Surveys (TUS) are particularly affected by these challenges and needs as they (while producing rich data) are complex, time-intensive studies (because they include multiple tasks and are administered at the household level). This article introduces the Modular Online Time Use Survey (MOTUS) data collection platform and explains how it accommodates the challenges of and changes in the production of a TUS that is carried out in line with the Harmonized European Time Use Survey guidelines. It argues that MOTUS supports a shift in the methodological paradigm of conducting TUS by being timelier and more cost efficient, by lowering respondent burden, and by improving the reliability of the data collected. Importantly, the modular structure allows MOTUS to be easily deployed for various TUS configurations. Moreover, this versatile structure allows comparable, complex diary surveys (such as the household budget survey) to be performed on the same platform and with the same applications.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"56 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138566666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Book Review: Silvia Biffignandi and Jelke Bethlehem. Handbook of Web Surveys, 2nd edition. 2021 Wiley, ISBN: 978-1-119-37168-7, 624 pps","authors":"Maria del Mar Rueda Garcia","doi":"10.2478/jos-2023-0027","DOIUrl":"https://doi.org/10.2478/jos-2023-0027","url":null,"abstract":"","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"114 1","pages":"591 - 595"},"PeriodicalIF":1.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138621765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial Collaborators","authors":"","doi":"10.2478/jos-2023-0028","DOIUrl":"https://doi.org/10.2478/jos-2023-0028","url":null,"abstract":"","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":" 8","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138614774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index to Volume 39, 2023","authors":"","doi":"10.2478/jos-2023-0029","DOIUrl":"https://doi.org/10.2478/jos-2023-0029","url":null,"abstract":"","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"103 ","pages":"601 - 603"},"PeriodicalIF":1.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138988955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}