Abstract This Letter to Editor is a supplement to the previously published article in the Journal of Official Statistics (Wazir and Goujon 2021). In 2021, a reconstruction method using demographic analysis for assessing the quality and validity of the 2017 census data has been applied, by critically investigating the demographic changes in the intercensal period at national and provincial levels. However, at the time when the article was written, the age and sex structure of the population from the 2017 census had not yet been published, making it hard to fully appreciate the reconstruction of the national and subnational level populations. In the meantime, detailed data have become available and offer the possibility to assess the reconstruction’s outcome more in detail. Therefore, this letter aims two-fold: (1) to analyze the quality of the age and sex distribution in the 2017 Population census of Pakistan, and (2) to compare the reconstruction by age and sex to the results of the 2017 population census. Our results reveal that the age and sex structure of the population as estimated by the 2017 census suffer from some irregularities. Our analysis by age and sex reinforces the main conclusion of previous article that the next census in Pakistan should increase in quality with an inbuild post-enumeration survey along with post-census demographic analysis.
{"title":"Letter to Editor Quality of 2017 Population Census of Pakistan by Age and Sex","authors":"A. Wazir, Anne Goujon","doi":"10.2478/jos-2023-0013","DOIUrl":"https://doi.org/10.2478/jos-2023-0013","url":null,"abstract":"Abstract This Letter to Editor is a supplement to the previously published article in the Journal of Official Statistics (Wazir and Goujon 2021). In 2021, a reconstruction method using demographic analysis for assessing the quality and validity of the 2017 census data has been applied, by critically investigating the demographic changes in the intercensal period at national and provincial levels. However, at the time when the article was written, the age and sex structure of the population from the 2017 census had not yet been published, making it hard to fully appreciate the reconstruction of the national and subnational level populations. In the meantime, detailed data have become available and offer the possibility to assess the reconstruction’s outcome more in detail. Therefore, this letter aims two-fold: (1) to analyze the quality of the age and sex distribution in the 2017 Population census of Pakistan, and (2) to compare the reconstruction by age and sex to the results of the 2017 population census. Our results reveal that the age and sex structure of the population as estimated by the 2017 census suffer from some irregularities. Our analysis by age and sex reinforces the main conclusion of previous article that the next census in Pakistan should increase in quality with an inbuild post-enumeration survey along with post-census demographic analysis.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"275 - 284"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48735237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this article, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.
{"title":"Database Reconstruction Is Not So Easy and Is Different from Reidentification","authors":"Krishnamurty Muralidhar, Josep Domingo-Ferrer","doi":"10.2478/jos-2023-0017","DOIUrl":"https://doi.org/10.2478/jos-2023-0017","url":null,"abstract":"Abstract In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this article, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Surveys estimate and monitor a variety of data collection parameters, including response propensity, number of contacts, and data collection costs. These parameters can be used as inputs to a responsive/adaptive design or to monitor the progression of a data collection period against predefined expectations. Recently, Bayesian methods have emerged as a method for combining historical information or external data with data from the in-progress data collection period to improve prediction. We develop a Bayesian method for predicting a measure of case-level progress or productivity, the estimated time lag, in days, between first contact attempt and first respondent contact. We compare the quality of predictions from the Bayesian method to predictions generated from more commonly-used predictive methods that leverage data from only historical data collection periods or the in-progress round of data collection. Using prediction error and misclassification as short- or long- day lags, we demonstrate that the Bayesian method results in improved predictions close to the day of the first contact attempt, when these predictions may be most informative for interventions or interviewer feedback. This application adds to evidence that combining historical and current information about data collection, in a Bayesian framework, can improve predictions of data collection parameters.
{"title":"Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach","authors":"Stephanie Coffey, Michael R. Elliott","doi":"10.2478/jos-2023-0015","DOIUrl":"https://doi.org/10.2478/jos-2023-0015","url":null,"abstract":"Abstract Surveys estimate and monitor a variety of data collection parameters, including response propensity, number of contacts, and data collection costs. These parameters can be used as inputs to a responsive/adaptive design or to monitor the progression of a data collection period against predefined expectations. Recently, Bayesian methods have emerged as a method for combining historical information or external data with data from the in-progress data collection period to improve prediction. We develop a Bayesian method for predicting a measure of case-level progress or productivity, the estimated time lag, in days, between first contact attempt and first respondent contact. We compare the quality of predictions from the Bayesian method to predictions generated from more commonly-used predictive methods that leverage data from only historical data collection periods or the in-progress round of data collection. Using prediction error and misclassification as short- or long- day lags, we demonstrate that the Bayesian method results in improved predictions close to the day of the first contact attempt, when these predictions may be most informative for interventions or interviewer feedback. This application adds to evidence that combining historical and current information about data collection, in a Bayesian framework, can improve predictions of data collection parameters.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"325 - 349"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45295208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Benassi, Massimo Mucciardi, Giovanni Pirrotta
Abstract In the article a new approach for measuring the spatial concentration of human population is presented and tested. The new procedure is based on the concept of concentration introduced by Gini and, at the same time, on its spatial extension (i.e., taking into account the concept of spatial autocorrelation, polarization). The proposed indicator, the Spatial Gini Index, is then computed by using two different kind of territorial partitioning methods: MaxMin (MM) and the Constant Step (CS) distance. In this framework an ad hoc extension of the Rey and Smith decomposition method is then introduced. We apply this new approach to the Italian and foreign population resident in almost 7,900 statistical units (Italian municipalities) in 2002, 2010 and 2018. All elaborations are based on a new ad hoc library developed and implemented in Python.
{"title":"Looking for a New Approach to Measuring the Spatial Concentration of the Human Population","authors":"Federico Benassi, Massimo Mucciardi, Giovanni Pirrotta","doi":"10.2478/jos-2023-0014","DOIUrl":"https://doi.org/10.2478/jos-2023-0014","url":null,"abstract":"Abstract In the article a new approach for measuring the spatial concentration of human population is presented and tested. The new procedure is based on the concept of concentration introduced by Gini and, at the same time, on its spatial extension (i.e., taking into account the concept of spatial autocorrelation, polarization). The proposed indicator, the Spatial Gini Index, is then computed by using two different kind of territorial partitioning methods: MaxMin (MM) and the Constant Step (CS) distance. In this framework an ad hoc extension of the Rey and Smith decomposition method is then introduced. We apply this new approach to the Italian and foreign population resident in almost 7,900 statistical units (Italian municipalities) in 2002, 2010 and 2018. All elaborations are based on a new ad hoc library developed and implemented in Python.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"285 - 324"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49661739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.
{"title":"Towards Demand-Driven On-The-Fly Statistics","authors":"T. Gelsema, Guido van den Heuvel","doi":"10.2478/jos-2023-0016","DOIUrl":"https://doi.org/10.2478/jos-2023-0016","url":null,"abstract":"Abstract A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"351 - 379"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42520574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The Article Database Reconstruction is Not So Easy and Is Different from Reidentification, by Krish Muralidhar and Josep Domingo-Ferrer, is an extended attack on the decision of the U.S. Census Bureau to turn its back on legacy statistical disclosure limitation techniques and instead use a bespoke algorithm based on differential privacy to protect the published data products of the Census Bureau’s 2020 Census of Population and Housing (henceforth referred to as the 2020 Census). This response explains why differential privacy was the only realistic choice for protecting sensitive data collected for the 2020 Census. However, differential privacy has a social cost: it requires that practitioners admit that there is inherently a trade-off between the utility of published official statistics and the privacy loss of those whose data are collected under a pledge of confidentiality.
{"title":"Comment to Mulalidhar and Domingo-Ferrer (2023) – Legacy Statistical Disclosure Limitation Techniques Were Not An Option for the 2020 US Census of Population And Housing","authors":"S. Garfinkel","doi":"10.2478/jos-2023-0018","DOIUrl":"https://doi.org/10.2478/jos-2023-0018","url":null,"abstract":"Abstract The Article Database Reconstruction is Not So Easy and Is Different from Reidentification, by Krish Muralidhar and Josep Domingo-Ferrer, is an extended attack on the decision of the U.S. Census Bureau to turn its back on legacy statistical disclosure limitation techniques and instead use a bespoke algorithm based on differential privacy to protect the published data products of the Census Bureau’s 2020 Census of Population and Housing (henceforth referred to as the 2020 Census). This response explains why differential privacy was the only realistic choice for protecting sensitive data collected for the 2020 Census. However, differential privacy has a social cost: it requires that practitioners admit that there is inherently a trade-off between the utility of published official statistics and the privacy loss of those whose data are collected under a pledge of confidentiality.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"399 - 410"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49290547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In our article “Database Reconstruction Is Not So Easy and Is Different from Reidentification”, we show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques, also sometimes called legacy statistical disclosure limitation (SDL) techniques. Furthermore, we also point out that, even if reconstruction can be performed, it does not imply reidentification. Hence, the risk of reconstruction does not seem to warrant replacing traditional SDC techniques with differential privacy (DP) based protection. In “Legacy Statistical Disclosure Limitation Techniques Were Not an Option for the 2020 US Census of Population and Housing”, by Simson Garfinkel, the author insists that the 2020 Census move to DP was justified. In our view, this latter article contains some misconceptions that we identify and discuss in some detail below. Consequently, we stand by the arguments given in “Database Reconstruction Is Not So Easy:: :”.
{"title":"A Rejoinder to Garfinkel (2023) – Legacy Statistical Disclosure Limitation Techniques for Protecting 2020 Decennial US Census: Still a Viable Option","authors":"K. Muralidhar, J. Domingo-Ferrer","doi":"10.2478/jos-2023-0019","DOIUrl":"https://doi.org/10.2478/jos-2023-0019","url":null,"abstract":"Abstract In our article “Database Reconstruction Is Not So Easy and Is Different from Reidentification”, we show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques, also sometimes called legacy statistical disclosure limitation (SDL) techniques. Furthermore, we also point out that, even if reconstruction can be performed, it does not imply reidentification. Hence, the risk of reconstruction does not seem to warrant replacing traditional SDC techniques with differential privacy (DP) based protection. In “Legacy Statistical Disclosure Limitation Techniques Were Not an Option for the 2020 US Census of Population and Housing”, by Simson Garfinkel, the author insists that the 2020 Census move to DP was justified. In our view, this latter article contains some misconceptions that we identify and discuss in some detail below. Consequently, we stand by the arguments given in “Database Reconstruction Is Not So Easy:: :”.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"411 - 420"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43397602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Common practice to address nonresponse in probability surveys in National Statistical Offices is to follow up every non respondent with a view to lifting response rates. As response rate is an insufficient indicator of data quality, it is argued that one should follow up non respondents with a view to reducing the mean squared error (MSE) of the estimator of the variable of interest. In this article, we propose a method to allocate the nonresponse follow-up resources in such a way as to minimise the MSE under a quasi-randomisation framework. An example to illustrate the method using the 2018/19 Rural Environment and Agricultural Commodities Survey from the Australian Bureau of Statistics is provided.
{"title":"A Note on the Optimum Allocation of Resources to Follow up Unit Nonrespondents in Probability Surveys","authors":"Siu-Ming Tam, A. Holmberg, Summer Wang","doi":"10.2478/jos-2023-0020","DOIUrl":"https://doi.org/10.2478/jos-2023-0020","url":null,"abstract":"Abstract Common practice to address nonresponse in probability surveys in National Statistical Offices is to follow up every non respondent with a view to lifting response rates. As response rate is an insufficient indicator of data quality, it is argued that one should follow up non respondents with a view to reducing the mean squared error (MSE) of the estimator of the variable of interest. In this article, we propose a method to allocate the nonresponse follow-up resources in such a way as to minimise the MSE under a quasi-randomisation framework. An example to illustrate the method using the 2018/19 Rural Environment and Agricultural Commodities Survey from the Australian Bureau of Statistics is provided.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"421 - 433"},"PeriodicalIF":1.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47288555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew M. Raim, T. Mathew, Kimberly F. Sellers, Renée Ellis, Mikelyn Meyers
Abstract Statistical agencies depend on responses to inquiries made to the public, and occasionally conduct experiments to improve contact procedures. Agencies may wish to assess whether there is significant change in response rates due to an operational refinement. This work considers the assessment of response rates when up to L attempts are made to contact each subject, and subjects receive one of J possible variations of the operation under experimentation. In particular, the continuation-ratio logit (CRL) model facilitates inference on the probability of success at each step of the sequence, given that failures occurred at previous attempts. The CRL model is investigated as a basis for sample size determination– one of the major decisions faced by an experimenter–to attain a desired power under a Wald test of a general linear hypothesis. An experiment that was conducted for nonresponse followup in the United States 2020 decennial census provides a motivating illustration.
{"title":"Design and Sample Size Determination for Experiments on Nonresponse Followup using a Sequential Regression Model","authors":"Andrew M. Raim, T. Mathew, Kimberly F. Sellers, Renée Ellis, Mikelyn Meyers","doi":"10.2478/jos-2023-0009","DOIUrl":"https://doi.org/10.2478/jos-2023-0009","url":null,"abstract":"Abstract Statistical agencies depend on responses to inquiries made to the public, and occasionally conduct experiments to improve contact procedures. Agencies may wish to assess whether there is significant change in response rates due to an operational refinement. This work considers the assessment of response rates when up to L attempts are made to contact each subject, and subjects receive one of J possible variations of the operation under experimentation. In particular, the continuation-ratio logit (CRL) model facilitates inference on the probability of success at each step of the sequence, given that failures occurred at previous attempts. The CRL model is investigated as a basis for sample size determination– one of the major decisions faced by an experimenter–to attain a desired power under a Wald test of a general linear hypothesis. An experiment that was conducted for nonresponse followup in the United States 2020 decennial census provides a motivating illustration.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"173 - 202"},"PeriodicalIF":1.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43302803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daan B. Zult, Sabine Krieg, B. Schouten, P. Ouwehand, Jan van den Brakel
Abstract Short-term business statistics at Statistics Netherlands are largely based on Value Added Tax (VAT) administrations. Companies may decide to file their tax return on a monthly, quarterly, or annual basis. Most companies file their tax return quarterly. So far, these VAT based short-term business statistics are published with a quarterly frequency as well. In this article we compare different methods to compile monthly figures, even though a major part of these data is observed quarterly. The methods considered to produce a monthly indicator must address two issues. The first issue is to combine a high- and low-frequency series into a single high-frequency series, while both series measure the same phenomenon of the target population. The appropriate method that is designed for this purpose is usually referred to as “benchmarking”. The second issue is a missing data problem, because the first and second month of a quarter are published before the corresponding quarterly data is available. A “nowcast” method can be used to estimate these months. The literature on mixed frequency models provides solutions for both problems, sometimes by dealing with them simultaneously. In this article we combine different benchmarking and nowcasting models and evaluate combinations. Our evaluation distinguishes between relatively stable periods and periods during and after a crisis because different approaches might be optimal under these two conditions. We find that during stable periods the so-called Bridge models perform slightly better than the alternatives considered. Until about fifteen months after a crisis, the models that rely heavier on historic patterns such as the Bridge, MIDAS and structural time series models are outperformed by more straightforward (S)ARIMA approaches.
{"title":"From Quarterly to Monthly Turnover Figures Using Nowcasting Methods","authors":"Daan B. Zult, Sabine Krieg, B. Schouten, P. Ouwehand, Jan van den Brakel","doi":"10.2478/jos-2023-0012","DOIUrl":"https://doi.org/10.2478/jos-2023-0012","url":null,"abstract":"Abstract Short-term business statistics at Statistics Netherlands are largely based on Value Added Tax (VAT) administrations. Companies may decide to file their tax return on a monthly, quarterly, or annual basis. Most companies file their tax return quarterly. So far, these VAT based short-term business statistics are published with a quarterly frequency as well. In this article we compare different methods to compile monthly figures, even though a major part of these data is observed quarterly. The methods considered to produce a monthly indicator must address two issues. The first issue is to combine a high- and low-frequency series into a single high-frequency series, while both series measure the same phenomenon of the target population. The appropriate method that is designed for this purpose is usually referred to as “benchmarking”. The second issue is a missing data problem, because the first and second month of a quarter are published before the corresponding quarterly data is available. A “nowcast” method can be used to estimate these months. The literature on mixed frequency models provides solutions for both problems, sometimes by dealing with them simultaneously. In this article we combine different benchmarking and nowcasting models and evaluate combinations. Our evaluation distinguishes between relatively stable periods and periods during and after a crisis because different approaches might be optimal under these two conditions. We find that during stable periods the so-called Bridge models perform slightly better than the alternatives considered. Until about fifteen months after a crisis, the models that rely heavier on historic patterns such as the Bridge, MIDAS and structural time series models are outperformed by more straightforward (S)ARIMA approaches.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"253 - 273"},"PeriodicalIF":1.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44455403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}