首页 > 最新文献

Journal of Official Statistics最新文献

英文 中文
Letter to Editor Quality of 2017 Population Census of Pakistan by Age and Sex 致巴基斯坦2017年按年龄和性别分列的人口普查质量编辑的信
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0013
A. Wazir, Anne Goujon
Abstract This Letter to Editor is a supplement to the previously published article in the Journal of Official Statistics (Wazir and Goujon 2021). In 2021, a reconstruction method using demographic analysis for assessing the quality and validity of the 2017 census data has been applied, by critically investigating the demographic changes in the intercensal period at national and provincial levels. However, at the time when the article was written, the age and sex structure of the population from the 2017 census had not yet been published, making it hard to fully appreciate the reconstruction of the national and subnational level populations. In the meantime, detailed data have become available and offer the possibility to assess the reconstruction’s outcome more in detail. Therefore, this letter aims two-fold: (1) to analyze the quality of the age and sex distribution in the 2017 Population census of Pakistan, and (2) to compare the reconstruction by age and sex to the results of the 2017 population census. Our results reveal that the age and sex structure of the population as estimated by the 2017 census suffer from some irregularities. Our analysis by age and sex reinforces the main conclusion of previous article that the next census in Pakistan should increase in quality with an inbuild post-enumeration survey along with post-census demographic analysis.
摘要这封致编辑的信是对《官方统计杂志》(Wazir和Goujon,2021)上先前发表的文章的补充。2021年,通过严格调查国家和省级人口普查期间的人口变化,采用了一种使用人口统计分析评估2017年人口普查数据质量和有效性的重建方法。然而,在撰写这篇文章时,2017年人口普查的人口年龄和性别结构尚未公布,因此很难完全理解国家和国家以下各级人口的重建情况。与此同时,已经有了详细的数据,可以更详细地评估重建的结果。因此,这封信的目的有两个:(1)分析巴基斯坦2017年人口普查中年龄和性别分布的质量;(2)将按年龄和性别进行的重建与2017年人口普查的结果进行比较。我们的结果显示,2017年人口普查估计的人口年龄和性别结构存在一些不规则性。我们按年龄和性别进行的分析强化了上一篇文章的主要结论,即巴基斯坦的下一次人口普查应该通过内置的人口普查后调查和人口普查后人口统计分析来提高质量。
{"title":"Letter to Editor Quality of 2017 Population Census of Pakistan by Age and Sex","authors":"A. Wazir, Anne Goujon","doi":"10.2478/jos-2023-0013","DOIUrl":"https://doi.org/10.2478/jos-2023-0013","url":null,"abstract":"Abstract This Letter to Editor is a supplement to the previously published article in the Journal of Official Statistics (Wazir and Goujon 2021). In 2021, a reconstruction method using demographic analysis for assessing the quality and validity of the 2017 census data has been applied, by critically investigating the demographic changes in the intercensal period at national and provincial levels. However, at the time when the article was written, the age and sex structure of the population from the 2017 census had not yet been published, making it hard to fully appreciate the reconstruction of the national and subnational level populations. In the meantime, detailed data have become available and offer the possibility to assess the reconstruction’s outcome more in detail. Therefore, this letter aims two-fold: (1) to analyze the quality of the age and sex distribution in the 2017 Population census of Pakistan, and (2) to compare the reconstruction by age and sex to the results of the 2017 population census. Our results reveal that the age and sex structure of the population as estimated by the 2017 census suffer from some irregularities. Our analysis by age and sex reinforces the main conclusion of previous article that the next census in Pakistan should increase in quality with an inbuild post-enumeration survey along with post-census demographic analysis.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"275 - 284"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48735237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Database Reconstruction Is Not So Easy and Is Different from Reidentification 数据库重建不是那么容易,它不同于重新识别
4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0017
Krishnamurty Muralidhar, Josep Domingo-Ferrer
Abstract In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this article, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.
近年来,有一种观点认为,在数据库中发布准确的统计信息有可能使数据库完全重建。差分隐私被认为是防止这些攻击的适当方法。最近,美国人口普查局非常重视这些说法,并导致他们在发布美国人口普查数据时采用差异隐私。由于受保护的产出缺乏准确性,这又引起了人口普查数据用户的恐慌。该公司还对美国商务部提起了法律诉讼。在本文中,我们追溯了在数据库上自动发布信息使其容易受到重构攻击暴露的说法的起源,并表明这种说法实际上是不正确的。我们还表明,通过正确使用传统的统计披露控制(SDC)技术可以避免重建。我们进一步表明,与实际采用的SDC方法相比,发布确切计数的地理水平与保护更相关。最后,我们警告不要混淆重建和再识别:使用重建质量作为再识别的度量会导致夸大的再识别风险数字。
{"title":"Database Reconstruction Is Not So Easy and Is Different from Reidentification","authors":"Krishnamurty Muralidhar, Josep Domingo-Ferrer","doi":"10.2478/jos-2023-0017","DOIUrl":"https://doi.org/10.2478/jos-2023-0017","url":null,"abstract":"Abstract In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this article, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach 使用贝叶斯方法预测横断面调查中受访者接触的天数
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0015
Stephanie Coffey, Michael R. Elliott
Abstract Surveys estimate and monitor a variety of data collection parameters, including response propensity, number of contacts, and data collection costs. These parameters can be used as inputs to a responsive/adaptive design or to monitor the progression of a data collection period against predefined expectations. Recently, Bayesian methods have emerged as a method for combining historical information or external data with data from the in-progress data collection period to improve prediction. We develop a Bayesian method for predicting a measure of case-level progress or productivity, the estimated time lag, in days, between first contact attempt and first respondent contact. We compare the quality of predictions from the Bayesian method to predictions generated from more commonly-used predictive methods that leverage data from only historical data collection periods or the in-progress round of data collection. Using prediction error and misclassification as short- or long- day lags, we demonstrate that the Bayesian method results in improved predictions close to the day of the first contact attempt, when these predictions may be most informative for interventions or interviewer feedback. This application adds to evidence that combining historical and current information about data collection, in a Bayesian framework, can improve predictions of data collection parameters.
调查评估和监测各种数据收集参数,包括响应倾向、接触次数和数据收集成本。这些参数可用作响应式/自适应设计的输入,或根据预定义的期望监视数据收集周期的进展。最近,贝叶斯方法作为一种将历史信息或外部数据与正在进行的数据收集期的数据相结合以改进预测的方法而出现。我们开发了一种贝叶斯方法来预测案例级进展或生产力的测量,估计的时间滞后,以天为单位,在第一次接触尝试和第一次被调查者接触之间。我们将贝叶斯方法的预测质量与更常用的预测方法产生的预测质量进行了比较,这些预测方法仅利用历史数据收集周期或正在进行的数据收集周期的数据。使用预测误差和错误分类作为短或长天滞后,我们证明了贝叶斯方法在接近第一次接触尝试的当天产生改进的预测,当这些预测可能对干预或面试官反馈最有信息。该应用程序进一步证明,在贝叶斯框架中结合有关数据收集的历史和当前信息,可以改进对数据收集参数的预测。
{"title":"Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach","authors":"Stephanie Coffey, Michael R. Elliott","doi":"10.2478/jos-2023-0015","DOIUrl":"https://doi.org/10.2478/jos-2023-0015","url":null,"abstract":"Abstract Surveys estimate and monitor a variety of data collection parameters, including response propensity, number of contacts, and data collection costs. These parameters can be used as inputs to a responsive/adaptive design or to monitor the progression of a data collection period against predefined expectations. Recently, Bayesian methods have emerged as a method for combining historical information or external data with data from the in-progress data collection period to improve prediction. We develop a Bayesian method for predicting a measure of case-level progress or productivity, the estimated time lag, in days, between first contact attempt and first respondent contact. We compare the quality of predictions from the Bayesian method to predictions generated from more commonly-used predictive methods that leverage data from only historical data collection periods or the in-progress round of data collection. Using prediction error and misclassification as short- or long- day lags, we demonstrate that the Bayesian method results in improved predictions close to the day of the first contact attempt, when these predictions may be most informative for interventions or interviewer feedback. This application adds to evidence that combining historical and current information about data collection, in a Bayesian framework, can improve predictions of data collection parameters.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"325 - 349"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45295208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Looking for a New Approach to Measuring the Spatial Concentration of the Human Population 寻找一种测量人口空间集中度的新方法
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0014
Federico Benassi, Massimo Mucciardi, Giovanni Pirrotta
Abstract In the article a new approach for measuring the spatial concentration of human population is presented and tested. The new procedure is based on the concept of concentration introduced by Gini and, at the same time, on its spatial extension (i.e., taking into account the concept of spatial autocorrelation, polarization). The proposed indicator, the Spatial Gini Index, is then computed by using two different kind of territorial partitioning methods: MaxMin (MM) and the Constant Step (CS) distance. In this framework an ad hoc extension of the Rey and Smith decomposition method is then introduced. We apply this new approach to the Italian and foreign population resident in almost 7,900 statistical units (Italian municipalities) in 2002, 2010 and 2018. All elaborations are based on a new ad hoc library developed and implemented in Python.
摘要本文提出并测试了一种测量人口空间集中度的新方法。新程序基于基尼提出的浓度概念,同时基于其空间扩展(即,考虑到空间自相关、极化的概念)。然后,通过使用两种不同的区域划分方法:MaxMin(MM)和恒定步长(CS)距离来计算所提出的指标,即空间基尼指数。在此框架中,引入了Rey和Smith分解方法的一个特殊扩展。我们将这一新方法应用于2002年、2010年和2018年居住在近7900个统计单位(意大利城市)的意大利和外国人口。所有的阐述都是基于一个用Python开发和实现的新的特设库。
{"title":"Looking for a New Approach to Measuring the Spatial Concentration of the Human Population","authors":"Federico Benassi, Massimo Mucciardi, Giovanni Pirrotta","doi":"10.2478/jos-2023-0014","DOIUrl":"https://doi.org/10.2478/jos-2023-0014","url":null,"abstract":"Abstract In the article a new approach for measuring the spatial concentration of human population is presented and tested. The new procedure is based on the concept of concentration introduced by Gini and, at the same time, on its spatial extension (i.e., taking into account the concept of spatial autocorrelation, polarization). The proposed indicator, the Spatial Gini Index, is then computed by using two different kind of territorial partitioning methods: MaxMin (MM) and the Constant Step (CS) distance. In this framework an ad hoc extension of the Rey and Smith decomposition method is then introduced. We apply this new approach to the Italian and foreign population resident in almost 7,900 statistical units (Italian municipalities) in 2002, 2010 and 2018. All elaborations are based on a new ad hoc library developed and implemented in Python.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"285 - 324"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49661739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Demand-Driven On-The-Fly Statistics 走向需求驱动的动态统计
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0016
T. Gelsema, Guido van den Heuvel
Abstract A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.
摘要:介绍了一种用于实时计算和传播聚合统计数据的问答系统——Farseer的原型。使用自然语言处理(NLP)、机器学习(ML)、人工智能(AI)和形式语义技术,该框架能够正确解释(聚合)统计的书面请求,并随后生成适当的结果。通过在输入到框架的知识图中捕获特定领域的信息,该框架以一种独立于所考虑的特定统计领域的方式运行。然而,也表明该原型仍有其局限性,缺乏统计披露控制。此外,搜索知识图谱仍然很耗时。
{"title":"Towards Demand-Driven On-The-Fly Statistics","authors":"T. Gelsema, Guido van den Heuvel","doi":"10.2478/jos-2023-0016","DOIUrl":"https://doi.org/10.2478/jos-2023-0016","url":null,"abstract":"Abstract A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"351 - 379"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42520574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comment to Mulalidhar and Domingo-Ferrer (2023) – Legacy Statistical Disclosure Limitation Techniques Were Not An Option for the 2020 US Census of Population And Housing 对Mulalidhar和Domingo Ferrer(2023)的评论——遗产统计披露限制技术不是2020年美国人口和住房普查的选择
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0018
S. Garfinkel
Abstract The Article Database Reconstruction is Not So Easy and Is Different from Reidentification, by Krish Muralidhar and Josep Domingo-Ferrer, is an extended attack on the decision of the U.S. Census Bureau to turn its back on legacy statistical disclosure limitation techniques and instead use a bespoke algorithm based on differential privacy to protect the published data products of the Census Bureau’s 2020 Census of Population and Housing (henceforth referred to as the 2020 Census). This response explains why differential privacy was the only realistic choice for protecting sensitive data collected for the 2020 Census. However, differential privacy has a social cost: it requires that practitioners admit that there is inherently a trade-off between the utility of published official statistics and the privacy loss of those whose data are collected under a pledge of confidentiality.
摘要Krish Muralidhar和Josep Domingo Ferrer的文章数据库重建并不那么容易,是对美国人口普查局决定放弃传统的统计披露限制技术,转而使用基于差异隐私的定制算法来保护人口普查局2020年人口和住房普查(以下简称2020人口普查)公布的数据产品的延伸攻击。这一回应解释了为什么差异隐私是保护2020年人口普查收集的敏感数据的唯一现实选择。然而,差异隐私有一个社会成本:它要求从业者承认,在公布的官方统计数据的效用和那些根据保密承诺收集数据的人的隐私损失之间,存在着内在的权衡。
{"title":"Comment to Mulalidhar and Domingo-Ferrer (2023) – Legacy Statistical Disclosure Limitation Techniques Were Not An Option for the 2020 US Census of Population And Housing","authors":"S. Garfinkel","doi":"10.2478/jos-2023-0018","DOIUrl":"https://doi.org/10.2478/jos-2023-0018","url":null,"abstract":"Abstract The Article Database Reconstruction is Not So Easy and Is Different from Reidentification, by Krish Muralidhar and Josep Domingo-Ferrer, is an extended attack on the decision of the U.S. Census Bureau to turn its back on legacy statistical disclosure limitation techniques and instead use a bespoke algorithm based on differential privacy to protect the published data products of the Census Bureau’s 2020 Census of Population and Housing (henceforth referred to as the 2020 Census). This response explains why differential privacy was the only realistic choice for protecting sensitive data collected for the 2020 Census. However, differential privacy has a social cost: it requires that practitioners admit that there is inherently a trade-off between the utility of published official statistics and the privacy loss of those whose data are collected under a pledge of confidentiality.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"399 - 410"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49290547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Rejoinder to Garfinkel (2023) – Legacy Statistical Disclosure Limitation Techniques for Protecting 2020 Decennial US Census: Still a Viable Option 对Garfinkel(2023)的回复-保护2020年十年一次的美国人口普查的遗留统计披露限制技术:仍然是一个可行的选择
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-09-01 DOI: 10.2478/jos-2023-0019
K. Muralidhar, J. Domingo-Ferrer
Abstract In our article “Database Reconstruction Is Not So Easy and Is Different from Reidentification”, we show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques, also sometimes called legacy statistical disclosure limitation (SDL) techniques. Furthermore, we also point out that, even if reconstruction can be performed, it does not imply reidentification. Hence, the risk of reconstruction does not seem to warrant replacing traditional SDC techniques with differential privacy (DP) based protection. In “Legacy Statistical Disclosure Limitation Techniques Were Not an Option for the 2020 US Census of Population and Housing”, by Simson Garfinkel, the author insists that the 2020 Census move to DP was justified. In our view, this latter article contains some misconceptions that we identify and discuss in some detail below. Consequently, we stand by the arguments given in “Database Reconstruction Is Not So Easy:: :”.
在我们的文章“数据库重建不是那么容易,不同于重新识别”中,我们表明可以通过适当使用传统的统计披露控制(SDC)技术来避免重建,有时也称为遗留统计披露限制(SDL)技术。此外,我们还指出,即使重建可以进行,它并不意味着重新识别。因此,重建的风险似乎不能保证用基于差分隐私(DP)的保护取代传统的SDC技术。在西姆森·加芬克尔(Simson Garfinkel)的《传统统计披露限制技术不是2020年美国人口和住房普查的选择》一书中,作者坚持认为,2020年人口普查转向DP是合理的。在我们看来,后一篇文章包含了一些误解,我们将在下面识别并详细讨论这些误解。因此,我们支持“数据库重建不是那么容易:::”中给出的论点。
{"title":"A Rejoinder to Garfinkel (2023) – Legacy Statistical Disclosure Limitation Techniques for Protecting 2020 Decennial US Census: Still a Viable Option","authors":"K. Muralidhar, J. Domingo-Ferrer","doi":"10.2478/jos-2023-0019","DOIUrl":"https://doi.org/10.2478/jos-2023-0019","url":null,"abstract":"Abstract In our article “Database Reconstruction Is Not So Easy and Is Different from Reidentification”, we show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques, also sometimes called legacy statistical disclosure limitation (SDL) techniques. Furthermore, we also point out that, even if reconstruction can be performed, it does not imply reidentification. Hence, the risk of reconstruction does not seem to warrant replacing traditional SDC techniques with differential privacy (DP) based protection. In “Legacy Statistical Disclosure Limitation Techniques Were Not an Option for the 2020 US Census of Population and Housing”, by Simson Garfinkel, the author insists that the 2020 Census move to DP was justified. In our view, this latter article contains some misconceptions that we identify and discuss in some detail below. Consequently, we stand by the arguments given in “Database Reconstruction Is Not So Easy:: :”.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"411 - 420"},"PeriodicalIF":1.1,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43397602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on the Optimum Allocation of Resources to Follow up Unit Nonrespondents in Probability Surveys 关于概率调查中后续单位非应答者资源优化配置的一点注记
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-07 DOI: 10.2478/jos-2023-0020
Siu-Ming Tam, A. Holmberg, Summer Wang
Abstract Common practice to address nonresponse in probability surveys in National Statistical Offices is to follow up every non respondent with a view to lifting response rates. As response rate is an insufficient indicator of data quality, it is argued that one should follow up non respondents with a view to reducing the mean squared error (MSE) of the estimator of the variable of interest. In this article, we propose a method to allocate the nonresponse follow-up resources in such a way as to minimise the MSE under a quasi-randomisation framework. An example to illustrate the method using the 2018/19 Rural Environment and Agricultural Commodities Survey from the Australian Bureau of Statistics is provided.
在国家统计局进行的概率调查中,解决无应答问题的通常做法是跟踪每一个无应答者,以期提高应答率。由于回复率是数据质量的一个不足的指标,有人认为应该跟踪非受访者,以减少感兴趣变量的估计量的均方误差(MSE)。在本文中,我们提出了一种在准随机化框架下以最小化MSE的方式分配非响应跟踪资源的方法。本文以澳大利亚统计局2018/19年度农村环境与农产品调查为例说明了该方法。
{"title":"A Note on the Optimum Allocation of Resources to Follow up Unit Nonrespondents in Probability Surveys","authors":"Siu-Ming Tam, A. Holmberg, Summer Wang","doi":"10.2478/jos-2023-0020","DOIUrl":"https://doi.org/10.2478/jos-2023-0020","url":null,"abstract":"Abstract Common practice to address nonresponse in probability surveys in National Statistical Offices is to follow up every non respondent with a view to lifting response rates. As response rate is an insufficient indicator of data quality, it is argued that one should follow up non respondents with a view to reducing the mean squared error (MSE) of the estimator of the variable of interest. In this article, we propose a method to allocate the nonresponse follow-up resources in such a way as to minimise the MSE under a quasi-randomisation framework. An example to illustrate the method using the 2018/19 Rural Environment and Agricultural Commodities Survey from the Australian Bureau of Statistics is provided.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"421 - 433"},"PeriodicalIF":1.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47288555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Sample Size Determination for Experiments on Nonresponse Followup using a Sequential Regression Model 使用序列回归模型的无反应随访实验的设计和样本量确定
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-01 DOI: 10.2478/jos-2023-0009
Andrew M. Raim, T. Mathew, Kimberly F. Sellers, Renée Ellis, Mikelyn Meyers
Abstract Statistical agencies depend on responses to inquiries made to the public, and occasionally conduct experiments to improve contact procedures. Agencies may wish to assess whether there is significant change in response rates due to an operational refinement. This work considers the assessment of response rates when up to L attempts are made to contact each subject, and subjects receive one of J possible variations of the operation under experimentation. In particular, the continuation-ratio logit (CRL) model facilitates inference on the probability of success at each step of the sequence, given that failures occurred at previous attempts. The CRL model is investigated as a basis for sample size determination– one of the major decisions faced by an experimenter–to attain a desired power under a Wald test of a general linear hypothesis. An experiment that was conducted for nonresponse followup in the United States 2020 decennial census provides a motivating illustration.
统计机构依赖于对公众询问的回应,偶尔会进行实验以改进联系程序。各机构不妨评估,由于业务改进,答复率是否发生了重大变化。这项工作考虑了在与每个受试者进行多达L次接触的情况下对反应率的评估,受试者在实验中接受J种可能的操作变化之一。特别是,连续比logit (CRL)模型有助于推断序列中每一步成功的概率,假设之前的尝试失败。CRL模型被研究作为样本大小确定的基础——在一般线性假设的Wald检验下获得期望的功率,这是实验者面临的主要决策之一。在美国2020年十年一次的人口普查中进行的一项无反应随访实验提供了一个鼓舞人心的例子。
{"title":"Design and Sample Size Determination for Experiments on Nonresponse Followup using a Sequential Regression Model","authors":"Andrew M. Raim, T. Mathew, Kimberly F. Sellers, Renée Ellis, Mikelyn Meyers","doi":"10.2478/jos-2023-0009","DOIUrl":"https://doi.org/10.2478/jos-2023-0009","url":null,"abstract":"Abstract Statistical agencies depend on responses to inquiries made to the public, and occasionally conduct experiments to improve contact procedures. Agencies may wish to assess whether there is significant change in response rates due to an operational refinement. This work considers the assessment of response rates when up to L attempts are made to contact each subject, and subjects receive one of J possible variations of the operation under experimentation. In particular, the continuation-ratio logit (CRL) model facilitates inference on the probability of success at each step of the sequence, given that failures occurred at previous attempts. The CRL model is investigated as a basis for sample size determination– one of the major decisions faced by an experimenter–to attain a desired power under a Wald test of a general linear hypothesis. An experiment that was conducted for nonresponse followup in the United States 2020 decennial census provides a motivating illustration.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"173 - 202"},"PeriodicalIF":1.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43302803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Quarterly to Monthly Turnover Figures Using Nowcasting Methods 从使用临近预报方法的季度到月度营业额数据
IF 1.1 4区 数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2023-06-01 DOI: 10.2478/jos-2023-0012
Daan B. Zult, Sabine Krieg, B. Schouten, P. Ouwehand, Jan van den Brakel
Abstract Short-term business statistics at Statistics Netherlands are largely based on Value Added Tax (VAT) administrations. Companies may decide to file their tax return on a monthly, quarterly, or annual basis. Most companies file their tax return quarterly. So far, these VAT based short-term business statistics are published with a quarterly frequency as well. In this article we compare different methods to compile monthly figures, even though a major part of these data is observed quarterly. The methods considered to produce a monthly indicator must address two issues. The first issue is to combine a high- and low-frequency series into a single high-frequency series, while both series measure the same phenomenon of the target population. The appropriate method that is designed for this purpose is usually referred to as “benchmarking”. The second issue is a missing data problem, because the first and second month of a quarter are published before the corresponding quarterly data is available. A “nowcast” method can be used to estimate these months. The literature on mixed frequency models provides solutions for both problems, sometimes by dealing with them simultaneously. In this article we combine different benchmarking and nowcasting models and evaluate combinations. Our evaluation distinguishes between relatively stable periods and periods during and after a crisis because different approaches might be optimal under these two conditions. We find that during stable periods the so-called Bridge models perform slightly better than the alternatives considered. Until about fifteen months after a crisis, the models that rely heavier on historic patterns such as the Bridge, MIDAS and structural time series models are outperformed by more straightforward (S)ARIMA approaches.
荷兰统计局的短期商业统计主要基于增值税(VAT)管理。公司可以决定按月、季度或年度提交纳税申报表。大多数公司每季度提交一次纳税申报表。到目前为止,这些基于增值税的短期业务统计数据也是按季度发布的。在本文中,我们比较了编制月度数据的不同方法,尽管这些数据的主要部分是按季度观察的。考虑编制月度指标的方法必须解决两个问题。第一个问题是将一个高、低频序列合并为一个单一的高频序列,而这两个序列测量的是目标人群的同一现象。为此目的而设计的适当方法通常称为“基准测试”。第二个问题是缺少数据的问题,因为一个季度的第一个月和第二个月是在相应的季度数据可用之前发布的。一种“临近预报”的方法可以用来估计这些月份。关于混合频率模型的文献为这两个问题提供了解决方案,有时是同时处理它们。在本文中,我们将不同的基准和临近预测模型结合起来,并对组合进行评估。我们的评估区分了相对稳定的时期和危机中和危机后的时期,因为在这两种情况下,不同的方法可能是最佳的。我们发现,在稳定时期,所谓的桥模型的表现略好于所考虑的替代方案。直到危机发生15个月后,更依赖于历史模式的模型,如桥模型、MIDAS模型和结构时间序列模型的表现都优于更直接的ARIMA方法。
{"title":"From Quarterly to Monthly Turnover Figures Using Nowcasting Methods","authors":"Daan B. Zult, Sabine Krieg, B. Schouten, P. Ouwehand, Jan van den Brakel","doi":"10.2478/jos-2023-0012","DOIUrl":"https://doi.org/10.2478/jos-2023-0012","url":null,"abstract":"Abstract Short-term business statistics at Statistics Netherlands are largely based on Value Added Tax (VAT) administrations. Companies may decide to file their tax return on a monthly, quarterly, or annual basis. Most companies file their tax return quarterly. So far, these VAT based short-term business statistics are published with a quarterly frequency as well. In this article we compare different methods to compile monthly figures, even though a major part of these data is observed quarterly. The methods considered to produce a monthly indicator must address two issues. The first issue is to combine a high- and low-frequency series into a single high-frequency series, while both series measure the same phenomenon of the target population. The appropriate method that is designed for this purpose is usually referred to as “benchmarking”. The second issue is a missing data problem, because the first and second month of a quarter are published before the corresponding quarterly data is available. A “nowcast” method can be used to estimate these months. The literature on mixed frequency models provides solutions for both problems, sometimes by dealing with them simultaneously. In this article we combine different benchmarking and nowcasting models and evaluate combinations. Our evaluation distinguishes between relatively stable periods and periods during and after a crisis because different approaches might be optimal under these two conditions. We find that during stable periods the so-called Bridge models perform slightly better than the alternatives considered. Until about fifteen months after a crisis, the models that rely heavier on historic patterns such as the Bridge, MIDAS and structural time series models are outperformed by more straightforward (S)ARIMA approaches.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"253 - 273"},"PeriodicalIF":1.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44455403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Official Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1