{"title":"‘Good data are used data’: Interview with Stefan Schweinfest1","authors":"Pieter Everaers","doi":"10.3233/sji-240050","DOIUrl":"https://doi.org/10.3233/sji-240050","url":null,"abstract":"","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 60","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141128691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giancarlo Carbonetti, Paolo Giacomi, Filomena Grassia, Alessandra Nuccitelli
While national registry systems are evolving worldwide and, in some cases, replacing reliance on censuses, in countries where well-established population registers are lacking, the population and housing census remains the primary source of detailed data on the number of people, their spatial distribution, age and gender structure, living conditions, and other key socio-economic characteristics. The quality of the census findings is crucial for several reasons, including building public trust in the national statistical system. In many developing countries, conducting a Post-Enumeration Survey appears to be the only feasible way to evaluate the census results. Indeed, the lack or incompleteness of reliable demographic data from alternative sources precludes the use of other methods. This paper discusses some aspects of the feasibility of a Post-Enumeration Survey in Ethiopia. In particular, the paper reports on the main critical issues that emerged from the pilot surveys carried out in the framework of a cooperation project – funded by the Italian Agency for Development Cooperation – aimed at providing methodological support and technical assistance for the preparation of the 4th Ethiopian Population and Housing Census.
{"title":"Towards the 4th population census in Ethiopia: Some insights into the feasibility of the Post-Enumeration Survey","authors":"Giancarlo Carbonetti, Paolo Giacomi, Filomena Grassia, Alessandra Nuccitelli","doi":"10.3233/sji-240024","DOIUrl":"https://doi.org/10.3233/sji-240024","url":null,"abstract":"While national registry systems are evolving worldwide and, in some cases, replacing reliance on censuses, in countries where well-established population registers are lacking, the population and housing census remains the primary source of detailed data on the number of people, their spatial distribution, age and gender structure, living conditions, and other key socio-economic characteristics. The quality of the census findings is crucial for several reasons, including building public trust in the national statistical system. In many developing countries, conducting a Post-Enumeration Survey appears to be the only feasible way to evaluate the census results. Indeed, the lack or incompleteness of reliable demographic data from alternative sources precludes the use of other methods. This paper discusses some aspects of the feasibility of a Post-Enumeration Survey in Ethiopia. In particular, the paper reports on the main critical issues that emerged from the pilot surveys carried out in the framework of a cooperation project – funded by the Italian Agency for Development Cooperation – aimed at providing methodological support and technical assistance for the preparation of the 4th Ethiopian Population and Housing Census.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luís Silva e Silva, Christian A. Mongeau Ospina, Carola Fabi
Rising food prices may rapidly push vulnerable populations into food insecurity, especially in developing economies and in low-income countries, where a substantial share of the financial resources available to the poorest households is spent on food. To capture soaring food prices and help in designing mitigating measures, we developed two complementary products: a nowcasting model that estimates official food consumer price inflation up to the current month and a daily food price monitor that checks whether the growth rate of a few basic food commodities exceeds a statistical threshold. Both products were designed with the consideration that the rapid acquisition of data and the automated extraction of insights are indispensable tools for policymakers, particularly in times of crisis. Our framework is characterized by three key aspects. Firstly, we leverage two non-traditional data sources to emphasize the importance of real-time information: a crowdsourced repository of daily food prices and textual insights obtained from newspapers articles. Secondly, our framework offers a global perspective, encompassing 225 countries and territories, which enables the monitoring of food prices dynamics on a global scale. Thirdly, results are made accessible daily via an intuitive and user-friendly interactive dashboard.
{"title":"Food price inflation nowcasting and monitoring","authors":"Luís Silva e Silva, Christian A. Mongeau Ospina, Carola Fabi","doi":"10.3233/sji-230083","DOIUrl":"https://doi.org/10.3233/sji-230083","url":null,"abstract":"Rising food prices may rapidly push vulnerable populations into food insecurity, especially in developing economies and in low-income countries, where a substantial share of the financial resources available to the poorest households is spent on food. To capture soaring food prices and help in designing mitigating measures, we developed two complementary products: a nowcasting model that estimates official food consumer price inflation up to the current month and a daily food price monitor that checks whether the growth rate of a few basic food commodities exceeds a statistical threshold. Both products were designed with the consideration that the rapid acquisition of data and the automated extraction of insights are indispensable tools for policymakers, particularly in times of crisis. Our framework is characterized by three key aspects. Firstly, we leverage two non-traditional data sources to emphasize the importance of real-time information: a crowdsourced repository of daily food prices and textual insights obtained from newspapers articles. Secondly, our framework offers a global perspective, encompassing 225 countries and territories, which enables the monitoring of food prices dynamics on a global scale. Thirdly, results are made accessible daily via an intuitive and user-friendly interactive dashboard.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gavin Corral, Luca Sartore, Katherine Vande Pol, Denise A. Abreu, Linda J Young
As is the case for many National Statistics Institutes, the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) has observed dwindling survey response rates, and the requests for more information at finer temporal and spatial scales have led to increased response burdens. Non-survey data are becoming increasingly abundant and accessible. Consequently, NASS is exploring the potential to complete some or all of a survey record using non-survey data, which would reduce respondent burden and potentially lead to increased response rates. In this paper, the focus is on a large set of records associated with potential farms, which are operations with undetermined farm status (farm/non-farm) and are referred to here as operations with unknown status (OUS). Although they usually have some agriculture, most OUS records are eventually classified as non-farms. Those OUS that are classified as farms tend to have higher proportions of producers from under-represented groups compared to other records. Determining the probability that an OUS record is a farm is an important step in the imputation process. The OUS records that responded to the 2017 U.S. Census of Agriculture were used to develop models to predict farm status using multiple data sources. Evaluated models include bootstrap random forest (RF), logistic regression (LR), neural network (NN), and support vector machine (SVM). Although the SVM had the best outcomes for three of the five metrics, the sensitivity for identifying farms was the lowest (13.8%). The NN model had a sensitivity of 80.5%, which was substantially higher than the other models, and its specificity of 45.3% was the lowest of all models. Because sensitivity was the primary metric of interest and the NN performed reasonably well on the other metrics, the NN was selected as the preferred model.
{"title":"Using machine learning algorithms to identify farms on the 2022 Census of Agriculture","authors":"Gavin Corral, Luca Sartore, Katherine Vande Pol, Denise A. Abreu, Linda J Young","doi":"10.3233/sji-230089","DOIUrl":"https://doi.org/10.3233/sji-230089","url":null,"abstract":"As is the case for many National Statistics Institutes, the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) has observed dwindling survey response rates, and the requests for more information at finer temporal and spatial scales have led to increased response burdens. Non-survey data are becoming increasingly abundant and accessible. Consequently, NASS is exploring the potential to complete some or all of a survey record using non-survey data, which would reduce respondent burden and potentially lead to increased response rates. In this paper, the focus is on a large set of records associated with potential farms, which are operations with undetermined farm status (farm/non-farm) and are referred to here as operations with unknown status (OUS). Although they usually have some agriculture, most OUS records are eventually classified as non-farms. Those OUS that are classified as farms tend to have higher proportions of producers from under-represented groups compared to other records. Determining the probability that an OUS record is a farm is an important step in the imputation process. The OUS records that responded to the 2017 U.S. Census of Agriculture were used to develop models to predict farm status using multiple data sources. Evaluated models include bootstrap random forest (RF), logistic regression (LR), neural network (NN), and support vector machine (SVM). Although the SVM had the best outcomes for three of the five metrics, the sensitivity for identifying farms was the lowest (13.8%). The NN model had a sensitivity of 80.5%, which was substantially higher than the other models, and its specificity of 45.3% was the lowest of all models. Because sensitivity was the primary metric of interest and the NN performed reasonably well on the other metrics, the NN was selected as the preferred model.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent increasing attention to the economic and policy analysis of the food systems from international fora, public institutions and academia calls for the availability of information and data capable of informing about the interrelations across economic sectors and within value chains. The international policy agenda is pushing for a more effective application of measures at country and regional level in line with the recommendations of the 2030 Agenda and its Sustainable Development Goals, for which more systematic and integrated data about economic, social and environmental impacts of policies are requested. The Food Value Chain Domain recently published in FAOSTAT responds to this call. Its data and information shed light on the distribution of final domestic food expenditures across industries (Agriculture, Food Processing, Wholesale, Retail, Accommodations and Food Services) and primary factors (e.g.: Labour, Gross Operating Surplus) on the relative food value chain. The FAOSTAT Domain offers therefore robust and granular information on both the farm and the post-farm gate component of the Food Value Chain. The applied Global Food Dollar methodology, that FAO is contributing to upscale at global level, is based on Leontief decomposition approach on the Input-Output tables. Moreover, whenever the Input-Output table are not available, it is now possible to impute them from Supply-Use tables by applying a conversion methodology, developed by FAO in compliance with European (EUROSTAT), United Nations (UNSD) and international statistical standards as the System of National Accounts. This allows to extend the analysis to several African, Asian, and Latin American countries that produce on regular basis only Supply and Use Tables, and not Industry by industry Input Output Tables. The potential time and data coverage of the methodology is therefore significantly expanded. The aim of this paper is to describe the conceptual framework of the conversion methodology of Supply-Use Tables into Input-Output Tables of the Global Food Dollar methodology, and the potential implementation scope of these methodologies. Preliminary analytical findings of the applied methodologies are presented as well. The new methods and data presented in this paper, being based on data compliant with the International Statistical Standards, as the System of National Accounts, and therefore comparable across countries, associated to larger data availability, have the potential to effectively support food policies at international, regional and national level, as well as contribute to a decision making in line with the 2030 Agenda.
{"title":"FAOSTAT Food Value Chain Domain implementation: Input Output modelling and analytical applications","authors":"Silvia Cerilli, Michele Vollaro, Veronica Boero, Olivier Lavagne d’Ortigue, Jing Yi","doi":"10.3233/sji-230079","DOIUrl":"https://doi.org/10.3233/sji-230079","url":null,"abstract":"The recent increasing attention to the economic and policy analysis of the food systems from international fora, public institutions and academia calls for the availability of information and data capable of informing about the interrelations across economic sectors and within value chains. The international policy agenda is pushing for a more effective application of measures at country and regional level in line with the recommendations of the 2030 Agenda and its Sustainable Development Goals, for which more systematic and integrated data about economic, social and environmental impacts of policies are requested. The Food Value Chain Domain recently published in FAOSTAT responds to this call. Its data and information shed light on the distribution of final domestic food expenditures across industries (Agriculture, Food Processing, Wholesale, Retail, Accommodations and Food Services) and primary factors (e.g.: Labour, Gross Operating Surplus) on the relative food value chain. The FAOSTAT Domain offers therefore robust and granular information on both the farm and the post-farm gate component of the Food Value Chain. The applied Global Food Dollar methodology, that FAO is contributing to upscale at global level, is based on Leontief decomposition approach on the Input-Output tables. Moreover, whenever the Input-Output table are not available, it is now possible to impute them from Supply-Use tables by applying a conversion methodology, developed by FAO in compliance with European (EUROSTAT), United Nations (UNSD) and international statistical standards as the System of National Accounts. This allows to extend the analysis to several African, Asian, and Latin American countries that produce on regular basis only Supply and Use Tables, and not Industry by industry Input Output Tables. The potential time and data coverage of the methodology is therefore significantly expanded. The aim of this paper is to describe the conceptual framework of the conversion methodology of Supply-Use Tables into Input-Output Tables of the Global Food Dollar methodology, and the potential implementation scope of these methodologies. Preliminary analytical findings of the applied methodologies are presented as well. The new methods and data presented in this paper, being based on data compliant with the International Statistical Standards, as the System of National Accounts, and therefore comparable across countries, associated to larger data availability, have the potential to effectively support food policies at international, regional and national level, as well as contribute to a decision making in line with the 2030 Agenda.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141131389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contemporary evidence-informed policy-making (EIPM) and societies require openly accessible high-quality knowledge as input into transparent and accountable decision-making and informed societal action. Open Science1 supports this requirement. As both enablers and logical consequences of the paradigm of Open Science, the ideas of Open Access, Open Data, and FAIR publishing principles revolutionise how academic research needs to be conceptualised, conducted, disseminated, published, and used. This ‘academic openness quartet’ is especially relevant for the ways in which research data are created, annotated, curated, managed, shared, reproduced, (re-)used, and further developed in academia. Greater accessibility of scientific output and scholarly data also aims at increasing the transparency and reproducibility of research results and the quality of research itself. In the applied ‘academic openness quartet’ perspective, they also function as remedies for academic malaises, like missing replicability of results or secrecy around research data. Against this backdrop, the present article offers a conceptual discussion on the four academic openness paradigms, their meanings, interrelations, as well as potential benefits and challenges arising from their application in data-driven research.
{"title":"Open Science and the impact of Open Access, Open Data, and FAIR publishing principles on data-driven academic research: Towards ever more transparent, accessible, and reproducible academic output?","authors":"Gaby Umbach","doi":"10.3233/sji-240021","DOIUrl":"https://doi.org/10.3233/sji-240021","url":null,"abstract":"Contemporary evidence-informed policy-making (EIPM) and societies require openly accessible high-quality knowledge as input into transparent and accountable decision-making and informed societal action. Open Science1 supports this requirement. As both enablers and logical consequences of the paradigm of Open Science, the ideas of Open Access, Open Data, and FAIR publishing principles revolutionise how academic research needs to be conceptualised, conducted, disseminated, published, and used. This ‘academic openness quartet’ is especially relevant for the ways in which research data are created, annotated, curated, managed, shared, reproduced, (re-)used, and further developed in academia. Greater accessibility of scientific output and scholarly data also aims at increasing the transparency and reproducibility of research results and the quality of research itself. In the applied ‘academic openness quartet’ perspective, they also function as remedies for academic malaises, like missing replicability of results or secrecy around research data. Against this backdrop, the present article offers a conceptual discussion on the four academic openness paradigms, their meanings, interrelations, as well as potential benefits and challenges arising from their application in data-driven research.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"133 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140443878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Official statistics are widely considered to be public goods, however this paper explores a higher aspiration: that they also serve the public good. To achieve this goal, and provide value to societies worldwide, there is a need for discussion around what it truly means for statistics to serve the public good. This paper shares initial perspectives on the matter from the United Kingdom Office for Statistics Regulation (OSR) before demonstrating how serving the public good fits with customer-centric perspectives on value, and calling for interested parties to join this discussion so that we may work together in service of statistics for a global good.
{"title":"Statistics for the public good: What it means and why it matters","authors":"Sofi Nickson","doi":"10.3233/sji-230116","DOIUrl":"https://doi.org/10.3233/sji-230116","url":null,"abstract":"Official statistics are widely considered to be public goods, however this paper explores a higher aspiration: that they also serve the public good. To achieve this goal, and provide value to societies worldwide, there is a need for discussion around what it truly means for statistics to serve the public good. This paper shares initial perspectives on the matter from the United Kingdom Office for Statistics Regulation (OSR) before demonstrating how serving the public good fits with customer-centric perspectives on value, and calling for interested parties to join this discussion so that we may work together in service of statistics for a global good.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"83 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140456291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, most activities of the statistical offices need to be adapted to the modernization policies of the national statistical system. Therefore, the application of machine learning techniques is mandatory for the main activities of statistical centers. These include important issues such as coding business activities, address matching, prediction of response propensities, and many others. One of the common applications of machine learning methods in official statistics is to match a statistical address to a postal address, in order to establish a link between register-based census and traditional censuses with the aim of providing time series census information. Since there is no unique identifier to directly map the records from different databases, text-based approaches can be applied. In this paper, a novel application of machine learning will be investigated to integrate data sources of governmental records and census, employing text-based learning. Additionally, three new methods of machine learning classification algorithms are proposed. A simulation study has been performed to evaluate the robustness of methods in terms of the degree of duplication and purity of the texts. Due to the limitation of the R programming environment on big data sets, all programming has been successfully implemented on SAS (Statistical analysis system) software.
如今,统计局的大多数活动都需要适应国家统计系统的现代化政策。因此,统计中心的主要活动都必须应用机器学习技术。其中包括对商业活动进行编码、地址匹配、预测反应倾向等重要问题。机器学习方法在官方统计中的常见应用之一是将统计地址与邮政地址进行匹配,以便在基于登记的普查和传统普查之间建立联系,从而提供时间序列普查信息。由于没有唯一的标识符来直接映射来自不同数据库的记录,因此可以采用基于文本的方法。本文将研究一种新的机器学习应用,利用基于文本的学习来整合政府记录和人口普查的数据源。此外,本文还提出了三种新的机器学习分类算法方法。我们进行了一项模拟研究,以评估这些方法在文本的重复程度和纯度方面的稳健性。由于 R 编程环境对大数据集的限制,所有编程都在 SAS(统计分析系统)软件上成功实现。
{"title":"Address matching using machine learning methods: An application to register-based census","authors":"Zahra Rezaei Ghahroodi, Hassan Ranji, Alireza Rezaee","doi":"10.3233/sji-230099","DOIUrl":"https://doi.org/10.3233/sji-230099","url":null,"abstract":"Today, most activities of the statistical offices need to be adapted to the modernization policies of the national statistical system. Therefore, the application of machine learning techniques is mandatory for the main activities of statistical centers. These include important issues such as coding business activities, address matching, prediction of response propensities, and many others. One of the common applications of machine learning methods in official statistics is to match a statistical address to a postal address, in order to establish a link between register-based census and traditional censuses with the aim of providing time series census information. Since there is no unique identifier to directly map the records from different databases, text-based approaches can be applied. In this paper, a novel application of machine learning will be investigated to integrate data sources of governmental records and census, employing text-based learning. Additionally, three new methods of machine learning classification algorithms are proposed. A simulation study has been performed to evaluate the robustness of methods in terms of the degree of duplication and purity of the texts. Due to the limitation of the R programming environment on big data sets, all programming has been successfully implemented on SAS (Statistical analysis system) software.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"314 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140457531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present Spatial-KWD, a free open-source tool for efficient computation of the Kantorovich-Wasserstein Distance (KWD), also known as Earth Mover Distance, between pairs of binned spatial distributions (histograms) of a non-negative variable. KWD can be used in spatial statistics as a measure of (dis)similarity between spatial distributions of physical or social quantities. KWD represents the minimum total cost of moving the “mass” from one distribution to the other when the “cost” of moving a unit of mass is proportional to the euclidean distance between the source and destination bins. As such, KWD captures the degree of “horizontal displacement” between the two input distributions. Despite its mathematical properties and intuitive physical interpretation, KWD has found little application in spatial statistics until now, mainly due to the high computational complexity of previous implementations that did not allow its application to large problem instances of practical interest. Building upon recent advances in Optimal Transport theory, the Spatial-KWD library allows to compute KWD values for very large instances with hundreds of thousands or even millions of bins. Furthermore, the tool offers a rich set of options and features to enable the flexible use of KWD in diverse practical applications.
{"title":"The Kantorovich-Wasserstein distance for spatial statistics: The Spatial-KWD library","authors":"Fabio Ricciato, Stefano Gualandi","doi":"10.3233/sji-230121","DOIUrl":"https://doi.org/10.3233/sji-230121","url":null,"abstract":"In this paper we present Spatial-KWD, a free open-source tool for efficient computation of the Kantorovich-Wasserstein Distance (KWD), also known as Earth Mover Distance, between pairs of binned spatial distributions (histograms) of a non-negative variable. KWD can be used in spatial statistics as a measure of (dis)similarity between spatial distributions of physical or social quantities. KWD represents the minimum total cost of moving the “mass” from one distribution to the other when the “cost” of moving a unit of mass is proportional to the euclidean distance between the source and destination bins. As such, KWD captures the degree of “horizontal displacement” between the two input distributions. Despite its mathematical properties and intuitive physical interpretation, KWD has found little application in spatial statistics until now, mainly due to the high computational complexity of previous implementations that did not allow its application to large problem instances of practical interest. Building upon recent advances in Optimal Transport theory, the Spatial-KWD library allows to compute KWD values for very large instances with hundreds of thousands or even millions of bins. Furthermore, the tool offers a rich set of options and features to enable the flexible use of KWD in diverse practical applications.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"87 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140461995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Celeste Cutting, Michael Alspach, Sarah Cowell, Michael Judd, Simon McBeth, Mathew Page
This paper provides an overview of progress and opportunities in Stats NZ’s journey towards a register-based statistical system. It sets out to provide a status update of the components of the system at Stats NZ – Statistical Business Register (SBR), Statistical Person Register (SPR), and Statistical Location Register (SLR). The drivers for change and changes to the authorising environment are described, including the prioritisation of a register-based statistical system through Stats NZ’s strategic priorities and the updates to the legislative context through the Data and Statistics Act 2022. The current state of each of the base registers is briefly described and detail is provided on the evolution of a SPR and concept development of a property-centric location register.
{"title":"A register-based statistical system in New Zealand: Progress and opportunities","authors":"Celeste Cutting, Michael Alspach, Sarah Cowell, Michael Judd, Simon McBeth, Mathew Page","doi":"10.3233/sji-230106","DOIUrl":"https://doi.org/10.3233/sji-230106","url":null,"abstract":"This paper provides an overview of progress and opportunities in Stats NZ’s journey towards a register-based statistical system. It sets out to provide a status update of the components of the system at Stats NZ – Statistical Business Register (SBR), Statistical Person Register (SPR), and Statistical Location Register (SLR). The drivers for change and changes to the authorising environment are described, including the prioritisation of a register-based statistical system through Stats NZ’s strategic priorities and the updates to the legislative context through the Data and Statistics Act 2022. The current state of each of the base registers is briefly described and detail is provided on the evolution of a SPR and concept development of a property-centric location register.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"17 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140463836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}