Statistical Journal of the IAOS最新文献

‘Good data are used data’: Interview with Stefan Schweinfest1 好数据就是用过的数据采访 Stefan Schweinfest1

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-05-12 DOI: 10.3233/sji-240050

Pieter Everaers

引用次数: 0

Towards the 4th population census in Ethiopia: Some insights into the feasibility of the Post-Enumeration Survey 埃塞俄比亚第四次人口普查：关于人口普查后调查可行性的一些见解

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-05-08 DOI: 10.3233/sji-240024

Giancarlo Carbonetti, Paolo Giacomi, Filomena Grassia, Alessandra Nuccitelli

While national registry systems are evolving worldwide and, in some cases, replacing reliance on censuses, in countries where well-established population registers are lacking, the population and housing census remains the primary source of detailed data on the number of people, their spatial distribution, age and gender structure, living conditions, and other key socio-economic characteristics. The quality of the census findings is crucial for several reasons, including building public trust in the national statistical system. In many developing countries, conducting a Post-Enumeration Survey appears to be the only feasible way to evaluate the census results. Indeed, the lack or incompleteness of reliable demographic data from alternative sources precludes the use of other methods. This paper discusses some aspects of the feasibility of a Post-Enumeration Survey in Ethiopia. In particular, the paper reports on the main critical issues that emerged from the pilot surveys carried out in the framework of a cooperation project – funded by the Italian Agency for Development Cooperation – aimed at providing methodological support and technical assistance for the preparation of the 4th Ethiopian Population and Housing Census.

虽然国家登记系统在世界范围内不断发展，并在某些情况下取代了对人口普查的依赖，但在缺乏完善的人口登记系统的国家，人口和住房普查仍是有关人口数量、空间分布、年龄和性别结构、生活条件以及其他主要社会经济特征的详细数据的主要来源。普查结果的质量至关重要，原因有几个，包括建立公众对国家统计系统的信任。在许多发展中国家，开展普查后调查似乎是评估普查结果的唯一可行方法。事实上，由于缺乏其他来源的可靠人口数据或数据不完整，因此无法使用其他方法。本文讨论了埃塞俄比亚人口普查后调查可行性的某些方面。特别是，本文报告了在意大利发展合作署资助的一个合作项目框架内开展的试点调查中出现的主要关键问题，该项目旨在为第四次埃塞俄比亚人口和住房普查的筹备工作提供方法支持和技术援助。

{"title":"Towards the 4th population census in Ethiopia: Some insights into the feasibility of the Post-Enumeration Survey","authors":"Giancarlo Carbonetti, Paolo Giacomi, Filomena Grassia, Alessandra Nuccitelli","doi":"10.3233/sji-240024","DOIUrl":"https://doi.org/10.3233/sji-240024","url":null,"abstract":"While national registry systems are evolving worldwide and, in some cases, replacing reliance on censuses, in countries where well-established population registers are lacking, the population and housing census remains the primary source of detailed data on the number of people, their spatial distribution, age and gender structure, living conditions, and other key socio-economic characteristics. The quality of the census findings is crucial for several reasons, including building public trust in the national statistical system. In many developing countries, conducting a Post-Enumeration Survey appears to be the only feasible way to evaluate the census results. Indeed, the lack or incompleteness of reliable demographic data from alternative sources precludes the use of other methods. This paper discusses some aspects of the feasibility of a Post-Enumeration Survey in Ethiopia. In particular, the paper reports on the main critical issues that emerged from the pilot surveys carried out in the framework of a cooperation project – funded by the Italian Agency for Development Cooperation – aimed at providing methodological support and technical assistance for the preparation of the 4th Ethiopian Population and Housing Census.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Food price inflation nowcasting and monitoring 食品价格通胀预报和监测

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-05-08 DOI: 10.3233/sji-230083

Luís Silva e Silva, Christian A. Mongeau Ospina, Carola Fabi

Rising food prices may rapidly push vulnerable populations into food insecurity, especially in developing economies and in low-income countries, where a substantial share of the financial resources available to the poorest households is spent on food. To capture soaring food prices and help in designing mitigating measures, we developed two complementary products: a nowcasting model that estimates official food consumer price inflation up to the current month and a daily food price monitor that checks whether the growth rate of a few basic food commodities exceeds a statistical threshold. Both products were designed with the consideration that the rapid acquisition of data and the automated extraction of insights are indispensable tools for policymakers, particularly in times of crisis. Our framework is characterized by three key aspects. Firstly, we leverage two non-traditional data sources to emphasize the importance of real-time information: a crowdsourced repository of daily food prices and textual insights obtained from newspapers articles. Secondly, our framework offers a global perspective, encompassing 225 countries and territories, which enables the monitoring of food prices dynamics on a global scale. Thirdly, results are made accessible daily via an intuitive and user-friendly interactive dashboard.

粮食价格上涨可能会迅速将弱势群体推向粮食不安全的境地，尤其是在发展中经济体和低收入国家，因为在这些国家，最贫困家庭可用的财政资源中很大一部分都花在了粮食上。为了掌握食品价格飙升的情况并帮助设计缓解措施，我们开发了两个互补产品：一个是预测模型，用于估算截至当月的官方食品消费价格通胀率；另一个是每日食品价格监测器，用于检查几种基本食品的增长率是否超过统计阈值。设计这两个产品时考虑到，快速获取数据和自动提取见解是决策者不可或缺的工具，尤其是在危机时期。我们的框架有三个主要特点。首先，我们利用两个非传统数据源来强调实时信息的重要性：一个是众包的每日食品价格信息库，另一个是从报纸文章中获得的文本见解。其次，我们的框架提供了一个全球视角，涵盖 225 个国家和地区，从而能够监测全球范围内的食品价格动态。第三，每天可通过直观、用户友好的交互式仪表板获取结果。

{"title":"Food price inflation nowcasting and monitoring","authors":"Luís Silva e Silva, Christian A. Mongeau Ospina, Carola Fabi","doi":"10.3233/sji-230083","DOIUrl":"https://doi.org/10.3233/sji-230083","url":null,"abstract":"Rising food prices may rapidly push vulnerable populations into food insecurity, especially in developing economies and in low-income countries, where a substantial share of the financial resources available to the poorest households is spent on food. To capture soaring food prices and help in designing mitigating measures, we developed two complementary products: a nowcasting model that estimates official food consumer price inflation up to the current month and a daily food price monitor that checks whether the growth rate of a few basic food commodities exceeds a statistical threshold. Both products were designed with the consideration that the rapid acquisition of data and the automated extraction of insights are indispensable tools for policymakers, particularly in times of crisis. Our framework is characterized by three key aspects. Firstly, we leverage two non-traditional data sources to emphasize the importance of real-time information: a crowdsourced repository of daily food prices and textual insights obtained from newspapers articles. Secondly, our framework offers a global perspective, encompassing 225 countries and territories, which enables the monitoring of food prices dynamics on a global scale. Thirdly, results are made accessible daily via an intuitive and user-friendly interactive dashboard.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using machine learning algorithms to identify farms on the 2022 Census of Agriculture 使用机器学习算法识别 2022 年农业普查中的农场

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-05-08 DOI: 10.3233/sji-230089

Gavin Corral, Luca Sartore, Katherine Vande Pol, Denise A. Abreu, Linda J Young

As is the case for many National Statistics Institutes, the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) has observed dwindling survey response rates, and the requests for more information at finer temporal and spatial scales have led to increased response burdens. Non-survey data are becoming increasingly abundant and accessible. Consequently, NASS is exploring the potential to complete some or all of a survey record using non-survey data, which would reduce respondent burden and potentially lead to increased response rates. In this paper, the focus is on a large set of records associated with potential farms, which are operations with undetermined farm status (farm/non-farm) and are referred to here as operations with unknown status (OUS). Although they usually have some agriculture, most OUS records are eventually classified as non-farms. Those OUS that are classified as farms tend to have higher proportions of producers from under-represented groups compared to other records. Determining the probability that an OUS record is a farm is an important step in the imputation process. The OUS records that responded to the 2017 U.S. Census of Agriculture were used to develop models to predict farm status using multiple data sources. Evaluated models include bootstrap random forest (RF), logistic regression (LR), neural network (NN), and support vector machine (SVM). Although the SVM had the best outcomes for three of the five metrics, the sensitivity for identifying farms was the lowest (13.8%). The NN model had a sensitivity of 80.5%, which was substantially higher than the other models, and its specificity of 45.3% was the lowest of all models. Because sensitivity was the primary metric of interest and the NN performed reasonably well on the other metrics, the NN was selected as the preferred model.

与许多国家统计局一样，美国农业部（USDA）国家农业统计服务局（NASS）也发现调查回复率不断下降，而且要求在更精细的时间和空间尺度上提供更多信息，导致回复负担加重。非调查数据越来越丰富，也越来越容易获取。因此，NASS 正在探索利用非调查数据完成部分或全部调查记录的可能性，这将减轻应答者的负担，并有可能提高应答率。本文的重点是与潜在农场相关的大量记录，这些农场的农场地位（农场/非农场）尚未确定，在此称为地位不明的农场（OUS）。虽然它们通常都有一些农业活动，但大多数 OUS 记录最终都被归类为非农场。与其他记录相比，那些被归类为农场的 OUS 往往有更高比例的生产者来自代表性不足的群体。确定 OUS 记录是农场的概率是估算过程中的一个重要步骤。对 2017 年美国农业普查做出回应的 OUS 记录被用于开发模型，以利用多种数据源预测农场地位。评估的模型包括引导随机森林（RF）、逻辑回归（LR）、神经网络（NN）和支持向量机（SVM）。虽然 SVM 在五项指标中的三项结果最好，但识别农场的灵敏度最低（13.8%）。NN 模型的灵敏度为 80.5%，大大高于其他模型，而其特异性为 45.3%，是所有模型中最低的。由于灵敏度是主要指标，而 NN 在其他指标上的表现也相当不错，因此 NN 被选为首选模型。

{"title":"Using machine learning algorithms to identify farms on the 2022 Census of Agriculture","authors":"Gavin Corral, Luca Sartore, Katherine Vande Pol, Denise A. Abreu, Linda J Young","doi":"10.3233/sji-230089","DOIUrl":"https://doi.org/10.3233/sji-230089","url":null,"abstract":"As is the case for many National Statistics Institutes, the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) has observed dwindling survey response rates, and the requests for more information at finer temporal and spatial scales have led to increased response burdens. Non-survey data are becoming increasingly abundant and accessible. Consequently, NASS is exploring the potential to complete some or all of a survey record using non-survey data, which would reduce respondent burden and potentially lead to increased response rates. In this paper, the focus is on a large set of records associated with potential farms, which are operations with undetermined farm status (farm/non-farm) and are referred to here as operations with unknown status (OUS). Although they usually have some agriculture, most OUS records are eventually classified as non-farms. Those OUS that are classified as farms tend to have higher proportions of producers from under-represented groups compared to other records. Determining the probability that an OUS record is a farm is an important step in the imputation process. The OUS records that responded to the 2017 U.S. Census of Agriculture were used to develop models to predict farm status using multiple data sources. Evaluated models include bootstrap random forest (RF), logistic regression (LR), neural network (NN), and support vector machine (SVM). Although the SVM had the best outcomes for three of the five metrics, the sensitivity for identifying farms was the lowest (13.8%). The NN model had a sensitivity of 80.5%, which was substantially higher than the other models, and its specificity of 45.3% was the lowest of all models. Because sensitivity was the primary metric of interest and the NN performed reasonably well on the other metrics, the NN was selected as the preferred model.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141129157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FAOSTAT Food Value Chain Domain implementation: Input Output modelling and analytical applications FAOSTAT 粮食价值链领域的实施：投入产出建模和分析应用

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-05-01 DOI: 10.3233/sji-230079

Silvia Cerilli, Michele Vollaro, Veronica Boero, Olivier Lavagne d’Ortigue, Jing Yi

The recent increasing attention to the economic and policy analysis of the food systems from international fora, public institutions and academia calls for the availability of information and data capable of informing about the interrelations across economic sectors and within value chains. The international policy agenda is pushing for a more effective application of measures at country and regional level in line with the recommendations of the 2030 Agenda and its Sustainable Development Goals, for which more systematic and integrated data about economic, social and environmental impacts of policies are requested. The Food Value Chain Domain recently published in FAOSTAT responds to this call. Its data and information shed light on the distribution of final domestic food expenditures across industries (Agriculture, Food Processing, Wholesale, Retail, Accommodations and Food Services) and primary factors (e.g.: Labour, Gross Operating Surplus) on the relative food value chain. The FAOSTAT Domain offers therefore robust and granular information on both the farm and the post-farm gate component of the Food Value Chain. The applied Global Food Dollar methodology, that FAO is contributing to upscale at global level, is based on Leontief decomposition approach on the Input-Output tables. Moreover, whenever the Input-Output table are not available, it is now possible to impute them from Supply-Use tables by applying a conversion methodology, developed by FAO in compliance with European (EUROSTAT), United Nations (UNSD) and international statistical standards as the System of National Accounts. This allows to extend the analysis to several African, Asian, and Latin American countries that produce on regular basis only Supply and Use Tables, and not Industry by industry Input Output Tables. The potential time and data coverage of the methodology is therefore significantly expanded. The aim of this paper is to describe the conceptual framework of the conversion methodology of Supply-Use Tables into Input-Output Tables of the Global Food Dollar methodology, and the potential implementation scope of these methodologies. Preliminary analytical findings of the applied methodologies are presented as well. The new methods and data presented in this paper, being based on data compliant with the International Statistical Standards, as the System of National Accounts, and therefore comparable across countries, associated to larger data availability, have the potential to effectively support food policies at international, regional and national level, as well as contribute to a decision making in line with the 2030 Agenda.

最近，国际论坛、公共机构和学术界对粮食系统的经济和政策分析日益关注，这就要求提供能够说明各经济部门和价值链内部相互关系的信息和数据。国际政策议程正在推动根据 2030 年议程及其可持续发展目标的建议，在国家和地区层面更有效地实施各项措施，为此需要更系统和综合的数据，说明各项政策对经济、社会和环境的影响。粮农统计数据库最近发布的 "粮食价值链领域 "正是响应了这一号召。其数据和信息揭示了国内最终食品支出在各行业（农业、食品加工、批发、零售、住宿和餐饮服务）之间的分布情况，以及相对食品价值链上的主要因素（如：劳动力、总经营盈余）。因此，粮农统计数据库域提供了有关食品价值链中农场和农场后环节的可靠而详细的信息。粮农组织正在全球范围内推广应用的 "全球粮食美元 "方法，是以投入产出表中的列昂惕夫分解法为基础的。此外，在没有投入产出表的情况下，现在可以通过应用粮农组织根据欧洲（EUROSTAT）、联合国（UNSD）和国际统计标准（如国民账户体系）制定的转换方法，从供应-使用表中推算出投入产出表。这样就可以将分析范围扩大到一些非洲、亚洲和拉丁美洲国家，这些国家定期编制的只是供应和使用表，而不是各行业的投入产出表。因此，该方法的潜在时间和数据覆盖范围大大扩展。本文旨在介绍将《供应-使用表》转换为《全球粮食美元》方法的《投入-产出表》的概念框架，以及这些方法的潜在实施范围。此外，还介绍了应用方法的初步分析结果。本文介绍的新方法和数据以符合国际统计标准的数据为基础，如国民账户体系，因此具有国家间可比性，与更大的数据可用性相关联，有可能有效支持国际、区域和国家层面的粮食政策，并有助于根据《2030 年议程》做出决策。

{"title":"FAOSTAT Food Value Chain Domain implementation: Input Output modelling and analytical applications","authors":"Silvia Cerilli, Michele Vollaro, Veronica Boero, Olivier Lavagne d’Ortigue, Jing Yi","doi":"10.3233/sji-230079","DOIUrl":"https://doi.org/10.3233/sji-230079","url":null,"abstract":"The recent increasing attention to the economic and policy analysis of the food systems from international fora, public institutions and academia calls for the availability of information and data capable of informing about the interrelations across economic sectors and within value chains. The international policy agenda is pushing for a more effective application of measures at country and regional level in line with the recommendations of the 2030 Agenda and its Sustainable Development Goals, for which more systematic and integrated data about economic, social and environmental impacts of policies are requested. The Food Value Chain Domain recently published in FAOSTAT responds to this call. Its data and information shed light on the distribution of final domestic food expenditures across industries (Agriculture, Food Processing, Wholesale, Retail, Accommodations and Food Services) and primary factors (e.g.: Labour, Gross Operating Surplus) on the relative food value chain. The FAOSTAT Domain offers therefore robust and granular information on both the farm and the post-farm gate component of the Food Value Chain. The applied Global Food Dollar methodology, that FAO is contributing to upscale at global level, is based on Leontief decomposition approach on the Input-Output tables. Moreover, whenever the Input-Output table are not available, it is now possible to impute them from Supply-Use tables by applying a conversion methodology, developed by FAO in compliance with European (EUROSTAT), United Nations (UNSD) and international statistical standards as the System of National Accounts. This allows to extend the analysis to several African, Asian, and Latin American countries that produce on regular basis only Supply and Use Tables, and not Industry by industry Input Output Tables. The potential time and data coverage of the methodology is therefore significantly expanded. The aim of this paper is to describe the conceptual framework of the conversion methodology of Supply-Use Tables into Input-Output Tables of the Global Food Dollar methodology, and the potential implementation scope of these methodologies. Preliminary analytical findings of the applied methodologies are presented as well. The new methods and data presented in this paper, being based on data compliant with the International Statistical Standards, as the System of National Accounts, and therefore comparable across countries, associated to larger data availability, have the potential to effectively support food policies at international, regional and national level, as well as contribute to a decision making in line with the 2030 Agenda.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141131389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Open Science and the impact of Open Access, Open Data, and FAIR publishing principles on data-driven academic research: Towards ever more transparent, accessible, and reproducible academic output? 开放科学以及开放获取、开放数据和 FAIR 出版原则对数据驱动型学术研究的影响：实现更加透明、可获取和可复制的学术成果？

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-02-21 DOI: 10.3233/sji-240021

Gaby Umbach

Contemporary evidence-informed policy-making (EIPM) and societies require openly accessible high-quality knowledge as input into transparent and accountable decision-making and informed societal action. Open Science1 supports this requirement. As both enablers and logical consequences of the paradigm of Open Science, the ideas of Open Access, Open Data, and FAIR publishing principles revolutionise how academic research needs to be conceptualised, conducted, disseminated, published, and used. This ‘academic openness quartet’ is especially relevant for the ways in which research data are created, annotated, curated, managed, shared, reproduced, (re-)used, and further developed in academia. Greater accessibility of scientific output and scholarly data also aims at increasing the transparency and reproducibility of research results and the quality of research itself. In the applied ‘academic openness quartet’ perspective, they also function as remedies for academic malaises, like missing replicability of results or secrecy around research data. Against this backdrop, the present article offers a conceptual discussion on the four academic openness paradigms, their meanings, interrelations, as well as potential benefits and challenges arising from their application in data-driven research.

当代循证决策（EIPM）和社会需要可公开获取的高质量知识，作为透明、负责任的决策和知情社会行动的投入。开放科学1 支持这一要求。作为 "开放科学 "范式的推动者和逻辑结果，"开放存取"、"开放数据 "和 "公平与公正 "出版原则的理念彻底改变了学术研究的概念化、开展、传播、出版和使用方式。这 "学术开放四重奏 "与学术界创建、注释、编辑、管理、共享、复制、（再）使用和进一步开发研究数据的方式尤为相关。提高科学成果和学术数据的可获取性也是为了提高研究成果的透明度和可复制性以及研究本身的质量。从 "学术开放四重奏 "的应用角度来看，它们也是对学术弊端的补救措施，如成果缺乏可复制性或研究数据保密等。在此背景下，本文从概念上探讨了四种学术开放范式、其含义、相互关系，以及在数据驱动研究中应用这些范式可能带来的益处和挑战。

{"title":"Open Science and the impact of Open Access, Open Data, and FAIR publishing principles on data-driven academic research: Towards ever more transparent, accessible, and reproducible academic output?","authors":"Gaby Umbach","doi":"10.3233/sji-240021","DOIUrl":"https://doi.org/10.3233/sji-240021","url":null,"abstract":"Contemporary evidence-informed policy-making (EIPM) and societies require openly accessible high-quality knowledge as input into transparent and accountable decision-making and informed societal action. Open Science1 supports this requirement. As both enablers and logical consequences of the paradigm of Open Science, the ideas of Open Access, Open Data, and FAIR publishing principles revolutionise how academic research needs to be conceptualised, conducted, disseminated, published, and used. This ‘academic openness quartet’ is especially relevant for the ways in which research data are created, annotated, curated, managed, shared, reproduced, (re-)used, and further developed in academia. Greater accessibility of scientific output and scholarly data also aims at increasing the transparency and reproducibility of research results and the quality of research itself. In the applied ‘academic openness quartet’ perspective, they also function as remedies for academic malaises, like missing replicability of results or secrecy around research data. Against this backdrop, the present article offers a conceptual discussion on the four academic openness paradigms, their meanings, interrelations, as well as potential benefits and challenges arising from their application in data-driven research.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"133 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140443878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistics for the public good: What it means and why it matters 公益统计：意义和重要性

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-02-15 DOI: 10.3233/sji-230116

Sofi Nickson

Official statistics are widely considered to be public goods, however this paper explores a higher aspiration: that they also serve the public good. To achieve this goal, and provide value to societies worldwide, there is a need for discussion around what it truly means for statistics to serve the public good. This paper shares initial perspectives on the matter from the United Kingdom Office for Statistics Regulation (OSR) before demonstrating how serving the public good fits with customer-centric perspectives on value, and calling for interested parties to join this discussion so that we may work together in service of statistics for a global good.

人们普遍认为官方统计数据是公共产品，但本文探讨了一个更高的愿望：官方统计数据也应服务于公共利益。为了实现这一目标并为全球社会提供价值，有必要围绕统计服务于公共利益的真正含义展开讨论。本文分享了英国统计监管局（OSR）对这一问题的初步看法，然后展示了服务于公共利益如何与以客户为中心的价值观点相契合，并呼吁有关各方加入讨论，以便我们能够共同努力，为全球公益事业服务。

引用次数: 0

Address matching using machine learning methods: An application to register-based census 使用机器学习方法进行地址匹配：基于登记册的人口普查应用

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-02-13 DOI: 10.3233/sji-230099

Zahra Rezaei Ghahroodi, Hassan Ranji, Alireza Rezaee

Today, most activities of the statistical offices need to be adapted to the modernization policies of the national statistical system. Therefore, the application of machine learning techniques is mandatory for the main activities of statistical centers. These include important issues such as coding business activities, address matching, prediction of response propensities, and many others. One of the common applications of machine learning methods in official statistics is to match a statistical address to a postal address, in order to establish a link between register-based census and traditional censuses with the aim of providing time series census information. Since there is no unique identifier to directly map the records from different databases, text-based approaches can be applied. In this paper, a novel application of machine learning will be investigated to integrate data sources of governmental records and census, employing text-based learning. Additionally, three new methods of machine learning classification algorithms are proposed. A simulation study has been performed to evaluate the robustness of methods in terms of the degree of duplication and purity of the texts. Due to the limitation of the R programming environment on big data sets, all programming has been successfully implemented on SAS (Statistical analysis system) software.

如今，统计局的大多数活动都需要适应国家统计系统的现代化政策。因此，统计中心的主要活动都必须应用机器学习技术。其中包括对商业活动进行编码、地址匹配、预测反应倾向等重要问题。机器学习方法在官方统计中的常见应用之一是将统计地址与邮政地址进行匹配，以便在基于登记的普查和传统普查之间建立联系，从而提供时间序列普查信息。由于没有唯一的标识符来直接映射来自不同数据库的记录，因此可以采用基于文本的方法。本文将研究一种新的机器学习应用，利用基于文本的学习来整合政府记录和人口普查的数据源。此外，本文还提出了三种新的机器学习分类算法方法。我们进行了一项模拟研究，以评估这些方法在文本的重复程度和纯度方面的稳健性。由于 R 编程环境对大数据集的限制，所有编程都在 SAS（统计分析系统）软件上成功实现。

{"title":"Address matching using machine learning methods: An application to register-based census","authors":"Zahra Rezaei Ghahroodi, Hassan Ranji, Alireza Rezaee","doi":"10.3233/sji-230099","DOIUrl":"https://doi.org/10.3233/sji-230099","url":null,"abstract":"Today, most activities of the statistical offices need to be adapted to the modernization policies of the national statistical system. Therefore, the application of machine learning techniques is mandatory for the main activities of statistical centers. These include important issues such as coding business activities, address matching, prediction of response propensities, and many others. One of the common applications of machine learning methods in official statistics is to match a statistical address to a postal address, in order to establish a link between register-based census and traditional censuses with the aim of providing time series census information. Since there is no unique identifier to directly map the records from different databases, text-based approaches can be applied. In this paper, a novel application of machine learning will be investigated to integrate data sources of governmental records and census, employing text-based learning. Additionally, three new methods of machine learning classification algorithms are proposed. A simulation study has been performed to evaluate the robustness of methods in terms of the degree of duplication and purity of the texts. Due to the limitation of the R programming environment on big data sets, all programming has been successfully implemented on SAS (Statistical analysis system) software.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"314 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140457531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Kantorovich-Wasserstein distance for spatial statistics: The Spatial-KWD library 用于空间统计的 Kantorovich-Wasserstein 距离：空间-KWD 库

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-02-02 DOI: 10.3233/sji-230121

Fabio Ricciato, Stefano Gualandi

In this paper we present Spatial-KWD, a free open-source tool for efficient computation of the Kantorovich-Wasserstein Distance (KWD), also known as Earth Mover Distance, between pairs of binned spatial distributions (histograms) of a non-negative variable. KWD can be used in spatial statistics as a measure of (dis)similarity between spatial distributions of physical or social quantities. KWD represents the minimum total cost of moving the “mass” from one distribution to the other when the “cost” of moving a unit of mass is proportional to the euclidean distance between the source and destination bins. As such, KWD captures the degree of “horizontal displacement” between the two input distributions. Despite its mathematical properties and intuitive physical interpretation, KWD has found little application in spatial statistics until now, mainly due to the high computational complexity of previous implementations that did not allow its application to large problem instances of practical interest. Building upon recent advances in Optimal Transport theory, the Spatial-KWD library allows to compute KWD values for very large instances with hundreds of thousands or even millions of bins. Furthermore, the tool offers a rich set of options and features to enable the flexible use of KWD in diverse practical applications.

本文介绍的 Spatial-KWD 是一款免费开源工具，用于高效计算非负变量的空间分布（直方图）对之间的康托洛维奇-瓦瑟斯坦距离（Kantorovich-Wasserstein Distance，又称地球移动距离）。KWD 可用于空间统计学，衡量物理量或社会量空间分布之间的（不）相似性。当移动一个质量单位的 "成本 "与来源箱和目的地箱之间的欧氏距离成正比时，KWD 表示将 "质量 "从一个分布移动到另一个分布的最小总成本。因此，KWD 反映了两个输入分布之间的 "水平位移 "程度。尽管 KWD 具有数学特性和直观的物理解释，但到目前为止，它在空间统计学中的应用还很少，主要原因是以前的实现方法计算复杂度高，无法应用于具有实际意义的大型问题实例。基于最优传输理论的最新进展，Spatial-KWD 库可以计算具有数十万甚至数百万分区的超大实例的 KWD 值。此外，该工具还提供了丰富的选项和功能，可在各种实际应用中灵活使用 KWD。

{"title":"The Kantorovich-Wasserstein distance for spatial statistics: The Spatial-KWD library","authors":"Fabio Ricciato, Stefano Gualandi","doi":"10.3233/sji-230121","DOIUrl":"https://doi.org/10.3233/sji-230121","url":null,"abstract":"In this paper we present Spatial-KWD, a free open-source tool for efficient computation of the Kantorovich-Wasserstein Distance (KWD), also known as Earth Mover Distance, between pairs of binned spatial distributions (histograms) of a non-negative variable. KWD can be used in spatial statistics as a measure of (dis)similarity between spatial distributions of physical or social quantities. KWD represents the minimum total cost of moving the “mass” from one distribution to the other when the “cost” of moving a unit of mass is proportional to the euclidean distance between the source and destination bins. As such, KWD captures the degree of “horizontal displacement” between the two input distributions. Despite its mathematical properties and intuitive physical interpretation, KWD has found little application in spatial statistics until now, mainly due to the high computational complexity of previous implementations that did not allow its application to large problem instances of practical interest. Building upon recent advances in Optimal Transport theory, the Spatial-KWD library allows to compute KWD values for very large instances with hundreds of thousands or even millions of bins. Furthermore, the tool offers a rich set of options and features to enable the flexible use of KWD in diverse practical applications.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"87 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140461995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A register-based statistical system in New Zealand: Progress and opportunities 新西兰以登记册为基础的统计系统：进展与机遇

Q3 Decision Sciences

Statistical Journal of the IAOS

Pub Date : 2024-02-01 DOI: 10.3233/sji-230106

Celeste Cutting, Michael Alspach, Sarah Cowell, Michael Judd, Simon McBeth, Mathew Page

This paper provides an overview of progress and opportunities in Stats NZ’s journey towards a register-based statistical system. It sets out to provide a status update of the components of the system at Stats NZ – Statistical Business Register (SBR), Statistical Person Register (SPR), and Statistical Location Register (SLR). The drivers for change and changes to the authorising environment are described, including the prioritisation of a register-based statistical system through Stats NZ’s strategic priorities and the updates to the legislative context through the Data and Statistics Act 2022. The current state of each of the base registers is briefly described and detail is provided on the evolution of a SPR and concept development of a property-centric location register.

本文概述了新西兰统计局在建立以登记册为基础的统计系统过程中所取得的进展和面临的机遇。它旨在提供新西兰统计局系统各组成部分的最新情况--统计业务登记册（SBR）、统计人员登记册（SPR）和统计地点登记册（SLR）。报告介绍了变革的驱动因素和授权环境的变化，包括通过新西兰统计局的战略重点优先发展以登记册为基础的统计系统，以及通过《2022 年数据与统计法》更新立法背景。报告简要介绍了各基础登记簿的现状，并详细介绍了SPR的演变以及以房产为中心的位置登记簿的概念发展。

引用次数: 0