首页 > 最新文献

Statistical Journal of the IAOS最新文献

英文 中文
Building trust and facilitating use of data 建立信任和促进数据的使用
Q3 Decision Sciences Pub Date : 2024-02-01 DOI: 10.3233/sji-240006
Francesca Perucci, Eric Swanson
Multiple crises, including the COVID-19 pandemic and increased frequency and intensity of disasters related to climate change, have demonstrated the critical importance of timely and open access to trusted data. Open data principles and practices that facilitate data access and use, relevance to policy needs, and increase the impact and value of data are central to building trust in data. The paper outlines four trends that present opportunities for expanding adoption and use of open data principles and practices and building data trust: the modernization of data governance; increased attention to the role of citizens in building trust and increasing the relevance of data and citizens’ contribution to data throughout the data value chain; the adoption of open data principles; and the work of watchdog organizations monitoring the progress of countries and agencies and identifying areas of data governance that still need attention.
包括 COVID-19 大流行病在内的多重危机以及与气候变化相关的灾害频率和强度的增加,都表明了及时、开放地获取可信数据的极端重要性。促进数据访问和使用、与政策需求相关、提高数据影响力和价值的开放数据原则和实践,是建立数据信任的核心。本文概述了四种趋势,这些趋势为扩大采用和使用开放数据原则和做法以及建立数据信任提供了机会:数据治理现代化;更加关注公民在建立信任和提高数据相关性方面的作用,以及公民在整个数据价值链中对数据的贡献;采用开放数据原则;监督组织监测各国和各机构的进展情况,并确定仍需关注的数据治理领域。
{"title":"Building trust and facilitating use of data","authors":"Francesca Perucci, Eric Swanson","doi":"10.3233/sji-240006","DOIUrl":"https://doi.org/10.3233/sji-240006","url":null,"abstract":"Multiple crises, including the COVID-19 pandemic and increased frequency and intensity of disasters related to climate change, have demonstrated the critical importance of timely and open access to trusted data. Open data principles and practices that facilitate data access and use, relevance to policy needs, and increase the impact and value of data are central to building trust in data. The paper outlines four trends that present opportunities for expanding adoption and use of open data principles and practices and building data trust: the modernization of data governance; increased attention to the role of citizens in building trust and increasing the relevance of data and citizens’ contribution to data throughout the data value chain; the adoption of open data principles; and the work of watchdog organizations monitoring the progress of countries and agencies and identifying areas of data governance that still need attention.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"177 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140469907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing commercial grape farm efficiency in Armavir region (Armenia) by using two-stage empirical approach 采用两阶段实证方法分析亚美尼亚阿尔马维尔地区商业葡萄农场的效率
Q3 Decision Sciences Pub Date : 2024-01-23 DOI: 10.3233/sji-230064
H. Asatryan, V. Aleksanyan, Samvel Asatryan, M. Manucharyan
The purpose of this paper is to provide an empirical assessment of the economic efficiency of grape-producing farms in Armenia. Upon reviewing various field-related studies the frontier analysis was singled out as a methodological base of this study. More specifically two-stage empirical analysis was performed, which includes the measurement of efficiency levels of grape farms by implementing the DEA technique and then assessing the determinants of obtained efficiency scores by performing Tobit modeling. To obtain necessary data, 365 grape farms from the Armavir region were surveyed. The main findings of this paper suggest that the average efficiency score for grape farms is 0.72, and there is room for improvement in the economic performance of farms with 28%. The main determinants of farm efficiency were cultivated grape varieties, farm size, and selling prices of grapes. The obtained results mainly support the findings of similar studies carried out for various viticulture regions across the world. This study provides some methodology bases for further expansion of similar studies both in terms of including the other Armenian viticulture regions and different years to explore the changes in the efficiency of grape farms over time. This article provides a base of knowledge for policymakers, scholars, researchers, investors, and credit companies for their decision-making processes and other purposes.
本文旨在对亚美尼亚葡萄生产农场的经济效益进行实证评估。在对各种实地相关研究进行审查后,前沿分析被选为本研究的方法论基础。具体而言,本研究进行了两阶段实证分析,包括通过采用 DEA 技术衡量葡萄种植园的效率水平,然后通过 Tobit 模型评估所获得的效率分数的决定因素。为了获得必要的数据,对阿尔马维尔地区的 365 个葡萄园进行了调查。本文的主要研究结果表明,葡萄农场的平均效率为 0.72,农场的经济效益还有 28% 的提升空间。农场效率的主要决定因素是种植的葡萄品种、农场规模和葡萄销售价格。研究结果主要支持世界各地葡萄种植区的类似研究结果。本研究为进一步扩大类似研究提供了一些方法论基础,既包括亚美尼亚其他葡萄栽培地区,也包括不同年份,以探讨葡萄农场效率随时间的变化。本文为决策者、学者、研究人员、投资者和信贷公司的决策过程及其他目的提供了知识基础。
{"title":"Analyzing commercial grape farm efficiency in Armavir region (Armenia) by using two-stage empirical approach","authors":"H. Asatryan, V. Aleksanyan, Samvel Asatryan, M. Manucharyan","doi":"10.3233/sji-230064","DOIUrl":"https://doi.org/10.3233/sji-230064","url":null,"abstract":"The purpose of this paper is to provide an empirical assessment of the economic efficiency of grape-producing farms in Armenia. Upon reviewing various field-related studies the frontier analysis was singled out as a methodological base of this study. More specifically two-stage empirical analysis was performed, which includes the measurement of efficiency levels of grape farms by implementing the DEA technique and then assessing the determinants of obtained efficiency scores by performing Tobit modeling. To obtain necessary data, 365 grape farms from the Armavir region were surveyed. The main findings of this paper suggest that the average efficiency score for grape farms is 0.72, and there is room for improvement in the economic performance of farms with 28%. The main determinants of farm efficiency were cultivated grape varieties, farm size, and selling prices of grapes. The obtained results mainly support the findings of similar studies carried out for various viticulture regions across the world. This study provides some methodology bases for further expansion of similar studies both in terms of including the other Armenian viticulture regions and different years to explore the changes in the efficiency of grape farms over time. This article provides a base of knowledge for policymakers, scholars, researchers, investors, and credit companies for their decision-making processes and other purposes.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"28 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140499035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaboration between national statistical offices and academia: Benefits, conditions, areas of collaboration and practical level experience in countries 国家统计局与学术界之间的合作:各国的优势、条件、合作领域和实践经验
Q3 Decision Sciences Pub Date : 2024-01-23 DOI: 10.3233/sji-230117
Charlotte Juul Hansen, Lina Maria Sanchez Cespedes, Leonardo Trujillo Oyola, X. K. Dimakos, Bianca Walsh, Renata Souza Bueno, Amos T. Kabo-Bah, Omar Seidu, Vibeke Oestreich Nielsen
National statistical offices (NSOs) and academia benefit from establishing partnerships and collaborating in different ways by bringing together their respective expertise. Collaborative alliances of this nature appear to offer numerous advantages for both the partners and the public and seem to be essential for unlocking opportunities within the evolving data ecosystem. Establishing good and fruitful collaboration between academia and NSOs requires a collaborative environment where each partner can see the benefits of the collaboration and how they could contribute. Different areas of collaboration are presented within four categories: education and learning, research, promotion of data use in society and providing services to each other. The article further discusses the benefits and conditions of a successful partnership. Examples from Brazil, Colombia, Ghana, and Norway showcase practical-level experiences and some lessons learned at the country level.
国家统计局 (NSO) 和学术界通过汇集各自的专业知识,以不同的方式建立伙伴关系和开展合作。这种性质的合作联盟似乎能为合作伙伴和公众带来诸多好处,对于在不断发展的数据生态系统中释放机遇似乎至关重要。在学术界和国家统计局之间建立良好而富有成效的合作需要一个合作环境,在这个环境中,每个合作伙伴都能看到合作的益处以及他们可以做出的贡献。文章介绍了四个类别中的不同合作领域:教育与学习、研究、促进数据在社会中的使用以及相互提供服务。文章进一步讨论了成功合作的好处和条件。巴西、哥伦比亚、加纳和挪威的实例展示了国家层面的实际经验和一些教训。
{"title":"Collaboration between national statistical offices and academia: Benefits, conditions, areas of collaboration and practical level experience in countries","authors":"Charlotte Juul Hansen, Lina Maria Sanchez Cespedes, Leonardo Trujillo Oyola, X. K. Dimakos, Bianca Walsh, Renata Souza Bueno, Amos T. Kabo-Bah, Omar Seidu, Vibeke Oestreich Nielsen","doi":"10.3233/sji-230117","DOIUrl":"https://doi.org/10.3233/sji-230117","url":null,"abstract":"National statistical offices (NSOs) and academia benefit from establishing partnerships and collaborating in different ways by bringing together their respective expertise. Collaborative alliances of this nature appear to offer numerous advantages for both the partners and the public and seem to be essential for unlocking opportunities within the evolving data ecosystem. Establishing good and fruitful collaboration between academia and NSOs requires a collaborative environment where each partner can see the benefits of the collaboration and how they could contribute. Different areas of collaboration are presented within four categories: education and learning, research, promotion of data use in society and providing services to each other. The article further discusses the benefits and conditions of a successful partnership. Examples from Brazil, Colombia, Ghana, and Norway showcase practical-level experiences and some lessons learned at the country level.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"37 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140498342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial and demographic distributions of personal insolvency: An opportunity for official statistics 个人破产的空间和人口分布:官方统计的机遇
Q3 Decision Sciences Pub Date : 2023-12-15 DOI: 10.3233/sji-230072
Jonas Klingwort, Sven Alexander Brocker, Christian Borgs
German official statistics publish statistics on personal insolvency. These statistics have been recently enhanced using web scraping to extract additional information from a public website on which the insolvency announcements are published. The currently scraped data is used for quality assurance and to derive an early indicator of personal insolvency. This paper provides novel methodological analyses for the same administrative database and presents further opportunities to improve the current official statistics regarding detail and timeliness using web scraping and text mining. These newly derived statistics inform on several aspects regarding personal insolvency’s demographic and spatial distribution.
德国官方统计机构公布了有关个人破产的统计数据。最近,这些统计数据得到了加强,利用网络搜索技术从发布破产公告的公共网站上提取更多信息。目前的刮擦数据用于质量保证和得出个人破产的早期指标。本文对同一行政数据库进行了新颖的方法分析,并提出了利用网络搜刮和文本挖掘改进当前官方统计数据的细节和及时性的进一步机会。这些新得出的统计数据从几个方面说明了个人破产的人口和空间分布情况。
{"title":"Spatial and demographic distributions of personal insolvency: An opportunity for official statistics","authors":"Jonas Klingwort, Sven Alexander Brocker, Christian Borgs","doi":"10.3233/sji-230072","DOIUrl":"https://doi.org/10.3233/sji-230072","url":null,"abstract":"German official statistics publish statistics on personal insolvency. These statistics have been recently enhanced using web scraping to extract additional information from a public website on which the insolvency announcements are published. The currently scraped data is used for quality assurance and to derive an early indicator of personal insolvency. This paper provides novel methodological analyses for the same administrative database and presents further opportunities to improve the current official statistics regarding detail and timeliness using web scraping and text mining. These newly derived statistics inform on several aspects regarding personal insolvency’s demographic and spatial distribution.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138997362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unbiased estimation strategies for respondent driven sampling 受访者驱动抽样的无偏估计策略
Q3 Decision Sciences Pub Date : 2023-11-20 DOI: 10.3233/sji-230087
P. D. Falorsi, G. Alleva, Francesca Petrarca
In this paper, we focus on respondent-driven sampling (RDS), which is a valuable survey methodology to estimate the size and the characteristics of hidden or hard-to-measure population groups. The RDS methodology makes it possible to gather information on these populations by exploiting the relationships between their components. However, RDS suffers from the lack of an estimation methodology that is sufficiently robust to accommodate the varying conditions under which it is applied. In this paper, we address the estimation problem of the RDS methodology and, by approaching it as a particular indirect sampling technique, we propose three unbiased estimation methods as possible solutions.
在本文中,我们将重点关注受访者驱动抽样(RDS),这是一种非常有价值的调查方法,可用于估算隐性或难以测量的人口群体的规模和特征。RDS 方法可以利用这些群体各组成部分之间的关系来收集有关这些群体的信息。然而,由于缺乏足够稳健的估算方法,RDS 难以适应不同的应用条件。本文针对 RDS 方法的估算问题,将其作为一种特殊的间接抽样技术,提出了三种无偏估算方法作为可能的解决方案。
{"title":"Unbiased estimation strategies for respondent driven sampling","authors":"P. D. Falorsi, G. Alleva, Francesca Petrarca","doi":"10.3233/sji-230087","DOIUrl":"https://doi.org/10.3233/sji-230087","url":null,"abstract":"In this paper, we focus on respondent-driven sampling (RDS), which is a valuable survey methodology to estimate the size and the characteristics of hidden or hard-to-measure population groups. The RDS methodology makes it possible to gather information on these populations by exploiting the relationships between their components. However, RDS suffers from the lack of an estimation methodology that is sufficiently robust to accommodate the varying conditions under which it is applied. In this paper, we address the estimation problem of the RDS methodology and, by approaching it as a particular indirect sampling technique, we propose three unbiased estimation methods as possible solutions.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139254869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hard-to-reach population groups in administrative sources: main challenges and future work 行政来源中难以接触到的人口群体:主要挑战和未来工作
Q3 Decision Sciences Pub Date : 2023-11-20 DOI: 10.3233/sji-230074
Donatella Zindato, Maciej Truszczynski
The paper deals with the concept and the definitions of hard-to-reach groups and the ways of capturing them in administrative sources, providing a detailed discussion of the meaning of hard-to-reach in the context of administrative sources and in relation to the traditional hard-to-count groups in censuses and surveys. The review of country practices shows that hard-to-reach populations in administrative data can be interpreted in different ways and that their definition is dependent on countries’ circumstances, though there are two main reasons for identifying a group as hard-to-reach in administrative sources. One of the interpretations is selecting some groups, typically considered difficult to reach with traditional survey methods (such as homeless, illegal immigrants or indigenous people) and then trying to capture them in registers to overcome the challenges of traditional field collection or to get more complete information. At first glance, administrative data might offer the potential to improve frame coverage for some target populations, but may also lead to other hard-to-reach or “hidden” populations for different population groups. Indeed, another interpretation refers to the incompleteness of registers or linked administrative databases, which makes some groups, such as children or elders, hard-to-reach and hence describe with data, due to time lag in reporting of some events or to other accuracy problems with the source itself. The paper summarizes the experience of national statistical offices in accessing hard-to-reach groups and describes problems and challenges in capturing them. It also proposes further possible work to improve access to hard-to-reach groups using administrative data.
本文论述了难以接触群体的概念和定义,以及在行政来源中获取这些群体的方式,详细讨论了难以接触群体在行政来源中的含义,以及与人口普查和调查中传统的难以统计群体的关系。对各国做法的审查表明,行政数据中难以接触人群可以有不同的解释,其定义取决于各国的国情,但在行政来源中将一个群体确定为难以接触人群有两个主要原因。其中一种解释是选择一些通常被认为难以用传统调查方法接触到的群体(如无家可归者、非法移民或原住民),然后试图在登记册中记录他们,以克服传统实地收集的挑战或获得更完整的信息。乍一看,行政数据可能会为改善某些目标人群的框架覆盖提供潜力,但也可能导致不同人群中出现其他难以触及或 "隐藏 "的人群。事实上,另一种解释是指登记册或相关行政数据库的不完整性,这使得一些群体,如儿童或老人,由于某些事件报告的时间滞后或来源本身的其他准确性问题,难以接触到,因此难以用数据描述。本文总结了各国统计局在获取难以接触到的群体方面的经验,并介绍了在获取这些群体方面存在的问题和挑战。文件还提出了进一步开展工作的可能性,以改善利用行政数据获取难以接触到的群体的情况。
{"title":"Hard-to-reach population groups in administrative sources: main challenges and future work","authors":"Donatella Zindato, Maciej Truszczynski","doi":"10.3233/sji-230074","DOIUrl":"https://doi.org/10.3233/sji-230074","url":null,"abstract":"The paper deals with the concept and the definitions of hard-to-reach groups and the ways of capturing them in administrative sources, providing a detailed discussion of the meaning of hard-to-reach in the context of administrative sources and in relation to the traditional hard-to-count groups in censuses and surveys. The review of country practices shows that hard-to-reach populations in administrative data can be interpreted in different ways and that their definition is dependent on countries’ circumstances, though there are two main reasons for identifying a group as hard-to-reach in administrative sources. One of the interpretations is selecting some groups, typically considered difficult to reach with traditional survey methods (such as homeless, illegal immigrants or indigenous people) and then trying to capture them in registers to overcome the challenges of traditional field collection or to get more complete information. At first glance, administrative data might offer the potential to improve frame coverage for some target populations, but may also lead to other hard-to-reach or “hidden” populations for different population groups. Indeed, another interpretation refers to the incompleteness of registers or linked administrative databases, which makes some groups, such as children or elders, hard-to-reach and hence describe with data, due to time lag in reporting of some events or to other accuracy problems with the source itself. The paper summarizes the experience of national statistical offices in accessing hard-to-reach groups and describes problems and challenges in capturing them. It also proposes further possible work to improve access to hard-to-reach groups using administrative data.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"16 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139254707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning estimation of the resident population 常住人口的机器学习估算
Q3 Decision Sciences Pub Date : 2023-11-19 DOI: 10.3233/sji-230090
Violeta Calian, Margherita Zuppardo, Omar Hardarson
In this paper, we formulate the problem of estimating the resident population, i.e. correcting for over-counts in administrative register data, as a binary classification problem. We propose a solution based on machine learning algorithms. The selection and the optimisation of the best algorithm is shown to depend on the goal of prediction. We illustrate this method for two important cases of official statistics, Census resident population and survey design with minimum non-response. The performance of the algorithms, the uncertainty of estimates and of the evaluation metrics are described in detail and implemented in shared, open source code. We exemplify with the results obtained by applying this method to Icelandic register and survey data.
在本文中,我们将常住人口估算问题(即纠正行政登记数据中的过多计算)表述为一个二元分类问题。我们提出了一种基于机器学习算法的解决方案。最佳算法的选择和优化取决于预测目标。我们针对官方统计的两个重要案例--常住人口普查和最小无响应调查设计--说明了这一方法。算法的性能、估计值的不确定性和评估指标都有详细描述,并在共享的开放源代码中实现。我们将此方法应用于冰岛的登记和调查数据,并以此结果为例进行说明。
{"title":"Machine learning estimation of the resident population","authors":"Violeta Calian, Margherita Zuppardo, Omar Hardarson","doi":"10.3233/sji-230090","DOIUrl":"https://doi.org/10.3233/sji-230090","url":null,"abstract":"In this paper, we formulate the problem of estimating the resident population, i.e. correcting for over-counts in administrative register data, as a binary classification problem. We propose a solution based on machine learning algorithms. The selection and the optimisation of the best algorithm is shown to depend on the goal of prediction. We illustrate this method for two important cases of official statistics, Census resident population and survey design with minimum non-response. The performance of the algorithms, the uncertainty of estimates and of the evaluation metrics are described in detail and implemented in shared, open source code. We exemplify with the results obtained by applying this method to Icelandic register and survey data.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139260061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Web scraping for price statistics in the Philippines 通过网络搜索菲律宾的价格统计数据
Q3 Decision Sciences Pub Date : 2023-11-17 DOI: 10.3233/sji-230030
Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado
Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping, a process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper provides a survey of experiences of various government statistical agencies in their conduct of web scraping for the Consumer Price Index (CPI). Moreover, it details the Philippines’ experience using web scraped data to estimate the food and alcoholic beverages CPI of the National Capital Region in the Philippines, and that is compared to the official CPI estimate of the Philippine Statistics Authority. Finally, this paper discusses the challenges encountered and the recommendations for enhancing the approach.
菲律宾的官方价格统计数据主要来源于定期调查和人口普查,这需要高昂的成本。随着企业进入数字化平台,这些传统数据源的替代品越来越多,其中之一就是网络搜索,一种从网络收集信息的过程。随着数字和在线平台越来越多地被用于商业活动,网络搜索提供了一种方法,与价格调查相比,它可以提高数据收集的频率,同时降低成本。本文调查了各政府统计机构在为消费者价格指数(CPI)进行网络搜索方面的经验。此外,本文还详细介绍了菲律宾使用网络搜索数据估算菲律宾国家首都地区食品和酒精饮料消费价格指数的经验,并将其与菲律宾统计局的官方消费价格指数估算进行了比较。最后,本文讨论了所遇到的挑战和改进该方法的建议。
{"title":"Web scraping for price statistics in the Philippines","authors":"Manuel Leonard F. Albis, Sabrina O. Romasoc, Shushimita G. Pelayo, Bea Andrea C. Gavira, Jazzen Paul J. Asombrado","doi":"10.3233/sji-230030","DOIUrl":"https://doi.org/10.3233/sji-230030","url":null,"abstract":"Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping, a process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper provides a survey of experiences of various government statistical agencies in their conduct of web scraping for the Consumer Price Index (CPI). Moreover, it details the Philippines’ experience using web scraped data to estimate the food and alcoholic beverages CPI of the National Capital Region in the Philippines, and that is compared to the official CPI estimate of the Philippine Statistics Authority. Finally, this paper discusses the challenges encountered and the recommendations for enhancing the approach.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139264876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To count or to estimate: A note on compiling population estimates from administrative data 计算还是估算?关于根据行政数据编制人口估计数的说明
Q3 Decision Sciences Pub Date : 2023-11-15 DOI: 10.3233/sji-230067
John Dunne, Francesca Kay, Timothy Linehan
Like many countries, Ireland has been researching new systems of population estimates compiled using administrative data. Ireland does not have a Central Population Register from which the estimates can be compiled. The primary step in compiling population estimates from administrative data is to first build a Statistical Population Dataset (SPD). Ideally an SPD will have one record for each person in the population containing the relevant attributes. The ideal SPD then allows compilation of statistics by simply counting over records. In practice, the compilation of SPDs is prone to error. These errors can be classified into 4 types of error; overcoverage, undercoverage, domain misclassification and linkage error. Ireland, to date, has investigated 2 different approaches to the compilation of population estimates from administrative data. The first, labeled in this paper as the simple count method, is based on building an SPD which minimises the overall number of individual record errors such that simple counts from the SPD will provide population estimates. The second, labeled in this paper as the estimation method, is based on building an SPD which aims to eliminate all error types bar that of undercoverage and then adjusts counts for undercoverage using Dual System Estimation (DSE) methods to obtain population estimates. This paper explores the advantages and disadvantages of both methods before considering how they could be integrated to eliminate the disadvantages. Many NSIs will be considering similar challenges when compiling annual Census like population estimates and this paper aims to contribute to that discussion.
与许多国家一样,爱尔兰一直在研究利用行政数据编制人口估计数的新系统。爱尔兰没有可用于编制估算的中央人口登记册。利用行政数据编制人口估计的主要步骤是首先建立一个人口统计数据集(SPD)。理想情况下,SPD 将为人口中的每个人提供一条包含相关属性的记录。理想的 SPD 只需对记录进行计数即可编制统计数据。实际上,SPD 的编制容易出错。这些错误可分为 4 类:过度覆盖、覆盖不足、领域分类错误和链接错误。迄今为止,爱尔兰已经研究了 2 种不同的方法来编制行政数据中的人口估计值。第一种方法在本文中称为简单计数法,其基础是建立一个 SPD,最大限度地减少单个记录错误的总体数量,从而使 SPD 的简单计数能够提供人口估计值。第二种方法在本文中称为估算方法,其基础是建立一个旨在消除除覆盖不足以外所有误差类型的 SPD,然后使用双系统估算(DSE)方法对覆盖不足的计数进行调整,以获得人口估算值。本文探讨了这两种方法的优缺点,然后考虑了如何整合这两种方法以消除缺点。许多国家统计机构在编制类似人口普查的年度人口估计时都会考虑类似的挑战,本文旨在为这一讨论做出贡献。
{"title":"To count or to estimate: A note on compiling population estimates from administrative data","authors":"John Dunne, Francesca Kay, Timothy Linehan","doi":"10.3233/sji-230067","DOIUrl":"https://doi.org/10.3233/sji-230067","url":null,"abstract":"Like many countries, Ireland has been researching new systems of population estimates compiled using administrative data. Ireland does not have a Central Population Register from which the estimates can be compiled. The primary step in compiling population estimates from administrative data is to first build a Statistical Population Dataset (SPD). Ideally an SPD will have one record for each person in the population containing the relevant attributes. The ideal SPD then allows compilation of statistics by simply counting over records. In practice, the compilation of SPDs is prone to error. These errors can be classified into 4 types of error; overcoverage, undercoverage, domain misclassification and linkage error. Ireland, to date, has investigated 2 different approaches to the compilation of population estimates from administrative data. The first, labeled in this paper as the simple count method, is based on building an SPD which minimises the overall number of individual record errors such that simple counts from the SPD will provide population estimates. The second, labeled in this paper as the estimation method, is based on building an SPD which aims to eliminate all error types bar that of undercoverage and then adjusts counts for undercoverage using Dual System Estimation (DSE) methods to obtain population estimates. This paper explores the advantages and disadvantages of both methods before considering how they could be integrated to eliminate the disadvantages. Many NSIs will be considering similar challenges when compiling annual Census like population estimates and this paper aims to contribute to that discussion.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"6 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139271001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods1 使用机器学习方法对 2021 年加拿大人口普查受访者的意见进行分类1
Q3 Decision Sciences Pub Date : 2023-11-14 DOI: 10.3233/sji-230063
Joanne Yoon
To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.
为了改进对加拿大人口普查受访者意见的分析,加拿大统计局的数据科学家对传统的机器学习、深度学习和基于变换器的技术进行了比较和评估。尽管存在语言和类别不平衡的问题,但对人口普查受访者评论进行微调的跨语言语言模型--基于变换器的双向编码器表征(XLM-R)取得了 89.91% 的 F1 总分的最佳结果。评估结束后,经过微调的模型被成功用于对 2021 年人口普查的评论进行客观分类,准确率很高。因此,受访者的反馈意见被转给了相应的主题分析师,以便他们在收集后进行分析。
{"title":"Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods1","authors":"Joanne Yoon","doi":"10.3233/sji-230063","DOIUrl":"https://doi.org/10.3233/sji-230063","url":null,"abstract":"To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"46 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139276268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Journal of the IAOS
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1