The purpose of the article is to explore a complementary approach to the analysis of countries’ participation in global production networks. We have analyzed the methodological aspect from the standpoint of the Micro and Macro theory of international production networks and value chains. In the context of globalization, production networks are actively formed within both individual industries and at the intersectoral level, and they successfully operate not only within limited territories but also at the interstate, interregional and global levels. Therefore, the study of methods for analyzing the participation of countries in global production networks is relevant. The article has used statistical analysis methods. Analytical methods have been used to determine the countries’ leading types of economic activity (fields of specialization, qualitative indicators that characterize each of the industries of the countries). The study of this issue was carried out on the example of the EU countries. One of the methods of analyzing the assessment of bilateral relations of the partner countries’ national economies is complementarity. The article examines the complementarity index as an indicator that determines the trade structure of partner countries. We received a model of the Global map of the International Production Network (nodes of trade) by specific industries, such as Manufacturing, Chemicals and non-metallic mineral products, Rubber and plastics products, computers, electronic and electrical equipment, and transport equipment. To obtain accurate results, we selected specific countries: Germany, the USA, Japan and China, and examined their statistics in two dimensions: gross exports and gross imports, in specifically selected industries.
{"title":"Complementary approach to the analysis of countries’ participation in global production networks","authors":"Oleksandr Osaulenko, Andriy Krysovatyy, I. Zvarych, Nataliia Reznikova, Oksana Brodovska, Ihor Krysovatyy","doi":"10.3233/sji-220094","DOIUrl":"https://doi.org/10.3233/sji-220094","url":null,"abstract":"The purpose of the article is to explore a complementary approach to the analysis of countries’ participation in global production networks. We have analyzed the methodological aspect from the standpoint of the Micro and Macro theory of international production networks and value chains. In the context of globalization, production networks are actively formed within both individual industries and at the intersectoral level, and they successfully operate not only within limited territories but also at the interstate, interregional and global levels. Therefore, the study of methods for analyzing the participation of countries in global production networks is relevant. The article has used statistical analysis methods. Analytical methods have been used to determine the countries’ leading types of economic activity (fields of specialization, qualitative indicators that characterize each of the industries of the countries). The study of this issue was carried out on the example of the EU countries. One of the methods of analyzing the assessment of bilateral relations of the partner countries’ national economies is complementarity. The article examines the complementarity index as an indicator that determines the trade structure of partner countries. We received a model of the Global map of the International Production Network (nodes of trade) by specific industries, such as Manufacturing, Chemicals and non-metallic mineral products, Rubber and plastics products, computers, electronic and electrical equipment, and transport equipment. To obtain accurate results, we selected specific countries: Germany, the USA, Japan and China, and examined their statistics in two dimensions: gross exports and gross imports, in specifically selected industries.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46493870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent years have seen increased interest in the use of alternative data sources in the definition and production of official statistics and indicators for the UN Sustainable Development Goals. In this paper, we consider the application of data science to the production of official statistics, illustrating our perspective through the use of poverty targeting as an application. We show that machine learning can play a central role in the generation of official statistics, combining a variety of types of data (survey, administrative and alternative). We focus on the problem of poverty targeting using the Proxy Means Test in Indonesia, comparing a number of existing statistical and machine learning methods, then introducing new approaches in the spirit of small area estimation that utilize area-level features and data augmentation at the subdistrict level to develop more refined models at the district level, evaluating the methods on three districts in Indonesia on the problem of estimating 2020 per capita household expenditure using data from 2016–2019. The best performing method, XGBoost, is able to reduce inclusion/exclusion errors on the problem of identifying the poorest 40% of the population in comparison to the commonly used Ridge Regression method by between 4.5% and 13.9% in the districts studied.
{"title":"Machine learning and data augmentation in the proxy means test for poverty targeting","authors":"W. Wobcke, Siti Mariyah","doi":"10.3233/sji-230033","DOIUrl":"https://doi.org/10.3233/sji-230033","url":null,"abstract":"Recent years have seen increased interest in the use of alternative data sources in the definition and production of official statistics and indicators for the UN Sustainable Development Goals. In this paper, we consider the application of data science to the production of official statistics, illustrating our perspective through the use of poverty targeting as an application. We show that machine learning can play a central role in the generation of official statistics, combining a variety of types of data (survey, administrative and alternative). We focus on the problem of poverty targeting using the Proxy Means Test in Indonesia, comparing a number of existing statistical and machine learning methods, then introducing new approaches in the spirit of small area estimation that utilize area-level features and data augmentation at the subdistrict level to develop more refined models at the district level, evaluating the methods on three districts in Indonesia on the problem of estimating 2020 per capita household expenditure using data from 2016–2019. The best performing method, XGBoost, is able to reduce inclusion/exclusion errors on the problem of identifying the poorest 40% of the population in comparison to the commonly used Ridge Regression method by between 4.5% and 13.9% in the districts studied.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41973525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial","authors":"","doi":"10.3233/sji-230070","DOIUrl":"https://doi.org/10.3233/sji-230070","url":null,"abstract":"","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41496256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interview with Dominik Rozkrut1","authors":"","doi":"10.3233/sji-230069","DOIUrl":"https://doi.org/10.3233/sji-230069","url":null,"abstract":"","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49159585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Official statisticians managed to quickly adapt to the consequences of the COVID-19 pandemic. Looking forward, an important issue is whether, and how, one should be fundamentally rethinking the way to produce and consume data in a “new normal” state of the world. The pandemic underlined that data producers have to provide more and more varied types of information to their users. It was also a reminder that the statistical landscape has to permanently evolve. As regards central banks’ statisticians, this calls for relying more heavily on data science, making a better use of the large amount of micro-level information available in today’s modern societies, adapting statistical frameworks to meet evolving policy objectives and user needs, and continuing to closely cooperate with other relevant stakeholders
{"title":"The post-pandemic new normal for central bank statistics","authors":"Saira Jahangir-Abdoelrahman, B. Tissot","doi":"10.3233/sji-230050","DOIUrl":"https://doi.org/10.3233/sji-230050","url":null,"abstract":"Official statisticians managed to quickly adapt to the consequences of the COVID-19 pandemic. Looking forward, an important issue is whether, and how, one should be fundamentally rethinking the way to produce and consume data in a “new normal” state of the world. The pandemic underlined that data producers have to provide more and more varied types of information to their users. It was also a reminder that the statistical landscape has to permanently evolve. As regards central banks’ statisticians, this calls for relying more heavily on data science, making a better use of the large amount of micro-level information available in today’s modern societies, adapting statistical frameworks to meet evolving policy objectives and user needs, and continuing to closely cooperate with other relevant stakeholders","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46125799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional wisdom holds that North American Industry Classification System (NAICS) codes chosen by people not experienced with the system are often mis-specified, but there has been little formal research into the scope of the problem. In this paper we explore prevalence of and patterns in misspecification in NAICS codes self-reported on two kinds of business tax forms. Errors are identified by comparing as-filed codes with codes validated by Statistics of Income. We find that over a third of codes are wrong, but that the errors are not random and often (though not always) seem to have logical reasons behind them.
{"title":"Accuracy and errors in self-assigned NAICS codes in tax return data","authors":"C. Oehlert","doi":"10.3233/sji-230035","DOIUrl":"https://doi.org/10.3233/sji-230035","url":null,"abstract":"Conventional wisdom holds that North American Industry Classification System (NAICS) codes chosen by people not experienced with the system are often mis-specified, but there has been little formal research into the scope of the problem. In this paper we explore prevalence of and patterns in misspecification in NAICS codes self-reported on two kinds of business tax forms. Errors are identified by comparing as-filed codes with codes validated by Statistics of Income. We find that over a third of codes are wrong, but that the errors are not random and often (though not always) seem to have logical reasons behind them.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43896281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Whittard, F. Ritchie, Van Phan, A. Bryson, J. Forth, L. Stokes, Carl Singleton
The role of the National Statistical Institution (NSI) is changing, with many now making microdata available to researchers through secure research environments This provides NSIs with an opportunity to benefit from the methodological input from researchers who challenge the data in new ways This article uses the United Kingdom’s Annual Survey of Hours and Earnings (ASHE) to illustrate the point We study whether the use of prefilled forms in ASHE may create inaccurate values in one of the key fields, workplace location, despite there being no direct evidence of it in the data supplied to researchers. We link surveys to examine the hypothesis that employees working for multi-site employers making an ASHE survey submission are more likely to have their work location incorrectly recorded as the respondent fails to correct the work location variable that has been pre-filled. In the short-term, suggestions are made to improve the quality of ASHE microdata, while longer-term we suggest that the burden of collecting additional data could be offset through greater use of electronic data capture. More generally, in a time when statistical budgets are under pressure, this study encourages NSIs to make greater use of the microdata research community to help inform statistical developments.
{"title":"The perils of pre-filling: Lessons from the UK’s Annual Survey of Hours and Earning microdata","authors":"D. Whittard, F. Ritchie, Van Phan, A. Bryson, J. Forth, L. Stokes, Carl Singleton","doi":"10.3233/sji-230013","DOIUrl":"https://doi.org/10.3233/sji-230013","url":null,"abstract":"The role of the National Statistical Institution (NSI) is changing, with many now making microdata available to researchers through secure research environments This provides NSIs with an opportunity to benefit from the methodological input from researchers who challenge the data in new ways This article uses the United Kingdom’s Annual Survey of Hours and Earnings (ASHE) to illustrate the point We study whether the use of prefilled forms in ASHE may create inaccurate values in one of the key fields, workplace location, despite there being no direct evidence of it in the data supplied to researchers. We link surveys to examine the hypothesis that employees working for multi-site employers making an ASHE survey submission are more likely to have their work location incorrectly recorded as the respondent fails to correct the work location variable that has been pre-filled. In the short-term, suggestions are made to improve the quality of ASHE microdata, while longer-term we suggest that the burden of collecting additional data could be offset through greater use of electronic data capture. More generally, in a time when statistical budgets are under pressure, this study encourages NSIs to make greater use of the microdata research community to help inform statistical developments.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45980482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social cohesion is a multi-dimensional concept referring to social connectedness, or the ‘glue’ that connects members of a society through bonds of solidarity and trust, within and across communities and organizations, and within society at large. The concept of social cohesion continues to garner interest in public and policy circles, perhaps reflecting the intuitive appeal of the concept and the role that cohesion can play in societies’ abilities to respond to challenges, to function effectively, and to support rewarding lives. As a latent concept that is not directly observable or measurable, social cohesion is often measured through key dimensions. In this context, a dimension refers to a constituent part of social cohesion. Using factor analysis and data from Statistics Canada’s 2020 General Social Survey on Social Identity, this study identifies nine key dimensions of social cohesion. Latent class modelling is then used to sort respondents into three latent classes or groups (“Low”, high “Confidence-Belonging” and high “Trust-Participation” cohesion groups) of individuals that share common traits and prioritize certain dimensions of social cohesion. The probabilistic classification of individuals in accordance with latent classes provides valuable insights into social sorting mechanisms and how this extends to cohesiveness within Canadian society.
{"title":"What holds us together? Measuring dimensions of social cohesion in Canada","authors":"Samuel MacIsaac, David Wavrock, G. Schellenberg","doi":"10.3233/sji-230055","DOIUrl":"https://doi.org/10.3233/sji-230055","url":null,"abstract":"Social cohesion is a multi-dimensional concept referring to social connectedness, or the ‘glue’ that connects members of a society through bonds of solidarity and trust, within and across communities and organizations, and within society at large. The concept of social cohesion continues to garner interest in public and policy circles, perhaps reflecting the intuitive appeal of the concept and the role that cohesion can play in societies’ abilities to respond to challenges, to function effectively, and to support rewarding lives. As a latent concept that is not directly observable or measurable, social cohesion is often measured through key dimensions. In this context, a dimension refers to a constituent part of social cohesion. Using factor analysis and data from Statistics Canada’s 2020 General Social Survey on Social Identity, this study identifies nine key dimensions of social cohesion. Latent class modelling is then used to sort respondents into three latent classes or groups (“Low”, high “Confidence-Belonging” and high “Trust-Participation” cohesion groups) of individuals that share common traits and prioritize certain dimensions of social cohesion. The probabilistic classification of individuals in accordance with latent classes provides valuable insights into social sorting mechanisms and how this extends to cohesiveness within Canadian society.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42245561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gissler, Region Stockholm, Karolinska Institutet
Cause-of-death statistics is an essential part of health information system. Finland has collected statistics on causes of death for more than 250 years. Since 1936 medical experts at Statistics Finland has been in charge of the coding. Changes in ICD-classification and coding praxis as well as the use of different standard populations and short-lists hampers time trend analyses and international benchmarking. The five Nordic countries and three Baltic countries has made cause-of-death coding comparisons since 2001. A random sample of death certificates are regularly reviewed. This exercise has demonstrated that national coding systems have not always agreed on the main causes of death. However, there has been a clear trend towards greater agreement, even for specific diagnostic groups, such as cancers, external causes and respiratory conditions. Most of the international data collection is voluntary, but the European Union has adopted a mandatory Regulation to ensure that cause-of-death statistics provide adequate information for all EU Member States to monitor Community actions in the field of public health. Since 2011 the data on causes-of-death have to be provided within 24 months after the end of the reference year. Therefore, causes-of-death statistics at Eurostat is more up-to-date than in other international databases.
{"title":"How to improve mortality statistics nationally and internationally?","authors":"M. Gissler, Region Stockholm, Karolinska Institutet","doi":"10.3233/sji-230026","DOIUrl":"https://doi.org/10.3233/sji-230026","url":null,"abstract":"Cause-of-death statistics is an essential part of health information system. Finland has collected statistics on causes of death for more than 250 years. Since 1936 medical experts at Statistics Finland has been in charge of the coding. Changes in ICD-classification and coding praxis as well as the use of different standard populations and short-lists hampers time trend analyses and international benchmarking. The five Nordic countries and three Baltic countries has made cause-of-death coding comparisons since 2001. A random sample of death certificates are regularly reviewed. This exercise has demonstrated that national coding systems have not always agreed on the main causes of death. However, there has been a clear trend towards greater agreement, even for specific diagnostic groups, such as cancers, external causes and respiratory conditions. Most of the international data collection is voluntary, but the European Union has adopted a mandatory Regulation to ensure that cause-of-death statistics provide adequate information for all EU Member States to monitor Community actions in the field of public health. Since 2011 the data on causes-of-death have to be provided within 24 months after the end of the reference year. Therefore, causes-of-death statistics at Eurostat is more up-to-date than in other international databases.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47099879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An ever-increasing deluge of big data is becoming available to national statistical offices globally, but it is well documented that statistics produced by big data alone often suffer from selection bias and are not usually representative of the population at large. In this paper, we construct a new design-based estimator of the median by integrating big data and survey data. Our estimator is asymptotically unbiased and has a smaller variance than a median estimator produced using survey data alone.
{"title":"Integrating big data and survey data for efficient estimation of the median","authors":"Ryan Covey","doi":"10.3233/sji-230054","DOIUrl":"https://doi.org/10.3233/sji-230054","url":null,"abstract":"An ever-increasing deluge of big data is becoming available to national statistical offices globally, but it is well documented that statistics produced by big data alone often suffer from selection bias and are not usually representative of the population at large. In this paper, we construct a new design-based estimator of the median by integrating big data and survey data. Our estimator is asymptotically unbiased and has a smaller variance than a median estimator produced using survey data alone.","PeriodicalId":55877,"journal":{"name":"Statistical Journal of the IAOS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41759468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}