Linda J. Young, G. Carletto, Graciela Márquez, Dominik A. Rozkrut, Spiro Stefanou
Many National Statistical Offices are modernizing the systems and processes underpinning the production of official agricultural statistics. Moving data and processes to the cloud, collecting survey data via the web, automating editing and imputation, incorporating more administrative, remotely sensed and other non-survey data in the estimation process, and more flexible dissemination of information are only some of the areas of current efforts. Although specific modernization efforts have been described, less discussion has been focused on exactly what the future of official agricultural statistics will be. During the 9th International Conference on Agricultural Statistics, which was held May 17–19, 2023, at the World Bank in Washington DC USA, four statistical leaders with diverse perspectives envision the not-too-distant future of official agricultural statistics in 2040.
{"title":"The production of official agricultural statistics in 2040: What does the future hold?","authors":"Linda J. Young, G. Carletto, Graciela Márquez, Dominik A. Rozkrut, Spiro Stefanou","doi":"10.3233/sji-240043","DOIUrl":"https://doi.org/10.3233/sji-240043","url":null,"abstract":"Many National Statistical Offices are modernizing the systems and processes underpinning the production of official agricultural statistics. Moving data and processes to the cloud, collecting survey data via the web, automating editing and imputation, incorporating more administrative, remotely sensed and other non-survey data in the estimation process, and more flexible dissemination of information are only some of the areas of current efforts. Although specific modernization efforts have been described, less discussion has been focused on exactly what the future of official agricultural statistics will be. During the 9th International Conference on Agricultural Statistics, which was held May 17–19, 2023, at the World Bank in Washington DC USA, four statistical leaders with diverse perspectives envision the not-too-distant future of official agricultural statistics in 2040.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"14 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141357335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vandercasteelen Joachim, Namesh Nazar, Yahya Bajwa, Willem Janssen
This paper proposes a conceptual and empirical framework to develop rural transformation strategies tailored to the agroecological potential and market access of rural areas in Pakistan. Such a framework allows to move away from stereotypical countrywide policies as in use in Pakistan and many other countries. Using publicly available geospatial measures of vegetation greenness and an urban gravity model to proxy the agricultural market demand, we classify Pakistan’s rural districts into categories with similar comparative advantages and describe dominant livelihood activities. The framework recommends market-based approaches to support commercial agriculture or non-agriculture business development in well-connected areas and where households have accumulated human and physical capital. In areas with less developed agricultural potential or market access, households will benefit from area-based and community-driven development, skill development, and labor programs. Since data collection is often challenging in rural areas, statistical agencies can use such an empirical framework to advise policymakers on prioritizing public investments and tailoring rural transformation pathways. In addition, statistical agencies can also extend the framework at different levels of resolution, from national to local level, and complement it with primary data sources to validate the usefulness of the approach.
{"title":"Identifying spatially differentiated pathways for rural transformation in Pakistan","authors":"Vandercasteelen Joachim, Namesh Nazar, Yahya Bajwa, Willem Janssen","doi":"10.3233/sji-230082","DOIUrl":"https://doi.org/10.3233/sji-230082","url":null,"abstract":"This paper proposes a conceptual and empirical framework to develop rural transformation strategies tailored to the agroecological potential and market access of rural areas in Pakistan. Such a framework allows to move away from stereotypical countrywide policies as in use in Pakistan and many other countries. Using publicly available geospatial measures of vegetation greenness and an urban gravity model to proxy the agricultural market demand, we classify Pakistan’s rural districts into categories with similar comparative advantages and describe dominant livelihood activities. The framework recommends market-based approaches to support commercial agriculture or non-agriculture business development in well-connected areas and where households have accumulated human and physical capital. In areas with less developed agricultural potential or market access, households will benefit from area-based and community-driven development, skill development, and labor programs. Since data collection is often challenging in rural areas, statistical agencies can use such an empirical framework to advise policymakers on prioritizing public investments and tailoring rural transformation pathways. In addition, statistical agencies can also extend the framework at different levels of resolution, from national to local level, and complement it with primary data sources to validate the usefulness of the approach.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"30 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141360079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identification and replacement of erroneous data is of fundamental importance for the quality of statistical surveys. If statistical units are continuously sampled over an extended period, time series methods can facilitate this task. Numerous outlier identification and replacement procedures are accessible for this particular purpose, like RegArima Approaches within the seasonal adjustment procedures in X13-Arima or Tramo/Seats. These algorithms can be used to identify different types of outliers, like additive outliers, level shifts or transitory changes. In this paper an alternative outlier identification procedure is proposed which is based on a nonlinear model estimated with support vector regressions. The focus of this procedure is on the identification of additive outliers and on the applicability for short time series with less than 3 years of observations.
{"title":"Outlier identification and adjustment for time series","authors":"Markus Fröhlich","doi":"10.3233/sji-230109","DOIUrl":"https://doi.org/10.3233/sji-230109","url":null,"abstract":"Identification and replacement of erroneous data is of fundamental importance for the quality of statistical surveys. If statistical units are continuously sampled over an extended period, time series methods can facilitate this task. Numerous outlier identification and replacement procedures are accessible for this particular purpose, like RegArima Approaches within the seasonal adjustment procedures in X13-Arima or Tramo/Seats. These algorithms can be used to identify different types of outliers, like additive outliers, level shifts or transitory changes. In this paper an alternative outlier identification procedure is proposed which is based on a nonlinear model estimated with support vector regressions. The focus of this procedure is on the identification of additive outliers and on the applicability for short time series with less than 3 years of observations.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"10 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141355876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a rapidly globalizing world, understanding the relationships between major stock markets is of paramount importance for investors and financial analysts. This study explores the interdependence and cointegration of stock markets in Japan, India, and the USA, and explores the dynamics of global financial markets as well as the survival of a long-term and short-term link between these three indices. These leading stock markets were selected because of the researchers’ desire to learn more about the connections between them. From April 2012 through March 2022, we used monthly data from three major stock market indices: the NIKKEI (Japan), theBSE SENSEX (India), and the NASDAQ (USA). Stock market performance in both the United States and India tend to move together. Additionally, the GC test is utilized in an effort to ascertain if the markets have any form of forecasting ability. Based on the results of the tests conducted, it was determined that the NASDAQ index can accurately predict the SENSEX index, but the NIKKEI index. The United States and the Indian stock markets are highly correlated. To further investigate the markets’ potential for foresight, the Granger causality test is applied. Tests showed that while the NASDAQ index predicted the SENSEX index with high precision, the NIKKEI index did not. After a causal relationship has been established, we then look for evidence of a short- and long-term connection.
{"title":"The interdependence and cointegration of stock markets: Evidence from Japan, India and USA","authors":"John Pradeep Kumar, N. Mukund Sharma","doi":"10.3233/sji-240011","DOIUrl":"https://doi.org/10.3233/sji-240011","url":null,"abstract":"In a rapidly globalizing world, understanding the relationships between major stock markets is of paramount importance for investors and financial analysts. This study explores the interdependence and cointegration of stock markets in Japan, India, and the USA, and explores the dynamics of global financial markets as well as the survival of a long-term and short-term link between these three indices. These leading stock markets were selected because of the researchers’ desire to learn more about the connections between them. From April 2012 through March 2022, we used monthly data from three major stock market indices: the NIKKEI (Japan), theBSE SENSEX (India), and the NASDAQ (USA). Stock market performance in both the United States and India tend to move together. Additionally, the GC test is utilized in an effort to ascertain if the markets have any form of forecasting ability. Based on the results of the tests conducted, it was determined that the NASDAQ index can accurately predict the SENSEX index, but the NIKKEI index. The United States and the Indian stock markets are highly correlated. To further investigate the markets’ potential for foresight, the Granger causality test is applied. Tests showed that while the NASDAQ index predicted the SENSEX index with high precision, the NIKKEI index did not. After a causal relationship has been established, we then look for evidence of a short- and long-term connection.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"26 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141358838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin A. Hunt, Jonathon Abernethy, Peter C. Beeson, Maria Bowman, Steven Wallander, Ryan Williams
Gridded landcover datasets like the NASS Cropland Data Layer (CDL) provide a useful resource for analyses of cropland management. However, many farm operation decisions are made at the field level, not the pixel level. To capture relationships between land cover and field characteristics – size, contiguity, etc. – some method is needed to aggregate gridded data into crop fields. To provide a uniform and consistent approach for aggregation of gridded data at the field level over a series of years, this research project developed a set of Crop Sequence Boundaries (CSBs), which are polygons that delineate areas of homogeneous cropping sequences for the contiguous US. The CSBs are open-sourced algorithm-based, geospatial polygons derived using historic CDLs together with road and rail networks to capture areas with common cropping sequences. The CSB approach used geospatial functions in Google Earth Engine (GEE) and in the ArcGIS Pro application. These geospatial functions are run in parallel by sub-dividing the contiguous US into smaller regions based on road and rail boundaries to prevent overlaps or gaps in the data. As a new set of algorithmically delineated field polygons, the CSBs enhance applications requiring large-scale crop mapping with vector-based data.
{"title":"Crop sequence boundaries using USDA national agricultural statistics service historic cropland data layers","authors":"Kevin A. Hunt, Jonathon Abernethy, Peter C. Beeson, Maria Bowman, Steven Wallander, Ryan Williams","doi":"10.3233/sji-230078","DOIUrl":"https://doi.org/10.3233/sji-230078","url":null,"abstract":"Gridded landcover datasets like the NASS Cropland Data Layer (CDL) provide a useful resource for analyses of cropland management. However, many farm operation decisions are made at the field level, not the pixel level. To capture relationships between land cover and field characteristics – size, contiguity, etc. – some method is needed to aggregate gridded data into crop fields. To provide a uniform and consistent approach for aggregation of gridded data at the field level over a series of years, this research project developed a set of Crop Sequence Boundaries (CSBs), which are polygons that delineate areas of homogeneous cropping sequences for the contiguous US. The CSBs are open-sourced algorithm-based, geospatial polygons derived using historic CDLs together with road and rail networks to capture areas with common cropping sequences. The CSB approach used geospatial functions in Google Earth Engine (GEE) and in the ArcGIS Pro application. These geospatial functions are run in parallel by sub-dividing the contiguous US into smaller regions based on road and rail boundaries to prevent overlaps or gaps in the data. As a new set of algorithmically delineated field polygons, the CSBs enhance applications requiring large-scale crop mapping with vector-based data.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":" 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140997857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Moltedo, Cristina Álvarez-Sánchez, Nathalie Troubat, Carlo Cafiero
This paper presents an approach to estimate the between-subject variability in nutrient intake (through the coefficient of variation [CV]) and a method to estimate the prevalence of nutrient inadequacy (PoNI) (for eight micronutrients) using household consumption and expenditure survey (HCES) data. Prevalence values are compared to individual-level estimates derived using the National-Cancer-Institute method. Data come from the 2015 Bangladesh Integrated-Household-Survey, which conducted a household-level 7-day recall (7DR) and two rounds of individual-level 24-hour recall (24HR), filled by one respondent on behalf of all members, for the same rural households. The PoNI values based on 7DR are lower than those calculated from 24HR data, due to the larger average intake estimates from 7DR data. After controlling for differences in average intake estimates and adjusting household-level data for random measurement errors, the PoNI values from 7DR and 24HR data are remarkably close. This highlights the potential use of HCES data (conducted according to international agreed standards) for estimating the level of between-subject variability in usual nutrient intake in a population. The CVs from HCES could be used to compute the PoNI using average intake estimates from individual-level data; and the inadequacy of global nutrient supply using Supply and Utilization Accounts data.
{"title":"Computing levels of nutrient inadequacy from household consumption and expenditure surveys: A case study","authors":"Ana Moltedo, Cristina Álvarez-Sánchez, Nathalie Troubat, Carlo Cafiero","doi":"10.3233/sji-230086","DOIUrl":"https://doi.org/10.3233/sji-230086","url":null,"abstract":"This paper presents an approach to estimate the between-subject variability in nutrient intake (through the coefficient of variation [CV]) and a method to estimate the prevalence of nutrient inadequacy (PoNI) (for eight micronutrients) using household consumption and expenditure survey (HCES) data. Prevalence values are compared to individual-level estimates derived using the National-Cancer-Institute method. Data come from the 2015 Bangladesh Integrated-Household-Survey, which conducted a household-level 7-day recall (7DR) and two rounds of individual-level 24-hour recall (24HR), filled by one respondent on behalf of all members, for the same rural households. The PoNI values based on 7DR are lower than those calculated from 24HR data, due to the larger average intake estimates from 7DR data. After controlling for differences in average intake estimates and adjusting household-level data for random measurement errors, the PoNI values from 7DR and 24HR data are remarkably close. This highlights the potential use of HCES data (conducted according to international agreed standards) for estimating the level of between-subject variability in usual nutrient intake in a population. The CVs from HCES could be used to compute the PoNI using average intake estimates from individual-level data; and the inadequacy of global nutrient supply using Supply and Utilization Accounts data.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":" 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140999169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Astrid Mathiassen, Owen Siyoto, Ellen Cathrine Kiøsterud
Household Consumption and Expenditure Surveys (HCES) collect comprehensive information on households’ consumption and can provide a range of analyses on access to food. They are key to estimating poverty (SDG 1.2.1) and Prevalence of Undernourishment (SDG 2.1.1). Before the food data becomes meaningful for analysis, it needs extensive preparation. While NSOs are responsible for poverty statistics and typically prepare the data for this purpose, it is often organizations or researchers that use the data for food security. Although the preparation of the data for these two purposes has a lot in common, they rely on different traditions and guidelines. This paper presents results from an ongoing project that aims to bridge the gap between these two processes. The project’s goal is that NSOs take the lead in preparing the HCES food data for all uses. An expected result is that the food security statistics will be available at the same time as the other main outputs from the survey and can be used for planning for improved food security. The project includes preparing a guideline for NSOs and others (endorsed by the United Nations’ Statistical Commission in 2024), building capacity in NSOs, and using results in a regional context.
{"title":"More efficient use of household consumption and expenditure surveys (HCES) to inform food security","authors":"Astrid Mathiassen, Owen Siyoto, Ellen Cathrine Kiøsterud","doi":"10.3233/sji-230098","DOIUrl":"https://doi.org/10.3233/sji-230098","url":null,"abstract":"Household Consumption and Expenditure Surveys (HCES) collect comprehensive information on households’ consumption and can provide a range of analyses on access to food. They are key to estimating poverty (SDG 1.2.1) and Prevalence of Undernourishment (SDG 2.1.1). Before the food data becomes meaningful for analysis, it needs extensive preparation. While NSOs are responsible for poverty statistics and typically prepare the data for this purpose, it is often organizations or researchers that use the data for food security. Although the preparation of the data for these two purposes has a lot in common, they rely on different traditions and guidelines. This paper presents results from an ongoing project that aims to bridge the gap between these two processes. The project’s goal is that NSOs take the lead in preparing the HCES food data for all uses. An expected result is that the food security statistics will be available at the same time as the other main outputs from the survey and can be used for planning for improved food security. The project includes preparing a guideline for NSOs and others (endorsed by the United Nations’ Statistical Commission in 2024), building capacity in NSOs, and using results in a regional context.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141021558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dinda Pusparahmi Sholawatunnisa, L. H. Suadaa, Usep Nugraha, Setia Pramana
Gross Domestic Product (GDP) stands as a pivotal indicator, offering strategic insights into economic dynamics. Recent technological advancements, particularly in real-time information dissemination through online economic news platforms, provide an accessible and alternative data source for analyzing GDP movements. This study employs online news classification to identify patterns in the movement and growth rate of Indonesia’s GDP. Utilizing a web scraping technique, we collected data for analysis. The classification models employed include transfer learning from pre-trained language model transformers, with classical machine learning methods serving as baseline models. The results indicate superior performance by the pre-trained language model transformers, achieving the highest accuracy of 0.8880 and 0.7899. In comparison, hyperparameter-tuned classical machine learning models also demonstrated commendable results, with the best accuracy reaching 0.845 and 0.7811. This research underscores the efficacy of leveraging online news classification, particularly through advanced language models. The findings contribute to a nuanced understanding of economic dynamics, aligning with the contemporary landscape of information accessibility and technological progress.
{"title":"Indonesian GDP movement detection using online news classification","authors":"Dinda Pusparahmi Sholawatunnisa, L. H. Suadaa, Usep Nugraha, Setia Pramana","doi":"10.3233/sji-230038","DOIUrl":"https://doi.org/10.3233/sji-230038","url":null,"abstract":"Gross Domestic Product (GDP) stands as a pivotal indicator, offering strategic insights into economic dynamics. Recent technological advancements, particularly in real-time information dissemination through online economic news platforms, provide an accessible and alternative data source for analyzing GDP movements. This study employs online news classification to identify patterns in the movement and growth rate of Indonesia’s GDP. Utilizing a web scraping technique, we collected data for analysis. The classification models employed include transfer learning from pre-trained language model transformers, with classical machine learning methods serving as baseline models. The results indicate superior performance by the pre-trained language model transformers, achieving the highest accuracy of 0.8880 and 0.7899. In comparison, hyperparameter-tuned classical machine learning models also demonstrated commendable results, with the best accuracy reaching 0.845 and 0.7811. This research underscores the efficacy of leveraging online news classification, particularly through advanced language models. The findings contribute to a nuanced understanding of economic dynamics, aligning with the contemporary landscape of information accessibility and technological progress.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"41 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141040901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The question “What is a small-scale producer?” keeps receiving different answers depending on the context in which is posed. Alternative ways of defining smallholders reflect heterogeneous historical and institutional eco-systemic contexts and depend upon what is the role of small-scale agriculture in the rural economy. This has become a pressing issue given the need to monitor the Sustainable Development Goals (SDGs), which refers to “small” farmers. Two important related issues are: 1) the adoption of a specific and robust definition of small-scale food producer (SSFP) and 2) the empirical implementation of this definition to determine the SSFPs. The calculations require suitable databases with microdata at the level of individual farms. Based on the 2020 agricultural census results, we identified the small food producers in Italy. We also proposed and compared other approaches to identify SSFPs, that are simpler than that proposed by the FAO and could also be calculated for other census years. Since revenues are not available for every farm – even the census did not collect this information – the standard indicator of production was used instead of revenues to identify SSFPs.
{"title":"Who are small-scale food producers in Italy? Comparisons among different approaches","authors":"Roberto Gismondi","doi":"10.3233/sji-230085","DOIUrl":"https://doi.org/10.3233/sji-230085","url":null,"abstract":"The question “What is a small-scale producer?” keeps receiving different answers depending on the context in which is posed. Alternative ways of defining smallholders reflect heterogeneous historical and institutional eco-systemic contexts and depend upon what is the role of small-scale agriculture in the rural economy. This has become a pressing issue given the need to monitor the Sustainable Development Goals (SDGs), which refers to “small” farmers. Two important related issues are: 1) the adoption of a specific and robust definition of small-scale food producer (SSFP) and 2) the empirical implementation of this definition to determine the SSFPs. The calculations require suitable databases with microdata at the level of individual farms. Based on the 2020 agricultural census results, we identified the small food producers in Italy. We also proposed and compared other approaches to identify SSFPs, that are simpler than that proposed by the FAO and could also be calculated for other census years. Since revenues are not available for every farm – even the census did not collect this information – the standard indicator of production was used instead of revenues to identify SSFPs.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"55 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140755131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Thurow, Florian Dumpert, Burim Ramosaj, Markus Pauly
Imputation methods are popular tools that allow for a wide range of subsequent analyses on complete data sets. However, in order for these analyses to be trustworthy, it is important that the imputation procedure reflects the true distribution of the unobserved data sufficiently well. This raises the question how well different imputation methods can reproduce multivariate correlations, associations or even the entire multivariate distribution. The paper gives first answers to this question by means of an extensive comparative simulation study. In particular, we evaluate the multivariate distributional accuracy for six state-of-the art imputation algorithms with respect to different measures and give practical recommendations.
{"title":"Assessing the multivariate distributional accuracy of common imputation methods","authors":"M. Thurow, Florian Dumpert, Burim Ramosaj, Markus Pauly","doi":"10.3233/sji-230015","DOIUrl":"https://doi.org/10.3233/sji-230015","url":null,"abstract":"Imputation methods are popular tools that allow for a wide range of subsequent analyses on complete data sets. However, in order for these analyses to be trustworthy, it is important that the imputation procedure reflects the true distribution of the unobserved data sufficiently well. This raises the question how well different imputation methods can reproduce multivariate correlations, associations or even the entire multivariate distribution. The paper gives first answers to this question by means of an extensive comparative simulation study. In particular, we evaluate the multivariate distributional accuracy for six state-of-the art imputation algorithms with respect to different measures and give practical recommendations.","PeriodicalId":509522,"journal":{"name":"Statistical Journal of the IAOS","volume":"13 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140241185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}