{"title":"基于随机森林的历史CPS数据纵向分类和预测建模","authors":"Ce Johnson, Hannah E. Schmuckler","doi":"10.1109/sieds55548.2022.9799352","DOIUrl":null,"url":null,"abstract":"The US Census Bureau uses its decennial census codes for industry and occupation in the monthly Current Population Survey. The Census Bureau has regularly revised these three- and four-digit codes to more accurately reflect the reality of work in the United States. These changes make it difficult to study industries and occupations over time. While limited crosswalks exist, there is currently no way to translate an individual's coded occupation or industry to every other scheme for long-term comparison by social scientists. This project aims to impute the most likely code for an individual's occupation and industry into each year's coding scheme by using random forest models to translate industry and occupation across decades. To our knowledge, this is the first tool that can map industry and occupation at scale with a high degree of accuracy into any year's scheme.","PeriodicalId":286724,"journal":{"name":"2022 Systems and Information Engineering Design Symposium (SIEDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests\",\"authors\":\"Ce Johnson, Hannah E. Schmuckler\",\"doi\":\"10.1109/sieds55548.2022.9799352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The US Census Bureau uses its decennial census codes for industry and occupation in the monthly Current Population Survey. The Census Bureau has regularly revised these three- and four-digit codes to more accurately reflect the reality of work in the United States. These changes make it difficult to study industries and occupations over time. While limited crosswalks exist, there is currently no way to translate an individual's coded occupation or industry to every other scheme for long-term comparison by social scientists. This project aims to impute the most likely code for an individual's occupation and industry into each year's coding scheme by using random forest models to translate industry and occupation across decades. To our knowledge, this is the first tool that can map industry and occupation at scale with a high degree of accuracy into any year's scheme.\",\"PeriodicalId\":286724,\"journal\":{\"name\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/sieds55548.2022.9799352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sieds55548.2022.9799352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests
The US Census Bureau uses its decennial census codes for industry and occupation in the monthly Current Population Survey. The Census Bureau has regularly revised these three- and four-digit codes to more accurately reflect the reality of work in the United States. These changes make it difficult to study industries and occupations over time. While limited crosswalks exist, there is currently no way to translate an individual's coded occupation or industry to every other scheme for long-term comparison by social scientists. This project aims to impute the most likely code for an individual's occupation and industry into each year's coding scheme by using random forest models to translate industry and occupation across decades. To our knowledge, this is the first tool that can map industry and occupation at scale with a high degree of accuracy into any year's scheme.