{"title":"中国国家统计局企业层面与海关数据匹配的新算法","authors":"P. Egger, Susie Xi Rao, S. Papini","doi":"10.1080/17538963.2021.1963046","DOIUrl":null,"url":null,"abstract":"ABSTRACT Combining accounting-type firm data and transactions-type customs data has become increasingly important for research in international and industrial economics. The statistical authorities in several countries such as the United States or France provide such linked data without details on sources, and researchers have to assume that the matching is correct and the firm identifiers are unique and flawless in the source data. For some other countries such as Switzerland or China, firm and customs data contain information which permits such linking ex post using string matching based on firm names and their meta-information like addresses. Due to spelling and typos, such matching is prone to some errors. Obtaining the largest-possible number of high-quality matches helps avoid potential biases while keeping crucial details. We report on a new algorithm which improves considerably the hitherto available linking efforts of the National Bureau of Statistics firm-level and the Customs trade data for China.","PeriodicalId":45279,"journal":{"name":"China Economic Journal","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2021-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new algorithm for matching Chinese NBS firm-level with customs data\",\"authors\":\"P. Egger, Susie Xi Rao, S. Papini\",\"doi\":\"10.1080/17538963.2021.1963046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Combining accounting-type firm data and transactions-type customs data has become increasingly important for research in international and industrial economics. The statistical authorities in several countries such as the United States or France provide such linked data without details on sources, and researchers have to assume that the matching is correct and the firm identifiers are unique and flawless in the source data. For some other countries such as Switzerland or China, firm and customs data contain information which permits such linking ex post using string matching based on firm names and their meta-information like addresses. Due to spelling and typos, such matching is prone to some errors. Obtaining the largest-possible number of high-quality matches helps avoid potential biases while keeping crucial details. We report on a new algorithm which improves considerably the hitherto available linking efforts of the National Bureau of Statistics firm-level and the Customs trade data for China.\",\"PeriodicalId\":45279,\"journal\":{\"name\":\"China Economic Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2021-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"China Economic Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/17538963.2021.1963046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"China Economic Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17538963.2021.1963046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
A new algorithm for matching Chinese NBS firm-level with customs data
ABSTRACT Combining accounting-type firm data and transactions-type customs data has become increasingly important for research in international and industrial economics. The statistical authorities in several countries such as the United States or France provide such linked data without details on sources, and researchers have to assume that the matching is correct and the firm identifiers are unique and flawless in the source data. For some other countries such as Switzerland or China, firm and customs data contain information which permits such linking ex post using string matching based on firm names and their meta-information like addresses. Due to spelling and typos, such matching is prone to some errors. Obtaining the largest-possible number of high-quality matches helps avoid potential biases while keeping crucial details. We report on a new algorithm which improves considerably the hitherto available linking efforts of the National Bureau of Statistics firm-level and the Customs trade data for China.