{"title":"基于CRF和具体规则的通关数据实体提取","authors":"Yonghua Xu, Yi Guo, Zhihong Wang, Wei Sun","doi":"10.1109/ICNISC.2017.00038","DOIUrl":null,"url":null,"abstract":"For the problem of entity extraction on dirty data in customs import and export domain, this paper proposes a data cleaning method based on specific rules and machine learning for commodity named entity extraction. First, KNN (k-Nearest Neighbor) classification algorithm is used to solve mismatches of attributes and their values of fields in domain data tuple. second, commodity named entities and sub attributes of fields are extracted by specific rules and CRF (conditional random field algorithm) model, proposed method can extract correct sub-attributes of each entity. Experiment results proved the advantages of proposed method in comparison with other methods in terms of precision and recall.","PeriodicalId":429511,"journal":{"name":"2017 International Conference on Network and Information Systems for Computers (ICNISC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Entity Extraction of Customs Clearance Data Based on CRF and Specific Rules\",\"authors\":\"Yonghua Xu, Yi Guo, Zhihong Wang, Wei Sun\",\"doi\":\"10.1109/ICNISC.2017.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the problem of entity extraction on dirty data in customs import and export domain, this paper proposes a data cleaning method based on specific rules and machine learning for commodity named entity extraction. First, KNN (k-Nearest Neighbor) classification algorithm is used to solve mismatches of attributes and their values of fields in domain data tuple. second, commodity named entities and sub attributes of fields are extracted by specific rules and CRF (conditional random field algorithm) model, proposed method can extract correct sub-attributes of each entity. Experiment results proved the advantages of proposed method in comparison with other methods in terms of precision and recall.\",\"PeriodicalId\":429511,\"journal\":{\"name\":\"2017 International Conference on Network and Information Systems for Computers (ICNISC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Network and Information Systems for Computers (ICNISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNISC.2017.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Network and Information Systems for Computers (ICNISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNISC.2017.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Entity Extraction of Customs Clearance Data Based on CRF and Specific Rules
For the problem of entity extraction on dirty data in customs import and export domain, this paper proposes a data cleaning method based on specific rules and machine learning for commodity named entity extraction. First, KNN (k-Nearest Neighbor) classification algorithm is used to solve mismatches of attributes and their values of fields in domain data tuple. second, commodity named entities and sub attributes of fields are extracted by specific rules and CRF (conditional random field algorithm) model, proposed method can extract correct sub-attributes of each entity. Experiment results proved the advantages of proposed method in comparison with other methods in terms of precision and recall.