{"title":"基于模板与内容分离的发票识别方法","authors":"R. Shi, Sanxin Jiang","doi":"10.1109/CCISP55629.2022.9974564","DOIUrl":null,"url":null,"abstract":"It is a necessary task to extract and save structured information from invoices. The existing methods are all to detect and identify the duplication of invoices. Considering that there are a lot of duplicate contents and fixed table structure between invoices of the same type, this method proposes to separate the template and filled contents of invoices by pixel segmentation; The perceptual hash algorithm is used to match the template of the invoice to be tested with the invoice in the template database; After successful matching, use the improved template alignment module to align the new filled content with the template invoice, and then import the new invoice into Excel for saving. Experimental results show that compared with the original method, the text detection time, recognition time and prediction time of this method are reduced by 68%, 91.13% and 89.94% respectively, and the overall prediction time is reduced by 27.26 seconds.","PeriodicalId":431851,"journal":{"name":"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Invoice Recognition Method Based on Separation of Template and Content\",\"authors\":\"R. Shi, Sanxin Jiang\",\"doi\":\"10.1109/CCISP55629.2022.9974564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is a necessary task to extract and save structured information from invoices. The existing methods are all to detect and identify the duplication of invoices. Considering that there are a lot of duplicate contents and fixed table structure between invoices of the same type, this method proposes to separate the template and filled contents of invoices by pixel segmentation; The perceptual hash algorithm is used to match the template of the invoice to be tested with the invoice in the template database; After successful matching, use the improved template alignment module to align the new filled content with the template invoice, and then import the new invoice into Excel for saving. Experimental results show that compared with the original method, the text detection time, recognition time and prediction time of this method are reduced by 68%, 91.13% and 89.94% respectively, and the overall prediction time is reduced by 27.26 seconds.\",\"PeriodicalId\":431851,\"journal\":{\"name\":\"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCISP55629.2022.9974564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Communication, Image and Signal Processing (CCISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCISP55629.2022.9974564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Invoice Recognition Method Based on Separation of Template and Content
It is a necessary task to extract and save structured information from invoices. The existing methods are all to detect and identify the duplication of invoices. Considering that there are a lot of duplicate contents and fixed table structure between invoices of the same type, this method proposes to separate the template and filled contents of invoices by pixel segmentation; The perceptual hash algorithm is used to match the template of the invoice to be tested with the invoice in the template database; After successful matching, use the improved template alignment module to align the new filled content with the template invoice, and then import the new invoice into Excel for saving. Experimental results show that compared with the original method, the text detection time, recognition time and prediction time of this method are reduced by 68%, 91.13% and 89.94% respectively, and the overall prediction time is reduced by 27.26 seconds.