Panas Thongtaweechaikij, Piyawat Tangpong, J. Inthiam, W. Tangsuksant
{"title":"Text Extraction by Optical Character Recognition-Based on the Template Card","authors":"Panas Thongtaweechaikij, Piyawat Tangpong, J. Inthiam, W. Tangsuksant","doi":"10.1109/RESTCON60981.2024.10463567","DOIUrl":null,"url":null,"abstract":"This study evaluates Optical Character Recognition's (OCR) effectiveness in extracting and organizing data from student cards. Assessing diverse OCR techniques, it aims to identify optimal methods for accurate text extraction, considering different formats and languages. The research investigates OCR's impact on information retrieval, analyzing its integration into databases for improved searchability and usability. Our proposed method presents the pre-processing with OCR process including the SIFT, KNN feature matching, MSER technique for noise detection and image transformation. For the experiment, all student cards in King Mongkut’s University of Technology North Bangkok capturing by smartphone, which the resolution of camera is greater than 2 megapixel. This research compares the different technique between traditional tesseract OCR and our proposed method by setting 50% and 70% of Intersection over Union (IoU), The experiment result shows that our proposed method with 70% of IoU has the highest accuracy as 97.36%. According to the result, the proposed illustrate the feasible method for our system.","PeriodicalId":518254,"journal":{"name":"2024 1st International Conference on Robotics, Engineering, Science, and Technology (RESTCON)","volume":"169 3","pages":"188-192"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 1st International Conference on Robotics, Engineering, Science, and Technology (RESTCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RESTCON60981.2024.10463567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study evaluates Optical Character Recognition's (OCR) effectiveness in extracting and organizing data from student cards. Assessing diverse OCR techniques, it aims to identify optimal methods for accurate text extraction, considering different formats and languages. The research investigates OCR's impact on information retrieval, analyzing its integration into databases for improved searchability and usability. Our proposed method presents the pre-processing with OCR process including the SIFT, KNN feature matching, MSER technique for noise detection and image transformation. For the experiment, all student cards in King Mongkut’s University of Technology North Bangkok capturing by smartphone, which the resolution of camera is greater than 2 megapixel. This research compares the different technique between traditional tesseract OCR and our proposed method by setting 50% and 70% of Intersection over Union (IoU), The experiment result shows that our proposed method with 70% of IoU has the highest accuracy as 97.36%. According to the result, the proposed illustrate the feasible method for our system.