M. Hasan, Md. Ali Hossain, Azmain Yakin Srizon, Abu Sayeed
{"title":"juktoMala: A Handwritten Bengali Consonant Conjuncts Dataset for Optical Character Recognition Using BiT-based M-ResNet-101x3 Architecture","authors":"M. Hasan, Md. Ali Hossain, Azmain Yakin Srizon, Abu Sayeed","doi":"10.1109/ECCE57851.2023.10101581","DOIUrl":null,"url":null,"abstract":"Bengali, the seventh most spoken language in the world by the number of speakers, doesn't have a well-established Optical Character Recognition (OCR) system for handwritten texts. One of the major reasons behind this lacking is contributed to having no complete conjuncts database. No dataset available today covers all the conjunct characters that are used by authors around the globe. In this research, we prepared a complete dataset consisting of 292 consonant conjunct characters, which is the biggest consonant conjunct character dataset to date by the number of classes available in the literature to our knowledge. We applied Big Transfer-based M-ResNet-101x3 Deep Convolutional Neural Network (DCNN) which achieves 91.32% accuracy that outperforms the baseline EfficientNetB7 approach which obtained 81.05% accuracy.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Bengali, the seventh most spoken language in the world by the number of speakers, doesn't have a well-established Optical Character Recognition (OCR) system for handwritten texts. One of the major reasons behind this lacking is contributed to having no complete conjuncts database. No dataset available today covers all the conjunct characters that are used by authors around the globe. In this research, we prepared a complete dataset consisting of 292 consonant conjunct characters, which is the biggest consonant conjunct character dataset to date by the number of classes available in the literature to our knowledge. We applied Big Transfer-based M-ResNet-101x3 Deep Convolutional Neural Network (DCNN) which achieves 91.32% accuracy that outperforms the baseline EfficientNetB7 approach which obtained 81.05% accuracy.