基于卷积神经网络的字体识别与基于支持向量机的两种特征提取方法的比较研究

Aveen Jalal Mohammed, Jwan Abdulkhaliq Mohammed, Amera Ismail Melhum
{"title":"基于卷积神经网络的字体识别与基于支持向量机的两种特征提取方法的比较研究","authors":"Aveen Jalal Mohammed, Jwan Abdulkhaliq Mohammed, Amera Ismail Melhum","doi":"10.25195/ijci.v49i2.434","DOIUrl":null,"url":null,"abstract":"Font recognition is one of the essential issues in document recognition and analysis, and is frequently a complex and time-consuming process. Many techniques of optical character recognition (OCR) have been suggested and some of them have been marketed, however, a few of these techniques considered font recognition. The issue of OCR is that it saves copies of documents to make them searchable, but the documents stop having the original appearance. To solve this problem, this paper presents a system for recognizing three and six English fonts from character images using Convolution Neural Network (CNN), and then compare the results of proposed system with the two studies. The first study used NCM features and SVM as a classification method, and the second study used DP features and SVM as classification method. The data of this study were taken from Al-Khaffaf dataset [21]. The two types of datasets have been used: the first type is about 27,620 sample for the three fonts classification and the second type is about 72,983 sample for the six fonts classification and both datasets are English character images in gray scale format with 8 bits. The results showed that CNN achieved the highest recognition rate in the proposed system compared with the two studies reached 99.75% and 98.329 % for the three and six fonts recognition, respectively. In addition, CNN got the least time required for creating model about 6 minutes and 23- 24 minutes for three and six fonts recognition, respectively. Based on the results, we can conclude that CNN technique is the best and most accurate model for recognizing fonts.","PeriodicalId":53384,"journal":{"name":"Iraqi Journal for Computers and Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COMPARATIVE STUDY OF FONT RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS AND TWO FEATURE EXTRACTION METHODS WITH SUPPORT VECTOR MACHINE\",\"authors\":\"Aveen Jalal Mohammed, Jwan Abdulkhaliq Mohammed, Amera Ismail Melhum\",\"doi\":\"10.25195/ijci.v49i2.434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Font recognition is one of the essential issues in document recognition and analysis, and is frequently a complex and time-consuming process. Many techniques of optical character recognition (OCR) have been suggested and some of them have been marketed, however, a few of these techniques considered font recognition. The issue of OCR is that it saves copies of documents to make them searchable, but the documents stop having the original appearance. To solve this problem, this paper presents a system for recognizing three and six English fonts from character images using Convolution Neural Network (CNN), and then compare the results of proposed system with the two studies. The first study used NCM features and SVM as a classification method, and the second study used DP features and SVM as classification method. The data of this study were taken from Al-Khaffaf dataset [21]. The two types of datasets have been used: the first type is about 27,620 sample for the three fonts classification and the second type is about 72,983 sample for the six fonts classification and both datasets are English character images in gray scale format with 8 bits. The results showed that CNN achieved the highest recognition rate in the proposed system compared with the two studies reached 99.75% and 98.329 % for the three and six fonts recognition, respectively. In addition, CNN got the least time required for creating model about 6 minutes and 23- 24 minutes for three and six fonts recognition, respectively. Based on the results, we can conclude that CNN technique is the best and most accurate model for recognizing fonts.\",\"PeriodicalId\":53384,\"journal\":{\"name\":\"Iraqi Journal for Computers and Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iraqi Journal for Computers and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25195/ijci.v49i2.434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iraqi Journal for Computers and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25195/ijci.v49i2.434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

字体识别是文档识别和分析的核心问题之一,往往是一个复杂而耗时的过程。许多光学字符识别(OCR)技术已经被提出,其中一些已经上市,然而,这些技术很少考虑字体识别。OCR的问题在于,它保存文档的副本以使其可搜索,但文档不再具有原始外观。为了解决这一问题,本文提出了一种基于卷积神经网络(CNN)从字符图像中识别三种和六种英文字体的系统,并与两种研究的结果进行了比较。第一项研究使用NCM特征和SVM作为分类方法,第二项研究使用DP特征和SVM作为分类方法。本研究数据取自Al-Khaffaf数据集[21]。使用了两类数据集:第一类为三种字体分类约27,620个样本,第二类为六种字体分类约72,983个样本,两类数据集均为8位灰度格式的英文字符图像。结果表明,与两项研究相比,CNN在本文系统中对3种字体和6种字体的识别率分别达到了99.75%和98.329%。此外,CNN创建模型所需时间最短,分别为6分钟和23- 24分钟,分别为3个和6个字体识别。基于这些结果,我们可以得出结论,CNN技术是识别字体最好、最准确的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
COMPARATIVE STUDY OF FONT RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS AND TWO FEATURE EXTRACTION METHODS WITH SUPPORT VECTOR MACHINE
Font recognition is one of the essential issues in document recognition and analysis, and is frequently a complex and time-consuming process. Many techniques of optical character recognition (OCR) have been suggested and some of them have been marketed, however, a few of these techniques considered font recognition. The issue of OCR is that it saves copies of documents to make them searchable, but the documents stop having the original appearance. To solve this problem, this paper presents a system for recognizing three and six English fonts from character images using Convolution Neural Network (CNN), and then compare the results of proposed system with the two studies. The first study used NCM features and SVM as a classification method, and the second study used DP features and SVM as classification method. The data of this study were taken from Al-Khaffaf dataset [21]. The two types of datasets have been used: the first type is about 27,620 sample for the three fonts classification and the second type is about 72,983 sample for the six fonts classification and both datasets are English character images in gray scale format with 8 bits. The results showed that CNN achieved the highest recognition rate in the proposed system compared with the two studies reached 99.75% and 98.329 % for the three and six fonts recognition, respectively. In addition, CNN got the least time required for creating model about 6 minutes and 23- 24 minutes for three and six fonts recognition, respectively. Based on the results, we can conclude that CNN technique is the best and most accurate model for recognizing fonts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
8 weeks
期刊最新文献
Credit Fraud Recognition Based on Performance Evaluation of Deep Learning Algorithm COMPARATIVE STUDY OF CHAOTIC SYSTEM FOR ENCRYPTION DYNAMIC THRESHOLDING GA-BASED ECG FEATURE SELECTION IN CARDIOVASCULAR DISEASE DIAGNOSIS Evaluation of Image Cryptography by Using Secret Session Key and SF Algorithm EDIBLE FISH IDENTIFICATION BASED ON MACHINE LEARNING
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1