{"title":"档案手写体数字分类中数据组成与深度学习模型的比较与评价","authors":"Nathan LeBlanc, I. Valova","doi":"10.11159/jmids.2022.001","DOIUrl":null,"url":null,"abstract":"– Archival maritime logs are well-preserved treasure throve of climate-related data. The analysis of these documents is instrumental to understanding historical climate trends and future predictions. Transcribing such handwritten logs depends on handwritten letter/digit recognition, which is our aim. The shortcomings of OCR (Optical Character Recognition) are manifesting in frequent confusion of digits and letters when it comes to archival handwritten documents. In this extension of conference and thesis work, two such methods are put to the test – convolutional (CNN) and long-short term memory (LSTM) neural networks (NN). A compound model of convolutional NN followed by LSTM is also considered. While all models register high accuracy, it is observed that the compound model performs faster with accuracy above the lone CNN. We also analyse dataset composition and test for size and balance.","PeriodicalId":430248,"journal":{"name":"Journal of Machine Intelligence and Data Science","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison and Evaluation of Data Composition and Deep Learning Models in Archival Handwritten Digit Classification\",\"authors\":\"Nathan LeBlanc, I. Valova\",\"doi\":\"10.11159/jmids.2022.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"– Archival maritime logs are well-preserved treasure throve of climate-related data. The analysis of these documents is instrumental to understanding historical climate trends and future predictions. Transcribing such handwritten logs depends on handwritten letter/digit recognition, which is our aim. The shortcomings of OCR (Optical Character Recognition) are manifesting in frequent confusion of digits and letters when it comes to archival handwritten documents. In this extension of conference and thesis work, two such methods are put to the test – convolutional (CNN) and long-short term memory (LSTM) neural networks (NN). A compound model of convolutional NN followed by LSTM is also considered. While all models register high accuracy, it is observed that the compound model performs faster with accuracy above the lone CNN. We also analyse dataset composition and test for size and balance.\",\"PeriodicalId\":430248,\"journal\":{\"name\":\"Journal of Machine Intelligence and Data Science\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Machine Intelligence and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11159/jmids.2022.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Intelligence and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11159/jmids.2022.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison and Evaluation of Data Composition and Deep Learning Models in Archival Handwritten Digit Classification
– Archival maritime logs are well-preserved treasure throve of climate-related data. The analysis of these documents is instrumental to understanding historical climate trends and future predictions. Transcribing such handwritten logs depends on handwritten letter/digit recognition, which is our aim. The shortcomings of OCR (Optical Character Recognition) are manifesting in frequent confusion of digits and letters when it comes to archival handwritten documents. In this extension of conference and thesis work, two such methods are put to the test – convolutional (CNN) and long-short term memory (LSTM) neural networks (NN). A compound model of convolutional NN followed by LSTM is also considered. While all models register high accuracy, it is observed that the compound model performs faster with accuracy above the lone CNN. We also analyse dataset composition and test for size and balance.