{"title":"Active Learning and Transfer Learning for Document Segmentation","authors":"D. M. Kiranov, M. A. Ryndin, I. S. Kozlov","doi":"10.1134/s0361768823070046","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>In this paper, we investigate the effectiveness of classical approaches to active learning in the problem of document segmentation with the aim of reducing the size of the training sample. A modified approach to selection of document images for labeling and subsequent model training is presented. The results of active learning are compared to those of transfer learning on fully labeled data. The paper also investigates how the problem domain of a training set, on which a model is initialized for transfer learning, affects the subsequent uptraining of the model.</p>","PeriodicalId":54555,"journal":{"name":"Programming and Computer Software","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Programming and Computer Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1134/s0361768823070046","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we investigate the effectiveness of classical approaches to active learning in the problem of document segmentation with the aim of reducing the size of the training sample. A modified approach to selection of document images for labeling and subsequent model training is presented. The results of active learning are compared to those of transfer learning on fully labeled data. The paper also investigates how the problem domain of a training set, on which a model is initialized for transfer learning, affects the subsequent uptraining of the model.
期刊介绍:
Programming and Computer Software is a peer reviewed journal devoted to problems in all areas of computer science: operating systems, compiler technology, software engineering, artificial intelligence, etc.