Nicolás Serrano, Adrià Giménez, A. Sanchís, Alfons Juan-Císcar
{"title":"Active learning strategies for handwritten text transcription","authors":"Nicolás Serrano, Adrià Giménez, A. Sanchís, Alfons Juan-Císcar","doi":"10.1145/1891903.1891962","DOIUrl":null,"url":null,"abstract":"Active learning strategies are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains nearly unexplored. The basic idea is to follow a sequential, line-byline transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. This approach has been recently explored using a conventional strategy by which the user is only asked to supervise words that are not recognized with high confidence. In this paper, the conventional strategy is improved by also letting the system to recompute most probable hypotheses with the constraints imposed by user supervisions. In particular, two strategies are studied which differ in the frequency of hypothesis recomputation on the current line: after each (iterative) or all (delayed) user corrections. Empirical results are reported on two real tasks showing that these strategies outperform the conventional approach.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICMI-MLMI '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1891903.1891962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Active learning strategies are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains nearly unexplored. The basic idea is to follow a sequential, line-byline transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. This approach has been recently explored using a conventional strategy by which the user is only asked to supervise words that are not recognized with high confidence. In this paper, the conventional strategy is improved by also letting the system to recompute most probable hypotheses with the constraints imposed by user supervisions. In particular, two strategies are studied which differ in the frequency of hypothesis recomputation on the current line: after each (iterative) or all (delayed) user corrections. Empirical results are reported on two real tasks showing that these strategies outperform the conventional approach.