Tien-Nam Nguyen, J. Burie, Thi-Lan Le, Anne-Valérie Schweyer
{"title":"Two-step sequence transformer based method for Cham to Latin script transliteration","authors":"Tien-Nam Nguyen, J. Burie, Thi-Lan Le, Anne-Valérie Schweyer","doi":"10.1145/3604951.3605525","DOIUrl":null,"url":null,"abstract":"Fusion information between visual and textual information is an interesting way to better represent the features. In this work, we propose a method for the text line transliteration of Cham manuscripts by combining visual and textual modality. Instead of using a standard approach that directly recognizes the words in the image, we split the problem into two steps. Firstly, we propose a scenario for recognition where similar characters are considered as unique characters, then we use the transformer model which considers both visual and context information to adjust the prediction when it concerns similar characters to be able to distinguish them. Based on this two-step strategy, the proposed method consists of a sequence to sequence model and a multi-modal transformer. Thus, we can take advantage of both the sequence-to-sequence model and the transformer model. Extensive experiments prove that the proposed method outperforms the approaches of the literature on our Cham manuscripts dataset.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3604951.3605525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fusion information between visual and textual information is an interesting way to better represent the features. In this work, we propose a method for the text line transliteration of Cham manuscripts by combining visual and textual modality. Instead of using a standard approach that directly recognizes the words in the image, we split the problem into two steps. Firstly, we propose a scenario for recognition where similar characters are considered as unique characters, then we use the transformer model which considers both visual and context information to adjust the prediction when it concerns similar characters to be able to distinguish them. Based on this two-step strategy, the proposed method consists of a sequence to sequence model and a multi-modal transformer. Thus, we can take advantage of both the sequence-to-sequence model and the transformer model. Extensive experiments prove that the proposed method outperforms the approaches of the literature on our Cham manuscripts dataset.