{"title":"Target-Directed MixUp for Labeling Tangut Characters","authors":"Guangwei Zhang, Yinliang Zhao","doi":"10.1109/ICDAR.2019.00041","DOIUrl":null,"url":null,"abstract":"Deep learning largely improves the performance in computer vision and image understanding tasks depending on large training datasets of labeled images. However, it is usually expensive and time-consuming to label data although unlabeled data are much easier to get. It is practical to build the training dataset iteratively from a small set of manually labeled data because of the limited budget or emerging new categories. The labeled data could not only be used for training the model but also some knowledge could be mined from them for finding examples of the classes not included in the training dataset. Mixup [1] improves the model's accuracy and generalization by augmenting the training dataset with the \"virtual examples\" that are generated by mixing pairs of randomly selected examples from the training dataset. Motivated by Mixup, we propose the Target-Directed Mixup (TDM) method for building the training dataset of the deep learning-based Tangut character recognition system. The virtual examples are generated by mixing two or more similar examples in the training dataset, together with the target examples of unseen classes that need to be labeled, which is a kind of generative few-shot learning. This method can help expand the training dataset by finding real examples of unseen Tangut characters and provide virtual examples that could represent the rare characters that are used very limited in historical documents. According to our experiments, TDM can help recognize the unseen examples at the accuracy of 80% with only 4 to 5 real target examples, which largely reduces human labor in data annotation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning largely improves the performance in computer vision and image understanding tasks depending on large training datasets of labeled images. However, it is usually expensive and time-consuming to label data although unlabeled data are much easier to get. It is practical to build the training dataset iteratively from a small set of manually labeled data because of the limited budget or emerging new categories. The labeled data could not only be used for training the model but also some knowledge could be mined from them for finding examples of the classes not included in the training dataset. Mixup [1] improves the model's accuracy and generalization by augmenting the training dataset with the "virtual examples" that are generated by mixing pairs of randomly selected examples from the training dataset. Motivated by Mixup, we propose the Target-Directed Mixup (TDM) method for building the training dataset of the deep learning-based Tangut character recognition system. The virtual examples are generated by mixing two or more similar examples in the training dataset, together with the target examples of unseen classes that need to be labeled, which is a kind of generative few-shot learning. This method can help expand the training dataset by finding real examples of unseen Tangut characters and provide virtual examples that could represent the rare characters that are used very limited in historical documents. According to our experiments, TDM can help recognize the unseen examples at the accuracy of 80% with only 4 to 5 real target examples, which largely reduces human labor in data annotation.