Hong Zhao, Jinhai Huang, Wengai Li, Zhaobin Chang, Weijie Wang
{"title":"PSA-HWT: handwritten font generation based on pyramid squeeze attention","authors":"Hong Zhao, Jinhai Huang, Wengai Li, Zhaobin Chang, Weijie Wang","doi":"10.7717/peerj-cs.2261","DOIUrl":null,"url":null,"abstract":"The generator, which combines convolutional neural network (CNN) and Transformer as its core modules, serves as the primary model for the handwriting font generation network and demonstrates effective performance. However, there are still problems with insufficient feature extraction in the overall structure of the font, the thickness of strokes, and the curvature of strokes, resulting in subpar detail in the generated fonts. To solve the problems, we propose a method for constructing a handwritten font generation model based on Pyramid Squeeze Attention, called PSA-HWT. The PSA-HWT model is divided into two parts: an encoder and a decoder. In the encoder, a multi-branch structure is used to extract spatial information at different scales from the input feature map, achieving multi-scale feature extraction. This helps better capture the semantic information and global structure of the font, aiding the generation model in understanding fine-grained features such as the shape, thickness, and curvature of the font. In the decoder, it uses a self-attention mechanism to capture dependencies across various positions in the input sequence. This helps to better understand the relationship between the generated strokes or characters and the handwritten font being generated, ensuring the overall coherence of the generated handwritten text. The experimental results on the IAM dataset demonstrate that PSA-HWT achieves a 16.35% decrease in Fréchet inception distance (FID) score and a 13.09% decrease in Geometry Score (GS) compared to the current advanced methods. This indicates that PSA-HWT generates handwritten fonts of higher quality, making it more practically valuable.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"60 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2261","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The generator, which combines convolutional neural network (CNN) and Transformer as its core modules, serves as the primary model for the handwriting font generation network and demonstrates effective performance. However, there are still problems with insufficient feature extraction in the overall structure of the font, the thickness of strokes, and the curvature of strokes, resulting in subpar detail in the generated fonts. To solve the problems, we propose a method for constructing a handwritten font generation model based on Pyramid Squeeze Attention, called PSA-HWT. The PSA-HWT model is divided into two parts: an encoder and a decoder. In the encoder, a multi-branch structure is used to extract spatial information at different scales from the input feature map, achieving multi-scale feature extraction. This helps better capture the semantic information and global structure of the font, aiding the generation model in understanding fine-grained features such as the shape, thickness, and curvature of the font. In the decoder, it uses a self-attention mechanism to capture dependencies across various positions in the input sequence. This helps to better understand the relationship between the generated strokes or characters and the handwritten font being generated, ensuring the overall coherence of the generated handwritten text. The experimental results on the IAM dataset demonstrate that PSA-HWT achieves a 16.35% decrease in Fréchet inception distance (FID) score and a 13.09% decrease in Geometry Score (GS) compared to the current advanced methods. This indicates that PSA-HWT generates handwritten fonts of higher quality, making it more practically valuable.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.