VATr++: Choose Your Words Wisely for Handwritten Text Generation

Bram Vanherle;Vittorio Pippi;Silvia Cascianelli;Nick Michiels;Frank Van Reeth;Rita Cucchiara
{"title":"VATr++: Choose Your Words Wisely for Handwritten Text Generation","authors":"Bram Vanherle;Vittorio Pippi;Silvia Cascianelli;Nick Michiels;Frank Van Reeth;Rita Cucchiara","doi":"10.1109/TPAMI.2024.3481154","DOIUrl":null,"url":null,"abstract":"Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect – the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This work extends the VATr (Pippi et al. 2023) Styled-HTG approach by addressing the pre-processing and training issues that it faces, which are common to many HTG models. In particular, we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research – the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"934-948"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10716806/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect – the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This work extends the VATr (Pippi et al. 2023) Styled-HTG approach by addressing the pre-processing and training issues that it faces, which are common to many HTG models. In particular, we propose generally applicable strategies for input preparation and training regularization that allow the model to achieve better performance and generalization capabilities. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research – the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
VATr++:为手写文本生成明智选词
风格手写文本生成(HTG)近年来受到了极大的关注,这得益于基于学习的解决方案的成功,这些解决方案采用了gan、变形器和初步的扩散模型。尽管兴趣激增,但仍有一个关键但尚未得到充分研究的方面-输入(视觉和文本)对HTG模型训练的影响及其随后对性能的影响。这项工作扩展了VATr (Pippi et al. 2023) style -HTG方法,解决了它所面临的预处理和训练问题,这些问题对于许多HTG模型都是常见的。特别是,我们提出了一般适用的输入准备和训练正则化策略,使模型能够获得更好的性能和泛化能力。此外,在这项工作中,我们超越了性能优化,并解决了HTG研究中的一个重大障碍——缺乏标准化的评估协议。特别是,我们提出了HTG评估协议的标准化,并对现有方法进行了全面的基准测试。通过这样做,我们的目标是为HTG战略之间的公平和有意义的比较奠定基础,促进该领域的进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models. Neural Eigenfunctions are Structured Representation Learners. Collaborative Feedback Discriminative Propagation for Video Super-Resolution. UDFStudio: A Unified Framework of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions. SSD: Making Face Forgery Clues Evident Again With Self-Steganographic Detection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1