使用单一编码器进行光导图像到图像的翻译

IEEE Open Journal of the Computer Society Pub Date : 2024-09-25 DOI:10.1109/OJCS.2024.3462477

Rina Oh;T. Gonsalves

{"title":"使用单一编码器进行光导图像到图像的翻译","authors":"Rina Oh;T. Gonsalves","doi":"10.1109/OJCS.2024.3462477","DOIUrl":null,"url":null,"abstract":"Image-to-image translation involves combining content and style from different images to generate new images. This technology is particularly valuable for exploring artistic aspects, such as how artists from different eras would depict scenes. Deep learning models are ideal for achieving these artistic styles. This study introduces an unpaired image-to-image translation architecture that extracts style features directly from input style images, without requiring a special encoder. Instead, the model uses a single encoder for the content image. To process the spatial features of the content image and the artistic features of the style image, a new normalization function called Direct Adaptive Instance Normalization with Pooling is developed. This function extracts style images more effectively, reducing the computational costs compared to existing guided image-to-image translation models. Additionally, we employed a Vision Transformer (ViT) in the Discriminator to analyze entire spatial features. The new architecture, named Single-Stream Image-to-Image Translation (SSIT), was tested on various tasks, including seasonal translation, weather-based environment transformation, and photo-to-art conversion. The proposed model successfully reflected the design information of the style images, particularly in translating photos to artworks, where it faithfully reproduced color characteristics. Moreover, the model consistently outperformed state-of-the-art translation models in each experiment, as confirmed by Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"624-635"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10694773","citationCount":"0","resultStr":"{\"title\":\"Photogenic Guided Image-to-Image Translation With Single Encoder\",\"authors\":\"Rina Oh;T. Gonsalves\",\"doi\":\"10.1109/OJCS.2024.3462477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-to-image translation involves combining content and style from different images to generate new images. This technology is particularly valuable for exploring artistic aspects, such as how artists from different eras would depict scenes. Deep learning models are ideal for achieving these artistic styles. This study introduces an unpaired image-to-image translation architecture that extracts style features directly from input style images, without requiring a special encoder. Instead, the model uses a single encoder for the content image. To process the spatial features of the content image and the artistic features of the style image, a new normalization function called Direct Adaptive Instance Normalization with Pooling is developed. This function extracts style images more effectively, reducing the computational costs compared to existing guided image-to-image translation models. Additionally, we employed a Vision Transformer (ViT) in the Discriminator to analyze entire spatial features. The new architecture, named Single-Stream Image-to-Image Translation (SSIT), was tested on various tasks, including seasonal translation, weather-based environment transformation, and photo-to-art conversion. The proposed model successfully reflected the design information of the style images, particularly in translating photos to artworks, where it faithfully reproduced color characteristics. Moreover, the model consistently outperformed state-of-the-art translation models in each experiment, as confirmed by Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"5 \",\"pages\":\"624-635\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10694773\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10694773/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10694773/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

图像到图像的翻译涉及将不同图像的内容和风格结合起来，生成新的图像。这项技术对于探索艺术方面尤其有价值，例如不同时代的艺术家如何描绘场景。深度学习模型是实现这些艺术风格的理想选择。本研究介绍了一种无配对图像到图像的翻译架构，它可直接从输入的风格图像中提取风格特征，而无需特殊的编码器。相反，该模型对内容图像使用单一编码器。为了处理内容图像的空间特征和风格图像的艺术特征，开发了一种名为 "池化直接自适应实例归一化 "的新归一化函数。与现有的引导图像到图像转换模型相比，该函数能更有效地提取风格图像，降低计算成本。此外，我们还在判别器中采用了视觉变换器（ViT）来分析整个空间特征。新架构被命名为 "单流图像到图像翻译（SSIT）"，在各种任务中进行了测试，包括季节翻译、基于天气的环境转换以及照片到艺术品的转换。所提出的模型成功地反映了风格图像的设计信息，特别是在将照片转换为艺术作品时，它忠实地再现了色彩特征。此外，该模型在每个实验中的表现始终优于最先进的翻译模型，这一点已通过弗雷谢特起始距离（FID）和核起始距离（KID）得分得到证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Photogenic Guided Image-to-Image Translation With Single Encoder

Image-to-image translation involves combining content and style from different images to generate new images. This technology is particularly valuable for exploring artistic aspects, such as how artists from different eras would depict scenes. Deep learning models are ideal for achieving these artistic styles. This study introduces an unpaired image-to-image translation architecture that extracts style features directly from input style images, without requiring a special encoder. Instead, the model uses a single encoder for the content image. To process the spatial features of the content image and the artistic features of the style image, a new normalization function called Direct Adaptive Instance Normalization with Pooling is developed. This function extracts style images more effectively, reducing the computational costs compared to existing guided image-to-image translation models. Additionally, we employed a Vision Transformer (ViT) in the Discriminator to analyze entire spatial features. The new architecture, named Single-Stream Image-to-Image Translation (SSIT), was tested on various tasks, including seasonal translation, weather-based environment transformation, and photo-to-art conversion. The proposed model successfully reflected the design information of the style images, particularly in translating photos to artworks, where it faithfully reproduced color characteristics. Moreover, the model consistently outperformed state-of-the-art translation models in each experiment, as confirmed by Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量