{"title":"Photogenic Guided Image-to-Image Translation With Single Encoder","authors":"Rina Oh;T. Gonsalves","doi":"10.1109/OJCS.2024.3462477","DOIUrl":null,"url":null,"abstract":"Image-to-image translation involves combining content and style from different images to generate new images. This technology is particularly valuable for exploring artistic aspects, such as how artists from different eras would depict scenes. Deep learning models are ideal for achieving these artistic styles. This study introduces an unpaired image-to-image translation architecture that extracts style features directly from input style images, without requiring a special encoder. Instead, the model uses a single encoder for the content image. To process the spatial features of the content image and the artistic features of the style image, a new normalization function called Direct Adaptive Instance Normalization with Pooling is developed. This function extracts style images more effectively, reducing the computational costs compared to existing guided image-to-image translation models. Additionally, we employed a Vision Transformer (ViT) in the Discriminator to analyze entire spatial features. The new architecture, named Single-Stream Image-to-Image Translation (SSIT), was tested on various tasks, including seasonal translation, weather-based environment transformation, and photo-to-art conversion. The proposed model successfully reflected the design information of the style images, particularly in translating photos to artworks, where it faithfully reproduced color characteristics. Moreover, the model consistently outperformed state-of-the-art translation models in each experiment, as confirmed by Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"624-635"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10694773","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10694773/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image-to-image translation involves combining content and style from different images to generate new images. This technology is particularly valuable for exploring artistic aspects, such as how artists from different eras would depict scenes. Deep learning models are ideal for achieving these artistic styles. This study introduces an unpaired image-to-image translation architecture that extracts style features directly from input style images, without requiring a special encoder. Instead, the model uses a single encoder for the content image. To process the spatial features of the content image and the artistic features of the style image, a new normalization function called Direct Adaptive Instance Normalization with Pooling is developed. This function extracts style images more effectively, reducing the computational costs compared to existing guided image-to-image translation models. Additionally, we employed a Vision Transformer (ViT) in the Discriminator to analyze entire spatial features. The new architecture, named Single-Stream Image-to-Image Translation (SSIT), was tested on various tasks, including seasonal translation, weather-based environment transformation, and photo-to-art conversion. The proposed model successfully reflected the design information of the style images, particularly in translating photos to artworks, where it faithfully reproduced color characteristics. Moreover, the model consistently outperformed state-of-the-art translation models in each experiment, as confirmed by Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) scores.