{"title":"基于 StyleGAN 的生成式人工智能高级语义段编码器","authors":"Byungseok Kang, Youngjae Jo","doi":"10.1109/mitp.2023.3338026","DOIUrl":null,"url":null,"abstract":"StyleGAN is a widely used model in various AI domains that generates high-quality images. It has many advantages but has the disadvantage of per-pixel noise inputs. These noise inputs used from StyleGAN are independent of location information and have a negative impact on natural location information learning because random noise is inserted in pixel units at intervals. This problem was even more problematic in the area of creating human faces. StyleGAN3 was announced to overcome this, but it did not completely solve the existing problems. If the angle of a human face is more than 30° from the front, the restoration rate further decreases. In this article, we propose an advanced semantic segment encoder that accurately generates eyes, nose, and mouth even when the angle of a human face is rotated more than 60°. We developed a face-angle analyzer to accurately measure the angle of a person’s face. The proposed idea improved restoration performance by approximately 30% compared to existing encoders when the face is not straight ahead.","PeriodicalId":49045,"journal":{"name":"IT Professional","volume":"53 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"StyleGAN-Based Advanced Semantic Segment Encoder for Generative AI\",\"authors\":\"Byungseok Kang, Youngjae Jo\",\"doi\":\"10.1109/mitp.2023.3338026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"StyleGAN is a widely used model in various AI domains that generates high-quality images. It has many advantages but has the disadvantage of per-pixel noise inputs. These noise inputs used from StyleGAN are independent of location information and have a negative impact on natural location information learning because random noise is inserted in pixel units at intervals. This problem was even more problematic in the area of creating human faces. StyleGAN3 was announced to overcome this, but it did not completely solve the existing problems. If the angle of a human face is more than 30° from the front, the restoration rate further decreases. In this article, we propose an advanced semantic segment encoder that accurately generates eyes, nose, and mouth even when the angle of a human face is rotated more than 60°. We developed a face-angle analyzer to accurately measure the angle of a person’s face. The proposed idea improved restoration performance by approximately 30% compared to existing encoders when the face is not straight ahead.\",\"PeriodicalId\":49045,\"journal\":{\"name\":\"IT Professional\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IT Professional\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/mitp.2023.3338026\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IT Professional","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/mitp.2023.3338026","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
StyleGAN-Based Advanced Semantic Segment Encoder for Generative AI
StyleGAN is a widely used model in various AI domains that generates high-quality images. It has many advantages but has the disadvantage of per-pixel noise inputs. These noise inputs used from StyleGAN are independent of location information and have a negative impact on natural location information learning because random noise is inserted in pixel units at intervals. This problem was even more problematic in the area of creating human faces. StyleGAN3 was announced to overcome this, but it did not completely solve the existing problems. If the angle of a human face is more than 30° from the front, the restoration rate further decreases. In this article, we propose an advanced semantic segment encoder that accurately generates eyes, nose, and mouth even when the angle of a human face is rotated more than 60°. We developed a face-angle analyzer to accurately measure the angle of a person’s face. The proposed idea improved restoration performance by approximately 30% compared to existing encoders when the face is not straight ahead.
IT ProfessionalCOMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
5.00
自引率
0.00%
发文量
111
审稿时长
>12 weeks
期刊介绍:
IT Professional is a technical magazine of the IEEE Computer Society. It publishes peer-reviewed articles, columns and departments written for and by IT practitioners and researchers covering:
practical aspects of emerging and leading-edge digital technologies,
original ideas and guidance for IT applications, and
novel IT solutions for the enterprise.
IT Professional’s goal is to inform the broad spectrum of IT executives, IT project managers, IT researchers, and IT application developers from industry, government, and academia.