Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177430
Xuchou Xu, Zhou Ruan, Lei Yang
Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.
{"title":"Facial Expression Recognition Based on Graph Neural Network","authors":"Xuchou Xu, Zhou Ruan, Lei Yang","doi":"10.1109/ICIVC50857.2020.9177430","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177430","url":null,"abstract":"Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"80 1","pages":"211-214"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88967613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177479
Hongtao Deng, D. Jiang, Kai Wang, Q. Fei
A full-field, three-dimensional and non-contact deformation field measurement method under high temperature environment based on 3D digital image correlation (3D-DIC) is introduced. In order to reduce the impact of high temperature radiation on the image quality, a band-pass filter is placed in front of the camera lens. The two cameras simultaneously take pictures of the object before and after deformation, and use 3D-DIC to measure the three-dimensional deformation field of the object surface. The high temperature deformation field measurement test shows that 3D-DIC can accurately and conveniently measure the deformation field of an object under high temperature environment.
{"title":"High Temperature Deformation Field Measurement Using 3D Digital Image Correlation Method","authors":"Hongtao Deng, D. Jiang, Kai Wang, Q. Fei","doi":"10.1109/ICIVC50857.2020.9177479","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177479","url":null,"abstract":"A full-field, three-dimensional and non-contact deformation field measurement method under high temperature environment based on 3D digital image correlation (3D-DIC) is introduced. In order to reduce the impact of high temperature radiation on the image quality, a band-pass filter is placed in front of the camera lens. The two cameras simultaneously take pictures of the object before and after deformation, and use 3D-DIC to measure the three-dimensional deformation field of the object surface. The high temperature deformation field measurement test shows that 3D-DIC can accurately and conveniently measure the deformation field of an object under high temperature environment.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"96 1","pages":"188-192"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88969096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177478
Tianpeng Xia, S. Liao
In this research, we have conducted a study on color image filtering in Bessel-Fourier moments domain. Bessel-Fourier moments of the two testing color images are computed independently from the three color channels (RGB), then lowpass and highpass filters are applied to the data in Bessel-Fourier moments domain for our investigation. For comparison, filters are applied in Fourier Frequency domain as well. The experimental results suggest that Bessel-Fourier moments of the lower orders contain mainly information of smooth varying components of images, while those of the higher orders are more related to details such as sharp transitions in intensity. It is also found that the Gaussian filters would reduce the ringing effect in Bessel-Fourier moments domain as they do in the Fourier Frequency domain.
{"title":"Color Image Filtering in Bessel-Fourier Moments Domain","authors":"Tianpeng Xia, S. Liao","doi":"10.1109/ICIVC50857.2020.9177478","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177478","url":null,"abstract":"In this research, we have conducted a study on color image filtering in Bessel-Fourier moments domain. Bessel-Fourier moments of the two testing color images are computed independently from the three color channels (RGB), then lowpass and highpass filters are applied to the data in Bessel-Fourier moments domain for our investigation. For comparison, filters are applied in Fourier Frequency domain as well. The experimental results suggest that Bessel-Fourier moments of the lower orders contain mainly information of smooth varying components of images, while those of the higher orders are more related to details such as sharp transitions in intensity. It is also found that the Gaussian filters would reduce the ringing effect in Bessel-Fourier moments domain as they do in the Fourier Frequency domain.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"112 1","pages":"75-81"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88776646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177435
Jie Zhou, Mengying Xu, Rui Yang
One of the most interesting issue regarding to wireless multimedia sensor networks (WMSNs) is to maximizing the network lifetime. Because sensor nodes are constrained in energy, it is very important and necessary to exploit novel duty cycle design algorithms. Such a problem is important in improving network lifetime in WMSNs. The new contribution of our paper is that we propose a clone chaotic niche evolutionary algorithm (CCNEA) for duty cycle design problem in WMSNs. Novel clone operator and chaotic operator have been designed to develop solutions randomly. The strategy merges the merits of clone selection, chaotic generation, and niche operator. CCNEA is a style of swarm algorithm, which has strong global exploit ability. CCNEA utilizes chaotic generation approach which targets to avoid local optima. Then, simulations are performed to verify the robust and efficacy performance of CCNEA compared to methods according to particle swarm optimization (PSO) and quantum genetic algorithm (QGA) under an WMSNs conditions. Simulation experiments denote that the presented CCNEA outperforms PSO and QGA under different conditions, especially for WMSNs that has large number of sensors.
{"title":"Clone Chaotic Niche Evolutionary Algorithm for Duty Cycle Control Optimization in Wireless Multimedia Sensor Networks","authors":"Jie Zhou, Mengying Xu, Rui Yang","doi":"10.1109/ICIVC50857.2020.9177435","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177435","url":null,"abstract":"One of the most interesting issue regarding to wireless multimedia sensor networks (WMSNs) is to maximizing the network lifetime. Because sensor nodes are constrained in energy, it is very important and necessary to exploit novel duty cycle design algorithms. Such a problem is important in improving network lifetime in WMSNs. The new contribution of our paper is that we propose a clone chaotic niche evolutionary algorithm (CCNEA) for duty cycle design problem in WMSNs. Novel clone operator and chaotic operator have been designed to develop solutions randomly. The strategy merges the merits of clone selection, chaotic generation, and niche operator. CCNEA is a style of swarm algorithm, which has strong global exploit ability. CCNEA utilizes chaotic generation approach which targets to avoid local optima. Then, simulations are performed to verify the robust and efficacy performance of CCNEA compared to methods according to particle swarm optimization (PSO) and quantum genetic algorithm (QGA) under an WMSNs conditions. Simulation experiments denote that the presented CCNEA outperforms PSO and QGA under different conditions, especially for WMSNs that has large number of sensors.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"15 1","pages":"278-282"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90767734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern real-time segmentation methods employ two-branch framework to achieve good speed and accuracy trade-off. However, we observe that low-level features coming from the shallow layers go through less processing, producing a potential semantic gap between different levels of features. Meanwhile, a rigid fusion is less effective due to the absence of consideration for two-branch framework characteristics. In this paper, we propose two novel modules: Unified Interplay Module and Separate Pyramid Pooling Module to address those two issues respectively. Based on our proposed modules, we present a novel Dual Stream Segmentation Network (DSSNet), a two-branch framework for real-time semantic segmentation. Compared with BiSeNet, our DSSNet based on ResNet18 achieves better performance 76.45% mIoU on the Cityscapes test dataset while sharing similar computation costs with BiSeNet. Furthermore, our DSSNet with ResNet34 backbone outperforms previous real-time models, achieving 78.5% mIoU on the Cityscapes test dataset with speed of 39 FPS on GTX1080Ti.
{"title":"Dual Stream Segmentation Network for Real-Time Semantic Segmentation","authors":"Changyuan Zhong, Zelin Hu, Miao Li, Hualong Li, Xuanjiang Yang, Fei Liu","doi":"10.1109/ICIVC50857.2020.9177439","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177439","url":null,"abstract":"Modern real-time segmentation methods employ two-branch framework to achieve good speed and accuracy trade-off. However, we observe that low-level features coming from the shallow layers go through less processing, producing a potential semantic gap between different levels of features. Meanwhile, a rigid fusion is less effective due to the absence of consideration for two-branch framework characteristics. In this paper, we propose two novel modules: Unified Interplay Module and Separate Pyramid Pooling Module to address those two issues respectively. Based on our proposed modules, we present a novel Dual Stream Segmentation Network (DSSNet), a two-branch framework for real-time semantic segmentation. Compared with BiSeNet, our DSSNet based on ResNet18 achieves better performance 76.45% mIoU on the Cityscapes test dataset while sharing similar computation costs with BiSeNet. Furthermore, our DSSNet with ResNet34 backbone outperforms previous real-time models, achieving 78.5% mIoU on the Cityscapes test dataset with speed of 39 FPS on GTX1080Ti.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"46 1","pages":"144-149"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91101276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177433
Ni Jiang, Fei-hong Yu
Cell counting is important in medical image analysis for its meaningful information. In this paper, we propose a cell counting network to predict the number of cells in an image with the distribution of cells. The proposed network learns to predict the density map which has a direct relationship with the number of cells. A foreground mask is designed to filter the low-level feature maps and the favorable information is fed to the decoder to recover the spatial information better. The foreground mask is a probability map indicating the pixels are more likely to belong to cells. Experiments on three public datasets show that the proposed model can achieve promising performances. Especially the ablation study on the Adipocyte Cells demonstrates the necessity of the foreground mask.
{"title":"A Foreground Mask Network for Cell Counting","authors":"Ni Jiang, Fei-hong Yu","doi":"10.1109/ICIVC50857.2020.9177433","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177433","url":null,"abstract":"Cell counting is important in medical image analysis for its meaningful information. In this paper, we propose a cell counting network to predict the number of cells in an image with the distribution of cells. The proposed network learns to predict the density map which has a direct relationship with the number of cells. A foreground mask is designed to filter the low-level feature maps and the favorable information is fed to the decoder to recover the spatial information better. The foreground mask is a probability map indicating the pixels are more likely to belong to cells. Experiments on three public datasets show that the proposed model can achieve promising performances. Especially the ablation study on the Adipocyte Cells demonstrates the necessity of the foreground mask.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"4 1","pages":"128-132"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80141776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177463
Yujun Liu, Shi Bao, Chuanying Yang, Shaoying Ma
Dichromacy, also called as visual impairment, is an inherited disease of defective or abnormal color vision characterized by an inability to recognize certain colors. In this paper, a new lightness modification method based on Craik-O'Brien (C-O) effect was proposed in order to improve the color recognition ability of dichromats. The main idea is to modify the lightness values of the contour parts of the regions which are easy to be confused for dichromats by establishing the objective function of considering color distance, and find the optimal lightness modification value by using the steepest descent method. The modified images will generate C-O effect, which will make the observers produce a visual lightness difference, thus improving the color recognition of the images. The proposed method can retain the information details and overall naturalness of the original color images, making it easier to obtain information and perceive color variations and overall characteristics in the original color images for dichromats. The effectiveness and feasibility of the proposed method is shown in the experimental part by means of the comparison and analysis among the test image, the dichromacy simulation images and the result images obtained by this method.
{"title":"A Craik-O'Brien Effect Based Lightness Modification Method Considering Color Distance for Dichromats","authors":"Yujun Liu, Shi Bao, Chuanying Yang, Shaoying Ma","doi":"10.1109/ICIVC50857.2020.9177463","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177463","url":null,"abstract":"Dichromacy, also called as visual impairment, is an inherited disease of defective or abnormal color vision characterized by an inability to recognize certain colors. In this paper, a new lightness modification method based on Craik-O'Brien (C-O) effect was proposed in order to improve the color recognition ability of dichromats. The main idea is to modify the lightness values of the contour parts of the regions which are easy to be confused for dichromats by establishing the objective function of considering color distance, and find the optimal lightness modification value by using the steepest descent method. The modified images will generate C-O effect, which will make the observers produce a visual lightness difference, thus improving the color recognition of the images. The proposed method can retain the information details and overall naturalness of the original color images, making it easier to obtain information and perceive color variations and overall characteristics in the original color images for dichromats. The effectiveness and feasibility of the proposed method is shown in the experimental part by means of the comparison and analysis among the test image, the dichromacy simulation images and the result images obtained by this method.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"1 1","pages":"86-91"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84663772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177492
Hai-Wu Lee, Wen-Tan Gu, Yuan-yuan Wang
In recent years, face recognition technology has developed rapidly, and its application range has become more and more extensive. It is one of the most important application fields in computer vision technology. However, there are still many technical factors that restrict the application and promotion of face recognition technology. For example: shadows, occlusions, light and dark areas, dark light, highlights and other factors will make the face recognition rate drop sharply. Therefore, face recognition has extremely high research and application value. We use the Local Binary Patterns (LBP) algorithms with histogram equalization to obtain high-resolution images and improve the recognition rate in different scenarios, and try to apply face recognition to attendance.
{"title":"Design of Face Recognition Attendance","authors":"Hai-Wu Lee, Wen-Tan Gu, Yuan-yuan Wang","doi":"10.1109/ICIVC50857.2020.9177492","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177492","url":null,"abstract":"In recent years, face recognition technology has developed rapidly, and its application range has become more and more extensive. It is one of the most important application fields in computer vision technology. However, there are still many technical factors that restrict the application and promotion of face recognition technology. For example: shadows, occlusions, light and dark areas, dark light, highlights and other factors will make the face recognition rate drop sharply. Therefore, face recognition has extremely high research and application value. We use the Local Binary Patterns (LBP) algorithms with histogram equalization to obtain high-resolution images and improve the recognition rate in different scenarios, and try to apply face recognition to attendance.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"84 1","pages":"222-226"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83810027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177464
S. Teng
Color constancy (CC) is an essential part of machine vision. Previously reported CC algorithms lacked consistent and clear-cut evaluation diagrams. This paper instead presents a gain-pixel visualization CC algorithm which uses optimization numerical analysis and 2D-3D graphical displays. This graph-based CC algorithm differs from others in that it gives a clear overall perspective on finding the appropriate amount of RGB gain adjustment to achieve image CC. The ground truth (GT) image, which is critical for data accuracy, has been used as a benchmark or a target in image CC. However, GT images in CC are often inconsistently determined or manually checked. This paper will illustrate that an accurate and specific GT image can be obtained or checked using an optimization scheme, namely the grayscale pixel maximization (GPM). Using previously published image CC results for evaluation and comparison, this paper demonstrates the usefulness, accuracy, and especially the forensic capability of this CC algorithm.
色彩恒常性(CC)是机器视觉的重要组成部分。先前报道的CC算法缺乏一致和明确的评估图。本文提出了一种利用优化数值分析和2D-3D图形显示的增益-像素可视化CC算法。这种基于图的CC算法与其他算法的不同之处在于,它给出了一个清晰的整体视角,如何找到合适的RGB增益调整来实现图像CC,对数据精度至关重要的ground truth (GT)图像被用作图像CC的基准或目标,但是CC中的GT图像往往是不一致的确定或人工检查。本文将说明使用优化方案,即灰度像素最大化(GPM),可以获得或检查精确和特定的GT图像。本文使用先前发表的图像CC结果进行评估和比较,证明了该CC算法的有用性,准确性,特别是取证能力。
{"title":"Gain-Pixel Visualization Algorithm Designed for Computational Color Constancy Scheme","authors":"S. Teng","doi":"10.1109/ICIVC50857.2020.9177464","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177464","url":null,"abstract":"Color constancy (CC) is an essential part of machine vision. Previously reported CC algorithms lacked consistent and clear-cut evaluation diagrams. This paper instead presents a gain-pixel visualization CC algorithm which uses optimization numerical analysis and 2D-3D graphical displays. This graph-based CC algorithm differs from others in that it gives a clear overall perspective on finding the appropriate amount of RGB gain adjustment to achieve image CC. The ground truth (GT) image, which is critical for data accuracy, has been used as a benchmark or a target in image CC. However, GT images in CC are often inconsistently determined or manually checked. This paper will illustrate that an accurate and specific GT image can be obtained or checked using an optimization scheme, namely the grayscale pixel maximization (GPM). Using previously published image CC results for evaluation and comparison, this paper demonstrates the usefulness, accuracy, and especially the forensic capability of this CC algorithm.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"20 1","pages":"237-246"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88969213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ICIVC50857.2020.9177494
Zihan Chen, Lianghong Chen, Zhiyuan Zhao, Yue Wang
In recent years, people's pursuit of art has been on the rise. People want computers to be able to create artistic paintings based on descriptions. In this paper, we proposed a novel project, Painting Creator, which uses deep learning technology to enable the computer to generate artistic illustrations from a short piece of text. Our scheme includes two models, image generation model and style transfer model. In the real image generation model, inspired by the application of stack generative adversarial networks in text to image generation, we proposed an improved model, IStackGAN, to solve the problem of image generation. We added a classifier based on the original model and added image structure loss and feature extraction loss to improve the performance of the generator. The generator network can get additional hidden information from the classification information to produce better pictures. The loss of image structure can force the generator to restore the real image, and the loss of feature extraction can verify whether the generator network has extracted the features of the real image set. For the style transfer model, we improved the generator based on the original cycle generative adversarial networks and used the residual block to improve the stability and performance of the u-net generator. To improve the performance of the generator, we also added the cycle consistent loss with MS-SSIM. The experimental results show that our model is improved significantly based on the original paper, and the generated pictures are more vivid in detail, and pictures after the style transfer are more artistic to watch.
{"title":"AI Illustrator: Art Illustration Generation Based on Generative Adversarial Network","authors":"Zihan Chen, Lianghong Chen, Zhiyuan Zhao, Yue Wang","doi":"10.1109/ICIVC50857.2020.9177494","DOIUrl":"https://doi.org/10.1109/ICIVC50857.2020.9177494","url":null,"abstract":"In recent years, people's pursuit of art has been on the rise. People want computers to be able to create artistic paintings based on descriptions. In this paper, we proposed a novel project, Painting Creator, which uses deep learning technology to enable the computer to generate artistic illustrations from a short piece of text. Our scheme includes two models, image generation model and style transfer model. In the real image generation model, inspired by the application of stack generative adversarial networks in text to image generation, we proposed an improved model, IStackGAN, to solve the problem of image generation. We added a classifier based on the original model and added image structure loss and feature extraction loss to improve the performance of the generator. The generator network can get additional hidden information from the classification information to produce better pictures. The loss of image structure can force the generator to restore the real image, and the loss of feature extraction can verify whether the generator network has extracted the features of the real image set. For the style transfer model, we improved the generator based on the original cycle generative adversarial networks and used the residual block to improve the stability and performance of the u-net generator. To improve the performance of the generator, we also added the cycle consistent loss with MS-SSIM. The experimental results show that our model is improved significantly based on the original paper, and the generated pictures are more vivid in detail, and pictures after the style transfer are more artistic to watch.","PeriodicalId":6806,"journal":{"name":"2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC)","volume":"69 1","pages":"155-159"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84354177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}