Color image quantization has been widely used as an important task in graphics manipulation and image processing. The key to color image quantization is to generate an efficient color palette. At present, there are many color image quantization methods that have been presented, which are fundamentally clustering-based algorithms. As an illustration, the K-means clustering algorithm is quite popular. However, the K-means algorithm has not been given sufficient focus in the field of color quantization due to its high computational effort caused by multiple iterations and its very susceptibility to initialization. This paper presented an efficient color clustering method to implement fast color quantization. This method mainly addresses the drawbacks of the conventional K-means clustering algorithm, which involves reducing the data samples and making use of triangular inequalities to accelerate the nearest neighbor search. The method mainly contains two stages. During the first phase, an initial palette is generated. In the second phase, quantized images are generated by a modified K-means method. Major modifications include data sampling and mean sorting, avoiding traversal of all cluster centers, and speeding up the time to search the palette. The experimental results illustrate that this presented method is quite competitive with previously presented color quantization algorithms both in the matter of efficiency and effectiveness.
{"title":"Fast image quantization with efficient color clustering","authors":"Yingying Liu","doi":"10.1117/12.2668985","DOIUrl":"https://doi.org/10.1117/12.2668985","url":null,"abstract":"Color image quantization has been widely used as an important task in graphics manipulation and image processing. The key to color image quantization is to generate an efficient color palette. At present, there are many color image quantization methods that have been presented, which are fundamentally clustering-based algorithms. As an illustration, the K-means clustering algorithm is quite popular. However, the K-means algorithm has not been given sufficient focus in the field of color quantization due to its high computational effort caused by multiple iterations and its very susceptibility to initialization. This paper presented an efficient color clustering method to implement fast color quantization. This method mainly addresses the drawbacks of the conventional K-means clustering algorithm, which involves reducing the data samples and making use of triangular inequalities to accelerate the nearest neighbor search. The method mainly contains two stages. During the first phase, an initial palette is generated. In the second phase, quantized images are generated by a modified K-means method. Major modifications include data sampling and mean sorting, avoiding traversal of all cluster centers, and speeding up the time to search the palette. The experimental results illustrate that this presented method is quite competitive with previously presented color quantization algorithms both in the matter of efficiency and effectiveness.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129494544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of mobile internet, real-time target detection using mobile devices has wide application prospects, but the computing power of the terminal greatly limits the speed and accuracy of target detection. Edge-cloud collaborative computing is the main method to solve the lack of computing power of mobile terminals. The current method can't settle the problem of computation scheduling in the edge-cloud collaboration system. Given the existing problems, this paper proposes the pruning technology of classical target detection deep learning networks; training and prediction offloading strategy of edge-to-cloud deep learning network; dynamic load balancing migration strategy based on CPU, memory, bandwidth, and disk state-changing in cluster. After testing, the edge-to-cloud deep learning method can reduce the inference delay by 50% and increase the system throughput by 40%. The maximum waiting time for operation can be reduced by about 20%. The efficiency and accuracy of target detection are effectively improved.
{"title":"Performance optimization of target detection based on edge-to-cloud deep learning","authors":"Zhongkui Fan, Yepeng Guan","doi":"10.1117/12.2668891","DOIUrl":"https://doi.org/10.1117/12.2668891","url":null,"abstract":"With the development of mobile internet, real-time target detection using mobile devices has wide application prospects, but the computing power of the terminal greatly limits the speed and accuracy of target detection. Edge-cloud collaborative computing is the main method to solve the lack of computing power of mobile terminals. The current method can't settle the problem of computation scheduling in the edge-cloud collaboration system. Given the existing problems, this paper proposes the pruning technology of classical target detection deep learning networks; training and prediction offloading strategy of edge-to-cloud deep learning network; dynamic load balancing migration strategy based on CPU, memory, bandwidth, and disk state-changing in cluster. After testing, the edge-to-cloud deep learning method can reduce the inference delay by 50% and increase the system throughput by 40%. The maximum waiting time for operation can be reduced by about 20%. The efficiency and accuracy of target detection are effectively improved.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125768413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the increasing number of private cars, traffic management departments are paying more and more attention to vehicle traffic problems. In the daily management, video image is often the most intuitive, effective and fast way to obtain information resources. In the actual parking lot, vehicle access management is a very complex and difficult job. As most of the parking lots are scanned manually to complete the task of entering and leaving the parking lot. In order to solve this problem, this paper is based on the human image recognition analysis technology to realize the effective and fast recording of incoming and outgoing personnel and vehicle information of the parking lot vehicle entry system, so that these information can be statistically analyzed and the corresponding processing plan can be made quickly when there is an unexpected situation, and at the same time can improve the road traffic efficiency and safety performance, and also provide a simple and fast, easy to operate work for the relevant staff. It also provides a simple, fast and easy to operate method for the relevant staff, which has certain practical value.
{"title":"Design of parking lot vehicle entry system based on human image recognition analysis technology","authors":"Liang Zhu, Junhong Xi","doi":"10.1117/12.2669161","DOIUrl":"https://doi.org/10.1117/12.2669161","url":null,"abstract":"Due to the increasing number of private cars, traffic management departments are paying more and more attention to vehicle traffic problems. In the daily management, video image is often the most intuitive, effective and fast way to obtain information resources. In the actual parking lot, vehicle access management is a very complex and difficult job. As most of the parking lots are scanned manually to complete the task of entering and leaving the parking lot. In order to solve this problem, this paper is based on the human image recognition analysis technology to realize the effective and fast recording of incoming and outgoing personnel and vehicle information of the parking lot vehicle entry system, so that these information can be statistically analyzed and the corresponding processing plan can be made quickly when there is an unexpected situation, and at the same time can improve the road traffic efficiency and safety performance, and also provide a simple and fast, easy to operate work for the relevant staff. It also provides a simple, fast and easy to operate method for the relevant staff, which has certain practical value.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116096286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video frame interpolation (VFI), which aims to synthesize predictive frames from bidirectional historical references, has made remarkable progress with the development of deep convolutional neural networks (CNNs) over the past years. Existing CNNs generally face challenges in handing large motions due to the locality of convolution operations, resulting in a slow inference structure. We introduce a Real-time video frame interpolation transformer (RVFIT), a novel framework to overcome this limitation. Unlike traditional methods based on CNNs, this paper does not process video frames separately with different network modules in the spatial domain but batches adjacent frames through a single UNet-style structure end-to-end Transformer network architecture. Moreover, this paper creatively sets up two-stage interpolation sampling before and after the end-to-end network to maximize the performance of the traditional CV algorithm. The experimental results show that compared with SOTA TMNet, RVFIT has only 50% of the network size (6.2M vs 12.3M, parameters) while ensuring comparable performance, and the speed is increased by 80% (26.1 fps vs 14.3 fps, frame size is 720*576).
视频帧插值(VFI)旨在从双向历史参考合成预测帧,近年来随着深度卷积神经网络(cnn)的发展取得了显著进展。由于卷积运算的局部性,现有cnn在处理大运动时普遍面临挑战,导致推理结构缓慢。我们介绍了一种实时视频帧插值转换器(RVFIT),这是一种克服这一限制的新框架。与传统的基于cnn的方法不同,本文没有在空间域中使用不同的网络模块分别处理视频帧,而是通过单一的unet风格的端到端Transformer网络架构对相邻帧进行批量处理。此外,本文创造性地设置了端到端网络前后两阶段插值采样,最大限度地提高了传统CV算法的性能。实验结果表明,与SOTA TMNet相比,RVFIT在保证相当性能的同时,网络大小仅为前者的50% (6.2M vs 12.3M,参数),速度提高了80% (26.1 fps vs 14.3 fps,帧大小为720*576)。
{"title":"RVFIT: Real-time Video Frame Interpolation Transformer","authors":"Linlin Ou, Yuanping Chen","doi":"10.1117/12.2669055","DOIUrl":"https://doi.org/10.1117/12.2669055","url":null,"abstract":"Video frame interpolation (VFI), which aims to synthesize predictive frames from bidirectional historical references, has made remarkable progress with the development of deep convolutional neural networks (CNNs) over the past years. Existing CNNs generally face challenges in handing large motions due to the locality of convolution operations, resulting in a slow inference structure. We introduce a Real-time video frame interpolation transformer (RVFIT), a novel framework to overcome this limitation. Unlike traditional methods based on CNNs, this paper does not process video frames separately with different network modules in the spatial domain but batches adjacent frames through a single UNet-style structure end-to-end Transformer network architecture. Moreover, this paper creatively sets up two-stage interpolation sampling before and after the end-to-end network to maximize the performance of the traditional CV algorithm. The experimental results show that compared with SOTA TMNet, RVFIT has only 50% of the network size (6.2M vs 12.3M, parameters) while ensuring comparable performance, and the speed is increased by 80% (26.1 fps vs 14.3 fps, frame size is 720*576).","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133114220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By analyzing the data from the Sloan Digital Sky Survey (SDSS) Data Release 16, which the spectra of the Quasars are major samples, we focus on the investigation of the possible variations of the fine structure constant on the cosmological temporal scales over the universe. We analyzed 14495 quasar samples (red shift z<1) constrained in the literature by using emission-line method on [OIII] doublet and obtained Δα/α=0.70±1.6×10-5. We investigated the precision limit for the measurement of fine-structure constant by SDSS spectrum analysis by designing the simulation about three main sources of systematics: Noise, Outflow of gas, and Skyline. In addition, we exerted cross-correlation analysis on a high-resolution spectrum from MagE (MagE Observations at the Magellan II Clay telescope) named “J131651.29+055646.9” and got the result Δα/α=-9.16±11.38×10-7. Better constraints (Skyline subtraction algorithm) may improve the precision slightly by using SDSS. The more possible and efficient method may be to constrain Δα/α with the spectra of high-resolution spectroscopy and large active galaxy/QSO surveys.
{"title":"Measuring the fine-structure constant on quasar spectra: High spectral resolution gains more than large size of moderate spectral resolution spectra","authors":"Haoran Liang, Zhe Wu","doi":"10.1117/12.2670012","DOIUrl":"https://doi.org/10.1117/12.2670012","url":null,"abstract":"By analyzing the data from the Sloan Digital Sky Survey (SDSS) Data Release 16, which the spectra of the Quasars are major samples, we focus on the investigation of the possible variations of the fine structure constant on the cosmological temporal scales over the universe. We analyzed 14495 quasar samples (red shift z<1) constrained in the literature by using emission-line method on [OIII] doublet and obtained Δα/α=0.70±1.6×10-5. We investigated the precision limit for the measurement of fine-structure constant by SDSS spectrum analysis by designing the simulation about three main sources of systematics: Noise, Outflow of gas, and Skyline. In addition, we exerted cross-correlation analysis on a high-resolution spectrum from MagE (MagE Observations at the Magellan II Clay telescope) named “J131651.29+055646.9” and got the result Δα/α=-9.16±11.38×10-7. Better constraints (Skyline subtraction algorithm) may improve the precision slightly by using SDSS. The more possible and efficient method may be to constrain Δα/α with the spectra of high-resolution spectroscopy and large active galaxy/QSO surveys.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113998950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to avoid some errors in distance measurement, e.g. due to physical characteristics, environmental influences, human errors etc. during the measurement process, visual measurement and modern digital image related processing techniques are used. This requires the creation of relevant image acquisition systems and the detection of the edges of the acquired images using specialised procedures, on the basis of which the differences between the basic points of the image edges are brought into the measurement equation. The values obtained prove that the final results of this measurement method are consistent with the actual values, proving the accuracy of digital image correlation processing techniques in distance measurement. Digital image processing technology is a derivative of advanced manufacturing technology, and the rapid development of computer technology has led to the development of digital image recognition and image analysis capabilities. Engineering survey research is mainly applied in the process of engineering construction and engineering management, which can greatly reduce the error of engineering manufacturing and the period of engineering inspection. Based on digital image processing technology, this paper proposes an engineering displacement measurement method, an industrial part size measurement method and an industrial thread standard measurement method. Compared with the traditional manual measurement technology, the use of digital image technology can shorten the working period and improve the working efficiency.
{"title":"Application of digital pictures process technique in engineering surveying","authors":"Xiaowen Hu, Yeming Wang","doi":"10.1117/12.2669154","DOIUrl":"https://doi.org/10.1117/12.2669154","url":null,"abstract":"In order to avoid some errors in distance measurement, e.g. due to physical characteristics, environmental influences, human errors etc. during the measurement process, visual measurement and modern digital image related processing techniques are used. This requires the creation of relevant image acquisition systems and the detection of the edges of the acquired images using specialised procedures, on the basis of which the differences between the basic points of the image edges are brought into the measurement equation. The values obtained prove that the final results of this measurement method are consistent with the actual values, proving the accuracy of digital image correlation processing techniques in distance measurement. Digital image processing technology is a derivative of advanced manufacturing technology, and the rapid development of computer technology has led to the development of digital image recognition and image analysis capabilities. Engineering survey research is mainly applied in the process of engineering construction and engineering management, which can greatly reduce the error of engineering manufacturing and the period of engineering inspection. Based on digital image processing technology, this paper proposes an engineering displacement measurement method, an industrial part size measurement method and an industrial thread standard measurement method. Compared with the traditional manual measurement technology, the use of digital image technology can shorten the working period and improve the working efficiency.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naked-eye 3D imaging needs to call the cell phone camera to detect the position of the human eye, the front camera of the cell phone is a certain distance away from the center of the cell phone screen, so there is a certain offset between the front camera and the human eye position detected by the front camera and the cell phone screen, and the 3D image display is the center of the two eyes detected by the front camera when the human eye looks directly at the center of the cell phone screen as the origin to switch the image. Therefore, the human eye positioning offset problem will be solved by direct measurement method and formula derivation method.
{"title":"Offset correction scheme for human eye positioning in naked eye 3D for Android","authors":"Ke Wang, zonghai pan, Yuting Chen, Fei Li, C. Lan","doi":"10.1117/12.2669407","DOIUrl":"https://doi.org/10.1117/12.2669407","url":null,"abstract":"Naked-eye 3D imaging needs to call the cell phone camera to detect the position of the human eye, the front camera of the cell phone is a certain distance away from the center of the cell phone screen, so there is a certain offset between the front camera and the human eye position detected by the front camera and the cell phone screen, and the 3D image display is the center of the two eyes detected by the front camera when the human eye looks directly at the center of the cell phone screen as the origin to switch the image. Therefore, the human eye positioning offset problem will be solved by direct measurement method and formula derivation method.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"87 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127410766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The verification of IP core with image processing algorithm is important for SoC and FPGA application in the field of machine vision. This paper proposes a verification framework with general purpose, real-time performance and agility for IP core with image processing algorithm by using heterogeneous platform composed of ARM and FPGA. In the verification framework, the Gigabit Ethernet communication between PC and ARM is established. The FPGA is used to build the data bus to be compatible with multiple types of images, and combine with a partial reconfiguration to achieve fast iteration of IP cores of the algorithm to be verified. The validation framework is reusable for the algorithm IP core, and the deployment speed of the IP cores to be verified is 25 times faster than global reconfiguration. Compared with the existing FPGA verification technology, it has better reusability, shorter verification cycle, more targeted test stimulus, and faster deployment of IP cores to be verified.
{"title":"Research on verification framework of image processing IP core based on real-time reconfiguration","authors":"Wei Mo, Lu Zhao, Jianping Wen","doi":"10.1117/12.2669153","DOIUrl":"https://doi.org/10.1117/12.2669153","url":null,"abstract":"The verification of IP core with image processing algorithm is important for SoC and FPGA application in the field of machine vision. This paper proposes a verification framework with general purpose, real-time performance and agility for IP core with image processing algorithm by using heterogeneous platform composed of ARM and FPGA. In the verification framework, the Gigabit Ethernet communication between PC and ARM is established. The FPGA is used to build the data bus to be compatible with multiple types of images, and combine with a partial reconfiguration to achieve fast iteration of IP cores of the algorithm to be verified. The validation framework is reusable for the algorithm IP core, and the deployment speed of the IP cores to be verified is 25 times faster than global reconfiguration. Compared with the existing FPGA verification technology, it has better reusability, shorter verification cycle, more targeted test stimulus, and faster deployment of IP cores to be verified.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115017067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object grasping is a very challenging problem in computer vision and robotics. Existing algorithms generally have a large number of training parameters, which lead to long training times and require high performance facilities. In this paper, we present a lightweight neural network to solve the problem of object grasping. Our network is able to generate grasps at real-time speeds (∼30ms), thus can be used on mobile devices. The main idea of GhostNet is to reduce the number of parameters by generating feature maps from each other in the process of convolution. We adopt this idea and apply it on the deconvolution process. Besides, we construct the lightweight grasp network based on these two processes. A lot of experiments on grasping datasets demonstrate that our network performs well. We achieve accuracy of 94% on Cornell grasp dataset and 91.8% on Jacquard dataset. At the same time, compared to traditional models, our model only requires 15% of the number of parameters and 47% of training time.
{"title":"A lightweight object grasping network using GhostNet","authors":"Yangfan Deng, Qinghua Guo, Yong Zhao, Junli Xu","doi":"10.1117/12.2669156","DOIUrl":"https://doi.org/10.1117/12.2669156","url":null,"abstract":"Object grasping is a very challenging problem in computer vision and robotics. Existing algorithms generally have a large number of training parameters, which lead to long training times and require high performance facilities. In this paper, we present a lightweight neural network to solve the problem of object grasping. Our network is able to generate grasps at real-time speeds (∼30ms), thus can be used on mobile devices. The main idea of GhostNet is to reduce the number of parameters by generating feature maps from each other in the process of convolution. We adopt this idea and apply it on the deconvolution process. Besides, we construct the lightweight grasp network based on these two processes. A lot of experiments on grasping datasets demonstrate that our network performs well. We achieve accuracy of 94% on Cornell grasp dataset and 91.8% on Jacquard dataset. At the same time, compared to traditional models, our model only requires 15% of the number of parameters and 47% of training time.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132101477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yusen ZHANG, Min Li, Wei Cai, Yao Gou, Shuaibing Shi
Focusing on how to obtain high-quality and sufficient synthetic aperture radar (SAR) data in deep learning, this paper proposed a new mothed named SARCUT (Self-Attention Relativistic Contrastive Learning for Unpaired Image-to-Image Translation) to translate optical images into SAR images. In order to improve the coordination of generated images and stabilize the training process, we constructed a generator with the self-attention mechanism and spectral normalization operation. Meanwhile, relativistic discrimination adversarial loss function was designed to accelerate the model convergence and improved the authenticity of the generated images. Experiments on open datasets with 6 image quantitative evaluation metrics showed our model can learn the deeper internal relations and main features between multiple source images. Compared with the classical methods, SARCUT has more advantages in establishing the real image domain mapping, both the quality and authenticity of the generated image are significantly improved.
针对如何在深度学习中获得高质量和充足的合成孔径雷达(SAR)数据,本文提出了一种新的方法SARCUT (Self-Attention Relativistic contrast learning for Unpaired Image-to-Image Translation),将光学图像转化为SAR图像。为了提高生成图像的协调性,稳定训练过程,我们构造了一个具有自关注机制和谱归一化操作的生成器。同时,设计了相对论性判别对抗损失函数,加快了模型的收敛速度,提高了生成图像的真实性。在具有6个图像定量评价指标的开放数据集上的实验表明,该模型可以学习到多源图像之间更深层次的内部关系和主要特征。与经典方法相比,SARCUT在建立真实图像域映射方面更有优势,生成的图像质量和真实性都有显著提高。
{"title":"SARCUT: Contrastive learning for optical-SAR image translation with self-attention and relativistic discrimination","authors":"Yusen ZHANG, Min Li, Wei Cai, Yao Gou, Shuaibing Shi","doi":"10.1117/12.2669086","DOIUrl":"https://doi.org/10.1117/12.2669086","url":null,"abstract":"Focusing on how to obtain high-quality and sufficient synthetic aperture radar (SAR) data in deep learning, this paper proposed a new mothed named SARCUT (Self-Attention Relativistic Contrastive Learning for Unpaired Image-to-Image Translation) to translate optical images into SAR images. In order to improve the coordination of generated images and stabilize the training process, we constructed a generator with the self-attention mechanism and spectral normalization operation. Meanwhile, relativistic discrimination adversarial loss function was designed to accelerate the model convergence and improved the authenticity of the generated images. Experiments on open datasets with 6 image quantitative evaluation metrics showed our model can learn the deeper internal relations and main features between multiple source images. Compared with the classical methods, SARCUT has more advantages in establishing the real image domain mapping, both the quality and authenticity of the generated image are significantly improved.","PeriodicalId":236099,"journal":{"name":"International Workshop on Frontiers of Graphics and Image Processing","volume":"12644 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131223347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}