Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043029
Zhou Zhou, Guangqian Kong, Xun Duan, Huiyun Long
Conducting video quality assessment (VQA) for user-generated content (UGC) videos and achieving consistency with subjective quality assessment are highly challenging tasks. We propose a no-reference video quality assessment (NR-VQA) method for UGC scenarios by considering characteristics of human visual perception. To distinguish between varying levels of human attention within different regions of a single frame, we devise a dual-branch network. This network extracts spatial features containing positional information of moving objects from frame-level images. In addition, we employ the temporal pyramid pooling module to effectively integrate temporal features of different scales, enabling the extraction of inter-frame temporal information. To mitigate the time-lag effect in the human visual system, we introduce the temporal pyramid attention module. This module evaluates the significance of individual video frames and simulates the varying attention levels exhibited by humans towards frames. We conducted experiments on the KoNViD-1k, LIVE-VQC, CVD2014, and YouTube-UGC databases. The experimental results demonstrate the superior performance of our proposed method compared to recent NR-VQA techniques in terms of both objective assessment and consistency with subjective assessment.
对用户生成内容(UGC)视频进行视频质量评估(VQA)并实现与主观质量评估的一致性是极具挑战性的任务。考虑到人类视觉感知的特点,我们提出了一种针对 UGC 场景的无参考视频质量评估(NR-VQA)方法。为了区分单帧不同区域内人类注意力的不同水平,我们设计了一个双分支网络。该网络从帧级图像中提取包含移动物体位置信息的空间特征。此外,我们还采用了时间金字塔池化模块来有效整合不同尺度的时间特征,从而提取帧间的时间信息。为了缓解人类视觉系统中的时滞效应,我们引入了时空金字塔注意力模块。该模块可评估单个视频帧的重要性,并模拟人类对帧所表现出的不同注意力水平。我们在 KoNViD-1k、LIVE-VQC、CVD2014 和 YouTube-UGC 数据库上进行了实验。实验结果表明,与最新的 NR-VQA 技术相比,我们提出的方法在客观评估和与主观评估的一致性方面都表现出色。
{"title":"No-reference video quality assessment based on human visual perception","authors":"Zhou Zhou, Guangqian Kong, Xun Duan, Huiyun Long","doi":"10.1117/1.jei.33.4.043029","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043029","url":null,"abstract":"Conducting video quality assessment (VQA) for user-generated content (UGC) videos and achieving consistency with subjective quality assessment are highly challenging tasks. We propose a no-reference video quality assessment (NR-VQA) method for UGC scenarios by considering characteristics of human visual perception. To distinguish between varying levels of human attention within different regions of a single frame, we devise a dual-branch network. This network extracts spatial features containing positional information of moving objects from frame-level images. In addition, we employ the temporal pyramid pooling module to effectively integrate temporal features of different scales, enabling the extraction of inter-frame temporal information. To mitigate the time-lag effect in the human visual system, we introduce the temporal pyramid attention module. This module evaluates the significance of individual video frames and simulates the varying attention levels exhibited by humans towards frames. We conducted experiments on the KoNViD-1k, LIVE-VQC, CVD2014, and YouTube-UGC databases. The experimental results demonstrate the superior performance of our proposed method compared to recent NR-VQA techniques in terms of both objective assessment and consistency with subjective assessment.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"16 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043023
Mingwen Shao, Minggui Han, Lingzhuang Meng, Fukang Liu
Contrastive learning for unpaired image-to-image translation (CUT) aims to learn a mapping from source to target domain with an unpaired dataset, which combines contrastive loss to maximize the mutual information between real and generated images. However, the existing CUT-based methods exhibit unsatisfactory visual quality due to the wrong locating of objects and backgrounds, particularly where it incorrectly transforms the background to match the object pattern in layout-changing datasets. To alleviate the issue, we present background-focused contrastive learning for unpaired image-to-image translation (BFCUT) to improve the background’s consistency between real and its generated images. Specifically, we first generate heat maps to explicitly locate the objects and backgrounds for subsequent contrastive loss and global background similarity loss. Then, the representative queries of objects and backgrounds rather than randomly sampling queries are selected for contrastive loss to promote reality of objects and maintenance of backgrounds. Meanwhile, global semantic vectors with less object information are extracted with the help of heat maps, and we further align the vectors of real images and their corresponding generated images to promote the maintenance of the backgrounds in global background similarity loss. Our BFCUT alleviates the wrong translation of backgrounds and generates more realistic images. Extensive experiments on three datasets demonstrate better quantitative results and qualitative visual effects.
{"title":"Background-focused contrastive learning for unpaired image-to-image translation","authors":"Mingwen Shao, Minggui Han, Lingzhuang Meng, Fukang Liu","doi":"10.1117/1.jei.33.4.043023","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043023","url":null,"abstract":"Contrastive learning for unpaired image-to-image translation (CUT) aims to learn a mapping from source to target domain with an unpaired dataset, which combines contrastive loss to maximize the mutual information between real and generated images. However, the existing CUT-based methods exhibit unsatisfactory visual quality due to the wrong locating of objects and backgrounds, particularly where it incorrectly transforms the background to match the object pattern in layout-changing datasets. To alleviate the issue, we present background-focused contrastive learning for unpaired image-to-image translation (BFCUT) to improve the background’s consistency between real and its generated images. Specifically, we first generate heat maps to explicitly locate the objects and backgrounds for subsequent contrastive loss and global background similarity loss. Then, the representative queries of objects and backgrounds rather than randomly sampling queries are selected for contrastive loss to promote reality of objects and maintenance of backgrounds. Meanwhile, global semantic vectors with less object information are extracted with the help of heat maps, and we further align the vectors of real images and their corresponding generated images to promote the maintenance of the backgrounds in global background similarity loss. Our BFCUT alleviates the wrong translation of backgrounds and generates more realistic images. Extensive experiments on three datasets demonstrate better quantitative results and qualitative visual effects.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"22 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043024
Bouthaina Abdallah, Sonda Ben Jdidia, Fatma Belghith, Mohamed Ali Ben Ayed, Nouri Masmoudi
The Joint Video Experts Team has recently finalized the versatile video coding (VVC) standard, which incorporates various advanced encoding tools. These tools ensure great enhancements in the coding efficiency, leading to a bitrate reduction up to 50% when compared to the previous standard, high-efficiency video coding. However, this enhancement comes at the expense of high computational complexity. Within this context, we address the new quadtree (QT) with nested multitype tree partition block in VVC for all-intra configuration. In fact, we propose a fast intra-coding unit (CU) partition algorithm using various convolution neural network (CNN) classifiers to directly predict the partition mode, skip unnecessary split modes, and early exit the partitioning process. The proposed approach first predicts the QT depth at a CU of size 64×64 by the corresponding CNN classifier. Then four CNN classifiers are applied to predict the partition decision tree at a CU of size 32×32 using multithreshold values and ignore the rate-distortion optimization process to speed up the partition coding time. Thus the developed method is implemented on the reference software VTM 16.2 and tested for different video sequences. The experimental results confirm that the proposed solution achieves an encoding time reduction of about 46% in average, reaching up to 67.3% with an acceptable increase in bitrate and an unsignificant decrease in quality.
{"title":"Early quadtree with nested multitype tree partitioning algorithm based on convolution neural network for the versatile video coding standard","authors":"Bouthaina Abdallah, Sonda Ben Jdidia, Fatma Belghith, Mohamed Ali Ben Ayed, Nouri Masmoudi","doi":"10.1117/1.jei.33.4.043024","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043024","url":null,"abstract":"The Joint Video Experts Team has recently finalized the versatile video coding (VVC) standard, which incorporates various advanced encoding tools. These tools ensure great enhancements in the coding efficiency, leading to a bitrate reduction up to 50% when compared to the previous standard, high-efficiency video coding. However, this enhancement comes at the expense of high computational complexity. Within this context, we address the new quadtree (QT) with nested multitype tree partition block in VVC for all-intra configuration. In fact, we propose a fast intra-coding unit (CU) partition algorithm using various convolution neural network (CNN) classifiers to directly predict the partition mode, skip unnecessary split modes, and early exit the partitioning process. The proposed approach first predicts the QT depth at a CU of size 64×64 by the corresponding CNN classifier. Then four CNN classifiers are applied to predict the partition decision tree at a CU of size 32×32 using multithreshold values and ignore the rate-distortion optimization process to speed up the partition coding time. Thus the developed method is implemented on the reference software VTM 16.2 and tested for different video sequences. The experimental results confirm that the proposed solution achieves an encoding time reduction of about 46% in average, reaching up to 67.3% with an acceptable increase in bitrate and an unsignificant decrease in quality.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"93 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043027
Jingsheng Li, Tianxiang Xue, Jiayi Zhao, Jingmin Ge, Yufang Min, Wei Su, Kun Zhan
The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. We introduce the high-resolution cloud detection network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature fusion module, and multiresolution pyramid pooling module to effectively capture complex cloud features. This architecture preserves detailed cloud texture information while facilitating feature exchange across different resolutions, thereby enhancing the overall performance in cloud detection. Additionally, an approach is introduced wherein a student view, trained on noisy augmented images, is supervised by a teacher view processing normal images. This setup enables the student to learn from cleaner supervisions provided by the teacher, leading to an improved performance. Extensive evaluations on three optical satellite image cloud detection datasets validate the superior performance of HR-cloud-Net compared with existing methods.
{"title":"High-resolution cloud detection network","authors":"Jingsheng Li, Tianxiang Xue, Jiayi Zhao, Jingmin Ge, Yufang Min, Wei Su, Kun Zhan","doi":"10.1117/1.jei.33.4.043027","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043027","url":null,"abstract":"The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. We introduce the high-resolution cloud detection network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature fusion module, and multiresolution pyramid pooling module to effectively capture complex cloud features. This architecture preserves detailed cloud texture information while facilitating feature exchange across different resolutions, thereby enhancing the overall performance in cloud detection. Additionally, an approach is introduced wherein a student view, trained on noisy augmented images, is supervised by a teacher view processing normal images. This setup enables the student to learn from cleaner supervisions provided by the teacher, leading to an improved performance. Extensive evaluations on three optical satellite image cloud detection datasets validate the superior performance of HR-cloud-Net compared with existing methods.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"15 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043028
Wenhao Lu, Zehao Li, Junying Li, Yuncheng Lu, Tony Tae-Hyoung Kim
Neuromorphic vision sensors (NVS) with the features of small data redundancy and transmission latency are widely implemented in Internet of Things applications. Previous studies have developed various object detection algorithms based on NVS’s unique event data format. However, most of these methods are only adaptive for scenarios with stationary backgrounds. Under dynamic background conditions, NVS can also acquire the events of non-target objects due to its mechanism of detecting pixel intensity changes. As a result, the performance of existing detection methods is greatly degraded. To address this shortcoming, we introduce an extra refinement process to the conventional histogram-based (HIST) detection method. For the proposed regions from HIST, we apply a practical decision condition to categorize them as either object-dominant or background-dominant cases. Then, the object-dominant regions undergo a second-time HIST-based region proposal for precise localization, while background-dominant regions employ an upper outline determination strategy for target object identification. Finally, the refined results are tracked using a simplified Kalman filter approach. Evaluated in an outdoor drone surveillance with an event camera, the proposed scheme demonstrates superior performance in both intersection over union and F1 score metrics compared to other methods.
神经形态视觉传感器(NVS)具有数据冗余和传输延迟小的特点,在物联网应用中得到广泛应用。以往的研究基于 NVS 独特的事件数据格式开发了各种物体检测算法。然而,这些方法大多只适用于静态背景的场景。在动态背景条件下,NVS 由于其检测像素强度变化的机制,也能获取非目标物体的事件。因此,现有检测方法的性能大大降低。针对这一缺陷,我们在传统的基于直方图(HIST)的检测方法中引入了额外的细化过程。对于从 HIST 中提出的区域,我们采用了一种实用的决策条件,将其分为物体主导型和背景主导型两种情况。然后,对象主导区域经过第二次基于 HIST 的区域提议以实现精确定位,而背景主导区域则采用上轮廓确定策略来识别目标对象。最后,使用简化的卡尔曼滤波法跟踪细化结果。在使用事件摄像机进行室外无人机监控的评估中,与其他方法相比,所提出的方案在交集大于联合和 F1 分数指标上都表现出了卓越的性能。
{"title":"Event-frame object detection under dynamic background condition","authors":"Wenhao Lu, Zehao Li, Junying Li, Yuncheng Lu, Tony Tae-Hyoung Kim","doi":"10.1117/1.jei.33.4.043028","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043028","url":null,"abstract":"Neuromorphic vision sensors (NVS) with the features of small data redundancy and transmission latency are widely implemented in Internet of Things applications. Previous studies have developed various object detection algorithms based on NVS’s unique event data format. However, most of these methods are only adaptive for scenarios with stationary backgrounds. Under dynamic background conditions, NVS can also acquire the events of non-target objects due to its mechanism of detecting pixel intensity changes. As a result, the performance of existing detection methods is greatly degraded. To address this shortcoming, we introduce an extra refinement process to the conventional histogram-based (HIST) detection method. For the proposed regions from HIST, we apply a practical decision condition to categorize them as either object-dominant or background-dominant cases. Then, the object-dominant regions undergo a second-time HIST-based region proposal for precise localization, while background-dominant regions employ an upper outline determination strategy for target object identification. Finally, the refined results are tracked using a simplified Kalman filter approach. Evaluated in an outdoor drone surveillance with an event camera, the proposed scheme demonstrates superior performance in both intersection over union and F1 score metrics compared to other methods.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"62 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video hashing is an efficient technique for tasks like copy detection and retrieval. This paper utilizes canonical polyadic (CP) decomposition and Hahn moments to design a robust video hashing. The first significant contribution is the secondary frame construction. It uses three weighted techniques to generate three secondary frames for each video group, which can effectively capture features of video frames from different aspects and thus improves discrimination. Another contribution is the deep feature extraction via the ResNet50 and CP decomposition. The use of the ResNet50 can provide rich features and the CP decomposition can learn a compact and discriminative representation from the rich features. In addition, the Hahn moments of secondary frames are taken to construct hash elements. Extensive experiments on the open video dataset demonstrate that the proposed algorithm surpasses several state-of-the-art algorithms in balancing discrimination and robustness.
{"title":"Robust video hashing with canonical polyadic decomposition and Hahn moments","authors":"Zhenjun Tang, Huijiang Zhuang, Mengzhu Yu, Lv Chen, Xiaoping Liang, Xianquan Zhang","doi":"10.1117/1.jei.33.4.043007","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043007","url":null,"abstract":"Video hashing is an efficient technique for tasks like copy detection and retrieval. This paper utilizes canonical polyadic (CP) decomposition and Hahn moments to design a robust video hashing. The first significant contribution is the secondary frame construction. It uses three weighted techniques to generate three secondary frames for each video group, which can effectively capture features of video frames from different aspects and thus improves discrimination. Another contribution is the deep feature extraction via the ResNet50 and CP decomposition. The use of the ResNet50 can provide rich features and the CP decomposition can learn a compact and discriminative representation from the rich features. In addition, the Hahn moments of secondary frames are taken to construct hash elements. Extensive experiments on the open video dataset demonstrate that the proposed algorithm surpasses several state-of-the-art algorithms in balancing discrimination and robustness.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"25 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043015
Ying Wang, Jie Qiu, Yanxiang Zhao
Compared with ordinary images, hyperspectral images (HSIs) consist of many bands that can provide rich spatial and spectral information and are widely used in remote sensing. However, HSIs are subject to various types of noise due to limited sensor sensitivity; low light intensity in the bands; and corruption during acquisition, transmission, and storage. Therefore, the problem of HSI denoising has attracted extensive attention from society. Although recent HSI denoising methods provide effective solutions in various optimization directions, their performance under real complex noise is still not optimal. To address these issues, this article proposes a self-modulated cross-attention network that fully utilizes spatial and spectral information. The core of the model is the use of deformable convolution to cross-fuse spatial and spectral features to improve the network denoising capability. At the same time, a self-modulating residual block allows the network to transform features in an adaptive manner based on neighboring bands, improving the network’s ability to deal with complex noise, which we call a feature enhancement block. Finally, we propose a three-segment network architecture that improves the stability of the model. The method proposed in this work outperforms other state-of-the-art methods through comparative analysis of experiments in synthetic and real data.
{"title":"Hyperspectral image denoising via self-modulated cross-attention deformable convolutional neural network","authors":"Ying Wang, Jie Qiu, Yanxiang Zhao","doi":"10.1117/1.jei.33.4.043015","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043015","url":null,"abstract":"Compared with ordinary images, hyperspectral images (HSIs) consist of many bands that can provide rich spatial and spectral information and are widely used in remote sensing. However, HSIs are subject to various types of noise due to limited sensor sensitivity; low light intensity in the bands; and corruption during acquisition, transmission, and storage. Therefore, the problem of HSI denoising has attracted extensive attention from society. Although recent HSI denoising methods provide effective solutions in various optimization directions, their performance under real complex noise is still not optimal. To address these issues, this article proposes a self-modulated cross-attention network that fully utilizes spatial and spectral information. The core of the model is the use of deformable convolution to cross-fuse spatial and spectral features to improve the network denoising capability. At the same time, a self-modulating residual block allows the network to transform features in an adaptive manner based on neighboring bands, improving the network’s ability to deal with complex noise, which we call a feature enhancement block. Finally, we propose a three-segment network architecture that improves the stability of the model. The method proposed in this work outperforms other state-of-the-art methods through comparative analysis of experiments in synthetic and real data.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"28 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1117/1.jei.33.4.043031
Muhammad Aon, Huibing Wang, Muhammad Noman Waleed, Yulin Wei, Xianping Fu
Capturing high-quality photos in an underwater atmosphere is complicated, as light attenuation, color distortion, and reduced contrast pose significant challenges. However, one fact usually ignored is the non-uniform texture degradation in distorted images. The loss of comprehensive textures in underwater images poses obstacles in object detection and recognition. To address this problem, we have introduced an image enhancement model called scene adaptive color compensation and multi-weight fusion for extracting fine textural details under diverse environments and enhancing the overall quality of the underwater imagery. Our method blends three input images derived from the adaptive color-compensating and color-corrected version of the degraded image. The first two input images are used to adjust the low contrast and dehazing of the image respectively. Similarly, the third input image is used to extract the fine texture details based on different scales and orientations of the image. Finally, the input images with their associated weight maps are normalized and fused through multi-weight fusion. The proposed model is tested on a distinct set of underwater imagery with varying levels of degradation and frequently outperformed state-of-the-art methods, producing significant improvements in texture visibility, reducing color distortion, and enhancing the overall quality of the submerged images.
{"title":"Scene adaptive color compensation and multi-weight fusion of underwater image","authors":"Muhammad Aon, Huibing Wang, Muhammad Noman Waleed, Yulin Wei, Xianping Fu","doi":"10.1117/1.jei.33.4.043031","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043031","url":null,"abstract":"Capturing high-quality photos in an underwater atmosphere is complicated, as light attenuation, color distortion, and reduced contrast pose significant challenges. However, one fact usually ignored is the non-uniform texture degradation in distorted images. The loss of comprehensive textures in underwater images poses obstacles in object detection and recognition. To address this problem, we have introduced an image enhancement model called scene adaptive color compensation and multi-weight fusion for extracting fine textural details under diverse environments and enhancing the overall quality of the underwater imagery. Our method blends three input images derived from the adaptive color-compensating and color-corrected version of the degraded image. The first two input images are used to adjust the low contrast and dehazing of the image respectively. Similarly, the third input image is used to extract the fine texture details based on different scales and orientations of the image. Finally, the input images with their associated weight maps are normalized and fused through multi-weight fusion. The proposed model is tested on a distinct set of underwater imagery with varying levels of degradation and frequently outperformed state-of-the-art methods, producing significant improvements in texture visibility, reducing color distortion, and enhancing the overall quality of the submerged images.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"36 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1117/1.jei.33.3.033033
Lu Ronghui, Tzong-Jer Chen
Our study investigates the impact of denoising preprocessing on the accuracy of image segmentation. Specifically, images with Gaussian noise were segmented using the fuzzy c-means method (FCM), local binary fitting (LBF), the adaptive active contour model coupling local and global information (EVOL_LCV), and the U-Net semantic segmentation method. These methods were then quantitatively evaluated. Subsequently, various denoising techniques, such as mean, median, Gaussian, bilateral filtering, and feed-forward denoising convolutional neural network (DnCNN), were applied to the original images, and the segmentation was performed using the methods mentioned above, followed by another round of quantitative evaluations. The two quantitative evaluations revealed that the segmentation results were clearly enhanced after denoising. Specifically, the Dice similarity coefficient of the FCM segmentation improved by 4% to 44%, LBF improved by 16%, and EVOL_LCV presented limited changes. Additionally, the U-Net network trained on denoised images attained a segmentation improvement of over 5%. The accuracy of traditional segmentation and semantic segmentation of Gaussian noise images is improved effectively using DnCNN.
{"title":"Research on image segmentation effect based on denoising preprocessing","authors":"Lu Ronghui, Tzong-Jer Chen","doi":"10.1117/1.jei.33.3.033033","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033033","url":null,"abstract":"Our study investigates the impact of denoising preprocessing on the accuracy of image segmentation. Specifically, images with Gaussian noise were segmented using the fuzzy c-means method (FCM), local binary fitting (LBF), the adaptive active contour model coupling local and global information (EVOL_LCV), and the U-Net semantic segmentation method. These methods were then quantitatively evaluated. Subsequently, various denoising techniques, such as mean, median, Gaussian, bilateral filtering, and feed-forward denoising convolutional neural network (DnCNN), were applied to the original images, and the segmentation was performed using the methods mentioned above, followed by another round of quantitative evaluations. The two quantitative evaluations revealed that the segmentation results were clearly enhanced after denoising. Specifically, the Dice similarity coefficient of the FCM segmentation improved by 4% to 44%, LBF improved by 16%, and EVOL_LCV presented limited changes. Additionally, the U-Net network trained on denoised images attained a segmentation improvement of over 5%. The accuracy of traditional segmentation and semantic segmentation of Gaussian noise images is improved effectively using DnCNN.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"32 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1117/1.jei.33.3.031201
Igor Jovančević, Jean-José Orteu
Guest Editors Igor Jovančević and Jean-José Orteu introduce the Special Section on Quality Control by Artificial Vision VII.
特邀编辑 Igor Jovančević 和 Jean-José Orteu 介绍第七期人工视觉质量控制特辑。
{"title":"Special Section Guest Editorial: Quality Control by Artificial Vision VII","authors":"Igor Jovančević, Jean-José Orteu","doi":"10.1117/1.jei.33.3.031201","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031201","url":null,"abstract":"Guest Editors Igor Jovančević and Jean-José Orteu introduce the Special Section on Quality Control by Artificial Vision VII.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"19 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}