Anomaly detection in multimedia datasets is a widely studied area. Yet, the concept drift challenge in data has been ignored or poorly handled by the majority of the anomaly detection frameworks. The state-of-the-art approaches assume that the data distribution at training and deployment time will be the same. However, due to various real-life environmental factors, the data may encounter drift in its distribution or can drift from one class to another in the late future. Thus, a one-time trained model might not perform adequately. In this paper, we systematically investigate the effect of concept drift on various detection models and propose a modified Adaptive Gaussian Mixture Model (AGMM) based framework for anomaly detection in multimedia data. In contrast to the baseline AGMM, the proposed extension of AGMM remembers the past for a longer period in order to handle the drift better. Extensive experimental analysis shows that the proposed model better handles the drift in data as compared with the baseline AGMM. Further, to facilitate research and comparison with the proposed framework, we contribute three multimedia datasets constituting faces as samples. The face samples of individuals correspond to the age difference of more than ten years to incorporate a longer temporal context.
{"title":"Concept drift challenge in multimedia anomaly detection: A case study with facial datasets","authors":"Pratibha Kumari , Priyankar Choudhary , Vinit Kujur , Pradeep K. Atrey , Mukesh Saini","doi":"10.1016/j.image.2024.117100","DOIUrl":"10.1016/j.image.2024.117100","url":null,"abstract":"<div><p>Anomaly detection<span> in multimedia datasets is a widely studied area. Yet, the concept drift challenge in data has been ignored or poorly handled by the majority of the anomaly detection frameworks. The state-of-the-art approaches assume that the data distribution at training and deployment time will be the same. However, due to various real-life environmental factors, the data may encounter drift in its distribution or can drift from one class to another in the late future. Thus, a one-time trained model might not perform adequately. In this paper, we systematically investigate the effect of concept drift on various detection models and propose a modified Adaptive Gaussian Mixture Model (AGMM) based framework for anomaly detection in multimedia data. In contrast to the baseline AGMM, the proposed extension of AGMM remembers the past for a longer period in order to handle the drift better. Extensive experimental analysis shows that the proposed model better handles the drift in data as compared with the baseline AGMM. Further, to facilitate research and comparison with the proposed framework, we contribute three multimedia datasets constituting faces as samples. The face samples of individuals correspond to the age difference of more than ten years to incorporate a longer temporal context.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"123 ","pages":"Article 117100"},"PeriodicalIF":3.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139423693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-08DOI: 10.1016/j.image.2024.117101
Qi Zheng , Zhengzhong Tu , Pavan C. Madhusudana , Xiaoyang Zeng , Alan C. Bovik , Yibo Fan
Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales. Recent advances in mobile devices and cloud computing techniques have made it possible to capture, process, and share high resolution, high frame rate (HFR) videos across the Internet nearly instantaneously. Being able to monitor and control the quality of these streamed videos can enable the delivery of more enjoyable content and perceptually optimized rate control. Accordingly, there is a pressing need to develop VQA models that can be deployed at enormous scales. While some recent effects have been applied to full-reference (FR) analysis of variable frame rate and HFR video quality, the development of no-reference (NR) VQA algorithms targeting frame rate variations has been little studied. Here, we propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video Evaluator w/o Reference (FAVER). FAVER uses extended models of spatial natural scene statistics that encompass space–time wavelet-decomposed video signals, and leverages the advantages of the deep neural network to provide motion perception, to conduct efficient frame rate sensitive quality prediction. Our extensive experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost. To facilitate reproducible research and public evaluation, an implementation of FAVER is being made freely available online: https://github.com/uniqzheng/HFR-BVQA.
{"title":"FAVER: Blind quality prediction of variable frame rate videos","authors":"Qi Zheng , Zhengzhong Tu , Pavan C. Madhusudana , Xiaoyang Zeng , Alan C. Bovik , Yibo Fan","doi":"10.1016/j.image.2024.117101","DOIUrl":"10.1016/j.image.2024.117101","url":null,"abstract":"<div><p><span><span>Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales. Recent advances in mobile devices<span><span> and cloud computing techniques have made it possible to capture, process, and share high resolution, high frame rate (HFR) videos across the Internet nearly instantaneously. Being able to monitor and control the quality of these streamed videos can enable the delivery of more enjoyable content and perceptually optimized rate control. Accordingly, there is a pressing need to develop VQA models that can be deployed at enormous scales. While some recent effects have been applied to full-reference (FR) analysis of variable frame rate and HFR video quality, the development of no-reference (NR) VQA algorithms targeting frame rate variations has been little studied. Here, we propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video </span>Evaluator w/o Reference (FAVER). FAVER uses extended models of spatial natural scene statistics that encompass space–time wavelet-decomposed video signals, and leverages the advantages of the </span></span>deep neural network to provide motion perception, to conduct efficient frame rate sensitive quality prediction. Our extensive experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost. To facilitate reproducible research and public evaluation, an implementation of FAVER is being made freely available online: </span><span>https://github.com/uniqzheng/HFR-BVQA</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117101"},"PeriodicalIF":3.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.1016/j.image.2023.117088
Omar Sallam, Rihui Feng, Jack Stason, Xinguo Wang, Mirjam Fürth
Using computer vision techniques such as stereo vision systems for sea state measurement or for offshore structures monitoring can improve the measurement fidelity and accuracy with no significant additional cost. In this paper, two experiments (in-lab/open-sea) are conducted to study the performance of stereo vision system to measure the water wave surface elevation and rigid body heaving motion. For the in-lab experiment, regular water waves are generated in a wave tank for different frequencies and wave heights, where the water surface is scanned by the stereo vision camera installed on the top of the tank. Surface elevation inferred by the stereo vision is verified by an installed stationary side camera that records the water surface through the tank transparent side window, water surface elevation measured by the side camera recordings is extracted using edge detection algorithm. During the in-lab experiment a heaving buoy is installed to test the performance of Visual Simultaneous Localization and Mapping (VSLAM) algorithm to monitor the buoy heave motion. The VSLAM algorithm fuses a buoy onboard stereo vision recordings with an embedded Inertial Measurement Unit (IMU) to estimate the 6-DOF of a rigid body. The Buoy motion VSLAM measurements are verified by a KLT tracking algorithm implemented on the video recordings of the stationary side camera. The open-sea experiment is implemented in Lake Somerville, Texas. The stereo vision system is installed to measure the water surface elevation and directional spectrum of the wind generated irregular waves. The open-sea wave measurements by the stereo vision are verified by a Sofar commercial wave buoys deployed in the testing location.
{"title":"Stereo vision based systems for sea-state measurement and floating structures monitoring","authors":"Omar Sallam, Rihui Feng, Jack Stason, Xinguo Wang, Mirjam Fürth","doi":"10.1016/j.image.2023.117088","DOIUrl":"10.1016/j.image.2023.117088","url":null,"abstract":"<div><p><span>Using computer vision<span> techniques such as stereo vision systems for sea state measurement or for </span></span>offshore structures<span><span> monitoring can improve the measurement fidelity<span> and accuracy with no significant additional cost. In this paper, two experiments (in-lab/open-sea) are conducted to study the performance of stereo vision system to measure the water wave surface elevation and rigid body heaving motion. For the in-lab experiment, regular water waves are generated in a wave tank for different frequencies and wave heights, where the water surface is scanned by the stereo vision camera installed on the top of the tank. Surface elevation inferred by the stereo vision is verified by an installed stationary side camera that records the water surface through the tank transparent side window, water surface elevation measured by the side camera recordings is extracted using edge detection algorithm. During the in-lab experiment a heaving buoy is installed to test the performance of Visual Simultaneous </span></span>Localization<span> and Mapping (VSLAM) algorithm to monitor the buoy heave motion. The VSLAM algorithm fuses a buoy onboard stereo vision recordings with an embedded Inertial Measurement Unit<span> (IMU) to estimate the 6-DOF of a rigid body. The Buoy motion VSLAM measurements are verified by a KLT tracking algorithm implemented on the video recordings of the stationary side camera. The open-sea experiment is implemented in Lake Somerville, Texas. The stereo vision system is installed to measure the water surface elevation and directional spectrum of the wind generated irregular waves. The open-sea wave measurements by the stereo vision are verified by a Sofar commercial wave buoys deployed in the testing location.</span></span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117088"},"PeriodicalIF":3.5,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139374052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1016/j.image.2023.117099
Huaping Zhou , Tao Wu , Senmao Ye , Xinru Qin , Kelei Sun
Synthesizing images with fine details from text descriptions is a challenge. The existing single-stage generative adversarial networks (GANs) fuse sentence features into the image generation process through affine transformation, which alleviate the problems of missing details and large computation from stacked networks. However, existing single-stage networks ignore the word features in the text description, resulting in a lack of detail in the generated image. To address this issue, we proposed a text aggregation module (TAM) to fuse sentence features and word features in a text by a simple spatial attention mechanism. Then we built a text connection fusion (TCF) block consisting mainly of gated recurrent unit (GRU) and up-sampled block. It can connect text features used in the up-sampled blocks to improve text utilization. Besides, to further improve the semantic consistency between text and the generated images, we introduce the deep attentional multimodal similarity model (DAMSM) loss, which monitors the similarity between text and improves semantic consistency. Experimental results prove that our method is superior to the state-of-the-art models on the CUB and COCO datasets, regarding both image fidelity and semantic consistency with the text.
{"title":"Enhancing fine-detail image synthesis from text descriptions by text aggregation and connection fusion module","authors":"Huaping Zhou , Tao Wu , Senmao Ye , Xinru Qin , Kelei Sun","doi":"10.1016/j.image.2023.117099","DOIUrl":"10.1016/j.image.2023.117099","url":null,"abstract":"<div><p><span><span>Synthesizing images with fine details from text descriptions is a challenge. The existing single-stage generative adversarial networks<span> (GANs) fuse sentence features into the image generation process through affine transformation, which alleviate the problems of missing details and large computation from stacked networks. However, existing single-stage networks ignore the word features in the text description, resulting in a lack of detail in the generated image. To address this issue, we proposed a text aggregation module (TAM) to fuse sentence features and word features in a text by a simple spatial </span></span>attention mechanism. Then we built a text connection fusion (TCF) block consisting mainly of gated </span>recurrent<span> unit (GRU) and up-sampled block. It can connect text features used in the up-sampled blocks to improve text utilization. Besides, to further improve the semantic consistency between text and the generated images, we introduce the deep attentional multimodal similarity model (DAMSM) loss, which monitors the similarity between text and improves semantic consistency. Experimental results prove that our method is superior to the state-of-the-art models on the CUB and COCO datasets, regarding both image fidelity and semantic consistency with the text.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117099"},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139093167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-29DOI: 10.1016/j.image.2023.117089
Nofre Sanmartin-Vich , Javier Calpe , Filiberto Pla
Continuous wave indirect Time-of-Flight cameras obtain depth images by emitting a modulated continuous light wave and measuring the delay of the received signal. In this paper we generalize the estimation of the effect of the shot noise when obtaining the phase delay with an arbitrary number of points in the Discrete Fourier Transform, extending and generalizing the analysis done in previous works for the case of four points. For that particular case, we compare our analysis with the state of art. Moreover, we extend the error model using a second order approximation in the error propagation analysis, which provides more accurate estimations according to the Montecarlo simulation experiments. The analysis, based on both analytical and numerical methods, shows that the phase error is, in general, related to the exposure time and weakly to the number of points in the Discrete Fourier Transform. It also depends on the background illumination level, on the amplitude of the received signal, and, when using a three point DFT, on the distance to the objects.
{"title":"Analyzing the effect of shot noise in indirect Time-of-Flight cameras","authors":"Nofre Sanmartin-Vich , Javier Calpe , Filiberto Pla","doi":"10.1016/j.image.2023.117089","DOIUrl":"10.1016/j.image.2023.117089","url":null,"abstract":"<div><p>Continuous wave indirect Time-of-Flight cameras obtain depth images by emitting a modulated continuous light wave and measuring the delay of the received signal. In this paper we generalize the estimation of the effect of the shot noise when obtaining the phase delay with an arbitrary number of points in the Discrete Fourier Transform<span>, extending and generalizing the analysis done in previous works for the case of four points. For that particular case, we compare our analysis with the state of art. Moreover, we extend the error model using a second order approximation in the error propagation analysis, which provides more accurate estimations according to the Montecarlo simulation experiments. The analysis, based on both analytical and numerical methods, shows that the phase error is, in general, related to the exposure time and weakly to the number of points in the Discrete Fourier Transform. It also depends on the background illumination level, on the amplitude of the received signal, and, when using a three point DFT, on the distance to the objects.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"122 ","pages":"Article 117089"},"PeriodicalIF":3.5,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139065281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-05DOI: 10.1016/j.image.2023.117086
Ali Fahmi Jafargholkhanloo, Mousa Shamsi
Localization of facial landmarks plays an important role in the measurement of facial metrics applicable for beauty analysis and facial plastic surgery. The first step in detecting facial landmarks is to estimate the face bounding box. Clinical images of patients' faces usually show intensity non-uniformity. These conditions cause common face detection algorithms do not perform well in face detection under varying illumination. To solve this problem, a modified fuzzy c-means (MFCM) algorithm is used under varying illumination modeling. The cascade regression method (CRM) has an appropriate performance in face alignment. This algorithm has two main drawbacks. (1) In the training phase, increasing the real data without considering normal data can lead to over-fitting. To solve this problem, a weighted CRM (WCRM) is presented. (2) In the test phase, using a mean shape causes the initial shape to be either near to or far from the face shape. To overcome this problem, a Procrustes-based analysis is presented. One of the most important steps in facial landmark localization is feature extraction. In this study, to increase detection accuracy of the cephalometric landmarks, local phase quantization (LPQ) is used for feature extraction in all three channels of RGB color space. Finally, the proposed algorithm is used to measure facial anthropometric metrics. Experimental results show that the proposed algorithm has a better performance in facial landmark localization than other compared algorithms.
{"title":"Quantitative analysis of facial soft tissue using weighted cascade regression model applicable for facial plastic surgery","authors":"Ali Fahmi Jafargholkhanloo, Mousa Shamsi","doi":"10.1016/j.image.2023.117086","DOIUrl":"10.1016/j.image.2023.117086","url":null,"abstract":"<div><p>Localization of facial landmarks plays an important role in the measurement of facial metrics applicable for beauty analysis and facial plastic surgery. The first step in detecting facial landmarks is to estimate the face bounding box. Clinical images of patients' faces usually show intensity non-uniformity. These conditions cause common face detection algorithms do not perform well in face detection under varying illumination. To solve this problem, a modified fuzzy c-means (MFCM) algorithm is used under varying illumination modeling. The cascade regression method (CRM) has an appropriate performance in face alignment. This algorithm has two main drawbacks. (1) In the training phase, increasing the real data without considering normal data can lead to over-fitting. To solve this problem, a weighted CRM (WCRM) is presented. (2) In the test phase, using a mean shape causes the initial shape to be either near to or far from the face shape. To overcome this problem, a Procrustes-based analysis is presented. One of the most important steps in facial landmark localization is feature extraction. In this study, to increase detection accuracy of the cephalometric landmarks, local phase quantization (LPQ) is used for feature extraction in all three channels of RGB color space. Finally, the proposed algorithm is used to measure facial anthropometric metrics. Experimental results show that the proposed algorithm has a better performance in facial landmark localization than other compared algorithms.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"121 ","pages":"Article 117086"},"PeriodicalIF":3.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-28DOI: 10.1016/j.image.2023.117064
Hui Lan, Cheolkon Jung
Although high resolution (HR) depth images are required in many applications such as virtual reality and autonomous navigation, their resolution and quality generated by consumer depth cameras fall short of the requirements. Existing depth upsampling methods focus on extracting multiscale features of HR color image to guide low resolution (LR) depth upsampling, thus causing blurry and inaccurate edges in depth. In this paper, we propose a depth super-resolution (SR) network guided by blurry depth and clear intensity edges, called DSRNet. DSRNet differentiates effective edges from a number of HR edges with the guidance of blurry depth and clear intensity edges. First, we perform global residual estimation based on an encoder–decoder architecture to extract edge structure from HR color image for depth SR. Then, we distinguish effective edges from HR edges in the decoder side with the guidance of LR depth upsampling. To maintain edges for depth SR, we use intensity edge guidance that extracts clear intensity edges from HR image. Finally, we use residual loss to generate accurate high frequency (HF) residual and reconstruct HR depth maps. Experimental results show that DSRNet successfully reconstructs depth edges in SR results as well as outperforms the state-of-the-art methods in terms of visual quality and quantitative measurements.1
尽管虚拟现实和自主导航等许多应用都需要高分辨率(HR)深度图像,但消费级深度相机生成的深度图像的分辨率和质量却达不到要求。现有的深度升采样方法主要是提取高分辨率彩色图像的多尺度特征来指导低分辨率(LR)深度升采样,因此会造成深度边缘模糊和不准确。在本文中,我们提出了一种由模糊深度和清晰强度边缘引导的深度超分辨率(SR)网络,称为 DSRNet。DSRNet 在模糊深度和清晰强度边缘的引导下,从大量 HR 边缘中区分出有效边缘。首先,我们基于编码器-解码器架构进行全局残差估计,从高清彩色图像中提取深度 SR 的边缘结构。然后,在解码器侧,我们以 LR 深度上采样为指导,将有效边缘与 HR 边缘区分开来。为了保持深度 SR 的边缘,我们使用强度边缘引导,从 HR 图像中提取清晰的强度边缘。最后,我们使用残差损耗来生成精确的高频(HF)残差,并重建 HR 深度图。实验结果表明,DSRNet 成功地重建了 SR 结果中的深度边缘,并在视觉质量和定量测量方面优于最先进的方法。
{"title":"DSRNet: Depth Super-Resolution Network guided by blurry depth and clear intensity edges","authors":"Hui Lan, Cheolkon Jung","doi":"10.1016/j.image.2023.117064","DOIUrl":"https://doi.org/10.1016/j.image.2023.117064","url":null,"abstract":"<div><p><span><span>Although high resolution (HR) depth images are required in many applications such as virtual reality and autonomous navigation<span>, their resolution and quality generated by consumer depth cameras fall short of the requirements. Existing depth upsampling methods focus on extracting multiscale features of HR color image to guide low resolution (LR) depth upsampling, thus causing blurry and inaccurate edges in depth. In this paper, we propose a depth super-resolution (SR) network guided by blurry depth and clear intensity edges, called DSRNet. DSRNet differentiates effective edges from a number of HR edges with the guidance of blurry depth and clear intensity edges. First, we perform global residual estimation based on an encoder–decoder architecture to extract edge structure from HR color image for depth SR. Then, we distinguish effective edges from HR edges in the decoder side with the guidance of LR depth upsampling. To maintain edges for depth SR, we use intensity edge guidance that extracts clear intensity edges from HR image. Finally, we use residual loss to generate accurate high frequency (HF) residual and reconstruct HR depth maps. Experimental results show that DSRNet successfully reconstructs depth edges in SR results as well as outperforms the state-of-the-art methods in terms of visual quality and </span></span>quantitative measurements.</span><span><sup>1</sup></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"121 ","pages":"Article 117064"},"PeriodicalIF":3.5,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138490174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-12DOI: 10.1016/j.image.2023.117077
Zhiyu Lyu, Yan Chen, Haojun Sun, Yimin Hou
Blind image denoising and edge-preserving are two primary challenges to recover an image from low-level vision to high-level vision. Blind denoising requires a single denoiser can denoise images with any intensity of noise, and it has practical utility since accurate noise levels cannot be acquired from realistic images. On the other hand, edge preservation can provide more image features for subsequent processing which is also important for the denoising. In this paper, we propose a novel blind universal image denoiser to remove synthesis and realistic noise while preserving the image texture. The denoiser consists of noise network and prior network parallelly, and then a fusion block is used to give the weight between these two networks to balance computation cost and denoising performance. We also use the Non-subsampled Shearlet Transform (NSST) to enlarge the size of receptive field to obtain more detailed information. Extensive denoising experiments on synthetic images and realistic images show the effectiveness of our denoiser.
{"title":"A dual fusion deep convolutional network for blind universal image denoising","authors":"Zhiyu Lyu, Yan Chen, Haojun Sun, Yimin Hou","doi":"10.1016/j.image.2023.117077","DOIUrl":"https://doi.org/10.1016/j.image.2023.117077","url":null,"abstract":"<div><p><span>Blind image denoising and edge-preserving are two primary challenges to recover an image from low-level vision to high-level vision. Blind denoising requires a single denoiser can denoise images with any intensity of noise, and it has practical utility since accurate noise levels cannot be acquired from realistic images. On the other hand, </span>edge preservation<span><span> can provide more image features for subsequent processing which is also important for the denoising. In this paper, we propose a novel blind universal image denoiser to remove synthesis and realistic noise while preserving the image texture. The denoiser consists of noise network and prior network parallelly, and then a fusion block is used to give the weight between these two networks to balance computation cost and denoising performance. We also use the Non-subsampled Shearlet Transform (NSST) to enlarge the size of receptive field to obtain more detailed information. Extensive denoising experiments on </span>synthetic images and realistic images show the effectiveness of our denoiser.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117077"},"PeriodicalIF":3.5,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134656277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-04DOI: 10.1016/j.image.2023.117074
Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal
With the advancement of technologies, automatic plant leaf disease detection has received considerable attention from researchers working in the area of precision agriculture. A number of deep learning-based methods have been introduced in the literature for automated plant disease detection. However, the majority of datasets collected from real fields have blurred background information, data imbalances, less generalization, and tiny lesion features, which may lead to over-fitting of the model. Moreover, the increased parameter size of deep learning models is also a concern, especially for agricultural applications due to limited resources. In this paper, a novel ClGan (Crop Leaf Gan) with improved loss function has been developed with a reduced number of parameters as compared to the existing state-of-the-art methods. The generator and discriminator of the developed ClGan have been encompassed with an encoder–decoder network to avoid the vanishing gradient problem, training instability, and non-convergence failure while preserving complex intricacies during synthetic image generation with significant lesion differentiation. The proposed improved loss function introduces a dynamic correction factor that stabilizes learning while perpetuating effective weight optimization. In addition, a novel plant leaf classification method ClGanNet, has been introduced to classify plant diseases efficiently. The efficiency of the proposed ClGan was validated on the maize leaf dataset in terms of the number of parameters and FID score, and the results are compared against five other state-of-the-art GAN models namely, DC-GAN, W-GAN, , InfoGan, and LeafGan. Moreover, the performance of the proposed classifier, ClGanNet, was evaluated with seven state-of-the-art methods against eight parameters on the original, basic augmented, and ClGan augmented datasets. Experimental results of ClGanNet have outperformed all the considered methods with 99.97% training and 99.04% testing accuracy while using the least number of parameters.
{"title":"ClGanNet: A novel method for maize leaf disease identification using ClGan and deep CNN","authors":"Vivek Sharma , Ashish Kumar Tripathi , Purva Daga , Nidhi M. , Himanshu Mittal","doi":"10.1016/j.image.2023.117074","DOIUrl":"https://doi.org/10.1016/j.image.2023.117074","url":null,"abstract":"<div><p>With the advancement of technologies, automatic plant leaf disease detection has received considerable attention from researchers working in the area of precision agriculture. A number of deep learning-based methods have been introduced in the literature for automated plant disease detection. However, the majority of datasets collected from real fields have blurred background information, data imbalances, less generalization, and tiny lesion features, which may lead to over-fitting of the model. Moreover, the increased parameter size of deep learning models is also a concern, especially for agricultural applications due to limited resources. In this paper, a novel ClGan (Crop Leaf Gan) with improved loss function has been developed with a reduced number of parameters as compared to the existing state-of-the-art methods. The generator and discriminator of the developed ClGan have been encompassed with an encoder–decoder network to avoid the vanishing gradient problem, training instability, and non-convergence failure while preserving complex intricacies during synthetic image generation with significant lesion differentiation. The proposed improved loss function introduces a dynamic correction factor that stabilizes learning while perpetuating effective weight optimization. In addition, a novel plant leaf classification method ClGanNet, has been introduced to classify plant diseases efficiently. The efficiency of the proposed ClGan was validated on the maize leaf dataset in terms of the number of parameters and FID score, and the results are compared against five other state-of-the-art GAN models namely, DC-GAN, W-GAN, <span><math><mrow><mi>W</mi><mi>G</mi><mi>a</mi><msub><mrow><mi>n</mi></mrow><mrow><mi>G</mi><mi>P</mi></mrow></msub></mrow></math></span>, InfoGan, and LeafGan. Moreover, the performance of the proposed classifier, ClGanNet, was evaluated with seven state-of-the-art methods against eight parameters on the original, basic augmented, and ClGan augmented datasets. Experimental results of ClGanNet have outperformed all the considered methods with 99.97% training and 99.04% testing accuracy while using the least number of parameters.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117074"},"PeriodicalIF":3.5,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91987222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1016/j.image.2023.117075
Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja
Natural scenes generally have very high dynamic range (HDR) which cannot be captured in the standard dynamic range (SDR) images. HDR imaging techniques can be used to capture these details in both dark and bright regions, and the resultant HDR images can be tone mapped to reproduce them on SDR displays. To adapt to different applications, the tone mapping operator (TMO) should be able to achieve high performance for diverse HDR scenes. In this paper, we present a clustering-based TMO by embedding human visual system models that function effectively in different scenes. A hierarchical scheme is applied for clustering to reduce the computational complexity. We also propose a detail preservation method by superimposing the details of original HDR images to enhance local contrasts, and a color preservation method by limiting the adaptive saturation parameter to control the color saturation attenuating. The effectiveness of our method is assessed by comparing with state-of-the-art TMOs quantitatively on large-scale HDR datasets and qualitatively with a group of subjects. Experimental results of both objective and subjective evaluations show that the proposed method achieves improvements over the competing methods in generating high quality tone-mapped images with good contrast and natural color appearance for diverse HDR scenes.
{"title":"Image tone mapping based on clustering and human visual system models","authors":"Xueyu Han , Ishtiaq Rasool Khan , Susanto Rahardja","doi":"10.1016/j.image.2023.117075","DOIUrl":"10.1016/j.image.2023.117075","url":null,"abstract":"<div><p><span><span>Natural scenes generally have very high dynamic range (HDR) which cannot be captured in the standard dynamic range (SDR) images. HDR imaging techniques can be used to capture these details in both dark and bright regions, and the resultant HDR images can be tone mapped to reproduce them on SDR displays. To adapt to different applications, the tone mapping operator (TMO) should be able to achieve high performance for diverse HDR scenes. In this paper, we present a clustering-based TMO by embedding </span>human visual system models that function effectively in different scenes. A hierarchical scheme is applied for clustering to reduce the </span>computational complexity<span>. We also propose a detail preservation method by superimposing the details of original HDR images to enhance local contrasts, and a color preservation method by limiting the adaptive saturation parameter to control the color saturation attenuating. The effectiveness of our method is assessed by comparing with state-of-the-art TMOs quantitatively on large-scale HDR datasets and qualitatively with a group of subjects. Experimental results of both objective and subjective evaluations show that the proposed method achieves improvements over the competing methods in generating high quality tone-mapped images with good contrast and natural color appearance for diverse HDR scenes.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117075"},"PeriodicalIF":3.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136093478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}