Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117030
Yuan Liu , John W. Woods
Interframe EZBC/JP2K has been shown to be an effective fine-grain scalable video coding system. However, its Lagrange multiplier values for motion estimation of multiple temporal levels are not specified, and must be specified by the user in the config file in order to run the program. In this paper, we investigate how to select these Lagrange parameters for optimized performance. By designing an iterative mechanism, we make it possible for the encoder to adaptively select Lagrange multipliers based on the feedback of Y-PSNR closed GOP performance. Experimental results regarding both classic test video clips and their concatenations are obtained and discussed. We also present a new analytical model for optimized Lagrange multiplier selection in terms of target Y-PSNR.
{"title":"Determination of Lagrange multipliers for interframe EZBC/JP2K","authors":"Yuan Liu , John W. Woods","doi":"10.1016/j.image.2023.117030","DOIUrl":"https://doi.org/10.1016/j.image.2023.117030","url":null,"abstract":"<div><p><span>Interframe<span> EZBC/JP2K has been shown to be an effective fine-grain scalable video coding system. However, its </span></span>Lagrange multiplier<span><span> values for motion estimation of multiple temporal levels are not specified, and must be specified by the user in the config file in order to run the program. In this paper, we investigate how to select these </span>Lagrange parameters for optimized performance. By designing an iterative mechanism, we make it possible for the encoder to adaptively select Lagrange multipliers based on the feedback of Y-PSNR closed GOP performance. Experimental results regarding both classic test video clips and their concatenations are obtained and discussed. We also present a new analytical model for optimized Lagrange multiplier selection in terms of target Y-PSNR.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117030"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117011
Yi Zhang , Damon M. Chandler , Xuanqin Mou
Although numerous methods have been proposed to remove blocking artifacts in JPEG-compressed images, one important issue not well addressed so far is the construction of a unified model that requires no prior knowledge of the JPEG encoding parameters to operate effectively on different compression-level images (grayscale/color) while occupying relatively small storage space to save and run. To address this issue, in this paper, we present a unified JPEG compression artifact reduction model called DSPW-Net, which employs (1) the deep steerable pyramid wavelet transform network for Y-channel restoration, and (2) the classic U-Net architecture for CbCr-channel restoration. To enable our model to work effectively on images with a wide range of compression levels, the quality factor (QF) related features extracted by the convolutional layers in the QF-estimation network are incorporated in the two restoration branches. Meanwhile, recursive blocks with shared parameters are utilized to drastically reduce model parameters and shared-source residual learning is employed to avoid the gradient vanishing/explosion problem in training. Extensive quantitative and qualitative results tested on various benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art deblocking methods.
{"title":"Deep steerable pyramid wavelet network for unified JPEG compression artifact reduction","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2023.117011","DOIUrl":"https://doi.org/10.1016/j.image.2023.117011","url":null,"abstract":"<div><p><span>Although numerous methods have been proposed to remove blocking artifacts in JPEG-compressed images, one important issue not well addressed so far is the construction of a unified model that requires no prior knowledge of the JPEG encoding parameters to operate effectively on different compression-level images (grayscale/color) while occupying relatively small storage space to save and run. To address this issue, in this paper, we present a unified JPEG compression artifact<span> reduction model called DSPW-Net, which employs (1) the deep steerable pyramid wavelet transform network for Y-channel restoration, and (2) the classic U-Net architecture for CbCr-channel restoration. To enable our model to work effectively on images with a wide range of compression levels, the quality factor (QF) related features extracted by the </span></span>convolutional layers in the QF-estimation network are incorporated in the two restoration branches. Meanwhile, recursive blocks with shared parameters are utilized to drastically reduce model parameters and shared-source residual learning is employed to avoid the gradient vanishing/explosion problem in training. Extensive quantitative and qualitative results tested on various benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art deblocking methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117011"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117014
Daniel Berjón, Carlos Cuevas, Narciso García
Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even under challenging conditions.
{"title":"Soccer line mark segmentation and classification with stochastic watershed transform","authors":"Daniel Berjón, Carlos Cuevas, Narciso García","doi":"10.1016/j.image.2023.117014","DOIUrl":"https://doi.org/10.1016/j.image.2023.117014","url":null,"abstract":"<div><p>Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even under challenging conditions.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117014"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117006
Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei
The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.
{"title":"Multi-scale deep feature fusion based sparse dictionary selection for video summarization","authors":"Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei","doi":"10.1016/j.image.2023.117006","DOIUrl":"https://doi.org/10.1016/j.image.2023.117006","url":null,"abstract":"<div><p>The explosive growth of video data constitutes a series of new challenges in computer vision<span><span>, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning<span> techniques based on convolutional neural networks<span> (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion<span> based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few </span></span></span></span>keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117006"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117019
Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang
Just noticeable distortion (JND), reflecting the perceptual redundancy directly, has been widely used in image and video compression. However, the human visual system (HVS) is extremely complex and the visual signal processing has not been fully understood, which result in existing JND models are not accurate enough and the bitrate saving of JND-based perceptual compression schemes is limited. This paper presents a novel pixel-based JND model for videos and a JND-based perceptual quantization scheme for HEVC codecs. In particular, positive and negative perception effects of the inter-frame difference and the motion information are analyzed and measured with an information-theoretic approach. Then, a surprise-based JND model is developed for perceptual video coding (PVC). In our PVC scheme, the frame-level perceptual quantization parameter (QP) is derived on the premise that the coding distortion is infinitely close to the estimated JND threshold. On the basis of the frame-level perceptual QP, we determine the perceptual QP for each coding unit through a perceptual adjustment function to achieve better perceptual quality. Experimental results indicate that the proposed JND model outperforms existing models significantly, the proposed perceptual quantization scheme improves video compression efficiency with better perceptual quality and lower coding complexity.
{"title":"Surprise-based JND estimation for perceptual quantization in H.265/HEVC codecs","authors":"Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang","doi":"10.1016/j.image.2023.117019","DOIUrl":"https://doi.org/10.1016/j.image.2023.117019","url":null,"abstract":"<div><p><span>Just noticeable distortion (JND), reflecting the perceptual redundancy directly, has been widely used in image and video compression. However, the </span>human visual system<span><span> (HVS) is extremely complex and the visual signal processing has not been fully understood, which result in existing JND models are not accurate enough and the bitrate saving of JND-based perceptual compression schemes<span> is limited. This paper presents a novel pixel-based JND model for videos and a JND-based perceptual quantization scheme for HEVC codecs. In particular, positive and negative perception effects of the inter-frame difference and the motion information are analyzed and measured with an information-theoretic approach. Then, a surprise-based JND model is developed for perceptual video coding (PVC). In our PVC scheme, the frame-level perceptual quantization parameter (QP) is derived on the premise that the coding distortion is infinitely close to the estimated JND threshold. On the basis of the frame-level perceptual QP, we determine the perceptual QP for each coding unit through a perceptual adjustment function to achieve better </span></span>perceptual quality. Experimental results indicate that the proposed JND model outperforms existing models significantly, the proposed perceptual quantization scheme improves video compression efficiency with better perceptual quality and lower coding complexity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117019"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1016/j.image.2023.117022
Le Zhang , Yao Lu , Tong Li , Guangming Lu
Image steganography aims to achieve covert communication between two partners utilizing stego images generated by hiding secret images within cover images. Existing deep image steganography methods have been rapidly developed in this area. Such methods, however, usually generate the stego images and reveal the secret images using one-process networks, lacking sufficient refinement in these methods. Thus, the security and quality of stego and revealed secret images still have much room for promotion, especially for large-capacity image steganography. This paper proposes Joint Adjustment Image Steganography Networks (JAIS-Nets), containing a series of coarse-to-fine iterative adjustment processes, for image steganography. Our JAIS-Nets first proposes Cross-Process Contrastive Refinement (CPCR) adjustment method, using the cross-process contrastive information from cover-stego and secret-revealed secret image pairs, to iteratively refine the generated stego and revealed secret images, respectively. In addition, our JAIS-Nets further proposes Cross-Process Multi-Scale (CPMS) adjustment method, using the cross-process multi-scale information from different scales cover-stego and secret-revealed secret image pairs, to directly adjust and enhance the intermediate representations of the proposed JAIS-Nets. Integrating the proposed CPCR with CPMS methods, the proposed JAIS-Nets can jointly adjust the quality of the stego and revealed secret images at both the learning process and image scale levels. Extensive experiments demonstrate that our JAIS-Nets can achieve state-of-the-art performances on the security and quality of the stego and revealed secret images on both the regular and large capacity image steganography.
{"title":"Joint adjustment image steganography networks","authors":"Le Zhang , Yao Lu , Tong Li , Guangming Lu","doi":"10.1016/j.image.2023.117022","DOIUrl":"https://doi.org/10.1016/j.image.2023.117022","url":null,"abstract":"<div><p>Image steganography aims to achieve covert communication<span><span> between two partners utilizing stego images generated by hiding </span>secret images<span> within cover images. Existing deep image steganography methods have been rapidly developed in this area. Such methods, however, usually generate the stego images and reveal the secret images using one-process networks, lacking sufficient refinement in these methods. Thus, the security and quality of stego and revealed secret images still have much room for promotion, especially for large-capacity image steganography. This paper proposes Joint Adjustment Image Steganography Networks (JAIS-Nets), containing a series of coarse-to-fine iterative adjustment processes, for image steganography. Our JAIS-Nets first proposes Cross-Process Contrastive Refinement (CPCR) adjustment method, using the cross-process contrastive information from cover-stego and secret-revealed secret image pairs, to iteratively refine the generated stego and revealed secret images, respectively. In addition, our JAIS-Nets further proposes Cross-Process Multi-Scale (CPMS) adjustment method, using the cross-process multi-scale information from different scales cover-stego and secret-revealed secret image pairs, to directly adjust and enhance the intermediate representations of the proposed JAIS-Nets. Integrating the proposed CPCR with CPMS methods, the proposed JAIS-Nets can jointly adjust the quality of the stego and revealed secret images at both the learning process and image scale levels. Extensive experiments demonstrate that our JAIS-Nets can achieve state-of-the-art performances on the security and quality of the stego and revealed secret images on both the regular and large capacity image steganography.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117022"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-26DOI: 10.1016/j.image.2023.117061
Felix S.K. Yu, Yuk-Hee Chan, Kenneth K.M. Lam, Daniel P.K. Lun
This paper presents a self-embedding reversible color-to-grayscale conversion (RCGC) algorithm that makes good use of deep learning, vector quantization, and halftoning techniques to achieve its goals. By decoupling the luminance information of a pixel from its chrominance information, it explicitly controls the luminance error of both the conversion outputs and their corresponding reconstructed color images. It can also alleviate the burden of the deep learning network used to restore the embedded chrominance information during the reconstruction of the color image. Luminance-guided chrominance quantization and checkerboard-based halftoning are introduced in the paper to encode the chrominance information to be embedded while reference-guided inverse halftoning is proposed to restore the color image. Simulation results verify that its performance is remarkably superior to conventional state-of-art RCGC algorithms in various measures. In the aspect of authentication, embedding the watermark and chrominance information is realized with context-based pixel-wise encryption and a key-based watermark bit positioning mechanism, which makes us possible to locate tampered regions and prevent unauthorized use of the chrominance information.
{"title":"Self-embedding reversible color-to-grayscale conversion with watermarking feature","authors":"Felix S.K. Yu, Yuk-Hee Chan, Kenneth K.M. Lam, Daniel P.K. Lun","doi":"10.1016/j.image.2023.117061","DOIUrl":"https://doi.org/10.1016/j.image.2023.117061","url":null,"abstract":"<div><p>This paper presents a self-embedding reversible color-to-grayscale conversion (RCGC) algorithm that makes good use of deep learning, vector quantization, and halftoning techniques to achieve its goals. By decoupling the luminance information of a pixel from its chrominance information, it explicitly controls the luminance error of both the conversion outputs and their corresponding reconstructed color images. It can also alleviate the burden of the deep learning network used to restore the embedded chrominance information during the reconstruction of the color image. Luminance-guided chrominance quantization and checkerboard-based halftoning are introduced in the paper to encode the chrominance information to be embedded while reference-guided inverse halftoning is proposed to restore the color image. Simulation results verify that its performance is remarkably superior to conventional state-of-art RCGC algorithms in various measures. In the aspect of authentication, embedding the watermark and chrominance information is realized with context-based pixel-wise encryption and a key-based watermark bit positioning mechanism, which makes us possible to locate tampered regions and prevent unauthorized use of the chrominance information.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"119 ","pages":"Article 117061"},"PeriodicalIF":3.5,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49838796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-23DOI: 10.1016/j.image.2023.117059
Aladine Chetouani , Muhammad Ali Qureshi , Mohamed Deriche , Azeddine Beghdadi
Next-generation multimedia networks are expected to provide systems and applications with top Quality of Experience (QoE) to users. To this end, robust quality evaluation metrics are critical. Unfortunately, most current research focuses only on modeling and evaluating mainly distortions across the pipeline of multimedia networks. While distortions are important, it is also as important to consider the effects of enhancement and other manipulations of multimedia content, especially images and videos. In contrast to most existing works dedicated to evaluating image/video quality in its traditional context, very few research efforts have been devoted to Image Quality Enhancement Assessment (IQEA) and more specifically, Contrast Enhancement Evaluation (CEE). Our contribution fills this gap by proposing a pairwise ranking scheme for estimating and evaluating the perceptual quality of image contrast change (contrast enhancement and/or contrast-distorted images) process. We propose a novel Deep Learning-based Blind Quality pairwise Ranking scheme for Contrast-Changed (Deep-BQRCC) images. This method provides an automatic pairwise ranking of a set of contrast-changed images. The proposed framework is based on using a pair of Convolutional Neural Networks (CNN) together with a saliency-based attention model and a color-difference visual map. Extensive experiments were conducted to validate the effectiveness of the proposed workflow through an ablation analysis. Different combinations of CNN models and pooling strategies were analyzed. The proposed Deep-BQRCC approach was evaluated over three dedicated publicly available datasets. The experimental results showed an increase in performance within a range of % compared to state-of-the-art IQEA measures.
{"title":"Blind quality-based pairwise ranking of contrast changed color images using deep networks","authors":"Aladine Chetouani , Muhammad Ali Qureshi , Mohamed Deriche , Azeddine Beghdadi","doi":"10.1016/j.image.2023.117059","DOIUrl":"10.1016/j.image.2023.117059","url":null,"abstract":"<div><p><span><span><span>Next-generation multimedia networks are expected to provide systems and applications with top </span>Quality of Experience<span><span> (QoE) to users. To this end, robust quality evaluation metrics<span> are critical. Unfortunately, most current research focuses only on modeling and evaluating mainly distortions across the pipeline of multimedia networks. While distortions are important, it is also as important to consider the effects of enhancement and other manipulations of multimedia content, especially images and videos. In contrast to most existing works dedicated to evaluating image/video quality in its traditional context, very few research efforts have been devoted to Image Quality Enhancement Assessment (IQEA) and more specifically, Contrast Enhancement Evaluation (CEE). Our contribution fills this gap by proposing a pairwise ranking scheme for estimating and evaluating the </span></span>perceptual quality of image contrast change (contrast enhancement and/or contrast-distorted images) process. We propose a novel Deep Learning-based Blind Quality pairwise Ranking scheme for Contrast-Changed (Deep-BQRCC) images. This method provides an automatic pairwise ranking of a set of contrast-changed images. The proposed framework is based on using a pair of </span></span>Convolutional Neural Networks (CNN) together with a saliency-based attention model and a color-difference visual map. Extensive experiments were conducted to validate the effectiveness of the proposed workflow through an ablation analysis. Different combinations of CNN models and pooling strategies were analyzed. The proposed Deep-BQRCC approach was evaluated over three dedicated publicly available datasets. The experimental results showed an increase in performance within a range of </span><span><math><mrow><mn>3</mn><mtext>–</mtext><mn>10</mn></mrow></math></span>% compared to state-of-the-art IQEA measures.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"121 ","pages":"Article 117059"},"PeriodicalIF":3.5,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135484422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-20DOI: 10.1016/j.image.2023.117056
Wenjun Zhou , Yuheng Deng , Bo Peng , Sheng Xiang , Shun’ichi Kaneko
Background information is an important aspect of pre-processing for advanced applications in computer vision. The literature has made rapid progress in background initialization. However, background initialization still suffers from high-dynamic complex scenes, such as illumination change, background motion, or camera jitter. Therefore, this study presents a novel Co-occurrence Spatial–Temporal (CoST) model for background initialization in high-dynamic complex scenes. CoST achieves a spatial–temporal model through a co-occurrence pixel-block structure. The proposed approach extracts the spatial–temporal information of pixels to self-adaptively generate the background without the influence of high-dynamic complex scenes. The efficiency of CoST is verified through experimental results compared with state-of-the-art algorithms. The source code of CoST is available online at: https://github.com/HelloMrDeng/CoST.git.
{"title":"Co-occurrence spatial–temporal model for adaptive background initialization in high-dynamic complex scenes","authors":"Wenjun Zhou , Yuheng Deng , Bo Peng , Sheng Xiang , Shun’ichi Kaneko","doi":"10.1016/j.image.2023.117056","DOIUrl":"https://doi.org/10.1016/j.image.2023.117056","url":null,"abstract":"<div><p><span>Background information is an important aspect of pre-processing for advanced applications in computer vision<span>. The literature has made rapid progress in background initialization. However, background initialization still suffers from high-dynamic complex scenes, such as illumination change, background motion, or camera jitter. Therefore, this study presents a novel Co-occurrence Spatial–Temporal (CoST) model for background initialization in high-dynamic complex scenes. CoST achieves a spatial–temporal model through a co-occurrence pixel-block structure. The proposed approach extracts the spatial–temporal information of pixels to self-adaptively generate the background without the influence of high-dynamic complex scenes. The efficiency of CoST is verified through experimental results compared with state-of-the-art algorithms. The source code of CoST is available online at: </span></span><span>https://github.com/HelloMrDeng/CoST.git</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"119 ","pages":"Article 117056"},"PeriodicalIF":3.5,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49838801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19DOI: 10.1016/j.image.2023.117063
Yingchun Guo, Dan Wang, Ye Zhu, Gang Yan
Image Retargeting (IR) technology is proposed to flexibly display images on various display devices while protecting the important content of the images undistorted. IR methods mainly use Salient Object Detection (SOD) to obtain important content, however, most existing SOD methods treat multiple salient objects with the same saliency degrees, which makes IR methods assign the same retargeting ratios for different objects and leads to producing information-loss retargeted results. Multi-operator IR demonstrates better generalization than single operator by using multiple operators to find the optimal sequence of operators. Meanwhile, the tremendous processing time limits its practical use. To address these problems, we propose a multi-operator IR method based on Salient Object Ranking (SOR) and Similarity Evaluation Metric (SORSEM-IR), which includes two stages: importance map generation and multi-operator IR. In the first stage, a SOR module with Context-aware Semantic Refinement (SORCSR) is proposed, which extracts the salient instances and infers their saliency ranks with a context-aware semantic refinement module, then the SOR map, face map, and gradient map are fused as the importance map. In the second stage, to speed up multiple operations, a similarity evaluation metric is proposed to measure the similarity between the original image and the seam-removal image by Seam Carving (SC) operation, and switch SC to uniform scaling to meet the aspect ratio when distortion caused by SC arrives at a certain extent. Experimental results show that the SORCSR network achieves state-of-the-art performance on the ASSR dataset subjectively and objectively, and the SORSEM-IR guided by SORCSR can not only protect the salient objects with minimum deformation but also meet human aesthetic perception.
{"title":"Multi-operator Image Retargeting based on Saliency Object Ranking and Similarity Evaluation Metric","authors":"Yingchun Guo, Dan Wang, Ye Zhu, Gang Yan","doi":"10.1016/j.image.2023.117063","DOIUrl":"https://doi.org/10.1016/j.image.2023.117063","url":null,"abstract":"<div><p>Image Retargeting (IR) technology is proposed to flexibly display images on various display devices while protecting the important content of the images undistorted. IR methods mainly use Salient Object Detection (SOD) to obtain important content, however, most existing SOD methods treat multiple salient objects with the same saliency degrees, which makes IR methods assign the same retargeting ratios for different objects and leads to producing information-loss retargeted results. Multi-operator IR demonstrates better generalization than single operator by using multiple operators to find the optimal sequence of operators. Meanwhile, the tremendous processing time limits its practical use. To address these problems, we propose a multi-operator IR method based on Salient Object Ranking (SOR) and Similarity Evaluation Metric<span> (SORSEM-IR), which includes two stages: importance map generation and multi-operator IR. In the first stage, a SOR module with Context-aware Semantic Refinement (SORCSR) is proposed, which extracts the salient instances and infers their saliency ranks with a context-aware semantic refinement module, then the SOR map, face map, and gradient map are fused as the importance map. In the second stage, to speed up multiple operations, a similarity evaluation metric is proposed to measure the similarity between the original image and the seam-removal image by Seam Carving (SC) operation, and switch SC to uniform scaling to meet the aspect ratio when distortion caused by SC arrives at a certain extent. Experimental results show that the SORCSR network achieves state-of-the-art performance on the ASSR dataset subjectively and objectively, and the SORSEM-IR guided by SORCSR can not only protect the salient objects with minimum deformation but also meet human aesthetic perception.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"119 ","pages":"Article 117063"},"PeriodicalIF":3.5,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49838794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}