首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Optimized deep learning enabled lecture audio video summarization 支持深度学习的讲座音频视频摘要优化
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104309
Preet Chandan Kaur , Dr. Leena Ragha
Video summarization plays an important role in multiple applications by compressing lengthy video content into compressed representation. The purpose is to present a fine-tuned deep model for lecture audio video summarization. Initially, the input lecture audio-visual video is taken from the dataset. Then, the video shot segmentation (slide segmentation) is done using the YCbCr space colour model. From each video shot, the audio and video within the video shot are segmented using the Honey Badger-based Bald Eagle Algorithm (HBBEA). The HBBEA is obtained by combining the Bald Eagle Search (BES) and Honey Badger Algorithm (HBA). The DRN training is executed by HBBEA to select the finest DRN weights. The relevant video frames are merged with the audio. The proposed HBBEA-based DRN outperformed with a better F1-Score of 91.9 %, Negative predictive value (NPV) of 89.6 %, Positive predictive value (PPV) of 90.7 %, Accuracy of 91.8 %, precision of 91 %, and recall of 92.8 %.
视频摘要通过将冗长的视频内容压缩成压缩表示法,在多种应用中发挥着重要作用。本研究的目的是提出一种用于讲座音频视频摘要的微调深度模型。首先,从数据集中获取输入的讲座视听视频。然后,使用 YCbCr 空间颜色模型进行视频镜头分割(幻灯片分割)。使用基于蜜獾的白头鹰算法(HBBEA)对每个视频镜头中的音频和视频进行分割。HBBEA 结合了秃鹰搜索(BES)和蜜獾算法(HBA)。通过 HBBEA 执行 DRN 训练,以选择最佳 DRN 权重。相关视频帧与音频合并。所提出的基于 HBBEA 算法的 DRN 性能更优,F1 分数为 91.9 %,负预测值 (NPV) 为 89.6 %,正预测值 (PPV) 为 90.7 %,准确率为 91.8 %,精确度为 91 %,召回率为 92.8 %。
{"title":"Optimized deep learning enabled lecture audio video summarization","authors":"Preet Chandan Kaur ,&nbsp;Dr. Leena Ragha","doi":"10.1016/j.jvcir.2024.104309","DOIUrl":"10.1016/j.jvcir.2024.104309","url":null,"abstract":"<div><div>Video summarization plays an important role in multiple applications by compressing lengthy video content into compressed representation. The purpose is to present a fine-tuned deep model for lecture audio video summarization. Initially, the input lecture audio-visual video is taken from the dataset. Then, the video shot segmentation (slide segmentation) is done using the YCbCr space colour model. From each video shot, the audio and video within the video shot are segmented using the Honey Badger-based Bald Eagle Algorithm (HBBEA). The HBBEA is obtained by combining the Bald Eagle Search (BES) and Honey Badger Algorithm (HBA). The DRN training is executed by HBBEA to select the finest DRN weights. The relevant video frames are merged with the audio. The proposed HBBEA-based DRN outperformed with a better F1-Score of 91.9 %, Negative predictive value (NPV) of 89.6 %, Positive predictive value (PPV) of 90.7 %, Accuracy of 91.8 %, precision of 91 %, and recall of 92.8 %.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104309"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Transformer-based invertible neural network for robust image watermarking 基于变压器的可逆神经网络用于鲁棒图像水印技术
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104317
Zhouyan He , Renzhi Hu , Jun Wu , Ting Luo , Haiyong Xu
For the existing encoder-noise-decoder (END) based watermarking models, since the coupling between the encoder and the decoder is weak, the encoder generally embeds certain redundant features into the cover image to enable the decoder to extract watermark completely, which will affect watermarking invisibility. To address this problem, this paper proposes a Transformer-based invertible neural network (INN) for robust image watermarking (IWFormer). In order to effectively reduce redundant features, the INN framework is utilized for the watermark embedding and extracting processes, so that the encoded features are highly consistent with the features required for decoding. For enhancing watermarking robustness, an affine Transformer module is designed by mining the global correlation of the cover image. In addition, considering that the human visual system is sensitive to low-frequency variations, the wavelet low-frequency sub-band loss is deployed to guide watermark to be embedded in middle- and high-frequency components, thus further increasing the quality of the encoded images. Experimental results demonstrate that compared with the existing state-of-the-art watermarking models, the proposed IWFormer owns remarkable advantages in terms of both watermarking invisibility and robustness.
对于现有的基于编码器-噪声-解码器(END)的水印模型,由于编码器和解码器之间的耦合较弱,编码器一般会在覆盖图像中嵌入某些冗余特征,以使解码器能够完全提取水印,这将影响水印的隐蔽性。针对这一问题,本文提出了一种基于变换器的鲁棒图像水印可逆神经网络(INN)(IWFormer)。为了有效减少冗余特征,INN 框架被用于水印嵌入和提取过程,从而使编码特征与解码所需的特征高度一致。为了增强水印的鲁棒性,设计了一个仿射变换器模块,通过挖掘覆盖图像的全局相关性来增强水印的鲁棒性。此外,考虑到人类视觉系统对低频变化比较敏感,还采用了小波低频子带损耗技术,引导水印嵌入中频和高频成分,从而进一步提高了编码图像的质量。实验结果表明,与现有的先进水印模型相比,所提出的 IWFormer 在水印的隐蔽性和鲁棒性方面都具有显著优势。
{"title":"A Transformer-based invertible neural network for robust image watermarking","authors":"Zhouyan He ,&nbsp;Renzhi Hu ,&nbsp;Jun Wu ,&nbsp;Ting Luo ,&nbsp;Haiyong Xu","doi":"10.1016/j.jvcir.2024.104317","DOIUrl":"10.1016/j.jvcir.2024.104317","url":null,"abstract":"<div><div>For the existing encoder-noise-decoder (END) based watermarking models, since the coupling between the encoder and the decoder is weak, the encoder generally embeds certain redundant features into the cover image to enable the decoder to extract watermark completely, which will affect watermarking invisibility. To address this problem, this paper proposes a Transformer-based invertible neural network (INN) for robust image watermarking (IWFormer). In order to effectively reduce redundant features, the INN framework is utilized for the watermark embedding and extracting processes, so that the encoded features are highly consistent with the features required for decoding. For enhancing watermarking robustness, an affine Transformer module is designed by mining the global correlation of the cover image. In addition, considering that the human visual system is sensitive to low-frequency variations, the wavelet low-frequency sub-band loss is deployed to guide watermark to be embedded in middle- and high-frequency components, thus further increasing the quality of the encoded images. Experimental results demonstrate that compared with the existing state-of-the-art watermarking models, the proposed IWFormer owns remarkable advantages in terms of both watermarking invisibility and robustness.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104317"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust watermarking approach for medical image authentication using dual image and quorum function 使用双图像和法定人数函数的医疗图像认证鲁棒水印方法
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104299
Ashis Dey , Partha Chowdhuri , Pabitra Pal , Utpal Nandi
To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.
为了保护病人医疗文件的身份和版权,水印策略被广泛使用。这项研究利用法定函数(QF)和 AD 插值技术提供了一种新的基于双图像的水印方法。AD 插值用于创建双图像,有助于提高嵌入容量。此外,设计使用 QF 的规则时考虑到了嵌入后对原始比特的影响最小。因此,它提高了偷窃图像的视觉质量。共享密钥被用来保护隐藏在医学图像中的信息,并维护隐私和保密性。使用 PSNR、SSIM、NCC 和 EC 的实验结果表明,在插入 786432 位水印后,建议的技术的平均 PSNR 为 68.44 dB,SSIM 接近 0.99,这表明该方案优于其他最先进的方案。
{"title":"A robust watermarking approach for medical image authentication using dual image and quorum function","authors":"Ashis Dey ,&nbsp;Partha Chowdhuri ,&nbsp;Pabitra Pal ,&nbsp;Utpal Nandi","doi":"10.1016/j.jvcir.2024.104299","DOIUrl":"10.1016/j.jvcir.2024.104299","url":null,"abstract":"<div><div>To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104299"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPIDN: A Hierarchical prior-guided iterative denoising network with global–local fusion for enhancing low-dose CT images HPIDN:分层先验引导迭代去噪网络与全局-局部融合,用于增强低剂量 CT 图像
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104297
Xiuya Shi , Yi Yang , Hao Liu , Litai Ma , Zhibo Zhao , Chao Ren
Low-dose computed tomography (LDCT) is an emerging medical diagnostic tool that reduces radiation exposure but suffers from noise retention. Current CNN-based LDCT denoising algorithms struggle to capture comprehensive global representations, impacting diagnostic accuracy. To address this, we propose a novel Hierarchical Prior-guided Iterative Denoising Network (HPIDN) for LDCT images, consisting of two main modules: the Dynamic Feature Extraction and Fusion Module (DFEFM) and the Feature-domain Iterative Denoising Module (FIDM). DFEFM dynamically captures a comprehensive representation, encompassing detailed local features in intra-relationships and complex global features in inter-relationships. It effectively guides the multi-stage iterative denoising process. FIDM hierarchically fuses the prior with image features from DFEFM by using the dual-domain attention fusion sub-network (DAFSN), enhancing denoising robustness and adaptability. This yields higher-quality images with reduced noise artifacts. Extensive experiments on the Mayo and ELCAP Datasets demonstrate the superior performance of our method quantitatively and qualitatively, improving diagnostic accuracy of lung diseases.
低剂量计算机断层扫描(LDCT)是一种新兴的医疗诊断工具,它能减少辐射暴露,但存在噪声滞留问题。目前基于 CNN 的 LDCT 去噪算法很难捕捉到全面的全局表征,从而影响了诊断的准确性。为了解决这个问题,我们提出了一种用于 LDCT 图像的新型分层先验指导迭代去噪网络(HPIDN),它由两个主要模块组成:动态特征提取与融合模块(DFEFM)和特征域迭代去噪模块(FIDM)。动态特征提取和融合模块(DFEFM)可动态捕捉全面的表征,包括内部关系中详细的局部特征和相互关系中复杂的全局特征。它能有效地指导多阶段迭代去噪过程。FIDM 通过使用双域注意力融合子网络(DAFSN),将先验值与来自 DFEFM 的图像特征进行分层融合,从而增强了去噪的鲁棒性和适应性。这将产生更高质量的图像,并减少噪声伪影。在梅奥数据集和 ELCAP 数据集上进行的大量实验表明,我们的方法在定量和定性方面都具有卓越的性能,提高了肺部疾病的诊断准确性。
{"title":"HPIDN: A Hierarchical prior-guided iterative denoising network with global–local fusion for enhancing low-dose CT images","authors":"Xiuya Shi ,&nbsp;Yi Yang ,&nbsp;Hao Liu ,&nbsp;Litai Ma ,&nbsp;Zhibo Zhao ,&nbsp;Chao Ren","doi":"10.1016/j.jvcir.2024.104297","DOIUrl":"10.1016/j.jvcir.2024.104297","url":null,"abstract":"<div><div>Low-dose computed tomography (LDCT) is an emerging medical diagnostic tool that reduces radiation exposure but suffers from noise retention. Current CNN-based LDCT denoising algorithms struggle to capture comprehensive global representations, impacting diagnostic accuracy. To address this, we propose a novel Hierarchical Prior-guided Iterative Denoising Network (HPIDN) for LDCT images, consisting of two main modules: the Dynamic Feature Extraction and Fusion Module (DFEFM) and the Feature-domain Iterative Denoising Module (FIDM). DFEFM dynamically captures a comprehensive representation, encompassing detailed local features in intra-relationships and complex global features in inter-relationships. It effectively guides the multi-stage iterative denoising process. FIDM hierarchically fuses the prior with image features from DFEFM by using the dual-domain attention fusion sub-network (DAFSN), enhancing denoising robustness and adaptability. This yields higher-quality images with reduced noise artifacts. Extensive experiments on the Mayo and ELCAP Datasets demonstrate the superior performance of our method quantitatively and qualitatively, improving diagnostic accuracy of lung diseases.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104297"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lossless medical ultrasound image compression based on frequency domain decomposition 基于频域分解的无损医学超声图像压缩技术
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104306
Yaqi Zhao, Yue Li
Medical ultrasound imaging is a widely used non-invasive method for diagnosing diseases. However, these images contain significant speckle noise, which differs from the characteristics of natural images. This makes effective lossless compression of medical ultrasound images a challenging task. In this paper, we propose a novel hybrid ultrasound image lossless learning compression framework. Firstly, we use the traditional DCT (discrete cosine transform) to transform the original raw pixels of ultrasound images into the frequency domain. Secondly, to effectively compress the numerical values in the frequency domain, we decompose the DCT coefficients into different groups to reduce local and global information redundancy in the frequency domain. Finally, we use learned and non-learned methods to compress the DCT coefficients of different groups separately. The experimental results show that on the Breast ultrasound image dataset, our proposed method achieves a bit rate reduction of 8.6% to 68.9% compared to learned and non-learned methods.
医学超声成像是一种广泛应用的无创疾病诊断方法。然而,这些图像含有明显的斑点噪声,与自然图像的特征不同。因此,对医学超声图像进行有效的无损压缩是一项具有挑战性的任务。本文提出了一种新型混合超声图像无损学习压缩框架。首先,我们使用传统的 DCT(离散余弦变换)将超声图像的原始像素转换到频域。其次,为了有效压缩频域中的数值,我们将 DCT 系数分解成不同的组,以减少频域中的局部和全局信息冗余。最后,我们使用学习和非学习方法分别压缩不同组的 DCT 系数。实验结果表明,在乳腺超声图像数据集上,我们提出的方法比学习方法和非学习方法的比特率降低了 8.6% 到 68.9%。
{"title":"Lossless medical ultrasound image compression based on frequency domain decomposition","authors":"Yaqi Zhao,&nbsp;Yue Li","doi":"10.1016/j.jvcir.2024.104306","DOIUrl":"10.1016/j.jvcir.2024.104306","url":null,"abstract":"<div><div>Medical ultrasound imaging is a widely used non-invasive method for diagnosing diseases. However, these images contain significant speckle noise, which differs from the characteristics of natural images. This makes effective lossless compression of medical ultrasound images a challenging task. In this paper, we propose a novel hybrid ultrasound image lossless learning compression framework. Firstly, we use the traditional DCT (discrete cosine transform) to transform the original raw pixels of ultrasound images into the frequency domain. Secondly, to effectively compress the numerical values in the frequency domain, we decompose the DCT coefficients into different groups to reduce local and global information redundancy in the frequency domain. Finally, we use learned and non-learned methods to compress the DCT coefficients of different groups separately. The experimental results show that on the Breast ultrasound image dataset, our proposed method achieves a bit rate reduction of 8.6% to 68.9% compared to learned and non-learned methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104306"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local and global mixture network for image inpainting 用于图像着色的局部和全局混合网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104312
Seunggyun Woo , Keunsoo Ko , Chang-Su Kim
In general, CNN-based inpainting can recover local patterns effectively using convolutional filters, but it may not exploit global correlation fully. On the other hand, transformer-based inpainting can fill in large holes faithfully based on global correlation, rather than local one. In this paper, we propose a novel image inpainting algorithm, called local and global mixture (LGM), to take advantage of the strengths of both approaches and compensate for their weaknesses. The LGM network comprises the local inpainting network (LIN) and the global inpainting network (GIN) in parallel, which are based on convolutional layers and transformer blocks, respectively, and exchange their intermediate results with each other. Furthermore, we develop an error propagation model with a continuous error mask, updated in LIN but used in both LIN and GIN to provide more reliable inpainting results. Extensive experiments demonstrate that the proposed LGM algorithm provides excellent inpainting performance, which indicates the efficacy of the parallel combination of LIN and GIN and the effectiveness of the error propagation model.
一般来说,基于 CNN 的涂色可以利用卷积滤波器有效地恢复局部模式,但可能无法充分利用全局相关性。另一方面,基于变换器的内绘可以基于全局相关性而非局部相关性忠实地填补大漏洞。在本文中,我们提出了一种名为局部和全局混合(LGM)的新型图像内绘算法,以利用这两种方法的优势并弥补它们的不足。LGM 网络由本地 Inpainting 网络 (LIN) 和全局 Inpainting 网络 (GIN) 并行组成,这两个网络分别基于卷积层和变换块,并相互交换中间结果。此外,我们还开发了一种带有连续误差掩码的误差传播模型,该模型在 LIN 中更新,但同时用于 LIN 和 GIN,以提供更可靠的绘制结果。广泛的实验证明,所提出的 LGM 算法具有出色的内绘制性能,这表明了 LIN 和 GIN 并行组合的功效以及误差传播模型的有效性。
{"title":"Local and global mixture network for image inpainting","authors":"Seunggyun Woo ,&nbsp;Keunsoo Ko ,&nbsp;Chang-Su Kim","doi":"10.1016/j.jvcir.2024.104312","DOIUrl":"10.1016/j.jvcir.2024.104312","url":null,"abstract":"<div><div>In general, CNN-based inpainting can recover local patterns effectively using convolutional filters, but it may not exploit global correlation fully. On the other hand, transformer-based inpainting can fill in large holes faithfully based on global correlation, rather than local one. In this paper, we propose a novel image inpainting algorithm, called local and global mixture (LGM), to take advantage of the strengths of both approaches and compensate for their weaknesses. The LGM network comprises the local inpainting network (LIN) and the global inpainting network (GIN) in parallel, which are based on convolutional layers and transformer blocks, respectively, and exchange their intermediate results with each other. Furthermore, we develop an error propagation model with a continuous error mask, updated in LIN but used in both LIN and GIN to provide more reliable inpainting results. Extensive experiments demonstrate that the proposed LGM algorithm provides excellent inpainting performance, which indicates the efficacy of the parallel combination of LIN and GIN and the effectiveness of the error propagation model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104312"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An illumination-guided dual-domain network for image exposure correction 用于图像曝光校正的照明引导双域网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104313
Jie Yang, Yuantong Zhang, Zhenzhong Chen, Daiqin Yang
Exposure problems, including underexposure and overexposure, can significantly degrade image quality. Poorly exposed images often suffer from coupled illumination degradation and detail degradation, aggravating the difficulty of recovery. These necessitate a spatial discriminating exposure correction, making achieving uniformly exposed and visually consistent images challenging. To address these issues, we propose an Illumination-guided Dual-domain Network (IDNet), which employs a Dual-Domain Module (DDM) to simultaneously recover illumination and details from the frequency and spatial domains, respectively. The DDM also integrates a structural re-parameterization technique to enhance the detail-aware capabilities with reduced computational cost. An Illumination Mask Predictor (IMP) is introduced to guide exposure correction by estimating the optimal illumination mask. The comparison with 26 methods on three benchmark datasets shows that IDNet achieves superior performance with fewer parameters and lower computational complexity. These results confirm the effectiveness and efficiency of our approach in enhancing image quality across various exposure scenarios.
曝光问题,包括曝光不足和曝光过度,会显著降低图像质量。曝光不足的图像通常会出现光照衰减和细节衰减,增加了恢复的难度。因此,有必要进行空间判别曝光校正,从而使实现均匀曝光和视觉一致的图像变得具有挑战性。为了解决这些问题,我们提出了一种光照引导双域网络(IDNet),它采用双域模块(DDM)分别从频域和空间域同时恢复光照和细节。DDM 还集成了结构重参数化技术,以增强细节感知能力,同时降低计算成本。此外,还引入了光照掩膜预测器(IMP),通过估计最佳光照掩膜来指导曝光校正。在三个基准数据集上与 26 种方法进行的比较表明,IDNet 以更少的参数和更低的计算复杂度实现了更优越的性能。这些结果证实了我们的方法在各种曝光情况下提高图像质量的有效性和效率。
{"title":"An illumination-guided dual-domain network for image exposure correction","authors":"Jie Yang,&nbsp;Yuantong Zhang,&nbsp;Zhenzhong Chen,&nbsp;Daiqin Yang","doi":"10.1016/j.jvcir.2024.104313","DOIUrl":"10.1016/j.jvcir.2024.104313","url":null,"abstract":"<div><div>Exposure problems, including underexposure and overexposure, can significantly degrade image quality. Poorly exposed images often suffer from coupled illumination degradation and detail degradation, aggravating the difficulty of recovery. These necessitate a spatial discriminating exposure correction, making achieving uniformly exposed and visually consistent images challenging. To address these issues, we propose an Illumination-guided Dual-domain Network (IDNet), which employs a Dual-Domain Module (DDM) to simultaneously recover illumination and details from the frequency and spatial domains, respectively. The DDM also integrates a structural re-parameterization technique to enhance the detail-aware capabilities with reduced computational cost. An Illumination Mask Predictor (IMP) is introduced to guide exposure correction by estimating the optimal illumination mask. The comparison with 26 methods on three benchmark datasets shows that IDNet achieves superior performance with fewer parameters and lower computational complexity. These results confirm the effectiveness and efficiency of our approach in enhancing image quality across various exposure scenarios.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104313"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust text watermarking based on average skeleton mass of characters against cross-media attacks 基于字符平均骨架质量的鲁棒文本水印技术对抗跨媒体攻击
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104300
Xinyi Huang, Hongxia Wang
The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.
数字文档的广泛传播使得保护知识产权和信息安全变得至关重要。在此背景下,作为数字版权保护的一种重要方法,强大的文档水印技术备受关注。随着当前电子设备的飞速发展,文件盗窃的方式不再局限于复制和传输。由于相机在纸张或屏幕上的拍摄操作方便快捷,目前的文本水印方法需要具有健壮性,以应对跨媒体传输。为了实现相应的鲁棒文本水印,本文提出了一种基于字符平均骨架质量的文本水印方案,用相邻字符的平均骨架质量来表示水印信息。本文设计了一种修改字符像素的水印方案,可以在不损失透明度的情况下修改字形,并提供较高的嵌入容量。与现有人工设计的基于字体的文本水印方案相比,该方案无需对字符进行精确分割,也不依赖于将字符拉伸到相同大小进行匹配,从而减少了对字符分割的需求。此外,实验结果表明,所提出的水印方案对打印-扫描、打印-摄像头和屏幕-摄像头等信息传输模式具有鲁棒性。
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"Xinyi Huang,&nbsp;Hongxia Wang","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104300"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images 使用混合 DCT 和混合胶囊自动编码器对脑部 MR 图像进行有效压缩
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104296
Bindu Puthentharayil Vikraman , Jabeena Afthab
Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.
如今,图像压缩因其存储和传输能力强而在各个领域越来越受欢迎。这项工作旨在引入脑磁共振图像(MRI)中的医学图像压缩模型,以缓解带宽和存储问题。首先,使用自适应线性平滑和直方图均衡(ALSHE)方法进行预处理,以忽略输入中的噪声。然后,使用优化模糊 C-Means (OFCM) 方法分别分割感兴趣区域 (ROI) 和非感兴趣区域 (ROI) 部分,以减少高复杂性问题。最后,提出了用于无损压缩的新型混合离散余弦变换-改进零小波(DCT-IZW)和用于有损压缩的混合平衡优化-胶囊自动编码器(EO-CAE)。然后,将压缩后的 ROI 和非 ROI 图像相加,并对压缩过程进行逆运算,得到重建图像。该研究使用 BRATS(2015、2018)数据集进行模拟,取得了比其他现有方法更好的性能。
{"title":"Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images","authors":"Bindu Puthentharayil Vikraman ,&nbsp;Jabeena Afthab","doi":"10.1016/j.jvcir.2024.104296","DOIUrl":"10.1016/j.jvcir.2024.104296","url":null,"abstract":"<div><div>Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104296"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image quilting heuristic compressed sensing video privacy protection coding for abnormal behavior detection in private scenes 图像绗缝启发式压缩传感视频隐私保护编码用于私密场景中的异常行为检测
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104307
Jixin Liu, Shabo Hu, Haigen Yang, Ning Sun
For video intelligence applications in private scenes such as home environments, traditional image processing methods are usually based on clear raw data and are prone to privacy leakage. Therefore, our team proposed multilayer compressed sensing (MCS) encoding to reduce image quality for visual privacy protection (VPP). However, the way in which MCS coding is implemented leads to unavoidable information loss. On this basis, inspired by the image quilting (IQ) algorithm, an image quilting heuristic MCS (IQ-MCS) coding method is proposed in this paper to improve the problem of faster information loss in the MCS coding process, which means that a similar privacy protection effect is achieved at lower coding layers, thus obtaining better application performance. To evaluate the level of VPP, a VPP evaluation algorithm is proposed that is more in line with subjective assessment. Finally, a correlation model between the VPP level and the performance of smart applications is established to balance the relationships between them, taking the detection of abnormal human behavior in private scenes as an example. The model can also provide a reference for the evaluation of other privacy protection methods.
对于家庭环境等私密场景中的视频智能应用,传统的图像处理方法通常基于清晰的原始数据,容易造成隐私泄露。因此,我们的团队提出了多层压缩传感(MCS)编码,以降低图像质量,实现视觉隐私保护(VPP)。然而,MCS 编码的实现方式会导致不可避免的信息损失。在此基础上,本文受图像绗缝(IQ)算法的启发,提出了一种图像绗缝启发式 MCS(IQ-MCS)编码方法,以改善 MCS 编码过程中信息丢失较快的问题,即在较低的编码层也能达到类似的隐私保护效果,从而获得更好的应用性能。为了评价 VPP 的水平,本文提出了一种更符合主观评价的 VPP 评价算法。最后,以检测私密场景中的异常人类行为为例,建立了 VPP 水平与智能应用性能之间的相关模型,以平衡二者之间的关系。该模型还可为其他隐私保护方法的评估提供参考。
{"title":"Image quilting heuristic compressed sensing video privacy protection coding for abnormal behavior detection in private scenes","authors":"Jixin Liu,&nbsp;Shabo Hu,&nbsp;Haigen Yang,&nbsp;Ning Sun","doi":"10.1016/j.jvcir.2024.104307","DOIUrl":"10.1016/j.jvcir.2024.104307","url":null,"abstract":"<div><div>For video intelligence applications in private scenes such as home environments, traditional image processing methods are usually based on clear raw data and are prone to privacy leakage. Therefore, our team proposed multilayer compressed sensing (MCS) encoding to reduce image quality for visual privacy protection (VPP). However, the way in which MCS coding is implemented leads to unavoidable information loss. On this basis, inspired by the image quilting (IQ) algorithm, an image quilting heuristic MCS (IQ-MCS) coding method is proposed in this paper to improve the problem of faster information loss in the MCS coding process, which means that a similar privacy protection effect is achieved at lower coding layers, thus obtaining better application performance. To evaluate the level of VPP, a VPP evaluation algorithm is proposed that is more in line with subjective assessment. Finally, a correlation model between the VPP level and the performance of smart applications is established to balance the relationships between them, taking the detection of abnormal human behavior in private scenes as an example. The model can also provide a reference for the evaluation of other privacy protection methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104307"},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1