首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
Are metrics measuring what they should? An evaluation of Image Captioning task metrics 衡量标准是衡量他们应该做什么吗?图像字幕任务度量的评估
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-14 DOI: 10.1016/j.image.2023.117071
Othón González-Chávez , Guillermo Ruiz , Daniela Moctezuma , Tania Ramirez-delReal

Image Captioning is a current research task to describe the image content using the objects and their relationships in the scene. Two important research areas converge to tackle this task: artificial vision and natural language processing. In Image Captioning, as in any computational intelligence task, the performance metrics are crucial for knowing how well (or bad) a method performs. In recent years, it has been observed that classical metrics based on n-grams are insufficient to capture the semantics and the critical meaning to describe the content in an image. Looking to measure how well or not the current and more recent metrics are doing, in this article, we present an evaluation of several kinds of Image Captioning metrics and a comparison between them using the well-known datasets, MS-COCO and Flickr8k. The metrics were selected from the most used in prior works; they are those based on n-grams, such as BLEU, SacreBLEU, METEOR, ROGUE-L, CIDEr, SPICE, and those based on embeddings, such as BERTScore and CLIPScore. We designed two scenarios for this: (1) a set of artificially built captions with several qualities and (2) a comparison of some state-of-the-art Image Captioning methods. Interesting findings were found trying to answer the questions: Are the current metrics helping to produce high-quality captions? How do actual metrics compare to each other? What are the metrics really measuring?

图像字幕是当前的一项研究任务,旨在使用场景中的对象及其关系来描述图像内容。两个重要的研究领域共同致力于解决这一任务:人工视觉和自然语言处理。在图像字幕中,与任何计算智能任务一样,性能指标对于了解一种方法的性能是至关重要的。近年来,人们观察到,基于n-gram的经典度量不足以捕捉描述图像中内容的语义和关键意义。为了衡量当前和最近的指标表现如何,在本文中,我们对几种图像字幕指标进行了评估,并使用众所周知的数据集MS-COCO和Flickr8k对它们进行了比较。这些指标是从以前工作中使用最多的指标中选择的;它们是基于n-gram的,如BLEU、SacreBLEU、METEOR、ROGUE-L、CIDEr、SPICE,以及基于嵌入的,如BERTScore和CLIPScore。我们为此设计了两个场景:(1)一组具有多种质量的人工构建的字幕;(2)一些最先进的图像字幕方法的比较。有趣的发现试图回答这些问题:当前的指标是否有助于制作高质量的字幕?实际指标之间的比较如何?衡量标准到底是什么?
{"title":"Are metrics measuring what they should? An evaluation of Image Captioning task metrics","authors":"Othón González-Chávez ,&nbsp;Guillermo Ruiz ,&nbsp;Daniela Moctezuma ,&nbsp;Tania Ramirez-delReal","doi":"10.1016/j.image.2023.117071","DOIUrl":"https://doi.org/10.1016/j.image.2023.117071","url":null,"abstract":"<div><p><span>Image Captioning is a current research task to describe the image content using the objects and their relationships in the scene. Two important research areas converge to tackle this task: artificial vision and natural language processing. In Image Captioning, as in any computational intelligence task, the performance metrics are crucial for knowing how well (or bad) a method performs. In recent years, it has been observed that classical metrics based on </span><span><math><mi>n</mi></math></span>-grams are insufficient to capture the semantics and the critical meaning to describe the content in an image. Looking to measure how well or not the current and more recent metrics are doing, in this article, we present an evaluation of several kinds of Image Captioning metrics and a comparison between them using the well-known datasets, MS-COCO and Flickr8k. The metrics were selected from the most used in prior works; they are those based on <span><math><mi>n</mi></math></span>-grams, such as BLEU, SacreBLEU, METEOR, ROGUE-L, CIDEr, SPICE, and those based on embeddings, such as BERTScore and CLIPScore. We designed two scenarios for this: (1) a set of artificially built captions with several qualities and (2) a comparison of some state-of-the-art Image Captioning methods. Interesting findings were found trying to answer the questions: Are the current metrics helping to produce high-quality captions? How do actual metrics compare to each other? What are the metrics <em>really</em> measuring?</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"120 ","pages":"Article 117071"},"PeriodicalIF":3.5,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49833433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based network for perceptual contrastive underwater image enhancement 基于变压器的感知对比水下图像增强网络
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117032
Na Cheng, Zhixuan Sun, Xuanbing Zhu, Hongyu Wang

Vision-based underwater image enhancement methods have received much attention for application in the fields of marine engineering and marine science. The absorption and scattering of light in real underwater scenes leads to severe information degradation in the acquired underwater images, thus limiting further development of underwater tasks. To solve these problems, a novel transformer-based perceptual contrastive network for underwater image enhancement methods (TPC-UIE) is proposed to achieve visually friendly and high-quality images, where contrastive learning is applied to the underwater image enhancement (UIE) task for the first time. Specifically, to address the limitations of the pure convolution-based network, we embed the transformer into the UIE network to improve its ability to capture global dependencies. Then, the limits of the transformer are then taken into account as convolution is reintroduced to better capture local attention. At the same time, the dual-attention module strengthens the network’s focus on the spatial and color channels that are more severely attenuated. Finally, a perceptual contrastive regularization method is proposed, where a multi-loss function made up of reconstruction loss, perceptual loss, and contrastive loss jointly optimizes the model to simultaneously ensure texture detail, contrast, and color consistency. Experimental results on several existing datasets show that the TPC-UIE obtains excellent performance in both subjective and objective evaluations compared to other methods. In addition, the visual quality of the underwater images is significantly improved by the enhancement of the method and effectively facilitates further development of the underwater task.

基于视觉的水下图像增强方法在海洋工程和海洋科学领域得到了广泛的应用。真实水下场景中光线的吸收和散射导致获取的水下图像信息严重退化,限制了水下任务的进一步开展。为了解决这些问题,提出了一种新的基于变压器的水下图像增强方法感知对比网络(TPC-UIE),并首次将对比学习应用于水下图像增强(UIE)任务中,以获得视觉友好的高质量图像。具体来说,为了解决纯基于卷积的网络的局限性,我们将转换器嵌入到UIE网络中,以提高其捕获全局依赖关系的能力。然后,当重新引入卷积以更好地捕获局部注意力时,考虑到变压器的限制。同时,双注意模块加强了网络对衰减较严重的空间和颜色通道的关注。最后,提出了一种感知对比正则化方法,由重建损失、感知损失和对比损失组成的多损失函数共同优化模型,同时保证纹理细节、对比度和颜色一致性。在多个已有数据集上的实验结果表明,与其他方法相比,TPC-UIE在主观和客观评价方面都取得了优异的成绩。此外,该方法的增强显著提高了水下图像的视觉质量,有效地促进了水下任务的进一步开展。
{"title":"A transformer-based network for perceptual contrastive underwater image enhancement","authors":"Na Cheng,&nbsp;Zhixuan Sun,&nbsp;Xuanbing Zhu,&nbsp;Hongyu Wang","doi":"10.1016/j.image.2023.117032","DOIUrl":"https://doi.org/10.1016/j.image.2023.117032","url":null,"abstract":"<div><p>Vision-based underwater image enhancement methods have received much attention for application in the fields of marine engineering and marine science. The absorption and scattering of light in real underwater scenes leads to severe information degradation in the acquired underwater images, thus limiting further development of underwater tasks. To solve these problems, a novel transformer-based perceptual contrastive network for underwater image enhancement methods (TPC-UIE) is proposed to achieve visually friendly and high-quality images, where contrastive learning<span> is applied to the underwater image enhancement (UIE) task for the first time. Specifically, to address the limitations of the pure convolution-based network, we embed the transformer into the UIE network to improve its ability to capture global dependencies. Then, the limits of the transformer are then taken into account as convolution is reintroduced to better capture local attention. At the same time, the dual-attention module strengthens the network’s focus on the spatial and color channels that are more severely attenuated. Finally, a perceptual contrastive regularization method is proposed, where a multi-loss function made up of reconstruction loss, perceptual loss, and contrastive loss jointly optimizes the model to simultaneously ensure texture detail, contrast, and color consistency. Experimental results on several existing datasets show that the TPC-UIE obtains excellent performance in both subjective and objective evaluations compared to other methods. In addition, the visual quality of the underwater images is significantly improved by the enhancement of the method and effectively facilitates further development of the underwater task.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117032"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference blurred image quality assessment method based on structure of structure features 基于结构特征的无参考模糊图像质量评估方法
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117008
Jian Chen , Shiyun Li , Li Lin , Jiaze Wan , Zuoyong Li

The deep structure in the image contains certain information of the image, which is helpful to perceive the quality of the image. Inspired by deep level image features extracted via deep learning methods, we propose a no-reference blurred image quality evaluation model based on the structure of structure features. In spatial domain, the novel weighted local binary patterns are proposed which leverage maximum local variation maps to extract structural features from multi-resolution images. In spectral domain, gradient information of multi-scale Log-Gabor filtered images is extracted as the structure of structure features, and combined with entropy features. Then, the features extracted from both domains are fused to form a quality perception feature vector and mapped into the quality score via support vector regression (SVR). Experiments are conducted to evaluate the performance of the proposed method on various IQA databases, including the LIVE, CSIQ, TID2008, TID2013, CID2013, CLIVE, and BID. The experimental results show that compared with some state-of-the-art methods, our proposed method achieves better evaluation results and is more in line with the human visual system. The source code will be released at https://github.com/JamesC0321/s2s_features/.

图像中的深层结构包含了图像的某些信息,有助于感知图像的质量。受深度学习方法提取的深层图像特征的启发,我们提出了一种基于结构特征结构的无参考模糊图像质量评估模型。在空间域中,提出了一种新的加权局部二进制模式,该模式利用最大局部变化图从多分辨率图像中提取结构特征。在谱域中,多尺度Log-Gabor滤波图像的梯度信息被提取为结构特征的结构,并与熵特征相结合。然后,将从两个领域提取的特征融合形成质量感知特征向量,并通过支持向量回归(SVR)映射到质量分数中。实验评估了所提出的方法在各种IQA数据库上的性能,包括LIVE、CSIQ、TID2008、TID2013、CID2013、CLIVE和BID。实验结果表明,与一些最先进的方法相比,我们提出的方法取得了更好的评价结果,更符合人类的视觉系统。源代码将在https://github.com/JamesC0321/s2s_features/.
{"title":"No-reference blurred image quality assessment method based on structure of structure features","authors":"Jian Chen ,&nbsp;Shiyun Li ,&nbsp;Li Lin ,&nbsp;Jiaze Wan ,&nbsp;Zuoyong Li","doi":"10.1016/j.image.2023.117008","DOIUrl":"https://doi.org/10.1016/j.image.2023.117008","url":null,"abstract":"<div><p><span><span><span><span><span><span>The deep structure in the image contains certain information of the image, which is helpful to perceive the quality of the image. Inspired by deep level image features extracted via </span>deep learning<span> methods, we propose a no-reference blurred image quality evaluation model based on the structure of structure features. In spatial domain, the novel weighted local binary patterns are proposed which leverage maximum local variation maps to extract structural features from multi-resolution images. In </span></span>spectral domain, </span>gradient information<span> of multi-scale Log-Gabor filtered images is extracted as the structure of structure features, and combined with entropy features. Then, the features extracted from both domains are fused to form a quality perception feature vector and mapped into the quality score via support vector regression (SVR). Experiments are conducted to evaluate the performance of the proposed method on various </span></span>IQA databases, including the LIVE, CSIQ, TID2008, TID2013, CID2013, CLIVE, and BID. The experimental results show that compared with some state-of-the-art methods, our proposed method achieves better evaluation results and is more in line with the </span>human visual system<span>. The source code will be released at </span></span><span>https://github.com/JamesC0321/s2s_features/</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117008"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Magnifying multimodal forgery clues for Deepfake detection 放大多模态伪造线索的深度伪造检测
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117010
Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao

Advancements in computer vision and deep learning have led to difficulty in distinguishing the generated Deepfake media. In addition, recent forgery techniques also modify the audio information based on the forged video, which brings new challenges. However, due to the cross-modal bias, recent multimodal detection methods do not well explore the intra-modal and cross-modal forgery clues, which leads to limited detection performance. In this paper, we propose a novel audio-visual aware multimodal Deepfake detection framework to magnify intra-modal and cross-modal forgery clues. Firstly, to capture temporal intra-modal defects, Forgery Clues Magnification Transformer (FCMT) module is proposed to magnify forgery clues based on sequence-level relationships. Then, the Distribution Difference based Inconsistency Computing (DDIC) module based on Jensen–Shannon divergence is designed to adaptively align multimodal information for further magnifying the cross-modal inconsistency. Next, we further explore spatial artifacts by connecting multi-scale feature representation to provide comprehensive information. Finally, a feature fusion module is designed to adaptively fuse features to generate a more discriminative feature. Experiments demonstrate that the proposed framework outperforms independently trained models, and at the same time, yields superior generalization capability on unseen types of Deepfake.

计算机视觉和深度学习的进步导致难以区分生成的Deepfake媒体。此外,近年来的伪造技术也在伪造视频的基础上对音频信息进行修改,这给伪造带来了新的挑战。然而,由于存在跨模态偏差,目前的多模态检测方法不能很好地探索模内和跨模态的伪造线索,导致检测性能受到限制。在本文中,我们提出了一种新的视听感知多模态深度伪造检测框架,以放大模态内和跨模态伪造线索。首先,为了捕获时序内模态缺陷,提出伪造线索放大转换器(FCMT)模块,基于序列级关系对伪造线索进行放大;然后,设计了基于Jensen-Shannon散度的基于分布差的不一致计算(DDIC)模块,对多模态信息进行自适应对齐,进一步放大跨模态不一致;接下来,我们通过连接多尺度特征表示来进一步探索空间伪影,以提供全面的信息。最后,设计特征融合模块,自适应融合特征,生成更具判别性的特征。实验表明,提出的框架优于独立训练的模型,同时在未见过的Deepfake类型上产生卓越的泛化能力。
{"title":"Magnifying multimodal forgery clues for Deepfake detection","authors":"Xiaolong Liu,&nbsp;Yang Yu,&nbsp;Xiaolong Li,&nbsp;Yao Zhao","doi":"10.1016/j.image.2023.117010","DOIUrl":"https://doi.org/10.1016/j.image.2023.117010","url":null,"abstract":"<div><p><span>Advancements in computer vision<span><span> and deep learning have led to difficulty in distinguishing the generated Deepfake media. In addition, recent forgery techniques also modify the audio information based on the forged video, which brings new challenges. However, due to the cross-modal bias, recent multimodal detection methods do not well explore the intra-modal and cross-modal forgery clues, which leads to limited detection performance. In this paper, we propose a novel audio-visual aware multimodal Deepfake detection framework to magnify intra-modal and cross-modal forgery clues. Firstly, to capture temporal intra-modal defects, Forgery Clues Magnification Transformer (FCMT) module is proposed to magnify forgery clues based on sequence-level relationships. Then, the Distribution Difference based Inconsistency Computing (DDIC) module based on Jensen–Shannon divergence is designed to adaptively align </span>multimodal information for further magnifying the cross-modal inconsistency. Next, we further explore spatial artifacts by connecting multi-scale feature representation to provide comprehensive information. Finally, a </span></span>feature fusion<span> module is designed to adaptively fuse features to generate a more discriminative feature. Experiments demonstrate that the proposed framework outperforms independently trained models, and at the same time, yields superior generalization capability on unseen types of Deepfake.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117010"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale graph neural network for global stereo matching 用于全局立体匹配的多尺度图神经网络
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117026
Xiaofeng Wang , Jun Yu , Zhiheng Sun , Jiameng Sun , Yingying Su

Currently, deep learning-based stereo matching is solely based on local convolution networks, which lack enough global information for accurate disparity estimation. Motivated by the excellent global representation of the graph, a novel Multi-scale Graph Neural Network (MGNN) is proposed to essentially improve stereo matching from the global aspect. Firstly, we construct the multi-scale graph structure, where the multi-scale nodes with projected multi-scale image features can be directly linked by the inner-scale and cross-scale edges, instead of solely relying on local convolutions for deep learning-based stereo matching. To enhance the spatial position information at non-Euclidean multi-scale graph space, we further propose a multi-scale position embedding to embed the potential position features of Euclidean space into projected multi-scale image features. Secondly, we propose the multi-scale graph feature inference to extract global context information on multi-scale graph structure. Thus, the features not only be globally inferred on each scale, but also can be interactively inferred across different scales to comprehensively consider global context information with multi-scale receptive fields. Finally, MGNN is deployed into dense stereo matching and experiments demonstrate that our method achieves state-of-the-art performance on Scene Flow, KITTI 2012/2015, and Middlebury Stereo Evaluation v.3/2021.

目前,基于深度学习的立体匹配仅基于局部卷积网络,缺乏足够的全局信息来进行准确的视差估计。基于图的良好全局表示,提出了一种新的多尺度图神经网络(MGNN),从全局角度改善了立体匹配。首先,我们构建了多尺度图结构,其中具有投影多尺度图像特征的多尺度节点可以通过内尺度和跨尺度边缘直接链接,而不是仅仅依靠局部卷积进行基于深度学习的立体匹配。为了增强非欧几里得多尺度图空间的空间位置信息,我们进一步提出了一种多尺度位置嵌入方法,将欧几里得空间的潜在位置特征嵌入到投影的多尺度图像特征中。其次,我们提出了多尺度图特征推理来提取多尺度图结构上的全局上下文信息。因此,特征不仅可以在每个尺度上全局推断,而且可以在不同尺度上交互推断,以综合考虑具有多尺度感受野的全局上下文信息。最后,MGNN被部署到密集立体匹配中,实验表明,我们的方法在场景流、KITTI 2012/2015和Middlebury立体声评估v.3/2021上实现了最先进的性能。
{"title":"Multi-scale graph neural network for global stereo matching","authors":"Xiaofeng Wang ,&nbsp;Jun Yu ,&nbsp;Zhiheng Sun ,&nbsp;Jiameng Sun ,&nbsp;Yingying Su","doi":"10.1016/j.image.2023.117026","DOIUrl":"https://doi.org/10.1016/j.image.2023.117026","url":null,"abstract":"<div><p>Currently, deep learning-based stereo matching<span><span> is solely based on local convolution networks, which lack enough global information for accurate disparity estimation. Motivated by the excellent global representation of the graph, a novel Multi-scale </span>Graph Neural Network<span><span> (MGNN) is proposed to essentially improve stereo matching from the global aspect. Firstly, we construct the multi-scale graph structure, where the multi-scale nodes with projected multi-scale image features<span> can be directly linked by the inner-scale and cross-scale edges, instead of solely relying on local convolutions for deep learning-based stereo matching. To enhance the spatial position information at non-Euclidean multi-scale graph space, we further propose a multi-scale </span></span>position embedding to embed the potential position features of Euclidean space into projected multi-scale image features. Secondly, we propose the multi-scale graph feature inference to extract global context information on multi-scale graph structure. Thus, the features not only be globally inferred on each scale, but also can be interactively inferred across different scales to comprehensively consider global context information with multi-scale receptive fields. Finally, MGNN is deployed into dense stereo matching and experiments demonstrate that our method achieves state-of-the-art performance on Scene Flow, KITTI 2012/2015, and Middlebury Stereo Evaluation v.3/2021.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117026"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing transferability of adversarial examples with pixel-level scale variation 利用像素级尺度变化增强对抗性示例的可转移性
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117020
Zhongshu Mao , Yiqin Lu , Zhe Cheng , Xiong Shen

The transferability of adversarial examples under the black-box attack setting has attracted extensive attention from the community. Input transformation is one of the most effective approaches to improve the transferability among all methods proposed recently. However, existing methods either only slightly improve transferability or are not robust to defense models. We delve into the generation process of adversarial examples and find that existing input transformation methods tend to craft adversarial examples by transforming the entire image, which we term image-level transformations. This naturally motivates us to perform pixel-level transformations, i.e., transforming only part pixels of the image. Experimental results show that pixel-level transformations can considerably enhance the transferability of the adversarial examples while still being robust to defense models. We believe that pixel-level transformations are more fine-grained than image-level transformations, and thus can achieve better performance. Based on this finding, we propose the pixel-level scale variation (PSV) method to further improve the transferability of adversarial examples. The proposed PSV randomly samples a set of scaled mask matrices and transforms the part pixels of the input image with the matrices to increase the pixel-level diversity. Empirical evaluations on the standard ImageNet dataset demonstrate the effectiveness and superior performance of the proposed PSV both on the normally trained (with the highest average attack success rate of 79.2%) and defense models (with the highest average attack success rate of 61.4%). Our method can further improve transferability (with the highest average attack success rate of 88.2%) by combining it with other input transformation methods.

对抗性例子在黑匣子攻击环境下的可转移性引起了社会的广泛关注。在最近提出的所有方法中,输入转换是提高可转移性的最有效方法之一。然而,现有的方法要么只是略微提高了可转移性,要么对防御模型不健壮。我们深入研究了对抗性示例的生成过程,发现现有的输入转换方法倾向于通过转换整个图像来制作对抗性示例,我们称之为图像级转换。这自然促使我们执行像素级转换,即仅转换图像的部分像素。实验结果表明,像素级变换可以显著提高对抗性示例的可转移性,同时对防御模型仍然具有鲁棒性。我们相信,像素级转换比图像级转换更细粒度,因此可以获得更好的性能。基于这一发现,我们提出了像素级尺度变化(PSV)方法,以进一步提高对抗性示例的可转移性。所提出的PSV随机采样一组缩放的掩模矩阵,并用这些矩阵变换输入图像的部分像素,以增加像素级的多样性。在标准ImageNet数据集上的经验评估证明了所提出的PSV在正常训练(最高平均攻击成功率为79.2%)和防御模型(最高平均进攻成功率为61.4%)上的有效性和优越性能。我们的方法可以通过将其与其他输入转换方法相结合。
{"title":"Enhancing transferability of adversarial examples with pixel-level scale variation","authors":"Zhongshu Mao ,&nbsp;Yiqin Lu ,&nbsp;Zhe Cheng ,&nbsp;Xiong Shen","doi":"10.1016/j.image.2023.117020","DOIUrl":"https://doi.org/10.1016/j.image.2023.117020","url":null,"abstract":"<div><p>The transferability of adversarial examples under the black-box attack setting has attracted extensive attention from the community. Input transformation is one of the most effective approaches to improve the transferability among all methods proposed recently. However, existing methods either only slightly improve transferability or are not robust to defense models. We delve into the generation process of adversarial examples and find that existing input transformation methods tend to craft adversarial examples by transforming the entire image, which we term image-level transformations. This naturally motivates us to perform pixel-level transformations, i.e., transforming only part pixels of the image. Experimental results show that pixel-level transformations can considerably enhance the transferability of the adversarial examples while still being robust to defense models. We believe that pixel-level transformations are more fine-grained than image-level transformations, and thus can achieve better performance. Based on this finding, we propose the pixel-level scale variation (PSV) method to further improve the transferability of adversarial examples. The proposed PSV randomly samples a set of scaled mask matrices and transforms the part pixels of the input image with the matrices to increase the pixel-level diversity. Empirical evaluations on the standard ImageNet dataset demonstrate the effectiveness and superior performance of the proposed PSV both on the normally trained (with the highest average attack success rate of 79.2%) and defense models (with the highest average attack success rate of 61.4%). Our method can further improve transferability (with the highest average attack success rate of 88.2%) by combining it with other input transformation methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117020"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A coarse-to-fine multi-scale feature hybrid low-dose CT denoising network 一种由粗到细的多尺度特征混合低剂量CT去噪网络
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117009
Zefang Han , Hong Shangguan, Xiong Zhang, Xueying Cui, Yue Wang

With the growing development and wide clinical application of CT technology, the potential radiation damage to patients has sparked public concern. However, reducing the radiation dose may cause large amounts of noise and artifacts in the reconstructed images, which may affect the accuracy of the clinical diagnosis. Therefore, improving the quality of low-dose CT scans has become a popular research topic. Generative adversarial networks (GAN) have provided new research ideas for low-dose CT (LDCT) denoising. However, utilizing only image decomposition or adding new functional subnetworks cannot effectively fuse the same type of features with different scales (or different types of features). Thus, most current GAN-based denoising networks often suffer from low feature utilization and increased network complexity. To address these problems, we propose a coarse-to-fine multiscale feature hybrid low-dose CT denoising network (CMFHGAN). The generator consists of a global denoising module, local texture feature enhancement module, and self-calibration feature fusion module. The three modules complement each other and guarantee overall denoising performance. In addition, to further improve the denoising performance, we propose a multi-resolution inception discriminator with multiscale feature extraction ability. Experiments were performed on the Mayo and Piglet datasets, and the results showed that the proposed method outperformed the state-of-the-art denoising algorithms.

随着CT技术的不断发展和临床应用的广泛,其对患者的潜在辐射损伤引起了公众的关注。然而,降低辐射剂量可能会在重建图像中引起大量噪声和伪影,这可能会影响临床诊断的准确性。因此,提高低剂量CT扫描的质量已成为一个热门的研究课题。生成对抗性网络为低剂量CT去噪提供了新的研究思路。然而,仅利用图像分解或添加新的功能子网络不能有效地融合具有不同尺度(或不同类型特征)的同一类型特征。因此,当前大多数基于GAN的去噪网络往往存在特征利用率低和网络复杂性增加的问题。为了解决这些问题,我们提出了一种从粗到细的多尺度特征混合低剂量CT去噪网络(CMFHGAN)。生成器由全局去噪模块、局部纹理特征增强模块和自校准特征融合模块组成。这三个模块相辅相成,保证了整体去噪性能。此外,为了进一步提高去噪性能,我们提出了一种具有多尺度特征提取能力的多分辨率初始鉴别器。在Mayo和Piglet数据集上进行了实验,结果表明,所提出的方法优于最先进的去噪算法。
{"title":"A coarse-to-fine multi-scale feature hybrid low-dose CT denoising network","authors":"Zefang Han ,&nbsp;Hong Shangguan,&nbsp;Xiong Zhang,&nbsp;Xueying Cui,&nbsp;Yue Wang","doi":"10.1016/j.image.2023.117009","DOIUrl":"https://doi.org/10.1016/j.image.2023.117009","url":null,"abstract":"<div><p><span><span>With the growing development and wide clinical application of CT technology, the potential radiation damage to patients has sparked public concern. However, reducing the radiation dose may cause large amounts of noise and artifacts in the reconstructed images, which may affect the accuracy of the clinical diagnosis. Therefore, improving the quality of low-dose CT scans has become a popular research topic. Generative adversarial networks (GAN) have provided new research ideas for low-dose CT (LDCT) denoising. However, utilizing only image decomposition or adding new functional </span>subnetworks<span> cannot effectively fuse the same type of features with different scales (or different types of features). Thus, most current GAN-based denoising networks often suffer from low feature utilization and increased network complexity. To address these problems, we propose a coarse-to-fine multiscale feature hybrid low-dose CT denoising network (CMFHGAN). The generator consists of a global denoising module, local texture feature enhancement module, and self-calibration </span></span>feature fusion<span> module. The three modules complement each other and guarantee overall denoising performance. In addition, to further improve the denoising performance, we propose a multi-resolution inception discriminator with multiscale feature extraction ability. Experiments were performed on the Mayo and Piglet datasets, and the results showed that the proposed method outperformed the state-of-the-art denoising algorithms.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117009"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGB pixel n-grams: A texture descriptor RGB像素n-grams:纹理描述符
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117028
Fátima Belén Paiva Pavón , María Cristina Orué Gil , José Luis Vázquez Noguera , Helena Gómez-Adorno , Valentín Calzada-Ledesma

This article proposes the “RGB Pixel N-grams” descriptor, which uses a sequence of n pixels to represent RGB color texture images. We conducted classification experiments with three different classifiers and five color texture image databases to evaluate the descriptor’s performance, using accuracy as the evaluation metric. These databases include various textures from different surfaces, sometimes under different lighting, scale, or rotation conditions. The proposed descriptor proved to be robust and competitive compared to other state-of-the-art descriptors, as it has better accuracy in classification results in most databases and classifiers.

本文提出了“RGB Pixel n -grams”描述符,它使用n个像素的序列来表示RGB彩色纹理图像。我们使用三种不同的分类器和五种颜色纹理图像数据库进行分类实验,以准确率为评价指标来评估描述符的性能。这些数据库包括来自不同表面的各种纹理,有时在不同的光照、比例或旋转条件下。与其他最先进的描述符相比,所提出的描述符具有鲁棒性和竞争力,因为它在大多数数据库和分类器中具有更好的分类结果准确性。
{"title":"RGB pixel n-grams: A texture descriptor","authors":"Fátima Belén Paiva Pavón ,&nbsp;María Cristina Orué Gil ,&nbsp;José Luis Vázquez Noguera ,&nbsp;Helena Gómez-Adorno ,&nbsp;Valentín Calzada-Ledesma","doi":"10.1016/j.image.2023.117028","DOIUrl":"https://doi.org/10.1016/j.image.2023.117028","url":null,"abstract":"<div><p>This article proposes the “RGB Pixel N-grams” descriptor, which uses a sequence of <span><math><mi>n</mi></math></span><span> pixels to represent RGB color texture images. We conducted classification experiments with three different classifiers and five color texture image databases to evaluate the descriptor’s performance, using accuracy as the evaluation metric<span>. These databases include various textures from different surfaces, sometimes under different lighting, scale, or rotation conditions. The proposed descriptor proved to be robust and competitive compared to other state-of-the-art descriptors, as it has better accuracy in classification results in most databases and classifiers.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117028"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual attention guided multi-scale fusion network for RGB-D salient object detection 用于RGB-D显著目标检测的双注意力引导多尺度融合网络
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117004
Huan Gao, Jichang Guo, Yudong Wang, Jianan Dong

While recent research on salient object detection (SOD) has shown remarkable progress in leveraging both RGB and depth data, it is still worth exploring how to use the inherent relationship between the two to extract and fuse features more effectively, and further make more accurate predictions. In this paper, we consider combining the attention mechanism with the characteristics of the SOD, proposing the Dual Attention Guided Multi-scale Fusion Network. We design the multi-scale fusion block by combining multi-scale branches with channel attention to achieve better fusion of RGB and depth information. Using the characteristic of the SOD, the dual attention module is proposed to make the network pay more attention to the currently unpredicted saliency regions and the wrong parts in the already predicted regions. We perform an ablation study to verify the effectiveness of each component. Quantitative and qualitative experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance.

尽管最近对显著对象检测(SOD)的研究在利用RGB和深度数据方面取得了显著进展,但如何利用两者之间的内在关系更有效地提取和融合特征,并进一步做出更准确的预测,仍然值得探索。本文考虑将注意力机制与超氧化物歧化酶的特点相结合,提出了双注意力引导的多尺度融合网络。我们通过将多尺度分支与通道注意力相结合来设计多尺度融合块,以实现RGB和深度信息的更好融合。利用超氧化物歧化酶的特性,提出了双注意力模块,使网络更加关注当前未预测的显著区域和已预测区域中的错误部分。我们进行了消融研究,以验证每个组件的有效性。定量和定性实验结果表明,我们的方法达到了最先进的(SOTA)性能。
{"title":"Dual attention guided multi-scale fusion network for RGB-D salient object detection","authors":"Huan Gao,&nbsp;Jichang Guo,&nbsp;Yudong Wang,&nbsp;Jianan Dong","doi":"10.1016/j.image.2023.117004","DOIUrl":"https://doi.org/10.1016/j.image.2023.117004","url":null,"abstract":"<div><p>While recent research on salient object detection (SOD) has shown remarkable progress in leveraging both RGB and depth data, it is still worth exploring how to use the inherent relationship between the two to extract and fuse features more effectively, and further make more accurate predictions. In this paper, we consider combining the attention mechanism with the characteristics of the SOD, proposing the Dual Attention Guided Multi-scale Fusion Network. We design the multi-scale fusion block by combining multi-scale branches with channel attention to achieve better fusion of RGB and depth information. Using the characteristic of the SOD, the dual attention module is proposed to make the network pay more attention to the currently unpredicted saliency regions and the wrong parts in the already predicted regions. We perform an ablation study to verify the effectiveness of each component. Quantitative and qualitative experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117004"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-light image enhancement based on virtual exposure 基于虚拟曝光的微光图像增强
IF 3.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2023-10-01 DOI: 10.1016/j.image.2023.117016
Wencheng Wang , Dongliang Yan , Xiaojin Wu , Weikai He , Zhenxue Chen , Xiaohui Yuan , Lun Li

Under poor illumination, the image information captured by a camera is partially lost, which seriously affects the visual perception of the human. Inspired by the idea that the fusion of multiexposure images can yield one high-quality image, an adaptive enhancement framework for a single low-light image is proposed based on the strategy of virtual exposure. In this framework, the exposure control parameters are adaptively generated through a statistical analysis of the low-light image, and a virtual exposure enhancer constructed by a quadratic function is applied to generate several image frames from a single input image. Then, on the basis of generating weight maps by three factors, i.e., contrast, saturation and saliency, the image sequences and weight images are transformed by a Laplacian pyramid and Gaussian pyramid, respectively, and multiscale fusion is implemented layer by layer. Finally, the enhanced result is obtained by pyramid reconstruction rule. Compared with the experimental results of several state-of-the-art methods on five datasets, the proposed method shows its superiority on several image quality evaluation metrics. This method requires neither image calibration nor camera response function estimation and has a more flexible application range. It can weaken the possibility of overenhancement, effectively avoid the appearance of a halo in the enhancement results, and adaptively improve the visual information fidelity.

在较差的光照条件下,摄像机捕捉到的图像信息会部分丢失,严重影响人的视觉感知。基于多曝光图像融合可获得高质量图像的思想,提出了一种基于虚拟曝光策略的单幅弱光图像自适应增强框架。该框架通过对低光图像进行统计分析,自适应生成曝光控制参数,并利用二次函数构造虚拟曝光增强器,从单个输入图像生成多个图像帧。然后,在对比度、饱和度和显著性三个因素生成权重图的基础上,分别用拉普拉斯金字塔和高斯金字塔对图像序列和权重图像进行变换,逐层实现多尺度融合;最后,利用金字塔重构规则得到增强结果。与几种最新方法在5个数据集上的实验结果进行比较,表明了该方法在多个图像质量评价指标上的优越性。该方法既不需要图像标定,也不需要估计相机响应函数,应用范围更加灵活。它可以减弱过度增强的可能性,有效避免增强结果中出现光晕,自适应地提高视觉信息保真度。
{"title":"Low-light image enhancement based on virtual exposure","authors":"Wencheng Wang ,&nbsp;Dongliang Yan ,&nbsp;Xiaojin Wu ,&nbsp;Weikai He ,&nbsp;Zhenxue Chen ,&nbsp;Xiaohui Yuan ,&nbsp;Lun Li","doi":"10.1016/j.image.2023.117016","DOIUrl":"https://doi.org/10.1016/j.image.2023.117016","url":null,"abstract":"<div><p>Under poor illumination, the image information captured by a camera is partially lost, which seriously affects the visual perception of the human. Inspired by the idea that the fusion of multiexposure images can yield one high-quality image, an adaptive enhancement framework for a single low-light image is proposed based on the strategy of virtual exposure. In this framework, the exposure control parameters are adaptively generated through a statistical analysis of the low-light image, and a virtual exposure enhancer constructed by a quadratic function<span><span><span> is applied to generate several image frames from a single input image. Then, on the basis of generating weight maps by three factors, i.e., contrast, saturation and saliency, the image sequences and weight images are transformed by a Laplacian pyramid<span> and Gaussian pyramid, respectively, and multiscale fusion is implemented layer by layer. Finally, the enhanced result is obtained by pyramid reconstruction rule. Compared with the experimental results of several state-of-the-art methods on five datasets, the proposed method shows its superiority on several image quality evaluation metrics. This method requires neither image calibration nor </span></span>camera response function estimation and has a more flexible application range. It can weaken the possibility of overenhancement, effectively avoid the appearance of a halo in the enhancement results, and adaptively improve the visual </span>information fidelity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117016"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1