首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding 使用稀疏编码的多帧图像超分辨率优化管道的新理论分析
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-07 DOI: 10.1016/j.image.2024.117198
Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar

Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.

超分辨率是指从一个或多个低分辨率(LR)图像中获取高分辨率(HR)图像的过程。单幅图像超分辨率(SISR)处理一幅低分辨率图像,而多幅图像超分辨率(MFSR)则采用多幅低分辨率图像来获得高分辨率输出。MFSR 流程包括对齐、融合和重建。我们利用稀疏编码(SC)和迭代收缩阈值算法进行了理论分析,以填补最佳 MFSR 流水线执行顺序在数学上的不足。我们的分析建议在重建阶段(无论是通过解卷积还是 SISR 技术)之前执行配准和融合。建议的顺序可确保提高峰值信噪比和结构相似性指数的性能。与将 SISR 应用于每个输入 LR 图像的直观方法相比,最佳管道还降低了计算复杂度。此外,我们还利用自然图像中的稀疏性假设,证明了 SC 在 MFSR 等计算机视觉任务分析中的实用性。仿真结果从定量和定性两方面支持了我们的理论分析结果。
{"title":"A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding","authors":"Mohammad Mahdi Afrasiabi,&nbsp;Reshad Hosseini,&nbsp;Aliazam Abbasfar","doi":"10.1016/j.image.2024.117198","DOIUrl":"10.1016/j.image.2024.117198","url":null,"abstract":"<div><p>Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117198"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image enhancement via brightness mask-guided multi-attention embedding 通过亮度掩模引导的多注意力嵌入增强水下图像
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-06 DOI: 10.1016/j.image.2024.117200
Yuanyuan Li, Zetian Mi, Peng Lin, Xianping Fu

Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.

人们提出了许多新的水下图像增强方法来校正颜色和增强对比度。虽然这些方法在某些方面取得了令人满意的增强效果,但很少有方法考虑到原始图像光照分布对增强结果的影响,这往往会导致过饱和或欠饱和。为了解决这些问题,我们设计了一种以亮度掩码为引导的水下图像增强网络,称为 BMGMANet。具体来说,考虑到水下图像的不同区域有不同的劣化程度,这可以通过表征图像光照分布的亮度掩码隐式地反映出来,因此设计了一个由反向亮度掩码引导的解码器网络,以增强暗区,同时抑制亮区的过度增强。此外,还设计了一个三重关注模块,以进一步增强水下图像的对比度,恢复更多细节。广泛的对比实验证明,我们网络的增强效果优于现有的最先进方法。此外,其他实验也证明,我们的 BMGMANet 可以有效增强非均匀光照水下图像,并提高水下图像中显著性物体检测的性能。
{"title":"Underwater image enhancement via brightness mask-guided multi-attention embedding","authors":"Yuanyuan Li,&nbsp;Zetian Mi,&nbsp;Peng Lin,&nbsp;Xianping Fu","doi":"10.1016/j.image.2024.117200","DOIUrl":"10.1016/j.image.2024.117200","url":null,"abstract":"<div><p>Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117200"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing DJUHNet:一种基于深度表示学习的方案,适用于联合图像上采样和散列任务
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-06 DOI: 10.1016/j.image.2024.117187
Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad

In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.

近年来,针对图像散列任务开发了许多采用深度神经网络的高效方案。然而,当输入图像不具备高空间分辨率和视觉质量时,如何提高这些深度哈希网络的性能和鲁棒性却没有得到太多关注。这是一个关键问题,因为在实际应用中往往无法保证获得高质量的高分辨率图像。在本文中,我们针对联合图像上采样和散列任务提出了一种采用三阶段设计的新方法。具体来说,在所提方案的前两个阶段,我们获得了两个深度神经网络,每个网络分别针对图像超分辨率和图像散列任务进行单独训练。然后,我们利用表征学习和交替优化过程的思想对由此获得的两个深度网络进行微调,以便为联合图像上采样和散列任务生成一组最佳参数。通过进行不同的实验,证明了设计拟议方法所采用的各种思想的有效性。实验结果表明,即使以端到端联合方式同时训练图像上采样和哈希算法,所提出的方案也能超越最先进的图像上采样和哈希算法。
{"title":"DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing","authors":"Alireza Esmaeilzehi ,&nbsp;Morteza Mirzaei ,&nbsp;Hossein Zaredar ,&nbsp;Dimitrios Hatzinakos ,&nbsp;M. Omair Ahmad","doi":"10.1016/j.image.2024.117187","DOIUrl":"10.1016/j.image.2024.117187","url":null,"abstract":"<div><p>In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117187"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Globally and locally optimized Pannini projection for high FoV rendering of 360° images 全局和局部优化的潘尼尼投影,用于 360° 图像的高 FoV 渲染
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-30 DOI: 10.1016/j.image.2024.117190
Falah Jabar, João Ascenso, Maria Paula Queluz

To render a spherical (360° or omnidirectional) image on planar displays, a 2D image - called as viewport - must be obtained by projecting a sphere region on a plane, according to the user's viewing direction and a predefined field of view (FoV). However, any sphere to plan projection introduces geometric distortions, such as object stretching and/or bending of straight lines, which intensity increases with the considered FoV. In this paper, a fully automatic content-aware projection is proposed, aiming to reduce the geometric distortions when high FoVs are used. This new projection is based on the Pannini projection, whose parameters are firstly globally optimized according to the image content, followed by a local conformality improvement of relevant viewport objects. A crowdsourcing subjective test showed that the proposed projection is the most preferred solution among the considered state-of-the-art sphere to plan projections, producing viewports with a more pleasant visual quality.

要在平面显示器上呈现球形(360° 或全方位)图像,必须根据用户的观看方向和预定义的视场(FoV),通过在平面上投影球形区域来获得二维图像(称为视口)。然而,任何球面到平面的投影都会带来几何失真,如物体拉伸和/或直线弯曲,其强度随所考虑的 FoV 而增加。本文提出了一种全自动内容感知投影,旨在减少使用高视场角时的几何失真。这种新投影基于潘尼尼投影,其参数首先根据图像内容进行全局优化,然后对相关视口对象进行局部保形改进。众包主观测试表明,在目前最先进的球面到平面投影中,建议的投影是最受欢迎的解决方案,能产生视觉质量更佳的视口。
{"title":"Globally and locally optimized Pannini projection for high FoV rendering of 360° images","authors":"Falah Jabar,&nbsp;João Ascenso,&nbsp;Maria Paula Queluz","doi":"10.1016/j.image.2024.117190","DOIUrl":"10.1016/j.image.2024.117190","url":null,"abstract":"<div><p>To render a spherical (360° or omnidirectional) image on planar displays, a 2D image - called as viewport - must be obtained by projecting a sphere region on a plane, according to the user's viewing direction and a predefined field of view (FoV). However, any sphere to plan projection introduces geometric distortions, such as object stretching and/or bending of straight lines, which intensity increases with the considered FoV. In this paper, a fully automatic content-aware projection is proposed, aiming to reduce the geometric distortions when high FoVs are used. This new projection is based on the Pannini projection, whose parameters are firstly globally optimized according to the image content, followed by a local conformality improvement of relevant viewport objects. A crowdsourcing subjective test showed that the proposed projection is the most preferred solution among the considered state-of-the-art sphere to plan projections, producing viewports with a more pleasant visual quality.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117190"},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000912/pdfft?md5=1ff2da4c676f5e3a19cdbe6c4c5f6989&pid=1-s2.0-S0923596524000912-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-wise self-knowledge distillation for few-shot segmentation 以原型为导向,提炼自我知识,进行少量细分
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-21 DOI: 10.1016/j.image.2024.117186
Yadang Chen , Xinyu Xu , Chenchen Wei , Chuhan Lu

Few-shot segmentation was proposed to obtain segmentation results for a image with an unseen class by referring to a few labeled samples. However, due to the limited number of samples, many few-shot segmentation models suffer from poor generalization. Prototypical network-based few-shot segmentation still has issues with spatial inconsistency and prototype bias. Since the target class has different appearance in each image, some specific features in the prototypes generated from the support image and its mask do not accurately reflect the generalized features of the target class. To address the support prototype consistency issue, we put forward two modules: Data Augmentation Self-knowledge Distillation (DASKD) and Prototype-wise Regularization (PWR). The DASKD module focuses on enhancing spatial consistency by using data augmentation and self-knowledge distillation. Self-knowledge distillation helps the model acquire generalized features of the target class and learn hidden knowledge from the support images. The PWR module focuses on obtaining a more representative support prototype by conducting prototype-level loss to obtain support prototypes closer to the category center. Broad evaluation experiments on PASCAL-5i and COCO-20i demonstrate that our model outperforms the prior works on few-shot segmentation. Our approach surpasses the state of the art by 7.5% in PASCAL-5i and 4.2% in COCO-20i.

少数镜头分割法的提出是为了通过参考少数标注样本来获得未见类别图像的分割结果。然而,由于样本数量有限,许多少数镜头分割模型都存在泛化能力差的问题。基于原型网络的少拍分割仍然存在空间不一致和原型偏差的问题。由于目标类别在每幅图像中都有不同的外观,因此由支持图像及其掩膜生成的原型中的某些特定特征并不能准确反映目标类别的概括特征。为了解决支持原型一致性问题,我们提出了两个模块:数据增强自知蒸馏(DASKD)和原型正则化(PWR)。DASKD 模块的重点是通过数据扩增和自我知识提炼来增强空间一致性。自知提炼有助于模型获取目标类别的通用特征,并从支持图像中学习隐藏知识。PWR 模块的重点是通过原型级损耗获得更具代表性的支持原型,从而获得更接近类别中心的支持原型。在 PASCAL-5i 和 COCO-20i 上进行的广泛评估实验表明,我们的模型在少镜头分割方面优于之前的研究成果。在 PASCAL-5i 和 COCO-20i 中,我们的方法分别比现有技术高出 7.5% 和 4.2%。
{"title":"Prototype-wise self-knowledge distillation for few-shot segmentation","authors":"Yadang Chen ,&nbsp;Xinyu Xu ,&nbsp;Chenchen Wei ,&nbsp;Chuhan Lu","doi":"10.1016/j.image.2024.117186","DOIUrl":"10.1016/j.image.2024.117186","url":null,"abstract":"<div><p>Few-shot segmentation was proposed to obtain segmentation results for a image with an unseen class by referring to a few labeled samples. However, due to the limited number of samples, many few-shot segmentation models suffer from poor generalization. Prototypical network-based few-shot segmentation still has issues with spatial inconsistency and prototype bias. Since the target class has different appearance in each image, some specific features in the prototypes generated from the support image and its mask do not accurately reflect the generalized features of the target class. To address the support prototype consistency issue, we put forward two modules: Data Augmentation Self-knowledge Distillation (DASKD) and Prototype-wise Regularization (PWR). The DASKD module focuses on enhancing spatial consistency by using data augmentation and self-knowledge distillation. Self-knowledge distillation helps the model acquire generalized features of the target class and learn hidden knowledge from the support images. The PWR module focuses on obtaining a more representative support prototype by conducting prototype-level loss to obtain support prototypes closer to the category center. Broad evaluation experiments on PASCAL-<span><math><msup><mrow><mn>5</mn></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-<span><math><mrow><mn>2</mn><msup><mrow><mn>0</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span> demonstrate that our model outperforms the prior works on few-shot segmentation. Our approach surpasses the state of the art by 7.5% in PASCAL-<span><math><msup><mrow><mn>5</mn></mrow><mrow><mi>i</mi></mrow></msup></math></span> and 4.2% in COCO-<span><math><mrow><mn>2</mn><msup><mrow><mn>0</mn></mrow><mrow><mi>i</mi></mrow></msup></mrow></math></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117186"},"PeriodicalIF":3.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-CNN for small image object detection 用于小图像对象检测的变换器-CNN
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-21 DOI: 10.1016/j.image.2024.117194
Yan-Lin Chen , Chun-Liang Lin , Yu-Chen Lin , Tzu-Chun Chen

Object recognition in computer vision technology has been a popular research field in recent years. Although the detection success rate of regular objects has achieved impressive results, small object detection (SOD) is still a challenging issue. In the Microsoft Common Objects in Context (MS COCO) public dataset, the detection rate of small objects is typically half that of regular-sized objects. The main reason is that small objects are often affected by multi-layer convolution and pooling, leading to insufficient details to distinguish them from the background or similar objects, resulting in poor recognition rates or even no results. This paper presents a network architecture, Transformer-CNN, that combines a self-attention mechanism-based transformer and a convolutional neural network (CNN) to improve the recognition rate of SOD. It captures global information through a transformer and uses the translation invariance and translation equivalence of CNN to maximize the retention of global and local features while improving the reliability and robustness of SOD. Our experiments show that the proposed model improves the small object recognition rate by 2∼5 % than the general transformer architectures.

近年来,计算机视觉技术中的物体识别是一个热门研究领域。虽然常规物体的检测成功率已经取得了令人瞩目的成果,但小物体检测(SOD)仍然是一个具有挑战性的问题。在 Microsoft Common Objects in Context(MS COCO)公共数据集中,小物体的检测率通常只有常规尺寸物体的一半。主要原因是小物体通常会受到多层卷积和池化的影响,导致细节不足,无法将其与背景或类似物体区分开来,从而导致识别率低下,甚至没有结果。本文提出了一种网络架构--变压器-CNN,它结合了基于自注意机制的变压器和卷积神经网络(CNN),以提高 SOD 的识别率。它通过变压器捕捉全局信息,并利用 CNN 的翻译不变性和翻译等价性最大限度地保留全局和局部特征,同时提高 SOD 的可靠性和鲁棒性。我们的实验表明,与一般的变换器架构相比,所提出的模型可将小物体识别率提高 2∼5%。
{"title":"Transformer-CNN for small image object detection","authors":"Yan-Lin Chen ,&nbsp;Chun-Liang Lin ,&nbsp;Yu-Chen Lin ,&nbsp;Tzu-Chun Chen","doi":"10.1016/j.image.2024.117194","DOIUrl":"10.1016/j.image.2024.117194","url":null,"abstract":"<div><p>Object recognition in computer vision technology has been a popular research field in recent years. Although the detection success rate of regular objects has achieved impressive results, small object detection (SOD) is still a challenging issue. In the Microsoft Common Objects in Context (MS COCO) public dataset, the detection rate of small objects is typically half that of regular-sized objects. The main reason is that small objects are often affected by multi-layer convolution and pooling, leading to insufficient details to distinguish them from the background or similar objects, resulting in poor recognition rates or even no results. This paper presents a network architecture, Transformer-CNN, that combines a self-attention mechanism-based transformer and a convolutional neural network (CNN) to improve the recognition rate of SOD. It captures global information through a transformer and uses the translation invariance and translation equivalence of CNN to maximize the retention of global and local features while improving the reliability and robustness of SOD. Our experiments show that the proposed model improves the small object recognition rate by 2∼5 % than the general transformer architectures.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117194"},"PeriodicalIF":3.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature extractor optimization for discriminative representations in Generalized Category Discovery 优化特征提取器,实现广义类别发现中的判别表征
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-17 DOI: 10.1016/j.image.2024.117195
Zhonghao Chang, Xiao Li, Zihao Zhao

Generalized Category Discovery (GCD) task involves transferring knowledge from labeled known categories to recognize both known and novel categories within an unlabeled dataset. A significant challenge arises from the lack of prior information for novel categories. To address this, we develop a feature extractor that can learn discriminative features for both known and novel categories. Our approach leverages the observation that similar samples often belong to the same class. We construct a similarity matrix and employ similarity contrastive loss to increase the similarity between similar samples in the feature space. Additionally, we incorporate cluster labels to further refine the feature extractor, utilizing K-means clustering to assign these labels to unlabeled data, providing valuable supervision. Our feature extractor is optimized through the utilization of instance-level contrastive learning and class-level contrastive learning constraints. These constraints promote similarity maximization in both the instance space and the label space for instances sharing the same pseudo-labels. These three components complement each other, facilitating the learning of discriminative representations for both known and novel categories. Through comprehensive evaluations of generic image recognition datasets and challenging fine-grained datasets, we demonstrate that our proposed method achieves state-of-the-art performance.

广义类别发现(GCD)任务涉及从已标记的已知类别中转移知识,以识别未标记数据集中的已知类别和新类别。由于缺乏新类别的先验信息,因此面临着巨大的挑战。为了解决这个问题,我们开发了一种特征提取器,可以学习已知类别和新类别的鉴别特征。我们的方法利用了相似样本通常属于同一类别这一观察结果。我们构建了一个相似性矩阵,并采用相似性对比损失来增加特征空间中相似样本之间的相似性。此外,我们还结合集群标签来进一步完善特征提取器,利用 K-means 聚类将这些标签分配给未标记的数据,从而提供有价值的监督。通过利用实例级对比学习和类级对比学习约束,我们的特征提取器得到了优化。对于共享相同伪标签的实例,这些约束可促进实例空间和标签空间的相似性最大化。这三个部分相辅相成,促进了已知类别和新类别的判别表征学习。通过对一般图像识别数据集和具有挑战性的细粒度数据集的全面评估,我们证明了我们提出的方法达到了最先进的性能。
{"title":"Feature extractor optimization for discriminative representations in Generalized Category Discovery","authors":"Zhonghao Chang,&nbsp;Xiao Li,&nbsp;Zihao Zhao","doi":"10.1016/j.image.2024.117195","DOIUrl":"10.1016/j.image.2024.117195","url":null,"abstract":"<div><p>Generalized Category Discovery (GCD) task involves transferring knowledge from labeled known categories to recognize both known and novel categories within an unlabeled dataset. A significant challenge arises from the lack of prior information for novel categories. To address this, we develop a feature extractor that can learn discriminative features for both known and novel categories. Our approach leverages the observation that similar samples often belong to the same class. We construct a similarity matrix and employ similarity contrastive loss to increase the similarity between similar samples in the feature space. Additionally, we incorporate cluster labels to further refine the feature extractor, utilizing K-means clustering to assign these labels to unlabeled data, providing valuable supervision. Our feature extractor is optimized through the utilization of instance-level contrastive learning and class-level contrastive learning constraints. These constraints promote similarity maximization in both the instance space and the label space for instances sharing the same pseudo-labels. These three components complement each other, facilitating the learning of discriminative representations for both known and novel categories. Through comprehensive evaluations of generic image recognition datasets and challenging fine-grained datasets, we demonstrate that our proposed method achieves state-of-the-art performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117195"},"PeriodicalIF":3.4,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142020716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image-based virtual try-on: Fidelity and simplification 基于图像的虚拟试穿:保真和简化
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-16 DOI: 10.1016/j.image.2024.117189
Tasin Islam, Alina Miron, Xiaohui Liu, Yongmin Li

We introduce a novel image-based virtual try-on model designed to replace a candidate’s garment with a desired target item. The proposed model comprises three modules: segmentation, garment warping, and candidate-clothing fusion. Previous methods have shown limitations in cases involving significant differences between the original and target clothing, as well as substantial overlapping of body parts. Our model addresses these limitations by employing two key strategies. Firstly, it utilises a candidate representation based on an RGB skeleton image to enhance spatial relationships among body parts, resulting in robust segmentation and improved occlusion handling. Secondly, truncated U-Net is employed in both the segmentation and warping modules, enhancing segmentation performance and accelerating the try-on process. The warping module leverages an efficient affine transform for ease of training. Comparative evaluations against state-of-the-art models demonstrate the competitive performance of our proposed model across various scenarios, particularly excelling in handling occlusion cases and significant differences in clothing cases. This research presents a promising solution for image-based virtual try-on, advancing the field by overcoming key limitations and achieving superior performance.

我们介绍了一种新颖的基于图像的虚拟试穿模型,旨在将候选人的服装替换为所需的目标物品。该模型由三个模块组成:分割、服装变形和候选人-服装融合。以往的方法在原始服装和目标服装之间存在显著差异以及身体部位大量重叠的情况下显示出局限性。我们的模型通过采用两个关键策略来解决这些局限性。首先,它利用基于 RGB 骨架图像的候选表示来增强身体部位之间的空间关系,从而实现稳健的分割并改进遮挡处理。其次,在分割和翘曲模块中都采用了截断 U-Net,从而提高了分割性能并加速了试穿过程。翘曲模块利用高效的仿射变换,便于训练。与最先进模型的比较评估表明,我们提出的模型在各种情况下都具有很强的竞争力,尤其是在处理遮挡情况和服装差异较大的情况时表现出色。这项研究为基于图像的虚拟试穿提供了一个前景广阔的解决方案,通过克服关键限制和实现卓越性能,推动了该领域的发展。
{"title":"Image-based virtual try-on: Fidelity and simplification","authors":"Tasin Islam,&nbsp;Alina Miron,&nbsp;Xiaohui Liu,&nbsp;Yongmin Li","doi":"10.1016/j.image.2024.117189","DOIUrl":"10.1016/j.image.2024.117189","url":null,"abstract":"<div><p>We introduce a novel image-based virtual try-on model designed to replace a candidate’s garment with a desired target item. The proposed model comprises three modules: segmentation, garment warping, and candidate-clothing fusion. Previous methods have shown limitations in cases involving significant differences between the original and target clothing, as well as substantial overlapping of body parts. Our model addresses these limitations by employing two key strategies. Firstly, it utilises a candidate representation based on an RGB skeleton image to enhance spatial relationships among body parts, resulting in robust segmentation and improved occlusion handling. Secondly, truncated U-Net is employed in both the segmentation and warping modules, enhancing segmentation performance and accelerating the try-on process. The warping module leverages an efficient affine transform for ease of training. Comparative evaluations against state-of-the-art models demonstrate the competitive performance of our proposed model across various scenarios, particularly excelling in handling occlusion cases and significant differences in clothing cases. This research presents a promising solution for image-based virtual try-on, advancing the field by overcoming key limitations and achieving superior performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117189"},"PeriodicalIF":3.4,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000900/pdfft?md5=d7b74bcca8966cd1d3e0e38fa30c8482&pid=1-s2.0-S0923596524000900-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Duration-aware and mode-aware micro-expression spotting for long video sequences 针对长视频序列的时长感知和模式感知微表情定位
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-10 DOI: 10.1016/j.image.2024.117192
Jing Liu , Xin Li , Jiaqi Zhang , Guangtao Zhai , Yuting Su , Yuyi Zhang , Bo Wang

Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based methods are the most prevalent. Due to the differences of individual physiological and psychological mechanisms, and some uncontrollable factors, the durations and transition modes of different MEs fluctuate greatly. Limited to fixed window scale and mode, traditional sliding window based ME spotting methods fail to capture the motion changes of all MEs exactly, resulting in performance degradation. In this paper, an ensemble learning based duration & mode-aware (DMA) ME spotting framework is proposed. Specifically, we exploit multiple sliding windows of different scales and modes to generate multiple weak detectors, each of which accommodates to MEs with certain duration and transition mode. Additionally, to get a more comprehensive strong detector, we integrate the analysis results of multiple weak detectors using a voting based aggregation module. Furthermore, a novel interval generation scheme is designed to merge close peaks and their neighbor frames into a complete ME interval. Experimental results on two long video databases show the promising performance of our proposed DMA framework compared with state-of-the-art methods. The codes are available at https://github.com/TJUMMG/DMA-ME-Spotting.

微表情(ME)是一种无意识的、瞬间的、轻微的面部动作,它揭示了人们的真实情感。定位微表情是对微表情进行分类的前提,但目前只有少数研究关注这一任务。其中,基于滑动窗口的方法最为普遍。由于个体生理和心理机制的差异以及一些不可控因素,不同 ME 的持续时间和转换模式波动很大。受限于固定的窗口尺度和模式,传统的基于滑动窗口的 ME 定位方法无法准确捕捉到所有 ME 的运动变化,导致性能下降。本文提出了一种基于集合学习的时长& 模式感知(DMA)ME 定位框架。具体来说,我们利用不同尺度和模式的多个滑动窗口来生成多个弱检测器,每个检测器都适用于具有特定持续时间和过渡模式的 ME。此外,为了得到更全面的强检测器,我们使用基于投票的聚合模块整合了多个弱检测器的分析结果。此外,我们还设计了一种新颖的时间间隔生成方案,可将接近的峰值及其邻近帧合并为一个完整的 ME 时间间隔。在两个长视频数据库上的实验结果表明,与最先进的方法相比,我们提出的 DMA 框架具有良好的性能。代码见 https://github.com/TJUMMG/DMA-ME-Spotting。
{"title":"Duration-aware and mode-aware micro-expression spotting for long video sequences","authors":"Jing Liu ,&nbsp;Xin Li ,&nbsp;Jiaqi Zhang ,&nbsp;Guangtao Zhai ,&nbsp;Yuting Su ,&nbsp;Yuyi Zhang ,&nbsp;Bo Wang","doi":"10.1016/j.image.2024.117192","DOIUrl":"10.1016/j.image.2024.117192","url":null,"abstract":"<div><p>Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based methods are the most prevalent. Due to the differences of individual physiological and psychological mechanisms, and some uncontrollable factors, the durations and transition modes of different MEs fluctuate greatly. Limited to fixed window scale and mode, traditional sliding window based ME spotting methods fail to capture the motion changes of all MEs exactly, resulting in performance degradation. In this paper, an ensemble learning based duration &amp; mode-aware (DMA) ME spotting framework is proposed. Specifically, we exploit multiple sliding windows of different scales and modes to generate multiple weak detectors, each of which accommodates to MEs with certain duration and transition mode. Additionally, to get a more comprehensive strong detector, we integrate the analysis results of multiple weak detectors using a voting based aggregation module. Furthermore, a novel interval generation scheme is designed to merge close peaks and their neighbor frames into a complete ME interval. Experimental results on two long video databases show the promising performance of our proposed DMA framework compared with state-of-the-art methods. The codes are available at <span><span>https://github.com/TJUMMG/DMA-ME-Spotting</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117192"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-rank tensor completion based on tensor train rank with partially overlapped sub-blocks and total variation 基于具有部分重叠子块和总变化的张量列车等级的低等级张量补全
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-10 DOI: 10.1016/j.image.2024.117193
Jingfei He, Zezhong Yang, Xunan Zheng, Xiaoyue Zhang, Ao Li

Recently, the low-rank tensor completion method based on tensor train (TT) rank has achieved promising performance. Ket augmentation (KA) is commonly used in TT rank-based methods to improve the performance by converting low-dimensional tensors to higher-dimensional tensors. However, block artifacts are caused since KA also destroys the original structure and image continuity of original low-dimensional tensors. To tackle this issue, a low-rank tensor completion method based on TT rank with tensor augmentation by partially overlapped sub-blocks (TAPOS) and total variation (TV) is proposed in this paper. The proposed TAPOS preserves the image continuity of the original tensor and enhances the low-rankness of the generated higher-dimensional tensors, and a weighted de-augmentation method is used to assign different weights to the elements of sub-blocks and further reduce the block artifacts. To further alleviate the block artifacts and improve reconstruction accuracy, TV is introduced in the TAPOS-based model to add the piecewise smooth prior. The parallel matrix decomposition method is introduced to estimate the TT rank to reduce the computational cost. Numerical experiments show that the proposed method outperforms the existing state-of-the-art methods.

最近,基于张量列车(TT)秩的低秩张量补全方法取得了良好的性能。Ket augmentation(KA)通常用于基于 TT 秩的方法,通过将低维张量转换为高维张量来提高性能。然而,由于 KA 也会破坏原始低维张量的原始结构和图像连续性,因此会产生块状伪影。为解决这一问题,本文提出了一种基于 TT 秩的低秩张量补全方法,该方法通过部分重叠子块(TAPOS)和总变异(TV)进行张量增强。本文提出的 TAPOS 既保留了原始张量的图像连续性,又增强了生成的高维张量的低秩性,并采用加权去增量方法为子块元素分配不同权重,进一步减少了块伪影。为了进一步减轻块伪影并提高重建精度,在基于 TAPOS 的模型中引入了 TV,以添加片断平滑先验。此外,还引入了并行矩阵分解法来估计 TT 的秩,以降低计算成本。数值实验表明,所提出的方法优于现有的先进方法。
{"title":"Low-rank tensor completion based on tensor train rank with partially overlapped sub-blocks and total variation","authors":"Jingfei He,&nbsp;Zezhong Yang,&nbsp;Xunan Zheng,&nbsp;Xiaoyue Zhang,&nbsp;Ao Li","doi":"10.1016/j.image.2024.117193","DOIUrl":"10.1016/j.image.2024.117193","url":null,"abstract":"<div><p>Recently, the low-rank tensor completion method based on tensor train (TT) rank has achieved promising performance. Ket augmentation (KA) is commonly used in TT rank-based methods to improve the performance by converting low-dimensional tensors to higher-dimensional tensors. However, block artifacts are caused since KA also destroys the original structure and image continuity of original low-dimensional tensors. To tackle this issue, a low-rank tensor completion method based on TT rank with tensor augmentation by partially overlapped sub-blocks (TAPOS) and total variation (TV) is proposed in this paper. The proposed TAPOS preserves the image continuity of the original tensor and enhances the low-rankness of the generated higher-dimensional tensors, and a weighted de-augmentation method is used to assign different weights to the elements of sub-blocks and further reduce the block artifacts. To further alleviate the block artifacts and improve reconstruction accuracy, TV is introduced in the TAPOS-based model to add the piecewise smooth prior. The parallel matrix decomposition method is introduced to estimate the TT rank to reduce the computational cost. Numerical experiments show that the proposed method outperforms the existing state-of-the-art methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117193"},"PeriodicalIF":3.4,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1