首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
DeepSpoof: Deep Reinforcement Learning-Based Spoofing Attack in Cross-Technology Multimedia Communication DeepSpoof:跨技术多媒体通信中基于深度强化学习的欺骗攻击
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1109/TMM.2024.3414660
Demin Gao;Liyuan Ou;Ye Liu;Qing Yang;Honggang Wang
Cross-technology communication is essential for the Internet of Multimedia Things (IoMT) applications, enabling seamless integration of diverse media formats, optimized data transmission, and improved user experiences across devices and platforms. This integration drives innovative and efficient IoMT solutions in areas like smart homes, smart cities, and healthcare monitoring. However, this integration of diverse wireless standards within cross-technology multimedia communication increases the susceptibility of wireless networks to attacks. Current methods lack robust authentication mechanisms, leaving them vulnerable to spoofing attacks. To mitigate this concern, we introduce DeepSpoof, a spoofing system that utilizes deep learning to analyze historical wireless traffic and anticipate future patterns in the IoMT context. This innovative approach significantly boosts an attacker's impersonation capabilities and offers a higher degree of covertness compared to traditional spoofing methods. Rigorous evaluations, leveraging both simulated and real-world data, confirm that DeepSpoof significantly elevates the average success rate of attacks.
跨技术通信对于多媒体物联网(IoMT)应用至关重要,它可实现不同媒体格式的无缝集成、优化数据传输,并改善跨设备和平台的用户体验。这种集成推动了智能家居、智能城市和医疗监控等领域创新而高效的 IoMT 解决方案。然而,在跨技术多媒体通信中整合不同的无线标准,增加了无线网络遭受攻击的可能性。目前的方法缺乏强大的验证机制,容易受到欺骗攻击。为了缓解这一问题,我们引入了 DeepSpoof,这是一种利用深度学习分析历史无线通信量并预测 IoMT 未来模式的欺骗系统。与传统的欺骗方法相比,这种创新方法大大提高了攻击者的假冒能力,并提供了更高的隐蔽性。利用模拟数据和真实数据进行的严格评估证实,DeepSpoof 能显著提高攻击的平均成功率。
{"title":"DeepSpoof: Deep Reinforcement Learning-Based Spoofing Attack in Cross-Technology Multimedia Communication","authors":"Demin Gao;Liyuan Ou;Ye Liu;Qing Yang;Honggang Wang","doi":"10.1109/TMM.2024.3414660","DOIUrl":"10.1109/TMM.2024.3414660","url":null,"abstract":"Cross-technology communication is essential for the Internet of Multimedia Things (IoMT) applications, enabling seamless integration of diverse media formats, optimized data transmission, and improved user experiences across devices and platforms. This integration drives innovative and efficient IoMT solutions in areas like smart homes, smart cities, and healthcare monitoring. However, this integration of diverse wireless standards within cross-technology multimedia communication increases the susceptibility of wireless networks to attacks. Current methods lack robust authentication mechanisms, leaving them vulnerable to spoofing attacks. To mitigate this concern, we introduce DeepSpoof, a spoofing system that utilizes deep learning to analyze historical wireless traffic and anticipate future patterns in the IoMT context. This innovative approach significantly boosts an attacker's impersonation capabilities and offers a higher degree of covertness compared to traditional spoofing methods. Rigorous evaluations, leveraging both simulated and real-world data, confirm that DeepSpoof significantly elevates the average success rate of attacks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10879-10891"},"PeriodicalIF":8.4,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments 利用正交矩的特征融合进行感知图像加密
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1109/TMM.2024.3405660
Xinran Li;Zichi Wang;Guorui Feng;Xinpeng Zhang;Chuan Qin
Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel perceptual hashing method is proposed in this paper using feature fusion of fractional-order continuous orthogonal moments (FrCOMs). Specifically, two robust image descriptors, i.e., fractional-order Chebyshev Fourier moments (FrCHFMs) and fractional-order radial harmonic Fourier moments (FrRHFMs), are used to extract global structural features of a color image. Then, the canonical correlation analysis (CCA) strategy is employed to fuse these features during the final hash generation process. Compared to direct concatenation, CCA excels in eliminating redundancies between feature vectors, resulting in a shorter hash sequence and higher authentication performance. A series of experiments demonstrate that the proposed method achieves satisfactory robustness, discrimination and security. Particularly, the proposed method exhibits better tampering detection ability and robustness against combined content-preserving manipulations in practical applications.
由于稳定的图像特征描述子数量有限,以及哈希生成的简单连接方法,现有的哈希方法并没有在鲁棒性和辨别力之间取得令人满意的平衡。为此,本文利用分数阶连续正交矩(FrCOMs)的特征融合,提出了一种新型的感知哈希方法。具体来说,本文使用两个鲁棒图像描述符,即分数阶切比雪夫傅里叶矩(FrCHFMs)和分数阶径向谐波傅里叶矩(FrRHFMs),来提取彩色图像的全局结构特征。然后,在最终哈希生成过程中,采用典型相关分析(CCA)策略对这些特征进行融合。与直接连接相比,CCA 擅长消除特征向量之间的冗余,从而缩短哈希序列并提高验证性能。一系列实验证明,所提出的方法具有令人满意的鲁棒性、辨别力和安全性。特别是在实际应用中,所提出的方法表现出更好的篡改检测能力和对组合内容保护操作的鲁棒性。
{"title":"Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments","authors":"Xinran Li;Zichi Wang;Guorui Feng;Xinpeng Zhang;Chuan Qin","doi":"10.1109/TMM.2024.3405660","DOIUrl":"10.1109/TMM.2024.3405660","url":null,"abstract":"Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel perceptual hashing method is proposed in this paper using feature fusion of fractional-order continuous orthogonal moments (FrCOMs). Specifically, two robust image descriptors, i.e., fractional-order Chebyshev Fourier moments (FrCHFMs) and fractional-order radial harmonic Fourier moments (FrRHFMs), are used to extract global structural features of a color image. Then, the canonical correlation analysis (CCA) strategy is employed to fuse these features during the final hash generation process. Compared to direct concatenation, CCA excels in eliminating redundancies between feature vectors, resulting in a shorter hash sequence and higher authentication performance. A series of experiments demonstrate that the proposed method achieves satisfactory robustness, discrimination and security. Particularly, the proposed method exhibits better tampering detection ability and robustness against combined content-preserving manipulations in practical applications.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10041-10054"},"PeriodicalIF":8.4,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression LLIC: 利用自适应权重的大接收场变换编码技术实现学习图像压缩
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-19 DOI: 10.1109/tmm.2024.3416831
Wei Jiang, Peirong Ning, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang
{"title":"LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression","authors":"Wei Jiang, Peirong Ning, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang","doi":"10.1109/tmm.2024.3416831","DOIUrl":"https://doi.org/10.1109/tmm.2024.3416831","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"32 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Screen-Shooting Resistant Watermarking With Grayscale Deviation Simulation 利用灰度偏差模拟进行抗屏幕拍摄水印处理
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-17 DOI: 10.1109/TMM.2024.3415415
Yiyi Li;Xin Liao;Xiaoshuai Wu
With the prevalence of electronic devices in our daily lives, content leakages frequently occur, and to enable leakage tracing, screen-shooting resistant watermarking has attracted tremendous attention. However, current studies often overlook a thoughtful investigation of the cross-media screen-camera process and fail to consider the effect of grayscale deviation on the screen. In this paper, we propose screen-shooting distortion simulation ($bf {SSDS}$), which involves a grayscale deviation function for constructing a more practical noise layer. We divide SSDS into screen displaying and camera shooting. For screen displaying, different viewing angles result in grayscale deviation with distinct intensities, and we simulate the distortions by modeling the relative position of the viewing point and the screen plane. For camera shooting, a series of distortion functions are used to approximate the perturbations in the camera pipeline, including defocus blur, noise and JPEG compression. Furthermore, the gradient-guided encoder is designed to conduct the embedding in the texture region using a modification cost map. Experimental results show that our proposed watermarking framework outperforms the state-of-the-art methods in terms of robustness and visual quality.
随着电子设备在日常生活中的普及,内容泄露事件时有发生,为实现泄露追踪,抗屏幕拍摄水印技术引起了广泛关注。然而,目前的研究往往忽略了对跨媒体屏幕-摄像机过程的周密调查,也没有考虑屏幕灰度偏差的影响。本文提出了屏幕拍摄失真模拟($bf {SSDS}$),其中涉及灰度偏差函数,用于构建更实用的噪声层。我们将 SSDS 分成屏幕显示和摄像机拍摄两部分。对于屏幕显示,不同的观察角度会产生不同强度的灰度偏差,我们通过模拟观察点与屏幕平面的相对位置来模拟失真。对于相机拍摄,我们使用一系列失真函数来近似相机流水线中的扰动,包括离焦模糊、噪声和 JPEG 压缩。此外,还设计了梯度引导编码器,利用修改代价图在纹理区域进行嵌入。实验结果表明,我们提出的水印框架在鲁棒性和视觉质量方面优于最先进的方法。
{"title":"Screen-Shooting Resistant Watermarking With Grayscale Deviation Simulation","authors":"Yiyi Li;Xin Liao;Xiaoshuai Wu","doi":"10.1109/TMM.2024.3415415","DOIUrl":"10.1109/TMM.2024.3415415","url":null,"abstract":"With the prevalence of electronic devices in our daily lives, content leakages frequently occur, and to enable leakage tracing, screen-shooting resistant watermarking has attracted tremendous attention. However, current studies often overlook a thoughtful investigation of the cross-media screen-camera process and fail to consider the effect of grayscale deviation on the screen. In this paper, we propose \u0000<underline>s</u>\u0000creen-\u0000<underline>s</u>\u0000hooting \u0000<underline>d</u>\u0000istortion \u0000<underline>s</u>\u0000imulation (\u0000<inline-formula><tex-math>$bf {SSDS}$</tex-math></inline-formula>\u0000), which involves a grayscale deviation function for constructing a more practical noise layer. We divide SSDS into screen displaying and camera shooting. For screen displaying, different viewing angles result in grayscale deviation with distinct intensities, and we simulate the distortions by modeling the relative position of the viewing point and the screen plane. For camera shooting, a series of distortion functions are used to approximate the perturbations in the camera pipeline, including defocus blur, noise and JPEG compression. Furthermore, the gradient-guided encoder is designed to conduct the embedding in the texture region using a modification cost map. Experimental results show that our proposed watermarking framework outperforms the state-of-the-art methods in terms of robustness and visual quality.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10908-10923"},"PeriodicalIF":8.4,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images 基于语义伪标签的深度混合对比学习用于光学遥感图像中的突出物体检测
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-14 DOI: 10.1109/TMM.2024.3414669
Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu
Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.
近年来,自然场景图像中的突出物体检测(NSI-SOD)取得了显著进展。然而,与自然图像相比,遥感图像(ORSI)的空间分辨率多样、背景结构复杂、物体视觉属性各异等特性更为复杂。因此,如何发掘遥感图像的多尺度结构感知信息以准确检测突出物体更具挑战性。本文受对比学习优越性的启发,提出了一种新的 ORSI-SOD 训练范式,即基于语义伪标签的深度混合对比学习(Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label, DHCont),以迫使网络提取丰富的结构感知信息,并进一步学习结构更好的特征嵌入空间。具体来说,DHCont 首先将 ORSI 分割成若干个由颜色和纹理相似像素组成的局部子区域,作为语义伪标签。这种策略可以有效地探索 ORSI-SOD 中未充分开发的语义类别。为了深入研究多尺度结构感知优化,DHCont 采用了混合对比策略,在多个尺度上整合了 "像素到像素"、"区域到区域"、"像素到区域 "和 "区域到像素 "对比。此外,为了增强突出区域的边缘细节,我们开发了一种硬边缘对比策略,重点提高对象边界附近硬像素的检测精度。此外,我们还引入了一种深度对比算法,为多个阶段的特征空间添加了额外的深度约束。在两个流行的 ORSI-SOD 数据集上进行的广泛实验表明,只需将我们的 DHCont 集成到现有的 ORSI-SOD 模型中,就能显著提高性能。
{"title":"Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images","authors":"Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu","doi":"10.1109/TMM.2024.3414669","DOIUrl":"10.1109/TMM.2024.3414669","url":null,"abstract":"Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10892-10907"},"PeriodicalIF":8.4,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Light Image Enhancement With SAM-Based Structure Priors and Guidance 利用基于 SAM 的结构先验和引导进行低照度图像增强
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-13 DOI: 10.1109/TMM.2024.3414328
Guanlin Li;Bin Zhao;Xuelong Li
Low-light images often suffer from severe detail lost in darker areas and non-uniform illumination distribution across distinct regions. Thus, structure modeling and region-specific illumination manipulation are crucial for high-quality enhanced image generation. However, previous methods encounter limitations in exploring robust structure priors and lack adequate modeling of illumination relationships among different regions, resulting in structure artifacts and color deviations. To alleviate this limitation, we propose a Segmentation-Guided Framework (SGF) which integrates the constructed robust segmentation priors to guide the enhancement process. Specifically, SGF first constructs a robust image-level edge prior based on the segmentation results of the Segment Anything Model (SAM) in a zero-shot manner. Then, we generate lighted-up region-aware feature-level prior by incorporating region-aware dynamic convolution. To adequately model long-distance illumination interactions across distinct regions, we design a segmentation-guided transformer block (SGTB), which utilizes the lighted-up region-aware feature-level prior to guide self-attention calculation. By arranging the SGTBs in a symmetric hierarchical structure, we derive a segmentation-guided enhancement module that operates under the guidance of both the image and feature-level priors. Comprehensive experimental results show that our SGF performs remarkably in both quantitative evaluation and visual comparison.
低照度图像往往存在暗部细节严重丢失和不同区域光照分布不均匀的问题。因此,结构建模和特定区域的光照处理对于生成高质量的增强图像至关重要。然而,以往的方法在探索稳健的结构先验方面存在局限性,对不同区域之间的光照关系缺乏足够的建模,从而导致结构伪影和色彩偏差。为了缓解这一局限性,我们提出了一种分割引导框架(SGF),该框架整合了构建的稳健分割先验来引导增强过程。具体来说,SGF 首先以零拍方式,根据任何分割模型(SAM)的分割结果,构建稳健的图像级边缘先验。然后,我们结合区域感知动态卷积,生成亮化的区域感知特征级先验。为了充分模拟不同区域之间的远距离光照相互作用,我们设计了一个分割引导转换器块(SGTB),利用点亮的区域感知特征级先验来引导自注意力计算。通过将 SGTB 安排在一个对称的分层结构中,我们得到了一个在图像和特征级先验指导下运行的分割引导增强模块。综合实验结果表明,我们的 SGF 在定量评估和视觉对比方面都表现出色。
{"title":"Low-Light Image Enhancement With SAM-Based Structure Priors and Guidance","authors":"Guanlin Li;Bin Zhao;Xuelong Li","doi":"10.1109/TMM.2024.3414328","DOIUrl":"10.1109/TMM.2024.3414328","url":null,"abstract":"Low-light images often suffer from severe detail lost in darker areas and non-uniform illumination distribution across distinct regions. Thus, structure modeling and region-specific illumination manipulation are crucial for high-quality enhanced image generation. However, previous methods encounter limitations in exploring robust structure priors and lack adequate modeling of illumination relationships among different regions, resulting in structure artifacts and color deviations. To alleviate this limitation, we propose a Segmentation-Guided Framework (SGF) which integrates the constructed robust segmentation priors to guide the enhancement process. Specifically, SGF first constructs a robust image-level edge prior based on the segmentation results of the Segment Anything Model (SAM) in a zero-shot manner. Then, we generate lighted-up region-aware feature-level prior by incorporating region-aware dynamic convolution. To adequately model long-distance illumination interactions across distinct regions, we design a segmentation-guided transformer block (SGTB), which utilizes the lighted-up region-aware feature-level prior to guide self-attention calculation. By arranging the SGTBs in a symmetric hierarchical structure, we derive a segmentation-guided enhancement module that operates under the guidance of both the image and feature-level priors. Comprehensive experimental results show that our SGF performs remarkably in both quantitative evaluation and visual comparison.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10854-10866"},"PeriodicalIF":8.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Inner- and Cross-Task Contrastive Relations for Continual Image Classification 为连续图像分类建立内部和跨任务对比关系模型
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-13 DOI: 10.1109/TMM.2024.3414277
Yuxuan Luo;Runmin Cong;Xialei Liu;Horace Ho Shing Ip;Sam Kwong
Existing continual image classification methods demonstrate that samples from all sequences of continual classification tasks contain common (task-invariant) features and class-specific (task-variant) features that can be decoupled for classification tasks. However, the existing feature decomposition strategies only focus on individual tasks while neglecting the essential cues that the relationship between different tasks can provide, thereby hindering the improvement of continual image classification results. To address this issue, we propose an Adversarial Contrastive Continual Learning (ACCL) method that decouples task-invariant and task-variant features by constructing all-round, multi-level contrasts on sample pairs within individual tasks or from different tasks. Specifically, three constraints on the distribution of task-invariant and task-variant features are included, i.e., task-invariant features across different tasks should remain consistent, task-variant features should exhibit differences, and task-invariant and task-variant features should differ from each other. At the same time, we also design an effective contrastive replay strategy to make full use of the replay samples to participate in the construction of sample pairs, further alleviating the forgetting problem, and modeling cross-task relationships. Through extensive experiments on continual image classification tasks on CIFAR100, MiniImageNet and TinyImageNet, we show the superiority of our proposed strategy, improving the accuracy and with better visualized outcomes.
现有的连续图像分类方法表明,来自所有连续分类任务序列的样本包含共同的(任务变量)特征和特定类别的(任务变量)特征,这些特征可以针对分类任务进行解耦。然而,现有的特征分解策略只关注单个任务,而忽略了不同任务之间的关系所能提供的重要线索,从而阻碍了连续图像分类结果的改进。为了解决这个问题,我们提出了一种对抗对比持续学习(ACCL)方法,通过对单个任务或不同任务中的样本对构建全方位、多层次的对比,将任务变量特征和任务变量特征分离开来。具体来说,任务不变特征和任务变量特征的分布有三个限制条件,即不同任务的任务不变特征应保持一致,任务变量特征应表现出差异,任务不变特征和任务变量特征应互不相同。同时,我们还设计了有效的对比重放策略,充分利用重放样本参与样本对的构建,进一步缓解遗忘问题,并对跨任务关系进行建模。通过在 CIFAR100、MiniImageNet 和 TinyImageNet 上对连续图像分类任务的广泛实验,我们证明了所提策略的优越性,不仅提高了准确率,而且获得了更好的可视化结果。
{"title":"Modeling Inner- and Cross-Task Contrastive Relations for Continual Image Classification","authors":"Yuxuan Luo;Runmin Cong;Xialei Liu;Horace Ho Shing Ip;Sam Kwong","doi":"10.1109/TMM.2024.3414277","DOIUrl":"10.1109/TMM.2024.3414277","url":null,"abstract":"Existing continual image classification methods demonstrate that samples from all sequences of continual classification tasks contain common (task-invariant) features and class-specific (task-variant) features that can be decoupled for classification tasks. However, the existing feature decomposition strategies only focus on individual tasks while neglecting the essential cues that the relationship between different tasks can provide, thereby hindering the improvement of continual image classification results. To address this issue, we propose an Adversarial Contrastive Continual Learning (ACCL) method that decouples task-invariant and task-variant features by constructing all-round, multi-level contrasts on sample pairs within individual tasks or from different tasks. Specifically, three constraints on the distribution of task-invariant and task-variant features are included, i.e., task-invariant features across different tasks should remain consistent, task-variant features should exhibit differences, and task-invariant and task-variant features should differ from each other. At the same time, we also design an effective contrastive replay strategy to make full use of the replay samples to participate in the construction of sample pairs, further alleviating the forgetting problem, and modeling cross-task relationships. Through extensive experiments on continual image classification tasks on CIFAR100, MiniImageNet and TinyImageNet, we show the superiority of our proposed strategy, improving the accuracy and with better visualized outcomes.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10842-10853"},"PeriodicalIF":8.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilateral Interaction for Local-Global Collaborative Perception in Low-Light Image Enhancement 低照度图像增强中的局部-全局协同感知双边互动
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-12 DOI: 10.1109/TMM.2024.3413293
Rui Xu;Yuezhou Li;Yuzhen Niu;Huangbiao Xu;Yuzhong Chen;Tiesong Zhao
Low-light image enhancement is a challenging task due to the limited visibility in dark environments. While recent advances have shown progress in integrating CNNs and Transformers, the inadequate local-global perceptual interactions still impedes their application in complex degradation scenarios. To tackle this issue, we propose BiFormer, a lightweight framework that facilitates local-global collaborative perception via bilateral interaction. Specifically, our framework introduces a core CNN-Transformer collaborative perception block (CPB) that combines local-aware convolutional attention (LCA) and global-aware recursive Transformer (GRT) to simultaneously preserve local details and ensure global consistency. To promote perceptual interaction, we adopt bilateral interaction strategy for both local and global perception, which involves local-to-global second-order interaction (SoI) in the dual-domain, as well as a mixed-channel fusion (MCF) module for global-to-local interaction. The MCF is also a highly efficient feature fusion module tailored for degraded features. Extensive experiments conducted on low-level and high-level tasks demonstrate that BiFormer achieves state-of-the-art performance. Furthermore, it exhibits a significant reduction in model parameters and computational cost compared to existing Transformer-based low-light image enhancement methods.
由于黑暗环境中的可见度有限,弱光图像增强是一项具有挑战性的任务。虽然最近的进展表明,CNN 和变换器的集成取得了进展,但局部-全局感知交互的不足仍然阻碍了它们在复杂降解场景中的应用。为解决这一问题,我们提出了 BiFormer,这是一种轻量级框架,可通过双边互动促进局部-全局协同感知。具体来说,我们的框架引入了一个核心的 CNN-Transformer 协作感知块(CPB),它结合了局部感知卷积注意(LCA)和全局感知递归变换器(GRT),可同时保留局部细节并确保全局一致性。为了促进感知交互,我们对本地和全局感知都采用了双边交互策略,其中包括双域中本地到全局的二阶交互(SoI),以及用于全局到本地交互的混合通道融合(MCF)模块。MCF 也是为降级特征量身定制的高效特征融合模块。在低级和高级任务中进行的大量实验表明,BiFormer 实现了最先进的性能。此外,与现有的基于 Transformer 的低照度图像增强方法相比,它还显著降低了模型参数和计算成本。
{"title":"Bilateral Interaction for Local-Global Collaborative Perception in Low-Light Image Enhancement","authors":"Rui Xu;Yuezhou Li;Yuzhen Niu;Huangbiao Xu;Yuzhong Chen;Tiesong Zhao","doi":"10.1109/TMM.2024.3413293","DOIUrl":"10.1109/TMM.2024.3413293","url":null,"abstract":"Low-light image enhancement is a challenging task due to the limited visibility in dark environments. While recent advances have shown progress in integrating CNNs and Transformers, the inadequate local-global perceptual interactions still impedes their application in complex degradation scenarios. To tackle this issue, we propose BiFormer, a lightweight framework that facilitates local-global collaborative perception via bilateral interaction. Specifically, our framework introduces a core CNN-Transformer collaborative perception block (CPB) that combines local-aware convolutional attention (LCA) and global-aware recursive Transformer (GRT) to simultaneously preserve local details and ensure global consistency. To promote perceptual interaction, we adopt bilateral interaction strategy for both local and global perception, which involves local-to-global second-order interaction (SoI) in the dual-domain, as well as a mixed-channel fusion (MCF) module for global-to-local interaction. The MCF is also a highly efficient feature fusion module tailored for degraded features. Extensive experiments conducted on low-level and high-level tasks demonstrate that BiFormer achieves state-of-the-art performance. Furthermore, it exhibits a significant reduction in model parameters and computational cost compared to existing Transformer-based low-light image enhancement methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10792-10804"},"PeriodicalIF":8.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Specific Domain Prompt Learning via Improved Text Label Optimization 通过改进文本标签优化实现特定领域提示学习
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-12 DOI: 10.1109/TMM.2024.3413318
Liangchen Liu;Nannan Wang;Decheng Liu;Xi Yang;Xinbo Gao;Tongliang Liu
Prompt learning has emerged as a thriving parameter-efficient fine-tuning technique for adapting pre-trained vision-language models (VLMs) to various downstream tasks. However, existing prompt learning approaches still exhibit limited capability for adapting foundational VLMs to specific domains that require specialized and expert-level knowledge. Since this kind of specific knowledge is primarily embedded in the pre-defined text labels, we infer that foundational VLMs cannot directly interpret semantic meaningful information from these specific text labels, which causes the above limitation. From this perspective, this paper additionally models text labels with learnable tokens and casts this operation into traditional prompt learning framework. By optimizing label tokens, semantic meaningful text labels are automatically learned for each class. Nevertheless, directly optimizing text label still remains two critical problems, i.e., insufficient optimization and biased optimization. We further address these problems by proposing Modality Interaction Text Label Optimization (MITLOp) and Color-based Consistency Augmentation (CCAug) respectively, thereby effectively improving the quality of the optimized text labels. Extensive experiments indicate that our proposed method achieves significant improvements in VLM adaptation on specific domains.
提示学习已成为一种蓬勃发展的参数高效微调技术,用于将预先训练好的视觉语言模型(VLM)调整到各种下游任务。然而,现有的提示学习方法在将基础视觉语言模型调整到需要专业和专家级知识的特定领域时,仍表现出有限的能力。由于这类特定知识主要蕴含在预定义的文本标签中,我们推断基础 VLM 无法直接解释这些特定文本标签中的语义信息,这就造成了上述局限性。从这个角度出发,本文用可学习标记对文本标签进行额外建模,并将这一操作引入传统的提示学习框架。通过优化标签标记,每个类别的有语义的文本标签都能被自动学习。然而,直接优化文本标签仍然存在两个关键问题,即优化不足和优化有偏差。针对这些问题,我们分别提出了模态交互文本标签优化(MITLOp)和基于颜色的一致性增强(CCAug)方法,从而有效提高了优化文本标签的质量。广泛的实验表明,我们提出的方法显著改善了特定领域的 VLM 适应性。
{"title":"Towards Specific Domain Prompt Learning via Improved Text Label Optimization","authors":"Liangchen Liu;Nannan Wang;Decheng Liu;Xi Yang;Xinbo Gao;Tongliang Liu","doi":"10.1109/TMM.2024.3413318","DOIUrl":"10.1109/TMM.2024.3413318","url":null,"abstract":"Prompt learning has emerged as a thriving parameter-efficient fine-tuning technique for adapting pre-trained vision-language models (VLMs) to various downstream tasks. However, existing prompt learning approaches still exhibit limited capability for adapting foundational VLMs to specific domains that require specialized and expert-level knowledge. Since this kind of specific knowledge is primarily embedded in the pre-defined text labels, we infer that foundational VLMs cannot directly interpret semantic meaningful information from these specific text labels, which causes the above limitation. From this perspective, this paper additionally models text labels with learnable tokens and casts this operation into traditional prompt learning framework. By optimizing label tokens, semantic meaningful text labels are automatically learned for each class. Nevertheless, directly optimizing text label still remains two critical problems, i.e., insufficient optimization and biased optimization. We further address these problems by proposing Modality Interaction Text Label Optimization (MITLOp) and Color-based Consistency Augmentation (CCAug) respectively, thereby effectively improving the quality of the optimized text labels. Extensive experiments indicate that our proposed method achieves significant improvements in VLM adaptation on specific domains.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10805-10815"},"PeriodicalIF":8.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADMNet: Attention-Guided Densely Multi-Scale Network for Lightweight Salient Object Detection ADMNet:用于轻量级突出物体检测的注意力引导密集多尺度网络
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-12 DOI: 10.1109/TMM.2024.3413529
Xiaofei Zhou;Kunye Shen;Zhi Liu
Recently, benefitting from the rapid development of deep learning technology, the research of salient object detection has achieved great progress. However, the performance of existing cutting-edge saliency models relies on large network size and high computational overhead. This is unamiable to real-world applications, especially the practical platforms with low cost and limited computing resources. In this paper, we propose a novel lightweight saliency model, namely Attention-guided Densely Multi-scale Network (ADMNet), to tackle this issue. Firstly, we design the multi-scale perception (MP) module to acquire different contextual features by using different receptive fields. Embarking on MP module, we build the encoder of our model, where each convolutional block adopts a dense structure to connect MP modules. Following this way, our model can provide powerful encoder features for the characterization of salient objects. Secondly, we employ dual attention (DA) module to equip the decoder blocks. Particularly, in DA module, the binarized coarse saliency inference of the decoder block (i.e., a hard spatial attention map) is first employed to filter out interference cues from the decoder feature, and then by introducing large receptive fields, the enhanced decoder feature is used to generate a soft spatial attention map, which further purifies the fused features. Following this way, the deep features are steered to give more concerns to salient regions. Extensive experiments on five public challenging datasets including ECSSD, DUT-OMRON, DUTS-TE, HKU-IS, and PASCAL-S clearly show that our model achieves comparable performance with the state-of-the-art saliency models while running at a 219.4fps GPU speed and a 1.76fps CPU speed for a 368×368 image with only 0.84 M parameters.
近年来,得益于深度学习技术的飞速发展,突出物体检测研究取得了长足的进步。然而,现有的前沿突出模型的性能依赖于庞大的网络规模和高计算开销。这对于现实世界的应用,尤其是成本低、计算资源有限的实用平台来说是不可行的。本文提出了一种新颖的轻量级显著性模型,即注意力引导的密集多尺度网络(Attention-guided Densely Multi-scale Network,ADMNet),以解决这一问题。首先,我们设计了多尺度感知(MP)模块,利用不同的感受野获取不同的上下文特征。针对多尺度感知模块,我们构建了模型的编码器,其中每个卷积块都采用密集结构来连接多尺度感知模块。这样,我们的模型就能为突出对象的特征描述提供强大的编码器特征。其次,我们采用双重注意力(DA)模块来装备解码器模块。特别是在 DA 模块中,首先利用解码器块的二值化粗略显著性推理(即硬空间注意力图)来过滤解码器特征中的干扰线索,然后通过引入大感受野,利用增强的解码器特征生成软空间注意力图,从而进一步净化融合特征。通过这种方式,深度特征被引导到更关注的突出区域。在包括 ECSSD、DUT-OMRON、DUTS-TE、HKU-IS 和 PASCAL-S 在内的五个公开挑战性数据集上进行的广泛实验清楚地表明,我们的模型与最先进的显著性模型性能相当,同时在 368×368 图像上以 219.4fps 的 GPU 速度和 1.76fps 的 CPU 速度运行,参数仅为 0.84 M。
{"title":"ADMNet: Attention-Guided Densely Multi-Scale Network for Lightweight Salient Object Detection","authors":"Xiaofei Zhou;Kunye Shen;Zhi Liu","doi":"10.1109/TMM.2024.3413529","DOIUrl":"10.1109/TMM.2024.3413529","url":null,"abstract":"Recently, benefitting from the rapid development of deep learning technology, the research of salient object detection has achieved great progress. However, the performance of existing cutting-edge saliency models relies on large network size and high computational overhead. This is unamiable to real-world applications, especially the practical platforms with low cost and limited computing resources. In this paper, we propose a novel lightweight saliency model, namely Attention-guided Densely Multi-scale Network (ADMNet), to tackle this issue. Firstly, we design the multi-scale perception (MP) module to acquire different contextual features by using different receptive fields. Embarking on MP module, we build the encoder of our model, where each convolutional block adopts a dense structure to connect MP modules. Following this way, our model can provide powerful encoder features for the characterization of salient objects. Secondly, we employ dual attention (DA) module to equip the decoder blocks. Particularly, in DA module, the binarized coarse saliency inference of the decoder block (\u0000<italic>i.e.</i>\u0000, a hard spatial attention map) is first employed to filter out interference cues from the decoder feature, and then by introducing large receptive fields, the enhanced decoder feature is used to generate a soft spatial attention map, which further purifies the fused features. Following this way, the deep features are steered to give more concerns to salient regions. Extensive experiments on five public challenging datasets including ECSSD, DUT-OMRON, DUTS-TE, HKU-IS, and PASCAL-S clearly show that our model achieves comparable performance with the state-of-the-art saliency models while running at a 219.4fps GPU speed and a 1.76fps CPU speed for a 368×368 image with only 0.84 M parameters.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10828-10841"},"PeriodicalIF":8.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1