首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
AnimeDiff: Customized Image Generation of Anime Characters using Diffusion Model AnimeDiff:利用扩散模型生成动漫人物的定制图像
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-08 DOI: 10.1109/tmm.2024.3415357
Yuqi Jiang, Qiankun Liu, Dongdong Chen, Lu Yuan, Ying Fu
{"title":"AnimeDiff: Customized Image Generation of Anime Characters using Diffusion Model","authors":"Yuqi Jiang, Qiankun Liu, Dongdong Chen, Lu Yuan, Ying Fu","doi":"10.1109/tmm.2024.3415357","DOIUrl":"https://doi.org/10.1109/tmm.2024.3415357","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Efficient Video Compression Artifact Detection and Removal: A Benchmark Dataset 实现高效的视频压缩伪影检测和去除:基准数据集
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-03 DOI: 10.1109/tmm.2024.3414549
Liqun Lin, Mingxing Wang, Jing Yang, Keke Zhang, Tiesong Zhao
{"title":"Toward Efficient Video Compression Artifact Detection and Removal: A Benchmark Dataset","authors":"Liqun Lin, Mingxing Wang, Jing Yang, Keke Zhang, Tiesong Zhao","doi":"10.1109/tmm.2024.3414549","DOIUrl":"https://doi.org/10.1109/tmm.2024.3414549","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-centric Behavior Description in Videos: New Benchmark and Model 视频中以人为中心的行为描述:新基准和模型
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-02 DOI: 10.1109/tmm.2024.3414263
Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, Yanning Zhang
{"title":"Human-centric Behavior Description in Videos: New Benchmark and Model","authors":"Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, Yanning Zhang","doi":"10.1109/tmm.2024.3414263","DOIUrl":"https://doi.org/10.1109/tmm.2024.3414263","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames 利用元优化帧进行高效跨模态视频检索
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-28 DOI: 10.1109/tmm.2024.3416669
Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun
{"title":"Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames","authors":"Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun","doi":"10.1109/tmm.2024.3416669","DOIUrl":"https://doi.org/10.1109/tmm.2024.3416669","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification 用于微视频多标签分类的多模态渐进调制网络
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-26 DOI: 10.1109/TMM.2024.3405724
Peiguang Jing;Xuan Zhao;Fugui Fan;Fan Yang;Yun Li;Yuting Su
Micro-videos, as an increasingly popular form of user-generated content (UGC), naturally include diverse multimodal cues. However, in pursuit of consistent representations, existing methods neglect the simultaneous consideration of exploring modality discrepancy and preserving modality diversity. In this paper, we propose a multimodal progressive modulation network (MPMNet) for micro-video multi-label classification, which enhances the indicative ability of each modality through gradually regulating various modality biases. In MPMNet, we first leverage a unimodal-centered parallel aggregation strategy to obtain preliminary comprehensive representations. We then integrate feature-domain disentangled modulation process and category-domain adaptive modulation process into a unified framework to jointly refine modality-oriented representations. In the former modulation process, we constrain inter-modal dependencies in a latent space to obtain modality-oriented sample representations, and introduce a disentangled paradigm to further maintain modality diversity. In the latter modulation process, we construct global-context-aware graph convolutional networks to acquire modality-oriented category representations, and develop two instance-level parameter generators to further regulate unimodal semantic biases. Extensive experiments on two micro-video multi-label datasets show that our proposed approach outperforms the state-of-the-art methods.
微视频作为一种日益流行的用户生成内容(UGC)形式,自然包含多种多样的多模态线索。然而,为了追求表征的一致性,现有方法忽略了同时考虑探索模态差异和保留模态多样性。本文提出了一种用于微视频多标签分类的多模态渐进调制网络(MPMNet),通过逐步调节各种模态偏差来增强每种模态的指示能力。在 MPMNet 中,我们首先利用以单模态为中心的并行聚合策略来获得初步的综合表征。然后,我们将特征域分解调制过程和类别域自适应调制过程整合到一个统一的框架中,共同完善面向模态的表征。在前一种调制过程中,我们在一个潜在空间中对模态间的依赖关系进行约束,从而获得面向模态的样本表征,并引入一种分解范式来进一步保持模态的多样性。在后一种调制过程中,我们构建了全局上下文感知图卷积网络,以获取面向模态的类别表征,并开发了两个实例级参数生成器,以进一步调节单模态语义偏差。在两个微型视频多标签数据集上进行的广泛实验表明,我们提出的方法优于最先进的方法。
{"title":"Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification","authors":"Peiguang Jing;Xuan Zhao;Fugui Fan;Fan Yang;Yun Li;Yuting Su","doi":"10.1109/TMM.2024.3405724","DOIUrl":"10.1109/TMM.2024.3405724","url":null,"abstract":"Micro-videos, as an increasingly popular form of user-generated content (UGC), naturally include diverse multimodal cues. However, in pursuit of consistent representations, existing methods neglect the simultaneous consideration of exploring modality discrepancy and preserving modality diversity. In this paper, we propose a multimodal progressive modulation network (MPMNet) for micro-video multi-label classification, which enhances the indicative ability of each modality through gradually regulating various modality biases. In MPMNet, we first leverage a unimodal-centered parallel aggregation strategy to obtain preliminary comprehensive representations. We then integrate feature-domain disentangled modulation process and category-domain adaptive modulation process into a unified framework to jointly refine modality-oriented representations. In the former modulation process, we constrain inter-modal dependencies in a latent space to obtain modality-oriented sample representations, and introduce a disentangled paradigm to further maintain modality diversity. In the latter modulation process, we construct global-context-aware graph convolutional networks to acquire modality-oriented category representations, and develop two instance-level parameter generators to further regulate unimodal semantic biases. Extensive experiments on two micro-video multi-label datasets show that our proposed approach outperforms the state-of-the-art methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141528982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alleviating Over-fitting in Hashing-based Fine-grained Image Retrieval: From Causal Feature Learning to Binary-injected Hash Learning 缓解基于哈希算法的细粒度图像检索中的过度拟合:从因果特征学习到二元注入哈希学习
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-21 DOI: 10.1109/tmm.2024.3410136
Xinguang Xiang, Xinhao Ding, Lu Jin, Zechao Li, Jinhui Tang, Ramesh Jain
{"title":"Alleviating Over-fitting in Hashing-based Fine-grained Image Retrieval: From Causal Feature Learning to Binary-injected Hash Learning","authors":"Xinguang Xiang, Xinhao Ding, Lu Jin, Zechao Li, Jinhui Tang, Ramesh Jain","doi":"10.1109/tmm.2024.3410136","DOIUrl":"https://doi.org/10.1109/tmm.2024.3410136","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification 用于无人机 RGB-Infrared 车辆再识别的解耦特征学习网络中的关系感知权重共享
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-21 DOI: 10.1109/TMM.2024.3400675
Xingyue Liu;Jiahao Qi;Chen Chen;Kangcheng Bin;Ping Zhong
Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and innovative research has not been studied sufficiently due to the issue of data inadequacy. Meanwhile, the cross-modality discrepancy and orientation discrepancy challenges further aggravate the difficulty of this task. To this end, we pioneer a cross-modality vehicle Re-ID benchmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID), containing 753 identities with 16015 RGB and 13913 infrared images. Moreover, to meet cross-modality discrepancy and orientation discrepancy challenges, we present a hybrid weights decoupling network (HWDNet) to learn the shared discriminative orientation-invariant features. For the first challenge, we proposed a hybrid weights siamese network with a well-designed weight restrainer and its corresponding objective function to learn both modality-specific and modality shared information. In terms of the second challenge, three effective decoupling structures with two pretext tasks are investigated to flexibly conduct orientation-invariant feature separation task. Comprehensive experiments are carried out to validate the effectiveness of the proposed method.
由于具有全时目标搜索能力,基于无人机(UAV)的跨模态车辆再识别技术在视频监控和公共安全领域正受到越来越多的关注。然而,由于数据不足的问题,这项前景广阔的创新研究尚未得到充分研究。同时,跨模态差异和方位差异的挑战进一步增加了这项任务的难度。为此,我们首创了一个名为 "无人机跨模态车辆再识别(UCM-VeID)"的跨模态车辆再识别基准,其中包含 753 个身份,16015 张 RGB 和 13913 张红外图像。此外,为了应对跨模态差异和方位差异的挑战,我们提出了混合权重解耦网络(HWDNet)来学习共享的方位不变判别特征。针对第一个挑战,我们提出了一种混合权重连体网络,该网络具有精心设计的权重约束器及其相应的目标函数,可同时学习特定模态信息和模态共享信息。在第二个挑战方面,我们研究了三种有效的解耦结构和两个前置任务,以灵活地执行方位不变特征分离任务。通过综合实验验证了所提方法的有效性。
{"title":"Relation-Aware Weight Sharing in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification","authors":"Xingyue Liu;Jiahao Qi;Chen Chen;Kangcheng Bin;Ping Zhong","doi":"10.1109/TMM.2024.3400675","DOIUrl":"10.1109/TMM.2024.3400675","url":null,"abstract":"Owing to the capacity of performing full-time target searches, cross-modality vehicle re-identification based on unmanned aerial vehicles (UAV) is gaining more attention in both video surveillance and public security. However, this promising and innovative research has not been studied sufficiently due to the issue of data inadequacy. Meanwhile, the cross-modality discrepancy and orientation discrepancy challenges further aggravate the difficulty of this task. To this end, we pioneer a cross-modality vehicle Re-ID benchmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID), containing 753 identities with \u0000<bold>16015</b>\u0000 RGB and \u0000<bold>13913</b>\u0000 infrared images. Moreover, to meet cross-modality discrepancy and orientation discrepancy challenges, we present a hybrid weights decoupling network (HWDNet) to learn the shared discriminative orientation-invariant features. For the first challenge, we proposed a hybrid weights siamese network with a well-designed weight restrainer and its corresponding objective function to learn both modality-specific and modality shared information. In terms of the second challenge, three effective decoupling structures with two pretext tasks are investigated to flexibly conduct orientation-invariant feature separation task. Comprehensive experiments are carried out to validate the effectiveness of the proposed method.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback 对齐和检索:有文本反馈的图像检索中的合成与分解学习
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-21 DOI: 10.1109/TMM.2024.3417694
Yahui Xu;Yi Bin;Jiwei Wei;Yang Yang;Guoqing Wang;Heng Tao Shen
We study the task of image retrieval with text feedback, where a reference image and modification text are composed to retrieve the desired target image. To accomplish this goal, existing methods always get the multimodal representations through different feature encoders and then adopt different strategies to model the correlation between the composed inputs and the target image. However, the multimodal query brings more challenges as it requires not only the synergistic understanding of the semantics from the heterogeneous multimodal inputs but also the ability to accurately build the underlying semantic correlation existing in each inputs-target triplet, i.e., reference image, modification text, and target image. In this paper, we tackle these issues with a novel Align and Retrieve (AlRet) framework. First, our proposed methods employ the contrastive loss in the feature encoders to learn meaningful multimodal representation while making the subsequent correlation modeling process in a more harmonious space. Then we propose to learn the accurate correlation between the composed inputs and target image in a novel composition-and-decomposition paradigm. Specifically, the composition network couples the reference image and modification text into a joint representation to learn the correlation between the joint representation and target image. The decomposition network conversely decouples the target image into visual and text subspaces to exploit the underlying correlation between the target image with each query element. The composition-and-decomposition paradigm forms a closed loop, which can be optimized simultaneously to promote each other in the performance. Massive comparison experiments on three real-world datasets confirm the effectiveness of the proposed method.
我们研究的是带有文本反馈的图像检索任务,即参考图像和修改文本组成检索所需的目标图像。为了实现这一目标,现有的方法总是通过不同的特征编码器获得多模态表示,然后采用不同的策略对组成的输入和目标图像之间的相关性进行建模。然而,多模态查询带来了更多挑战,因为它不仅需要协同理解来自异构多模态输入的语义,还需要准确构建存在于每个输入-目标三元组(即参考图像、修改文本和目标图像)中的底层语义相关性。在本文中,我们采用一种新颖的对齐和检索(AlRet)框架来解决这些问题。首先,我们提出的方法利用特征编码器中的对比损失来学习有意义的多模态表示,同时使后续的相关性建模过程在一个更加和谐的空间中进行。然后,我们提出以一种新颖的合成-分解范式来学习合成输入与目标图像之间的精确相关性。具体来说,合成网络将参考图像和修改文本组合成一个联合表示,从而学习联合表示与目标图像之间的相关性。分解网络则将目标图像分解为视觉和文本子空间,以利用目标图像与每个查询元素之间的潜在相关性。组成和分解范式形成了一个闭环,可以同时优化,从而在性能上相互促进。在三个真实世界数据集上进行的大规模对比实验证实了所提方法的有效性。
{"title":"Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback","authors":"Yahui Xu;Yi Bin;Jiwei Wei;Yang Yang;Guoqing Wang;Heng Tao Shen","doi":"10.1109/TMM.2024.3417694","DOIUrl":"10.1109/TMM.2024.3417694","url":null,"abstract":"We study the task of image retrieval with text feedback, where a reference image and modification text are composed to retrieve the desired target image. To accomplish this goal, existing methods always get the multimodal representations through different feature encoders and then adopt different strategies to model the correlation between the composed inputs and the target image. However, the multimodal query brings more challenges as it requires not only the synergistic understanding of the semantics from the heterogeneous multimodal inputs but also the ability to accurately build the underlying semantic correlation existing in each inputs-target triplet, i.e., reference image, modification text, and target image. In this paper, we tackle these issues with a novel Align and Retrieve (AlRet) framework. First, our proposed methods employ the contrastive loss in the feature encoders to learn meaningful multimodal representation while making the subsequent correlation modeling process in a more harmonious space. Then we propose to learn the accurate correlation between the composed inputs and target image in a novel composition-and-decomposition paradigm. Specifically, the composition network couples the reference image and modification text into a joint representation to learn the correlation between the joint representation and target image. The decomposition network conversely decouples the target image into visual and text subspaces to exploit the underlying correlation between the target image with each query element. The composition-and-decomposition paradigm forms a closed loop, which can be optimized simultaneously to promote each other in the performance. Massive comparison experiments on three real-world datasets confirm the effectiveness of the proposed method.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepSpoof: Deep Reinforcement Learning-Based Spoofing Attack in Cross-Technology Multimedia Communication DeepSpoof:跨技术多媒体通信中基于深度强化学习的欺骗攻击
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1109/tmm.2024.3414660
Demin Gao, Liyuan Ou, Ye Liu, Qing Yang, Honggang Wang
{"title":"DeepSpoof: Deep Reinforcement Learning-Based Spoofing Attack in Cross-Technology Multimedia Communication","authors":"Demin Gao, Liyuan Ou, Ye Liu, Qing Yang, Honggang Wang","doi":"10.1109/tmm.2024.3414660","DOIUrl":"https://doi.org/10.1109/tmm.2024.3414660","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments 利用正交矩的特征融合进行感知图像加密
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1109/TMM.2024.3405660
Xinran Li;Zichi Wang;Guorui Feng;Xinpeng Zhang;Chuan Qin
Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel perceptual hashing method is proposed in this paper using feature fusion of fractional-order continuous orthogonal moments (FrCOMs). Specifically, two robust image descriptors, i.e., fractional-order Chebyshev Fourier moments (FrCHFMs) and fractional-order radial harmonic Fourier moments (FrRHFMs), are used to extract global structural features of a color image. Then, the canonical correlation analysis (CCA) strategy is employed to fuse these features during the final hash generation process. Compared to direct concatenation, CCA excels in eliminating redundancies between feature vectors, resulting in a shorter hash sequence and higher authentication performance. A series of experiments demonstrate that the proposed method achieves satisfactory robustness, discrimination and security. Particularly, the proposed method exhibits better tampering detection ability and robustness against combined content-preserving manipulations in practical applications.
由于稳定的图像特征描述子数量有限,以及哈希生成的简单连接方法,现有的哈希方法并没有在鲁棒性和辨别力之间取得令人满意的平衡。为此,本文利用分数阶连续正交矩(FrCOMs)的特征融合,提出了一种新型的感知哈希方法。具体来说,本文使用两个鲁棒图像描述符,即分数阶切比雪夫傅里叶矩(FrCHFMs)和分数阶径向谐波傅里叶矩(FrRHFMs),来提取彩色图像的全局结构特征。然后,在最终哈希生成过程中,采用典型相关分析(CCA)策略对这些特征进行融合。与直接连接相比,CCA 擅长消除特征向量之间的冗余,从而缩短哈希序列并提高验证性能。一系列实验证明,所提出的方法具有令人满意的鲁棒性、辨别力和安全性。特别是在实际应用中,所提出的方法表现出更好的篡改检测能力和对组合内容保护操作的鲁棒性。
{"title":"Perceptual Image Hashing Using Feature Fusion of Orthogonal Moments","authors":"Xinran Li;Zichi Wang;Guorui Feng;Xinpeng Zhang;Chuan Qin","doi":"10.1109/TMM.2024.3405660","DOIUrl":"10.1109/TMM.2024.3405660","url":null,"abstract":"Due to the limited number of stable image feature descriptors and the simplistic concatenation approach to hash generation, existing hashing methods have not achieved a satisfactory balance between robustness and discrimination. To this end, a novel perceptual hashing method is proposed in this paper using feature fusion of fractional-order continuous orthogonal moments (FrCOMs). Specifically, two robust image descriptors, i.e., fractional-order Chebyshev Fourier moments (FrCHFMs) and fractional-order radial harmonic Fourier moments (FrRHFMs), are used to extract global structural features of a color image. Then, the canonical correlation analysis (CCA) strategy is employed to fuse these features during the final hash generation process. Compared to direct concatenation, CCA excels in eliminating redundancies between feature vectors, resulting in a shorter hash sequence and higher authentication performance. A series of experiments demonstrate that the proposed method achieves satisfactory robustness, discrimination and security. Particularly, the proposed method exhibits better tampering detection ability and robustness against combined content-preserving manipulations in practical applications.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1