首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition MVP-Shot:用于少射动作识别的多速度渐进式对齐框架
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586118
Hongyu Qu;Rui Yan;Xiangbo Shu;Hailiang Gao;Peng Huang;Guosen Xie
Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc.) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to make more accurate query sample predictions under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, SSv2-full, and SSv2-small).
目前的小镜头动作识别(FSAR)方法通常对学习到的判别特征进行语义匹配,以达到较好的识别效果。然而,大多数FSAR方法专注于单尺度(如帧级、段级等)特征对齐,忽略了具有相同语义的人类动作可能以不同的速度出现。为此,我们开发了一种新的多速度渐进式对齐(MVP-Shot)框架,以逐步学习和对齐多速度水平的语义相关动作特征。具体而言,设计了一个多速度特征对齐(Multi-Velocity Feature Alignment, MVFA)模块,用于测量不同速度尺度的支持视频和查询视频的特征之间的相似性,然后以残差方式合并所有相似分数。为了避免多个速度特征偏离底层运动语义,我们提出的渐进式语义定制交互(Progressive semantic -tailored Interaction, PSTI)模块通过在不同速度的信道域和时域上的特征交互,将速度定制的文本信息注入视频特征中。以上两个模块相互补偿,在少镜头设置下做出更准确的查询样本预测。实验结果表明,我们的方法在多个标准的少量基准测试(即HMDB51、UCF101、Kinetics、SSv2-full和SSv2-small)上优于当前最先进的方法。
{"title":"MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition","authors":"Hongyu Qu;Rui Yan;Xiangbo Shu;Hailiang Gao;Peng Huang;Guosen Xie","doi":"10.1109/TMM.2025.3586118","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586118","url":null,"abstract":"Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc.) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to make more accurate query sample predictions under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (<italic>i.e.</i>, HMDB51, UCF101, Kinetics, SSv2-full, and SSv2-small).","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6593-6605"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Prompt-Driven Low-Light Image Enhancement With Frequency Aware Learning 基于频率感知学习的渐进式提示驱动微光图像增强
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586101
Xiaoyan Sun;De Cheng;Yan Li;Nannan Wang;Dingwen Zhang;Xinbo Gao;Jiande Sun
Low-light Image Enhancement (LLIE) aims to rectify inadequate illumination conditions and achieve superior visual quality in images, which plays a pivotal role in the domain of low-level computer vision. Due to poor illumination in images, many high-frequency details are obscured, which leads to an uneven distribution of low- and high-frequency information. However, most existing LLIE methods do not pay special attention to the restoration of high-frequency detail information and some challenging-to-recover areas in images. To address this issue, we propose a novel progressive prompt-driven LLIE framework with frequency aware learning, through a two-stage coarse-to-fine learning mechanism. Specifically, the proposed method fully utilizes both the specially designed brightness-aware prompt and detail-aware prompt on the prior trained model, to achieve an excellent enhanced image that exhibits more natural brightness and richer detail information. Furthermore, the proposed frequency aware learning objective can adaptively adjust the contribution of individual pixels for image reconstruction based on the statistics of high- and low-frequency features, which enables the network to focus on learning intricate details and other challenging areas in low-light images. Extensive experimental results demonstrate the effectiveness of the proposed method, achieving superior performances to state-of-the-art methods on representative real-world and synthetic datasets.
低光图像增强(Low-light Image Enhancement, LLIE)旨在改善光照不足的情况,使图像获得更好的视觉质量,在低光计算机视觉领域中起着举足轻重的作用。由于图像中光照较差,许多高频细节被遮挡,导致低频和高频信息分布不均匀。然而,现有的LLIE方法大多没有特别关注图像中高频细节信息和一些难以恢复区域的恢复。为了解决这个问题,我们提出了一种新的渐进式提示驱动的LLIE框架,通过两阶段的粗到精学习机制,实现频率感知学习。具体而言,该方法在先验训练模型上充分利用了专门设计的亮度感知提示和细节感知提示,获得了亮度更自然、细节信息更丰富的优秀增强图像。此外,所提出的频率感知学习目标可以基于高频和低频特征的统计自适应调整单个像素对图像重建的贡献,使网络能够专注于学习低光图像中复杂的细节和其他具有挑战性的区域。大量的实验结果证明了该方法的有效性,在具有代表性的真实世界和合成数据集上取得了优于最先进方法的性能。
{"title":"Progressive Prompt-Driven Low-Light Image Enhancement With Frequency Aware Learning","authors":"Xiaoyan Sun;De Cheng;Yan Li;Nannan Wang;Dingwen Zhang;Xinbo Gao;Jiande Sun","doi":"10.1109/TMM.2025.3586101","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586101","url":null,"abstract":"Low-light Image Enhancement (LLIE) aims to rectify inadequate illumination conditions and achieve superior visual quality in images, which plays a pivotal role in the domain of low-level computer vision. Due to poor illumination in images, many high-frequency details are obscured, which leads to an uneven distribution of low- and high-frequency information. However, most existing LLIE methods do not pay special attention to the restoration of high-frequency detail information and some challenging-to-recover areas in images. To address this issue, we propose a novel progressive prompt-driven LLIE framework with frequency aware learning, through a two-stage coarse-to-fine learning mechanism. Specifically, the proposed method fully utilizes both the specially designed brightness-aware prompt and detail-aware prompt on the prior trained model, to achieve an excellent enhanced image that exhibits more natural brightness and richer detail information. Furthermore, the proposed frequency aware learning objective can adaptively adjust the contribution of individual pixels for image reconstruction based on the statistics of high- and low-frequency features, which enables the network to focus on learning intricate details and other challenging areas in low-light images. Extensive experimental results demonstrate the effectiveness of the proposed method, achieving superior performances to state-of-the-art methods on representative real-world and synthetic datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6620-6634"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UBTransformer: Uncertainty-Based Transformer Model for Complex Scenarios Detection in Autonomous Driving UBTransformer:基于不确定性的自动驾驶复杂场景检测变压器模型
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586103
Ke Wang;Qi Ma;Xingcan Li;Chongqiang Shen;Rui Leng;Jianbo Lu
The traditional object detection algorithm in the intelligent vehicle perception system cannot maintain stable recognition performance in the unknown and changing road environment. We find that uncertainty quantification is of great significance in detecting unknown complex environments and helps to improve the robustness and safety of autonomous driving systems. Therefore, this paper proposes an Uncertainty-based Transformer (UBT) object detection algorithm. Firstly, the double Gaussian feature map network (DGF) is designed to quantify and utilize the uncertainty of the features derived from the backbone network. Secondly, we propose a RBF-based query filtering model(RBQF), which takes uncertainty sum as the index of query vector screening. At the same time, this paper proposes an uncertainty detection head (UDH); the final model output results are quantitative uncertainty, improved detection performance and enhanced algorithm reliability. To further prove the detection performance of the proposed method in real driving scenes, we use COCO, Cityscapes, FoggyCityscapes, RainCityscapes and self-made traffic scene datasets for verification, which shows that our algorithm is well applicable to large datasets and complex road scenes.
智能车辆感知系统中传统的目标检测算法无法在未知多变的道路环境中保持稳定的识别性能。研究发现,不确定性量化对于检测未知复杂环境具有重要意义,有助于提高自动驾驶系统的鲁棒性和安全性。为此,本文提出了一种基于不确定性的变压器(UBT)目标检测算法。首先,设计双高斯特征映射网络(DGF)来量化和利用骨干网中提取的特征的不确定性;其次,提出了一种基于rbf的查询过滤模型(RBQF),该模型以不确定性和作为查询向量筛选的指标。同时,提出了一种不确定度检测头(UDH);最终的模型输出结果是定量的不确定性,提高了检测性能,增强了算法的可靠性。为了进一步证明本文方法在真实驾驶场景中的检测性能,我们使用COCO、cityscape、foggycityscape、raincityscape和自制的交通场景数据集进行验证,表明本文算法在大型数据集和复杂道路场景中具有良好的适用性。
{"title":"UBTransformer: Uncertainty-Based Transformer Model for Complex Scenarios Detection in Autonomous Driving","authors":"Ke Wang;Qi Ma;Xingcan Li;Chongqiang Shen;Rui Leng;Jianbo Lu","doi":"10.1109/TMM.2025.3586103","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586103","url":null,"abstract":"The traditional object detection algorithm in the intelligent vehicle perception system cannot maintain stable recognition performance in the unknown and changing road environment. We find that uncertainty quantification is of great significance in detecting unknown complex environments and helps to improve the robustness and safety of autonomous driving systems. Therefore, this paper proposes an Uncertainty-based Transformer (UBT) object detection algorithm. Firstly, the double Gaussian feature map network (DGF) is designed to quantify and utilize the uncertainty of the features derived from the backbone network. Secondly, we propose a RBF-based query filtering model(RBQF), which takes uncertainty sum as the index of query vector screening. At the same time, this paper proposes an uncertainty detection head (UDH); the final model output results are quantitative uncertainty, improved detection performance and enhanced algorithm reliability. To further prove the detection performance of the proposed method in real driving scenes, we use COCO, Cityscapes, FoggyCityscapes, RainCityscapes and self-made traffic scene datasets for verification, which shows that our algorithm is well applicable to large datasets and complex road scenes.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6581-6592"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DomainVerse: A Benchmark Towards Real-World Distribution Shifts for Training-Free Adaptive Domain Generalization DomainVerse:面向无训练自适应域泛化的真实世界分布转移的基准
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586108
Feng Hou;Jin Yuan;Ying Yang;Yao Zhang;Yang Liu;Yang Zhang;Cheng Zhong;Zhongchao Shi;Jianping Fan;Zhiqiang He;Yong Rui
Traditional cross-domain tasks, including unsupervised domain adaptation (UDA), domain generalization (DG) and test-time adaptation (TTA), rely heavily on the training model by source domain data whether for specific or arbitrary target domains. With the recent advance of vision-language models (VLMs), recognized as natural source models that can be transferred to various downstream tasks without any parameter training, we propose a novel cross-domain task directly combining the strengths of both UDA and DG, named Training-Free Adaptive Domain Generalization (TF-ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which hinder the real-world application of current cross-domain models due to the lack of accurate and fair evaluation of fine-grained realistic domains. These insights motivate us to establish a novel realistic benchmark for TF-ADG. Benefiting from the introduced hierarchical definition of domain shifts, our proposed dataset DomainVerse addresses these issues by providing about 0.5 million images from 390 realistic, hierarchical, and balanced domains, allowing for decomposition across multiple domains within each image. With the help of the constructed DomainVerse and VLMs, we further propose two algorithms called Domain CLIP and Domain++ CLIP for training-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
传统的跨领域任务,包括无监督域自适应(UDA)、域泛化(DG)和测试时间自适应(TTA),无论是针对特定的还是任意的目标域,都严重依赖于源域数据的训练模型。随着视觉语言模型(VLMs)的最新进展,我们提出了一种新的跨域任务,直接结合了UDA和DG的优势,称为无训练自适应域泛化(TF-ADG)。然而,当前的跨域数据集存在许多局限性,如不现实的域、不明确的域定义、无法进行细粒度的域分解等,缺乏对细粒度现实域的准确、公平的评价,阻碍了当前跨域模型的实际应用。这些见解促使我们为TF-ADG建立一个新颖的现实基准。受益于引入的领域转移的分层定义,我们提出的数据集DomainVerse通过提供来自390个现实的、分层的和平衡的领域的大约50万张图像来解决这些问题,允许在每个图像中跨多个领域进行分解。在构建的DomainVerse和vlm的基础上,我们进一步提出了Domain CLIP和domain++ CLIP两种无需训练的自适应域泛化算法。广泛而全面的实验证明了数据集的重要性和所提出方法的有效性。
{"title":"DomainVerse: A Benchmark Towards Real-World Distribution Shifts for Training-Free Adaptive Domain Generalization","authors":"Feng Hou;Jin Yuan;Ying Yang;Yao Zhang;Yang Liu;Yang Zhang;Cheng Zhong;Zhongchao Shi;Jianping Fan;Zhiqiang He;Yong Rui","doi":"10.1109/TMM.2025.3586108","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586108","url":null,"abstract":"Traditional cross-domain tasks, including unsupervised domain adaptation (UDA), domain generalization (DG) and test-time adaptation (TTA), rely heavily on the training model by source domain data whether for specific or arbitrary target domains. With the recent advance of vision-language models (VLMs), recognized as natural source models that can be transferred to various downstream tasks without any parameter training, we propose a novel cross-domain task directly combining the strengths of both UDA and DG, named Training-Free Adaptive Domain Generalization (TF-ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which hinder the real-world application of current cross-domain models due to the lack of accurate and fair evaluation of fine-grained realistic domains. These insights motivate us to establish a novel realistic benchmark for TF-ADG. Benefiting from the introduced hierarchical definition of domain shifts, our proposed dataset DomainVerse addresses these issues by providing about 0.5 million images from 390 realistic, hierarchical, and balanced domains, allowing for decomposition across multiple domains within each image. With the help of the constructed DomainVerse and VLMs, we further propose two algorithms called Domain CLIP and Domain++ CLIP for training-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6648-6660"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIMG: Progressive Image-to-Music Generation With Contrastive Diffusion Models 基于对比扩散模型的图像到音乐的递进生成
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-03 DOI: 10.1109/TMM.2025.3586119
Mulin Chen;Yajie Wang;Xuelong Li
The goal of Image-to-Music Generation is to create pure music according to the given image. Unlike existing tasks such as text-to-image generation, there is no explicit connection between image content and musical melody. Some existing studies attempt to generate music by directly mapping image features (such as color, edges, etc.) into musical notes, which may result in the melodic incoherence. Inspired by neuroscience, it is desirable to employ emotion to bridge these two modalities. However, the continuity and complexity of emotions make it difficult to capture the cross-modal correlation. Drawing from human perception mechanisms of emotions, a Progressive Image-to-Music Generation (PIMG) framework is proposed. The framework designs a mean-teacher based association network to guide the music generation process progressively, starting from highly correlated image-music pairs. The generation network receives more challenging sample pairs gradually, eventually capturing complex cross-modal emotional correspondences. Additionally, a contrastive learning strategy is introduced into the diffusion models to better capture the consistency between pieces of music with the similar emotions. Extensive experimental results demonstrate that the proposed framework is able to generate high-quality and emotionally consistent music from images.
图像到音乐生成的目标是根据给定的图像创造纯粹的音乐。与现有的文本到图像生成等任务不同,图像内容和音乐旋律之间没有明确的联系。现有的一些研究试图通过直接将图像特征(如颜色、边缘等)映射到音符中来生成音乐,这可能会导致旋律的不连贯。受神经科学的启发,我们希望利用情感来连接这两种模式。然而,情感的连续性和复杂性使得捕捉跨模态相关性变得困难。借鉴人类情绪感知机制,提出了一种渐进图像-音乐生成框架。该框架设计了一个基于平均教师的关联网络,从高度相关的图像-音乐对开始,逐步引导音乐生成过程。生成网络逐渐接收更多具有挑战性的样本对,最终捕获复杂的跨模态情感对应。此外,在扩散模型中引入了对比学习策略,以更好地捕捉具有相似情绪的音乐片段之间的一致性。大量的实验结果表明,所提出的框架能够从图像中生成高质量和情感一致的音乐。
{"title":"PIMG: Progressive Image-to-Music Generation With Contrastive Diffusion Models","authors":"Mulin Chen;Yajie Wang;Xuelong Li","doi":"10.1109/TMM.2025.3586119","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586119","url":null,"abstract":"The goal of Image-to-Music Generation is to create pure music according to the given image. Unlike existing tasks such as text-to-image generation, there is no explicit connection between image content and musical melody. Some existing studies attempt to generate music by directly mapping image features (such as color, edges, etc.) into musical notes, which may result in the melodic incoherence. Inspired by neuroscience, it is desirable to employ emotion to bridge these two modalities. However, the continuity and complexity of emotions make it difficult to capture the cross-modal correlation. Drawing from human perception mechanisms of emotions, a Progressive Image-to-Music Generation (PIMG) framework is proposed. The framework designs a mean-teacher based association network to guide the music generation process progressively, starting from highly correlated image-music pairs. The generation network receives more challenging sample pairs gradually, eventually capturing complex cross-modal emotional correspondences. Additionally, a contrastive learning strategy is introduced into the diffusion models to better capture the consistency between pieces of music with the similar emotions. Extensive experimental results demonstrate that the proposed framework is able to generate high-quality and emotionally consistent music from images.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6732-6739"},"PeriodicalIF":9.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal Hybrid Interaction Vision-Language Tracking 多模态混合交互视觉语言跟踪
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565984
Lei Lei;Xianxian Li
Vision-language tracking is a crucial branch of multi-modal object tracking, aiming to jointly locate an object by utilizing visual information and language descriptions. Typically, existing vision-language trackers employ language and visual encoders to extract features from language descriptions and visual information, respectively. Based on these extracted visual and language features, a cross-modal interaction module is used to extract multi-modal features to locate the targets. However, they ignore the differences between visual and language modalities. Due to the lack of pixel-level position information in language descriptions, the positional information of the multi-modal features is greatly weakened by the cross-modal interaction modules. As a result, the vision-language trackers cannot effectively capture subtle changes in the target's positions. To address this problem, we propose a multi-modal hybrid interaction vision-language tracking method (named MHITrack), in which a multi-modal hybrid interaction decoder is designed to enhance the positional information of multi-modal features. The proposed multi-modal hybrid interaction decoder consists of a visual-language interaction module, a multi-level position interaction module, and a hybrid interaction module. Firstly, the multi-level position interaction module is utilized to capture fine-grained position information of the target from multi-level features. Meanwhile, the visual-language interaction module performs cross-modal interaction between visual and language features to obtain multi-modal features. Furthermore, the hybrid interaction module is employed to integrate the multi-modal features with target position information, enhancing the positional information of the multi-modal features. Finally, the proposed tracker can effectively capture subtle changes in the target's positions. Through extensive experiments on four benchmark datasets, namely TNL2k, LaSOT, OTB-Lang, and LaSOText, we demonstrate that the proposed vision-language tracker achieves promising performance compared to existing state-of-the-art vision-language trackers.
视觉语言跟踪是多模态目标跟踪的一个重要分支,旨在利用视觉信息和语言描述共同定位目标。通常,现有的视觉语言跟踪器分别使用语言和视觉编码器从语言描述和视觉信息中提取特征。基于这些提取的视觉特征和语言特征,利用跨模态交互模块提取多模态特征来定位目标。然而,他们忽略了视觉和语言形态之间的差异。由于语言描述中缺乏像素级的位置信息,跨模态交互模块极大地削弱了多模态特征的位置信息。因此,视觉语言跟踪器无法有效捕捉目标位置的细微变化。为了解决这一问题,我们提出了一种多模态混合交互视觉语言跟踪方法(MHITrack),该方法设计了一个多模态混合交互解码器来增强多模态特征的位置信息。所提出的多模态混合交互解码器由视觉语言交互模块、多级位置交互模块和混合交互模块组成。首先,利用多层次位置交互模块从多层次特征中捕获目标的细粒度位置信息;同时,视觉语言交互模块在视觉特征和语言特征之间进行跨模态交互,获得多模态特征。利用混合交互模块将多模态特征与目标位置信息进行融合,增强多模态特征的位置信息。最后,该跟踪器能够有效捕捉目标位置的细微变化。通过在TNL2k、LaSOT、OTB-Lang和LaSOText四个基准数据集上的大量实验,我们证明了与现有的最先进的视觉语言跟踪器相比,所提出的视觉语言跟踪器取得了令人满意的性能。
{"title":"Multi-Modal Hybrid Interaction Vision-Language Tracking","authors":"Lei Lei;Xianxian Li","doi":"10.1109/TMM.2025.3565984","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565984","url":null,"abstract":"Vision-language tracking is a crucial branch of multi-modal object tracking, aiming to jointly locate an object by utilizing visual information and language descriptions. Typically, existing vision-language trackers employ language and visual encoders to extract features from language descriptions and visual information, respectively. Based on these extracted visual and language features, a cross-modal interaction module is used to extract multi-modal features to locate the targets. However, they ignore the differences between visual and language modalities. Due to the lack of pixel-level position information in language descriptions, the positional information of the multi-modal features is greatly weakened by the cross-modal interaction modules. As a result, the vision-language trackers cannot effectively capture subtle changes in the target's positions. To address this problem, we propose a multi-modal hybrid interaction vision-language tracking method (named MHITrack), in which a multi-modal hybrid interaction decoder is designed to enhance the positional information of multi-modal features. The proposed multi-modal hybrid interaction decoder consists of a visual-language interaction module, a multi-level position interaction module, and a hybrid interaction module. Firstly, the multi-level position interaction module is utilized to capture fine-grained position information of the target from multi-level features. Meanwhile, the visual-language interaction module performs cross-modal interaction between visual and language features to obtain multi-modal features. Furthermore, the hybrid interaction module is employed to integrate the multi-modal features with target position information, enhancing the positional information of the multi-modal features. Finally, the proposed tracker can effectively capture subtle changes in the target's positions. Through extensive experiments on four benchmark datasets, namely TNL2k, LaSOT, OTB-Lang, and LaSOText, we demonstrate that the proposed vision-language tracker achieves promising performance compared to existing state-of-the-art vision-language trackers.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5857-5865"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact-Yet-Separate: Proto-Centric Multi-Modal Hashing With Pronounced Category Differences for Multi-Modal Retrieval 紧凑而独立:多模态检索中具有明显类别差异的以原型为中心的多模态哈希
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565973
Ruifan Zuo;Chaoqun Zheng;Lei Zhu;Wenpeng Lu;Jiasheng Si;Weiyu Zhang
Multi-modal hashing achieves low storage costs and high retrieval speeds by using compact hash codes to represent complex and heterogeneous multi-modal data, effectively addressing the inefficiency and resource intensiveness challenges faced by the traditional multi-modal retrieval methods. However, balancing intraclass compactness and interclass separability remains a struggle in existing works due to coarse-grained feature limitations, simplified fusion strategies that overlook semantic complementarity, and neglect of the structural information within the multi-modal data. To address these limitations comprehensively, we propose a Proto-centric Multi-modal Hashing with Pronounced Category Differences (PMH-PCD) model. Specifically, PMH-PCD first learns modality-specific prototypes by deeply exploring within-modality class information, ensuring effective fusion of each modality's unique characteristics. Furthermore, it learns multi-modal integrated class prototypes that seamlessly incorporate semantic information across modalities to effectively capture and represent the intricate relationships and complementary semantic content embedded within the multi-modal data. Additionally, to generate more discriminative and representative binary hash codes, PMH-PCD integrates multifaceted semantic information, encompassing both low-level pairwise relations and high-level structural patterns, holistically capturing intricate data details and leveraging underlying structures. The experimental results demonstrate that, compared with existing advanced methods, PMH-PCD achieves superior and consistent performances in multi-modal retrieval tasks.
多模态哈希通过使用紧凑的哈希码来表示复杂异构的多模态数据,实现了低存储成本和高检索速度,有效解决了传统多模态检索方法低效和资源密集的问题。然而,由于粗粒度特征的限制、忽略语义互补性的简化融合策略以及忽略多模态数据中的结构信息,在现有的工作中,平衡类内紧密性和类间可分离性仍然是一个难题。为了全面解决这些限制,我们提出了一个以原型为中心的具有明显类别差异的多模态哈希(PMH-PCD)模型。具体而言,PMH-PCD首先通过深入探索模态内类信息来学习模态特定的原型,确保有效融合每种模态的独特特征。此外,它学习多模态集成类原型,无缝地整合跨模态的语义信息,以有效地捕获和表示嵌入在多模态数据中的复杂关系和互补语义内容。此外,为了生成更具判别性和代表性的二进制哈希码,PMH-PCD集成了多方面的语义信息,包括低级成对关系和高级结构模式,全面捕获复杂的数据细节并利用底层结构。实验结果表明,与现有的先进方法相比,PMH-PCD在多模态检索任务中取得了优越且一致的性能。
{"title":"Compact-Yet-Separate: Proto-Centric Multi-Modal Hashing With Pronounced Category Differences for Multi-Modal Retrieval","authors":"Ruifan Zuo;Chaoqun Zheng;Lei Zhu;Wenpeng Lu;Jiasheng Si;Weiyu Zhang","doi":"10.1109/TMM.2025.3565973","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565973","url":null,"abstract":"Multi-modal hashing achieves low storage costs and high retrieval speeds by using compact hash codes to represent complex and heterogeneous multi-modal data, effectively addressing the inefficiency and resource intensiveness challenges faced by the traditional multi-modal retrieval methods. However, balancing intraclass compactness and interclass separability remains a struggle in existing works due to coarse-grained feature limitations, simplified fusion strategies that overlook semantic complementarity, and neglect of the structural information within the multi-modal data. To address these limitations comprehensively, we propose a <italic>Proto-centric Multi-modal Hashing with Pronounced Category Differences</i> (PMH-PCD) model. Specifically, PMH-PCD first learns modality-specific prototypes by deeply exploring within-modality class information, ensuring effective fusion of each modality's unique characteristics. Furthermore, it learns multi-modal integrated class prototypes that seamlessly incorporate semantic information across modalities to effectively capture and represent the intricate relationships and complementary semantic content embedded within the multi-modal data. Additionally, to generate more discriminative and representative binary hash codes, PMH-PCD integrates multifaceted semantic information, encompassing both low-level pairwise relations and high-level structural patterns, holistically capturing intricate data details and leveraging underlying structures. The experimental results demonstrate that, compared with existing advanced methods, PMH-PCD achieves superior and consistent performances in multi-modal retrieval tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5843-5856"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Semi-Decoupled Detector for Accurate Object Detection 精确目标检测的渐进式半解耦检测器
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565933
Bo Han;Lihuo He;Junjie Ke;Jinjian Wu;Xinbo Gao
Inconsistent accuracy between classification and localization tasks is a common challenge in modern object detection. Task decoupling, which employs distinct features or labeling strategies for each task, is a widely used approach to address this issue. Although it has led to noteworthy advancements, this approach is insufficient as it neglects task interdependence and lacks an explicit consistency constraint. To bridge this gap, this paper proposes the Progressive Semi-Decoupled Detector (ProSDD) to enhance both classification and localization accuracy. Specifically, a new detection head is designed that incorporates feature suppression and enhancement mechanism (FSEM) and bidirectional interaction module (BIM). Compared with the decoupled head, it not only filters out task-irrelevant information and enhances task-related information, but also avoids excessive decoupling at the feature level. Moreover, both FSEM and BIM are used multiple times, thus forming a progressive semi-decoupled head. Then, a novel consistency loss is proposed and integrated into the loss function of object detection, ensuring harmonic performance in classification and localization. Experimental results demonstrate that the proposed ProSDD effectively alleviates inconsistent accuracy and achieves high-quality object detection. Taking the pretrained ResNet-50 as the backbone, ProSDD achieves a remarkable 43.3 AP on the MS COCO dataset, surpassing contemporary state-of-the-art detectors by a substantial margin under the equivalent configurations.
分类任务和定位任务的准确性不一致是现代目标检测中普遍存在的问题。任务解耦是解决这个问题的一种广泛使用的方法,它为每个任务使用不同的特征或标记策略。尽管它已经取得了显著的进步,但这种方法是不够的,因为它忽略了任务的相互依赖性,并且缺乏明确的一致性约束。为了弥补这一差距,本文提出了渐进式半解耦检测器(procdd)来提高分类和定位精度。具体而言,设计了一种结合特征抑制和增强机制(FSEM)和双向交互模块(BIM)的新型检测头。与解耦头相比,它不仅滤除了任务无关信息,增强了任务相关信息,而且避免了特征层的过度解耦。此外,FSEM和BIM都被多次使用,从而形成了一个渐进的半解耦头。然后,提出了一种新的一致性损失,并将其集成到目标检测的损失函数中,保证了分类和定位的谐波性能。实验结果表明,该方法有效地缓解了精度不一致的问题,实现了高质量的目标检测。以预训练的ResNet-50为主干,ProSDD在MS COCO数据集上实现了惊人的43.3 AP,在同等配置下大大超过了当代最先进的检测器。
{"title":"Progressive Semi-Decoupled Detector for Accurate Object Detection","authors":"Bo Han;Lihuo He;Junjie Ke;Jinjian Wu;Xinbo Gao","doi":"10.1109/TMM.2025.3565933","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565933","url":null,"abstract":"Inconsistent accuracy between classification and localization tasks is a common challenge in modern object detection. Task decoupling, which employs distinct features or labeling strategies for each task, is a widely used approach to address this issue. Although it has led to noteworthy advancements, this approach is insufficient as it neglects task interdependence and lacks an explicit consistency constraint. To bridge this gap, this paper proposes the <bold>Pro</b>gressive <bold>S</b>emi-<bold>D</b>ecoupled <bold>D</b>etector (ProSDD) to enhance both classification and localization accuracy. Specifically, a new detection head is designed that incorporates feature suppression and enhancement mechanism (FSEM) and bidirectional interaction module (BIM). Compared with the decoupled head, it not only filters out task-irrelevant information and enhances task-related information, but also avoids excessive decoupling at the feature level. Moreover, both FSEM and BIM are used multiple times, thus forming a progressive semi-decoupled head. Then, a novel consistency loss is proposed and integrated into the loss function of object detection, ensuring harmonic performance in classification and localization. Experimental results demonstrate that the proposed ProSDD effectively alleviates inconsistent accuracy and achieves high-quality object detection. Taking the pretrained ResNet-50 as the backbone, ProSDD achieves a remarkable 43.3 AP on the MS COCO dataset, surpassing contemporary state-of-the-art detectors by a substantial margin under the equivalent configurations.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5866-5878"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Throughput Shelf Life Determination of Atlantic Cod (Gadus morhua L.) by Use of Hyperspectral Imaging 利用高光谱成像技术测定大西洋鳕鱼的高通量保质期
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-16 DOI: 10.1109/TMM.2025.3561661
Samuel Ortega;Tatiana N. Ageeva;Silje Kristoffersen;Karsten Heia;Heidi A. Nilsen
Fish quality and shelf life can be evaluated using various assessment methods, such as sensory analysis, biochemical tests, microbiological evaluations, and physicochemical analyses. However, these methods are invasive and time-consuming, driving interest in technologies capable of estimating shelf life through non-invasive procedures. This study investigates the potential of hyperspectral imaging as a non-invasive technology for predicting the shelf life of Atlantic cod. A storage experiment was conducted that included both gutted fish with heads (GFWH) and fillets, with sensory evaluation and biochemical measurements employed to determine shelf life. Subsequently, hyperspectral images of the fish samples were captured under industrial production conditions, and the spectral data were analyzed using different regression algorithms. The majority of the regression techniques utilized in this research successfully predicted shelf life for both fillets and GFWH, achieving a root mean square error (RMSE) lower than one day. While most regression models exhibited comparable performance in predicting the shelf life of fillets, deep learning-based models demonstrated superior performance for GFWH. These results suggest that hyperspectral imaging technology has significant potential as a non-invasive tool for estimating the shelf life of Atlantic cod, thereby enabling effective quality-based sorting, reducing food waste, and enhancing sustainability in the seafood supply chain.
鱼的质量和保质期可以用各种评估方法进行评估,如感官分析、生化测试、微生物评估和理化分析。然而,这些方法是侵入性的和耗时的,通过非侵入性程序来估计保质期的技术引起了人们的兴趣。本研究探讨了高光谱成像作为一种非侵入性技术预测大西洋鳕鱼保质期的潜力。采用感官评价和生化测定方法,对带头去内脏鱼和鱼片进行了贮藏试验。随后,在工业生产条件下捕获鱼样的高光谱图像,并使用不同的回归算法对光谱数据进行分析。本研究中使用的大多数回归技术成功地预测了鱼片和GFWH的保质期,实现了低于一天的均方根误差(RMSE)。虽然大多数回归模型在预测鱼片的货架寿命方面表现出相当的性能,但基于深度学习的模型在GFWH方面表现出优异的性能。这些结果表明,高光谱成像技术作为一种评估大西洋鳕鱼保质期的非侵入性工具具有巨大的潜力,从而实现有效的基于质量的分类,减少食物浪费,并增强海鲜供应链的可持续性。
{"title":"High Throughput Shelf Life Determination of Atlantic Cod (Gadus morhua L.) by Use of Hyperspectral Imaging","authors":"Samuel Ortega;Tatiana N. Ageeva;Silje Kristoffersen;Karsten Heia;Heidi A. Nilsen","doi":"10.1109/TMM.2025.3561661","DOIUrl":"https://doi.org/10.1109/TMM.2025.3561661","url":null,"abstract":"Fish quality and shelf life can be evaluated using various assessment methods, such as sensory analysis, biochemical tests, microbiological evaluations, and physicochemical analyses. However, these methods are invasive and time-consuming, driving interest in technologies capable of estimating shelf life through non-invasive procedures. This study investigates the potential of hyperspectral imaging as a non-invasive technology for predicting the shelf life of Atlantic cod. A storage experiment was conducted that included both gutted fish with heads (GFWH) and fillets, with sensory evaluation and biochemical measurements employed to determine shelf life. Subsequently, hyperspectral images of the fish samples were captured under industrial production conditions, and the spectral data were analyzed using different regression algorithms. The majority of the regression techniques utilized in this research successfully predicted shelf life for both fillets and GFWH, achieving a root mean square error (RMSE) lower than one day. While most regression models exhibited comparable performance in predicting the shelf life of fillets, deep learning-based models demonstrated superior performance for GFWH. These results suggest that hyperspectral imaging technology has significant potential as a non-invasive tool for estimating the shelf life of Atlantic cod, thereby enabling effective quality-based sorting, reducing food waste, and enhancing sustainability in the seafood supply chain.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2809-2824"},"PeriodicalIF":8.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10966199","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Manipulation 轨迹锚定多视图编辑文本引导的三维高斯操作
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-14 DOI: 10.1109/TMM.2025.3557618
Chaofan Luo;Donglin Di;Xun Yang;Yongjia Ma;Zhou Xue;Wei Chen;Xiaofei Gou;Yebin Liu
Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency during the multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from the text-to-image process. Additionally, we explore the connection between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choices, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Extensive quantitative and qualitative results in text-guided 3D scene editing clearly indicate that our method can achieve superior editing quality compared with state-of-the-art 3D scene editing methods.
尽管在3D场景编辑领域取得了重大进展,但目前的方法遇到了实质性的挑战,特别是在多视图编辑过程中保持3D一致性。为了应对这一挑战,我们提出了一种渐进式3D编辑策略,该策略通过具有双分支编辑机制的轨迹锚定方案(TAS)确保多视图一致性。具体来说,TAS促进了2D视图编辑和3D更新之间的紧密耦合迭代过程,防止了文本到图像过程中产生的错误积累。此外,我们探讨了基于优化的方法和基于重建的方法之间的联系,为选择卓越的设计选择提供了统一的视角,支持设计的TAS背后的基本原理。我们进一步提出了一个无需调整的视图一致注意力控制(VCAC)模块,该模块利用来自源分支的跨视图语义和几何参考,在编辑2D视图期间产生来自目标分支的对齐视图。为了验证该方法的有效性,我们对二维实例进行了分析,以证明该方法与VCAC模块的一致性得到了改善。大量的文本引导三维场景编辑的定量和定性结果清楚地表明,与目前最先进的3D场景编辑方法相比,我们的方法可以实现更高的编辑质量。
{"title":"TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Manipulation","authors":"Chaofan Luo;Donglin Di;Xun Yang;Yongjia Ma;Zhou Xue;Wei Chen;Xiaofei Gou;Yebin Liu","doi":"10.1109/TMM.2025.3557618","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557618","url":null,"abstract":"Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency during the multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from the text-to-image process. Additionally, we explore the connection between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choices, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Extensive quantitative and qualitative results in text-guided 3D scene editing clearly indicate that our method can achieve superior editing quality compared with state-of-the-art 3D scene editing methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2886-2898"},"PeriodicalIF":8.4,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1