首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
UBTransformer: Uncertainty-Based Transformer Model for Complex Scenarios Detection in Autonomous Driving UBTransformer:基于不确定性的自动驾驶复杂场景检测变压器模型
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586103
Ke Wang;Qi Ma;Xingcan Li;Chongqiang Shen;Rui Leng;Jianbo Lu
The traditional object detection algorithm in the intelligent vehicle perception system cannot maintain stable recognition performance in the unknown and changing road environment. We find that uncertainty quantification is of great significance in detecting unknown complex environments and helps to improve the robustness and safety of autonomous driving systems. Therefore, this paper proposes an Uncertainty-based Transformer (UBT) object detection algorithm. Firstly, the double Gaussian feature map network (DGF) is designed to quantify and utilize the uncertainty of the features derived from the backbone network. Secondly, we propose a RBF-based query filtering model(RBQF), which takes uncertainty sum as the index of query vector screening. At the same time, this paper proposes an uncertainty detection head (UDH); the final model output results are quantitative uncertainty, improved detection performance and enhanced algorithm reliability. To further prove the detection performance of the proposed method in real driving scenes, we use COCO, Cityscapes, FoggyCityscapes, RainCityscapes and self-made traffic scene datasets for verification, which shows that our algorithm is well applicable to large datasets and complex road scenes.
智能车辆感知系统中传统的目标检测算法无法在未知多变的道路环境中保持稳定的识别性能。研究发现,不确定性量化对于检测未知复杂环境具有重要意义,有助于提高自动驾驶系统的鲁棒性和安全性。为此,本文提出了一种基于不确定性的变压器(UBT)目标检测算法。首先,设计双高斯特征映射网络(DGF)来量化和利用骨干网中提取的特征的不确定性;其次,提出了一种基于rbf的查询过滤模型(RBQF),该模型以不确定性和作为查询向量筛选的指标。同时,提出了一种不确定度检测头(UDH);最终的模型输出结果是定量的不确定性,提高了检测性能,增强了算法的可靠性。为了进一步证明本文方法在真实驾驶场景中的检测性能,我们使用COCO、cityscape、foggycityscape、raincityscape和自制的交通场景数据集进行验证,表明本文算法在大型数据集和复杂道路场景中具有良好的适用性。
{"title":"UBTransformer: Uncertainty-Based Transformer Model for Complex Scenarios Detection in Autonomous Driving","authors":"Ke Wang;Qi Ma;Xingcan Li;Chongqiang Shen;Rui Leng;Jianbo Lu","doi":"10.1109/TMM.2025.3586103","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586103","url":null,"abstract":"The traditional object detection algorithm in the intelligent vehicle perception system cannot maintain stable recognition performance in the unknown and changing road environment. We find that uncertainty quantification is of great significance in detecting unknown complex environments and helps to improve the robustness and safety of autonomous driving systems. Therefore, this paper proposes an Uncertainty-based Transformer (UBT) object detection algorithm. Firstly, the double Gaussian feature map network (DGF) is designed to quantify and utilize the uncertainty of the features derived from the backbone network. Secondly, we propose a RBF-based query filtering model(RBQF), which takes uncertainty sum as the index of query vector screening. At the same time, this paper proposes an uncertainty detection head (UDH); the final model output results are quantitative uncertainty, improved detection performance and enhanced algorithm reliability. To further prove the detection performance of the proposed method in real driving scenes, we use COCO, Cityscapes, FoggyCityscapes, RainCityscapes and self-made traffic scene datasets for verification, which shows that our algorithm is well applicable to large datasets and complex road scenes.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6581-6592"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DomainVerse: A Benchmark Towards Real-World Distribution Shifts for Training-Free Adaptive Domain Generalization DomainVerse:面向无训练自适应域泛化的真实世界分布转移的基准
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-04 DOI: 10.1109/TMM.2025.3586108
Feng Hou;Jin Yuan;Ying Yang;Yao Zhang;Yang Liu;Yang Zhang;Cheng Zhong;Zhongchao Shi;Jianping Fan;Zhiqiang He;Yong Rui
Traditional cross-domain tasks, including unsupervised domain adaptation (UDA), domain generalization (DG) and test-time adaptation (TTA), rely heavily on the training model by source domain data whether for specific or arbitrary target domains. With the recent advance of vision-language models (VLMs), recognized as natural source models that can be transferred to various downstream tasks without any parameter training, we propose a novel cross-domain task directly combining the strengths of both UDA and DG, named Training-Free Adaptive Domain Generalization (TF-ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which hinder the real-world application of current cross-domain models due to the lack of accurate and fair evaluation of fine-grained realistic domains. These insights motivate us to establish a novel realistic benchmark for TF-ADG. Benefiting from the introduced hierarchical definition of domain shifts, our proposed dataset DomainVerse addresses these issues by providing about 0.5 million images from 390 realistic, hierarchical, and balanced domains, allowing for decomposition across multiple domains within each image. With the help of the constructed DomainVerse and VLMs, we further propose two algorithms called Domain CLIP and Domain++ CLIP for training-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
传统的跨领域任务,包括无监督域自适应(UDA)、域泛化(DG)和测试时间自适应(TTA),无论是针对特定的还是任意的目标域,都严重依赖于源域数据的训练模型。随着视觉语言模型(VLMs)的最新进展,我们提出了一种新的跨域任务,直接结合了UDA和DG的优势,称为无训练自适应域泛化(TF-ADG)。然而,当前的跨域数据集存在许多局限性,如不现实的域、不明确的域定义、无法进行细粒度的域分解等,缺乏对细粒度现实域的准确、公平的评价,阻碍了当前跨域模型的实际应用。这些见解促使我们为TF-ADG建立一个新颖的现实基准。受益于引入的领域转移的分层定义,我们提出的数据集DomainVerse通过提供来自390个现实的、分层的和平衡的领域的大约50万张图像来解决这些问题,允许在每个图像中跨多个领域进行分解。在构建的DomainVerse和vlm的基础上,我们进一步提出了Domain CLIP和domain++ CLIP两种无需训练的自适应域泛化算法。广泛而全面的实验证明了数据集的重要性和所提出方法的有效性。
{"title":"DomainVerse: A Benchmark Towards Real-World Distribution Shifts for Training-Free Adaptive Domain Generalization","authors":"Feng Hou;Jin Yuan;Ying Yang;Yao Zhang;Yang Liu;Yang Zhang;Cheng Zhong;Zhongchao Shi;Jianping Fan;Zhiqiang He;Yong Rui","doi":"10.1109/TMM.2025.3586108","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586108","url":null,"abstract":"Traditional cross-domain tasks, including unsupervised domain adaptation (UDA), domain generalization (DG) and test-time adaptation (TTA), rely heavily on the training model by source domain data whether for specific or arbitrary target domains. With the recent advance of vision-language models (VLMs), recognized as natural source models that can be transferred to various downstream tasks without any parameter training, we propose a novel cross-domain task directly combining the strengths of both UDA and DG, named Training-Free Adaptive Domain Generalization (TF-ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which hinder the real-world application of current cross-domain models due to the lack of accurate and fair evaluation of fine-grained realistic domains. These insights motivate us to establish a novel realistic benchmark for TF-ADG. Benefiting from the introduced hierarchical definition of domain shifts, our proposed dataset DomainVerse addresses these issues by providing about 0.5 million images from 390 realistic, hierarchical, and balanced domains, allowing for decomposition across multiple domains within each image. With the help of the constructed DomainVerse and VLMs, we further propose two algorithms called Domain CLIP and Domain++ CLIP for training-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6648-6660"},"PeriodicalIF":9.7,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIMG: Progressive Image-to-Music Generation With Contrastive Diffusion Models 基于对比扩散模型的图像到音乐的递进生成
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-03 DOI: 10.1109/TMM.2025.3586119
Mulin Chen;Yajie Wang;Xuelong Li
The goal of Image-to-Music Generation is to create pure music according to the given image. Unlike existing tasks such as text-to-image generation, there is no explicit connection between image content and musical melody. Some existing studies attempt to generate music by directly mapping image features (such as color, edges, etc.) into musical notes, which may result in the melodic incoherence. Inspired by neuroscience, it is desirable to employ emotion to bridge these two modalities. However, the continuity and complexity of emotions make it difficult to capture the cross-modal correlation. Drawing from human perception mechanisms of emotions, a Progressive Image-to-Music Generation (PIMG) framework is proposed. The framework designs a mean-teacher based association network to guide the music generation process progressively, starting from highly correlated image-music pairs. The generation network receives more challenging sample pairs gradually, eventually capturing complex cross-modal emotional correspondences. Additionally, a contrastive learning strategy is introduced into the diffusion models to better capture the consistency between pieces of music with the similar emotions. Extensive experimental results demonstrate that the proposed framework is able to generate high-quality and emotionally consistent music from images.
图像到音乐生成的目标是根据给定的图像创造纯粹的音乐。与现有的文本到图像生成等任务不同,图像内容和音乐旋律之间没有明确的联系。现有的一些研究试图通过直接将图像特征(如颜色、边缘等)映射到音符中来生成音乐,这可能会导致旋律的不连贯。受神经科学的启发,我们希望利用情感来连接这两种模式。然而,情感的连续性和复杂性使得捕捉跨模态相关性变得困难。借鉴人类情绪感知机制,提出了一种渐进图像-音乐生成框架。该框架设计了一个基于平均教师的关联网络,从高度相关的图像-音乐对开始,逐步引导音乐生成过程。生成网络逐渐接收更多具有挑战性的样本对,最终捕获复杂的跨模态情感对应。此外,在扩散模型中引入了对比学习策略,以更好地捕捉具有相似情绪的音乐片段之间的一致性。大量的实验结果表明,所提出的框架能够从图像中生成高质量和情感一致的音乐。
{"title":"PIMG: Progressive Image-to-Music Generation With Contrastive Diffusion Models","authors":"Mulin Chen;Yajie Wang;Xuelong Li","doi":"10.1109/TMM.2025.3586119","DOIUrl":"https://doi.org/10.1109/TMM.2025.3586119","url":null,"abstract":"The goal of Image-to-Music Generation is to create pure music according to the given image. Unlike existing tasks such as text-to-image generation, there is no explicit connection between image content and musical melody. Some existing studies attempt to generate music by directly mapping image features (such as color, edges, etc.) into musical notes, which may result in the melodic incoherence. Inspired by neuroscience, it is desirable to employ emotion to bridge these two modalities. However, the continuity and complexity of emotions make it difficult to capture the cross-modal correlation. Drawing from human perception mechanisms of emotions, a Progressive Image-to-Music Generation (PIMG) framework is proposed. The framework designs a mean-teacher based association network to guide the music generation process progressively, starting from highly correlated image-music pairs. The generation network receives more challenging sample pairs gradually, eventually capturing complex cross-modal emotional correspondences. Additionally, a contrastive learning strategy is introduced into the diffusion models to better capture the consistency between pieces of music with the similar emotions. Extensive experimental results demonstrate that the proposed framework is able to generate high-quality and emotionally consistent music from images.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6732-6739"},"PeriodicalIF":9.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal Hybrid Interaction Vision-Language Tracking 多模态混合交互视觉语言跟踪
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565984
Lei Lei;Xianxian Li
Vision-language tracking is a crucial branch of multi-modal object tracking, aiming to jointly locate an object by utilizing visual information and language descriptions. Typically, existing vision-language trackers employ language and visual encoders to extract features from language descriptions and visual information, respectively. Based on these extracted visual and language features, a cross-modal interaction module is used to extract multi-modal features to locate the targets. However, they ignore the differences between visual and language modalities. Due to the lack of pixel-level position information in language descriptions, the positional information of the multi-modal features is greatly weakened by the cross-modal interaction modules. As a result, the vision-language trackers cannot effectively capture subtle changes in the target's positions. To address this problem, we propose a multi-modal hybrid interaction vision-language tracking method (named MHITrack), in which a multi-modal hybrid interaction decoder is designed to enhance the positional information of multi-modal features. The proposed multi-modal hybrid interaction decoder consists of a visual-language interaction module, a multi-level position interaction module, and a hybrid interaction module. Firstly, the multi-level position interaction module is utilized to capture fine-grained position information of the target from multi-level features. Meanwhile, the visual-language interaction module performs cross-modal interaction between visual and language features to obtain multi-modal features. Furthermore, the hybrid interaction module is employed to integrate the multi-modal features with target position information, enhancing the positional information of the multi-modal features. Finally, the proposed tracker can effectively capture subtle changes in the target's positions. Through extensive experiments on four benchmark datasets, namely TNL2k, LaSOT, OTB-Lang, and LaSOText, we demonstrate that the proposed vision-language tracker achieves promising performance compared to existing state-of-the-art vision-language trackers.
视觉语言跟踪是多模态目标跟踪的一个重要分支,旨在利用视觉信息和语言描述共同定位目标。通常,现有的视觉语言跟踪器分别使用语言和视觉编码器从语言描述和视觉信息中提取特征。基于这些提取的视觉特征和语言特征,利用跨模态交互模块提取多模态特征来定位目标。然而,他们忽略了视觉和语言形态之间的差异。由于语言描述中缺乏像素级的位置信息,跨模态交互模块极大地削弱了多模态特征的位置信息。因此,视觉语言跟踪器无法有效捕捉目标位置的细微变化。为了解决这一问题,我们提出了一种多模态混合交互视觉语言跟踪方法(MHITrack),该方法设计了一个多模态混合交互解码器来增强多模态特征的位置信息。所提出的多模态混合交互解码器由视觉语言交互模块、多级位置交互模块和混合交互模块组成。首先,利用多层次位置交互模块从多层次特征中捕获目标的细粒度位置信息;同时,视觉语言交互模块在视觉特征和语言特征之间进行跨模态交互,获得多模态特征。利用混合交互模块将多模态特征与目标位置信息进行融合,增强多模态特征的位置信息。最后,该跟踪器能够有效捕捉目标位置的细微变化。通过在TNL2k、LaSOT、OTB-Lang和LaSOText四个基准数据集上的大量实验,我们证明了与现有的最先进的视觉语言跟踪器相比,所提出的视觉语言跟踪器取得了令人满意的性能。
{"title":"Multi-Modal Hybrid Interaction Vision-Language Tracking","authors":"Lei Lei;Xianxian Li","doi":"10.1109/TMM.2025.3565984","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565984","url":null,"abstract":"Vision-language tracking is a crucial branch of multi-modal object tracking, aiming to jointly locate an object by utilizing visual information and language descriptions. Typically, existing vision-language trackers employ language and visual encoders to extract features from language descriptions and visual information, respectively. Based on these extracted visual and language features, a cross-modal interaction module is used to extract multi-modal features to locate the targets. However, they ignore the differences between visual and language modalities. Due to the lack of pixel-level position information in language descriptions, the positional information of the multi-modal features is greatly weakened by the cross-modal interaction modules. As a result, the vision-language trackers cannot effectively capture subtle changes in the target's positions. To address this problem, we propose a multi-modal hybrid interaction vision-language tracking method (named MHITrack), in which a multi-modal hybrid interaction decoder is designed to enhance the positional information of multi-modal features. The proposed multi-modal hybrid interaction decoder consists of a visual-language interaction module, a multi-level position interaction module, and a hybrid interaction module. Firstly, the multi-level position interaction module is utilized to capture fine-grained position information of the target from multi-level features. Meanwhile, the visual-language interaction module performs cross-modal interaction between visual and language features to obtain multi-modal features. Furthermore, the hybrid interaction module is employed to integrate the multi-modal features with target position information, enhancing the positional information of the multi-modal features. Finally, the proposed tracker can effectively capture subtle changes in the target's positions. Through extensive experiments on four benchmark datasets, namely TNL2k, LaSOT, OTB-Lang, and LaSOText, we demonstrate that the proposed vision-language tracker achieves promising performance compared to existing state-of-the-art vision-language trackers.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5857-5865"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact-Yet-Separate: Proto-Centric Multi-Modal Hashing With Pronounced Category Differences for Multi-Modal Retrieval 紧凑而独立:多模态检索中具有明显类别差异的以原型为中心的多模态哈希
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565973
Ruifan Zuo;Chaoqun Zheng;Lei Zhu;Wenpeng Lu;Jiasheng Si;Weiyu Zhang
Multi-modal hashing achieves low storage costs and high retrieval speeds by using compact hash codes to represent complex and heterogeneous multi-modal data, effectively addressing the inefficiency and resource intensiveness challenges faced by the traditional multi-modal retrieval methods. However, balancing intraclass compactness and interclass separability remains a struggle in existing works due to coarse-grained feature limitations, simplified fusion strategies that overlook semantic complementarity, and neglect of the structural information within the multi-modal data. To address these limitations comprehensively, we propose a Proto-centric Multi-modal Hashing with Pronounced Category Differences (PMH-PCD) model. Specifically, PMH-PCD first learns modality-specific prototypes by deeply exploring within-modality class information, ensuring effective fusion of each modality's unique characteristics. Furthermore, it learns multi-modal integrated class prototypes that seamlessly incorporate semantic information across modalities to effectively capture and represent the intricate relationships and complementary semantic content embedded within the multi-modal data. Additionally, to generate more discriminative and representative binary hash codes, PMH-PCD integrates multifaceted semantic information, encompassing both low-level pairwise relations and high-level structural patterns, holistically capturing intricate data details and leveraging underlying structures. The experimental results demonstrate that, compared with existing advanced methods, PMH-PCD achieves superior and consistent performances in multi-modal retrieval tasks.
多模态哈希通过使用紧凑的哈希码来表示复杂异构的多模态数据,实现了低存储成本和高检索速度,有效解决了传统多模态检索方法低效和资源密集的问题。然而,由于粗粒度特征的限制、忽略语义互补性的简化融合策略以及忽略多模态数据中的结构信息,在现有的工作中,平衡类内紧密性和类间可分离性仍然是一个难题。为了全面解决这些限制,我们提出了一个以原型为中心的具有明显类别差异的多模态哈希(PMH-PCD)模型。具体而言,PMH-PCD首先通过深入探索模态内类信息来学习模态特定的原型,确保有效融合每种模态的独特特征。此外,它学习多模态集成类原型,无缝地整合跨模态的语义信息,以有效地捕获和表示嵌入在多模态数据中的复杂关系和互补语义内容。此外,为了生成更具判别性和代表性的二进制哈希码,PMH-PCD集成了多方面的语义信息,包括低级成对关系和高级结构模式,全面捕获复杂的数据细节并利用底层结构。实验结果表明,与现有的先进方法相比,PMH-PCD在多模态检索任务中取得了优越且一致的性能。
{"title":"Compact-Yet-Separate: Proto-Centric Multi-Modal Hashing With Pronounced Category Differences for Multi-Modal Retrieval","authors":"Ruifan Zuo;Chaoqun Zheng;Lei Zhu;Wenpeng Lu;Jiasheng Si;Weiyu Zhang","doi":"10.1109/TMM.2025.3565973","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565973","url":null,"abstract":"Multi-modal hashing achieves low storage costs and high retrieval speeds by using compact hash codes to represent complex and heterogeneous multi-modal data, effectively addressing the inefficiency and resource intensiveness challenges faced by the traditional multi-modal retrieval methods. However, balancing intraclass compactness and interclass separability remains a struggle in existing works due to coarse-grained feature limitations, simplified fusion strategies that overlook semantic complementarity, and neglect of the structural information within the multi-modal data. To address these limitations comprehensively, we propose a <italic>Proto-centric Multi-modal Hashing with Pronounced Category Differences</i> (PMH-PCD) model. Specifically, PMH-PCD first learns modality-specific prototypes by deeply exploring within-modality class information, ensuring effective fusion of each modality's unique characteristics. Furthermore, it learns multi-modal integrated class prototypes that seamlessly incorporate semantic information across modalities to effectively capture and represent the intricate relationships and complementary semantic content embedded within the multi-modal data. Additionally, to generate more discriminative and representative binary hash codes, PMH-PCD integrates multifaceted semantic information, encompassing both low-level pairwise relations and high-level structural patterns, holistically capturing intricate data details and leveraging underlying structures. The experimental results demonstrate that, compared with existing advanced methods, PMH-PCD achieves superior and consistent performances in multi-modal retrieval tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5843-5856"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Semi-Decoupled Detector for Accurate Object Detection 精确目标检测的渐进式半解耦检测器
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-30 DOI: 10.1109/TMM.2025.3565933
Bo Han;Lihuo He;Junjie Ke;Jinjian Wu;Xinbo Gao
Inconsistent accuracy between classification and localization tasks is a common challenge in modern object detection. Task decoupling, which employs distinct features or labeling strategies for each task, is a widely used approach to address this issue. Although it has led to noteworthy advancements, this approach is insufficient as it neglects task interdependence and lacks an explicit consistency constraint. To bridge this gap, this paper proposes the Progressive Semi-Decoupled Detector (ProSDD) to enhance both classification and localization accuracy. Specifically, a new detection head is designed that incorporates feature suppression and enhancement mechanism (FSEM) and bidirectional interaction module (BIM). Compared with the decoupled head, it not only filters out task-irrelevant information and enhances task-related information, but also avoids excessive decoupling at the feature level. Moreover, both FSEM and BIM are used multiple times, thus forming a progressive semi-decoupled head. Then, a novel consistency loss is proposed and integrated into the loss function of object detection, ensuring harmonic performance in classification and localization. Experimental results demonstrate that the proposed ProSDD effectively alleviates inconsistent accuracy and achieves high-quality object detection. Taking the pretrained ResNet-50 as the backbone, ProSDD achieves a remarkable 43.3 AP on the MS COCO dataset, surpassing contemporary state-of-the-art detectors by a substantial margin under the equivalent configurations.
分类任务和定位任务的准确性不一致是现代目标检测中普遍存在的问题。任务解耦是解决这个问题的一种广泛使用的方法,它为每个任务使用不同的特征或标记策略。尽管它已经取得了显著的进步,但这种方法是不够的,因为它忽略了任务的相互依赖性,并且缺乏明确的一致性约束。为了弥补这一差距,本文提出了渐进式半解耦检测器(procdd)来提高分类和定位精度。具体而言,设计了一种结合特征抑制和增强机制(FSEM)和双向交互模块(BIM)的新型检测头。与解耦头相比,它不仅滤除了任务无关信息,增强了任务相关信息,而且避免了特征层的过度解耦。此外,FSEM和BIM都被多次使用,从而形成了一个渐进的半解耦头。然后,提出了一种新的一致性损失,并将其集成到目标检测的损失函数中,保证了分类和定位的谐波性能。实验结果表明,该方法有效地缓解了精度不一致的问题,实现了高质量的目标检测。以预训练的ResNet-50为主干,ProSDD在MS COCO数据集上实现了惊人的43.3 AP,在同等配置下大大超过了当代最先进的检测器。
{"title":"Progressive Semi-Decoupled Detector for Accurate Object Detection","authors":"Bo Han;Lihuo He;Junjie Ke;Jinjian Wu;Xinbo Gao","doi":"10.1109/TMM.2025.3565933","DOIUrl":"https://doi.org/10.1109/TMM.2025.3565933","url":null,"abstract":"Inconsistent accuracy between classification and localization tasks is a common challenge in modern object detection. Task decoupling, which employs distinct features or labeling strategies for each task, is a widely used approach to address this issue. Although it has led to noteworthy advancements, this approach is insufficient as it neglects task interdependence and lacks an explicit consistency constraint. To bridge this gap, this paper proposes the <bold>Pro</b>gressive <bold>S</b>emi-<bold>D</b>ecoupled <bold>D</b>etector (ProSDD) to enhance both classification and localization accuracy. Specifically, a new detection head is designed that incorporates feature suppression and enhancement mechanism (FSEM) and bidirectional interaction module (BIM). Compared with the decoupled head, it not only filters out task-irrelevant information and enhances task-related information, but also avoids excessive decoupling at the feature level. Moreover, both FSEM and BIM are used multiple times, thus forming a progressive semi-decoupled head. Then, a novel consistency loss is proposed and integrated into the loss function of object detection, ensuring harmonic performance in classification and localization. Experimental results demonstrate that the proposed ProSDD effectively alleviates inconsistent accuracy and achieves high-quality object detection. Taking the pretrained ResNet-50 as the backbone, ProSDD achieves a remarkable 43.3 AP on the MS COCO dataset, surpassing contemporary state-of-the-art detectors by a substantial margin under the equivalent configurations.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5866-5878"},"PeriodicalIF":9.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Throughput Shelf Life Determination of Atlantic Cod (Gadus morhua L.) by Use of Hyperspectral Imaging 利用高光谱成像技术测定大西洋鳕鱼的高通量保质期
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-16 DOI: 10.1109/TMM.2025.3561661
Samuel Ortega;Tatiana N. Ageeva;Silje Kristoffersen;Karsten Heia;Heidi A. Nilsen
Fish quality and shelf life can be evaluated using various assessment methods, such as sensory analysis, biochemical tests, microbiological evaluations, and physicochemical analyses. However, these methods are invasive and time-consuming, driving interest in technologies capable of estimating shelf life through non-invasive procedures. This study investigates the potential of hyperspectral imaging as a non-invasive technology for predicting the shelf life of Atlantic cod. A storage experiment was conducted that included both gutted fish with heads (GFWH) and fillets, with sensory evaluation and biochemical measurements employed to determine shelf life. Subsequently, hyperspectral images of the fish samples were captured under industrial production conditions, and the spectral data were analyzed using different regression algorithms. The majority of the regression techniques utilized in this research successfully predicted shelf life for both fillets and GFWH, achieving a root mean square error (RMSE) lower than one day. While most regression models exhibited comparable performance in predicting the shelf life of fillets, deep learning-based models demonstrated superior performance for GFWH. These results suggest that hyperspectral imaging technology has significant potential as a non-invasive tool for estimating the shelf life of Atlantic cod, thereby enabling effective quality-based sorting, reducing food waste, and enhancing sustainability in the seafood supply chain.
鱼的质量和保质期可以用各种评估方法进行评估,如感官分析、生化测试、微生物评估和理化分析。然而,这些方法是侵入性的和耗时的,通过非侵入性程序来估计保质期的技术引起了人们的兴趣。本研究探讨了高光谱成像作为一种非侵入性技术预测大西洋鳕鱼保质期的潜力。采用感官评价和生化测定方法,对带头去内脏鱼和鱼片进行了贮藏试验。随后,在工业生产条件下捕获鱼样的高光谱图像,并使用不同的回归算法对光谱数据进行分析。本研究中使用的大多数回归技术成功地预测了鱼片和GFWH的保质期,实现了低于一天的均方根误差(RMSE)。虽然大多数回归模型在预测鱼片的货架寿命方面表现出相当的性能,但基于深度学习的模型在GFWH方面表现出优异的性能。这些结果表明,高光谱成像技术作为一种评估大西洋鳕鱼保质期的非侵入性工具具有巨大的潜力,从而实现有效的基于质量的分类,减少食物浪费,并增强海鲜供应链的可持续性。
{"title":"High Throughput Shelf Life Determination of Atlantic Cod (Gadus morhua L.) by Use of Hyperspectral Imaging","authors":"Samuel Ortega;Tatiana N. Ageeva;Silje Kristoffersen;Karsten Heia;Heidi A. Nilsen","doi":"10.1109/TMM.2025.3561661","DOIUrl":"https://doi.org/10.1109/TMM.2025.3561661","url":null,"abstract":"Fish quality and shelf life can be evaluated using various assessment methods, such as sensory analysis, biochemical tests, microbiological evaluations, and physicochemical analyses. However, these methods are invasive and time-consuming, driving interest in technologies capable of estimating shelf life through non-invasive procedures. This study investigates the potential of hyperspectral imaging as a non-invasive technology for predicting the shelf life of Atlantic cod. A storage experiment was conducted that included both gutted fish with heads (GFWH) and fillets, with sensory evaluation and biochemical measurements employed to determine shelf life. Subsequently, hyperspectral images of the fish samples were captured under industrial production conditions, and the spectral data were analyzed using different regression algorithms. The majority of the regression techniques utilized in this research successfully predicted shelf life for both fillets and GFWH, achieving a root mean square error (RMSE) lower than one day. While most regression models exhibited comparable performance in predicting the shelf life of fillets, deep learning-based models demonstrated superior performance for GFWH. These results suggest that hyperspectral imaging technology has significant potential as a non-invasive tool for estimating the shelf life of Atlantic cod, thereby enabling effective quality-based sorting, reducing food waste, and enhancing sustainability in the seafood supply chain.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2809-2824"},"PeriodicalIF":8.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10966199","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Manipulation 轨迹锚定多视图编辑文本引导的三维高斯操作
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-14 DOI: 10.1109/TMM.2025.3557618
Chaofan Luo;Donglin Di;Xun Yang;Yongjia Ma;Zhou Xue;Wei Chen;Xiaofei Gou;Yebin Liu
Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency during the multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from the text-to-image process. Additionally, we explore the connection between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choices, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Extensive quantitative and qualitative results in text-guided 3D scene editing clearly indicate that our method can achieve superior editing quality compared with state-of-the-art 3D scene editing methods.
尽管在3D场景编辑领域取得了重大进展,但目前的方法遇到了实质性的挑战,特别是在多视图编辑过程中保持3D一致性。为了应对这一挑战,我们提出了一种渐进式3D编辑策略,该策略通过具有双分支编辑机制的轨迹锚定方案(TAS)确保多视图一致性。具体来说,TAS促进了2D视图编辑和3D更新之间的紧密耦合迭代过程,防止了文本到图像过程中产生的错误积累。此外,我们探讨了基于优化的方法和基于重建的方法之间的联系,为选择卓越的设计选择提供了统一的视角,支持设计的TAS背后的基本原理。我们进一步提出了一个无需调整的视图一致注意力控制(VCAC)模块,该模块利用来自源分支的跨视图语义和几何参考,在编辑2D视图期间产生来自目标分支的对齐视图。为了验证该方法的有效性,我们对二维实例进行了分析,以证明该方法与VCAC模块的一致性得到了改善。大量的文本引导三维场景编辑的定量和定性结果清楚地表明,与目前最先进的3D场景编辑方法相比,我们的方法可以实现更高的编辑质量。
{"title":"TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Manipulation","authors":"Chaofan Luo;Donglin Di;Xun Yang;Yongjia Ma;Zhou Xue;Wei Chen;Xiaofei Gou;Yebin Liu","doi":"10.1109/TMM.2025.3557618","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557618","url":null,"abstract":"Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency during the multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from the text-to-image process. Additionally, we explore the connection between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choices, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Extensive quantitative and qualitative results in text-guided 3D scene editing clearly indicate that our method can achieve superior editing quality compared with state-of-the-art 3D scene editing methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2886-2898"},"PeriodicalIF":8.4,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imp: Highly Capable Large Multimodal Models for Mobile Devices Imp:移动设备的高性能大型多模态模型
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-11 DOI: 10.1109/TMM.2025.3557680
Zhenwei Shao;Zhou Yu;Jun Yu;Xuecheng Ouyang;Lihao Zheng;Zhenbiao Gai;Mingyang Wang;Zhenzhong Kuang;Jiajun Ding
By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximize the capabilities under constrained scale (e.g., 3B). Despite the encouraging results achieved by these methods, most of them only focus on one or two aspects of the design space, and the key design choices that influence model capability have not yet been thoroughly investigated. In this paper, we conduct a systematic study for lightweight LMMs from the aspects of model architecture, training strategy, and training data. Based on our findings, we obtain Imp—a family of highly capable LMMs at the 2B$sim$4B scales. Notably, our Imp-3B model steadily outperforms all the existing lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs at the 13B scale. With low-bit quantization and resolution reduction techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile chip with a high inference speed of about 13 tokens/s.
通过利用大型语言模型(llm)的能力,最近的大型多模态模型(lmm)在开放世界的多模态理解中显示出显著的多功能性。然而,它们通常是大量参数和计算密集型的,因此阻碍了它们在资源受限场景中的适用性。为此,相继提出了几种轻量级lmm,以最大限度地发挥受限制规模下的能力(例如3B)。尽管这些方法取得了令人鼓舞的结果,但它们大多只关注设计空间的一个或两个方面,并且尚未对影响模型能力的关键设计选择进行深入研究。本文从模型架构、训练策略、训练数据等方面对轻量级lmm进行了系统的研究。基于我们的研究结果,我们获得了一个在2B$sim$4B尺度上的高性能lmm家族。值得注意的是,我们的Imp-3B模型稳定地优于所有现有的类似尺寸的轻型lmm,甚至超过了最先进的13B级lmm。通过低比特量化和分辨率降低技术,我们的Imp模型可以部署在高通骁龙8Gen3移动芯片上,具有大约13个令牌/秒的高推断速度。
{"title":"Imp: Highly Capable Large Multimodal Models for Mobile Devices","authors":"Zhenwei Shao;Zhou Yu;Jun Yu;Xuecheng Ouyang;Lihao Zheng;Zhenbiao Gai;Mingyang Wang;Zhenzhong Kuang;Jiajun Ding","doi":"10.1109/TMM.2025.3557680","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557680","url":null,"abstract":"By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximize the capabilities under constrained scale (e.g., 3B). Despite the encouraging results achieved by these methods, most of them only focus on one or two aspects of the design space, and the key design choices that influence model capability have not yet been thoroughly investigated. In this paper, we conduct a systematic study for lightweight LMMs from the aspects of model architecture, training strategy, and training data. Based on our findings, we obtain Imp—a family of highly capable LMMs at the 2B<inline-formula><tex-math>$sim$</tex-math></inline-formula>4B scales. Notably, our Imp-3B model steadily outperforms all the existing lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs at the 13B scale. With low-bit quantization and resolution reduction techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile chip with a high inference speed of about 13 tokens/s.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2961-2974"},"PeriodicalIF":8.4,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding 自动几何图像数据集创建增强几何理解
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-10 DOI: 10.1109/TMM.2025.3557720
Zihan Huang;Tao Wu;Wang Lin;Shengyu Zhang;Jingyuan Chen;Fei Wu
With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100 k, an extensive repository comprising 100 k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100 k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100 k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research.
随着大型语言模型的快速发展,人们对它们在数学推理中的能力越来越感兴趣。然而,由于缺乏高质量的几何数据集,现有的研究主要集中在基于文本的代数问题上,而忽视了几何问题的研究。为了解决这一问题,本文介绍了自动生成数学几何图像的新方法autogo,以满足对大规模和多样化几何数据集的需求。augeto促进了augeto - 100k的创建,这是一个包含100k高质量几何图像-文本对的广泛存储库。通过利用精确定义的几何条款,augego - 100k包含各种几何形状,包括线,多边形,圆和复杂的空间关系等。此外,本文还证明了autogo - 100k通过微调来提高多模态大型语言模型的性能。实验结果表明,该模型在处理几何图像方面的能力有了显著提高,在几何标题和数学推理等任务上的准确性得到了提高。这项研究不仅填补了几何数据集可用性方面的关键空白,而且为先进的人工智能驱动工具在教育和研究中的发展铺平了道路。
{"title":"AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding","authors":"Zihan Huang;Tao Wu;Wang Lin;Shengyu Zhang;Jingyuan Chen;Fei Wu","doi":"10.1109/TMM.2025.3557720","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557720","url":null,"abstract":"With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry due to the lack of high-quality geometric datasets. To address this gap, this paper introduces AutoGeo, a novel approach for automatically generating mathematical geometric images to fulfill the demand for large-scale and diverse geometric datasets. AutoGeo facilitates the creation of AutoGeo-100 k, an extensive repository comprising 100 k high-quality geometry image-text pairs. By leveraging precisely defined geometric clauses, AutoGeo-100 k contains a wide variety of geometric shapes, including lines, polygons, circles, and complex spatial relationships, etc. Furthermore, this paper demonstrates the efficacy of AutoGeo-100 k in enhancing the performance of multimodal large language models through fine-tuning. Experimental results indicate significant improvements in the model's ability in handling geometric images, as evidenced by enhanced accuracy in tasks such as geometric captioning and mathematical reasoning. This research not only fills a critical gap in the availability of geometric datasets but also paves the way for the advancement of sophisticated AI-driven tools in education and research.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3105-3116"},"PeriodicalIF":8.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1