首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
Crafting More Transferable Adversarial Examples via Quality-Aware Transformation Combination 通过质量意识转换组合制作更多可转移的对抗性示例
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-09-01 DOI: 10.1109/TMM.2025.3604967
Junlin Liu;Xinchen Lyu;Chenshan Ren;Qimei Cui
Input diversity is an effective technique for crafting transferable adversarial examples that can deceive unknown AI models. Existing input-diversity-based methods typically use single input transformation, limiting targeted transferability and defense robustness. Combining different transformation types is challenging, as keeping increasing types would degrade semantic information and targeted transferability. This paper proposes a quality-aware transformation combination attack (TCA) that selects high-quality transformation combinations. The quality-aware selection enables expansion of transformation types, enhances input diversity, and hence improves targeted transferability and defense robustness. We first design a quality-evaluation framework to quantify the effectiveness of transformation combinations, which jointly considers convergence, transferability, and robustness. Only a small group (up to 10) of images are required for computation-efficient quality evaluation. Experiments validate TCA’s superiority over state-of-the-art baselines in adversarial transferability and robustness. When defenses are secured, the average targeted success rate of TCA with four transformation types (i.e., TCA-t4) outperforms the best baseline by 26%$sim$42% on ImageNet.
输入多样性是一种有效的技术,用于制作可转移的对抗示例,可以欺骗未知的AI模型。现有的基于输入多样性的方法通常使用单输入转换,限制了目标可转移性和防御鲁棒性。组合不同的转换类型具有挑战性,因为不断增加类型会降低语义信息和目标可移植性。提出了一种选择高质量转换组合的质量感知转换组合攻击(TCA)。质量意识选择可以扩展转换类型,增强输入多样性,从而提高目标可转移性和防御鲁棒性。我们首先设计了一个质量评估框架来量化转换组合的有效性,它共同考虑了收敛性、可转移性和鲁棒性。计算效率高的质量评估只需要一小组(最多10张)图像。实验验证了TCA在对抗可转移性和鲁棒性方面优于最先进的基线。当防御得到保护时,具有四种转换类型(即TCA-t4)的TCA的平均目标成功率在ImageNet上比最佳基线高出26%。
{"title":"Crafting More Transferable Adversarial Examples via Quality-Aware Transformation Combination","authors":"Junlin Liu;Xinchen Lyu;Chenshan Ren;Qimei Cui","doi":"10.1109/TMM.2025.3604967","DOIUrl":"https://doi.org/10.1109/TMM.2025.3604967","url":null,"abstract":"Input diversity is an effective technique for crafting transferable adversarial examples that can deceive unknown AI models. Existing input-diversity-based methods typically use single input transformation, limiting targeted transferability and defense robustness. Combining different transformation types is challenging, as keeping increasing types would degrade semantic information and targeted transferability. This paper proposes a quality-aware <underline>t</u>ransformation <underline>c</u>ombination <underline>a</u>ttack (TCA) that selects high-quality transformation combinations. The quality-aware selection enables expansion of transformation types, enhances input diversity, and hence improves targeted transferability and defense robustness. We first design a quality-evaluation framework to quantify the effectiveness of transformation combinations, which jointly considers convergence, transferability, and robustness. Only a small group (up to 10) of images are required for computation-efficient quality evaluation. Experiments validate TCA’s superiority over state-of-the-art baselines in adversarial transferability and robustness. When defenses are secured, the average targeted success rate of TCA with four transformation types (i.e., TCA-t4) outperforms the best baseline by 26%<inline-formula><tex-math>$sim$</tex-math></inline-formula>42% on ImageNet.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"7917-7929"},"PeriodicalIF":9.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Projection Distilling Knowledge for Omnidirectional Image Quality Assessment 面向全方位图像质量评价的交叉投影提取知识
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-28 DOI: 10.1109/TMM.2025.3590920
Huixin Hu;Feng Shao;Hangwei Chen;Xiongli Chai;Qiuping Jiang
Nowadays, virtual reality technology is advancing rapidly and becoming increasingly matured. Omnidirectional images have integrated into the daily lives of many individuals. However, these images are susceptible to irreversible distortion during the encoding and transmission processes. Given the unique characteristics of deformation and distortion in omnidirectional images, the development of a quality assessment method is crucial. To ensure that our network not only delivers efficient and stable performance but also maintains a minimal parameter count, we have integrated the concept of knowledge distillation into our network. This involves utilizing a full-reference (FR) teacher network to guide the training of a no-reference (NR) student network by cross-projection distilling knowledge. To specifically implement this method, a Dual Projection Format Fusion (DPFF) module is specifically designed to complement and integrate the mutual fusion of the two projection formats of omnidirectional images. In the design of our knowledge distillation process and loss function, we have introduced a review mechanism to enhance the performance and efficiency of response-based knowledge, as well as utilized intermediate fusion features to improve the effectiveness of feature-based knowledge. These components are combined to formulate the final loss function. Experimental results validate the superiority of our proposed model over existing FR and NR methods when evaluated on four omnidirectional image databases. This highlights the effectiveness of our proposed model in elevating the quality assessment of omnidirectional images.
当前,虚拟现实技术发展迅速,日趋成熟。全方位的图像已经融入了许多人的日常生活。然而,这些图像在编码和传输过程中容易产生不可逆失真。鉴于全向图像的形变和畸变的独特特性,开发一种质量评估方法至关重要。为了确保我们的网络不仅提供高效和稳定的性能,而且保持最小的参数计数,我们将知识蒸馏的概念集成到我们的网络中。这涉及到利用全参考(FR)教师网络通过交叉投影提取知识来指导无参考(NR)学生网络的训练。为具体实现该方法,专门设计了双投影格式融合(Dual Projection Format Fusion, DPFF)模块,对全向图像两种投影格式的相互融合进行补充和集成。在知识蒸馏过程和损失函数的设计中,我们引入了评审机制来提高基于响应的知识的性能和效率,并利用中间融合特征来提高基于特征的知识的有效性。这些分量组合起来形成最终的损失函数。实验结果验证了该模型在4个全向图像数据库上优于现有的FR和NR方法。这突出了我们提出的模型在提高全向图像质量评估方面的有效性。
{"title":"Cross-Projection Distilling Knowledge for Omnidirectional Image Quality Assessment","authors":"Huixin Hu;Feng Shao;Hangwei Chen;Xiongli Chai;Qiuping Jiang","doi":"10.1109/TMM.2025.3590920","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590920","url":null,"abstract":"Nowadays, virtual reality technology is advancing rapidly and becoming increasingly matured. Omnidirectional images have integrated into the daily lives of many individuals. However, these images are susceptible to irreversible distortion during the encoding and transmission processes. Given the unique characteristics of deformation and distortion in omnidirectional images, the development of a quality assessment method is crucial. To ensure that our network not only delivers efficient and stable performance but also maintains a minimal parameter count, we have integrated the concept of knowledge distillation into our network. This involves utilizing a full-reference (FR) teacher network to guide the training of a no-reference (NR) student network by cross-projection distilling knowledge. To specifically implement this method, a Dual Projection Format Fusion (DPFF) module is specifically designed to complement and integrate the mutual fusion of the two projection formats of omnidirectional images. In the design of our knowledge distillation process and loss function, we have introduced a review mechanism to enhance the performance and efficiency of response-based knowledge, as well as utilized intermediate fusion features to improve the effectiveness of feature-based knowledge. These components are combined to formulate the final loss function. Experimental results validate the superiority of our proposed model over existing FR and NR methods when evaluated on four omnidirectional image databases. This highlights the effectiveness of our proposed model in elevating the quality assessment of omnidirectional images.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6752-6765"},"PeriodicalIF":9.7,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Grained Vision-and-Language Model for Medical Image and Text Alignment 医学图像和文本对齐的多粒度视觉和语言模型
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-23 DOI: 10.1109/TMM.2025.3590930
Huimin Yan;Xian Yang;Liang Bai;Jiamin Li;Jiye Liang
The increasing interest in learning from paired medical images and textual reports highlights the need for methods that can achieve multi-grained alignment between these two modalities. However, most existing approaches overlook fine-grained semantic alignment, which can constrain the quality of the generated representations. To tackle this problem, we propose the Multi-Grained Vision-and-Language Alignment (MGVLA) model, which effectively leverages multi-grained correspondences between medical images and texts at different levels, including disease, instance, and token levels. For disease-level alignment, our approach adopts the concept of contrastive learning and uses medical terminologies detected from textual reports as soft labels to guide the alignment process. At the instance level, we propose a strategy for sampling hard negatives, where images and texts with the same disease type but differing in details such as disease locations and severity are considered as hard negatives. This strategy helps our approach to better distinguish between positive and negative image-text pairs, ultimately enhancing the quality of our learned representations. For token-level alignment, we employ a masking and recovery technique to achieve fine-grained semantic alignment between patches and sub-words. This approach effectively aligns the different levels of granularity between the image and language modalities. To assess the efficacy of our MGVLA model, we conduct comprehensive experiments on the image-text retrieval and phrase grounding tasks.
人们对从配对医学图像和文本报告中学习的兴趣日益增加,这突出了对能够在这两种模式之间实现多粒度对齐的方法的需求。然而,大多数现有的方法忽略了细粒度的语义对齐,这可能会限制生成表示的质量。为了解决这个问题,我们提出了多粒度视觉和语言对齐(MGVLA)模型,该模型有效地利用了医学图像和文本在不同级别(包括疾病、实例和令牌级别)之间的多粒度对应关系。对于疾病级对齐,我们的方法采用对比学习的概念,并使用从文本报告中检测到的医学术语作为软标签来指导对齐过程。在实例级,我们提出了一种硬阴性抽样策略,其中具有相同疾病类型但在疾病位置和严重程度等细节上不同的图像和文本被视为硬阴性。这个策略帮助我们的方法更好地区分正面和负面的图像文本对,最终提高我们学习表征的质量。对于令牌级对齐,我们采用屏蔽和恢复技术来实现补丁和子词之间的细粒度语义对齐。这种方法有效地对齐了图像和语言模态之间的不同粒度级别。为了评估MGVLA模型的有效性,我们在图像文本检索和短语基础任务上进行了全面的实验。
{"title":"Multi-Grained Vision-and-Language Model for Medical Image and Text Alignment","authors":"Huimin Yan;Xian Yang;Liang Bai;Jiamin Li;Jiye Liang","doi":"10.1109/TMM.2025.3590930","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590930","url":null,"abstract":"The increasing interest in learning from paired medical images and textual reports highlights the need for methods that can achieve multi-grained alignment between these two modalities. However, most existing approaches overlook fine-grained semantic alignment, which can constrain the quality of the generated representations. To tackle this problem, we propose the Multi-Grained Vision-and-Language Alignment (MGVLA) model, which effectively leverages multi-grained correspondences between medical images and texts at different levels, including disease, instance, and token levels. For disease-level alignment, our approach adopts the concept of contrastive learning and uses medical terminologies detected from textual reports as soft labels to guide the alignment process. At the instance level, we propose a strategy for sampling hard negatives, where images and texts with the same disease type but differing in details such as disease locations and severity are considered as hard negatives. This strategy helps our approach to better distinguish between positive and negative image-text pairs, ultimately enhancing the quality of our learned representations. For token-level alignment, we employ a masking and recovery technique to achieve fine-grained semantic alignment between patches and sub-words. This approach effectively aligns the different levels of granularity between the image and language modalities. To assess the efficacy of our MGVLA model, we conduct comprehensive experiments on the image-text retrieval and phrase grounding tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6780-6792"},"PeriodicalIF":9.7,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework XMusic:走向一个广义可控的符号音乐生成框架
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-23 DOI: 10.1109/TMM.2025.3590912
Sida Tian;Can Zhang;Wei Yuan;Wei Tan;Wenjie Zhu
In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quality outputs. This paper presents a generalized symbolic music generation framework, XMusic, which supports flexible prompts (i.e., images, videos, texts, tags, and humming) to generate emotionally controllable and high-quality symbolic music. XMusic consists of two core components, XProjector and XComposer. XProjector parses the prompts of various modalities into symbolic music elements (i.e., emotions, genres, rhythms and notes) within the projection space to generate matching music. XComposer contains a Generator and a Selector. The Generator generates emotionally controllable and melodious music based on our innovative symbolic music representation, whereas the Selector identifies high-quality symbolic music by constructing a multi-task learning scheme involving quality assessment, emotion recognition, and genre recognition tasks. In addition, we build XMIDI, a large-scale symbolic music dataset that contains 108,023 MIDI files annotated with precise emotion and genre labels. Objective and subjective evaluations show that XMusic significantly outperforms the current state-of-the-art methods with impressive music quality. Our XMusic has been awarded as one of the nine Highlights of Collectibles at WAIC 2023. The project homepage of XMusic is: https://xmusic-project.github.io.
近年来,人工智能生成内容(AIGC)在图像合成和文本生成领域取得了显著进展,生成的内容可与人类生成的内容相媲美。然而,人工智能生成的音乐的质量尚未达到这一标准,主要是由于有效控制音乐情绪和确保高质量输出的挑战。本文提出了一种通用的符号音乐生成框架XMusic,它支持灵活的提示(即图像、视频、文本、标签和哼哼)来生成情感可控的高质量符号音乐。XMusic由两个核心组件组成,XProjector和XComposer。XProjector将各种形式的提示解析为投影空间内的象征性音乐元素(即情感,类型,节奏和音符),以生成匹配的音乐。XComposer包含一个生成器和一个选择器。生成器基于我们创新的符号音乐表示生成情感可控且旋律优美的音乐,而选择器通过构建包含质量评估、情感识别和类型识别任务的多任务学习方案来识别高质量的符号音乐。此外,我们还构建了XMIDI,这是一个大规模的符号音乐数据集,包含108,023个带有精确情感和类型标签的MIDI文件。客观和主观的评估表明,XMusic显著优于当前最先进的方法,具有令人印象深刻的音乐质量。我们的XMusic被评为WAIC 2023的九大收藏品之一。XMusic的项目主页是:https://xmusic-project.github.io。
{"title":"XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework","authors":"Sida Tian;Can Zhang;Wei Yuan;Wei Tan;Wenjie Zhu","doi":"10.1109/TMM.2025.3590912","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590912","url":null,"abstract":"In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quality outputs. This paper presents a generalized symbolic music generation framework, XMusic, which supports flexible prompts (i.e., images, videos, texts, tags, and humming) to generate emotionally controllable and high-quality symbolic music. XMusic consists of two core components, XProjector and XComposer. XProjector parses the prompts of various modalities into symbolic music elements (i.e., emotions, genres, rhythms and notes) within the projection space to generate matching music. XComposer contains a Generator and a Selector. The Generator generates emotionally controllable and melodious music based on our innovative symbolic music representation, whereas the Selector identifies high-quality symbolic music by constructing a multi-task learning scheme involving quality assessment, emotion recognition, and genre recognition tasks. In addition, we build XMIDI, a large-scale symbolic music dataset that contains 108,023 MIDI files annotated with precise emotion and genre labels. Objective and subjective evaluations show that XMusic significantly outperforms the current state-of-the-art methods with impressive music quality. Our XMusic has been awarded as one of the nine <italic>Highlights of Collectibles at WAIC 2023</i>. The project homepage of XMusic is: <uri>https://xmusic-project.github.io</uri>.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6857-6871"},"PeriodicalIF":9.7,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFAN: An Attention-Driven Forgery Adversarial Network for Blind Image Inpainting AFAN:一种用于盲图像绘制的注意力驱动的伪造对抗网络
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-22 DOI: 10.1109/TMM.2025.3590914
Jiahao Wang;Gang Pan;Di Sun;Jinyuan Li;Jiawan Zhang
Blind image inpainting is a challenging task aimed at reconstructing corrupted regions without relying on mask information. Due to the lack of mask priors, previous methods usually integrate a mask prediction network in the initial phase, followed by an inpainting backbone. However, this multi-stage generation process may result in feature misalignment. While recent end-to-end generative methods bypass the mask prediction step, they typically struggle with weak perception of contaminated regions and introduce structural distortions. This study presents a novel mask region perception strategy for blind image inpainting by combining adversarial training with forgery detection. To implement this strategy, we propose an attention-driven forgery adversarial network (AFAN), which leverages adaptive contextual attention (ACA) blocks for effective feature modulation. Specifically, within the generator, ACA employs self-attention to enhance content reconstruction by utilizing the rich contextual information of adjacent tokens. In the discriminator, ACA utilizes cross-attention with noise priors to guide adversarial learning for forgery detection. Moreover, we design a high-frequency omni-dimensional dynamic convolution (HODC) based on edge feature enhancement to improve detail representation. Extensive evaluations across multiple datasets demonstrate that the proposed AFAN model outperforms existing generative methods in blind image inpainting, particularly in terms of quality and texture fidelity.
盲图像重建是一项具有挑战性的任务,其目的是在不依赖掩模信息的情况下重建损坏区域。由于缺乏掩码先验,以前的方法通常在初始阶段集成一个掩码预测网络,然后再集成一个修复主干。然而,这种多阶段生成过程可能导致特征不对齐。虽然最近的端到端生成方法绕过了掩膜预测步骤,但它们通常难以对污染区域进行弱感知,并引入结构扭曲。本文提出了一种将对抗训练与伪造检测相结合的盲图像补漆掩膜区域感知策略。为了实现这一策略,我们提出了一个注意驱动的伪造对抗网络(AFAN),它利用自适应上下文注意(ACA)块进行有效的特征调制。具体而言,在生成器中,ACA利用相邻令牌的丰富上下文信息,利用自关注来增强内容重构。在鉴别器中,ACA利用交叉注意和噪声先验来指导伪造检测的对抗学习。此外,我们还设计了一种基于边缘特征增强的高频全维动态卷积(HODC)算法来改善细节表示。跨多个数据集的广泛评估表明,所提出的AFAN模型在盲图像绘制方面优于现有的生成方法,特别是在质量和纹理保真度方面。
{"title":"AFAN: An Attention-Driven Forgery Adversarial Network for Blind Image Inpainting","authors":"Jiahao Wang;Gang Pan;Di Sun;Jinyuan Li;Jiawan Zhang","doi":"10.1109/TMM.2025.3590914","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590914","url":null,"abstract":"Blind image inpainting is a challenging task aimed at reconstructing corrupted regions without relying on mask information. Due to the lack of mask priors, previous methods usually integrate a mask prediction network in the initial phase, followed by an inpainting backbone. However, this multi-stage generation process may result in feature misalignment. While recent end-to-end generative methods bypass the mask prediction step, they typically struggle with weak perception of contaminated regions and introduce structural distortions. This study presents a novel mask region perception strategy for blind image inpainting by combining adversarial training with forgery detection. To implement this strategy, we propose an attention-driven forgery adversarial network (AFAN), which leverages adaptive contextual attention (ACA) blocks for effective feature modulation. Specifically, within the generator, ACA employs self-attention to enhance content reconstruction by utilizing the rich contextual information of adjacent tokens. In the discriminator, ACA utilizes cross-attention with noise priors to guide adversarial learning for forgery detection. Moreover, we design a high-frequency omni-dimensional dynamic convolution (HODC) based on edge feature enhancement to improve detail representation. Extensive evaluations across multiple datasets demonstrate that the proposed AFAN model outperforms existing generative methods in blind image inpainting, particularly in terms of quality and texture fidelity.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6845-6856"},"PeriodicalIF":9.7,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ROSA: A Robust Self-Adaptive Model for Multimodal Emotion Recognition With Uncertain Missing Modalities ROSA:一个具有不确定缺失模态的多模态情绪识别鲁棒自适应模型
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-22 DOI: 10.1109/TMM.2025.3590929
Ziming Li;Yaxin Liu;Chuanpeng Yang;Yan Zhou;Songlin Hu
The rapid development of online media has heightened the importance of multimodal emotion recognition (MER) in video analysis. However, practical applications often encounter challenges due to missing modalities caused by various interferences. It is difficult to predict the specific missing situations, such as the number and types of missing modalities. Current approaches to modality missing typically apply a uniform method to address various missing cases, which are insufficiently adaptive to dynamic conditions. For example, translation-based methods can efficiently complete missing text from audio, but generating audio or video features that retain the original emotional information from other modalities is challenging and may introduce additional noise. In this paper, we introduce ROSA, a novel robust self-adaptive model designed to address various missing cases with tailored approaches, leveraging available modalities effectively and reducing the introduction of additional noise. Specifically, the A-T Completion module based on the encoder-decoder architecture enables ROSA to generate missing raw text from audio rather than mere embedding representations, capturing more nuanced modal features. Additionally, we design the T-V Fusion module based on a vision-language large model for deep extraction and fusion of textual and visual features. Comprehensive experiments conducted on three widely used public datasets demonstrate the superiority and effectiveness of our model. ROSA outperforms other models in both fixed missing rate and fixed missing modality cases. The ablation studies further highlights the contribution of each designed module.
网络媒体的快速发展,提高了多模态情感识别在视频分析中的重要性。然而,由于各种干扰导致的模态缺失,在实际应用中经常遇到挑战。很难预测具体的缺失情况,例如缺失模态的数量和类型。目前的模态缺失方法通常采用统一的方法来处理各种缺失情况,这些方法对动态条件的适应性不足。例如,基于翻译的方法可以有效地完成音频中缺失的文本,但是生成保留其他模式的原始情感信息的音频或视频特征是具有挑战性的,并且可能会引入额外的噪声。在本文中,我们介绍了ROSA,这是一种新颖的鲁棒自适应模型,旨在通过量身定制的方法解决各种缺失情况,有效地利用现有模式并减少额外噪声的引入。具体来说,基于编码器-解码器架构的A-T补全模块使ROSA能够从音频中生成缺失的原始文本,而不仅仅是嵌入表示,捕获更细微的模态特征。此外,我们设计了基于视觉语言大模型的T-V融合模块,用于文本和视觉特征的深度提取和融合。在三个广泛使用的公共数据集上进行的综合实验证明了该模型的优越性和有效性。ROSA在固定缺失率和固定缺失模态情况下都优于其他模型。烧蚀研究进一步强调了每个设计模块的贡献。
{"title":"ROSA: A Robust Self-Adaptive Model for Multimodal Emotion Recognition With Uncertain Missing Modalities","authors":"Ziming Li;Yaxin Liu;Chuanpeng Yang;Yan Zhou;Songlin Hu","doi":"10.1109/TMM.2025.3590929","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590929","url":null,"abstract":"The rapid development of online media has heightened the importance of multimodal emotion recognition (MER) in video analysis. However, practical applications often encounter challenges due to missing modalities caused by various interferences. It is difficult to predict the specific missing situations, such as the number and types of missing modalities. Current approaches to modality missing typically apply a uniform method to address various missing cases, which are insufficiently adaptive to dynamic conditions. For example, translation-based methods can efficiently complete missing text from audio, but generating audio or video features that retain the original emotional information from other modalities is challenging and may introduce additional noise. In this paper, we introduce ROSA, a novel <bold>ro</b>bust <bold>s</b>elf-<bold>a</b>daptive model designed to address various missing cases with tailored approaches, leveraging available modalities effectively and reducing the introduction of additional noise. Specifically, the A-T Completion module based on the encoder-decoder architecture enables ROSA to generate missing raw text from audio rather than mere embedding representations, capturing more nuanced modal features. Additionally, we design the T-V Fusion module based on a vision-language large model for deep extraction and fusion of textual and visual features. Comprehensive experiments conducted on three widely used public datasets demonstrate the superiority and effectiveness of our model. ROSA outperforms other models in both fixed missing rate and fixed missing modality cases. The ablation studies further highlights the contribution of each designed module.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6766-6779"},"PeriodicalIF":9.7,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Super-Resolution With Taylor Expansion Approximation and Large Field Reception 图像超分辨率与泰勒展开近似和大视野接收
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-21 DOI: 10.1109/TMM.2025.3590917
Jiancong Feng;Yuan-Gen Wang;Mingjie Li;Fengchuang Xing
Self-similarity techniques are booming in no-reference super-resolution (SR) due to accurate estimation of the degradation types involved in low-resolution images. However, high-dimensional matrix multiplication within self-similarity computation prohibitively consumes massive computational costs. We find that the high-dimensional attention map is derived from the matrix multiplication between query and key, followed by a softmax function. This softmax makes the matrix multiplication inseparable, posing a great challenge in simplifying computational complexity. To address this issue, we first propose a second-order Taylor expansion approximation (STEA) to separate the matrix multiplication of query and key, resulting in the complexity reduction from $mathcal {O}(N^{2})$ to $mathcal {O}(N)$. Then, we design a multi-scale large field reception (MLFR) to compensate for the performance degradation caused by STEA. Finally, we apply these two core designs to laboratory and real-world scenarios by constructing LabNet and RealNet, respectively. Extensive experimental results tested on five synthetic datasets demonstrate that our LabNet sets a new benchmark in qualitative and quantitative evaluations. Tested on the real-world dataset, our RealNet achieves superior visual quality over existing methods. Ablation studies further verify the contributions of STEA and MLFR towards both LabNet and RealNet frameworks.
由于能够准确估计低分辨率图像的退化类型,自相似技术在无参考超分辨率(SR)中得到了蓬勃发展。然而,自相似计算中的高维矩阵乘法消耗了大量的计算成本。我们发现高维注意力图是由查询和关键字之间的矩阵乘法推导出来的,然后是一个softmax函数。这种softmax使得矩阵乘法不可分割,对简化计算复杂度提出了很大的挑战。为了解决这个问题,我们首先提出了一个二阶泰勒展开近似(STEA)来分离查询和键的矩阵乘法,从而将复杂度从$mathcal {O}(N^{2})$降低到$mathcal {O}(N)$。然后,我们设计了一个多尺度大场接收(MLFR)来补偿STEA引起的性能下降。最后,我们通过分别构建LabNet和RealNet,将这两个核心设计应用于实验室和现实场景。在五个合成数据集上测试的大量实验结果表明,我们的LabNet在定性和定量评估方面树立了新的基准。在真实世界的数据集上测试,我们的RealNet实现了比现有方法更好的视觉质量。消融研究进一步验证了STEA和MLFR对LabNet和RealNet框架的贡献。
{"title":"Image Super-Resolution With Taylor Expansion Approximation and Large Field Reception","authors":"Jiancong Feng;Yuan-Gen Wang;Mingjie Li;Fengchuang Xing","doi":"10.1109/TMM.2025.3590917","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590917","url":null,"abstract":"Self-similarity techniques are booming in no-reference super-resolution (SR) due to accurate estimation of the degradation types involved in low-resolution images. However, high-dimensional matrix multiplication within self-similarity computation prohibitively consumes massive computational costs. We find that the high-dimensional attention map is derived from the matrix multiplication between query and key, followed by a softmax function. This softmax makes the matrix multiplication inseparable, posing a great challenge in simplifying computational complexity. To address this issue, we first propose a second-order Taylor expansion approximation (STEA) to separate the matrix multiplication of query and key, resulting in the complexity reduction from <inline-formula><tex-math>$mathcal {O}(N^{2})$</tex-math></inline-formula> to <inline-formula><tex-math>$mathcal {O}(N)$</tex-math></inline-formula>. Then, we design a multi-scale large field reception (MLFR) to compensate for the performance degradation caused by STEA. Finally, we apply these two core designs to laboratory and real-world scenarios by constructing LabNet and RealNet, respectively. Extensive experimental results tested on five synthetic datasets demonstrate that our LabNet sets a new benchmark in qualitative and quantitative evaluations. Tested on the real-world dataset, our RealNet achieves superior visual quality over existing methods. Ablation studies further verify the contributions of STEA and MLFR towards both LabNet and RealNet frameworks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6819-6830"},"PeriodicalIF":9.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Student Actions in Classroom Scenes: New Dataset and Baseline 学生在课堂场景中的行为:新的数据集和基线
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-21 DOI: 10.1109/TMM.2025.3590899
Zhuolin Tan;Chenqiang Gao;Anyong Qin;Ruixin Chen;Tiecheng Song;Feng Yang;Deyu Meng
Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label Student Action Video (SAV) dataset, specifically designed for action detection in classroom settings. The SAV dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, annotated with 15 distinct student actions. Compared to existing action detection datasets, the SAV dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. These complexities introduce new opportunities and challenges to advance action detection methods. To benchmark this, we propose a novel baseline method based on a visual transformer, designed to enhance attention to key local details within small and dense object regions. Our method demonstrates excellent performance with a mean Average Precision (mAP) of 67.9% and 27.4% on the SAV and AVA datasets, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes.
学生行为分析是教育研究中一项重要而富有挑战性的任务。由于缺乏可访问的数据集来捕捉教室中细微的行动动态,现有的努力受到了阻碍。在本文中,我们提出了一个新的多标签学生动作视频(SAV)数据集,专门为课堂环境中的动作检测设计。SAV数据集由来自758个不同教室的4324个精心修剪的视频片段组成,并附有15种不同的学生动作注释。与现有的动作检测数据集相比,SAV数据集通过提供广泛的真实教室场景、高质量的视频数据和独特的挑战而脱颖而出,包括细微的运动差异、密集的物体参与、显著的尺度差异、不同的拍摄角度和视觉遮挡。这些复杂性为推进动作检测方法带来了新的机遇和挑战。为了对其进行基准测试,我们提出了一种基于视觉转换器的新型基线方法,旨在增强对小而密集物体区域内关键局部细节的关注。该方法在SAV和AVA数据集上的平均精度(mAP)分别为67.9%和27.4%。本文不仅提供了数据集,还呼吁进一步研究人工智能驱动的教育工具,这些工具可能会改变教学方法和学习成果。
{"title":"Towards Student Actions in Classroom Scenes: New Dataset and Baseline","authors":"Zhuolin Tan;Chenqiang Gao;Anyong Qin;Ruixin Chen;Tiecheng Song;Feng Yang;Deyu Meng","doi":"10.1109/TMM.2025.3590899","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590899","url":null,"abstract":"Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label <italic>Student Action Video</i> (SAV) dataset, specifically designed for action detection in classroom settings. The SAV dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, annotated with 15 distinct student actions. Compared to existing action detection datasets, the SAV dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. These complexities introduce new opportunities and challenges to advance action detection methods. To benchmark this, we propose a novel baseline method based on a visual transformer, designed to enhance attention to key local details within small and dense object regions. Our method demonstrates excellent performance with a mean Average Precision (mAP) of 67.9% and 27.4% on the SAV and AVA datasets, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6831-6844"},"PeriodicalIF":9.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Domain Generalized Object Detection With Frequency Whitening and Contrastive Learning 基于频率白化和对比学习的单域广义目标检测
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-21 DOI: 10.1109/TMM.2025.3590915
Xiaolong Guo;Chengxu Liu;Xueming Qian;Zhixiao Wang;Xubin Feng;Yao Xue
Single-Domain Generalization Object Detection (Single-DGOD) refers to training a model with only one source domain, enabling the model to generalize to any unseen domain. For instance, a detector trained on a sunny daytime dataset should also perform well in scenarios such as rainy nighttime. The main challenge is to improve the detector’s ability to learn the domain-invariant representation (DIR) while removing domain-specific information. Recent progress in Single-DGOD has demonstrated the efficacy of removing domain-specific information by adjusting feature distributions. Nonetheless, simply adjusting the global feature distribution in Single-DGOD task is insufficient to learn the potential relationship from sunny to adverse weather, as these ignore the significant domain gaps between instances across different weathers. In this paper, we propose a novel object detection method for more robust single-domain generalization. In particular, it mainly consists of a frequency-aware selective whitening module (FSW) for removing redundant domain-specific information and a contrastive feature alignment module (CFA) for enhancing domain-invariant information among instances. Specially, FSW extracts the magnitude spectrum of the feature and uses a group whitening loss to selectively eliminate redundant domain-specific information in the magnitude. To further eliminate domain differences among instances, we apply the style transfer method for data augmentation and use the augmented data in the CFA module. CFA formulates both the original and the augmentd RoI features into a series of groups with different categories, and utilizes contrastive learning across them to facilitate the learning of DIR in various categories. Experiments show that our method achieves favorable performance on existing standard benchmarks.
单域泛化对象检测(Single-Domain Generalization Object Detection, Single-DGOD)是指只训练一个源域的模型,使模型泛化到任何不可见的域。例如,在阳光明媚的白天数据集上训练的检测器也应该在下雨的夜晚等场景中表现良好。主要的挑战是在删除特定于领域的信息的同时提高检测器学习领域不变表示(DIR)的能力。最近在Single-DGOD方面的进展已经证明了通过调整特征分布来去除特定领域信息的有效性。然而,简单地调整Single-DGOD任务中的全局特征分布不足以了解晴天与恶劣天气之间的潜在关系,因为这些任务忽略了不同天气下实例之间的显著域间隙。在本文中,我们提出了一种新的目标检测方法,以提高单域泛化的鲁棒性。该算法主要由去除冗余域特定信息的频率感知选择性白化模块(FSW)和增强实例间域不变信息的对比特征对齐模块(CFA)组成。其中,FSW提取特征的幅度谱,并使用组白化损失选择性地去除幅度中冗余的特定域信息。为了进一步消除实例之间的领域差异,我们采用风格迁移方法对数据进行扩充,并在CFA模块中使用扩充后的数据。CFA将RoI的原始特征和增强特征分成一系列不同类别的组,并利用它们之间的对比学习来促进不同类别DIR的学习。实验表明,该方法在现有的标准基准测试中取得了良好的性能。
{"title":"Single-Domain Generalized Object Detection With Frequency Whitening and Contrastive Learning","authors":"Xiaolong Guo;Chengxu Liu;Xueming Qian;Zhixiao Wang;Xubin Feng;Yao Xue","doi":"10.1109/TMM.2025.3590915","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590915","url":null,"abstract":"Single-Domain Generalization Object Detection (Single-DGOD) refers to training a model with only one source domain, enabling the model to generalize to any unseen domain. For instance, a detector trained on a sunny daytime dataset should also perform well in scenarios such as rainy nighttime. The main challenge is to improve the detector’s ability to learn the domain-invariant representation (DIR) while removing domain-specific information. Recent progress in Single-DGOD has demonstrated the efficacy of removing domain-specific information by adjusting feature distributions. Nonetheless, simply adjusting the global feature distribution in Single-DGOD task is insufficient to learn the potential relationship from sunny to adverse weather, as these ignore the significant domain gaps between instances across different weathers. In this paper, we propose a novel object detection method for more robust single-domain generalization. In particular, it mainly consists of a frequency-aware selective whitening module (FSW) for removing redundant domain-specific information and a contrastive feature alignment module (CFA) for enhancing domain-invariant information among instances. Specially, FSW extracts the magnitude spectrum of the feature and uses a group whitening loss to selectively eliminate redundant domain-specific information in the magnitude. To further eliminate domain differences among instances, we apply the style transfer method for data augmentation and use the augmented data in the CFA module. CFA formulates both the original and the augmentd RoI features into a series of groups with different categories, and utilizes contrastive learning across them to facilitate the learning of DIR in various categories. Experiments show that our method achieves favorable performance on existing standard benchmarks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6805-6818"},"PeriodicalIF":9.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Modal-Specific Representations for Sentiment Analysis With Incomplete Modalities 基于不完全模态的情感分析中情态特定表示的增强
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-07-21 DOI: 10.1109/TMM.2025.3590909
Xin Jiang;Lihuo He;Fei Gao;Kaifan Zhang;Jie Li;Xinbo Gao
Multimodal sentiment analysis aims at exploiting complementary information from multiple modalities or data sources to enhance the understanding and interpretation of sentiment. While existing multi-modal fusion techniques offer significant improvements in sentiment analysis, real-world scenarios often involve missing modalities, introducing complexity due to uncertainty of which modalities may be absent. To tackle the challenge of incomplete modality-specific feature extraction caused by missing modalities, this paper proposes a Cosine Margin-Aware Network (CMANet) which centers on the Cosine Margin-Aware Distillation (CMAD) module. The core module measures distance between samples and the classification boundary, enabling CMANet to focus on samples near the boundary. So, it effectively captures the unique features of different modal combinations. To address the issue of modality imbalance during modality-specific feature extraction, this paper proposes a Weak Modality Regularization (WMR) strategy, which aligns the feature distributions between strong and weak modalities at the dataset-level, while also enhancing the prediction loss of samples at the sample-level. This dual mechanism improves the recognition robustness of weak modality combination. Extensive experiments demonstrate that the proposed method outperforms the previous best model, MMIN, with a 3.82% improvement in unweighted accuracy. These results underscore the robustness of the approach under conditions of uncertain and missing modalities.
多模态情感分析旨在利用来自多个模态或数据源的互补信息来增强对情感的理解和解释。虽然现有的多模态融合技术在情感分析方面有了很大的改进,但现实世界的场景往往涉及缺失的模态,由于模态缺失的不确定性,引入了复杂性。为了解决模态缺失导致的模态特征提取不完整的问题,本文提出了一种以余弦边缘感知蒸馏(CMAD)模块为中心的余弦边缘感知网络(CMANet)。核心模块测量样本与分类边界之间的距离,使CMANet能够专注于边界附近的样本。因此,它有效地捕捉了不同模态组合的独特特征。为了解决模态特征提取过程中模态不平衡的问题,本文提出了一种弱模态正则化(Weak modal Regularization, WMR)策略,该策略在数据集水平上对强模态和弱模态之间的特征分布进行对齐,同时在样本水平上增强样本的预测损失。这种双重机制提高了弱模态组合的识别鲁棒性。大量的实验表明,该方法优于之前的最佳模型MMIN,未加权精度提高了3.82%。这些结果强调了该方法在不确定和缺失模式条件下的鲁棒性。
{"title":"Boosting Modal-Specific Representations for Sentiment Analysis With Incomplete Modalities","authors":"Xin Jiang;Lihuo He;Fei Gao;Kaifan Zhang;Jie Li;Xinbo Gao","doi":"10.1109/TMM.2025.3590909","DOIUrl":"https://doi.org/10.1109/TMM.2025.3590909","url":null,"abstract":"Multimodal sentiment analysis aims at exploiting complementary information from multiple modalities or data sources to enhance the understanding and interpretation of sentiment. While existing multi-modal fusion techniques offer significant improvements in sentiment analysis, real-world scenarios often involve missing modalities, introducing complexity due to uncertainty of which modalities may be absent. To tackle the challenge of incomplete modality-specific feature extraction caused by missing modalities, this paper proposes a Cosine Margin-Aware Network (CMANet) which centers on the Cosine Margin-Aware Distillation (CMAD) module. The core module measures distance between samples and the classification boundary, enabling CMANet to focus on samples near the boundary. So, it effectively captures the unique features of different modal combinations. To address the issue of modality imbalance during modality-specific feature extraction, this paper proposes a Weak Modality Regularization (WMR) strategy, which aligns the feature distributions between strong and weak modalities at the dataset-level, while also enhancing the prediction loss of samples at the sample-level. This dual mechanism improves the recognition robustness of weak modality combination. Extensive experiments demonstrate that the proposed method outperforms the previous best model, MMIN, with a 3.82% improvement in unweighted accuracy. These results underscore the robustness of the approach under conditions of uncertain and missing modalities.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6793-6804"},"PeriodicalIF":9.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1