首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
Cross-Lingual Adaptation for Vision-Language Model via Multimodal Semantic Distillation 基于多模态语义蒸馏的视觉语言模型跨语言自适应
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557678
Yu Weng;Wenbin He;Jun Dong;Chaomurilige;Xuan Liu;Zheng Liu
Large Multimodal Models (LMMs) excel in English multimedia tasks but face challenges in adapting to other languages due to linguistic diversity, limited non-English multimodal data, and high training costs. Existing approaches rely on machine-translated multimodal corpora or multilingual large language models, yet they demand substantial resources and achieve only modest zero-shot cross-lingual transfer performance, as shown in the IGLUE benchmark. In this work, we propose SMSA, a Syntax-aware Multimodal Semantic Adaptation approach, which efficiently extends vision-language models (VLMs) to multiple languages via a lightweight adaptation module. Instead of learning from scratch, SMSA transfers multimodal knowledge from English-trained models using two key components: (1) a Syntax-aware Adapter (SAA), which restructures multilingual text representations to align better with English syntax, reducing cross-lingual misalignment; (2) a Multimodal Semantic Distillation (MSD) method, which enables the model to mimic English sequence processing and retain multimodal associations across languages. This allows efficient adaptation to new languages while preserving the original model's strong multimodal capabilities. We extend an MoE-based VLM to 8 languages using a small translation dataset. Evaluations on the IGLUE benchmark show that SMSA achieves strong zero-shot transfer, outperforming some multilingual LMMs and demonstrating its effectiveness in cross-lingual vision-language adaptation.
大型多模态模型(Large Multimodal Models, lmm)在英语多媒体任务中表现优异,但由于语言的多样性、非英语多模态数据的有限性以及高昂的训练成本,lmm在适应其他语言方面面临挑战。现有的方法依赖于机器翻译的多模态语料库或多语言大型语言模型,但它们需要大量资源,并且只能实现适度的零概率跨语言迁移性能,如IGLUE基准所示。在这项工作中,我们提出了SMSA,一种语法感知的多模态语义自适应方法,该方法通过轻量级的自适应模块有效地将视觉语言模型(vlm)扩展到多种语言。SMSA不是从头开始学习,而是使用两个关键组件从英语训练的模型中转移多模态知识:(1)语法感知适配器(SAA),它重构多语言文本表示以更好地与英语语法对齐,减少跨语言偏差;(2)多模态语义蒸馏(MSD)方法,该方法使模型能够模拟英语序列处理并保留跨语言的多模态关联。这允许有效地适应新语言,同时保留原始模型强大的多模式功能。我们使用一个小的翻译数据集将基于moe的VLM扩展到8种语言。对IGLUE基准的评估表明,SMSA实现了较强的零射击迁移,优于一些多语言lmm,证明了其在跨语言视觉语言适应方面的有效性。
{"title":"Cross-Lingual Adaptation for Vision-Language Model via Multimodal Semantic Distillation","authors":"Yu Weng;Wenbin He;Jun Dong;Chaomurilige;Xuan Liu;Zheng Liu","doi":"10.1109/TMM.2025.3557678","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557678","url":null,"abstract":"Large Multimodal Models (LMMs) excel in English multimedia tasks but face challenges in adapting to other languages due to linguistic diversity, limited non-English multimodal data, and high training costs. Existing approaches rely on machine-translated multimodal corpora or multilingual large language models, yet they demand substantial resources and achieve only modest zero-shot cross-lingual transfer performance, as shown in the IGLUE benchmark. In this work, we propose SMSA, a Syntax-aware Multimodal Semantic Adaptation approach, which efficiently extends vision-language models (VLMs) to multiple languages via a lightweight adaptation module. Instead of learning from scratch, SMSA transfers multimodal knowledge from English-trained models using two key components: (1) a Syntax-aware Adapter (SAA), which restructures multilingual text representations to align better with English syntax, reducing cross-lingual misalignment; (2) a Multimodal Semantic Distillation (MSD) method, which enables the model to mimic English sequence processing and retain multimodal associations across languages. This allows efficient adaptation to new languages while preserving the original model's strong multimodal capabilities. We extend an MoE-based VLM to 8 languages using a small translation dataset. Evaluations on the IGLUE benchmark show that SMSA achieves strong zero-shot transfer, outperforming some multilingual LMMs and demonstrating its effectiveness in cross-lingual vision-language adaptation.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3184-3196"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExpLLM: Towards Chain of Thought for Facial Expression Recognition 面向面部表情识别的思维链
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557704
Xing Lan;Jian Xue;Ji Qi;Dongmei Jiang;Ke Lu;Tat-Seng Chua
Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as those based on facial action units (AUs), typically provide AU names and intensities but lack insight into the interactions and relationships between AUs and the overall expression. In this paper, we propose a novel method called ExpLLM, which leverages large language models to generate an accurate chain of thought (CoT) for facial expression recognition. Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion. The key observations describe the AU's name, intensity, and associated emotions. The overall emotional interpretation provides an analysis based on multiple AUs and their interactions, identifying the dominant emotions and their relationships. Finally, the conclusion presents the final expression label derived from the preceding analysis. Furthermore, we also introduce the Exp-CoT Engine, designed to construct this expression CoT and generate instruction-description data for training our ExpLLM. Extensive experiments on the RAF-DB and AffectNet datasets demonstrate that ExpLLM outperforms current state-of-the-art FER methods. ExpLLM also surpasses the latest GPT-4o in expression CoT generation, particularly in recognizing micro-expressions where GPT-4o frequently fails.
面部表情识别是多媒体技术中的一项重要任务,在多个领域具有重要意义。然而,分析面部表情的原因对于准确识别它们至关重要。目前的方法,如基于面部动作单位(AUs)的方法,通常提供AU名称和强度,但缺乏对AU与整体表情之间的相互作用和关系的洞察。在本文中,我们提出了一种称为ExpLLM的新方法,该方法利用大型语言模型来生成准确的面部表情识别思维链(CoT)。具体来说,我们从三个关键角度设计了CoT机制:关键观察、整体情绪解释和结论。关键的观察描述了非盟的名称、强度和相关的情绪。整体情绪解释提供了基于多个au及其相互作用的分析,确定了主导情绪及其关系。最后,结论部分给出了由上述分析得出的最终表达式标签。此外,我们还介绍了Exp-CoT引擎,该引擎旨在构建此表达式CoT并生成用于训练ExpLLM的指令描述数据。在RAF-DB和AffectNet数据集上进行的大量实验表明,ExpLLM优于当前最先进的FER方法。在表达CoT生成方面,ExpLLM也超过了最新的gpt - 40,特别是在识别gpt - 40经常失败的微表达方面。
{"title":"ExpLLM: Towards Chain of Thought for Facial Expression Recognition","authors":"Xing Lan;Jian Xue;Ji Qi;Dongmei Jiang;Ke Lu;Tat-Seng Chua","doi":"10.1109/TMM.2025.3557704","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557704","url":null,"abstract":"Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as those based on facial action units (AUs), typically provide AU names and intensities but lack insight into the interactions and relationships between AUs and the overall expression. In this paper, we propose a novel method called ExpLLM, which leverages large language models to generate an accurate chain of thought (CoT) for facial expression recognition. Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion. The key observations describe the AU's name, intensity, and associated emotions. The overall emotional interpretation provides an analysis based on multiple AUs and their interactions, identifying the dominant emotions and their relationships. Finally, the conclusion presents the final expression label derived from the preceding analysis. Furthermore, we also introduce the Exp-CoT Engine, designed to construct this expression CoT and generate instruction-description data for training our ExpLLM. Extensive experiments on the RAF-DB and AffectNet datasets demonstrate that ExpLLM outperforms current state-of-the-art FER methods. ExpLLM also surpasses the latest GPT-4o in expression CoT generation, particularly in recognizing micro-expressions where GPT-4o frequently fails.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3069-3081"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks 从基于图像的大型多模态模型到视频任务的有效转换
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557692
Shidong Cao;Zhonghan Zhao;Shengyu Hao;Wenhao Chai;Jenq-Neng Hwang;Hongwei Wang;Gaoang Wang
Extending image-based Large Multimodal Models (LMMs) to video-based LMMs always requires temporal modeling in the pre-training. However, training the temporal modules gradually erases the knowledge of visual features learned from various image-text-based scenarios, leading to degradation in some downstream tasks. To address this issue, in this paper, we introduce a novel, efficient transfer approach termed MTransLLAMA, which employs transfer learning from pre-trained image LMMs for fine-grained video tasks with only small-scale training sets. Our method enables fewer trainable parameters and achieves faster adaptation and higher accuracy than pre-training video-based LMM models. Specifically, our method adopts early fusion between textual and visual features to capture fine-grained information, reuses spatial attention weights in temporal attentions for cyclical spatial-temporal reasoning, and introduces dynamic attention routing to capture both global and local information in spatial-temporal attentions. Experiments demonstrate that across multiple datasets and tasks, without relying on video pre-training, our model achieves state-of-the-art performance, enabling lightweight and efficient transfer from image-based LMMs to fine-grained video tasks.
将基于图像的大型多模态模型扩展到基于视频的大型多模态模型,在预训练中总是需要进行时间建模。然而,训练时间模块会逐渐抹去从各种基于图像-文本的场景中学习到的视觉特征知识,导致一些下游任务的退化。为了解决这个问题,在本文中,我们引入了一种新的,高效的迁移方法,称为MTransLLAMA,它使用预训练图像lmm的迁移学习来处理只有小规模训练集的细粒度视频任务。与基于预训练视频的LMM模型相比,该方法可以实现更少的可训练参数,实现更快的自适应和更高的精度。具体而言,我们的方法采用文本和视觉特征的早期融合来捕获细粒度信息,重用时间注意中的空间注意权值进行循环时空推理,并引入动态注意路由来捕获时空注意中的全局和局部信息。实验表明,在多个数据集和任务中,不依赖于视频预训练,我们的模型达到了最先进的性能,实现了从基于图像的lmm到细粒度视频任务的轻量级和高效传输。
{"title":"Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks","authors":"Shidong Cao;Zhonghan Zhao;Shengyu Hao;Wenhao Chai;Jenq-Neng Hwang;Hongwei Wang;Gaoang Wang","doi":"10.1109/TMM.2025.3557692","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557692","url":null,"abstract":"Extending image-based Large Multimodal Models (LMMs) to video-based LMMs always requires temporal modeling in the pre-training. However, training the temporal modules gradually erases the knowledge of visual features learned from various image-text-based scenarios, leading to degradation in some downstream tasks. To address this issue, in this paper, we introduce a novel, efficient transfer approach termed MTransLLAMA, which employs transfer learning from pre-trained image LMMs for fine-grained video tasks with only small-scale training sets. Our method enables <bold>fewer trainable parameters</b> and achieves <bold>faster adaptation</b> and <bold>higher accuracy</b> than pre-training video-based LMM models. Specifically, our method adopts early fusion between textual and visual features to capture fine-grained information, reuses spatial attention weights in temporal attentions for cyclical spatial-temporal reasoning, and introduces dynamic attention routing to capture both global and local information in spatial-temporal attentions. Experiments demonstrate that across multiple datasets and tasks, without relying on video pre-training, our model achieves state-of-the-art performance, enabling lightweight and efficient transfer from image-based LMMs to fine-grained video tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3045-3056"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HA-FGOVD: Highlighting Fine-Grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection HA-FGOVD:通过显式线性组合突出细粒度属性用于开放词汇对象检测
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557624
Yuqi Ma;Mengyin Liu;Chao Zhu;Xu-Cheng Yin
Open-vocabulary object detection (OVD) models are considered to be Large Multi-modal Models (LMM), due to their extensive training data and a large number of parameters. Mainstream OVD models prioritize object coarse-grained category rather than focus on their fine-grained attributes, e.g., colors or materials, thus failed to identify objects specified with certain attributes. Despite being pretrained on large-scale image-text pairs with rich attribute information, their latent feature space does not highlight these fine-grained attributes. In this paper, we introduce HA-FGOVD, a universal and explicit method that enhances the attribute-level detection capabilities of frozen OVD models by highlighting fine-grained attributes in explicit linear space. Our approach uses a LLM to extract attribute words in input text as a zero-shot task. Then, token attention masks are adjusted to guide text encoders in extracting both global and attribute-specific features, which are explicitly composited as two vectors in linear space to form a new attribute-highlighted feature for detection tasks. The composition weight scalars can be learned or transferred across different OVD models, showcasing the universality of our method. Experimental results show that HA-FGOVD achieves state-of-the-art performance on the FG-OVD benchmark and demonstrates promising generalization on the OVDEval benchmark, suggesting that our method addresses significant limitations in fine-grained attribute detection and has potential for broader fine-grained detection applications.
开放词汇目标检测(Open-vocabulary object detection, OVD)模型由于具有广泛的训练数据和大量的参数,被认为是大型多模态模型(Large Multi-modal model, LMM)。主流OVD模型优先考虑对象的粗粒度类别,而不是关注它们的细粒度属性,例如颜色或材料,因此无法识别具有特定属性的对象。尽管对具有丰富属性信息的大规模图像-文本对进行了预训练,但其潜在特征空间并不突出这些细粒度属性。本文介绍了一种通用的显式方法HA-FGOVD,该方法通过在显式线性空间中突出细粒度属性来增强冻结OVD模型的属性级检测能力。我们的方法使用LLM从输入文本中提取属性词作为零射击任务。然后,调整标记注意掩码以指导文本编码器提取全局特征和特定属性特征,这些特征在线性空间中显式合成为两个向量,形成新的属性突出显示特征用于检测任务。组合权重标量可以在不同的OVD模型之间学习或转移,显示了我们的方法的通用性。实验结果表明,HA-FGOVD在FG-OVD基准测试上达到了最先进的性能,并在OVDEval基准测试上表现出了很好的泛化效果,这表明我们的方法解决了细粒度属性检测的重大局限性,具有更广泛的细粒度检测应用潜力。
{"title":"HA-FGOVD: Highlighting Fine-Grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection","authors":"Yuqi Ma;Mengyin Liu;Chao Zhu;Xu-Cheng Yin","doi":"10.1109/TMM.2025.3557624","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557624","url":null,"abstract":"Open-vocabulary object detection (OVD) models are considered to be Large Multi-modal Models (LMM), due to their extensive training data and a large number of parameters. Mainstream OVD models prioritize object coarse-grained category rather than focus on their fine-grained attributes, e.g., colors or materials, thus failed to identify objects specified with certain attributes. Despite being pretrained on large-scale image-text pairs with rich attribute information, their latent feature space does not highlight these fine-grained attributes. In this paper, we introduce HA-FGOVD, a universal and explicit method that enhances the attribute-level detection capabilities of frozen OVD models by highlighting fine-grained attributes in explicit linear space. Our approach uses a LLM to extract attribute words in input text as a zero-shot task. Then, token attention masks are adjusted to guide text encoders in extracting both global and attribute-specific features, which are explicitly composited as two vectors in linear space to form a new attribute-highlighted feature for detection tasks. The composition weight scalars can be learned or transferred across different OVD models, showcasing the universality of our method. Experimental results show that HA-FGOVD achieves state-of-the-art performance on the FG-OVD benchmark and demonstrates promising generalization on the OVDEval benchmark, suggesting that our method addresses significant limitations in fine-grained attribute detection and has potential for broader fine-grained detection applications.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3171-3183"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-Vocabulary Multi-Object Tracking With Domain Generalized and Temporally Adaptive Features 具有领域广义和时间自适应特征的开放词汇多目标跟踪
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557619
Run Li;Dawei Zhang;Yanchao Wang;Yunliang Jiang;Zhonglong Zheng;Sang-Woon Jeon;Hua Wang
Open-vocabulary multi-object tracking (OVMOT) is a cutting research direction within the multi-object tracking field. It employs large multi-modal models to effectively address the challenge of tracking unseen objects within dynamic visual scenes. While models require robust domain generalization and temporal adaptability, OVTrack, the only existing open-vocabulary multi-object tracker, relies solely on static appearance information and lacks these crucial adaptive capabilities. In this paper, we propose OVSORT, a new framework designed to improve domain generalization and temporal information processing. Specifically, we first propose the Adaptive Contextual Normalization (ACN) technique in OVSORT, which dynamically adjusts the feature maps based on the dataset's statistical properties, thereby fine-tuning our model's to improve domain generalization. Then, we introduce motion cues for the first time. Using our Joint Motion and Appearance Tracking (JMAT) strategy, we obtain a joint similarity measure and subsequently apply the Hungarian algorithm for data association. Finally, our Hierarchical Adaptive Feature Update (HAFU) strategy adaptively adjusts feature updates according to the current state of each trajectory, which greatly improves the utilization of temporal information. Extensive experiments on the TAO validation set and test set confirm the superiority of OVSORT, which significantly improves the handling of novel and base classes. It surpasses existing methods in terms of accuracy and generalization, setting a new state-of-the-art for OVMOT.
开放词汇多目标跟踪(OVMOT)是多目标跟踪领域的一个重要研究方向。它采用大型多模态模型来有效地解决在动态视觉场景中跟踪看不见的物体的挑战。虽然模型需要鲁棒的领域泛化和时间适应性,但OVTrack是现有唯一的开放词汇多目标跟踪器,仅依赖于静态外观信息,缺乏这些关键的自适应能力。在本文中,我们提出了一个新的框架OVSORT,旨在提高领域泛化和时间信息处理。具体来说,我们首先在OVSORT中提出了自适应上下文归一化(ACN)技术,该技术根据数据集的统计属性动态调整特征映射,从而微调我们的模型以提高领域泛化。然后,我们第一次引入了动作线索。使用我们的联合运动和外观跟踪(JMAT)策略,我们获得了一个联合相似性度量,随后应用匈牙利算法进行数据关联。最后,我们的分层自适应特征更新(HAFU)策略根据每条轨迹的当前状态自适应调整特征更新,大大提高了时间信息的利用率。在TAO验证集和测试集上的大量实验证实了OVSORT的优越性,它显著提高了对新类和基类的处理。它在准确性和泛化方面超越了现有的方法,为OVMOT设定了新的技术水平。
{"title":"Open-Vocabulary Multi-Object Tracking With Domain Generalized and Temporally Adaptive Features","authors":"Run Li;Dawei Zhang;Yanchao Wang;Yunliang Jiang;Zhonglong Zheng;Sang-Woon Jeon;Hua Wang","doi":"10.1109/TMM.2025.3557619","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557619","url":null,"abstract":"Open-vocabulary multi-object tracking (OVMOT) is a cutting research direction within the multi-object tracking field. It employs large multi-modal models to effectively address the challenge of tracking unseen objects within dynamic visual scenes. While models require robust domain generalization and temporal adaptability, OVTrack, the only existing open-vocabulary multi-object tracker, relies solely on static appearance information and lacks these crucial adaptive capabilities. In this paper, we propose OVSORT, a new framework designed to improve domain generalization and temporal information processing. Specifically, we first propose the Adaptive Contextual Normalization (ACN) technique in OVSORT, which dynamically adjusts the feature maps based on the dataset's statistical properties, thereby fine-tuning our model's to improve domain generalization. Then, we introduce motion cues for the first time. Using our Joint Motion and Appearance Tracking (JMAT) strategy, we obtain a joint similarity measure and subsequently apply the Hungarian algorithm for data association. Finally, our Hierarchical Adaptive Feature Update (HAFU) strategy adaptively adjusts feature updates according to the current state of each trajectory, which greatly improves the utilization of temporal information. Extensive experiments on the TAO validation set and test set confirm the superiority of OVSORT, which significantly improves the handling of novel and base classes. It surpasses existing methods in terms of accuracy and generalization, setting a new state-of-the-art for OVMOT.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3009-3022"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144177411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-View User Preference Modeling for Personalized Text-to-Image Generation 个性化文本到图像生成的多视图用户偏好建模
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557683
Huaiwen Zhang;Tianci Wu;Yinwei Wei
Personalized text-to-image generation aims to synthesize images tailored to individual user preferences. Existing methods primarily generate customized content using a few reference images, which often struggle to mine user preferences from historical records, and thus fail to synthesize truly personalized content. In addition, it is difficult to directly incorporate the extracted feature of user preferences into the feature space of the generation model, since there exists a considerable gap between them. In this paper, we propose a novel multi-view personalized text-to-image generation method based on the diffusion model, named MVP-Diffusion, which learns instance- and user-level preferences from historical records and integrates them into the generation model. For instance-level user preference modeling, we employ a chain-of-thought prompting strategy to deduce preference keywords and integrate them into input prompts with the aid of a large language model. For user-level preference modeling, we construct a learnable embedding for each user to capture more comprehensive preferences by analyzing their historical records. An adaptive user preference fusion module is proposed to inject user preferences into the generation model via a set of learnable parameters. Experimental results demonstrate that the proposed method significantly enhances the personalization of the generated images compared to the other personalized text-to-image generation methods.
个性化的文本到图像生成旨在合成适合个人用户偏好的图像。现有的方法主要是使用少量参考图像生成定制内容,这些方法往往难以从历史记录中挖掘用户偏好,因此无法合成真正个性化的内容。此外,提取的用户偏好特征很难直接纳入生成模型的特征空间,因为它们之间存在相当大的差距。本文提出了一种新的基于扩散模型的多视图个性化文本到图像生成方法,称为MVP-Diffusion,该方法从历史记录中学习实例级和用户级偏好,并将其集成到生成模型中。对于实例级用户偏好建模,我们采用思维链提示策略来推断偏好关键字,并借助大型语言模型将其集成到输入提示中。对于用户级偏好建模,我们为每个用户构建一个可学习的嵌入,通过分析他们的历史记录来捕获更全面的偏好。提出了一种自适应用户偏好融合模块,通过一组可学习参数将用户偏好注入生成模型。实验结果表明,与其他个性化文本到图像生成方法相比,该方法显著增强了生成图像的个性化。
{"title":"Multi-View User Preference Modeling for Personalized Text-to-Image Generation","authors":"Huaiwen Zhang;Tianci Wu;Yinwei Wei","doi":"10.1109/TMM.2025.3557683","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557683","url":null,"abstract":"Personalized text-to-image generation aims to synthesize images tailored to individual user preferences. Existing methods primarily generate customized content using a few reference images, which often struggle to mine user preferences from historical records, and thus fail to synthesize truly personalized content. In addition, it is difficult to directly incorporate the extracted feature of user preferences into the feature space of the generation model, since there exists a considerable gap between them. In this paper, we propose a novel multi-view personalized text-to-image generation method based on the diffusion model, named MVP-Diffusion, which learns instance- and user-level preferences from historical records and integrates them into the generation model. For instance-level user preference modeling, we employ a chain-of-thought prompting strategy to deduce preference keywords and integrate them into input prompts with the aid of a large language model. For user-level preference modeling, we construct a learnable embedding for each user to capture more comprehensive preferences by analyzing their historical records. An adaptive user preference fusion module is proposed to inject user preferences into the generation model via a set of learnable parameters. Experimental results demonstrate that the proposed method significantly enhances the personalization of the generated images compared to the other personalized text-to-image generation methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3082-3091"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144177413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Target Pose Estimation and Behavior Analysis Based on Symmetric Cascaded AdderNet 基于对称级联AdderNet的多目标姿态估计与行为分析
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1109/TMM.2025.3557614
Xiaoshuo Jia;Qingzhen Xu;Aiqing Zhu;Xiaomei Kuang
In the tasks of pose estimation and behavior analysis in computer vision, conventional models are often constrained by various factors or complex environments (such as multiple targets, small targets, occluded targets, etc.). To address this problem, this paper proposes a symmetric cascaded additive network (MulAG) to improve the accuracy of posture estimation and behavior analysis in complex environments. MulAG consists of two modules, MulA and MulG. The MulA module is designed based on a cascaded symmetric network structure and incorporates the addition operation. MulA extracts the posture spatial features of the target from a single frame image. And, the MulG module is designed based on three continuous GRUs (gated recurrent unit). Based on the MulA, MulG extracts the posture temporal features from the posture spatial features of the moving target and predicts the posture temporal features of the moving target. The paper firstly demonstrates the feasibility of addition operations in pose estimation tasks by comparing with MobileNet-v3 in ablation experiments. Secondly, on the HiEve and CrowdPose datasets, MulA achieves accuracy of 79.6% and 80.4%, respectively, outperforming the PTM model by 12.0% and 21.2%. Detection speed of MulA achieves the best value at 8.6 ms, which is 1 times higher than HDGCN. The result demonstrates the effectiveness of MulA in multi-target pose estimation in complex scenes. Finally, on the HDMB-51 and UCF-101 datasets, MulAG achieves accuracy of 74.8% and 86.3%, respectively, outperforming HDGCN by 9.6% and 9.5%. Compared with SKP and GIST, the fps of MulAG (44.8 s−1) is improved by 8.2% and 8.9%. These experiments highlight the generalizability and superiority of MulAG in behavior analysis and pose estimation tasks.
在计算机视觉的姿态估计和行为分析任务中,常规模型经常受到各种因素或复杂环境(如多目标、小目标、遮挡目标等)的约束。为了解决这一问题,本文提出了一种对称级联加性网络(MulAG),以提高复杂环境下姿态估计和行为分析的准确性。MulAG由MulA和MulG两个模块组成。MulA模块是基于级联对称网络结构设计的,并结合了加法运算。MulA从单帧图像中提取目标的姿态空间特征。MulG模块是基于三个连续gru(门控循环单元)设计的。MulG基于MulA,从运动目标的姿态空间特征中提取姿态时间特征,并对运动目标的姿态时间特征进行预测。本文首先通过与MobileNet-v3在烧蚀实验中的对比,论证了加法运算在位姿估计任务中的可行性。其次,在HiEve和CrowdPose数据集上,MulA的准确率分别达到79.6%和80.4%,分别比PTM模型高12.0%和21.2%。MulA的检测速度在8.6 ms时达到最佳值,是HDGCN的1倍。实验结果证明了MulA算法在复杂场景下多目标姿态估计中的有效性。最后,在HDMB-51和UCF-101数据集上,MulAG的准确率分别达到74.8%和86.3%,比HDGCN高9.6%和9.5%。与SKP和GIST相比,MulAG的fps (44.8 s−1)分别提高了8.2%和8.9%。这些实验突出了MulAG在行为分析和姿态估计任务中的通用性和优越性。
{"title":"Multi-Target Pose Estimation and Behavior Analysis Based on Symmetric Cascaded AdderNet","authors":"Xiaoshuo Jia;Qingzhen Xu;Aiqing Zhu;Xiaomei Kuang","doi":"10.1109/TMM.2025.3557614","DOIUrl":"https://doi.org/10.1109/TMM.2025.3557614","url":null,"abstract":"In the tasks of pose estimation and behavior analysis in computer vision, conventional models are often constrained by various factors or complex environments (such as multiple targets, small targets, occluded targets, etc.). To address this problem, this paper proposes a symmetric cascaded additive network (MulAG) to improve the accuracy of posture estimation and behavior analysis in complex environments. MulAG consists of two modules, MulA and MulG. The MulA module is designed based on a cascaded symmetric network structure and incorporates the addition operation. MulA extracts the posture spatial features of the target from a single frame image. And, the MulG module is designed based on three continuous GRUs (gated recurrent unit). Based on the MulA, MulG extracts the posture temporal features from the posture spatial features of the moving target and predicts the posture temporal features of the moving target. The paper firstly demonstrates the feasibility of addition operations in pose estimation tasks by comparing with MobileNet-v3 in ablation experiments. Secondly, on the HiEve and CrowdPose datasets, MulA achieves accuracy of 79.6% and 80.4%, respectively, outperforming the PTM model by 12.0% and 21.2%. Detection speed of MulA achieves the best value at 8.6 ms, which is 1 times higher than HDGCN. The result demonstrates the effectiveness of MulA in multi-target pose estimation in complex scenes. Finally, on the HDMB-51 and UCF-101 datasets, MulAG achieves accuracy of 74.8% and 86.3%, respectively, outperforming HDGCN by 9.6% and 9.5%. Compared with SKP and GIST, the fps of MulAG (44.8 s<sup>−1</sup>) is improved by 8.2% and 8.9%. These experiments highlight the generalizability and superiority of MulAG in behavior analysis and pose estimation tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3197-3209"},"PeriodicalIF":8.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: When Multimedia Meets Food: Multimedia Computing for Food Data Analysis and Applications 客座评论:当多媒体与食品相遇:食品数据分析和应用的多媒体计算
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-28 DOI: 10.1109/TMM.2025.3566452
Weiqing Min;Shuqiang Jiang;Petia Radeva;Vladimir Pavlovic;Chong-Wah Ngo;Kiyoharu Aizawa;Wanqing Li
{"title":"Guest Editorial: When Multimedia Meets Food: Multimedia Computing for Food Data Analysis and Applications","authors":"Weiqing Min;Shuqiang Jiang;Petia Radeva;Vladimir Pavlovic;Chong-Wah Ngo;Kiyoharu Aizawa;Wanqing Li","doi":"10.1109/TMM.2025.3566452","DOIUrl":"https://doi.org/10.1109/TMM.2025.3566452","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2708-2712"},"PeriodicalIF":8.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11016286","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Bijection Supervised Pyramid Diffeomorphic Deformation for Learning Tooth Meshes From CBCT Images 基于连续双射监督金字塔微分变形的CBCT图像齿网学习
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-10 DOI: 10.1109/TMM.2025.3543091
Zechu Zhang;Weilong Peng;Jinyu Wen;Keke Tang;Meie Fang;David Dagan Feng;Ping Li
Accurate and high-quality tooth mesh generation from cone-beam computerized tomography (CBCT) is an essential computer-aided technology for digital dentistry. However, existing segmentation-based methods require complicated post-processing and significant manual correction to generate regular tooth meshes. In this paper, we propose a method of continuous bijection supervised pyramid diffeomorphic deformation (PDD) for learning tooth meshes, which could be used to directly generate high-quality tooth meshes from CBCT Images. Overall, we adopt a classic two-stage framework. In the first stage, we devise an enhanced detector to accurately locate and crop every tooth. In the second stage, a PDD network is designed to deform a sphere mesh from low resolution to high one according to pyramid flows based on diffeomorphic mesh deformations, so that the generated mesh approximates the ground truth infinitely and efficiently. To achieve that, a novel continuous bijection distance loss on the diffeomorphic sphere is also designed to supervise the deformation learning, which overcomes the shortcoming of loss based on nearest-neighbour mapping and improves the fitting precision. Experiments show that our method outperforms the state-of-the-art methods in terms of both different evaluation metrics and the geometry quality of reconstructed tooth surfaces.
锥形束计算机断层扫描(CBCT)生成准确、高质量的牙齿网格是数字牙科的重要计算机辅助技术。然而,现有的基于分割的方法需要复杂的后处理和大量的人工校正才能生成规则的牙齿网格。本文提出了一种基于连续双射监督金字塔微分变形(PDD)的牙齿网格学习方法,该方法可直接从CBCT图像中生成高质量的牙齿网格。总的来说,我们采用了一个经典的两阶段框架。在第一阶段,我们设计了一个增强的检测器来准确地定位和切割每颗牙齿。第二阶段,设计PDD网络,基于微同构网格变形,根据金字塔流将球面网格从低分辨率变形为高分辨率,使生成的网格无限高效地逼近地面真实。为此,设计了一种新的差分同构球上的连续双射距离损失来监督变形学习,克服了基于最近邻映射损失的缺点,提高了拟合精度。实验表明,该方法在不同的评价指标和重构齿面几何质量方面都优于现有的方法。
{"title":"Continuous Bijection Supervised Pyramid Diffeomorphic Deformation for Learning Tooth Meshes From CBCT Images","authors":"Zechu Zhang;Weilong Peng;Jinyu Wen;Keke Tang;Meie Fang;David Dagan Feng;Ping Li","doi":"10.1109/TMM.2025.3543091","DOIUrl":"https://doi.org/10.1109/TMM.2025.3543091","url":null,"abstract":"Accurate and high-quality tooth mesh generation from cone-beam computerized tomography (CBCT) is an essential computer-aided technology for digital dentistry. However, existing segmentation-based methods require complicated post-processing and significant manual correction to generate regular tooth meshes. In this paper, we propose a method of continuous bijection supervised pyramid diffeomorphic deformation (PDD) for learning tooth meshes, which could be used to directly generate high-quality tooth meshes from CBCT Images. Overall, we adopt a classic two-stage framework. In the first stage, we devise an enhanced detector to accurately locate and crop every tooth. In the second stage, a PDD network is designed to deform a sphere mesh from low resolution to high one according to pyramid flows based on diffeomorphic mesh deformations, so that the generated mesh approximates the ground truth infinitely and efficiently. To achieve that, a novel continuous bijection distance loss on the diffeomorphic sphere is also designed to supervise the deformation learning, which overcomes the shortcoming of loss based on nearest-neighbour mapping and improves the fitting precision. Experiments show that our method outperforms the state-of-the-art methods in terms of both different evaluation metrics and the geometry quality of reconstructed tooth surfaces.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5696-5708"},"PeriodicalIF":9.7,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Neural Network Watermarking Protocol Against Evidence Exposure Attack 抗证据暴露攻击的安全神经网络水印协议
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-06 DOI: 10.1109/TMM.2025.3542975
Huixin Luo;Li Li;Xinpeng Zhang
Trigger-based backdoor watermarking is an extensively utilized and effective method to safeguard the copyright of deep neural networks (DNNs), in which the trigger set could be taken as the key of the watermark. However, during the verification stage, there is a risk that the trigger set could be leaked and exposed to adversaries. If this occurs, the adversaries might apply this leaked trigger set to claim ownership of the model, posing significant copyright issues for the watermarked DNN. To address such an evidence exposure problem, a secure neural network watermarking protocol is put forward in this paper. In the proposed protocol, the trigger set is not fixed, once the trigger is utilized for verification, it is invalid and cannot be used for verification in the future. As a result, even if the trigger set is leaked during the verification process and obtained by the attacker, they cannot use it for copyright verification since it is invalid. To assist the protocol, a trigger set generation method is designed, in which the auxiliary classifier generative adversarial network (ACGAN) and the target classification model are trained together. The special logits distribution and the labels of the generated trigger samples can be ensured and verified effectively in this way. The performance of the trigger generation methods regarding effectiveness, fidelity, and robustness is verified by experiments, and the security analysis of the designed watermarking protocol is conducted.
基于触发器的后门水印是一种广泛使用的有效的深度神经网络版权保护方法,其中触发器集可以作为水印的密钥。然而,在验证阶段,存在触发集泄露并暴露给对手的风险。如果发生这种情况,攻击者可能会应用这个泄露的触发集来声明模型的所有权,从而为带水印的DNN带来重大的版权问题。为了解决这类证据暴露问题,本文提出了一种安全的神经网络水印协议。在提议的协议中,触发集是不固定的,一旦触发被用于验证,它就失效了,以后不能再用于验证。因此,即使触发集在验证过程中被泄露并被攻击者获取,也无法用于版权验证,因为它是无效的。为了辅助该协议,设计了一种触发集生成方法,该方法将辅助分类器生成对抗网络(ACGAN)和目标分类模型一起训练。这样可以有效地保证和验证生成的触发样本的特殊逻辑分布和标签。实验验证了触发器生成方法在有效性、保真度和鲁棒性方面的性能,并对所设计的水印协议进行了安全性分析。
{"title":"Secure Neural Network Watermarking Protocol Against Evidence Exposure Attack","authors":"Huixin Luo;Li Li;Xinpeng Zhang","doi":"10.1109/TMM.2025.3542975","DOIUrl":"https://doi.org/10.1109/TMM.2025.3542975","url":null,"abstract":"Trigger-based backdoor watermarking is an extensively utilized and effective method to safeguard the copyright of deep neural networks (DNNs), in which the trigger set could be taken as the key of the watermark. However, during the verification stage, there is a risk that the trigger set could be leaked and exposed to adversaries. If this occurs, the adversaries might apply this leaked trigger set to claim ownership of the model, posing significant copyright issues for the watermarked DNN. To address such an evidence exposure problem, a secure neural network watermarking protocol is put forward in this paper. In the proposed protocol, the trigger set is not fixed, once the trigger is utilized for verification, it is invalid and cannot be used for verification in the future. As a result, even if the trigger set is leaked during the verification process and obtained by the attacker, they cannot use it for copyright verification since it is invalid. To assist the protocol, a trigger set generation method is designed, in which the auxiliary classifier generative adversarial network (ACGAN) and the target classification model are trained together. The special logits distribution and the labels of the generated trigger samples can be ensured and verified effectively in this way. The performance of the trigger generation methods regarding effectiveness, fidelity, and robustness is verified by experiments, and the security analysis of the designed watermarking protocol is conducted.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5563-5574"},"PeriodicalIF":9.7,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1