首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation 学习视觉条件标记,纠正领域偏移,实现完全测试时间适应性
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/tmm.2024.3443633
Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He
{"title":"Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation","authors":"Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He","doi":"10.1109/tmm.2024.3443633","DOIUrl":"https://doi.org/10.1109/tmm.2024.3443633","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Uncertainty Calibration for Federated Time Series Analysis 联合时间序列分析的贝叶斯不确定性校准
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/tmm.2024.3443627
Chao Cai, Weide Liu, Xue Xia, Zhenghua Chen, Yuming Fang
{"title":"Bayesian Uncertainty Calibration for Federated Time Series Analysis","authors":"Chao Cai, Weide Liu, Xue Xia, Zhenghua Chen, Yuming Fang","doi":"10.1109/tmm.2024.3443627","DOIUrl":"https://doi.org/10.1109/tmm.2024.3443627","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces 利用三维和二维空间中的互补特征进行彩色点云质量评估
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/tmm.2024.3443634
Mao Cui, Yun Zhang, Chunling Fan, Raouf Hamzaoui, Qinglan Li
{"title":"Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces","authors":"Mao Cui, Yun Zhang, Chunling Fan, Raouf Hamzaoui, Qinglan Li","doi":"10.1109/tmm.2024.3443634","DOIUrl":"https://doi.org/10.1109/tmm.2024.3443634","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding 多模态理解的预训练模型》特约编辑导言
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-31 DOI: 10.1109/TMM.2024.3384680
Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal
In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.
在不断发展的多媒体领域,多模态理解的重要性怎么强调都不为过。随着多媒体内容变得越来越复杂和无处不在,有效地组合和分析来自不同类型数据(如文本、音频、图像、视频和点云)的各种信息的能力,对于推动技术在理解我们周围的世界并与之互动方面所能达到的极限将是至关重要的。因此,多模态理解吸引了大量研究,成为一个新兴课题。预训练模型尤其为这一领域带来了革命性的变化,它提供了一种无需特定任务注释即可利用海量数据的方法,从而为各种下游任务提供了便利。
{"title":"Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding","authors":"Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal","doi":"10.1109/TMM.2024.3384680","DOIUrl":"10.1109/TMM.2024.3384680","url":null,"abstract":"In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10616245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability 基于多尺度退化的自适应攻击,提升逆向可转移性
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-23 DOI: 10.1109/tmm.2024.3428311
Ran Ran, Jiwei Wei, Chaoning Zhang, Guoqing Wang, Yang Yang, Heng Tao Shen
{"title":"Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability","authors":"Ran Ran, Jiwei Wei, Chaoning Zhang, Guoqing Wang, Yang Yang, Heng Tao Shen","doi":"10.1109/tmm.2024.3428311","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428311","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings 利用角度重构文本嵌入检索零镜头视频瞬间
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-19 DOI: 10.1109/TMM.2024.3396272
Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with Angular Reconstructive Text embeddings (ART), generalizing the image-text matching pre-trained model CLIP to the VMR task. Specifically, assuming that visual embeddings are close to their semantically related text embeddings in angular space, our ART method generates pseudo-text embeddings of video event proposals through the hypersphere of CLIP. Moreover, to address the temporal nature of videos, we also design local multimodal fusion learning to narrow the gaps between image-text matching and video-text matching. Our experimental results on two widely used VMR benchmarks, Charades-STA and ActivityNet-Captions, show that our method outperforms current state-of-the-art ZS-VMR methods. It also achieves competitive performance compared to recent weakly-supervised VMR methods.
给定一段未经剪辑的视频和一个文本查询,视频时刻检索(VMR)旨在检索视频内容与文本查询语义相关的特定时刻。传统的 VMR 方法依赖于视频-文本配对数据或每个目标事件的特定时间注释。然而,标注过程的主观性和耗时性限制了这些方法在多媒体应用中的实用性。为了解决这个问题,最近有研究人员提出了一种用于 VMR 的零镜头学习设置(Zero-Shot Learning setting for VMR,ZS-VMR),它可以在没有人工监督信号的情况下训练 VMR 模型,从而降低数据成本。在本文中,我们利用角度重构文本嵌入(ART)解决了具有挑战性的 ZS-VMR 问题,将图像-文本匹配预训练模型 CLIP 推广到了 VMR 任务中。具体来说,我们的 ART 方法假定视觉嵌入与其语义相关的文本嵌入在角度空间上很接近,通过 CLIP 的超球生成视频事件提案的伪文本嵌入。此外,针对视频的时间特性,我们还设计了局部多模态融合学习,以缩小图像-文本匹配和视频-文本匹配之间的差距。我们在两个广泛使用的 VMR 基准(Charades-STA 和 ActivityNet-Captions)上的实验结果表明,我们的方法优于目前最先进的 ZS-VMR 方法。与最新的弱监督 VMR 方法相比,我们的方法也取得了具有竞争力的性能。
{"title":"Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings","authors":"Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen","doi":"10.1109/TMM.2024.3396272","DOIUrl":"10.1109/TMM.2024.3396272","url":null,"abstract":"Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with \u0000<italic>Angular Reconstructive Text embeddings (ART)</i>\u0000, generalizing the image-text matching pre-trained model CLIP to the VMR task. Specifically, assuming that visual embeddings are close to their semantically related text embeddings in angular space, our ART method generates pseudo-text embeddings of video event proposals through the hypersphere of CLIP. Moreover, to address the temporal nature of videos, we also design local multimodal fusion learning to narrow the gaps between image-text matching and video-text matching. Our experimental results on two widely used VMR benchmarks, Charades-STA and ActivityNet-Captions, show that our method outperforms current state-of-the-art ZS-VMR methods. It also achieves competitive performance compared to recent weakly-supervised VMR methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":8.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation 原型分解知识提炼,用于学习广义联合表征
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-16 DOI: 10.1109/tmm.2024.3428352
Aming Wu, Jiaping Yu, Yuxuan Wang, Cheng Deng
{"title":"Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation","authors":"Aming Wu, Jiaping Yu, Yuxuan Wang, Cheng Deng","doi":"10.1109/tmm.2024.3428352","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428352","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation CenterFormer:用于无约束牙菌斑分段的新型集群中心增强变换器
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-16 DOI: 10.1109/tmm.2024.3428349
Wenfeng Song, Xuan Wang, Yuting Guo, Shuai Li, Bin Xia, Aimin Hao
{"title":"CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation","authors":"Wenfeng Song, Xuan Wang, Yuting Guo, Shuai Li, Bin Xia, Aimin Hao","doi":"10.1109/tmm.2024.3428349","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428349","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Aggregated Graph Neural Network for Skeleton-based Action Recognition 基于骨架的动作识别的层次聚合图神经网络
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-15 DOI: 10.1109/tmm.2024.3428330
Pei Geng, Xuequan Lu, Wanqing Li, Lei Lyu
{"title":"Hierarchical Aggregated Graph Neural Network for Skeleton-based Action Recognition","authors":"Pei Geng, Xuequan Lu, Wanqing Li, Lei Lyu","doi":"10.1109/tmm.2024.3428330","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428330","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMEye: An Interactive Perception Network for Large Language Models LMEye:大型语言模型的交互式感知网络
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-15 DOI: 10.1109/tmm.2024.3428317
Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang
{"title":"LMEye: An Interactive Perception Network for Large Language Models","authors":"Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang","doi":"10.1109/tmm.2024.3428317","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428317","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":null,"pages":null},"PeriodicalIF":7.3,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1