首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
GS-SFS: Joint Gaussian Splatting and Shape-From-Silhouette for Multiple Human Reconstruction in Large-Scale Sports Scenes GS-SFS:联合高斯拼接和轮廓塑形技术,用于大规模运动场景中的多人重构
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/TMM.2024.3443637
Yuqi Jiang;Jing Li;Haidong Qin;Yanran Dai;Jing Liu;Guodong Zhang;Canbin Zhang;Tao Yang
We introduce GS-SFS, a method that utilizes a camera array with wide baselines for high-quality multiple human mesh reconstruction in large-scale sports scenes. Traditional human reconstruction methods in sports scenes, such as Shape-from-Silhouette (SFS), struggle with sparse camera setups and small human targets, making it challenging to obtain complete and accurate human representations. Despite advances in differentiable rendering, including 3D Gaussian Splatting (3DGS), which can produce photorealistic novel-view renderings with dense inputs, accurate depiction of surfaces and generation of detailed meshes is still challenging. Our approach uniquely combines 3DGS's view synthesis with an optimized SFS method, thereby significantly enhancing the quality of multiperson mesh reconstruction in large-scale sports scenes. Specifically, we introduce body shape priors, including the human surface point clouds extracted through SFS and human silhouettes, to constrain 3DGS to a more accurate representation of the human body only. Then, we develop an improved mesh reconstruction method based on SFS, mainly by adding additional viewpoints through 3DGS and obtaining a more accurate surface to achieve higher-quality reconstruction models. We implement a high-density scene resampling strategy based on spherical sampling of human bounding boxes and render new perspectives using 3D Gaussian Splatting to create precise and dense multi-view human silhouettes. During mesh reconstruction, we integrate the human body's 2D Signed Distance Function (SDF) into the computation of the SFS's implicit surface field, resulting in smoother and more accurate surfaces. Moreover, we enhance mesh texture mapping by blending original and rendered images with different weights, preserving high-quality textures while compensating for missing details. The experimental results from real basketball game scenarios demonstrate the significant improvements of our approach for multiple human body model reconstruction in complex sports settings.
我们介绍了 GS-SFS,这是一种利用具有宽基线的摄像机阵列在大规模运动场景中进行高质量多重人体网格重建的方法。传统的运动场景中的人体重建方法,如轮廓重建(Shape-from-Silhouette,SFS),在摄像机设置稀疏和人体目标较小的情况下很难获得完整准确的人体表现。尽管可微分渲染技术(包括 3D Gaussian Splatting (3DGS))取得了进步,可以在高密度输入的情况下生成逼真的新颖视图渲染,但准确描绘表面和生成详细网格仍是一项挑战。我们的方法独特地将 3DGS 的视图合成与优化的 SFS 方法相结合,从而显著提高了大规模运动场景中多人网格重建的质量。具体来说,我们引入了人体形状先验,包括通过 SFS 提取的人体表面点云和人体剪影,以约束 3DGS 更精确地呈现人体。然后,我们开发了一种基于 SFS 的改进型网格重建方法,主要是通过 3DGS 增加额外的视点,获得更精确的表面,从而实现更高质量的重建模型。我们在对人体边界框进行球形采样的基础上实施了高密度场景重采样策略,并利用三维高斯拼接技术渲染新视角,从而创建精确而密集的多视角人体轮廓。在网格重建过程中,我们将人体的二维签名距离函数(SDF)整合到 SFS 的隐式曲面场计算中,从而获得更平滑、更精确的曲面。此外,我们还通过混合原始图像和渲染图像的不同权重来增强网格纹理映射,在保留高质量纹理的同时补偿缺失的细节。来自真实篮球比赛场景的实验结果表明,我们的方法在复杂运动环境中重建多个人体模型方面有显著改进。
{"title":"GS-SFS: Joint Gaussian Splatting and Shape-From-Silhouette for Multiple Human Reconstruction in Large-Scale Sports Scenes","authors":"Yuqi Jiang;Jing Li;Haidong Qin;Yanran Dai;Jing Liu;Guodong Zhang;Canbin Zhang;Tao Yang","doi":"10.1109/TMM.2024.3443637","DOIUrl":"10.1109/TMM.2024.3443637","url":null,"abstract":"We introduce GS-SFS, a method that utilizes a camera array with wide baselines for high-quality multiple human mesh reconstruction in large-scale sports scenes. Traditional human reconstruction methods in sports scenes, such as Shape-from-Silhouette (SFS), struggle with sparse camera setups and small human targets, making it challenging to obtain complete and accurate human representations. Despite advances in differentiable rendering, including 3D Gaussian Splatting (3DGS), which can produce photorealistic novel-view renderings with dense inputs, accurate depiction of surfaces and generation of detailed meshes is still challenging. Our approach uniquely combines 3DGS's view synthesis with an optimized SFS method, thereby significantly enhancing the quality of multiperson mesh reconstruction in large-scale sports scenes. Specifically, we introduce body shape priors, including the human surface point clouds extracted through SFS and human silhouettes, to constrain 3DGS to a more accurate representation of the human body only. Then, we develop an improved mesh reconstruction method based on SFS, mainly by adding additional viewpoints through 3DGS and obtaining a more accurate surface to achieve higher-quality reconstruction models. We implement a high-density scene resampling strategy based on spherical sampling of human bounding boxes and render new perspectives using 3D Gaussian Splatting to create precise and dense multi-view human silhouettes. During mesh reconstruction, we integrate the human body's 2D Signed Distance Function (SDF) into the computation of the SFS's implicit surface field, resulting in smoother and more accurate surfaces. Moreover, we enhance mesh texture mapping by blending original and rendered images with different weights, preserving high-quality textures while compensating for missing details. The experimental results from real basketball game scenarios demonstrate the significant improvements of our approach for multiple human body model reconstruction in complex sports settings.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11095-11110"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation 学习视觉条件标记,纠正领域偏移,实现完全测试时间适应性
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/tmm.2024.3443633
Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He
{"title":"Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation","authors":"Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He","doi":"10.1109/tmm.2024.3443633","DOIUrl":"https://doi.org/10.1109/tmm.2024.3443633","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"16 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Uncertainty Calibration for Federated Time Series Analysis 联合时间序列分析的贝叶斯不确定性校准
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/TMM.2024.3443627
Chao Cai;Weide Liu;Xue Xia;Zhenghua Chen;Yuming Fang
Deep learning models for time series analysis often require large-scale labeled datasets for training. However, acquiring such datasets is cost-intensive and challenging, particularly for individual institutions. To overcome this challenge and concern about data confidentiality among different institutions, federated learning (FL) servers as a viable solution to this dilemma by offering a decentralized learning framework. However, the datasets collected by each institution often suffer from imbalance and may not adhere to uniform protocols, leading to diverse data distributions. To address this problem, we design a global model to approximate the global data distribution of all participant clients, then transfer it to local clients as an induction in the training phase. While discrepancies between the approximate distribution and the actual distribution result in uncertainty in the predicted results. Moreover, the diverse data distributions among various clients within the FL framework, combined with the inherent lack of reliability and interpretability in deep learning models, further amplify the uncertainty of the prediction results. To address these issues, we propose an uncertainty calibration method based on Bayesian deep learning techniques, which captures uncertainty by learning a fidelity transformation to reconstruct the output of time series regression and classification tasks, utilizing deterministic pre-trained models. Extensive experiments on the regression dataset (C-MAPSS) and classification datasets (ESR, Sleep-EDF, HAR, and FD) in the Independent and Identically Distributed (IID) and non-IID settings show that our approach effectively calibrates uncertainty within the FL framework and facilitates better generalization performance in both the regression and classification tasks, achieving state-of-the-art performance.
用于时间序列分析的深度学习模型通常需要大规模标注数据集进行训练。然而,获取此类数据集成本高昂且极具挑战性,尤其是对个别机构而言。为了克服这一挑战和对不同机构间数据保密性的担忧,联合学习(FL)服务器提供了一个分散的学习框架,成为解决这一难题的可行方案。然而,每个机构收集的数据集往往存在不平衡的问题,而且可能不遵守统一的协议,从而导致数据分布的多样性。为了解决这个问题,我们设计了一个全局模型来近似所有参与客户端的全局数据分布,然后将其转移到本地客户端,作为训练阶段的诱导。近似分布与实际分布之间的差异会导致预测结果的不确定性。此外,FL 框架内不同客户端之间的数据分布各不相同,再加上深度学习模型本身缺乏可靠性和可解释性,进一步放大了预测结果的不确定性。为了解决这些问题,我们提出了一种基于贝叶斯深度学习技术的不确定性校准方法,该方法通过学习保真度转换来捕捉不确定性,从而利用确定性预训练模型重建时间序列回归和分类任务的输出。在独立同分布(IID)和非独立同分布(IID)设置下,对回归数据集(C-MAPSS)和分类数据集(ESR、Sleep-EDF、HAR 和 FD)进行的大量实验表明,我们的方法在 FL 框架内有效地校准了不确定性,并在回归和分类任务中促进了更好的泛化性能,达到了最先进的性能。
{"title":"Bayesian Uncertainty Calibration for Federated Time Series Analysis","authors":"Chao Cai;Weide Liu;Xue Xia;Zhenghua Chen;Yuming Fang","doi":"10.1109/TMM.2024.3443627","DOIUrl":"10.1109/TMM.2024.3443627","url":null,"abstract":"Deep learning models for time series analysis often require large-scale labeled datasets for training. However, acquiring such datasets is cost-intensive and challenging, particularly for individual institutions. To overcome this challenge and concern about data confidentiality among different institutions, federated learning (FL) servers as a viable solution to this dilemma by offering a decentralized learning framework. However, the datasets collected by each institution often suffer from imbalance and may not adhere to uniform protocols, leading to diverse data distributions. To address this problem, we design a global model to approximate the global data distribution of all participant clients, then transfer it to local clients as an induction in the training phase. While discrepancies between the approximate distribution and the actual distribution result in uncertainty in the predicted results. Moreover, the diverse data distributions among various clients within the FL framework, combined with the inherent lack of reliability and interpretability in deep learning models, further amplify the uncertainty of the prediction results. To address these issues, we propose an uncertainty calibration method based on Bayesian deep learning techniques, which captures uncertainty by learning a fidelity transformation to reconstruct the output of time series regression and classification tasks, utilizing deterministic pre-trained models. Extensive experiments on the regression dataset (C-MAPSS) and classification datasets (ESR, Sleep-EDF, HAR, and FD) in the Independent and Identically Distributed (IID) and non-IID settings show that our approach effectively calibrates uncertainty within the FL framework and facilitates better generalization performance in both the regression and classification tasks, achieving state-of-the-art performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11151-11163"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces 利用三维和二维空间中的互补特征进行彩色点云质量评估
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1109/TMM.2024.3443634
Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li
Point Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.
点云质量评估(PCQA)在优化以人为本的视觉媒体应用中的点云采集、编码、传输和渲染方面发挥着至关重要的作用。在本文中,我们提出了一种使用三维和二维空间互补特征的客观 PCQA 模型,称为 CF-PCQA,用于测量彩色点云的视觉质量。首先,我们开发了四种有效的三维空间特征来表示彩色点云的感知属性,其中包括三维空间中点的曲率、峰度、亮度距离和色调特征。其次,我们使用贴片投影法将三维点云投影到二维平面上,并在空间域中提取投影二维图像的结构相似性特征,以及在小波域中提取子带相似性特征。最后,我们提出了一种特征选择和学习模型来融合高维特征并预测彩色点云的视觉质量。大量实验结果表明,在 SIAT-PCQD、SJTU-PCQA、WPC2.0 和 ICIP2020 数据集上,所提出的 CF-PCQA 的皮尔逊线性相关系数(PLCC)分别为 0.9117、0.9005、0.9340 和 0.9826。此外,统计显著性检验表明,在这四个数据集上,CF-PCQA 明显优于最先进的 PCQA 基准方案。
{"title":"Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces","authors":"Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li","doi":"10.1109/TMM.2024.3443634","DOIUrl":"10.1109/TMM.2024.3443634","url":null,"abstract":"Point Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11111-11125"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phase-shifted tACS can modulate cortical alpha waves in human subjects. 相移 tACS 可以调节人体皮层阿尔法波。
IF 3.1 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 Epub Date: 2023-08-29 DOI: 10.1007/s11571-023-09997-1
Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter

In the present study, we investigated traveling waves induced by transcranial alternating current stimulation in the alpha frequency band of healthy subjects. Electroencephalographic data were recorded in 12 healthy subjects before, during, and after phase-shifted stimulation with a device combining both electroencephalographic and stimulation capacities. In addition, we analyzed the results of numerical simulations and compared them to the results of identical analysis on real EEG data. The results of numerical simulations indicate that imposed transcranial alternating current stimulation induces a rotating electric field. The direction of waves induced by stimulation was observed more often during at least 30 s after the end of stimulation, demonstrating the presence of aftereffects of the stimulation. Results suggest that the proposed approach could be used to modulate the interaction between distant areas of the cortex. Non-invasive transcranial alternating current stimulation can be used to facilitate the propagation of circulating waves at a particular frequency and in a controlled direction. The results presented open new opportunities for developing innovative and personalized transcranial alternating current stimulation protocols to treat various neurological disorders.

Supplementary information: The online version contains supplementary material available at 10.1007/s11571-023-09997-1.

在本研究中,我们研究了经颅交变电流刺激在健康受试者α频段诱发的行波。我们使用一种集脑电和刺激能力于一体的设备,记录了 12 名健康受试者在受到相移刺激之前、期间和之后的脑电数据。此外,我们还分析了数值模拟的结果,并将其与真实脑电图数据的相同分析结果进行了比较。数值模拟的结果表明,外加的经颅交变电流刺激会诱发旋转电场。在刺激结束后的至少 30 秒内,更频繁地观察到刺激诱发的电波方向,这表明刺激存在后遗效应。结果表明,建议的方法可用于调节大脑皮层远处区域之间的相互作用。非侵入性经颅交流电刺激可用于促进特定频率和受控方向的循环波传播。这些结果为开发创新的个性化经颅交变电流刺激方案治疗各种神经系统疾病提供了新的机遇:在线版本包含补充材料,可查阅 10.1007/s11571-023-09997-1。
{"title":"Phase-shifted tACS can modulate cortical alpha waves in human subjects.","authors":"Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter","doi":"10.1007/s11571-023-09997-1","DOIUrl":"10.1007/s11571-023-09997-1","url":null,"abstract":"<p><p>In the present study, we investigated traveling waves induced by transcranial alternating current stimulation in the alpha frequency band of healthy subjects. Electroencephalographic data were recorded in 12 healthy subjects before, during, and after phase-shifted stimulation with a device combining both electroencephalographic and stimulation capacities. In addition, we analyzed the results of numerical simulations and compared them to the results of identical analysis on real EEG data. The results of numerical simulations indicate that imposed transcranial alternating current stimulation induces a rotating electric field. The direction of waves induced by stimulation was observed more often during at least 30 s after the end of stimulation, demonstrating the presence of aftereffects of the stimulation. Results suggest that the proposed approach could be used to modulate the interaction between distant areas of the cortex. Non-invasive transcranial alternating current stimulation can be used to facilitate the propagation of circulating waves at a particular frequency and in a controlled direction. The results presented open new opportunities for developing innovative and personalized transcranial alternating current stimulation protocols to treat various neurological disorders.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11571-023-09997-1.</p>","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"24 1","pages":"1575-1592"},"PeriodicalIF":3.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11297852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52867081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding 多模态理解的预训练模型》特约编辑导言
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-31 DOI: 10.1109/TMM.2024.3384680
Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal
In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.
在不断发展的多媒体领域,多模态理解的重要性怎么强调都不为过。随着多媒体内容变得越来越复杂和无处不在,有效地组合和分析来自不同类型数据(如文本、音频、图像、视频和点云)的各种信息的能力,对于推动技术在理解我们周围的世界并与之互动方面所能达到的极限将是至关重要的。因此,多模态理解吸引了大量研究,成为一个新兴课题。预训练模型尤其为这一领域带来了革命性的变化,它提供了一种无需特定任务注释即可利用海量数据的方法,从而为各种下游任务提供了便利。
{"title":"Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding","authors":"Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal","doi":"10.1109/TMM.2024.3384680","DOIUrl":"10.1109/TMM.2024.3384680","url":null,"abstract":"In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"8291-8296"},"PeriodicalIF":8.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10616245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability 基于多尺度退化的自适应攻击,提升逆向可转移性
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-23 DOI: 10.1109/tmm.2024.3428311
Ran Ran, Jiwei Wei, Chaoning Zhang, Guoqing Wang, Yang Yang, Heng Tao Shen
{"title":"Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability","authors":"Ran Ran, Jiwei Wei, Chaoning Zhang, Guoqing Wang, Yang Yang, Heng Tao Shen","doi":"10.1109/tmm.2024.3428311","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428311","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"353 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings 利用角度重构文本嵌入检索零镜头视频瞬间
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-19 DOI: 10.1109/TMM.2024.3396272
Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with Angular Reconstructive Text embeddings (ART), generalizing the image-text matching pre-trained model CLIP to the VMR task. Specifically, assuming that visual embeddings are close to their semantically related text embeddings in angular space, our ART method generates pseudo-text embeddings of video event proposals through the hypersphere of CLIP. Moreover, to address the temporal nature of videos, we also design local multimodal fusion learning to narrow the gaps between image-text matching and video-text matching. Our experimental results on two widely used VMR benchmarks, Charades-STA and ActivityNet-Captions, show that our method outperforms current state-of-the-art ZS-VMR methods. It also achieves competitive performance compared to recent weakly-supervised VMR methods.
给定一段未经剪辑的视频和一个文本查询,视频时刻检索(VMR)旨在检索视频内容与文本查询语义相关的特定时刻。传统的 VMR 方法依赖于视频-文本配对数据或每个目标事件的特定时间注释。然而,标注过程的主观性和耗时性限制了这些方法在多媒体应用中的实用性。为了解决这个问题,最近有研究人员提出了一种用于 VMR 的零镜头学习设置(Zero-Shot Learning setting for VMR,ZS-VMR),它可以在没有人工监督信号的情况下训练 VMR 模型,从而降低数据成本。在本文中,我们利用角度重构文本嵌入(ART)解决了具有挑战性的 ZS-VMR 问题,将图像-文本匹配预训练模型 CLIP 推广到了 VMR 任务中。具体来说,我们的 ART 方法假定视觉嵌入与其语义相关的文本嵌入在角度空间上很接近,通过 CLIP 的超球生成视频事件提案的伪文本嵌入。此外,针对视频的时间特性,我们还设计了局部多模态融合学习,以缩小图像-文本匹配和视频-文本匹配之间的差距。我们在两个广泛使用的 VMR 基准(Charades-STA 和 ActivityNet-Captions)上的实验结果表明,我们的方法优于目前最先进的 ZS-VMR 方法。与最新的弱监督 VMR 方法相比,我们的方法也取得了具有竞争力的性能。
{"title":"Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings","authors":"Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen","doi":"10.1109/TMM.2024.3396272","DOIUrl":"10.1109/TMM.2024.3396272","url":null,"abstract":"Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with \u0000<italic>Angular Reconstructive Text embeddings (ART)</i>\u0000, generalizing the image-text matching pre-trained model CLIP to the VMR task. Specifically, assuming that visual embeddings are close to their semantically related text embeddings in angular space, our ART method generates pseudo-text embeddings of video event proposals through the hypersphere of CLIP. Moreover, to address the temporal nature of videos, we also design local multimodal fusion learning to narrow the gaps between image-text matching and video-text matching. Our experimental results on two widely used VMR benchmarks, Charades-STA and ActivityNet-Captions, show that our method outperforms current state-of-the-art ZS-VMR methods. It also achieves competitive performance compared to recent weakly-supervised VMR methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9657-9670"},"PeriodicalIF":8.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation 原型分解知识提炼,用于学习广义联合表征
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-16 DOI: 10.1109/tmm.2024.3428352
Aming Wu, Jiaping Yu, Yuxuan Wang, Cheng Deng
{"title":"Prototype-Decomposed Knowledge Distillation for Learning Generalized Federated Representation","authors":"Aming Wu, Jiaping Yu, Yuxuan Wang, Cheng Deng","doi":"10.1109/tmm.2024.3428352","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428352","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"74 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation CenterFormer:用于无约束牙菌斑分段的新型集群中心增强变换器
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-16 DOI: 10.1109/tmm.2024.3428349
Wenfeng Song, Xuan Wang, Yuting Guo, Shuai Li, Bin Xia, Aimin Hao
{"title":"CenterFormer: A Novel Cluster Center Enhanced Transformer for Unconstrained Dental Plaque Segmentation","authors":"Wenfeng Song, Xuan Wang, Yuting Guo, Shuai Li, Bin Xia, Aimin Hao","doi":"10.1109/tmm.2024.3428349","DOIUrl":"https://doi.org/10.1109/tmm.2024.3428349","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"38 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1