Pub Date : 2024-08-14DOI: 10.1109/TMM.2024.3443637
Yuqi Jiang;Jing Li;Haidong Qin;Yanran Dai;Jing Liu;Guodong Zhang;Canbin Zhang;Tao Yang
We introduce GS-SFS, a method that utilizes a camera array with wide baselines for high-quality multiple human mesh reconstruction in large-scale sports scenes. Traditional human reconstruction methods in sports scenes, such as Shape-from-Silhouette (SFS), struggle with sparse camera setups and small human targets, making it challenging to obtain complete and accurate human representations. Despite advances in differentiable rendering, including 3D Gaussian Splatting (3DGS), which can produce photorealistic novel-view renderings with dense inputs, accurate depiction of surfaces and generation of detailed meshes is still challenging. Our approach uniquely combines 3DGS's view synthesis with an optimized SFS method, thereby significantly enhancing the quality of multiperson mesh reconstruction in large-scale sports scenes. Specifically, we introduce body shape priors, including the human surface point clouds extracted through SFS and human silhouettes, to constrain 3DGS to a more accurate representation of the human body only. Then, we develop an improved mesh reconstruction method based on SFS, mainly by adding additional viewpoints through 3DGS and obtaining a more accurate surface to achieve higher-quality reconstruction models. We implement a high-density scene resampling strategy based on spherical sampling of human bounding boxes and render new perspectives using 3D Gaussian Splatting to create precise and dense multi-view human silhouettes. During mesh reconstruction, we integrate the human body's 2D Signed Distance Function (SDF) into the computation of the SFS's implicit surface field, resulting in smoother and more accurate surfaces. Moreover, we enhance mesh texture mapping by blending original and rendered images with different weights, preserving high-quality textures while compensating for missing details. The experimental results from real basketball game scenarios demonstrate the significant improvements of our approach for multiple human body model reconstruction in complex sports settings.
{"title":"GS-SFS: Joint Gaussian Splatting and Shape-From-Silhouette for Multiple Human Reconstruction in Large-Scale Sports Scenes","authors":"Yuqi Jiang;Jing Li;Haidong Qin;Yanran Dai;Jing Liu;Guodong Zhang;Canbin Zhang;Tao Yang","doi":"10.1109/TMM.2024.3443637","DOIUrl":"10.1109/TMM.2024.3443637","url":null,"abstract":"We introduce GS-SFS, a method that utilizes a camera array with wide baselines for high-quality multiple human mesh reconstruction in large-scale sports scenes. Traditional human reconstruction methods in sports scenes, such as Shape-from-Silhouette (SFS), struggle with sparse camera setups and small human targets, making it challenging to obtain complete and accurate human representations. Despite advances in differentiable rendering, including 3D Gaussian Splatting (3DGS), which can produce photorealistic novel-view renderings with dense inputs, accurate depiction of surfaces and generation of detailed meshes is still challenging. Our approach uniquely combines 3DGS's view synthesis with an optimized SFS method, thereby significantly enhancing the quality of multiperson mesh reconstruction in large-scale sports scenes. Specifically, we introduce body shape priors, including the human surface point clouds extracted through SFS and human silhouettes, to constrain 3DGS to a more accurate representation of the human body only. Then, we develop an improved mesh reconstruction method based on SFS, mainly by adding additional viewpoints through 3DGS and obtaining a more accurate surface to achieve higher-quality reconstruction models. We implement a high-density scene resampling strategy based on spherical sampling of human bounding boxes and render new perspectives using 3D Gaussian Splatting to create precise and dense multi-view human silhouettes. During mesh reconstruction, we integrate the human body's 2D Signed Distance Function (SDF) into the computation of the SFS's implicit surface field, resulting in smoother and more accurate surfaces. Moreover, we enhance mesh texture mapping by blending original and rendered images with different weights, preserving high-quality textures while compensating for missing details. The experimental results from real basketball game scenarios demonstrate the significant improvements of our approach for multiple human body model reconstruction in complex sports settings.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11095-11110"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning models for time series analysis often require large-scale labeled datasets for training. However, acquiring such datasets is cost-intensive and challenging, particularly for individual institutions. To overcome this challenge and concern about data confidentiality among different institutions, federated learning (FL) servers as a viable solution to this dilemma by offering a decentralized learning framework. However, the datasets collected by each institution often suffer from imbalance and may not adhere to uniform protocols, leading to diverse data distributions. To address this problem, we design a global model to approximate the global data distribution of all participant clients, then transfer it to local clients as an induction in the training phase. While discrepancies between the approximate distribution and the actual distribution result in uncertainty in the predicted results. Moreover, the diverse data distributions among various clients within the FL framework, combined with the inherent lack of reliability and interpretability in deep learning models, further amplify the uncertainty of the prediction results. To address these issues, we propose an uncertainty calibration method based on Bayesian deep learning techniques, which captures uncertainty by learning a fidelity transformation to reconstruct the output of time series regression and classification tasks, utilizing deterministic pre-trained models. Extensive experiments on the regression dataset (C-MAPSS) and classification datasets (ESR, Sleep-EDF, HAR, and FD) in the Independent and Identically Distributed (IID) and non-IID settings show that our approach effectively calibrates uncertainty within the FL framework and facilitates better generalization performance in both the regression and classification tasks, achieving state-of-the-art performance.
{"title":"Bayesian Uncertainty Calibration for Federated Time Series Analysis","authors":"Chao Cai;Weide Liu;Xue Xia;Zhenghua Chen;Yuming Fang","doi":"10.1109/TMM.2024.3443627","DOIUrl":"10.1109/TMM.2024.3443627","url":null,"abstract":"Deep learning models for time series analysis often require large-scale labeled datasets for training. However, acquiring such datasets is cost-intensive and challenging, particularly for individual institutions. To overcome this challenge and concern about data confidentiality among different institutions, federated learning (FL) servers as a viable solution to this dilemma by offering a decentralized learning framework. However, the datasets collected by each institution often suffer from imbalance and may not adhere to uniform protocols, leading to diverse data distributions. To address this problem, we design a global model to approximate the global data distribution of all participant clients, then transfer it to local clients as an induction in the training phase. While discrepancies between the approximate distribution and the actual distribution result in uncertainty in the predicted results. Moreover, the diverse data distributions among various clients within the FL framework, combined with the inherent lack of reliability and interpretability in deep learning models, further amplify the uncertainty of the prediction results. To address these issues, we propose an uncertainty calibration method based on Bayesian deep learning techniques, which captures uncertainty by learning a fidelity transformation to reconstruct the output of time series regression and classification tasks, utilizing deterministic pre-trained models. Extensive experiments on the regression dataset (C-MAPSS) and classification datasets (ESR, Sleep-EDF, HAR, and FD) in the Independent and Identically Distributed (IID) and non-IID settings show that our approach effectively calibrates uncertainty within the FL framework and facilitates better generalization performance in both the regression and classification tasks, achieving state-of-the-art performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11151-11163"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1109/TMM.2024.3443634
Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li
Point Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.
{"title":"Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces","authors":"Mao Cui;Yun Zhang;Chunling Fan;Raouf Hamzaoui;Qinglan Li","doi":"10.1109/TMM.2024.3443634","DOIUrl":"10.1109/TMM.2024.3443634","url":null,"abstract":"Point Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11111-11125"},"PeriodicalIF":8.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2023-08-29DOI: 10.1007/s11571-023-09997-1
Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter
In the present study, we investigated traveling waves induced by transcranial alternating current stimulation in the alpha frequency band of healthy subjects. Electroencephalographic data were recorded in 12 healthy subjects before, during, and after phase-shifted stimulation with a device combining both electroencephalographic and stimulation capacities. In addition, we analyzed the results of numerical simulations and compared them to the results of identical analysis on real EEG data. The results of numerical simulations indicate that imposed transcranial alternating current stimulation induces a rotating electric field. The direction of waves induced by stimulation was observed more often during at least 30 s after the end of stimulation, demonstrating the presence of aftereffects of the stimulation. Results suggest that the proposed approach could be used to modulate the interaction between distant areas of the cortex. Non-invasive transcranial alternating current stimulation can be used to facilitate the propagation of circulating waves at a particular frequency and in a controlled direction. The results presented open new opportunities for developing innovative and personalized transcranial alternating current stimulation protocols to treat various neurological disorders.
Supplementary information: The online version contains supplementary material available at 10.1007/s11571-023-09997-1.
{"title":"Phase-shifted tACS can modulate cortical alpha waves in human subjects.","authors":"Alexandre Aksenov, Malo Renaud-D'Ambra, Vitaly Volpert, Anne Beuter","doi":"10.1007/s11571-023-09997-1","DOIUrl":"10.1007/s11571-023-09997-1","url":null,"abstract":"<p><p>In the present study, we investigated traveling waves induced by transcranial alternating current stimulation in the alpha frequency band of healthy subjects. Electroencephalographic data were recorded in 12 healthy subjects before, during, and after phase-shifted stimulation with a device combining both electroencephalographic and stimulation capacities. In addition, we analyzed the results of numerical simulations and compared them to the results of identical analysis on real EEG data. The results of numerical simulations indicate that imposed transcranial alternating current stimulation induces a rotating electric field. The direction of waves induced by stimulation was observed more often during at least 30 s after the end of stimulation, demonstrating the presence of aftereffects of the stimulation. Results suggest that the proposed approach could be used to modulate the interaction between distant areas of the cortex. Non-invasive transcranial alternating current stimulation can be used to facilitate the propagation of circulating waves at a particular frequency and in a controlled direction. The results presented open new opportunities for developing innovative and personalized transcranial alternating current stimulation protocols to treat various neurological disorders.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11571-023-09997-1.</p>","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"24 1","pages":"1575-1592"},"PeriodicalIF":3.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11297852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52867081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1109/TMM.2024.3384680
Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal
In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.
{"title":"Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding","authors":"Wengang Zhou;Jiajun Deng;Niculae Sebe;Qi Tian;Alan L. Yuille;Concetto Spampinato;Zakia Hammal","doi":"10.1109/TMM.2024.3384680","DOIUrl":"10.1109/TMM.2024.3384680","url":null,"abstract":"In the ever-evolving domain of multimedia, the significance of multi-modality understanding cannot be overstated. As multimedia content becomes increasingly sophisticated and ubiquitous, the ability to effectively combine and analyze the diverse information from different types of data, such as text, audio, image, video and point clouds, will be paramount in pushing the boundaries of what technology can achieve in understanding and interacting with the world around us. Accordingly, multi-modality understanding has attracted a tremendous amount of research, establishing itself as an emerging topic. Pre-trained models, in particular, have revolutionized this field, providing a way to leverage vast amounts of data without task-specific annotation to facilitate various downstream tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"8291-8296"},"PeriodicalIF":8.4,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10616245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1109/TMM.2024.3396272
Xun Jiang;Xing Xu;Zailei Zhou;Yang Yang;Fumin Shen;Heng Tao Shen
Given an untrimmed video and a text query, Video Moment Retrieval (VMR) aims at retrieving a specific moment where the video content is semantically related to the text query. Conventional VMR methods rely on video-text paired data or specific temporal annotations for each target event. However, the subjectivity and time-consuming nature of the labeling process limit their practicality in multimedia applications. To address this issue, recently researchers proposed a Zero-Shot Learning setting for VMR (ZS-VMR) that trains VMR models without manual supervision signals, thereby reducing the data cost. In this paper, we tackle the challenging ZS-VMR problem with Angular Reconstructive Text embeddings (ART)