首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Analysis and evaluation of improved algorithm for aerial image homogenization processing
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-30 DOI: 10.1016/j.jvcir.2025.104403
Zhihua Zhang , Hao Yuan , Xinxiu Zhang , Dongdong Feng , Yikun Li , Shuwen Yang
The imaging time, light change, sensor lens angle and ground characteristics have all contributed to significant discrepancies in the brightness and colour distribution of the aerial images. These phenomena will have a significant impact on the production of DOM, and will present challenges in the interpretation, transliteration, feature extraction, and other related processes. In addressing these issues, the current methodologies exhibit shortcomings in terms of artificial subjectivity and an inability to exert comprehensive control over the processing effects, an improved method based on Mask dodging algorithm is proposed after a comprehensive evaluation of some different aerial image dodging algorithms. The aerial image is subjected to the Mask uniform light algorithm for processing. However, due to the uneven distribution of tonal contrast, a gradient stretching algorithm for the image histogram is employed to stretch the aerial image and drone image. Based on statistics, we are aiming to enhance the evaluation index value of image quality and improve contrast stretching through parameter selection and the Linear2% stretching processing algorithm. The comparative analysis reveals that the improved gradient stretching algorithm, based on image histogram, effectively ensures consistent overall brightness and texture contrast in aerial images while significantly improve the clarity of images.
{"title":"Analysis and evaluation of improved algorithm for aerial image homogenization processing","authors":"Zhihua Zhang ,&nbsp;Hao Yuan ,&nbsp;Xinxiu Zhang ,&nbsp;Dongdong Feng ,&nbsp;Yikun Li ,&nbsp;Shuwen Yang","doi":"10.1016/j.jvcir.2025.104403","DOIUrl":"10.1016/j.jvcir.2025.104403","url":null,"abstract":"<div><div>The imaging time, light change, sensor lens angle and ground characteristics have all contributed to significant discrepancies in the brightness and colour distribution of the aerial images. These phenomena will have a significant impact on the production of DOM, and will present challenges in the interpretation, transliteration, feature extraction, and other related processes. In addressing these issues, the current methodologies exhibit shortcomings in terms of artificial subjectivity and an inability to exert comprehensive control over the processing effects, an improved method based on Mask dodging algorithm is proposed after a comprehensive evaluation of some different aerial image dodging algorithms. The aerial image is subjected to the Mask uniform light algorithm for processing. However, due to the uneven distribution of tonal contrast, a gradient stretching algorithm for the image histogram is employed to stretch the aerial image and drone image. Based on statistics, we are aiming to enhance the evaluation index value of image quality and improve contrast stretching through parameter selection and the Linear2% stretching processing algorithm. The comparative analysis reveals that the improved gradient stretching algorithm, based on image histogram, effectively ensures consistent overall brightness and texture contrast in aerial images while significantly improve the clarity of images.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"108 ","pages":"Article 104403"},"PeriodicalIF":2.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Visible-Infrared person re-identification algorithm based on skeleton Insight Criss-Cross network
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-28 DOI: 10.1016/j.jvcir.2025.104395
Pan Jiaxing , Zhang Baohua , Zhang Jiale , Gu Yu , Shan Chongrui , Sun Yanxia , Wu Dongyang
There are significant inter-class differences in the cross-modal feature space. If the pedestrian skeleton information is used as the discrimination basis for cross-modal person re-identification, the problem of mismatch between the skeleton features and the ID attributes is inevitable. In order to solve the above problems, this paper proposes a novel Skeleton Insight Criss-Cross Network (SI-CCN), which consists of a Skeleton Insight Module (SIM) and a Criss-Cross Module (CCM). The former uses the skeleton hierarchical mechanism to extract the key skeleton information of the pedestrian limb area, obtain the characteristics of the skeleton key points at the pixel level, and the skeleton key points are used as the graph nodes to construct the skeleton posture structure of the pedestrian. And as a result, the SIM module can not only accurately capture the spatial information of various parts of the pedestrian, but also maintain the relative positional relationship between the key points of the skeleton to form a complete skeleton structure. The latter cooperatively optimizes the characteristics of high-dimensional skeleton and low-dimensional identity identification by using a cross-learning mechanism. In order to effectively capture the diverse skeleton posture, the attention distribution of the two in the feature extraction process is dynamically adjusted to integrate identity details at the same time, and the consistency of cross-modal features is improved. The experiments on the two cross-modal person re-identification data sets of SYSU-MM01 and RegDB show that the Rank-1 and mAP of the SI-CCN on the SYSU-MM01 data set are 81.94% and 76.92%, respectively, and the Rank-1 and mAP on the RegDB data set are 95.49% and 95.67%, respectively. The proposed method has better performance than that of the recent representative methods.
{"title":"A Visible-Infrared person re-identification algorithm based on skeleton Insight Criss-Cross network","authors":"Pan Jiaxing ,&nbsp;Zhang Baohua ,&nbsp;Zhang Jiale ,&nbsp;Gu Yu ,&nbsp;Shan Chongrui ,&nbsp;Sun Yanxia ,&nbsp;Wu Dongyang","doi":"10.1016/j.jvcir.2025.104395","DOIUrl":"10.1016/j.jvcir.2025.104395","url":null,"abstract":"<div><div>There are significant inter-class differences in the cross-modal feature space. If the pedestrian skeleton information is used as the discrimination basis for cross-modal person re-identification, the problem of mismatch between the skeleton features and the ID attributes is inevitable. In order to solve the above problems, this paper proposes a novel Skeleton Insight Criss-Cross Network (SI-CCN), which consists of a Skeleton Insight Module (SIM) and a Criss-Cross Module (CCM). The former uses the skeleton hierarchical mechanism to extract the key skeleton information of the pedestrian limb area, obtain the characteristics of the skeleton key points at the pixel level, and the skeleton key points are used as the graph nodes to construct the skeleton posture structure of the pedestrian. And as a result, the SIM module can not only accurately capture the spatial information of various parts of the pedestrian, but also maintain the relative positional relationship between the key points of the skeleton to form a complete skeleton structure. The latter cooperatively optimizes the characteristics of high-dimensional skeleton and low-dimensional identity identification by using a cross-learning mechanism. In order to effectively capture the diverse skeleton posture, the attention distribution of the two in the feature extraction process is dynamically adjusted to integrate identity details at the same time, and the consistency of cross-modal features is improved. The experiments on the two cross-modal person re-identification data sets of SYSU-MM01 and RegDB show that the Rank-1 and mAP of the SI-CCN on the SYSU-MM01 data set are 81.94% and 76.92%, respectively, and the Rank-1 and mAP on the RegDB data set are 95.49% and 95.67%, respectively. The proposed method has better performance than that of the recent representative methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104395"},"PeriodicalIF":2.6,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual object tracking based on adaptive deblurring integrating motion blur perception
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-27 DOI: 10.1016/j.jvcir.2025.104388
Lifan Sun , Baocheng Gong , Jianfeng Liu , Dan Gao
Visual object tracking in motion-blurred scenes is crucial for applications such as traffic monitoring and navigation, including intelligent video surveillance, robotic vision navigation, and automated driving. Existing tracking algorithms primarily cater to sharp images, exhibiting significant performance degradation in motion-blurred scenes. Image degradation and decreased contrast resulting from motion blur compromise feature extraction quality. This paper proposes a visual object tracking algorithm, SiamADP, based on adaptive deblurring and integrating motion blur perception. First, the proposed algorithm employs a blur perception mechanism to detect whether the input image is severely blurred. After that, an effective motion blur removal network is used to generate blur-free images, facilitating rich and useful feature information extraction. Given the scarcity of motion blur datasets for object tracking evaluation, four test datasets are proposed: three synthetic datasets and a manually collected and labeled real motion blur dataset. Comparative experiments with existing trackers demonstrate the effectiveness and robustness of SiamADP in motion blur scenarios, validating its performance.
{"title":"Visual object tracking based on adaptive deblurring integrating motion blur perception","authors":"Lifan Sun ,&nbsp;Baocheng Gong ,&nbsp;Jianfeng Liu ,&nbsp;Dan Gao","doi":"10.1016/j.jvcir.2025.104388","DOIUrl":"10.1016/j.jvcir.2025.104388","url":null,"abstract":"<div><div>Visual object tracking in motion-blurred scenes is crucial for applications such as traffic monitoring and navigation, including intelligent video surveillance, robotic vision navigation, and automated driving. Existing tracking algorithms primarily cater to sharp images, exhibiting significant performance degradation in motion-blurred scenes. Image degradation and decreased contrast resulting from motion blur compromise feature extraction quality. This paper proposes a visual object tracking algorithm, SiamADP, based on adaptive deblurring and integrating motion blur perception. First, the proposed algorithm employs a blur perception mechanism to detect whether the input image is severely blurred. After that, an effective motion blur removal network is used to generate blur-free images, facilitating rich and useful feature information extraction. Given the scarcity of motion blur datasets for object tracking evaluation, four test datasets are proposed: three synthetic datasets and a manually collected and labeled real motion blur dataset. Comparative experiments with existing trackers demonstrate the effectiveness and robustness of SiamADP in motion blur scenarios, validating its performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104388"},"PeriodicalIF":2.6,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skeleton-guided and supervised learning of hybrid network for multi-modal action recognition
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-25 DOI: 10.1016/j.jvcir.2025.104389
Ziliang Ren , Li Luo , Yong Qin
With the wide application of multi-modal data in computer vision classification tasks, multi-modal action recognition has become a high-profile research area. However, it has been a challenging task to fully utilize the complementarities between different modalities and extract high-level semantic features that are closely related to actions. In this paper, we employ a skeleton alignment mechanism and design a sampling and skeleton-guided cropping module (SSGCM), which serves to crop redundant background information in RGB and depth sequences, thereby enhancing the representation of important RGB and depth information that is closely related to actions. In addition, we transform the entire skeleton information into a set of pseudo-images by mapping and normalizing the information of skeleton data in a matrix, which is used as a supervised information flow for extracting multi-modal complementary features. Furthermore, we propose an innovative multi-modal supervised learning framework based on a hybrid network, which aims to learn compensatory features from RGB, depth and skeleton modalities to improve the performance of multi-modal action recognition. We comprehensively evaluate our recognition framework on the three benchmark multi-modal dataset: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results show that our method achieved the state-of-the-art action recognition performance on these three benchmark datasets through the joint training and supervised learning strategies with SSGCM.
{"title":"Skeleton-guided and supervised learning of hybrid network for multi-modal action recognition","authors":"Ziliang Ren ,&nbsp;Li Luo ,&nbsp;Yong Qin","doi":"10.1016/j.jvcir.2025.104389","DOIUrl":"10.1016/j.jvcir.2025.104389","url":null,"abstract":"<div><div>With the wide application of multi-modal data in computer vision classification tasks, multi-modal action recognition has become a high-profile research area. However, it has been a challenging task to fully utilize the complementarities between different modalities and extract high-level semantic features that are closely related to actions. In this paper, we employ a skeleton alignment mechanism and design a sampling and skeleton-guided cropping module (SSGCM), which serves to crop redundant background information in RGB and depth sequences, thereby enhancing the representation of important RGB and depth information that is closely related to actions. In addition, we transform the entire skeleton information into a set of pseudo-images by mapping and normalizing the information of skeleton data in a matrix, which is used as a supervised information flow for extracting multi-modal complementary features. Furthermore, we propose an innovative multi-modal supervised learning framework based on a hybrid network, which aims to learn compensatory features from RGB, depth and skeleton modalities to improve the performance of multi-modal action recognition. We comprehensively evaluate our recognition framework on the three benchmark multi-modal dataset: NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results show that our method achieved the state-of-the-art action recognition performance on these three benchmark datasets through the joint training and supervised learning strategies with SSGCM.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104389"},"PeriodicalIF":2.6,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning scalable Omni-scale distribution for crowd counting
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-18 DOI: 10.1016/j.jvcir.2025.104387
Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian
Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.
{"title":"Learning scalable Omni-scale distribution for crowd counting","authors":"Huake Wang ,&nbsp;Xingsong Hou ,&nbsp;Kaibing Zhang ,&nbsp;Xin Zeng ,&nbsp;Minqi Li ,&nbsp;Wenke Sun ,&nbsp;Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":"10.1016/j.jvcir.2025.104387","url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":2.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PFFNet: A point cloud based method for 3D face flow estimation
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-10 DOI: 10.1016/j.jvcir.2024.104382
Dong Li, Yuchen Deng, Zijun Huang
In recent years, the research on 3D facial flow has received more attention, and it is of great significance for related research on 3D faces. Point cloud based 3D face flow estimation is inherently challenging due to non-rigid and large-scale motion. In this paper, we propose a novel method called PFFNet for estimating 3D face flow in a coarse-to-fine network. Specifically, an adaptive sampling module is proposed to learn sampling points, and an effective channel-wise feature extraction module is incorporated to learn facial priors from the point clouds, jointly. Additionally, to accommodate large-scale motion, we also introduce a normal vector angle upsampling module to enhance local semantic consistency, and a context-aware cost volume that learns the correlation between the two point clouds with context information. Experiments conducted on the FaceScape dataset demonstrate that the proposed method outperforms state-of-the-art scene flow methods by a significant margin.
{"title":"PFFNet: A point cloud based method for 3D face flow estimation","authors":"Dong Li,&nbsp;Yuchen Deng,&nbsp;Zijun Huang","doi":"10.1016/j.jvcir.2024.104382","DOIUrl":"10.1016/j.jvcir.2024.104382","url":null,"abstract":"<div><div>In recent years, the research on 3D facial flow has received more attention, and it is of great significance for related research on 3D faces. Point cloud based 3D face flow estimation is inherently challenging due to non-rigid and large-scale motion. In this paper, we propose a novel method called PFFNet for estimating 3D face flow in a coarse-to-fine network. Specifically, an adaptive sampling module is proposed to learn sampling points, and an effective channel-wise feature extraction module is incorporated to learn facial priors from the point clouds, jointly. Additionally, to accommodate large-scale motion, we also introduce a normal vector angle upsampling module to enhance local semantic consistency, and a context-aware cost volume that learns the correlation between the two point clouds with context information. Experiments conducted on the FaceScape dataset demonstrate that the proposed method outperforms state-of-the-art scene flow methods by a significant margin.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104382"},"PeriodicalIF":2.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFA: Lifelong Person Re-Identification learning by statistics-aware feature alignment
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-31 DOI: 10.1016/j.jvcir.2024.104378
Qiankun Gao, Mengxi Jia, Jie Chen, Jian Zhang
The goal of Lifelong Person Re-Identification (Re-ID) is to continuously update a model with new data to improve its generalization ability, without forgetting previously learned knowledge. Lifelong Re-ID approaches usually employs classifier-based knowledge distillation to overcome forgetting, where classifier parameters grow with the amount of learning data. In the fine-grained Re-ID task, features contain more valuable information than classifiers. However, due to feature space drift, naive feature distillation can overly suppress model’s plasticity. This paper proposes SAFA with statistics-aware feature alignment and progressive feature distillation. Specifically, we align new and old features based on coefficient of variation and gradually increase the strength of feature distillation. This encourages the model to learn new knowledge in early epochs, punishes it for forgetting in later epochs, and ultimately achieves a better stability–plasticity balance. Experiments on domain-incremental and intra-domain benchmarks demonstrate that our SAFA significantly outperforms counterparts while achieving better memory and computation efficiency.
{"title":"SAFA: Lifelong Person Re-Identification learning by statistics-aware feature alignment","authors":"Qiankun Gao,&nbsp;Mengxi Jia,&nbsp;Jie Chen,&nbsp;Jian Zhang","doi":"10.1016/j.jvcir.2024.104378","DOIUrl":"10.1016/j.jvcir.2024.104378","url":null,"abstract":"<div><div>The goal of Lifelong Person Re-Identification (Re-ID) is to continuously update a model with new data to improve its generalization ability, without forgetting previously learned knowledge. Lifelong Re-ID approaches usually employs classifier-based knowledge distillation to overcome forgetting, where classifier parameters grow with the amount of learning data. In the fine-grained Re-ID task, features contain more valuable information than classifiers. However, due to feature space drift, naive feature distillation can overly suppress model’s plasticity. This paper proposes SAFA with statistics-aware feature alignment and progressive feature distillation. Specifically, we align new and old features based on coefficient of variation and gradually increase the strength of feature distillation. This encourages the model to learn new knowledge in early epochs, punishes it for forgetting in later epochs, and ultimately achieves a better stability–plasticity balance. Experiments on domain-incremental and intra-domain benchmarks demonstrate that our SAFA significantly outperforms counterparts while achieving better memory and computation efficiency.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104378"},"PeriodicalIF":2.6,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dense video captioning using unsupervised semantic information
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-30 DOI: 10.1016/j.jvcir.2024.104385
Valter Estevam , Rayson Laroca , Helio Pedrini , David Menotti
We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a clustering method to group representations producing a visual codebook. Then, we learn a dense representation by encoding the co-occurrence probability matrix for the codebook entries. This representation leverages the performance of the dense video captioning task in a scenario with only visual features. For example, we replace the audio signal in the BMT method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual representation with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in the captioning subtask compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi.
{"title":"Dense video captioning using unsupervised semantic information","authors":"Valter Estevam ,&nbsp;Rayson Laroca ,&nbsp;Helio Pedrini ,&nbsp;David Menotti","doi":"10.1016/j.jvcir.2024.104385","DOIUrl":"10.1016/j.jvcir.2024.104385","url":null,"abstract":"<div><div>We introduce a method to learn unsupervised semantic visual information based on the premise that complex events can be decomposed into simpler events and that these simple events are shared across several complex events. We first employ a clustering method to group representations producing a visual codebook. Then, we learn a dense representation by encoding the co-occurrence probability matrix for the codebook entries. This representation leverages the performance of the dense video captioning task in a scenario with only visual features. For example, we replace the audio signal in the BMT method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual representation with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in the captioning subtask compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at <span><span>https://github.com/valterlej/dvcusi</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104385"},"PeriodicalIF":2.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality assessment of windowed 6DoF video with viewpoint switching
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-27 DOI: 10.1016/j.jvcir.2024.104352
Wenhui Zou , Tingyan Tang , Weihua Chen , Gangyi Jiang , Zongju Peng
Windowed six degrees of freedom (6DoF) video systems can provide users with highly interactive experiences by offering three rotational and three translational free movements. Free viewing in immersive scenes requires extensive viewpoint switching, which introduces new distortions (such as jitter and discomfort) to windowed 6DoF videos in addition to traditional compression and rendering distortions. This paper proposes a quality assessment method via spatiotemporal features and view switching smoothness for windowed 6DoF-synthesized videos with a wide field of view. Firstly, the edges are extracted from video frames to obtain local spatial distortion features by measuring their statistical characteristics through a generalized Gaussian distribution. Then, the synthesized videos are decomposed and reassembled in the temporal domain to intuitively describe the horizontal and vertical characteristics of the temporal distortions. A gradient-weighted local binary pattern is used to measure temporal flicker distortions. Next, to assess the impact of viewpoint switching on visual perception, a velocity model for retinal image motion is established. Finally, the objective quality score is predicted by a weighted regression model. The experimental results confirm that the proposed method is highly competitive.
{"title":"Quality assessment of windowed 6DoF video with viewpoint switching","authors":"Wenhui Zou ,&nbsp;Tingyan Tang ,&nbsp;Weihua Chen ,&nbsp;Gangyi Jiang ,&nbsp;Zongju Peng","doi":"10.1016/j.jvcir.2024.104352","DOIUrl":"10.1016/j.jvcir.2024.104352","url":null,"abstract":"<div><div>Windowed six degrees of freedom (6DoF) video systems can provide users with highly interactive experiences by offering three rotational and three translational free movements. Free viewing in immersive scenes requires extensive viewpoint switching, which introduces new distortions (such as jitter and discomfort) to windowed 6DoF videos in addition to traditional compression and rendering distortions. This paper proposes a quality assessment method via spatiotemporal features and view switching smoothness for windowed 6DoF-synthesized videos with a wide field of view. Firstly, the edges are extracted from video frames to obtain local spatial distortion features by measuring their statistical characteristics through a generalized Gaussian distribution. Then, the synthesized videos are decomposed and reassembled in the temporal domain to intuitively describe the horizontal and vertical characteristics of the temporal distortions. A gradient-weighted local binary pattern is used to measure temporal flicker distortions. Next, to assess the impact of viewpoint switching on visual perception, a velocity model for retinal image motion is established. Finally, the objective quality score is predicted by a weighted regression model. The experimental results confirm that the proposed method is highly competitive.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104352"},"PeriodicalIF":2.6,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-dimension deep model for body mass index estimation from facial image sequences with different poses
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-25 DOI: 10.1016/j.jvcir.2024.104381
Chenghao Xiang, Boxiang Liu, Liang Zhao, Xiujuan Zheng
Body mass index (BMI), an essential indicator of human health, can be calculated based on height and weight. Previous studies have carried out visual BMI estimation from a frontal facial image. However, these studies have ignored the visual information provided by the different face poses on BMI estimation. Considering the contributions of different face poses, this study applies the perspective transformation to the public facial image dataset to simulate face rotation and collects a video dataset with face rotation in yaw type. A three-dimensional convolutional neural network, which integrates the facial three-dimensional information from an image sequence with different face poses, is proposed for BMI estimation. The proposed methods are validated using the public and private datasets. Ablation experiments demonstrate that the face sequence with different poses can improve the performance of visual BMI estimation. Comparison experiments indicate that the proposed method can increase classification accuracy and reduce visual BMI estimation errors. Code has been released: https://github.com/xiangch1910/STNET-BMI.
{"title":"Three-dimension deep model for body mass index estimation from facial image sequences with different poses","authors":"Chenghao Xiang,&nbsp;Boxiang Liu,&nbsp;Liang Zhao,&nbsp;Xiujuan Zheng","doi":"10.1016/j.jvcir.2024.104381","DOIUrl":"10.1016/j.jvcir.2024.104381","url":null,"abstract":"<div><div>Body mass index (BMI), an essential indicator of human health, can be calculated based on height and weight. Previous studies have carried out visual BMI estimation from a frontal facial image. However, these studies have ignored the visual information provided by the different face poses on BMI estimation. Considering the contributions of different face poses, this study applies the perspective transformation to the public facial image dataset to simulate face rotation and collects a video dataset with face rotation in yaw type. A three-dimensional convolutional neural network, which integrates the facial three-dimensional information from an image sequence with different face poses, is proposed for BMI estimation. The proposed methods are validated using the public and private datasets. Ablation experiments demonstrate that the face sequence with different poses can improve the performance of visual BMI estimation. Comparison experiments indicate that the proposed method can increase classification accuracy and reduce visual BMI estimation errors. Code has been released: <span><span>https://github.com/xiangch1910/STNET-BMI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104381"},"PeriodicalIF":2.6,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143174813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1