Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics. Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity compared to conventional cameras. On the basis of these features, previous works have attempted to reconstruct high dynamic range (HDR) videos from events, but have either suffered from unrealistic artifacts or failed to provide sufficiently high frame rates. In this paper, we present a recurrent convolutional neural network that reconstruct high-speed HDR videos from event sequences, with a key frame guidance to prevent potential error accumulation caused by the sparse event data. Additionally, to address the problem of severely limited real dataset, we develop a new optical system to collect a real-world dataset with paired high-speed HDR videos and event streams, facilitating future research in this field. Our dataset provides the first real paired dataset for event-to-HDR reconstruction, avoiding potential inaccuracies from simulation strategies. Experimental results demonstrate that our method can generate high-quality, high-speed HDR videos. We further explore the potential of our work in cross-camera reconstruction and downstream computer vision tasks, including object detection, panoramic segmentation, optical flow estimation, and monocular depth estimation under HDR scenarios.
{"title":"EventHDR: From Event to High-Speed HDR Videos and Beyond.","authors":"Yunhao Zou, Ying Fu, Tsuyoshi Takatani, Yinqiang Zheng","doi":"10.1109/TPAMI.2024.3469571","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3469571","url":null,"abstract":"<p><p>Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics. Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity compared to conventional cameras. On the basis of these features, previous works have attempted to reconstruct high dynamic range (HDR) videos from events, but have either suffered from unrealistic artifacts or failed to provide sufficiently high frame rates. In this paper, we present a recurrent convolutional neural network that reconstruct high-speed HDR videos from event sequences, with a key frame guidance to prevent potential error accumulation caused by the sparse event data. Additionally, to address the problem of severely limited real dataset, we develop a new optical system to collect a real-world dataset with paired high-speed HDR videos and event streams, facilitating future research in this field. Our dataset provides the first real paired dataset for event-to-HDR reconstruction, avoiding potential inaccuracies from simulation strategies. Experimental results demonstrate that our method can generate high-quality, high-speed HDR videos. We further explore the potential of our work in cross-camera reconstruction and downstream computer vision tasks, including object detection, panoramic segmentation, optical flow estimation, and monocular depth estimation under HDR scenarios.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-09DOI: 10.1109/TPAMI.2024.3476683
Zhenyu Wu, Wei Wang, Lin Wang, Yacong Li, Fengmao Lv, Qing Xia, Chenglizhao Chen, Aimin Hao, Shuo Li
Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial spatio-temporal ensemble active learning. Our contributions are four- fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. 2) Our proposed spatio-temporal ensemble strategy not only achieves outstanding performance but significantly reduces the model's computational cost. 3) Our proposed relationship-aware diversity sampling can conquer oversampling while boosting model performance. 4) We provide theoretical proof for the existence of such a point-labeled dataset. Experimental results show that our approach can find such a point-labeled dataset, where a saliency model trained on it obtained 98%-99% performance of its fully-supervised version with only ten annotated points per image. The code is available at https://github.com/wuzhenyubuaa/ASTE-AL.
{"title":"Pixel is All You Need: Adversarial Spatio-Temporal Ensemble Active Learning for Salient Object Detection.","authors":"Zhenyu Wu, Wei Wang, Lin Wang, Yacong Li, Fengmao Lv, Qing Xia, Chenglizhao Chen, Aimin Hao, Shuo Li","doi":"10.1109/TPAMI.2024.3476683","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3476683","url":null,"abstract":"<p><p>Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial spatio-temporal ensemble active learning. Our contributions are four- fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. 2) Our proposed spatio-temporal ensemble strategy not only achieves outstanding performance but significantly reduces the model's computational cost. 3) Our proposed relationship-aware diversity sampling can conquer oversampling while boosting model performance. 4) We provide theoretical proof for the existence of such a point-labeled dataset. Experimental results show that our approach can find such a point-labeled dataset, where a saliency model trained on it obtained 98%-99% performance of its fully-supervised version with only ten annotated points per image. The code is available at https://github.com/wuzhenyubuaa/ASTE-AL.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-09DOI: 10.1109/TPAMI.2024.3475249
Miaoyu Li, Ying Fu, Tao Zhang, Ji Liu, Dejing Dou, Chenggang Yan, Yulun Zhang
The restoration of hyperspectral image (HSI) plays a pivotal role in subsequent hyperspectral image applications. Despite the remarkable capabilities of deep learning, current HSI restoration methods face challenges in effectively exploring the spatial non-local self-similarity and spectral low-rank property inherently embedded with HSIs. This paper addresses these challenges by introducing a latent diffusion enhanced rectangle Transformer for HSI restoration, tackling the non-local spatial similarity and HSI-specific latent diffusion low-rank property. In order to effectively capture non-local spatial similarity, we propose the multi-shape spatial rectangle self-attention module in both horizontal and vertical directions, enabling the model to utilize informative spatial regions for HSI restoration. Meanwhile, we propose a spectral latent diffusion enhancement module that generates the image-specific latent dictionary based on the content of HSI for low-rank vector extraction and representation. This module utilizes a diffusion model to generatively obtain representations of global low-rank vectors, thereby aligning more closely with the desired HSI. A series of comprehensive experiments were carried out on four common hyperspectral image restoration tasks, including HSI denoising, HSI super-resolution, HSI reconstruction, and HSI inpainting. The results of these experiments highlight the effectiveness of our proposed method, as demonstrated by improvements in both objective metrics and subjective visual quality.
{"title":"Latent Diffusion Enhanced Rectangle Transformer for Hyperspectral Image Restoration.","authors":"Miaoyu Li, Ying Fu, Tao Zhang, Ji Liu, Dejing Dou, Chenggang Yan, Yulun Zhang","doi":"10.1109/TPAMI.2024.3475249","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3475249","url":null,"abstract":"<p><p>The restoration of hyperspectral image (HSI) plays a pivotal role in subsequent hyperspectral image applications. Despite the remarkable capabilities of deep learning, current HSI restoration methods face challenges in effectively exploring the spatial non-local self-similarity and spectral low-rank property inherently embedded with HSIs. This paper addresses these challenges by introducing a latent diffusion enhanced rectangle Transformer for HSI restoration, tackling the non-local spatial similarity and HSI-specific latent diffusion low-rank property. In order to effectively capture non-local spatial similarity, we propose the multi-shape spatial rectangle self-attention module in both horizontal and vertical directions, enabling the model to utilize informative spatial regions for HSI restoration. Meanwhile, we propose a spectral latent diffusion enhancement module that generates the image-specific latent dictionary based on the content of HSI for low-rank vector extraction and representation. This module utilizes a diffusion model to generatively obtain representations of global low-rank vectors, thereby aligning more closely with the desired HSI. A series of comprehensive experiments were carried out on four common hyperspectral image restoration tasks, including HSI denoising, HSI super-resolution, HSI reconstruction, and HSI inpainting. The results of these experiments highlight the effectiveness of our proposed method, as demonstrated by improvements in both objective metrics and subjective visual quality.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1109/TPAMI.2024.3462453
Xin Liu;Rong Qin;Junchi Yan;Jufeng Yang
Correspondence pruning plays a crucial role in a variety of feature matching based tasks, which aims at identifying correct correspondences (inliers) from initial ones. Seeking consistent $k$