Low-light video enhancement (LLVE) has received little attention compared to low-light image enhancement (LLIE) mainly due to the lack of paired low-/normal-light video datasets. Consequently, a common approach to LLVE is to enhance each video frame individually using LLIE methods. However, this practice introduces temporal inconsistencies in the resulting video. In this work, we propose a recurrent neural network (RNN) that, given a low-light video and its per-frame enhanced version, produces a temporally consistent video preserving the underlying frame-based enhancement. We achieve this by training our network with a combination of a new forward-backward temporal consistency loss and a content-preserving loss. At inference time, we can use our trained network to correct videos processed by any LLIE method. Experimental results show that our method achieves the best trade-off between temporal consistency improvement and fidelity with the per-frame enhanced video, exhibiting a lower memory complexity and comparable time complexity with respect to other state-of-the-art methods for temporal consistency.
{"title":"A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame Methods","authors":"Claudio Rota;Marco Buzzelli;Simone Bianco;Raimondo Schettini","doi":"10.1109/LSP.2024.3475969","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475969","url":null,"abstract":"Low-light video enhancement (LLVE) has received little attention compared to low-light image enhancement (LLIE) mainly due to the lack of paired low-/normal-light video datasets. Consequently, a common approach to LLVE is to enhance each video frame individually using LLIE methods. However, this practice introduces temporal inconsistencies in the resulting video. In this work, we propose a recurrent neural network (RNN) that, given a low-light video and its per-frame enhanced version, produces a temporally consistent video preserving the underlying frame-based enhancement. We achieve this by training our network with a combination of a new forward-backward temporal consistency loss and a content-preserving loss. At inference time, we can use our trained network to correct videos processed by any LLIE method. Experimental results show that our method achieves the best trade-off between temporal consistency improvement and fidelity with the per-frame enhanced video, exhibiting a lower memory complexity and comparable time complexity with respect to other state-of-the-art methods for temporal consistency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1109/LSP.2024.3476208
Zhiyu Jiang;Ye Yuan;Yuan Yuan
Few-shot semantic segmentation (FSS) is crucial for image interpretation, yet it is constrained by requirements for extensive base data and a narrow focus on foreground-background differentiation. This work introduces Data-free Few-shot Semantic Segmentation (DFSS), a task that requires limited labeled images and forgoes the need for extensive base data, allowing for comprehensive image segmentation. The proposed method utilizes the Segment Anything Model (SAM) for its generalization capabilities. The Prototypical Metric Segment Anything Model is introduced, featuring an initial segmentation phase followed by prototype matching, effectively addressing the learning challenges posed by limited data. To enhance discrimination in multi-class segmentation, the Supervised Prototypical Contrastive Loss (SPCL) is designed to refine prototype features, ensuring intra-class cohesion and inter-class separation. To further accommodate intra-class variability, the Adaptive Prototype Update (APU) strategy dynamically refines prototypes, adapting the model to class heterogeneity. The method's effectiveness is demonstrated through superior performance over existing techniques on the DFSS task, marking a significant advancement in UAV image segmentation.
{"title":"Prototypical Metric Segment Anything Model for Data-Free Few-Shot Semantic Segmentation","authors":"Zhiyu Jiang;Ye Yuan;Yuan Yuan","doi":"10.1109/LSP.2024.3476208","DOIUrl":"https://doi.org/10.1109/LSP.2024.3476208","url":null,"abstract":"Few-shot semantic segmentation (FSS) is crucial for image interpretation, yet it is constrained by requirements for extensive base data and a narrow focus on foreground-background differentiation. This work introduces Data-free Few-shot Semantic Segmentation (DFSS), a task that requires limited labeled images and forgoes the need for extensive base data, allowing for comprehensive image segmentation. The proposed method utilizes the Segment Anything Model (SAM) for its generalization capabilities. The Prototypical Metric Segment Anything Model is introduced, featuring an initial segmentation phase followed by prototype matching, effectively addressing the learning challenges posed by limited data. To enhance discrimination in multi-class segmentation, the Supervised Prototypical Contrastive Loss (SPCL) is designed to refine prototype features, ensuring intra-class cohesion and inter-class separation. To further accommodate intra-class variability, the Adaptive Prototype Update (APU) strategy dynamically refines prototypes, adapting the model to class heterogeneity. The method's effectiveness is demonstrated through superior performance over existing techniques on the DFSS task, marking a significant advancement in UAV image segmentation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1109/LSP.2024.3475358
Arka Roy;Udit Satija
Respiratory disorders have become the third largest cause of death worldwide, which can be assessed by one of the two key diagnostic modalities: breathing patterns (BPs) or the airflow signals, and respiratory sounds (RSs). In recent years, few studies have been conducted on finding correlation between these two modalities which indicate the structural flaws of lungs under disease condition. In this letter, we propose ‘RS-2-BP’: a unified deep learning framework for deriving the electrical impedance tomography-based airflow signals from respiratory sounds using a hybrid neural network architecture, namely ReSTL, that comprises cascaded standard and residual shrinkage convolution blocks, followed by feature refined transformer encoders and long-short term memory (LSTM) units. The proposed framework is extensively evaluated using the publicly available BRACETS dataset. Experimental results suggest that our ReSTL can accurately derive the BPs from RSs with an average mean absolute error of $0.024pm 0.011, ,0.436pm 0.120, ,0.020pm 0.011,,0.134pm 0.068$