Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu
{"title":"用于极暗图像增强的实时注意力稀释 U-Net","authors":"Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu","doi":"10.1145/3654668","DOIUrl":null,"url":null,"abstract":"<p>Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, <i>i</i>.<i>e</i>., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"52 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement\",\"authors\":\"Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu\",\"doi\":\"10.1145/3654668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, <i>i</i>.<i>e</i>., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3654668\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654668","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement
Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, i.e., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.