用于极暗图像增强的实时注意力稀释 U-Net

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-03-27 DOI:10.1145/3654668

Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu

{"title":"用于极暗图像增强的实时注意力稀释 U-Net","authors":"Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu","doi":"10.1145/3654668","DOIUrl":null,"url":null,"abstract":"Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, i.e., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"52 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement\",\"authors\":\"Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu\",\"doi\":\"10.1145/3654668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, i.e., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3654668\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654668","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在低照度条件下拍摄的图像存在能见度低、色彩失真和颗粒感等问题，所有这些问题都会降低图像质量，妨碍下游视觉任务的执行，例如自动驾驶领域的物体检测和实例分割，因此低照度增强是高级视觉任务不可或缺的基本组成部分。弱光增强旨在缓解这些问题，几十年来已引起广泛关注和研究。弱光图像增强的主要挑战来自照明不足造成的低信噪比（SNR）。在接近零勒克斯的条件下，这一挑战变得更加突出，因为噪声会淹没可用的图像信息。在这种情况下，传统的图像信号处理（ISP）管道和传统的低照度图像增强方法都显得力不从心。最近，深度神经网络被用来应对这一挑战。这些网络将未修改的 RAW 图像作为输入，生成增强的 sRGB 图像，形成了基于深度学习的 ISP 管道。然而，这些网络大多计算成本高昂，因此远未得到实际应用。在本文中，我们提出了一种名为entive dilated U-Net（ADU-Net）的轻量级模型来解决这一问题。我们的模型采用了多项创新设计，包括非对称 U 型架构、用于特征提取的稀释残差模块（DRM）和用于特征融合的殷勤融合模块（AFM）。DRM 具有很强的代表性，而 AFM 则能有效利用网络中的低级纹理信息和高级语义信息。这两个模块都采用了轻量级设计，但性能提升显著。广泛的实验证明，我们的方法非常有效，在图像质量和计算复杂度之间实现了极佳的平衡，即在单个 GTX 1080Ti GPU 上处理高清 4K 图像的时间小于 4 毫秒，同时还能保持极具竞争力的视觉质量。此外，我们的方法还表现出令人满意的可扩展性和通用性，突显了其广泛应用的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement

Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to mitigate these issues, and has garnered extensive attention and research over several decades. The primary challenge in low-light image enhancement arises from the low signal-to-noise ratio (SNR) caused by insufficient lighting. This challenge becomes even more pronounced in near-zero lux conditions, where noise overwhelms the available image information. Both traditional image signal processing (ISP) pipeline and conventional low-light image enhancement methods struggle in such scenarios. Recently, deep neural networks have been used to address this challenge. These networks take unmodified RAW images as input and produce the enhanced sRGB images, forming a deep learning-based ISP pipeline. However, most of these networks are computationally expensive and thus far from practical use. In this paper, we propose a lightweight model called attentive dilated U-Net (ADU-Net) to tackle this issue. Our model incorporates several innovative designs, including an asymmetric U-shape architecture, dilated residual modules (DRMs) for feature extraction, and attentive fusion modules (AFMs) for feature fusion. The DRMs provide strong representative capability while the AFMs effectively leverage low-level texture information and high-level semantic information within the network. Both modules employ a lightweight design but offer significant performance gains. Extensive experiments demonstrate our method is highly-effective, achieving an excellent balance between image quality and computational complexity, i.e., taking less than 4ms for a high-definition 4K image on a single GTX 1080Ti GPU and yet maintaining competitive visual quality. Furthermore, our method exhibits pleasing scalability and generalizability, highlighting its potential for widespread applicability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.