{"title":"Multi-Scale Video Inverse Tone Mapping with Deformable Alignment","authors":"Jiaqi Zou, Ke Mei, Songlin Sun","doi":"10.1109/VCIP49819.2020.9301780","DOIUrl":null,"url":null,"abstract":"Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP49819.2020.9301780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.