{"title":"基于室内环境单幅图像的结构感知照明估计","authors":"Junhong Zhao;Bing Xue;Mengjie Zhang","doi":"10.1109/TIP.2024.3512381","DOIUrl":null,"url":null,"abstract":"High Dynamic Range (HDR) lighting plays a pivotal role in modern augmented and mixed-reality (AR/MR) applications, facilitating immersive experiences through realistic object insertion and dynamic relighting. However, the acquisition of precise HDR environment maps remains cost-prohibitive and impractical when using standard devices. To bridge this gap, this paper introduces SALENet, a novel deep network for estimating global lighting conditions from a single image, to effectively mitigate the need for resource-intensive acquisition methods. In contrast to earlier studies, we focus on exploring the inherent structural relationships within the lighting distribution. We design a hierarchical transformer-based neural network architecture with a proposed cross-attention mechanism between different resolution lighting source representations, optimizing the spatial distribution of lighting sources simultaneously for enhanced consistency. To further improve accuracy, a structure-based contrastive learning method is proposed to select positive-negative pairs based on lighting distribution similarity. By harnessing the synergy of hierarchical transformers and structure-based contrastive learning, our framework yields a significant enhancement in lighting prediction accuracy, enabling high-fidelity augmented and mixed reality to achieve cost-effectively immersive and realistic lighting effects.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6806-6820"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SALENet: Structure-Aware Lighting Estimations From a Single Image for Indoor Environments\",\"authors\":\"Junhong Zhao;Bing Xue;Mengjie Zhang\",\"doi\":\"10.1109/TIP.2024.3512381\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High Dynamic Range (HDR) lighting plays a pivotal role in modern augmented and mixed-reality (AR/MR) applications, facilitating immersive experiences through realistic object insertion and dynamic relighting. However, the acquisition of precise HDR environment maps remains cost-prohibitive and impractical when using standard devices. To bridge this gap, this paper introduces SALENet, a novel deep network for estimating global lighting conditions from a single image, to effectively mitigate the need for resource-intensive acquisition methods. In contrast to earlier studies, we focus on exploring the inherent structural relationships within the lighting distribution. We design a hierarchical transformer-based neural network architecture with a proposed cross-attention mechanism between different resolution lighting source representations, optimizing the spatial distribution of lighting sources simultaneously for enhanced consistency. To further improve accuracy, a structure-based contrastive learning method is proposed to select positive-negative pairs based on lighting distribution similarity. By harnessing the synergy of hierarchical transformers and structure-based contrastive learning, our framework yields a significant enhancement in lighting prediction accuracy, enabling high-fidelity augmented and mixed reality to achieve cost-effectively immersive and realistic lighting effects.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"6806-6820\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10794602/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10794602/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SALENet: Structure-Aware Lighting Estimations From a Single Image for Indoor Environments
High Dynamic Range (HDR) lighting plays a pivotal role in modern augmented and mixed-reality (AR/MR) applications, facilitating immersive experiences through realistic object insertion and dynamic relighting. However, the acquisition of precise HDR environment maps remains cost-prohibitive and impractical when using standard devices. To bridge this gap, this paper introduces SALENet, a novel deep network for estimating global lighting conditions from a single image, to effectively mitigate the need for resource-intensive acquisition methods. In contrast to earlier studies, we focus on exploring the inherent structural relationships within the lighting distribution. We design a hierarchical transformer-based neural network architecture with a proposed cross-attention mechanism between different resolution lighting source representations, optimizing the spatial distribution of lighting sources simultaneously for enhanced consistency. To further improve accuracy, a structure-based contrastive learning method is proposed to select positive-negative pairs based on lighting distribution similarity. By harnessing the synergy of hierarchical transformers and structure-based contrastive learning, our framework yields a significant enhancement in lighting prediction accuracy, enabling high-fidelity augmented and mixed reality to achieve cost-effectively immersive and realistic lighting effects.