SGformer：用于从单张图像估算室内照明的增强变换器

IF 17.3 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computational Visual Media Pub Date : 2024-08-21 DOI:10.1007/s41095-024-0447-8

Junhong Zhao, Bing Xue, Mengjie Zhang

{"title":"SGformer：用于从单张图像估算室内照明的增强变换器","authors":"Junhong Zhao, Bing Xue, Mengjie Zhang","doi":"10.1007/s41095-024-0447-8","DOIUrl":null,"url":null,"abstract":"<p>Estimating lighting from standard images can effectively circumvent the need for resource-intensive high-dynamic-range (HDR) lighting acquisition. However, this task is often ill-posed and challenging, particularly for indoor scenes, due to the intricacy and ambiguity inherent in various indoor illumination sources. We propose an innovative transformer-based method called SGformer for lighting estimation through modeling spherical Gaussian (SG) distributions—a compact yet expressive lighting representation. Diverging from previous approaches, we explore underlying local and global dependencies in lighting features, which are crucial for reliable lighting estimation. Additionally, we investigate the structural relationships spanning various resolutions of SG distributions, ranging from sparse to dense, aiming to enhance structural consistency and curtail potential stochastic noise stemming from independent SG component regressions. By harnessing the synergy of local-global lighting representation learning and incorporating consistency constraints from various SG resolutions, the proposed method yields more accurate lighting estimates, allowing for more realistic lighting effects in object relighting and composition. Our code and model implementing our work can be found at https://github.com/junhong-jennifer-zhao/SGformer.\n</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"10 1","pages":""},"PeriodicalIF":17.3000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SGformer: Boosting transformers for indoor lighting estimation from a single image\",\"authors\":\"Junhong Zhao, Bing Xue, Mengjie Zhang\",\"doi\":\"10.1007/s41095-024-0447-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Estimating lighting from standard images can effectively circumvent the need for resource-intensive high-dynamic-range (HDR) lighting acquisition. However, this task is often ill-posed and challenging, particularly for indoor scenes, due to the intricacy and ambiguity inherent in various indoor illumination sources. We propose an innovative transformer-based method called SGformer for lighting estimation through modeling spherical Gaussian (SG) distributions—a compact yet expressive lighting representation. Diverging from previous approaches, we explore underlying local and global dependencies in lighting features, which are crucial for reliable lighting estimation. Additionally, we investigate the structural relationships spanning various resolutions of SG distributions, ranging from sparse to dense, aiming to enhance structural consistency and curtail potential stochastic noise stemming from independent SG component regressions. By harnessing the synergy of local-global lighting representation learning and incorporating consistency constraints from various SG resolutions, the proposed method yields more accurate lighting estimates, allowing for more realistic lighting effects in object relighting and composition. Our code and model implementing our work can be found at https://github.com/junhong-jennifer-zhao/SGformer.\\n</p>\",\"PeriodicalId\":37301,\"journal\":{\"name\":\"Computational Visual Media\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":17.3000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Visual Media\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s41095-024-0447-8\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Visual Media","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s41095-024-0447-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

从标准图像中估算光照可以有效地避免对资源密集型高动态范围（HDR）光照采集的需求。然而，由于各种室内照明光源错综复杂、模棱两可，这项任务往往困难重重、极具挑战性，尤其是在室内场景中。我们提出了一种基于变压器的创新方法，称为 SGformer，通过对球形高斯分布（SG）建模来进行光照估计--球形高斯分布是一种紧凑而又富有表现力的光照表示。与以往的方法不同，我们探索了照明特征中潜在的局部和全局依赖关系，这对于可靠的照明估计至关重要。此外，我们还研究了从稀疏到密集的各种 SG 分布分辨率之间的结构关系，旨在增强结构的一致性，并减少独立 SG 分量回归产生的潜在随机噪声。通过利用局部-全局照明表征学习的协同作用，并结合来自不同 SG 分辨率的一致性约束，所提出的方法可以获得更准确的照明估计，从而在物体重照和合成中实现更逼真的照明效果。我们的代码和模型可在 https://github.com/junhong-jennifer-zhao/SGformer 上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SGformer: Boosting transformers for indoor lighting estimation from a single image

Estimating lighting from standard images can effectively circumvent the need for resource-intensive high-dynamic-range (HDR) lighting acquisition. However, this task is often ill-posed and challenging, particularly for indoor scenes, due to the intricacy and ambiguity inherent in various indoor illumination sources. We propose an innovative transformer-based method called SGformer for lighting estimation through modeling spherical Gaussian (SG) distributions—a compact yet expressive lighting representation. Diverging from previous approaches, we explore underlying local and global dependencies in lighting features, which are crucial for reliable lighting estimation. Additionally, we investigate the structural relationships spanning various resolutions of SG distributions, ranging from sparse to dense, aiming to enhance structural consistency and curtail potential stochastic noise stemming from independent SG component regressions. By harnessing the synergy of local-global lighting representation learning and incorporating consistency constraints from various SG resolutions, the proposed method yields more accurate lighting estimates, allowing for more realistic lighting effects in object relighting and composition. Our code and model implementing our work can be found at https://github.com/junhong-jennifer-zhao/SGformer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Visual Media Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

16.90

自引率

5.80%

发文量

243

审稿时长

6 weeks

期刊介绍： Computational Visual Media is a peer-reviewed open access journal. It publishes original high-quality research papers and significant review articles on novel ideas, methods, and systems relevant to visual media. Computational Visual Media publishes articles that focus on, but are not limited to, the following areas: • Editing and composition of visual media • Geometric computing for images and video • Geometry modeling and processing • Machine learning for visual media • Physically based animation • Realistic rendering • Recognition and understanding of visual media • Visual computing for robotics • Visualization and visual analytics Other interdisciplinary research into visual media that combines aspects of computer graphics, computer vision, image and video processing, geometric computing, and machine learning is also within the journal''s scope. This is an open access journal, published quarterly by Tsinghua University Press and Springer. The open access fees (article-processing charges) are fully sponsored by Tsinghua University, China. Authors can publish in the journal without any additional charges.

期刊最新文献

TrafPS: A shapley-based visual analytics approach to interpret traffic CLIP-Flow: Decoding images encoded in CLIP space CLIP-SP: Vision-language model with adaptive prompting for scene parsing SGformer: Boosting transformers for indoor lighting estimation from a single image Central similarity consistency hashing for asymmetric image retrieval