Shiyu Guo;Yuhao Ju;Xi Chen;Sachin S. Sapatnekar;Jie Gu
{"title":"Mobile-PBR: A 28-nm Energy-Efficient Rendering Processor for Photorealistic Augmented Reality With Inverse Rendering and Background Clustering","authors":"Shiyu Guo;Yuhao Ju;Xi Chen;Sachin S. Sapatnekar;Jie Gu","doi":"10.1109/JSSC.2024.3484212","DOIUrl":null,"url":null,"abstract":"This work presents a low-power physical-based ray-tracing (PBRT) rendering processor for photorealistic augmented reality (AR) rendering applications on mobile devices, referred to as mobile physical-based renderer (Mobile-PBR). By introducing inverse rendering (IR) and background clustering, Mobile-PBR enables complicated photorealistic lighting effects such as reflection, refraction, and shadow with minimum resources on mobile edge devices. The key features of this work include: 1) an ASIC rendering processor that embeds an end-to-end ray-tracing (RT) solution with IR for AR on mobile devices; 2) a reconfigurable mixed-precision processing element (PE) design supporting diverse computing tasks for both IR and RT modes; 3) background clustered field of view (FOV)-focused 3-D construction reducing conventional background scene complexity from O(nlogn) to O(1); 4) scalable partitioning scheme for complex 3-D objects with an average of <inline-formula> <tex-math>$13{\\times }$ </tex-math></inline-formula> speed up on test scenes; and 5) use of global RT scheduler (GRTS) and global memory access controller (GMAC) to overcome the challenges of irregular memory access pattern and varied PE runtime with overall <inline-formula> <tex-math>$684{\\times }$ </tex-math></inline-formula> speed up compared with the baseline design. A 28-nm test chip was fabricated demonstrating 500- and 1418-frames/s/W power efficiency in IR and RT modes, respectively, achieving <inline-formula> <tex-math>$28.8{\\times }$ </tex-math></inline-formula> and <inline-formula> <tex-math>$3.95{\\times }$ </tex-math></inline-formula> higher RT rendering efficiency compared with existing ASIC solutions, and having an average performance of 25.8 frames/s on various testing scenes, enabling real-time physical-based RT rendering on mobile edge devices.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 1","pages":"125-135"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10744559/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This work presents a low-power physical-based ray-tracing (PBRT) rendering processor for photorealistic augmented reality (AR) rendering applications on mobile devices, referred to as mobile physical-based renderer (Mobile-PBR). By introducing inverse rendering (IR) and background clustering, Mobile-PBR enables complicated photorealistic lighting effects such as reflection, refraction, and shadow with minimum resources on mobile edge devices. The key features of this work include: 1) an ASIC rendering processor that embeds an end-to-end ray-tracing (RT) solution with IR for AR on mobile devices; 2) a reconfigurable mixed-precision processing element (PE) design supporting diverse computing tasks for both IR and RT modes; 3) background clustered field of view (FOV)-focused 3-D construction reducing conventional background scene complexity from O(nlogn) to O(1); 4) scalable partitioning scheme for complex 3-D objects with an average of $13{\times }$ speed up on test scenes; and 5) use of global RT scheduler (GRTS) and global memory access controller (GMAC) to overcome the challenges of irregular memory access pattern and varied PE runtime with overall $684{\times }$ speed up compared with the baseline design. A 28-nm test chip was fabricated demonstrating 500- and 1418-frames/s/W power efficiency in IR and RT modes, respectively, achieving $28.8{\times }$ and $3.95{\times }$ higher RT rendering efficiency compared with existing ASIC solutions, and having an average performance of 25.8 frames/s on various testing scenes, enabling real-time physical-based RT rendering on mobile edge devices.
期刊介绍:
The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.