Efficient Hybrid Zoom Using Camera Fusion on Mobile Phones

ACM Transactions on Graphics (TOG) Pub Date : 2023-12-04 DOI:10.1145/3618362

Xiaotong Wu, Wei-Sheng Lai, Yi-Chang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang

{"title":"Efficient Hybrid Zoom Using Camera Fusion on Mobile Phones","authors":"Xiaotong Wu, Wei-Sheng Lai, Yi-Chang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang","doi":"10.1145/3618362","DOIUrl":null,"url":null,"abstract":"DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smart-phone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"6 6","pages":"1 - 12"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618362","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smart-phone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在移动电话上使用相机融合技术实现高效混合变焦

数码单反相机可以通过改变镜头距离或切换镜头类型来实现多种变焦级别。然而，由于空间限制，这些技术在智能手机设备上是不可能的。大多数智能手机制造商采用混合变焦系统:通常是低变焦水平的广角(W)相机和高变焦水平的远摄(T)相机。为了模拟W和T之间的变焦水平，这些系统从W裁剪和数字上采样图像，导致显著的细节损失。在本文中，我们提出了一种高效的移动设备混合变焦超分辨率系统，该系统捕获同步的W和T对镜头，并利用机器学习模型将细节从T对齐和传输到W。我们进一步开发了一种自适应混合方法，该方法考虑了景深不匹配、场景遮挡、流不确定性和对齐误差。为了最大限度地减少领域差距，我们设计了一个双手机摄像头来捕捉现实世界的输入和监督训练的真实情况。我们的方法在500毫秒内在移动平台上生成1200万像素的图像，并在真实场景的广泛评估下与最先进的方法相比具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Graphics (TOG)

自引率

0.00%

发文量

期刊最新文献

GeoLatent: A Geometric Approach to Latent Space Design for Deformable Shape Generators An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range Rectifying Strip Patterns From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans Warped-Area Reparameterization of Differential Path Integrals