Mohadikar, PayalFan, ChuanmaoZhao, ChenxiDuan, YeChaine, RaphaƫlleDeng, ZhigangKim, Min H.2023-10-092023-10-092023978-3-03868-234-9https://doi.org/10.2312/pg.20231282https://diglib.eg.org:443/handle/10.2312/pg20231282Panorama images are widely used for scene depth estimation as they provide comprehensive scene representation. The existing deep-learning monocular panorama depth estimation networks produce inconsistent, discontinuous, and poor-quality depth maps. To overcome this, we propose a novel multi-scale monocular panorama depth estimation framework. We use a coarseto- fine depth estimation approach, where multi-scale tangent perspective images, projected from 360 images, are given to coarse and fine encoder-decoder networks to produce multi-scale perspective depth maps, that are merged to get low and high-resolution 360 depth maps. The coarse branch extracts holistic features that guide fine branch extracted features using a Multi-Scale Feature Fusion (MSFF) module at the network bottleneck. The performed experiments on the Stanford2D3D benchmark dataset show that our model outperforms the existing methods, producing consistent, smooth, structure-detailed, and accurate depth maps.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies -> Scene understandingComputing methodologiesScene understandingMulti-scale Monocular Panorama Depth Estimation10.2312/pg.20231282113-1142 pages