Search Results

Now showing 1 - 4 of 4
  • Item
    Evaluating AI-based static stereoscopic rendering of indoor panoramic scenes
    (The Eurographics Association, 2024) Jashari, Sara; Tukur, Muhammad; Boraey, Yehia; Alzubaidi, Mahmood; Pintore, Giovanni; Gobbetti, Enrico; Villanueva, Alberto Jaspe; Schneider, Jens; Fetais, Noora; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
    Panoramic imaging has recently become an extensively used technology for the representation and exploration of indoor environments. Panoramic cameras generate omnidirectional images that provide a comprehensive 360-degree view, making them a valuable tool for applications such as virtual tours in real estate, architecture, and cultural heritage. However, constructing truly immersive experiences from panoramic images presents challenges, particularly in generating panoramic stereo pairs that offer consistent depth cues and visual comfort across all viewing directions. Traditional stereo-imaging techniques do not directly apply to spherical panoramic images, requiring complex processing to avoid artifacts that can disrupt immersion. To address these challenges, various imaging and processing technologies have been developed, including multi-camera systems and computational methods that generate stereo images from a single panoramic input. Although effective, these solutions often involve complicated hardware and processing pipelines. Recently, deep learning approaches have emerged, enabling novel view generation from single panoramic images. While these methods show promise, they have not yet been thoroughly evaluated in practical scenarios. This paper presents a series of evaluation experiments aimed at assessing different technologies for creating static stereoscopic environments from omnidirectional imagery, with a focus on 3DOF immersive exploration. A user study was conducted using a WebXR prototype and a Meta Quest 3 headset to quantitatively and qualitatively compare traditional image composition techniques with AI-based methods. Our results indicate that while traditional methods provide a satisfactory level of immersion, AI-based generation is nearing a quality level suitable for deployment in web-based environments.
  • Item
    DDD: Deep indoor panoramic Depth estimation with Density maps consistency
    (The Eurographics Association, 2024) Pintore, Giovanni; Agus, Marco; Signoroni, Alberto; Gobbetti, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
    We introduce a novel deep neural network for rapid and structurally consistent monocular 360◦ depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.
  • Item
    Disk-NeuralRTI: Optimized NeuralRTI Relighting through Knowledge Distillation
    (The Eurographics Association, 2024) Dulecha, Tinsae Gebrechristos; Righetto, Leonardo; Pintus, Ruggero; Gobbetti, Enrico; Giachetti, Andrea; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
    Relightable images created from Multi-Light Image Collections (MLICs) are among the most employed models for interactive object exploration in cultural heritage (CH). In recent years, neural representations have been shown to produce higherquality images at similar storage costs to the more classic analytical models such as Polynomial Texture Maps (PTM) or Hemispherical Harmonics (HSH). However, the Neural RTI models proposed in the literature perform the image relighting with decoder networks with a high number of parameters, making decoding slower than for classical methods. Despite recent efforts targeting model reduction and multi-resolution adaptive rendering, exploring high-resolution images, especially on high-pixelcount displays, still requires significant resources and is only achievable through progressive rendering in typical setups. In this work, we show how, by using knowledge distillation from an original (teacher) Neural RTI network, it is possible to create a more efficient RTI decoder (student network). We evaluated the performance of the network compression approach on existing RTI relighting benchmarks, including both synthetic and real datasets, and on novel acquisitions of high-resolution images. Experimental results show that we can keep the student prediction close to the teacher with up to 80% parameter reduction and almost ten times faster rendering when embedded in an online viewer.
  • Item
    VISPI: Virtual Staging Pipeline for Single Indoor Panoramic Images
    (The Eurographics Association, 2024) Shah, Uzair; Jashari, Sara; Tukur, Muhammad; Pintore, Giovanni; Gobbetti, Enrico; Schneider, Jens; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
    Taking a 360◦ image is the quickest and most cost-effective way to capture the entire environment around the viewer in a form that can be directly exploited for creating immersive content [PBAG23]. In this work, we introduce novel solutions for the virtual staging of indoor environments, supporting automatic emptying, object insertion, and relighting. Our solution, dubbed VISPI (Virtual Staging Pipeline for Single Indoor Panoramic Images), integrates data-driven processing components, that take advantage of the analysis of knowledge learned from massive data collections, within a real-time rendering and editing system, allowing for interactive restaging of indoor scenes. Key components of VISPI include: i) a holistic architecture based on a multi-task vision transformer for extracting geometry, semantic, and material information from a single panoramic image, ii) a lighting model based on spherical Gaussians, iii) a method for lighting estimation from the geometric, semantic, and material signals, and iv) a real-time editing and rendering component. The proposed framework provides an interactive and user-friendly solution for creating immersive visualizations of indoor spaces. We present a preliminary assessment of VISPI using a synthetic dataset - Structured3D - and demonstrate its application in creating restaged indoor scenes.