8 results
Search Results
Now showing 1 - 8 of 8
Item An Interactive Tuning Method for Generator Networks Trained by GAN(The Eurographics Association, 2022) Zhou, Mengyuan; Yamaguchi, Yasushi; Cabiddu, Daniela; Schneider, Teseo; Allegra, Dario; Catalano, Chiara Eva; Cherchi, Gianmarco; Scateni, RiccardoThe recent studies on GAN achieved impressive results in image synthesis. However, they are still not so perfect that output images may contain unnatural regions. We propose a tuning method for generator networks trained by GAN to improve their results by interactively removing unexpected objects and textures or changing the object colors. Our method could find and ablate those units in the generator networks that are highly related to the specific regions or their colors. Compared to the related studies, our proposed method can tune pre-trained generator networks without relying on any additional information like segmentation-based networks. We built the interactive system based on our method, capable of tuning the generator networks to make the resulting images as expected. The experiments show that our method could remove only unexpected objects and textures. It could change the selected area color as well. The method also gives us some hints to discuss the properties of generator networks which layers and units are associated with objects, textures, or colors.Item STRONGER: Simple TRajectory-based ONline GEsture Recognizer(The Eurographics Association, 2021) Emporio, Marco; Caputo, Ariel; Giachetti, Andrea; Frosini, Patrizio and Giorgi, Daniela and Melzi, Simone and Rodolà, EmanueleIn this paper, we present STRONGER, a client-server solution for the online gesture recognition from captured hands' joints sequences. The system leverages a CNN-based recognizer improving current state-of-the-art solutions for segmented gestures classification, trained and tested for the online gesture recognition task on a recent benchmark including heterogeneous gestures. The recognizer provides good classification accuracy and a limited number of false positives on most of the gesture classes of the benchmark used and has been used to create a demo application in a Mixed Reality scenario using an Hololens 2 optical see through Head-Mounted Display with hand tracking capability.Item Exploring Upper Limb Segmentation with Deep Learning for Augmented Virtuality(The Eurographics Association, 2021) Gruosso, Monica; Capece, Nicola; Erra, Ugo; Frosini, Patrizio and Giorgi, Daniela and Melzi, Simone and Rodolà, EmanueleSense of presence, immersion, and body ownership are among the main challenges concerning Virtual Reality (VR) and freehand-based interaction methods. Through specific hand tracking devices, freehand-based methods can allow users to use their hands for VE interaction. To visualize and make easy the freehand methods, recent approaches take advantage of 3D meshes to represent the user's hands in VE. However, this can reduce user immersion due to their unnatural correspondence with the real hands. We propose an augmented virtuality (AV) pipeline allows users to visualize their limbs in VE to overcome this limit. In particular, they were captured by a single monocular RGB camera placed in an egocentric perspective, segmented using a deep convolutional neural network (CNN), and streamed in the VE. In addition, hands were tracked through a Leap Motion controller to allow user interaction. We introduced two case studies as a preliminary investigation for this approach. Finally, both quantitative and qualitative evaluations of the CNN results were provided and highlighted the effectiveness of the proposed CNN achieving remarkable results in several real-life unconstrained scenarios.Item FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose Estimation(The Eurographics Association, 2024) Pöllabauer, Thomas; Pramod, Ashwin; Knauthe, Volker; Wahl, Michael; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos6D object pose estimation involves determining the three-dimensional translation and rotation of an object within a scene and relative to a chosen coordinate system. This problem is of particular interest for many practical applications in industrial tasks such as quality control, bin picking, and robotic manipulation, where both speed and accuracy are critical for real-world deployment. Current models, both classical and deep-learning-based, often struggle with the trade-off between accuracy and latency. Our research focuses on enhancing the speed of a prominent state-of-the-art deep learning model, GDRNPP, while keeping its high accuracy. We employ several techniques to reduce the model size and improve inference time. These techniques include using smaller and quicker backbones, pruning unnecessary parameters, and distillation to transfer knowledge from a large, high-performing model to a smaller, more efficient student model. Our findings demonstrate that the proposed configuration maintains accuracy comparable to the state-of-the-art while significantly improving inference time. This advancement could lead to more efficient and practical applications in various industrial scenarios, thereby enhancing the overall applicability of 6D Object Pose Estimation models in real-world settings.Item Evaluating AI-based static stereoscopic rendering of indoor panoramic scenes(The Eurographics Association, 2024) Jashari, Sara; Tukur, Muhammad; Boraey, Yehia; Alzubaidi, Mahmood; Pintore, Giovanni; Gobbetti, Enrico; Villanueva, Alberto Jaspe; Schneider, Jens; Fetais, Noora; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosPanoramic imaging has recently become an extensively used technology for the representation and exploration of indoor environments. Panoramic cameras generate omnidirectional images that provide a comprehensive 360-degree view, making them a valuable tool for applications such as virtual tours in real estate, architecture, and cultural heritage. However, constructing truly immersive experiences from panoramic images presents challenges, particularly in generating panoramic stereo pairs that offer consistent depth cues and visual comfort across all viewing directions. Traditional stereo-imaging techniques do not directly apply to spherical panoramic images, requiring complex processing to avoid artifacts that can disrupt immersion. To address these challenges, various imaging and processing technologies have been developed, including multi-camera systems and computational methods that generate stereo images from a single panoramic input. Although effective, these solutions often involve complicated hardware and processing pipelines. Recently, deep learning approaches have emerged, enabling novel view generation from single panoramic images. While these methods show promise, they have not yet been thoroughly evaluated in practical scenarios. This paper presents a series of evaluation experiments aimed at assessing different technologies for creating static stereoscopic environments from omnidirectional imagery, with a focus on 3DOF immersive exploration. A user study was conducted using a WebXR prototype and a Meta Quest 3 headset to quantitatively and qualitatively compare traditional image composition techniques with AI-based methods. Our results indicate that while traditional methods provide a satisfactory level of immersion, AI-based generation is nearing a quality level suitable for deployment in web-based environments.Item DDD: Deep indoor panoramic Depth estimation with Density maps consistency(The Eurographics Association, 2024) Pintore, Giovanni; Agus, Marco; Signoroni, Alberto; Gobbetti, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosWe introduce a novel deep neural network for rapid and structurally consistent monocular 360◦ depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.Item A Sparse Mesh Sampling Scheme for Graph-based Relief Pattern Classification(The Eurographics Association, 2023) Paolini, Gabriele; Guiducci, Niccolò; Tortorici, Claudio; Berretti, Stefano; Banterle, Francesco; Caggianese, Giuseppe; Capece, Nicola; Erra, Ugo; Lupinetti, Katia; Manfredi, GildaIn the context of geometric deep learning, the classification of relief patterns involves recognizing the surface characteristics of a 3D object, regardless of its global shape. State-of-the-art methods leverage powerful 2D deep learning image-based techniques by converting local patches of the surface into a texture image. However, their effectiveness is guaranteed only when the mesh is simple enough to allow this projection onto a 2D subspace. Therefore, developing deep learning techniques that can work directly on manifolds represents an interesting line of research for addressing these challenges. The objective of our paper is to extend and enhance the architecture described in a recent GNN approach for a relief pattern classifier through the introduction of a new sampling tecnhique for meshes. In their method, local mesh structures, referred to as SpiderPatches, are connected to form the nodes of a graph, called MeshGraph, that captures global structures of the mesh. These two data structures are then fed into a bi-level architecture based on Graph Attention Networks. The MeshGraph construction proves important in ensuring optimal classification results. By the proposed subsampling process, we tackle the problem of fine-tuning multiple hyperparameters inherent the MeshGraph by defining a graph structure that is aware of the mesh geometric details. We demonstrate that the graph constructed using this approach robustly captures the relief patterns on the surface, obviating the need for data augmentation during training. The resulting network is robust, easily customizable, and shows comparable performance to recent methods, all while operating directly on 3D data.Item VISPI: Virtual Staging Pipeline for Single Indoor Panoramic Images(The Eurographics Association, 2024) Shah, Uzair; Jashari, Sara; Tukur, Muhammad; Pintore, Giovanni; Gobbetti, Enrico; Schneider, Jens; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosTaking a 360◦ image is the quickest and most cost-effective way to capture the entire environment around the viewer in a form that can be directly exploited for creating immersive content [PBAG23]. In this work, we introduce novel solutions for the virtual staging of indoor environments, supporting automatic emptying, object insertion, and relighting. Our solution, dubbed VISPI (Virtual Staging Pipeline for Single Indoor Panoramic Images), integrates data-driven processing components, that take advantage of the analysis of knowledge learned from massive data collections, within a real-time rendering and editing system, allowing for interactive restaging of indoor scenes. Key components of VISPI include: i) a holistic architecture based on a multi-task vision transformer for extracting geometry, semantic, and material information from a single panoramic image, ii) a lighting model based on spherical Gaussians, iii) a method for lighting estimation from the geometric, semantic, and material signals, and iv) a real-time editing and rendering component. The proposed framework provides an interactive and user-friendly solution for creating immersive visualizations of indoor spaces. We present a preliminary assessment of VISPI using a synthetic dataset - Structured3D - and demonstrate its application in creating restaged indoor scenes.