Search Results

Now showing 1 - 10 of 11
  • Item
    Virtual Instrument Performances (VIP): A Comprehensive Review
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Kyriakou, Theodoros; Alvarez de la Campa Crespo, Merce; Panayiotou, Andreas; Chrysanthou, Yiorgos; Charalambous, Panayiotis; Aristidou, Andreas; Aristidou, Andreas; Macdonnell, Rachel
    Driven by recent advancements in Extended Reality (XR), the hype around the Metaverse, and real-time computer graphics, the transformation of the performing arts, particularly in digitizing and visualizing musical experiences, is an ever-evolving landscape. This transformation offers significant potential in promoting inclusivity, fostering creativity, and enabling live performances in diverse settings. However, despite its immense potential, the field of Virtual Instrument Performances (VIP) has remained relatively unexplored due to numerous challenges. These challenges arise from the complex and multi-modal nature of musical instrument performances, the need for high precision motion capture under occlusions including the intricate interactions between a musician's body and fingers with instruments, the precise synchronization and seamless integration of various sensory modalities, accommodating variations in musicians' playing styles, facial expressions, and addressing instrumentspecific nuances. This comprehensive survey delves into the intersection of technology, innovation, and artistic expression in the domain of virtual instrument performances. It explores musical performance multi-modal databases and investigates a wide range of data acquisition methods, encompassing diverse motion capture techniques, facial expression recording, and various approaches for capturing audio and MIDI data (Musical Instrument Digital Interface). The survey also explores Music Information Retrieval (MIR) tasks, with a particular emphasis on the Musical Performance Analysis (MPA) field, and offers an overview of various works in the realm of Musical Instrument Performance Synthesis (MIPS), encompassing recent advancements in generative models. The ultimate aim of this survey is to unveil the technological limitations, initiate a dialogue about the current challenges, and propose promising avenues for future research at the intersection of technology and the arts.
  • Item
    State of the Art on 3D Reconstruction with RGB-D Cameras
    (The Eurographics Association and John Wiley & Sons Ltd., 2018) Zollhöfer, Michael; Stotko, Patrick; Görlitz, Andreas; Theobalt, Christian; Nießner, Matthias; Klein, Reinhard; Kolb, Andreas; Hildebrandt, Klaus and Theobalt, Christian
    The advent of affordable consumer grade RGB-D cameras has brought about a profound advancement of visual scene reconstruction methods. Both computer graphics and computer vision researchers spend significant effort to develop entirely new algorithms to capture comprehensive shape models of static and dynamic scenes with RGB-D cameras. This led to significant advances of the state of the art along several dimensions. Some methods achieve very high reconstruction detail, despite limited sensor resolution. Others even achieve real-time performance, yet possibly at lower quality. New concepts were developed to capture scenes at larger spatial and temporal extent. Other recent algorithms flank shape reconstruction with concurrent material and lighting estimation, even in general scenes and unconstrained conditions. In this state-of-the-art report, we analyze these recent developments in RGB-D scene reconstruction in detail and review essential related work. We explain, compare, and critically analyze the common underlying algorithmic concepts that enabled these recent advancements. Furthermore, we show how algorithms are designed to best exploit the benefits of RGB-D data while suppressing their often non-trivial data distortions. In addition, this report identifies and discusses important open research questions and suggests relevant directions for future work.
  • Item
    ShapeVerse: Physics-based Characters with Varied Body Shapes
    (The Eurographics Association, 2024) Vyas, Bharat; O'Sullivan, Carol; Liu, Lingjie; Averkiou, Melinos
    Computer animation of realistic human characters remains a significant challenge. This work used deep reinforcement learning to generate physics-based characters with diverse body shapes. We aimed to replicate reference motions like walking or jogging while considering individual variations in body shape and mass. Reference motions served as training targets, accounting for differences in shape parameters to accommodate mass variations. This method produced animations that accurately capture human motion details, leading to diverse and lifelike character performances.
  • Item
    Fine-Grained Semantic Segmentation of Motion Capture Data using Dilated Temporal Fully-Convolutional Networks
    (The Eurographics Association, 2019) Cheema, Noshaba; hosseini, somayeh; Sprenger, Janis; Herrmann, Erik; Du, Han; Fischer, Klaus; Slusallek, Philipp; Cignoni, Paolo and Miguel, Eder
    Human motion capture data has been widely used in data-driven character animation. In order to generate realistic, naturallooking motions, most data-driven approaches require considerable efforts of pre-processing, including motion segmentation and annotation. Existing (semi-) automatic solutions either require hand-crafted features for motion segmentation or do not produce the semantic annotations required for motion synthesis and building large-scale motion databases. In addition, human labeled annotation data suffers from inter- and intra-labeler inconsistencies by design. We propose a semi-automatic framework for semantic segmentation of motion capture data based on supervised machine learning techniques. It first transforms a motion capture sequence into a ''motion image'' and applies a convolutional neural network for image segmentation. Dilated temporal convolutions enable the extraction of temporal information from a large receptive field. Our model outperforms two state-of-the-art models for action segmentation, as well as a popular network for sequence modeling. Most of all, our method is very robust under noisy and inaccurate training labels and thus can handle human errors during the labeling process.
  • Item
    Smoothing Noisy Skeleton Data in Real Time
    (The Eurographics Association, 2018) Hoxey, Thomas; Stephenson, Ian; Jain, Eakta and Kosinka, Jirí
    The aim of this project is to be able to visualise live skeleton tracking data in a virtual analogue of a real world environment, to be viewed in VR. Using a single RGBD camera motion tracking method is a cost effective way to get real time 3D skeleton tracking data. Not only this but people being tracked don't need any special markers. This makes it much more practical for use in a non studio or lab environment. However the skeleton it provides is not as accurate as a traditional multiple camera system. With a single fixed view point the body can easily occlude itself, for example by standing side on to the camera. Secondly without marked tracking points there can be inconsistencies with where the joints are identified, leading to inconsistent body proportions. In this paper we outline a method for improving the quality of motion capture data in real time, providing an off the shelf framework for importing the data into a virtual scene. Our method uses a two stage approach to smooth smaller inconsistencies and try to estimate the position of improperly proportioned or occluded joints.
  • Item
    Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Yunus, Raza; Lenssen, Jan Eric; Niemeyer, Michael; Liao, Yiyi; Rupprecht, Christian; Theobalt, Christian; Pons-Moll, Gerard; Huang, Jia-Bin; Golyanik, Vladislav; Ilg, Eddy; Aristidou, Andreas; Macdonnell, Rachel
    Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.
  • Item
    State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications
    (The Eurographics Association and John Wiley & Sons Ltd., 2018) Zollhöfer, Michael; Thies, Justus; Garrido, Pablo; Bradley, Derek; Beeler, Thabo; Pérez, Patrick; Stamminger, Marc; Nießner, Matthias; Theobalt, Christian; Hildebrandt, Klaus and Theobalt, Christian
    The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB-D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state-of-the-art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance-based animation to real-time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization-based reconstruction algorithms. We provide an in-depth overview of the underlying concepts of real-world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under-constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo-geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.
  • Item
    Kinder-Gator: The UF Kinect Database of Child and Adult Motion
    (The Eurographics Association, 2018) Aloba, Aishat; Flores, Gianne; Woodward, Julia; Shaw, Alex; Castonguay, Amanda; Cuba, Isabella; Dong, Yuzhu; Jain, Eakta; Anthony, Lisa; Diamanti, Olga and Vaxman, Amir
    Research has suggested that children's whole-body motions are different from those of adults. However, research on children's motions, and how these motions differ from those of adults, is limited. One possible reason for this limited research is that there are few motion capture (mocap) datasets for children, with most datasets focusing on adults instead. There are even fewer datasets that have both children's and adults' motions to allow for comparison between them. To address these problems, we present Kinder-Gator, a new dataset of ten children and ten adults performing whole-body motions in front of the Kinect v1.0. The data contains RGB and 3D joint positions for 58 motions, such as wave, walk in place, kick, and point, which have been manually labeled according to the category of the participant (child vs. adult), and the motion being performed. We believe this dataset will be useful in supporting research and applications in animation and whole-body motion recognition and interaction.
  • Item
    A Probabilistic Motion Planning Algorithm for Realistic Walk Path Simulation
    (The Eurographics Association, 2018) Agethen, Philipp; Neher, Thomas; Gaisbauer, Felix; Manns, Martin; Rukzio, Enrico; Jain, Eakta and Kosinka, Jirí
    This paper presents an approach that combines a hybrid A* path planner with a statistical motion graph to effectively generate a rich repertoire of walking trajectories. The motion graph is generated from a comprehensive database (20 000 steps) of captured human motion and covers a wide range of gait variants. The hybrid A* path planner can be regarded as an orchestrationinstance, stitching together succeeding left and right steps, which were drawn from the statistical motion model. Moreover, the hybrid A* planner ensures a collision-free path between a start and an end point. A preliminary evaluation underlines the evident benefits of the proposed algorithm.
  • Item
    3D Human Shape and Pose from a Single Depth Image with Deep Dense Correspondence Enabled Model Fitting
    (The Eurographics Association, 2022) Wang, Xiaofang; Boukhayma, Adnane; Prévost, Stéphanie; Desjardin, Eric; Loscos, Celine; Multon, Franck; Sauvage, Basile; Hasic-Telalovic, Jasminka
    We propose a two-stage hybrid method, with no initialization, for 3D human shape and pose estimation from a single depth image, combining the benefits of deep learning and optimization. First, a convolutional neural network predicts pixel-wise dense semantic correspondences to a template geometry, in the form of body part segmentation labels and normalized canonical geometry vertex coordinates. Using these two outputs, pixel-to-vertex correspondences are computed in a six-dimensional embedding of the template geometry through nearest neighbor. Second, a parametric shape model (SMPL) is fitted to the depth data by minimizing vertex distances to the input. Extensive evaluation on both real and synthetic human shape in motion datasets shows that our method yields quantitatively and qualitatively satisfactory results and state-of-the-art reconstruction errors.