3 results
Search Results
Now showing 1 - 3 of 3
Item DragPoser: Motion Reconstruction from Variable Sparse Tracking Signals via Latent Space Optimization(The Eurographics Association and John Wiley & Sons Ltd., 2025) Ponton, Jose Luis; Pujol, Eduard; Aristidou, Andreas; Andujar, Carlos; Pelechano, Nuria; Bousseau, Adrien; Day, AngelaHigh-quality motion reconstruction that follows the user's movements can be achieved by high-end mocap systems with many sensors. However, obtaining such animation quality with fewer input devices is gaining popularity as it brings mocap closer to the general public. The main challenges include the loss of end-effector accuracy in learning-based approaches, or the lack of naturalness and smoothness in IK-based solutions. In addition, such systems are often finely tuned to a specific number of trackers and are highly sensitive to missing data, e.g., in scenarios where a sensor is occluded or malfunctions. In response to these challenges, we introduce DragPoser, a novel deep-learning-based motion reconstruction system that accurately represents hard and dynamic constraints, attaining real-time high end-effectors position accuracy. This is achieved through a pose optimization process within a structured latent space. Our system requires only one-time training on a large human motion dataset, and then constraints can be dynamically defined as losses, while the pose is iteratively refined by computing the gradients of these losses within the latent space. To further enhance our approach, we incorporate a Temporal Predictor network, which employs a Transformer architecture to directly encode temporality within the latent space. This network ensures the pose optimization is confined to the manifold of valid poses and also leverages past pose data to predict temporally coherent poses. Results demonstrate that DragPoser surpasses both IK-based and the latest data-driven methods in achieving precise end-effector positioning, while it produces natural poses and temporally coherent motion. In addition, our system showcases robustness against on-the-fly constraint modifications, and exhibits adaptability to various input configurations and changes. The complete source code, trained model, animation databases, and supplementary material used in this paper can be found at https://upc-virvig.github.io/DragPoserItem Multi-Modal Instrument Performances (MMIP): A Musical Database(The Eurographics Association and John Wiley & Sons Ltd., 2025) Kyriakou, Theodoros; Aristidou, Andreas; Charalambous, Panayiotis; Bousseau, Adrien; Day, AngelaMusical instrument performances are multimodal creative art forms that integrate audiovisual elements, resulting from musicians' interactions with instruments through body movements, finger actions, and facial expressions. Digitizing such performances for archiving, streaming, analysis, or synthesis requires capturing every element that shapes the overall experience, which is crucial for preserving the performance's essence. In this work, following current trends in large-scale dataset development for deep learning analysis and generative models, we introduce the Multi-Modal Instrument Performances (MMIP) database (https://mmip.cs.ucy.ac.cy). This is the first dataset to incorporate synchronized high-quality 3D motion capture data for the body, fingers, facial expressions, and instruments, along with audio, multi-angle videos, and MIDI data. The database currently includes 3.5 hours of performances featuring three instruments: guitar, piano, and drums. Additionally, we discuss the challenges of acquiring these multi-modal data, detailing our approach to data collection, signal synchronization, annotation, and metadata management. Our data formats align with industry standards for ease of use, and we have developed an open-access online repository that offers a user-friendly environment for data exploration, supporting data organization, search capabilities, and custom visualization tools. Notable features include a MIDI-to-instrument animation project for visualizing the instruments and a script for playing back FBX files with synchronized audio in a web environment.Item CEDRL: Simulating Diverse Crowds with Example-Driven Deep Reinforcement Learning(The Eurographics Association and John Wiley & Sons Ltd., 2025) Panayiotou, Andreas; Aristidou, Andreas; Charalambous, Panayiotis; Bousseau, Adrien; Day, AngelaThe level of realism in virtual crowds is strongly affected by the presence of diverse crowd behaviors. In real life, we can observe various scenarios, ranging from pedestrians moving on a shopping street, people talking in static groups, or wandering around in a public park. Most of the existing systems optimize for specific behaviors such as goal-seeking and collision avoidance, neglecting to consider other complex behaviors that are usually challenging to capture or define. Departing from the conventional use of Supervised Learning, which requires vast amounts of labeled data and often lacks controllability, we introduce Crowds using Example-driven Deep Reinforcement Learning (CEDRL), a framework that simultaneously leverages multiple crowd datasets to model a broad spectrum of human behaviors. This approach enables agents to adaptively learn and exhibit diverse behaviors, enhancing their ability to generalize decisions across unseen states. The model can be applied to populate novel virtual environments while providing real-time controllability over the agents' behaviors. We achieve this through the design of a reward function aligned with real-world observations and by employing curriculum learning that gradually diminishes the agents' observation space. A complexity characterization metric defines each agent's high-level crowd behavior, linking it to the agent's state and serving as an input to the policy network. Additionally, a parametric reward function, influenced by the type of crowd task, facilitates the learning of a diverse and abstract behavior ''skill'' set. We evaluate our model on both training and unseen real-world data, comparing against other simulators, showing its ability to generalize across scenarios and accurately reflect the observed complexity of behaviors. We also examine our system's controllability by adjusting the complexity weight, discovering that higher values lead to more complex behaviors such as wandering, static interactions, and group dynamics like joining or leaving. Finally, we demonstrate our model's capabilities in novel synthetic scenarios.