PG2025 Conference Papers, Posters, and Demos
Permanent URI for this collection
Browse
Browsing PG2025 Conference Papers, Posters, and Demos by Issue Date
Now showing 1 - 20 of 61
Results Per Page
Sort Options
Item DiffQN: Differentiable Quasi-Newton Method for Elastodynamics(The Eurographics Association, 2025) Cai, Youshuai; Li, Chen; Song, Haichuan; Xie, Youchen; Wang, ChangBo; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneWe propose DiffQN, an efficient differentiable quasi-Newton method for elastodynamics simulation, addressing the challenges of high computational cost and limited material generality in existing differentiable physics frameworks. Our approach employs a per-frame initial Hessian approximation and selectively delays Hessian updates, resulting in improved convergence and faster forward simulation compared to prior methods such as DiffPD. During backpropagation, we further reduce gradient evaluation costs by reusing prefactorized linear system solvers from the forward pass. Unlike previous approaches, our method supports a wide range of hyperelastic materials without restrictions on material energy functions, enabling the simulation of more general physical phenomena. To efficiently handle high-resolution systems with large degrees of freedom, we introduce a subspace optimization strategy that projects both forward simulation and backpropagation into a low-dimensional subspace, significantly improving computational and memory efficiency. Our subspace method can provide effective initial guesses for subsequent full-space optimization. We validate our framework on diverse applications, including system identification, initial state optimization, and facial animation, demonstrating robust performance and achieving up to 1.8× to 18.9× speedup over state-of-the-art methods.Item SPLICE: Part-Level 3D Shape Editing from Local Semantic Extraction to Global Neural Mixing(The Eurographics Association, 2025) Zhou, Jin; Yang, Hongliang; Xu, Pengfei; Huang, Hui; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneNeural implicit representations of 3D shapes have shown great potential in 3D shape editing due to their ability to model highlevel semantics and continuous geometric representations. However, existing methods often suffer from limited editability, lack of part-level control, and unnatural results when modifying or rearranging shape parts. In this work, we present SPLICE, a novel part-level neural implicit representation of 3D shapes that enables intuitive, structure-aware, and high-fidelity shape editing. By encoding each shape part independently and positioning them using parameterized Gaussian ellipsoids, SPLICE effectively isolates part-specific features while discarding global context that may hinder flexible manipulation. A global attention-based decoder is then employed to integrate parts coherently, further enhanced by an attention-guiding filtering mechanism that prevents information leakage across symmetric or adjacent components. Through this architecture, SPLICE supports various part-level editing operations, including translation, rotation, scaling, deletion, duplication, and cross-shape part mixing. These operations enable users to flexibly explore design variations while preserving semantic consistency and maintaining structural plausibility. Extensive experiments demonstrate that SPLICE outperforms existing approaches both qualitatively and quantitatively across a diverse set of shape-editing tasks.Item Distance-Aware Tri-Perspective View for Efficient 3D Perception in Autonomous Driving(The Eurographics Association, 2025) Tang, Yutao; Zhao, Jigang; Qin, Zhengrui; Qiu, Rui; Zhao, Lingying; Ren, Jie; Chen, Guangxi; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneThree-dimensional environmental perception remains a critical bottleneck in autonomous driving, where existing vision-based dense representations face an intractable trade-off between spatial resolution and computational complexity. Current methods, including Bird's Eye View (BEV) and Tri-Perspective View (TPV), apply uniform perception precision across all spatial regions, disregarding the fundamental safety principle that near-field objects demand high-precision detection for collision avoidance while distant objects permit lower initial accuracy. This uniform treatment squanders computational resources and constrains real-time deployment. We introduce Distance-Aware Tri-Perspective View (DA-TPV), a novel framework that allocates computational resources proportional to operational risk. DA-TPV employs a hierarchical dual-plane architecture for each viewing direction: low-resolution planes capture global scene context while high-resolution planes deliver fine-grained perception within safety-critical reaction zones. Through distance-adaptive feature fusion, our method dynamically concentrates processing power where it most directly impacts vehicle safety. Extensive experiments on nuScenes demonstrate that DA-TPV matches or exceeds single high-resolution TPV performance while reducing memory consumption by 26.3% and achieving real-time inference. This work establishes distance-aware perception as a practical paradigm for deploying sophisticated three-dimensional understanding within automotive computational constraints. Code is available at https://github.com/yytang2012/DA-TPVFormer.Item Unsupervised 3D Shape Parsing with Primitive Correspondence(The Eurographics Association, 2025) Zhao, Tianshu; Guan, Yanran; Kaick, Oliver van; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, Eugene3D shape parsing, the process of analyzing and breaking down a 3D shape into components or parts, has become an important task in computer graphics and vision. Approaches for shape parsing include segmentation and approximation methods. Approximation methods often represent shapes with a set of primitives fit to the shapes, such as cuboids, cylinders, or superquadrics. However, existing approximation methods typically rely on a large number of initial primitives and aim to maximize their coverage of the target shape, without accounting for correspondences among the primitives. In this paper, we introduce a novel 3D shape approximation method that integrates reconstruction and correspondence into a single objective, providing approximations that are consistent across the input set of shapes. Our method is unsupervised but also supports supervised learning. Experimental results demonstrate that integrating correspondences into the fitting process not only provides consistent correspondences across a set of input shapes, but also improves approximation quality when using a small number of primitives. Moreover, although correspondences are estimated in an unsupervised manner, our method effectively leverages this knowledge, leading to improved approximations.Item Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation(The Eurographics Association, 2025) Ren, Kaiwen; Hu, Lei; Zhang, Zhiheng; Ye, Yongjing; Xia, Shihong; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneVision-based regression tasks, such as hand pose estimation, have achieved higher accuracy and faster convergence through representation learning. However, existing representation learning methods often encounter the following issues: the high semantic level of features extracted from images is inadequate for regressing low-level information, and the extracted features include task-irrelevant information, reducing their compactness and interfering with regression tasks. To address these challenges, we propose TI-Net, a highly versatile visual Network backbone designed to construct a Transformation Isomorphic latent space. Specifically, we employ linear transformations to model geometric transformations in the latent space and ensure that TI-Net aligns them with those in the image space. This ensures that the latent features capture compact, low-level information beneficial for pose estimation tasks. We evaluated TI-Net on the hand pose estimation task to demonstrate the network's superiority. On the DexYCB dataset, TI-Net achieved a 10% improvement in the PA-MPJPE metric compared to specialized state-of-the-art (SOTA) hand pose estimation methods. Our code is available at https://github.com/Mine268/TI-Net.Item Structural Entropy Based Visualization of Social Networks(The Eurographics Association, 2025) Xue, Mingliang; Chen, Lu; Wei, Chunyu; Hou, Shuowei; Cui, Lizhen; Deussen, Oliver; Wang, Yunhai; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneSocial networks exhibit the small-world phenomenon, characterized by highly interconnected nodes (clusters) with short average path distances. While force-directed layouts are widely employed to visualize such networks, they often result in visual clutter, obscuring community structures due to high node connectivity. In this paper, we present a novel approach that leverages structural entropy and coding trees to enhance community visualization in social networks. Our method computes the structural entropy of graph partitions to construct coding trees that guide hierarchical partitioning with O(E) time complexity. These partitions are then used to assign edge weights that influence attractive forces in the layout, promoting clearer community separation while preserving local cohesion. We evaluate our approach through both quantitative and qualitative comparisons with state-of-the-art community-aware layout algorithms and present two case studies that highlight its practical utility in the analysis of real-world social networks. The results demonstrate that our method enhances community visibility without compromising layout performance. Code and demonstrations are available at https://github.com/IDEAS-Laboratory/SEL.Item Stable Sample Caching for Interactive Stereoscopic Ray Tracing(The Eurographics Association, 2025) Philippi, Henrik; Jensen, Henrik Wann; Frisvad, Jeppe Revall; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneWe present an algorithm for interactive stereoscopic ray tracing that decouples visibility from shading and enables caching of radiance results for temporally stable and stereoscopically consistent rendering. With an outset in interactive stable ray tracing, we build a screen space cache that carries surface samples from frame to frame via forward reprojection. Using a visibility heuristic, we adaptively trace the samples and achieve high performance with little temporal artefacts. Our method also serves as a shading cache, which enables temporal reuse and filtering of shading results in virtual reality (VR). We demonstrate good antialiasing and temporal coherence when filtering geometric edges. We compare our sample-based radiance caching that operates in screen space with temporal antialiasing (TAA) and a hash-based shading cache that operates in a voxel representation of world space. In addition, we show how to extend the shading cache into a radiance cache. Finally, we use the per-sample radiance values to improve stereo vision by employing stereo blending with improved estimates of the blending parameter between the two views.Item KIN-FDNet:Dual-Branch KAN-INN Decomposition Network for Multi-Modality Image Fusion(The Eurographics Association, 2025) Dong, Aimei; Meng, Hao; Chen, Zhen; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneMulti-modality image fusion (MMIF) aims to integrate information from different source images to preserve the complementary information of each modality, such as feature highlights and texture details. However, current fusion methods fail to effectively address the inter-modality interference and feature redundancy issues. To address this issue, we propose an end-to-end dualbranch KAN-INN decomposition network (KIN-FDNet) with an effective feature decoupling mechanism for separating shared and specific features. It first employs a gated attention-based Transformer module for cross-modal shallow feature extraction. Then, we embed KAN into the Transformer architecture to extract low-frequency global features and solve the problem of low parameter efficiency in multi-branch models. Meanwhile, an invertible neural network (INN) processes high-frequency local information to preserve fine-grained modality-specific details. In addition, we design a dual-frequency cross-fusion module to promote information interaction between low and high frequencies to obtain high-quality fused images. Extensive experiments on visible infrared (VIF) and medical image fusion (MIF) tasks demonstrate the superior performance and generalization ability of our KIN-FDNet framework.Item Motion Vector-Based Frame Generation for Real-Time Rendering(The Eurographics Association, 2025) Ha, Inwoo; Ahn, Young Chun; Yoon, Sung-eui; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneThe demand for high frame rate rendering is rapidly increasing, especially in the graphics and gaming industries. Although recent learning-based frame interpolation methods have demonstrated promising results, they have not yet achieved the quality required for real-time gaming. High-quality frame interpolation is critical for rendering faster, dynamic motion during gameplay. In graphics, motion vectors are typically favored over optical flow due to their accuracy and efficiency in game engines. However, motion vectors alone are insufficient for frame interpolation, as they lack bilateral motions for the target frame to interpolate and struggle with capturing non-geometric movements. To address this, we propose a novel method that leverages fast, low-cost motion vectors as guiding flows, integrating them into a task-specific intermediate flow estimation process. Our approach employs a combined motion and image context encoder-decoder to produce more accurate intermediate bilateral flows. As a result, our method significantly improves interpolation quality and achieves state-of-the-art performance in rendered content.Item ER-Diff: A Multi-Scale Exposure Residual-Guided Diffusion Model for Image Exposure Correction(The Eurographics Association, 2025) Chen, TianZhen; Liu, Jie; Ru, Yi; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneThis paper proposes an Exposure Residual-guided Diffusion Model (ER-Diff) to address the performance limitations of existing image restoration methods in handling non-uniform exposure. Current exposure correction techniques struggle with detail recovery in extreme over/underexposed regions and global exposure balancing. While diffusion models offer powerful generative capabilities for image restoration, effectively leveraging exposure information to guide the denoising process remains underexplored. Additionally, content reconstruction fidelity in severely degraded regions is challenging to ensure. To tackle these issues, ER-Diff explicitly constructs exposure residual features to guide the diffusion process. Specifically, we design a multi-scale exposure residual guidance module that first computes the residual between the input image and an ideally exposed reference, then transforms it into hierarchical feature representations via a multi-scale extraction network, and finally integrates these features progressively into the denoising process. This design enhances feature representation in locally distorted exposure areas while maintaining global exposure consistency. By decoupling content reconstruction and exposure correction, our method achieves more natural exposure adjustment with better detail preservation while ensuring content authenticity. Extensive experiments demonstrate that ER-Diff outperforms state-of-the-art exposure correction methods in both quantitative and qualitative evaluations, particularly in complex lighting conditions, effectively balancing detail retention and exposure correction.Item Trajectory-guided Anime Video Synthesis via Effective Motion Learning(The Eurographics Association, 2025) Lin, Jian; Li, Chengze; Qin, Haoyun; Liu, Hanyuan; Liu, Xueting; Ma, Xin; Chen, Cunjian; Wong, Tien-Tsin; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneCartoon and anime motion production is traditionally labor-intensive, requiring detailed animatics and extensive inbetweening from keyframes. To streamline this process, we propose a novel framework that synthesizes motion directly from a single colored keyframe, guided by user-provided trajectories. Addressing the limitations of prior methods, which struggle with anime due to reliance on optical flow estimators and models trained on natural videos, we introduce an efficient motion representation specifically adapted for anime, leveraging CoTracker to capture sparse frame-to-frame tracking effectively. To achieve our objective, we design a two-stage learning mechanism: the first stage predicts sparse motion from input frames and trajectories, generating a motion preview sequence via explicit warping; the second stage refines these previews into high-quality anime frames by fine-tuning ToonCrafter, an anime-specific video diffusion model. We train our framework on a novel animation video dataset comprising more than 500,000 clips. Experimental results demonstrate significant improvements in animating still frames, achieving better alignment with user-provided trajectories and more natural motion patterns while preserving anime stylization and visual quality. Our method also supports versatile applications, including motion manga generation and 2D vector graphic animations. The data and code will be released upon acceptance. For models, datasets and additional visual comparisons and ablation studies, visit our project page: https://animemotiontraj.github.io/.Item A Region-Based Facial Motion Analysis and Retargeting Model for 3D Characters(The Eurographics Association, 2025) Zhu, ChangAn; Soltanpour, Sima; Joslin, Chris; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneWith the expanding applicable scenarios of 3D facial animation, abundant research has been done on facial motion capture, 3D face parameterization, and retargeting. However, current retargeting methods still struggle to reflect the source motion on a target 3D face accurately. One major reason is that the source motion is not translated into precise representations of the motion meanings and intensities, resulting in the target 3D face presenting inaccurate motion semantics. We propose a region-based facial motion analysis and retargeting model that focuses on predicting detailed facial motion representations and providing a plausible retargeting result through 3D facial landmark input. We have defined the regions based on facial muscle behaviours and trained a motion-to-representation regression for each region. A refinement process, designed using an autoencoder and a motion predictor for facial landmarks, which works for both real-life subjects' and fictional characters' face rigs, is also introduced to improve the precision of the retargeting. The region-based strategy effectively balances the motion scales of the different facial regions, providing reliable representation prediction and retargeting results. The representation prediction and refinement with 3D facial landmark input have enabled flexible application scenarios such as video-based and marker-based motion retargeting, and the reuse of animation assets for Computer-Generated (CG) characters. Our evaluation shows that the proposed model provides semantically more accurate and visually more natural results than similar methods and the commercial solution from Faceware. Our ablation study demonstrates the positive effects of the region-based strategy and the refinement process.Item Iterative Lightmap Updates for Scene Editing(The Eurographics Association, 2025) Lu, Guowei; Peters, Christoph; Kellnhofer, Petr; Eisemann, Elmar; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneLightmaps are popular for precomputed global illumination, but require costly recomputation when the scene changes. We present the theory and an iterative algorithm to update lightmaps efficiently, when objects are inserted or removed. Our method is based on path tracing, but focuses on updates to those paths that are affected by the scene change. Using an importance sampling scheme, our solution substantially accelerates convergence. Our GPU implementation is well-suited for interactive scene editing scenarios.Item Uni-IR: One Stage is Enough for Ambiguity-Reduced Inverse Rendering(The Eurographics Association, 2025) Ge, Wenhang; Feng, Jiawei; Shen, Guibao; Chen, Ying-Cong; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneInverse rendering aims to decompose an image into geometry, materials, and lighting. Recently, Neural Radiance Fields (NeRF) based inverse rendering has significantly advanced, bridging the gap between NeRF-based models and conventional rendering engines. Existing methods typically adopt a two-stage optimization approach, beginning with volume rendering for geometry reconstruction, followed by physically based rendering (PBR) for materials and lighting estimation. However, the inherent ambiguity between materials and lighting during PBR, along with the suboptimal nature of geometry reconstruction by volume rendering, compromises the outcomes. To address these challenges, we introduce Uni-IR, a unified framework that imposes mutual constraints to alleviate ambiguity by integrating volume rendering and physically based rendering. Specifically, we employ a physically-based volume rendering (PBVR) approach that incorporates PBR concepts into volume rendering, directly facilitating connections with materials and lighting, in addition to geometry. Both rendering methods are utilized simultaneously during optimization, imposing mutual constraints and optimizing geometry, materials, and lighting synergistically. By employing a carefully designed unified representation for both lighting and materials, Uni-IR achieves high-quality geometry reconstruction, materials, and lighting estimation across various object types.Item Semi-supervised Dual-teacher Comparative Learning with Bidirectional Bisect Copy-paste for Medical Image Segmentation(The Eurographics Association, 2025) Fang, Jiangxiong; Qi, Shikuan; Liu, Huaxiang; Fu, Youyao; Zhang, Shiqing; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneSemi-supervised learning leverages limited pixel-level annotated data and abundant unlabeled data to achieve effective semantic image segmentation. To address this, we propose a semi-supervised learning framework, integrated with a bidirectional bisect copy-paste (B2P) mechanism. We introduce a B2CP strategy applied to labeled and unlabeled data in the second teacher network, preserving both data types to enhance training diversity. This mechanism, coupled with copy-paste-based supervision for the student network, effectively mitigates interference from uncontrollable regions. Extensive experiments on the ACDC public datasets demonstrate the efficiency of the proposed model. It surpasses the fully supervised U-Net at a 5% labeled data and 20% labeled data.Item CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering(The Eurographics Association, 2025) Jin, Qiangguo; Zheng, Xianyao; Cui, Hui; Sun, Changming; Fang, Yuqi; Cong, Cong; Su, Ran; Wei, Leyi; Xuan, Ping; Wang, Junbo; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneMedical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent self-attention based methods struggle to effectively handle cross-modal semantic alignments between vision and language. Moreover, classification-based methods rely on predefined answer sets. Treating this task as a simple classification problem may make it unable to adapt to the diversity of free-form answers and overlook the detailed semantic information of free-form answers. In order to tackle these challenges, we introduce a Cross-Mamba Interaction based Multi-Task Learning (CMI-MTL) framework that learns cross-modal feature representations from images and texts. CMI-MTL comprises three key modules: fine-grained visual-text feature alignment (FVTA), cross-modal interleaved feature representation (CIFR), and free-form answer-enhanced multi-task learning (FFAE). FVTA extracts the most relevant regions in image-text pairs through fine-grained visual-text feature alignment. CIFR captures cross-modal sequential interactions via cross-modal interleaved feature representation. FFAE leverages auxiliary knowledge from open-ended questions through free-form answerenhanced multi-task learning, improving the model's capability for open-ended Med-VQA. Experimental results show that CMI-MTL outperforms the existing state-of-the-art methods on three Med-VQA datasets: VQA-RAD, SLAKE, and OVQA. Furthermore, we conduct more interpretability experiments to prove the effectiveness. The code is publicly available at https://github.com/BioMedIA-repo/CMI-MTL.Item Animating Multi-Vehicle Interactions in Traffic Conflict Zones Using Operational Plans(The Eurographics Association, 2025) Chang, Feng-Jui; Wong, Sai-Keung; Huang, Bo-Rui; Lin, Wen-Chieh; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneThis paper introduces an agent-based method for generating animations of intricate vehicle interactions by regulating behaviors in conflict zones on non-signalized road segments. As vehicles move along their paths, they create sweeping regions representing the areas they may occupy. The method assigns operation plans to vehicles, regulating their crossing and yielding strategies within intersecting or merging conflict zones. This approach enables various vehicle interactions, combining basic actions such as acceleration, deceleration, keeping speed, and stopping. Experimental results demonstrate that our method generates plausible interaction behaviors in diverse road structures, including intersections, Y-junctions, and midblocks. This method could be beneficial for applications in traffic scenario planning, self-driving vehicles, driving training, and education.Item Skeletal Gesture Recognition Based on Joint Spatio-Temporal and Multi-Modal Learning(The Eurographics Association, 2025) Yu, Zhijing; Zhu, Zhongjie; Ge, Di; Tu, Renwei; Bai, Yongqiang; Yang, Yueping; Wang, Yuer; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneHand skeleton-based gesture recognition is a crucial task in human-computer interaction and virtual reality. It aims to achieve precise classification by analyzing the spatio-temporal dynamics of skeleton joints. However, existing methods struggle to effectively model highly entangled spatio-temporal features and fuse heterogeneous Joint, Bone, and Motion (J/B/JM) modalities. These limitations hinder recognition performance. To address these challenges, we propose an Adaptive Spatio-Temporal Network (ASTD-Net) for gesture recognition. Our approach centers on integrated spatio-temporal feature learning and collaborative optimization. First, for spatial feature learning, we design an Adaptive Multi-Subgraph Convolution Module (AMS-GCN) which mitigates spatial coupling interference and enhances structural representation. Subsequently, for temporal feature learning, we introduce a Multi-Scale Dilated Temporal Fusion Module (MD-TFN) that captures multi-granularity temporal patterns, spanning local details to global evolution. This allows for comprehensive modeling of temporal dependencies. Finally, we propose a Self-Supervised Spatio-Temporal Channel Adaptation Module (SSTC-A). Using a temporal discrepancy loss, SSTC-A dynamically optimizes cross-modal dependencies and strengthens alignment between heterogeneous J/B/JM features, enhancing their fusion. On the SHREC'17 and DHG-14/28 datasets, ASTD-Net achieves recognition accuracies of 97.50% and 93.57%, respectively. This performance surpasses current state-of-the-art methods by up to 0.50% and 1.07%. These results verify the effectiveness and superiority of our proposed method.Item Exploring Perceptual Homogenization through a VR-Based AI Narrative(The Eurographics Association, 2025) Kao, Bing-Chen; Tsai, Tsun-Hung; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneThis research explores how the drive for cognitive efficiency in Artificial Intelligence (AI) may contribute to the homogenization of sensory experiences. We present Abstract.exe, a Virtual Reality (VR) installation designed as a critical medium for this inquiry. The experience places participants in a detailed virtual forest where their exploration triggers an AI-driven ''simplification'' of the world. Visuals, models, and lighting progressively degrade, aiming to transform the 3D scene into abstract 2D color fields. This work attempts to translate the abstract logic of AI-driven summarization into a tangible, immersive experience. This paper outlines the concept and technical implementation in Unreal Engine 5 (UE5), which utilizes a Procedural Content Generation (PCG) framework. Abstract.exe is intended as both an artistic inquiry and a cautionary exploration of how we might preserve experiential richness in an algorithmically influenced world.Item Attention-Guided Multi-scale Neural Dual Contouring(The Eurographics Association, 2025) Wu, Fuli; Hu, Chaoran; Li, Wenxuan; Hao, Pengyi; Christie, Marc; Han, Ping-Hsuan; Lin, Shih-Syun; Pietroni, Nico; Schneider, Teseo; Tsai, Hsin-Ruey; Wang, Yu-Shuen; Zhang, EugeneReconstructing high-quality meshes from binary voxel data is a fundamental task in computer graphics. However, existing methods struggle with low information density and strong discreteness, making it difficult to capture complex geometry and long-range boundary features, often leading to jagged surfaces and loss of sharp details.We propose an Attention-Guided Multiscale Neural Dual Contouring (AGNDC) method to address this challenge. AGNDC refines surface reconstruction through a multi-scale framework, using a hybrid feature extractor that combines global attention and dynamic snake convolution to enhance perception of long-range and high-curvature features. A dynamic feature fusion module aligns multi-scale predictions to improve local detail continuity, while a geometric postprocessing module further refines mesh boundaries and suppresses artifacts. Experiments on the ABC dataset demonstrate the superior performance of AGNDC in both visual and quantitative metrics. It achieves a Chamfer Distance (CD×105) of 9.013 and an F-score of 0.440, significantly reducing jaggedness and improving surface smoothness.