PARC: A Two-Stage Multi-Modal Framework for Point Cloud Completion

dc.contributor.authorCai, Yujiaoen_US
dc.contributor.authorSu, Yuhaoen_US
dc.contributor.editorChristie, Marcen_US
dc.contributor.editorPietroni, Nicoen_US
dc.contributor.editorWang, Yu-Shuenen_US
dc.date.accessioned2025-10-07T05:03:18Z
dc.date.available2025-10-07T05:03:18Z
dc.date.issued2025
dc.description.abstractPoint cloud completion is vital for accurate 3D reconstruction, yet real world scans frequently exhibit large structural gaps that compromise recovery. Meanwhile, in 2D vision, VAR (Visual Auto-Regression) has demonstrated that a coarse-to-fine ''nextscale prediction'' can significantly improve generation quality, inference speed, and generalization. Because this coarse-to-fine approach closely aligns with the progressive nature of filling missing geometry in point clouds, we were inspired to develop PARC (Patch-Aware Coarse-to-Fine Refinement Completion), a two-stage multimodal framework specifically designed for handling missing structures. In the pretraining stage, PARC leverages complete point clouds alongside a Patch-Aware Coarse-to- Fine Refinement (PAR) strategy and a Mixture-of-Experts (MoE) architecture to generate high-quality local fragments, thereby improving geometric structure understanding and feature representation quality. During finetuning, the model is adapted to partial scans, further enhancing its resilience to incomplete inputs. To address remaining uncertainties in areas with missing structure, we introduce a dual-branch architecture that incorporates image cues: point cloud and image features are extracted independently and then fused via the MoE with an alignment loss, allowing complementary modalities to guide reconstruction in occluded or missing regions. Experiments conducted on the ShapeNet-ViPC dataset show that PARC has achieved highly competitive performance. Code is available at https://github.com/caiyujiaocyj/PARC.en_US
dc.description.number7
dc.description.sectionheadersCreating and Processing Point Clouds
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume44
dc.identifier.doi10.1111/cgf.70266
dc.identifier.issn1467-8659
dc.identifier.pages10 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.70266
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf70266
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.subjectCCS Concepts: Computer vision → Reconstruction; Machine learning → Neural networks; Information systems → Multimedia information systems; Multimedia and multimodal retrieval
dc.subjectComputer vision → Reconstruction
dc.subjectMachine learning → Neural networks
dc.subjectInformation systems → Multimedia information systems
dc.subjectMultimedia and multimodal retrieval
dc.titlePARC: A Two-Stage Multi-Modal Framework for Point Cloud Completionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
cgf70266.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format
Collections