PF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrieval

dc.contributor.authorWu, Yiqien_US
dc.contributor.authorHu, Rongleien_US
dc.contributor.authorWu, Huachaoen_US
dc.contributor.authorHe, Fazhien_US
dc.contributor.authorZhang, Dejunen_US
dc.contributor.editorChristie, Marcen_US
dc.contributor.editorHan, Ping-Hsuanen_US
dc.contributor.editorLin, Shih-Syunen_US
dc.contributor.editorPietroni, Nicoen_US
dc.contributor.editorSchneider, Teseoen_US
dc.contributor.editorTsai, Hsin-Rueyen_US
dc.contributor.editorWang, Yu-Shuenen_US
dc.contributor.editorZhang, Eugeneen_US
dc.date.accessioned2025-10-07T06:03:42Z
dc.date.available2025-10-07T06:03:42Z
dc.date.issued2025
dc.description.abstractUniversal Cross-Domain Retrieval (UCDR) aims to match semantically related images across domains and categories not seen during training. While vision-language pre-trained models offer strong global alignment, we are inspired by the observation that local structures, such as shapes, contours, and textures, often remain stable across domains, and thus propose to model them explicitly at the patch level. We present PF-UCDR, a framework built upon frozen vision-language backbones that performs patch-wise fusion of RGB and phase representations. Central to our design is a Fusing Vision Encoder, which applies masked cross-attention to spatially aligned RGB and phase patches, enabling fine-grained integration of complementary appearance and structural cues. Additionally, we incorporate adaptive visual prompts that condition image encoding based on domain and class context. Local and global fusion modules aggregate these enriched features, and a two-stage training strategy progressively optimizes alignment and retrieval objectives. Experiments on standard UCDR benchmarks demonstrate that PF-UCDR significantly outperforms existing methods, validating the effectiveness of structure-aware local fusion grounded in multimodal pretraining. Our code is publicly available at https://github.com/djzgroup/PF-UCDR.en_US
dc.description.sectionheadersMulti-Modality
dc.description.seriesinformationPacific Graphics Conference Papers, Posters, and Demos
dc.identifier.doi10.2312/pg.20251279
dc.identifier.isbn978-3-03868-295-0
dc.identifier.pages10 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20251279
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20251279
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Computer vision tasks; Visual content-based indexing and retrieval; Image representations
dc.subjectComputing methodologies → Computer vision tasks
dc.subjectVisual content
dc.subjectbased indexing and retrieval
dc.subjectImage representations
dc.titlePF-UCDR: A Local-Aware RGB-Phase Fusion Network with Adaptive Prompts for Universal Cross-Domain Retrievalen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pg20251279.pdf
Size:
5.84 MB
Format:
Adobe Portable Document Format