CLIP-based Neural Neighbor Style Transfer for 3D Assets

Mishra, ShaileshGranskog, JonathanBabaei, VahidSkouras, Melina2023-05-032023-05-032023978-3-03868-209-71017-4656https://doi.org/10.2312/egs.20231006https://diglib.eg.org:443/handle/10.2312/egs20231006We present a method for transferring the style from a set of images to the texture of a 3D object. The texture of an asset is optimized with a differentiable renderer and losses using pretrained deep neural networks. More specifically, we utilize a nearest-neighbor feature matching (NNFM) loss with CLIP-ResNet50 that we extend to support multiple style images. We improve color accuracy and artistic control with an extra loss on user-provided or automatically extracted color palettes. Finally, we show that a CLIP-based NNFM loss provides a different appearance over a VGG-based one by focusing more on textural details over geometric shapes. However, we note that user preference is still subjective.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Appearance and texture representations; Rasterization; Supervised learning by regressionComputing methodologies → Appearance and texture representationsRasterizationSupervised learning by regressionCLIP-based Neural Neighbor Style Transfer for 3D Assets10.2312/egs.2023100625-284 pages