Evaluating Zero-Shot Monocular Depth Estimation Models for Tactile Rendering of Paintings

dc.contributor.authorMagherini, Robertoen_US
dc.contributor.authorServi, Michaelaen_US
dc.contributor.authorBuonamici, Francescoen_US
dc.contributor.authorFurferi, Roccoen_US
dc.contributor.editorCampana, Stefanoen_US
dc.contributor.editorFerdani, Danieleen_US
dc.contributor.editorGraf, Holgeren_US
dc.contributor.editorGuidi, Gabrieleen_US
dc.contributor.editorHegarty, Zackaryen_US
dc.contributor.editorPescarin, Sofiaen_US
dc.contributor.editorRemondino, Fabioen_US
dc.date.accessioned2025-09-05T20:56:56Z
dc.date.available2025-09-05T20:56:56Z
dc.date.issued2025
dc.description.abstractAccess to pictorial art remains a significant challenge for visually impaired individuals, as 2D paintings require transformation into tactile 2.5D/3D models. While deep learning offers promising tools for monocular depth estimation (MDE), applying state-of-the-art zero-shot models to artworks presents unique difficulties due to artistic conventions (perspective, lighting, texture) and the lack of ground truth, especially concerning details crucial for tactile perception. This paper addresses this gap by qualitatively evaluating a wide range of SOTA zero-shot MDE models - including DepthAnything (v1/v2), Marigold, Metric3D v2, ZoeDepth, UniDepth (v1/v2/v2_old), GeoWizard (v1/v2), and Depth-Pro - on their ability to generate depth maps suitable for tactile rendering from two 20th-century Italian paintings with distinct styles and input qualities. The assessment, based on criteria like detail preservation, contour definition, spatial coherence, and artifact absence, reveals that while zero-shot models can interpret basic spatial structures, performance varies considerably. Models such as DepthAnything v2 and GeoWizard v2 demonstrated superior capabilities in preserving key features for tactile fruition, emerging as promising candidates. However, no model produced a directly usable output, highlighting persistent challenges in handling artistic styles and pictorial textures. This study provides the first systematic comparison in this niche application, offering practical insights for cultural institutions aiming to leverage AI for accessibility. It concludes that current zero-shot models, while valuable starting points requiring validation and refinement, show significant potential but also underscore the need for further research in areas like targeted post-processing, art-specific metrics, and user-centered validation to make cultural heritage truly accessible to all.en_US
dc.description.sectionheadersDigital Technologies for CHANGES (CHANGES SESSION) - Part 3
dc.description.seriesinformationDigital Heritage
dc.identifier.doi10.2312/dh.20253048
dc.identifier.isbn978-3-03868-277-6
dc.identifier.pages8 pages
dc.identifier.urihttps://doi.org/10.2312/dh.20253048
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/dh20253048
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleEvaluating Zero-Shot Monocular Depth Estimation Models for Tactile Rendering of Paintingsen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dh20253048.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format