High-quality real-time view synthesis methods are based on volume rendering, splatting, or surface rendering.
While surface-based methods generally are the fastest, they cannot faithfully model fuzzy geometry like hair. In turn, alpha-blending techniques excel at representing fuzzy materials but require an unbounded number of samples per ray (P1). Further overheads are induced by empty space skipping in volume rendering (P2) and sorting input primitives in splatting (P3).
We present a novel representation for real-time view synthesis where the (P1) number of sampling locations is small and bounded, (P2) sampling locations are efficiently found via rasterization, and (P3) rendering is sorting-free. We achieve this by representing objects as semi-transparent multi-layer meshes, rendered in fixed order. First, we model surface layers as SDF shells with optimal spacing learned during training. Then, we bake them as meshes and fit UV textures.
Unlike single-surface methods, our multi-layer representation effectively models fuzzy objects. In contrast to volume-based and splatting-based methods, our approach enables real-time rendering on low-cost smartphones.
We focus our evaluation on object-centric datasets with prominent fuzzy structures. Our method, tested on renderings from our real-time WebGL renderer, consistently delivers higher image quality than surface-based competitors and renders faster than 3DGS.
Our key baselines focus on real-time rendering, with 3DGS and MobileNeRF as primary competitors. 3DGS represents the fastest volumetric approach, while MobileNeRF pioneers surface-based neural graphics for mobile devices.
Method | FPS ⋄ ⬆️ | FPS ⋆ ⬆️ | PSNR ⬆️ | MB ⬇️ |
---|---|---|---|---|
MobileNeRF | 24 | 35 | 29.30 | 194 |
3DGS-75K | 13 | 115 | 33.05 | 18 |
3DGS | 8 | 18 | 35.44 | 57 |
3-Mesh | 65 | 145 | 33.39 | 46 |
5-Mesh | 55 | 90 | 34.25 | 77 |
7-Mesh | 42 | 70 | 34.50 | 110 |
9-Mesh | 35 | 55 | 34.38 | 140 |
Framerate is measured on close-up views at HD (720p) resolution on low-power smartphone (Samsung A52s) (marked with ⋄) and laptop (Dell XPS 13 i5) (marked with ⋆), on respective WebGL renderers; memory footprint as stored on disk. Metrics averaged over scenes of the Shelly dataset.
We find that using seven layers (7-Mesh) offers a good balance between image quality, model size, and rendering speed. Refer to our paper for more extensive quantitative results.
Qualitative comparisons between (b) PermutoSDF and (c) our method (7-Mesh) demonstrate that our approach convincingly represents fuzzy objects. This is achieved by trading (a) high-frequency geometry for the number of integration points, which are found by rasterizing smooth, lightweight meshes defined as (d) adaptively-spaced shells around the object and traversed in a fixed order.
3DGS demonstrates superior performance in modeling thin structures but is significantly less effective than our method in representing large, textured areas. In 3DGS-75k, we limit the maximum number of primitives during optimization to 75.000. Our method renders faster than 3DGS-75K on mobile devices, proving a clear advantage in term of efficiency even with a limited number of Guassians.
Our method surpasses MobileNeRF in modeling volumetric hair while also achieving superior performance on flat surfaces.
@inproceedings{Esposito2025VolSurfs,
author = {Esposito, Stefano and Chen, Anpei and Reiser, Christian and Rota Bulò, Samuel and Porzi, Lorenzo and Schwarz, Katja and Richardt, Christian and Zollhoefer, Michael and Kontschieder, Peter and Geiger, Andreas},
title = {Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
Plushy Blender object by kaizentutorials.
Stefano Esposito acknowledges travel support from the European Union’s Horizon 2020 research and innovation program under ELISE Grant Agreement No. 951847.