TL, DR: We propose a method for joint geometry estimation (specifically surface normals and coordinates) from multi-view images and video clips within a unified space, by leveraging the inter-frame correspondence prior embedded in the video diffusion model.
The website template is borrowed from Nerfies.