UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation

Arxiv 2025
1The University of Hong Kong, 2Beihang University, 3VAST

TL, DR: We propose a method for joint geometry estimation (specifically surface normals and coordinates) from multi-view images and video clips within a unified space, by leveraging the inter-frame correspondence prior embedded in the video diffusion model.

Teaser

Motivation

Teaser
  • (a) Pre-trained video diffusion models capture accurate inter-frame correspondence (the same patches in different frames highlight in the attention maps)
  • (b) The correspondence can be specified by applying identical positional encodings onto different frames
  • (c) Geometric properties within a shared global coordinate system naturally exhibit alignment across frames.

Video Geometry Estimation

Multi-View Images Geometry Estimation

原始图像
原始图像2

Dynamic Video Geometry Estimation

Long Video Stitching

Comparison with other methods

Acknowledgements

The website template is borrowed from Nerfies.