GeoFusionLRM
Geometry-Aware Self-Correction for Consistent 3D Reconstruction
Paper Code
Ahmet Burak Yildirim1,* Tuna Saygin1,* Duygu Ceylan2 Aysegul Dundar1
1Bilkent University, Ankara, Turkey 2Adobe Research, London, United Kingdom
*These authors contributed equally.

Our method emphasizes geometric consistency over appearance. By refining self-predicted depth and normals, it improves normal accuracy and corrects structural errors often hidden in RGB renderings of existing LRM-based methods.

GeoFusionLRM teaser figure
Abstract

Single-image 3D reconstruction with large reconstruction models (LRMs) has advanced rapidly, yet reconstructions often exhibit geometric inconsistencies and misaligned details that limit fidelity. We introduce GeoFusionLRM, a geometry-aware self-correction framework that leverages the model's own normal and depth predictions to refine structural accuracy. Unlike prior approaches that rely solely on features extracted from the input image, GeoFusionLRM feeds back geometric cues through a dedicated transformer and fusion module, enabling the model to correct errors and enforce consistency with the conditioning image. This design improves the alignment between the reconstructed mesh and the input views without additional supervision or external signals. Extensive experiments demonstrate that GeoFusionLRM achieves sharper geometry, more consistent normals, and higher fidelity than state-of-the-art LRM baselines.

Method
GeoFusionLRM method figure
Overview of the proposed GeoFusionLRM architecture. Given a conditioning image, semantic features are extracted with a pre-trained vision encoder, while geometric cues from normals and depths of the intermediate mesh are encoded by the geometry-aware GeoFormer. The GeoFuser module merges these two streams of embeddings at the token level to produce refined conditioning features, which guide the LRM in generating an updated 3D mesh. This process corrects residual geometric errors and improves the consistency of surface normals and RGB renderings with respect to the conditioning image.
Qualitative Results and Comparisons

We present qualitative comparisons across multiple datasets, including GSO, OmniObject3D, and synthetic Flux-generated images, highlighting differences in geometric consistency, surface normals, and structural accuracy. Results are visualized using both RGB renderings and surface normal maps, allowing geometric artifacts that may be hidden in RGB appearance to be clearly revealed through normal-based inspection under changing viewpoints.

0.25×
GT Input
InstantMesh
Ours
Qualitative comparison on Flux-generated single-image inputs. We compare InstantMesh and GeoFusionLRM using reconstructed RGB videos (top) and surface normal renderings (bottom), revealing geometric errors that are often visually masked in RGB appearance.
GT
LRM
Spar3D
LGM
InstantMesh
Ours
Qualitative comparison on the GSO dataset across competing methods. All results are rendered from identical viewpoints. RGB renderings (top) and surface normals (bottom) highlight differences in geometric consistency and surface accuracy.
GT
LRM
Spar3D
LGM
InstantMesh
Ours
Qualitative comparison on the OmniObject3D dataset across competing methods. All results are rendered from identical viewpoints. RGB renderings (top) and surface normals (bottom) highlight differences in geometric consistency and surface accuracy.