DeOcc-1-to-3: 3D De-Occlusion from a Single Image

DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised
Multi-View Diffusion

¹ Key Laboratory of Multimedia Trusted Perception and Efficient Computing,
Ministry of Education of China, Xiamen University

² State Key Laboratory of Virtual Reality Technology and Systems, Beihang University

Paper

arXiv

Code

DeOcc-1-to-3 takes a single occluded image (left) as input and synthesizes structurally consistent multi-view de-occluded images (middle). These outputs can be seamlessly integrated into various 3D reconstruction or generation frameworks to produce accurate meshes and surface normals (right).

TL;DR

Abstract

Reconstructing 3D objects from a single image is a long-standing challenge, especially under real-world occlusions. While recent diffusion-based view synthesis models can generate consistent novel views from a single RGB image, they generally assume fully visible inputs and fail when parts of the object are occluded. This leads to inconsistent views and degraded 3D reconstruction quality. To overcome this limitation, we propose an end-to-end framework for occlusion-aware multi-view generation. Our method directly synthesizes six structurally consistent novel views from a single partially occluded image, enabling downstream 3D reconstruction without requiring prior inpainting or manual annotations. We construct a self-supervised training pipeline, leveraging occluded-unoccluded image pairs and pseudo-ground-truth views to teach the model structure-aware completion and view consistency. Without modifying the original architecture, we fully fine-tune the view synthesis model to jointly learn completion and multi-view generation. Additionally, we introduce the first benchmark for occlusion-aware reconstruction, encompassing diverse occlusion levels, object categories, and mask patterns. This benchmark provides a standardized protocol for evaluating future methods under partial occlusions.

Examples

Qualitative results on diverse occluded objects. Each row presents (from left to right) an occluded input image, the generated multi-view de-occluded images, a rendered video, and the final 3D object. Our method reconstructs coherent geometry and appearance across a wide range of shapes, materials, and occlusion scenarios.

Occluded Images	Multi-View Images	Rendered Videos	Full Objects

Gradio Demo

🎥 Demo Tutorial: This video demonstrates how to use DeOcc-1-to-3 Gradio demo for de-occlusion. It shows the input process, interaction options, and the output visualization.

Citation

If you want to cite our work, please use:

    @article{qu2025deocc,
        title={DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion},
        author={Qu, Yansong and Dai, Shaohui and Li, Xinyang and Wang, Yuze and Shen, You and Cao, Liujuan and Ji, Rongrong},
        journal={arXiv preprint arXiv:2506.21544},
        year={2025}
      }

Acknowledgements

The website template was borrowed from Michaël Gharbi and MipNeRF360.