Beijing Zhiyuan Artificial Intelligence Institute (BAAI) has released a breakthrough 3D generation model, See3D, which uses massive unlabeled Internet videos to learn and realizes the generation of 3D models from videos, marking the "See Video, Get A major advance in the concept of 3D. The See3D model does not need to rely on traditional camera parameters and 3D annotations. Through visual condition technology, it can generate multi-view images with controllable camera direction and consistent geometry using only visual clues in the video, greatly reducing the cost and difficulty of obtaining 3D data. , bringing new possibilities to 3D generation technology.
See3D model supports generating 3D models from text, single view and sparse view, and has 3D editing and Gaussian rendering functions. The model, code and demo have been open sourced to facilitate in-depth study and application by researchers. See3D has a wide range of application scenarios, including unlocking 3D interactive worlds, 3D reconstruction based on sparse pictures, open world 3D generation, and single-view-based 3D generation. Its core advantages lie in data scalability, camera controllability and geometric consistency. By constructing a WebVi3D data set containing 16 million video clips and 320 million frames of images, it has achieved significant improvements in 3D generation technology.
The research team constructed a large-scale WebVi3D dataset by automatically filtering video data, and by adding time-dependent noise to the masked video data, generated pure 2D visual signals to support scalable multi-view diffusion model training, ultimately achieving camera-free 3D generation of conditions. The emergence of See3D has brought new ideas to the field of 3D generation, and is expected to promote the application of large-scale camera-free annotation data in 3D research, reduce the cost of 3D data collection, and narrow the gap with existing closed-source 3D solutions.
The advantages of See3D lie in its data scalability, camera controllability and geometric consistency. It can generate scenes under any complex camera trajectories and maintain the geometric consistency of the previous and next frame views. This makes See3D widely applicable in a variety of 3D creation applications.
By expanding the size of the dataset, See3D provides new ideas for the development of 3D generation technology. It is hoped that this work can promote the 3D research community's attention to large-scale camera-free annotation data, reduce the cost of 3D data acquisition, and shrink the existing gap between closed source 3D solutions.
Project address: https://vision.baai.ac.cn/see3d
All in all, the open source release of the See3D model has brought new technological breakthroughs and development directions to the field of 3D generation. Its efficient and convenient features will bring innovation to more application scenarios. It is worth looking forward to its greater development in the future. and applications.