ShanghaiTech University has developed a breakthrough AI model called CLAY, which can generate detailed 3D objects based on text descriptions or two-dimensional images. With its efficient generation speed and high-quality output, CLAY has shown great potential in the field of 3D modeling and is expected to revolutionize industries such as game development, film production, and 3D printing. The core of the CLAY model lies in the multi-resolution variational autoencoder and diffusion transformer. It can directly process 3D content without converting to 2D images, and supports users to precisely control the generation results through custom shapes or bounding boxes, showing powerful flexibility.
Scientists at ShanghaiTech University recently developed an artificial intelligence model called CLAY that can generate detailed 3D objects from text descriptions or 2D images. Compared with previous technologies, CLAY has achieved significant breakthroughs in the quality and diversity of generated 3D objects.
The core of the CLAY model includes a multi-resolution variational autoencoder (VAE) and a diffusion transformer (DiT). VAE is responsible for encoding 3D geometries at different levels of detail into latent space, while DiT is responsible for generating these geometries. Unlike many other systems, CLAY is able to process 3D content directly without first converting to 2D images.
CLAY's training data exceeds 500,000 3D models, covering a wide range of objects from simple everyday objects to complex fantasy creatures. In addition, CLAY also has the ability to be controlled through additional input. Users can achieve precise control over the generated results by specifying rough shapes (such as voxel structures, point clouds) or bounding boxes. This flexibility allows CLAY to generate entire city scenes and even reconstruct detailed 3D models from hand-drawn sketches.
When compared with other systems (such as Shap-E, DreamFusion, Wonder3D), CLAY shows clear advantages. Whether text is converted to 3D or image is converted to 3D, CLAY can generate more consistent geometric shapes, smoother surfaces and finer details. CLAY is also incredibly fast at generating high-quality 3D assets, taking only about 45 seconds, whereas some comparison systems can take hours to optimize.
CLAY has a wide range of potential applications, including game development, film production and 3D printing. Still, the researchers are aware of the potential risks of AI-generated virtual content, so they plan to add more security measures to ensure responsible use.
In the future, the researchers also plan to further expand the training data, improve model quality, and integrate geometry generation and material synthesis into a single model to achieve more comprehensive functionality. A version of CLAY can be accessed through the 3D-Gen service Rodin.
Product entrance: https://hyperhuman.deemos.com/rodin
The emergence of the CLAY model marks a major leap in 3D modeling technology. Its efficient, high-quality generation capabilities and broad application prospects make it an important tool in the field of future 3D content creation. In the future, with the continuous development and improvement of technology, CLAY will surely bring more innovations and possibilities to all walks of life.