Researchers from Adobe Research and the Australian National University have developed a Large Reconstruction Model (LRM) that can convert a 2D image into a 3D model within 5 seconds. LRM uses a transformer-based architecture and can generate high-fidelity 3D shapes. The model is scalable, efficient, and adaptable to various datasets. Future plans include increasing the model’s size and exploring multi-modal generative models in 3D. This technology has the potential to automate some tasks performed by 3D designers and enhance accessibility in the creative sector.
Introducing the Large Reconstruction Model (LRM) for 3D Object Prediction
Imagine a world where any 2D image can be instantly transformed into a 3D model. This vision has motivated researchers to develop a generic and efficient method for achieving this objective, with applications in industrial design, animation, gaming, and augmented reality/virtual reality.
Early approaches to learning-based 3D modeling focused on specific categories, using category data to infer overall shape due to the inherent ambiguity of 3D geometry. Recent studies have taken advantage of image generation advancements to enable multi-view supervision. However, these approaches require careful parameter adjustment and regularization, and their output is limited by pre-trained 2D generative models.
The Solution: Large Reconstruction Model (LRM)
Researchers from Adobe Research and the Australian National University have developed a breakthrough solution. LRM uses a massive transformer-based encoder-decoder architecture to learn 3D object representation from a single image. When an image is inputted, LRM outputs a triplane representation of a NeRF (Neural Radiance Field).
LRM’s architecture involves generating image features using a pre-trained visual transformer as the image encoder, and then learning an image-to-triplane transformer decoder to project the 2D image features onto the 3D triplane. The model also self-attentively models the relations among the triplane tokens. The output tokens are reshaped and upsampled to the final triplane feature maps. This allows for volume rendering and image generation from any viewpoint.
LRM offers practical benefits:
- Scalability and efficiency due to its well-designed architecture
- Computational friendliness compared to other representations
- Proximity to the input image
- Efficient training and adaptability to various multi-view image datasets
LRM is the first large-scale 3D reconstruction model, with over 500 million learnable parameters and training data consisting of approximately one million 3D shapes and videos from various categories. Experimental results demonstrate high-fidelity 3D shape generation from real-world and generative model photos.
Future Directions
The research team plans to further enhance LRM by increasing its size and training data using a simpler transformer-based design with minimal regularization. They also aim to extend it to multi-modal generative models in 3D.
Practical Applications and Value
LRM and similar image-to-3D reconstruction models have the potential to automate certain tasks performed by 3D designers. These technologies can increase growth and accessibility in the creative sector.
If you’re looking to evolve your company with AI and stay competitive, consider leveraging the capabilities of LRM. AI can redefine your work processes, automate customer interactions, and drive business outcomes. Connect with us at hello@itinai.com for AI KPI management advice and explore our AI solutions at itinai.com.
Spotlight on a Practical AI Solution:
Discover the AI Sales Bot from itinai.com/aisalesbot, designed to automate customer engagement and manage interactions across all stages of the customer journey. Explore how AI can redefine your sales processes and customer engagement.