Nvidia's new text-to-3D model shows how fast generative AI is advancing

An image of a 3D origami dog on a skateboard generated by Nvidia LATTE3D

(Image credit: Nvidia)

Nvidia's on quite a roll. After revealing its Blackwell superchip, which is designed for the training of more powerful AI models like GPT, Claude and Gemini, it's teased a text-to-3D AI tool of its own (see our guide to the best graphics cards for consumer options).

The graphics card giant closed GTC week by showcasing LATTE3D, a text-to-3D generative AI model that it described as a "virtual 3D printer". It can turn text prompts into 3D representations of objects and animals within a second.

Nvidia says the 3D shapes generated by LATTE3D can be "easily served up in virtual environments for developing video games, ad campaigns, design projects or virtual training grounds for robotics". We've seen text-to-3D tools before, and commends online suggest some aren't too impressed with the quality of LATTE3Ds results. But the new model represents a big advance, especially in terms of speed.

Nvidia says it produce 3D shapes almost instantly when running inference on a single GPU, such as the NVIDIA RTX A6000 used for the research demo. This means that a creator starting a design from scratch or combing through a 3D asset library could use LATTE3D to generate detailed objects as quickly as the ideas occur to them.

The model generates several 3D shape options based on each text prompt. The desired objects can be optimised for higher quality and then exported to graphics software applications or platforms like NVIDIA Omniverse, which enables Universal Scene Description (OpenUSD)-based 3D workflows and applications.

“A year ago, it took an hour for AI models to generate 3D visuals of this quality — and the current state of the art is now around 10 to 12 seconds,” Sanja Fidler, vice president of AI research, said “We can now produce results an order of magnitude faster, putting near-real-time text-to-3D generation within reach for creators across industries.”

Images of dogs generated by the Nvidia LATTE3D AI model — 3D dogs generated by the Nvidia LATTE3D AI model (Image credit: Nvidia)

LATTE3D was developed by Nvidia's Toronto-based AI lab team and was trained using text prompts generated using ChatGPT to improve the model’s ability to handle the various phrases a user might come up with to describe a particular 3D object. While the researchers trained LATTE3D on two specific datasets, animals and everyday objects, the same architecture could be used to to train the AI on other data types. It remains a research project only and is not available to for public use.

The AI creator Bilawal Sidhu wrote on X: "This leap is huge. DreamFusion circa 2022 was slow and low quality, but kicked off this generative 3D revolution. Efforts like ATT3D (Amortized Text-to-3D Object Synthesis) chased speed at the cost of quality. Now with LATTE3D is high quality and processes in less than a second! Meaning you can quickly iterate and populate a 3D world using text or image to 3D."

Along with video, 3D is the next frontier for AI image generation. Also this week, Adobe announced the integration of its first Firefly AI-driven tools in Substance 3D.

TOPICS

Joe is a regular freelance journalist and editor at Creative Bloq. He writes news, features and buying guides and keeps track of the best equipment and software for creatives, from video editing programs to monitors and accessories. A veteran news writer and photographer, he now works as a project manager at the London and Buenos Aires-based design, production and branding agency Hermana Creatives. There he manages a team of designers, photographers and video editors who specialise in producing visual content and design assets for the hospitality sector. He also dances Argentine tango.