From a 2D Photo to a 3D Model with Nvidia Ganverse3D with Nvidia Omniverse Create

Remember that a few months ago I talked about a new ground-breaking application, yet to be released, from Nvidia Research Labs, called GanVerse3D, capable of rendering 3D models with texture from 2D photos? 

Not only that, GanVerse3D was also supposed to be capable of animating the 3D model at a click of a button.

Yes, the wait is finished. Anyone, who is lucky enough to own an RTX graphics card can now try this ground-breaking application in 3D Deep Learning. 
All you need to do is to install Nvidia Omniverse. 

But let me temper your expectations. This demo of GanVerse3D only works with photos of cars, and only works with the right type of car. You will see what I mean. Also, the 3D models don’t look amazing yet. But don’t worry, I am sure that after a few updates they will be looking great!

If you haven’t heard of Nvidia Omniverse before, you need to know about it. Nvidia is not just selling hardware anymore. It is now making money from the metaverse. 

I bet you heard that Mark Zuckerberg, from Facebook is trying to create the metaverse too. While Facebook is trying to create an alternative reality in the digital world, or shall we call it a cloud cuckoo land, Nvidia is doing something entirely different!

Nvidia’s metaverse is a virtual collaboration world for engineers, with applications in the real world. 

What Omniverse, does is to visualize and simulate a virtual world, which includes realistic physics and realistic materials using 3D assets, textures, point clouds, created in a number of 3D modeling applications. And, by the way, Blender is included too! 

There’s so much more to Nvidia Omniverse, including the ability to simulate robots in a virtual factory, testing driverless cars in a virtual world, and many more! If you want to know more about what Nvidia Omniverse is, then check out my last video on this.

But Ganverse3D is not the only AI App in In Nvidia Omniverse. There’s more! For example Audio2Face!

And what is Audio2Face? Audio2Face is an application inside Nvidia Omniverse that allows you to animate any humanoid mesh from a sample of audio. This is similar to Wav2Lip, which I also talked about a few months ago, but in this case, instead of animating 2D photos, it can animate a mesh in 3D. 

When you try Ganverse3D, you can see the different parts of the car that were reconstructed by Ganverse3D.

Nvidia has created a special game scene that allows us to do a test drive of our Ganverse3D cars. Let’s give it a go!

You can definitely see, that it is a work in progress. My car was floating! But anyway it was fun to defy the law of physics!

What you are seeing in Ganverse3D, is the result of research from at least three different research papers from Nvidia. And let’s not forget all the research papers before that. 

Nvidia didn’t get here alone! 

The most recent paper is DatasetGAN from 2021. In this paper, Nvidia Researchers describe the technique used to generate the massive dataset required of car images using a Generative Adversarial Network, aka GAN, used to train Ganverse3D, the neural network from the paper Ganverse3D.

These images are semantically segmented down to the pixel level, and DatasetGAN is also able to control the camera viewpoints for each image it generates. 

This allows DatasetGAN to generate images with the exact viewpoints required for training Ganverse3D, the neural network capable of reconstructing a 3D model with 2D supervision using DIB-R, the differential renderer in another of Nvidia’s papers. 

Ganverse3D is also capable of distinguishing the functional parts of the car, for example, the headlights and the wheels, and this is why Ganverse3D can animate a car!


Research Papers

Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Wenzheng Chen, Jun Gao*, Huan Ling*, Edward J. Smith*, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler
Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

Shunyu Yao, Tzu Ming Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, Bill Freeman, and Josh Tenenbaum
3d-aware scene manipulation via inverse graphics. In Advances in neural information processing systems,

P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200.
Technical Report CNS-TR-2010–001, California Institute of Technology, 2010.

Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. Beyond pascal: A benchmark for 3d object detection in the
wild. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2014.

Related Videos

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization (ICCV 2019)