| Oct 29, 2023 | Biraj sarmah |

AI Image Generator Explained

 Hi everyone, I’m Biraj,from Audio AI, and I’m researching on how to construct intelligent robot agents. Today, I’ll be explaining how AI image generators work.

First question, how do large text guided diffusion models work?
So large text guided diffusion models are trained on a very large data set of images and captions on the internet. When you search up different words on Google search, you can find all types of different images for a particular phrase. Text image diffusion models are trained on Millions and millions of these image and taxpayers on the internet, and they’re trained to be able to reconstruct and generate images are similar to the images on the internet because they’re trained on such massive amounts of data.

You can imagine that when you get, when you give this model a new text prompt, because it’s been trained to be able to recapture the images that it’s seen before, it is able to generate an image that matches the prompt. Next question, what makes the fusion model so impactful? So I think the main reason why we’re seeing all these incredible advances with these diffusion models is because of the fact that we are training them with so much data and with so much computational power.

So the idea of diffusion models themselves, uh, it’s a very old idea, but what really makes them work now is the idea that we’ve now, we now have so many, so much computational resources and so much computational power that we’re able to train them at very large scales.

Can you sell AI image generated images?
Yeah, I guess right now there isn’t really, uh, copyright or anything like that. I think people have, uh, definitely been selling some of these AI artworks. Like, I heard that Mid Journey Diffusion, uh, was selling artwork for millions of dollars each month. So, yeah, yeah, I think you can definitely sell them.

I, I think that this is a question of, like, maybe we should add some licensing or something like that because these models are trained on many, many images that are copyrighted. But right now, it seems like you can easily sell them. How are models prevented from showing harmful or offensive images?

Especially considering the use of open source programs. Yeah, this is a great question. In practice, these models are trained on hundreds of millions of images on the internet. So there’s all types of harmful things, uh, on the internet. In practice, one way to prevent these models from doing this would be to very carefully select and curate.

The images that these models aren’t trained on. But currently, this is a huge problem with a lot of these models, which is why I think that a lot of these companies are not willing to release their very large pre trained models for public use right now. Are there potential applications of generative AI outside of image or text?

Can you apply it, for example, to robotics or control?
Yeah, I think this is a really great question. Actually, I’ve been working on some work where we have been directly applying these generative models exactly for the showcase of control. We have this paper called Planning with Diffusion, where you can use this similar process, this similar generative process to represent different trajectories of actions.

So you can train these models on huge amounts of robotic actions or manipulation demonstrations in video, and then you can imagine you can give the robot a new instruction like Pick up this mug, and then it can, it can use the large model, generative model that’s trained on millions of other videos of robots picking up objects.

So I think there’s many different applications for these generative models. For example, in robotics, or you can imagine for other domains such as protein synthesis or molecular design. Well, it looks like we’re out of questions now. Thank you for listening and asking all these wonderful questions.

Hopefully you learned something about AI image generation. Thank you.


