Build an AI Image Generator App With Tkinter

Let’s take a brief look at the field of diffusion models, which are used to text to create images. Using a Markov chain, a diffusion model gradually adds noise to the data before reversing the process and creating the necessary data sample from the noise. Notable diffusion models are StabilityAI’s Stable diffusion, Google’s Imagen, and OpenAI’s DALLE2.

Build an AI Image Generator App With Tkinter

Here, We’ll look at how to create an app using Tkinter and Stable Diffusion from Stability AI. While generative models like Flow-based models, Variational Autoencoders (VAEs), and Generative Adversarial Networks (GANs) are well-known, diffusion models are more advanced, effective, and produce better results.

Building the App

We’ll walk through creating a basic GUI application using a custom Tkinter. The goal is to set up an application window that can serve as the foundation for the “Text to Image” converter. Below is a snippet of Python code that demonstrates the creation of a basic GUI application:

Python
from customtkinter import *

app = CTk()
# Setting the width and height of the app 
app.geometry("500x500") 
# Theme of the app 
set_appearance_mode('light')
# Title 
app.title('Text to Image')
app.mainloop()

Output:

Output of the above code

Setting Up the Inline Text Bar and Clickable Button

By using customtkinter to build upon our basic GUI design, we can improve the user interface by including interactive elements like buttons and input fields. This is an upgrade to our earlier code that adds an action-triggering button and an entry widget for text input.

Python
# Setting the inline text bar for entering the prompt by user
prompt = CTkEntry(app, height=30, width=350, font=("Arial", 15), text_color="black", fg_color="white") 
prompt.place(x=10, y=10)

# Generate button for image generation
trigger = CTkButton(app, height=30, width=120, font=("Arial", 15), text_color="white", fg_color="#3DB7E4") 
trigger.configure(text="Generate") 
trigger.place(x=370, y=10) 

Output

Inline text bar and generate button

Adding a Label to Display Images

Let’s add a label widget to our GUI application to enhance it even more. It will act as a placeholder for images that are created based on the text input.

Python
# Adding a label for displaying the image 
lmain = CTkLabel(app, height = 400, width = 400, bg_color = '#f1ac85', corner_radius = 15)
lmain.place(x=50, y=70)

Output

Label for Image holding and displaying (Orange color is illustrated only for displaying purpose later it will be replaced by the Image)

Setup the Access token in hugging face

Hugging face gives us access to a stable diffusion model. Hugging Face is a platform and community that offers free and open-source machine learning datasets and models. Hugging Face is free, yet there is a paid tier as the models are open source. If you don’t already have one, you must make a profile on hugging face. You require a “Access token” after registering.

Steps to Setup the Access token in hugging face:

  • Click on your profile icon
  • Click on “Settings”
  • Navigate to “Access Tokens” on the left tab
  • You can either generate a new token or utilize an existing one.
  • Copy the token

Image Generation

We Import a pre-trained Stable Diffusion model and is identified by modelid. The model is loaded using the pipe variable with the provided authentication token, and the encoder part of the model is deleted to save memory.

Using the Stable Diffusion model, we develop a generate() method that reads the text prompt from the entry field, creates an image, saves it, and modifies the image_label to show the updated image. When the “Generate” button is hit, the generate() function is supposed to be called.

Python
from customtkinter import *  
from PIL import ImageTk
import torch
from diffusers import StableDiffusionPipeline 

auth_token = "Your Auth Token from Huggingface"

app = CTk() 
app.geometry("500x500")
set_appearance_mode('light') 

prompt = CTkEntry(app, height=30, width=350, font=("Arial", 15), text_color="black", fg_color="white") 
prompt.place(x=10, y=10)

image_label = CTkLabel(app, height = 400, width = 400, bg_color = 'white', corner_radius = 15)
image_label.place(x=50, y=70)

modelid = "CompVis/stable-diffusion-v1-4"
device = "cuda" if torch.cuda.is_available() else "cpu" 
pipe = StableDiffusionPipeline.from_pretrained(modelid, use_auth_token=auth_token) 

del pipe.vae.encoder 

def generate(): 
    image = pipe(prompt.get(), guidance_scale=8.5).images[0]
    image.save('generatedimage.png')
    img = ImageTk.PhotoImage(image)
    image_label.configure(image=img) 

trigger = CTkButton(app, height=30, width=120, font=("Arial", 15), text_color="white", fg_color="#3DB7E4", command=generate) 
trigger.configure(text="Generate") 
trigger.place(x=370, y=10) 

app.title('Text to Image') 
app.mainloop() 

The guidance_scale parameter is set to 8.5. In order to produce a picture that closely resembles the text prompt, the guidance scale regulates the strength of the guiding signal sent to the model. Images that more closely fit the prompt but might be less varied are produced by a higher guiding scale value. Once the code has been executed, give it some time to download the Stable Diffusion model prerequisites. Depending on your internet speed and system specifications, this could take a few minutes.

Note: This model will operate on the CPU if you don’t have access to a GPU. This will work, but depending on the capacity of your CPU, it might take longer to generate photos.

Output

Images Generated by the model

Conclusion

In conclusion, this article offers a thorough walkthrough for developing an application that integrates AI for image production and uses Python’s Tkinter framework. It includes crucial topics including configuring the development environment, putting the user interface in place, and adding AI models to produce images.