ETRI’s KOALA Model: A Game-Changer for Ultra-Fast AI Image Generation

A revolutionary new generative AI model has been developed that can create high-resolution images from just a text description in a mere two seconds. This breakthrough in artificial intelligence has the potential to significantly impact various industries, including creative services, content production, and education.

In Short:

  1. ETRI has developed an ultra-fast generative visual intelligence model, known as the ‘KOALA’ model, that can create images from text inputs in just 2 seconds.
  2. It has managed to reduce the model’s size considerably and increase the generation speed.
  3. The ‘KOALA’ model is significantly faster than other models in the market.

What is the ‘KOALA’ Model?

The ‘KOALA’ model is a fast text-to-image model developed by the Electronics and Telecommunications Research Institute (ETRI). It uses a technique called knowledge distillation to compress the U-Net of the Stable Diffusion XL (SDXL) model. The KOALA-700M model can generate a 1024×1024 image in less than 1.5 seconds on an NVIDIA 4090 GPU, which is more than 2x faster than SDXL. This model offers a balance between speed and performance, making it a cost-effective alternative to SDXL in resource-constrained environments.

What is ETRI?

The Electronics and Telecommunications Research Institute (ETRI) is a South Korean government-funded research institution. Established in 1976, ETRI has been at the forefront of technological excellence for over 40 years. It is one of the leading research institutes in the wireless communication domain with more than 2,500 patents filed. ETRI strives to advance science by formulating innovative ideas, developing new techniques, and training professional individuals in the area of information telecommunications.

How Does the KOALA Model Work?

The ‘KOALA’ model, developed by ETRI, is a breakthrough in AI image generation. It works by significantly reducing the parameters from 2.56 billion of the public SW model to 700 million using a technique called knowledge distillation. This reduction in parameters leads to fewer computations, thus decreasing processing times and operational costs. The model size is reduced by a third, which improves the generation of high-resolution images, making it twice as fast as before and five times faster compared to DALL-E 3. This efficiency makes KOALA a game-changer in the field.

Download and Install the KOALA Model

Here are the steps to download and install the KOALA model:

Step 1: Go To The Hugging Face Page For The Koala Model.

Step 2: Choose Between The Two Types Of Compressed U-Net, Koala-1b, And Koala-700m.

Step 3: Click On The ‘Download’.

Step 4: After Downloading, Install The Model.

Please note that the exact usage might vary based on the specific environment and requirements.

How to Use the KOALA Model?

Here are the steps to use the KOALA model:

Step 1: Visit The Hugging Face Model Page To Access The Koala Model.

Step 2: Choose Between KOALA-1B and KOALA-700M.

Step 3: Prepare The Text Input That You Want To Convert Into An Image.

Step 4: Use The Diffusers Library To Run The Model With The Prepared Text Input.

Step 5: The Model Will Generate A 1024×1024 Image On An Nvidia 4090 GPU.

Benefits of The KOALA Model

Here are the key benefits of the KOALA model:

  1. Efficient U-Net Architecture: KOALA models use a simplified U-Net architecture that reduces the model size by up to 54% and 69% respectively compared to its predecessor, Stable Diffusion XL (SDXL).
  2. Fast Image Generation: KOALA-700M can generate a 1024×1024 image in less than 1.5 seconds on an NVIDIA 4090 GPU, which is more than 2x faster than SDXL.
  3. Cost-Effective: The model’s reduced size and increased generation speed enable its operation on low-cost GPUs with only 8GB of memory.
  4. High Quality: Despite its efficiency, the model maintains a high quality of image generation.
  5. Accessible: ETRI has released the KOALA models in the HuggingFace environment, making them easily accessible for use.

What is the LAION-aesthetics-V2 6+ Dataset?

The LAION-Aesthetics V2 6+ dataset is a subset of the LAION 5B dataset, which is known for its high visual quality. This specific subset includes images that scored 6.5 or higher via aesthetics prediction models. These models were trained to predict the rating people gave when asked “How much do you like this image on a scale from 1 to 10?”. The dataset is used in various AI research and applications, particularly in training models like KOALA.

What is the ‘Ko-LLaVA’ Model?

The ‘Ko-LLaVA’ model, developed by the Electronics and Telecommunications Research Institute (ETRI), is a conversational visual-language model. It adds visual intelligence to conversational AI like ChatGPT. The model can retrieve images or videos and perform question-answering in Korean about them. It was developed in an international joint research project with the University of Wisconsin-Madison and ETRI. The model utilizes the open-source LLaVA (Large Language and Vision Assistant) with image interpretation capabilities at the level of GPT-4.

KOALA Model Vs Ko-LLaVA Model

KOALA: Generates images from text descriptions (text-to-image).

Ko-LLaVA: Adds visual intelligence to conversational AI (conversational visual language).

Ko-LLaVA Model Capabilities

  1. Text Generation: It can generate text in Korean.
  2. Image and Video Retrieval: The model can retrieve images or videos based on the input.
  3. Question-Answering: Ko-LLaVA can perform question-answering in Korean about images or videos.
  4. Image Description: The model can provide descriptions for images.
  5. Video Description: In addition to images, it can also provide descriptions for videos.
  6. Integration with Other Models: Ko-LLaVA can be used in conjunction with other models like KOALA.

Practical Applications of the KOALA Model

The KOALA model developed by ETRI has several practical applications:

  1. Art and Design: The model can be used to generate artwork and assist in design processes. By inputting descriptive text, artists, and designers can use the model to quickly generate visual concepts.
  2. Educational and Creative Tools: The model can be integrated into educational or creative tools to provide visual aids based on textual descriptions. This can enhance learning experiences and stimulate creativity.
  3. Research on Generative Models: The KOALA model serves as a valuable resource for researchers studying generative models. Its efficiency and performance can provide insights into the development of future models.
  4. Safe Deployment of Models: The KOALA model can be used to explore safe deployment strategies for models that have the potential to generate harmful content.
  5. Understanding Limitations and Biases: The model can be used to probe and understand the limitations and biases of generative models.

System Requirements For the KOALA Model

The KOALA model developed by ETRI is designed to run efficiently on GPUs. Specifically, the KOALA-700M model can generate a 1024×1024 image in less than 1.5 seconds on an NVIDIA 4090 GPU. However, the exact system requirements may vary depending on your specific setup and requirements.

Limitations of the KOALA Model

The KOALA model, despite its impressive capabilities, does have some limitations:

  1. Text Rendering: The models face challenges in rendering long, legible text within images. This means that the model might struggle to generate images that contain a lot of text or require specific text to be legible.
  2. Complex Prompts: KOALA sometimes struggles with complex prompts involving multiple attributes. This means that the model might not always accurately generate images when given complex or multi-faceted prompts.
  3. Dataset Dependencies: The current limitations are partially attributed to the characteristics of the training dataset (LAION-aesthetics-V2 6+). This means that the model’s performance and capabilities might be influenced by the specific characteristics and limitations of the dataset it was trained on.


ETRI’s ultra-fast generative visual intelligence model is a significant step forward in the field of AI. By combining generative AI and visual intelligence, this model can create images from text inputs in just 2 seconds, making it a game-changer in the industry.


Is the KOALA Model free?

Yes, the KOALA model developed by ETRI is available for free on the Hugging Face platform.

How has ETRI improved the generation of high-resolution images?

ETRI has improved the generation of high-resolution images by reducing the model size by a third, making it twice as fast as before and five times faster compared to DALL-E 3.

What is the future of the ‘KOALA’ model?

The ‘KOALA’ model is expected to propel the field of ultra-fast generative visual intelligence, making it a game-changer in the industry.