Kubernetes Horizontal Pod Autoscaler

Horizontal Pod Autoscaler(HPA) is a controller that can scale most of the pod-based resources up and down based on your application workload. It does this by scaling the number of replicas of your pod once certain preconfigured thresholds are met and for the many applications we deploy scaling mostly depends on only a single metric which is CPU usage. To use HPA we need to define the number of maximum and minimum pods that we want to use for a particular application and also the memory percentage. If HPA is successfully enabled for a particular application Kubernetes will automatically monitor and controls the scaling up and down of pods based on the minimum and maximum limit we have defined.

For example, we will consider an application like Airbnb that runs in Kubernetes and it experiences high traffic of users if there is any offer on booking hotels and flights if the application is not optimized for handling this traffic, users may experience slow response times or even downtime. By using HPA, you may specify a target CPU usage percentage, a minimum and a maximum number of running pods, and other parameters. Kubernetes will automatically increase the number of pods to manage the increasing traffic when the CPU utilization reaches the specified level.

YAML code for HPA:

apiVersion: autoscaling/v2    
#this specifies Kubernetes API Version 
kind: HorizontalPodAutoscaler   
# this specifies Kubernetes object like HPA or VPA 
metadata:
 name: name_of_app   
spec:
 scaleTargetRef:
   apiVersion: apps/v2
   kind: Deployment
   name: name_of_app
 minReplicas: 1
 maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40
  - type: Resource
    resource:
     name: memory
     target:
      type: Utilization
      averageUtilization: 40

The last line ‘targetCPUUtilizationPercentage’ specifies the target CPU utilization percentage that the HPA will aim for when scaling the deployment. In this case, it is set to 50%, meaning that the HPA will attempt to keep the CPU utilization of the deployment at or below 50%. This YAML code will automatically scale the specified deployment based on CPU Utilization with a minimum of 1 and a maximum of 10 replicas. If the average CPU utilization of the container exceeds 50%, the HPA will automatically scale up the deployment to maintain optimal performance

Kubernetes – Autoscaling

Pre-requisite: Kubernetes

Life before Kubernetes is like writing our code and pushing the code into physical servers in a data center and managing the resources needed by that server to run our application smoothly and another type is deploying our code in virtual machines(VM). With VMs also have problems with hardware and software components required by VMs costs are high and there are some security risks with VMs. Here comes the role of Kubernetes. It is an open-source platform that allows users to manage, deploy and maintain a group of containers and it is like a tool that manages multiple docker environments together. The problems we faced in VMs can be overcome by Kubernetes(K8s).

Kubernetes Horizontal Pod Autoscaler

Kubernetes – Autoscaling

Similar Reads