Container and cluster scaling on Kubernetes using Horizontal Pod Autoscaler and Cluster Autoscaler on AWS EKS


Kubernetes is about scaling but it doesn’t mean that we have auto scaling out-of-the-box – we must activate some additional components and configure them. In this article I want to show you working example of simple service scaling on Kubernetes using Horizontal Pod Autoscaler. After reaching maximum cluster capacity we will automatically add more workers to our cluster using Cluster Autoscaler. Everything will run on Amazon EKS – managed Kubernetes on AWS.


As always full source code can be found on my github:

Simple deployment creating load

To test auto scaling we need some simple service which will create big load for us so scaling can be triggered. For that task we are going to use following Python code:

import flask
import uuid
import hashlib
app = flask.Flask(__name__)
def hello():
    for i in range(0,800000):
    return "Done""", threaded=True)

That code is really simple – just create web endpoint using Flask framework – GET request on “/” will cause long loop which calculate a lot of SHA hashes from random UUID what should take about 5–10 seconds and consume a lot of CPU during that time.

To avoid building own Docker image as we want to avoid creating docker registry (to simplify example) we can use simple trick by taking docker image jazzdd/alpine-flask:python3 which is available on dockerhub and contains Python/Flask installed. So we can create our python file in “command” section and run it, see full yaml below:

apiVersion: extensions/v1beta1
kind: Deployment
  namespace: default
  name: microservice
  replicas: 1
        app: microservice
      - name: microservice
        image: jazzdd/alpine-flask:python3
        command: ["sh"]
        args: ["-c", "printf \"import flask\\nimport uuid\\nimport hashlib\\napp = flask.Flask(__name__)\\n@app.route(\\\"/\\\")\\ndef hello():\\n    for i in range(0,800000):\\n     hashlib.sha224(uuid.uuid4().hex.upper()[0:6].encode()).hexdigest()\\n    return \\\"Done\\\"\\\\\"\\\", threaded=True)\" > && python3"]
        - name: http-port
          containerPort: 5000
            cpu: 200m

Important thing there is resources request block which says that on 1 CPU core machine (which we are going to use in this article) we can create 5 microservice PODs (200m x 5 = 1000m = 1CPU) and reaching that number means end of capacity of particular node. Reaching cluster capacity will be trigger for Cluster Autoscaler.

Horizontal Pod Autoscaler

Horizontal scaling in Kubernetes world means adding more pods in particular deployment. To achieve that Horizontal Pod Autoscaler can be used but we need to note one important thing: In the newest Kubernetes version metrics-server need to be installed to use HPA – Heapster is deprecated and shouldn’t be used anymore. 

To test that on minikube you need just to type:

minikube addons enable metrics-server

To deploy metrics-server on EKS you need to clone following repository: and then issue command:

kubectl apply -f metrics-server/deploy/1.8+/

To activate HPA for our microservice we need to apply following yaml file:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  namespace: default
  name: microservice
    apiVersion: apps/v1beta1
    kind: Deployment
    name: microservice
  minReplicas: 1
  maxReplicas: 10
    - type: Resource
        name: cpu
        targetAverageUtilization: 50

targetAverageUtilization: 50 means that Kubernetes will try to maintain half of usage of CPU requested by our microservice (50% * 200m = 100m) on particular POD. E.g. when we have single POD which is having 200m of CPU, Kubernetes will create new POD so 200m can be divided on 2 PODs (100m and 100m).

AWS EKS and Cluster Autoscaler

Disclaimer – why use Cluster Autoscaler instead of ASG scaling trigger based on CPU?

From Cluster Autoscaler FAQ:

“Cluster Autoscaler makes sure that all pods in the cluster have a place to run, no matter if there is any CPU load or not. Moreover, it tries to ensure that there are no unneeded nodes in the cluster.

CPU-usage-based (or any metric-based) cluster/node group autoscalers don’t care about pods when scaling up and down. As a result, they may add a node that will not have any pods, or remove a node that has some system-critical pods on it, like kube-dns. Usage of these autoscalers with Kubernetes is discouraged.”

For EKS deployment we are going to use modified EKS version from my previous article.

Cluster Autoscaler is component which will be installed on EKS cluster. It will look in Kubernetes API and make request to AWS API to scale worker nodes’s ASG. It means that node on which Cluster Autoscaler will reside need proper IAM policy which will allow container from that node to make operations on ASG.

 resource "aws_iam_role_policy" "for-autoscaler" {
  name = "for-autoscaler"
  policy = <<POLICY
    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "*"
  role = "${}"

That policy should be probably limited in Resource section but we will leave * to simplify example.

We put some additional tags to ASG to use them in Cluster Autoscaler

  tag {
    key = ""
    value = "whatever"
    propagate_at_launch = false

  tag {
    key                 = ""
    value               = "owned"
    propagate_at_launch = true

We must also setup security groups to allow 443 port communication from cluster control plane to nodes as mentioned in this issue:

For Cluster Autoscaler we will modify a little example deployment from here:

We need to modify tags which Cluster Autoscaler will use to discover ASG which will be scaled:


Add env with region in which we are operating:

   - name: "AWS_REGION"
     value: eu-west-1

Change certificate to use what is required for EKS:

   - name: ssl-certs
     mountPath: /etc/ssl/certs/ca-bundle.crt
     readOnly: true

Cluster Autoscaler is ready to use and will scale up or down worker nodes using ASG scaling between 1 and 10 instances.


Last step is to create load balancer attached to microservice and test auto scaling by making some requests to create load.

apiVersion: v1
kind: Service
  name: microservice
  annotations: nlb
    app: microservice
  - port: 80
    targetPort: http-port
    app: microservice
  type: LoadBalancer

You can just simply try to open load balancer endpoint on root in web browser and hit f5 a few times to generate load or use script like:

while true; do sleep 2; timeout 1 curl http://<elb_id>; done


After that you should see in Kubernetes that HPA scaled your containers up and reached maximum node capacity. After while Cluster Autoscaler should scale AWS ASG and add new worker node so HPA can complete PODs scaling.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s