Setting up Kubernetes cluster autoscaling
This guide takes you through setting up your Brightbox Kubernetes cluster to automatically add new capacity when required.
Requirements
You’ll need to have deployed a Kubernetes cluster with our Terraform configuration.
Configure the cluster to autoscale
Our Terraform manifests automatically deploy the Brightbox Kubernetes cluster
autoscaler, so all we need to do is configure the maximum number of new workers
that can be built. Just set the terraform variable worker_max
in your
local.auto.tfvars
:
worker_max = 5
And run terraform apply
which will configure the cluster with the new setting:
$ terraform apply
...
Apply complete! Resources: 1 added, 1 changed, 1 destroyed.
And we’re ready to scale up an application!
Connect to your Kubernetes cluster
If you’re using our Terraform configuration, the master
output is the public
IP address of the Kubernetes master server. You can SSH into this server using
your SSH key:
$ ssh ubuntu@$(terraform output master)
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-111-generic x86_64)
Last login: Thu Jul 16 09:48:47 2020 from 86.31.15.94
And use kubectl
on the master to inspect the cluster. In this particular
example cluster, I have 1 master node and just 1 worker node:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
srv-4dbz0 Ready master 16m v1.18.5
srv-hrgv1 Ready worker 14m v1.18.5
Deploy an example application
We’ll deploy a basic hello world application to play with here. First create a namespace for it:
$ kubectl create namespace example
Then create the Deployment, and we’ll specifically request 512MB ram per pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-world
namespace: example
spec:
replicas: 1
selector:
matchLabels:
app: hello-world
template:
metadata:
labels:
app: hello-world
spec:
containers:
- name: app
image: brightbox/rails-hello-world
ports:
- name: web
containerPort: 3000
protocol: TCP
resources:
requests:
memory: "512Mi"
And apply it:
$ kubectl apply -f hello-world-deployment.yaml
deployment.apps/hello-world configured
Inspect the deployment
So now we can see the hello-world
deployment on this cluster is set to just 1
replica, which means there is only 1 pod running the application:
$ kubectl -n example get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-world 1/1 1 1 3m
$ kubectl -n example get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-5f48c6bb68-drw4n 1/1 Running 0 38s 192.168.146.71 srv-hrgv1 <none> <none>
Scale up the deployment
Now let’s scale up the deployment. First, let’s increase the replicas to 2:
$ kubectl -n example scale --replicas=2 deployment/hello-world
deployment.apps/hello-world scaled
$ kubectl -n example get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-5f48c6bb68-drw4n 1/1 Running 0 8m48s 192.168.146.71 srv-hrgv1 <none> <none>
hello-world-5f48c6bb68-fxtnc 1/1 Running 0 3s 192.168.146.72 srv-hrgv1 <none> <none>
Here we can see that an additional pod was created for the application, but
since there was room on the existing worker (srv-hrgv1
), it was just allocated
onto there by the Kubernetes scheduler. Let’s scale it up a bit further, beyond
the existing capacity of the cluster:
$ kubectl -n example scale --replicas=4 deployment/hello-world
deployment.apps/hello-world scaled
$ kubectl -n example get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-5f48c6bb68-drw4n 1/1 Running 0 10m 192.168.146.71 srv-hrgv1 <none> <none>
hello-world-5f48c6bb68-fxtnc 1/1 Running 0 2m10s 192.168.146.72 srv-hrgv1 <none> <none>
hello-world-5f48c6bb68-c2mjx 0/1 Pending 0 4s <none> <none> <none> <none>
hello-world-5f48c6bb68-qrv2h 0/1 Pending 0 4s <none> <none> <none> <none>
Now we can see two additional pods are listed in Pending
state, as there is no
capacity available for them. This is where the autoscaler kicks in. If we give
it a couple of minutes and check the pod status again, we’ll see they’re been
allocated to a brand new node (srv-e4isv
)
$ kubectl -n example get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-5f48c6bb68-drw4n 1/1 Running 0 15m 192.168.146.71 srv-hrgv1 <none> <none>
hello-world-5f48c6bb68-fxtnc 1/1 Running 0 7m4s 192.168.146.72 srv-hrgv1 <none> <none>
hello-world-5f48c6bb68-c2mjx 1/1 Running 0 4m58s 192.168.102.1 srv-e4isv <none> <none>
hello-world-5f48c6bb68-qrv2h 1/1 Running 0 4m58s 192.168.102.2 srv-e4isv <none> <none>
and indeed there is a new node in the cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
srv-4dbz0 Ready master 5h41m v1.18.5
srv-e4isv Ready <none> 115m v1.18.5
srv-hrgv1 Ready worker 5h39m v1.18.5
we can have a peek behind the curtain by inspecting the logs for the cluster
autoscaler deployment over in the kube-system
namespace. It detected that some
pods needed scheduling, figured how how many new workers were needed and built
them:
I0716 13:19:14.581283 1 scale_up.go:322] Pod example/hello-world-5f48c6bb68-qrv2h is unschedulable
I0716 13:19:14.581486 1 scale_up.go:322] Pod example/hello-world-5f48c6bb68-c2mjx is unschedulable
I0716 13:19:14.581852 1 scale_up.go:452] Best option to resize: grp-v28su
I0716 13:19:14.581964 1 scale_up.go:456] Estimated 1 nodes needed in grp-v28su
I0716 13:19:14.582073 1 scale_up.go:569] Final scale-up plan: [{grp-v28su 1->2 (max: 5)}]
I0716 13:19:14.582156 1 scale_up.go:658] Scale-up: setting group grp-v28su size to 2
Scale the deployment down
And when we’re done, we can scale the application back down to 1 pod and the autoscaler will remove the servers it built after a few minutes, if they’re no longer needed:
$ kubectl -n example scale --replicas=1 deployment/hello-world
deployment.apps/hello-world scaled
Again, the logs from the autoscaler:
I0716 15:30:59.513137 1 scale_down.go:790] srv-e4isv was unneeded for 10m9.233169593s
I0716 15:30:59.513258 1 scale_down.go:1053] Scale-down: removing empty node srv-e4isv
I0716 15:30:59.519680 1 delete.go:103] Successfully added ToBeDeletedTaint on node srv-e4isv
Terraform vs. autoscaler
So here we got a new worker server built by the autoscaler, outside of terraform, so terraform knows nothing about it - it’s entirely managed by the autoscaler. And the autoscaler knows which workers are managed by terraform, and won’t ever touch those, to avoid stepping on any toes.
You should consider the worker nodes built by terraform as static and the worker nodes built by the autoscaler as dynamic.
Fully automated scaling
This is convenient, but you don’t want to have to scale deployments up and down manually whenever your system’s load changes. To fully automate scaling, something needs to monitor the load and adjust the replicas for your deployments automatically. That something is the Horizontal Pod Scaler and is a topic for a future post.