We’ve just launched our Kubernetes autoscaler, which creates and destroys new Kubernetes cluster nodes in response to demand and I thought I’d go into more detail about how it all works.
Perhaps surprisingly, Kubernetes doesn’t include mechanisms to create and destroy resources on the cloud within the core code. Instead it delegates that task to components running on the cluster (like our Cloud Controller Manager which provides node information and controls access to the Load Balancers).
The component that adjusts the size of a Kubernetes is the Cluster Autoscaler, and we have created a version that you can run to automatically adjust the size of the cluster so that all pods have a place to run and there are no unneeded nodes.
The autoscaler looks for Server Groups named
after the cluster-name option passed to the autoscaler (--cluster-name
).
A group named with a suffix of the cluster-name (e.g.
k8s-worker.k8s-test.cluster.local
) is a candidate to be a scaling group. The
autoscaler will then check the description to see if it is a pair of integers
separated by a colon (e.g. 1:4
). If it finds those numbers then they will
become the minimum and maximum server size for that group, and autoscaler will
attempt to scale the group between those sizes.
The type of server, the image used and the target zone will be dynamically determined from the existing members. If these differ, or there are no existing servers, autoscaler will log an error and will not scale that group.
A group named precisely the same as the cluster-name (e.g.
k8s-test.cluster.local
) is considered to be the default cluster group and all
autoscaled servers created are placed within it as well as the scaling group.
The Brightbox Autoscaler only supports auto-discovery mode using
this pattern. node-group-auto-discovery
and nodes
options are
effectively ignored.
If you are using the Kubernetes Cluster
Builder set the worker_max
values to scale the worker group, and the storage_max
values to scale the
storage group.
The Cluster Builder will ensure the group name and description are updated with the correct values in the format that autoscaler can recognise.
Cluster Autoscaler works by listening to the Kubernetes scheduler and picking up on any unschedulable pod events. If it spots any it runs a ghost scheduling process against a changed list of nodes and if that would schedule the pods it asks the cloud provider to create the nodes. Similarly if it detects any idle resources it calculates a proposed smaller set of nodes and asks the scheduler whether it can reschedule the pods. If it can the node is drained and deleted on the cloud provider.
The main challenge has been updating our Kubernetes Cluster manifests so they create workers that will bootstrap themselves and connect to the cluster. Once that was in place we implemented a cloud driver in the Cluster Autoscaler and everything worked nicely.
However this is a compiled-in driver and means that we have to maintain a fork of the autoscaler repository. Because autoscaler uses the Kubernetes scheduler code that means maintaining a version for each current version of Kubernetes.
Obviously we’d rather avoid having to do this work, which is why we’re excited to see the autoscaler merge a Cluster API driver into the code. The ClusterAPI project allows each cloudprovider to expose its resources as native objects on Kubernetes Clusters allowing you to deploy cloud resources as you would normal Kubernetes pods, deployments or services.
We’ll be keeping an eye on this and see how the Cluster API matures.
If you want to play with Kubernetes, you can sign up for Brightbox real quick and use your £50 free credit to give it a go.
If instead you want us to run Kubernetes for you, or anything else for that matter, we offer hands-on support and managed services too. Drop us a line.