When I was doing the first round of AKS cluster upgrades at my current client, I noticed we were running a lot of pods with only 1 replica. I always try to lift my clients to the next level by leveraging Cloud Native technologies as much as possible. I’m therefore starting a project to always run applications with multiple replicas.

However, running multiple replicas is not the only necessary improvement here. Even though a pod is running with multiple replicas, that does not mean that Kubernetes will always keep them alive. When you do an AKS cluster upgrade, nodes are drained one by one and the pods are moved to a node with the higher k8s version. Technically, when draining a node, Kubernetes could kill both of the pods at the same time if they are running on the same node.

To prevent this we use pod disruption budgets

Helm

We’re using a Helm chart to deploy with ArgoCD. I’m adding this to the templates directory:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pyramid-backend-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      tier: backend

I’m selecing all the pods that have the label “tier: backend”.

To verify that I’m targeting the right pods, I run:

kubectl get pods -l tier=backend

Currently my deployment has only two replicas:

(ins)$ k get deploy
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
pyramid-deployment-backend   1/1     1            1           297d

So when I check my PDB it will show no allowed disruptions.

(ins)$ k get pdb
NAME                  MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
pyramid-backend-pdb   1               N/A               0                     13m

This is good, this is the expected behaviour. Kubernetes will now prevent me from draining the node because there are no allowed disruptions.

Then I scale up the deployment:

(ins)$ k scale deploy --replicas=2 pyramid-deployment-backend
deployment.apps/pyramid-deployment-backend scaled

And then I get an allowed disruption of 1:

(ins)$ k get pdb
NAME                  MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
pyramid-backend-pdb   1               N/A               1                     15m

With this configuration, Kubernetes will never kill all of the pods simultaneously in case a node needs to be drained. It will make sure that one of the pods keeps running when it is rescheduling pods for an upgrade.

https://kubernetes.io/docs/tasks/run-application/configure-pdb/

202311271411