Using kind to test our Kubernetes Cassandra Operator

Author: Sebastien Bonnet | Posted on: September 25, 2020



How would you test a Kubernetes operator? Would you write unit tests for all its features? Would you want to test its behaviour by isolating it from the components it interacts with? This might be tedious and require a fair amount of mocking, but achieving a good test coverage is possible. Would this be enough though? How would you know then that your operator actually works once deployed? How would you know that it is watching the correct resources and making the right API calls?

Wouldn’t you be missing a huge deal if you didn’t try to test it in a Kubernetes cluster?

That’s the conclusion we arrived at when deciding on the testing strategy for our Cassandra Kubernetes operator. We figured we would never be truly confident that the cluster deployment or our reconciliation process would work in a real-life integrated environment unless we ran the tests against a Kubernetes cluster. We kept unit tests to validate process flows and complex algorithms, but mostly relied on a comprehensive end-to-end test suite to validate the features of our operator inside a Kubernetes cluster, only stubbing out Cassandra nodes whenever possible to reduce bootstrapping time.

Why kind?

This is all well-intended but how do you run a Kubernetes cluster in an open source CI environment? You could set up a self-hosted public-facing Kubernetes service, but this would require a significant amount of expertise and you will need to ensure it remains up-to-date and secure. You could opt for a hosted solution instead, but this would cost an additional subscription that you may not be willing to commit to. Besides, in both cases, you would have to pay for resources that you would only use during CI. The rest of the time your cluster will be idle but still cost you money.

Wouldn’t it be great to bootstrap a Kubernetes cluster just for the duration of the CI providing it’s fast and lightweight?

This prompted us to investigate our options for a lightweight Kubernetes cluster. Our initial requirements were:

  • it should support multi-nodes
  • it should be fast to spin up; read under 5mins
  • it should use a small amount of resources such that it can run on standard Travis VMs (or laptops)
  • it should work as close to a real Kubernetes cluster as possible

When we started writing our Cassandra operator a few years ago minikube was not going to work for us as we wanted to simulate a Cassandra cluster with multiple nodes spread across different availability zones and k3s was in its infancy. DinD seemed the most mature offering and was close enough to the 5mins startup time we were targeting. Shortly after investigating DinD, the project was retired in favour of kind and so we adopted kind instead.

kind is both fast and easy to use; it takes 4 minutes to create a 4-node cluster in Travis and just under 1 minute on my laptop. kind can also be configured to support multiple availability zones, PersistentVolume and pull images from a local registry. We’ll show you how to get this done.

Spreading nodes across multiple availability zones

Nodes in a Cassandra cluster should be placed on different racks to make them highly available and fault-tolerant. A usual setup consists of distributing the racks across 3 availability zones to ensure the cluster can survive a zone outage with RF=3 and CL=QUORUM.

As we are only focusing on the functions of the operator (not of Cassandra) in our testing, setting up a 4-node cluster with 2 racks spread across 2 zones is enough to satisfy the rack awareness feature.

Let’s start by creating a 4-node cluster. All we need to do is provide kind with the docker image for the Kubernetes version we want and the types of nodes via a config file. kind then bootstraps the control plane and joins the nodes. In this setup, we used kind:v0.5.1 as we needed to support an old version of Kubernetes.

tmpDir=$(mktemp -d)
cat << EOF > ${tmpDir}/kind-cluster.yml
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
  - role: control-plane
  - role: worker
  - role: worker
  - role: worker
  - role: worker
EOF

kind create cluster --loglevel=info --config ${tmpDir}/kind-cluster.yml --image kindest/node:v1.12.10@sha256:e43003c6714cc5a9ba7cf1137df3a3b52ada5c3f2c77f8c94a4d73c82b64f6f3
export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
kubectl config rename-context "kubernetes-admin@kind" "kind"

Once we have a cluster up and running, nodes then need to be labelled correctly so we can pretend they belong to different availability zones. This script sample assigns the kind-worker node to eu-west-a. We’ll then need to repeat the same process for the other nodes ensuring they are assigned to the correct zone so we end up with 2 nodes for each zone.

node=kind-worker
zone=a
kubectl --context kind label --overwrite node ${node} failure-domain.beta.kubernetes.io/zone=eu-west-1${zone}

When all the nodes are done we should end up with something similar to this:

$ kubectl --context kind get node --show-labels
NAME                 STATUS   ROLES    AGE   VERSION    LABELS
kind-control-plane   Ready    master   93s   v1.12.10   kubernetes.io/hostname=kind-control-plane,node-role.kubernetes.io/master=
kind-worker          Ready    <none>   71s   v1.12.10   failure-domain.beta.kubernetes.io/zone=eu-west-1a,kubernetes.io/hostname=kind-worker
kind-worker2         Ready    <none>   71s   v1.12.10   failure-domain.beta.kubernetes.io/zone=eu-west-1a,kubernetes.io/hostname=kind-worker2
kind-worker3         Ready    <none>   71s   v1.12.10   failure-domain.beta.kubernetes.io/zone=eu-west-1b,kubernetes.io/hostname=kind-worker3
kind-worker4         Ready    <none>   71s   v1.12.10   failure-domain.beta.kubernetes.io/zone=eu-west-1b,kubernetes.io/hostname=kind-worker4

We then define zone-specific storage class to better illustrate zone awareness. We’ll need to repeat the same process for zone b. If you look closely you will notice that we’ve defined a no-op provisioner. This is because we will use a specific volume provisioner to create volumes; more on that in the next section.

zone=a
cat <<EOF | kubectl --context kind apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-zone-${zone}
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
EOF

Using local persistent volumes

When building our Cassandra Operator in an open-source tool such as Travis, we’d rather not use remote volumes such as Amazon Elastic Block Store (EBS), but instead switch to local persistent volumes for speed and ease of setup. Local persistent volumes allow users to access local storage through the standard Persistent Volume Claim (PVC) interface, making it completely transparent to the application which can continue to use PVC in the same way as with remote volumes. Here we’ll use the static provisioner from Kubernetes SIGs https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner. It works by scanning a directory on the host for mount points and then creating a corresponding Persistent Volume (PV) for each one it finds.

The first thing we need to do is set up the volume mount points on each node. The following script sample sets up one node with a mount point for zone a, so we’ll need to repeat this process for the other 3 nodes. Take note of the PV path as it will be used in the provisioner configmap when defining how the storage class maps to the local volume.

node=kind-worker
zone=a
pv_path="/mnt/pv-zone-${zone}"
docker exec ${node} mkdir -p /data/vol ${pv_path}/bindmount
docker exec ${node} mount -o bind /data/vol ${pv_path}/bindmount

As kind enables RBAC by default, we need to create a service account granting the provisioner access to create and bind PVs.

kubectl --context kind create ns local-volume-provisioning
cat <<EOF | kubectl --context kind apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: local-storage-admin
  namespace: local-volume-provisioning
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: local-storage-provisioner-pv-binding
  namespace: local-volume-provisioning
subjects:
- kind: ServiceAccount
  name: local-storage-admin
  namespace: local-volume-provisioning
roleRef:
  kind: ClusterRole
  name: system:persistent-volume-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: local-storage-provisioner-node-clusterrole
  namespace: local-volume-provisioning
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: local-storage-provisioner-node-binding
  namespace: local-volume-provisioning
subjects:
- kind: ServiceAccount
  name: local-storage-admin
  namespace: local-volume-provisioning
roleRef:
  kind: ClusterRole
  name: local-storage-provisioner-node-clusterrole
  apiGroup: rbac.authorization.k8s.io
EOF

The last step is to deploy a local provisioner on every node. Recall that we have 2 availability zones with a corresponding storage class, so here we deploy a local provisioner that will run on nodes in zone eu-west-1a and manage volumes for the storage class of that zone. We’ll need to repeat the same process for nodes in zone eu-west-1b.

zone=a
cat <<EOF | kubectl --context kind apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-provisioner-config-${zone}
  namespace: local-volume-provisioning
data:
  storageClassMap: |
    standard-zone-${zone}:
       hostDir: /mnt/pv-zone-${zone}
       mountDir: /mnt/pv-zone-${zone}
       volumeMode: Filesystem
       fsType: ext4
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: local-volume-provisioner-${zone}
  namespace: local-volume-provisioning
spec:
  selector:
    matchLabels:
      app: local-volume-provisioner-${zone}
  template:
    metadata:
      labels:
        app: local-volume-provisioner-${zone}
    spec:
      serviceAccountName: local-storage-admin
      containers:
        - image: "quay.io/external_storage/local-volume-provisioner:v2.3.3"
          name: provisioner
          securityContext:
            privileged: true
          env:
          - name: MY_NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: MY_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          volumeMounts:
            - mountPath: /etc/provisioner/config
              name: provisioner-config
              readOnly: true
            - mountPath: /mnt/pv-zone-${zone}
              name: pv-zone-${zone}
              mountPropagation: "HostToContainer"
      nodeSelector:
        failure-domain.beta.kubernetes.io/zone: eu-west-1${zone}
      volumes:
        - name: provisioner-config
          configMap:
            name: local-provisioner-config-${zone}
        - name: pv-zone-${zone}
          hostPath:
            path: /mnt/pv-zone-${zone}
EOF

You should end up with a zone-specific local volume provisioner on each node:

$ kubectl --context kind -n local-volume-provisioning get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE    IP           NODE           NOMINATED NODE
local-volume-provisioner-a-2fn87   1/1     Running   0          115s   10.244.3.2   kind-worker2   <none>
local-volume-provisioner-a-h2k8w   1/1     Running   0          115s   10.244.4.2   kind-worker    <none>
local-volume-provisioner-b-96qnj   1/1     Running   0          115s   10.244.1.2   kind-worker4   <none>
local-volume-provisioner-b-p76v6   1/1     Running   0          115s   10.244.2.2   kind-worker3   <none>

Moments later the static PVs will get provisioned automatically ready to be bound directly to a pod or via a PVC.

$ kubectl --context kind get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS      REASON   AGE
local-pv-a7c00434   245Gi      RWO            Delete           Available           standard-zone-a            35s
local-pv-b6f98e8    245Gi      RWO            Delete           Available           standard-zone-b            35s
local-pv-c3b01b62   245Gi      RWO            Delete           Available           standard-zone-a            38s
local-pv-cfe437df   245Gi      RWO            Delete           Available           standard-zone-b            36s

We should now be able to create a pod with a persistent volume. If our storage class was topology aware we could do without the affinity rule.

cat <<EOF | kubectl --context kind apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-claim
spec:
  storageClassName: standard-zone-a
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
EOF

cat <<EOF | kubectl --context kind apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: pv-pod
spec:
  volumes:
    - name: pv-storage
      persistentVolumeClaim:
        claimName: pv-claim
  containers:
    - name: pv-container
      image: nginx
      volumeMounts:
        - mountPath: "/path/to/storage"
          name: pv-storage
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: failure-domain.beta.kubernetes.io/zone
            operator: In
            values:
            - eu-west-1a
EOF

The PV should now show that it’s bound to the PVC we just created.

$ kubectl --context kind get pvc
NAME       STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pv-claim   Bound    local-pv-a7c00434   245Gi      RWO            standard-zone-a   2m31s

Deploying local images

While developing the operator we could have opted to push our docker images to a public repo so they could be used in kind. Our Travis CI would then build and push images remotely and kind would download them when deploying the operator and the Cassandra pods. This could have been acceptable if we had just one image to manage, but our operator uses up to 5! On top of that, the fact that Cassandra pods run on different nodes meant that each node would need to download the images for each initial deployment. The time quickly added up and we soon got to a stage where we wasted minutes waiting for remote images that we already had built locally.

We needed to find a way to avoid the unnecessary round trip to the remote registry.

kind does offer a load image function, but this felt still too slow and a bit cumbersome to use. All we wanted was for kind to pull images directly from the host it’s running on.

After some digging around we found an interesting idea suggesting to run a local registry and set up the worker nodes to forward registry requests to the docker host. This solution was specific to DinD so we had to adapt it slightly to make it work with kind. Instead of running a registry-proxy via docker which is not available on kind nodes, we used a Daemonset. More recent versions of kind added support for a local registry; see https://kind.sigs.k8s.io/docs/user/local-registry/

docker run -d --name=kind-registry --restart=always -p 5000:5000 registry:2

cat <<EOF | kubectl --context kind apply -f -
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: registry-proxy
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: registry-proxy
  template:
    metadata:
      labels:
        app: registry-proxy
    spec:
      hostNetwork: true
      containers:
        - image: "tecnativa/tcp-proxy"
          name: tcp-proxy
          command: ["/bin/sh", "-c"]
          args:
            - export TALK=$(/sbin/ip route |  awk '/default/ { print $3 ":5000"}');
              export LISTEN=:5000;
              /magic-entrypoint /docker-entrypoint.sh haproxy -f /usr/local/etc/haproxy/haproxy.cfg;
          ports:
            - containerPort: 5000
EOF

In the end, we ended up with a faster and more elegant solution that didn’t need to push untested images to a remote repository or consume unnecessary network bandwidth.

Conclusion

kind is a fast and lightweight Kubernetes cluster which makes a great candidate for testing your Kubernetes components in a CI build. Using it to test our Kubernetes operator gave us confidence that it would work once deployed into an integrated environment.