Single Node Kubeflow 1.7 cluster with Nvidia GPU support

Kubeflow

At Kubeflow 1.4 on a Minikube Kubernetes Node I described the setup of a Kubeflow/Minikube setup. Still getting the GPUs provided to Kubeflow is difficult if Minikube is used. Minikube only supports GPUs for selected drivers. The none driver is not recommend, https://minikube.sigs.k8s.io/docs/drivers/none/, and the kvm2 driver adds another layer of virtualization I like to avoid.

Still, Minikube has the advantage that it provides network plugins and storage provisioners out of the box. However, I went the long and difficult road to setup a pure Kubernetes cluster for Kubeflow which includes Calico as network plugin and Rook/Ceph as storage provisioner. Such a setup, scales easily to a multi-node cluster, since Kubernetes as well as Rook/Ceph are designed for such use cases. Only due to our limited hardware equipment, we restrict our setup to a single-node. A multi-node cluster would offer redundancy, high-availabily and scalability, characteristics desired in production environments.

Our Hardware Setup

Our setup is based on an ASUS ESC4000A-E10 with

  • 2x AMD EPYC 7413 24
  • 512 GB RAM DDR4-3200
  • 7x Nvidia A30 GPU
  • 2x 1,92 TB SSD/NVMe 2.5" system partition as software raid 1
  • 2x 15,3 TB NVMe M.2 data partition as Ceph redundant partitions.

Our Software Setup

The most difficult part is the selection to find a set of working components, so far I used:

I installed Ubuntu 22.04 with:

  1. Containerd
    • In previous installations, it was difficult to find a working container engine, for now containerd seems to work good.
  2. Kubernetes 1.25
    • Kubeflow 1.7 is tested with Kubeflow 1.24/1.25.
  3. Rook and Ceph as storage provider
    • Kubeflow uses persistent volume claims, therefore a storage provider is required that can provide them. We use Ceph as single node installation.
  4. Nvidia GPU Operator to make GPUs available to notebooks
  5. Kustomize v5.0.3
    • Since Kubeflow 1.3 kustomize manifests are used to deploy Kubeflow.
  6. Kubeflow 1.7

Software Installation

This section gives a brief summary of the commands used for installation and references to the related documentation.

Preparation

Create /etc/sysctl.d/kubeflow.conf and insert

fs.inotify.max_user_instances = 1280

which seems to solve the problem related to https://github.com/kubeflow/manifests/issues/2087.

According to https://kubernetes.io/docs/setup/production-environment/container-runtimes/#install-and-configure-prerequisites, load required kernel modules and enable bridging and forwarding.

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

Install containerd

See https://docs.docker.com/engine/install/ubuntu/

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io

Install Kubernetes

Kubeflow 1.7 is tested with 1.24/1.25, we use the newer Kubernetes release 1.25.

Since we use Ubuntu with systemd, configure containerd to use systemd cgroup driver, see https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd:

sudo su
containerd config default > /etc/containerd/config.toml
exit

Set SystemdCgroup = true in section [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] in /etc/containerd/config.toml.

and restart `containerd``

sudo systemctl restart containerd

afterwards install Kubernetes.

Prepare the software repositories

sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update

Install kubernetes command line tools

Install the latest minior release of 1.25, which is 1.25.11-00.

KVER=1.25.11-00
sudo apt-get install -y kubelet=$KVER kubeadm=$KVER kubectl=$KVER
sudo apt-mark hold kubelet kubeadm kubectl

Set the packages to hold to avoid upgrades.

Init the Cluster and install the Network Plugin Calico

Note, that the master role is removed for our single node cluster, so that pods are scheduled in the master, too.

sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --kubernetes-version="1.25"

mkdir -p $HOME/.kube 
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-

We also increase number if maximum pods from 110 to 200 in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add --max-pods=243 to ExexStart since Kubeflow schedules pods for any user logged in once.

Install the storage provider

Kubeflow requires a storage provider. For our setup, we use Ceph deployed by Rook. Therefore, we provide two spare disks that are initialized by Ceph. Ensure that the disks do not have any filesystem, otherwise Ceph does not use them.

To wipe a filesystem see https://rook.github.io/docs/rook/latest/ceph-teardown.html.

Typically, Rook/Ceph is used in a multi-node cluster for high-availability. Here, we make a single node deployment. Example yaml-files are already included at https://github.com/rook/rook.git. The single-node specific configurations are in cluster-test.yaml and storageclass-test.yaml. Details see below.

Install a single node Rook/Ceph cluster:

git clone --single-branch --branch master https://github.com/rook/rook.git
cd rook/deploy/examples
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster-test.yaml

and create the storage class used for the persistent volume claims:

cd csi/rbd
kubectl create -f storageclass-test.yaml

Make this class the default class:

kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Check that the OSD pods rook-ceph-osd-... are running with

kubectl -n rook-ceph get pods

Check the rook/ceph storage health status, see https://rook.io/docs/rook/v1.11/Upgrade/health-verification/, with the toolbox:

#Change back to deploy/examples in the rook folder
cd ../..
kubectl create -f toolbox.yaml
ROOK_CLUSTER_NAMESPACE=rook-ceph
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status

Add GPU support

Different ways exist as described at https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html .

We select the Nvidia GPU operator, which handles the installation of drivers and additional required libraries.

From https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 && chmod 700 get_helm.sh && ./get_helm.sh
helm repo add nvidia https://nvidia.github.io/gpu-operator && helm repo update
helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator

We had to restart the server, so that GPU-Operator initializes sucessfully.

Test with

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvidia/samples:vectoradd-cuda11.2.1"
    resources:
      limits:
         nvidia.com/gpu: 1
EOF

and see the logs

kubectl logs cuda-vectoradd

Install Kustomize v5.0.3

wget https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv5.0.3/kustomize_v5.0.3_linux_amd64.tar.gz
tar xvzf kustomize_v5.0.3_linux_amd64.tar.gz
sudo mv kustomize /usr/bin

Install Kubeflow 1.7

See https://github.com/kubeflow/manifests#installation

Preparation

Since Kubeflow 1.4 manifests are used to deploy Kubeflow.

Download the manifests, change into the directory, and checkout the latest release

git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.7.0

For the next commands stay in this directory.

If desired, set a non-default user password for the default user. First create a password hash with python.

sudo apt install python3-passlib python3-bcrypt
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'

and set the hash option in the file ./common/dex/base/config-map.yaml to the generated password hash

vi ./common/dex/base/config-map.yaml

Bugfix login stucks in infinite loop, issue see https://github.com/kubeflow/manifests/issues/2423,due to outdated image of authservice, bugfix see https://github.com/kubeflow/manifests/pull/2474 or change referenced image in common/oidc-authservice/base/kustomization.yaml from

newName: gcr.io/arrikto/kubeflow/oidc-authservice
newTag: e236439

to

newName: gcr.io/arrikto/oidc-authservice
newTag: 0c4ea9a

Install

Install all components with one commands, see https://github.com/kubeflow/manifests#install-with-a-single-command

while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Now we should have a running single node Kubeflow cluster, verify with kubectl get pods -A that all pods are in the states Running or Completed.

Login via SSH port forwarding

So far, we do not make the web interface publicly available and use SSH port forwarding.

On the Kubeflow machine expose the port 8080:

kubectl port-forward -n istio-system service/istio-ingressgateway 8080:80 --address=0.0.0.0

On the connecting client forward the local port 8080 to the remote port 8080:

ssh -L 8080:localhost:8080 <remote-user>@<kubeflow-machine>

Open a web browser on the client and open localhost:8080.

For a better usability use a loadbalancer.

Setup the Loadbalancer with TLS

Adapted from https://v1-5-branch.kubeflow.org/docs/distributions/nutanix/install-kubeflow/#setup-a-loadbalancer-optional

Requirement: Create a valid certificate before.

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml

Specify a IPAddressPool in pool.yaml with a single address:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.10.2/32

Specify an advertisement in advert.yaml:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb-system
kubectl apply -f pool.yaml
kubectl apply -f advert.yaml

Have your certificate cert.pem and encrypted private file key.pem ready and add it:

kubectl create -n istio-system secret tls kubeflowcrt --key=key.pem --cert=cert.pem

Now adapt the istio kubeflow gateway with

kubectl -n kubeflow edit gateways.networking.istio.io kubeflow-gateway

set the spec section to

spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: kubeflowcrt
      mode: SIMPLE

Change the type of the istio-ingressgateway service to LoadBalancer and get the IP

kubectl -n istio-system  patch service istio-ingressgateway -p '{"spec": {"type": "LoadBalancer"}}'
kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0]}'

This should be the IP you have configured for metallb.

Add REDIRECT_URL in oidc-authservice-parameters configmap to data section, where x.x.x.x is your IP address or if you have DNS name use this instead:

kubectl -n istio-system edit configmap oidc-authservice-parameters
apiVersion: v1
data:
  AUTHSERVICE_URL_PREFIX: /authservice/
  OIDC_AUTH_URL: /dex/auth
  OIDC_PROVIDER: http://dex.auth.svc.cluster.local:5556/dex
  OIDC_SCOPES: profile email groups
  PORT: '"8080"'
  REDIRECT_URL: https://x.x.x.x/login/oidc
  ...

Append https://x.x.x.x/login/oidc also to redirectURIs in the dex configmap

kubectl -n auth edit configmap dex

Rollout and restart services

kubectl -n istio-system rollout restart statefulset authservice
kubectl -n auth rollout restart deployment dex

Now Kubeflow should accessible via https://x.x.x.x.

LDAP integration

Change accordingly the fields and the filter

Get the current config:

kubectl get configmap dex -n auth -o jsonpath='{.data.config\.yaml}' > dex-config.yaml

Add the LDAP connector:

cat << EOF >> dex-config.yaml
connectors:
- type: ldap
  id: ldap
  name: LDAP
  config:
    host: <LDAP host>
    usernamePrompt: username
    userSearch:
      baseDN: dc=<domain>,dc=<>
      filter: (&(objectClass=posixAccount)(|(uid=<username>)))
      username: uid
      idAttr: uid
      emailAttr: mail
      nameAttr: givenName
EOF

Apply config

kubectl create configmap dex --from-file=config.yaml=dex-config.yaml -n auth --dry-run=client -oyaml | kubectl apply -f -

Restart Dex

kubectl rollout restart deployment dex -n auth

For details see https://cloudadvisors.net/2020/09/23/ldap-active-directory-with-kubeflow-within-tkg/

Disable unused ports in istio

Istio opens a few ports by default, you can disable (if not required) other than 80 and 443 with by modification of the service

kubectl edit services istio-ingressgateway -n istio-system
Professor of Computer Networks