Installing Kubeflow 1.4 on a Minikube Kubernetes Node

Why Kubeflow

Kubeflow is a software framework for machine learning workflows from training, optimization, and serving.

It integrates various components for this task, whereof one of the most famous ones is Jupyter Notebooks, still various useful tools to automate feature selection, network topology optimization, and hyperparameter optimization are included.

Furthermore, it integrates features such as a web interface, multi-tenancy by namespaces, and scalability by the use of Kubernetes as infrastructure framework.

Typically, it is well suited for large deployments and is available for various public clouds such Amazon Web Service, Google Cloud, Azure … .

Here, we target at a single host deployment to make Kubeflow available in a small lab environment.

For previous versions, there was a description for installation on Minikube, i.e. a single host Kubernetes environment. This installation description was removed and the installation process was heavily modified using Kubernetes manifests. Some descriptions using Arrikto MiniKF or Microk8s Kubeflow are described at https://www.kubeflow.org/docs/started/installing-kubeflow/ for local deployments. Still, all of these installations have some flaws I encountered during installation, which are also mentioned in the video tutorial https://youtu.be/C9Cl8EcqnfE.

Therefore, I share here my installation process for a local machine installation. Difficulties are especially in the selection of the correct version of Kubernetes and working container driver and container runtime.

Therefore, this blog post follows the description given at https://github.com/kubeflow/manifests#installation, but fills the gaps by selection of concrete software packages.

Our Hardware Setup

Our setup is based on an ASUS ESC4000A-E10 with

  • AMD Epyc CPU 7443P
  • 128 GB RAM DDR4-3200
  • 1x Nvidia RTX 3900 GPU
  • 1 TB NVMe M.2 SSD system partition
  • 3.84 TB NVMe 2.5" SSD data partition

Currently, we only deploy one GPU for machine learning tasks, still this server can provide up to four cards. This description does not enable the GPU usage in the setup but may be provided in the future.

Our Software Setup

The most difficult part is the selection to find a set of working components, so far we use:

  • Ubuntu 20.04 LTS with hwe kernel 5.11, the hwe kernel improves the tensorflow benchmark performance compared to the default kernel 5.4. The hwe kernel can easily be installed with:
sudo apt install --install-recommends linux-generic-hwe-20.04
  • Docker and Containerd
    • I tried podman and cri-o, too. But the cert-manager-webhook deployment does not start, using docker and containerd solves the problem.
  • Minikube 1.22
    • most recent version
  • Kustomize v3.2.0
  • Kubernetes 1.21
    • API changes in Kubernetes 1.22 hinder the use of 1.22
  • Kubeflow 1.4
    • most recent version

Specific adaptations for our environment

To provide sufficient space for the containers, we set the data-root of docker to our data drive. Change the docker.service file

sudo vi /usr/lib/systemd/system/docker.service

and add the --data-root dockerd option to the ExecStart systemd option:

...
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --data-root=/data/docker
...

Install docker and containerd

See https://docs.docker.com/engine/install/ubuntu/

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo usermod -aG docker $USER && newgrp docker

The current dockerd version is 20.10.12.

Install minikube

See https://minikube.sigs.k8s.io/docs/start/

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb

Create a Minikube cluster with latest Kubernetes version < 1.22

minikube start --kubernetes-version=1.21.8

For convenience set an alias for the kubeclt command

alias kubectl="minikube kubectl --"

you may also create the file ~/.bash_aliases and add the previous line for a permanent alias.

Install Kustomize

wget https://github.com/kubernetes-sigs/kustomize/releases/download/v3.2.0/kustomize_3.2.0_linux_amd64
chmod +x kustomize_3.2.0_linux_amd64
sudo mv kustomize_3.2.0_linux_amd64 /usr/bin
sudo ln -s /usr/bin/kustomize_3.2.0_linux_amd64 /usr/bin/kustomize

Install Kubeflow 1.4

See https://github.com/kubeflow/manifests#installation

Download the manifests, change into the directory, and checkout the latest release

git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.4.1

For the next commands stay in this directory.

If desired, set a non-default user password for the default user. First create a password hash with python.

sudo apt install python3-passlib python3-bcrypt
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'

and set the hash option in the file ./common/dex/base/config-map.yaml.

vi ./common/dex/base/config-map.yaml

Install

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Login

So far, we do not make the web interface publicly available and use SSH port forwarding.

On the Kubeflow machine expose the port 8080:

kubectl port-forward -n istio-system service/istio-ingressgateway 8080:80 --address=0.0.0.0

On the connection client forward the local port 8080 to the remote port 8080:

ssh -L 8080:localhost:8080 <remote-user>@<kubeflow-machine>

Open a web browser on the client machine and open localhost:8080 .

ToDo

Professor of Computer Networks