- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 504
Upgrade Keda and GPU components versions #1575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|  | @@ -5,6 +5,21 @@ CUDA workloads require the NVIDIA Container Runtime, so containerd needs to be c | |||||
| The K3s container itself also needs to run with this runtime. | ||||||
| If you are using Docker you can install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). | ||||||
|  | ||||||
| ## Preparing Server to run Keda | ||||||
|  | ||||||
| To Create a Server that has all the drivers installed you can use exmaple script prepared for Ubuntu 24.04 | ||||||
| 
     | ||||||
| To Create a Server that has all the drivers installed you can use exmaple script prepared for Ubuntu 24.04 | |
| To Create a Server that has all the drivers installed you can use example script prepared for Ubuntu 24.04 | 
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -1,30 +1,60 @@ | ||||
| ARG K3S_TAG="v1.28.8-k3s1" | ||||
| ARG CUDA_TAG="12.4.1-base-ubuntu22.04" | ||||
| ARG K3S_TAG="v1.31.7-k3s1" | ||||
| ARG CUDA_TAG="12.8.1-base-ubuntu24.04" | ||||
| ARG NVIDIA_DRIVER_VERS="570" | ||||
|  | ||||
| FROM rancher/k3s:$K3S_TAG as k3s | ||||
| FROM nvcr.io/nvidia/cuda:$CUDA_TAG | ||||
| # Stage 1: Pull k3s base image | ||||
| FROM rancher/k3s:${K3S_TAG} AS k3s | ||||
| # Nothing else needed here except the base | ||||
|  | ||||
| # Install the NVIDIA container toolkit | ||||
| RUN apt-get update && apt-get install -y curl \ | ||||
| && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ | ||||
| # Stage 2: CUDA + NVIDIA Toolkit layer | ||||
| FROM nvcr.io/nvidia/cuda:${CUDA_TAG} | ||||
|  | ||||
| # Re-declare all ARGs you want to use in this stage | ||||
| ARG NVIDIA_DRIVER_VERS | ||||
|  | ||||
| # Optional: useful for runtime debugging | ||||
| ENV NVIDIA_DRIVER_VERS=${NVIDIA_DRIVER_VERS} | ||||
| ENV PAGER=less | ||||
|  | ||||
| # Install NVIDIA container toolkit & matching utilities | ||||
| # Install NVIDIA container toolkit, utilities, and graphics-drivers PPA | ||||
| # YES WE KNOW we are at Ubuntu 24.04 but NVidia container toolkit does not support 24.04 YET | ||||
| # https://github.com/NVIDIA/nvidia-container-toolkit/issues/482 | ||||
| RUN apt-get update && apt-get install -y \ | ||||
| curl \ | ||||
| gnupg \ | ||||
| less \ | ||||
| ca-certificates \ | ||||
| software-properties-common \ | ||||
| lsb-release \ | ||||
| less \ | ||||
| 
     | ||||
| less \ | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,19 +1,52 @@ | ||
| #!/bin/bash | ||
| #!/usr/bin/env bash | ||
| # set -euxo pipefail | ||
|  | ||
| set -euxo pipefail | ||
|  | ||
| K3S_TAG=${K3S_TAG:="v1.28.8-k3s1"} # replace + with -, if needed | ||
| CUDA_TAG=${CUDA_TAG:="12.4.1-base-ubuntu22.04"} | ||
| IMAGE_REGISTRY=${IMAGE_REGISTRY:="MY_REGISTRY"} | ||
| IMAGE_REPOSITORY=${IMAGE_REPOSITORY:="rancher/k3s"} | ||
| IMAGE_TAG="$K3S_TAG-cuda-$CUDA_TAG" | ||
| # Set default values | ||
| K3S_TAG=${K3S_TAG:="v1.31.7-k3s1"} # replace + with -, if needed | ||
| CUDA_TAG=${CUDA_TAG:="12.8.1-base-ubuntu24.04"} | ||
| #IMAGE_REGISTRY=${IMAGE_REGISTRY:="techmakers.azurecr.io"} | ||
| IMAGE_REGISTRY=${IMAGE_REGISTRY:="docker.io"} | ||
| IMAGE_REPOSITORY=${IMAGE_REPOSITORY:="k3s"} | ||
| IMAGE_TAG="${K3S_TAG//+/-}-cuda-$CUDA_TAG" | ||
| IMAGE=${IMAGE:="$IMAGE_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"} | ||
|  | ||
| echo "IMAGE=$IMAGE" | ||
|  | ||
| docker build \ | ||
| # Check if Docker is installed | ||
| if ! command -v docker &> /dev/null; then | ||
| echo "Docker is not installed. Please install Docker first." >&2 | ||
| exit 1 | ||
| fi | ||
|  | ||
| # Check if Docker service is running | ||
| if ! systemctl is-active --quiet docker; then | ||
| echo "Docker service is not running. Attempting to start it..." >&2 | ||
| sudo systemctl start docker | ||
| fi | ||
|  | ||
| # Check if user is in docker group | ||
| if ! groups | grep -q '\bdocker\b'; then | ||
| echo "WARNING: You are not in the 'docker' group. You may need to use sudo for docker commands." | ||
| fi | ||
|  | ||
| # Check if az CLI is installed | ||
| if ! command -v az &> /dev/null; then | ||
| echo "Azure CLI (az) is not installed. Please install it first." >&2 | ||
| exit 1 | ||
| fi | ||
|  | ||
| # Login to Azure container registry | ||
| # echo "Logging into Azure..." | ||
| # az acr login --name "$(echo $IMAGE_REGISTRY | cut -d. -f1)" | ||
|  | ||
| # --- Build and Push --- | ||
| echo "Building image..." | ||
| docker build --debug \ | ||
| --build-arg K3S_TAG=$K3S_TAG \ | ||
| --build-arg CUDA_TAG=$CUDA_TAG \ | ||
| -t $IMAGE . | ||
| docker push $IMAGE | ||
| echo "Done!" | ||
| -t "$IMAGE" . | ||
|  | ||
| # echo "Pushing image..." | ||
| # docker push "$IMAGE" | ||
|  | ||
| echo "Done!" | 
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,54 @@ | ||||||||||||||||||||||||||||||||||
| #!/bin/bash | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| NVIDIA_DRIVER_VERS="570" | ||||||||||||||||||||||||||||||||||
| sudo apt-get remove --purge '^nvidia-.*' 'libnvidia-*' -y && \ | ||||||||||||||||||||||||||||||||||
| sudo apt-get autoremove -y && \ | ||||||||||||||||||||||||||||||||||
| sudo apt-get autoclean -y && \ | ||||||||||||||||||||||||||||||||||
| sudo apt update && \ | ||||||||||||||||||||||||||||||||||
| sudo apt-get install -y software-properties-common && \ | ||||||||||||||||||||||||||||||||||
| sudo add-apt-repository -y ppa:graphics-drivers/ppa && \ | ||||||||||||||||||||||||||||||||||
| sudo apt-get update && \ | ||||||||||||||||||||||||||||||||||
| sudo apt install nvidia-driver-${NVIDIA_DRIVER_VERS} -y && \ | ||||||||||||||||||||||||||||||||||
| sudo reboot | ||||||||||||||||||||||||||||||||||
| # Then install following to have nvidia-smi | ||||||||||||||||||||||||||||||||||
| # sudo apt install nvidia-utils-570 -y | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # Add NVIDIA GPG key | ||||||||||||||||||||||||||||||||||
| distribution="ubuntu22.04" | ||||||||||||||||||||||||||||||||||
| curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \ | ||||||||||||||||||||||||||||||||||
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \ | ||||||||||||||||||||||||||||||||||
| curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ | ||||||||||||||||||||||||||||||||||
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ | ||||||||||||||||||||||||||||||||||
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| # Install | ||||||||||||||||||||||||||||||||||
| sudo apt-get update | ||||||||||||||||||||||||||||||||||
| sudo apt-get install -y nvidia-container-toolkit | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| # Configure Docker to use NVIDIA runtime | ||||||||||||||||||||||||||||||||||
| sudo nvidia-ctk runtime configure --runtime=docker | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| # Restart Docker | ||||||||||||||||||||||||||||||||||
| sudo systemctl restart docker | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| # Add NVIDIA GPG key | ||||||||||||||||||||||||||||||||||
| curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \ | ||||||||||||||||||||||||||||||||||
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| # Add repo using fake distribution | ||||||||||||||||||||||||||||||||||
| curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ | ||||||||||||||||||||||||||||||||||
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ | ||||||||||||||||||||||||||||||||||
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| sudo apt-get update | ||||||||||||||||||||||||||||||||||
| sudo apt-get install -y nvidia-container-toolkit | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| sudo nvidia-ctk runtime configure --runtime=docker | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| sudo systemctl restart docker | ||||||||||||||||||||||||||||||||||
|  | ||||||||||||||||||||||||||||||||||
| 
      Comment on lines
    
      +37
     to 
      +53
    
   
     | ||||||||||||||||||||||||||||||||||
| # Add NVIDIA GPG key | |
| curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \ | |
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg | |
| # Add repo using fake distribution | |
| curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ | |
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ | |
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list | |
| sudo apt-get update | |
| sudo apt-get install -y nvidia-container-toolkit | |
| sudo nvidia-ctk runtime configure --runtime=docker | |
| sudo systemctl restart docker | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading mentions 'Keda' but should be 'CUDA' based on the context. This appears to be a typo as the section is about preparing a server for GPU/CUDA workloads, not Keda (Kubernetes Event-driven Autoscaling).