annotate kube-gpu/start.sh @ 332:d4893670f888 default tip

WIP: use watchdog reboot timer on pi
author drewp@bigasterisk.com
date Thu, 27 Feb 2025 11:09:29 -0800
parents 34ab4aec7d4b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
268
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
1 #!/bin/bash
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
2
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
3 linux-amd64/helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
4 && linux-amd64/helm repo update
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
5 linux-amd64/helm install --wait nvidiagpu \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
6 -n gpu-operator --create-namespace \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
7 --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
8 --set toolkit.env[1].name=CONTAINERD_SOCKET --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
9 --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS --set toolkit.env[2].value=nvidia \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
10 --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT --set-string toolkit.env[3].value=true \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
11 --set validator.driver.env[0].name="DISABLE_DEV_CHAR_SYMLINK_CREATION" --set-string validator.driver.env[0].value="true" \
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
12 nvidia/gpu-operator
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
13
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
14 # and maybe k edit ClusterPolicy to do this:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
15
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
16 # Error: error validating driver installation:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
17 # error creating symlink creator:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
18 # failed to create NVIDIA device nodes:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
19 # failed to create device node nvidiactl:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
20 # failed to determine major:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
21 # invalid device node
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
22
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
23 # Failed to create symlinks under /dev/char that point to all possible NVIDIA character devices.
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
24 # The existence of these symlinks is required to address the following bug:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
25
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
26 # https://github.com/NVIDIA/gpu-operator/issues/430
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
27
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
28 # This bug impacts container runtimes configured with systemd cgroup management enabled.
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
29 # To disable the symlink creation, set the following envvar in ClusterPolicy:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
30
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
31 # validator:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
32 # driver:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
33 # env:
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
34 # - name: DISABLE_DEV_CHAR_SYMLINK_CREATION
34ab4aec7d4b notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff changeset
35 # value: \"true\""