Mercurial > code > home > repos > infra
annotate kube-gpu/start.sh @ 332:d4893670f888 default tip
WIP: use watchdog reboot timer on pi
author | drewp@bigasterisk.com |
---|---|
date | Thu, 27 Feb 2025 11:09:29 -0800 |
parents | 34ab4aec7d4b |
children |
rev | line source |
---|---|
268
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
1 #!/bin/bash |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
2 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
3 linux-amd64/helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
4 && linux-amd64/helm repo update |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
5 linux-amd64/helm install --wait nvidiagpu \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
6 -n gpu-operator --create-namespace \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
7 --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
8 --set toolkit.env[1].name=CONTAINERD_SOCKET --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
9 --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS --set toolkit.env[2].value=nvidia \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
10 --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT --set-string toolkit.env[3].value=true \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
11 --set validator.driver.env[0].name="DISABLE_DEV_CHAR_SYMLINK_CREATION" --set-string validator.driver.env[0].value="true" \ |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
12 nvidia/gpu-operator |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
13 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
14 # and maybe k edit ClusterPolicy to do this: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
15 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
16 # Error: error validating driver installation: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
17 # error creating symlink creator: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
18 # failed to create NVIDIA device nodes: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
19 # failed to create device node nvidiactl: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
20 # failed to determine major: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
21 # invalid device node |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
22 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
23 # Failed to create symlinks under /dev/char that point to all possible NVIDIA character devices. |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
24 # The existence of these symlinks is required to address the following bug: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
25 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
26 # https://github.com/NVIDIA/gpu-operator/issues/430 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
27 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
28 # This bug impacts container runtimes configured with systemd cgroup management enabled. |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
29 # To disable the symlink creation, set the following envvar in ClusterPolicy: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
30 |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
31 # validator: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
32 # driver: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
33 # env: |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
34 # - name: DISABLE_DEV_CHAR_SYMLINK_CREATION |
34ab4aec7d4b
notes and changes for getting nvidia gpu k3d support going, which was very hard
drewp@bigasterisk.com
parents:
diff
changeset
|
35 # value: \"true\"" |