This is because the NVIDIA manager opens a handle on nvml at NewNvidiaManager time. This is problematic in a kubernetes setting where kubelet now has an open handle on the NVIDIA driver, preventing an update of the NVIDIA driver unless kubelet is restarted. Additionally with the new metrics pipeline in Kubernetes, metrics are now expected to be collected through a container rather than through the kubelet itself. Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com> |
||
---|---|---|
.. | ||
container_test.go | ||
container.go | ||
manager_test.go | ||
manager.go |