From 6ba3fa4e8c031161f3cca0b0eb6518aea602eef9 Mon Sep 17 00:00:00 2001 From: Rohit Agarwal Date: Mon, 27 Nov 2017 17:41:56 -0800 Subject: [PATCH] Add docs for using nvidia gpu monitoring. --- docs/running.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/running.md b/docs/running.md index b798c1c9..c1c7fd03 100644 --- a/docs/running.md +++ b/docs/running.md @@ -82,3 +82,26 @@ cAdvisor is now running (in the foreground) on `http://localhost:8080/`. ## Runtime Options cAdvisor has a series of flags that can be used to configure its runtime behavior. More details can be found in runtime [options](runtime_options.md). + +## Hardware Accelerator Monitoring + +cAdvisor can export some metrics for hardware accelerators attached to containers. +Currently only Nvidia GPUs are supported. There are no machine level metrics. +So, metrics won't show up if no container with accelerators attached is running. +Metrics will only show up if accelerators are explicitly attached to the container, e.g., by passing `--device /dev/nvidia0:/dev/nvidia0` flag to docker. +If nothing is explicitly attached to the container, metrics will NOT show up. This can happen when you access accelerators from privileged containers. + +There are two things that cAdvisor needs to show Nvidia GPU metrics: +- access to NVML library (`libnvidia-ml.so.1`). +- access to the GPU devices. + +If you are running cAdvisor inside a container, you will need to do the following to give the container access to NVML library: +``` +-e LD_LIBRARY_PATH= +--volume : +``` + +If you are running cAdvisor inside a container, you can do one of the following to give it access to the GPU devices: +- Run with `--privileged` +- If you are on docker v17.04.0-ce or above, run with `--device-cgroup-rule 'c 195:* mrw'` +- Run with `--device /dev/nvidiactl:/dev/nvidiactl /dev/nvidia0:/dev/nvidia0 /dev/nvidia1:/dev/nvidia1 `