Go to file
Rohit Agarwal 4a35130019 Collect container-level GPU metrics using NVML.
When cAdvisor starts up, it would read the `vendor` files in
`/sys/bus/pci/devices/*` to see if any NVIDIA devices (vendor ID: 0x10de) are
attached to the node. If no NVIDIA devices are found, this code path would
become dormant for the rest of cAdvisor lifetime. If NVIDIA devices are found,
we would start a goroutine that would check for the presence of NVML by trying
to dynamically load it at regular intervals. We need to do this regular
checking instead of doing it just once because it may happen that cAdvisor is
started before the NVIDIA drivers and NVML are installed.  Once the NVML
dynamic loading succeeds, we would use NVML’s query methods to find out how
many devices exist on the node and create a map from their minor numbers to
their handles and cache that map. The goroutine would exit at this point.

If we detected the presence of NVML in the previous step, whenever a new
container is detected by cAdvisor, cAdvisor would read the `devices.list` file
from the container's devices cgroup. The `devices.list` file lists the
major:minor number of all the devices that the container is allowed to access.
If we find any device with major number 195 (which is the major number assigned
to NVIDIA devices), we would cache the list of corresponding minor numbers for
that container.

During every housekeeping operation, in addition to collecting all the existing
metrics, we will use the cached NVIDIA device minor numbers and the map from
minor numbers to device handles to get metrics for GPU devices attached to the
container.
2017-11-06 11:54:59 -08:00
accelerators Collect container-level GPU metrics using NVML. 2017-11-06 11:54:59 -08:00
api Add an API to get FsStats from filesystem UUID 2017-08-23 12:33:42 -07:00
build Vendor Go bindings for NVML. Don't build a static binary. 2017-11-01 14:41:35 -07:00
cache re-order the import package 2015-11-30 16:43:22 +08:00
client new client with http timeout 2017-04-14 10:20:36 +08:00
collector Update prometheus_collector.go 2017-08-24 08:54:27 +08:00
container Collect container-level GPU metrics using NVML. 2017-11-06 11:54:59 -08:00
deploy fix builds 2017-09-07 10:47:04 -07:00
devicemapper update runc godep 2017-08-25 13:42:26 -07:00
docs Add runtime options for TLS support 2017-08-24 15:14:33 +02:00
events Don't create a EventStore if the event limit is set to 0 2016-04-25 16:50:17 -07:00
fs Stop AfterFunc timer after findCmd.Wait regardless of errors to prevent memory leak 2017-10-24 15:18:07 +02:00
Godeps Vendor Go bindings for NVML. Don't build a static binary. 2017-11-01 14:41:35 -07:00
healthz Fix imported package names to not use mixedCaps or under_scores 2015-10-22 12:10:57 +08:00
http Cleanup cAdvisor error responses with proper headers & response codes 2017-08-10 12:32:25 -07:00
info Add accelerator metrics to the API. 2017-11-01 14:41:35 -07:00
integration [docker] add overlay2 storage driver 2017-04-13 13:39:45 +02:00
machine Simplify Utsname string conversion 2017-10-31 12:14:10 +01:00
manager Collect container-level GPU metrics using NVML. 2017-11-06 11:54:59 -08:00
metrics Add accelerator metrics to the API. 2017-11-01 14:41:35 -07:00
pages Cleanup cAdvisor error responses with proper headers & response codes 2017-08-10 12:32:25 -07:00
storage feat: handling retention policy in influxdb 2017-04-28 21:13:00 +02:00
summary Export type to calculate percentiles 2015-07-21 17:52:01 -07:00
utils fix #1743; move off of docker/engine-api 2017-09-28 11:05:13 -07:00
validate fix #1743; move off of docker/engine-api 2017-09-28 11:05:13 -07:00
vendor Vendor Go bindings for NVML. Don't build a static binary. 2017-11-01 14:41:35 -07:00
version Simplify cAdvisor release versioning 2016-06-29 18:27:07 -07:00
zfs Add watcher for zfs similar to devicemapper 2017-03-15 18:31:11 -04:00
.gitignore Gitignore Files generated by JetBrains IDEs 2017-03-18 16:51:36 +05:30
AUTHORS Remove mention of contributors file. We don't have one. 2014-12-30 17:16:46 +00:00
cadvisor_test.go Add udp and udp6 network statistics 2017-04-10 20:41:51 +01:00
cadvisor.go Add udp and udp6 network statistics 2017-04-10 20:41:51 +01:00
CHANGELOG.md 0.27.1 changelog 2017-09-06 16:09:30 -07:00
CONTRIBUTING.md Add CONTRIBUTING.md 2014-06-10 13:09:14 -07:00
LICENSE Migrating cAdvisor code from lmctfy 2014-06-09 12:12:07 -07:00
logo.png Run PNG crusher on logo.png 2016-02-10 15:02:44 -08:00
Makefile Minor cleanup. 2017-09-27 00:22:11 -07:00
README.md Minor cleanup. 2017-09-27 00:22:11 -07:00
storagedriver.go Inline storageDriver usage & fix error message 2016-05-05 17:13:41 -07:00
test.htdigest Added HTTP Auth and HTTP Digest authentication #302 2014-12-11 17:25:43 +05:30
test.htpasswd Added HTTP Auth and HTTP Digest authentication #302 2014-12-11 17:25:43 +05:30

cAdvisor

cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics. This data is exported by container and machine-wide.

cAdvisor has native support for Docker containers and should support just about any other container type out of the box. We strive for support across the board so feel free to open an issue if that is not the case. cAdvisor's container abstraction is based on lmctfy's so containers are inherently nested hierarchically.

cAdvisor

Quick Start: Running cAdvisor in a Docker Container

To quickly tryout cAdvisor on your machine with Docker, we have a Docker image that includes everything you need to get started. You can run a single cAdvisor to monitor the whole machine. Simply run:

sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

cAdvisor is now running (in the background) on http://localhost:8080. The setup includes directories with Docker state cAdvisor needs to observe.

Note: If you're running on CentOS, Fedora, RHEL, or are using LXC take a look at our running instructions.

We have detailed instructions on running cAdvisor standalone outside of Docker. cAdvisor running options may also be interesting for advanced usecases. If you want to build your own cAdvisor Docker image see our deployment page.

Building and Testing

See the more detailed instructions in the build page. This includes instructions for building and deploying the cAdvisor Docker image.

Exporting stats

cAdvisor supports exporting stats to various storage plugins. See the documentation for more details and examples.

Web UI

cAdvisor exposes a web UI at its port:

http://<hostname>:<port>/

See the documentation for more details.

Remote REST API & Clients

cAdvisor exposes its raw and processed stats via a versioned remote REST API. See the API's documentation for more information.

There is also an official Go client implementation in the client directory. See the documentation for more information.

Roadmap

cAdvisor aims to improve the resource usage and performance characteristics of running containers. Today, we gather and expose this information to users. In our roadmap:

  • Advise on the performance of a container (e.g.: when it is being negatively affected by another, when it is not receiving the resources it requires, etc)
  • Auto-tune the performance of the container based on previous advise.
  • Provide usage prediction to cluster schedulers and orchestration layers.

Community

Contributions, questions, and comments are all welcomed and encouraged! cAdvisor developers hang out on Slack in the #sig-node channel (get an invitation here). We also have the kubernetes-users Google Groups mailing list.