Commit Graph

81 Commits

Author SHA1 Message Date
WanLinghao
4eab5b671e Add support to disable diskIO metrics 2019-01-15 09:43:33 +08:00
Davanum Srinivas
4da6d809be
Move from glog to klog
Change-Id: Ic92f57c2d7f268d8d985797974883c1a537d6993
2018-11-08 18:06:28 -05:00
Sashank Appireddy
da29418c31 cache process metrics 2018-11-06 13:29:14 -08:00
David Ashpole
2fa6c624a2
Merge pull request #2034 from usabilla/mapped_file
Adds mapped_file metric
2018-08-29 10:25:29 -07:00
Gijs Kunze
9e175e9ea9 Adds mapped_file metric 2018-08-09 15:14:46 +02:00
Valentyn Boginskey
b09b68c4a9 Fix cache reporting with cgroup hierarchy 2018-07-28 07:20:42 -04:00
David Ashpole
c225d06adf don't emit prometheus metrics that are ignored 2018-07-09 13:17:49 -07:00
nielsole
08f0c2397c Adding /proc/<pid>/schedstat (#1872)
Add /proc/<pid>/schedstat metrics for scheduler metrics
2018-03-08 09:27:06 -08:00
David Ashpole
e1d602d7af create libcontainer handler for common code 2018-02-21 08:53:42 -08:00
Bryan Boreham
ec6da3acae Prometheus metrics: optionally export total CPU instead of per-CPU
Per-CPU stats are more expensive to transport and store, and that
level of detail is not required in many cases.

We export overall total cpu in the same metric as per-cpu, so that
dashboards which previously summed over cpu will work identically.
2018-02-20 13:58:44 +00:00
Euan Kemp
1ecd24ea8d libcontainer: Use first cgroup subsystem found (#1792)
libcontainer: Use first cgroup subsystem found
2017-11-06 15:33:59 -08:00
Rohit Agarwal
4a35130019 Collect container-level GPU metrics using NVML.
When cAdvisor starts up, it would read the `vendor` files in
`/sys/bus/pci/devices/*` to see if any NVIDIA devices (vendor ID: 0x10de) are
attached to the node. If no NVIDIA devices are found, this code path would
become dormant for the rest of cAdvisor lifetime. If NVIDIA devices are found,
we would start a goroutine that would check for the presence of NVML by trying
to dynamically load it at regular intervals. We need to do this regular
checking instead of doing it just once because it may happen that cAdvisor is
started before the NVIDIA drivers and NVML are installed.  Once the NVML
dynamic loading succeeds, we would use NVML’s query methods to find out how
many devices exist on the node and create a map from their minor numbers to
their handles and cache that map. The goroutine would exit at this point.

If we detected the presence of NVML in the previous step, whenever a new
container is detected by cAdvisor, cAdvisor would read the `devices.list` file
from the container's devices cgroup. The `devices.list` file lists the
major:minor number of all the devices that the container is allowed to access.
If we find any device with major number 195 (which is the major number assigned
to NVIDIA devices), we would cache the list of corresponding minor numbers for
that container.

During every housekeeping operation, in addition to collecting all the existing
metrics, we will use the cached NVIDIA device minor numbers and the map from
minor numbers to device handles to get metrics for GPU devices attached to the
container.
2017-11-06 11:54:59 -08:00
Euan Kemp
587691c7f3 libcontainer: ignore nil cpustats
Cadvisor can inotify watch for new cgroups. This leads to it racing
fairly tightly with cgroup creation... So tightly, that sometimes
cpustats are nil.

The runc library code we call
(https://github.com/opencontainers/runc/blob/v1.0.0-rc4/libcontainer/cgroups/fs/apply_raw.go#L179-L182)
doesn't actaully consider this an error, so we have to handle that
scenario ourselves.

This fixes https://github.com/google/cadvisor/issues/1765
2017-10-20 13:08:23 -07:00
Derek Carr
9ea61176bf Expose memory.max_usage_in_bytes in container stats 2017-10-10 17:31:31 -04:00
Euan Kemp
d2e11efba2 libcontainer: use real number of CPUs for usage
As of the 4.7 kernel, the cpustats field returned from libcontainer
contains values for every possible cpu (including nonexistent ones).
The extra values are all 0s.

If we assume that hotplug events won't happen, we can get a more
accurage cpu count by using runtime.NumCPU and then ignoring any values
beyond that.
2017-08-30 14:26:26 -07:00
Derek Carr
6fa48d9048 Expose total_rss when hierarchy is enabled 2017-08-23 14:56:59 -04:00
Derek Carr
d493f11f0b Reduce log spam when unable to get network stats 2017-08-18 16:11:03 -04:00
Tristan Colgate
227bb3a79d Add udp and udp6 network statistics 2017-04-10 20:41:51 +01:00
derekwaynecarr
b84046f12c Look at all cgroup mounts 2016-09-22 15:34:59 -04:00
Florian Koch
3ce98a46c4 add cgropu swap usage and export as prometheus metric 2016-08-09 07:33:37 +02:00
Tobias Schmidt
1653733ea7 Expose cpu cgroup CFS prometheus metrics
If CPU quota is configured (cpu.cfs_quota != -1) the CFS will provide
stats about elapsed periods and throtting in cpu.stats. This change
makes these information available as container_cpu_cfs_* metrics.
2016-08-06 18:08:26 -04:00
Michael Taufen
307d1b1cb3 Modify working set memory stats calculation
Change working set calculation to usage - total_inactive_file, rather than
usage - total_inactive_anon - total_inactive_file. Since writes to tmpfs
get tracked as total_inactive_anon when swap is disabled, the old
calculation would under-report memory pressure.

See this Kubernetes issue for context:
https://github.com/kubernetes/kubernetes/issues/28619
2016-07-15 10:58:25 -07:00
Tim St. Clair
4c506006f2 Don't validate docker state file, since it's no longer used 2016-05-06 19:29:24 -07:00
Tim St. Clair
dc6415aef7 Check docker container existance the same way as raw & rkt 2016-04-15 11:35:31 -07:00
Tim St. Clair
4a8f3e4c93 Read docker container spec from cgroupfs, rather than libcontainer spec 2016-04-14 17:10:03 -07:00
Tim St. Clair
7b1820b1d4 Look for container state in containerd path 2016-04-13 15:09:08 -07:00
Vishnu kannan
e2717d8bb7 Avoid collecting network stats for non root cgroups in raw handler.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-03-15 12:16:11 -07:00
Vishnu kannan
36415f465a Support opt out for metrics.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-02-24 15:57:31 -08:00
Jimmi Dyson
33386f899b bump(github.com/opencontainers/runc/libcontainer)
Fixes issues with breaking changes to ``GetPids` which is affecting
downstream consumers of cadvisor (e.g. Kubernetes).
2016-01-26 09:46:59 +00:00
Shimin Guo
a26b58ec8e expose page cache size 2016-01-15 08:45:51 -08:00
Shimin Guo
1a867bdadd expose RSS 2016-01-15 08:45:51 -08:00
Lei Xue
15b34b0131 add test case for compatibility.go 2015-12-02 11:01:50 +08:00
Lei Xue
7343ae4583 fix unmarshal container config failure with Docker 1.8.3 2015-12-02 11:01:12 +08:00
Lei Xue
dbbe38dfed re-order the import package 2015-11-30 16:43:22 +08:00
Jimmi Dyson
82810f13cd Remove unused code (via deadcode linter) 2015-11-27 21:48:33 +00:00
Jimmi Dyson
360c73c6fd Improve perf of interface stats parsing 2015-11-27 14:12:41 +00:00
Jimmi Dyson
f9eb56e800 Merge pull request #966 from afein/godep_update_runc
[Godeps] changed docker/libcontainer dependency to runc/libcontainer
2015-11-26 15:19:28 +00:00
Jimmi Dyson
d1fce20304 Regexp tidy up 2015-11-26 09:14:26 +00:00
Alex Mavrogiannis
4533dd7d18 changed libcontainer dependency to runc 2015-11-21 14:04:01 -08:00
Jimmi Dyson
561cc1da4f Use file reader directly for net stats 2015-10-28 12:51:19 +00:00
Jimmi Dyson
c72e0c23a5 Add test for net dev stats 2015-10-28 12:51:13 +00:00
Jimmi Dyson
da771a0977 Drop regexp for net stats parsing
Reported in kubernetes/kubernetes#16296
2015-10-27 20:16:49 +00:00
Jimmi Dyson
8b6e002e0a Disable tcp stats collection
Fixes #938
2015-10-22 21:05:46 +01:00
Jimmi Dyson
5a5d0575f5 Docker, libcontainer, docker client bumps 2015-10-20 09:22:12 +01:00
Tomas Kral
bd61caf0c3 add failcnt 2015-10-02 14:24:22 +02:00
Florian Koch
e4262b91b1 move TCP and TCP6 stats to NetworkStats 2015-09-25 09:04:53 +02:00
Florian Koch
dd041457b5 some fixes 2015-09-24 15:44:42 +02:00
Florian Koch
c331982f21 add tcp/tcp6 statistics 2015-09-24 15:44:42 +02:00
Jimmi Dyson
7e10398a50 Use proc fs to get network stats.
Reasons discussed in
https://github.com/google/cadvisor/issues/822#issuecomment-135811901 &
following.
2015-08-29 00:20:07 +01:00
Jimmi Dyson
d5fa97c998 Get network stats by switching network namespace on newer Docker
versions.

Fixes #822
2015-08-25 23:27:01 +01:00