Commit Graph

98 Commits

Author SHA1 Message Date
Sashank Appireddy
02ecf721f5 Emit number of processes and file descriptors of a container 2018-10-29 16:55:34 -07:00
David Ashpole
c094ef0d2a
Merge pull request #1859 from andyxning/reduce_labels_for_container_info
reduce labels for container info
2018-02-21 08:33:17 -08:00
Davanum Srinivas
b1656b253f Fix Warning->Warningf for better logging 2018-02-02 19:19:07 -05:00
Andy Xie
1ccbe6fdd0 reduce labels for container info 2018-01-12 00:14:01 +08:00
Tim Allclair
5b435b4b70
Clean up cAdvisor logging 2017-11-27 19:48:05 -08:00
David Ashpole
3166cdae87 add utils/clock dependency 2017-11-21 16:19:57 -08:00
David Ashpole
3d6ad6dd86 on demand metrics 2017-11-20 14:51:04 -08:00
Rohit Agarwal
4a35130019 Collect container-level GPU metrics using NVML.
When cAdvisor starts up, it would read the `vendor` files in
`/sys/bus/pci/devices/*` to see if any NVIDIA devices (vendor ID: 0x10de) are
attached to the node. If no NVIDIA devices are found, this code path would
become dormant for the rest of cAdvisor lifetime. If NVIDIA devices are found,
we would start a goroutine that would check for the presence of NVML by trying
to dynamically load it at regular intervals. We need to do this regular
checking instead of doing it just once because it may happen that cAdvisor is
started before the NVIDIA drivers and NVML are installed.  Once the NVML
dynamic loading succeeds, we would use NVML’s query methods to find out how
many devices exist on the node and create a map from their minor numbers to
their handles and cache that map. The goroutine would exit at this point.

If we detected the presence of NVML in the previous step, whenever a new
container is detected by cAdvisor, cAdvisor would read the `devices.list` file
from the container's devices cgroup. The `devices.list` file lists the
major:minor number of all the devices that the container is allowed to access.
If we find any device with major number 195 (which is the major number assigned
to NVIDIA devices), we would cache the list of corresponding minor numbers for
that container.

During every housekeeping operation, in addition to collecting all the existing
metrics, we will use the cached NVIDIA device minor numbers and the map from
minor numbers to device handles to get metrics for GPU devices attached to the
container.
2017-11-06 11:54:59 -08:00
Seth Jennings
3ba4699c12 skip subcontainer update on v2 calls 2017-08-17 12:36:38 -05:00
Thomas Orozco
2e1f0e2a08 Use a dedicated CpuLoadReader per container
This ensures each goroutine is given its own Netlink connection, and
presumably avoids having a message destined for one goroutine read by
another goroutine.
2016-05-18 09:34:13 +02:00
Vishnu kannan
ae38e6f460 Update docker dependency.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-01-21 14:57:18 -08:00
Timothy St. Clair
3256647668 Add jitter to housekeeping interval 2016-01-16 08:17:46 -06:00
Jimmi Dyson
4e9d29a408 Fix FS usage goroutine leaks 2016-01-14 19:30:48 +00:00
Pavel Tikhomirov
97257ccf61 v2: Fix cgroupPathRegExp to match path after first colon after devices
If in getCgroupPath in cgroups we have some other hierarchies after
"devices" using ".*" sometimes will cause matching wrong string as
a container path so we need negated character class here: "[^:]*".

e.g.
If cgroups string is
"153:name=systemd:/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope,4:freezer,devices,name=container:/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope,3:cpuacct,cpu,cpuset,name=fairsched:/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope,2:memory:/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope,1:blkio,name=beancounter:/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope"

match[1] will be "blkio" but not:
/system.slice/docker-f55e7cad1fcc02f992e0c33c210ecdc6d641858a665f28370523c27c05bdde0e.scope

These fixes the commit:
4cbd91c761 Make getCgroupPath work in case of named or multi- hierarchies

v2: use negated character class, correct the example, remove .* on
either end as they don't do anything in FindSubmatch.
2015-12-14 10:33:26 +03:00
Vishnu kannan
a6daa760c8 Fix goroutine leak in docker fs handler logic.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2015-12-04 11:19:49 -08:00
Jimmi Dyson
1d6e0ec9bc Merge pull request #1002 from timstclair/loaddecay
Fix usage of housekeeping_interval flag
2015-12-04 12:55:11 +00:00
Tim St. Clair
33216870d8 Fix usage of housekeeping_interval flag
Defer calculation of `loadDecay`. Flags must be parsed before they can
be read, and therefore cannot be reliably be read at package init time.
2015-12-03 18:39:11 -08:00
Lei Xue
dbbe38dfed re-order the import package 2015-11-30 16:43:22 +08:00
Jimmi Dyson
b9ff5c098c Fix up ignored/inefficient assigns (via ineffassign linter) 2015-11-27 22:01:54 +00:00
Jimmi Dyson
d1fce20304 Regexp tidy up 2015-11-26 09:14:26 +00:00
Pavel Tikhomirov
4cbd91c761 Make getCgroupPath work in case of named or multi- hierarchies
In case we have devices hierarchies mounted in named cgroup
or together with other hierarchy regexp parse will fail.
So after "devices" and before ":" can be name of cgroup or
other hierarchies names.

E.g.:
1) remount cgroups:
umount /sys/fs/cgroup/devices
mkdir /sys/fs/cgroup/named_cgroup
mount -n -t cgroup -o devices,name=named_cgroup cgroup
/sys/fs/cgroup/named_cgroup

2) add some task to nested device cgroup and check ps output
mkdir /sys/fs/cgroup/named_cgroup/test.slice
sleep 1000 &
[1] 22734
echo 22734 > /sys/fs/cgroup/named_cgroup/test.slice/tasks
ps -ao pid,cgroup | grep 22734
22734
14:devices,name=named_cgroup:/test.slice,1:name=systemd:/user.slice/user-1000.slice/session-1.scope

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2015-09-18 17:55:23 +03:00
Jin-Hwan Jeong
9bb7a0278d this patch fixes "high cpu consumption without sleeping in housekeeping()" problem if system time has changed
Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
2015-08-18 09:03:12 +09:00
anushree-n
04a78502ca Modify generic collector 2015-08-12 17:56:01 -07:00
Rohit Jnagal
15664a6a0c Fix cgroup name parsing logic in ps output for centos6.
Centos 6 uses a different ps lib resulting in different output.
2015-08-08 02:23:58 +00:00
Rohit Jnagal
ef41402a39 Merge pull request #838 from rjnagal/docker
Add custom metrics to spec.
2015-07-27 16:37:32 -07:00
Rohit Jnagal
c0b3f779f5 Add custom metrics to spec.
Remove spec-related fields from stat.
We can simplify the stats a bit further by handling Int and Float better.
But this was big enough change already.
Verified v1 and v2 spec/stats/appmetrics APIs.
2015-07-25 20:17:54 +00:00
Victor Marmol
b581ee2e67 Merge pull request #835 from rjnagal/docker
Fix converion of rss and vsz in ps output.
2015-07-24 10:12:25 -07:00
Rohit Jnagal
a5e65b38c6 Fix converion of rss and vsz in ps output. 2015-07-24 15:35:58 +00:00
Victor Marmol
ca7fd6d40a Merge pull request #831 from rjnagal/docker
Two small fixes to custom metric collection.
2015-07-22 21:34:23 -07:00
Rohit Jnagal
3f8e065947 Two small fixes to custom metric collection.
- a typo in minPollingFrequency multiplies it with time.Second twice.
- Updating custom metrics is unnecessarily called for all containers.
2015-07-23 01:56:07 +00:00
Victor Marmol
17c45c6ec3 Merge pull request #827 from rjnagal/docker
Add logic to read custom metric config files from container root.
2015-07-22 13:39:17 -07:00
Rohit Jnagal
a123fd72d8 Add logic to read custom metric config files from container root.
Docker does not provide the rootfs path through docker inspect or statefile
and the path is dependent on the storage driver being used.

Instead of enumerating the storage drivers, we pick a pid from the container
and get the config from /proc/pid/root. Although a bit expensive, this method
works for non-docker containers too.
2015-07-22 15:45:07 +00:00
Piotr Szczesniak
90ca5f9286 Moved max_housekeeping and allow_dynamic_housekeeping flags to cadvisor.go 2015-07-21 20:26:57 +02:00
anushree-n
e2e193c1fd Add metrics caching 2015-07-20 11:24:20 -07:00
Victor Marmol
675c09e296 Remove stats from cache when container is destroyed 2015-06-10 07:53:46 -07:00
Rohit Jnagal
eb8f941ba6 Make process listing work when cadvisor is running in docker.
Use /rootfs/proc to build the process listing.
2015-06-04 18:49:43 +00:00
Rohit Jnagal
1a2781819e Separate in-memory cache from storage drivers. 2015-06-02 16:06:01 +00:00
Rohit Jnagal
d8fb3c802f Add cgroup info and links to the process list on root page. 2015-05-20 03:19:56 +00:00
Rohit Jnagal
1ca29f8f20 Improve process table output.
Use pretty prints, but maintain sorting capabilities.
2015-05-19 16:33:56 +00:00
Rohit Jnagal
3bcae7f430 Add memory-percent to ps output. 2015-05-12 22:44:48 +00:00
Rohit Jnagal
2a99748874 Add process information to the UI.
For root, we report all processes. Process stats are refreshed every minute.
2015-05-12 19:08:12 +00:00
Victor Marmol
d61a381e84 Merge pull request #707 from rjnagal/docker
Add an api to support ps/top.
2015-05-11 23:01:24 -07:00
Rohit Jnagal
5e10989a78 Add an api to support ps/top. 2015-05-12 00:06:47 +00:00
Victor Marmol
4fdd709717 Collectors export metrics from Collect(). 2015-05-11 12:26:51 -07:00
Victor Marmol
834d1cf34e Lower logging level of some common logs. 2015-05-06 10:24:50 -07:00
Victor Marmol
bce54ce3f5 Run custom collectors in container housekeeping.
This will allow us to register and run custom collectors for each
container.
2015-05-04 15:57:18 -07:00
Victor Marmol
11462d80bc Lowering log levels.
Reduce common logging using Kubernetes logging standards.
2015-04-13 15:05:41 -07:00
Clayton Coleman
3e7c1b3613 Reduce the level for a few common log messages
Following the Kubernetes convention of V(2) is normal verbosity (log
each request to a webserver).
2015-03-19 23:55:36 -04:00
Victor Marmol
54bc33dd2c Lowering log level for frequent events.
Lowering all frequent normal logs to v=3. Kubelet runs by default on
debug of v=2 and we don't want to log these events in that case.
2015-03-09 14:53:53 -07:00
Rohit Jnagal
d3db8503f4 Move derived stats to v2. Add v2 container spec. 2015-03-04 18:27:57 +00:00