Commit Graph

287 Commits

Author SHA1 Message Date
James Ravn
e660d8b8b7 Retry docker status on startup
For https://github.com/google/cadvisor/issues/1866.
2018-02-06 10:31:11 +00:00
Davanum Srinivas
b1656b253f Fix Warning->Warningf for better logging 2018-02-02 19:19:07 -05:00
James Ravn
57e17d8be2 Add timeouts for docker queries
As these can otherwise block indefinitely due to docker issues.

This is to fix https://github.com/kubernetes/kubernetes/issues/53207,
where kubelet relies on cadvisor for gathering docker information as
part of its periodic node status update.
2017-12-05 13:50:48 +00:00
Tim Allclair
5b435b4b70
Clean up cAdvisor logging 2017-11-27 19:48:05 -08:00
Tim Allclair
3a40bbfc5c
Raise verbosity on runtime registration failure 2017-11-27 19:48:04 -08:00
David Ashpole
3166cdae87 add utils/clock dependency 2017-11-21 16:19:57 -08:00
David Ashpole
3d6ad6dd86 on demand metrics 2017-11-20 14:51:04 -08:00
David Ashpole
6988e70a3d Revert "fix #1708; move from inotify to fsnotify"
This reverts commit e6b6a1ac57.
2017-11-17 10:28:28 -08:00
abhi
6ad15431f4 Integrating containerd to cadvisor
This commit includes changes to integrate containerd
runtime to cadvisor to collect container stats

Signed-off-by: abhi <abhi@docker.com>

Test cases and minor changes

This commit include test cases and minor fixes
for the same

Signed-off-by: abhi <abhi@docker.com>
2017-11-14 17:37:36 -08:00
Rohit Agarwal
4a35130019 Collect container-level GPU metrics using NVML.
When cAdvisor starts up, it would read the `vendor` files in
`/sys/bus/pci/devices/*` to see if any NVIDIA devices (vendor ID: 0x10de) are
attached to the node. If no NVIDIA devices are found, this code path would
become dormant for the rest of cAdvisor lifetime. If NVIDIA devices are found,
we would start a goroutine that would check for the presence of NVML by trying
to dynamically load it at regular intervals. We need to do this regular
checking instead of doing it just once because it may happen that cAdvisor is
started before the NVIDIA drivers and NVML are installed.  Once the NVML
dynamic loading succeeds, we would use NVML’s query methods to find out how
many devices exist on the node and create a map from their minor numbers to
their handles and cache that map. The goroutine would exit at this point.

If we detected the presence of NVML in the previous step, whenever a new
container is detected by cAdvisor, cAdvisor would read the `devices.list` file
from the container's devices cgroup. The `devices.list` file lists the
major:minor number of all the devices that the container is allowed to access.
If we find any device with major number 195 (which is the major number assigned
to NVIDIA devices), we would cache the list of corresponding minor numbers for
that container.

During every housekeeping operation, in addition to collecting all the existing
metrics, we will use the cached NVIDIA device minor numbers and the map from
minor numbers to device handles to get metrics for GPU devices attached to the
container.
2017-11-06 11:54:59 -08:00
David Ashpole
53820123e6 Merge pull request #1336 from ronnielai/test
Don't rely on the returned value when there's an error
2017-10-24 15:56:55 -07:00
David Ashpole
e6b6a1ac57 fix #1708; move from inotify to fsnotify 2017-09-28 10:57:49 -07:00
Antonio Murdaca
4b002b3bd3
*: add CRI-O handler
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-09-05 17:01:58 +02:00
Yang Guo
a5adaad26d Add an API to get FsStats from filesystem UUID 2017-08-23 12:33:42 -07:00
Seth Jennings
2ab9f25f4e fixup manager after runc bump 2017-08-22 16:38:37 -05:00
Seth Jennings
3ba4699c12 skip subcontainer update on v2 calls 2017-08-17 12:36:38 -05:00
Chris Bui
cdf78981fb Allow finding docker containers by short name
Allow docker containers to be found by a short prefix name to match
the behavior of the docker daemon. This change now matches the
examples on the API docs.

Return an error if the given short name of a container is not unique.
2017-05-18 11:56:48 -05:00
Manjunath A Kumatagi
8fb1158353 Add Docker API version 2017-04-04 10:56:11 +05:30
David Ashpole
696b82ae97 do not log multiple filesystems if root container 2017-01-09 10:55:41 -08:00
David Ashpole
3fec19a10e added getdirfsinfo, which finds fsinfo for the filesystem containing dir 2016-10-21 15:41:18 -07:00
David Ashpole
d6be0547f4 found a much simpler way to fix the go vet problem 2016-10-19 14:54:31 -07:00
David Ashpole
a9b9dbe6be Revert "Merge pull request #1503 from dashpole/configure_root_path"
Undo this commit
This reverts commit 719df516db, reversing
changes made to cae5bfaee6.
2016-10-19 13:47:01 -07:00
David Ashpole
9e47be7bdf Cadvisor allows the RootPath to be configured. The RootPath is used to determine which filesystem is the RootFs. 2016-10-19 10:39:35 -07:00
Jimmi Dyson
041c5af905
Switch to Prometheus decoder 2016-09-22 22:22:07 +01:00
derekwaynecarr
6c114be580 Expose total inodes 2016-08-02 10:47:51 -04:00
derekwaynecarr
cccf9d5fec Allow clients to know if inodes are supported on a filesystem 2016-07-26 11:15:07 -04:00
mwringe
b8b541d86a Update collectors to use a customized httpClient. 2016-07-21 16:00:21 -04:00
Lantao Liu
41e74494b3 Continue watching other directories when there is watch error. 2016-07-18 16:49:47 -07:00
Matt Wringe
6ef612f21e Update collectors to be able to directly access containers by their ip address. 2016-07-14 10:36:53 -04:00
Tim St. Clair
35f7bc5ee7 Move mocks to testing package to remove +build tags
Some go tools (e.g. godef, gorename) don't handle +build tags well, so
I refactored some packages to remove the test tags from cAdvisor.
2016-07-06 14:15:43 -07:00
Ron Lai
29ffb3b6b9 Adding inode info 2016-06-27 11:52:40 -07:00
Ron Lai
7ba1f7e60f Don't rely on the returned value when there's an error 2016-06-24 13:35:42 -07:00
Tim St. Clair
f02ec8a967 Downgrade failure to rgeister runtime factory to warning
It is not an error to fail to register the Docker factory on a system
running only rkt, and vice-versa, so these failures are downgraded from
an Error to a Warning. The raw handler should always be registered.
2016-06-21 13:21:13 -07:00
Tim St. Clair
81786ec1d2 Fix nil-interface partialFailure bug 2016-05-19 16:16:58 -07:00
Thomas Orozco
2e1f0e2a08 Use a dedicated CpuLoadReader per container
This ensures each goroutine is given its own Netlink connection, and
presumably avoids having a message destined for one goroutine read by
another goroutine.
2016-05-18 09:34:13 +02:00
Vish Kannan
05809d5936 Merge pull request #1286 from timstclair/subcontainers2
Make manager multi-container functions robust to partial failures
2016-05-17 13:22:17 -07:00
Shaya Potter
6fa3687720 Polling rkt implementation of new watcher inteface (#1284)
polling rkt implementation of new watcher inteface
2016-05-17 10:34:56 -07:00
Tim St. Clair
39fe454f32 Make manager multi-container functions robust to partial failures 2016-05-16 13:25:33 -07:00
Shaya Potter
e02632463b Refactor container watching out of raw handler into its own inteface / package 2016-05-11 20:27:10 -07:00
Yu-Ju Hong
f695b7cfc8 Print the event when failed to process it 2016-05-02 17:47:41 -07:00
Tim St. Clair
4d3ef349fb Move utils/machine -> machine 2016-05-02 15:56:49 -07:00
Tim St. Clair
f365c6a115 Move docker types to v1 API 2016-05-02 15:52:29 -07:00
Tim St. Clair
9961e37168 Delete unused ManagerMock 2016-05-02 12:24:33 -07:00
Tim St. Clair
0c89fd1b71 Refactor docker-specific functions from manager to docker 2016-05-02 12:24:31 -07:00
Tim St. Clair
b983d32d96 Refactor manager/machine.go -> utils/machine/info.go 2016-05-02 11:54:10 -07:00
Lantao Liu
ece4c555cc switch to the new engine-api 2016-04-25 19:22:05 -07:00
derekwaynecarr
d01934a3e4 on systemd, we should ignore .mount cgroups 2016-04-20 23:47:19 -04:00
Tim St. Clair
b553e02476 Fix cAdivsor docker validation 2016-04-08 17:05:26 -07:00
Tim St. Clair
d9c864324b Fix usage of the latest go-dockerclient 2016-04-04 18:01:47 -07:00
Shaya Potter
206670a655 first cut of rkt handler 2016-03-21 17:34:42 -07:00